Journal of Phonetics & Audiology

Journal of Phonetics & Audiology
Open Access

ISSN: 2471-9455

Research Article - (2016) Volume 2, Issue 1

Virtual Space of English Consonants: Shorter Distance Produced by Japanese Learners of English

Kaoru Tomita*
Faculty of Literature and Social Sciences, Yamagata University, 1-4-12 Kojirakawa-machi, Yamagata 990-8560, Japan
*Corresponding Author: Kaoru Tomita, Faculty of Literature and Social Sciences, Yamagata University, 1-4-12 Kojirakawa-machi, Yamagata 990-8560, Japan, Tel: (023) 628-4793, Fax: (023) 628-4793 Email:

Abstract

This study investigates how Japanese learners of English pronounce two consonants, /s/ and /S/, or /b/ and /v/, of English minimal-paired words whose corresponding words are English-based loanwords in Japanese and written in katakana. Frequency of spectral peak, duration, and intensity of these consonants produced by six Japanese learners of English and six native English speakers are measured with acoustic equipment. Among these phonetic features, significant differences in frequency of spectral peak between /s/ and /S/ are observed. This holds true for both native English speakers and Japanese learners of English. There are also significant differences in duration between /b/ and /v/, and intensity between /S/ and /s/, or /b/ and /v/. A hypothesis that distance between the values of these features for each paired consonants tends to be smaller for the Japanese learners of English than for the native English speakers is also verified. Implications for further research are briefly discussed.

Keywords: Consonant; Consonantal distance; Duration; Formant; Intensity

Introduction

Japanese and English have different phonological systems, which produce differences in timing, such as a stress-timed language or a mora-timed language. That might also cause differences in length, manner and even position of consonant articulation. A standard textbook of Japanese-English phonetics and phonology states that there is not a big difference in number of English and Japanese consonants, and broadly the former is uttered with stronger breath than the latter.

Manners and positions of English and Japanese consonants presented in Table 1, however, invite several questions as to those explanations of Japanese and English phonological features.

  Bilabial Labio-dental Dental Alveolar Post-alveolar Palatal Velar Uvular Glotal
Plosive pb   td td     kg    
Affricate       ts tSdZ        
Nasal m   n n     N N  
Flap         }        
Fricative ­ fv TD sz SZ       h
Approximant       ¨   j w    
Lateral approximant       l          

Table 1: Consonants which has the same manner and position of articulation for English and Japanese are presented in small letters and those whose manner or position differs in big letters, with italic for English and non-italic for Japanese.

It is then expected that these different phonological systems would affect Japanese learners of English at some learning stages. For example, one may ask to what extent Japanese speakers’ /S/ in she is different from English speakers’ one in terms of acoustic properties such as frequency of noise region. Transferring from one’s native language plays an important role in learning a foreign language [1]:

• Foreign accents are not the result of just “missing the mark” in random ways. To the contrary, careful inspection shows that the deviations between the goal and what is achieved are systematic; and can usually be attributed to the phonology, including the phonological rules, of one’s native language. The phenomenon of mispronunciations in a second language in ways attributable to the phonology of the first language is called transfer.

The transfer from a native language, for example, the one from English-based loan words in Japanese to English, is not studied empirically [2]:

• The argument that loanwords in Japanese are of (great) detriment to learners of English has been generated from observations of errors and evidenced with anecdotes. These studies have focused on perceived interference in pronunciation and word meaning. As with Contrastive Analysis itself, there is little empirical evidence presented, only descriptions of gross, superficial features of produced language.

Consonantal transferring would be observed in its phonetic features, such as duration and intensity, and they can be analyzed with values of a concentration of acoustic energy, the formant.

Consonants are easy to describe in articulatory terms whereas vowels are easier to describe in acoustic terms [3]. They are more difficult than vowels to be measured with a single category [4]. They, however, can be dealt with based on the same concepts [3]:

• This use of different dimensions in the description of consonants and vowels suggests that the articulation of these two classes of sounds has little in common. However, the articulatory description of both consonants and vowels is largely based on location of constriction (“place of articulation” in consonants, “frontness” in vowels) and degree of constriction (“manner of articulation” in consonants, “height” in vowels).

Formant values would be used to describe consonantal transferring observed in utterances by Japanese learners of English.

Consonants are generally classified by voicing/unvoicing, place of articulation and manner of articulation. Consonants, /s/ and /S/, for example, are both unvoicing sounds produced with a stream of air directed at the upper teeth, which creates noisy turbulent flow. Only their place of articulation is different. The sound, /s/, is made by touching the tip or blade of the tongue to a location just forward of the alveolar ridge. The sound, /S/, is made by touching the blade of the tongue to a location just behind the alveolar ridge.

With looking at the spectral noise region, we can tell, /s/ from /S/: there is the importance of a high-frequency noise region for /s/ and a low-frequency noise region for /S/. This noise region has some ranges: The energy for /s/ is largely above 4,000 Hz and that for /S/ begins lower at around 2,500 Hz [5]. Identification of [s] appeares to depend on energy peaks at about 5000 and 8000 Hz, whereas identification of [S] is related to a peak at about 2500 Hz [4]. The difference between frequency spectral peak of [s] and [S], in which [S] typically exhibits a mid-frequency spectral peak around 2,500-3,500 Hz and alveolar [s] is produced with a shorter anterior cavity than [S] and therefore display a primary spectral peak at higher frequencies, ranging from 4,000 to 7,000 Hz [3].

A consonant, /v/, is also a voicing sound formed by touching the lower lip to the upper teeth with a tight constriction which is made so that air passing through the constriction flows turbulently, making a hissing noise. A consonant, /b/, is defined to be a voicing sound formed by two lips with the airflow through the mouth is momentarily closed off.

Consonants are also discussed in groups that are distinctive in their articulatory and acoustic properties: plosives, affricates, nasals, fricatives and approximants. Spectral change is a vital part of creating characteristic timbre of consonants [6]. They are always coproduced with a vowel, and are acoustically characterized by a period of silence (corresponding to the vocal tract closure), followed by a sudden broadband burst of energy (as the vocal tract constriction is released), followed by formant transitions that typically last about 50 msec.

Languages traditionally classified as stress-timed have low %V (percent of duration occupied by vowels) and high ΔC values (consonantal interval variability), while languages traditionally classified as syllable timed or mora timed have high %V and low ΔC values [6]. This would affect the duration of consonants in stress timed language when they are produced by learners whose native language is syllable or mora timed.

With looking at the duration of noise segments, we can tell, /b/ from /v/. It is reported that when stops, affricates and fricatives are compared in an equivalent context, the fricatives generally have the longest noise segments. The interval from release of a consonant constriction to the onset of voicing is larger in fricatives than in stops [7]. In a study of the noise segment durations for stops, affricates, and fricatives in the languages of Mandarin, Czech, and German, the following durational boundaries are identified: 62 to 78 msec for the stop-affricate boundary, and 132 to 133 msec for the affricate-fricative boundary [8-10].

On the basis of these phonological features, it is estimated that there would be a significant difference in the values of their production of frequency of noise region, duration and intensity between paired English consonants, /S/ and /s/, or /b/ and /v/ produced by both native English speakers and Japanese learners of English. Besides, for minimalpaired consonants, such as /s/ and /S/, Japanese learners of English would produce /s/s that display noise region at lower frequencies, and /S/s that display noise region at higher frequencies than those produced by native speakers of English. For minimal-paired consonants, such as /v/ and /b/, Japanese learners of English would produce /v/s that hold shorter duration, and /b/s that hold longer duration than those produced by native speakers of English. For both of these minimalpaired consonants, Japanese learners of English would produce /S/s in lower intensity and /s/s in higher intensity or /b/s in lower intensity and /v/s in higher intensity than those produced by native speakers of English

On the basis of these predictions, it is hypothesized that distance between the locations that the blade of the tongue touches for producing /s/ and /S/, which can be called virtual consonant space, would smaller for Japanese learners of English than for native English speakers. Also the distance between the duration of /v/ and /b/, or the distance between the intensity of /S/ and /s/, or /b/ and /v/, which can be also called virtual consonant space, would smaller for Japanese learners of English than for native English speakers.

Experiment

Method

Subjects: Three female speakers of American English (hereafter FE1, FE2, FE3), three male speakers of American English (hereafter ME1, ME2, ME3) and four female Japanese learners of English (FJ1, FJ2, FJ3, FJ4) and two male Japanese learners of English (MJ1, MJ2) participated in the experiment. The native English speakers aged 23 or 20 came from Oklahoma, U.S.A. as exchange students with one year term. The Japanese learners of English who were from Northern Japan were college students and their ages were 21 or 20 years. On the basis of the TOEIC® scores, they were regarded as intermediate-level learners of college English.

Stimuli: The stimuli consisted of a pair of monosyllabic or disyllabic words. They were (1) veering /vI«¨IN/, (2) beer /bI«¨/, (3) view /vju:/, (4) boom /bu:m/, (5) virtual /vÎ:]u«l/, (6) bargain /bA:]gIn/, (7) seafood /si:]u:d/, (8) shifting /SIftIN/ (9) suit /su:]/, (10) shoot /Su:]/, (11) son / sÃn/, and (12) shutting /SÃtIN/. All these words except “veering” and “son” were used as English-based loanwords in Japanese. They were called in such ways as /bi±u/, /bju:/, /bu:mu/, /ba:tSa±u/, /ba:gen/, /si:u: do/, /siutingu/, /su:tsu/, /sju:to/, /sjaÂtingu/. Each stimulus item was printed in a carrier phrase, I said “ ”., on a 21 cm × 30 cm sheet of paper.

Procedure

Each speaker was presented with test sheets and asked to clearly produce test items ten times. He/she was instructed to pronounce each word as clearly as possible. This instruction was important particularly for native English speakers because elision of consonants in conversational speech was not uncommon.

Recordings were made of 1440 items (12 speakers × 12 items × 10 times per subject) and they were recorded in a sound treated room using a Sony unidirectional dynamic microphone (F-V640) and a Marantz solid state recorder (PMD670). The microphone was positioned at a lip-to-mouth distance of approximately five cm. Recordings lasted approximately one hour for each subject.

Acoustic measurements

The speech samples were analyzed using the Praat speech analyzing software (http: www.praat.org). The sampling rate was 44.1 kHz with a 16 bit resolution. For each consonant, the mean frequency of noise region, the mean duration and the mean intensity values were used for analyses.

Analyses

Frequency of noise region, duration and intensity were compared between subjects. An analysis of variance (ANOVA) with repeated measures was a basic tool used for mean comparisons.

Results

The frequency ranges from 1129 to 10890 Hz. The durations ranges from 13 to 149 msec. The intensity ranges from 36 to 79 dB.

Frequency of noise regions for /s/ and /]/ comparison

The spectral noise region is measured with acquiring frequency of the highest spectrum by observing spectral slice. Mean frequencies of the spectral peak in hertz, that are observed in consonants, /s/ or /S/, of the experimental words produced by each subject is shown in Tables 2 and 3. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal quality effect. As predicted, comparison of /s/ and /S/ shows that the frequency of noise region of /s/ is significantly higher than that of /S/ for all native English speakers and Japanese learners of English except MJ2’s seafoodshifting.

/s/-/S/ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Seafood Shifting Mean
F-valueb
p-value
Suit
Shoot Mean
F-valueb
p-value Son Shutting Mean
F-valueb
p-value
7634 4095 5864 115.64 0.001 5572 4132 4847 6.53
0.02 7192 4189 5690 467.78 0.001
7994 3714 5854 114.81 0.001 6629 3639 5134 56.45 0.001 7694 3501 5597 91.69 0.001 10890 1129 6009 610.88 0.001 8635 3753 6194 24.75 0.001 9599 3861 6730 545.07 0.001 7288 3609 5448 14.00 0.001 5489 3698 4593 14.25 0.001 8424 3762 6093 137.71 0.001 2562 3304 2931 3.93 0.06 4545 3629 4087 30.01 0.001 6293 4368 5330 11.21 0.004 6560 4724 5642 9.61 0.06 6556 4201 5378 15.94 0.001 6932 4568 5750 41.18 0.001 5665 5358 5511 4.16
0.05 6185 3972 5078 266.20 0.001 5050 6588 5819 24.64 0.001
6941 3704



6230 3860




7312 4405
27.35 18.12



5.76 6.65




18.85 38.78
<0.001 <0.001



<0.001 <0.001



<0.001 <0.001
FE3>FE2, FE1, FJ1, FJ3, FJ4>FJ2 FJ4, FJ3>FE1, FE2,



FJ1, FJ2,>FE3 FE3>FE2, FJ3, FJ4, FE1, FJ1 FJ3, FE1, FJ4>FE3, FJ1, FE2, FJ2


FE3, FJ1>FE2, FE1, FJ3, FJ2,>FJ4 FJ4>FJ3, FJ2, FE1, FE3, FJ1>FE2

Table 2: Mean frequency of spectral peak for the female speakers [Hz]. a The degrees of freedom are all 6 and 63. bThe degrees of freedom are all 1 and 18.

/s/-/S ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Seafood
Shifting
Mean
F-valueb
p-value
Suit
Shoot
Mean
F-valueb
p-value
Son
Shutting
Mean
F-valueb
p-value
8067
4639
6353
33.86
0.001
6042
4068
5055
20.67
0.001
6576
4313
5444
69.66
0.001
5603
2781
4192
291.80
0.001
5516
2541
4028
224.68
0.001
5830
2700
4265
222.50
0.001
6998
4291
5644
226.61
0.001
5225
3180
4202
32.55
0.001
5716
4222
4969
49.60
0.001
6670
3513
5091
35.74
0.001
6881
3196
5038
279.56
0.001
7733
3629
5681
318.44
0.001
4661
4667
4664
40.49
NS
4687
3801
4244
40.846
0.001
5623
4133
4878
87.65
0.001
6400
3978 5670
3357 5695
3799
13.83
42.91 10.42
19.90 27.34
42.91
<0.001
<0.001 <0.001
<0.001 <0.001
<0.001
ME1, ME3>MJ>ME2, MJ2
ME1, ME3>MJ1>MJ2>ME2 MJ1, ME1>ME2, ME3, MJ2
ME1, MJ2>MJ1, ME2>ME2 MJ1>ME1>ME2, ME3, MJ2
MJ1>ME1, ME3, MJ2>ME2

Table 3: Mean frequency of spectral peak for the male speakers [Hz]. aThe degrees of freedom are all 4 and 45. bThe degrees of freedom are all 1 and 18. NS not significant.

Duration of segments for /b/ and /v/ comparison

The duration is measured with observing spectrogram and acquiring range of energy spread fairly evenly. Mean durations in milliseconds of consonants, /v/ or /b/, produced by each subject is shown in Tables 4 and 5. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal quality effect. As predicted, there are significant differences between duration of /v/ and /b/ for both native English speakers and Japanese learners of English. Comparison of /v/ and /b/ shows that the duration of /v/ was significantly longer than that of /b/ for native English speakers except FE2’s view-boom, virtual-bargain, FE3’s view-boom, ME1’s view-boom, ME2’s view-boom, ME3’s veering-beer, virtual-bargain. The comparison of /v/ and /b/ also showed that the former was significantly longer than the latter for Japanese learners of English except FJ1’s view-boom, virtual-bargain, FJ2’s view-boom, virtual-bargain, FJ3’s virtual-bargain, FJ4’s veering-beer, virtual-bargain, MJ1’s veering-beer, view-boom, MJ2’s veering-beer and virtual-bargain.

/v/-/b/ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Veering
Beer
Mean F-valueb
p-value
View
Boom
Mean
F-valueb
p-value
Virtual
Bargain
Mean
F-valueb
p-value
149
24
86
695.71
0.001
82
29
55
24.59
0.001
98
48
73
11.07
0.004
47
17
32
80.73
0.001
30
26
28
3.24
NS
26
29
27
0.68
NS
97
44
70
94.13
0.001
52
48
50
0.41
NS
50
61
55
8.25
0.010
52
38
45
6.87
0.01
59
52
55
2.41
NS
61
48
54
1.49
NS
32
53
42
17.81
0.001
57
59
58
0.32
NS
57
55
56
0.06
NS
68
45
56
5.58
0.03
85
60
72
12.08
0.003
61
63
62
0.06
NS
36
37
36
0.09
NS
32
51
41
11.74
0.003
42
35
38
3.64
NS
68
37 56
46 56
48
12.49
75.23 18.28
13.98 16.92
4.79
<0.001
<0.001 <0.001
<0.001 <0.001
<0.001
FE1>FE3>FJ3, FJ1, FE2, FJ3, FJ2
FJ2, FJ3, FE3, FJ1, FJ4, FE1>FE2 FJ3, FE1>FJ1, FJ2, FE3>FJ4, FE2
FJ2, FJ2, FJ1, FJ4, FE3>FE1, FE2 FE1>FJ1, FK3, FJ2, FE3>FJ4, FE2
FJ3, FE3, FJ1, FJ2, FE1, FJ4>FE2

Table 4: Mean duration for the female speakers [msec]. aThe degrees of freedom are all 6 and 63.bThe degrees of freedom are all 1 and 18. NS not significant.

/v/-/b/ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Veering
Beer
Mean F-valueb
p-value
View
Boom
Mean
F-valueb
p-value
Virtual
Bargain
Mean
F-valueb
p-value
30
13
21
61.28
0.001
26
20
23
3.46
NS
24
14
19
19.72
0.001
71
23
47
84.16
0.001
33
26
29
2.80
NS
29
39
34
12.18
0.003
44
38
41
1.68
NS
50
40
45
4.84
0.04
49
43
46
1.07
NS
25
29
27
2.91
NS
35
39
37
0.56
NS
38
48
43
6.20
0.023
57
41
34
0.52
NS
41
30
35
4.48
0.04
35
33
34
0.40
NS
45
28 37
31 35
35
2.12
22.44 6.66
11.54 24.92
18.59
NS
<0.001 <0.001
<0.001 <0.001
<0.001
MJ2, ME3>MJ1, ME2>ME1 ME3, MJ2>MJ1, ME2, ME1
ME3, MJ1>MJ2, ME2, ME1 ME3>MJ1, MJ2>ME2, ME1
MJ1, ME3, ME2>MJ2>ME1

Table 5: Mean duration for the male speakers [msec]. a The degrees of freedom are all 4 and 45. b The degrees of freedom are all 1 and 18. NS not significant.

Intensity of segments for /S/ and /s/ or /b/ and /v/ comparison

The intensity is measured with observing spectrogram and acquiring energy values. Mean intensities in decibels of consonants, /S/ and /s/, or /b/ and /v/, produced by each subject is shown in Tables 6 and 7. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal effect. As is expected, comparison of /S/ and /s/ shows that the intensity of /S/ is significantly higher than that of /s/ and comparison of /b/ and /v/ shows that the intensity of /b/ is significantly higher than that of /v/ for both native English speakers and Japanese learners of English. Comparison of /S/ and /s/ shows that the intensity of /S/ is significantly higher than that of /s/ for native English speakers except ME1’s shootsuit, shutting-son, ME3’s shifting-seafood, shutting-son. Comparison of /b/ and /v/ shows that intensity of /b/ is significantly higher than that of /v/ for native English speakers except FE3’s bargain-virtual, ME2’s bargain-virtual, and ME3’s beer-veering. Comparison of /S/ and /s/ or /b/ and /v/ produced by Japanese learners of English shows that among 18 cases of /S/ and /s/ comparison, six ones do not show a significant difference, and among 18 cases of /b/ and /v/ comparison, 12 ones do not show a significant difference.

/S/-/s/ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Shifting
Seafood
Mean
F-valueb
p-value
Shoot
Suit
Mean
F-valueb
p-value
Shutting
Son
Mean
F-valueb
p-value
56
56
56
0.23
NS
61
59
60
7.83
0.010
60
57
58
8.32
0.010
49
42
45
72.90
0.001
51
44
47
22.25
0.001
47
40
43
39.03
0.001
54
47
50
46.97
0.001
51
47
49
15.17
0.001
53
47
50
32.54
0.001
54
43
48
102.93
0.001
56
49
52
16.33
0.001
54
45
49
42.43
0.001
67
73
70
22.19
0.001
48
58
53
13.87
0.002
42
56
49
58.31
0.001
40
38
39
0.52
NS
43
37
40
26.45
0.001
42
36
39
25.79
0.001
47
46
46
0.52
NS
46
45
45
0.01
NS
46
44
45
1.03
NS
52
43 50
48 48
46
85.49
115.40 39.82
36.08 46.65
46.96
<0.001
<0.001 <0.001
<0.001 <0.001
<0.001
FJ2>FE1, FE3, FJ1>FJ4, FE2>FJ3
FJ2>FE1>FE3, FJ1, FJ4>FE2, FJ3 FE1>FJ1>FE2, FE3>FJ2, FJ4>FJ3
FE1, FJ2>FJ1, FE3, FJ4, FE2>FJ2 FE1>FJ1, FE3>FE2, FJ4>FJ2, FJ3
FE1, FJ2>FE3, FJ1, FJ4>FE2, FJ3
/b/-/v/ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Beer
Veering
Mean F-valueb
p-value
Boom
View
Mean
F-valueb
p-value
Bargain
Virtual
Mean
F-valueb
p-value
73
57
65
138.44
0.001
77
58
67
436.50
0.001
70
56
63
22.12
0.001
53
48
50
17.66
0.001
59
50
54
50.87
0.001
57
53
55
15.69
0.001
59
57
58
13.88
0.002
66
57
61
183.50
0.001
62
59
60
1.39
NS
72
70
71
5.86
0.026
73
71
72
2.76
NS
65
68
66
0.63
NS
72
70
71
2.43
NS
76
70
73
31.22
0.001
74
73
73
1.18
NS
57
53
55
7.48
0.014
54
45
49
43.70
0.001
55
52
53
1.18
NS
62
68
65
20.48
0.001
64
64
64
0.18
NS
68
67
67
0.042
NS
64
60 67
59 64
61
90.89
118.61 71.57
134.83 17.49
38.67
<0.001
<0.001 <0.001
<0.001 <0001
<0.001
FE1, FJ1, FJ2>FJ4, FE3>FJ3>FE2
FJ1, FJ2, FJ4>FE1, FE3>FJ3>FE2 FE1, FJ2, FJ1>FE3, FJ4>FE2, FJ3
FJ1, FJ2>FJ4>FE1, FE3>FE2>FJ2 FJ2, FE1, FJ4>FJ1, FE3>FE2, FJ3
FJ2, FJ1>FJ4>FE3, FE1>FE2, FJ3

Table 6: Mean intensity for the female speakers [dB]. a The degrees of freedom are all 6 and 63. b The degrees of freedom are all 1 and 18. NS not significant.

/s/-/Σ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Shifting
Seafood
Mean
F-valueb
p-value
Shoot
Suit
Mean
F-valueb
p-value
Shutting
Son
Mean
F-valueb
p-value
52
49
50
4.49
0.04
56
54
55
2.07
NS
54
52
53
1.88
NS
48
44
46
18.76
0.001
51
45
48
38.71
0.001
48
45
46
10.96
0.004
68
65
66
3.71
NS
62
56
59
40.09
0.001
66
66
66
0.00
NS
48
40
44
22.37
0.001
49
40
44
14.04
0.001
45
40
42
11.83
0.003
53
54
53
1.81
NS
53
50
51
2.81
NS
46
46
46
44.11
0.001
53
58 54
49 51
49
78.94
89.58 36.88
23.43 99.22
90.36
<0.001
<0.001 <0.001
<0.001 <0.001
<0.001
ME3>MJ2, ME1>ME2, MJ1
ME3>MJ2>ME1>ME2.MJ1 ME3>ME1, MJ2>ME2, MJ1
ME3, ME1>MJ2, ME2>MJ1 ME3>ME1, MJ2>ME2, MJ1
ME3>ME1>MJ2, ME2>MJ1
/b/-/v/ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Beer
Veering
Mean F-valueb
p-value
Boom
View
Mean
F-valueb
p-value
Bargain
Virtual
Mean
F-valueb
p-value
76
62
69
67.63
0.001
79
60
69
432.01
0.001
76
67
71
44.41
0.001
58
48
53
42.80
0.001
61
55
58
4.79
0.001
59
57
58
2.99
NS
71
69
70
2.20
NS
79
67
73
38.10
0.001
79
75
77
7.66
0.013
60
58
59
2.20
NS
58
57
57
0.78
NS
61
62
61
2.59
NS
74
70
72
8.32
0.010
65
61
63
1.72
NS
67
66
66
0.32
NS
67
61 68
60 68
65
89.97
50.25 65.31
13.22 69.95
41.87
<0.001
<0.001 <0.001
<0.001
ME1, MJ2>ME3>MJ1, ME2
MJ2, ME3>ME1, MJ1>ME2 ME1, ME3>MJ2, ME2>MJ1
ME3>MJ2, ME1, MJ1>ME2 ME3, ME1>MJ2>MJ1, ME2
ME3>ME1, MJ2>MJ1>ME2

Table 7: Mean intensity of spectral peak for the male speakers [dB]. a The degrees of freedom are all 4 and 45. b The degrees of freedom are all 1 and 18. NS not significant

Discussion

This study examines precise descriptions of English consonant qualities produced by Japanese learners of English with using four consonants, /s/, /]/, /b/, /v/, and their frequency of spectral peak, duration and intensity values are measured. There are significant differences between the frequency of spectral peak of /s/ and /S/ produced by both English native speakers and Japanese learners of English. There are significant differences between the duration of /b/ and /v/ produced by both of them, and there are also significant differences between the intensity of /S/ and /s/, and /b/ and /v/ produced by both of them. However, the number of the cases that do not show a significant difference is not the same for English native speakers and Japanese learners of English. There are much more cases for Japanese learners of English that do not show a significant difference between the paired consonants, /s/ and /S/, or /b/ and /v/, than for English native speakers.

English distinguishes /b/ and /v/ but Japanese does not have /v/. It has /b/ only. English distinguishes /s/ and /S/ but Japanese does not distinguish /s/ and /S/ either. It has /s/ only. As for the Japanese /s/, it is palatalized in some contexts [11]:

• Palatalization of consonants before /i/ is regular in Japanese, and its effect is especially notable with /s/, /z/, /t/, /d/, /n/, /h/. This can threaten intelligibility when transferred to English consonants preceding /iù/ and /I/.

From this difference in these two languages, it is expected that the virtual distance pictured by the frequency of spectral peak of the constituent consonants, the duration in them or intensity values in them is expected to be different for the native English speakers and the Japanese learners of English.

Shorter consonant-distance hypotheses

Distance in the frequency of spectral peak of a pair of consonants, /s/ and /S/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ //. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Table 8. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect.

/s/-/Σ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Seafood-Shifting
Suit-shoot
Son-shutting
3539
1439
3003
4280
2989
4193
6884
4881
5738
3679
1791
4662
-7422
9162
1925
1836
2355
2364
3065
2213
-1538
2265
3547
2477
25.54
5.78
40.56
<0.001
<0.001
<0.001
FE3, FE2, FJ1>FE1, FJ4>FJ3>FJ2
FJ2>FE3, FE2>FJ3, FJ4, FJ1, FE1
FE3, FJ1>FE2, FE1>FJ3, FJ2>FJ4

Table 8: Mean distance of frequency of spectral peak between a pair of consonants for female speakers [Hz]. The degrees of freedom are all 6 and 63.

Distance in the duration of a pair of consonants, /b/ and /v/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ /A/. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Table 8. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect.

Distance in the intensity of a pair of consonants, /s/ and /S/ or /b/ and /v/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ /A/. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Tables 8-11. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect (Tables 12 and 13).

/s/-/Σ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Seafood-Shifting
Suit-Shoot
Son-shutting
3428
1974
2263
2821
2975
3129
2706
2045
1493
3156
3684
4104
693
886
1490
2560
2312
2495
12.15
10.99
31.52
<0.001
<0.001
<0.001
ME1, MJ1, ME2, ME3>MJ2
MJ1, ME2>ME3, ME1>MJ2
MJ1>ME2>ME1, ME3, MJ2

Table 9: Mean distance of frequency of spectral peak between a pair of consonants for male speakers [Hz]. The degrees of freedom are all 4 and 45.

/s/-/Σ/ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Seafood-Shifting
Suit-Shoot
Son-shutting
3428
1974
2263
2821
2975
3129
2706
2045
1493
3156
3684
4104
693
886
1490
319
101
81
68.20
13.88
15.65
<0.001
<0.001
<0.001
FE1>FE3, FE2>FJ3>FJ1, FJ4>FJ2
FE1>FJ3, FJ1, FE3, FE2>FJ2, FJ4
FE1>FJ1, FJ4, FJ2, FJ3, FE2>FE3

Table 10: Mean distance of duration between a pair of consonants for female speakers [msec]. The degrees of freedom are all 6 and 63.

/s/-/S/ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Veering-beer
View-boom
Virtual-bargain
164
55
95
485
69
-99
58
100
56
-45
-35
-100
-74
112
16
117
60
-6
37.38
2.35
6.19
<0.001
NS
<0.001
ME2>ME1, ME3>MJ1, MJ2 ME1, ME3, MJ2>ME2, MJ1

Table 11: Mean distance of duration between a pair of consonants for male speakers [msec]. The degrees of freedom are all 4 and 45. NS not significant.

/S/-/s/ FE1 FE2 FE3 FJ1 FJ2 FJ3 FJ4 Mean F-valuea p-value Comparison
Shifting-Seafood
Shoot-suit
Shutting-Son
Beer-veering
Boom-view
Bargain-virtual
6
22
35
156
187
146
72
69
70
51
91
37
73
42
61
18
96
2
110
67
94
24
23
-29
-65
-97
-145
18
56
8
16
58
56
38
93
36
19
2
22
-60
-9
2
33
23
27
35
76
28
15.31
15.45
35.76
36.39
27.33
6.47

<0.001
<0.001
<0.001
<0.001
<0.001
FJ1, FE3, FE2>FJ4, FJ3, FE1>FJ2
FE2, FJ1, FJ3, FE3, FE1>FJ4>FJ2
FJ1, FE2, FE3, FJ3>FE1, FJ4>FJ2
FE1>FE2, FJ3, FJ1, FE3, FJ2>FJ4
FE1>FE3, FJ3, FE2>FJ2, FJ1>FJ4
FE1>FE2, FJ3, FJ2, FE3, FJ4, FJ1

Table 12: Mean distance of intensity between a pair of consonants for female speakers [dB]. The degrees of freedom are all 6 and 63.

/s/-/S/ ME1 ME2 ME3 MJ1 MJ2 Mean F-valuea p-value Comparison
Shifting-Seafood
Shoot-suit
Shutting-son
Beer-veering
Boom-view
Bargain-virtual
30
18
20
143
185
89
42
61
36
101
64
19
28
60
0
25
117
39
87
6
50
18
13
-17
-16
34
88
39
39
12
34
51
38
65
83
28
8.14
3.71
6.28
14.23
16.85
7.96
<0.001
<0.011
<0.001
<0.001
<0.001
<0.001
MJ1, ME2>ME1, ME3, MJ2
MJ1, ME2, ME3, MJ2>ME1
MJ2, MJ1, ME2, ME1, ME3
ME1, ME2>MJ2, ME3, MJ1
ME1>ME3, ME2>MJ2, MJ1
ME1, ME3>ME2, MJ2, MJ1

Table 13: Mean distance of intensity between a pair of consonants for male speakers [dB]. The degrees of freedom are all 4 and 45.

Virtual distance of /s/ and /S/ measured in frequency of spectral peak produced by the native English speakers and the Japanese learners of English are presented in Figures 1 and 2.

phonetics-audiology-seafood-shifting

Figure 1: Distances of /s/ and /Σ/ in paired-words, 1: seafood-shifting, 2: suit-shoot, 3: son-shutting, measured by frequency of spectral peak of female speakers [Hz].

phonetics-audiology-son-shutting

Figure 2: Distances of /s/ and /Σ/ in paired-words, 1: seafood-shifting, 2: suit-shoot, 3: son-shutting, measured by frequency of spectral peak of male speakers [Hz].

Among three pairs of words measured by frequency of spectral peak, one pair, that holds /s/ or /S/ before /i/ present much shorter distance for the Japanese learners of English than the native English speakers. This phenomenon is explained with referring to an effect of regular palatalization of /s/ before /i/ in Japanese. The palatalization of /k/ before /i/ in English is simulated [12]. Seafood as katakana version is pronounced as [si]uùdo] and this Japanese [sj] is similar to English /S/. Katakana letters are used for putting loanwords in Japanese words and that would affect the sound of these words in English produced by Japanese learners of English. Different katakana letters, サ[sa] and シ ャ[sja] or ス[su] and シュ[sju], are used for the pairs of /s/ or /S/ before /A/, and /s/ or /S/ before /u/, whereas the same letter, シ[sji] is used for the pair of /s/ or /S/ before /i/.

Virtual distance of /v/ and /b/ measured in duration produced by native English speakers and Japanese learners of English are presented in Figures 3 and 4.

phonetics-audiology-virtual-bargain

Figure 3: Distances of /v/ and /b/ in 1: veering-beer, 2: view-boom, 3: virtual-bargain, measured by duration of female speakers [msec].

phonetics-audiology-male-speakers

Figure 4: Distances of /v/ and /b/ in 1: veering-beer, 2: view-boom, 3: virtual-bargain, measured by duration of male speakers [msec].

The distance of all the pairs of word measured by duration presents shorter one for Japanese learners of English than for native English speakers.

Virtual distance of /S/ and /s/, or /b/ and /v/ measured in intensity produced by native English speakers and Japanese learners of English is presented in Figures 5 and 6.

phonetics-audiology-beer-veering

Figure 5: Distances of /Σ/ and /s/ in paired-words, 1: shifting-seafood, 2: shoot-suit, 3: shutting-son, and /b/ and /v/ in 4: beer-veering, 5: boom-view, 6: bargain-virtual, measured by intensity of female speaker [dB].

phonetics-audiology-shoot-suit

Figure 6: Distances of /Σ/ and /s/ in paired-words, 1: shifting-seafood, 2: shoot-suit, 3: shutting-son, and /b/ and /v/ in 4: beer-veering, 5: boom-view, 6: bargainvirtual, measured by intensity of male speaker [dB].

All the pairs of words measured in intensity present shorter distance for Japanese learners of English than for native English speakers. Among three types of phonological features used for the norms of analyses in this study, intensity presentes the largest differences between the native English speakers and the Japanese learners of English. These differences, which are reflected on the virtual space of English consonants as is called in this study, would be easier to grasp with visualized threedimensions. Of course, the units used for each norm show different phonetic variations. Mean distances of all the paired words in each norm, frequency, duration and intensity are added and their means are calculated. They are put into three dimensions: frequency (4105 Hz for native English speakers and 2354 Hz for Japanese learners of English), duration (2537 msec. for native English speakers and 2335 msec. for Japanese learners of English), and intensity (474 dB for native English speakers and 18 dB for Japanese learners of English). Virtual space of English consonants in a cubic ellipse is calculated by multiplying the frequency, the duration, the intensity and 4/3π. The one produced by native English speakers is 20634 × 106 Hz msec dB and the one produced by Japanese learners of English is 413 × 106 Hz msec dB. Simple multiplication may not reveal the precise values of virtual space but these two numbers at least show that there is a big difference between what the native English speakers and the Japanese learners of English can deal with.

This study presents comparison of phonetic features, such as frequency of spectral peak, duration and intensity in the production of paired consonants, /S/ and /s/, or /b/ and /v/ between native English speakers and Japanese learners of English in the phonological contexts of _ /i/, _ /u/ or _ /A/. One of the points that is not hypothesized but newly observed in the results is that, among three types of phonological contexts, that of /i/ seems to affect the precedent consonants very much and produce the clear differences between the native English speakers and the Japanese learners of English. This might be because that the front part of vowel space for /i/ is very narrow. It would be a good way to focus on the phonological context of /i/ and observe the different pronunciation of consonants by the native English speakers and the Japanese learners of English.

Further research

The present study takes only a first step in broader research on Japanese learners’ consonant qualities in English. As such, many problems and questions remain for future investigations. Some of them are mentioned here.

First of all, the vowel sound of /i/s or /I/s is used to form the phonological contexts in this study, but it would be better to select the either one. The same is true for the context of /A/. This study includes /A/, /Ã/ and /Î/ to arrange words of minimal paired consonants in the phonological context, but it would also better to select the one type from the vowel contexts of /A/, /Ã/ or /Î/.

Gestures required to contrast manner of articulation produced at some places are different from those at other places [13]. How to measure the phonetic features of each consonant in these different places precisely is an important issue. Researches focusing on a more dynamic cue are introduced and recommended [3]:

• Subsequent research on invariance focused on a more dynamic cue, namely the change in distribution of high-frequency energy as compared to low-frequency energy between consonant onset and the onset of the following vowel. This approach was a refinement of the earlier research in that it still captured the basic notion that bilabials are characterized by a relative predominance of energy in the lower frequencies while alveolars showed a predominance of energy in the higher frequencies. Using this dynamic criterion, 91 percent of the labial, dental, and alveolar plosives in English, French, and Malayalam were correctly classified.

From this point of view, the measurement of consonants, such as /b/ and /v/ is better to consider their dynamic change like the following characteristics described [3]:

• Since plosives are produced with a complete constriction followed by a release, the change in energy from plosive to vowel is relatively large, certainly larger than that for approximants, which have only a moderate constriction.

As for the distances, not only real distances produced by speakers with pronouncing two different consonants, but also so-called perceptual distances should be taken into consideration. Speakers know the perceptual distances between two phonological elements, and based on this knowledge, they attempt to minimize the perceptual disparity between two corresponding elements in phonology [14].

Furthermore, there are several items that need to be clarified. A first question to ask may be whether and how consonant quality as found in this study contributes to the putatively low intelligibility noted for spoken English words produced by Japanese learners of English. Although some educationists support the idea of lingua franca core, that produces so-called pronunciation norm for non-native speakers [15], others including the authors of this study still think these norms are not adequate to apply for language learning in classrooms. A second question of interest may be which English consonants are difficult for Japanese (and other language) speakers to acquire and why. A third question which the author finds interesting involves the possible variability in consonant quality within speakers.

Japanese speakers of this study distinguish paired consonants fairly well. It is, however, too early to conclude that Japanese learners of English acquire English sounds very well. The virtual space of consonants produced by the Japanese learners of English is much smaller than the one by the native English speakers. This means although the former distinguishes paired sounds very well, the degree of discrimination is less for them than the ones produced by the latter.

Acknowledgment

The authors wish to thank Emily Goodwill, Thomas Green Jr., Amber Numamoto, Aaron Molinas, Zoe Nieves, Ahren Kerwood, Risa Endo, Hitomi Hori, Keiko Sakai, Shunetsu Arai, Misato Kameyama, Tatsuhiro Higuchi for their active participation in the language experiment.

References

  1. Daulton FE (2007) Japan’s Built-in Lexicon of English-based Loanwords. Toronto: Multilingual Matters LTD.
  2. Reetz H, Jongman A (2009) Phonetics: Transcription, Production, Acoustics, and Perception. Oxford: Blackwell Publishing.
  3. Kent RD, Read C (1992) The Acoustic Analysis of Speech. California: Singular Publishing Groups Inc.
  4. Crystal D (2010) The Cambridge Encyclopedia of Language. 3rd edition Cambridge CUP.
  5. Patel AD (2008) Music, Language, and the Brain. Oxford: Oxford University Press.
  6. Shaw JA, Davidson L (2011) Perceptual similarity in input-output mappings: A computational/experimental study of non-native speech production. Lingua 121: 1344-1358.
  7. Klatt DH (1975) Voice onset time, frication, and aspiration in word-initial consonant clusters. J Speech Hear Res 18: 686-706.
  8. Klatt DH (1976) Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J AcoustSoc Am 59: 1208-1221.
  9. Shinn P (1984) A cross-language investigation of the stop, affricate and fricative manner of articulation. Providence RI.
  10. Walker R (2011) Teaching the Pronunciation of English as a Lingua Franca. Oxford: Oxfored University Press.
  11. Morley RL (2014) Implications of an exemplar-theoretic model of phoneme genesis: a velar palatalization case study. Lang Speech 57: 3-41.
  12. Jong KJ, Hao YC, Park H (2009) Evidence for featural units in the acquisition of speech production skills: Linguistic structure in foreign accent. Journal of Phonetics 37: 357-373.
  13. Kawahara S, Shinohara K (2009) The role of psychoacoustic similarity in Japanese puns: A corpus study. Journal of Linguistics 45: 111-138.
  14. Jenkins J (2002) A sociolinguistically based, empirically researched pronunciation syllabus for English as an international language. Applied Linguistics 23:83-103.
Citation: Tomita K (2016) Virtual Space of English Consonants: Shorter Distance Produced by Japanese Learners of English. J Phonet and Audiol 2:112.

Copyright: © 2016 Tomita K. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Top