ISSN: 2471-9455
Research Article - (2016) Volume 2, Issue 1
This study investigates how Japanese learners of English pronounce two consonants, /s/ and /S/, or /b/ and /v/, of English minimal-paired words whose corresponding words are English-based loanwords in Japanese and written in katakana. Frequency of spectral peak, duration, and intensity of these consonants produced by six Japanese learners of English and six native English speakers are measured with acoustic equipment. Among these phonetic features, significant differences in frequency of spectral peak between /s/ and /S/ are observed. This holds true for both native English speakers and Japanese learners of English. There are also significant differences in duration between /b/ and /v/, and intensity between /S/ and /s/, or /b/ and /v/. A hypothesis that distance between the values of these features for each paired consonants tends to be smaller for the Japanese learners of English than for the native English speakers is also verified. Implications for further research are briefly discussed.
Keywords: Consonant; Consonantal distance; Duration; Formant; Intensity
Japanese and English have different phonological systems, which produce differences in timing, such as a stress-timed language or a mora-timed language. That might also cause differences in length, manner and even position of consonant articulation. A standard textbook of Japanese-English phonetics and phonology states that there is not a big difference in number of English and Japanese consonants, and broadly the former is uttered with stronger breath than the latter.
Manners and positions of English and Japanese consonants presented in Table 1, however, invite several questions as to those explanations of Japanese and English phonological features.
Bilabial | Labio-dental | Dental | Alveolar | Post-alveolar | Palatal | Velar | Uvular | Glotal | |
---|---|---|---|---|---|---|---|---|---|
Plosive | pb | td | td | kg | |||||
Affricate | ts | tSdZ | |||||||
Nasal | m | n | n | N | N | ||||
Flap | } | ||||||||
Fricative | | fv | TD | sz | SZ | h | |||
Approximant | ¨ | j | w | ||||||
Lateral approximant | l |
Table 1: Consonants which has the same manner and position of articulation for English and Japanese are presented in small letters and those whose manner or position differs in big letters, with italic for English and non-italic for Japanese.
It is then expected that these different phonological systems would affect Japanese learners of English at some learning stages. For example, one may ask to what extent Japanese speakers’ /S/ in she is different from English speakers’ one in terms of acoustic properties such as frequency of noise region. Transferring from one’s native language plays an important role in learning a foreign language [1]:
• Foreign accents are not the result of just “missing the mark” in random ways. To the contrary, careful inspection shows that the deviations between the goal and what is achieved are systematic; and can usually be attributed to the phonology, including the phonological rules, of one’s native language. The phenomenon of mispronunciations in a second language in ways attributable to the phonology of the first language is called transfer.
The transfer from a native language, for example, the one from English-based loan words in Japanese to English, is not studied empirically [2]:
• The argument that loanwords in Japanese are of (great) detriment to learners of English has been generated from observations of errors and evidenced with anecdotes. These studies have focused on perceived interference in pronunciation and word meaning. As with Contrastive Analysis itself, there is little empirical evidence presented, only descriptions of gross, superficial features of produced language.
Consonantal transferring would be observed in its phonetic features, such as duration and intensity, and they can be analyzed with values of a concentration of acoustic energy, the formant.
Consonants are easy to describe in articulatory terms whereas vowels are easier to describe in acoustic terms [3]. They are more difficult than vowels to be measured with a single category [4]. They, however, can be dealt with based on the same concepts [3]:
• This use of different dimensions in the description of consonants and vowels suggests that the articulation of these two classes of sounds has little in common. However, the articulatory description of both consonants and vowels is largely based on location of constriction (“place of articulation” in consonants, “frontness” in vowels) and degree of constriction (“manner of articulation” in consonants, “height” in vowels).
Formant values would be used to describe consonantal transferring observed in utterances by Japanese learners of English.
Consonants are generally classified by voicing/unvoicing, place of articulation and manner of articulation. Consonants, /s/ and /S/, for example, are both unvoicing sounds produced with a stream of air directed at the upper teeth, which creates noisy turbulent flow. Only their place of articulation is different. The sound, /s/, is made by touching the tip or blade of the tongue to a location just forward of the alveolar ridge. The sound, /S/, is made by touching the blade of the tongue to a location just behind the alveolar ridge.
With looking at the spectral noise region, we can tell, /s/ from /S/: there is the importance of a high-frequency noise region for /s/ and a low-frequency noise region for /S/. This noise region has some ranges: The energy for /s/ is largely above 4,000 Hz and that for /S/ begins lower at around 2,500 Hz [5]. Identification of [s] appeares to depend on energy peaks at about 5000 and 8000 Hz, whereas identification of [S] is related to a peak at about 2500 Hz [4]. The difference between frequency spectral peak of [s] and [S], in which [S] typically exhibits a mid-frequency spectral peak around 2,500-3,500 Hz and alveolar [s] is produced with a shorter anterior cavity than [S] and therefore display a primary spectral peak at higher frequencies, ranging from 4,000 to 7,000 Hz [3].
A consonant, /v/, is also a voicing sound formed by touching the lower lip to the upper teeth with a tight constriction which is made so that air passing through the constriction flows turbulently, making a hissing noise. A consonant, /b/, is defined to be a voicing sound formed by two lips with the airflow through the mouth is momentarily closed off.
Consonants are also discussed in groups that are distinctive in their articulatory and acoustic properties: plosives, affricates, nasals, fricatives and approximants. Spectral change is a vital part of creating characteristic timbre of consonants [6]. They are always coproduced with a vowel, and are acoustically characterized by a period of silence (corresponding to the vocal tract closure), followed by a sudden broadband burst of energy (as the vocal tract constriction is released), followed by formant transitions that typically last about 50 msec.
Languages traditionally classified as stress-timed have low %V (percent of duration occupied by vowels) and high ΔC values (consonantal interval variability), while languages traditionally classified as syllable timed or mora timed have high %V and low ΔC values [6]. This would affect the duration of consonants in stress timed language when they are produced by learners whose native language is syllable or mora timed.
With looking at the duration of noise segments, we can tell, /b/ from /v/. It is reported that when stops, affricates and fricatives are compared in an equivalent context, the fricatives generally have the longest noise segments. The interval from release of a consonant constriction to the onset of voicing is larger in fricatives than in stops [7]. In a study of the noise segment durations for stops, affricates, and fricatives in the languages of Mandarin, Czech, and German, the following durational boundaries are identified: 62 to 78 msec for the stop-affricate boundary, and 132 to 133 msec for the affricate-fricative boundary [8-10].
On the basis of these phonological features, it is estimated that there would be a significant difference in the values of their production of frequency of noise region, duration and intensity between paired English consonants, /S/ and /s/, or /b/ and /v/ produced by both native English speakers and Japanese learners of English. Besides, for minimalpaired consonants, such as /s/ and /S/, Japanese learners of English would produce /s/s that display noise region at lower frequencies, and /S/s that display noise region at higher frequencies than those produced by native speakers of English. For minimal-paired consonants, such as /v/ and /b/, Japanese learners of English would produce /v/s that hold shorter duration, and /b/s that hold longer duration than those produced by native speakers of English. For both of these minimalpaired consonants, Japanese learners of English would produce /S/s in lower intensity and /s/s in higher intensity or /b/s in lower intensity and /v/s in higher intensity than those produced by native speakers of English
On the basis of these predictions, it is hypothesized that distance between the locations that the blade of the tongue touches for producing /s/ and /S/, which can be called virtual consonant space, would smaller for Japanese learners of English than for native English speakers. Also the distance between the duration of /v/ and /b/, or the distance between the intensity of /S/ and /s/, or /b/ and /v/, which can be also called virtual consonant space, would smaller for Japanese learners of English than for native English speakers.
Method
Subjects: Three female speakers of American English (hereafter FE1, FE2, FE3), three male speakers of American English (hereafter ME1, ME2, ME3) and four female Japanese learners of English (FJ1, FJ2, FJ3, FJ4) and two male Japanese learners of English (MJ1, MJ2) participated in the experiment. The native English speakers aged 23 or 20 came from Oklahoma, U.S.A. as exchange students with one year term. The Japanese learners of English who were from Northern Japan were college students and their ages were 21 or 20 years. On the basis of the TOEIC® scores, they were regarded as intermediate-level learners of college English.
Stimuli: The stimuli consisted of a pair of monosyllabic or disyllabic words. They were (1) veering /vI«¨IN/, (2) beer /bI«¨/, (3) view /vju:/, (4) boom /bu:m/, (5) virtual /vÎ:]u«l/, (6) bargain /bA:]gIn/, (7) seafood /si:]u:d/, (8) shifting /SIftIN/ (9) suit /su:]/, (10) shoot /Su:]/, (11) son / sÃn/, and (12) shutting /SÃtIN/. All these words except “veering” and “son” were used as English-based loanwords in Japanese. They were called in such ways as /bi±u/, /bju:/, /bu:mu/, /ba:tSa±u/, /ba:gen/, /si:u: do/, /siutingu/, /su:tsu/, /sju:to/, /sjaÂtingu/. Each stimulus item was printed in a carrier phrase, I said “ ”., on a 21 cm × 30 cm sheet of paper.
Procedure
Each speaker was presented with test sheets and asked to clearly produce test items ten times. He/she was instructed to pronounce each word as clearly as possible. This instruction was important particularly for native English speakers because elision of consonants in conversational speech was not uncommon.
Recordings were made of 1440 items (12 speakers × 12 items × 10 times per subject) and they were recorded in a sound treated room using a Sony unidirectional dynamic microphone (F-V640) and a Marantz solid state recorder (PMD670). The microphone was positioned at a lip-to-mouth distance of approximately five cm. Recordings lasted approximately one hour for each subject.
Acoustic measurements
The speech samples were analyzed using the Praat speech analyzing software (http: www.praat.org). The sampling rate was 44.1 kHz with a 16 bit resolution. For each consonant, the mean frequency of noise region, the mean duration and the mean intensity values were used for analyses.
Analyses
Frequency of noise region, duration and intensity were compared between subjects. An analysis of variance (ANOVA) with repeated measures was a basic tool used for mean comparisons.
The frequency ranges from 1129 to 10890 Hz. The durations ranges from 13 to 149 msec. The intensity ranges from 36 to 79 dB.
Frequency of noise regions for /s/ and /]/ comparison
The spectral noise region is measured with acquiring frequency of the highest spectrum by observing spectral slice. Mean frequencies of the spectral peak in hertz, that are observed in consonants, /s/ or /S/, of the experimental words produced by each subject is shown in Tables 2 and 3. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal quality effect. As predicted, comparison of /s/ and /S/ shows that the frequency of noise region of /s/ is significantly higher than that of /S/ for all native English speakers and Japanese learners of English except MJ2’s seafoodshifting.
/s/-/S/ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|---|---|
Seafood Shifting Mean F-valueb p-value Suit Shoot Mean F-valueb p-value Son Shutting Mean F-valueb p-value |
7634 4095 5864 115.64 0.001 5572 4132 4847 6.53 0.02 7192 4189 5690 467.78 0.001 |
7994 3714 5854 114.81 0.001 6629 3639 5134 56.45 0.001 7694 3501 5597 91.69 0.001 | 10890 1129 6009 610.88 0.001 8635 3753 6194 24.75 0.001 9599 3861 6730 545.07 0.001 | 7288 3609 5448 14.00 0.001 5489 3698 4593 14.25 0.001 8424 3762 6093 137.71 0.001 | 2562 3304 2931 3.93 0.06 4545 3629 4087 30.01 0.001 6293 4368 5330 11.21 0.004 | 6560 4724 5642 9.61 0.06 6556 4201 5378 15.94 0.001 6932 4568 5750 41.18 0.001 | 5665 5358 5511 4.16 0.05 6185 3972 5078 266.20 0.001 5050 6588 5819 24.64 0.001 |
6941 3704 6230 3860 7312 4405 |
27.35 18.12 5.76 6.65 18.85 38.78 |
<0.001 <0.001 <0.001 <0.001 <0.001 <0.001 |
FE3>FE2, FE1, FJ1, FJ3, FJ4>FJ2 FJ4, FJ3>FE1, FE2, FJ1, FJ2,>FE3 FE3>FE2, FJ3, FJ4, FE1, FJ1 FJ3, FE1, FJ4>FE3, FJ1, FE2, FJ2 FE3, FJ1>FE2, FE1, FJ3, FJ2,>FJ4 FJ4>FJ3, FJ2, FE1, FE3, FJ1>FE2 |
Table 2: Mean frequency of spectral peak for the female speakers [Hz]. a The degrees of freedom are all 6 and 63. bThe degrees of freedom are all 1 and 18.
/s/-/S | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Seafood Shifting Mean F-valueb p-value Suit Shoot Mean F-valueb p-value Son Shutting Mean F-valueb p-value |
8067 4639 6353 33.86 0.001 6042 4068 5055 20.67 0.001 6576 4313 5444 69.66 0.001 |
5603 2781 4192 291.80 0.001 5516 2541 4028 224.68 0.001 5830 2700 4265 222.50 0.001 |
6998 4291 5644 226.61 0.001 5225 3180 4202 32.55 0.001 5716 4222 4969 49.60 0.001 |
6670 3513 5091 35.74 0.001 6881 3196 5038 279.56 0.001 7733 3629 5681 318.44 0.001 |
4661 4667 4664 40.49 NS 4687 3801 4244 40.846 0.001 5623 4133 4878 87.65 0.001 |
6400 3978 5670 3357 5695 3799 |
13.83 42.91 10.42 19.90 27.34 42.91 |
<0.001 <0.001 <0.001 <0.001 <0.001 <0.001 |
ME1, ME3>MJ>ME2, MJ2 ME1, ME3>MJ1>MJ2>ME2 MJ1, ME1>ME2, ME3, MJ2 ME1, MJ2>MJ1, ME2>ME2 MJ1>ME1>ME2, ME3, MJ2 MJ1>ME1, ME3, MJ2>ME2 |
Table 3: Mean frequency of spectral peak for the male speakers [Hz]. aThe degrees of freedom are all 4 and 45. bThe degrees of freedom are all 1 and 18. NS not significant.
Duration of segments for /b/ and /v/ comparison
The duration is measured with observing spectrogram and acquiring range of energy spread fairly evenly. Mean durations in milliseconds of consonants, /v/ or /b/, produced by each subject is shown in Tables 4 and 5. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal quality effect. As predicted, there are significant differences between duration of /v/ and /b/ for both native English speakers and Japanese learners of English. Comparison of /v/ and /b/ shows that the duration of /v/ was significantly longer than that of /b/ for native English speakers except FE2’s view-boom, virtual-bargain, FE3’s view-boom, ME1’s view-boom, ME2’s view-boom, ME3’s veering-beer, virtual-bargain. The comparison of /v/ and /b/ also showed that the former was significantly longer than the latter for Japanese learners of English except FJ1’s view-boom, virtual-bargain, FJ2’s view-boom, virtual-bargain, FJ3’s virtual-bargain, FJ4’s veering-beer, virtual-bargain, MJ1’s veering-beer, view-boom, MJ2’s veering-beer and virtual-bargain.
/v/-/b/ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|---|---|
Veering Beer Mean F-valueb p-value View Boom Mean F-valueb p-value Virtual Bargain Mean F-valueb p-value |
149 24 86 695.71 0.001 82 29 55 24.59 0.001 98 48 73 11.07 0.004 |
47 17 32 80.73 0.001 30 26 28 3.24 NS 26 29 27 0.68 NS |
97 44 70 94.13 0.001 52 48 50 0.41 NS 50 61 55 8.25 0.010 |
52 38 45 6.87 0.01 59 52 55 2.41 NS 61 48 54 1.49 NS |
32 53 42 17.81 0.001 57 59 58 0.32 NS 57 55 56 0.06 NS |
68 45 56 5.58 0.03 85 60 72 12.08 0.003 61 63 62 0.06 NS |
36 37 36 0.09 NS 32 51 41 11.74 0.003 42 35 38 3.64 NS |
68 37 56 46 56 48 |
12.49 75.23 18.28 13.98 16.92 4.79 |
<0.001 <0.001 <0.001 <0.001 <0.001 <0.001 |
FE1>FE3>FJ3, FJ1, FE2, FJ3, FJ2 FJ2, FJ3, FE3, FJ1, FJ4, FE1>FE2 FJ3, FE1>FJ1, FJ2, FE3>FJ4, FE2 FJ2, FJ2, FJ1, FJ4, FE3>FE1, FE2 FE1>FJ1, FK3, FJ2, FE3>FJ4, FE2 FJ3, FE3, FJ1, FJ2, FE1, FJ4>FE2 |
Table 4: Mean duration for the female speakers [msec]. aThe degrees of freedom are all 6 and 63.bThe degrees of freedom are all 1 and 18. NS not significant.
/v/-/b/ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Veering Beer Mean F-valueb p-value View Boom Mean F-valueb p-value Virtual Bargain Mean F-valueb p-value |
30 13 21 61.28 0.001 26 20 23 3.46 NS 24 14 19 19.72 0.001 |
71 23 47 84.16 0.001 33 26 29 2.80 NS 29 39 34 12.18 0.003 |
44 38 41 1.68 NS 50 40 45 4.84 0.04 49 43 46 1.07 NS |
25 29 27 2.91 NS 35 39 37 0.56 NS 38 48 43 6.20 0.023 |
57 41 34 0.52 NS 41 30 35 4.48 0.04 35 33 34 0.40 NS |
45 28 37 31 35 35 |
2.12 22.44 6.66 11.54 24.92 18.59 |
NS <0.001 <0.001 <0.001 <0.001 <0.001 |
MJ2, ME3>MJ1, ME2>ME1 ME3, MJ2>MJ1, ME2, ME1 ME3, MJ1>MJ2, ME2, ME1 ME3>MJ1, MJ2>ME2, ME1 MJ1, ME3, ME2>MJ2>ME1 |
Table 5: Mean duration for the male speakers [msec]. a The degrees of freedom are all 4 and 45. b The degrees of freedom are all 1 and 18. NS not significant.
Intensity of segments for /S/ and /s/ or /b/ and /v/ comparison
The intensity is measured with observing spectrogram and acquiring energy values. Mean intensities in decibels of consonants, /S/ and /s/, or /b/ and /v/, produced by each subject is shown in Tables 6 and 7. A 2 × 7 ANOVA with two consonants and seven female speakers and the 2 × 5 ANOVA with two consonants and five male speakers are performed to examine the speaker effect and the consonantal effect. As is expected, comparison of /S/ and /s/ shows that the intensity of /S/ is significantly higher than that of /s/ and comparison of /b/ and /v/ shows that the intensity of /b/ is significantly higher than that of /v/ for both native English speakers and Japanese learners of English. Comparison of /S/ and /s/ shows that the intensity of /S/ is significantly higher than that of /s/ for native English speakers except ME1’s shootsuit, shutting-son, ME3’s shifting-seafood, shutting-son. Comparison of /b/ and /v/ shows that intensity of /b/ is significantly higher than that of /v/ for native English speakers except FE3’s bargain-virtual, ME2’s bargain-virtual, and ME3’s beer-veering. Comparison of /S/ and /s/ or /b/ and /v/ produced by Japanese learners of English shows that among 18 cases of /S/ and /s/ comparison, six ones do not show a significant difference, and among 18 cases of /b/ and /v/ comparison, 12 ones do not show a significant difference.
/S/-/s/ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|---|---|
Shifting Seafood Mean F-valueb p-value Shoot Suit Mean F-valueb p-value Shutting Son Mean F-valueb p-value |
56 56 56 0.23 NS 61 59 60 7.83 0.010 60 57 58 8.32 0.010 |
49 42 45 72.90 0.001 51 44 47 22.25 0.001 47 40 43 39.03 0.001 |
54 47 50 46.97 0.001 51 47 49 15.17 0.001 53 47 50 32.54 0.001 |
54 43 48 102.93 0.001 56 49 52 16.33 0.001 54 45 49 42.43 0.001 |
67 73 70 22.19 0.001 48 58 53 13.87 0.002 42 56 49 58.31 0.001 |
40 38 39 0.52 NS 43 37 40 26.45 0.001 42 36 39 25.79 0.001 |
47 46 46 0.52 NS 46 45 45 0.01 NS 46 44 45 1.03 NS |
52 43 50 48 48 46 |
85.49 115.40 39.82 36.08 46.65 46.96 |
<0.001 <0.001 <0.001 <0.001 <0.001 <0.001 |
FJ2>FE1, FE3, FJ1>FJ4, FE2>FJ3 FJ2>FE1>FE3, FJ1, FJ4>FE2, FJ3 FE1>FJ1>FE2, FE3>FJ2, FJ4>FJ3 FE1, FJ2>FJ1, FE3, FJ4, FE2>FJ2 FE1>FJ1, FE3>FE2, FJ4>FJ2, FJ3 FE1, FJ2>FE3, FJ1, FJ4>FE2, FJ3 |
/b/-/v/ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
Beer Veering Mean F-valueb p-value Boom View Mean F-valueb p-value Bargain Virtual Mean F-valueb p-value |
73 57 65 138.44 0.001 77 58 67 436.50 0.001 70 56 63 22.12 0.001 |
53 48 50 17.66 0.001 59 50 54 50.87 0.001 57 53 55 15.69 0.001 |
59 57 58 13.88 0.002 66 57 61 183.50 0.001 62 59 60 1.39 NS |
72 70 71 5.86 0.026 73 71 72 2.76 NS 65 68 66 0.63 NS |
72 70 71 2.43 NS 76 70 73 31.22 0.001 74 73 73 1.18 NS |
57 53 55 7.48 0.014 54 45 49 43.70 0.001 55 52 53 1.18 NS |
62 68 65 20.48 0.001 64 64 64 0.18 NS 68 67 67 0.042 NS |
64 60 67 59 64 61 |
90.89 118.61 71.57 134.83 17.49 38.67 |
<0.001 <0.001 <0.001 <0.001 <0001 <0.001 |
FE1, FJ1, FJ2>FJ4, FE3>FJ3>FE2 FJ1, FJ2, FJ4>FE1, FE3>FJ3>FE2 FE1, FJ2, FJ1>FE3, FJ4>FE2, FJ3 FJ1, FJ2>FJ4>FE1, FE3>FE2>FJ2 FJ2, FE1, FJ4>FJ1, FE3>FE2, FJ3 FJ2, FJ1>FJ4>FE3, FE1>FE2, FJ3 |
Table 6: Mean intensity for the female speakers [dB]. a The degrees of freedom are all 6 and 63. b The degrees of freedom are all 1 and 18. NS not significant.
/s/-/Σ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Shifting Seafood Mean F-valueb p-value Shoot Suit Mean F-valueb p-value Shutting Son Mean F-valueb p-value |
52 49 50 4.49 0.04 56 54 55 2.07 NS 54 52 53 1.88 NS |
48 44 46 18.76 0.001 51 45 48 38.71 0.001 48 45 46 10.96 0.004 |
68 65 66 3.71 NS 62 56 59 40.09 0.001 66 66 66 0.00 NS |
48 40 44 22.37 0.001 49 40 44 14.04 0.001 45 40 42 11.83 0.003 |
53 54 53 1.81 NS 53 50 51 2.81 NS 46 46 46 44.11 0.001 |
53 58 54 49 51 49 |
78.94 89.58 36.88 23.43 99.22 90.36 |
<0.001 <0.001 <0.001 <0.001 <0.001 <0.001 |
ME3>MJ2, ME1>ME2, MJ1 ME3>MJ2>ME1>ME2.MJ1 ME3>ME1, MJ2>ME2, MJ1 ME3, ME1>MJ2, ME2>MJ1 ME3>ME1, MJ2>ME2, MJ1 ME3>ME1>MJ2, ME2>MJ1 |
/b/-/v/ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
Beer Veering Mean F-valueb p-value Boom View Mean F-valueb p-value Bargain Virtual Mean F-valueb p-value |
76 62 69 67.63 0.001 79 60 69 432.01 0.001 76 67 71 44.41 0.001 |
58 48 53 42.80 0.001 61 55 58 4.79 0.001 59 57 58 2.99 NS |
71 69 70 2.20 NS 79 67 73 38.10 0.001 79 75 77 7.66 0.013 |
60 58 59 2.20 NS 58 57 57 0.78 NS 61 62 61 2.59 NS |
74 70 72 8.32 0.010 65 61 63 1.72 NS 67 66 66 0.32 NS |
67 61 68 60 68 65 |
89.97 50.25 65.31 13.22 69.95 41.87 |
<0.001 <0.001 <0.001 <0.001 |
ME1, MJ2>ME3>MJ1, ME2 MJ2, ME3>ME1, MJ1>ME2 ME1, ME3>MJ2, ME2>MJ1 ME3>MJ2, ME1, MJ1>ME2 ME3, ME1>MJ2>MJ1, ME2 ME3>ME1, MJ2>MJ1>ME2 |
Table 7: Mean intensity of spectral peak for the male speakers [dB]. a The degrees of freedom are all 4 and 45. b The degrees of freedom are all 1 and 18. NS not significant
This study examines precise descriptions of English consonant qualities produced by Japanese learners of English with using four consonants, /s/, /]/, /b/, /v/, and their frequency of spectral peak, duration and intensity values are measured. There are significant differences between the frequency of spectral peak of /s/ and /S/ produced by both English native speakers and Japanese learners of English. There are significant differences between the duration of /b/ and /v/ produced by both of them, and there are also significant differences between the intensity of /S/ and /s/, and /b/ and /v/ produced by both of them. However, the number of the cases that do not show a significant difference is not the same for English native speakers and Japanese learners of English. There are much more cases for Japanese learners of English that do not show a significant difference between the paired consonants, /s/ and /S/, or /b/ and /v/, than for English native speakers.
English distinguishes /b/ and /v/ but Japanese does not have /v/. It has /b/ only. English distinguishes /s/ and /S/ but Japanese does not distinguish /s/ and /S/ either. It has /s/ only. As for the Japanese /s/, it is palatalized in some contexts [11]:
• Palatalization of consonants before /i/ is regular in Japanese, and its effect is especially notable with /s/, /z/, /t/, /d/, /n/, /h/. This can threaten intelligibility when transferred to English consonants preceding /iù/ and /I/.
From this difference in these two languages, it is expected that the virtual distance pictured by the frequency of spectral peak of the constituent consonants, the duration in them or intensity values in them is expected to be different for the native English speakers and the Japanese learners of English.
Shorter consonant-distance hypotheses
Distance in the frequency of spectral peak of a pair of consonants, /s/ and /S/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ //. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Table 8. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect.
/s/-/Σ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|---|---|
Seafood-Shifting Suit-shoot Son-shutting |
3539 1439 3003 |
4280 2989 4193 |
6884 4881 5738 |
3679 1791 4662 |
-7422 9162 1925 |
1836 2355 2364 |
3065 2213 -1538 |
2265 3547 2477 |
25.54 5.78 40.56 |
<0.001 <0.001 <0.001 |
FE3, FE2, FJ1>FE1, FJ4>FJ3>FJ2 FJ2>FE3, FE2>FJ3, FJ4, FJ1, FE1 FE3, FJ1>FE2, FE1>FJ3, FJ2>FJ4 |
Table 8: Mean distance of frequency of spectral peak between a pair of consonants for female speakers [Hz]. The degrees of freedom are all 6 and 63.
Distance in the duration of a pair of consonants, /b/ and /v/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ /A/. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Table 8. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect.
Distance in the intensity of a pair of consonants, /s/ and /S/ or /b/ and /v/ is compared for each pair of words, which occurs in the phonological contexts of __ /I/, __ /u/ or __ /A/. The mean difference of distance in frequencies of spectral peak for the consonants in paired words is shown in Tables 8-11. A 1 × 7 ANOVA with one paired distance of consonants and seven female speakers and the 1 × 5 ANOVA with one paired distance of consonants and five male speakers are performed to examine the speaker effect (Tables 12 and 13).
/s/-/Σ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Seafood-Shifting Suit-Shoot Son-shutting |
3428 1974 2263 |
2821 2975 3129 |
2706 2045 1493 |
3156 3684 4104 |
693 886 1490 |
2560 2312 2495 |
12.15 10.99 31.52 |
<0.001 <0.001 <0.001 |
ME1, MJ1, ME2, ME3>MJ2 MJ1, ME2>ME3, ME1>MJ2 MJ1>ME2>ME1, ME3, MJ2 |
Table 9: Mean distance of frequency of spectral peak between a pair of consonants for male speakers [Hz]. The degrees of freedom are all 4 and 45.
/s/-/Σ/ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Seafood-Shifting Suit-Shoot Son-shutting |
3428 1974 2263 |
2821 2975 3129 |
2706 2045 1493 |
3156 3684 4104 |
693 886 1490 |
319 101 81 |
68.20 13.88 15.65 |
<0.001 <0.001 <0.001 |
FE1>FE3, FE2>FJ3>FJ1, FJ4>FJ2 FE1>FJ3, FJ1, FE3, FE2>FJ2, FJ4 FE1>FJ1, FJ4, FJ2, FJ3, FE2>FE3 |
Table 10: Mean distance of duration between a pair of consonants for female speakers [msec]. The degrees of freedom are all 6 and 63.
/s/-/S/ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Veering-beer View-boom Virtual-bargain |
164 55 95 |
485 69 -99 |
58 100 56 |
-45 -35 -100 |
-74 112 16 |
117 60 -6 |
37.38 2.35 6.19 |
<0.001 NS <0.001 |
ME2>ME1, ME3>MJ1, MJ2 ME1, ME3, MJ2>ME2, MJ1 |
Table 11: Mean distance of duration between a pair of consonants for male speakers [msec]. The degrees of freedom are all 4 and 45. NS not significant.
/S/-/s/ | FE1 | FE2 | FE3 | FJ1 | FJ2 | FJ3 | FJ4 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|---|---|
Shifting-Seafood Shoot-suit Shutting-Son Beer-veering Boom-view Bargain-virtual |
6 22 35 156 187 146 |
72 69 70 51 91 37 |
73 42 61 18 96 2 |
110 67 94 24 23 -29 |
-65 -97 -145 18 56 8 |
16 58 56 38 93 36 |
19 2 22 -60 -9 2 |
33 23 27 35 76 28 |
15.31 15.45 35.76 36.39 27.33 6.47 |
<0.001 <0.001 <0.001 <0.001 <0.001 |
FJ1, FE3, FE2>FJ4, FJ3, FE1>FJ2 FE2, FJ1, FJ3, FE3, FE1>FJ4>FJ2 FJ1, FE2, FE3, FJ3>FE1, FJ4>FJ2 FE1>FE2, FJ3, FJ1, FE3, FJ2>FJ4 FE1>FE3, FJ3, FE2>FJ2, FJ1>FJ4 FE1>FE2, FJ3, FJ2, FE3, FJ4, FJ1 |
Table 12: Mean distance of intensity between a pair of consonants for female speakers [dB]. The degrees of freedom are all 6 and 63.
/s/-/S/ | ME1 | ME2 | ME3 | MJ1 | MJ2 | Mean | F-valuea | p-value | Comparison |
---|---|---|---|---|---|---|---|---|---|
Shifting-Seafood Shoot-suit Shutting-son Beer-veering Boom-view Bargain-virtual |
30 18 20 143 185 89 |
42 61 36 101 64 19 |
28 60 0 25 117 39 |
87 6 50 18 13 -17 |
-16 34 88 39 39 12 |
34 51 38 65 83 28 |
8.14 3.71 6.28 14.23 16.85 7.96 |
<0.001 <0.011 <0.001 <0.001 <0.001 <0.001 |
MJ1, ME2>ME1, ME3, MJ2 MJ1, ME2, ME3, MJ2>ME1 MJ2, MJ1, ME2, ME1, ME3 ME1, ME2>MJ2, ME3, MJ1 ME1>ME3, ME2>MJ2, MJ1 ME1, ME3>ME2, MJ2, MJ1 |
Table 13: Mean distance of intensity between a pair of consonants for male speakers [dB]. The degrees of freedom are all 4 and 45.
Virtual distance of /s/ and /S/ measured in frequency of spectral peak produced by the native English speakers and the Japanese learners of English are presented in Figures 1 and 2.
Among three pairs of words measured by frequency of spectral peak, one pair, that holds /s/ or /S/ before /i/ present much shorter distance for the Japanese learners of English than the native English speakers. This phenomenon is explained with referring to an effect of regular palatalization of /s/ before /i/ in Japanese. The palatalization of /k/ before /i/ in English is simulated [12]. Seafood as katakana version is pronounced as [si]uùdo] and this Japanese [sj] is similar to English /S/. Katakana letters are used for putting loanwords in Japanese words and that would affect the sound of these words in English produced by Japanese learners of English. Different katakana letters, サ[sa] and シ ャ[sja] or ス[su] and シュ[sju], are used for the pairs of /s/ or /S/ before /A/, and /s/ or /S/ before /u/, whereas the same letter, シ[sji] is used for the pair of /s/ or /S/ before /i/.
Virtual distance of /v/ and /b/ measured in duration produced by native English speakers and Japanese learners of English are presented in Figures 3 and 4.
The distance of all the pairs of word measured by duration presents shorter one for Japanese learners of English than for native English speakers.
Virtual distance of /S/ and /s/, or /b/ and /v/ measured in intensity produced by native English speakers and Japanese learners of English is presented in Figures 5 and 6.
All the pairs of words measured in intensity present shorter distance for Japanese learners of English than for native English speakers. Among three types of phonological features used for the norms of analyses in this study, intensity presentes the largest differences between the native English speakers and the Japanese learners of English. These differences, which are reflected on the virtual space of English consonants as is called in this study, would be easier to grasp with visualized threedimensions. Of course, the units used for each norm show different phonetic variations. Mean distances of all the paired words in each norm, frequency, duration and intensity are added and their means are calculated. They are put into three dimensions: frequency (4105 Hz for native English speakers and 2354 Hz for Japanese learners of English), duration (2537 msec. for native English speakers and 2335 msec. for Japanese learners of English), and intensity (474 dB for native English speakers and 18 dB for Japanese learners of English). Virtual space of English consonants in a cubic ellipse is calculated by multiplying the frequency, the duration, the intensity and 4/3π. The one produced by native English speakers is 20634 × 106 Hz msec dB and the one produced by Japanese learners of English is 413 × 106 Hz msec dB. Simple multiplication may not reveal the precise values of virtual space but these two numbers at least show that there is a big difference between what the native English speakers and the Japanese learners of English can deal with.
This study presents comparison of phonetic features, such as frequency of spectral peak, duration and intensity in the production of paired consonants, /S/ and /s/, or /b/ and /v/ between native English speakers and Japanese learners of English in the phonological contexts of _ /i/, _ /u/ or _ /A/. One of the points that is not hypothesized but newly observed in the results is that, among three types of phonological contexts, that of /i/ seems to affect the precedent consonants very much and produce the clear differences between the native English speakers and the Japanese learners of English. This might be because that the front part of vowel space for /i/ is very narrow. It would be a good way to focus on the phonological context of /i/ and observe the different pronunciation of consonants by the native English speakers and the Japanese learners of English.
Further research
The present study takes only a first step in broader research on Japanese learners’ consonant qualities in English. As such, many problems and questions remain for future investigations. Some of them are mentioned here.
First of all, the vowel sound of /i/s or /I/s is used to form the phonological contexts in this study, but it would be better to select the either one. The same is true for the context of /A/. This study includes /A/, /Ã/ and /Î/ to arrange words of minimal paired consonants in the phonological context, but it would also better to select the one type from the vowel contexts of /A/, /Ã/ or /Î/.
Gestures required to contrast manner of articulation produced at some places are different from those at other places [13]. How to measure the phonetic features of each consonant in these different places precisely is an important issue. Researches focusing on a more dynamic cue are introduced and recommended [3]:
• Subsequent research on invariance focused on a more dynamic cue, namely the change in distribution of high-frequency energy as compared to low-frequency energy between consonant onset and the onset of the following vowel. This approach was a refinement of the earlier research in that it still captured the basic notion that bilabials are characterized by a relative predominance of energy in the lower frequencies while alveolars showed a predominance of energy in the higher frequencies. Using this dynamic criterion, 91 percent of the labial, dental, and alveolar plosives in English, French, and Malayalam were correctly classified.
From this point of view, the measurement of consonants, such as /b/ and /v/ is better to consider their dynamic change like the following characteristics described [3]:
• Since plosives are produced with a complete constriction followed by a release, the change in energy from plosive to vowel is relatively large, certainly larger than that for approximants, which have only a moderate constriction.
As for the distances, not only real distances produced by speakers with pronouncing two different consonants, but also so-called perceptual distances should be taken into consideration. Speakers know the perceptual distances between two phonological elements, and based on this knowledge, they attempt to minimize the perceptual disparity between two corresponding elements in phonology [14].
Furthermore, there are several items that need to be clarified. A first question to ask may be whether and how consonant quality as found in this study contributes to the putatively low intelligibility noted for spoken English words produced by Japanese learners of English. Although some educationists support the idea of lingua franca core, that produces so-called pronunciation norm for non-native speakers [15], others including the authors of this study still think these norms are not adequate to apply for language learning in classrooms. A second question of interest may be which English consonants are difficult for Japanese (and other language) speakers to acquire and why. A third question which the author finds interesting involves the possible variability in consonant quality within speakers.
Japanese speakers of this study distinguish paired consonants fairly well. It is, however, too early to conclude that Japanese learners of English acquire English sounds very well. The virtual space of consonants produced by the Japanese learners of English is much smaller than the one by the native English speakers. This means although the former distinguishes paired sounds very well, the degree of discrimination is less for them than the ones produced by the latter.
The authors wish to thank Emily Goodwill, Thomas Green Jr., Amber Numamoto, Aaron Molinas, Zoe Nieves, Ahren Kerwood, Risa Endo, Hitomi Hori, Keiko Sakai, Shunetsu Arai, Misato Kameyama, Tatsuhiro Higuchi for their active participation in the language experiment.