L2 production & perception
|Discussion Topic 4:
Learning to produce & perceive the L2
- Flege (1993) examined Chinese Ss’ production and perception of vowel duration as a cue to the word-
final distinction between English /t/-/d/, yielding a correlation of r = 0.54, p > 0.01;
- Research examining word-initial stops (Flege & Schmidt, 1995; Schmidt & Flege, 1995) focused on native
Spanish Ss’ production of VOT in English /p/ and the location of the “best” /p/ in continua differing in
VOT; r = 0.54, p < 0.01. In a post-hoc analysis, Ss were divided according to overall degree of foreign
accent. A significant production-perception correlation was obtained for “proficient” Ss (i.e., those with
relatively mild foreign accents, r = 0.49, p > 0.01) but not for less proficient Ss (r = -0.004);
- Flege et al. (1997) examined the production and perception of English vowels by 20 native speakers
each of German, Spanish, Korean and Mandarin. The measure of perception was the size of the shift
from one vowel category to another based on changes in F1 frequency. The measure of production was
the size of F1 differences produced in pairs of English vowels. A production-perception correlation of r =
0.53 was obtained for English /i/ and /ɪ/, a correlation of r = 0.52 was obtained for /ɛ/ and /ӕ/.
|Write if you have something to say on this topic. Please send a carefully edited text and
permission to publish on this site if you want to make your comments public. Thanks. JEF
The relation between the production and perception of L2 sounds
has generated a lot of attention over the years, and even more
confusion. Here I lay out the SLM position after providing some
background information. Then I sumarize some relevant data,
concluding this post with a brief description of what I consider to be
the most appropriate way to assess the production-perception
relation in L2 speech learning.
Work focusing on infant development suggests that infants begin
showing an influence of the surrounding linguistic environment on
perception somewhat before showing an influence of the ambient
language on their vocal output. Research on (monolingual) speech
and language development indicates that young children's
perception of L1 segments generally "leads" their production of
those segments. However, adult-child perceptual differences aren't
evident to the casual observer; and children's articulation of
segments continues to be refined long beyond the point that their
productions are recognizable.
Whatever its time course, alignment presupposes error correction mechanisms. One mechanism enables
children to modify their production so that their vocal output corrresponds to the perceptual representations
they have developed (which, in turn, is based on what they have heard over a fairly long period of time). The
other error correction mechanism helps guide speech in real time via self-hearing. Using these mechanisms, L1
learning children eventually stop "misarticulating" L1 sounds. L1 phonetic categories continue to develop but,
eventually, these too reach completion so that, at some point during adolescence, the children become mature
"speaker-hearers" of their L1.
The neural representations used in the auditory processing and
articulation of segments are localized in different regions of the
brain, but are connected to one another directly and via higher
processing centers. At a neural level, production and perception
show a mutual influence, which depends on level of processing and
type of stimulation.
The child's need to align patterns of production and perception is demonstrated by cross-language research,
which shows that although languages may differ in the phonetic specification of speech sounds, they always
show co-ordinated patterns of production and perception. As an example, Spanish and English both have /t/,
but the segment is realized with longer VOT in English than Spanish. Correspondingly, English adults require
longer VOT values to identify stimuli as /t/ (as opposed to /d/) than Spanish-speaking adults do.
It is uncertain how long it takes children learning an L1 to align production and perception. Here too the results
of cross-language research is relevant. Flege and Eefting (1987a,b. 1988) showed that children learning
Spanish and English as an L1 differed from adult monolingual speakers of Spanish and English in much the
same way. In both languages, 8-9 year-old children produced stops with shorter VOT values and, in an
identification experiment, showed phoneme boundaries (i.e, cross-overs from predominantly /d/ to /t/
judgments) at shorter VOT values than adults did. Even though they had not yet reached adult-like levels, the
children's production and perception were aligned. Perhaps production changes little by little as perception
changes so that there is never a large gap between production and perception.
When researchers talk about the "attunement" of infants and
children to their L1, they are generally referring to a gradual
modification of auditory perceptual representations. It is generally
assumed that these perceptual representations ("phonetic
categories" in the SLM framework) guide the development of
articulatory motor plans that eventually can be used by children to
reproduce the sounds they have heard.
Once the L1 phonetic system has been fully established, error correction mechanisms are less important. For
adults who become profoundly deaf as the result of taking ototoxic drugs, the ability to produce L1 segments in
a native-like fashion is affected minimally, and not even right away. These unfortunate individuals have little
need, it seems, for feedback to maintain the correct articulatory patterns they established as children.
2. L2 speech learning.
The SLM proposes that all of the capacities that were used in successful L1 speech development -- including
the ability to align production and perception -- remain intact and accessible to learners of an L2. More
specifically, the SLM proposes that, as in L1 learning, perception generally "leads" production. Accurate
perception does not entail accurate production; however, accurate production requires accurate perception.
In an invited talk presented at the ICPhS meeting held in San Francisco, I (Flege (1999a) reviewed research
comparing the production and perception of phonetic segments in an L2. In this talk, I summarized research in
which the relation between segmental production and perception was evaluated through correlational analyses.
All of the studies yielded moderate correlations of about r = 0.50, including these:
1. Not all aspects of perception are “transported” (Bever’s term) from perception to production. Example:
I can perceptually distinguish Italian trilled /r/ from other variants of /r/ but, to my great embarrassment,
can not produce trills. (I take consolation in the fact that trills are learned late by most Italian children
and not at all by some Italians, including my late father-in-law who used a uvular /r/ instead of a trilled
2. The “transportation” of properties from (perceptual) phonetic categories to phonetic implementation
rules takes time. This observation implies that if two groups differed in perception but not production at
Time 1 of a longitudinal study, they may differ in production at Time 2.
3. The measure of segmental production and/or perception submitted to correlation analyses are limited
by ceiling effects.
4. Production and perception are inherently incommensurable, making comparison difficult. Flege
(1999) cited a study examining the perception and production of phonemic length contrasts in Swedish.
The phonetic dimension of interest in both domains was overall vowel duration, seemingly a
commensurable dimension. A correlation of r = 0.70 was obtained.
In my ICPhS talk, I offered several explanations as to why L2 research does not yield higher production-
These considerations suggest that correlation analysis does not provide the best method to evaluate the
contingency at the heart of the SLM hypothesis, i.e., that production accuracy can not be greater than
perceptual accuracy. .
In the ICPhS talk, I cited a study which made use of a more appropriate form of analysis. The study in
question examined production and perception of English vowels by native speakers of Italian living in
Canada. Most of the Ss examined succeeded in producing most English vowels in a readily identifiable
One exception was English /ʌ/, which was produced inaccurately by 31 of the 72 native Italian Ss. Given that
the Italian vowel that is perceptually closest to English /ʌ/ is /a/, a categorial discrimination test was used to
evaluated the discrimination of English /ʌ/ from Italian /a/.
The Ss who produced English /ʌ/ accurately were found to discriminate English /ʌ/-Italian /a/ significantly
better than the 31 Italian Ss who produced /ʌ/ poorly. A difference in production accuracy was not obtained,
however, when the Ss were divided into subgroups based on the ability to discriminate English /ʌ/-English /ӕ/
or the ability to discriminate English /ʌ/-English /α/.
A proposal by Bever anticipated the finding just mentioned. Bever (1981, p. 193) proposed that once the
production and perception of L1 phonetic segments have been successfully “aligned” via a “psycho-grammar”
there is no further need for the “internal communication” between production and perception. As a result, the
psycho-grammar which served to align production and perception in L1 speech acquisition “falls into disrepair
because of disuse”. Use it or lose it. Bever suggested, however, that losing the capacity to align production and
perception is not inevitable so long as “one is continually learning a new language.”