You are on page 1of 19

479360

2013

SLR29210.1177/0267658313479360Second Language ResearchShoemaker and Rast

Article

second language research


Second Language Research 29(2) 165183 The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0267658313479360 slr.sagepub.com

Extracting words from the speech stream at first exposure


Ellenor Shoemaker Rebekah Rast

Universit Sorbonne Nouvelle, Paris, France

The American University of Paris, France

Abstract
The earliest stages of adult language acquisition have received increased attention in recent years (cf. Carroll, introduction to this issue). The study reported here aims to contribute to this discussion by investigating the role of several variables in the development of word recognition strategies during the very first hours of exposure to a novel target language (TL). Eighteen native speakers of French with no previous exposure to Polish were tested at intervals throughout a 6.5-hour intensive Polish course on their ability to extract target words from Polish sentences. Following Rast and Dommergues (2003) first exposure study, stimuli were designed to investigate the effect of three factors: 1. 2. 3. the lexical transparency of the target word with respect to the native language (L1); the frequency of the target word in the input; the target words position in the sentence.

Results suggest that utterance position plays an essential role in learners ability to recognize words in the signal at first exposure, indicating acute sensitivity to the edges of prosodic domains. In addition, transparent words (e.g. professor professor) were recognized significantly better than non-transparent words (e.g. lekarz doctor), suggesting that first exposure learners are highly dependent on L1 phonological forms. Furthermore, the frequency of a target word in the input did not affect performance, suggesting that at the very beginning stages of learning, the amount of exposure to a lexical item alone does not play a significant role in recognition.

Keywords
acquisition of second language phonology, first exposure, Polish language, spoken word recognition
Corresponding author: Ellenor Shoemaker, Dpartement Monde Anglophone, Universit Sorbonne Nouvelle Paris 3, 13 rue de Santeuil, 75005 Paris, France. Email: ellenor.shoemaker@univ-paris3.fr

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

166

Second Language Research 29(2)

IIntroduction
The notion that running speech is comprised of individual lexical units is one based in a psychological reality rather than an acoustic one. As noted by Kuhl (2000: 11,852), no speaker of any language perceives acoustic reality; in each case, perception is altered in the service of language. It is a listeners linguistic experience rather than the inherent acoustic properties of the signal that allows him or her to identify and extract discrete lexical items from running speech. The variable and continuous nature of speech requires listeners to apply languagespecific processing strategies in order to comprehend the speech stream. The sounds of a language not only attach to one another without pause in a continuous acoustic signal, but phonemes are also subject to myriad phonological processes that can modify their acoustic realization, thus rendering the mapping of lexical forms to acoustic input problematical. Despite this variability, however, spoken word recognition in ones native language is not only efficient, but effortless. According to Cutler (1996), models of spoken word recognition fall roughly into two categories: models that propose that recognition is the by-product of lexical competition, and models that propose that recognition is aided by explicit acoustic and/or phonological cues to where word boundaries lie. Competition-based recognition centers on the notion that the segmentation of the speech signal emerges as a result of competition between candidates in the mental lexicon as they are activated by the acoustic input. This view hinges on the fact that listeners possess a well-stocked mental lexicon and consequently recognize the beginning of a word by identifying where the preceding word ends. Models of lexical competition such as TRACE (McClelland and Elman, 1986), and Shortlist (Norris, 1994) do not therefore posit specialized mechanisms for the identification of word boundaries.1 The speech stream is segmented into non-overlapping words when the lexical competition process results in an optimal parse of the signal. A second theory of segmentation and word recognition is based on the exploitation of phonetic and phonological detail in the identification of word and syllable boundaries. An ever-growing body of work has established that listeners exploit the presence of phonetic variation that occurs at the edges of prosodic domains in processing the speech signal. Nakatani and Dukes (1977) were among the first to show that (native) listeners can use the presence of aspiration in word-initial voiceless stops in English to distinguish between such potentially ambiguous pairs as loose pills and Lou spills, where an aspirated /p/ in the former signals a preceding word boundary. Further research in this domain has established that native speakers make use of myriad acoustic and phonological cues to locate the edges of words including variation in segmental duration (Shoemaker, in press), the presence of full, unreduced vowels (Cutler and Butterfield, 1992), and changes in fundamental frequency (Welby, 2007), and phonotactic constraints (McQueen, 1998), among others. A substantial body of research also suggests that listeners make use of the rhythmic characteristics of language to identify word boundaries in the speech stream. The Metrical Segmentation Strategy (Cutler and Norris, 1988) proposes that prosody-based segmentation is a language-universal processing procedure, but that the rhythmic cues used in segmentation are particular to each language (or family of languages). Numerous studies have supported this hypothesis. Segmentation in English and Dutch has been

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

167

shown to be stress based (Cutler and Norris, 1988; van Zon and de Gelder, 1993) in that listeners exploit the fact that most content words in these languages begin with a strong (stressed) syllable and subsequently assume that a word boundary will directly precede such a strong syllable. Conversely, syllable-based segmentation routines have been demonstrated in the Romance languages (Mehler etal., 1981; Sebastin-Galls etal., 1992; Tabossi etal., 2000), where syllables are more or less equally weighted and listeners assume that syllable boundaries coincide with word boundaries. More recent work has explored the dynamic nature of speech perception by examining the simultaneous exploitation of multiple cues at different processing levels in the segmentation of connected speech, thereby offering a hierarchy of cues based upon the saliency of each individual cue in different speech processing environments. Mattys (2003) for example showed differential sensitivity to stress and phonotactic cues in Englishspeaking listeners when these cues were presented in clear speech as opposed to noise. When the two cues were pitted against one another in clear speech participants showed more sensitivity to phonotactic constraints, but more sensitivity to stress when stimuli were presented in noise. A further study investigating the simultaneous exploitation of stress and coarticulation as boundary cues showed that coarticulation outweighed stress in clear speech, while stress outweighed coarticulation in a degraded signal (Mattys, 2004). The exploitation of language-specific processing strategies such as those mentioned above renders (native) spoken word recognition effortless, to the extent that it is impossible not to comprehend spoken input in the native language (L1). The ease with which the L1 is processed, however, is contrasted with the effort that can be required in the processing of a second language (L2). Word recognition and segmentation strategies that are employed efficiently in the L1 may not be applicable to the L2. For example, a stressbased segmentation routine that aids in the segmentation of English would be ineffective in the segmentation of French, which lacks lexical stress. Conversely, a syllable-based processing routine employed in French would be rendered inefficient in English given the ambisyllabic status of word-medial intervocalic consonants (Kahn, 1980). Research on processing by late learners has suggested that the processing of an L2 can in fact be constrained by the application of L1 segmentation routines. (for a review, see Cutler, 2001). However, while a great deal of research has been undertaken concerning the limitations of L2 speech processing at intermediate and advanced levels of proficiency, very little research to date has specifically dealt with how learners initially break into the sound stream of a novel foreign language, what aids them in this process at first exposure, and how they manage to improve perceptual strategies over time. One exception is the study of statistical learning using artificial language learning paradigms (e.g. Saffran etal., 1996); however, only in recent years have studies based on natural language input been seriously devoted to the investigation of how learners approach segmentation at first exposure. Extant models of natural L2 acquisition do not explicitly take into account the developmental aspects of word recognition at the very beginning stages of learning due to what is largely perceived as the insurmountable obstacle of controlling and measuring natural language input. Some recent studies have managed to find ways to surmount this obstacle in a non-instructional language acquisition setting. Gullberg etal. (2010), for

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

168

Second Language Research 29(2)

instance, found that participants were capable of extracting possible Mandarin word forms and phonotactic constraints after only 714 minutes of exposure to Mandarin Chinese by means of audio-visual material, and that with gestural support they were even able to extract soundreferent pairings. Other studies have focused on instructed L2 acquisition at first exposure (e.g. Rast, 2008), which allows for a full control of the linguistic input and input treatments. One methodology that has been used in L2 acquisition studies to capture learners perceptual ability, regardless of proficiency level, is the sentence repetition task. Sentence repetition tasks have traditionally been used to determine how a learner perceives and memorizes target language (TL) utterances in the short term. Kleins (1986) study on sentence repetitions, for example, revealed a privileged role for the processing of information in initial and final positions. Barcroft and VanPatten (1997) tested a position effect as well and found that beginning English learners of Spanish attended more to utterance-initial items than to items in medial or final positions. Rast and Dommergues (2003) used this paradigm to investigate what elements of Polish could be perceived and repeated by first exposure participants and learners (L1 French) after 4 hours and again after 8 hours of Polish instruction. Participants listened to sentences in Polish recorded by a native speaker and were asked to repeat the sentences as best they could. The data were analysed in terms of correct repetitions of individual words relative to the following factors: hours of instruction, word length (measured in syllables), word stress, phonemic distance (based on a FrenchPolish phonemic comparison), transparency (based on a FrenchPolish lexical comparison), the position of the word in the sentence, and the frequency of the word in the Polish input. Results showed a significant effect of word stress, phonemic distance, transparency, and sentence position on the ability of both first exposure participants and learners to repeat Polish words at all time intervals. No effect of word length was found, and a frequency effect appeared only after 8 hours of exposure. The current study expands on Rast and Dommergues work by testing fewer variables but with more control over the type of language activity by removing the production task. A purely perceptual word recognition task was designed to focus more specifically on sentence segmentation and word recognition by alleviating the need for learners to (re)produce orally. Carroll (2012) suggests that position effects in particular may be task dependent. Therefore, removing the production portion of the task may allow us to more precisely home in on comprehension strategies to which learners have access before aspects of the TL have been acquired. This line of inquiry is based on several assumptions. First, we assume that first exposure learners do not have complete access to lexically-based segmentation strategies in that the TL lexicon has not yet been acquired, and therefore competition among lexical candidates in the TL cannot occur. Furthermore, we assume that learners at first exposure do not yet have knowledge of explicit phonetic and phonological cues to word boundaries in the TL that could aid in the localization of word boundaries. Conversely, we assume that first exposure learners do have access to the implicit knowledge that (1) syllables and words make up the acoustic signal, and that (2) perceptual strategies can be employed to extract these items efficiently.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

169

Following the results of Rast and Dommergues (2003), we examine the contribution of three factors to a learners ability to extract words from the signal: the frequency of the item in the input, transparency of the item with respect to the L1, and the position of the item in an utterance. First, the finding by Rast and Dommergues (2003) that frequency (measured as > 20 tokens in the input) only became a significant factor in participants ability to repeat words after 8 hours of exposure to Polish merits further investigation. Slobin (1985) suggests that learners must take note of sameness in order for frequency to come into play. In other words, they need to recognize that they have previously seen or heard a given item or structure in the input and make note of it. This implies that some sort of recognition or extraction must take place repeatedly. One need also note that generally frequent items will become even more frequent over time; an item that reached frequency at 4 hours will likely be even more frequent at 8 hours, suggesting that frequency does not act alone, but rather is correlated with overall exposure. With regard to lexical transparency, evidence of cross-linguistic influence in the results of the sentence repetition task in Rast and Dommergues (2003) adds yet more support to the well-established claim that learners use prior linguistic knowledge in L2 acquisition (Odlin, 1989). First exposure participants and learners were better able to repeat Polish words that shared formal (and possibly semantic) features with French words than those that did not. Results also showed that increased exposure had a stronger influence on opaque words than transparent ones. Can we then assume that transparent words are not learned in the same way as opaque words because transparent words are mapped onto existing mental representations whereas opaque ones are not? Furthermore, will frequency effects be stronger for opaque words than for transparent ones in a purely perceptual task? Once again, a word recognition task will allow us to observe the effect of transparency on the ability of our learners to recognize lexical items without having to reproduce them. First exposure participants and learners in Rast and Dommergues (2003) study also relied on the position of a word in the sentence, correctly repeating more words in initial and final positions than in medial position, and thereby providing support for Kleins (1986) claim that learners tend to process items in initial and final positions before those in medial position. These results are not, however, in line with Barcroft and VanPattens (1997) findings that utterance-initial items were more acoustically salient than those in medial or final positions. The effect of position is not yet thoroughly understood, and studies show contradictory results. As mentioned above, the issue of relevant task must be addressed. To what degree will utterance-initial and utterance-final positions be favored if the task requires the learner to recognize a lexical item in the acoustic stream, but not to reproduce it? Polish was chosen as the target language of the current study in order to compare results with those of Rast and Dommergues (2003), and because its phonological and morpho-syntactic systems differ significantly from those of the L1 of the studys participants (French), allowing for observation of the role of the L1 in the acquisition process. In this section, we briefly outline some surface phonological similarities and differences between French and Polish at both the segmental and suprasegmental levels. For comprehensive phonological accounts of French and Polish the reader is directed to Tranel (1987) and Gussman (2007), respectively.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

170
Table 1. Phonemic inventories of French and Polish. French Oral vowels Nasal vowels Glides Consonants /a, ,1 e, , , i, , o, , , u, y/ /, , , / /j, , w/

Second Language Research 29(2)

Polish /a, , i, , u, / /, /2 /j, w/ /p, t, k, b, d, , m, n, , , f, v, s, z, , , x, , , , ,3 , , , , r,4 l/

/p, t, k, b, d, , m, n, , f, v, s, z, , , , l/

Notes. Not all French speakers make a distinction between /a/ and //, most opting for the more centralized /a/.2 It should be noted that not all phonologists agree that these two vowels are nasal, maintaining instead that they are realizations of // and // followed by a nasalized glide (for discussion see Gussman, 2007).3 The affricates // and // are attested in French, but only in loan words.4 Alveolar trill. Sources. French data adapted from Tranel, 1987; Polish data adapted from Gussman, 2007.

Concerning segmental inventories (see Table 1), French has a relatively complex vocalic system that includes 12 oral vowels and four nasal vowels. Polish, on the other hand, comprises six oral vowels and two nasal vowels. The consonantal inventories of the two languages differ largely as well. French comprises 17 consonantal segments and three glides. As noted by Gussman (2007), what Polish may lack in its limited vocalic inventory, it more than makes up for in its consonantal system, which boasts 27 consonantal segments and two glides, including an extremely rich inventory of fricatives and affricates. In addition, Polish consonants are systematically palatalized in certain environments; some phonologists argue that palatalized consonants count as separate phonemes, while others argue that palatalization is merely allophonic variation. At the suprasegmental level, French and Polish share the prosodic characteristic of fixed stress; however, they differ in where stress falls. Stress accent in French (mainly signaled by duration) consistently falls on the last syllable of a word in isolation or on the last syllable of an utterance, while Polish stress (mainly signaled by F0) falls on the penultimate syllable of a word. While both languages share fixed stress, Polish exhibits some leniency in the displacement of stress as evidenced by loan words from Greek and Latin that carry stress on the antepenultimate syllable, e.g. 'fizyka physics or re'publika republic. Words that are borrowed into French from other languages, however, are without exception regularized to conform to the French pattern. Regarding the rhythmic classification of the two languages, other differences emerge. French is considered to be a classic example of a syllable-timed language (Mehler etal., 1981), in that its rhythm is based on a regular distribution of roughly equally weighted syllables and a lack of vowel reduction. Like French, Polish lacks vowel reduction; however, research on the classification of Polish rhythm is mixed as to whether its rhythm is syllable-timed or stress-timed. Syllable-timed languages tend to have relatively simple syllable structures and full, unreduced vowels, while stress-timed languages allow for more complex syllabic structure and vowel reduction in weak syllables, resulting in greater overall durational variability between consonantal and vocalic segments in stress-timed languages than in syllable-timed languages. Exploiting this fact, Ramus etal. (1999) measured durational

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

171

ratios of segments in eight languages from naturally produced corpora and found that, while languages traditionally considered to be stress-timed patterned together (e.g. English, Dutch) and syllable-timed languages also patterned together (e.g. French, Spanish, Italian), Polish patterned differently from both groups, leading the authors to conclude that Polish is neither stress-timed nor syllable-timed. Perceptual data confirmed this hypothesis. Ramus etal. (2003) explored whether naive listeners can differentiate languages based solely on the durational ratios of consonants and vowels in synthesized speech. As predicted, languages in different rhythmic classes were easily discriminated from one another (e.g. English and Spanish). Participants easily discriminated Polish from Spanish, which researchers take to indicate that Polish is not a syllable-timed language; however, it was also easily discriminated from English (although somewhat less easily than from Spanish). The authors take this to indicate that Polish cannot be classified along the traditional distinction between syllable-based and stress-based rhythm.

IIMethod
An intensive five-day Polish course, taught by a native speaker of Polish, was conducted in Paris, France. The learning environment represented an authentic instructed languagelearning situation using a communication-based method that excluded all use of metalanguage as well as explicit explanations of grammar and pronunciation. In addition, learners were asked not to consult Polish dictionaries, grammar books, or any outside input (spoken or written) for the duration of the data collection period. In order to control the input received by learners and the frequency of lexical items a teaching script was strictly followed by the instructor. All learners received a total of 6.5 hours of instruction in Polish. The course was recorded, filmed and subsequently transcribed in its entirety in CHAT format of the CHILDES programs (MacWhinney, 2000).

1 Participants
Eighteen native speakers of French were selected by means of a questionnaire and an interview with respect to specific criteria, and were remunerated for their participation. The average age of participants was 21.2 years (range 1927). All participants reported English as their L2 and a Romance language as their third language (L3). Polish, the target language of the study, was the learners fourth language (L4).2 None had any previous knowledge of Polish or other Slavic languages.

2 Materials
A list of 16 words in Polish (see Appendix 1) was compiled according to two criteria: transparency with respect to the L1 (French) and the words frequency in the classroom input. Transparency was measured independently, based on the judgments of 13 native speakers of French who did not participate in the Polish course and who had no previous knowledge of Slavic languages.3 Participants in the transparency test heard a list of 71 words presented aurally and were asked to give a translation in French to the best of their ability. Following

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

172

Second Language Research 29(2)

Rast and Dommergues (2003), words with 0 correct translations across participants were classified as low transparency (LT) and words with more than 50% correct translations were classified as high transparency (HT). Each test word was further classified as low frequency (LF) if the word was completely absent from the classroom input (0 tokens) and high frequency (HF) if the word appeared more than 20 times in the classroom input.4 Test words were counterbalanced across the frequency and transparency categories; four words appeared in each combination of categories (HT/HF, HT/LF, LT/HF and LT/LF). All test words were of two or three syllables and carry stress on the penultimate syllable. In order to measure the possible effect of a words position in an utterance,5 48 test sentences were created ranging from 2025 syllables, which included each of the 16 test words in three different positions: initial (IP), medial (MP), and final (FP). Care was taken to avoid subordination or other syntactic structures that may introduce a pause before or after test words in the sentences. Thirty-three distracter sentences of 2025 syllables were also created, and were presented during the test along with 11 additional distracter words (presented three times each) that were not present in the test sentences. All sentences and words were recorded by a female native speaker of Polish in a soundtreated booth.

3 Procedures
Participants were tested at two time intervals throughout the course: pre-exposure (T1; 0 hours of instruction) and after 6.5 hours of exposure (T2). The experimental protocol was created using E-Prime experimental software (Schneider etal., 2002) and presented on either laptop or desktop computers. Stimuli were presented binaurally through headphones. The experimental procedure was loosely based on Carroll (2006). In each experimental trial, participants heard a sentence in Polish followed immediately by the word OK.6 They then heard a Polish word in isolation. Their task was to report whether the word was present or not in the sentence they had heard by pressing on the computer keyboard either (1) or (2), respectively. Stimuli were presented in randomized order. Participants completed a training portion (10 trials) before beginning the experimental portion (81 trials) in order to familiarize them with the procedure. Items included in the training portion were not included in the experimental portion. No response limit was set; participants were instructed to respond quickly, though not so quickly as to sacrifice accuracy. Each testing session lasted approximately 15 minutes.

IIIResults
Mean accuracy scores at T1 (0 hours of exposure) were examined using factorial analyses of variance (ANOVA) according to Transparency (LT and HT) and utterance Position (IP, MP, and FP).7 Testing at T1 revealed a significant effect of Transparency: F(1, 34) = 59.37, p < .0001. HT words were recognized better than LT words (88.4% versus 63.7%, respectively). A test words Position in the utterance also had a significant effect on recognition accuracy at T1: F(2, 51) = 56.30, p < .0001. Words were recognized in IP 75.4% of the time, in MP 56.3% of the time, and in FP 96.3% of the time. Post-hoc analyses (Scheff) demonstrated that the difference was significant among all three positions.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

173

There was also a significant interaction between Transparency and Position at T1: F(2, 102) = 14.05, p < .0001, indicating that the effect of Transparency was not equivalent in all utterance positions. Post-hoc analyses showed that the effect of transparency was stronger in IP and in MP than in FP (possibly due to a ceiling effect in that FP words were initially recognized with 96.3% accuracy). Mean accuracy scores were then analysed for T2 (6.5 hours of exposure). Testing at T2 again revealed a significant effect of Transparency: F(1, 34) = 18.59, p = .0001. HT words were again recognized better than LT words (93.9% versus 82.1%, respectively). A test words position in the utterance also had a significant effect on recognition: F(2, 51) = 27.91, p < .0001. At T2, words were recognized in IP 88.8% of the time, in MP 75.2% of the time, and in FP 100% of the time. Further post-hoc analyses also showed that the difference was again significant between all positions. No effect of a test words frequency in the input was observed, F(1, 34) = .002, n.s. LF words and HF words were recognized equally well (87.9% and 88.1%, respectively after 6.5 hours of exposure). There was additionally a significant interaction between Transparency and Position at T2: F(2, 102) = 8.34, p = .0004, again suggesting that the effect of Transparency was not equivalent in all utterance positions at T2. Further post-hoc analyses showed again that the effect of transparency was stronger in IP and in MP than in FP. No further significant interactions were revealed. Word recognition performance at the two test Sessions (T1 and T2) was subsequently compared using a repeated-measures ANOVA, treating Transparency, Frequency, Position, and Session as within-participant variables. Overall mean accuracy scores at T1 (76.0%) and T2 (87.9%) revealed a significant effect of Session: F(1,17) = 43.38, p < .0001. Participants significantly improved in the recognition of test items after 6.5 hours of exposure. A significant interaction was further observed between Session and Transparency, F(1, 17) = 21.25, p = .0002, indicating that the effect of transparency was not equivalent at the two test sessions. Post-hoc analyses showed that sensitivity to LT words increased significantly from T1 to T2, while sensitivity to HT words did not. In addition, a significant interaction was observed between Session and Position, F(2, 34) = 7.14, p = .0026, indicating that the effect of a words position in the utterance was also not equivalent at the two test sessions. Post-hoc analyses revealed that words in IP and MP were recognized significantly better at T2 than at T1, while words in FP showed no significant improvement. Additionally, there was a significant interaction between Transparency and Position, F(2, 34) = 31.97, p < .0001. This interaction suggests that the effect of Transparency was not equivalent in all utterance positions from T1 to T2, no doubt due to the fact that significant improvement was observed in LT words in IP and MP, but not in FP. No further significant interactions were observed. Mean accuracy rates at T1 and T2 are presented in Table 2.

IVDiscussion
The current experiments explore what information is available in the acoustic signal that will aid the adult learner in extracting lexical forms from running speech at first exposure, i.e. before the TL lexicon and phonological system have been acquired. At T1, before any exposure to Polish, participants performed well above chance in the recognition of Polish words (76% mean accuracy), demonstrating that learners come to the table

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

174

Second Language Research 29(2)

Table 2. Mean accuracy (percentage) in word recognition task at T1 and T2 according to Frequency, Transparency, and utterance Position (standard deviations are in parentheses). T1 Frequency Transparency Position Global accuracy High Low High Low Initial Medial Final 76.9 (10.5) 75.2 (8.1) 88.4 (7.0) 63.7 (11.6) 75.4 (11.1) 56.3 (15.2) 96.3 (5.3) 76.0 (8.1) T2 87.9 (8.8) 88.1 (7.5) 95.9 (5.9) 82.1 (10.0) 88.8 (8.9) 75.2 (14.8) 100.0 (0.0) 87.9 (7.0)

with efficient perceptual tools already in place.8 We discuss below the exploitation of each of the three factors examined here.

1 Transparency
HT words (e.g. profesor professeur / professor) were extracted more easily from the speech stream than LT words (e.g. lekarz mdecin / doctor) at both test times, suggesting that learners may be highly dependent on phonetic and lexical forms already established in the L1. Before any exposure to the Polish language, learners were able to use the transparency of items to recognize lexical forms in running speech. In other words, the phonetic forms of transparent words in Polish appear to be sufficient to activate L1 forms in the mental lexicon from the very first exposure. After 6.5 hours of exposure to spoken Polish (T2), the effect of Transparency was again seen; HT words were extracted better than LT words. What is particularly striking is the fact that significant improvement from T1 to T2 was observed in the recognition of LT words, but not in the recognition of HT words. This discrepancy could be attributed to a possible ceiling effect in that HT words were recognized extremely effectively at both test sessions (88.4% and 95.9%, respectively) and therefore did not allow as much room for improvement as LT words. A further possibility is that the discrepancy is due to increased sensitivity on the part of participants to the phonological system of Polish. Specifically, the fact that recognition of LT words, including those that were absent from the input) increased significantly, while recognition of HT words did not, would lead us to conclude that participants are acquiring sensitivity to general phonological forms and/ or prosodic patterns of Polish rather than to specific lexical items that are acquired through repeated exposure. This hypothesis is discussed in further detail below.

2 Frequency
The frequency of a word in the input did not play a significant role in the recognition of individual lexical items. At T2 (6.5 hours of exposure), words that were frequent in the input were not recognized significantly better than those heard only during the administration of

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

175

the test at T1. This finding further refines the results of Rast and Dommergues (2003), where a frequency effect was found after 8 hours of exposure but not after 4 hours. Frequency effects have been studied extensively in psycholinguistics and second language acquisition (for a review, see Ellis, 2002). Gass and Mackey (2002: 257) address the complexity of frequency effects and highlight the importance of several central issues: How do frequency effects interact with other aspects of the second language acquisition process, and when and under what conditions do they come into play? Likewise, when and under what conditions do they not play a role? Our findings suggest that frequency effects when measured in terms of repetitions of lexical items during intensive language instruction did not play a role in the overall ability of participants to recognize words in the speech stream. Given the large quantity of research that provides evidence for a strong role for frequency, this finding may seem surprising. Several observations can be made, however. As mentioned above, Slobin (1985) proposes that frequency involves taking note of sameness, or rather familiarity and unfamiliarity, with some regularity. He further notes that the organism keeps track of frequency of patterns in experience, with automatic capacities to strengthen the traces of repeated experience and to more readily retrieve frequent and recent information (Slobin, 1985: 116566). Our results, taken together with those of Rast and Dommergues (2003), suggest that 6.5 hours of intensive language instruction (1.5 hours/day) or 8 hours of extensive instruction (1.5 hours/week) is sufficient exposure for the learner to begin to extract lexical items from running speech, but that recognition accuracy is not specifically based in repeated exposure. This finding is further in line with research on artificial language learning by Endress and Bonatti (2007), who showed that word forms can be segmented after only two minutes of exposure, but that more exposure is required to create representations of words. These authors suggest that there may in fact be two separate learning mechanisms: one that rapidly extracts structural information (such as boundary cues) from the speech signal, and another that operates more slowly and that computes the distributional properties within the structure. Specifically, the results presented here suggest that frequent exposure to test words at this early stage was not sufficient for participants to build lasting mental representations of these items.

3 Utterance position
Recognition results with respect to a words position in an utterance clearly point to a learners reliance on the edges of prosodic domains in the recognition of TL lexical items, providing evidence that prosodic boundaries are highly salient for first exposure learners. Words in both initial and final position were recognized more readily than words in medial position. Furthermore, words in final position were recognized better than words in both initial and medial position. The effect of Position was consistent at T1 and T2; MP words were recognized worse than both IP and FP words. However, recognition of MP words regardless of their transparency or frequency increased more than recognition of both IP and FP words from T1 to T2, an effect which we again interpret to indicate increased perceptual sensitivity to the Polish phonological system, which allowed participants to better break into the signal.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

176

Second Language Research 29(2)

The finding that words in initial and final position were better recognized than those in medial position is in line with Rast and Dommergues (2003) sentence repetition results. However, unlike the current study, Rast and Dommergues found no significant difference between the repetition of words in initial and those in final position, which may be evidence of differing processing strategies in production as opposed to perception. It should also be noted that the current results concerning a words position in an utterance are contra those of Barcroft and VanPatten (1997), who found differential preference for the left edge of an utterance over the right edge of an utterance. While both initial and final utterance position can facilitate segmentation in that either the right or left edge of an item is necessarily marked by silence, we see two further potential reasons why words may be recognized better in final position. First is an effect of acoustic and/or working memory in that participants could more easily keep an acoustic trace of a word in final position in memory than a word in medial or initial position, both because the time needed to retain the form before responding is much shorter and because working memory is not further encumbered by incoming material. The test sentences employed in the current study contained 2025 syllables, with an average duration of 4.38sec, a relatively long sentence length in naturally produced speech. If participants are relying heavily on acoustic memory in the recognition of words, it is reasonable to assume that words in final position hold a privileged position. Second is a possible effect of phrase-final lengthening, which could render words in utterance-final position easier to recognize. This would in effect give listeners a double cue in that not only is the right edge of the word marked by silence, but the word itself is longer and therefore more acoustically salient than the same word in other positions. Post-hoc analyses confirmed that words produced in final position (mean 692 msec) were significantly longer than the same word in initial (mean 551 msec) and medial (mean 510 msec) positions: F(2, 45) = 19.32, p < .0001. Analyses showed no significant durational differences between initial and medial words. Either or a combination of these two factors could render words in final position more easily recognizable than words in both initial and medial position.

V General discussion
What elements can adult learners use to break into a novel acoustic signal, transforming it from a stream of incomprehensible noise into a sequence of neatly segmented, recognizable sound forms? The current study has demonstrated a clear role of both utterance position and lexical transparency in a learners ability to recognize words at first exposure. The results demonstrate rapid improvement in the learners ability to break into running speech in the TL after limited exposure. One possibility as to why the learners of this study may have developed sensitivity to the Polish phonetic system so rapidly and effectively is based in general language-learning strategies. As noted above, all of the participants reported English as their L2 and a Romance language as their L3, and therefore they were all experienced language learners. A growing body of research has found this experience to be beneficial to general language learning (see, for example, De Angelis, 2007). Given that all of the learners in the current study had previous experience with processing a novel acoustic stream, they no doubt had certain strategies already in place.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

177

One further possibility is that participants gained sensitivity to the Polish phonological system. Given that input frequency did not have an effect on recognition, we posit that participants did not build lasting mental representations of individual lexical entries that they were exposed to, but rather that they acquired prosodic or segmental information specific to Polish, which allowed them to better segment the signal. If improvement in recognition from T1 to T2 were based on the acquisition of individual lexical entries through exposure to input, we would expect HF words to be recognized better than LF words at T2, which was not the case. This hypothesis is further supported by the fact that the recognition of LT words improved regardless of their frequency in the input. While improved recognition of words due to increased sensitivity to the Polish phonological system is plausible, the current results do not allow us to pinpoint whether phonological knowledge was acquired at the segmental or the suprasegmental level, or both. Specifically regarding the improved recognition of LT words, one possibility is that participants gained sensitivity to the Polish phonemic inventory. The LT words contain segments (e.g. /, , /, and palatalized /p, f, l/) as well as consonants clusters (e.g. /p, f/) that are not attested in French. Thus, if participants gained sensitivity to these individual segments or clusters, LT words could feasibly be easier to extract from the sentences. At the suprasegmental level, increased sensitivity to the distribution of stress in Polish could also have played a role in word recognition. We are not aware of any research that specifically addresses rhythmically-based speech segmentation strategies in Polish (by either native or non-native speakers), and the mixed nature of Polish rhythmic structure means that any supposition should be approached with caution. However, given the regular distribution of stress, it is reasonable to assume that a segmentation strategy that exploits stress placement in the localization of word boundaries would be efficient. Segmentation strategies based on the regular distribution of stress have been demonstrated in languages such as English (Cutler and Butterfield, 1992), and could in fact be an efficient strategy in Polish in that stress is fixed and therefore even more regular than in English. If participants gained sensitivity to the overall rhythm of Polish, it would follow that they became sensitive to the fact that stress in Polish words falls almost exclusively on the penultimate syllable of a word. This information could help them extract the test words used in the current study in that all 16 test words follow this stress pattern, which would in effect signal to the listener the right edge of the word (the syllable following the stressed syllable). Additionally, all the test words used in the current study consisted of two or three syllables, and there was no possibility of secondary stress placement within words. Therefore, not only does penultimate stress help learners locate the right edge of the word, but the length of the word aids learners in finding the left edge of the word in that it either immediately precedes the stressed syllable (in two-syllable words) or is located one-syllable to the left of the stressed syllable (in three-syllable words). All in all, the recognition of stress would constitute an efficient strategy for the localization of word boundaries and the extraction of word forms in the current stimuli. This hypothesis is arguably in line with research concerning phonological acquisition in infants. Speech development in infants progresses in a similar fashion. A large body of work has focused on the notion of prosodic boot-strapping, which holds that infants

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

178

Second Language Research 29(2)

first acquire the prosodic structure of a language and then in turn use this knowledge to identify discrete prosodic units, and finally the words and segments that make up these units. Work by Jusczyk and Aslin (1995) has shown evidence of speech segmentation in infants at about 67 months, before the acquisition of individual lexical items is in place, the implication being that infants can identify word boundaries before they acquire the words delineated by these boundaries. In other words, infants learn to identify cues to prosodic words before they acquire the segmental content of words. This conclusion is also based on evidence that infants can discriminate rhythmic classes before they can discriminate lexical items. For example, infants can discriminate between a stress-based language such as English and a mora-based language such as Japanese (Nazzi etal., 1998). However, infants cannot discriminate between languages that exhibit similar metrical structures, for example, French and Spanish (Mehler etal., 1988) or English and Dutch (Nazzi etal., 1998), further suggesting that they are attending to the rhythm of the languages and not individual segments or words. It could be argued however that the particular language pairing employed in the current study, L1 French and TL Polish, is problematic for a theory in which first exposure participants are using stress placement in the segmentation of running speech. Previous research on the perception of stress has suggested that speakers of French exhibit a certain deafness to stress stemming from the lack of lexical stress in French. It has been proposed that (monolingual) French speakers have never acquired this parameter in their native phonology and therefore have particular difficulty perceiving it in other languages (Dupoux etal., 1997). One could argue that if French speakers are transferring their L1 pattern of syllable-based segmentation (and lack of lexical stress), they may not be able to efficiently parse the signal based on stress placement in Polish. In response to this, however, we would argue that the distribution of stress in Polish may be more accessible to speakers of French in that, as explained above, Polish shows characteristics of both syllable-timed languages and stress-timed languages. Specifically, Polish, like French, does not have vowel reduction, therefore rendering the length and weight of syllables more regular than in a stress-timed language such as English. This fact may render the edges of syllables (and words) more available to speakers of French, and may render the perception of stress more accessible. We emphasize that our conclusions concerning the acquisition of Polish prosodic structure by participants in this study are purely hypothetical. The current study was not specifically designed to test the acquisition of prosody in Polish, and thus our conclusions must remain conjectural at this point. Furthermore, we are not proposing that information acquired concerning the phonemic inventory in Polish did not also aid in recognition; however, given the prosodic structure of Polish, it seems that rhythmic information may be more useful to participants than segmental information in the signal. Further research would be required to test whether participants at first exposure are specifically attending to general prosodic characteristics as they break into the TL signal; however, data from infant phonological acquisition would lead us to believe that this is plausible. Crucially, while the current data show a rapid increase in the learners ability to match and extract phonetic forms from the acoustic stream, they do not speak to the

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

179

learners capacity to assign meaning to these forms. As Carroll notes, hearing words is merely a first step in a series of processes which take the speech signal as their input and culminate in an interpretation (2004: 228). Therefore, one further question that emerges from the current results is whether participants are not only recognizing phonetic forms, but also attaching meaning to these forms. In other words, are recognition and association of meaning two completely separate processes? Can there be segmentation without recognition, in which unknown words remain arbitrary (yet recognizable) strings of phonemes? Given that the meaning of HT words is (generally) equivalent in both the L1 and the TL, we assume that learners are able to map HT words onto existing lexical entries that include meaning associations; however, we cannot make the same assumption for the LT words. What is clear from the current results is that participants significantly improved in their ability to match Polish phonetic forms after just 6.5 hours of exposure. Further research testing different source and target language combinations as well as research specifically targeted at interactions between the acquisition of phonological systems and formmeaning associations would be required to address these and other unanswered questions. One further avenue for research could include introspective data collection in which participants are asked to reflect upon the particular strategies that they employed in the recognition of words in continuous speech. This type of metaanalysis could contribute to our ability to pinpoint what strategies language learners may be employing during word recognition in the initial exposure to the target language. Acknowledgements
We wish to express our thanks to those who made this study possible, in particular Marzena Watorek, Paulina Kurzepa, Ewa Lenart, Maya Hickmann, and Sophie Wauquier. We also thank our anonymous reviewers for their insightful comments.

Funding
This research project was supported by a grant from the Programme dAide la Recherche Innovante (201112), Universit Paris 8, St-Denis, France.

Notes
1. 2. Later versions of Shortlist do incorporate metrically-based segmentation. Space limitations prevent us from entering into discussion about the details of learners background languages and thus from contributing more fully to research on multilingual speech processing. Information about the learners languages was collected by means of a language questionnaire in which learners rated their own proficiency level in each of their languages. Our objective was to create a homogenous group; therefore, candidates whose language background differed significantly from the group profile were not retained for the study. 3. We distinguish transparency from cognate in order to emphasize our focus on the learner. We are not concerned here with etymology but rather with the psychotypology of the learner, i.e. what the learner perceives as the same or different in the L1 and TL (see Kellerman, 1980). 4. Frequency measures were calculated based solely on the Polish professors oral input. Participants were exposed to limited written input in the form of presentation slides and some further aural input in the form of recorded dialogues used in listening comprehension

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

180

Second Language Research 29(2)

exercises, however these tokens were not included in the frequency count. Furthermore, all forms of a word were counted regardless of case and gender inflection (e.g. studentk was included in frequency counts for the target item studentem). With regard to the distribution of frequency over time, all frequent words appeared in the input during at least three of the five class sessions. Although frequency counts for each item differed, they were comparable with respect to the two categories of transparency (high and low). We also point out that, since participants were tested twice using the same test materials, there was very limited exposure to the eight LF words in that participants were exposed to these items three times in each of the two test sessions (once in each utterance position). 5. It should be noted that the input contained varied syntactic structures (SVO, OVS and others), and therefore target items were well distributed across sentence positions in the classroom input. 6. The word OK was included between the sentence and the test word in order to prevent participants from relying solely on echoic (i.e. acoustic) memory. 7. Frequency measures were not analysed at T1 given that participants had not yet been exposed to the HF test words. 8. As pointed out by a reviewer, the fact that the learners in this study performed well above chance in the word recognition task before any exposure to the target language could be due to the fact that the participants were all experienced language learners. Polish represented a L4 for all of the participants and therefore these learners already possessed successful language learning strategies. For this reason, generalizations to less experienced learners should be made with caution.

References
Altenberg E (2005) The perception of word boundaries in a second language. Second Language Research 21: 32558. Barcroft J and VanPatten B (1997) Acoustic salience of grammatical forms: The effect of location, stress, and boundedness on Spanish L2 Input Processing. In: Glass WR and Prez-Leroux AT (eds) Contemporary perspectives on the acquisition of Spanish: Volume 2. Somerville, MA: Cascadilla Press, 10921. Carroll S (2004) Segmentation: Learning how to hear words in the L2 speech stream. Transactions of the Philological Society 102: 22754. Carroll S (2006) The micro-structure of a learning problem: Prosodic prominence, attention, segmentation and word learning in a second language. Unpublished paper presented at the Annual Meeting of the Canadian Linguistic Association, York University, Toronto, Ontario, Canada. Carroll S (2012) First exposure learners make use of top-down lexical knowledge when learning. In: Braunmller K and Gabriel C (eds) Multilingual Individuals and Multilingual Societies. Amsterdam: John Benjamins, 2346. Cutler A (1996) Prosody and the word boundary problem. In: Morgan J and Demuth K (eds) Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Mahwah, NJ: Erlbaum, 8799. Cutler A (2001) Listening to a second language through the ears of a first. Interpreting 5: 123. Cutler A and Butterfield S (1992) Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language 31: 21836. Cutler A and Norris D (1988) The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14: 11321. Cutler A, Mehler J, Norris D and Segui J (1989) Limits of bilingualism. Nature 340: 22930. De Angelis G (2007) Third or additional language acquisition. Clevedon: Multilingual Matters.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast

181

Dupoux E, Pallier C, Sebastin-Galls N and Mehler J (1997) A destressing deafness in French? Journal of Memory and Language 36: 40621. Ellis N (2002) Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24: 14388. Endress A and Bonatti LL (2007) Rapid learning of syllable classes from a perceptually continuous speech stream. Cognition 105: 24799. Gass S and Mackey A (2002) Frequency effects and second language acquisition: A complex picture? Studies in Second Language Acquisition 24: 24960. Gullberg M, Roberts L, Dimroth C, Veroude K and Indefrey P (2010) Adult language learning after minimal exposure to an unknown natural language. Language Learning 60: 524. Gussman E (2007) The phonology of Polish. Oxford: Oxford University Press. Jusczyk P and Aslin R (1995) Infants detection of the sound patterns of words in fluent speech. Cognitive Psychology 28: 123. Kahn D (1980) Syllable-based generalizations in English phonology. New York: Garland. Kellerman E (1980) il pour il. Encrages, Special issue: Acquisition dune langue trangre [Foreign language acquisition]: 5463. Klein W (1986) Second language acquisition. Cambridge: Cambridge University Press. Kuhl P (2000) A new view of language acquisition. Proceedings of the National Academy of Science 97: 11,85057. MacWhinney B (2000) The CHILDES Project: Tools for analysing talk. Mahwah, NJ: Lawrence Erlbaum Associates. Mattys S (2003) Stress-based speech segmentation revisited. In: Proceedings of Eurospeech: The 8th annual Conference on Speech Communication and Technology. Geneva, 12124. Mattys S (2004) Stress versus coarticulation: Towards an integrated approach to explicit speech segmentation. Journal of Experimental Psychology: Human Perception and Performance 30: 397408. McQueen J (1998) Segmentation of continuous speech using phonotactics. Journal of Memory and Language 39: 2146. McClelland J and Elman J (1986) The TRACE model of speech perception. Cognitive Psychology 18: 186. Mehler J, Dommergues J-Y, Frauenfelder U and Segui J (1981) The syllables role in speech segmentation. Journal of Verbal Learning and Verbal Behavior 20: 298305. Mehler J, Jusczyk EW, Lambertz G, Halsted N, Bertoncini J and Amiel-Tison C (1988) A precursor of language acquisition in young infants. Cognition 29: 14378. Nakatani L and Dukes K (1977) Locus of segmental cues to word juncture. Journal of the Acoustical Society of America 62: 71419. Nazzi T, Bertoncini J and Mehler J (1998) Language discrimination in newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance 24: 75677. Norris D (1994) Shortlist: A connectionist model of continuous speech recognition. Cognition 52: 189234. Odlin T (1989) Language transfer: Cross-linguistic influence in language learning. Cambridge: Cambridge University Press. Ramus F, Nespor M and Mehler J (1999) Correlates of linguistic rhythm in the speech signal. Cognition 73: 26592. Ramus F, Dupoux E and Mehler J (2003) The psychological reality of rhythm classes: Perceptual studies. In: 15th International Congress of Phonetic Sciences. Barcelona, 33742. Rast R (2008) Foreign language input: Initial processing. Clevedon: Multilingual Matters.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

182

Second Language Research 29(2)

Rast R and Dommergues J-Y (2003) Towards a characterisation of saliency on first exposure to a second language. EUROSLA Yearbook 3: 13156. Saffran JR, Aslin RN and Newport EL (1996) Statistical learning by 8-month-olds. Science 274: 192628. Schneider W, Eschman A and Zuccolotto A (2002) E-Prime users guide. Pittsburgh, PA: Psychology Software Tools. Sebastin-Galls N, Dupoux E, Segui J and Mehler J (1992) Contrasting syllabic effects in Catalan and Spanish: The role of Stress. Journal of Memory and Language 31: 1832. Shoemaker E (in press) Durational cues to word recognition in Spoken French. Applied Psycholinguistics. Slobin D (1985) Crosslinguistic evidence for the language-making capacity. In: Slobin D (ed.) The crosslinguistic study of language acquisition: Volume II. Hillsdale, NJ: Lawrence Erlbaum, 11571256. Tabossi P, Collina S, Mazzetti M and Zoppello M (2000) Syllables in the processing of spoken Italian. Journal of Experimental Psychology: Human Perception and Performance 26: 75875. Tranel B (1987) The sounds of French: An introduction. Cambridge: Cambridge University Press. Van Zon M and de Gelder B (1993) Perception of word boundaries by Dutch listeners. In: Proceedings of the 3rd European Conference on Speech Communication and Technology, Berlin, 68992. Welby P (2007) The role of early fundamental frequency rises and elbows in French word segmentation. Speech Communication 49: 2848.

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

Shoemaker and Rast


Appendix 1. Polish test words (with French and English translations). High frequency (HF) High Transparency (HT) francuski /fran'tsuski/ franais (French) profesor /pr'fsr/ professeur (professor) studentem /stu'dntm/ tudiant (student) fotograf / f'tgraf/ 'photographe' (photographer) lekarz /'lka/ mdecin (doctor) jzyk /'jzk/ langue (language) pracuje /pra'tsuj/ travaille (works, 3rd person singular) Niemcem /'mtsm/ allemand (German) Low frequency (LF) dokument /d'kumnt/ document (document) lampa /'lampa/ lampe (lamp) plastyk /'plastk/ plastique (plastic) ananas /a'nanas/ ananas (pineapple) spiewak /'pvak/ chanteur (singer) swietnie /'ft/ bien (well/ adverb) Litwinem /lit'finm/ lituanien (Lithuanian) lodowka /l'dufka/ frigo (refrigerator)

183

Low Transparency (LT)

Downloaded from slr.sagepub.com at BEIJING FOREIGN STUDIES UNIV on July 9, 2013

You might also like