SlideShare a Scribd company logo
1 of 24
Subtle patterns of learner language
Steve Pepper 2013-09-26 ASKeladden
13 topics for further research
og
er
det
å
i
jeg
som
en
at
på
for
de
til
ikke
har
med
vi
kan
av
man
men
om
et
så
mange den
varmå
eller
seg
også
mye
veldig
når
være
fra
norge
andre
alle
skal
megdu
vil
noen
hvis
mer
mennesker
ha
dette
barn
bare
blirviktig
fordi
folk
da
han
min
barna
hva
noefå
dem
bli
synes
hvor
selv
etter
hadde
oss
nå
land
år
kommer
ting
gjøre
alt
enn
dag
der
livet
tror
venner
gå
flere
stor
får
trenger
Introduction
• An application of the detection-based
argument (Jarvis 2010)
– Modelled on Jarvis & Crossley (2012)
• Use of data mining methods to
1) automatically detect (predict) the L1
2) identify (lexical) features that serve to
discriminate between L1 groups, i.e.
L1 predictors
• Major advantages:
– Ability to recognize positive as well as
negative transfer
– Ability to detect very subtle patterns that
might otherwise escape notice
Jarvis & Crossley (2012)
Evidence of the third kind...
• The method supplies the first two kinds of
evidence “out of the box”
– The focus here is therefore on supplying the
third kind
• Sources of type 3 evidence
– the learner’s L1 performance
– comparable users’ L1 performance
– contrastive grammars
– traditional grammars
• Involves Contrastive Interlanguage Analysis
(Granger 1996)
– ILL2 < > NLL1
Evidence for
transfer
(Jarvis 2010)
1. Intergroup
heterogeneity
2. Intragroup
homogeneity
3. Cross-language
congruity
4. Intralingual
contrasts
L1 predictors
• 55 features (i.e. words) selected using
Discriminant Analysis (see box)
– DA explained on Saturday at LCR 2013
• Subjected to post-hoc analysis using
Tukey’s HSD
– single-step multiple comparison procedure
and statistical test that is used in conjunction
with an ANOVA to find means that differ
statistically from each other
• The output is not very easy to
interpret…
andre, at, av, bare,
barn, barna, bo, da, de,
den, det, du, eller, en,
enn, er, et, for, fordi,
fra, han, har, hun, i,
ikke, jeg, kan, liker,
man, mange, med,
meg, men, mennesker,
mer, min, mye, norge,
norsk, når, og, også,
om, på, skal, som,
sted, så, til, veldig,
venner, vi, viktig,
være, å
SH EN PL DE NO RU
X
Y Y Y Y
X X X
Df Sum Sq Mean Sq F value Pr(>F)
myData$L1 5 1790 358.1 10.11 2.65e-09 ***
Residuals 594 21044 35.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = myData[, X] ~ myData$L1)
$`myData$L1`
diff lwr upr p adj
en-de -1.373 -3.7796269 1.03362692 0.5781845
no-de 0.032 -2.3746269 2.43862692 1.0000000
pl-de -0.239 -2.6456269 2.16762692 0.9997514
ru-de 3.186 0.7793731 5.59262692 0.0023298
sh-de -2.434 -4.8406269 -0.02737308 0.0456381
no-en 1.405 -1.0016269 3.81162692 0.5528485
pl-en 1.134 -1.2726269 3.54062692 0.7583997
ru-en 4.559 2.1523731 6.96562692 0.0000013
sh-en -1.061 -3.4676269 1.34562692 0.8063672
pl-no -0.271 -2.6776269 2.13562692 0.9995400
ru-no 3.154 0.7473731 5.56062692 0.0026907
sh-no -2.466 -4.8726269 -0.05937308 0.0409536
ru-pl 3.425 1.0183731 5.83162692 0.0007589
sh-pl -2.195 -4.6016269 0.21162692 0.0969624
sh-ru -5.620 -8.0266269 -3.21337308 0.0000000
sh en pl de no ru
2.806 3.867 5.001 5.240 5.272 8.426
feature: den
NOTE:
Tukey’s HSD was performed for
groups of six L1s at a time. There were
six such “groups of six”:
– DE, EN, PL and RU were always
included (along with the control
group NO)
– NL, SH, SP, SO, SQ and VI
were each added in turn
– The example above shows the
homogeneity table for the group
of L1s that includes SH
– Examples to follow (including
the next one) contain up to six
homogeneity tables at once
Essence represented visually
as a “homogeneity table”
#1 NL speakers overuse skal
• Finite form of modal auxiliary skulle; used to
form the future tense
han skal lage middag i kveld
he will make dinner tonight
– Other methods:
• non-past: han lager middag i kveld
• construction komme til + infinitive
• Recognized tendency for beginners to overuse
this form
– Partly due to overly simplistic explanations
in teaching materials
• “Futurum lager vi av skal + infinitiv”
(Greftegreff 1985)
• Analysis shows that skal is overused by NL, SH,
SO, SQ and VI learners
RU DE EN NO PL NL
Y
X X X X X
RU DE EN NO PL SH
Y
X X X X X
RU DE EN NO PL SO
Y
X X X X X
RU DE EN NO PL SP
X X X X X X
RU DE EN NO PL SQ
Y
X X X X X
RU DE EN NO PL VI
Y
X X X X X
? proficiency
? thematic bias
? transfer
Proficiency?
• We have CEFR ratings for 7 of the 10
L1 groups (not NL, SH, SQ)
– VI and SO score lowest
– DE and EN score highest
• For these 7 L1 groups, overuse of skal
thus correlates with linguistic and/or
cultural distance
– VI and SO communities in Norway
originated as refugees
– If lower proficiency explains overuse
of skal for VI and SO, chances are
that it also does so for SH and SQ
– But this does not explain the NL case
• So could the reason for NL users’
overuse be thematic bias?
0 20 40 60 80 100
SO
VI
SP
RU
PL
EN
DE
A2 A2/B1 B1 B1/B2 B2 B2/C1 C1
Thematic bias?
• Some topics are more concerned with future events than others
– Over half the occurrences of skal are in 6 of the 46 topics
• Cf. occurrences pr. text (“freq”) with the topic held constant
– 4.9 (NL) >> 2.9 (SP)
– 1.3 (NL) >> 0.5 (EN) and 0.6 (SP)
– 1.1 (NL) >> 0.7 (DE) and 0.4 (EN)
• Even with the topic held constant, the tendency is clear
• Thematic bias can thus be ruled out
DE EN NL SP
wc tc freq wc tc freq wc tc freq wc tc freq
Framtida - - -   - - -   39 8 4.9 29 10 2.9
Bomiljø - - -   20 38 0.5 21 16 1.3 14 23 0.6
Bolig og bosted - - -   - - -   13 9 1.4 - - -  
Frivillig hjelp i 
organisasjoner 2 5 0.4 - - -   9 2 4.5 - - -  
Nyheter 7 10 0.7 4 9 0.4 8 7 1.1 2 -  
Reise - - -   - - -   8 14 0.6 - - -  
Cross-linguistic explanation
• In NL the future tenses are formed with the auxiliary zullen
hij zal het diner vanavond maken
• NL zullen cognate with skulle – finite form zal similar in form to skal
– EN shall also cognate with skal and similar in form, but much less frequent
in EN than ’ll, will and going to
– DE werden is neither cognate nor similar in form
• Conclusion: Strong tendency for NL speakers to overuse skal appears to
be a case of formal lexical transfer
– Caveat: NL has other means to express future action, including the non-past
tense (hij maakt het diner vanavond) and the auxiliary gaan
– Further investigation of relative frequencies necessary in order to confirm or
disconfirm possible transfer effects
➔ Is there anything else that should be considered???
#2 DE speakers overuse en
• Speakers of Slavic languages use the indefinite
articles en (m.) and et (n.) much less frequently
than learners from other L1 backgrounds
– Also applies to SO, SQ and VI. As expected
• But why do DE speakers use the masculine form
en more than everyone else?
– DE forms ein (m., n.), eine (f.) bear strong formal
resemblance to en
– Tendency to use en instead of et because of this?
– Detailed error analysis required.
• Hypothesis
– That DE speakers commit errors of type
<sic type="W" corr="et"><word>en</word></sic>
more frequently than other L1 groups
➔ Comments???
PL RU EN NO NL DE
Y Y Y
X X X
Y Y
X X
PL SH RU EN NO DE
Y Y
X X
Y Y
X X X
PL RU SO EN NO DE
Y Y
X X X
Y Y Y
X X X
PL RU SP EN NO DE
Y Y
X X X
Y Y Y
X X
PL RU SQ EN NO DE
Y Y
X X X
Y Y Y
X X
PL VI RU EN NO DE
Y Y
X X
Y Y Y
X X X
#3 EN speakers overuse et
• Cross-linguistic explanation?
– Avoidance of en (as indefinite article) due to
identification with the numeral ‘one’?
– Greater similarity between EN ‘a’ [ə] and NO et
(short vowel, unvoiced dental plosive) than between
‘a’ and NO en (formal lexical transfer)?
• Greater similarity between en and EN ‘an’, but ‘an’
much less frequent than ‘a’
– Wiktionary rankings #102 and #5 respectively
– ‘a’ occurs 11 times more often that ‘an’
– Evidence that frequency constrains transfer?
• Conclusion: L1 transfer appears to be at work
when EN speakers overuse et
➔ But how can this be proved beyond doubt???
RU PL DE NL NO EN
X X
Y Y Y
X X X
RU PL SH DE NO EN
Y Y
X X X X
SO RU PL DE NO EN
X X
Y Y
X X X X
RU PL DE SP NO EN
X X
Y Y Y
X X X X
RU PL SQ DE NO EN
X X
Y Y
X X X X
RU PL DE VI NO EN
X X
Y Y Y
X X X X
#4 PL and RU speakers: den and det
• These are 3SG pronouns, demonstratives, and
(preposed) definite articles
• RU speakers use den (m.) significantly more
often than all other L1 groups, including PL
speakers
• PL speakers use det (n.) significantly more
often than RU speakers
– Absolute usage figures:
• den PL 122, RU 166 (~40:60)
• det PL 668, RU 496 (~60:40)
➔ Why???
➔ How can we find out???
NOTE:
• 3SG personal pronouns
are identical in
PL (on, ona, ono) and
RU (он, она, оно)
• Demonstrative pronouns
– PL ten, ta, to
– RU етот, ето, ета
$den
SH EN PL DE NO RU
X
Y Y Y Y
X X X
$det
NO RU SH EN DE PL
X X X X
Y Y Y Y
X X X X
#5 EN speakers overuse er
• EN speakers use er ‘is, are’ statistically more
than all other L1 groups (except PL and SH)
• Most likely explanation: formal transfer
– formal resemblance er [æɾ] ~ are [ɑ(ɹ)]
EN NO
sg pl sg pl
1. am are er er
2. are are er er
3. is are er er
• High salience of ‘to be’ in English (not least
because of present continuous)
– And yet, ENPC shows finite forms of NO være to
be more frequent than finite forms of EN be
• 8,182 vs. 6,566 occurrences
➔ So how to explain EN overuse???
RU NO NL DE PL EN
X X
Y Y Y
X X X X
RU NO DE PL SH EN
X X X
Y Y Y
X X X
SO RU NO DE PL EN
X X
Y Y
X X X X
RU NO DE SP PL EN
Y Y
X X X
Y Y Y
X X X
RU NO SQ DE PL EN
X X
Y Y Y
X X X X
RU VI NO DE PL EN
X X
Y Y
X X X X
#6 While RU speakers underuse er
• PL and SH speakers use er more than RU
speakers
– Despite the fact that they are all Slavic languages
• PL and SH have a copula in the present tense
(być and бити ~ biti)
PL dom jest tam
SH куђа је тамо ~ kuća je tamo
‘the house is there’
• RU no longer has such a copula
RU дом _ там
‘the house is there’
➔ Case proved???
RU NO NL DE PL EN
X X
Y Y Y
X X X X
RU NO DE PL SH EN
X X X
Y Y Y
X X X
SO RU NO DE PL EN
X X
Y Y
X X X X
RU NO DE SP PL EN
Y Y
X X X
Y Y Y
X X X
RU NO SQ DE PL EN
X X
Y Y Y
X X X X
RU VI NO DE PL EN
X X
Y Y
X X X X
#7 Many L1 groups underuse være
Underuse by RU, SH, SO, SQ and VI
Possible cross-linguistic explanations:
RU no copula in present tense
VI copula là not used with adjectives
(because adjectives are verbal), thus:
Mai là sinh viên
‘Mai is (a) student’
but
Mai cao
‘Mai is tall’
SH copula exists but little used due to
contact with other Balkan languages
SO yahay ‘to be’ contracts with adjectives,
losing its root (-ah-) in the process
SQ no infinitives (është is finite form)
➔ Case proved???
RU NL DE PL NO EN
Y Y Y Y Y
X X X X X
SH RU DE PL NO EN
Y Y Y Y
X X X X X
SO RU DE PL NO EN
X X X X
Y Y Y Y
X X X X
RU DE PL NO SP EN
Y Y Y Y Y
X X X X
SQ RU DE PL NO EN
Y Y Y Y
X X X X X
VI RU DE PL NO EN
Y Y Y Y
X X X X X
#8 But EN speakers overuse være
• Overuse by EN speakers
– Difference is statistical w.r.t. RU, SH, SO, SQ
and VI
• Difference w.r.t. NO not statistical, but still
noticeable
– In the English-Norwegian Parallel Corpus, be
occurs much more frequently in English texts
(both fiction and non-fiction) than være does in
Norwegian texts
• be: 3,126 occurrences
• være: 1,193 occurrences
– Worthy of a more detailed investigation using
ENPC
➔ Alternative explanations?
RU NL DE PL NO EN
Y Y Y Y Y
X X X X X
SH RU DE PL NO EN
Y Y Y Y
X X X X X
SO RU DE PL NO EN
X X X X
Y Y Y Y
X X X X
RU DE PL NO SP EN
Y Y Y Y Y
X X X X
SQ RU DE PL NO EN
Y Y Y Y
X X X X X
VI RU DE PL NO EN
Y Y Y Y
X X X X X
#9 Prepositions i and på
• Preposition på ‘on’
– EN (overuse) vs. DE (underuse)
– Investigate using error analysis
– Check type and token frequencies of
constructions in which corresponding
L1 forms (on and auf) are congruent
in one L1 but not the other, e.g.:
– NO på søndag ≡EN on Sunday
but≠DE am Sonntag
whereas
– NO på engelsk ≡DE auf Englisch
but≠EN in English
• Preposition i ‘in’
– RU (overuse) vs. PL (underuse)
– Investigate using error analysis
➔ Any suggestions???
$i
PL EN DE NO NL RU
X X X
Y Y Y Y
X X X X
PL EN DE SH NO RU
Y Y
X X X X X
PL EN DE SO NO RU
Y Y Y
X X X X X
PL EN SP DE NO RU
Y Y
X X X X X
PL EN DE NO SQ RU
X X X
Y Y Y
X X X X
PL EN DE NO VI RU
X X X
Y Y Y Y
X X X X
$på
DE RU NO NL PL EN
Y Y Y Y Y
X X X X X
DE RU NO SH PL EN
Y Y Y Y Y
X X X X X
SO DE RU NO PL EN
X X X X
Y Y Y Y
X X X X
DE RU NO SP PL EN
Y Y Y Y Y
X X X X X
DE SQ RU NO PL EN
Y Y Y Y Y
X X X X X
DE RU NO VI PL EN
Y Y Y Y Y
X X X X X
Prepositions, especially spatial prepositions, are renowned for being “among the hardest expressions to acquire when learning a second language”
(Coventry & Garrod 2004: 4) and they have already been the subject of some interesting work based on ASK (Szymanska 2010; Malcher 2011).
#10 Prepositions til and fra
• Preposition til ‘to’
– underused by all L1 groups,
especially DE, SH and SQ
– …
• Preposition fra ‘from’
– used statistically more often
by EN speakers than by PL
or native speakers
– …
➔ Any suggestions here???
$til
DE RU PL NL EN NO
Y Y Y Y Y
X X X X X
SH DE RU PL EN NO
Y Y Y Y
X X X X X
DE RU SO PL EN NO
Y Y Y Y Y
X X X X X
DE RU SP PL EN NO
Y Y Y Y Y
X X X X X
SQ DE RU PL EN NO
Y Y Y Y
X X X X X
DE RU PL VI EN NO
Y Y Y Y Y
X X X X X
$fra
NO PL DE NL RU EN
X X X X
Y Y Y Y
X X X
NO PL SH DE RU EN
X X
Y Y Y
X X X X
NO PL DE SO RU EN
X X X X
Y Y Y Y
X X X
NO PL DE SP RU EN
X X X X
Y Y Y Y
X X X
NO PL DE SQ RU EN
X X X X
Y Y Y Y
X X X
NO PL DE VI RU EN
X X X X
Y Y Y Y
X X X X
#11 Underuse and overuse of og
• Striking contrast between PL speakers
(underuse) and RU speakers (overuse)
– Cannot be formal transfer, since PL i and RU и
are phonologically identical
• Different token frequencies in L1s?
– Wiktionary frequency lists (WFREQ)*
• RU и ranked as #1
• PL i ranked as #2 (after w ‘in’)
– Raw frequencies not comparable in WFREQ
• Zipfian distribution?
• Requires further investigation
➔ Your suggestions???
PL DE NL EN NO RU
Y Y
X X X X
PL DE SH EN NO RU
X X
Y Y
X X X X
PL DE EN SO NO RU
X X
Y Y Y
X X X X
PL SP DE EN NO RU
X X
Y Y
X X X X
PL SQ DE EN NO RU
X X
Y Y
X X X X
VI PL DE EN NO RU
X X
Y Y
X X X X
* http://en.wiktionary.org/wiki/Wiktionary:FREQ
#12 Overuse and underuse of eller
• DE and EN speakers overuse eller ‘or’
– Difference w.r.t. to NL is highly statistical
• This seems odd. (Are the Dutch more
decisive than the English and Germans?)
– Difference between DE and NO also statistical
– Frequency related?
• Mutual correspondence between NO eller
and EN ‘or’ is 84%
• RU speakers underuse eller
– Strong formal resemblance with или (ili)
• Possible cross-linguistic explanation
– или has a more restricted distribution
– Not used in negative contexts
он не любит ни футбол, ни теннис
‘he doesn’t like football or tennis’
RU NO NL PL EN DE
X X
Y Y Y Y
X X X X
RU SH NO PL EN DE
X X
Y Y Y
X X X X
RU SO NO PL EN DE
X X
Y Y Y
X X X X
RU NO PL SP EN DE
X X
Y Y Y Y
X X X
RU SQ NO PL EN DE
X X
Y Y Y
X X X X
RU VI NO PL EN DE
X X
Y Y Y
X X X X
#13 More general questions
• Misclassification can also be revealing
– Texts written by EN learners are more often misclassified as SP, rather than NL
or DE, despite EN being more closely related to the latter
➔ Why???
– Texts by SO and SQ learners are most often misclassified as RU, whilst texts
by VI learners are most often misclassified as PL
➔ Again, why???
• All the 12 patterns discussed above pertain to Indo-European languages
most closely related to NO (DE, EN, NL; PL, RU)
– There no really clear-cut predictors for the most distantly related L1s,
i.e. SO, SQ and VI
➔ Why???
Conclusion
• Discriminant analysis reveals subtle patterns of L2 usage that
would otherwise go undetected
• Homogeneity tables based on Tukey’s HSD can help us
understand those patterns
• Contrastive analysis is required in order to confirm that the
patterns are due to cross-linguistic influence
• All 13 issues discussed in this chapter are suitable topics for
further research using ASK
• This study has merely scratched the surface…
13 research questions
1. Why do NL speakers overuse skal?
2. Why do DE speakers overuse en?
3. Why do EN speakers overuse et?
4. Why do PL and RU speakers differ so
much in their use of den and det?
5. Why do EN speakers overuse er?
6. Why do RU speakers underuse er?
7. Why do many L1 groups underuse være?
8. Why do EN speakers, on the other hand,
overuse være?
9. Why do EN speakers overuse på, while DE
speakers underuse it?
And why do RU speakers overuse i, while PL
speakers underuse it?
10. Why do all L1 groups underuse til –
and why do EN speakers overuse fra?
11. Why do PL and RU speakers differ so
markedly in their use of og?
12. Why do EN and DE speakers overuse eller and
why do RU speakers underuse it?
13. What lies behind the misclassification patterns,
and why are there no good predictors for SO,
SQ and VI?
References
Donaldson, Bruce. 1997. Dutch: A Comprehensive Grammar. London: Routledge.
Granger, Sylviane. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and
learner corpora. In Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.) Languages in Contrast.
Papers from a symposium on text-based cross-linguistic studies. Lund 4–5 March 1994. Lund: Lund
University Press [Lund Studies in English 88], 37–51.
Greftegreff, Liv Astrid. 1985. Enkel norsk grammatikk. Oslo: NKS-Forlaget.
Husby, Olaf. 1999. En kort innføring i albansk. Trondheim: Tapir.
Husby, Olaf. 2001. En kort innføring i somali. Trondheim: Tapir.
Jarvis, Scott. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA
Yearbook 10, 169 192.‑
Jarvis, Scott & Scott A. Crossley (eds.) 2012. Approaching Language Transfer through Text Classification.
Explorations in the detection-based approach. Bristol: Multilingual Matters.
Koolhoven, H. 1961. Teach yourself Dutch. London: The English Universities Press.
Lie, Svein. 2005. Kontrastiv grammatikk – med norsk i sentrum, 3rd Edition. Oslo: Novus.
Malcher, Jenny. 2011. Jeg liker å treffe folk i café. Man må nyter de fine tingene på verden! Preposisjoner og
morsmålstransfer – en korpusbasert studie med i og på i fokus. Masters thesis, Department of Linguistics
and Scandinavian Studies, University of Oslo.
Mønnesland, Svein. 1990. Serbokroatisk-norsk kontrastiv grammatikk. In Hvenekilde, Anne (ed.) Med to
språk: Fem kontrastive språkstudier for lærere. Oslo: Cappelen.
Saaed, John Ibrahim. 1993. Somali Reference Grammar, 2nd Edition. Kensington, MD: Dunwoody Press.
Szymanska, Oliwia. 2010b. A conceptual approach towards the use of prepositional phrases in Norwegian – the
case of i and på. Folia Scandinavica 11, 173-183.
Wade, Terence. 2011. A Comprehensive Russian Grammar. Wiley: Malden MA.
Wiull, Hans Olaf. 2007. Bli bedre i norsk – se forskjellene mellom norsk og vietnamesisk. Oslo: VOX.

More Related Content

Similar to Subtle patterns of learner language: 13 topics for further research

Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Steve Pepper
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptxbdiot
 
Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019Teresa Lynn
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishSteven Saffels
 
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...mmitrovic2
 
How To Say What in Spanish
How To Say What in SpanishHow To Say What in Spanish
How To Say What in SpanishLive Lingua
 
language-journey-EN.pdf
language-journey-EN.pdflanguage-journey-EN.pdf
language-journey-EN.pdfzmolochid
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...onthewight
 
Arabic syntactic parsing
Arabic syntactic parsingArabic syntactic parsing
Arabic syntactic parsingAmena dheif
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxarnoldmeredith47041
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docxdennisa15
 
Historical linguistics
Historical linguisticsHistorical linguistics
Historical linguisticsRick McKinnon
 
An a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersAn a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersBangulzai
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptxAndi Wu
 
Spoken & prounuciation lesson 11
Spoken & prounuciation lesson 11Spoken & prounuciation lesson 11
Spoken & prounuciation lesson 11Md. Jamal Uddin
 
natasha.ppt
natasha.pptnatasha.ppt
natasha.pptMrBrave
 
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi SystemA Study on Implementation of Southern-Min Taiwanese Tone Sandhi System
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi Systemungian iunn
 

Similar to Subtle patterns of learner language: 13 topics for further research (20)

Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
 
Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019Dublin Machine Learning Meetup 2019
Dublin Machine Learning Meetup 2019
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP Spanish
 
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...
Nominal Collectivity, S-Mass & O-Mass: Syntax, Semantics, and Phonology Inter...
 
How To Say What in Spanish
How To Say What in SpanishHow To Say What in Spanish
How To Say What in Spanish
 
language-journey-EN.pdf
language-journey-EN.pdflanguage-journey-EN.pdf
language-journey-EN.pdf
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...
Professor John Coleman, Phonetics Department, Oxford University, talk "Voices...
 
Arabic syntactic parsing
Arabic syntactic parsingArabic syntactic parsing
Arabic syntactic parsing
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
The noun phrase introducers of npChapter 4the noun phr.docx
The noun phrase  introducers of npChapter 4the noun phr.docxThe noun phrase  introducers of npChapter 4the noun phr.docx
The noun phrase introducers of npChapter 4the noun phr.docx
 
Historical linguistics
Historical linguisticsHistorical linguistics
Historical linguistics
 
An a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersAn a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakers
 
BibleTech2013.pptx
BibleTech2013.pptxBibleTech2013.pptx
BibleTech2013.pptx
 
Spoken & prounuciation lesson 11
Spoken & prounuciation lesson 11Spoken & prounuciation lesson 11
Spoken & prounuciation lesson 11
 
natasha.ppt
natasha.pptnatasha.ppt
natasha.ppt
 
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi SystemA Study on Implementation of Southern-Min Taiwanese Tone Sandhi System
A Study on Implementation of Southern-Min Taiwanese Tone Sandhi System
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Subtle patterns of learner language: 13 topics for further research

  • 1. Subtle patterns of learner language Steve Pepper 2013-09-26 ASKeladden 13 topics for further research og er det å i jeg som en at på for de til ikke har med vi kan av man men om et så mange den varmå eller seg også mye veldig når være fra norge andre alle skal megdu vil noen hvis mer mennesker ha dette barn bare blirviktig fordi folk da han min barna hva noefå dem bli synes hvor selv etter hadde oss nå land år kommer ting gjøre alt enn dag der livet tror venner gå flere stor får trenger
  • 2. Introduction • An application of the detection-based argument (Jarvis 2010) – Modelled on Jarvis & Crossley (2012) • Use of data mining methods to 1) automatically detect (predict) the L1 2) identify (lexical) features that serve to discriminate between L1 groups, i.e. L1 predictors • Major advantages: – Ability to recognize positive as well as negative transfer – Ability to detect very subtle patterns that might otherwise escape notice Jarvis & Crossley (2012)
  • 3. Evidence of the third kind... • The method supplies the first two kinds of evidence “out of the box” – The focus here is therefore on supplying the third kind • Sources of type 3 evidence – the learner’s L1 performance – comparable users’ L1 performance – contrastive grammars – traditional grammars • Involves Contrastive Interlanguage Analysis (Granger 1996) – ILL2 < > NLL1 Evidence for transfer (Jarvis 2010) 1. Intergroup heterogeneity 2. Intragroup homogeneity 3. Cross-language congruity 4. Intralingual contrasts
  • 4. L1 predictors • 55 features (i.e. words) selected using Discriminant Analysis (see box) – DA explained on Saturday at LCR 2013 • Subjected to post-hoc analysis using Tukey’s HSD – single-step multiple comparison procedure and statistical test that is used in conjunction with an ANOVA to find means that differ statistically from each other • The output is not very easy to interpret… andre, at, av, bare, barn, barna, bo, da, de, den, det, du, eller, en, enn, er, et, for, fordi, fra, han, har, hun, i, ikke, jeg, kan, liker, man, mange, med, meg, men, mennesker, mer, min, mye, norge, norsk, når, og, også, om, på, skal, som, sted, så, til, veldig, venner, vi, viktig, være, å
  • 5. SH EN PL DE NO RU X Y Y Y Y X X X Df Sum Sq Mean Sq F value Pr(>F) myData$L1 5 1790 358.1 10.11 2.65e-09 *** Residuals 594 21044 35.4 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = myData[, X] ~ myData$L1) $`myData$L1` diff lwr upr p adj en-de -1.373 -3.7796269 1.03362692 0.5781845 no-de 0.032 -2.3746269 2.43862692 1.0000000 pl-de -0.239 -2.6456269 2.16762692 0.9997514 ru-de 3.186 0.7793731 5.59262692 0.0023298 sh-de -2.434 -4.8406269 -0.02737308 0.0456381 no-en 1.405 -1.0016269 3.81162692 0.5528485 pl-en 1.134 -1.2726269 3.54062692 0.7583997 ru-en 4.559 2.1523731 6.96562692 0.0000013 sh-en -1.061 -3.4676269 1.34562692 0.8063672 pl-no -0.271 -2.6776269 2.13562692 0.9995400 ru-no 3.154 0.7473731 5.56062692 0.0026907 sh-no -2.466 -4.8726269 -0.05937308 0.0409536 ru-pl 3.425 1.0183731 5.83162692 0.0007589 sh-pl -2.195 -4.6016269 0.21162692 0.0969624 sh-ru -5.620 -8.0266269 -3.21337308 0.0000000 sh en pl de no ru 2.806 3.867 5.001 5.240 5.272 8.426 feature: den NOTE: Tukey’s HSD was performed for groups of six L1s at a time. There were six such “groups of six”: – DE, EN, PL and RU were always included (along with the control group NO) – NL, SH, SP, SO, SQ and VI were each added in turn – The example above shows the homogeneity table for the group of L1s that includes SH – Examples to follow (including the next one) contain up to six homogeneity tables at once Essence represented visually as a “homogeneity table”
  • 6. #1 NL speakers overuse skal • Finite form of modal auxiliary skulle; used to form the future tense han skal lage middag i kveld he will make dinner tonight – Other methods: • non-past: han lager middag i kveld • construction komme til + infinitive • Recognized tendency for beginners to overuse this form – Partly due to overly simplistic explanations in teaching materials • “Futurum lager vi av skal + infinitiv” (Greftegreff 1985) • Analysis shows that skal is overused by NL, SH, SO, SQ and VI learners RU DE EN NO PL NL Y X X X X X RU DE EN NO PL SH Y X X X X X RU DE EN NO PL SO Y X X X X X RU DE EN NO PL SP X X X X X X RU DE EN NO PL SQ Y X X X X X RU DE EN NO PL VI Y X X X X X ? proficiency ? thematic bias ? transfer
  • 7. Proficiency? • We have CEFR ratings for 7 of the 10 L1 groups (not NL, SH, SQ) – VI and SO score lowest – DE and EN score highest • For these 7 L1 groups, overuse of skal thus correlates with linguistic and/or cultural distance – VI and SO communities in Norway originated as refugees – If lower proficiency explains overuse of skal for VI and SO, chances are that it also does so for SH and SQ – But this does not explain the NL case • So could the reason for NL users’ overuse be thematic bias? 0 20 40 60 80 100 SO VI SP RU PL EN DE A2 A2/B1 B1 B1/B2 B2 B2/C1 C1
  • 8. Thematic bias? • Some topics are more concerned with future events than others – Over half the occurrences of skal are in 6 of the 46 topics • Cf. occurrences pr. text (“freq”) with the topic held constant – 4.9 (NL) >> 2.9 (SP) – 1.3 (NL) >> 0.5 (EN) and 0.6 (SP) – 1.1 (NL) >> 0.7 (DE) and 0.4 (EN) • Even with the topic held constant, the tendency is clear • Thematic bias can thus be ruled out DE EN NL SP wc tc freq wc tc freq wc tc freq wc tc freq Framtida - - -   - - -   39 8 4.9 29 10 2.9 Bomiljø - - -   20 38 0.5 21 16 1.3 14 23 0.6 Bolig og bosted - - -   - - -   13 9 1.4 - - -   Frivillig hjelp i  organisasjoner 2 5 0.4 - - -   9 2 4.5 - - -   Nyheter 7 10 0.7 4 9 0.4 8 7 1.1 2 -   Reise - - -   - - -   8 14 0.6 - - -  
  • 9. Cross-linguistic explanation • In NL the future tenses are formed with the auxiliary zullen hij zal het diner vanavond maken • NL zullen cognate with skulle – finite form zal similar in form to skal – EN shall also cognate with skal and similar in form, but much less frequent in EN than ’ll, will and going to – DE werden is neither cognate nor similar in form • Conclusion: Strong tendency for NL speakers to overuse skal appears to be a case of formal lexical transfer – Caveat: NL has other means to express future action, including the non-past tense (hij maakt het diner vanavond) and the auxiliary gaan – Further investigation of relative frequencies necessary in order to confirm or disconfirm possible transfer effects ➔ Is there anything else that should be considered???
  • 10. #2 DE speakers overuse en • Speakers of Slavic languages use the indefinite articles en (m.) and et (n.) much less frequently than learners from other L1 backgrounds – Also applies to SO, SQ and VI. As expected • But why do DE speakers use the masculine form en more than everyone else? – DE forms ein (m., n.), eine (f.) bear strong formal resemblance to en – Tendency to use en instead of et because of this? – Detailed error analysis required. • Hypothesis – That DE speakers commit errors of type <sic type="W" corr="et"><word>en</word></sic> more frequently than other L1 groups ➔ Comments??? PL RU EN NO NL DE Y Y Y X X X Y Y X X PL SH RU EN NO DE Y Y X X Y Y X X X PL RU SO EN NO DE Y Y X X X Y Y Y X X X PL RU SP EN NO DE Y Y X X X Y Y Y X X PL RU SQ EN NO DE Y Y X X X Y Y Y X X PL VI RU EN NO DE Y Y X X Y Y Y X X X
  • 11. #3 EN speakers overuse et • Cross-linguistic explanation? – Avoidance of en (as indefinite article) due to identification with the numeral ‘one’? – Greater similarity between EN ‘a’ [ə] and NO et (short vowel, unvoiced dental plosive) than between ‘a’ and NO en (formal lexical transfer)? • Greater similarity between en and EN ‘an’, but ‘an’ much less frequent than ‘a’ – Wiktionary rankings #102 and #5 respectively – ‘a’ occurs 11 times more often that ‘an’ – Evidence that frequency constrains transfer? • Conclusion: L1 transfer appears to be at work when EN speakers overuse et ➔ But how can this be proved beyond doubt??? RU PL DE NL NO EN X X Y Y Y X X X RU PL SH DE NO EN Y Y X X X X SO RU PL DE NO EN X X Y Y X X X X RU PL DE SP NO EN X X Y Y Y X X X X RU PL SQ DE NO EN X X Y Y X X X X RU PL DE VI NO EN X X Y Y Y X X X X
  • 12. #4 PL and RU speakers: den and det • These are 3SG pronouns, demonstratives, and (preposed) definite articles • RU speakers use den (m.) significantly more often than all other L1 groups, including PL speakers • PL speakers use det (n.) significantly more often than RU speakers – Absolute usage figures: • den PL 122, RU 166 (~40:60) • det PL 668, RU 496 (~60:40) ➔ Why??? ➔ How can we find out??? NOTE: • 3SG personal pronouns are identical in PL (on, ona, ono) and RU (он, она, оно) • Demonstrative pronouns – PL ten, ta, to – RU етот, ето, ета $den SH EN PL DE NO RU X Y Y Y Y X X X $det NO RU SH EN DE PL X X X X Y Y Y Y X X X X
  • 13. #5 EN speakers overuse er • EN speakers use er ‘is, are’ statistically more than all other L1 groups (except PL and SH) • Most likely explanation: formal transfer – formal resemblance er [æɾ] ~ are [ɑ(ɹ)] EN NO sg pl sg pl 1. am are er er 2. are are er er 3. is are er er • High salience of ‘to be’ in English (not least because of present continuous) – And yet, ENPC shows finite forms of NO være to be more frequent than finite forms of EN be • 8,182 vs. 6,566 occurrences ➔ So how to explain EN overuse??? RU NO NL DE PL EN X X Y Y Y X X X X RU NO DE PL SH EN X X X Y Y Y X X X SO RU NO DE PL EN X X Y Y X X X X RU NO DE SP PL EN Y Y X X X Y Y Y X X X RU NO SQ DE PL EN X X Y Y Y X X X X RU VI NO DE PL EN X X Y Y X X X X
  • 14. #6 While RU speakers underuse er • PL and SH speakers use er more than RU speakers – Despite the fact that they are all Slavic languages • PL and SH have a copula in the present tense (być and бити ~ biti) PL dom jest tam SH куђа је тамо ~ kuća je tamo ‘the house is there’ • RU no longer has such a copula RU дом _ там ‘the house is there’ ➔ Case proved??? RU NO NL DE PL EN X X Y Y Y X X X X RU NO DE PL SH EN X X X Y Y Y X X X SO RU NO DE PL EN X X Y Y X X X X RU NO DE SP PL EN Y Y X X X Y Y Y X X X RU NO SQ DE PL EN X X Y Y Y X X X X RU VI NO DE PL EN X X Y Y X X X X
  • 15. #7 Many L1 groups underuse være Underuse by RU, SH, SO, SQ and VI Possible cross-linguistic explanations: RU no copula in present tense VI copula là not used with adjectives (because adjectives are verbal), thus: Mai là sinh viên ‘Mai is (a) student’ but Mai cao ‘Mai is tall’ SH copula exists but little used due to contact with other Balkan languages SO yahay ‘to be’ contracts with adjectives, losing its root (-ah-) in the process SQ no infinitives (është is finite form) ➔ Case proved??? RU NL DE PL NO EN Y Y Y Y Y X X X X X SH RU DE PL NO EN Y Y Y Y X X X X X SO RU DE PL NO EN X X X X Y Y Y Y X X X X RU DE PL NO SP EN Y Y Y Y Y X X X X SQ RU DE PL NO EN Y Y Y Y X X X X X VI RU DE PL NO EN Y Y Y Y X X X X X
  • 16. #8 But EN speakers overuse være • Overuse by EN speakers – Difference is statistical w.r.t. RU, SH, SO, SQ and VI • Difference w.r.t. NO not statistical, but still noticeable – In the English-Norwegian Parallel Corpus, be occurs much more frequently in English texts (both fiction and non-fiction) than være does in Norwegian texts • be: 3,126 occurrences • være: 1,193 occurrences – Worthy of a more detailed investigation using ENPC ➔ Alternative explanations? RU NL DE PL NO EN Y Y Y Y Y X X X X X SH RU DE PL NO EN Y Y Y Y X X X X X SO RU DE PL NO EN X X X X Y Y Y Y X X X X RU DE PL NO SP EN Y Y Y Y Y X X X X SQ RU DE PL NO EN Y Y Y Y X X X X X VI RU DE PL NO EN Y Y Y Y X X X X X
  • 17. #9 Prepositions i and på • Preposition på ‘on’ – EN (overuse) vs. DE (underuse) – Investigate using error analysis – Check type and token frequencies of constructions in which corresponding L1 forms (on and auf) are congruent in one L1 but not the other, e.g.: – NO på søndag ≡EN on Sunday but≠DE am Sonntag whereas – NO på engelsk ≡DE auf Englisch but≠EN in English • Preposition i ‘in’ – RU (overuse) vs. PL (underuse) – Investigate using error analysis ➔ Any suggestions??? $i PL EN DE NO NL RU X X X Y Y Y Y X X X X PL EN DE SH NO RU Y Y X X X X X PL EN DE SO NO RU Y Y Y X X X X X PL EN SP DE NO RU Y Y X X X X X PL EN DE NO SQ RU X X X Y Y Y X X X X PL EN DE NO VI RU X X X Y Y Y Y X X X X $på DE RU NO NL PL EN Y Y Y Y Y X X X X X DE RU NO SH PL EN Y Y Y Y Y X X X X X SO DE RU NO PL EN X X X X Y Y Y Y X X X X DE RU NO SP PL EN Y Y Y Y Y X X X X X DE SQ RU NO PL EN Y Y Y Y Y X X X X X DE RU NO VI PL EN Y Y Y Y Y X X X X X Prepositions, especially spatial prepositions, are renowned for being “among the hardest expressions to acquire when learning a second language” (Coventry & Garrod 2004: 4) and they have already been the subject of some interesting work based on ASK (Szymanska 2010; Malcher 2011).
  • 18. #10 Prepositions til and fra • Preposition til ‘to’ – underused by all L1 groups, especially DE, SH and SQ – … • Preposition fra ‘from’ – used statistically more often by EN speakers than by PL or native speakers – … ➔ Any suggestions here??? $til DE RU PL NL EN NO Y Y Y Y Y X X X X X SH DE RU PL EN NO Y Y Y Y X X X X X DE RU SO PL EN NO Y Y Y Y Y X X X X X DE RU SP PL EN NO Y Y Y Y Y X X X X X SQ DE RU PL EN NO Y Y Y Y X X X X X DE RU PL VI EN NO Y Y Y Y Y X X X X X $fra NO PL DE NL RU EN X X X X Y Y Y Y X X X NO PL SH DE RU EN X X Y Y Y X X X X NO PL DE SO RU EN X X X X Y Y Y Y X X X NO PL DE SP RU EN X X X X Y Y Y Y X X X NO PL DE SQ RU EN X X X X Y Y Y Y X X X NO PL DE VI RU EN X X X X Y Y Y Y X X X X
  • 19. #11 Underuse and overuse of og • Striking contrast between PL speakers (underuse) and RU speakers (overuse) – Cannot be formal transfer, since PL i and RU и are phonologically identical • Different token frequencies in L1s? – Wiktionary frequency lists (WFREQ)* • RU и ranked as #1 • PL i ranked as #2 (after w ‘in’) – Raw frequencies not comparable in WFREQ • Zipfian distribution? • Requires further investigation ➔ Your suggestions??? PL DE NL EN NO RU Y Y X X X X PL DE SH EN NO RU X X Y Y X X X X PL DE EN SO NO RU X X Y Y Y X X X X PL SP DE EN NO RU X X Y Y X X X X PL SQ DE EN NO RU X X Y Y X X X X VI PL DE EN NO RU X X Y Y X X X X * http://en.wiktionary.org/wiki/Wiktionary:FREQ
  • 20. #12 Overuse and underuse of eller • DE and EN speakers overuse eller ‘or’ – Difference w.r.t. to NL is highly statistical • This seems odd. (Are the Dutch more decisive than the English and Germans?) – Difference between DE and NO also statistical – Frequency related? • Mutual correspondence between NO eller and EN ‘or’ is 84% • RU speakers underuse eller – Strong formal resemblance with или (ili) • Possible cross-linguistic explanation – или has a more restricted distribution – Not used in negative contexts он не любит ни футбол, ни теннис ‘he doesn’t like football or tennis’ RU NO NL PL EN DE X X Y Y Y Y X X X X RU SH NO PL EN DE X X Y Y Y X X X X RU SO NO PL EN DE X X Y Y Y X X X X RU NO PL SP EN DE X X Y Y Y Y X X X RU SQ NO PL EN DE X X Y Y Y X X X X RU VI NO PL EN DE X X Y Y Y X X X X
  • 21. #13 More general questions • Misclassification can also be revealing – Texts written by EN learners are more often misclassified as SP, rather than NL or DE, despite EN being more closely related to the latter ➔ Why??? – Texts by SO and SQ learners are most often misclassified as RU, whilst texts by VI learners are most often misclassified as PL ➔ Again, why??? • All the 12 patterns discussed above pertain to Indo-European languages most closely related to NO (DE, EN, NL; PL, RU) – There no really clear-cut predictors for the most distantly related L1s, i.e. SO, SQ and VI ➔ Why???
  • 22. Conclusion • Discriminant analysis reveals subtle patterns of L2 usage that would otherwise go undetected • Homogeneity tables based on Tukey’s HSD can help us understand those patterns • Contrastive analysis is required in order to confirm that the patterns are due to cross-linguistic influence • All 13 issues discussed in this chapter are suitable topics for further research using ASK • This study has merely scratched the surface…
  • 23. 13 research questions 1. Why do NL speakers overuse skal? 2. Why do DE speakers overuse en? 3. Why do EN speakers overuse et? 4. Why do PL and RU speakers differ so much in their use of den and det? 5. Why do EN speakers overuse er? 6. Why do RU speakers underuse er? 7. Why do many L1 groups underuse være? 8. Why do EN speakers, on the other hand, overuse være? 9. Why do EN speakers overuse på, while DE speakers underuse it? And why do RU speakers overuse i, while PL speakers underuse it? 10. Why do all L1 groups underuse til – and why do EN speakers overuse fra? 11. Why do PL and RU speakers differ so markedly in their use of og? 12. Why do EN and DE speakers overuse eller and why do RU speakers underuse it? 13. What lies behind the misclassification patterns, and why are there no good predictors for SO, SQ and VI?
  • 24. References Donaldson, Bruce. 1997. Dutch: A Comprehensive Grammar. London: Routledge. Granger, Sylviane. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In Karin Aijmer, Bengt Altenberg and Mats Johansson (eds.) Languages in Contrast. Papers from a symposium on text-based cross-linguistic studies. Lund 4–5 March 1994. Lund: Lund University Press [Lund Studies in English 88], 37–51. Greftegreff, Liv Astrid. 1985. Enkel norsk grammatikk. Oslo: NKS-Forlaget. Husby, Olaf. 1999. En kort innføring i albansk. Trondheim: Tapir. Husby, Olaf. 2001. En kort innføring i somali. Trondheim: Tapir. Jarvis, Scott. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA Yearbook 10, 169 192.‑ Jarvis, Scott & Scott A. Crossley (eds.) 2012. Approaching Language Transfer through Text Classification. Explorations in the detection-based approach. Bristol: Multilingual Matters. Koolhoven, H. 1961. Teach yourself Dutch. London: The English Universities Press. Lie, Svein. 2005. Kontrastiv grammatikk – med norsk i sentrum, 3rd Edition. Oslo: Novus. Malcher, Jenny. 2011. Jeg liker å treffe folk i café. Man må nyter de fine tingene på verden! Preposisjoner og morsmålstransfer – en korpusbasert studie med i og på i fokus. Masters thesis, Department of Linguistics and Scandinavian Studies, University of Oslo. Mønnesland, Svein. 1990. Serbokroatisk-norsk kontrastiv grammatikk. In Hvenekilde, Anne (ed.) Med to språk: Fem kontrastive språkstudier for lærere. Oslo: Cappelen. Saaed, John Ibrahim. 1993. Somali Reference Grammar, 2nd Edition. Kensington, MD: Dunwoody Press. Szymanska, Oliwia. 2010b. A conceptual approach towards the use of prepositional phrases in Norwegian – the case of i and på. Folia Scandinavica 11, 173-183. Wade, Terence. 2011. A Comprehensive Russian Grammar. Wiley: Malden MA. Wiull, Hans Olaf. 2007. Bli bedre i norsk – se forskjellene mellom norsk og vietnamesisk. Oslo: VOX.

Editor's Notes

  1. J&amp;C focused on (1). One of my goals was to take this one step further by also doing (2)
  2. I assume everyone is familiar with Jarvis’ framework for methodological rigour in transfer research, and the kinds of evidence he calls for (Jarvis 2000, Jarvis &amp; Pavlenko 2008, Jarvis 2010). (The real Type 3 evidence can only be found in the head of the language user…)
  3. Average proficiency level for SO and VI clearly lower than the others. Correlates negatively with linguistic and cultural distance. Likely that this also explains results for SH and SQ (for which figures are not available). But what about NL (for which we also do not have CEFR ratings)? Would expect them to be up there with DE and EN – in which case lower proficiency level cannot explain the fact that the pattern of overuse amongst NL learners. Could it be thematic bias?
  4. The table compares the number of occurrences per topic for DE, EN, NL and SP speakers for the six topics that have the most occurrences of skal amongst NL speakers. The word count (wc), number of texts (tc) and the ratio between them is shown for each L1 group. Thus, NL speakers produced 39 occurrences of skal in the 8 texts entitled Framtida , a ratio of 4.9 occurrences per text. This figure can be compared with that for SP speakers writing on the same topic, i.e. 2.9 – a considerable difference. Comparisons can also be made between EN and SP speakers for the topic Bomiljø (‘Residential environment’), and between DE and EN speakers for the topic Nyheter (‘News’). In each case the ratio of occurrences of skal per text is consistently higher for NL speakers. In other words, even when choice of topic is held constant, the predilection of NL speakers for the word skal is still very clear. So what is the explanation ?
  5. Caveat: Apparently there are other mechanisms for forming the future in Dutch, Koolhoven’s statement notwithstanding. One of them is to use the present tense with future meaning. A proper contrastive analysis is required to determine the relative frequencies of the various forms.
  6. Wiktionary rankings are 102 and 5 respectively, with ‘a’ used more than 11 times more often than ‘an’.
  7. FIXME !! йцукенгшщзхън фывапролджэ\ \ячсмитьбю.
  8. FIXME!! йцукенгшщзхън фывапролджэ\ \ячсмитьбю. дом там љњертзуиопшђ асдфгхјклчћж &lt;ѕџцвбнм,.- куђа је тамо qwertzuiopšđ asdfghjklčćž &lt;yxcvbnm,.- kuća je tamo
  9. FIXME!! йцукенгшщзхън фывапролджэ\ \ячсмитьбю.
  10. FIXME!! йцукенгшщзхън фывапролджэ\ \ячсмитьбю.