The use of film subtitles to estimate word frequencies | Applied Psycholinguistics | Cambridge Core (original) (raw)
Abstract
We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures.
References
Baayen H.,Feldman L., &Schreuder B.2006.Morphological influences on the recognition of monosyllabic monomorphemic words.Journal of Memory and Language,55,290–313.Google Scholar
Baayen H.,Piepenbrock R., &Gulikers L.1995.The CELEX Lexical Database(Release 2) [CD-ROM].Philadelphia, PA:University of Pennsylvania, Linguistic Data Consortium.
Balota D. A.,Cortese M. J.,Sergent-Marshall S. D.,Spieler D. H., &Yap M. J.2004.Visual word recognition of single-syllable words.Journal of Experimental Psychology: General,133,283–316.Google Scholar
Balota D. A.,Yap M. J.,Cortese M. J.,Hutchison K. I.,Kessler B.,Loftis B., et al.(in press).The English Lexicon Project.Behavior Research Method.
Blair I. V.,Urland G. R., &Ma J. E.2002.Using Internet search engines to estimate word frequency.Behavior Research Methods, Instruments, & Computers,34,286–290.Google Scholar
Bonin P.,Chalard M.,Méot A., &Fayol M.2001.Age-of-acquisition and word frequency in the lexical decision task: Further evidence from the French language.Current Psychology of Cognition,20,401–443.Google Scholar
Desmet T.,De Baecke C.,Drieghe D.,Brysbaert M., &Vonk W.2006.Relative clause attachment in Dutch: On-line comprehension corresponds to corpus frequencies when lexical variables are taken into account.Language and Cognitive Processes,21,453–485.Google Scholar
Grondelaers S.,Deygers K.,van Aken H.,van den Heede V., &Speelman D.2000.Het ConDiv-corpus geschreven Nederlands.Nederlandse Taalkunde,5,356–363.Google Scholar
New B.,Pallier C.,Brysbaert M., &Ferrand L.2004.Lexique 2: A new French lexical database.Behavior Research Methods, Instruments, & Computers,36,516–524.Google Scholar
New B.,Pallier C.,Ferrand L., &Matos R.2001.Une base de données lexicales du français contemporain sur internet: LEXIQUE,L'Année Pschologique,101,447–462.Google Scholar
Robert P.1996.Le grand Robert électronique [Software]. Havas Interactive. Accessed at http://www.havas.co.
RomaryL.,Salmon-Alt S., &Francopoulo G.2004.Standards going concrete: From LMF to Morphalou.Unpublished manuscript, Coling,Geneva, Switzerland,Workshop on Electronic Dictionaries.