Language & Statistics II (original) (raw)
Instructor: Prof. Noah Smith
Meeting time: Tuesday and Thursday 3-4:20 pm
Office hours: Tuesday and Thursday 4:30-5:30 pm, NSH 2602F (right after lecture)
Location: Doherty 1217
Prerequisite: Language and Statistics (11-761) or permission of instructor.Recommended: Algorithms for Natural Language Processing (11-711), Machine Learning (15-681, 15-781, or 11-746)
This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing.
Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and *ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.
Each student will give a ~20-minute oral presentation on his/her literature review. A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?
A final written exam will be given to test basic competence with the technical material covered in the lectures.
Dates (tentative)
Topic
Readings
Lecture Slides
8/29
Philosophy: the empirical way of thinking about language.
8/31-9/5
Stochastic models for sequences: Markov models, hidden Markov models, and related algorithms
Manning & Schütze, 1999 (ch. 9); Smith, 2004 (works through an HMM example); Eisner, 2002 (MS Excel spreadsheet illustrating forward-backward)
9/7-9/14
Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization, and convex optimization
There's a lot of tutorial material on these kinds of models. Here are some starting points: Adam Berger's page, Adwait Ratnaparkhi's tutorial, a handout I made for another class.
Research papers: Lafferty, McCallum, and Pereira, 2001; Chen and Rosenfeld, 1999; Khudanpur and Wu, 2000; Rosenfeld, Chen, and Zhu, 2000; Della Pietra, Della Pietra, and Lafferty, 1997
9/19-9/21
Interspeech (no lecture)
9/26-9/28
Weighted finite-state technology
Eisner, 2002; Stolcke and Omohundro, 1993; Mehryar Mohri's list of references will be helpful if you want to know about algorithms for FSTs. Karttunen, 2001 will tell you all about two-level morphology using FSTs.
Tools: Xerox's FS group, AT&T FSM libraries, RWTH FSA toolkit
10/3-10/12
Stochastic grammars and statistical parsing
Johnson, 1998
papers about some important parsers:Charniak, 1997; Charniak, 2000; Collins, 2003; Klein and Manning, 2003; McDonald, Pereira, Ribarov, and Hajic, 2005
10/17-10/19
Weighted dynamic programming
Goodman, 1999;Eisner, Goldlust, and Smith, 2005; if you're in love, Shieber, Schabes, and Pereira, 1995
10/24
Discriminative training: perceptron, boosting, maximum margin estimation
Collins (2002); Taskar and Klein's tutorial at ACL 2005 on maximum margin methods
10/26
Information extraction (guest lecture: Vitor Carvalho)
Cohen and McCallum's tutorial at KDD 2003; Siefkes and Siniakov, 2005
10/31-11/2
Discriminative training (continued) and reranking; transformation-based learning
Collins, 2000; Collins and Duffy, 2002; Brill, 1992
11/7
Unsupervised learning: clustering and EM, clustering words
Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Schütze, 1993
11/9-11/14
The EM algorithm for structured models, and with hidden data and partially-hidden data; contrastive estimation
Merialdo, 1994; Pereira and Schabes, 1992 (note corrected link); Klein and Manning, 2002; Smith and Eisner, 2005
11/16
Semisupervised learning: Yarowsky algorithms, co-training.
Yarowsky, 1995; Blum and Mitchell, 1998; Nigam and Ghani, 2000; Abney, 2004
11/21
Experimentation and hypothesis testing.
11/28
Final presentations
3:00 Mengqiu Wang: question answering
3:30 Yitao Sun: syntactic language modeling
11/30
Final presentations
3:00 David Huggins-Daines: optimality theory
12/5
Final presentations
3:00 Kevin Gimpel: topic modeling
3:30 Greg Hanneman: statistical syntactic machine translation (1)
12/7
Final presentations
3:00 Amr Ahmed: statistical syntactic machine translation (2)
3:30 Jaime Arguello: unsupervised morphology induction
?
Course review
12/11
Final exam