Language & Statistics II (original) (raw)

Instructor: Prof. Noah Smith
Meeting time: Tuesday and Thursday 3-4:20 pm
Office hours: Tuesday and Thursday 4:30-5:30 pm, NSH 2602F (right after lecture)
Location: Doherty 1217

Prerequisite: Language and Statistics (11-761) or permission of instructor.Recommended: Algorithms for Natural Language Processing (11-711), Machine Learning (15-681, 15-781, or 11-746)

This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing.

Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and *ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.

Each student will give a ~20-minute oral presentation on his/her literature review. A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?

A final written exam will be given to test basic competence with the technical material covered in the lectures.

Dates (tentative)

Topic

Readings

Lecture Slides

8/29

Philosophy: the empirical way of thinking about language.

Pereira, 2000; Abney, 1996

pdf

8/31-9/5

Stochastic models for sequences: Markov models, hidden Markov models, and related algorithms

Manning & Schütze, 1999 (ch. 9); Smith, 2004 (works through an HMM example); Eisner, 2002 (MS Excel spreadsheet illustrating forward-backward)

pdf1, pdf2

9/7-9/14

Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization, and convex optimization

There's a lot of tutorial material on these kinds of models. Here are some starting points: Adam Berger's page, Adwait Ratnaparkhi's tutorial, a handout I made for another class.
Research papers: Lafferty, McCallum, and Pereira, 2001; Chen and Rosenfeld, 1999; Khudanpur and Wu, 2000; Rosenfeld, Chen, and Zhu, 2000; Della Pietra, Della Pietra, and Lafferty, 1997

pdf1, pdf2, pdf3

9/19-9/21

Interspeech (no lecture)

9/26-9/28

Weighted finite-state technology

Eisner, 2002; Stolcke and Omohundro, 1993; Mehryar Mohri's list of references will be helpful if you want to know about algorithms for FSTs. Karttunen, 2001 will tell you all about two-level morphology using FSTs.

Tools: Xerox's FS group, AT&T FSM libraries, RWTH FSA toolkit

pdf1, pdf2

10/3-10/12

Stochastic grammars and statistical parsing

Johnson, 1998
papers about some important parsers:Charniak, 1997; Charniak, 2000; Collins, 2003; Klein and Manning, 2003; McDonald, Pereira, Ribarov, and Hajic, 2005

pdf1, pdf2, pdf3, pdf4

10/17-10/19

Weighted dynamic programming

Goodman, 1999;Eisner, Goldlust, and Smith, 2005; if you're in love, Shieber, Schabes, and Pereira, 1995

pdf1, pdf2

10/24

Discriminative training: perceptron, boosting, maximum margin estimation

Collins (2002); Taskar and Klein's tutorial at ACL 2005 on maximum margin methods

pdf

10/26

Information extraction (guest lecture: Vitor Carvalho)

Cohen and McCallum's tutorial at KDD 2003; Siefkes and Siniakov, 2005

10/31-11/2

Discriminative training (continued) and reranking; transformation-based learning

Collins, 2000; Collins and Duffy, 2002; Brill, 1992

pdf1,pdf2

11/7

Unsupervised learning: clustering and EM, clustering words

Brown et al., 1992; Pereira, Tishby, and Lee, 1993; Schütze, 1993

pdf

11/9-11/14

The EM algorithm for structured models, and with hidden data and partially-hidden data; contrastive estimation

Merialdo, 1994; Pereira and Schabes, 1992 (note corrected link); Klein and Manning, 2002; Smith and Eisner, 2005

pdf1, pdf2

11/16

Semisupervised learning: Yarowsky algorithms, co-training.

Yarowsky, 1995; Blum and Mitchell, 1998; Nigam and Ghani, 2000; Abney, 2004

pdf

11/21

Experimentation and hypothesis testing.

pdf

11/28

Final presentations

3:00 Mengqiu Wang: question answering
3:30 Yitao Sun: syntactic language modeling

11/30

Final presentations

3:00 David Huggins-Daines: optimality theory

12/5

Final presentations

3:00 Kevin Gimpel: topic modeling
3:30 Greg Hanneman: statistical syntactic machine translation (1)

12/7

Final presentations

3:00 Amr Ahmed: statistical syntactic machine translation (2)
3:30 Jaime Arguello: unsupervised morphology induction

?

Course review

12/11

Final exam