CS/INFO 630, Spring 2006 (original) (raw)

CS/INFO 630, Spring 2006: Human Language Technology

(or, IR, NLP, and special guests; or, Representing and accessing digital information)

Note: After the Spring 2006 semester, this course was renumbered; the current version is CS674/INFO 630.

Prof. Lillian Lee (follow link for contact information and office hours)
TR 2:55-4:10, Bard 140

Official description (from the course catalog):

Information retrieval has evolved from the problem of locating books in a library to a multitude of tasks ubiquitous in business, science, and personal life. Modern information systems automatically compose newspapers, extract facts from the web, and analyze usage patterns. This course covers the necessary techniques for representing, organizing, and accessing digital information that is in textual or semistructured form. Topics combine information retrieval, natural language processing, and machine learning, with links to work in databases and data mining.

Administrative info:

Course description and policies(distributed 1/24/06). Includes exam dates and tentative syllabus.
Examples for lecture-guide preparation and an example (for a different class) of the scribe-notes portion (distributed 1/26/06)
Midterm exam, mid(-)term notes, and Lecture 15 Krafft/Rabkin guide (includes brief solution sketches for some midterm questions)
Information regarding the final exam Resources: (reference texts, other lecture notes and slides, etc.) Lectures:

1/24/06: "Sense and Sensibility: Automatically Analyzing Subject and Sentiment in Human-Authored Texts" (preface)
- Handouts: course description and policies [pdf]

1/26/06: Basics of information retrieval; the vector-space model
- Lecture guides: cover sheet,Haque/Shaparenko, Rabkin/Krafft
- Handouts: Examples for lecture-guide preparation [pdf] and an example (for a different class) of the scribe-notes portion [ps]
- Lecture references:
  * Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001. [ps]
  * TREC conference proceedings. [website]. See especially the overviews.
- Additional references:
  * See the course resources page for other overviews.
  * David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004. [html]An interesting sidelight on the origins of the vector space model, the assumptions underlying the VSM, and phantom citations.
  * Karen Spärck Jones. IDF term weighting and IR research lessons. Journal of Documentation 60(5):521–523 (2004). [pdf]Reviews the history of the introduction of the three "consensus" term-weighting quantities. Notes that Spärck Jones' data consisted of presence/absence wordlists, thus explaining the non-incorporation of term-frequency information.

1/31/06: An example of empirical IR research: length normalization
- Lecture guides: cover sheet for Danis/Rogan (statement of problem corrected 2/21/06), Danis/Rogan scribe notes, Danis/Rogan problems, Leshed/Shami (cover sheet for Leshed/Shami superseded by updated cover sheet for Danis/Rogan)
- Lecture references:
  * Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. SIGIR 1996. [ps, pdf]
  * Justin Zobel. How reliable are large-scale information retrieval experiments? SIGIR 1998 [pdf, pdf2]
- Additional references:
  * Mark Sanderson and Hideo Joho. Forming test collections with no system pooling. SIGIR 2004. [pdf]
  * Ellen Voorhees and Donna Harman. Overview of the Eighth Text REtrieval Conference (TREC-8) [good ps, poor pdf]. Section 3 discusses the effects of pooling.

2/2/06: Completion of pivoted document-length normalization
- Lecture guides: cover sheet, Dejgosha/Hu scribe notes, Hamatake/Park
- Additional references:
  * Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir Frieder. Document normalization revisited. SIGIR poster (2002). [pdf,pdf2]
  * Jaap Kamps, Maarten de Rijke, and Boerkur Sigurbjoernsson. Length normalization in XML retrieval. SIGIR (2004) [abstract,pdf]

2/7/06: Introduction to probabilistic retrieval (classic case)
- Lecture guides: cover sheet, Au/Gerner/Kot, Babinski/Lin
- Lecture references:
  * Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf of Part 1, pdf of Part 2]
  * William S. Cooper. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems (TOIS), pp. 100–111, 1995. [pdf]
- Additional references:
  * Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain Campbell. "Is this document relevant? ...probably": A survey of probabilistic models in information retrieval. ACM Computing Surveys 30(4) (1998). [pdf,pdf2]
  * M. E. Maron. Probabilistic approaches to the document retrieval problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for interpreting the probability of relevance, and also gives a personal historical perspective on information science (as Maron construes it).
  * Stephen Robertson. The probability ranking principle in IR. Journal of Documentation 33 (1977). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).Appendix gives an example where ranking by probability of relevance is suboptimal.
  * Stephen Robertston and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-46 (1976). [pdf]

2/9/06: Completion of a derivation of (a version of) the RSJ model; the case of binary attributes; an isolated-Poisson model for term frequencies
- Lecture guides: cover sheet, Connolly/Mujeeb,Haque/Shaparenko
- Lecture references:
  * W. Bruce Croft and D. Harper. Using probabilistic models of information retrieval without relevance information. Journal of Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references, including samples from the continuing examination of explanations for/derivations of IDF weighting.
  * Kishore Papineni. Why inverse document frequency? NAACL (2001). [pdf]
  * Stephen Robertson. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60(5):503–520 (2004). [pdf]
  * Arjen P. de Vries and Thomas Roelleke. Relevance Information: A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf,pdf2]

2/14/06: The two-Poisson model and approximations (complete classic probabilistic retrieval)
- Lecture guides: cover sheet (3/9/06: minor update to note that a corrected version of the lecture handout has been posted), Krafft/Rabkin, Leshed/Shami
- Handout: pdf (3/7/06: corrected lecture number and noted typo in Singhal's Okapi equation)
- Lecture references:
  * Abraham Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25:312–318 (1974).
  * Stephen P. Harter. A probabilistic approach to automatic keyword indexing, part I: On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4). [pdf.)
  * Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references:
  * Eugene L. Margulis. N-Poisson document modelling. SIGIR, pp. 177–189 (1992). [pdf]
  * S. E. Robertson and S. Walker and S. Jones and M. M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3 (1994) [pdf,ps.gz]Introduces BM25.
  * Origin of the name "Okapi". [html]

2/16/06: Introduction to the language modeling approach
- Lecture guide: cover sheet, Danis/Rogan scribe notes, Danis/Rogan problems
- Lecture references:
  * Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. SIGIR (1998). [pdf]
  * John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds., Language Modeling and Information Retrieval (2003). [pdf,ps]
  * ChengXiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR (2001). [pdf,ps]
- Additional references:
  * Language modeling and information retrieval. The Lemur project. [html]
  * Xiaoyong Liu and W. Bruce Croft. Statistical language modeling for information retrieval. Annual Review of Information Science and Technology 39 (2003). [pdf]

2/21/06: More on the LM approach
- Lecture guides: cover sheet for Babinski/Lin, Babinski/Lin(3/10/06: lecturer-induced typo in exponent for HMM LM incorporated into guide and hence notice removed from cover sheet), Hamatake/Park scribe notes (3/16: a few typos corrected), Hamatake/Park problems
- Additional references:
  * David MacKay and Linda Peto. A hierarchical Dirichlet language model. Natural Language Engineering 1(3):289–307 (1995). [pdf]
  * ChengXiang Zhai. Notes on the KL-divergence retrieval formula and Dirichlet prior smoothing (2003). [pdf]Our discussion of the effects of using corpus-dependent Dirichlet smoothing is modeled on the derivation given in this paper, although the initial scoring function is a bit different.

2/23/06: Completion of alternate LM formulation; introduction to relevance feedback

Lecture guides (cover sheet announces the altered derivation presented in both sets of scribe notes): Au/Gerner/Kot (3/29: a few typos corrected),Connolly/Mujeeb
Lecture references:
* Ian Ruthven and Mounias Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(1) (2003) [pdf]
Additional references:
* ChengXiang Zhai and John Lafferty. Document language models, query models, and risk minimization for information retrieval. SIGIR (2001) [ps,pdf;long version] Considers the query and the relevant documents to have been generated by the same source LM.
* Karen Spärck Jones. Language modelling's generative model: is it rational? (2004) [pdf]

2/28/06: Relevance feedback methods

Lecture guides: cover sheet (updated 6/12/06),Haque/Shaparenko, Krafft/Rabkin
Lecture references:
* Victor Lavrenko and W. Bruce Croft. Relevance-based language models. SIGIR (2001) [pdf,pdf2]
* Stephen Robertson. On term selection for query expansion. Journal of Documentation 46(4):359–364 (1990) [pdf]
* J. J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing (1971)
* E. Ide and G. Salton. Interactive search strategies and dynamic file organization in information retrieval. Ibid, pp. 373-393 (1971)
Additional references:
* Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990). [pdf]

3/2/06: Relevance feedback: further explorations and evaluation issues

Lecture guides: Danis/Rogan, Leshed/Shami
Lecture references:
* Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
* Helene Fowkes and Micheline Beaulieu. Interactive searching behavior: Okapi experiments for TREC8. Proceedings of 22nd BCS-IRSG European Colloquium on IR Research (2000).
* Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). [pdf, pdf2]
* Xuehua Shen and ChengXiang Zhai. Active feedback in ad hoc information retrieval. SIGIR, pp. 59–66 (2005). [pdf, pdf2]
Additional references:
* Micheline Beaulieu, Helene Fowkes, Nega Alemayehu, and Mark Sanderson. Interactive Okapi at Sheffield — TREC-8 (1999). [pdf]

3/7/06: Completion of relevance feedback; implicit feedback sources

Lecture guides: Hamatake/Park
Lecture references:
* Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search via automated analysis of interests and activities. SIGIR, pp. 449–456 (2005). [pdf,pdf2]
* Masahiro Morita and Yoichi Shinoda. Information filtering based on user behavior analysis and best match text retrieval. SIGIR, pp. 272–281 (1994). [pdf]
* Diane Kelly and Nicholas J. Belkin. Display time as implicit feedback: Understanding task effects. SIGIR (2004). [pdf,pdf2]
Additional references:
* NIPS 2005 workshop on machine learning for implicit feedback and user modeling [homepage] and associated PASCAL "inferring relevance from eye movements challenge" 2005 [homepage]
* Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003). [pdf, pdf2]
* Combining eye movements and collaborative filtering for proactive information retrieval. Kai Puolamäki, Jarkko Salojärvi, Eerika Savia, Jaana Simola, and Samuel Kaski. SIGIR (2005). [pdf, pdf2]
* Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Beyond the commons: Investigating the value of personalizing web search. Workshop on New Technologies for Personalized Information Access (PIA) (2005). [pdf]Study showing that individuals can give the same results different relevancy judgments even when the individuals had the same information need, that the same query can represent different information needs, and related findings.

3/9/06: An LM-based approach to implicit feedback

Lecture guides: cover sheet (updated 5/15/06 to correct a mistake in notation), Au/Gerner/Kot, Babinski/Lin
Lecture references:
* Xuehua Shen, Bin Tan, and ChengXiang Zhai. Context-sensitive information retrieval using implicit feedback. SIGIR, pp. 43–50 (2005). [pdf,pdf2]
Additional references:
* Thomas Cover and Joy A. Thomas. Elements of information theory. Wiley (1991) [book homepage]. Well-known introduction to the field..
* Lillian Lee. Similarity-based approaches to natural language processing (1997). [pdf]Section 2.3 contains some introductory material describing motivations for the use of the Kullback-Leibler (KL) divergence.

3/14/06: Clickthrough data as implicit feedback: human validation

Lecture guides: Connolly/Mujeeb, Haque/Shaparenko
Lecture references:
* Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. SIGIR, pp. 154–161 (2005). [pdf,pdf2]

3/16/06: midterm exam (see also mid(-)term notes (3/28/06) and the Lecture 15 Krafft/Rabkin guide (includes brief solution sketches for some midterm questions))

3/28/06: Clickthrough data as relative implicit feedback

Lecture guides: Au/Gerner, Krafft/Rabkin (includes brief solution sketches for some midterm questions)
Handouts: mid(-)term notes
Lecture references:
* Thorsten Joachims. Optimizing search engines using clickthrough data. KDD, 2002 [pdf, pdf2]
* Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. KDD, pp. 239–248 (2005) [pdf,pdf2]Co-winner, best student paper award.
Additional references:
* Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115–132 (2000) [ps.gz]

3/30/06: Matrix-theoretic characterizations of corpus structure (introduction to the singular value decomposition (SVD))

Lecture guides: Danis/Rogan, Kot/Leshed
Additional references:
* Lloyd N. (Nick) Trefethen and David Bau III. Numerical Linear Algebra (1997). [book homepage] This book, and especially lectures one,fourand five(all from part I, "Fundamentals"), should have been referenced in class: our presentation turns out to resemble theirs, but is, not surprisingly, not nearly as nice.

4/4/06: The SVD

Lecture guides: Connolly/Lin/Mujeeb, Hamatake/Park (Update, 6/12/06: corrigenda)
Additional references: (see also previous lecture)
* Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition (1996). [book homepage]. The "Bible" on, well, matrix computations. Extensive coverage of the singular value decomposition (SVD), although certainly not as gentle an introduction as in class.

4/6/06: "Few-factor" representations (LSI, pLSI, and others)

Lecture guides: Haque/Shaparenko.Note: In class, I incorrectly stated that the GMAT automated scoring system,e-rater, uses LSI; I confused two different systems. This lecture guide corrects the error.
Handouts: Lecture aid
Lecture references:
* David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. NIPS 16 (2003). [pdf]
* Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391–407 (1990). [pdf,pdf2]
* Peter W. Foltz, Darrell Laham, and Thomas K. Landauer. Automated essay scoring: Applications to education technology. Proceedings of ED-MEDIA (1999). [html]
* Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR, pp. 50–57 (1999). [pdf,pdf2]
* Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2):211–240 (1997). [pdf,html,html2]
* Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791 (1999) [pdf]
* Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: a probabilistic analysis. Symposium on Principles of Database Systems, pp. 159–168 (1998) [pdf]. Journal version: Journal of Computer and Systems Science 61(20):217–235 (2000) [ps]
Additional references:
* Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162 (2001) [pdf,pdf2]
* Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared Saia. Spectral analysis of data. Symposium on Theory of Computing (STOC), pp. 619–626 (2001) [pdf,(longer?) ps]
* Thomas Hofmann and Jan Puzicha. Statistical models for co-occurrence data. AI Memo 1625, MIT (1998). [pdf]
* Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of English words. Proceedings of the ACL (1993). [pdf]
* Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping e-rater: Challenging the validity of automated essay scoring (2001). [pdf]An interesting study in which knowledgeable humans were asked to produce essays that would be incorrectly scored by automatic methods.
* Michael Steele and David Aldous. Introduction to the interface of probability and algorithms. Probability and Algorithms: A Panel Report. National Academy of Sciences (1993). [pdf]One subsection gives a quick calculation for the expected number of occupied tables in the Chinese restaurant process.
* Demo: score your own essay (on psycholinguistics) with LSI. [home page]

4/11/06: Modeling syntactic structure: context-free grammars

Lecture guides: Au, Krafft/Rabkin
Handouts: Lecture aid
Lecture references:
* Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. Generalized phrase structure grammar. Harvard University Press (1985).
* Maurice Gross. Méthodes en syntaxe: régime des constructions complétives. Hermann (1975).
Additional references:
* James Allen. Section 5.2: Movement phenomena in language. Natural language understanding, second edition. Benjamin/Cummings (1995).
* Christopher Culy. The complexity of the vocabulary of Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer link]
* Daniel Jurafsky and James H. Martin. Chapter 9: Context-free grammars for English. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall (2000).
* Fernando C. N. Pereira and Stuart M. Shieber. Prolog and natural-language analysis. CLSI (1987) [pdf].Sections 4.2.3-4.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe syntactic aspects of filler-gap dependencies.
* Geoff Pullum, 1986. Footloose and context-free. Natural language and linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo vocabulary hoax, U. of Chicago Press 1991. A very fun read.
* Stuart Shieber. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer link]

4/13/06: Feature-based context-free grammars; introduction to tree-adjoining grammars

Lecture guides: Kot/Leshed, Danis/Rogan
Lecture references:
* Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
Additional references:
* James Allen. Section 5.3: Handling questions in context-free grammars. Natural language understanding, second edition. Benjamin/Cummings (1995). Motivated by gap feature propagation in GPSG, as described in Gazdar et al. (1985) (full citation in references for the previous lecture).
* Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [pdf]
* Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall (2000).
* K. Vijay-Shankar and Aravind K. Joshi. Some computational properties of tree adjoining grammars. Proceedings of the 23rd ACL, pp. 82–93 (1985). [pdf,pdf2]
* The XTAG project. [home page]. Site includes a tutorial, grammars for English and Korean, synchronous-tag implementation, and a supertagging demo.

4/18/06: TAGs, continued

Lecture guides: Connolly/Lin/Mujeeb
Additional references: (see also above)
* Lillian Lee. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer science: Reflections on the Field, Reflections from the Field, National Academies Press, pp. 111–118 (2004). [pdf]

4/20/06: TAGs for idiom analysis; adjunction constraints

Lecture guides: Hamatake/Park
Lecture references:
* K. Vijay-Shanker and Aravind K. Joshi. Feature structures based tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
Additional references: (see also above)
* Anne Abeillé and Yves Schabes. Parsing idioms in lexicalized TAGs. Proc. of the 4th EACL, pp. 1--9 (1989). [pdf]

4/25/06: Algorithms for grammar parsing and learning

Lecture references:
* Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjölander, Rebecca C. Underwood, and David Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22(23):5112–20 (1994). [pdf]
* David B. Searls. The linguistics of DNA. American Scientist 80(6):579–591. (1992)
* Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2):165–201. [pdf,html,tech-report ps]
* Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210(2): 277–303, special issue on genome informatics (1999) [pdf]
Additional references:
* Daniel Jurafsky and James H. Martin. Chapter 10: Parsing with context-free grammars. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall (2000).
* Lillian Lee. Fast context-free grammar parsing requires fast Boolean matrix multiplication. Journal of the ACM 49(1):1–15 (2002). [pdf,pdf2]
* Giorgio Satta. Tree adjoining grammar parsing and Boolean matrix multiplication. Computational Linguistics 20(2):173–192 (1994). [pdf]
* Giorgio Satta. Tutorial on tree adjoining grammar parsing. Seventh international workshop on tree adjoining grammarrs and related formalisms (TAG+7) (2004). [ps]

4/27/06: The EM algorithm

Lecture references:
* James K. Baker. Trainable grammars for speech recognition. Speech communication papers for the 97th meeting of the Acoustical Society of America, pp. 547–550 (1979)
* Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4:35–56 (1990)
* Detlef Prescher. A tutorial on the Expectation-Maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. [pdf]The handout for lecture consists of pages 44–47.
Additional references: Note that notation varies across different presentations, with the same variable often used to refer to different things, unfortunately.
* Adam Berger. Convexity, maximum likelihood and all that. [ps]
* Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free grammars. Computational Linguistics 24(2):298–305 (1998). [pdf]
* Michael Collins. The EM algorithm (1997) [ps]
* Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977) [ocr]
* Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions. Second edition. Wiley (2006). [errata for first printing]
* Ted Pedersen. The EM algorithm: selected readings (2001) [pdf]
* Stefan Riezler. Getting EM to work (talk at EMNLP 2001) [ps.gz]

5/2/06: Conclusion of EM; introduction to maximum-entropy models.

Handouts: lecture aid
Lecture references:
* Adam Berger, Stephen Della Pietra, and Vincent Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics 22(1):39–71 (1996). [pdf,ps]
Additional references:
* Adam Berger's page on "MaxEnt and exponential models" [html]Contains many tutorials on relevant topics.

5/4/06: Conclusion of maximum entropy and of the class

Handouts: Information regarding the final exam
Lecture references:
* John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, pp. 282–289 (2001) [pdf]