CS 6740/INFO 6300, Spring 2010 (original) (raw)

Instructor: Prof. Lillian Lee. Office hours: usually Fridays 10:45-11:45 or by appointment, but please check my webpage, http://www.cs.cornell.edu/home/llee, for updates.
Lectures: TR 10:10–11:25, Upson 315
Midterm: March 18th, in class; Final: Thursday May 13, 2-4:30pm, Upson 315

The content below renders in tab-organized format if javascript is enabled. This page sets a cookie so that the next time you return to this page (if ever ...), the tab from which you left will be open.

Administration

Brief overview

Philosophy, Spring 2010: This class is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. The course focuses on the development and derivation of major ideas, and aims to promote research skills for students working in and outside of language technologies.While this course is thus not primarily a survey of the field, pointers to related/current work will be provided. Because of the wealth of Cornell machine-learning courses, learning is not an emphasis of this class (despite its immense importance in the field) to avoid overlap.

Please see the Course description and policies handout for more information.

Tentative syllabus: See “Lectures” section/tab.

Prerequisites: (Firm) knowledge of elementary computer science, probability, and linear algebra. Neither CS/INFO 4300 nor CS/COGST/LING 4740 are prerequisites.

Administrative handouts

Cornell Code of Academic Integrity, and Acknowledging the Work of Others . I strongly encourage students to read the “Acknowledging ...” document carefully; it's quite informative.
Course description and policies. Includes course philosophy, course work requirements, exam dates, and tentative syllabus.
Guidelines for lecture-guide preparation
Example problem for lecture-guide preparation.
Sample midterm (from 2006) and brief solution notes (see section 2 of document)
Final(-)exam guide, including sample problems; solutions thereunto

General resources

Homepage for my previous running of this course click here for Fall 07

Reference texts

Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval (1999). US mirror.
Rik Belew, Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW(2000).
Eugene Charniak, Statistical Language Learning, 1993.
Bruce Croft, Don Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice (2010). Some chapters available online.
W. Bruce Croft and John Lafferty, editors, Language Modeling for Information Retrieval (2003).
William B. Frakes and Ricardo Baeza-Yates, Information Retrieval: Data Structures & Algorithms (1992).
Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition (2008).
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, (2008).
Christopher D. Manning and Hinrich Schuetze, Foundations of Statistical Natural Language Processing (1999). Completely online via Cornell.
Fernando C. N. Pereira and Stuart M. Shieber. Prolog and Natural-Language Analysis. (1987).
C. J. van Rijsbergen, Information Retrieval, second edition (1979). Alternate version w/o indexing or page divisions.
Karen Spärck Jones and Peter Willett. Readings in Information Retrieval (1997).

Pointers to papers

Alistair Moffat, Justin Zobel, and David Hawking, Recommended reading for IR research students, SIGIR Forum 39(2):3–14, 2005. [pdf,pdf2]

Cornell IR/NLP courses

Datasets and software:

List of locally available resources (requires Cornell IP address)
Cornell-produced datasets (and others here)
Bow
Galago
LEMUR
Lucene
LDA-(topic-modeling-)related code
MALLET
NLTK
Nutch
openNLP
Zettair

0. (Jan 26) A prefatory lecture

Lecture slides
Handouts: course description and policies
The projects described in lecture correspond to these papers:
Rie Kubota Ando.Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. SIGIR, 2000.
Rie Kubota Ando and Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162, 2001.
Matt Thomas, Bo Pang, and Lillian Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. EMNLP, pp. 327–335, 2006.
Cristian Danescu-Niculescu-Mizil, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. WWW, pp. 141–150, 2009.
We'll be covering latent semantic indexing (LSI) itself in more depth later in the course. Here is some additional material and references on some of the other topics we covered.
slides for an hour-long talk on IRR: pdf (Part One), pdf (Part Two)
slides for a 30-minute talk on the Amazon study
Rie Kubota Ando. The document representation problem: An Analysis of LSI and Iterative Residual Rescaling. Ph.D. thesis, 2001.
Shay David and Trevor Pinch. Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday, special issue on commercial applications of the Internet, July 2006. .
Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Now Publishers/Foundations and Trends in Information Retrieval, 2008.
Cornell movie-review datasets and congressional-debate datasets for performing sentiment-analysis experiments on.

1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

Lecture guide: none, but see the problems (section 4) in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina
Handouts: Example problem for lecture-guide preparation
Lecture references:
- TREC datasets
- TREC conference proceedings website. .
Additional references:
- Chapter 8 "Evaluating search engines" of the Croft/Metzler/Strohman book
- Chapter 8 "Evaluation in information retrieval" of the Manning/Raghavan/Schütze book
- Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. SIGIR 2000, pp. 33–40.
- David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004.
- Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. . slides
- Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
- Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4): pp. 422–446, 2002. ,
- Stephen Robertson. A new interpretation of average precision. SIGIR, 2008, pp. 689–690.
- Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001.
- Ellen M. Voorhees and Donna K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
- William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai. Precision-at-ten considered redundant. SIGIR 2008, pp. 695–696.

0. (Jan 26) A prefatory lecture

Lecture slides
Handouts: course description and policies
The projects described in lecture correspond to these papers:
Rie Kubota Ando.Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. SIGIR, 2000.
Rie Kubota Ando and Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162, 2001.
Matt Thomas, Bo Pang, and Lillian Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. EMNLP, pp. 327–335, 2006.
Cristian Danescu-Niculescu-Mizil, Gueorgi Kossinets, Jon Kleinberg, and Lillian Lee. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. WWW, pp. 141–150, 2009.
We'll be covering latent semantic indexing (LSI) itself in more depth later in the course. Here is some additional material and references on some of the other topics we covered.
slides for an hour-long talk on IRR: pdf (Part One), pdf (Part Two)
slides for a 30-minute talk on the Amazon study
Rie Kubota Ando. The document representation problem: An Analysis of LSI and Iterative Residual Rescaling. Ph.D. thesis, 2001.
Shay David and Trevor Pinch. Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday, special issue on commercial applications of the Internet, July 2006. .
Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Now Publishers/Foundations and Trends in Information Retrieval, 2008.
Cornell movie-review datasets and congressional-debate datasets for performing sentiment-analysis experiments on.

1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

Lecture guide: none, but see the problems (section 4) in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina
Handouts: Example problem for lecture-guide preparation
Lecture references:
- TREC datasets
- TREC conference proceedings website. .
Additional references:
- Chapter 8 "Evaluating search engines" of the Croft/Metzler/Strohman book
- Chapter 8 "Evaluation in information retrieval" of the Manning/Raghavan/Schütze book
- Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. SIGIR 2000, pp. 33–40.
- David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004.
- Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. . slides
- Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
- Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4): pp. 422–446, 2002. ,
- Stephen Robertson. A new interpretation of average precision. SIGIR, 2008, pp. 689–690.
- Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001.
- Ellen M. Voorhees and Donna K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
- William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai. Precision-at-ten considered redundant. SIGIR 2008, pp. 695–696.

1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

Lecture guide: none, but see the problems (section 4) in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina
Handouts: Example problem for lecture-guide preparation
Lecture references:
- TREC datasets
- TREC conference proceedings website. .
Additional references:
- Chapter 8 "Evaluating search engines" of the Croft/Metzler/Strohman book
- Chapter 8 "Evaluation in information retrieval" of the Manning/Raghavan/Schütze book
- Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. SIGIR 2000, pp. 33–40.
- David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004.
- Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. . slides
- Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
- Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4): pp. 422–446, 2002. ,
- Stephen Robertson. A new interpretation of average precision. SIGIR, 2008, pp. 689–690.
- Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001.
- Ellen M. Voorhees and Donna K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
- William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai. Precision-at-ten considered redundant. SIGIR 2008, pp. 695–696.

2. (Feb 2) length normalization (who'da thunk?)

Lecture guide: none, but see the problems (note that problem 2 starts on the page numbered 13) in the FA07 lecture guide by Cristian Danescu-Niculescu-Mizil and Vasumathi Raman
Lecture references:
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR 1994, pp. 354–354. ,
- Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira.AT&T at TREC-7. TREC, 1998.2.
Additional references:
- Section 6.4 “Variant tf-idf functions” of the Manning/Raghavan/Schütze book
- John Broglio, James P. Callan, W. Bruce Croft and Daniel W. Nachbar. Document retrieval and routing using the INQUERY system. TREC 3, 1995.
- Aleksander Kolcz and Wen-tau Yih. Raising the baseline for high-precision text classifiers.KDD 2007, pp. 400–409.

3. (Feb 4) pivoted document-length normalization

Lecture guide by Lakshmi Ganesh and Navin Sivakumar
Lecture references:
- Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. SIGIR 1996. ,.
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR 1994, pp. 354–354. ,
Additional references:
- Section 6.4.4 “Pivoted normalized document length” of the Manning/Raghavan/Schütze book
- Abdur Chowdhury, M. Catherine McCabe, David Grossman, Ophir Frieder. Document normalization revisited. SIGIR 2002, pp. 381–382.
- Tze Leung Chung, Robert Wing Pong Luk, Kam Fai Wong, Kui Lam Kwok, Dik Lun Lee. Adapting pivoted document-length normalization for query size: Experiments in Chinese and English. ACM Transactions on Asian Language Information Processing (TALIP) 5(3):245--263 (2006).
- Ben He, Iadh Ounis. On setting the hyper-parameters of term frequency normalization for information retrieval. TOIS 25(3), 2007. . .
- Martin Kaszkiel and Justin Zobel. Passage retrieval revisited.SIGIR 1997.
- David E. Losada, Leif Azzopardi, and Mark Baillie. Revisiting the relationship between document length and relevance. CIKM, pp. 419–428, 2008.

1. (Jan 28) Information-retrieval basics (setting, evaluation); intro to the vector-space model

Lecture guide: none, but see the problems (section 4) in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina
Handouts: Example problem for lecture-guide preparation
Lecture references:
- TREC datasets
- TREC conference proceedings website. .
Additional references:
- Chapter 8 "Evaluating search engines" of the Croft/Metzler/Strohman book
- Chapter 8 "Evaluation in information retrieval" of the Manning/Raghavan/Schütze book
- Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. SIGIR 2000, pp. 33–40.
- David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004.
- Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. . slides
- Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
- Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4): pp. 422–446, 2002. ,
- Stephen Robertson. A new interpretation of average precision. SIGIR, 2008, pp. 689–690.
- Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001.
- Ellen M. Voorhees and Donna K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
- William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai. Precision-at-ten considered redundant. SIGIR 2008, pp. 695–696.

2. (Feb 2) length normalization (who'da thunk?)

Lecture guide: none, but see the problems (note that problem 2 starts on the page numbered 13) in the FA07 lecture guide by Cristian Danescu-Niculescu-Mizil and Vasumathi Raman
Lecture references:
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR 1994, pp. 354–354. ,
- Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira.AT&T at TREC-7. TREC, 1998.2.
Additional references:
- Section 6.4 “Variant tf-idf functions” of the Manning/Raghavan/Schütze book
- John Broglio, James P. Callan, W. Bruce Croft and Daniel W. Nachbar. Document retrieval and routing using the INQUERY system. TREC 3, 1995.
- Aleksander Kolcz and Wen-tau Yih. Raising the baseline for high-precision text classifiers.KDD 2007, pp. 400–409.

3. (Feb 4) pivoted document-length normalization

Lecture guide by Lakshmi Ganesh and Navin Sivakumar
Lecture references:
- Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. SIGIR 1996. ,.
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR 1994, pp. 354–354. ,
Additional references:
- Section 6.4.4 “Pivoted normalized document length” of the Manning/Raghavan/Schütze book
- Abdur Chowdhury, M. Catherine McCabe, David Grossman, Ophir Frieder. Document normalization revisited. SIGIR 2002, pp. 381–382.
- Tze Leung Chung, Robert Wing Pong Luk, Kam Fai Wong, Kui Lam Kwok, Dik Lun Lee. Adapting pivoted document-length normalization for query size: Experiments in Chinese and English. ACM Transactions on Asian Language Information Processing (TALIP) 5(3):245--263 (2006).
- Ben He, Iadh Ounis. On setting the hyper-parameters of term frequency normalization for information retrieval. TOIS 25(3), 2007. . .
- Martin Kaszkiel and Justin Zobel. Passage retrieval revisited.SIGIR 1997.
- David E. Losada, Leif Azzopardi, and Mark Baillie. Revisiting the relationship between document length and relevance. CIKM, pp. 419–428, 2008.

4. (Feb 9) Evaluation: annotation and experimental design

Lecture guide by Jerzy Hausknecht and Kent Sutherland
Lecture references:
- Omar Alonso. Guidelines for designing crowdsourcing-based relevance experiments. Manuscript, date 2009?
- Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. SIGIR, pp. 25–32, 2004.
- Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and Ian Soboroff. Reliable Information Retrieval Evaluation with Incomplete and Biased Judgements. SIGIR, pp. 63–70, 2007.
- Ben Carterette, James Allan, and Ramesh Sitaraman. Minimal test collections for retrieval evaluation. SIGIR, pp. 268–275, 2006. 2
- Luis Von Ahn and Laura Dabbish. Labeling images with a computer game. CHI 2004. , .
- Justin Zobel. How reliable are large-scale information retrieval experiments? SIGIR 1998.
Additional references:
- Chris Callison-Burch. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. EMNLP 2009, pp. 286–295. 2
- Benjamin Carterette. Low-cost and robust evaluation of IR systems. PhD thesis, 2008.
- Gordon V. Cormack and Christopher R. Palmer and Charles L. A. Clarke. Efficient construction of large test collections. SIGIR 1998, pp. 282–289.
- Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM TOIS 27(1), 2008.
- Panos Ipeirotis. Your estimated waiting time: Infinite!. Blog post dated Feb 5, 2009.
- Tetsuya Sakai and Noriko Kando. On information retrieval metrics designed for evaluation with incomplete relevance assessments. .Information Retrieval, 11(5):447–470, 2008. journal-site html
- Mark Sanderson and Hideo Joho. Forming test collections with no system pooling. SIGIR 2004.
- Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Ng. Cheap and fast — But is it good? Evaluating non-expert annotations for natural language tasks. EMNLP 2008, pp. 254–263.
- Section 3.3.2 of Ellen Voorhees and Donna Harman. Overview of the Eighth Text REtrieval Conference (TREC-8). poor pdf.
- William Webber and Laurence A. F. Park. Score adjustment for correction of pooling bias. SIGIR 2009, pp. 444 451. .

5. (Feb 11) Introduction to (Robertson/Spärck Jones) probabilistic retrieval

Lecture guide by Ellis Weng and Andrew Owens
Lecture references:
- Stephen Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–46 (1976).
Additional references:
- Probabilistic information retrieval chapter of the Manning/Raghavan/Schütze book.
- Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992).
- C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
- Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval 3(4):pp. 333–389, 2009.
- Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). html of preface, pdf of Part 1, pdf of Part 2

6. (Feb 16) RSJ probabilistic retrieval: binary models and the IDF

Lecture guide: none, but see the problem in the FA07 lecture guide by Nam Nguyen and Myle Ott
Lecture references:
- W. Bruce Croft and D. Harper. Using probabilistic models of information retrieval without relevance information.. Journal of Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
- Lillian Lee. IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model. SIGIRpp. 751–752, 2007. slides
- Nam Nguyen and Myle Ott. FA07 lecture guide, with question showing that the hyperbolic estimate can be thought of as a “special case” of the linear model
- Stephen E. Robertson and Steve Walker. On relevance weights with little relevance information. SIGIR, pp. 16–24 (1997)
Additional references:
- The Spärck Jones / Robertson IDF page
- Kenneth W. Church and William A. Gale. Inverse document frequency (IDF): A measure of deviations from Poisson. Third Workshop on Very Large Corpora (WVLC), pp. 121–130, 1995. ps
- Warren Greiff. A theory of term weighting based on exploratory data analysis. SIGIR 1998, pages 11–19.
- Kishore Papineni. Why inverse document frequency? NAACL (2001).
- Thomas Roelleke and Jun Wang. TF-IDF uncovered: a study of theories and probabilities SIGIR, pp 435–442, 2008. slides

7. (Feb 18) Two-Poisson models and BM weighting

Lecture guide by Taiyang Chen. See also the the problems in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina and the problems in the FA07 lecture guide by Cristian Danescu-Niculescu-Mizil
Handouts: excerpt from Singhal (2001) comparing BM25 and pivoted-document-length scoring functions
Lecture references:
- Abraham Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312–318 (1974).
- Kenneth W. Church and William A. Gale Poisson mixtures. Natural Language Engineering 1(2): 163–190, 1995.
- Stephen P. Harter. A probabilistic approach to automatic keyword indexing, part I: On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4):197–206, 1975.
- Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR, pp. 232–241 (1994). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
Additional references:
- David Blei and John Lafferty. Topic models. In A. Srivastava and M. Sahami, editors, Text Mining: Theory and Applications, 2009.
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. Proceedings of TREC 7, pp. 253–264, 1999. pdf.gz,

5. (Feb 11) Introduction to (Robertson/Spärck Jones) probabilistic retrieval

Lecture guide by Ellis Weng and Andrew Owens
Lecture references:
- Stephen Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–46 (1976).
Additional references:
- Probabilistic information retrieval chapter of the Manning/Raghavan/Schütze book.
- Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992).
- C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
- Stephen Robertson and Hugo Zaragoza. The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends in Information Retrieval 3(4):pp. 333–389, 2009.
- Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). html of preface, pdf of Part 1, pdf of Part 2

6. (Feb 16) RSJ probabilistic retrieval: binary models and the IDF

Lecture guide: none, but see the problem in the FA07 lecture guide by Nam Nguyen and Myle Ott
Lecture references:
- W. Bruce Croft and D. Harper. Using probabilistic models of information retrieval without relevance information.. Journal of Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
- Lillian Lee. IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model. SIGIRpp. 751–752, 2007. slides
- Nam Nguyen and Myle Ott. FA07 lecture guide, with question showing that the hyperbolic estimate can be thought of as a “special case” of the linear model
- Stephen E. Robertson and Steve Walker. On relevance weights with little relevance information. SIGIR, pp. 16–24 (1997)
Additional references:
- The Spärck Jones / Robertson IDF page
- Kenneth W. Church and William A. Gale. Inverse document frequency (IDF): A measure of deviations from Poisson. Third Workshop on Very Large Corpora (WVLC), pp. 121–130, 1995. ps
- Warren Greiff. A theory of term weighting based on exploratory data analysis. SIGIR 1998, pages 11–19.
- Kishore Papineni. Why inverse document frequency? NAACL (2001).
- Thomas Roelleke and Jun Wang. TF-IDF uncovered: a study of theories and probabilities SIGIR, pp 435–442, 2008. slides

7. (Feb 18) Two-Poisson models and BM weighting

Lecture guide by Taiyang Chen. See also the the problems in the FA07 lecture guide by Nikos Karampatziakis and Ainur Yessenalina and the problems in the FA07 lecture guide by Cristian Danescu-Niculescu-Mizil
Handouts: excerpt from Singhal (2001) comparing BM25 and pivoted-document-length scoring functions
Lecture references:
- Abraham Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312–318 (1974).
- Kenneth W. Church and William A. Gale Poisson mixtures. Natural Language Engineering 1(2): 163–190, 1995.
- Stephen P. Harter. A probabilistic approach to automatic keyword indexing, part I: On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4):197–206, 1975.
- Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR, pp. 232–241 (1994). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
Additional references:
- David Blei and John Lafferty. Topic models. In A. Srivastava and M. Sahami, editors, Text Mining: Theory and Applications, 2009.
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. Proceedings of TREC 7, pp. 253–264, 1999. pdf.gz,

8 (Feb 23) Intro to the language-modeling approach to IR

Lecture references:
- Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC-7: Ad hoc and cross language track
- John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds.,Language Modeling and Information Retrieval (2003).
- Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. SIGIR (1998), pp. 275–281.
- Karen Spärck Jones and Stephen Robertson. LM vs PM: where's the relevance. 2001 Workshop on Language Modeling and Information Retrieval.
- ChengXiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR(2001). pdf of journal version, TOIS 22(2):179--214
Additional references:
- The Lemur toolkit for language modeling and information retrieval
- Oren Kurland and Lillian Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. SIGIR, pp. 306–313, 2005.
- David MacKay and Linda Peto. A hierarchical Dirichlet language model. Natural Language Engineering 1(3):289–307 (1995).
- Thomas Roelleke and Jun Wang. TF-IDF uncovered: a study of theories and probabilities. SIGIR 2008, pp. 435–442. slides
- Chengxiang Zhai. Statistical Language Models for Information Retrieval: A Critical Review. Now Publishers (2008). Also Foundations and Trends in Information Retrieval 2(3):137–213.

9 (Feb 25) About query likelihood; relevance LMs

Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
Lecture references:
- Victor Lavrenko and W. Bruce Croft. Relevance-based language models. SIGIR (2001)

10 (Mar 2) More on language models

Lecture guide by Jerzy Hausknecht and Kent Sutherland
Lecture references:
- Google n-gram corpus. announcement (with stats)
- John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR(2001)
- Voltaire. Candide. Google books preview
Additional references:
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 1993.
- Thomas Cover and Joy A. Thomas. Elements of information theory. Wiley (1991) book homepage

11 (Mar 4) The Good-Turing estimate

Lecture guide, by Ellis Weng and Andrew Owens
Lecture references:
- I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264 (1953). . .
- Frederick Jelink and Robert Mercer. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin 28: 2591–2594 (1985).
- Arthur Nádas. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(6):1414–1416, 1985.
Additional references:
- Frederick Jelinek, Chapter 15 “Estimation of probabilities from counts and the back-off method” of Statistical methods for speech recognition, MIT Press (1997). Google books preview
- Alon Orlitsky, Narayana P. Santhanam, and Junan Zhang give Codebreakers: The Inside Story of Bletchley Park by Hinsley and Stripp as a source for asserting that Good and Turing were working on deciphering the Enigma code.

12 (Mar 9) Smoothing; LM evaluation

Handouts: Sample midterm (from 2006). Also available are brief solution notes (see section 2 of document)
Lecture references:
- Frederick Jelinek and Robert L. Mercer. Interpolated estimation of Markov source parameters from sparse data. Workshop on Pattern Recognition in Practice, 1980.
- Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):400–401 (1987).
Additional references:
- Timothy Bell, John Cleary and Ian Witten. Text compression (1990)
- Stanley F. Chen and Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. ACL, 1996. techrpt version
- William Gale and Kenneth W. Church. What's wrong with adding one?. In Corpus-Based Research into Language: In honour of Jan Aarts, 1994.
- Frederick Jelinek. Some of my best friends are linguists. Language Resources and Evaluation 39:25–34, 2005.

13 (Mar. 16) Zipf's law and Miller's monkeys

handout
Lecture references:
- J.B. Estoup, 1916. Gammes Stenographiques. Institut Stenographique de France.
- W. Nelson Francis and Henry Kucera, 1982. Frequency Analysis of English Usage. Houghton Mifflin. Also of potential interest: the Brown corpus manual, 1979; and a search interface.
- Benoit Mandelbrot, 1957. Th�orie math�matique de la d'Estoup-Zipf. Inst. de Statistique de l'Univ�rsit�.
- George A. Miller, 1957. Some effects of intermittent silence. American Journal of Psychology 70, pp. 311-313.
- George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press.
- See also section 1.4 of the Manning/Schütze text or chapter 4 of Timothy C. Bell, John G. Cleary, and Ian H. Witten, Text Compression, 1990.
Additional references:
- Zipf's Law (webpage by Wentian Li). Many Zipf's Law refs in many fields.

8 (Feb 23) Intro to the language-modeling approach to IR

Lecture references:
- Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC-7: Ad hoc and cross language track
- John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds.,Language Modeling and Information Retrieval (2003).
- Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. SIGIR (1998), pp. 275–281.
- Karen Spärck Jones and Stephen Robertson. LM vs PM: where's the relevance. 2001 Workshop on Language Modeling and Information Retrieval.
- ChengXiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR(2001). pdf of journal version, TOIS 22(2):179--214
Additional references:
- The Lemur toolkit for language modeling and information retrieval
- Oren Kurland and Lillian Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. SIGIR, pp. 306–313, 2005.
- David MacKay and Linda Peto. A hierarchical Dirichlet language model. Natural Language Engineering 1(3):289–307 (1995).
- Thomas Roelleke and Jun Wang. TF-IDF uncovered: a study of theories and probabilities. SIGIR 2008, pp. 435–442. slides
- Chengxiang Zhai. Statistical Language Models for Information Retrieval: A Critical Review. Now Publishers (2008). Also Foundations and Trends in Information Retrieval 2(3):137–213.

9 (Feb 25) About query likelihood; relevance LMs

Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
Lecture references:
- Victor Lavrenko and W. Bruce Croft. Relevance-based language models. SIGIR (2001)

10 (Mar 2) More on language models

Lecture guide by Jerzy Hausknecht and Kent Sutherland
Lecture references:
- Google n-gram corpus. announcement (with stats)
- John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR(2001)
- Voltaire. Candide. Google books preview
Additional references:
- Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 1993.
- Thomas Cover and Joy A. Thomas. Elements of information theory. Wiley (1991) book homepage

11 (Mar 4) The Good-Turing estimate

Lecture guide, by Ellis Weng and Andrew Owens
Lecture references:
- I. J. Good. The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264 (1953). . .
- Frederick Jelink and Robert Mercer. Probability distribution estimation from sparse data. IBM Technical Disclosure Bulletin 28: 2591–2594 (1985).
- Arthur Nádas. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-33(6):1414–1416, 1985.
Additional references:
- Frederick Jelinek, Chapter 15 “Estimation of probabilities from counts and the back-off method” of Statistical methods for speech recognition, MIT Press (1997). Google books preview
- Alon Orlitsky, Narayana P. Santhanam, and Junan Zhang give Codebreakers: The Inside Story of Bletchley Park by Hinsley and Stripp as a source for asserting that Good and Turing were working on deciphering the Enigma code.

12 (Mar 9) Smoothing; LM evaluation

Handouts: Sample midterm (from 2006). Also available are brief solution notes (see section 2 of document)
Lecture references:
- Frederick Jelinek and Robert L. Mercer. Interpolated estimation of Markov source parameters from sparse data. Workshop on Pattern Recognition in Practice, 1980.
- Slava Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 35(3):400–401 (1987).
Additional references:
- Timothy Bell, John Cleary and Ian Witten. Text compression (1990)
- Stanley F. Chen and Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. ACL, 1996. techrpt version
- William Gale and Kenneth W. Church. What's wrong with adding one?. In Corpus-Based Research into Language: In honour of Jan Aarts, 1994.
- Frederick Jelinek. Some of my best friends are linguists. Language Resources and Evaluation 39:25–34, 2005.

13 (Mar. 16) Zipf's law and Miller's monkeys

handout
Lecture references:
- J.B. Estoup, 1916. Gammes Stenographiques. Institut Stenographique de France.
- W. Nelson Francis and Henry Kucera, 1982. Frequency Analysis of English Usage. Houghton Mifflin. Also of potential interest: the Brown corpus manual, 1979; and a search interface.
- Benoit Mandelbrot, 1957. Th�orie math�matique de la d'Estoup-Zipf. Inst. de Statistique de l'Univ�rsit�.
- George A. Miller, 1957. Some effects of intermittent silence. American Journal of Psychology 70, pp. 311-313.
- George Zipf, 1949. Human Behavior and the principle of least effort. Addison-Wesley Press.
- See also section 1.4 of the Manning/Schütze text or chapter 4 of Timothy C. Bell, John G. Cleary, and Ian H. Witten, Text Compression, 1990.
Additional references:
- Zipf's Law (webpage by Wentian Li). Many Zipf's Law refs in many fields.

14 (Mar 30) Relevance feedback

Lecture references:
- Ide, E. 1971. New experiments in relevance feedback. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 337–354 (1971). book ACM metadata
- E. Ide and G. Salton. Interactive search strategies and dynamic file organization in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 373–393 (1971).
- Stephen Robertson. On term selection for query expansion.Journal of Documentation 46(4):359–364 (1990)
- J. J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 313–323 (1971).
- Ian Ruthven and Mounias Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review18(2):95–145 (2003).
- Jaime Teevan, Eytan Adar, Rosie Jones and Michael Potts. Information Re-retrieval: Repeat queries in Yahoo's logs. SIGIR(2007)
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Beyond the commons: Investigating the value of personalizing web search. Workshop on New Technologies for Personalized Information Access, 2005.
Additional references:
- Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
- Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). .
- Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990).

15 (Apr 1) Clickthrough data as implicit relevance feedback

Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
Lecture references:
- Thorsten Joachims. Optimizing search engines using clickthrough data. KDD, 2002
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. SIGIR, pp. 154–161 (2005).
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. ,Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems (TOIS), 25(2) (2007)
- Diane Kelly and Nicholas J. Belkin. Display time as implicit feedback: Understanding task effects. SIGIR (2004). SIGIR (2004).
- Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003).
- Masahiro Morita and Yoichi Shinoda. Information filtering based on user behavior analysis and best match text retrieval. SIGIR, pp. 272–281 (1994).
- Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. KDD, pp. 239–248 (2005)
- Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo and Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Technical Report, Computer and Information Science, University of Helsinki (2005). The PASCAL 2005 challenge
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search via automated analysis of interests and activities. SIGIR, pp. 449–456 (2005).
Additional references:
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115–132 (2000)
- Lori Lorigo, Bing Pan, Helene Hembrooke, Thorsten Joachims, Laura Granka, and Geri Gay. The influence of task and gender on search and evaluation behavior using Google Information Processing and Management 42(4):1123–1131, 2006.
- Xuehua Shen, Bin Tan, and ChengXiang Zhai. Context-sensitive information retrieval using implicit feedback. SIGIR, pp. 43–50 (2005).

16 (Apr 13) End relevance feedback; begin syntactic structure

Lecture guide by Ellis Weng and Andrew Owens
Lecture references
- Tobias Wolff. Bullet in the brain. Reprinted in Our Story Begins, Knopf, 2008. Wolff reading from it
Additional references:
- The Gerard Salton award is given every three years as a lifetime achievement award in information retrieval. The lectures of the awardees often address their visions of the future of the field.

14 (Mar 30) Relevance feedback

Lecture references:
- Ide, E. 1971. New experiments in relevance feedback. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 337–354 (1971). book ACM metadata
- E. Ide and G. Salton. Interactive search strategies and dynamic file organization in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 373–393 (1971).
- Stephen Robertson. On term selection for query expansion.Journal of Documentation 46(4):359–364 (1990)
- J. J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 313–323 (1971).
- Ian Ruthven and Mounias Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review18(2):95–145 (2003).
- Jaime Teevan, Eytan Adar, Rosie Jones and Michael Potts. Information Re-retrieval: Repeat queries in Yahoo's logs. SIGIR(2007)
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Beyond the commons: Investigating the value of personalizing web search. Workshop on New Technologies for Personalized Information Access, 2005.
Additional references:
- Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
- Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). .
- Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990).

15 (Apr 1) Clickthrough data as implicit relevance feedback

Lecture guide by Navin Sivakumar, Lakshmi Ganesh, and Taiyang Chen
Lecture references:
- Thorsten Joachims. Optimizing search engines using clickthrough data. KDD, 2002
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. SIGIR, pp. 154–161 (2005).
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. ,Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems (TOIS), 25(2) (2007)
- Diane Kelly and Nicholas J. Belkin. Display time as implicit feedback: Understanding task effects. SIGIR (2004). SIGIR (2004).
- Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003).
- Masahiro Morita and Yoichi Shinoda. Information filtering based on user behavior analysis and best match text retrieval. SIGIR, pp. 272–281 (1994).
- Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. KDD, pp. 239–248 (2005)
- Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo and Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Technical Report, Computer and Information Science, University of Helsinki (2005). The PASCAL 2005 challenge
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search via automated analysis of interests and activities. SIGIR, pp. 449–456 (2005).
Additional references:
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115–132 (2000)
- Lori Lorigo, Bing Pan, Helene Hembrooke, Thorsten Joachims, Laura Granka, and Geri Gay. The influence of task and gender on search and evaluation behavior using Google Information Processing and Management 42(4):1123–1131, 2006.
- Xuehua Shen, Bin Tan, and ChengXiang Zhai. Context-sensitive information retrieval using implicit feedback. SIGIR, pp. 43–50 (2005).

16 (Apr 13) End relevance feedback; begin syntactic structure

Lecture guide by Ellis Weng and Andrew Owens
Lecture references
- Tobias Wolff. Bullet in the brain. Reprinted in Our Story Begins, Knopf, 2008. Wolff reading from it
Additional references:
- The Gerard Salton award is given every three years as a lifetime achievement award in information retrieval. The lectures of the awardees often address their visions of the future of the field.

16 (Apr 13) End relevance feedback; begin syntactic structure

Lecture guide by Ellis Weng and Andrew Owens
Lecture references
- Tobias Wolff. Bullet in the brain. Reprinted in Our Story Begins, Knopf, 2008. Wolff reading from it
Additional references:
- The Gerard Salton award is given every three years as a lifetime achievement award in information retrieval. The lectures of the awardees often address their visions of the future of the field.

17 (Apr 15) Feature-based CFGs with unification constraints

Lecture guide by Jerzy Hausknecht and Kent Sutherland
Handouts: sample CFG
Additional references:
- Chapter 12.1 (“Constituency”)-12.3 (“Some grammar rules for English”) of the Jurafsky and Martin text (2nd edition)
- Chapter 15.1 (“Feature structures”), 15.2 (“Unification of feature structures”), 15.3 (“Feature structures in the grammar”)
- Chapter 9 (“Building feature based grammars”) of Natural Language Processing with Python --- Analyzing Text with the Natural Language Toolkit, Steven Bird, Ewan Klein, and Edward Loper. O'Reilly, 2009.
- Michael Collins. Three generative, lexicalised models for statistical parsing. ACL, 1997.
- Kevin Knight. Unification: A multidisciplinary survey. ACM Computing Surveys 21(1), 1989.

18 (Apr 20). Feature-based CFGs; TAGs

Handouts: feature-based CFG example rules
Lecture references:
- Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
Additional references:
- James Allen. Section 5.3: Handling questions in context-free grammars. Natural Language Understanding, second edition. Benjamin/Cummings (1995).
- Anthony Kroch and Aravind Joshi. The linguistic relevance of tree adjoining grammar. UPenn Technical Report MS-CIS-85-16
- Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [pdf]
- Fernando C. N. Pereira and Stuart M. Shieber. Sections 4.2.3-4.2.7 of Prolog and Natural-language Analysis. CLSI (1987)
- Ivan Sag, Thomas Wasow, and Emily Bender. Syntactic Theory: A Formal Introduction (2nd ed). CLSI 2003.
- K. Vijay-Shankar and Aravind K. Joshi. Some computational properties of tree adjoining grammars. ACL, pp. 82–93 (1985).
- K. Vijayashanker. A study of tree adjoining grammars. PhD thesis, University of Pennsylvania, 1988.
- The XTAG project. project home page.

19 (Apr 22) More on TAGs

Handouts: Sample tree substitution grammar
Additional references:
- Anoop Sarkar, Practical experiments in parsing using tree adjoining grammars. TAG+5, 2000.
- Giorgio Satta. Tree adjoining grammar parsing and Boolean matrix multiplication. Computational Linguistics 20(20:173–191, 1994.
- Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2):165–201.

20 (April 27) Feature-based TAGs

Lecture references:
- K. Vijay-Shanker and Aravind K. Joshi. Feature structures based tree adjoining grammars. COLING, pp. 714–719 (1988)
Additional references:
- Anne Abeillé and Yves Schabes. Parsing idioms in lexicalized TAGs. EACL, pp. 1–9 (1989).
- Aravind Joshi, K. Vijay-Shanker, and David Weir. The Convergence of Mildly Context-Sensitive Grammar Formalisms. In Peter Sells, Stuart Shieber and Thomas Wasow, Eds., Foundational Issues in Natural Language Processing, 1991. tech report version
- Yves Schabes and Richard C. Waters. Tree Insertion Grammar: A cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced. Computational Linguistics 21(4), 1995.

16 (Apr 13) End relevance feedback; begin syntactic structure

Lecture guide by Ellis Weng and Andrew Owens
Lecture references
- Tobias Wolff. Bullet in the brain. Reprinted in Our Story Begins, Knopf, 2008. Wolff reading from it
Additional references:
- The Gerard Salton award is given every three years as a lifetime achievement award in information retrieval. The lectures of the awardees often address their visions of the future of the field.

17 (Apr 15) Feature-based CFGs with unification constraints

Lecture guide by Jerzy Hausknecht and Kent Sutherland
Handouts: sample CFG
Additional references:
- Chapter 12.1 (“Constituency”)-12.3 (“Some grammar rules for English”) of the Jurafsky and Martin text (2nd edition)
- Chapter 15.1 (“Feature structures”), 15.2 (“Unification of feature structures”), 15.3 (“Feature structures in the grammar”)
- Chapter 9 (“Building feature based grammars”) of Natural Language Processing with Python --- Analyzing Text with the Natural Language Toolkit, Steven Bird, Ewan Klein, and Edward Loper. O'Reilly, 2009.
- Michael Collins. Three generative, lexicalised models for statistical parsing. ACL, 1997.
- Kevin Knight. Unification: A multidisciplinary survey. ACM Computing Surveys 21(1), 1989.

18 (Apr 20). Feature-based CFGs; TAGs

Handouts: feature-based CFG example rules
Lecture references:
- Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
Additional references:
- James Allen. Section 5.3: Handling questions in context-free grammars. Natural Language Understanding, second edition. Benjamin/Cummings (1995).
- Anthony Kroch and Aravind Joshi. The linguistic relevance of tree adjoining grammar. UPenn Technical Report MS-CIS-85-16
- Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [pdf]
- Fernando C. N. Pereira and Stuart M. Shieber. Sections 4.2.3-4.2.7 of Prolog and Natural-language Analysis. CLSI (1987)
- Ivan Sag, Thomas Wasow, and Emily Bender. Syntactic Theory: A Formal Introduction (2nd ed). CLSI 2003.
- K. Vijay-Shankar and Aravind K. Joshi. Some computational properties of tree adjoining grammars. ACL, pp. 82–93 (1985).
- K. Vijayashanker. A study of tree adjoining grammars. PhD thesis, University of Pennsylvania, 1988.
- The XTAG project. project home page.

19 (Apr 22) More on TAGs

Handouts: Sample tree substitution grammar
Additional references:
- Anoop Sarkar, Practical experiments in parsing using tree adjoining grammars. TAG+5, 2000.
- Giorgio Satta. Tree adjoining grammar parsing and Boolean matrix multiplication. Computational Linguistics 20(20:173–191, 1994.
- Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2):165–201.

20 (April 27) Feature-based TAGs

Lecture references:
- K. Vijay-Shanker and Aravind K. Joshi. Feature structures based tree adjoining grammars. COLING, pp. 714–719 (1988)
Additional references:
- Anne Abeillé and Yves Schabes. Parsing idioms in lexicalized TAGs. EACL, pp. 1–9 (1989).
- Aravind Joshi, K. Vijay-Shanker, and David Weir. The Convergence of Mildly Context-Sensitive Grammar Formalisms. In Peter Sells, Stuart Shieber and Thomas Wasow, Eds., Foundational Issues in Natural Language Processing, 1991. tech report version
- Yves Schabes and Richard C. Waters. Tree Insertion Grammar: A cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced. Computational Linguistics 21(4), 1995.

21 (Apr 29) PCFGs and EM

Lecture references:
- Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free grammars. Computational Linguistics 24(2):298–305 (1998).
- Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1997.
- Detlef Prescher. A tutorial on the Expectation-Maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. Presented at the 15th European Summer School in Logic, Language and Information (ESSLLI-03).
Additional references:
- Adam Berger. Convexity, maximum likelihood and all that. Manuscript.
- Michael Collins. The EM algorithm (1997)
- Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions. Second edition. Wiley (2006). errata for first printing
- Ted Pedersen. The EM algorithm: selected readings Manuscript, 2001.
- Stefan Riezler. Getting EM to work. talk slides

22. Finish EM, start discourse

Handouts: discourse examples
Lecture references:
- James Baker. Trainable grammars for speech recognition. Journal of the Acoustical Society of America 65(S1):S132–132, 1979.
- Jerry R. Hobbs. Resolving Pronoun References, Lingua 44:311–338, 1978. Reprinted in Grosz, Sparck Jones, and Webber, Readings in Natural Language Processing.
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977)
- Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4(1):35–56, 1990.
- William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3): 243–281, 1988.
- Fernando Pereira and Yves Schabes. Inside-outside reestimation from partially bracketed corpora. ACL, pp. 128–135, 1992.
- Remko Scha and Livia Polanyi. An Augmented Context Free Grammar for Discourse. COLING, pp. 573–577, 1988.

21 (Apr 29) PCFGs and EM

Lecture references:
- Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free grammars. Computational Linguistics 24(2):298–305 (1998).
- Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1997.
- Detlef Prescher. A tutorial on the Expectation-Maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. Presented at the 15th European Summer School in Logic, Language and Information (ESSLLI-03).
Additional references:
- Adam Berger. Convexity, maximum likelihood and all that. Manuscript.
- Michael Collins. The EM algorithm (1997)
- Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions. Second edition. Wiley (2006). errata for first printing
- Ted Pedersen. The EM algorithm: selected readings Manuscript, 2001.
- Stefan Riezler. Getting EM to work. talk slides

22. Finish EM, start discourse

Handouts: discourse examples
Lecture references:
- James Baker. Trainable grammars for speech recognition. Journal of the Acoustical Society of America 65(S1):S132–132, 1979.
- Jerry R. Hobbs. Resolving Pronoun References, Lingua 44:311–338, 1978. Reprinted in Grosz, Sparck Jones, and Webber, Readings in Natural Language Processing.
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977)
- Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4(1):35–56, 1990.
- William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3): 243–281, 1988.
- Fernando Pereira and Yves Schabes. Inside-outside reestimation from partially bracketed corpora. ACL, pp. 128–135, 1992.
- Remko Scha and Livia Polanyi. An Augmented Context Free Grammar for Discourse. COLING, pp. 573–577, 1988.

22. Finish EM, start discourse

Handouts: discourse examples
Lecture references:
- James Baker. Trainable grammars for speech recognition. Journal of the Acoustical Society of America 65(S1):S132–132, 1979.
- Jerry R. Hobbs. Resolving Pronoun References, Lingua 44:311–338, 1978. Reprinted in Grosz, Sparck Jones, and Webber, Readings in Natural Language Processing.
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977)
- Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4(1):35–56, 1990.
- William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3): 243–281, 1988.
- Fernando Pereira and Yves Schabes. Inside-outside reestimation from partially bracketed corpora. ACL, pp. 128–135, 1992.
- Remko Scha and Livia Polanyi. An Augmented Context Free Grammar for Discourse. COLING, pp. 573–577, 1988.

23 (May 6) Local and global theories of discourse

Handouts: final exam guide [and here are solutions], discourse examples
Lecture references:
- Susan E. Brennan, Marilyn W. Friedman, Carl J. Pollard. A centering approach to pronouns. ACL, 155–162, 1987.
- Barbara Grosz, Aravind Joshi, and Scott Weinstein. Providing a unified account of definite noun phrases in discourse. ACL, 44–50, 1983. journal version: “Centering: A framework for modeling the local coherence of discourse”, 1995
- Barbara J. Grosz and Candace L. Sidner, 1986.Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12(3):175–204.
Additional references:
- Marilyn Walker. Limited attention and discourse structure. Computational Linguistics 22(2):255–264, 1996.

22. Finish EM, start discourse

Handouts: discourse examples
Lecture references:
- James Baker. Trainable grammars for speech recognition. Journal of the Acoustical Society of America 65(S1):S132–132, 1979.
- Jerry R. Hobbs. Resolving Pronoun References, Lingua 44:311–338, 1978. Reprinted in Grosz, Sparck Jones, and Webber, Readings in Natural Language Processing.
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977)
- Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4(1):35–56, 1990.
- William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3): 243–281, 1988.
- Fernando Pereira and Yves Schabes. Inside-outside reestimation from partially bracketed corpora. ACL, pp. 128–135, 1992.
- Remko Scha and Livia Polanyi. An Augmented Context Free Grammar for Discourse. COLING, pp. 573–577, 1988.

23 (May 6) Local and global theories of discourse

Handouts: final exam guide [and here are solutions], discourse examples
Lecture references:
- Susan E. Brennan, Marilyn W. Friedman, Carl J. Pollard. A centering approach to pronouns. ACL, 155–162, 1987.
- Barbara Grosz, Aravind Joshi, and Scott Weinstein. Providing a unified account of definite noun phrases in discourse. ACL, 44–50, 1983. journal version: “Centering: A framework for modeling the local coherence of discourse”, 1995
- Barbara J. Grosz and Candace L. Sidner, 1986.Attention, Intentions, and the Structure of Discourse. Computational Linguistics 12(3):175–204.
Additional references:
- Marilyn Walker. Limited attention and discourse structure. Computational Linguistics 22(2):255–264, 1996.