CS 674/INFO 630, Fall 2007 (original) (raw)

CS 674/INFO 630, Fall 2007: Advanced Language Technologies (or, IR, NLP, and special guests;)

Prof. Lillian Lee (follow link for contact information and office hours)
TR 1:25-2:40, Hollister 312.
Final exam: Thursday, December 13, 2-4:30pm, Thurston 202.

Short description: This course is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. Please see theCourse description and policieshandout for more information.

Tentative syllabus: Three fundamental paradigms in information retrieval: the vector-space model; the (Robertson-Spärck Jones) probabilistic retrieval paradigm; the language-modeling approach. Relevance feedback, explicit and implicit. Latent Semantic Indexing (LSI). Feature-based context-free grammars (CFGs). Tree adjoining grammars (TAGs). Parsing. The Expectation-Maximization (EM) algorithm. Maximum-entropy modeling.

Administrative info:

Resources: (reference texts, other lecture notes and slides, etc.)

The homepage for the previous running of this course may also be useful.

Lectures:


Quick links: starts of the units on: the vector space model (8/28/07); RSJ probabilistic retrieval (9/11/07); language-modeling approaches to IR (9/20/07); relevance feedback (10/2/07); implicit relevance feedback (10/11/07); the SVD and LSI(10/30/07); CFGs (11/8/07); TAGs (11/15/07); EM (11/27/07)


  1. 8/23/07: "Sense and Sensibility: Automatically Analyzing Subject and Sentiment in Human-Authored Texts"
    (a prefatory lecture; students are not responsible for the material that was covered)
    • Handouts: course description and policies [pdf]
    • Main lecture references:
      * Rie Kubota Ando and Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162, 2001. [pdf]
      * Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL, pp. 271–278, 2004. [pdf]
    • Additional material and references:
      * slides for an hour-long talk on IRR: pdf (Part One), pdf (Part Two)
      * Rie Kubota Ando. The Document Representation Problem: An Analysis of LSI and Iterative Residual Rescaling. Ph.D. thesis, 2001. [pdf]
      * Cornell movie-review datasets and congressional-debate datasets for performing sentiment-analysis experiments on.

  1. 8/28/07: Basics of information retrieval; the vector-space model
    • Lecture guide
    • Handouts: Example problem for lecture-guide preparation [pdf]
    • Lecture references:
      * TREC conference proceedings. [website]. See especially the overviews.
    • Additional references:
      * David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004. [html]An interesting sidelight on the origins of the vector space model, the assumptions underlying the VSM, and phantom citations.
      * Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. Best paper award. [pdf,pdf2]
      * Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
      * Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001. [pdf]
      * Karen Spärck Jones. IDF term weighting and IR research lessons. Journal of Documentation 60(5):521–523 (2004). [pdf]

  1. 8/30/07: length normalization (who'da think?)
    • Lecture guide
    • Lecture references:
      * Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira. AT&T at TREC-7. [pdf, pdf.gz]

  1. 9/4/07: pivoted document-length normalization
    • Lecture guide
    • Lecture references:
      * Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. SIGIR 1996. [pdf, pdf2]
    • Additional references:
      * Tze Leung Chung, Robert Wing Pong Luk, Kam Fai Wong, Kui Lam Kwok, Dik Lun Lee. Adapting pivoted document-length normalization for query size: Experiments in Chinese and English. ACM Transactions on Asian Language Information Processing (TALIP) 5(3):245--263 (2006). [pdf]
      * Jaap Kamps, Maarten de Rijke, and Börkur Sigurbjörnsson. Length normalization in XML retrieval. SIGIR 2004 [abstract,pdf,pdf of 2005 version in Information Retrieval]

  1. 9/6/07: remarks on evaluation (of SBM SIGIR '96, and beyond)
    • Lecture references:
      * Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. SIGIR, pp. 25–32, 2004. [pdf,pdf2]
      * Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and Ian Soboroff. Reliable Information Retrieval Evaluation with Incomplete and Biased Judgements. SIGIR, pp. 63–70, 2007 [pdf, pdf2]
      * Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir Frieder. Document normalization revisited. SIGIR poster (2002). [pdf,pdf2]
      * Tetsuya Sakai. Alternatives to Bpref. SIGIR, pp. 71–78, 2007. [pdf,pdf of talk slides]
      * Section 3.3.2 of Ellen Voorhees and Donna Harman. Overview of the Eighth Text REtrieval Conference (TREC-8) [good ps, poor pdf].
      * Justin Zobel. How reliable are large-scale information retrieval experiments? SIGIR 1998 [pdf,pdf2, ]
    • Additional references:
      * Ben He, Iadh Ounis. On setting the hyper-parameters of term frequency normalization for information retrieval. TOIS 25(3), 2007. [pdf].For a different retrieval paradigm, but still suggestive in the context of this lecture.
      * Mark Sanderson and Hideo Joho. Forming test collections with no system pooling. SIGIR 2004. [pdf]

  1. 9/11/07: Introduction to probabilistic retrieval (Robertson-Spärck Jones version)
    <liLecture guide</li
    • Lecture references:
      * Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf of Part 1, pdf of Part 2]
      * William S. Cooper. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems (TOIS), pp. 100–111, 1995. [pdf]
      * Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992). [pdf]
      * Stephen Robertston and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-46 (1976). [pdf]
      * C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in In John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
    • Additional references:
      * Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain Campbell. "Is this document relevant? ...probably": A survey of probabilistic models in information retrieval. ACM Computing Surveys 30(4) (1998). [pdf,pdf2]
      * Norbert Fuhr. Probabilistic models in information retrieval. The Computer Journal (1992) [pdf]
      * Bojidar Mateev, Elke Mittendorf and Peter Schäuble. Where the Linked Dependence Assumption Fails and How to Move Beyond It. 19th Annual BCS-IRSG Colloquium on IR. [pdf]
      * M. E. Maron. Probabilistic approaches to the document retrieval problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for interpreting the probability of relevance, and also gives a personal historical perspective on information science (as Maron construes it).
      * Stephen Robertson. The probability ranking principle in IR. Journal of Documentation 33 (1977). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).Appendix gives an example where ranking by probability of relevance is suboptimal.

  1. 9/13/07: Probabilistic retrieval with binary attribute variables: derivations of the IDF
    • Lecture guide
    • Lecture references:
      * W. Bruce Croft and D. Harper. Using probabilistic models of information retrieval without relevance information. Journal of Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
      * L. Lee. IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model. SIGIR pp. 751–752, 2007. [pdf,pdf2]
      * Stephen E. Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science (JASIS) 27(3):129–146, 1976. Reprinted in Peter Willett (ed.), Document Retrieval Systems, pp. 143–160, 1988. [pdf].
      * Stephen E. Robertson and Steve Walker. On relevance weights with little relevance information. SIGIR, pp. 16–24 (1997) [pdf].
    • Additional references:
      * Kenneth W. Church and William A. Gale. Inverse document frequency (IDF): A measure of deviations from Poisson. Third Workshop on Very Large Corpora (WVLC), pp. 121–130, 1995. [pdf, ps]
      * Kishore Papineni. Why inverse document frequency? NAACL (2001). [pdf]
      * Stephen Robertson. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60(5):503–520 (2004). [pdf]
      * Arjen P. de Vries and Thomas Roelleke. Relevance Information: A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf,pdf2]

  1. 9/18/07: Probabilistic retrieval with attribute-count variables: Poisson-based models
    • Lecture guide
    • Lecture references:
      * Abraham Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312–318 (1974). [pdf]
      * Stephen P. Harter. A probabilistic approach to automatic keyword indexing, part I: On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4):197–206, 1975. [pdf]
    • Additional references:
      * Eugene L. Margulis. N-Poisson document modelling. SIGIR, pp. 177–189 (1992). [pdf]

  1. 9/20/07: Derivation of BM/Okapi term weighting. Intro to the language-modeling paradigm for IR
    • Lecture guides: part one, part two
      * John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds., Language Modeling and Information Retrieval (2003). [pdf]
      * Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. SIGIR (1998). [pdf]
      * Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
    • Additional references:
      * S. E. Robertson and S. Walker and S. Jones and M. M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3. Proceedings of Trec 3, pp. 109–126, 1995. [pdf,ps.gz]
      * S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. Proceedings of TREC 7, pp. 253–264, 1999. [ps.gz(might be better),pdf.gz,pdf2]

  1. 9/25/07: More on the LM approach

  1. 9/27/07: Completion of the LM approach

  1. 10/2/07: Introduction to relevance feedback

  1. 10/4/07: Automatic query expansion (AQE) and interactive query expansion (IQE)

  1. 10/11/07: Implicit feedback. Further notes on language models (in preparation for combining them with implicit feedback).

  1. 10/16/07: More on language models, in preparation for combining them with relevance feedback


  1. 10/23/07 An LM-based approach to implicit feedback
    • Lecture guide
    • Lecture references:
      * Xuehua Shen, Bin Tan, and ChengXiang Zhai. Context-sensitive information retrieval using implicit feedback. SIGIR, pp. 43–50 (2005). [pdf,pdf2]
      * John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR (2001) [ps,pdf;long version]

  1. 10/25/07: Clickthrough data as implicit feedback
    • Lecture guide
    • Lecture references:
      * Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. KDD, pp. 239–248 (2005) [pdf,pdf2]Co-winner, best student paper award.
      * Thorsten Joachims. Optimizing search engines using clickthrough data. KDD, 2002 [pdf,pdf2]
      * Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. SIGIR, pp. 154–161 (2005). [pdf,pdf2]
      * Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems (TOIS), 25(2) (2007) [pdf,pdf2]
    • Additional references:
      * Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115–132 (2000) [ps.gz]

  1. 10/30/07: The term-document matrix, revisited (preparation for the SVD)
    • Lecture guide
    • Lecture references:
      * Lloyd N. (Nick) Trefethen and David Bau III. Numerical Linear Algebra (1997). [book homepage] See lectures one,fourand five(all from part I, "Fundamentals").

  1. 11/1/07: The singular value decomposition and “approximating ± convex hulls ”
    • Additional references:
      * Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition (1996). [book homepage]. The "Bible" on, well, matrix computations. Extensive coverage of the singular value decomposition (SVD), although certainly not as gentle an introduction as in class.

  1. 11/6/07: "Few-factor" representations (LSI, pLSI, and others)
    • Lecture guide
    • Lecture references:
      * David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. NIPS 16 (2003). [pdf]
      * Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391–407 (1990). [pdf,pdf2]
      * Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR, pp. 50–57 (1999). [pdf,pdf2]
    • Additional references:
      * Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162 (2001) [pdf,pdf2]
      * Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared Saia. Spectral analysis of data. Symposium on Theory of Computing (STOC), pp. 619–626 (2001) [pdf,(longer?) ps]
      * Thomas Hofmann and Jan Puzicha. Statistical models for co-occurrence data. AI Memo 1625, MIT (1998). [pdf]
      * Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2):211–240 (1997). [pdf,html,html2]
      * Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791 (1999) [pdf]
      * Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: a probabilistic analysis. Symposium on Principles of Database Systems, pp. 159–168 (1998) [pdf]. Journal version: Journal of Computer and Systems Science 61(20):217–235 (2000) [ps]
      * Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of English words. Proceedings of the ACL (1993). [pdf]
      * Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping e-rater: Challenging the validity of automated essay scoring (2001). [pdf]An interesting study in which knowledgeable humans were asked to produce essays that would be incorrectly scored by automatic methods.
      * Michael Steele and David Aldous. Introduction to the interface of probability and algorithms. Probability and Algorithms: A Panel Report. National Academy of Sciences (1993). [pdf]One subsection gives a quick calculation for the expected number of occupied tables in the Chinese restaurant process.
      * Demo: score your own essay (on psycholinguistics) with LSI. [home page]

  1. 11/8/07: Modeling syntactic structure: context-free grammars
    • Lecture guide
    • Lecture references:
      * Noam Chomsky. Three models for the description of language. In I.R.E. Transactions on Information Theory 2, pp. 113–124.
      * Igor Mel'čuk. Studies in Dependency Syntax, Karoma (1979).
      * Lucien Tesnière. Elements de Syntaxe Structurale, Klincksieck (1959).
    • Additional references:
      * James Allen. Section 5.2: Movement phenomena in language. Natural language understanding, second edition. Benjamin/Cummings (1995).
      * Christopher Culy. The complexity of the vocabulary of Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer link]
      * Daniel Jurafsky and James H. Martin. Chapter 9: Context-free grammars for English. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall (2000). [homepage]In the second edition, this is chapter 12 (draft form).
      * Lillian Lee. “I'm sorry Dave, I'm afraid I can't do that”: Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, National Academies Press, pp. 111–118, 2004. [pdf, html]
      * Geoff Pullum, 1986. Footloose and context-free. Natural language and linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo vocabulary hoax, U. of Chicago Press 1991. A very fun read.
      * Stuart Shieber. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer link]

  1. 4/13/06: Augmented context-free grammars
    • Lecture guide
    • Lecture references:
      * James Allen. Section 5.3: Handling questions in context-free grammars. Natural Language Understanding, second edition. Benjamin/Cummings (1995).
    • Additional references:
      * Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall (2000). [homepage]In the second edition, this is chapter 16 (draft form).
      * Fernando C. N. Pereira and Stuart M. Shieber. Prolog and natural-language analysis. CLSI (1987) [pdf].Sections 4.2.3-4.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe syntactic aspects of filler-gap dependencies.

  1. 11/15/07: Introduction to tree-adjoining grammars
    • Lecture references:
      * Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
    • Additional references:
      * Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [pdf]
      * K. Vijay-Shankar and Aravind K. Joshi. Some computational properties of tree adjoining grammars. Proceedings of the 23rd ACL, pp. 82–93 (1985). [pdf,pdf2]
      * The XTAG project. [home page]. Site includes a tutorial, grammars for English and Korean, synchronous-tag implementation, and a supertagging demo.

  1. 11/20/07: adjunction constraints and feature-based TAGS
    • Lecture references:
      * K. Vijay-Shanker and Aravind K. Joshi. Feature structures based tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
    • Additional references:
      * Anne Abeillé and Yves Schabes. Parsing idioms in lexicalized TAGs. Proc. of the 4th EACL, pp. 1--9 (1989). [pdf]
      * Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjölander, Rebecca C. Underwood, and David Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22(23):5112–20 (1994). [pdf]
      * David B. Searls. The linguistics of DNA. American Scientist 80(6):579–591. (1992)
      * Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210(2): 277–303, special issue on genome informatics (1999) [pdf]

  1. 11/27/07: The EM algorithm, part one

  1. 11/29/07: Conclusion of EM, and the course