Information Retrieval and Web Search (original) (raw)

Required textbook

Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze (Cambridge University Press, 2008).

This book is available from Amazon, the Stanford bookstore, or your favorite book purveyor. You can also download and print chapters for free at the book website. (We’d appreciate any reports of typos or of higher-level problems for the third printing.)

This book will be referred to as IIR in the reading assignments listed in the course schedule section.

Other useful references

Note:
Some of the slides and video links are from previous offering of the course. We leave them here for your reference and they will be updated/replaced by each lecture. * marks the latest updated slides.
The complementary videos are on Canvas, and the slides of the videos are linked below.

Course Schedule

Week Date Event Description & materials Readings & other resources
Week 1 Tues. 4/2 Lecture (Pandu) Introduction to the course Videos: "Semistructured Data" Slides:PPT |PDF/6 PDF/1 IIR chapter 1 MG section 3.2 MIR section 8.2 Shakespeare plays
Thurs. 4/4 Lecture (Chris) Inverted Indices: Dictionary and postings lists, boolean querying Videos: "Document Encodings" , "Tokens", "Terms", "Stemming", "Skip Lists" Slides:PPT |PDF/6 PDF/1 IIR chapter 2 MG sections 3.6, 4.3 MIR section 7.2 Porter's stemmer (MIR) Porter stemming algorithm (Official) A skip list cookbook (Pugh 1990) Fast phrase querying with combined indexes (Williams, Zobel, Bahle 2004) Efficient phrase querying with an auxiliary index (Bahle, Williams, Zobel 2002)
Week 2 Tues. 4/9 Lecture (Pandu) Index Construction Videos: "Index Construction" Slides:PPT |PDF/6 PDF/1 IIR chapter 4
Tues. 4/9 PA1 release Programming assignment #1 released
Thurs. 4/11 Lecture (Chris) Algorithms for postings list compression Videos: "Index Compression" Slides:PPT |PDF/6 PDF/1 IIR chapter 5 MG sections 3.3-3.4 Compression of inverted indexes for fast query evaluation (Scholer et al. 2002) Inverted index compression using word-aligned binary codes (Anh and Moffat 2005) Inverted index compression and query processing with optimized document ordering (Yan et al. 2009)
Week 3 Tues. 4/16 Lecture (Pandu) Spelling correction Videos: "Dictionaries and Tolerant Retrieval" Slides:PPT|PDF/6 PDF/1 IIR chapter 3 MG section 4.2 How to write a spelling corrector (Peter Norvig) Techniques for automatically correcting words in text (Kukich 1992) Finding approximate matches in large lexicons (Zobel and Dart 1995) Efficient Generation and Ranking of Spelling Error Corrections (Tillenius)
Tues. 4/16 PS1 release Problem set #1 released
Tues. 4/16 Query quiz release Query quiz released
Thurs. 4/18 Lecture (Pandu) Scoring, term weighting and the vector space model Videos: "Computing Scores" Slides:PPT |PDF/6 PDF/1 IIR chapter 7 IIR chapter 11
Sun. 4/20 Query quiz due Query quiz due
Week 4 Tues. 4/23 PA1 due Programming assignment #1 due
Tues. 4/23 Guest lecture Guest lecture by Joachim Kupke (Principal Software Engineer, Google) NOTE: attendance required for on-campus students
Tues. 4/23 PA2 release Programming assignment #2 released
Thurs. 4/25 Lecture (Chris) Probabilistic IR: the binary independence model, BM25, BM25F Videos: "Vector Space Model" Slides:PPT |PDF/6 PDF/1 IIR chapter 6 IIR chapter 11
Week 5 Tues. 4/30 PS1 due Problem set #1 due
Tues. 4/30 Lecture (Chris) Evaluation methods & NDCG Videos: "Result Summaries" Slides:PPT |PDF/6 PDF/1 IIR chapter 8 MG section 4.5 MIR chapter 3
Tues. 4/30 Ranking quiz release Ranking quiz released
Thurs. 5/2 Lecture (Pandu) Systems issues in efficient retrieval and scoring Slides: PPT |PDF/6 PDF/1 IIR chapter 6 IIR chapter 7 Efficient Query Evaluation using a Two-Level Retrieval Process (Broder et al. 2003)
Week 6 Tues. 5/7 PA2 due Programming assignment #2 due
Tues. 5/7 Lecture (Pandu) Classification and clustering in vector spaces(Naive Bayes, kNN, decision boundaries) Slides: PPT |PDF/6 PDF/1 Videos: "Naive Bayes" IIR chapter 13 IIR chapter 14 Reuters-21578 Machine learning in automated text categorization (Sebastiani 2002) A re-examination of text categorization methods (Yang et al. 1999) A Comparison of event models for naive Bayes text classification (McCallum et al. 1998) Tackling the poor assumptions of Naive Bayes classifier (Rennie et al. 2003) Machine learning in automated text categorization (Sebastiani 2002) A re-examination of text categorization methods (Yang et al. 1999) Evaluating and optimizing autonomous text classification systems (Lewis 1995) Tom Mitchell. Machine Learning. McGraw-Hill, 1997. Trevor Hastie, Robert Tibshirani, Jerome Friedman. Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2001. Open Calais Weka
Thurs. 5/9 Lecture (Chris) Text classification Slides: PPT |PDF/6 PDF/1 IIR chapter 15 Reuters-21578 A tutorial on support vector machines for pattern recognition (Burges 1998) Using SVMs for text categorization (Dumais 1998) Inductive learning algorithms and representations for text categorization (Dumais et al. 1998) A Re-examination of text categorization methods (Yang et al. 1999) Text categorization based on regularized linear classification methods (Zhang et al. 2001) A loss function analysis for classification methods in text categorization (Li et al. 2003) Trevor Hastie, Robert Tibshirani, Jerome Friedman. Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag,New York, 2001. Thorsten Joachims. Learning to Classify Text using Support Vector Machines. Kluwer, 2002.
Thurs/ 5/9 PA3 release Programming assignment #3 released
Week 7 Tues. 5/14 Lecture (Chris) Distributed word representations for IR Slides: PPT |PDF/6 PDF/1 Distributed Representations of Words and Phrases and their Compositionality (Mikolov et al., 2013) GloVe: Global Vectors for Word Representation (Pennington et al., 2014)
Tues. 5/14 PS2 released Problem set #2 released
Thurs. 5/16 Lecture (Chris) Learning to rank Slides: PPT |PDF/6 PDF/1 IIR sections 6.1.2-6.1.3 IIR section 15.4 LETOR benchmark datasets Discriminative models for information retrieval (Nallapati 2004) Adapting ranking SVM to document retrieval (Cao et al. 2006) A support vector method for optimizing average precision (Yue et al. 2007)
Week 8 Tues. 5/21 Lecture (Chris) Link analysis Slides: PPT |PDF/6 PDF/1 IIR chapter 21 Ranking the web frontier (Eiron et al. 2004) The WebGraph framework I: Compression techniques (Boldi et al. 2004) Extrapolation methods for accelerating PageRank computations (Kamvar et al. 2003) Searching the workplace web (Fagin et al. 2003
Thurs. 5/23 PS2 due Problem set #2 due
Thurs. 5/23 Guest lecture Guest lecture by Susan Dumais (Distinguished Scientist & Deputy Managing Director, Microsoft Research Lab) Slides: PDF/1 NOTE: attendance required for on-campus students
Week 9 Tues. 5/28 Lecture (Pandu) Crawling and near-duplicate pages Slides: PPT |PDF/6 PDF/1 IIR chapter 19 IIR chapter 20 Mercator: A scalable, extensible web crawler (Heydon et al. 1999) A standard for robot exclusion
Thurs. 5/30 PA3 due Programming assignment #3 due
Thurs. 5/30 Lecture (Chris) Question answering Slides: PPT |PDF/6 PDF/1
Week 10 Tues. 6/4 Lecture (Pandu) Personalization Slides: PPT |PDF/6 PDF/1 J. Teevan, S. Dumais, E. Horvitz. Potential for personalization. 2010 J. Pitkow et al. Personalized search. 2002 J. Teevan, S. Dumais, E. Horvitz. Personalizing search via automated analysis of interests and activities. 2005 P. Bennett et al. Inferring and using location metadata to personalize Web search. 2011 T. Haveliwala. Topic-sensitive pagerank. 2002. G. Jeh and J. Widom. Scaling personalized Web search. 2003 M. Curtiss et al. Unicorn: A system for searching the social graph. 2013
Exam week Fri. 6/7 Final exam Alternate final exam (8:30-11:30am)
Wed. 6/12 Final exam Final exam (3:30-6:30pm) Practice final and solution are on Canvas

FAQ

Can I take this course on credit/no credit basis?

Yes. Credit will be given to those who would have otherwise earned a C- or above.

Can I audit or sit in?

In general we are very open to sitting-in guests if you are a member of the Stanford community (registered student, staff, and/or faculty). Out of courtesy, we would appreciate that you first email us or talk to the instructor after the first class you attend.

I have a question about the class. What is the best way to reach the course staff?

In general, we ask students to use the Piazza forum for our class so that other students may benefit from your questions and our answers. If you have a personal matter that you believe is not appropriate to share on Piazza (even in a private post), you may email the course staff at cs276-spr1819-staff@lists.stanford.edu. We may NOT be able to reply emails sent to individual instructors or TAs regarding the class.

As an SCPD student, how do I take the final exam?

For SCPD students, if you are local, you're encouraged to just come to Stanford for one of the on-campus exams. If you decide to take on-campus exams, please let us know in advance (through a survey that we send out closer to the final exam date). If you are not local or can't make it at the on-campus exams, you need to line up an exam monitor (usually your manager or a co-worker at your company), and submit the form specifying this person to SCPD in advance. You won't get an exam if you don't have an exam monitor on file. You need to make sure we get the exam back promptly (monitor should scan and email directly to us).If you are taking the exam in the first 24 hour period, you need to make sure we get the exam back from your monitor by Saturday 12:30 pm PT. If you are taking the exam in the second 24 hour period, you need to make sure we get the exam back from your monitor by Wednesday 7:30 pm PT. We need to grade exams immediately after that in order to be able to turn grades in in time. Please refer to the course policies page for Final exam details

Will there be virtual office hours for SCPD students?

We will be sure to join a Google hangout for at least some office hours. We will use QueueStatus and post google hangout link on QueueStatus page in each office hour for SCPD students.