COS 435, Spring 2002: Home Page (original) (raw)
Directory
General Information |Schedule and Readings |Work of the Course |Project Page |Announcements
Course Summary
We study both classic techniques of indexing documents and searching text and also new algorithms that exploit properties of the Web (e.g. links) and modern digital libraries, including multimedia collections. We also study techniques for finding relationships and patterns that have not been explicitly modeled within digital collections, e.g. "mining" data in massive databases for new information. Finally, improvements in network technology alone cannot meet the ever-increasing demand for more information faster. We examine techniques such as caching and distributed storage for making information delivery more efficient.
Prerequisites
COS 217 and 226.
Administrative Information
Meeting time: Mon, Wed 11:00AM--12:20 PM
Meeting place: Room 301 Computer Science Building
Extra meetings: We may need to make up a class or two that we miss due to my schedule. Therefore, we may have a class during reading period and/or some evening classes during the semester. Class participants will be consulted before any make-up class time is chosen.
Professor: Andrea LaPaugh, aslp@cs.princeton.edu,
304 CS Building, 258-4568, or Forbes College Office*, 258-5232
Office hours Monday and Wednesday 12:20--1:00PM or by appointment. Please catch me after class or send email to make an appointment.
* in my "other life" I am Master of Forbes College; you are welcome to call me at either office.
Course secretary: Mitra Kelly, 323 CS building, 258-4562,mkelly@cs.princeton.edu
Reading
Required text: None
Supplemental reading on reserve at Engineering Library
- Baeza -Yates and Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1999.
- expect additions as we progress in the semester
We will also use reprints and online material.
Syllabus
(this is the general list of topics and probably a superset of what we will have time to cover. Please see Schedule and Readings for specific topics and reading assignments)
Part 1, topics in information retrieval and manipulation:
- Indexing and inverted files
- Keyword-based searching
- Vector space model of documents
- Latent Semantic Indexing.
- Ranking documents
- Evaluating retrieval systems
- Using URL structure for Web document categorizing
Part 2, topics in document similarity and information discovery:
- Web crawling
- Document similarity
- Clustering
- Pattern recognition
- Semantic and feedback techniques
Part 3, systems issues in delivering digital information:
- Information caching
- Information prefetching
- Distributed storage
- Broadcast-based systems
- Reliability and permanence
A.S. LaPaugh content last changed February 2002