The Modern Algorithmic Toolbox (CS168), Spring 2024 (original) (raw)
## Instructor:
- Gregory Valiant (Office hours: Tues 3-4pm, Gates 162). Contact: email me at my last name at stanford.edu or via our course Ed page.
Course Assistants:
- Sidhant Bansal (Office hours: Mon 4:15-6:15pm, Huang Basement)
- Luci Bresette (Office hours: Wed 10:30-11:30, Fri 10:30-11:30, Huang Basement)
- Isaac Gorelik (Office hours: Mon 11-1, Huang Basement)
- Meenal Gupta (Office hours: Wednesday 8-10pm, Zoom link, entry password 856878))
- Benson Kung (Office hours: Wed 11-1, Zoom link)
- Ali Malik (Office hours: Wed 6-8pm, Huang Basement)
- Ligia Melo (Office hours: Mon 9-11am Huang Basement)
- Joey Rivkin (Office hours: Tues 10:45-11:45 and Wed 10-11am, Huang Basement)
- Luna Yang (Office hours: Wed 2-4pm, Huang Basement)
Lecture Time/location:
Tues/Thurs, 1:30-2:50pm in NVIDIA Auditorium
Ed site for online discussion/questions. This link includes the access-code that you need the first time you sign up.
Prerequisites:
CS107 and CS161, or permission from the instructor.
Course Description
This course will provide a rigorous and hands-on introduction to the central ideas and algorithms that constitute the core of the modern algorithms toolkit. Emphasis will be on understanding the high-level theoretical intuitions and principles underlying the algorithms we discuss, as well as developing a concrete understanding of when and how to implement and apply the algorithms. The course will be structured as a sequence of one-week investigations; each week will introduce one algorithmic idea, and discuss the motivation, theoretical underpinning, and practical applications of that algorithmic idea. Each topic will be accompanied by a mini-project in which students will be guided through a practical application of the ideas of the week. Topics include modern techniques in hashing, dimension reduction, linear and convex programming, gradient descent and regression, sampling and estimation, compressive sensing, linear-algebraic techniques (principal components analysis, singular value decomposition, spectral techniques), and an intro to differential privacy.
Proposed Lecture Schedule
Week 1: Modern Hashing
- Lecture 1 (Tues 4/2): Course introduction. ``Consistent'' hashing.
* Lecture notes
Supplementary material:
* The Akamai paper: Karger/Lehman/Leighton/Levine/Lewin/Panigrahy,Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web, STOC 1997.
* Akamai stories by co-founder Tom Leighton.
* Further implementation details (see Section 3).
* The Chord paper: Stoica et al.,Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, SIGCOMM 2001.
* The Amazon Dynamo paper: DeCandia et al.,Dynamo: Amazon's Highly Available Key-value Store, SOSP 2007.
* Review videos on hashing:Operations and Applications, Implementation Details Part 1 and Part 2.- **Lecture 2 (Thurs 4/4):**Property-preserving lossy compression. From majority elements to approximate heavy hitters. From bloom filters to the count-min sketch.
* Lecture notes
Supplementary material:
* Review videos on bloom filters:The Basicsand Heuristic Analysis
* Broder/Mitzenmacher,Network Applications of Bloom Filters: A Survey, 2005.
* Cormode/Muthukrishnan,An Improved Data Stream Summary: The Count-Min Sketch and its Applications, 2003.
* One of several count-min sketch implementations.- Lecture 1 (Tues 4/2): Course introduction. ``Consistent'' hashing.
Week 2: Data with Distances (Similarity Search, Nearest Neighbor, Dimension Reduction, LSH)
- **Lecture 3 (Tues 4/9):**Similarity Search. (Dis)similarity metrics: Jaccard, Euclidean, Lp. Efficient algorithm for finding similar elements in small/medium (ie. <20) dimensions using k-d-trees.
* Lecture notes.
Supplementary material:
* Original paper of Bentley:Multidimensional binary search trees used for associative searching, 1975.
* Python scipy kd-tree implementationhere.- **Lecture 4 (Thurs 4/11):**Curse of Dimensionality, kissing number. Distance-preserving compression. Estimating Jaccard similarity using MinHash. Euclidean distance preserving dimensionality reduction (aka the Johnson-Lindenstrauss Transform).
* Lecture notes
Supplementary material:
* A nice survey of "kissing number", and some other strange phenomena from high dimensional spaces:Kissing Numbers, Sphere Packings, and some Unexpected Proofs (from 2000).
* Origins of MinHash at Alta Vista: Broder,Identifying and Filtering Near-Duplicate Documents (from 2000).
* Ailon/Chazelle, Faster Dimension Reduction, CACM '10.
* Andoni/Indyk,Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, CACM '08.
* For much more on Locality Sensitive Hashing, see this chapter of the CS246 textbook (by Leskovec, Rajaraman, and Ullman).- **Lecture 3 (Tues 4/9):**Similarity Search. (Dis)similarity metrics: Jaccard, Euclidean, Lp. Efficient algorithm for finding similar elements in small/medium (ie. <20) dimensions using k-d-trees.
Week 3: Generalization and Regularization
- **Lecture 5 (Tues 4/16):**Generalization (or, how much data is enough?). Learning an unknown function from samples from an unknown distribution. Training error vs. test error. PAC guarantees for linear classifiers. Empirical risk minimization.
* Lecture notes - Lecture 6 (Thurs 4/18): Regularization. The polynomial embedding and random projection, L2 regularization, and L1 regularization as a computationally tractable surrogate for L0 regularization, frequentist and Bayesian perspectives on regularization.
* Lecture notes
Supplementary material:
* A 2016 paper arguing that, to understand why deep learning works, we need to rethink the theory of generalization. This paper was quite controversial, with one camp thinking that its conclusions are completely obvious, and the other camp thinking that it is revealing an extremely deep mystery. You decide for yourself! Paper is here.- **Lecture 5 (Tues 4/16):**Generalization (or, how much data is enough?). Learning an unknown function from samples from an unknown distribution. Training error vs. test error. PAC guarantees for linear classifiers. Empirical risk minimization.
Week 4: Linear-Algebraic Techniques: Understanding Principal Components Analysis
- **Lecture 7 (Tues 4/23):**Understanding Principal Component Analysis (PCA). Minimizing squared distances equals maximizing variance. Use cases for data visualization and data compression. Failure modes for PCA.
* Lecture notes
Supplementary material:
* An exposition by 23andMe of the fact that the top 2 principal components of genetic SNP data of Europeans essentially recovers the geography of europe: nice exposition w. figures. Original Nature paper: Genes mirror geography in Europe, Nature, Aug. 2008.
* Eigenfaces(see also this blog post)
* There's tons of PCA tutorials floating around the Web (some good, some not so good), which you are also permitted to refer to.- **Lecture 8 (Thurs 4/25):**How PCA works. Maximizing variance as finding the "direction of maximum stretch" of the covariance matrix. The simple geometry of "diagonals in disguise." The power iteration algorithm.
* Lecture notes
- **Lecture 7 (Tues 4/23):**Understanding Principal Component Analysis (PCA). Minimizing squared distances equals maximizing variance. Use cases for data visualization and data compression. Failure modes for PCA.
Week 5: Linear-Algebraic Techniques: Understanding the Singular Value Decomposition and Tensor Methods
- **Lecture 9 (Tues 4/30):**Low-rank matrix approximations. The singular value decomposition (SVD), applications to matrix compression, de-noising, and matrix completion (i.e. recovering missing entries).
* Lecture notes
Supplementary material:
* For a striking recent application of low-rank approximation, check out the LoRA paper from 2021. The punchline is that in many cases, when fine-tuning large language models, the updates are close to low rank. Hence once can explicitly train these fine-tuning updates to the original model in the low-rank factorized parameter space, reducing the number of parameters that need to be trained by 1000x or 10,000x!!- **Lecture 10 (Thurs 5/2):**Tensor methods. Differences between matrices and tensors, the uniqueness of low-rank tensor factorizations, and Jenrich's algorithm.
* Lecture notes
Supplementary material:
* A blog post discussing Spearman's original experiment and the motivation for tensor methods: here.
* See chapter 3 of Ankur Moitra's course notes here for a more technical in depth discussion of tensor methods, and Jenrich's algorithm.- **Lecture 9 (Tues 4/30):**Low-rank matrix approximations. The singular value decomposition (SVD), applications to matrix compression, de-noising, and matrix completion (i.e. recovering missing entries).
Week 6: Spectral Graph Theory
- **Lecture 11 (Tues 5/7):**Graphs as matrices and the Laplacian of a graph. Interpretations of the largest and smallest eigenvectors/eigenvalues of the Laplacian. Spectral embeddings, and an overview of applications (e.g. graph coloring, spectral clustering.)
* Lecture notes.
Supplementary material:
* Dan Spielman's excellent lecture notes for his semester-long course on Spectral Graph Theory. The notes include a number of helpful plots.
* Amin Saberi offered a grad seminar a few years ago that covered some of the research frontier of Spectral Graph Theory. Hopefully he will offer this again soon...- **Lecture 12 (Thurs 5/9):**Spectral techniques, Part 2. Interpretations of the second eigenvalue (via conductance and isoperimetric number), and connections with the speed at which random walks/diffusions converge.
* Lecture notes.
- **Lecture 11 (Tues 5/7):**Graphs as matrices and the Laplacian of a graph. Interpretations of the largest and smallest eigenvectors/eigenvalues of the Laplacian. Spectral embeddings, and an overview of applications (e.g. graph coloring, spectral clustering.)
Week 7: Sampling and Estimation
- **Lecture 13 (Tues 5/14):**Reservoir sampling (how to select a random sample from a datastream). Basic probability tools: Markov's inequality and Chebyshev's inequality. Importance Sampling (how to make inferences about one distribution based on samples from a different distribution). Good-Turing estimate of the missing/unseen mass.
* Lecture notes
Supplementary material:
* A paper of mine showing that one can accurately estimate the structure of the unobserved portion of a distribution---not just the total probability mass. paper link here.- **Lecture 14 (Thurs 5/16):**Markov Chains, stationary distributions. Markov Chain Monte Carlo (MCMC) as approaches to solving hard problems by sampling from carefully crafted distributions.
* Lecture notes
Supplementary material:
* A basic description of MCMC, here.
* Lecture notes from Persi Diaconis on MCMC, including a description of the MCMC approach to decoding substitution-ciphers,here.
* Example of MCMC used for fitting extremely complex biological models:The human splicing code... Science, Jan. 2015.
* For those interested in computer Go: here is the Jan, 2016 Nature paper from Google's DeepMind group.- **Lecture 13 (Tues 5/14):**Reservoir sampling (how to select a random sample from a datastream). Basic probability tools: Markov's inequality and Chebyshev's inequality. Importance Sampling (how to make inferences about one distribution based on samples from a different distribution). Good-Turing estimate of the missing/unseen mass.
Week 8: The Fourier Perspective (and other bases)
- **Lecture 15 (Tues 5/21):**Fourier methods, part 1.
* Lecture notes for Lectures 15 and 16
Supplementary material:
* A very basic intro to Fourier transformations with some nice visualizations: here.
* A book version of a Stanford course on the Fourier Transform, which has many extremely nice applications: pdf here.- **Lecture 16 (Thurs 5/23):**Fourier methods, part 2. Emphasis on convolution and its many applications (fast multiplication, physics simulations, etc.)
* Lecture notes combined with Lecture 15.
- **Lecture 15 (Tues 5/21):**Fourier methods, part 1.
Week 9: Online Learning with Multiplicative Weights, and Convex Optimization
- Lecture 17 (Tues 5/28): Online learning and the multiplicative weights algorithm, and applications.
* Lecture notes
.
- **Lecture 18 (Thurs 5/30):**Compressive sensing, Linear and convex programming, and solving LPs via the multiplicative weights algorithm.
* Lecture notes on compressive sensing and Lecture notes on linear and convex programming.
Supplementary material:
* Linear programming FAQ, out of date but still with lots of useful info.
* For convex optimization at Stanford, start with Stephen Boyd.
* For more on matrix completion, start with Chapter 7 of Moitra's notes.
-->- Lecture 17 (Tues 5/28): Online learning and the multiplicative weights algorithm, and applications.
Week 10: Privacy Preserving Computation
- **Lecture 19 (Tues 6/4):**Differential Privacy in data analysis and machine learning.
* We won't have the usual lecture notes for this lecture, though the first chapter of the the Dwork/Roth book linked below is a fantastic starting place for learning about differential privacy.
Supplementary material:
* A fairly recent book on Differential Privacy by Cynthia Dwork and Aaron Roth: here (should be downloadable from stanford network).
* The original paper of Dwork, McSherry, Nissim, and Smith, introducing differential privacy.- **Lecture 19 (Tues 6/4):**Differential Privacy in data analysis and machine learning.
Coursework
- Assignments (70% of Grade): There will be 9 weekly mini-projects centered around the topics covered that week. Each mini-project contains both written and programming parts. You are strongly encouraged to work in teams of up to four students. If you work in a team only one member should submit all of the relevant files.
For the written part, you are encouraged to use LaTeX to typeset your homeworks; we've provided a template for your convenience. We will be using the GradeScope online submission system. Please create an account on Gradescope using your Stanford ID and join CS168 using entry code 2PN4J7.
For the programming part, you are encouraged to use Numpy and Pyplot in Python (Python tutorial, Numpy tutorial, Matplotlib tutorial), matlab (tutorial), or some other scientific computing tool (with plotting). Here is a comprehensive python tutorial using Google's colab that was put together for the CS231n course, with examples of plotting using matplotlib at the very end.
Assignments will be released on Tuesdays, and are due at 11am on Thursday the following week. No late assignments will be accepted, but we will drop your lowest assignment grade when calculating your final grade. - Final Exam (30% of Grade): There will be an in-class final exam, covering all the material with an emphasis on the high-level punchlines of each lecture. We will post a practice exam a few weeks before the end of the quarter.
Collaboration Policy
You can discuss the problems at a high level with other groups and contact the course staff (via Ed or office hours) for additional help. And of course, you are encouraged to help respond to Ed questions .
You may refer to the course notes and research papers linked from the course webpage, and can also use other references that you find online. You may *NOT* consult solution sets or code from past offerings of this course. Of course, you are expected to understand everything that is written on any assignment turned in with your name; if you do refer to references, make sure you cite them appropriately, and make sure that all your words are your own.
You are also permitted to use general resources for whatever programming language you choose to use. None of the assignments require you to write much code, and *you may NOT reuse any code from friends/internet sources. If you use helper functions or code snippets that you did not write yourself (aside from standard python packages, etc.) you must clearly cite the source in your writeup.
Please follow the honor code.