Lectures and Readings : Computer Vision : Fall 2024 (original) (raw)

Computer Vision (CMU 16-385)

The lecture slides for this course can be found here: Lecture Slides Folder

(Overview of computer vision)

(Image transformations, point image processing, linear shift-invariant image filtering, convolution, image gradients)

Basic reading:

Szeliski textbook, Section 3.2

(Image downsampling, aliasing, Gaussian image pyramid, Laplacian image pyramid, Fourier series, frequency domain, Fourier transform, frequency-domain filtering, sampling)

Basic reading:

Szeliski textbook, Section 3.4, 3.5

Additional reading:

Burt and Adelson, "The Laplacian Pyramid as a Compact Image Code", IEEE ToC 1983. (The original Laplacian pyramid paper.)
Hubel and Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex", The Journal of Physiology 1962. A foundational paper describing information processing in the visual system, including the different types of filtering it performs; Hubel and Wiesel won the Nobel Prize in Medicine in 1981 for the discoveries described in this paper.

(Finding boundaries, line fitting, line parameterization, Hough transform, Hough circles)

Basic reading:

Szeliski textbook, Section 7.4, A.2

(Visualizing quadratics, Harris corner detector, multi-scale detection)

Basic reading:

Szeliski textbook, Section 7.1
The Singular Value Decomposition (from Numerical Linear Algebra by Trefethen and Bau). Note: The eigenvalues and eigenvectors of the covariance matrix (or any positive semidefinite matrix for that matter) are equivalent to its singular values and singular vectors.

(Designing feature descriptors, MOPS descriptor, GIST descriptor, Histogram of Textons descriptor, HOG descriptor, SIFT)

Basic reading:

Szeliski textbook, Section 7.1

(2D transformations, projective geometry, classification of 2D transformations, determining unknown 2D transformations)

Basic reading:

Szeliski textbook, Section 2.1

Additional reading:

Hartley and Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press 2004. A comprehensive treatment of all aspects of projective geometry relating to computer vision, and also a very useful reference for the second part of the class.
Richter-Gebert, "Perspectives on projective geometry", Springer 2011. A beautiful, thorough, and very accessible mathematics textbook on projective geometry (available online for free from CMU's library).

(Panoramas, Image homographies, Computing with homographies, direct linear transform (DLT), random sample consensus (RANSAC))

Basic reading:

Szeliski textbook, Section 2.1

Additional reading:

Hartley and Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press 2004. Sections 2 and 4 in particular discuss everything about homography estimation.

(Introduction to learning-based vision, image classification, bag-of-words, K-means clustering, classification, K-nearest neighbors, naive Bayes, support vector machines)

Basic reading:

Szeliski textbook, Chapter 6.2

(Perceptron, neural networks, training perceptrons, gradient descent, backpropagation, stochastic gradient descent)

Basic reading (No standard textbooks yet!):

(Intro to vision for video, optical flow, constant flow, Horn-Schunck flow)

Basic reading:

Szeliski textbook, Section 8.4

(Motion magnification using optical flow, image alignment, Lucas-Kanade alignment, Baker-Matthews alignment, inverse alignment, KLT tracking, mean-shift tracking, modern trackers)

Basic reading:

Szeliski textbook, Section 4.1.1, 5.3, 8.1