16-721 Learning-Based Methods in Vision (original) (raw)

"The aim of computer vision is to overfit to our visual world"
-- remark by Antonio Torralba (after his third beer)

Overview

A graduate seminar course in Computer Vision with emphasis on using large amounts of real data (images, video, textual annotations, user preferences, etc) to learn the structure of our visual world toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: theories of perception, low-level vision (color, texture), mid-level vision (grouping and segmentation), object and scene recognition, image parsing, words and pictures models, image manifolds, etc.

Prerequisite: 16-720 or similar Computer Vision course

We will meet on Tuesdays and Thursdays from 10:30am-11:50am in NSH 3002.

Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
Office Hours: Tuesdays at Noon, Thursdays at 1:30pm

TA: Jean-Francois Lalonde, A521 Newell-Simon Hall.
Office Hours: Monday 1:30pm and Wednesday 1:30pm (also by appointment if you can't make it: jlalonde at cs)

Projects

Check out this list of data sources for some ideas on where to get images to work with.

Challenges:

Class Schedule

A list of suggested papers to present is available here.

If you want to change your presentation date, please arrange a swap with another student and notify the instructor and the TA at least two weeks in advance.

Introduction

Date	Presenter	Paper title	Slides
Jan. 16	Alyosha Efros	Introduction, Vision: Measurement vs. Perception Administrative stuff, overview of the course, datasets	Intro ppt
Jan. 18	Alyosha Efros	Overview lecture on theories of Visual Perception 1. Cavanagh, P. (1995) Vision is getting easier every day 2. Cavanagh, P. (1991) What's up in top-down processing? 3. Cavanagh, P. (2005) The Artist as Neuroscientist Suggested reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century?	Theories ppt
Jan. 23	Alyosha Efros	Overview lecture on the physiology of vision 4. Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision	Physiology ppt

Part 1: Images

Learning Features from Data

Date	Presenter	Paper title	Slides
Jan. 25	Byron Evaluator: Eakta	5. Olshausen, B. & Field, D. (1996) Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, Nature (Byron) Code available Additional background reading: here	Coming soon...
Jan. 30	Byron Andrew Evaluator: Eakta	We will first finish the Olshausen & Field paper from last class. 6. Serre, T., Wolf, L. Poggio, T. (2005) Object recognition with features inspired by visual cortex, CVPR (Andrew) There is some code available	Serre ppt

Distributions of Features

Date	Presenter	Paper title	Slides
Feb. 1st	Frederik Jean-Francois	7. Rubner, Y., Tomasi, C. and Guibas, L.J. (2000) The Earth Mover's Distance as a Metric for Image Retrieval, IJCV (Frederik) There is some code available Additional reading: Levina, E. and Bickel, P.J. (2001) The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics, ICCV 8. Martin, Fowlkes and Malik (2004) Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues, PAMI (Jean-Francois) There are some code and data available short version, NIPS 2002	Rubner ppt Martin pdf

Images as Texture ("Bag of Words" models)

Date	Presenter	Paper title	Slides
Feb. 6	Alyosha	9. Renninger, L.W. & Malik, J. (2004) When is scene recognition just texture recognition?, Vision Research (Alyosha) Data available 10. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004) Visual categorization with bags of keypoints (Alyosha) 11. Winn, J., Criminisi, A. and Minka, T. (2005) Object Categorization by Learned Universal Visual Dictionary (Alyosha)	Coming soon...

Images as Scenes

Date	Presenter	Paper title	Slides
Feb. 8	Sebastian	12. Torralba, A. and Oliva, A. (2003) Statistics of Natural Image Categories, Network: Computation in Neural Systems (Sebastian) 13. Torralba, A. and Oliva, A. (2002) Depth estimation from image structure, PAMI (Sebastian) 14. Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV (Sebastian)	Gist pdf

Images as Feature Vectors

Date	Presenter	Paper title	Slides
Feb. 13	Google talk! (Henry Rowley)
Feb. 15	Alyosha	15. Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding, Science (Presenter: Alyosha, Evaluator: Ankur) Code available 16. Tenenbaum, J.B., De Silva, V. and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction, Science (Presenter: Alyosha, Evaluator: Ankur) Code available	Manifolds ppt
Feb. 20	Devi Evaluator: Ankur	Ankur will evaluate papers 15 and 16 Additional applications Pless, R. (2003) Using isomap to explore video sequences, ICCV (Devi) Pless, R. and Simon, I. (2002) Using Thousands of Images of an Object, Computer Vision, Pattern Recognition and Image Processing (Devi) Mohan, A., Winnemoller, H., Tumblin, J. and Gooch, B. (2005) Light Waving: Light Position Estimates from Photos Alone, Eurographics (Website) (Devi)	Isomap applications ppt
Feb. 22	Ralph	17. Tenenbaum & Freeman (2000) Separating Style and Content with Bilinear Models, Neural Computation (Ralph)	Coming soon...

Image matching (Distance Transforms)

Date	Presenter	Paper title	Slides
Feb. 27	Alyosha Evaluator: Minh	18. Learned-Miller, E. (2005) Data Driven Image Models through Continuous Joint Alignment, PAMI (Alyosha) Code available (Evaluator: Minh)	Registration ppt
Mar. 1	Ankur Evaluator: Byron	19. Huttenlocker, Klanderman, G. and Rucklidge, W. (1993) Comparing Images Using the Hausdorff Distance, PAMI (Ankur) 20. Borgefors, G. (1988) Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm, PAMI (Ankur) Huttenlocker & Felzenswalb, P. (2004) Distance Transforms of Sampled Functions, Cornell Computing and Information Science Technical Report TR2004-1963 (Evaluator: Byron) Code available	Comparison ppt

Image Correspondence (Caltech-101-fest!)

Date	Presenter	Paper title	Slides
Mar. 6	Ross Alyosha	21. Zhang, H., Berg, A., Maire, M. and Malik, J. (2006) SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, CVPR (Ross) 22. Frome, A., Singer, Y. and Malik, J. (2006) Image Retrieval and Recognition Using Local Distance Functions, NIPS (to appear) (Ross) 23. Berg, A., Berg, T. and Malik, J. (2005) Shape Matching and Object Recognition using Low Distortion Correspondences, CVPR (Alyosha) Alternative approach: Leordeanu, M. and Hebert, M. (2005) A Spectral Technique for Correspondence Problems using Pairwise Constraints, ICCV	SVM-KNN ppt
Mar. 8	Special lecture by Andrew Zisserman!

Lots of Data is Fun!

Date	Presenter	Paper title	Slides
Mar. 13	No class: Spring break!
Mar. 15	No class: Spring break!
Mar. 20	Hongwen Ross	24. Lazebnik, S., Schmid, C. and Ponce, J. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR (Ross) Background: Grauman, K. and Darrell, T. (2005) The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV 27. Sivic, J. and Zisserman, A. (2003) Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV (webpage) (Presented by A.Z. last class) 28. Nist�r, D. and Stew�nius, H. (2006) Scalable Recognition with a Vocabulary Tree (Hongwen)	Coming soon...
Mar. 22	Alyosha Ralph	25. Zitnik & Kanade (2003) Content-free image retrieval, unpublished (Alyosha) 26. Berg, T., Berg, A., Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and Forsyth, D.A. (in submission) Names and Faces (Ralph)	Coming soon...
Mar. 27	Devi Jean-Francois	27. Dalal and Triggs (2005) Histograms of Oriented Gradients for Human Detection, CVPR (Devi) Data available 28. Marszalek, M. and Schmid, C. (2006) Spatial weighting for bag-of-features, CVPR (Devi) 29. Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo tourism: Exploring photo collections in 3D, SIGGRAPH, (webpage) (Jean-Francois)	Coming soon...

Boosting Background

Date	Presenter	Paper title	Slides
Mar. 29	Sebastian Minh	30. AdaBoost background (Sebastian) 31. Friedman, J. H., Hastie, T. and Tibshirani, R. (1998) Additive Logistic Regression: a Statistical View of Boosting (Sebastian) 32. Schneiderman, H. and Kanade, T. (2004) Object Detection Using the Statistics of Parts, IJCV (Presenter: Minh, Evaluator: Andrew) Demo available 33. Viola, P. and Jones (2001) Robust Real-time Object Detection, Second International Workshop on Statistical and Computational Theories of Vision (Presenter: Minh, Evaluator: Andrew) Short version	Obj. detection ppt Evaluation ppt

Part 2: Objects and Parts

Segmentation

Date	Presenter	Paper title	Slides
Apr. 3-5	Alyosha Fred	34. Wertheimer, M. (1923) Laws of Organization in Perceptual Forms (Alyosha) 35. Weiss, Y. (1999) Segmentation using eigenvectors: a unifying view, ICCV (Fred) 36. Ng, A.Y., Jordan, M.I. and Weiss, Y. (2001) On Spectral Clustering: Analysis and an algorithm, NIPS (Fred)	Coming soon...
Apr. 10	Ross Jean-Francois Evaluator: Hongwen	37. Tu and Zhu (2002) Image Segmentation by Data-Driven Markov Chain Monte Carlo, PAMI (Ross)	Coming soon...
Apr. 12	Jean-Francois Evaluator: Hongwen	38. Boykov and Jolly (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images, ICCV (Jean-Francois) Application: Li, Y., Sun, J., Tang, C.K. and Shum, H. (2004) Lazy Snapping, SIGGRAPH (Jean-Francois, Evaluator: Hongwen)	Coming soon...

Grouping Repeated Structures

Date	Presenter	Paper title	Slides
Apr. 17	Ankur Eakta	39. Boiman, O. and Irani, M (2006), Similarity by Composition, NIPS (Ankur)	Coming soon...
Apr. 19	No classes (from academic calendar)
Apr. 24	Eakta Alyosha Evaluator: Fred	40. Kannan, A., Winn, J. and Rother, C. (2006) Clustering appearance and shape by learning jigsaws, NIPS (Eakta) 41. Ren, X. and Malik, J. (2003) Learning a Classification Model for Segmentation, ICCV Superpixel code available (Evaluator: Fred) 42. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T. and Zisserman, A. (2006) Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, CVPR (Alyosha)	Coming soon...

From Features to Objects

Date	Presenter	Paper title	Slides
Apr. 26	Hongwen	44. Torralba, A., Murphy, K.P. and Freeman, W.T (in press) Sharing visual features for multiclass and multiview object detection, PAMI (Hongwen) 45. Opelt, A., Pinz, A, Zisserman, A. (2006) Incremental learning of object detectors using a visual shape alphabet, CVPR 46. Ferrari, V., Fevrier, L., Jurie, F. and Schmid, C. (2006) Groups of Adjacent Contour Segments for Object Detection, INRIA Technical Report 47. Leibe, B., Leonardis, A. and Schiele, B. (2004) Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV'04 Workshop on Statistical Learning in Computer Vision (Hongwen) 48. Leibe, B., Seemann, E. and Schiele, B. (2005) Pedestrian Detection in Crowded Scenes, CVPR	Coming soon...

Scenes, Context, and Image Parsing

Date	Presenter	Paper title	Slides
May 1	Byron Alyosha	66. Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS (Byron) 64. Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV (Alyosha) 67. Tu, Z., Chen, X., Yuille, A. and Zhu, S.C. (2005) Image Parsing: Unifying Segmentation, Detection, and Recognition, IJCV 68. Ren, X., Fowlkes, C. and Malik, J. (2006) Figure/Ground Assignment in Natural Images, ECCV 69. Cornelis, N., Leibe, B., Cornelis, K. and Van Gool, L. (2006) 3D City Modeling Using Cognitive Loops, 3DPVT	Coming soon...

Face Modeling / Recognition

Date	Presenter	Paper title	Slides
May 3	Andrew Minh Evaluator : Ralph	70. Sinha, P., Balas, B.J., Ostrovsky, Y., and Russell, R. (under review) Face recognition by humans: 20 results all computer vision researchers should know about (Andrew) 71. Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active Appearance Models, ECCV (Minh) (Evaluator: Ralph)	Coming soon...

Final project presentations

Date	Informations
May 7	The presentations will be from 1:00 to 4:00 pm. The location is PH226A (that's Porter Hall). See here for updated information from the HUB (search for 16721).

Similar Courses

This course has been inspired by these offered by several of my colleagues. Here is a partial list:

Selected Topics in Vision & Learning (Serge Belongie, UCSD)
Learning and Inference in Vision (Bill Freeman, MIT)
Object Recognition (Kristen Grauman, Texas-Austin)
High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
Recognition Problems in Computer Vision (Greg Mori, SFU)
Scene Understanding Seminar (Aude Oliva, MIT)
Visual Recognition (Pietro Perona, CalTech)
Vision and Learning (Jianbo Shi, UPenn)

Some tutorials, workshops and seminars:

CMU VASC Seminar (Spring 2007)
Sicily Workshop on Category-level Object Recognition (2006)
IMA Visual Learning and Recognition Workshop (2006)
MSRI Visual Recognition Workshop (2006)
Scene Understanding Symposium SUnS'06 (2006)
Recognizing and Learning Object Categories (ICCV 2005 Tutorial)

Page created and maintained by Jean-Francois Lalonde (email: jlalonde at cs dot cmu dot edu)