16-721 Learning-Based Methods in Vision (original) (raw)

"The aim of computer vision is to overfit to our visual world"
-- remark by Antonio Torralba (after his third beer)

Overview

A graduate seminar course in Computer Vision with emphasis on using large amounts of real data (images, video, textual annotations, user preferences, etc) to learn the structure of our visual world toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: theories of perception, low-level vision (color, texture), mid-level vision (grouping and segmentation), object and scene recognition, image parsing, words and pictures models, image manifolds, etc.

Prerequisite: 16-720 or similar Computer Vision course

We will meet on Tuesdays and Thursdays from 10:30am-11:50am in NSH 3002.

Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
Office Hours: Tuesdays at Noon, Thursdays at 1:30pm

TA: Jean-Francois Lalonde, A521 Newell-Simon Hall.
Office Hours: Monday 1:30pm and Wednesday 1:30pm (also by appointment if you can't make it: jlalonde at cs)

Projects

Check out this list of data sources for some ideas on where to get images to work with.

Challenges:

Class Schedule

A list of suggested papers to present is available here.

If you want to change your presentation date, please arrange a swap with another student and notify the instructor and the TA at least two weeks in advance.

Introduction

Date Presenter Paper title Slides
Jan. 16 Alyosha Efros Introduction, Vision: Measurement vs. Perception Administrative stuff, overview of the course, datasets Intro ppt
Jan. 18 Alyosha Efros Overview lecture on theories of Visual Perception 1. Cavanagh, P. (1995) Vision is getting easier every day 2. Cavanagh, P. (1991) What's up in top-down processing? 3. Cavanagh, P. (2005) The Artist as Neuroscientist Suggested reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? Theories ppt
Jan. 23 Alyosha Efros Overview lecture on the physiology of vision 4. Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision Physiology ppt

Part 1: Images

Learning Features from Data

Date Presenter Paper title Slides
Jan. 25 Byron Evaluator: Eakta 5. Olshausen, B. & Field, D. (1996) Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, Nature (Byron) Code available Additional background reading: here Coming soon...
Jan. 30 Byron Andrew Evaluator: Eakta We will first finish the Olshausen & Field paper from last class. 6. Serre, T., Wolf, L. Poggio, T. (2005) Object recognition with features inspired by visual cortex, CVPR (Andrew) There is some code available Serre ppt

Distributions of Features

Date Presenter Paper title Slides
Feb. 1st Frederik Jean-Francois 7. Rubner, Y., Tomasi, C. and Guibas, L.J. (2000) The Earth Mover's Distance as a Metric for Image Retrieval, IJCV (Frederik) There is some code available Additional reading: Levina, E. and Bickel, P.J. (2001) The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics, ICCV 8. Martin, Fowlkes and Malik (2004) Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues, PAMI (Jean-Francois) There are some code and data available short version, NIPS 2002 Rubner ppt Martin pdf

Images as Texture ("Bag of Words" models)

Date Presenter Paper title Slides
Feb. 6 Alyosha 9. Renninger, L.W. & Malik, J. (2004) When is scene recognition just texture recognition?, Vision Research (Alyosha) Data available 10. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004) Visual categorization with bags of keypoints (Alyosha) 11. Winn, J., Criminisi, A. and Minka, T. (2005) Object Categorization by Learned Universal Visual Dictionary (Alyosha) Coming soon...

Images as Scenes

Date Presenter Paper title Slides
Feb. 8 Sebastian 12. Torralba, A. and Oliva, A. (2003) Statistics of Natural Image Categories, Network: Computation in Neural Systems (Sebastian) 13. Torralba, A. and Oliva, A. (2002) Depth estimation from image structure, PAMI (Sebastian) 14. Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV (Sebastian) Gist pdf

Images as Feature Vectors

Date Presenter Paper title Slides
Feb. 13 Google talk! (Henry Rowley)
Feb. 15 Alyosha 15. Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding, Science (Presenter: Alyosha, Evaluator: Ankur) Code available 16. Tenenbaum, J.B., De Silva, V. and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction, Science (Presenter: Alyosha, Evaluator: Ankur) Code available Manifolds ppt
Feb. 20 Devi Evaluator: Ankur Ankur will evaluate papers 15 and 16 Additional applications Pless, R. (2003) Using isomap to explore video sequences, ICCV (Devi) Pless, R. and Simon, I. (2002) Using Thousands of Images of an Object, Computer Vision, Pattern Recognition and Image Processing (Devi) Mohan, A., Winnemoller, H., Tumblin, J. and Gooch, B. (2005) Light Waving: Light Position Estimates from Photos Alone, Eurographics (Website) (Devi) Isomap applications ppt
Feb. 22 Ralph 17. Tenenbaum & Freeman (2000) Separating Style and Content with Bilinear Models, Neural Computation (Ralph) Coming soon...

Image matching (Distance Transforms)

Date Presenter Paper title Slides
Feb. 27 Alyosha Evaluator: Minh 18. Learned-Miller, E. (2005) Data Driven Image Models through Continuous Joint Alignment, PAMI (Alyosha) Code available (Evaluator: Minh) Registration ppt
Mar. 1 Ankur Evaluator: Byron 19. Huttenlocker, Klanderman, G. and Rucklidge, W. (1993) Comparing Images Using the Hausdorff Distance, PAMI (Ankur) 20. Borgefors, G. (1988) Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm, PAMI (Ankur) Huttenlocker & Felzenswalb, P. (2004) Distance Transforms of Sampled Functions, Cornell Computing and Information Science Technical Report TR2004-1963 (Evaluator: Byron) Code available Comparison ppt

Image Correspondence (Caltech-101-fest!)

Date Presenter Paper title Slides
Mar. 6 Ross Alyosha 21. Zhang, H., Berg, A., Maire, M. and Malik, J. (2006) SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, CVPR (Ross) 22. Frome, A., Singer, Y. and Malik, J. (2006) Image Retrieval and Recognition Using Local Distance Functions, NIPS (to appear) (Ross) 23. Berg, A., Berg, T. and Malik, J. (2005) Shape Matching and Object Recognition using Low Distortion Correspondences, CVPR (Alyosha) Alternative approach: Leordeanu, M. and Hebert, M. (2005) A Spectral Technique for Correspondence Problems using Pairwise Constraints, ICCV SVM-KNN ppt
Mar. 8 Special lecture by Andrew Zisserman!

Lots of Data is Fun!

Date Presenter Paper title Slides
Mar. 13 No class: Spring break!
Mar. 15 No class: Spring break!
Mar. 20 Hongwen Ross 24. Lazebnik, S., Schmid, C. and Ponce, J. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR (Ross) Background: Grauman, K. and Darrell, T. (2005) The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV 27. Sivic, J. and Zisserman, A. (2003) Video Google: A Text Retrieval Approach to Object Matching in Videos, ICCV (webpage) (Presented by A.Z. last class) 28. Nist�r, D. and Stew�nius, H. (2006) Scalable Recognition with a Vocabulary Tree (Hongwen) Coming soon...
Mar. 22 Alyosha Ralph 25. Zitnik & Kanade (2003) Content-free image retrieval, unpublished (Alyosha) 26. Berg, T., Berg, A., Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and Forsyth, D.A. (in submission) Names and Faces (Ralph) Coming soon...
Mar. 27 Devi Jean-Francois 27. Dalal and Triggs (2005) Histograms of Oriented Gradients for Human Detection, CVPR (Devi) Data available 28. Marszalek, M. and Schmid, C. (2006) Spatial weighting for bag-of-features, CVPR (Devi) 29. Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo tourism: Exploring photo collections in 3D, SIGGRAPH, (webpage) (Jean-Francois) Coming soon...

Boosting Background

Date Presenter Paper title Slides
Mar. 29 Sebastian Minh 30. AdaBoost background (Sebastian) 31. Friedman, J. H., Hastie, T. and Tibshirani, R. (1998) Additive Logistic Regression: a Statistical View of Boosting (Sebastian) 32. Schneiderman, H. and Kanade, T. (2004) Object Detection Using the Statistics of Parts, IJCV (Presenter: Minh, Evaluator: Andrew) Demo available 33. Viola, P. and Jones (2001) Robust Real-time Object Detection, Second International Workshop on Statistical and Computational Theories of Vision (Presenter: Minh, Evaluator: Andrew) Short version Obj. detection ppt Evaluation ppt

Part 2: Objects and Parts

Segmentation

Date Presenter Paper title Slides
Apr. 3-5 Alyosha Fred 34. Wertheimer, M. (1923) Laws of Organization in Perceptual Forms (Alyosha) 35. Weiss, Y. (1999) Segmentation using eigenvectors: a unifying view, ICCV (Fred) 36. Ng, A.Y., Jordan, M.I. and Weiss, Y. (2001) On Spectral Clustering: Analysis and an algorithm, NIPS (Fred) Coming soon...
Apr. 10 Ross Jean-Francois Evaluator: Hongwen 37. Tu and Zhu (2002) Image Segmentation by Data-Driven Markov Chain Monte Carlo, PAMI (Ross) Coming soon...
Apr. 12 Jean-Francois Evaluator: Hongwen 38. Boykov and Jolly (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images, ICCV (Jean-Francois) Application: Li, Y., Sun, J., Tang, C.K. and Shum, H. (2004) Lazy Snapping, SIGGRAPH (Jean-Francois, Evaluator: Hongwen) Coming soon...

Grouping Repeated Structures

Date Presenter Paper title Slides
Apr. 17 Ankur Eakta 39. Boiman, O. and Irani, M (2006), Similarity by Composition, NIPS (Ankur) Coming soon...
Apr. 19 No classes (from academic calendar)
Apr. 24 Eakta Alyosha Evaluator: Fred 40. Kannan, A., Winn, J. and Rother, C. (2006) Clustering appearance and shape by learning jigsaws, NIPS (Eakta) 41. Ren, X. and Malik, J. (2003) Learning a Classification Model for Segmentation, ICCV Superpixel code available (Evaluator: Fred) 42. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T. and Zisserman, A. (2006) Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, CVPR (Alyosha) Coming soon...

From Features to Objects

Date Presenter Paper title Slides
Apr. 26 Hongwen 44. Torralba, A., Murphy, K.P. and Freeman, W.T (in press) Sharing visual features for multiclass and multiview object detection, PAMI (Hongwen) 45. Opelt, A., Pinz, A, Zisserman, A. (2006) Incremental learning of object detectors using a visual shape alphabet, CVPR 46. Ferrari, V., Fevrier, L., Jurie, F. and Schmid, C. (2006) Groups of Adjacent Contour Segments for Object Detection, INRIA Technical Report 47. Leibe, B., Leonardis, A. and Schiele, B. (2004) Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV'04 Workshop on Statistical Learning in Computer Vision (Hongwen) 48. Leibe, B., Seemann, E. and Schiele, B. (2005) Pedestrian Detection in Crowded Scenes, CVPR Coming soon...

Scenes, Context, and Image Parsing

Date Presenter Paper title Slides
May 1 Byron Alyosha 66. Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS (Byron) 64. Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV (Alyosha) 67. Tu, Z., Chen, X., Yuille, A. and Zhu, S.C. (2005) Image Parsing: Unifying Segmentation, Detection, and Recognition, IJCV 68. Ren, X., Fowlkes, C. and Malik, J. (2006) Figure/Ground Assignment in Natural Images, ECCV 69. Cornelis, N., Leibe, B., Cornelis, K. and Van Gool, L. (2006) 3D City Modeling Using Cognitive Loops, 3DPVT Coming soon...

Face Modeling / Recognition

Date Presenter Paper title Slides
May 3 Andrew Minh Evaluator : Ralph 70. Sinha, P., Balas, B.J., Ostrovsky, Y., and Russell, R. (under review) Face recognition by humans: 20 results all computer vision researchers should know about (Andrew) 71. Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active Appearance Models, ECCV (Minh) (Evaluator: Ralph) Coming soon...

Final project presentations

Date Informations
May 7 The presentations will be from 1:00 to 4:00 pm. The location is PH226A (that's Porter Hall). See here for updated information from the HUB (search for 16721).

Similar Courses

This course has been inspired by these offered by several of my colleagues. Here is a partial list:

Some tutorials, workshops and seminars:

Page created and maintained by Jean-Francois Lalonde (email: jlalonde at cs dot cmu dot edu)
Valid HTML 4.01 Transitional