16-721 Learning-Based Methods in Vision (original ) (raw )"The aim of computer vision is to overfit to our visual world" -- remark by Antonio Torralba (after his third beer)
OverviewA graduate seminar course in Computer Vision with emphasis on using large amounts of real data (images, video, textual annotations, user preferences, etc) to learn the structure of our visual world toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: theories of perception, low-level vision (color, texture), mid-level vision (grouping and segmentation), object and scene recognition, image parsing, words and pictures models, image manifolds, etc.
Prerequisite : 16-720 or similar Computer Vision course
We will meet on Tuesdays and Thursdays from 10:30am-11:50am in NSH 3002 .
Instructor : Alexei (Alyosha) Efros , Assistant Professor, 4207 Newell-Simon Hall.Office Hours : Tuesdays at Noon, Thursdays at 1:30pm
TA : Jean-Francois Lalonde , A521 Newell-Simon Hall.Office Hours : Monday 1:30pm and Wednesday 1:30pm (also by appointment if you can't make it: jlalonde at cs)
ProjectsCheck out this list of data sources for some ideas on where to get images to work with.
Challenges:
Class ScheduleA list of suggested papers to present is available here .
If you want to change your presentation date, please arrange a swap with another student and notify the instructor and the TA at least two weeks in advance.
Introduction
Part 1: Images Learning Features from Data
Distributions of Features
Date
Presenter
Paper title
Slides
Feb. 1st
Frederik Jean-Francois
7. Rubner, Y., Tomasi, C. and Guibas, L.J. (2000) The Earth Mover's Distance as a Metric for Image Retrieval , IJCV (Frederik) There is some code available Additional reading : Levina, E. and Bickel, P.J. (2001) The Earth Mover's Distance is the Mallows Distance: Some Insights from Statistics , ICCV 8. Martin, Fowlkes and Malik (2004) Learning to Detect Natural Image Boundaries Using Local Brightness, Color and Texture Cues , PAMI (Jean-Francois) There are some code and data available short version , NIPS 2002
Rubner ppt Martin pdf
Images as Texture ("Bag of Words" models)
Date
Presenter
Paper title
Slides
Feb. 6
Alyosha
9. Renninger, L.W. & Malik, J. (2004) When is scene recognition just texture recognition? , Vision Research (Alyosha) Data available 10. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004) Visual categorization with bags of keypoints (Alyosha) 11. Winn, J., Criminisi, A. and Minka, T. (2005) Object Categorization by Learned Universal Visual Dictionary (Alyosha)
Coming soon...
Images as Scenes
Images as Feature Vectors
Date
Presenter
Paper title
Slides
Feb. 13
Google talk! (Henry Rowley)
Feb. 15
Alyosha
15. Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding , Science (Presenter: Alyosha, Evaluator: Ankur) Code available 16. Tenenbaum, J.B., De Silva, V. and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction , Science (Presenter: Alyosha, Evaluator: Ankur) Code available
Manifolds ppt
Feb. 20
Devi Evaluator : Ankur
Ankur will evaluate papers 15 and 16 Additional applications Pless, R. (2003) Using isomap to explore video sequences , ICCV (Devi) Pless, R. and Simon, I. (2002) Using Thousands of Images of an Object , Computer Vision, Pattern Recognition and Image Processing (Devi) Mohan, A., Winnemoller, H., Tumblin, J. and Gooch, B. (2005) Light Waving: Light Position Estimates from Photos Alone , Eurographics (Website ) (Devi)
Isomap applications ppt
Feb. 22
Ralph
17. Tenenbaum & Freeman (2000) Separating Style and Content with Bilinear Models , Neural Computation (Ralph)
Coming soon...
Date
Presenter
Paper title
Slides
Feb. 27
Alyosha Evaluator: Minh
18. Learned-Miller, E. (2005) Data Driven Image Models through Continuous Joint Alignment , PAMI (Alyosha) Code available (Evaluator: Minh)
Registration ppt
Mar. 1
Ankur Evaluator: Byron
19. Huttenlocker, Klanderman, G. and Rucklidge, W. (1993) Comparing Images Using the Hausdorff Distance , PAMI (Ankur) 20. Borgefors, G. (1988) Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , PAMI (Ankur) Huttenlocker & Felzenswalb, P. (2004) Distance Transforms of Sampled Functions , Cornell Computing and Information Science Technical Report TR2004-1963 (Evaluator: Byron) Code available
Comparison ppt
Image Correspondence (Caltech-101-fest!)
Date
Presenter
Paper title
Slides
Mar. 6
Ross Alyosha
21. Zhang, H., Berg, A., Maire, M. and Malik, J. (2006) SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , CVPR (Ross) 22. Frome, A., Singer, Y. and Malik, J. (2006) Image Retrieval and Recognition Using Local Distance Functions , NIPS (to appear) (Ross) 23. Berg, A., Berg, T. and Malik, J. (2005) Shape Matching and Object Recognition using Low Distortion Correspondences , CVPR (Alyosha) Alternative approach : Leordeanu, M. and Hebert, M. (2005) A Spectral Technique for Correspondence Problems using Pairwise Constraints , ICCV
SVM-KNN ppt
Mar. 8
Special lecture by Andrew Zisserman!
Lots of Data is Fun!
Date
Presenter
Paper title
Slides
Mar. 13
No class: Spring break!
Mar. 15
No class: Spring break!
Mar. 20
Hongwen Ross
24. Lazebnik, S., Schmid, C. and Ponce, J. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , CVPR (Ross) Background : Grauman, K. and Darrell, T. (2005) The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features , ICCV 27. Sivic, J. and Zisserman, A. (2003) Video Google: A Text Retrieval Approach to Object Matching in Videos , ICCV (webpage ) (Presented by A.Z. last class) 28. Nist�r, D. and Stew�nius, H. (2006) Scalable Recognition with a Vocabulary Tree (Hongwen)
Coming soon...
Mar. 22
Alyosha Ralph
25. Zitnik & Kanade (2003) Content-free image retrieval , unpublished (Alyosha) 26. Berg, T., Berg, A., Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and Forsyth, D.A. (in submission) Names and Faces (Ralph)
Coming soon...
Mar. 27
Devi Jean-Francois
27. Dalal and Triggs (2005) Histograms of Oriented Gradients for Human Detection , CVPR (Devi) Data available 28. Marszalek, M. and Schmid, C. (2006) Spatial weighting for bag-of-features , CVPR (Devi) 29. Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo tourism: Exploring photo collections in 3D , SIGGRAPH, (webpage ) (Jean-Francois)
Coming soon...
Boosting Background
Date
Presenter
Paper title
Slides
Mar. 29
Sebastian Minh
30. AdaBoost background (Sebastian) 31. Friedman, J. H., Hastie, T. and Tibshirani, R. (1998) Additive Logistic Regression: a Statistical View of Boosting (Sebastian) 32. Schneiderman, H. and Kanade, T. (2004) Object Detection Using the Statistics of Parts , IJCV (Presenter: Minh, Evaluator: Andrew) Demo available 33. Viola, P. and Jones (2001) Robust Real-time Object Detection , Second International Workshop on Statistical and Computational Theories of Vision (Presenter: Minh, Evaluator: Andrew) Short version
Obj. detection ppt Evaluation ppt
Part 2: Objects and Parts Segmentation
Date
Presenter
Paper title
Slides
Apr. 3-5
Alyosha Fred
34. Wertheimer, M. (1923) Laws of Organization in Perceptual Forms (Alyosha) 35. Weiss, Y. (1999) Segmentation using eigenvectors: a unifying view , ICCV (Fred) 36. Ng, A.Y., Jordan, M.I. and Weiss, Y. (2001) On Spectral Clustering: Analysis and an algorithm , NIPS (Fred)
Coming soon...
Apr. 10
Ross Jean-Francois Evaluator : Hongwen
37. Tu and Zhu (2002) Image Segmentation by Data-Driven Markov Chain Monte Carlo , PAMI (Ross)
Coming soon...
Apr. 12
Jean-Francois Evaluator : Hongwen
38. Boykov and Jolly (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images , ICCV (Jean-Francois) Application : Li, Y., Sun, J., Tang, C.K. and Shum, H. (2004) Lazy Snapping , SIGGRAPH (Jean-Francois, Evaluator: Hongwen)
Coming soon...
Grouping Repeated Structures
Date
Presenter
Paper title
Slides
Apr. 17
Ankur Eakta
39. Boiman, O. and Irani, M (2006), Similarity by Composition , NIPS (Ankur)
Coming soon...
Apr. 19
No classes (from academic calendar)
Apr. 24
Eakta Alyosha Evaluator : Fred
40. Kannan, A., Winn, J. and Rother, C. (2006) Clustering appearance and shape by learning jigsaws , NIPS (Eakta) 41. Ren, X. and Malik, J. (2003) Learning a Classification Model for Segmentation , ICCV Superpixel code available (Evaluator: Fred) 42. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T. and Zisserman, A. (2006) Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , CVPR (Alyosha)
Coming soon...
From Features to Objects
Date
Presenter
Paper title
Slides
Apr. 26
Hongwen
44. Torralba, A., Murphy, K.P. and Freeman, W.T (in press) Sharing visual features for multiclass and multiview object detection , PAMI (Hongwen) 45. Opelt, A., Pinz, A, Zisserman, A. (2006) Incremental learning of object detectors using a visual shape alphabet , CVPR 46. Ferrari, V., Fevrier, L., Jurie, F. and Schmid, C. (2006) Groups of Adjacent Contour Segments for Object Detection , INRIA Technical Report 47. Leibe, B., Leonardis, A. and Schiele, B. (2004) Combined Object Categorization and Segmentation with an Implicit Shape Model , ECCV'04 Workshop on Statistical Learning in Computer Vision (Hongwen) 48. Leibe, B., Seemann, E. and Schiele, B. (2005) Pedestrian Detection in Crowded Scenes , CVPR
Coming soon...
Scenes, Context, and Image Parsing
Date
Presenter
Paper title
Slides
May 1
Byron Alyosha
66. Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images , NIPS (Byron) 64. Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image , ICCV (Alyosha) 67. Tu, Z., Chen, X., Yuille, A. and Zhu, S.C. (2005) Image Parsing: Unifying Segmentation, Detection, and Recognition , IJCV 68. Ren, X., Fowlkes, C. and Malik, J. (2006) Figure/Ground Assignment in Natural Images , ECCV 69. Cornelis, N., Leibe, B., Cornelis, K. and Van Gool, L. (2006) 3D City Modeling Using Cognitive Loops , 3DPVT
Coming soon...
Face Modeling / Recognition
Date
Presenter
Paper title
Slides
May 3
Andrew Minh Evaluator : Ralph
70. Sinha, P., Balas, B.J., Ostrovsky, Y., and Russell, R. (under review) Face recognition by humans: 20 results all computer vision researchers should know about (Andrew) 71. Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active Appearance Models , ECCV (Minh) (Evaluator: Ralph)
Coming soon...
Final project presentations
Date
Informations
May 7
The presentations will be from 1:00 to 4:00 pm. The location is PH226A (that's Porter Hall). See here for updated information from the HUB (search for 16721).
Similar CoursesThis course has been inspired by these offered by several of my colleagues. Here is a partial list:
Selected Topics in Vision & Learning (Serge Belongie, UCSD)
Learning and Inference in Vision (Bill Freeman, MIT)
Object Recognition (Kristen Grauman, Texas-Austin)
High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
Recognition Problems in Computer Vision (Greg Mori, SFU)
Scene Understanding Seminar (Aude Oliva, MIT)
Visual Recognition (Pietro Perona, CalTech)
Vision and Learning (Jianbo Shi, UPenn)
Some tutorials, workshops and seminars:
Page created and maintained by Jean-Francois Lalonde (email: jlalonde at cs dot cmu dot edu)