Topics in Artificial Intelligence -- Machine Learning with Large-scale Data (original) (raw)
General Information
Time: Mondays 12:00 - 3:00 PM | Place: CBIM 22 |
---|---|
Instructor: Tina Eliassi-Rad | Office hours: Mondays 3:00 - 4:00 PM in CBIM 08 |
Course number: 16:198:598 | Credits: 3 |
Overview
This graduate-level course covers machine-learning algorithms, programming environments, and software frameworks that are designed to effectively deal with large-scale (i.e., big) data.
Prerequisites: A previous course on machine learning or data mining. A strong knowledge of algorithms and programming (Java, C, and scripting/dynamic languages).
Textbook
- Scaling up Machine Learning: Parallel and Distributed Approaches. Edited by Ron Bekkerman, Mikhail Bilenko, and John Langford. Cambridge University Press, December 30, 2011
Resources
- (textbook) Kevin Murphy,Machine Learning: A Probabilistic Perspective. ISBN 0262018020, MIT Press, 2012.
- (textbook) Christopher Bishop, Pattern Recognition and Machine Learning. ISBN 0387310738, Springer 2006.
- (textbook) Tom Mitchell, Machine Learning. ISBN 0070428077, McGraw-Hill, 1997.
- (textbook, free on-line) Trevor Hastie, Robert Tibshirani and Jerome Friedman, Elements of Statistical Learning. ISBN 0387952845, Springer, 2009 (2nd edition).
- (textbook, free on-line) David MacKay, Information Theory, Inference, and Learning Algorithms. ISBN 0521642981, Cambridge University Press, 2003.
- (textbook, free on-line) Roberto Battiti and Mauro Brunato.The LION Way: Machine Learning plus Intelligent Optimization. Lionsolver, Inc. 2013.
- Probability Review (David Blei, Princeton)
- Probability Theory Review (Arian Maleki and Tom Do, Stanford)
- Linear Algebra Tutorial (C.T. Abdallah, Penn)
- Linear Algebra Review and Reference (Zico Kolter and Chuong Do, Stanford)
- Statistical Data Mining Tutorials (Andrew Moore, Google/CMU)
- Theoretical CS Cheat Sheet (Princeton)
Grading
You will be evaluated based on student presentations (40%) and a substantial semester-long project (60%). The project must include at least one big data set, at least one learning/mining algorithm, and a real-world application. For the project, you will need to prepare a proposal, give a presentation at the end of the semester, and write a final report. More details will be provided in class.
Notes, Policies, and Guidelines
- We will use the class sakai site for announcements, assignments, and your contributions.
- When emailing me about the course, begin the subject line with [f14 cs598].
- For your hadoop-based jobs, you can use the DCS hadoop cluster. For big non-hadoop jobs, you can use aurora.cs. If you don't have accounts on these machine, let me know.
- Course projects must be done individually.
- Any regrading request must be submitted in writing and within one week of the returned material. The request must detail precisely and concisely the grading error.
- Refresh your knowledge of the university's academic integrity policy and plagiarism. There is zero-tolerance for cheating!
Schedule / Syllabus (Subject to Change)
Some Similar Courses in Other Universities
- Machine Learning with Large Datasets by William Cohen (taught Spring 2012, Spring 2013 and Spring 2014)
- Big Data: Large Scale Machine Learning by John Langford and Yann LeCun (taught Spring 2013)
- Machine Learning for Big Data / Statistics for Big Data by Carlos Guestrin and Emily Fox (taught Winter 2013)