Machine learning in Python with scikit-learn (original) (raw)
What you will learn
At the end of this course, you will be able to:
- Grasp the fundamental concepts of machine learning
- Build a predictive modeling pipeline with scikit-learn
- Develop intuitions behind machine learning models from linear models to gradient-boosted decision trees
- Evaluate the statistical performance of your models
Description
Predictive modeling is a pillar of modern data science. In this field, scikit-learn is a central tool: it is easily accessible, yet powerful, and naturally dovetails in the wider ecosystem of data-science tools based on the Python programming language.
This course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.
The course is more than a cookbook: it will teach you to be critical about each step of the design of a predictive modeling pipeline: from choices in data preprocessing, to choosing models, gaining insights on their failure modes and interpreting their predictions.
The training will be essentially practical, focusing on examples of applications with code executed by the participants.
The MOOC is free of charge, all the course materials are available at: https://inria.github.io/scikit-learn-mooc/.
The authors of the course are scikit-learn core developers, they will be your guides throughout the training!
Format
The course will cover practical aspects through the use of Jupyter notebooks and regular exercises. Throughout the course, we will highlight scikit-learn best practices and give you the intuition to use scikit-learn in a methodologically sound way.
Prerequisites
The course aims to be accessible without a strong technical background. The requirements for this course are:
- basic knowledge of Python programming : defining variables, writing functions, importing modules
- some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required
For a quick introduction to these libraries, you can use the following resources : Introduction to NumPy and Matplotlib by Sebastian Raschka and 10 minutes to pandas.
Assessment and certification
Students' work in the course is assessed through quizzes after the lessons and programming exercises at the end of every modules.
An Open Badge for successful completion of the course will be issued on request to learners who obtain an overall score of 60% correct answers to all the quizzes and programming exercises.
Course plan
- Machine Learning concepts
- Fitting a scikit-learn model on numerical data
- Handling categorical data
- Overfitting and Underfitting
- Validation and learning curves
- Bias versus variance trade-off
- Intuitions on linear models
- Modelling with a non-linear relationship data-target
- Regularization in linear model
- Linear model for classification
- Intuitions on tree-based models
- Decisison tree in classification
- Decision tree in regression
- Hyperparameters of decision tree
- Ensemble method using bootstrapping
- Ensemble based on boosting
- Hyperparameters tuning with ensemble methods
- Comparing a model with simple baselines
- Choice of cross-validation
Course team
Arturo Amor is an engineer at Inria. He is in charge of broadening the scikit-learn documentation's accessibility to all kind of users.
Loïc Estève is a research engineer at Inria. He is a scikit-learn core developer since 2016.
Olivier Grisel is a machine learning engineer at Inria. He is a scikit-learn core developer since 2010.
Guillaume Lemaître is a research engineer at Inria. He is a scikit-learn core developer since 2017.
Gaël Varoquaux is a research director at Inria. He is one of the creator of scikit-learn and the project manager for the scikit-learn consortium.
Thomas Schmitt is a machine Learning Engineer at Inria.
Organizations
Partnership
Hosting the Jupyter notebook execution environment for this MOOC.
Social networks
Follow us on twitter @InriaLearnLab and feel free to use the #ScikitLearnMooc hashtag.
License
License for the course content
Attribution
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.