Python Data Science – Real Python (original) (raw)

Python Data Science Tutorials

“Data science” is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components:

Data exploration & analysis.

Data visualization. A pretty self-explanatory name. Taking data and turning it into something colorful.

Classical machine learning. Conceptually, we could define this as any supervised or unsupervised learning task that is not deep learning (see below). Scikit-learn is far-and-away the go-to tool for implementing classification, regression, clustering, and dimensionality reduction, while StatsModels is less actively developed but still has a number of useful features.

Deep learning. This is a subset of machine learning that is seeing a renaissance, and is commonly implemented with Keras, among other libraries. It has seen monumental improvements over the last ~5 years, such as AlexNet in 2012, which was the first design to incorporate consecutive convolutional layers.

Data storage and big data frameworks. Big data is best defined as data that is either literally too large to reside on a single machine, or can’t be processed in the absence of a distributed environment. The Python bindings to Apache technologies play heavily here.

Odds and ends. Includes subtopics such as natural language processing, and image manipulation with libraries such as OpenCV.

All data science tutorials at Real Python: