What's Data Science Pipeline? (original) (raw)

Last Updated : 1 Jul, 2025

Data Science is a field that focuses on extracting knowledge from data sets that are huge in amount. It includes preparing data, doing analysis and presenting findings to make informed decisions in an organization. A pipeline in data science is a set of actions which changes the raw data from various sources to an understandable format so that we can store it and use it for analysis.

The raw data undergoes different stages within a pipeline which are:

**Step 1: Problem Definition

**Step 2: Data Collection

**Step 3: Data Cleaning and Preprocessing

**Step 4: Exploratory Data Analysis (EDA)

**Step 5: Data Modeling

Step 6: Model Evaluation

Step 7: Deployment

Step 8: Monitoring and Maintenance

Step 9: Reporting

**Related Articles:

  1. What is Data Science?
  2. Overview of Data Pipeline