Sales Forecast Prediction Python (original) (raw)

Sales forecasting is an important aspect of business planning, helping organizations predict future sales and make informed decisions about inventory management, marketing strategies and resource allocation. In this article we will explore how to build a sales forecast prediction model using Python. Sales forecasting involves estimating current or future sales based on data trends.

Below is the step-by-step implementation of the sales prediction model.

1. **Importing Required Libraries

Before starting, ensure you have the necessary libraries installed. For this project, we will be using pandas, matplotlib, seaborn, xgboost and scikit learn. You can install them using pip:

pip install pandas numpy matplotlib seaborn scikit-learn xgboost

Python `

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import xgboost as xgb from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error

`

2. **Loading the Dataset

For this we will be using a sales dataset that contains features like Row ID, Order ID, Customer ID, Customer ID, etc. You can download dataset from here.

Python `

file_path = 'train.csv' data = pd.read_csv(file_path)

data.head()

`

**Output:

Screenshot-2025-04-08-123518

Dataset

3. **Data Preprocessing and Visualization

In this block, we will preprocess the data and visualize the sales trend over time.

data['Order Date'] = pd.to_datetime(data['Order Date'], format='%d/%m/%Y')

sales_by_date = data.groupby('Order Date')['Sales'].sum().reset_index()

plt.figure(figsize=(12, 6)) plt.plot(sales_by_date['Order Date'], sales_by_date['Sales'], label='Sales', color='red') plt.title('Sales Trend Over Time') plt.xlabel('Date') plt.ylabel('Sales') plt.grid(True) plt.legend() plt.xticks(rotation=45) plt.tight_layout() plt.show()

`

Output:

download-

Sales Trend Over Time

4. **Feature Engineering - Creating Lagged Features

Here we create lagged features to capture the temporal patterns in the sales data.

def create_lagged_features(data, lag=1): lagged_data = data.copy() for i in range(1, lag+1): lagged_data[f'lag_{i}'] = lagged_data['Sales'].shift(i) return lagged_data

lag = 5
sales_with_lags = create_lagged_features(data[['Order Date', 'Sales']], lag)

sales_with_lags = sales_with_lags.dropna()

`

5. **Preparing the Data for Training

In this step we prepare the data for training and testing.

X = sales_with_lags.drop(columns=['Order Date', 'Sales']) y = sales_with_lags['Sales']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

`

6. **Training the XGBoost Model

Here we will train the **XGBoost model. It is a machine learning algorithm that uses gradient boosting to create highly accurate predictive models particularly well-suited for regression tasks like sales forecasting.

model_xgb = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=100, learning_rate=0.1, max_depth=5) model_xgb.fit(X_train, y_train)

`

7. **Making Predictions and Evaluating the Model

Here we make predictions and evaluate the model performance using RMSE.

predictions_xgb = model_xgb.predict(X_test) rmse_xgb = np.sqrt(mean_squared_error(y_test, predictions_xgb))

print(f"RMSE: {rmse_xgb:.2f}")

`

RMSE: 734.63

The RMSE of 734.63 indicates the average deviation between the actual and predicted sales values. A lower RMSE value signifies better model accuracy, with the model's predictions being closer to the actual sales data. As we have large amount of sales data this RMSE score is accptable.

8. Visualizing Results

We will plot both the actual and predicted sales to visually compare the performance of the model.

Python `

plt.figure(figsize=(12, 6)) plt.plot(y_test.index, y_test, label='Actual Sales', color='red') plt.plot(y_test.index, predictions_xgb, label='Predicted Sales', color='green') plt.title('Sales Forecasting using XGBoost') plt.xlabel('Date') plt.ylabel('Sales') plt.legend() plt.grid(True) plt.tight_layout() plt.show()

`

**Output:

download

As we can see the predicted and actual values are quite close to each other this proves the efficiency of our model. Sales forecasting using machine learning models like XGBoost can significantly enhance the accuracy of predictions by capturing temporal patterns in historical data. It can be used for improving sales predictions helping businesses optimize inventory, pricing and demand planning.