How to perform Causal Analysis? (original) (raw)

Last Updated : 2 May, 2026

Causal analysis is a technique used to understand why something happens by identifying cause–effect relationships. It helps analyze how changes in one variable affect another, supporting better decision-making across various fields. Causal analysis helps answer key questions such as:

For example, Increasing the price of a product may lead to a decrease in its demand. Here, price is the cause and demand is the effect. By analyzing data, we can determine whether this relationship truly exists and how strong the impact is.

Correlation & Causation

**Correlation: Refers to a situation where two variables change together, but one does not necessarily cause the other.

**Causation: Refers to a situation where a change in one variable directly causes a change in another.

**Key Concepts

Steps to Perform

  1. **Defining the Problem: Clearly identify the issue to be analyzed, as this sets the foundation for the process.
  2. **Identifying Variables: Breaking the problem into key variables that can influence the outcome.
  3. **Collection of Data: Gathering relevant and reliable data using methods like surveys, experiments, or existing datasets.
  4. **Establishing Relationships: Determine how variables are related using appropriate tools or methods.
  5. **Distinguishing Correlation from Causation: Ensure that relationships are causal and not just coincidental.
  6. **Considering Confounding Variables: Identifying other factors that may influence the relationship and affect results.
  7. **Interpreting the Results: Analyzing the findings to draw meaningful conclusions and support decision-making.

**Common Methods

Implementation

Suppose we want to understand how a customer’s total bill influences the tip amount. The goal is to analyze whether an increase in total bill leads to a higher tip using causal analysis.

The dataset used is publicly available and contains information about restaurant bills and tips. It can be downloded by clicking here.

Includes variables such as:

**1. Importing Libraries

Importing libraries likepandas, matplotliband statsmodels

Python `

Importing Libraries

import pandas as pd # for data handling import matplotlib.pyplot as plt # for plotting graphs import statsmodels.api as sm # to study relationship between variables

`

**2. Loading the Dataset

Loading the dataset directly from the URL and viewing the data.

Python `

data = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")

selecting relevant columns

data = data[["total_bill", "tip"]]

renaming for simplicity

data.columns = ["price", "demand"]

print(data.head())

`

3. **Visualizing the Relationship

Plotting the relationship between bill amount and tip.

Python `

plt.scatter(data["price"], data["demand"])

plt.xlabel("Total Bill (Price)") plt.ylabel("Tip (Demand)") plt.title("Price vs Demand")

plt.show()

`

Screenshot-from-2026-03-28-11-20-07

Total Bill vs Tip Scatter Plot

**4. Applying Regression Analysis

Building a regression model to analyze how one variable affects the other.

Python `

X = data["price"] # independent variable y = data["demand"] # dependent variable

X = sm.add_constant(X)

model = sm.OLS(y, X).fit()

`

**5. Understanding the Output

Ithelps interpret how strongly and significantly bill amount influences tip.

C++ `

print("Coefficient:", model.params["price"]) print("P-value:", model.pvalues["price"]) print("R-squared:", model.rsquared)

`

**Output:

Coefficient: 0.1050245173843534 P-value: 6.692470646863736e-34 R-squared: 0.45661658635167657

Higher total bills are usually associated with higher tips. This relationship is reliable and can be trusted based on the analysis.

**Advantages

**Limitations

**Applications