QuantileQuantile Plots (original) (raw)
Quantile-Quantile Plots
Last Updated : 6 Apr, 2026
The Quantile-Quantile (Q–Q) plot is a graphical method used to determine whether a dataset follows a specific probability distribution or whether two datasets come from the same population. It is particularly useful for assessing whether data is normally distributed or follows another known distribution.
- Plots sample quantiles against theoretical quantiles (or another dataset’s quantiles)
- A roughly straight line indicates a good match between distributions
- Deviations from the line reveal skewness, heavy or light tails or outliers
- Widely used to check distributional assumptions in statistical modeling
- Useful in data analysis, hypothesis testing and quality control
Quantiles And Percentiles
Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution. They are often used to describe the spread or distribution of a dataset. The most common quantiles are:
- **Median (50th percentile): The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves.
- **Quartiles (25th, 50th and 75th percentiles): Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median and the third quartile (Q3) is the value below which 75% of the data falls.
- **Percentiles: Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.
**Note:
- A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.
- For reference, a 45-degree line is plotted. If the samples come from the same distribution, the points will lie approximately along this line..
How to Draw Q-Q plot
To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps:
**1. Collect the Data: Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest.
**2. Sort the Data: Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately.
**3. Choose a Theoretical Distribution: Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution or any other distribution that fits your data well.
**4. Calculate Theoretical Quantiles: Compute the quantiles for the chosen theoretical distribution. For example, if you're comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles.
**5. Plotting:
- Plot the sorted dataset values on the x-axis.
- Plot the corresponding theoretical quantiles on the y-axis.
- Each data point (x, y) represents a pair of observed and expected values.
- Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.
Interpretation of Q-Q plot
- If the points on the plot fall approximately along a straight line, it suggests that your dataset follows the assumed distribution.
- Deviations from the straight line indicate departures from the assumed distribution, requiring further investigation.
Exploring Distribution Similarity with Q-Q Plots
Q-Q plots visually assess whether two datasets follow the same distribution by comparing their quantiles.
- Comparing datasets helps determine if they can be merged to improve estimation of parameters like location and scale.
- A Q-Q plot is constructed by plotting quantiles of one dataset against corresponding quantiles of another.
- Points lying close to a diagonal line indicate that the two distributions are similar.
- Deviations from the diagonal suggest differences in shape, spread or tail behavior.
- Tests such as chi-square and Kolmogorov–Smirnov assess overall distribution differences statistically.
- These tests may not reveal where specific distributional differences occur.
- Q-Q plots provide a detailed visual comparison by examining quantile-by-quantile differences.
Python Implementation Of Q-Q Plot
Here we create a Q-Q plot to compare a sample dataset with a theoretical normal distribution. It helps visually assess whether the data follows a normal distribution.
Python `
import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats
Generate example data
np.random.seed(0) data = np.random.normal(loc=0, scale=1, size=1000)
Create Q-Q plot
stats.probplot(data, dist="norm", plot=plt) plt.title('Normal Q-Q plot') plt.xlabel('Theoretical quantiles') plt.ylabel('Ordered Values') plt.grid(True) plt.show()
`
**Output:
.webp)
Q-Q plot
Here as the data points approximately follow a straight line in the Q-Q plot, it suggests that the dataset is consistent with the assumed theoretical distribution, which in this case we assumed to be the normal distribution.
Types of Q-Q plots
There are several types of Q-Q plots commonly used in statistics and data analysis, each suited to different scenarios or purposes:
- **Normal Distribution: A symmetric distribution where the Q-Q plot would show points approximately along a diagonal line if the data adheres to a normal distribution.
- **Right-skewed Distribution: A distribution where the Q-Q plot would display a pattern where the observed quantiles deviate from the straight line towards the upper end, indicating a longer tail on the right side.
- **Left-skewed Distribution: A distribution where the Q-Q plot would exhibit a pattern where the observed quantiles deviate from the straight line towards the lower end, indicating a longer tail on the left side.
- **Under-dispersed Distribution: A distribution where the Q-Q plot would show observed quantiles clustered more tightly around the diagonal line compared to the theoretical quantiles, suggesting lower variance.
- **Over-dispersed Distribution: A distribution where the Q-Q plot would display observed quantiles more spread out or deviating from the diagonal line, indicating higher variance or dispersion compared to the theoretical distribution. Python `
import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats
normal_data = np.random.normal(loc=0, scale=1, size=1000)
right_skewed_data = np.random.exponential(scale=1, size=1000)
left_skewed_data = -np.random.exponential(scale=1, size=1000)
under_dispersed_data = np.random.normal(loc=0, scale=0.5, size=1000) under_dispersed_data = under_dispersed_data[(under_dispersed_data > -1) & (under_dispersed_data < 1)] # Truncate
over_dispersed_data = np.concatenate((np.random.normal(loc=-2, scale=1, size=500), np.random.normal(loc=2, scale=1, size=500)))
Create Q-Q plots
plt.figure(figsize=(15, 10))
plt.subplot(2, 3, 1) stats.probplot(normal_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Normal Distribution')
plt.subplot(2, 3, 2) stats.probplot(right_skewed_data, dist="expon", plot=plt) plt.title('Q-Q Plot - Right-skewed Distribution')
plt.subplot(2, 3, 3) stats.probplot(left_skewed_data, dist="expon", plot=plt) plt.title('Q-Q Plot - Left-skewed Distribution')
plt.subplot(2, 3, 4) stats.probplot(under_dispersed_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Under-dispersed Distribution')
plt.subplot(2, 3, 5) stats.probplot(over_dispersed_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Over-dispersed Distribution')
plt.tight_layout() plt.show()
`
**Output:
.webp)
Q-Q plot for different distributions
Applications
- Checking distribution assumptions
- Detecting outliers
- Comparing datasets
- Assessing normality
- Validating statistical models
Advantages
- Easy visual comparison of distributions
- Works for datasets of different sizes
- Helps identify deviations and outliers
- Useful for model validation