QuantileQuantile Plots (original) (raw)

Quantile-Quantile Plots

Last Updated : 6 Apr, 2026

The Quantile-Quantile (Q–Q) plot is a graphical method used to determine whether a dataset follows a specific probability distribution or whether two datasets come from the same population. It is particularly useful for assessing whether data is normally distributed or follows another known distribution.

Quantiles And Percentiles

Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution. They are often used to describe the spread or distribution of a dataset. The most common quantiles are:

  1. **Median (50th percentile): The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves.
  2. **Quartiles (25th, 50th and 75th percentiles): Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median and the third quartile (Q3) is the value below which 75% of the data falls.
  3. **Percentiles: Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.

**Note:

How to Draw Q-Q plot

To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps:

**1. Collect the Data: Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest.

**2. Sort the Data: Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately.

**3. Choose a Theoretical Distribution: Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution or any other distribution that fits your data well.

**4. Calculate Theoretical Quantiles: Compute the quantiles for the chosen theoretical distribution. For example, if you're comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles.

**5. Plotting:

Interpretation of Q-Q plot

Exploring Distribution Similarity with Q-Q Plots

Q-Q plots visually assess whether two datasets follow the same distribution by comparing their quantiles.

Python Implementation Of Q-Q Plot

Here we create a Q-Q plot to compare a sample dataset with a theoretical normal distribution. It helps visually assess whether the data follows a normal distribution.

Python `

import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats

Generate example data

np.random.seed(0) data = np.random.normal(loc=0, scale=1, size=1000)

Create Q-Q plot

stats.probplot(data, dist="norm", plot=plt) plt.title('Normal Q-Q plot') plt.xlabel('Theoretical quantiles') plt.ylabel('Ordered Values') plt.grid(True) plt.show()

`

**Output:

download-(13)

Q-Q plot

Here as the data points approximately follow a straight line in the Q-Q plot, it suggests that the dataset is consistent with the assumed theoretical distribution, which in this case we assumed to be the normal distribution.

Types of Q-Q plots

There are several types of Q-Q plots commonly used in statistics and data analysis, each suited to different scenarios or purposes:

  1. **Normal Distribution: A symmetric distribution where the Q-Q plot would show points approximately along a diagonal line if the data adheres to a normal distribution.
  2. **Right-skewed Distribution: A distribution where the Q-Q plot would display a pattern where the observed quantiles deviate from the straight line towards the upper end, indicating a longer tail on the right side.
  3. **Left-skewed Distribution: A distribution where the Q-Q plot would exhibit a pattern where the observed quantiles deviate from the straight line towards the lower end, indicating a longer tail on the left side.
  4. **Under-dispersed Distribution: A distribution where the Q-Q plot would show observed quantiles clustered more tightly around the diagonal line compared to the theoretical quantiles, suggesting lower variance.
  5. **Over-dispersed Distribution: A distribution where the Q-Q plot would display observed quantiles more spread out or deviating from the diagonal line, indicating higher variance or dispersion compared to the theoretical distribution. Python `

import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats

normal_data = np.random.normal(loc=0, scale=1, size=1000)

right_skewed_data = np.random.exponential(scale=1, size=1000)

left_skewed_data = -np.random.exponential(scale=1, size=1000)

under_dispersed_data = np.random.normal(loc=0, scale=0.5, size=1000) under_dispersed_data = under_dispersed_data[(under_dispersed_data > -1) & (under_dispersed_data < 1)] # Truncate

over_dispersed_data = np.concatenate((np.random.normal(loc=-2, scale=1, size=500), np.random.normal(loc=2, scale=1, size=500)))

Create Q-Q plots

plt.figure(figsize=(15, 10))

plt.subplot(2, 3, 1) stats.probplot(normal_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Normal Distribution')

plt.subplot(2, 3, 2) stats.probplot(right_skewed_data, dist="expon", plot=plt) plt.title('Q-Q Plot - Right-skewed Distribution')

plt.subplot(2, 3, 3) stats.probplot(left_skewed_data, dist="expon", plot=plt) plt.title('Q-Q Plot - Left-skewed Distribution')

plt.subplot(2, 3, 4) stats.probplot(under_dispersed_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Under-dispersed Distribution')

plt.subplot(2, 3, 5) stats.probplot(over_dispersed_data, dist="norm", plot=plt) plt.title('Q-Q Plot - Over-dispersed Distribution')

plt.tight_layout() plt.show()

`

**Output:

download-(14)

Q-Q plot for different distributions

Applications

Advantages