Apriori Algorithm (original) (raw)

Last Updated : 13 May, 2026

Apriori Algorithm is a data mining technique used to identify items that frequently appear together in large datasets. It helps discover relationships and association rules between items, making it widely used in market basket analysis.

**For Example:

If customers often buy bread and butter together in a grocery store, the store can place these items nearby or create combo offers to improve sales and customer experience.

Working

**1. Identifying Frequent Item-Sets

**2. Creating Possible Item Group

**3. Removing Infrequent Item Groups

4. **Generating Association Rules

Key Metrics of Apriori Algorithm

1. Support

Support measures how frequently an item or item-set appears in the dataset relative to the total number of transactions.

Support(X) = \frac{\text{Number of transactions containing } X}{\text{Total number of transactions}}

2. Confidence

Confidence measures the likelihood that item Y is purchased when item X is purchased.

Confidence(X \rightarrow Y) =\frac{Support(X \cup Y)}{Support(X)}

**3. Lift

Lift measures how much more likely two items are purchased together compared to random chance.

Lift(X \rightarrow Y) =\frac{Confidence(X \rightarrow Y)}{Support(Y)}

Example

Let's understand the concept of apriori Algorithm with the help of an example. Consider the following dataset and we will find frequent Item-Sets and generate association rules for them:

transaction_id

Transactions of a Grocery Shop

**Step 1 : Setting the parameters

**Minimum Support Threshold: 50% (item must appear in at least 3/5 transactions). This threshold is formulated from this formula:

\text{Support}(A) = \frac{\text{Number of transactions containing itemset } A}{\text{Total number of transactions}}

**Minimum Confidence Threshold: 70% ( You can change the value of parameters as per the use case and problem statement ). This threshold is formulated from this formula:

\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}

**Step 2: Find Frequent 1-Item-Sets

Let's count how many transactions include each item in the dataset (calculating the frequency of each item).

item

Frequent 1-Itemsets

All items have support ≥ 50%, so they qualify as frequent 1-Item-Sets. If any item has support < 50%, It will be omitted from the frequent 1- Item-Sets.

**Step 3: Generate Candidate 2-Item-Sets

Combine the frequent 1-Item-Sets into pairs and calculate their support. For this use case we will get 3 item pairs ( bread,butter) , (bread,milk) and (butter,milk). Their support values are calculated similarly to Step 2.

check2

Candidate 2-Itemsets

**Frequent 2-Item-Sets: {Bread, Milk} meets the 50% minimum support threshold. However, {Bread, Butter} and {Butter, Milk} do not meet the threshold, so they are omitted.

**Step 4: Generate Candidate 3-Item-Sets

The Apriori Algorithm generates candidate 3-itemsets only from frequent 2-itemsets. Since only {Bread, Milk} satisfies the minimum support threshold in Step 3, there is no valid 3-itemset can be generated.

**Step 5: Generate Association Rules

Now we generate association rules from the frequent itemsets and calculate their confidence values.

Rule 1: If Bread implies Butter

If a customer buys Bread, they are likely to buy Butter as well.

Rule 2: Butter implies Bread

If a customer buys Butter, they are likely to buy Bread as well.

Rule 3: Bread implies Milk

If a customer buys Bread, they are likely to buy Milk as well.

The Apriori Algorithm, as demonstrated in the bread-butter example, is widely used in modern startups like Zomato, Swiggy and other food delivery platforms. These companies use it to perform market basket analysis which helps them identify customer behaviour patterns and optimise recommendations.

Applications