Association Rule Mining in R Programming (original) (raw)

Last Updated : 5 Jul, 2025

Association Rule Mining is an unsupervised learning technique used to discover interesting relationships, patterns, or associations between items in large transactional datasets. It helps answer questions like, “What products are frequently bought together?” This technique is widely used in market basket analysis, recommendation systems, and customer behavior prediction.

Key Measures in Association Rule Mining

To evaluate and rank the strength of discovered rules, three key metrics are used:

**Support: Support tells how frequently an item or a set of items appears in the dataset.

**Formula:

\text{Support} = \frac{\text{Number of transactions with both A and B}}{\text{Total number of transactions}} = P(A \cap B)

**Confidence: Confidence tells how likely item Y is purchased when item X is purchased.

**Formula:

\text{Confidence} = \frac{\text{Number of transactions with both A and B}}{\text{Number of transactions with A}} = \frac{P(A \cap B)}{P(A)}

**Lift: Lift tells how likely item Y is purchased with X compared to the regular purchase rate of Y. A lift > 1 means X and Y are positively related.

**Formula:

\text{Lift} = \frac{\text{Confidence}}{\text{Expected Confidence}} = \frac{P(A \cap B)}{P(A) \cdot P(B)}

**Example: A customer does 4 transactions with you. In the first transaction, she buys 1 apple, 1 beer, 1 rice, and 1 chicken. In the second transaction, she buys 1 apple, 1 beer, 1 rice. In the third transaction, she buys 1 apple, 1 beer only. In fourth transactions, she buys 1 apple and 1 orange.

**Support(Apple) = 4/4

So, Support of {Apple} is 4 out of 4 or 100%

**Confidence(Apple -> Beer) = Support(Apple, Beer)/Support(Apple)
= (3/4)/(4/4)
= 3/4

So, Confidence of {Apple -> Beer} is 3 out of 4 or 75%

**Lift(Beer -> Rice) = Support(Beer, Rice)/(Support(Beer) * Support(Rice))
= (2/4)/(3/4) * (2/4)
= 1.33

So, Lift value is greater than 1 implies Rice is likely to be bought if Beer is bought.

Implementation of Association Rule Mining in R

We are performing market basket analysis using the Apriori algorithm on a groceries dataset, where each transaction is grouped by Member_number and items are listed in the itemDescription column.

1. Installing required packages

We install the necessary libraries for rule mining and visualization.

install.packages("arules") install.packages("arulesViz") install.packages("igraph") install.packages("visNetwork")

`

2. Loading the libraries

We load the required libraries into the session to make their functions available.

library(arules) library(arulesViz) library(igraph) library(visNetwork)

`

3. Reading the dataset

We read the CSV file containing the transactions into R as a data frame.

You can download the dataset from here.

data <- read.csv("/content/Groceries data.csv", stringsAsFactors = FALSE)

`

4. Converting data to transaction format

We group items by customer ID to create transactions.

transactions_list <- split(data$itemDescription, data$Member_number) transactions <- as(transactions_list, "transactions")

`

5. Displaying item summary and plotting frequencies

We visualize the top items purchased across all transactions.

summary(transactions) itemFrequencyPlot(transactions, topN = 10, type = "absolute", col = "steelblue", main = "Top 10 Items")

`

**Output:

dataset

Output

graph

Output

6. Running the Apriori algorithm

We generate association rules from the transaction data.

rules <- apriori(transactions, parameter = list(supp = 0.001, conf = 0.3))

`

**Output:

Apriori

Output

7. Displaying generated rules

We print the number of rules generated and inspect a few.

cat("Number of rules generated:", length(rules), "\n")

if (length(rules) > 0) { inspect(rules[1:min(10, length(rules))]) } else { cat("No rules were generated. Try lowering support or confidence.\n") }

`

**Output:

Dataset

Output

8. Sorting and inspecting rules by lift

We sort rules to identify those with the highest lift.

if (length(rules) > 0) { top_lift_rules <- sort(rules, by = "lift", decreasing = TRUE) inspect(top_lift_rules[1:min(10, length(top_lift_rules))]) }

`

**Output:

dataset

Output

9. Plotting the rules as a scatter plot

We visualize the rules by support and confidence.

if (length(rules) > 0) { plot(rules, method = "scatterplot", measure = c("support", "confidence"), shading = "lift", main = "Scatter Plot of Association Rules") }

`

**Output:

scatter_plot

Output

Applications of Association Rule Mining