Identify and Remove Duplicate Data in R (original) (raw)

Last Updated : 30 Apr, 2026

Duplicate data is a common issue in real-world datasets, where identical rows or repeated values appear more than once. If not handled properly, duplicates can affect analysis results and lead to incorrect conclusions. R provides simple functions to identify and remove duplicates from vectors and data frames.

remove_duplicate_data_in_r

Remove Duplicate Data in R

Identifying Duplicate Data in vector

We can use duplicated() function to find out how many duplicates value are present in a vector. The sum() function will give us the count of the number of duplicate values.

R `

vec <- c(1, 2, 3, 4, 4, 5)

duplicated(vec)

sum(duplicated(vec))

`

**Output:

FALSE FALSE FALSE FALSE TRUE FALSE
1

Here, the value 4 appears twice, so one duplicate is identified.

Removing Duplicate Data in a vector

We can remove duplicate data from vectors by using unique() functions so it will give only unique values.

R `

vec <- c(1, 2, 3, 4, 4, 5)

unique(vec)

`

**Output:

[1] 1 2 3 4 5

Identifying Duplicate Data in a Data Frame

We will use the duplicated() function which returns the count of duplicate rows present in a data frame.

**Syntax:

duplicated(dataframe)

R `

res=data.frame(name=c("Ram","Geeta","John","Paul", "Cassie","Geeta","Paul"), maths=c(7,8,8,9,10,8,9), science=c(5,7,6,8,9,7,8), history=c(7,7,7,7,7,7,7))

res duplicated(res) sum(duplicated(res))

`

**Output:

data_frame

Output

**Here:

Removing Duplicate Data in a data frame

We will see some different methods to handle duplicate values in a data frame.

Method 1: Using unique()

We use unique() to get rows having unique values in our data.

**Syntax:

unique(dataframe)

R `

res=data.frame(name=c("Ram","Geeta","John","Paul", "Cassie","Geeta","Paul"), maths=c(7,8,8,9,10,8,9), science=c(5,7,6,8,9,7,8), history=c(7,7,7,7,7,7,7))

res unique(res)

`

**Output:

dataframe

Output

Method 2: Using distinct()

To use this method, tidyverse package should be installed and dplyr library should be loaded. We use distinct() to get rows having distinct values in our data.

**Syntax

distinct(dataframe,keep_all=TRUE)

**Parameter:

**Example 1: Using distinct function

R `

library(tidyverse)

res=data.frame(name=c("Ram","Geeta","John","Paul", "Cassie","Geeta","Paul"), maths=c(7,8,8,9,10,8,9), science=c(5,7,6,8,9,7,8), history=c(7,7,7,7,7,7,7))

res distinct(res)

`

**Output:

dataframe

Output

**Example 2: Printing unique rows in terms of maths column

R `

res=data.frame(name=c("Ram","Geeta","John","Paul", "Cassie","Geeta","Paul"), maths=c(7,8,8,9,10,8,9), science=c(5,7,6,8,9,7,8), history=c(7,7,7,7,7,7,7))

res distinct(res,maths,.keep_all = TRUE)

`

**Output:

dataframe

Output

The output returns a data frame with distinct rows based on the "maths" column, keeping only the first occurrence of each unique value.