Comprehensive Guide to Scatter Plot using ggplot2 in R (original) (raw)

Scatter plot uses dots to represent values for two different numeric variables and is used to observe relationships between those variables. To plot the Scatter plot we will use we will be using the **geom_point() function. This function is available in ggplot2 package which is a free and open-source visualization package widely used in R.

This package can be installed using the R function install. packages(). We can use below command to download it.

R `

install.packages("ggplot2")

`

**For example: We are using the ggplot2 library to create a scatter plot of the Sepal.Length vs. Sepal.Width from the iris dataset.

R `

library(ggplot2) ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

**Syntax :

geom_point(size, color, fill, shape, stroke)

**Parameters :

1. Scatter plot with groups

Here we will use distinguish the values by a group of data ( factor level data). **aes() function controls the color of the group and it should be factor variable.

**Syntax:

aes(color = factor(variable))

We are creating a scatter plot of Sepal.Length vs. Sepal.Width from the iris dataset and using the geom_point() function to color the points based on different values of Sepal.Width, treating it as a factor.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(aes(color = factor(Sepal.Width)))

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

2. Changing color in Scatter plot

Here we use aes() methods color attributes to change the color of the data points with specific variables. We are creating a scatter plot to color the points based on the Species variable.

R `

ggplot(iris) + geom_point(aes(x = Sepal.Length, y = Sepal.Width, color = Species))

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

3. Changing Shape of Data points in a Scatter plot

To change the shape of the data points we will use **shape attributes with aes() methods. We are creating a scatter plot to differentiate points by both shape and color based on the Species variable.

R `

ggplot(iris) + geom_point(aes(x = Sepal.Length, y = Sepal.Width, shape = Species , color = Species))

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

4. Changing the size aesthetic in Scatter plot

To change the aesthetic or data points we will use size **attributes in aes() methods. Here, we are creating a scatter plot to set the size of all points to a constant value of 0.5.

R `

ggplot(iris) + geom_point(aes(x = Sepal.Length, y = Sepal.Width, size = .5))

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

5. Label points in Scatter plot

To deploy the labels on the data point we will use label into the **geom_text() methods. Like in this example, we are creating a scatter plot and customizing the colors of the points based on the Species variable with a manual color palette. Labels are added to the points with geom_text() and the plot is further customized with titles, axis labels and a minimal theme. The legend is positioned to the right.

R `

library(ggplot2) color_palette <- c("blue", "green", "red")

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(size = 3) + geom_text(aes(label = Species), position = position_nudge(x = 0.05, y = 0.05), size = 3, show.legend = FALSE) +

scale_color_manual(values = color_palette) + theme_minimal() +

ggtitle("Sepal Length vs. Sepal Width") + xlab("Sepal Length") + ylab("Sepal Width") + theme(legend.position = "right")

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

Regression lines in Scatter plot with ggplot2 in R

Regression models a target prediction value supported independent variables and mostly used for finding out the relationship between variables and forecasting. In R we can use the stat_smooth() function to smoothen the visualization.

**Example: We are creating a scatter plot of Sepal.Length vs. Sepal.Width from the iris dataset and adding a linear regression line using stat_smooth() with the lm method to show the best-fit line.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + stat_smooth(method=lm)

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

**Syntax:

stat_smooth(method=”method_name”, formula=fromula_to_be_used, geom=’method name’)

**Parameters:

1. Using stat_mooth with LOESS mode in a Scatter plot

We are creating a scatter plot and adding a smoothing line using stat_smooth() which automatically selects the smoothing method (default is LOESS) to fit the data.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + stat_smooth()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

Alternative Method:

The **geom_smooth() function to represent a regression line and smoothen the visualization.

**Syntax:

geom_smooth(method=”method_name”, formula=fromula_to_be_used)

**Parameters:

**Example: We are creating a scatter plot and adding a smoothing line using geom_smooth() which automatically selects the smoothing method (default is LOESS) to fit the data.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + geom_smooth()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

In order to show the regression line on the graphical medium with help of geom_smooth() function, we pass the method as “loess” and the formula used as y ~ x.

2. Intercept and slope in a Scatter plot

We are creating a scatter plot and adding a customized straight line with a specified intercept of 37, slope of -5, in red color, dashed linetype, and size 1.5 using geom_smooth().

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + geom_smooth(intercept = 37, slope = -5, color="red", linetype="dashed", size=1.5)

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

3. Change the point color, shape and size manually

The scale_fill_manual, scale_size_manual, scale_shape_manual, scale_linetype_manual are builtin types which is assign desired colors to categorical data we use one of them scale_color_manual() function which is used to scale (map).

**Syntax :

**Parameter :

**Example: We are creating a scatter plot and coloring the points based on the Species variable. A linear regression line is added using geom_smooth() with no confidence interval (se=FALSE) and extended across the full range. Custom shapes and colors are applied to the points and the legend is positioned at the top.

R `

library(ggplot2)

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point() + geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+ scale_shape_manual(values=c(3, 16, 17))+ scale_color_manual(values=c('pink','yellow', 'green'))+ theme(legend.position="top")

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

4. Marginal rugs to a Scatter plot with ggplot2 in R

To add marginal rugs to the scatter plot we will use geom_rug() methods. We are creating a scatter plot and adding marginal rugs using geom_rug() to show the distribution of values along the x and y axes.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()+ geom_rug()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

Scatter plots with the 2-D density estimation

To create density estimation in scatter plot we will use **geom_density_2d() methods and **geom_density_2d_filled() from ggplot2.

**Example: We are creating a scatter plot and adding a 2D density contour plot using geom_density_2d() to visualize the density of data points in the plot.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()+ geom_density_2d()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

**Syntax:

ggplot( aes(x)) + geom_density_2d( fill, color, alpha)

**Parameters:

1. Adding aesthetics to the 2-D density estimations

We are creating a scatter plot and adding a semi-transparent 2D density contour plot using geom_density_2d(alpha = 0.5) and filling the contours with colors using geom_density_2d_filled() to visualize the data density in the plot.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()+ geom_density_2d(alpha = 0.5)+ geom_density_2d_filled()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

2. Scatter plots with ellipses

To add a circle or ellipse around a cluster of data points, we use the stat_ellipse() function. This function automatically computes the circle/ellipse radius to draw around the cluster of points by categorical data. Like in this example, we are creating a scatter plot and adding ellipses using stat_ellipse() to show the confidence region or distribution of data points for each group in the dataset.

R `

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()+ stat_ellipse()

`

**Output:

gh

Basic Scatterplot with ggplot2 in R

In this article, we explored how to use scatter plots using ggplot2 in R Programming Language.