GitHub - riazarbi/diffdfs: Compute The Difference Between Dataframes (original) (raw)
diffdfs
A small R package to compute the difference between data frames.
Install
Install via CRAN with install.packages("diffdfs")
Alternatively, install directly from this repository with devtools::install_github("riazarbi/diffdfs")
Use
This package just has two functions, checkkey
and diffdfs
.
checkkey
is just a helper for diffdfs
but you can use it if it suits your purposes.
here are some examples you can run in your R
session:
iris$key <- 1:nrow(iris)
old_df <- iris[1:100,] old_df[75,1] <- 100 new_df <- iris[50:150,]
diffdfs(new_df, old_df, key_cols = "key") operation Sepal.Length Sepal.Width Petal.Length Petal.Width Species key 1 new 6.3 3.3 6.0 2.5 virginica 101 2 new 5.8 2.7 5.1 1.9 virginica 102 3 new 7.1 3.0 5.9 2.1 virginica 103 4 new 6.3 2.9 5.6 1.8 virginica 104 5 new 6.5 3.0 5.8 2.2 virginica 105 6 new 7.6 3.0 6.6 2.1 virginica 106 ... ...
irisint = iris irisint$rownum = 1:nrow(irisint) key_cols = c("rownum")
checkkey(irisint, key_cols, TRUE) Checking that key column rows are unique [1] TRUE
checkkey(irisint, "Species", TRUE) Checking that key column rows are unique [1] FALSE
More detail
If you'd like to see more detail on the rationale behind this package, and a toy implementation of a diffdfs
driven data versioning strategy, read my blog post on the subject at here.
Contributing
Riaz Arbi is the maintainer of this package. If you'd like to point out a bug or make a suggestion, create an issue in this repo.