Package index (original) (raw)
Distributed Data Frame
[SparkDataFrame-class](SparkDataFrame.html)
S4 class that represents a SparkDataFrame
[groupedData()](GroupedData.html)
S4 class that represents a GroupedData
[agg()](summarize.html)
[summarize()](summarize.html)
summarize
[arrange()](arrange.html)
[orderBy(_<SparkDataFrame>_,_<characterOrColumn>_)](arrange.html)
Arrange Rows by Variables
[approxQuantile(_<SparkDataFrame>_,_<character>_,_<numeric>_,_<numeric>_)](approxQuantile.html)
Calculates the approximate quantiles of numerical columns of a SparkDataFrame
[as.data.frame()](as.data.frame.html)
Download data from a SparkDataFrame into a R data.frame
[attach(_<SparkDataFrame>_)](attach.html)
Attach SparkDataFrame to R search path
[broadcast()](broadcast.html)
broadcast
[cache()](cache.html)
Cache
[cacheTable()](cacheTable.html)
Cache Table
[checkpoint()](checkpoint.html)
checkpoint
[collect()](collect.html)
Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.
[coltypes()](coltypes.html)
[`coltypes<-`()](coltypes.html)
coltypes
[colnames()](columns.html)
[`colnames<-`()](columns.html)
[columns()](columns.html)
[names(_<SparkDataFrame>_)](columns.html)
[`names<-`(_<SparkDataFrame>_)](columns.html)
Column Names of SparkDataFrame
[count()](count.html)
[n()](count.html)
Count
[createDataFrame()](createDataFrame.html)
[as.DataFrame()](createDataFrame.html)
Create a SparkDataFrame
[createExternalTable()](createExternalTable-deprecated.html)
(Deprecated) Create an external table
[createOrReplaceTempView()](createOrReplaceTempView.html)
Creates a temporary view using the given name.
[createTable()](createTable.html)
Creates a table based on the dataset in a data source
[crossJoin(_<SparkDataFrame>_,_<SparkDataFrame>_)](crossJoin.html)
CrossJoin
[crosstab(_<SparkDataFrame>_,_<character>_,_<character>_)](crosstab.html)
Computes a pair-wise frequency table of the given columns
[cube()](cube.html)
cube
[describe()](describe.html)
describe
[distinct()](distinct.html)
[unique(_<SparkDataFrame>_)](distinct.html)
Distinct
[dim(_<SparkDataFrame>_)](dim.html)
Returns the dimensions of SparkDataFrame
[drop()](drop.html)
drop
[dropDuplicates()](dropDuplicates.html)
dropDuplicates
[dropna()](nafunctions.html)
[na.omit()](nafunctions.html)
[fillna()](nafunctions.html)
A set of SparkDataFrame functions working with NA values
[dtypes()](dtypes.html)
DataTypes
[except()](except.html)
except
[exceptAll()](exceptAll.html)
exceptAll
[explain()](explain.html)
Explain
[filter()](filter.html)
[where()](filter.html)
Filter
[getNumPartitions(_<SparkDataFrame>_)](getNumPartitions.html)
getNumPartitions
[group_by()](groupBy.html)
[groupBy()](groupBy.html)
GroupBy
[head(_<SparkDataFrame>_)](head.html)
Head
[hint()](hint.html)
hint
[histogram(_<SparkDataFrame>_,_<characterOrColumn>_)](histogram.html)
Compute histogram statistics for given column
[insertInto()](insertInto.html)
insertInto
[intersect()](intersect.html)
Intersect
[intersectAll()](intersectAll.html)
intersectAll
[isLocal()](isLocal.html)
isLocal
[isStreaming()](isStreaming.html)
isStreaming
[join(_<SparkDataFrame>_,_<SparkDataFrame>_)](join.html)
Join
[limit()](limit.html)
Limit
[localCheckpoint()](localCheckpoint.html)
localCheckpoint
[merge()](merge.html)
Merges two data frames
[mutate()](mutate.html)
[transform()](mutate.html)
Mutate
[ncol(_<SparkDataFrame>_)](ncol.html)
Returns the number of columns in a SparkDataFrame
[count(_<SparkDataFrame>_)](nrow.html)
[nrow(_<SparkDataFrame>_)](nrow.html)
Returns the number of rows in a SparkDataFrame
[orderBy()](orderBy.html)
Ordering Columns in a WindowSpec
[persist()](persist.html)
Persist
[pivot(_<GroupedData>_,_<character>_)](pivot.html)
Pivot a column of the GroupedData and perform the specified aggregation.
[printSchema()](printSchema.html)
Print Schema of a SparkDataFrame
[randomSplit()](randomSplit.html)
randomSplit
[rbind()](rbind.html)
Union two or more SparkDataFrames
[rename()](rename.html)
[withColumnRenamed()](rename.html)
rename
[registerTempTable()](registerTempTable-deprecated.html)
(Deprecated) Register Temporary Table
[repartition()](repartition.html)
Repartition
[repartitionByRange()](repartitionByRange.html)
Repartition by range
[rollup()](rollup.html)
rollup
[sample()](sample.html)
[sample_frac()](sample.html)
Sample
[sampleBy()](sampleBy.html)
Returns a stratified sample without replacement
[saveAsTable()](saveAsTable.html)
Save the contents of the SparkDataFrame to a data source as a table
[schema()](schema.html)
Get schema object
[select()](select.html)
[`$`(_<SparkDataFrame>_)](select.html)
[`$<-`(_<SparkDataFrame>_)](select.html)
Select
[selectExpr()](selectExpr.html)
SelectExpr
[show(_<Column>_)](show.html)
[show(_<GroupedData>_)](show.html)
[show(_<SparkDataFrame>_)](show.html)
[show(_<WindowSpec>_)](show.html)
[show(_<StreamingQuery>_)](show.html)
show
[showDF()](showDF.html)
showDF
[str(_<SparkDataFrame>_)](str.html)
Compactly display the structure of a dataset
[storageLevel(_<SparkDataFrame>_)](storageLevel.html)
StorageLevel
[subset()](subset.html)
[`[[`(_<SparkDataFrame>_,_<numericOrcharacter>_)](subset.html)
[`[[<-`(_<SparkDataFrame>_,_<numericOrcharacter>_)](subset.html)
[`[`(_<SparkDataFrame>_)](subset.html)
Subset
[summary()](summary.html)
summary
[take()](take.html)
Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame
[tableToDF()](tableToDF.html)
Create a SparkDataFrame from a SparkSQL table or view
[toJSON(_<SparkDataFrame>_)](toJSON.html)
toJSON
[union()](union.html)
Return a new SparkDataFrame containing the union of rows
[unionAll()](unionAll.html)
Return a new SparkDataFrame containing the union of rows.
[unionByName()](unionByName.html)
Return a new SparkDataFrame containing the union of rows, matched by column names
[unpersist()](unpersist.html)
Unpersist
[unpivot()](unpivot.html)
[melt(_<SparkDataFrame>_,_<ANY>_,_<ANY>_,_<character>_,_<character>_)](unpivot.html)
Unpivot a DataFrame from wide format to long format.
[with()](with.html)
Evaluate a R expression in an environment constructed from a SparkDataFrame
[withColumn()](withColumn.html)
WithColumn
Data import and export
[read.df()](read.df.html)
[loadDF()](read.df.html)
Load a SparkDataFrame
[read.jdbc()](read.jdbc.html)
Create a SparkDataFrame representing the database table accessible via JDBC URL
[read.json()](read.json.html)
Create a SparkDataFrame from a JSON file.
[read.orc()](read.orc.html)
Create a SparkDataFrame from an ORC file.
[read.parquet()](read.parquet.html)
Create a SparkDataFrame from a Parquet file.
[read.text()](read.text.html)
Create a SparkDataFrame from a text file.
[write.df()](write.df.html)
[saveDF()](write.df.html)
Save the contents of SparkDataFrame to a data source.
[write.jdbc()](write.jdbc.html)
Save the content of SparkDataFrame to an external database table via JDBC.
[write.json()](write.json.html)
Save the contents of SparkDataFrame as a JSON file
[write.orc()](write.orc.html)
Save the contents of SparkDataFrame as an ORC file, preserving the schema.
[write.parquet()](write.parquet.html)
Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
[write.text()](write.text.html)
Save the content of SparkDataFrame in a text file at the specified path.
Column functions
Schema Definitions
Structured Streaming
Spark MLlib
MLlib is Spark’s machine learning (ML) library
[AFTSurvivalRegressionModel-class](AFTSurvivalRegressionModel-class.html)
S4 class that represents a AFTSurvivalRegressionModel
[ALSModel-class](ALSModel-class.html)
S4 class that represents an ALSModel
[BisectingKMeansModel-class](BisectingKMeansModel-class.html)
S4 class that represents a BisectingKMeansModel
[DecisionTreeClassificationModel-class](DecisionTreeClassificationModel-class.html)
S4 class that represents a DecisionTreeClassificationModel
[DecisionTreeRegressionModel-class](DecisionTreeRegressionModel-class.html)
S4 class that represents a DecisionTreeRegressionModel
[FMClassificationModel-class](FMClassificationModel-class.html)
S4 class that represents a FMClassificationModel
[FMRegressionModel-class](FMRegressionModel-class.html)
S4 class that represents a FMRegressionModel
[FPGrowthModel-class](FPGrowthModel-class.html)
S4 class that represents a FPGrowthModel
[GBTClassificationModel-class](GBTClassificationModel-class.html)
S4 class that represents a GBTClassificationModel
[GBTRegressionModel-class](GBTRegressionModel-class.html)
S4 class that represents a GBTRegressionModel
[GaussianMixtureModel-class](GaussianMixtureModel-class.html)
S4 class that represents a GaussianMixtureModel
[GeneralizedLinearRegressionModel-class](GeneralizedLinearRegressionModel-class.html)
S4 class that represents a generalized linear model
[glm(_<formula>_,_<ANY>_,_<SparkDataFrame>_)](glm.html)
Generalized Linear Models (R-compliant)
[IsotonicRegressionModel-class](IsotonicRegressionModel-class.html)
S4 class that represents an IsotonicRegressionModel
[KMeansModel-class](KMeansModel-class.html)
S4 class that represents a KMeansModel
[KSTest-class](KSTest-class.html)
S4 class that represents an KSTest
[LDAModel-class](LDAModel-class.html)
S4 class that represents an LDAModel
[LinearRegressionModel-class](LinearRegressionModel-class.html)
S4 class that represents a LinearRegressionModel
[LinearSVCModel-class](LinearSVCModel-class.html)
S4 class that represents an LinearSVCModel
[LogisticRegressionModel-class](LogisticRegressionModel-class.html)
S4 class that represents an LogisticRegressionModel
[MultilayerPerceptronClassificationModel-class](MultilayerPerceptronClassificationModel-class.html)
S4 class that represents a MultilayerPerceptronClassificationModel
[NaiveBayesModel-class](NaiveBayesModel-class.html)
S4 class that represents a NaiveBayesModel
[PowerIterationClustering-class](PowerIterationClustering-class.html)
S4 class that represents a PowerIterationClustering
[PrefixSpan-class](PrefixSpan-class.html)
S4 class that represents a PrefixSpan
[RandomForestClassificationModel-class](RandomForestClassificationModel-class.html)
S4 class that represents a RandomForestClassificationModel
[RandomForestRegressionModel-class](RandomForestRegressionModel-class.html)
S4 class that represents a RandomForestRegressionModel
[fitted()](fitted.html)
Get fitted result from a k-means model
[freqItems(_<SparkDataFrame>_,_<character>_)](freqItems.html)
Finding frequent items for columns, possibly with false positives
[spark.als()](spark.als.html)
[summary(_<ALSModel>_)](spark.als.html)
[predict(_<ALSModel>_)](spark.als.html)
[write.ml(_<ALSModel>_,_<character>_)](spark.als.html)
Alternating Least Squares (ALS) for Collaborative Filtering
[spark.bisectingKmeans()](spark.bisectingKmeans.html)
[summary(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html)
[predict(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html)
[fitted(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html)
[write.ml(_<BisectingKMeansModel>_,_<character>_)](spark.bisectingKmeans.html)
Bisecting K-Means Clustering Model
[spark.decisionTree()](spark.decisionTree.html)
[summary(_<DecisionTreeRegressionModel>_)](spark.decisionTree.html)
[print(_<summary.DecisionTreeRegressionModel>_)](spark.decisionTree.html)
[summary(_<DecisionTreeClassificationModel>_)](spark.decisionTree.html)
[print(_<summary.DecisionTreeClassificationModel>_)](spark.decisionTree.html)
[predict(_<DecisionTreeRegressionModel>_)](spark.decisionTree.html)
[predict(_<DecisionTreeClassificationModel>_)](spark.decisionTree.html)
[write.ml(_<DecisionTreeRegressionModel>_,_<character>_)](spark.decisionTree.html)
[write.ml(_<DecisionTreeClassificationModel>_,_<character>_)](spark.decisionTree.html)
Decision Tree Model for Regression and Classification
[spark.fmClassifier()](spark.fmClassifier.html)
[summary(_<FMClassificationModel>_)](spark.fmClassifier.html)
[predict(_<FMClassificationModel>_)](spark.fmClassifier.html)
[write.ml(_<FMClassificationModel>_,_<character>_)](spark.fmClassifier.html)
Factorization Machines Classification Model
[spark.fmRegressor()](spark.fmRegressor.html)
[summary(_<FMRegressionModel>_)](spark.fmRegressor.html)
[predict(_<FMRegressionModel>_)](spark.fmRegressor.html)
[write.ml(_<FMRegressionModel>_,_<character>_)](spark.fmRegressor.html)
Factorization Machines Regression Model
[spark.fpGrowth()](spark.fpGrowth.html)
[spark.freqItemsets()](spark.fpGrowth.html)
[spark.associationRules()](spark.fpGrowth.html)
[predict(_<FPGrowthModel>_)](spark.fpGrowth.html)
[write.ml(_<FPGrowthModel>_,_<character>_)](spark.fpGrowth.html)
FP-growth
[spark.gaussianMixture()](spark.gaussianMixture.html)
[summary(_<GaussianMixtureModel>_)](spark.gaussianMixture.html)
[predict(_<GaussianMixtureModel>_)](spark.gaussianMixture.html)
[write.ml(_<GaussianMixtureModel>_,_<character>_)](spark.gaussianMixture.html)
Multivariate Gaussian Mixture Model (GMM)
[spark.gbt()](spark.gbt.html)
[summary(_<GBTRegressionModel>_)](spark.gbt.html)
[print(_<summary.GBTRegressionModel>_)](spark.gbt.html)
[summary(_<GBTClassificationModel>_)](spark.gbt.html)
[print(_<summary.GBTClassificationModel>_)](spark.gbt.html)
[predict(_<GBTRegressionModel>_)](spark.gbt.html)
[predict(_<GBTClassificationModel>_)](spark.gbt.html)
[write.ml(_<GBTRegressionModel>_,_<character>_)](spark.gbt.html)
[write.ml(_<GBTClassificationModel>_,_<character>_)](spark.gbt.html)
Gradient Boosted Tree Model for Regression and Classification
[spark.glm()](spark.glm.html)
[summary(_<GeneralizedLinearRegressionModel>_)](spark.glm.html)
[print(_<summary.GeneralizedLinearRegressionModel>_)](spark.glm.html)
[predict(_<GeneralizedLinearRegressionModel>_)](spark.glm.html)
[write.ml(_<GeneralizedLinearRegressionModel>_,_<character>_)](spark.glm.html)
Generalized Linear Models
[spark.isoreg()](spark.isoreg.html)
[summary(_<IsotonicRegressionModel>_)](spark.isoreg.html)
[predict(_<IsotonicRegressionModel>_)](spark.isoreg.html)
[write.ml(_<IsotonicRegressionModel>_,_<character>_)](spark.isoreg.html)
Isotonic Regression Model
[spark.kmeans()](spark.kmeans.html)
[summary(_<KMeansModel>_)](spark.kmeans.html)
[predict(_<KMeansModel>_)](spark.kmeans.html)
[write.ml(_<KMeansModel>_,_<character>_)](spark.kmeans.html)
K-Means Clustering Model
[spark.kstest()](spark.kstest.html)
[summary(_<KSTest>_)](spark.kstest.html)
[print(_<summary.KSTest>_)](spark.kstest.html)
(One-Sample) Kolmogorov-Smirnov Test
[spark.lda()](spark.lda.html)
[spark.posterior()](spark.lda.html)
[spark.perplexity()](spark.lda.html)
[summary(_<LDAModel>_)](spark.lda.html)
[write.ml(_<LDAModel>_,_<character>_)](spark.lda.html)
Latent Dirichlet Allocation
[spark.lm()](spark.lm.html)
[summary(_<LinearRegressionModel>_)](spark.lm.html)
[predict(_<LinearRegressionModel>_)](spark.lm.html)
[write.ml(_<LinearRegressionModel>_,_<character>_)](spark.lm.html)
Linear Regression Model
[spark.logit()](spark.logit.html)
[summary(_<LogisticRegressionModel>_)](spark.logit.html)
[predict(_<LogisticRegressionModel>_)](spark.logit.html)
[write.ml(_<LogisticRegressionModel>_,_<character>_)](spark.logit.html)
Logistic Regression Model
[spark.mlp()](spark.mlp.html)
[summary(_<MultilayerPerceptronClassificationModel>_)](spark.mlp.html)
[predict(_<MultilayerPerceptronClassificationModel>_)](spark.mlp.html)
[write.ml(_<MultilayerPerceptronClassificationModel>_,_<character>_)](spark.mlp.html)
Multilayer Perceptron Classification Model
[spark.naiveBayes()](spark.naiveBayes.html)
[summary(_<NaiveBayesModel>_)](spark.naiveBayes.html)
[predict(_<NaiveBayesModel>_)](spark.naiveBayes.html)
[write.ml(_<NaiveBayesModel>_,_<character>_)](spark.naiveBayes.html)
Naive Bayes Models
[spark.assignClusters()](spark.powerIterationClustering.html)
PowerIterationClustering
[spark.findFrequentSequentialPatterns()](spark.prefixSpan.html)
PrefixSpan
[spark.randomForest()](spark.randomForest.html)
[summary(_<RandomForestRegressionModel>_)](spark.randomForest.html)
[print(_<summary.RandomForestRegressionModel>_)](spark.randomForest.html)
[summary(_<RandomForestClassificationModel>_)](spark.randomForest.html)
[print(_<summary.RandomForestClassificationModel>_)](spark.randomForest.html)
[predict(_<RandomForestRegressionModel>_)](spark.randomForest.html)
[predict(_<RandomForestClassificationModel>_)](spark.randomForest.html)
[write.ml(_<RandomForestRegressionModel>_,_<character>_)](spark.randomForest.html)
[write.ml(_<RandomForestClassificationModel>_,_<character>_)](spark.randomForest.html)
Random Forest Model for Regression and Classification
[spark.survreg()](spark.survreg.html)
[summary(_<AFTSurvivalRegressionModel>_)](spark.survreg.html)
[predict(_<AFTSurvivalRegressionModel>_)](spark.survreg.html)
[write.ml(_<AFTSurvivalRegressionModel>_,_<character>_)](spark.survreg.html)
Accelerated Failure Time (AFT) Survival Regression Model
[spark.svmLinear()](spark.svmLinear.html)
[predict(_<LinearSVCModel>_)](spark.svmLinear.html)
[summary(_<LinearSVCModel>_)](spark.svmLinear.html)
[write.ml(_<LinearSVCModel>_,_<character>_)](spark.svmLinear.html)
Linear SVM Model
[read.ml()](read.ml.html)
Load a fitted MLlib model from the input path.
[write.ml()](write.ml.html)
Saves the MLlib model to the input path