Package index (original) (raw)

Distributed Data Frame

[SparkDataFrame-class](SparkDataFrame.html)

S4 class that represents a SparkDataFrame

[groupedData()](GroupedData.html)

S4 class that represents a GroupedData

[agg()](summarize.html) [summarize()](summarize.html)

summarize

[arrange()](arrange.html) [orderBy(_<SparkDataFrame>_,_<characterOrColumn>_)](arrange.html)

Arrange Rows by Variables

[approxQuantile(_<SparkDataFrame>_,_<character>_,_<numeric>_,_<numeric>_)](approxQuantile.html)

Calculates the approximate quantiles of numerical columns of a SparkDataFrame

[as.data.frame()](as.data.frame.html)

Download data from a SparkDataFrame into a R data.frame

[attach(_<SparkDataFrame>_)](attach.html)

Attach SparkDataFrame to R search path

[broadcast()](broadcast.html)

broadcast

[cache()](cache.html)

Cache

[cacheTable()](cacheTable.html)

Cache Table

[checkpoint()](checkpoint.html)

checkpoint

[collect()](collect.html)

Collects all the elements of a SparkDataFrame and coerces them into an R data.frame.

[coltypes()](coltypes.html) [`coltypes<-`()](coltypes.html)

coltypes

[colnames()](columns.html) [`colnames<-`()](columns.html) [columns()](columns.html) [names(_<SparkDataFrame>_)](columns.html) [`names<-`(_<SparkDataFrame>_)](columns.html)

Column Names of SparkDataFrame

[count()](count.html) [n()](count.html)

Count

[createDataFrame()](createDataFrame.html) [as.DataFrame()](createDataFrame.html)

Create a SparkDataFrame

[createExternalTable()](createExternalTable-deprecated.html)

(Deprecated) Create an external table

[createOrReplaceTempView()](createOrReplaceTempView.html)

Creates a temporary view using the given name.

[createTable()](createTable.html)

Creates a table based on the dataset in a data source

[crossJoin(_<SparkDataFrame>_,_<SparkDataFrame>_)](crossJoin.html)

CrossJoin

[crosstab(_<SparkDataFrame>_,_<character>_,_<character>_)](crosstab.html)

Computes a pair-wise frequency table of the given columns

[cube()](cube.html)

cube

[describe()](describe.html)

describe

[distinct()](distinct.html) [unique(_<SparkDataFrame>_)](distinct.html)

Distinct

[dim(_<SparkDataFrame>_)](dim.html)

Returns the dimensions of SparkDataFrame

[drop()](drop.html)

drop

[dropDuplicates()](dropDuplicates.html)

dropDuplicates

[dropna()](nafunctions.html) [na.omit()](nafunctions.html) [fillna()](nafunctions.html)

A set of SparkDataFrame functions working with NA values

[dtypes()](dtypes.html)

DataTypes

[except()](except.html)

except

[exceptAll()](exceptAll.html)

exceptAll

[explain()](explain.html)

Explain

[filter()](filter.html) [where()](filter.html)

Filter

[getNumPartitions(_<SparkDataFrame>_)](getNumPartitions.html)

getNumPartitions

[group_by()](groupBy.html) [groupBy()](groupBy.html)

GroupBy

[head(_<SparkDataFrame>_)](head.html)

Head

[hint()](hint.html)

hint

[histogram(_<SparkDataFrame>_,_<characterOrColumn>_)](histogram.html)

Compute histogram statistics for given column

[insertInto()](insertInto.html)

insertInto

[intersect()](intersect.html)

Intersect

[intersectAll()](intersectAll.html)

intersectAll

[isLocal()](isLocal.html)

isLocal

[isStreaming()](isStreaming.html)

isStreaming

[join(_<SparkDataFrame>_,_<SparkDataFrame>_)](join.html)

Join

[limit()](limit.html)

Limit

[localCheckpoint()](localCheckpoint.html)

localCheckpoint

[merge()](merge.html)

Merges two data frames

[mutate()](mutate.html) [transform()](mutate.html)

Mutate

[ncol(_<SparkDataFrame>_)](ncol.html)

Returns the number of columns in a SparkDataFrame

[count(_<SparkDataFrame>_)](nrow.html) [nrow(_<SparkDataFrame>_)](nrow.html)

Returns the number of rows in a SparkDataFrame

[orderBy()](orderBy.html)

Ordering Columns in a WindowSpec

[persist()](persist.html)

Persist

[pivot(_<GroupedData>_,_<character>_)](pivot.html)

Pivot a column of the GroupedData and perform the specified aggregation.

[printSchema()](printSchema.html)

Print Schema of a SparkDataFrame

[randomSplit()](randomSplit.html)

randomSplit

[rbind()](rbind.html)

Union two or more SparkDataFrames

[rename()](rename.html) [withColumnRenamed()](rename.html)

rename

[registerTempTable()](registerTempTable-deprecated.html)

(Deprecated) Register Temporary Table

[repartition()](repartition.html)

Repartition

[repartitionByRange()](repartitionByRange.html)

Repartition by range

[rollup()](rollup.html)

rollup

[sample()](sample.html) [sample_frac()](sample.html)

Sample

[sampleBy()](sampleBy.html)

Returns a stratified sample without replacement

[saveAsTable()](saveAsTable.html)

Save the contents of the SparkDataFrame to a data source as a table

[schema()](schema.html)

Get schema object

[select()](select.html) [`$`(_<SparkDataFrame>_)](select.html) [`$<-`(_<SparkDataFrame>_)](select.html)

Select

[selectExpr()](selectExpr.html)

SelectExpr

[show(_<Column>_)](show.html) [show(_<GroupedData>_)](show.html) [show(_<SparkDataFrame>_)](show.html) [show(_<WindowSpec>_)](show.html) [show(_<StreamingQuery>_)](show.html)

show

[showDF()](showDF.html)

showDF

[str(_<SparkDataFrame>_)](str.html)

Compactly display the structure of a dataset

[storageLevel(_<SparkDataFrame>_)](storageLevel.html)

StorageLevel

[subset()](subset.html) [`[[`(_<SparkDataFrame>_,_<numericOrcharacter>_)](subset.html) [`[[<-`(_<SparkDataFrame>_,_<numericOrcharacter>_)](subset.html) [`[`(_<SparkDataFrame>_)](subset.html)

Subset

[summary()](summary.html)

summary

[take()](take.html)

Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame

[tableToDF()](tableToDF.html)

Create a SparkDataFrame from a SparkSQL table or view

[toJSON(_<SparkDataFrame>_)](toJSON.html)

toJSON

[union()](union.html)

Return a new SparkDataFrame containing the union of rows

[unionAll()](unionAll.html)

Return a new SparkDataFrame containing the union of rows.

[unionByName()](unionByName.html)

Return a new SparkDataFrame containing the union of rows, matched by column names

[unpersist()](unpersist.html)

Unpersist

[unpivot()](unpivot.html) [melt(_<SparkDataFrame>_,_<ANY>_,_<ANY>_,_<character>_,_<character>_)](unpivot.html)

Unpivot a DataFrame from wide format to long format.

[with()](with.html)

Evaluate a R expression in an environment constructed from a SparkDataFrame

[withColumn()](withColumn.html)

WithColumn

Data import and export

[read.df()](read.df.html) [loadDF()](read.df.html)

Load a SparkDataFrame

[read.jdbc()](read.jdbc.html)

Create a SparkDataFrame representing the database table accessible via JDBC URL

[read.json()](read.json.html)

Create a SparkDataFrame from a JSON file.

[read.orc()](read.orc.html)

Create a SparkDataFrame from an ORC file.

[read.parquet()](read.parquet.html)

Create a SparkDataFrame from a Parquet file.

[read.text()](read.text.html)

Create a SparkDataFrame from a text file.

[write.df()](write.df.html) [saveDF()](write.df.html)

Save the contents of SparkDataFrame to a data source.

[write.jdbc()](write.jdbc.html)

Save the content of SparkDataFrame to an external database table via JDBC.

[write.json()](write.json.html)

Save the contents of SparkDataFrame as a JSON file

[write.orc()](write.orc.html)

Save the contents of SparkDataFrame as an ORC file, preserving the schema.

[write.parquet()](write.parquet.html)

Save the contents of SparkDataFrame as a Parquet file, preserving the schema.

[write.text()](write.text.html)

Save the content of SparkDataFrame in a text file at the specified path.

Column functions

Schema Definitions

Structured Streaming

Spark MLlib

MLlib is Spark’s machine learning (ML) library

[AFTSurvivalRegressionModel-class](AFTSurvivalRegressionModel-class.html)

S4 class that represents a AFTSurvivalRegressionModel

[ALSModel-class](ALSModel-class.html)

S4 class that represents an ALSModel

[BisectingKMeansModel-class](BisectingKMeansModel-class.html)

S4 class that represents a BisectingKMeansModel

[DecisionTreeClassificationModel-class](DecisionTreeClassificationModel-class.html)

S4 class that represents a DecisionTreeClassificationModel

[DecisionTreeRegressionModel-class](DecisionTreeRegressionModel-class.html)

S4 class that represents a DecisionTreeRegressionModel

[FMClassificationModel-class](FMClassificationModel-class.html)

S4 class that represents a FMClassificationModel

[FMRegressionModel-class](FMRegressionModel-class.html)

S4 class that represents a FMRegressionModel

[FPGrowthModel-class](FPGrowthModel-class.html)

S4 class that represents a FPGrowthModel

[GBTClassificationModel-class](GBTClassificationModel-class.html)

S4 class that represents a GBTClassificationModel

[GBTRegressionModel-class](GBTRegressionModel-class.html)

S4 class that represents a GBTRegressionModel

[GaussianMixtureModel-class](GaussianMixtureModel-class.html)

S4 class that represents a GaussianMixtureModel

[GeneralizedLinearRegressionModel-class](GeneralizedLinearRegressionModel-class.html)

S4 class that represents a generalized linear model

[glm(_<formula>_,_<ANY>_,_<SparkDataFrame>_)](glm.html)

Generalized Linear Models (R-compliant)

[IsotonicRegressionModel-class](IsotonicRegressionModel-class.html)

S4 class that represents an IsotonicRegressionModel

[KMeansModel-class](KMeansModel-class.html)

S4 class that represents a KMeansModel

[KSTest-class](KSTest-class.html)

S4 class that represents an KSTest

[LDAModel-class](LDAModel-class.html)

S4 class that represents an LDAModel

[LinearRegressionModel-class](LinearRegressionModel-class.html)

S4 class that represents a LinearRegressionModel

[LinearSVCModel-class](LinearSVCModel-class.html)

S4 class that represents an LinearSVCModel

[LogisticRegressionModel-class](LogisticRegressionModel-class.html)

S4 class that represents an LogisticRegressionModel

[MultilayerPerceptronClassificationModel-class](MultilayerPerceptronClassificationModel-class.html)

S4 class that represents a MultilayerPerceptronClassificationModel

[NaiveBayesModel-class](NaiveBayesModel-class.html)

S4 class that represents a NaiveBayesModel

[PowerIterationClustering-class](PowerIterationClustering-class.html)

S4 class that represents a PowerIterationClustering

[PrefixSpan-class](PrefixSpan-class.html)

S4 class that represents a PrefixSpan

[RandomForestClassificationModel-class](RandomForestClassificationModel-class.html)

S4 class that represents a RandomForestClassificationModel

[RandomForestRegressionModel-class](RandomForestRegressionModel-class.html)

S4 class that represents a RandomForestRegressionModel

[fitted()](fitted.html)

Get fitted result from a k-means model

[freqItems(_<SparkDataFrame>_,_<character>_)](freqItems.html)

Finding frequent items for columns, possibly with false positives

[spark.als()](spark.als.html) [summary(_<ALSModel>_)](spark.als.html) [predict(_<ALSModel>_)](spark.als.html) [write.ml(_<ALSModel>_,_<character>_)](spark.als.html)

Alternating Least Squares (ALS) for Collaborative Filtering

[spark.bisectingKmeans()](spark.bisectingKmeans.html) [summary(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html) [predict(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html) [fitted(_<BisectingKMeansModel>_)](spark.bisectingKmeans.html) [write.ml(_<BisectingKMeansModel>_,_<character>_)](spark.bisectingKmeans.html)

Bisecting K-Means Clustering Model

[spark.decisionTree()](spark.decisionTree.html) [summary(_<DecisionTreeRegressionModel>_)](spark.decisionTree.html) [print(_<summary.DecisionTreeRegressionModel>_)](spark.decisionTree.html) [summary(_<DecisionTreeClassificationModel>_)](spark.decisionTree.html) [print(_<summary.DecisionTreeClassificationModel>_)](spark.decisionTree.html) [predict(_<DecisionTreeRegressionModel>_)](spark.decisionTree.html) [predict(_<DecisionTreeClassificationModel>_)](spark.decisionTree.html) [write.ml(_<DecisionTreeRegressionModel>_,_<character>_)](spark.decisionTree.html) [write.ml(_<DecisionTreeClassificationModel>_,_<character>_)](spark.decisionTree.html)

Decision Tree Model for Regression and Classification

[spark.fmClassifier()](spark.fmClassifier.html) [summary(_<FMClassificationModel>_)](spark.fmClassifier.html) [predict(_<FMClassificationModel>_)](spark.fmClassifier.html) [write.ml(_<FMClassificationModel>_,_<character>_)](spark.fmClassifier.html)

Factorization Machines Classification Model

[spark.fmRegressor()](spark.fmRegressor.html) [summary(_<FMRegressionModel>_)](spark.fmRegressor.html) [predict(_<FMRegressionModel>_)](spark.fmRegressor.html) [write.ml(_<FMRegressionModel>_,_<character>_)](spark.fmRegressor.html)

Factorization Machines Regression Model

[spark.fpGrowth()](spark.fpGrowth.html) [spark.freqItemsets()](spark.fpGrowth.html) [spark.associationRules()](spark.fpGrowth.html) [predict(_<FPGrowthModel>_)](spark.fpGrowth.html) [write.ml(_<FPGrowthModel>_,_<character>_)](spark.fpGrowth.html)

FP-growth

[spark.gaussianMixture()](spark.gaussianMixture.html) [summary(_<GaussianMixtureModel>_)](spark.gaussianMixture.html) [predict(_<GaussianMixtureModel>_)](spark.gaussianMixture.html) [write.ml(_<GaussianMixtureModel>_,_<character>_)](spark.gaussianMixture.html)

Multivariate Gaussian Mixture Model (GMM)

[spark.gbt()](spark.gbt.html) [summary(_<GBTRegressionModel>_)](spark.gbt.html) [print(_<summary.GBTRegressionModel>_)](spark.gbt.html) [summary(_<GBTClassificationModel>_)](spark.gbt.html) [print(_<summary.GBTClassificationModel>_)](spark.gbt.html) [predict(_<GBTRegressionModel>_)](spark.gbt.html) [predict(_<GBTClassificationModel>_)](spark.gbt.html) [write.ml(_<GBTRegressionModel>_,_<character>_)](spark.gbt.html) [write.ml(_<GBTClassificationModel>_,_<character>_)](spark.gbt.html)

Gradient Boosted Tree Model for Regression and Classification

[spark.glm()](spark.glm.html) [summary(_<GeneralizedLinearRegressionModel>_)](spark.glm.html) [print(_<summary.GeneralizedLinearRegressionModel>_)](spark.glm.html) [predict(_<GeneralizedLinearRegressionModel>_)](spark.glm.html) [write.ml(_<GeneralizedLinearRegressionModel>_,_<character>_)](spark.glm.html)

Generalized Linear Models

[spark.isoreg()](spark.isoreg.html) [summary(_<IsotonicRegressionModel>_)](spark.isoreg.html) [predict(_<IsotonicRegressionModel>_)](spark.isoreg.html) [write.ml(_<IsotonicRegressionModel>_,_<character>_)](spark.isoreg.html)

Isotonic Regression Model

[spark.kmeans()](spark.kmeans.html) [summary(_<KMeansModel>_)](spark.kmeans.html) [predict(_<KMeansModel>_)](spark.kmeans.html) [write.ml(_<KMeansModel>_,_<character>_)](spark.kmeans.html)

K-Means Clustering Model

[spark.kstest()](spark.kstest.html) [summary(_<KSTest>_)](spark.kstest.html) [print(_<summary.KSTest>_)](spark.kstest.html)

(One-Sample) Kolmogorov-Smirnov Test

[spark.lda()](spark.lda.html) [spark.posterior()](spark.lda.html) [spark.perplexity()](spark.lda.html) [summary(_<LDAModel>_)](spark.lda.html) [write.ml(_<LDAModel>_,_<character>_)](spark.lda.html)

Latent Dirichlet Allocation

[spark.lm()](spark.lm.html) [summary(_<LinearRegressionModel>_)](spark.lm.html) [predict(_<LinearRegressionModel>_)](spark.lm.html) [write.ml(_<LinearRegressionModel>_,_<character>_)](spark.lm.html)

Linear Regression Model

[spark.logit()](spark.logit.html) [summary(_<LogisticRegressionModel>_)](spark.logit.html) [predict(_<LogisticRegressionModel>_)](spark.logit.html) [write.ml(_<LogisticRegressionModel>_,_<character>_)](spark.logit.html)

Logistic Regression Model

[spark.mlp()](spark.mlp.html) [summary(_<MultilayerPerceptronClassificationModel>_)](spark.mlp.html) [predict(_<MultilayerPerceptronClassificationModel>_)](spark.mlp.html) [write.ml(_<MultilayerPerceptronClassificationModel>_,_<character>_)](spark.mlp.html)

Multilayer Perceptron Classification Model

[spark.naiveBayes()](spark.naiveBayes.html) [summary(_<NaiveBayesModel>_)](spark.naiveBayes.html) [predict(_<NaiveBayesModel>_)](spark.naiveBayes.html) [write.ml(_<NaiveBayesModel>_,_<character>_)](spark.naiveBayes.html)

Naive Bayes Models

[spark.assignClusters()](spark.powerIterationClustering.html)

PowerIterationClustering

[spark.findFrequentSequentialPatterns()](spark.prefixSpan.html)

PrefixSpan

[spark.randomForest()](spark.randomForest.html) [summary(_<RandomForestRegressionModel>_)](spark.randomForest.html) [print(_<summary.RandomForestRegressionModel>_)](spark.randomForest.html) [summary(_<RandomForestClassificationModel>_)](spark.randomForest.html) [print(_<summary.RandomForestClassificationModel>_)](spark.randomForest.html) [predict(_<RandomForestRegressionModel>_)](spark.randomForest.html) [predict(_<RandomForestClassificationModel>_)](spark.randomForest.html) [write.ml(_<RandomForestRegressionModel>_,_<character>_)](spark.randomForest.html) [write.ml(_<RandomForestClassificationModel>_,_<character>_)](spark.randomForest.html)

Random Forest Model for Regression and Classification

[spark.survreg()](spark.survreg.html) [summary(_<AFTSurvivalRegressionModel>_)](spark.survreg.html) [predict(_<AFTSurvivalRegressionModel>_)](spark.survreg.html) [write.ml(_<AFTSurvivalRegressionModel>_,_<character>_)](spark.survreg.html)

Accelerated Failure Time (AFT) Survival Regression Model

[spark.svmLinear()](spark.svmLinear.html) [predict(_<LinearSVCModel>_)](spark.svmLinear.html) [summary(_<LinearSVCModel>_)](spark.svmLinear.html) [write.ml(_<LinearSVCModel>_,_<character>_)](spark.svmLinear.html)

Linear SVM Model

[read.ml()](read.ml.html)

Load a fitted MLlib model from the input path.

[write.ml()](write.ml.html)

Saves the MLlib model to the input path

Distributed R

SQL Catalog

Spark Session and Context