Introduction to Apache Spark's Core API (Part I) (original) (raw)

Hello coders, I hope you are all doing well.

Over the past few months, I've been learning the Spark framework along with other big data topics. Spark is essentially a cluster programming framework. I know that one word can't define the entire framework.

In this post, I am going to discuss the core APIs of Apache Spark with respect to Python as a programming language. I am assuming that you have a basic knowledge of the Spark framework (tuples, RDD, pair RDD, and data frames) and its initialization steps.

When we launch a Spark shell, either in Scala or Python (i.e. Spark Shell or PySpark), it will initialize assparkContextsc and asSQLContextsqlContext.

For now, these are all the functions or methods for plain RDD, i.e. without the key. In my next post, I will explain the functions or methods with respect to pair RDD with multiple example snippets.

Thanks for reading and happy coding!

Opinions expressed by DZone contributors are their own.