Practical Spark Tips for Data Scientists
I know — Spark is sometimes frustrating to work with.
Although sometimes we can manage our big data using tools like Rapids or Parallelization , there is no way around using Spark if you are working with Terabytes of data.
In my l ast few posts on Spark, I explained how to work with PySpark RDDs and Dataframes . Although these posts explain a lot on how to work with RDDs and Dataframe operations, they still are not quite enough.
Why? Because Spark gives memory errors a lot of times, and it is only when you genuinely work on big datasets with spark, would you be able to truly work with Spark.
This post is going to be about — “Practical Spark and memory management tips for Data Scientists.”
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.