MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Practical Spark Tips for Data Scientists
Copy link
Facebook
Email
Notes
More

Practical Spark Tips for Data Scientists

Rahul Agarwal's avatar
Rahul Agarwal
Mar 20, 2020
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Practical Spark Tips for Data Scientists
Copy link
Facebook
Email
Notes
More
Share
Practical Spark Tips for Data Scientists

I know — Spark is sometimes frustrating to work with.

Although sometimes we can manage our big data using tools like Rapids or Parallelization , there is no way around using Spark if you are working with Terabytes of data.

In my l ast few posts on Spark, I explained how to work with PySpark RDDs and Dataframes . Although these posts explain a lot on how to work with RDDs and Dataframe operations, they still are not quite enough.

Why? Because Spark gives memory errors a lot of times, and it is only when you genuinely work on big datasets with spark, would you be able to truly work with Spark.

This post is going to be about — “Practical Spark and memory management tips for Data Scientists.”

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More