MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Accelerating Spark 3.0 Google DataProc Project with NVIDIA GPUs in 6 simple steps

Rahul Agarwal's avatar
Rahul Agarwal
Aug 04, 2020
∙ Paid
Accelerating Spark 3.0 Google DataProc Project with NVIDIA GPUs in 6 simple steps

Data Exploration is a key part of Data Science. And does it take long? Ahh. Don’t even ask. Preparing a data set for ML not only requires understanding the data set, cleaning, and creating new features, it also involves doing these steps repeatedly until we have a fine-tuned system.

As we moved towards bigger datasets, Apache Spark came as a ray of hope. It gave us a scalable and distributed in-memory system to work with Big Data. By the by, we also saw frameworks like Pytorch and Tensorflow that inherently parallelized matrix computations using thousands of GPU cores.

But never did we see these two systems working in tandem in the past. We continued to use Spark for Big Data ETL tasks and GPUs for matrix intensive problems in Deep Learning .

User's avatar

Continue reading this post for free, courtesy of Rahul Agarwal.

Or purchase a paid subscription.
© 2026 Rahul Agarwal · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture