Accelerating Spark 3.0 Google DataProc Project with NVIDIA GPUs in 6 simple steps
Data Exploration is a key part of Data Science. And does it take long? Ahh. Don’t even ask. Preparing a data set for ML not only requires understanding the data set, cleaning, and creating new features, it also involves doing these steps repeatedly until we have a fine-tuned system.
As we moved towards bigger datasets, Apache Spark came as a ray of hope. It gave us a scalable and distributed in-memory system to work with Big Data. By the by, we also saw frameworks like Pytorch and Tensorflow that inherently parallelized matrix computations using thousands of GPU cores.
But never did we see these two systems working in tandem in the past. We continued to use Spark for Big Data ETL tasks and GPUs for matrix intensive problems in Deep Learning .
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.