Make your own Super Pandas using Multiproc
Parallelization is awesome.
We data scientists have got laptops with quad-core, octa-core, turbo-boost. We work with servers with even more cores and computing power.
But do we really utilize the raw power we have at hand?
Instead, we wait for time taking processes to finish. Sometimes for hours, when urgent deliverables are at hand.
Can we do better? Can we get better?
In this series of posts named Python Shorts , I will explain some simple constructs provided by Python, some essential tips and some use cases I come up with regularly in my Data Science work.
This post is about using the computing power we have at hand and applying it to the data structure we use most.
Problem Statement
We have got a huge pandas data frame, and we want to apply a complex function to it which takes a lot of time.
For this post, I will use data from the Quora Insincere Question Classification on Kaggle, and we need to create some numerical features like length, the number of punctuations, etc. on it.
The competit…
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.