Minimal Pandas Subset for Data Scientists
Pandas is a vast library.
Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax.
Still, I generally have some issues with it.
There are multiple ways to doing the same thing in Pandas, and that might make it troublesome for the beginner user.
This has inspired me to come up with a minimal subset of pandas functions I use while coding.
I have tried it all, and currently, I stick to a particular way. It is like a mind map.
Sometimes because it is fast and sometimes because it’s more readable and sometimes because I can do it with my current knowledge. And sometimes because I know that a particular way will be a headache in the long run(think multi-index)
This post is about handling most of the data manipulation cases in Python using a straightforward, simple, and matter of fact way.
With a sprinkling of some recommendations throughout.
I will be using a data set…
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.