Minimal Pandas Subset for Data Scientists on GPU
Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax.
Sometimes back, I wrote about the subset of Pandas functionality I end up using often. In this post, I will talk about handling most of those data manipulation cases in Python on a GPU using cuDF.
With a sprinkling of some recommendations throughout.
PS: for benchmarking, all the experiments below are done on a Machine with 128 GB RAM and a Titan RTX GPU with 24 GB RAM.
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.