Lightning Fast XGBoost on Multiple GPUs

Lightning Fast XGBoost on Multiple GPUs

XGBoost is one of the most used libraries fora data science. At the time XGBoost came into existence, it was lightning fast compared to its nearest rival Python’s Scikit-learn GBM. But as the times have progressed, it has been rivaled by some awesome libraries like LightGBM and Catboost, both on speed as well as accuracy. I, for one, use LightGBM for most of the use cases where I have just got CPU for training.
Share your Projects even more easily with this New Streamlit Feature

Share your Projects even more easily with this New Streamlit Feature

A Machine Learning project is never really complete if we don’t have a good way to showcase it. While in the past, a well-made visualization or a small PPT used to be enough for showcasing a data science project, with the advent of dashboarding tools like RShiny and Dash, a good data scientist needs to have a fair bit of knowledge of web frameworks to get along. As Sten Sootla says in his satire piece which I thoroughly enjoyed:
100x faster Hyperparameter Search Framework with Pyspark

100x faster Hyperparameter Search Framework with Pyspark

Recently I was working on tuning hyperparameters for a huge Machine Learning model. Manual tuning was not an option since I had to tweak a lot of parameters. Hyperopt was also not an option as it works serially i.e. at a time, only a single model is being built. So it was taking up a lot of time to train each model and I was pretty short on time.
How to Deploy a Streamlit App using an Amazon Free ec2 instance?

How to Deploy a Streamlit App using an Amazon Free ec2 instance?

A Machine Learning project is never really complete if we don’t have a good way to showcase it. While in the past, a well-made visualization or a small PPT used to be enough for showcasing a data science project, with the advent of dashboarding tools like RShiny and Dash, a good data scientist needs to have a fair bit of knowledge of web frameworks to get along. And Web frameworks are hard to learn.
Minimal Pandas Subset for Data Scientists on GPU

Minimal Pandas Subset for Data Scientists on GPU

Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax. Sometimes back, I wrote about the subset of Pandas functionality I end up using often. In this post, I will talk about handling most of those data manipulation cases in Python on a GPU using cuDF.
Become a Data Scientist in 2020 with these 10 resources

Become a Data Scientist in 2020 with these 10 resources

I am a Mechanical engineer by education. And I started my career with a core job in the steel industry. With those heavy steel enforced gumboots and that plastic helmet, venturing around big blast furnaces and rolling mills. Artificial safety measures, to say the least, as I knew that nothing would save me if something untoward happens. Maybe some running shoes would have helped. As for the helmet. I would just say that molten steel burns at 1370 degrees C.
Confidence Intervals Explained Simply for Data Scientists

Confidence Intervals Explained Simply for Data Scientists

Recently, I got asked about how to explain confidence intervals in simple terms to a layperson. I found that it is hard to do that. Confidence Intervals are always a headache to explain even to someone who knows about them, let alone someone who doesn’t understand statistics. I went to Wikipedia to find something and here is the definition: In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data.