XGBoost is one of the most used libraries fora data science. At the time XGBoost came into existence, it was lightning fast compared to its nearest rival Python’s Scikit-learn GBM. But as the times have progressed, it has been rivaled by some awesome libraries like LightGBM and Catboost, both on speed as well as accuracy. I, for one, use LightGBM for most of the use cases where I have just got CPU for training.
A Machine Learning project is never really complete if we don’t have a good way to showcase it. While in the past, a well-made visualization or a small PPT used to be enough for showcasing a data science project, with the advent of dashboarding tools like RShiny and Dash, a good data scientist needs to have a fair bit of knowledge of web frameworks to get along. As Sten Sootla says in his satire piece which I thoroughly enjoyed:
Recently I was working on tuning hyperparameters for a huge Machine Learning model. Manual tuning was not an option since I had to tweak a lot of parameters. Hyperopt was also not an option as it works serially i.e. at a time, only a single model is being built. So it was taking up a lot of time to train each model and I was pretty short on time.
A Machine Learning project is never really complete if we don’t have a good way to showcase it. While in the past, a well-made visualization or a small PPT used to be enough for showcasing a data science project, with the advent of dashboarding tools like RShiny and Dash, a good data scientist needs to have a fair bit of knowledge of web frameworks to get along. And Web frameworks are hard to learn.
Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax. Sometimes back, I wrote about the subset of Pandas functionality I end up using often. In this post, I will talk about handling most of those data manipulation cases in Python on a GPU using cuDF.
I am a Mechanical engineer by education. And I started my career with a core job in the steel industry. With those heavy steel enforced gumboots and that plastic helmet, venturing around big blast furnaces and rolling mills. Artificial safety measures, to say the least, as I knew that nothing would save me if something untoward happens. Maybe some running shoes would have helped. As for the helmet. I would just say that molten steel burns at 1370 degrees C.
Recently, I got asked about how to explain confidence intervals in simple terms to a layperson. I found that it is hard to do that. Confidence Intervals are always a headache to explain even to someone who knows about them, let alone someone who doesn’t understand statistics. I went to Wikipedia to find something and here is the definition: In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data.