The 5 Sampling Algorithms every Data Scientist need to know

The 5 Sampling Algorithms every Data Scientist need to know

Data Science is the study of algorithms. I grapple through with many algorithms on a day to day basis so I thought of listing some of the most common and most used algorithms one will end up using in this new DS Algorithm series. This post is about some of the most common sampling techniques one can use while working with data. Simple Random Sampling Say you want to select a subset of a population in which each member of the subset has an equal probability of being chosen.
Bayesian Bandits explained simply

Bayesian Bandits explained simply

Exploration and Exploitation play a key role in any business. And any good business will try to “explore” various opportunities where it can make a profit. Any good business at the same time also tries to focus on a particular opportunity it has found already and tries to “exploits” it. Let me explain this further with a thought experiment. Thought Experiment: Assume that we have infinite slot machines. Every slot machine has some win probability.
Minimal Pandas Subset for Data Scientists

Minimal Pandas Subset for Data Scientists

Pandas is a vast library. Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax. Still, I generally have some issues with it. There are multiple ways to doing the same thing in Pandas, and that might make it troublesome for the beginner user.
The Hitchhikers guide to handle Big Data using Spark

The Hitchhikers guide to handle Big Data using Spark

Big Data has become synonymous with Data engineering. But the line between Data Engineering and Data scientists is blurring day by day. At this point in time, I think that Big Data must be in the repertoire of all data scientists. Reason: Too much data is getting generated day by day And that brings us to Spark. Now most of the Spark documentation, while good, did not explain it from the perspective of a data scientist.
3 Great Additions for your Jupyter Notebooks

3 Great Additions for your Jupyter Notebooks

I love Jupyter notebooks and the power they provide. They can be used to present findings as well as share code in the most effective manner which was not easy with the previous IDEs. Yet there is something still amiss. There are a few functionalities I aspire in my text editor which don’t come by default in Jupyter. But fret not. Just like everything in Python, Jupyter too has third-party extensions.
An End to End Introduction to GANs

An End to End Introduction to GANs

I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs. We have reached a stage where it is becoming increasingly difficult to distinguish between actual human faces and faces that are generated by Artificial Intelligence. In this post, I will help the reader to understand how they can create and build such applications on their own.
The Hitchhiker’s Guide to Feature Extraction

The Hitchhiker’s Guide to Feature Extraction

Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. In this post, I am going to talk about: Various methods of feature creation- Both Automated and manual Different Ways to handle categorical features Longitude and Latitude features Some kaggle tricks And some other ideas to think about feature creation.