The Hitchhikers guide to handle Big Data using Spark

The Hitchhikers guide to handle Big Data using Spark

Big Data has become synonymous with Data engineering. But the line between Data Engineering and Data scientists is blurring day by day. At this point in time, I think that Big Data must be in the repertoire of all data scientists. Reason: Too much data is getting generated day by day And that brings us to Spark. Now most of the Spark documentation, while good, did not explain it from the perspective of a data scientist.
3 Great Additions for your Jupyter Notebooks

3 Great Additions for your Jupyter Notebooks

I love Jupyter notebooks and the power they provide. They can be used to present findings as well as share code in the most effective manner which was not easy with the previous IDEs. Yet there is something still amiss. There are a few functionalities I aspire in my text editor which don’t come by default in Jupyter. But fret not. Just like everything in Python, Jupyter too has third-party extensions.
An End to End Introduction to GANs

An End to End Introduction to GANs

I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs. We have reached a stage where it is becoming increasingly difficult to distinguish between actual human faces and faces that are generated by Artificial Intelligence. In this post, I will help the reader to understand how they can create and build such applications on their own.
The Hitchhiker’s Guide to Feature Extraction

The Hitchhiker’s Guide to Feature Extraction

Good Features are the backbone of any machine learning model. And good feature creation often needs domain knowledge, creativity, and lots of time. In this post, I am going to talk about: Various methods of feature creation- Both Automated and manual Different Ways to handle categorical features Longitude and Latitude features Some kaggle tricks And some other ideas to think about feature creation.
The Nation of a Billion Votes

The Nation of a Billion Votes

It is election month in India and a quote by Dr. Rahat Indori sums it up pretty well. “सरहदों पर बहुत तनाव है क्या , पता तो करो चुनाव है क्या !” For English speakers, this means: Is there a lot of tension at the borders? just ask if the elections are on. This election India has talked about a lot of issues. News channels have talked about Patriotism, Socialism, Religion as well as terrorism.
A primer on *args, **kwargs, decorators for Data Scientists

A primer on *args, **kwargs, decorators for Data Scientists

Python has a lot of constructs that are reasonably easy to learn and use in our code. Then there are some constructs which always confuse us when we encounter them in our code. Then are some that even seasoned programmers are not able to understand. *args, **kwargs and decorators are some constructs that fall into this category. I guess a lot of my data science friends have faced them too.
Python’s One Liner graph creation library with animations Hans Rosling Style

Python’s One Liner graph creation library with animations Hans Rosling Style

I distinctly remember the time when Seaborn came. I was really so fed up with Matplotlib. To create even simple graphs I had to run through so many StackOverflow threads. The time I could have spent in thinking good ideas for presenting my data was being spent in handling Matplotlib. And it was frustrating. Seaborn is much better than Matplotlib, yet it also demands a lot of code for a simple “good looking” graph.