Python

A Layman guide to moving from Keras to Pytorch

A Layman guide to moving from Keras to Pytorch

Recently I started up with a competition on kaggle on text classification, and as a part of the competition, I had to somehow move to Pytorch to get deterministic results. Now I have always worked with Keras in the past and it has given me pretty good results, but somehow I got to know that the CuDNNGRU/CuDNNLSTM layers in keras are not deterministic, even after setting the seeds.
What Kagglers are using for Text Classification

What Kagglers are using for Text Classification

With the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning. For those who don’t know, Text classification is a common task in natural language processing, which transforms a sequence of text of indefinite length into a category of text. How could you use that? To find sentiment of a review. Find toxic comments in a platform like Facebook Find Insincere questions on Quora.
To all Data Scientists - The one Graph Algorithm you need to know

To all Data Scientists - The one Graph Algorithm you need to know

Graphs provide us with a very useful data structure. They can help us to find structure within our data. With the advent of Machine learning and big data we need to get as much information as possible about our data. Learning a little bit of graph theory can certainly help us with that. Here is a Graph Analytics for Big Data course on Coursera by UCSanDiego which I highly recommend to learn the basics of graph theory.
Using XGBoost for time series prediction tasks

Using XGBoost for time series prediction tasks

Recently Kaggle master Kazanova along with some of his friends released a “How to win a data science competition” Coursera course. You can start for free with the 7-day Free Trial. The Course involved a final project which itself was a time series prediction problem. Here I will describe how I got a top 10 position as of writing this article. Description of the Problem: In this competition we were given a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company.
Good Feature Building Techniques - Tricks for Kaggle -  My Kaggle Code Repository

Good Feature Building Techniques - Tricks for Kaggle - My Kaggle Code Repository

Often times it happens that we fall short of creativity. And creativity is one of the basic ingredients of what we do. Creating features needs creativity. So here is the list of ideas I gather in day to day life, where people have used creativity to get great results on Kaggle leaderboards. Take a look at the How to Win a Data Science Competition: Learn from Top Kagglers course in the Advanced machine learning specialization by Kazanova(Number 3 Kaggler at the time of writing).

Maths Beats Intuition probably every damn time

Newton once said that “God does not play dice with the universe”. But actually he does. Everything happening around us could be explained in terms of probabilities. We repeatedly watch things around us happen due to chances, yet we never learn. We always get dumbfounded by the playfulness of nature. One of such ways intuition plays with us is with the Birthday problem. Problem Statement: In a room full of N people, what is the probability that 2 or more people share the same birthday(Assumption: 365 days in year)?

Today I Learned This Part I: What are word2vec Embeddings?

Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn. The one thing that blew my mind away was the word2vec embeddings. Till now whenever I heard the term word2vec I visualized it as a way to create a bag of words vector for a sentence. For those who don’t know bag of words: If we have a series of sentences(documents)