Recently I started up with a competition on kaggle on text classification, and as a part of the competition, I had to somehow move to Pytorch to get deterministic results. Now I have always worked with Keras in the past and it has given me pretty good results, but somehow I got to know that the CuDNNGRU/CuDNNLSTM layers in keras are not deterministic, even after setting the seeds.
With the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning. For those who don’t know, Text classification is a common task in natural language processing, which transforms a sequence of text of indefinite length into a category of text. How could you use that? To find sentiment of a review. Find toxic comments in a platform like Facebook Find Insincere questions on Quora.
Recently Kaggle master Kazanova along with some of his friends released a “How to win a data science competition” Coursera course. You can start for free with the 7-day Free Trial. The Course involved a final project which itself was a time series prediction problem. Here I will describe how I got a top 10 position as of writing this article. Description of the Problem: In this competition we were given a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company.
Often times it happens that we fall short of creativity. And creativity is one of the basic ingredients of what we do. Creating features needs creativity. So here is the list of ideas I gather in day to day life, where people have used creativity to get great results on Kaggle leaderboards. Take a look at the How to Win a Data Science Competition: Learn from Top Kagglers course in the Advanced machine learning specialization by Kazanova(Number 3 Kaggler at the time of writing).
Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn. The one thing that blew my mind away was the word2vec embeddings. Till now whenever I heard the term word2vec I visualized it as a way to create a bag of words vector for a sentence. For those who don’t know bag of words: If we have a series of sentences(documents)