Creating a great machine learning system is an art. There are a lot of things to consider while building a great machine learning system. But often it happens that we as data scientists only worry about certain parts of the project. Most of the time that happens to be modeling, but in reality, the success or failure of a Machine Learning project depends on a lot of other factors.
I always get confused whenever someone talks about generative vs. discriminative classification models. I end up reading it again and again, yet somehow it eludes me. So I thought of writing a post on it to improve my understanding. This post is about understanding Generative Models and how they differ from Discriminative models. In the end, we will create a simple generative model ourselves. Discriminative vs. Generative Classifiers Problem Statement: Having some input data, X we want to classify the data into labels y.
We as data scientists have gotten quite comfortable with Pandas or SQL or any other relational database. We are used to seeing our users in rows with their attributes as columns. But does the real world really behave like that? In a connected world, users cannot be considered as independent entities. They have got certain relationships between each other and we would sometimes like to include such relationships while building our machine learning models.
Just Kidding, Nothing is hotter than Jennifer Lawrence. But as you are here, let’s proceed. For a practitioner in any field, they turn out as good as the tools they use. Data Scientists are no different. But sometimes we don’t even know which tools we need and also if we need them. We are not able to fathom if there could be a more natural way to solve the problem we face.
Newton once said that “God does not play dice with the universe”. But actually he does. Everything happening around us could be explained in terms of probabilities. We repeatedly watch things around us happen due to chances, yet we never learn. We always get dumbfounded by the playfulness of nature. One of such ways intuition plays with us is with the Birthday problem. Problem Statement: In a room full of N people, what is the probability that 2 or more people share the same birthday(Assumption: 365 days in year)?
Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn. The one thing that blew my mind away was the word2vec embeddings. Till now whenever I heard the term word2vec I visualized it as a way to create a bag of words vector for a sentence. For those who don’t know bag of words: If we have a series of sentences(documents)
I have been looking to create this list for a while now. There are many people on quora who ask me how I started in the data science field. And so I wanted to create this reference. To be frank, when I first started learning it all looked very utopian and out of the world. The Andrew Ng course felt like black magic. And it still doesn’t cease to amaze me.