We, as data scientists have gotten quite comfortable with Pandas or SQL or any other relational database. We are used to seeing our users in rows with their attributes as columns. But does the real world behave like that? In a connected world, users cannot be considered as independent entities. They have got certain relationships with each other, and we would sometimes like to include such relationships while building our machine learning models.
When we create our machine learning models, a common task that falls on us is how to tune them. People end up taking different manual approaches. Some of them work, and some don’t, and a lot of time is spent in anticipation and running the code again and again. So that brings us to the quintessential question: Can we automate this process? A while back, I was working on an in-class competition from the “How to win a data science competition” Coursera course.
Creating a great machine learning system is an art. There are a lot of things to consider while building a great machine learning system. But often it happens that we as data scientists only worry about certain parts of the project. Most of the time that happens to be modeling, but in reality, the success or failure of a Machine Learning project depends on a lot of other factors.
I always get confused whenever someone talks about generative vs. discriminative classification models. I end up reading it again and again, yet somehow it eludes me. So I thought of writing a post on it to improve my understanding. This post is about understanding Generative Models and how they differ from Discriminative models. In the end, we will create a simple generative model ourselves. Discriminative vs. Generative Classifiers Problem Statement: Having some input data, X we want to classify the data into labels y.
We as data scientists have gotten quite comfortable with Pandas or SQL or any other relational database. We are used to seeing our users in rows with their attributes as columns. But does the real world really behave like that? In a connected world, users cannot be considered as independent entities. They have got certain relationships between each other and we would sometimes like to include such relationships while building our machine learning models.
One of the main tasks while working with text data is to create a lot of text-based features. One could like to find out certain patterns in the text, emails if present in a text as well as phone numbers in a large text. While it may sound fairly trivial to achieve such functionalities it is much simpler if we use the power of Python’s regex module. For example, let’s say you are tasked with finding the number of punctuations in a particular piece of text.
I am a Mechanical engineer by education. And I started my career with a core job in the steel industry. But I didn’t like it and so I left that. I made it my goal to move into the analytics and data science space somewhere around in 2013. From then on, it has taken me a lot of failures and a lot of efforts to shift. Now, people on social networks ask me how I got started in the data science field.