Today I Learned This Part I: What are word2vec Embeddings?

Apr 09, 2017

∙ Paid

Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn. The one thing that blew my mind away was the word2vec embeddings.

Till now whenever I heard the term word2vec I visualized it as a way to create a bag of words vector for a sentence.

For those who don’t know bag of words: If we have a series of sentences(documents)

This is good - [1,1,1,0,0]
This is bad - [1,1,0,1,0]
This is awesome - [1,1,0,0,1]

Bag of words would encode it using 0:This 1:is 2:good 3:bad 4:awesome

But it is much more powerful than that.

What word2vec does is that it creates vectors for words. What I mean by that is that we have a 300 dimensional vector for every word(common bigrams too) in a dictionary.

How does that help?

We can use this for multiple scenarios but the most common are:

A. Using word2vec embeddings we can find out similarity between words. Assume you have to answer if these two statements signify the same thing:

President g…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.