Explaining BERT Simply Using Sketches

Jul 24, 2021

∙ Paid

In my last series of posts on Transformers, I talked about how a transformer works and how to implement one yourself for a translation task.

In this post, I will go a step further and try to explain BERT, one of the most popular NLP models that utilize a Transformer at its core and which achieved State of the Art performance on many NLP tasks including Classification, Question Answering, and NER Tagging when it was first introduced.

Specifically, unlike other posts on the same topic, I will try to go through the highly influential BERT paper — Pre-training of Deep Bidirectional Transformers for Language Understanding while keeping the jargon to a minimum and try to explain how BERT works through sketches.

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.