MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Understanding Transformers, the Data Science Way
Copy link
Facebook
Email
Notes
More

Understanding Transformers, the Data Science Way

Who are we? - Data Scientists

Rahul Agarwal's avatar
Rahul Agarwal
Sep 20, 2020
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
Understanding Transformers, the Data Science Way
Copy link
Facebook
Email
Notes
More
Share
Understanding Transformers, the Data Science Way

Transformers have become the defacto standard for NLP tasks nowadays.

While the Transformer architecture was introduced with NLP, they are now being used in Computer Vision and to generate music as well. I am sure you would all have heard about the GPT3 Transformer and its applications thereof.

But all these things aside, they are still hard to understand as ever.

It has taken me multiple readings through the Google research paper that first introduced transformers along with just so many blog posts to really understand how a transformer works.

So, I thought of putting the whole idea down in as simple words as possible and with some very basic Math and some puns as I am a proponent of having some fun while learning. I will try to keep both the jargon and the technicality to a minimum, yet it is such a topic that I could only do so much. And my goal is to make the reader understand even the most gory details of Transformer by the end of this post.

Also, this is officially my longest post bo…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More