Become a Data Scientist in 2020 with these 10 resources

I am a Mechanical engineer by education. And I started my career with a core job in the steel industry.

With those heavy steel enforced gumboots and that plastic helmet, venturing around big blast furnaces and rolling mills. Artificial safety measures, to say the least, as I knew that nothing would save me if something untoward happens. Maybe some running shoes would have helped. As for the helmet. I would just say that molten steel burns at 1370 degrees C.

As I realized based on my constant fear, that job was not for me, and so I made it my goal to move into the Analytics and Data Science space somewhere around in 2011. From that time, MOOCs have been my goto option for learning new things, and I ended up taking a lot of them. Good ones and bad ones.

Now in 2020, with the Data Science field changing so rapidly, there is no shortage of resources to learn data science. But that also often poses a problem for a beginner as to where to start learning and what to learn? There are a lot of great resources on the internet, but that means there are a lot of bad ones too.

A lot of choices may often result in stagnation as anxiety is not good when it comes to learning.

In his book, The Paradox of Choice — Why More Is Less, Schwartz argues that eliminating consumer choices can greatly reduce anxiety for shoppers. And the same remains true for Data Science courses as well.

This post is about providing recommendations to lost souls with a lot of choices on where to start their Data Science Journey.


1) Python 3 Programming Specialization

“GoodBye World” for Python 2.7!!!

First, you need a programming language. This specialization from the University of Michigan is about learning to use Python and creating things on your own.

You will learn about programming fundamentals like variables, conditionals, and loops, and get to some intermediate material like keyword parameters, list comprehensions, lambda expressions, and class inheritance.

You might also like to go through my Python Shorts posts while going through this specialization.

Python Shorts Posts


2) Applied Data Science with Python

Do first, understand later

We need to get a taste of Machine Learning before understanding it fully.

This specialization in Applied Data Science with Python gives an intro to many modern machine learning methods that you should know about. Not a thorough grinding, but you will get the tools to build your models.

This skills-based specialization is intended for learners who have a basic python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx to gain insight into their data.

You might also like to go through a few of my posts while going through this specialization:


3) Machine Learning Theory and Fundamentals

After doing these above courses, you will gain the status of what I would like to call a “Beginner.”

Congrats!!!. You know stuff; you know how to implement things.

You are Useful

Yet, you do not fully understand all the math and grind that goes behind all these models.

You need to understand what goes behind the clf.fit. Its time to face the music. Nobody is going to take you seriously till you understand the Math behind your models.

If you don’t understand it you won’t be able to improve it

Here comes the Game Changer Machine Learning course. It contains the maths behind many of the Machine Learning algorithms.

I will put this course as the one course you have to take as this course motivated me into getting into this field, and Andrew Ng is a great instructor. Also, this was the first course that I took myself when I started.

This course has a little of everything — Regression, Classification, Anomaly Detection, Recommender systems, Neural networks, plus a lot of great advice.

You might also want to go through a few of my posts while going through this course:


4) Learn Statistical Inference

“Facts are stubborn things, but statistics are pliable.”― Mark Twain

Mine Çetinkaya-Rundel teaches this course on Inferential Statistics. And it cannot get simpler than this one.

She is a great instructor and explains the fundamentals of Statistical inference nicely — a must-take course.

You will learn about hypothesis testing, confidence intervals, and statistical inference methods for numerical and categorical data.

You might also want to go through a few of my posts while going through this specialization:


5) Learn SQL Basics for Data Science

SQL is the heart of all data ETL

While we feel much more accomplished by creating models and coming up with the different hypotheses, the role of data munging can’t be understated.

And with the ubiquitousness of SQL when it comes to ETL and data preparation tasks, everyone should know a little bit of it to at least be useful.

SQL has also become a de facto standard of working with Big Data Tools like Apache Spark. This SQL specialization from UC Davis will teach you about SQL as well as how to use SQL for distributed computing.

From the Course website:

Through four progressively more difficult SQL projects with data science applications, you will cover topics such as SQL basics, data wrangling, SQL analysis, AB testing, distributed computing using Apache Spark, and more

You might also want to go through a few of my posts while going through this specialization:


6) Advanced Machine Learning

In the big leagues, there is no spoonfeeding.

You might not agree to this, but till now, whatever we have done has been spoonfed learning. The material was structured, and the Math has been minimal. But that has prepared you for the next steps. This Advanced Machine Learning specialization by Top Kaggle machine learning practitioners and CERN scientists takes another approach to learning by going through a lot of difficult concepts and guiding you through how things worked in the past and the most recent advancements in the Machine Learning World. The description on the website says:

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice.

You might like to look at a few of my posts while trying to understand some of the material in this course.


7) Deep Learning

Deep Learning is the Future

Andrew NG is back again with his new Deep Learning Specialization. And this is Pure Gold.

Andrew Ng has achieved mastery in explaining difficult concepts in an easy to understand way. The nomenclature he follows is different from all other tutorials and courses on the net, and I hope it catches on as it is pretty helpful in understanding all the basic concepts.

From the specialization website:

Learn the foundations of Deep Learning, understand how to build neural networks, and learn how to lead successful machine learning projects. You will learn about Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization, and more. You will work on case studies from healthcare, autonomous driving, sign language reading, music generation, and natural language processing.

You might like to look at a few of my posts while trying to understand some of the material in this course.


8) Pytorch

Python on Fire

I usually never advocate to learn a tool, but here I do. The reason being that it is incredible and seriously, you will be able to read code in a lot of recent research papers if you understand Pytorch. Pytorch has become a default programming language for researchers working in Deep Learning, and it will only pay for us to learn it.

A structured way to learn Pytorch is by taking this course on Deep Neural Networks with Pytorch. From the course website:

The course will start with Pytorch’s tensors and Automatic differentiation package. Then each section will cover different models starting off with fundamentals such as Linear Regression, and logistic/softmax regression. Followed by Feedforward deep neural networks, the role of different activation functions, normalization and dropout layers. Then Convolutional Neural Networks and Transfer learning will be covered. Finally, several other Deep learning methods will be covered.

You might also look at this post of mine, where I try to explain how to work with PyTorch. - Moving from Keras to Pytorch


9) Getting Started with AWS for Machine Learning

The secret: it’s not what you know, it’s what you show.

There are a lot of things to consider while building a great machine learning system. But often it happens that we, as data scientists, only worry about certain parts of the project.

But do we ever think about how we will deploy our models once we have them?

I have seen a lot of ML projects, and a lot of them are doomed to fail as they don’t have a set plan for production from the onset.

Having a good platform and understanding how that platform deploys machine Learning apps will make all the difference in the real world. This course on AWS for implementing Machine Learning applications promises just that.

This course will teach you: 1. How to build, train and deploy a model using Amazon SageMaker with built-in algorithms and Jupyter Notebook instance. 2. How to build intelligent applications using Amazon AI services like Amazon Comprehend, Amazon Rekognition, Amazon Translate and others.

You might also look at this post of mine, where I try to talk about apps and explain how to plan for Production.


10) Data Structures and Algorithms

Algorithms. Yes, you need them.

Algorithms and data structures are an integral part of data science. While most of us data scientists don’t take a proper algorithms course while studying, they are essential all the same.

Many companies ask data structures and algorithms as part of their interview process for hiring data scientists.

They will require the same zeal to crack as your Data Science interviews, and thus, you might want to give some time for the study of algorithms and Data structure and algorithms questions.

One of the best resources I found to learn algorithms is the Algorithm Specialization on Coursera by UCSanDiego. From the specialization website:

You will learn algorithmic techniques for solving various computational problems and will implement about 100 algorithmic coding problems in a programming language of your choice. No other online course in Algorithms even comes close to offering you a wealth of programming challenges that you may face at your next job interview.

You might also like to look at a few of my posts while trying to understand some of the material in this specialization.


Continue Learning

I am going to be writing more beginner-friendly posts in the future too. Follow me up at Medium or Subscribe to my blog to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz.

Also, a small disclaimer — There might be some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.

Start your future with a Data Science Certificate.