Awesome Guides
Learning Resources# Become a Data Scientist in 2020 with these 10 resources

## 1) Python 3 Programming Specialization

## 2) Applied Data Science with Python

## 3) Machine Learning Theory and Fundamentals

## 4) Learn Statistical Inference

## 5) Learn SQL Basics for Data Science

## 6) Advanced Machine Learning

## 7) Deep Learning

## 8) Pytorch

## 9) Getting Started with AWS for Machine Learning

## 10) Data Structures and Algorithms

## Continue Learning

I am a Mechanical engineer by education. And I started my career with a core job in the steel industry.

With those heavy steel enforced gumboots and that plastic helmet, venturing around big blast furnaces and rolling mills. Artificial safety measures, to say the least, as I knew that nothing would save me if something untoward happens. Maybe some running shoes would have helped. As for the helmet. I would just say that molten steel burns at 1370 degrees C.

As I realized based on my constant fear, that job was not for me, and so I made it my goal to move into the Analytics and Data Science space somewhere around in 2011. From that time, MOOCs have been my goto option for learning new things, and I ended up taking a lot of them. Good ones and bad ones.

Now in 2020, with the Data Science field changing so rapidly, there is no shortage of resources to learn data science. But that also often poses a problem for a beginner as to where to start learning and what to learn? There are a lot of great resources on the internet, but that means there are a lot of bad ones too.

A lot of choices may often result in stagnation as anxiety is not good when it comes to learning.

In his book, * The Paradox of Choice — Why More Is Less*, Schwartz argues that eliminating consumer choices can greatly reduce
anxiety
for shoppers. And the same remains true for Data Science courses as well.

**This post is about providing recommendations to lost souls with a lot of choices on where to start their Data Science Journey.**

“GoodBye World” for Python 2.7!!!

First, you need a programming language. This specialization from the University of Michigan is about learning to use Python and creating things on your own.

You will learn about programming fundamentals like variables, conditionals, and loops, and get to some intermediate material like keyword parameters, list comprehensions, lambda expressions, and class inheritance.

You might also like to go through my Python Shorts posts while going through this specialization.

<strong>Python Shorts Posts</strong>

Do first, understand later

We need to get a taste of Machine Learning before understanding it fully.

This specialization in Applied Data Science with Python gives an intro to many modern machine learning methods that you should know about. Not a thorough grinding, but you will get the tools to build your models.

This skills-based specialization is intended for learners who have a basic python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx to gain insight into their data.

You might also like to go through a few of my posts while going through this specialization:

<strong>Python’s One Liner graph creation library with animations Hans Rosling Style</strong>

<strong>3 Awesome Visualization Techniques for every dataset</strong>

After doing these above courses, you will gain the status of what I would like to call a **“Beginner.”**

Congrats!!!. *You know stuff; you know how to implement things*.

You are Useful

Yet, you do not fully understand all the math and grind that goes behind all these models.

You need to understand what goes behind the clf.fit. Its time to face the music. Nobody is going to take you seriously till you understand the Math behind your models.

If you don’t understand it you won’t be able to improve it

Here comes the Game Changer Machine Learning course . It contains the maths behind many of the Machine Learning algorithms.

I will put this course as the * one course you have to take* as this course motivated me into getting into this field, and Andrew Ng is a great instructor. Also, this was the first course that I took myself when I started.

This course has a little of everything — Regression, Classification, Anomaly Detection, Recommender systems, Neural networks, plus a lot of great advice.

You might also want to go through a few of my posts while going through this course:

<strong>The Hitchhiker’s Guide to Feature Extraction</strong>

<strong>The 5 Classification Evaluation metrics every Data Scientist must know</strong>

<strong>The 5 Feature Selection Algorithms every Data Scientist should know</strong>

<strong>The Simple Math behind 3 Decision Tree Splitting criterions</strong>

“Facts are stubborn things, but statistics are pliable.”― Mark Twain

Mine Çetinkaya-Rundel teaches this course on Inferential Statistics . And it cannot get simpler than this one.

She is a great instructor and explains the fundamentals of Statistical inference nicely — a must-take course.

You will learn about hypothesis testing, confidence intervals, and statistical inference methods for numerical and categorical data.

You might also want to go through a few of my posts while going through this specialization:

- <strong>P-value Explained Simply for Data Scientists</strong>
- <strong>Confidence Intervals Explained Simply for Data Scientists</strong>

SQL is the heart of all data ETL

While we feel much more accomplished by creating models and coming up with the different hypotheses, the role of data munging can’t be understated.

And with the ubiquitousness of SQL when it comes to ETL and data preparation tasks, everyone should know a little bit of it to at least be useful.

SQL has also become a de facto standard of working with Big Data Tools like Apache Spark. This SQL specialization from UC Davis will teach you about SQL as well as how to use SQL for distributed computing.

From the Course website:

Through four progressively more difficult SQL projects with data science applications, you will cover topics such as SQL basics, data wrangling, SQL analysis, AB testing, distributed computing using Apache Spark, and more

You might also want to go through a few of my posts while going through this specialization:

- <strong>Learning SQL the Hard Way</strong>
- <strong>The Hitchhikers guide to handle Big Data using Spark</strong>
- <strong>5 Ways to add a new column in a PySpark Dataframe</strong>

In the big leagues, there is no spoonfeeding.

You might not agree to this, but till now, whatever we have done has been spoonfed learning. The material was structured, and the Math has been minimal. But that has prepared you for the next steps. This Advanced Machine Learning specialization by Top Kaggle machine learning practitioners and CERN scientists takes another approach to learning by going through a lot of difficult concepts and guiding you through how things worked in the past and the most recent advancements in the Machine Learning World. The description on the website says:

This specialization gives an introduction to

. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice.deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods

You might like to look at a few of my posts while trying to understand some of the material in this course.

Deep Learning is the Future

Andrew NG is back again with his new Deep Learning Specialization . And this is Pure Gold.

Andrew Ng has achieved mastery in explaining difficult concepts in an easy to understand way. The nomenclature he follows is different from all other tutorials and courses on the net, and I hope it catches on as it is pretty helpful in understanding all the basic concepts.

From the specialization website:

Learn the foundations of Deep Learning, understand how to build neural networks, and learn how to lead successful machine learning projects. You will learn about Convolutional networks, RNNs, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization, and more. You will work on case studies from healthcare, autonomous driving, sign language reading, music generation, and natural language processing.

You might like to look at a few of my posts while trying to understand some of the material in this course.

- <strong>An End to End Introduction to GANs</strong>
- <strong>Object Detection using Deep Learning Approaches: An End to End Theoretical Perspective</strong>

Python on Fire

I usually never advocate to learn a tool, but here I do. The reason being that it is incredible and seriously, you will be able to read code in a lot of recent research papers if you understand Pytorch. Pytorch has become a default programming language for researchers working in Deep Learning, and it will only pay for us to learn it.

A structured way to learn Pytorch is by taking this course on Deep Neural Networks with Pytorch . From the course website:

The course will start with Pytorch’s tensors and Automatic differentiation package. Then each section will cover different models starting off with fundamentals such as Linear Regression, and logistic/softmax regression. Followed by Feedforward deep neural networks, the role of different activation functions, normalization and dropout layers. Then Convolutional Neural Networks and Transfer learning will be covered. Finally, several other Deep learning methods will be covered.

You might also look at this post of mine, where I try to explain how to work with PyTorch.

The secret: it’s not what you know, it’s what you show.

There are a lot of things to consider while building a great machine learning system. But often it happens that we, as data scientists, only worry about certain parts of the project.

**But do we ever think about how we will deploy our models once we have them?**

I have seen a lot of ML projects, and a lot of them are doomed to fail as they don’t have a set plan for production from the onset.

Having a good platform and understanding how that platform deploys machine Learning apps will make all the difference in the real world. This course on AWS for implementing Machine Learning applications promises just that.

This course will teach you:

- How to build, train and deploy a model using Amazon SageMaker with built-in algorithms and Jupyter Notebook instance.
- How to build intelligent applications using Amazon AI services like Amazon Comprehend, Amazon Rekognition, Amazon Translate and others.

You might also look at this post of mine, where I try to talk about apps and explain how to plan for Production.

- <strong>How to write Web apps using simple Python for Data Scientists?</strong>
- <strong>How to Deploy a Streamlit App using an Amazon Free ec2 instance?</strong>
- <strong>Take your Machine Learning Models to Production with these 5 simple steps</strong>

Algorithms. Yes, you need them.

Algorithms and data structures are an integral part of data science. While most of us data scientists don’t take a proper algorithms course while studying, they are essential all the same.

Many companies ask data structures and algorithms as part of their interview process for hiring data scientists.

They will require the same zeal to crack as your Data Science interviews, and thus, you might want to give some time for the study of algorithms and Data structure and algorithms questions.

**One of the best resources I found to learn algorithms is the
Algorithm Specialization on Coursera by UCSanDiego
.** From the specialization website:

You will learn algorithmic techniques for solving various computational problems and will implement about 100 algorithmic coding problems in a programming language of your choice. No other online course in

Algorithmseven comes close to offering you a wealth ofprogramming challengesthat you may face at your next job interview.

You might also like to look at a few of my posts while trying to understand some of the material in this specialization.

- <strong>3 Programming concepts for Data Scientists</strong>
- <strong>A simple introduction to Linked Lists for Data Scientists</strong>
- <strong>Dynamic Programming for Data Scientists</strong> - <strong>Handling Trees in Data Science Algorithmic Interview</strong>

I am going to be writing more beginner-friendly posts in the future too. Follow me up at Medium or Subscribe to my blog .

Also, a small disclaimer — There might be some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.

comments powered by Disqus