Spark - MLWhiz

Everything Programmers need to learn about GPT — Using OpenAI and Understanding Prompting

Deep Learning Natural Language Processing Awesome Guides Chat Gpt Series Everything Programmers need to learn about GPT — Using OpenAI and Understanding Prompting

By Rahul Agarwal 13 August 2023

ChatGPT’s free conversational interface offers a tantalizing glimpse into the future of AI.

Deep Learning Natural Language Processing Awesome Guides Chat Gpt Series Everything you need to learn about GPT — How Does ChatGPT Work?

By Rahul Agarwal 07 July 2023

ChatGPT is what everyone is talking about nowadays. Would it take all the jobs?

Opinion How should I start Reading code?

By Rahul Agarwal 27 November 2022

Or how to get better at hacking? Reading code is a hard skill to inculcate.

Bash How to Set Environment Variables in Linux?

By Rahul Agarwal 27 November 2022

Or How to use the export command Linux shell has become a constant part of every ML Engineer, Data Scientist and Programmer’s life.

Awesome Guides The Primer on Asyncio that I Wish I Had

By Rahul Agarwal 26 November 2022

Parallelism and concurrency aren’t the same things. In some cases, concurrency is much more powerful.

Deep Learning Natural Language Processing Awesome Guides Explaining BERT Simply Using Sketches

By Rahul Agarwal 24 July 2021

In my last series of posts on Transformers, I talked about how a transformer works and how to implement one yourself for a translation task.

Deep Learning Natural Language Processing Awesome Guides How Can Data Scientists Use Parallel Processing?

By Rahul Agarwal 24 July 2021

Finally, my program is running! Should I go and get a coffee?

Programming Awesome Guides Solve almost every Binary Search Problem

By Rahul Agarwal 24 July 2021

Algorithms are an integral part of data science. While most of us data scientists don’t take a proper algorithms course while studying, they are important all the same.

Learning Resources Free Coursera Courses for Learners in India

By Rahul Agarwal 02 June 2021

In one of my previous posts, I talked about how to become a data Scientist using some awesome resources from Coursera .

Data Science A Layman’s guide to ROC Curves And AUC

By Rahul Agarwal 03 February 2021

ROC curves, or receiver operating characteristic curves, are one of the most common evaluation metrics for checking a classification model’s performance.

Data Science Programming Use Iterators, Generators, and Generator Expressions

By Rahul Agarwal 28 November 2020

Python in many ways has made our life easier when it comes to programming.

Data Science Programming Object Oriented Programming Explained Simply for Data Scientists

By Rahul Agarwal 24 November 2020

Object-Oriented Programming or OOP can be a tough concept to understand for beginners.

Programming How to Look like a 10x developer

By Rahul Agarwal 13 November 2020

I love working with shell commands. They are fast and provide a ton of flexibility to do ad-hoc things.

Programming Learning Resources How I cracked my MLE interview at Facebook

By Rahul Agarwal 30 October 2020

It was August last year and I was in the process of giving interviews.

Data Science The Use of Machine Learning in Predicting Sports Match Outcomes

By Rahul Agarwal 22 October 2020

Out of all spheres of technological advancement, artificial intelligence has always attracted the most attention from the general public.

Programming Awesome Guides Understanding Transformers, the Programming Way

By Rahul Agarwal 10 October 2020

Transformers have become the defacto standard for NLP tasks nowadays. They started being used in NLP but they are now being used in Computer Vision and sometimes to generate music as well.

Deep Learning Computer Vision How I Created a Dataset for Instance Segmentation from Scratch?

By Rahul Agarwal 04 October 2020

Recently, I was looking for a toy dataset for my new book’s chapter (you can subscribe to the updates here ) on instance segmentation.

Deep Learning Natural Language Processing Awesome Guides Understanding Transformers, the Data Science Way

By Rahul Agarwal 20 September 2020

Transformers have become the defacto standard for NLP tasks nowadays.

Deep Learning Natural Language Processing Computer Vision Awesome Guides The Most Complete Guide to PyTorch for Data Scientists

By Rahul Agarwal 08 September 2020

PyTorch has sort of became one of the de facto standards for creating Neural Networks now, and I love its interface.

Data Science Awesome Guides Create an Awesome Development Setup for Data Science using Atom

By Rahul Agarwal 02 September 2020

Before I even begin this article, let me just say that I love iPython Notebooks, and Atom is not an alternative to Jupyter in any way.

Deep Learning Computer Vision Awesome Guides A Layman’s Introduction to GANs for Data Scientists using PyTorch

By Rahul Agarwal 27 August 2020

Most of us in data science has seen a lot of AI-generated people in recent times, whether it be in papers, blogs, or videos.

Data Science Learning Resources Become an ML Engineer with these courses from Amazon and Google

By Rahul Agarwal 27 August 2020

With ML Engineer job roles in all the vogue and a lot of people preparing for them, I get asked a lot of times by my readers to recommend courses for the ML engineer roles particularly and not for the Data Science roles.

Data Science How to use SQL with Pandas?

By Rahul Agarwal 27 August 2020

Pandas is one of the best data manipulation libraries in recent times.

Data Science 5 Essential Business-Oriented Critical Thinking Skills For Data Scientists

By Rahul Agarwal 12 August 2020

As Alexander Pope said, to err is human. By that metric, who is more human than us data scientists?

Deep Learning Computer Vision Creating my First Deep Learning + Data Science Workstation

By Rahul Agarwal 09 August 2020

Creating my workstation has been a dream for me, if nothing else.

Big Data Data Science Accelerating Spark 3.0 Google DataProc Project with NVIDIA GPUs in 6 simple steps

By Rahul Agarwal 04 August 2020

Data Exploration is a key part of Data Science. And does it take long?

Deployment could be easy — A Data Scientist’s Guide to deploy an Image detection FastAPI API using Amazon ec2

Programming Computer Vision Awesome Guides Deployment could be easy — A Data Scientist’s Guide to deploy an Image detection FastAPI API using Amazon ec2

By Rahul Agarwal 04 August 2020

Just recently, I had written a simple tutorial on FastAPI, which was about simplifying and understanding how APIs work, and creating a simple API using the framework.

A definitive guide for Setting up a Deep Learning Workstation with Ubuntu

Natural Language Processing Deep Learning Computer Vision Awesome Guides A definitive guide for Setting up a Deep Learning Workstation with Ubuntu

By Rahul Agarwal 24 June 2020

Creating my own workstation has been a dream for me if nothing else.

Awesome Guides Data Science A Layman’s Guide for Data Scientists to create APIs in minutes

By Rahul Agarwal 24 June 2020

Have you ever been in a situation where you want to provide your model predictions to a frontend developer without them having access to model related code?

Deep Learning Computer Vision Awesome Guides End to End Pipeline for setting up Multiclass Image Classification for Data Scientists

By Rahul Agarwal 24 June 2020

Have you ever wondered how Facebook takes care of the abusive and inappropriate images shared by some of its users?

Deep Learning Data Science How to run your ML model Predictions 50 times faster?

By Rahul Agarwal 24 June 2020

With the advent of so many computing and serving frameworks, it is getting stressful day by day for the developers to put a model into production .

Big Data Data Science Awesome Guides The Most Complete Guide to pySpark DataFrames

By Rahul Agarwal 24 June 2020

Big Data has become synonymous with Data engineering. But the line between Data Engineering and Data scientists is blurring day by day.

Data Science Don’t Democratize Data Science

By Rahul Agarwal 25 May 2020

Every few years, some academic and professional field gets a lot of cachet in the popular imagination.

Data Science Five Cognitive Biases In Data Science (And how to avoid them)

By Rahul Agarwal 25 May 2020

Recently, I was reading Rolf Dobell’s The Art of Thinking Clearly, which made me think about cognitive biases in a way I never had before.

Deep Learning Stop Worrying and Create your Deep Learning Server in 30 minutes

By Rahul Agarwal 25 May 2020

I have found myself creating a Deep Learning Machine time and time again whenever I start a new project.

Programming How and Why to use f strings in Python3?

By Rahul Agarwal 24 May 2020

Python provides us with many styles of coding. And with time, Python has regularly come up with new coding standards and tools that adhere even more to the coding standards in the Zen of Python.

Natural Language Processing Deep Learning Awesome Guides Using Deep Learning for End to End Multiclass Text Classification

By Rahul Agarwal 24 May 2020

Have you ever thought about how toxic comments get flagged automatically on platforms like Quora or Reddit?

Awesome Guides Data Science A Newspaper for COVID-19 — The CoronaTimes

By Rahul Agarwal 29 March 2020

It seems that the way that I consume information has changed a lot.

Learning Resources 5 Online Courses you can take for free during COVID-19 Epidemic

By Rahul Agarwal 27 March 2020

With Coronavirus on the prowl, there has been a huge demand across the world for MOOCs as schools and universities continue to shut down.

Data Science Deep Learning Can AI help in fighting against Corona?

By Rahul Agarwal 25 March 2020

Feeling Helpless? I know I am. With the whole shutdown situation, what I thought was once a paradise for my introvert self doesn’t look so good when it is actually happening.

Big Data Data Science Awesome Guides Practical Spark Tips for Data Scientists

By Rahul Agarwal 20 March 2020

I know — Spark is sometimes frustrating to work with.

Learning Resources Data Science 5 tips for getting your first Data Science job in 2020

By Rahul Agarwal 24 February 2020

Many of my followers ask me — How difficult is it to get a job in the Data Science field?

Big Data Data Science Awesome Guides 5 Ways to add a new column in a PySpark Dataframe

By Rahul Agarwal 24 February 2020

Too much data is getting generated day by day. Although sometimes we can manage our big data using tools like Rapids or Parallelization , Spark is an excellent tool to have in your repertoire if you are working with Terabytes of data.

Data Science Bamboolib — Learn and use Pandas without Coding

By Rahul Agarwal 23 February 2020

Have you ever been frustrated by doing data exploration and manipulation with Pandas?

Data Science Programming Lightning Fast XGBoost on Multiple GPUs

By Rahul Agarwal 23 February 2020

XGBoost is one of the most used libraries fora data science.

Data Science Share your Projects even more easily with this New Streamlit Feature

By Rahul Agarwal 23 February 2020

A Machine Learning project is never really complete if we don’t have a good way to showcase it.

Big Data Data Science 100x faster Hyperparameter Search Framework with Pyspark

By Rahul Agarwal 22 February 2020

Recently I was working on tuning hyperparameters for a huge Machine Learning model.

Data Science How to Deploy a Streamlit App using an Amazon Free ec2 instance?

By Rahul Agarwal 22 February 2020

A Machine Learning project is never really complete if we don’t have a good way to showcase it.

Data Science Minimal Pandas Subset for Data Scientists on GPU

By Rahul Agarwal 22 February 2020

Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax.

Data Science Awesome Guides Confidence Intervals Explained Simply for Data Scientists

By Rahul Agarwal 21 February 2020

Recently, I got asked about how to explain confidence intervals in simple terms to a layperson.

Data Science Awesome Guides Learning SQL the Hard Way

By Rahul Agarwal 21 February 2020

A Data Scientist who doesn’t know SQL is not worth his salt

Data Science Add this single word to make your Pandas Apply faster

By Rahul Agarwal 20 February 2020

We as data scientists have got laptops with quad-core, octa-core, turbo-boost.

Data Science Programming Handling Trees in Data Science Algorithmic Interview

By Rahul Agarwal 29 January 2020

Algorithms and data structures are an integral part of data science.

Data Science Programming A simple introduction to Linked Lists for Data Scientists

By Rahul Agarwal 28 January 2020

Algorithms and data structures are an integral part of data science.

Data Science Programming Dynamic Programming for Data Scientists

By Rahul Agarwal 28 January 2020

Algorithms and data structures are an integral part of data science.

Data Science Awesome Guides The 5 most useful Techniques to Handle Imbalanced datasets

By Rahul Agarwal 28 January 2020

Have you ever faced an issue where you have such a small sample for the positive class in your dataset that the model is unable to learn?

3 Industries That Benefit from Data Science

By Rahul Agarwal 15 January 2020

Collecting and analysing data, including but not limited to text, images, and video formats, is a huge part of various industries.

Data Science Using Gradient Boosting for Time Series prediction tasks

By Rahul Agarwal 28 December 2019

Time series prediction problems are pretty frequent in the retail domain.

Data Science Awesome Guides Take your Machine Learning Models to Production with these 5 simple steps

By Rahul Agarwal 25 December 2019

Creating a great machine learning system is an art.

Learning Resources Data Science 3 Mistakes you should not make in a Data Science Interview

By Rahul Agarwal 24 December 2019

People ask me a lot about how to land a data science job?

Programming Data Science 3 Programming concepts for Data Scientists

By Rahul Agarwal 09 December 2019

Algorithms are an integral part of data science. While most of us data scientists don’t take a proper algorithms course while studying, they are important all the same.

Awesome Guides Data Science How to write Web apps using simple Python for Data Scientists?

By Rahul Agarwal 07 December 2019

A Machine Learning project is never really complete if we don’t have a good way to showcase it.

Deep Learning Computer Vision Awesome Guides Demystifying Object Detection and Instance Segmentation for Data Scientists

By Rahul Agarwal 05 December 2019

I like deep learning a lot but Object Detection is something that doesn’t come easily to me.

Data Science How to find Feature importances for BlackBox Models?

By Rahul Agarwal 04 December 2019

Data Science is the study of algorithms. I grapple through with many algorithms on a day to day basis, so I thought of listing some of the most common and most used algorithms one will end up using in this new DS Algorithm series .

Data Science Awesome Guides The Simple Math behind 3 Decision Tree Splitting criterions

By Rahul Agarwal 12 November 2019

Decision Trees are great and are useful for a variety of tasks.

Data Science Awesome Guides P-value Explained Simply for Data Scientists

By Rahul Agarwal 11 November 2019

Recently, I got asked about how to explain p-values in simple terms to a layperson.

Data Science Natural Language Processing Adding Interpretability to Multiclass Text Classification models

By Rahul Agarwal 08 November 2019

Explain Like I am 5. It is the basic tenets of learning for me where I try to distill any concept in a more palatable form.

Awesome Guides Data Science The 5 Classification Evaluation metrics every Data Scientist must know

By Rahul Agarwal 07 November 2019

What do we want to optimize for? Most of the businesses fail to answer this simple question.

Data Science 4 Graph Algorithms on Steroids for data Scientists with cuGraph

By Rahul Agarwal 06 November 2019

We, as data scientists have gotten quite comfortable with Pandas or SQL or any other relational database.

Data Science Automate Hyperparameter Tuning for your models

By Rahul Agarwal 10 October 2019

When we create our machine learning models, a common task that falls on us is how to tune them.

Data Science 6 Important Steps to build a Machine Learning System

By Rahul Agarwal 26 September 2019

Creating a great machine learning system is an art. There are a lot of things to consider while building a great machine learning system.

Data Science A Generative Approach to Classification

By Rahul Agarwal 23 September 2019

I always get confused whenever someone talks about generative vs. discriminative classification models.

Data Science Awesome Guides The Ultimate Guide to using the Python regex module

By Rahul Agarwal 01 September 2019

One of the main tasks while working with text data is to create a lot of text-based features.

Data Science Awesome Guides How did I learn Data Science?

By Rahul Agarwal 12 August 2019

I am a Mechanical engineer by education. And I started my career with a core job in the steel industry.

Data Science Awesome Guides The 5 Feature Selection Algorithms every Data Scientist should know

By Rahul Agarwal 07 August 2019

Data Science Awesome Guides The 5 Sampling Algorithms every Data Scientist need to know

By Rahul Agarwal 30 July 2019

Data Science is the study of algorithms. I grapple through with many algorithms on a day to day basis so I thought of listing some of the most common and most used algorithms one will end up using in this new DS Algorithm series.

Data Science Bayesian Bandits explained simply

By Rahul Agarwal 21 July 2019

Exploration and Exploitation play a key role in any business.

Awesome Guides Data Science Minimal Pandas Subset for Data Scientists

By Rahul Agarwal 20 July 2019

Pandas is a vast library. Data manipulation is a breeze with pandas, and it has become such a standard for it that a lot of parallelization libraries like Rapids and Dask are being created in line with Pandas syntax.

Big Data Data Science Awesome Guides The Hitchhikers guide to handle Big Data using Spark

By Rahul Agarwal 07 July 2019

Big Data has become synonymous with Data engineering. But the line between Data Engineering and Data scientists is blurring day by day.

Data Science 3 Great Additions for your Jupyter Notebooks

By Rahul Agarwal 28 June 2019

I love Jupyter notebooks and the power they provide.

Deep Learning Computer Vision Awesome Guides An End to End Introduction to GANs using Keras

By Rahul Agarwal 17 June 2019

I bet most of us have seen a lot of AI-generated people faces in recent times, be it in papers or blogs.

Data Science Awesome Guides The Hitchhiker’s Guide to Feature Extraction

By Rahul Agarwal 19 May 2019

Good Features are the backbone of any machine learning model.

Programming Data Science A primer on *args, **kwargs, decorators for Data Scientists

By Rahul Agarwal 14 May 2019

Python has a lot of constructs that are reasonably easy to learn and use in our code.

Data Science Python’s One Liner graph creation library with animations Hans Rosling Style

By Rahul Agarwal 05 May 2019

I distinctly remember the time when Seaborn came. I was really so fed up with Matplotlib.

Programming Data Science Make your own Super Pandas using Multiproc

By Rahul Agarwal 02 May 2019

Parallelization is awesome. We data scientists have got laptops with quad-core, octa-core, turbo-boost.

Programming Data Science Minimize for loop usage in Python

By Rahul Agarwal 23 April 2019

Python provides us with many styles of coding. In a way, it is pretty inclusive.

Data Science Programming Python Pro Tip: Start using Python defaultdict and Counter in place of dictionary

By Rahul Agarwal 22 April 2019

Learning a language is easy. Whenever I start with a new language, I focus on a few things in below order, and it is a breeze to get started with writing code in any language.

Data Science Awesome Guides 3 Awesome Visualization Techniques for every dataset

By Rahul Agarwal 19 April 2019

Visualizations are awesome. However, a good visualization is annoyingly hard to make.

Data Science Why Sublime Text for Data Science is Hotter than Jennifer Lawrence?

By Rahul Agarwal 31 March 2019

Just Kidding, Nothing is hotter than Jennifer Lawrence. But as you are here, let’s proceed.

Natural Language Processing Deep Learning Awesome Guides NLP Learning Series: Part 4 - Transfer Learning Intuition for Text Classification

By Rahul Agarwal 30 March 2019

This post is the fourth post of the NLP Text classification series.

Natural Language Processing Deep Learning Awesome Guides NLP Learning Series: Part 3 - Attention, CNN and what not for Text Classification

By Rahul Agarwal 09 March 2019

This post is the third post of the NLP Text classification series.

What my first Silver Medal taught me about Text Classification and Kaggle in general?

Natural Language Processing Deep Learning Awesome Guides What my first Silver Medal taught me about Text Classification and Kaggle in general?

By Rahul Agarwal 19 February 2019

Kaggle is an excellent place for learning. And I learned a lot of things from the recently concluded competition on Quora Insincere questions classification in which I got a rank of 182/4037.

Natural Language Processing Deep Learning Awesome Guides NLP Learning Series: Part 2 - Conventional Methods for Text Classification

By Rahul Agarwal 08 February 2019

This is the second post of the NLP Text classification series.

Natural Language Processing Deep Learning Awesome Guides NLP Learning Series: Part 1 - Text Preprocessing Methods for Deep Learning

By Rahul Agarwal 17 January 2019

Recently, I started up with an NLP competition on Kaggle called Quora Question insincerity challenge.

Natural Language Processing Deep Learning Computer Vision Awesome Guides A Layman guide to moving from Keras to Pytorch

By Rahul Agarwal 06 January 2019

Recently I started up with a competition on kaggle on text classification, and as a part of the competition, I had to somehow move to Pytorch to get deterministic results.

Natural Language Processing Deep Learning Awesome Guides What Kagglers are using for Text Classification

By Rahul Agarwal 17 December 2018

With the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning.

Data Science Big Data To all Data Scientists - The one Graph Algorithm you need to know

By Rahul Agarwal 07 December 2018

Graphs provide us with a very useful data structure. They can help us to find structure within our data.

Deep Learning Computer Vision Object Detection: An End to End Theoretical Perspective

By Rahul Agarwal 22 September 2018

We all know about the image classification problem. Given an image can you find out the class the image belongs to?

Data Science Awesome Guides Using XGBoost for time series prediction tasks

By Rahul Agarwal 26 December 2017

Recently Kaggle master Kazanova along with some of his friends released a “How to win a data science competition” Coursera course.

Data Science The story of every distribution - Discrete Distributions

By Rahul Agarwal 14 September 2017

Distributions play an important role in the life of every Statistician.

Data Science Maths Beats Intuition probably every damn time

By Rahul Agarwal 16 April 2017

Newton once said that “God does not play dice with the universe”.

Natural Language Processing Deep Learning Today I Learned This Part I: What are word2vec Embeddings?

By Rahul Agarwal 09 April 2017

Recently Quora put out a Question similarity competition on Kaggle. This is the first time I was attempting an NLP problem so a lot to learn.

Data Science Learning Resources Top Data Science Resources on the Internet right now

By Rahul Agarwal 26 March 2017

I have been looking to create this list for a while now.

Data Science Basics Of Linear Regression

By Rahul Agarwal 23 March 2017

Today we will look into the basics of linear regression. Here we go :

Data Science Top advice for a Data Scientist

By Rahul Agarwal 05 March 2017

A data scientist needs to be Critical and always on a lookout of something that misses others.

Data Science Machine Learning Algorithms for Data Scientists

By Rahul Agarwal 05 February 2017

As a data scientist I believe that a lot of work has to be done before Classification/Regression/Clustering methods are applied to the data you get.

Things to see while buying a Mutual Fund

By Rahul Agarwal 24 December 2016

This is a post which deviates from my pattern fo blogs that I have wrote till now but I found that Finance also uses up a lot of Statistics.

Data Science Pandas For All - Some Basic Pandas Functions

By Rahul Agarwal 27 October 2016

It has been quite a few days I have been working with Pandas and apparently I feel I have gotten quite good at it.

Data Science Deploying ML Apps using Python and Flask- Learning about Flask

By Rahul Agarwal 10 January 2016

It has been a long time since I wrote anything on my blog.

Data Science Shell Basics every Data Scientist Should know - Part II(AWK)

By Rahul Agarwal 11 October 2015

Yesterday I got introduced to awk programming on the shell and is it cool.

Data Science Shell Basics every Data Scientist Should know -Part I

By Rahul Agarwal 09 October 2015

Shell Commands are powerful. And life would be like hell without shell is how I like to say it(And that is probably the reason that I dislike windows).

Data Science Awesome Guides Create basic graph visualizations with SeaBorn- The Most Awesome Python Library For Visualization yet

By Rahul Agarwal 13 September 2015

When it comes to data preparation and getting acquainted with data, the one step we normally skip is the data visualization.

Big Data Data Science Awesome Guides Learning Spark using Python: Basics and Applications

By Rahul Agarwal 07 September 2015

I generally have a use case for Hadoop in my daily job.

Data Science Awesome Guides Behold the power of MCMC

By Rahul Agarwal 21 August 2015

Last time I wrote an article on MCMC and how they could be useful.

Data Science Awesome Guides My Tryst With MCMC Algorithms

By Rahul Agarwal 19 August 2015

The things that I find hard to understand push me to my limits.

Big Data Data Science Hadoop Mapreduce Streaming Tricks and Techniques

By Rahul Agarwal 09 May 2015

I have been using Hadoop a lot now a days and thought about writing some of the novel techniques that a user could use to get the most out of the Hadoop Ecosystem.

Data Science Exploring Vowpal Wabbit with the Avazu Clickthrough Prediction Challenge

By Rahul Agarwal 01 December 2014

In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance.

Data Science Data Science 101 : Playing with Scraping in Python

By Rahul Agarwal 02 October 2014

This is a simple illustration of using Pattern Module to scrape web data using Python.

Data Science Dictvectorizer for One Hot Encoding of Categorical Data

By Rahul Agarwal 30 September 2014

THE PROBLEM: Recently I was working on the Criteo Advertising Competition on Kaggle.

Data Science Learning pyspark – Installation – Part 1

By Rahul Agarwal 28 September 2014

This is part one of a learning series of pyspark, which is a python binding to the spark program written in Scala.

Big Data Machine Learning Hadoop, Mapreduce and More – Part 1

By Rahul Agarwal 27 September 2014

It has been some time since I was stalling learning Hadoop.

About Me

Topics

Tags

Connect With Me