MLWhiz | AI Unwrapped

MLWhiz | AI Unwrapped

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
5 Ways to add a new column in a PySpark Dataframe
Copy link
Facebook
Email
Notes
More

5 Ways to add a new column in a PySpark Dataframe

Rahul Agarwal's avatar
Rahul Agarwal
Feb 24, 2020
∙ Paid

Share this post

MLWhiz | AI Unwrapped
MLWhiz | AI Unwrapped
5 Ways to add a new column in a PySpark Dataframe
Copy link
Facebook
Email
Notes
More
Share
5 Ways to add a new column in a PySpark Dataframe

Too much data is getting generated day by day.

Although sometimes we can manage our big data using tools like Rapids or Parallelization , Spark is an excellent tool to have in your repertoire if you are working with Terabytes of data.

In my last post on Spark, I explained how to work with PySpark RDDs and Dataframes.

Although this post explains a lot on how to work with RDDs and basic Dataframe operations, I missed quite a lot when it comes to working with PySpark Dataframes.

And it is only when I required more functionality that I read up and came up with multiple solutions to do one single thing.

How to create a new column in spark?

Now, this might sound trivial, but believe me, it isn’t. With so much you might want to do with your data, I am pretty sure you will end up using most of these column creation processes in your workflow. Sometimes to utilize Pandas functionality, or occasionally to use RDDs based partitioning or sometimes to make use of the mature python ecosystem.

This post is…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Rahul Agarwal
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More