Shell Basics every Data Scientist Should know - Part II(AWK)

Oct 11, 2015

∙ Paid

Yesterday I got introduced to awk programming on the shell and is it cool. It lets you do stuff on the command line which you never imagined. As a matter of fact, it’s a whole data analytics software in itself when you think about it. You can do selections, groupby, mean, median, sum, duplication, append. You just ask. There is no limit actually.

And it is easy to learn.

In this post, I will try to give you a brief intro about how you could add awk to your daily work-flow.

Please see my previous post if you want some background or some basic to intermediate understanding of shell commands.

Basics/ Fundamentals

So let me start with an example first. Say you wanted to sum a column in a comma delimited file. How would you do that in shell?

Here is the command. The great thing about awk is that it took me nearly 5 sec to write this command. I did not have to open any text editor to write a python script.

It lets you do adhoc work quickly.

awk 'BEGIN{ sum=0; FS=","} { sum += $5 } END { print sum }…

Keep reading with a 7-day free trial

Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.