Data Science Programming

Python Pro Tip: Start using Python defaultdict and Counter in place of dictionary

Python Pro Tip: Start using Python defaultdict and Counter in place of dictionary

Learning a language is easy. Whenever I start with a new language, I focus on a few things in below order, and it is a breeze to get started with writing code in any language.

  • Operators and Data Types: +,-,int,float,str

  • Conditional statements: if,else,case,switch

  • Loops: For, while

  • Data structures: List, Array, Dict, Hashmaps

  • Define Function

However, learning to write a language and writing a language in an optimized way are two different things.

Every Language has some ingredients which make it unique.

Yet, a new programmer to any language will always do some forced overfitting. A Java programmer, new to python, for example, might write this code to add numbers in a list.

x=[1,2,3,4,5]

sum_x = 0
for i in range(len(x)):
    sum_x+=x[i]

While a python programmer will naturally do this:

sum_x = sum(x)

In this series of posts named ‘Python Shorts’, I will explain some simple constructs that Python provides, some essential tips and some use cases I come up with regularly in my Data Science work.

This series is about efficient and readable code.

Counter and defaultdict — Use Cases

Let’s say I need to count the number of word occurrences in a piece of text. Maybe for a book like Hamlet. How could I do that?

Python always provides us with multiple ways to do the same thing. But only one way that I find elegant.

This is a Naive Python implementation using the dict object.

text = "I need to count the number of word occurrences in a piece of text. How could I do that? Python provides us with multiple ways to do the same thing. But only one way I find beautiful."

word_count_dict = {}
for w in text.split(" "):
    if w in word_count_dict:
        word_count_dict[w]+=1
    else:
        word_count_dict[w]=1

We could use defaultdict to reduce the number of lines in the code.

from Collections import defaultdict
word_count_dict = defaultdict(int)
for w in text.split(" "):
    word_count_dict[w]+=1

We could also have used Counter to do this.

from Collections import Counter
word_count_dict = Counter()
for w in text.split(" "):
    word_count_dict[w]+=1

If we use Counter, we can also get the most common words using a simple function.

word_count_dict.most_common(10)
---------------------------------------------------------------
[('I', 3), ('to', 2), ('the', 2)]

Other use cases of Counter:

# Count Characters
Counter('abccccccddddd')
---------------------------------------------------------------
Counter({'a': 1, 'b': 1, 'c': 6, 'd': 5})

# Count List elements
Counter([1,2,3,4,5,1,2])
---------------------------------------------------------------
Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1})

So, why ever use defaultdict ?

Notice that in Counter, the value is always an integer.

What if we wanted to parse through a list of tuples and wanted to create a dictionary of key and list of values.

The main functionality provided by a defaultdict is that it defaults a key to empty/zero if it is not found in the defaultdict.

s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit','banana')]

d = defaultdict(list)

for k, v in s:
     d[k].append(v)

print(d)
---------------------------------------------------------------
defaultdict(<class 'list'>, {'color': ['blue', 'orange', 'yellow'], 'fruit': ['banana', 'orange', 'banana']})

banana comes two times in fruit, we could use set

d = defaultdict(set)

for k, v in s:
     d[k].add(v)

print(d)
---------------------------------------------------------------
defaultdict(<class 'set'>, {'color': {'yellow', 'blue', 'orange'}, 'fruit': {'banana', 'orange'}})

Conclusion

To conclude, I will say that there is always a beautiful way to do anything in Python. Search for it before you write code. Going to StackOverflow is okay. I go there a lot of times when I get stuck. Always Remember:

Creating a function for what already is provided is not pythonic.

Also if you want to learn more about Python 3, I would like to call out an excellent course on Learn Intermediate level Python from the University of Michigan. Do check it out.

If you liked this post do share. It will help increase coverage for this post. I am going to be writing more beginner friendly posts in the future too. Let me know what you think about the series. Follow me up at <strong>Medium</strong> or Subscribe to my <strong>blog</strong> to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz .

Start your future with a Data Analysis Certificate.
comments powered by Disqus