Python Pro Tip: Start using Python defaultdict and Counter in place of dictionary
Learning a language is easy. Whenever I start with a new language, I focus on a few things in below order, and it is a breeze to get started with writing code in any language.
Operators and Data Types: +,-,int,float,str
Conditional statements: if,else,case,switch
Loops: For, while
Data structures: List, Array, Dict, Hashmaps
Define Function
However, learning to write a language and writing a language in an optimized way are two different things.
Every Language has some ingredients which make it unique.
Yet, a new programmer to any language will always do some forced overfitting. A Java programmer, new to python, for example, might write this code to add numbers in a list.
x=[1,2,3,4,5]
sum_x = 0
for i in range(len(x)):
sum_x+=x[i]
While a python programmer will naturally do this:
sum_x = sum(x)
In this series of posts named ‘Python Shorts’, I will explain some simple constructs that Python provides, some essential tips and some use cases I come up with regularly in my Data Science work.
This series is about efficient and readable code.
Counter and defaultdict — Use Cases
Let’s say I need to count the number of word occurrences in a piece of text. Maybe for a book like Hamlet. How could I do that?
Python always provides us with multiple ways to do the same thing. But only one way that I find elegant.
This is a Naive Python implementation using the dict object.
text = "I need to count the number of word occurrences in a piece of text. How could I do that? Python provides us with multiple ways to do the same thing. But only one way I find beautiful."
word_count_dict = {}
for w in text.split(" "):
if w in word_count_dict:
word_count_dict[w]+=1
else:
word_count_dict[w]=1
We could use defaultdict to reduce the number of lines in the code.
from Collections import defaultdict
word_count_dict = defaultdict(int)
for w in text.split(" "):
word_count_dict[w]+=1
We could also have used Counter to do this.
from Collections import Counter
word_count_dict = Counter()
for w in text.split(" "):
word_count_dict[w]+=1
If we use Counter, we can also get the most common words using a simple function.
word_count_dict.most_common(10)
---------------------------------------------------------------
[('I', 3), ('to', 2), ('the', 2)]
Other use cases of Counter:
# Count Characters
Counter('abccccccddddd')
---------------------------------------------------------------
Counter({'a': 1, 'b': 1, 'c': 6, 'd': 5})
# Count List elements
Counter([1,2,3,4,5,1,2])
---------------------------------------------------------------
Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1})
So, why ever use defaultdict ?
Notice that in Counter, the value is always an integer.
What if we wanted to parse through a list of tuples and wanted to create a dictionary of key and list of values.
The main functionality provided by a defaultdict is that it defaults a key to empty/zero if it is not found in the defaultdict.
s = [('color', 'blue'), ('color', 'orange'), ('color', 'yellow'), ('fruit', 'banana'), ('fruit', 'orange'),('fruit','banana')]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
print(d)
---------------------------------------------------------------
defaultdict(<class 'list'>, {'color': ['blue', 'orange', 'yellow'], 'fruit': ['banana', 'orange', 'banana']})
banana comes two times in fruit, we could use set
d = defaultdict(set)
for k, v in s:
d[k].add(v)
print(d)
---------------------------------------------------------------
defaultdict(<class 'set'>, {'color': {'yellow', 'blue', 'orange'}, 'fruit': {'banana', 'orange'}})
Conclusion
To conclude, I will say that there is always a beautiful way to do anything in Python. Search for it before you write code. Going to StackOverflow is okay. I go there a lot of times when I get stuck. Always Remember:
Creating a function for what already is provided is not pythonic.
Also if you want to learn more about Python 3, I would like to call out an excellent course on Learn Intermediate level Python from the University of Michigan. Do check it out.