Python in many ways has made our life easier when it comes to programming.
With its many libraries and functionalities, sometimes we forget to focus on some of the useful things it offers.
One of such functionalities are generators and generator expressions. I stalled learning about them for a long time but they are useful.
Have you ever encountered
yield in Python code and didn’t knew what it meant? or what does an
iterator or a
generator means and why we use it? Or have you used
ImageDataGenerator while working with Keras and didn’t understand what is going at the backend? Then this post is for you.
In this series of posts named Python Shorts , I will explain some simple constructs provided by Python, some essential tips and some use cases I come up with regularly in my Data Science work.
This post is about explaining some of the difficult concepts in an easy to understand way.
Let us say that we need to run a for loop over 10 Million Prime numbers.
I am using prime numbers in this case for understanding but it could be extended to a case where we have to process a lot of images or files in a database or big data.
How would you proceed with such a problem?
Simple. We can create a list and keep all the prime numbers there.
Really? Think of the memory such a list would occupy.
It would be great if we had something that could just keep the last prime number we have checked and returns just the next prime number.
That is where iterators could help us.
We create a class named primes and use it to generate primes.
def check_prime(number): for divisor in range(2, int(number ** 0.5) + 1): if number % divisor == 0: return False return True class Primes: def __init__(self, max): # the maximum number of primes we want generated self.max = max # start with this number to check if it is a prime. self.number = 1 # No of primes generated yet. We want to StopIteration when it reaches max self.primes_generated = 0 def __iter__(self): return self def __next__(self): self.number += 1 if self.primes_generated >= self.max: raise StopIteration elif check_prime(self.number): self.primes_generated+=1 return self.number else: return self.__next__()
We can then use this as:
prime_generator = Primes(10000000) for x in prime_generator: # Process Here
Here I have defined an iterator. This is how most of the functions like
Every iterator needs to have:
__iter__ method that returns self, and
__next__ method that returns the next value.
StopIteration exception that signifies the ending of the iterator.
Every iterator takes the above form and we can tweak the functions to our liking in this boilerplate code to do what we want to do.
See that we don’t keep all the prime numbers in memory just the state of the iterator like
what max prime number we have returned and
how many primes we have returned already.
But it seems a little too much code. Can we do better?
Put simply Generators provide us ways to write iterators easily using the yield statement.
def Primes(max): number = 1 generated = 0 while generated < max: number += 1 if check_prime(number): generated+=1 yield number
we can use the function as:
prime_generator = Primes(10) for x in prime_generator: # Process Here
It is so much simpler to read. But what is
We can think of
yield as a
return statement only as it returns the value.
But when a
yield happens the state of the function is also saved in the memory. So at every iteration in for loop the function variables like
max are stored somewhere in memory.
So what is happening is that the above function is taking care of all the boilerplate code for us by using the
Much More pythonic.
While not explicitly better than the previous solution but we can also use Generator expression for the same task. But we might lose some functionality here. They work exactly like list comprehensions but they don’t keep the whole list in memory.
primes = (i for i in range(1,100000000) if check_prime(i)) for x in primes: # do something
Functionality loss: We can generate primes till 10M. But we can’t generate 10M primes. One can only do so much with generator expressions.
But generator expressions let us do some pretty cool things.
Let us say we wanted to have all Pythagorean Triplets lower than 1000.
How can we get it?
Using a generator, now we know how to use them.
def triplet(n): # Find all the Pythagorean triplets between 1 and n for a in range(n): for b in range(a): for c in range(b): if a*a == b*b + c*c: yield(a, b, c)
We can use this as:
triplet_generator = triplet(1000) for x in triplet_generator: print(x) ------------------------------------------------------------ (5, 4, 3) (10, 8, 6) (13, 12, 5) (15, 12, 9) .....
Or, we could also have used a generator expression here:
triplet_generator = ((a,b,c) for a in range(1000) for b in range(a) for c in range(b) if a*a == b*b + c*c) for x in triplet_generator: print(x) ------------------------------------------------------------ (5, 4, 3) (10, 8, 6) (13, 12, 5) (15, 12, 9) .....
Isn’t Python Beautiful?
***We must always try to reduce the memory footprint in Python . ***Iterators and generators provide us with a way to do that with Lazy evaluation.
How do we choose which one to use? What we can do with generator expressions we could have done with generators or iterators too.
There is no correct answer here. Whenever I face such a dilemma, I always think in the terms of functionality vs readability. Generally,
Functionality wise: Iterators>Generators>Generator Expressions.
Readability wise: Iterators<Generators<Generator Expressions.
It is not necessary that you end up using them in your code now. But I guess understanding how these things work helps mitigate some of the confusion and panic one faces whenever these constructs come up.
Understanding is vital when it comes to coding
Also if you want to learn more about Python 3, I would like to call out an excellent course on Learn Intermediate level Python from the University of Michigan. Do check it out.
Also, a small disclaimer — There might be some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.comments powered by Disqus