This post is a moron's attempt to understand something that most others take for granted, a recipe given in the itertools documentation. This is not a solution to any real problem.
Groups of n
Groups of n is a function that takes an iterable and returns n consecutive elements from that iterable.
I needed something like this, while I was trying to convert a rather silly ID-generation function written in Perl to Python. It turned out to be a simple hexadecimal encoding/decoding function.
Under normal circumstances, this is how you would implement that in Python:
import codecs def hexenc(input): return codecs.getencoder("hex")(input)[0] def hexdec(input): return codecs.getdecoder("hex")(input)[0]
In any case, I was halfway through, and so I decided to continue, and ended up with the following implementation:
def hexenc(input): return "".join( hex(ord(char))[2:].rjust(2, "0") for char in input ) def hexdec(input): return "".join( chr(int("".join(hx), 16)) for hx in groupsofn(2, input) )
Notice that the hexdec
function assumes that a function called “groupsofn” exists and that it splits the input into n consecutive items.
Plain Generator Implementation
This is a plain generator implementation of it, and incredibly dumb:
def groupsofn(n, iterable, pad=False, padwith=None): from itertools import repeat group = [] for i, item in enumerate(iterable): group.append(item) if (i + 1) % n == 0: yield group group = [] if pad and 0 < len(group) < n: group.extend(repeat(padwith, n - len(group))) if group: yield group
The itertools
Recipe
The itertools documentation provides the following recipe:
# from itertools recipe def grouper(n, iterable, padvalue=None): """grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x') """ return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
How does this work? This left me puzzled for some time. (I am sure expert programmers would not find anything to be puzzled here, but I am just a plain, dumb programmer.)
How the itertools
Recipe Works?
Let us assume for the time being that we are not concerned with padding. Now, the implementation simplifies to the following:
def grouper(n, iterable): return izip(*[iterable]*n)
The [iterable]*n
bit makes n references to the same iterator. Then when izip
picks one from each iterator, we simply get consecutive elements. It is like taking apples from the same basket, but we think we are taking it from three different baskets.
Consider the example of invoking grouper
with 3 (I am expanding the variables in each step):
def grouper(3, iterable): return izip(*([iterable] * 3)) # multiplying the list creates a list with 3 copies def grouper(3, iterable): return izip(*[iterable, iterable, iterable]) # Unpacking makes the list into 3 params to izip def grouper(3, iterable): return izip(iterable, iterable, iterable) # Now, let us look just at izip (conceptually, it # yields, one from each iterator packed into a tuple) def izip(iter1, iter2, iter3): # ... imaginary izip implementation yield (iter1.next(), iter2.next(), iter3.next()) # Note that iter1 is iter2 is iter3 == True. # so, it is essentially this: def izip(iterable, iterable, iterable) # ... imaginary izip implementation yield (iterable.next(), iterable.next(), iterable.next())
Adding the Padding
The chain(iterable, repeat(padvalue, n-1)
bit ensures that there will be enough items in the iterable for izip
to go through, because izip
bails out when the shortest iterator ends.
Let us consider the case, where there is one odd (not the mathematical sense) item at the end (that is, len(iterable) % n == 1). If n - 1, padding is always added to the end of the iterable, then, izip
will not exit and leave out that one last item.
Consider the other extreme, when, there are no odd ones out at the end, then the padded n-1 values, will not give enough elements for the iterator, to consume and that extra padding will be left behind.
In other cases, then having the additional padding of n-1, will ensure that that 1 element is returned, and for anything larger, the excess padding will be skipped.
X: padding value *: values in iterable n == 3 len(iterable) == 7 * * * * * * * X X len(iterable) == 6 * * * * * * X X len(iterable) == 8 * * * * * * * * X X
A consequence of this is that it does not provide a way to leave out padding altogether.
A grouper
with Optional Padding
To add optional padding, I could use the simple generator based code listed earlier. However, after you see the itertools
recipe, you just don't feel like manually accumulating anything.
The groupby
Solution
The itertools.groupby
looked promising, and after a few trials, came up with the following:
def grouper(n, iterable): for k, g in groupby(enumerate(iterable), lambda x: x[0] // n): yield (e for i, e in g)
I chose to follow the itertools.groupby
's behavior of keeping the groups as generators.
A similar implementation was pointed out by David Isaac in the Python mailing list.
Adding Padding to the groupby
Solution
To add padding, we could do something dumb like this:
def pad(length, iterable, padding=None): return islice(chain(iterable, repeat(padding)), length) def grouper(n, iterable, pad=False, padding=None): if pad: for k, g in groupby(enumerate(iterable), lambda x: x[0] // n): yield pad(n, (e for i, e in g), padding) else: for k, g in groupby(enumerate(iterable), lambda x: x[0] // n): yield (e for i, e in g)
The padding routine is an adaptation of the padnone
recipe from the itertools
documentation.
However, This implementation has a big problem. We pointlessly pad every group, when, we might only need to do it for one.
The Hybrid
We already have a solution that works well for the case where padding is required. So, the obvious solution is to use a hybrid.
Here is the full version of the hybrid code:
def grouper(n, iterable, pad=False, padding=None): if pad: return izip(*[chain(iterable, repeat(padding, n-1))]*n) else: return ((e for i, e in group) for key, group in groupby(enumerate(iterable), lambda x: x[0] // n) )