This post is a moron's attempt to understand something that most others take for granted, a recipe given in the itertools documentation. This is not a solution to any real problem.
Groups of n
Groups of n is a function that takes an iterable and returns n consecutive elements from that iterable.
I needed something like this, while I was trying to convert a rather silly ID-generation function written in Perl to Python. It turned out to be a simple hexadecimal encoding/decoding function.
Under normal circumstances, this is how you would implement that in Python:
import codecs def hexenc(input): return codecs.getencoder("hex")(input) def hexdec(input): return codecs.getdecoder("hex")(input)
In any case, I was halfway through, and so I decided to continue, and ended up with the following implementation:
def hexenc(input): return "".join( hex(ord(char))[2:].rjust(2, "0") for char in input ) def hexdec(input): return "".join( chr(int("".join(hx), 16)) for hx in groupsofn(2, input) )
Notice that the
hexdec function assumes that a function called “groupsofn” exists and that it splits the input into n consecutive items.
Plain Generator Implementation
This is a plain generator implementation of it, and incredibly dumb:
def groupsofn(n, iterable, pad=False, padwith=None): from itertools import repeat group =  for i, item in enumerate(iterable): group.append(item) if (i + 1) % n == 0: yield group group =  if pad and 0 < len(group) < n: group.extend(repeat(padwith, n - len(group))) if group: yield group
# from itertools recipe def grouper(n, iterable, padvalue=None): """grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x') """ return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
How does this work? This left me puzzled for some time. (I am sure expert programmers would not find anything to be puzzled here, but I am just a plain, dumb programmer.)
itertools Recipe Works?
Let us assume for the time being that we are not concerned with padding. Now, the implementation simplifies to the following:
def grouper(n, iterable): return izip(*[iterable]*n)
[iterable]*n bit makes n references to the same iterator. Then when
izip picks one from each iterator, we simply get consecutive elements. It is like taking apples from the same basket, but we think we are taking it from three different baskets.
Consider the example of invoking
grouper with 3 (I am expanding the variables in each step):
def grouper(3, iterable): return izip(*([iterable] * 3)) # multiplying the list creates a list with 3 copies def grouper(3, iterable): return izip(*[iterable, iterable, iterable]) # Unpacking makes the list into 3 params to izip def grouper(3, iterable): return izip(iterable, iterable, iterable) # Now, let us look just at izip (conceptually, it # yields, one from each iterator packed into a tuple) def izip(iter1, iter2, iter3): # ... imaginary izip implementation yield (iter1.next(), iter2.next(), iter3.next()) # Note that iter1 is iter2 is iter3 == True. # so, it is essentially this: def izip(iterable, iterable, iterable) # ... imaginary izip implementation yield (iterable.next(), iterable.next(), iterable.next())
Adding the Padding
chain(iterable, repeat(padvalue, n-1) bit ensures that there will be enough items in the iterable for
izip to go through, because
izip bails out when the shortest iterator ends.
Let us consider the case, where there is one odd (not the mathematical sense) item at the end (that is, len(iterable) % n == 1). If n - 1, padding is always added to the end of the iterable, then,
izip will not exit and leave out that one last item.
Consider the other extreme, when, there are no odd ones out at the end, then the padded n-1 values, will not give enough elements for the iterator, to consume and that extra padding will be left behind.
In other cases, then having the additional padding of n-1, will ensure that that 1 element is returned, and for anything larger, the excess padding will be skipped.
X: padding value *: values in iterable n == 3 len(iterable) == 7 * * * * * * * X X len(iterable) == 6 * * * * * * X X len(iterable) == 8 * * * * * * * * X X
A consequence of this is that it does not provide a way to leave out padding altogether.
grouper with Optional Padding
To add optional padding, I could use the simple generator based code listed earlier. However, after you see the
itertools recipe, you just don't feel like manually accumulating anything.
itertools.groupby looked promising, and after a few trials, came up with the following:
def grouper(n, iterable): for k, g in groupby(enumerate(iterable), lambda x: x // n): yield (e for i, e in g)
I chose to follow the
itertools.groupby's behavior of keeping the groups as generators.
A similar implementation was pointed out by David Isaac in the Python mailing list.
Adding Padding to the
To add padding, we could do something dumb like this:
def pad(length, iterable, padding=None): return islice(chain(iterable, repeat(padding)), length) def grouper(n, iterable, pad=False, padding=None): if pad: for k, g in groupby(enumerate(iterable), lambda x: x // n): yield pad(n, (e for i, e in g), padding) else: for k, g in groupby(enumerate(iterable), lambda x: x // n): yield (e for i, e in g)
The padding routine is an adaptation of the
padnone recipe from the
However, This implementation has a big problem. We pointlessly pad every group, when, we might only need to do it for one.
We already have a solution that works well for the case where padding is required. So, the obvious solution is to use a hybrid.
Here is the full version of the hybrid code:
def grouper(n, iterable, pad=False, padding=None): if pad: return izip(*[chain(iterable, repeat(padding, n-1))]*n) else: return ((e for i, e in group) for key, group in groupby(enumerate(iterable), lambda x: x // n) )