2008-09-28

Scala for Pythonistas: Part 2

This is the second installment in the series that examines features that a beginner to intermediate level Pythonista would use, and their Scala equivalents. I am not an expert-level Pythonista, so I won't cover any advanced features.

For-loops

We have already seen the for-comprehensions (see part 1) in Scala. The for-loop construct is similar, but without the yield bit.

# Python
acc = 0
for x in xs:
  acc += x

// Scala
var acc = 0
for (x <- xs) {
  acc += x
} 

No break or continue

Scala does not support break and continue. Additional if-else must be used to simulate continue. To simulate break, an additional flag has to be maintained.

Scala does this to keep the syntax of the core language small and to avoid the complications of adding another control mechanism. Some would point out that this is related to the whole "goto considered harmful" business, because break and continue are restricted forms of goto. However, I think, Scala is trying to push functional-style programming, and that this isn't a misguided attempt to enforce "goto-less programming".

Before I describe how to simulate break in Scala, I want to point out that using break and continue judiciously can make code clearer and more concise than its convoluted work around. So, if your language provides a break construct, do not avoid it just for fanatical adherence to "anti-goto-ism".

Donald Knuth pointed out that fanatically avoiding all goto-like constructs might lead to unnecessarily complex and obscure code[Knuth].

Simulating break

Consider how you will simulate break in Scala. You will notice that the use of break is more concise and clearer. The work-around has to maintain an additional condition variable (think of C-style for-loop):

# Python
acc = 0
for x in xs:
  acc += x
  if test(x, acc):
     break

// Scala
var acc = 0
var continue = true
for (x <- xs.takeWhile(e => continue)) {
  acc += x
  continue = !test(x, acc)
} 

There are cases, when we can avoid the additional condition variable. This is when our test depends only on the values accessible from the takeWhile's predicate, and when the values accessible from the takeWhile's predicate are the appropriate values, and when the initial case would not be false. And we can also rephrase the test case to avoid the negation.

Depending on which would be clearer, you are most likely to use other constructs or break the computation into smaller functions (Scala's compile time and Java VM's runtime optimizations should, hopefully, keep the function call cost a bit lower than that of Python).

Simulating continue

Simulation of continue is described in the discussion of while loops below (which applies to for-loops as well).

While Loops

While loop construct is similar to that of Python (and most other languages). Consider the following Python code:

res = 0
while test(res):
  res += work(res)

Here is the Scala equivalent:

var res = 0
while (test(res)) {
  res += work(res)
}
Simulating break
# Python
res = 0
while test1(res):
  res += work(res)
  if test2(res):
     break

// Scala
var res = 0
var continue = test1(res)
while (continue) {
  res += work(res)
  continue = test1(res) && !test2(res)
}

Scala does not provide break or continue, so we need to use a “continue” flag and update the flag. Alternatively, we can use functional constructs or express it in another way; which may or may not be clearer.

Of course, we can do this kind of programming in Python as well. In fact, some, especially those in academia, might insist on programming the above Python as follows:

res = 0
do_continue = test1(res)
while do_continue:
  res += work(res)
  do_continue = test1(res) and not test2(res)

In this particular case, I think I will stick to using break in Python.

Simulating continue

Similarly, Python's continue can be simulated by an additional if-else block.

# Python (I find this particular case more confusing)
res = 0
while test1(res):
  res += work1(res)
  if test2(res):
     continue
  work2(res)

// Scala
var res = 0
while (test1(res)) {
  res += work1(res)
  if (!test2(res)) {
     work2(res)
  }
}

# Python; better
res = 0
while test1(res):
  res += work1(res)
  if not test2(res):
     work2(res)

In this case, in Python, Java, and Scala, I prefer to use a nested if-else block over continue. The only advantage continue has over the nested-if-else is that you can avoid another level of nesting, which isn't worth it in this case. However, the continue construct might be more appropriate when using a fail/exit-early programming style.

Simulating enumerate

Although the for-each constructs in Python and Scala makes iterating over items in an iterable convenient, it becomes a bit annoying when we also need to keep track of the index. In Python, the solution is to use a wrapper generator, like the built-in enumerate. In Scala, the zipWithIndex method of the iterable is used instead:

# Python
for i, x in enumerate(xs):
  print(i, x)

// Scala
for ((x, i) <- xs.zipWithIndex) {
  println(i + " " + x)
}

Wrap-up

Hopefully, I can continue cover more of this fun stuff in later installments.

References

Notes

[Knuth: Structured Goto]
In "Structured Programming with go-to Statements" describes cases where goto-like constructs are clearer and more concise than convoluted code to avoid goto-like constructs.

2008-09-13

Scala for Pythonistas: Part 1

This examines features that a beginner to intermediate level Pythonista would use, and their Scala equivalents. I am not an expert-level Pythonista, so I won't cover any advanced features.

This is the first installment.

Tuples

Scala has tuples, and supports some of the tuple unpacking tricks familiar to Pythonistas.

-- tuple literals
# Python
resp = (42, "The Answer")
// Scala
val resp = (42, "The Answer")

-- tuple member access (access first item)
# Python
resp[0]
// Scala
resp._1

-- multiple assignments using tuples
# Python
num, desc = 42, "The Answer"
// Scala (parentheses are always required)
val (num, desc) = (42, "The Answer")

Pythonistas sometimes get into the habit of using and abusing the iterable nature of Python's tuples. This won't work in Scala, because, tuples are not considered an iterable in Scala. Also, Tuples are not defined as a general sequence type in Scala and is only defined for up to 20 members.

The idea of a tuple, is that of something like a C struct instance or a database row (see PyFAQ1). This is how we usually use tuples in Python, e.g., to return multiple values, parameters, etc. Python, being a dynamic language, and because Python decided to make tuple a sequence type, tuples start looking like immutable lists, and sometimes we just abuse it. I plead guilty to this; several times.

In Scala, tuples can only be used as, tuples in the C struct-like sense. At least, there won't be endless flame-wars on the nature of tuples. :-)

Comprehensions

Every Pythonista's favorite, comprehensions, has its analogue in Scala. It is called for-comprehension.

Scala's for-comprehensions are more like Python's generator comprehensions, than list comprehensions, in that they build generators. Furthermore, the generators built this way can be invoked multiple times (unlike generators created by Python's comprehensions). Also, the type of generator created by a for-comprehension depends on the type of sequence the for-comprehension was iterating over.

Here are a few examples:

-- simple comprehension
# Python
sqs = (x*x for x in range(1, 10))

// Scala
val sqs = for (x <- (1 until 10)) yield x*x


-- comprehension with filter
# Python
esqs = (x*x for x in range(1, 10) if x % 2 == 0)

// Scala
val esqs = for (x <- (1 until 10) if x % 2 == 0) yield x*x


-- nested comprehensions (cross-product-style results)
# Python
ccps = ((x, y, x+y) for x in range(1, 5) 
                    for y in range(5, 10))

// Scala
val ccps = for (x <- (1 until 5)
                y <- (5 until 10)
           ) yield (x, y, x+y)


-- chained comprehensions
# Python
dds = ((y, y+1) for y in (x*2 for x in range(1, 5)) 
            if y < 5)

// Scala
val dds = for (y <- (for (x <- (1 until 5)) yield x*2) 
               if y < 5
              ) yield (y, y+1)

In the case of the example with nested comprehensions, semi-colons are required if multiple x <- xs expressions are to appear on the same line.

Ranges

Python 2.x's xrange() or 3.x's range() built-in is represented by Scala's until method available to Scala's Int. Just as it is in Python 2.x, ranges are only available for 32-bit integer types. The behavior is very similar (the following examples assumes Python 3.x built-ins, substitute xrange for range for 2.x equivalent):

-- simple range: 0, 1, 2, 3, 4
# Python 3.x (use xrange for 2.x)
xs = range(0, 5)

// Scala
val xs = 0 until 5


-- range with custom increment: 0, 2, 4
# Python 3.x
xs = range(0, 5, 2)

// Scala
val xs = 0 until 5 by 2


-- reversed range: 5, 4, 3, 2, 1
# Python 3.x
xs = range(5, 0, -1)

// Scala
val xs = 5 until 0 by -1


-- invalid ranges => empty
# Python 3.x
range(5, 1)
range(1, 5, -1)

// Scala
5 until 1
1 until 5 by -1

Note that, the behavior is almost identical, including the response to invalid (nonsensical) range specifications like counting from 5 up to 1.

Also, note that the generator created by until can be invoked multiple times, similar to the generators created by Scala's for-comprehensions.

The equivalent of Python 2.x's range() built-in is the range factory method of the List object. The parameters and behavior would match the behavior of Python's range exactly. See the example below:

# Python 2.x (in 3.x use list(range(...)))
xs = range(0, 5, 2)

// Scala
val xs = List.range(0, 5, 2)

Scala also provides inclusive range constructor and provides all the options (Thanks to Seth Tisue for pointing this and the "by" method out). In Python, there is no built-in inclusive range, but you can simulate it by adding 1 to (for positive ranges) or subtracting 1 from (for negative ranges) the range end.

// simple range ; gives 1, 2, 3, 4, 5
val xs = 1 to 5 
# Python
xs = range(1, 6)

// range with custom step; gives 1, 3, 5
val cs = 1 to 5 by 2
# Python
cs = range(1, 6, 2)

// reversed range; gives 5, 4, 3, 2, 1
val rs = 5 to 1 by -1
# Python
rs = range(5, 0, -1)

Note that until and to are not language keywords, but methods available on Int objects through implicit conversion to RichInt objects. For instance, we could have written 0 until 5 as 0.until(5). by is also not a keyword, but a method on Inclusive and Range (the inclusive range object created by to).

Wrap-up

Hopefully, I can cover more of this fun stuff in later installments.

References

The following were used to check the facts:

  • Programming in Scala
  • Python documentation 2.5, 2.6beta and 3.0beta

Notes

[PyFAQ1]
Why are there separate tuple and list data types Python FAQ
2.6 and above will have a factory method namedtuple, that will allow you to build tuples that work more like structs
[Seth Tisue]
Seth Tisue is the lead developer of NetLogo at the Northwestern University, Evanston, IL.

2008-07-20

Groupsofn, itertools grouper Recipe, and Padding

This post is a moron's attempt to understand something that most others take for granted, a recipe given in the itertools documentation. This is not a solution to any real problem.

Groups of n

Groups of n is a function that takes an iterable and returns n consecutive elements from that iterable.

I needed something like this, while I was trying to convert a rather silly ID-generation function written in Perl to Python. It turned out to be a simple hexadecimal encoding/decoding function.

Under normal circumstances, this is how you would implement that in Python:

import codecs

def hexenc(input):
    return codecs.getencoder("hex")(input)[0]

def hexdec(input):
    return codecs.getdecoder("hex")(input)[0]

In any case, I was halfway through, and so I decided to continue, and ended up with the following implementation:

def hexenc(input):
    return "".join(
        hex(ord(char))[2:].rjust(2, "0")
            for char in input
    )

def hexdec(input):
    return "".join(
        chr(int("".join(hx), 16))
            for hx in groupsofn(2, input)
    )

Notice that the hexdec function assumes that a function called “groupsofn” exists and that it splits the input into n consecutive items.

Plain Generator Implementation

This is a plain generator implementation of it, and incredibly dumb:

def groupsofn(n, iterable, pad=False, padwith=None):
    from itertools import repeat
    group = []
    for i, item in enumerate(iterable):
        group.append(item)
        if (i + 1) % n == 0:
            yield group
            group = []
    
    if pad and 0 < len(group) < n:
        group.extend(repeat(padwith, n - len(group)))
    
    if group:
        yield group

The itertools Recipe

The itertools documentation provides the following recipe:

# from itertools recipe
def grouper(n, iterable, padvalue=None):
    """grouper(3, 'abcdefg', 'x') --> ('a','b','c'),
                              ('d','e','f'), ('g','x','x')
    """
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

How does this work? This left me puzzled for some time. (I am sure expert programmers would not find anything to be puzzled here, but I am just a plain, dumb programmer.)

How the itertools Recipe Works?

Let us assume for the time being that we are not concerned with padding. Now, the implementation simplifies to the following:

def grouper(n, iterable):
    return izip(*[iterable]*n)

The [iterable]*n bit makes n references to the same iterator. Then when izip picks one from each iterator, we simply get consecutive elements. It is like taking apples from the same basket, but we think we are taking it from three different baskets.

Consider the example of invoking grouper with 3 (I am expanding the variables in each step):

def grouper(3, iterable):
    return izip(*([iterable] * 3))

# multiplying the list creates a list with 3 copies
def grouper(3, iterable):
    return izip(*[iterable, iterable, iterable])

# Unpacking makes the list into 3 params to izip
def grouper(3, iterable):
    return izip(iterable, iterable, iterable)

# Now, let us look just at izip (conceptually, it 
# yields, one from each iterator packed into a tuple)
def izip(iter1, iter2, iter3):
    # ... imaginary izip implementation
    yield (iter1.next(), iter2.next(), iter3.next())

# Note that iter1 is iter2 is iter3 == True.
# so, it is essentially this:
def izip(iterable, iterable, iterable)
    # ... imaginary izip implementation
    yield (iterable.next(), iterable.next(), iterable.next())
Adding the Padding

The chain(iterable, repeat(padvalue, n-1) bit ensures that there will be enough items in the iterable for izip to go through, because izip bails out when the shortest iterator ends.

Let us consider the case, where there is one odd (not the mathematical sense) item at the end (that is, len(iterable) % n == 1). If n - 1, padding is always added to the end of the iterable, then, izip will not exit and leave out that one last item.

Consider the other extreme, when, there are no odd ones out at the end, then the padded n-1 values, will not give enough elements for the iterator, to consume and that extra padding will be left behind.

In other cases, then having the additional padding of n-1, will ensure that that 1 element is returned, and for anything larger, the excess padding will be skipped.

X: padding value
 *: values in iterable
 n == 3

 len(iterable) == 7
 * * *
 * * *
 * X X

 len(iterable) == 6
 * * *
 * * *
 X X

 len(iterable) == 8
 * * *
 * * *
 * * X
 X

A consequence of this is that it does not provide a way to leave out padding altogether.

A grouper with Optional Padding

To add optional padding, I could use the simple generator based code listed earlier. However, after you see the itertools recipe, you just don't feel like manually accumulating anything.

The groupby Solution

The itertools.groupby looked promising, and after a few trials, came up with the following:

def grouper(n, iterable):
    for k, g in groupby(enumerate(iterable), lambda x: x[0] // n):
        yield (e for i, e in g)

I chose to follow the itertools.groupby's behavior of keeping the groups as generators.

A similar implementation was pointed out by David Isaac in the Python mailing list.

Adding Padding to the groupby Solution

To add padding, we could do something dumb like this:

def pad(length, iterable, padding=None):
    return islice(chain(iterable, repeat(padding)), length)

def grouper(n, iterable, pad=False, padding=None):
    if pad:
        for k, g in groupby(enumerate(iterable), 
                                  lambda x: x[0] // n):
            yield pad(n, (e for i, e in g), padding)
    else:
        for k, g in groupby(enumerate(iterable), 
                                  lambda x: x[0] // n):
            yield (e for i, e in g)

The padding routine is an adaptation of the padnone recipe from the itertools documentation.

However, This implementation has a big problem. We pointlessly pad every group, when, we might only need to do it for one.

The Hybrid

We already have a solution that works well for the case where padding is required. So, the obvious solution is to use a hybrid.

Here is the full version of the hybrid code:

def grouper(n, iterable, pad=False, padding=None):
    if pad:
        return izip(*[chain(iterable, repeat(padding, n-1))]*n)
    else:
        return ((e for i, e in group) for key, group in 
                    groupby(enumerate(iterable), lambda x: x[0] // n)
        )

Resources