Sunday, October 9, 2011

Python and Clojure comparisons: functional programming

Doing these comparisons actually helps me learn clojure since I'm more familiar with python.  I figure that this is probably true for a lot of other people, so hopefully this will help others learn clojure as well.  OTOH, what I will cover here is more of a functional style of python programming that people might not be used to.  So I hope this will also help people figure out the functional style of programming along with me.  And as a warning, I am no functional guru.  I'm a guy whose first language was C++, then C, then python, then Java...and I forgot the order of everything else.  I've only recently begun programming in a functional style.  But I hope that since I am new at this, I'll be in the same boat as others trying to learn this way of programming as well.

So in this entry, I'll cover some functional style programming between the two languages.  But before I go too deep, let me begin with a brief introduction on my own understanding of functional programming.  As the name suggests, functional programming puts functions as first class citizens.  This means that functions can be passed as arguments to functions, or they can be returned as arguments.  Functions can also "close over" data inside that function, and this is what the term closure means (and where the play on words, clojure, comes from) .  Functional programming also takes after math.  In math, functions take parameters belonging to some domain, and map an element from the domain to a range.  Therefore, functions in functional programming languages seem to frown upon taking no arguments and returning void (returning nothing).  Imagine a mathematical function:

nil = f()

What would that even mean?  But now I'll go over some common ground between the two languages.

In clojure, it is often idiomatic to use the functions map, apply, reduce, or filter to solve problems.  When I first was learning clojure, I had some difficulty in understanding the difference between the first three of these functions.

The map function takes as parameters, a function, and one or more collections.  The number of collections passed in depends on the function that is passed in to map.  If the function takes 2 arguments, then you pass in 2 collections.  If the function takes 3 args, then you pass in 3 collections, etc etc.  The map function will take the 1st element in each collection, and pass them as arguments to the function we are using.  Here's a trivial example in clojure:

(map #(+ % %2 %3) [ 1 2 3 ] [4 5 6] [10 20 30])

Notice the #(+ % %2 %3) anonymous function.  This is idiomatic clojure's way of defining an anonymous function.  However, I could have also done this:

(fn [ f s t ] (+ f s t))


Anonymous functions in clojure are similar to lambdas in python but more powerful.  Lambdas in python can only contain a single expression, thus limiting their power.  The equivalent lambda in python would look like this:

lambda x, y, z: return x + y + z

But getting back to clojure's map function, the return of map is a lazy sequence containing all the mappings of collection values passed to the function inside of the map.  One key to remember here is that we return a lazy sequence, and as such, sometimes it might be necessary to iterate through all values (using a doall).

So how about python's map?  Python's map is similar except that it doesn't return a lazy sequence (a generator in python terms).  The equivalent python code of the above would be this:

map(lambda x,y,z: x + y + z, [1, 2, 3], [ 4, 5, 6], [10, 20,30])


In python, you might be tempted to do this instead and it would return the same result:

[ x+y+z ; for x,y,z in zip([1,2,3], [4,5,6], [10,20,30]) ]


However, this has a disadvantage compared to the clojure way.  The disadvantage is that clojure's map returns a lazy sequence, but neither python's map, nor its list comprehension is lazy.  This means that for huge collections, you may run out of memory.  However, there IS a way to create a lazy sequence in python, but don't use map() and don't use the list comprehension above either.  Imagine if I had done this:


def increment(start=0, inc=1):
  i = start
  while True:
    yield i
    i += inc

def get_item(gen, i):
    res = gen.next()
    x = 0
    while x != i:
        res = gen.next()
    return res

needed_value = get_item(increment(), 100000) ## if you have a huge sequence

By calling get_item(), it will only return the value of gen at some iteration.  The advantage to this style is that since it is using a generator, it is not consuming the entire list in memory.  If i used a generator expression using for example xrange, and used list() to realize he values, this could be a lifesaver.  This same concept applies to clojure's lazy sequences.  Rather than hold the entire sequence in memory, a lazy sequence only holds as much as you require.  Sometimes, when you need to iterate over the whole sequence, you have to force the evaluation using doall.

Update:
So python also has generator comprehensions as well as list comprehensions.  Although the code above shows how a generator is manually created, there's a simpler syntax.

for term in (x+y+z ; for x,y,z in zip([1,2,3], [4,5,6], [10,20,30])):
    print term

Notice the use of () parens instead of [] that list comprehensions use.  The above code would create a generator and the for term in would operate on this generator.  This way you get the advantage of a lazy evaluator.

So that covers map...what about reduce?  Perhaps you've heard of the map/reduce algorithm used by projects from Google or Hadoop.  We've just seen above what map does.  It applies values from one or more collections, and passes them to a function, and the return values are put inside of a sequence.  The reduce method is similar, but it takes a specific kind of function.  This function must take 2 arguments.  Also, reduce only takes one collection.  So you will often see the result of a map, applied to a reduce since the map call returns a collection, and reduce takes only one collection.  The result of a reduce is a scalar, not a collection of some sort.

As its name suggests, reduce reduces a collection one by one in order to arrive at a scalar value.  The classic example of reduce is a summation.  For example, you can sum the series of numbers like this:


(reduce #(+ % %2) (take 10 (iterate inc 1)))


That would sum the numbers 1 - 10.  If you're not familiar enough with clojure yet, I'll break it down.  The inner (iterate inc 1) returns a lazy (infinite) sequence.  The inc function adds 1 to the argument passed in.  So (iterate inc 1) is saying, "create an infinite sequence starting at 1, and each succeeding element is 1 + the last element".  In other words (1 2 3 4 5 ...).  The take function takes a number of elements to grab from a sequence.  So (take 10 (iterate inc 1)) says, "give me the first 10 elements of the natural numbers, and return them as a sequence".  This is good since reduce requires a sequence as its second argument.  The #(+ % %2) is an anonymous function that simply adds the two arguments together.  The % and %2 are placeholders for arguments.

When reduce is called the first time, it will take the first 2 elements from the sequence, and pass these to % and %2 respectively.  The result of #(+ % %2) is used as the incoming argument to % on all succeeding calls, and the next element in the sequence is used for the %2 argument.  So for example, the first several calls would look like this:

(+ 1 2)  ;; returns 3
(+ 3 3)  ;; returns 6
(+ 6 4) ;; returns 10
(+ 10 5)
...

So hopefully that explains map and reduce, but what about apply?  Actually, python doesn't really have a version of apply...at least not what clojure considers an apply.  Since I can't really compare, I'll move on to something else.

A common scenario is creating a vector composed of the same index element in all the other vectors.  For example, given these arrays:

a = [ 1, 2, 3]
b = ['a', 'b', 'c']
c = ['first', 'second', 'third']

I want to create a set of vectors like this:

d= [[1, 'a', 'first'], [2, 'b', 'second'], [3, 'c', 'third']]

Python has a built in function called zip() which does this for you.

d= zip(a, b, c)

The only difference is that python will return a tuple rather than a list of the newly created zip.  For clojure, you may think the equivalent is zipmap, but it's not.  Clojure's zipmap only takes two sequences, it doesn't take more (or less) than that.  So how do you pass in an arbitrary number of sequences as we did with python's zip() function?  One solution is a map.

(map vector (iterate inc 1) ["a" "b" "c"] [:first :second :third] )

Try passing that into the REPL, and notice what you get.  You actually get a sequence, not another vector.  This is something to watch out for in Clojure.  Test it out by wrapping the call above inside of (class ).

So how about iterating through things?  Is it true that in Clojure (and other loops) you always have to do recursion?  Well, not exactly.  But since this is a rather complex topic, I'll save that for a later blog :)

No comments:

Post a Comment