Wednesday, December 3, 2014

A critique of python. Functional python to the rescue?

Perhaps my post about the OpenStack clojure tool may have given away some hints, but I am trying to distance myself from python.  I believe that in the next 5-10 years, python will have much of its thunder taken by the likes of the new kids on the block:  go, julia, and swift, and possibly even clojure or elixir.

I'm a big believer in "right tool for the right job".  The trouble is that people think python is great for anything.  I can already hear people sharpening their swords, but let me explain.    Python can be used for just about anything.  If a site like YouTube is mostly written in python, I think that goes to show its power.  Also, look at OpenStack itself.  A project with a million lines of code is nothing to sneeze at.  However, python has its problems especially at scale.  And I believe a big reason that big projects are written in python is because it's perceived as an "easy" language.  Because it is "easy" that means there is a wealth of programmers to hire, and also it's very fast to churn out code.  If code is easy to crank out, it seems logical to think that scaling should be easier as well.  Another big problem with python is that even though it is "multi-paradigm" it is basically an OOP language with mutable imperative programming baked in.

The trouble is that the strength of python is also its weakness.  Python is good at rapidly banging out some idea to see if it works.  Due to its dynamically typed nature, you don't have to write tons of boilerplate code like you would with Java or even C++.  If you want to extend the abilities of a class, no problem, monkey-patching to the rescue.  And very likely, some other programmer(s) have written a 3rd party library for you to consume.  However, do you see the problems inherent with these powers?

Because python is dynamically typed, when you first read python code, it is not at all obvious what kinds of arguments you are supposed to pass into a function.  Docstrings I hear you say?  Even if docstrings exist, all too often they are out of date or even plain old wrong.  How many times have you had to crack open a debugger and figure out why your function wasn't working as expected because the argument that you passed in was the wrong type?  I would go so far as to say that the time saved from not having to statically type your variables is lost during debugging.  At least Python3 introduced argument annotations (which btw, is not necessarily type hinting), perhaps because they realized duck typing isn't always enough?  Duck typing, while convenient, offers very weak guarantees if your object *should* be calling some function.   And monkey-patching is not the best for production quality distributed software.  How can you be sure that some monkey-patched new field or function won't clash with what another client is doing?  Finally, while it may seem like having a ton of 3rd party libraries is great, python never solved what OSGi for Java did (or what the upcoming project jigsaw will do for Java).

Let's say you have module foo, and it depends on importing module baz version 1.2.  Then you have module bar, and it too has a dependency on module baz, but at version 2.0+.  Sorry friends, but you are SOL in python.  This is because python has a flat PYTHONPATH much like vanilla Java does with its CLASSPATH.  OSGi solved this by essentially writing custom class loaders and adding metadata to OSGi modules (and I believe some JVM bytecode hackery too).

The weakness above don't even factor into consideration other problems of scale.  Take for example speed.  Unless you are running pypy, python speed leaves something to be desired.  But not every problem is speed insensitive.  Some may say this is a moot point and "my program is fast enough", as Guido himself seems to think.  Even Guido says the solution is to write the slow parts of your python code in C(++).  Really?  Okay, for the moment, maybe the python people are safe, because your competitors programs written in ruby, lua or python as well, will probably be about as fast.  And, you think, the feature set you churn out will be far greater than those statically typed guys programming in Java, Scala, C++, Haskell, etc etc.  There's no way they could churn out as much functionality in the same time frame as python right?

Ok, let's buy into that last argument (even though I don't think it is necessarily true).  What if you start comparing the old world dynamic languages (ie python, ruby, perl, etc) with the new generation of languages?  Say for example Julia, go, elixir or clojure?  Julia is blazing fast, and clojure is no slouch either.  Now, let's say your competitor has written his product in one of these languages, while yours is still using python?  That speed equivalence across the old school languages is gone, and your product will suffer.

And speed isn't the only scaling problem python has.  Python has a model for concurrency which requires, in essence, a distributed model of computing even on a single machine.  Yes, I can hear the chorus of shouting now, "Stop blaming the GIL!!!".   And if you think your answer is multiprocessing, this is not always the easiest of solutions.  Firstly, multiprocessing on Windows is a pain due to the requirement that all arguments to the new multiprocess must be pickleable.  Secondly, multiprocessing itself doesn't save you whatsoever from synchronization even though you may think you will be even if using Queues for serialization (to be fair, most languages don't make concurrency easy, though some new languages claim to fame is that they do make it quite a bit easier like clojure, erlang or the scala via the Akka library).  And since the end of Moore's law is around the corner (I remember reading once that at the 12nm limit, the electron can jump out, and we're almost there), that means speed increases will come about either from some form of concurrency/parallelism, or from a new computer architecture, perhaps fiber optics or quantum computers.

When you start writing huge amounts of code, especially concurrent code, python's lack of having immutable variables is really head scratching.  Yes, I understand tuples, strings, ints, etc are immutable, but there's overhead in subclassing from one of those types to create your own immutable type.  Also, persistent data types only exist in 3rd party libraries like pyrsistent.  That means you can't really be sure if your dict is the mutable or immutable kind.  A lot of people don't understand the importance of having immutable data, especially if they have never written concurrent code.  But I can't understand how python still doesn't have a simple way of creating a read-only constant.  Globals aren't necessarily evil if they are read-only.

So, why do functional programmers rave about immutable persistent data structures?  Imaging a scenario as simple as this:

sizes = [10, 20, 30]
configure_sizes(size)

What will sizes be after running configure_sizes()?  As a client, you don't know what that function does to the sizes variable.  Ok, I can hear a smart aleck say I should have used a tuple.  But what if I had used a dict?  If sizes = 100 (a simple non-compound type) and configure_sizes() changed it, you'd be upset wouldn't you?  Why should compound types be treated differently?   Let's look at another evil of mutable imperative OOP.

class Foo(object):
    def __init__(self, x):
        self.x = x
    def multiplier(self, y):
        return self.x * y

def modifier(foo_obj, newval):
    foo_obj.x = newval

f = Foo(10)
result = f.multiplier(2)
modifier(f, 3)
final_result = f.multiplier(2)

In this case, Foo.multiplier() method output depends on hidden state for the output.  This is bad, because it means that the end client cant ever fully be sure, given the input he passes in, what the result will be.  This is why in functional programming, the only thing that the determines the output, is the input.  

Python's crippled lambda also kind of sucks.  Why should it be limited to a single line for the expression?

Is python a bad language?  Definitely not.  There are some very cool features like generators, coroutines, list and dict comprehensions, and decorators.  But combine the inability to do true type hinting, lack of standard immutable data types, no easy way to do concurrency, and by default imperative style of programming leaves a lot to be desired.  However, writing in a functional style of python is possible.

Toolz
toolz is a python package that expands upon the functional paradigm of using higher order functions.  I highly recommend reading through the site, as they make a good case for functional programming in python.  Toolz includes some useful functions that work with sequences.  For example, they have a function called accumulate which is very similar to must other functional languages reduce.


Pyrsistent
pyrsistent is a set of immutable data objects that can be used in python.  They give you immutable vectors (lists), immutable maps (dictionaries) and even immutable classes.  Immutable data is of immense help during concurrent programming.  If you have multiple threads acting on the same immutable (read-only) data structure, you can never have inconsistent results.  Even when you aren't writing concurrent code, it is very useful because you don't have to keep track of state all the time.  Functional programming highly stresses to always use lexically scoped, pure functions.  Pure meaning that given the same inputs to a function, you will always get the same outputs.  There is no hidden state (such as globals

hy
A lisp running on python?  coool.  Now, it's no clojure, but it might be possible to get halfway there by using hy in combination with toolz and pyrsistent.  Although they don't have literals for certain data types, it should be possible because hy has reader macros (which clojure lacks btw).