Under A Boddhi Tree

Monday, September 7, 2015

How to evaluate forms given to a clojure macro without throwing an exception

OK, I probably shouldn't admit this, but it took me the better part of 2 days of straight coding to come up with a macro that I wanted. In a nutshell, I wanted to be able to call a sequence of functions and collect the results, even if one of those functions would throw an exception. For example, something like this:

209 (let [x 2
210       y 0]
211   (try+
212     (* x 2)
213     (* x y)
214     (+ 9 y)
215     (/ 1 y)))

Do you see why I needed a macro for the try+? What if I had tried to write it as a function? Since clojure is by default an eager language, it will try to evaluate the arguments first and then supply the results of the evaluation to the calling function. However, it is quite possible that a function that is supplied as an argument to another function can throw an exception, which will result in the calling function failing, as well as any other "functions as arguments" to the right of the offending function not getting evaluated at all. One way around that would be to quote each function call, and in the function evaluate it

   1 (defn awkward-try [& fncalls]
   2   (for [fnc fncalls]
   3     (try 
   4       (eval fnc)
   5         (catch Exception ex ex))))
   6 
   7 (awkward-try
   8     '(* 2 2)
   9     '(/ 1 0))

However, making the user quote the functions is unnecessary, although the solution for it was quite a bit more difficult. Before I show you the working solution, I'll show a failed attempt to make it work, because sometimes, it's just as useful to show something you thought would work but didn't.

So an early attempt I made was similar to the awkward-try function above and it looked like this:

162 (defmacro firsttry+
163   "Takes a body of function calls and calls them lazily.  If a function throws
164    an exception, dont propagate it.  Collect the exception in the
165    results"
166   [& body]
167   `(for [arg# '~body]
168      (try
169        (eval arg#)
170        (catch Exception ex#
171          (println "caught exception")
172          ex#)))))

And if you try this, it seems to work:

(try++
(* 2 2)
(/ 1 0))
caught exception
=> (4 #error {
:cause "Divide by zero"
:via
[{:type java.lang.ArithmeticException
:message "Divide by zero"
:at [clojure.lang.Numbers divide "Numbers.java" 158]}]
:trace
...)

The problem is when you try to use let bound symbols:

(let [x 2
y 0]
(try++
(* 2 x)
(/ 1 y)))
caught exception
caught exception
=> (#error {
:cause "Unable to resolve symbol: x in this context"
:via
[{:type clojure.lang.Compiler$CompilerException
:message "java.lang.RuntimeException: Unable to resolve symbol: x in this context, compiling:(/home/stoner/.IdeaIC14/system/tmp/form-init96648934167159599.clj:4:5)"
:at [clojure.lang.Compiler analyze "Compiler.java" 6543]}
{:type java.lang.RuntimeException
:message "Unable to resolve symbol: x in this context"
:at [clojure.lang.Util runtimeException "Util.java" 221]}]
:trace
...
} #error {
:cause "Unable to resolve symbol: y in this context"
:via
[{:type clojure.lang.Compiler$CompilerException
:message "java.lang.RuntimeException: Unable to resolve symbol: y in this context, compiling:(/home/stoner/.IdeaIC14/system/tmp/form-init96648934167159599.clj:5:5)"
:at [clojure.lang.Compiler analyze "Compiler.java" 6543]}
{:type java.lang.RuntimeException
:message "Unable to resolve symbol: y in this context"
:at [clojure.lang.Util runtimeException "Util.java" 221]}]
:trace
...)

Hmmm, so what's all this stuff about not being able to resolve symbol x and y when there are let bound symbols? The key is in understanding how at macroexpansion time, the arguments that got passed in are exposed. If you notice, I have a somewhat strange '~body in the for expression. First off, it wasn't even clear to me what was in the body symbol once it was evaluated. I couldn't just do ~body because the whole exercise of the macro was to avoid evaluating the body! But, I did need to pull the elements out.

I also couldn't use ~@body, because that would have the wrong form in a for expression. like let, loop, doseq, and binding, a for macro takes one or more pairs. If I had done a unquote-splice, it would have done something like this when expanded:

(for [arg# (+ 2 2) (/ 1 0)]
...)

Which is not the right form. So I thought ok, let me try '~body which I thought would return what body represented (including any substitutions), but without actually evaluating it because it would be quoted. I thought doing that would be like this:

(for [arg# '((* 2 2) (/ 1 0))]
... )

But that's not what happens, and what you really get is:

(for [arg# '((* 2 x) (/ 1 y))]
... )

And that is why the clojure compiler complains that it doesn't know what the symbol x and y are. So ok, that explains the unknown symbol problem, but why did that happen? Why didn't it substitute the value of 2 for x and 0 for y? I honestly am not sure of the answer to that question. Also, how can I substitute all symbols within each s-expression in the body of the macro one by one if I cant do ~body, ~@body or '~body?

After a lot of trial and error, I finally decided to try a different tack, and I looked at the or macro in clojure. I saw that it just did a simple (stack overflowing) self-recursion based on different arities. I realized I could do this too, but I wanted to save the results of calling each function. It took me a little while to realize that once again, lazy-seq is your friend.

Here's the final code I finally came up with that works:

163 (defmacro wrap
164   "Takes a function call and surrounds it with a try catch.  Logs the function name
165    the args supplied to the function "
166   [head]
167   `(let [fnname# (first '~head)
168          args# (rest (list ~@head))]
169      (timbre/info "evaluating function:" fnname# ", args:" args#)
170      (try
171        ~head
172        (catch Exception ex#
173          [{:name fnname# :args args# :ex ex#}]))))
174 
175 
176 (defmacro try+
177   ([head]
178    [`(wrap ~head)])
179   ([head & tail]
180    `(lazy-seq
181       (cons
182        (wrap ~head)
183        (try+ ~@tail)))))
184

The wrap macro is really just a helper macro to help print out what is getting called. It takes one of the forms from body. So from the above example the first execution, head will be (* 2 2). Notice that the value of x does get substituted in (otherwise head would be (* 2 x) ). I was able to use ~head on line 178 and 182, to do the substitution....but without evaluation. Recall that with macros, an expression is not eagerly evaluated automatically. So what happens here is:

(wrap (* 2 2))

But since wrap is itself a macro, (* 2 2) does not get evaluated yet. That's why when it gets to the next form of (/ 1 0), it does not throw an exception as soon as wrap is expanded. Otherwis try+ uses destructuring to split the forms submitted to it as a head and tail.

Just another note, it's a little tricky to figure when and where to start the syntax unquoting. For example, in one of my earlier attempts, I did not put the ` syntax quote literal on line 180, but on 181 instead. And what I noticed was that the macro would not evaluate lazily. The try+ macro would evaluate all the forms given to it in one shot. I believe the reason for this is because macros have a macroexpansion time. Because I did not syntax quote the entire lazy-seq form, the macro expander was expanding the entire form all in one shot at macro expansion time, so when it got back to the evaluation run time phase, everything had already been calculated.

Some other gotchas I noticed was that lazy-seq either wants to go on infinitely, or if it is finite recursion, the final thing the recursive call must return must be some seq type. If you notice on line 178, it returns a vector. I needed that because as the end of the recursion, it has to return a seq type. That's why normally you see a pattern of:

...
(lazy-seq
(if some-pred?
(cons x (foo y))
[])

Since cons takes (element, collection) as it's args, the 2nd arg to cons should be some kind of collection. If you see an error like:

IllegalArgumentException Don't know how to create ISeq from: java.lang.Long

Then you are probably trying to cons a scalar element (a Long for example) to a sequence.

Sunday, August 23, 2015

Evangelizing clojure

Since I've started my new position where I get to work in clojure, I've been itching to try to get others at my workplace to see where clojure would be useful. Currently, my workplace is a python, java, C and shell workshop (with a smattering of ruby here and there). I'm one of the few engineers at my work where I get to work with clojure.

And that's sad. Unfortunately, a common fear for most companies is the difficulty in finding engineers who are proficient in a certain technology stack. I quite frankly find that a rather lame excuse. Any engineer worth their salt should be able to learn a new language. And in fact, I'd rather hire engineers who have a mind curious enough to learn a not-hot language with a very different paradigm. If management's concern is that they want an engineer to "hit the ground running", I think they are sacrificing short term gains for long term benefits. I used to work at a company that decided to use perl for all its scripting efforts, and they wound up having many perl "camps" where engineers spent an entire week going through intensive perl training. If there are companies that make a living teaching new languages, why not take advantage of that? So when I hear managers claim that the lack of engineers with the skill to program in clojure is a detriment, I find that as a weak excuse.

There's also the odd paradox that some companies don't seem to be so concerned about other new "hot" langauges like Go or Swift. Perhaps it's because those two languages are backed by the giants Google and Apple respectively, and so therefore, they must have gotten something right. Personally, having had a cursory glance at Go and Swift, I've found nothing particularly outstanding about them compared to other new languages without the hotness (clojure, elixir, rust, elm or julia for example).

So what can we clojurians do to help others understand where clojure could be a viable alternative? I think we need to do several things:

Point out how language X has certain weaknesses that could be resolved with clojure
Point out how clojure can live synergistically with a Java ecosystem
Help train and educate others that lisps aren't as scary/gross as they think
Get people familiar with the tools and ecosystem of clojure

For example, I hope to release a set of tutorials to help compare and contrast how clojure could solve problems more elegantly than python. It would cover things like how to do highly concurrent programs in comparison with python and how immutability can help make more robust programs. I'd also show how python decorators, which are sometimes compared to lisp macros, fail to deliver the same power of a lisp-style macro.

Another topic that I don't see discussed too much, is how to integrate clojure with legacy java projects. I'd like to create some articles talking about how to use TestNG with clojure, how gen-class really works, and how to plug clojure into a maven or gradle based program. I'd also like to give more examples on how to use java interop constructs, including defprotocols, proxy and using them to bridge java and clojure.

Another hinderance is, IMHO, purely psychological. I find people's first reactions to lisp syntax somewhat amusing. It's such an immediate and almost visceral reaction that I have to truly wonder why lisp syntax is so (initially) despised by so many. Is it perhaps because there is a relationship between lisp syntax and XML (and people hate XML)? I remember my first reaction to lisp in college and I was just like "whoaaaa". But I also remember my first reaction to python's syntax where white space mattered was like, "who the hell thought having white space matter was a good thing!!". But after about 3 weeks, I didn't even notice it anymore. And the same thing happened with me with clojure. But how do you get people to even try clojure for 3-4 weeks?

Finally, another big barrier for people coming to clojure is the tools and ecosystem. For starters, a large chunk of tutorials and videos you will see online use emacs + CIDER as the IDE. I basically started learning emacs about 3 years ago in order to do clojure. Now, I'm an older guy, so I'm not afraid of basic text editors unlike some young whipper-snappers who seem to be at a loss without a full fledged IDE. Now for Java programming I do enjoy something like IntelliJ or Eclipse, but emacs is a pretty cool IDE for clojure. While there is a plugin for vim and clojure, the majority of the community does work with emacs. There's another interesting IDE called cursive which is supposed to have the ability to debug both clojure and java code which would come in handy.

Beyond the IDE, there's build tools, and so coming to grips with leiningen and perhaps boot would be useful. Also, if you don't have any background in Java, while it's not absolutely necessary to know clojure, it will definitely help (the same is true if you don't really know javascript, but want to know clojurescript). So some familiarity with the underlying runtime (the JVM or javascript engine) will go a long way to making you a better clojurian.

Figuring out Clojure vars vs. symbols

Although I realized that there was some kind of difference between a clojure var and a symbol, I hadn't really considered what the difference was. To make matters worse, I just considered a clojure symbol as symbols are usually considered in other languages. In other words, I just considered a symbol to be an object or reference to something that could be used by the program. However, symbols have their own special meaning in clojure.

So, what exactly is a var? When I first was learning clojure, I kept reading on websites and in the books I had that clojure doesn't have variables. Instead, they have vars and bindings. Well, ok, but what in the world does that mean, and how do vars differ from variables? Furthermore, I pretty much had assumed vars and symbols were (almost) the same thing. For example, if I have:

(def foo 10)

Ok, so foo kind of looks like what other programming languages would call a variable. But if foo isn't a variable, what is it? It's a var right?

Hold on partner, we have to consider how we are looking at the thing called foo. In the line above, yes, foo is a var. But if i just type foo at the repl, what is it? Or what is 'foo, or #'foo?

Let's step back for a moment and consider what Rich has wanted clojure to do. Clojure is a language that dearly wants to separate identity, state and values. Identity is what names a thing, state is a value at a moment in time, and values are...well, values :) In Python, if I do this:

bar = [10]

Then bar is a variable which has the value of 10. However, it has not separated the notion of identity, state and value. Identity, state and value are all commingled in the variable bar.

So back in clojure land, how we look at foo depends on how it is being evaluated. Put simply foo (by itself) is a symbol which can be used to look up a var. In this example, foo is our identity. So you might now be wondering what the 10 is as that obviously seems to be a value. Values have to be stored somewhere and the var is what actually holds some value.

Normally we think of foo as neither a symbol nor a var, but a value. In other words, I could just mentally replace the value of 10 wherever I see foo. But wait kimosabe, you are forgetting about clojure's macros, but I am getting ahead of myself. If I just type foo in the repl, I get its value back which is 10.

foo
10

(type foo)
java.lang.Long

Okay, so it seems like for all intents and purposes the symbol foo _is_ 10. But is it? What does the documentation say about def anyways?

boot.user=>; (doc def)
-------------------------
def
(def symbol doc-string? init?)
Special Form
Creates and interns a global var with the name
of symbol in the current namespace (*ns*) or locates such a var if
it already exists. If init is supplied, it is evaluated, and the
root binding of the var is set to the resulting value. If init is
not supplied, the root binding of the var is unaffected.

Please see http://clojure.org/special_forms#def
nil

Hmmm, so (def foo 10) interns a var in the current namespace with the name of the symbol. Have you wondered if def returns anything?

(println (def x 100))

Ah, so def returns the var itself. The definition says that a var with the name of the symbol is created by a def. Ok, is a symbol just a lookup name? Where does it fit into the picture? Consider this:

(symbol "foo")
(type (symbol "foo"))

What does that return? It returns....gasp....a symbol :) But what good is that? It doesn't actually return 10. Why not? To get the value (the var contains) that foo represents, we could do something like this:

(eval (symbol "foo"))

But let's try another thought experiment to help illuminate the difference between vars, symbols and values. Consider what this returns before trying this in the repl:

(var (symbol "foo"))

If you did try that in the repl, you'll notice that threw an exception...how rude!!

clojure.lang.Compiler$CompilerException: java.lang.ClassCastException: clojure.lang.PersistentList cannot be cast to clojure.lang.Symbol, compiling:(/tmp/boot.user2720475669809682962.clj:1:1)
java.lang.ClassCastException: clojure.lang.PersistentList cannot be cast to clojure.lang.Symbol

Hmmm, so it looks like var is actually evaluating (symbol "foo"), and not the result of (symbol "foo"). Ok, let's try this:

(defmacro huh [var-string]
`(let [x# (-> ~(symbol var-string) var)]
x#))

So why did I have to make a macro? var is a special form, so it doesn't eagerly evaluate the arg that gets passed into it. By the way, try doing (-> (symbol "foo") var) and see what happens (and you'll see why I needed a macro). If you look at the documentation for var, it says that it returns the var (not the value) of a symbol. You can see that by doing this:

(type (huh "foo"))
clojure.lang.Var

So remember what we've done here. By having (symbol "foo") we are creating a symbol. This object does not evaluate to 10. In fact, neither does getting the var which is pointed to by the symbol foo. In order to actually get the value of the var object, we need to dereference it. Let's make a small change to our macro:

(defmacro huh [var-string]
`(let [x# (-> ~(symbol var-string) var)]
@x#)) ;;

(huh "foo")
10

So why bother making a distinction between symbols and vars? I mean, wouldn't it be simpler to just have the symbol directly reference the value? Why have this 2-level look up system of symbol -> var -> value? Recall what I said earlier about maintaining a distinction between identity, state, and value. Another answer is to think about macros and macro expansion time vs. compile time. Here's another exploration:

(doseq [elem '(def foo 10)]
(println e "is a" (type e)))

def is a clojure.lang.Symbol
foo is a clojure.lang.Symbol
10 is a java.lang.Long
nil

Ahhhh, so when the reader looks at (def foo 10), foo is a symbol. By having a var looked up by a symbol, and then the value retrieved from the var, we can delay actually getting the value....by retrieving the var instead. Also, consider how many times clojure wants the symbol of a thing, rather than its value. Furthermore, some clojure functions want the var itself rather than the value. For example:

(defn ^{:version "1.0"} doubler

[x]

(* x 2))

;(meta doubler) ;; Wrong, the metadata doesnt belong to the doubler function, but the var itself

(meta #'doubler) ;; equivalent to (meta (var doubler))

{:version "1.0", :arglists ([x]), :line 1, :column 1, :file "/tmp/boot.user2720475669809682962.clj", :name doubler, :ns #object[clojure.lang.Namespace 0x2d471d43 "boot.user"]}

Another example is when we require or import from within the repl. When you require or import from the repl (as opposed to when you use the :require or :import directives from the ns macro), it requires a sequence of symbols which refer to classes in the classpath.

Finally, remember that vars can have thread-local bindings. That's why symbols shouldn't just point to values, as you may want to give another thread some other binding value.

I hope this makes the differences between vars and symbols a little more clear.

Tuesday, July 7, 2015

Clojure....here I come!!

So, on July 15th, I'll be starting my new position at Red Hat, working as a Quality Engineer on the subscription-manager team. One may be wondering why I would leave a hot product like Openstack to go into the relatively obscure quality team. One word: clojure.

I'll get to do clojure and be paid for it (and not have to be skunkworks)! That alone is sufficient reason for me to have wanted to take on this role. I'm pretty stoked about it, but my clojure has gotten a little rusty in the last few months. If it wasn't for some hy code I was writing, I'd probably have forgotten a lot.

For example, for fun, I'm working on my first ever web application. I wanted to do something fun because I've never made a web application before (I know...almost 9 years into my career, and I've never made a web app before). So I am finally turning my role playing game ideas into a web app. I saw this site that is a virtual roleplaying table and that looked cool. But I'm far from that, and decided to just work on implementing characters and rules in clojure first. Since part of my calculation requires working with exponents (yes, this game will require a computer), I thought it'd be neat to make a lazy exponent calculator. I wanted something like this:

(take 4 (lazy-expt 2)) => (1 2 4 8)

And I was very confused about how to go about doing it. Of course, lazy-seq was something I needed, but I couldn't figure out how to accumulate my results. I really didn't want to force the user to pass in an accumulator. That's when multiple-arity functions made me see the light.

(defn lazy-expt
"Lazy sequence for exponents"
([base]
(let [orig 1
acc (* base orig)]
(lazy-seq
(cons orig (lazy-expt base orig)))))
([base acc]
(let [total (* base acc)]
(lazy-seq
(cons total (lazy-expt base total))))))

The multiple arity allowed me to not require the user to pass in an accumulated result. I very rarely use multiple arity methods and instead tend to use methods with default params or extra params (ie using & in the argument vector).

Just for fun, I'll start working on finding square (and other) roots to a number although the methods to do that look a lot more difficult.

I am currently learning luminus, clojurescript, webgl and HTML5. I discovered that browsers have an experimental ability to get access to the webcam and microphone. One thing I hate is forcing users to use flash, applet or plugin for that. And instead of a 2d table mat, I want a 3d environment. I'm also boning up on the OrientDB graph database, because that's what I'm going to use to store data.

I'm stoked. It's an ambitious project, but as my grandfather used to say "shoot for the stars, hit the moon". Or, "if it comes to you easily, it isn't worth it".

Friday, February 27, 2015

How to modify a clojure map from a given sequence

Okay, this turned out to be rather challenging to me. As I've been telling some people the syntax of clojure is relatively simple, the hard part is learning how to deal with immutability. For a little OpenStack project I was working on, I wanted to convert the service catalog that got returned as a JSON string into something a little more search-worthy. But the problem I was having was that I wanted to modify a map based on items in a sequence. Here's a contrived example.

Suppose I have this list:

(def people [{:first "John" :last "Doe" :age 42 :company "LinkedIn"}
{:first "Harry" :last "Smith" :age 31 :company "NASA"}])

and this already existing map:

(def companies {1 "Google", 2 "RedHat"})

So, how do I create a new map of companies by iterating through people, and adding the company the person belongs to?

Unlike a mutable language, where I could just directly change the map upon each iteration, that wont work in clojure. Let me show you an equivalent example in python and show why something similar in clojure wont work.

people = [{"first" : "John", "last": "Doe", "age": 42, "company": "LinkedIn"},
{"first": "Harry", "last": "Smith", "age": 31, "company": "NASA"}]
companies = {1: "Google", 2: "RedHat"}

for id, p in enumerate(people, 3):
c_name = p.get("company")
companies[id] = c_name # mutates the dictionary

Do you see why a similar approach in clojure won't work? So, my first temptation to do something similar in clojure was like this:

(for [{:keys [company]} people
i (range 3 5)]
(assoc companies i company))

Yields this:
({1 "Google", 3 "LinkedIn", 2 "LinkedIn"} {1 "Google", 4 "LinkedIn", 2 "LinkedIn"} {1 "Google", 3 "NASA", 2 "LinkedIn"} {1 "Google", 4 "NASA", 2 "LinkedIn"})

Okay, so that's not what I want. There are 2 problems. The first is that I don't want the nested loop (that's why I get 4 entries). The second and perhaps more serious is that when the for sequence calls assoc, it's only associating the key-value pair once on the map, and then "forgetting" the change on the next item in the sequence. Remember, we're not mutating the map as we iterate through it. So I thought, ok, let me use recur.

(defn addme[coll, m, id, keyname]
(let [p (first coll)
val (p keyname)]
(if (nil? val)
m
(recur (rest coll) (assoc m id val) (inc id) (keyname)))))

(addme people companies 3 :company)

While this works, I hope you see that it's not the prettiest thing to look at. It's also rather verbose. So I scratched my head a little bit and realized I could use reduce.

Frankly, reduce had always been a little obscure to me. I had seen it used for things like +, but + can already take multiple args. So I was never quite clear where reduce would come in handy. Then it dawned on me....reduce is really recursion with a function that takes two args and it "accumulates" results.

For the moment, forget about the mapping of index to company, and let's look at perhaps a simpler problem. Here, we can map 1-27 to the letters a-z. There's always clojure's zipmap function to do this. If all you need to do is map one collection (as keys) to another collection (as values), it's pretty simple:

(let [alpha-int (range 1 27)
alpha-char (for [i alpha-int] (char (+ i 96)))]
(zipmap alpha-int alpha-char))

Another way is to use reduce. Generally, reduce transforms a sequence into a scalar value. But if you look at reduce, all it does is take a function that takes 2 arguments, and returns some value. What if the thing that is returned is a sequence? Remember, reduce initially pulls the first 2 items from your sequence, operates on those 2 values, and returns something. On the next iteration, that return value is then used as the first argument, and _one_ more item is pulled from the collection. This continues until the sequence is empty

Clojure's reduce has a handy form of reduce that instead of pulling the first 2 items from the sequence on the first iteration (really it's recursion), you supply an optional first argument. In that case, on the first iteration, only one item is pulled from the sequence

(defn map-seq [m val]
(let [offset (+ val 96) ; \a = 97 so if we have val starting at 1, it will map to \a
c (char offset)] ; convert 97 to \a
(assoc m val c)))) ; transform m by associating the new val to c

(reduce map-seq {} (range 1 27))

Think about what's happening on the first 3 calls

(map-seq {} 1) => {1 \a} ; (assoc {} 1 \a) => {1 \a}
(map-seq {1 \a} 2) => {1 \a 2 \b} ; (assoc {1 \a} 2 b) => {1 \a 2 \b}
(map-seq {1 \a 2 \b} 3) => ; (assoc {1 \a 2 \b} 3 c) =>; {1 \a 2 \b 3 \c}

So, getting back to our other problem, how would we use reduce? Like the above demonstrated, we need to create a function that we pass to reduce, that takes 2 arguments. The first argument is the map, and in this case, the second value is a 2 element vector.

; A simple function that takes a map, and a collection that is a key-value pair
(defn add-to-map [m coll]
(let [[k v] coll]
(assoc m k v)))

; Takes a sequence of maps, looking for a value in the map, and returns a
; mapped sequence
(defn make-indexed [keyname coll & start]
(let [[s] (if start
start
[0])
vals (for [m coll] (m keyname))]
(map #(vector (+ s %) %2) (range) vals)))

(reduce add-to-map companies (make-indexed :company people 3))

Okay, some of you may be thinking that's more verbose than the recursive function addme. However, there's an advantage to breaking this up into sub functions and using reduce. Those 2 subfunctions, add-to-map and make-indexed can be used in other scenarios. In fact, add-to-map can be used like zipmap

(let [pairs (map #(into [] [%1 %2]) (range 1 27) (for [c (range 1 27] (+ 96 c)))
(reduce add-to-map {} pairs))

Wednesday, December 3, 2014

A critique of python. Functional python to the rescue?

Perhaps my post about the OpenStack clojure tool may have given away some hints, but I am trying to distance myself from python. I believe that in the next 5-10 years, python will have much of its thunder taken by the likes of the new kids on the block: go, julia, and swift, and possibly even clojure or elixir.

I'm a big believer in "right tool for the right job". The trouble is that people think python is great for anything. I can already hear people sharpening their swords, but let me explain. Python can be used for just about anything. If a site like YouTube is mostly written in python, I think that goes to show its power. Also, look at OpenStack itself. A project with a million lines of code is nothing to sneeze at. However, python has its problems especially at scale. And I believe a big reason that big projects are written in python is because it's perceived as an "easy" language. Because it is "easy" that means there is a wealth of programmers to hire, and also it's very fast to churn out code. If code is easy to crank out, it seems logical to think that scaling should be easier as well. Another big problem with python is that even though it is "multi-paradigm" it is basically an OOP language with mutable imperative programming baked in.

The trouble is that the strength of python is also its weakness. Python is good at rapidly banging out some idea to see if it works. Due to its dynamically typed nature, you don't have to write tons of boilerplate code like you would with Java or even C++. If you want to extend the abilities of a class, no problem, monkey-patching to the rescue. And very likely, some other programmer(s) have written a 3rd party library for you to consume. However, do you see the problems inherent with these powers?

Because python is dynamically typed, when you first read python code, it is not at all obvious what kinds of arguments you are supposed to pass into a function. Docstrings I hear you say? Even if docstrings exist, all too often they are out of date or even plain old wrong. How many times have you had to crack open a debugger and figure out why your function wasn't working as expected because the argument that you passed in was the wrong type? I would go so far as to say that the time saved from not having to statically type your variables is lost during debugging. At least Python3 introduced argument annotations (which btw, is not necessarily type hinting), perhaps because they realized duck typing isn't always enough? Duck typing, while convenient, offers very weak guarantees if your object *should* be calling some function. And monkey-patching is not the best for production quality distributed software. How can you be sure that some monkey-patched new field or function won't clash with what another client is doing? Finally, while it may seem like having a ton of 3rd party libraries is great, python never solved what OSGi for Java did (or what the upcoming project jigsaw will do for Java).

Let's say you have module foo, and it depends on importing module baz version 1.2. Then you have module bar, and it too has a dependency on module baz, but at version 2.0+. Sorry friends, but you are SOL in python. This is because python has a flat PYTHONPATH much like vanilla Java does with its CLASSPATH. OSGi solved this by essentially writing custom class loaders and adding metadata to OSGi modules (and I believe some JVM bytecode hackery too).

The weakness above don't even factor into consideration other problems of scale. Take for example speed. Unless you are running pypy, python speed leaves something to be desired. But not every problem is speed insensitive. Some may say this is a moot point and "my program is fast enough", as Guido himself seems to think. Even Guido says the solution is to write the slow parts of your python code in C(++). Really? Okay, for the moment, maybe the python people are safe, because your competitors programs written in ruby, lua or python as well, will probably be about as fast. And, you think, the feature set you churn out will be far greater than those statically typed guys programming in Java, Scala, C++, Haskell, etc etc. There's no way they could churn out as much functionality in the same time frame as python right?

Ok, let's buy into that last argument (even though I don't think it is necessarily true). What if you start comparing the old world dynamic languages (ie python, ruby, perl, etc) with the new generation of languages? Say for example Julia, go, elixir or clojure? Julia is blazing fast, and clojure is no slouch either. Now, let's say your competitor has written his product in one of these languages, while yours is still using python? That speed equivalence across the old school languages is gone, and your product will suffer.

And speed isn't the only scaling problem python has. Python has a model for concurrency which requires, in essence, a distributed model of computing even on a single machine. Yes, I can hear the chorus of shouting now, "Stop blaming the GIL!!!". And if you think your answer is multiprocessing, this is not always the easiest of solutions. Firstly, multiprocessing on Windows is a pain due to the requirement that all arguments to the new multiprocess must be pickleable. Secondly, multiprocessing itself doesn't save you whatsoever from synchronization even though you may think you will be even if using Queues for serialization (to be fair, most languages don't make concurrency easy, though some new languages claim to fame is that they do make it quite a bit easier like clojure, erlang or the scala via the Akka library). And since the end of Moore's law is around the corner (I remember reading once that at the 12nm limit, the electron can jump out, and we're almost there), that means speed increases will come about either from some form of concurrency/parallelism, or from a new computer architecture, perhaps fiber optics or quantum computers.

When you start writing huge amounts of code, especially concurrent code, python's lack of having immutable variables is really head scratching. Yes, I understand tuples, strings, ints, etc are immutable, but there's overhead in subclassing from one of those types to create your own immutable type. Also, persistent data types only exist in 3rd party libraries like pyrsistent. That means you can't really be sure if your dict is the mutable or immutable kind. A lot of people don't understand the importance of having immutable data, especially if they have never written concurrent code. But I can't understand how python still doesn't have a simple way of creating a read-only constant. Globals aren't necessarily evil if they are read-only.

So, why do functional programmers rave about immutable persistent data structures? Imaging a scenario as simple as this:

sizes = [10, 20, 30]
configure_sizes(size)

What will sizes be after running configure_sizes()? As a client, you don't know what that function does to the sizes variable. Ok, I can hear a smart aleck say I should have used a tuple. But what if I had used a dict? If sizes = 100 (a simple non-compound type) and configure_sizes() changed it, you'd be upset wouldn't you? Why should compound types be treated differently? Let's look at another evil of mutable imperative OOP.

class Foo(object):
def __init__(self, x):
self.x = x
def multiplier(self, y):
return self.x * y

def modifier(foo_obj, newval):
foo_obj.x = newval

f = Foo(10)
result = f.multiplier(2)
modifier(f, 3)
final_result = f.multiplier(2)

In this case, Foo.multiplier() method output depends on hidden state for the output. This is bad, because it means that the end client cant ever fully be sure, given the input he passes in, what the result will be. This is why in functional programming, the only thing that the determines the output, is the input.

Python's crippled lambda also kind of sucks. Why should it be limited to a single line for the expression?

Is python a bad language? Definitely not. There are some very cool features like generators, coroutines, list and dict comprehensions, and decorators. But combine the inability to do true type hinting, lack of standard immutable data types, no easy way to do concurrency, and by default imperative style of programming leaves a lot to be desired. However, writing in a functional style of python is possible.

Toolz
toolz is a python package that expands upon the functional paradigm of using higher order functions. I highly recommend reading through the site, as they make a good case for functional programming in python. Toolz includes some useful functions that work with sequences. For example, they have a function called accumulate which is very similar to must other functional languages reduce.

Pyrsistent
pyrsistent is a set of immutable data objects that can be used in python. They give you immutable vectors (lists), immutable maps (dictionaries) and even immutable classes. Immutable data is of immense help during concurrent programming. If you have multiple threads acting on the same immutable (read-only) data structure, you can never have inconsistent results. Even when you aren't writing concurrent code, it is very useful because you don't have to keep track of state all the time. Functional programming highly stresses to always use lexically scoped, pure functions. Pure meaning that given the same inputs to a function, you will always get the same outputs. There is no hidden state (such as globals

hy
A lisp running on python? coool. Now, it's no clojure, but it might be possible to get halfway there by using hy in combination with toolz and pyrsistent. Although they don't have literals for certain data types, it should be possible because hy has reader macros (which clojure lacks btw).

Sunday, September 7, 2014

Wrapping my head around python asyncio

Trying to understand python's asyncio is a challenge. First, I personally don't know what is more difficult, multi-threaded programming, or event driven programming. Multithreaded programming has the difficulty of properly finding and eliminating race conditions and dead|live locks. Event driven programming has the difficulty of non-intuitive flow of control and many layers of abstraction and indirection. So where do we even start? If you just start reading the official documentation on asyncio, you probably won't get too far. Reading PEP 3156 won't get you much farther, though I do recommend studying both.

My main motivation for learning asyncio is probably a little unusual. I wanted to write something like pexpect, without using pexpect. In a nutshell, I wanted to interact with a child subprocess perhaps more than once. Python's subprocess module doesn't let you do this exactly, even though you may think the Popen.communicate() seems to. The problem is that it is a "one-shot" communication. You feed it one string and then you are done. But what if you need to answer multiple prompts from your child process?

So where can we start? I'm learning this too, so as I go, I'll introduce more examples. So let's make a small example of calling a subprocess using asyncio. I won't explain it in detail in this blog.

I will however explain briefly what coroutines are. In a nutshell, a coroutine is a way to factor out code that uses yield. The reason is that yield is "contagious". The very presence of yield in a function turns that function into a generator. So what would you do if you realize that some code you have that uses yield could be factored out into its own code? A coroutine can be spotted by its use of the new "yield from" keyword.

subproc_shell.py

1    """ 
2    Took the example from the tulip project and modified it to make it more like pexpect 
3    """ 
4     
5    import asyncio 
6    import os 
7    from asyncio.subprocess import PIPE, STDOUT 
8    import re 
9     
10    
11   @asyncio.coroutine 
12   def send_input(writer, input, que, regex): 
13       """ 
14       The coroutine that will send its input to the input stream (usually stdin) 
15    
16       :param writer: The stream where input will go (usually stdin) 
17       :param input: A sequence of bytes (not strings) 
18       :param que: an asyncio.Queue used to check if we have what we need 
19       :param regex: a re.compile() object used to see if an item from the que matches 
20       :return: None 
21       """ 
22       input.reverse()  # We have to reverse because we pop() from the end 
23       try: 
24           while input: 
25               item = yield from que.get() 
26               #print("Pulled from queue:", repr(item)) 
27               if item is None: 
28                   break 
29               m = regex.match(item.decode()) 
30               if m: 
31                   line = input.pop() 
32                   #print('sending {} bytes'.format(len(line))) 
33                   writer.write(line) 
34                   d = writer.drain() 
35                   if d: 
36                       # writer.drain() returns a generator 
37                       yield from d 
38                       #print('resume writing') 
39                   writer.close() 
40       except asyncio.QueueEmpty: 
41           pass 
42       except BrokenPipeError: 
43           print('stdin: broken pipe error') 
44       except ConnectionResetError: 
45           print('stdin: connection reset error') 
46       except Exception as ex: 
47           print(ex) 
48    
49   @asyncio.coroutine 
50   def log_errors(reader): 
51       while True: 
52           line = yield from reader.read(512) 
53           if not line: 
54               break 
55           print('ERROR', repr(line)) 
56    
57    
58   @asyncio.coroutine 
59   def read_stdout(stdout, que): 
60       """ 
61       The coroutine that reads non-blocking from a reader stream
62       :param stdout: the stream we will read from
63       :param que: an asyncio.Queue object we put lines into
64       
65       """ 
66       while True: 
67           line = yield from stdout.read(512)  # use this instead of readline() so we dont pause on a newline 
68           print('Received from child:', repr(line)) 
69           que.put_nowait(line)  # put the line into the que, so it can be read by send_input() 
70           if not line: 
71               que.put_nowait(None)  # A sentinel so that when send_input() pulls this from the que, it will stop 
72               break 
73    
74    
75   @asyncio.coroutine 
76   def start(cmd, inp=None, queue=None, shell=True, wait=True, **kwargs): 
77       """ 
78       Kicks off the subprocess
79       :param cmd: str of the command to run
80       :param inp: 
81       :param kwargs: 
82       :return: 
83       """ 
84       kwargs['stdout'] = PIPE 
85       kwargs['stderr'] = STDOUT 
86       if inp is None and 'stdin' not in kwargs: 
87           kwargs['stdin'] = None 
88       else: 
89           kwargs['stdin'] = PIPE 
90    
91       fnc = asyncio.create_subprocess_shell if shell else asyncio.create_subprocess_exec 
92       proc = yield from fnc(cmd, **kwargs) 
93    
94       q = queue or asyncio.Queue()  # Stores our output from read_stdout and pops off (maybe) from send_input 
95       regex = re.compile("Reset counter") 
96    
97       tasks = [] 
98       if proc.stdout is not None: 
99           tasks.append(read_stdout(proc.stdout, q)) 
100      else: 
101          print('No stdout') 
102      if inp is not None: 
103          tasks.append(send_input(proc.stdin, inp, q, regex)) 
104      else: 
105          print('No stdin') 
106   
107      if 0: 
108          if proc.stderr is not None: 
109              tasks.append(log_errors(proc.stderr)) 
110          else: 
111              print('No stderr') 
112   
113      if tasks: 
114          # feed stdin while consuming stdout to avoid hang 
115          # when stdin pipe is full 
116          yield from asyncio.wait(tasks) 
117   
118      if wait: 
119          exitcode = yield from proc.wait() 
120          print("exit code: %s" % exitcode) 
121      else: 
122          return proc 
123   
124   
125  def main(): 
126      if os.name == 'nt': 
127          loop = asyncio.ProactorEventLoop() 
128          asyncio.set_event_loop(loop) 
129      else: 
130          loop = asyncio.get_event_loop() 
131      loop.run_until_complete(start('c:\\python34\python.exe dummy.py', inp=[str(x).encode() for x in (3, 3, 0)])) 
132      loop.close() 
133   
134   
135  if __name__ == '__main__': 
136      main()