Sunday, March 13, 2016

Haskell, java 8, and calculus

I bought the Haskell Programming book last week, and though the price tag was pretty hefty, it's been a good read so far.  It is the first book I have seen that actually covers the basics of the lambda calculus.  After that cursory examination I now understand currying much better, and the basis of lisp also makes more sense.  I also discovered that haskell can look a lot like lisp.

((+) (length [1,2,3])
       (length [x | x <- [0..20], ((==) (mod x 2) 0)]))

That basically adds the lengths of 2 lists.  You do need to convert some infix operators to prefix form (eg (+) or (==)), but otherwise, that's very lispy.  I'm sure a real haskeller would cry at that code, but honestly, I don't see anything wrong with it :).  I'm also getting a better grasp on haskell's types, namely the difference between type contructors, data constructors, and typeclasses.  If only C++ templates or java generics were like haskell's type classes!!

This is the number one thing I miss in clojure.  As cool as clojure is, the lack of type information still bothers me.  The little bit I've looked at core.typed leaves me wishing for a stronger type system that can handle type parameterization better.  Also, there's just the limitations of the JVM itself.  For example the lack of tail call optimization, continuations, lightweight threads or the generation of native binaries.  I've often said that clojure will be the gateway drug to haskell.  I think that clojure really kickstarted a lot of peoples interest in functional programming (even moreso than Scala which has a hybrid approach and didn't really require people to use immutable, persistent, or lazy data structures).  I can see haskell taking a prominent place in my programming future, but Clojure is still an interesting language, and I'll be using it for any JVM related project where I can (though I'll be exploring frege too).  I think clojure will be in my toolbox kind of like python (should be)...a tool to quickly hash out an idea, and then re-implement in a fully typed language.

Another thing that's caused clojure to lose a little bit of lustre for me is some recent problems I've been having with pheidippides.  Recently, I removed the Java bits of my code, because I discovered a way to workaround a bug I had found in clojure.  Everything seemed to work ok when a single Module sent just one message to the Controller.  But if I sent another message (either from the same Module or a different one), my code in clojure would sit in a while loop like this:

(while (.hasRemaining buff)
    (.read chan buff))

The buff is a size of one byte.  It was meant to read in the first byte of a message (the opcode) and given that determine how many more bytes to read in.  The Selector had already determined that the channel (the chan var there) was in the ready set.  Now according to the documentation it is possible that there really isn't data there from the SelectionKey.  So I double checked and called in a previous function (.isReadable sel-key), where sel-key contains the channel I'm reading from.  Since that returned true, the channel should have been readable and there must be at least one byte in it.  But for some reason, my clojure code would spin in that while loop forever.

However, when I resurrected my Java code....which ironically calls the very same clojure code, it works.

  public static PersistentVector test(SocketChannel chan) {
        IFn require = Clojure.var("clojure.core", "require");
        require.invoke(Clojure.read("pheidippides.messaging.core"));
        IFn getMessage = Clojure.var("pheidippides.messaging.messages", "get-chan-msg");
        PersistentVector msg = (PersistentVector) getMessage.invoke(chan);
        System.out.println("Opcode is " + msg.get(1).toString());
        return msg;
    }

Why is the Java class invoking the same function from clojure working, but the actual function from clojure is not?  Well, it turns out I forgot to do something in my clojure code that I was doing in the Java version.  In the Java version, as I was using an Iterator to walk through the SelectedKey set, and as I handled the event, I called iter.remove().  In the clojure code, rather than directly walk through the set with an iterator, I was using doseq to walk through the set.  However, at the end, I forgot to clear the set.  That meant that the SelectedKeys set was growing everytime a SocketChannel was sending data to the Controller.  The NIO Selector wants you to essentially remove a key from one of the sets (interest or ready set) which means that the Set is mutable.

Basically, although writing in pure clojure is nice, as soon as you have to reach down into Java, things get dirty.  Since I was using java's nio.channels, I have no choice but to deal with their mutable data.  Indeed, looking at my clojure code, I see how imperative it looks.


One reason I'll continue to use clojure is that for better or worse, the JVM is the world's number one platform (though javascript is mounting a big offensive on that front).  The JVM is continually improving, and in the Da Vinci project they are working on things like TCO, value types, and even the generation of native executables (though according to the JVMLS of 2015, that feature will only allow statically linked executables, and it will be a premium feature not in Java SE).  Since so many projects use the JVM in one form or another, it pays to understand the JVM and JVM languages.

In fact, at work, I had the need to grab all classes that had the @Test annotation applied to them from TestNG.  I'll get to that in a later post.  But I will say a few things:

  • Java lambdas take getting used to since they are in essence type inferred interfaces
  • Java lambdas don't seem to allow mutation of local variables
  • The stream API has a way to create new data structures instead of mutating existing ones
Although it was a little funky getting used to Java 8 lambdas and the new stream API, at least Java does seem to be heading in the right direction.  Unfortunately, there's just a ton of impure stuff and it can make things a pain.

The other thing occupying my time is refreshing my calculus.  About 3 weeks ago, there was a sale on udemy and I bought a couple of courses for $24, including 3 for calculus, and one for linear algebra.  Why go back over calculus?  I'd eventually like to study for the GRE, and my math is pretty rusty.  Plus, haskell has just gotten me back in the mood to get better at math.

I've always felt it was a shame that I minored in math and yet in my career, I basically haven't used it.  I think people tend to denigrate languages like haskell for being too academic, and not "real world" enough.  But at least there is scientific rigor to what they are doing.  I sometimes feel that "software engineering" is a misnomer.  "Programmers" are just churning out code that hopefully works.  But there's not really an analysis of what's going on.  As Rich Hickey said about simple vs. easy, the programming world wants the "easy" way to do things.  The problem is that very often "easy" is not composable, testable, or really what we need.  The easiness is just a veneer to get something "up and running", but the minute you run into trouble, you have to descend into a sea of madness to figure out what's really going on.

The argument I hear from the dynamic programmers is that types just get in their way.  But does it?  How many times does a python programmer have to sit at a debugger, and figure out why something isn't working, only to realize they passed in the wrong type?  How many times have you cursed at a null or None because some file wasn't there that you expected or a network hiccup caused a socket timeout?  Of course, all we have is anecdotal evidence, and if we can't prove that type information, Maybe monads that eliminate null conditions, or guaranteeing when functions can produce side effects are good....why go through all that trouble?  Plus, it seems that typing out a few extra characters is just too much for some (never mind that type inference can save you a lot of typing).

I truly think we need more rigor in the software engineering world.  To give you an example, at Red Hat, they want us to do more end-to-end scenario testing.  That sounds like a reasonable goal.  After all, unit tests don't guarantee that they will work in actual usage (perhaps your mock didn't do something the real component would do).  And ditto for functional tests.  But why?  Why should a functional test pass, but in the real world fail?  It's because of impurity.  If a functional (or unit test) passes, it's because your mock essentially guaranteed a certain result.  In effect, your mock made your test more pure.  But the real world gets in the way.  Network connections time out, files disappear, a hardware resource becomes unavailable,  a module changes some singleton or other global variable, or simply some part of your data mutates due to something else.

In a pure functional language, if a functional test passes but not in the real world, you know it has to be in one of your monads.  This helps pinpoint the problem much faster.  Good luck tracking down the problem if your language doesn't explicitly point out which functions are pure and which are not.

Since haskell essentially is mathematical, I figure getting better at calculus and linear algebra will put me back into a better frame of mind.  To be honest, calculus and linear algebra actually help in the real world too.  I feel like with what I've been doing lately, all I'm doing is pushing ghostly bits in the electronic ether.  Hopefully one day I can actually do something that will provide real-world value.


Sunday, February 28, 2016

Pheidippides and a new blog

I didn't realize it's been so long since I've written a blog.  I am going to try and correct that and write at least 2 blogs a month.  I've been up to a few things, so I'd like to talk about them as things progress.

The first thing though is that I've been at work on a messaging system I've named pheidippides.  It's partially a learning project, and partially something I can use at work.  In essence, it's a system and protocol that will handle both RPC and pub/sub style messaging.  Both of these will be handled in an asynchronous and decoupled nature.  Pheidippides is all about sending messages from one Module to another.

Encoding:
Rather than use JSON or XML to represent a message, it will use the binary encoded msgpack

Protocol:
Instead of using http with REST style mechanics, pheidippides will have its own protocol for interpreting messages

Transport:
While the first focus of pheidippides will be on TCP/IP, it will be abstracted enough to handle native (in-memory) transport, and some investigation into supporting dbus is also being done


Why yet another messaging framework?

Partially as I stated above, pheidippides is a learning project.  I've been curious about messaging frameworks and all the hoopla surrouding them.  From a more pragmatic standpoint, I wanted to replace our test framework's rampant usage of making SSH calls to do things.  I've also been considering making something akin to Ansible or Puppet, but using pheidippides as the means to "get stuff done".  Also, at work, we heavily make use of dbus messages, and I think I can make an adapter from pheidippides messages to dbus messages (given enough time).  If I can do this, it would be a huge advantage by being able to create a bridge between dbus and pheidippides (I'm gonna call it pdd for short from now on...reminder to self...make a shorter name for projects).  By having this bridge, any Module hooked into pdd would also get dbus events.  I'm also not unfamiliar with low-level protocols.  At my first job, we basically had a router who had attached services, and at LSI, we had message passing system between the (user library) driver and the firmware.

Also, most other messaging systems fall short for one reason or another.

HTTP w/REST:  Is still a request/response (synchronous) way of doing things.  Also, there's a lot of overhead with JSON or XML not to mention all the baggage of http

Websockets:  Websockets solves the synchronous problem of REST, but then you're left with implementing your own protocol anyway

AMQP: Comes close, but it's RPC mechanism takes a little work, and at least as I understand it, it requires a coupling of caller and callee

WAMP-proto:  I actually just discovered this yesterday, and it comes the closest to what I have been envisioning.  However, it doesn't use a binary encoding of messages, and although they say they will support outside of websockets, they appear to not have done so yet.

There's a ton of work I still need to do.  Here's a rough overview I've made

Right now, pheidippides isn't actually useable.  I just recently got it working to send a simple register message, but the Controller isn't even doing anything with it yet.  But, at least I got the encoding/decoding and transport for TCP/IP working.  If you look at the issues list, there's still a ton I need to do.

But I have to say, I'm having quite a bit of fun with this.  I'm learning a lot about networking on Java, as well as NIO2.  I have an idea to write a macro for something which I feel is a shortcoming in clojure (namely, that defrecords can't inherit...if you have defrecord A with fields foo, bar and baz, and I want another defrecord B that has those same fields but additionally quux, there's no clean way to do it in clojure without repeating the fields).

I'm also stoked to get pheidippides up and running to replace all of our SSH calls in our test code.  While SSH calls work ok, for one, they are very non-JVM'ish.  Basically we're coding in bash or shelling out to a subprocess, and then scraping the results.  That's just fugly in my opinion.  In fact, our code doesn't even get SSH output (or ProcessBuilder output) in real time (but my other library commando fixes that...although I've not replaced it in our testware yet).  Eventually, I'd like pheidippides to be the foundation for an agent based system to provision our test VM's.  The hotness seems to be Ansible, but while Ansible touts as an advantage that it doesn't require an Agent, I also view that as a weakness (because Ansible is a push-only system, it can not get events or messages that are not explicitly called for).

But beyond work, I envision using pheidippides as the basis for a real-time distributed analysis system.  I'm even going to look into Infinispan for the database, as that would also give me a distributed data-grid and distributed execution.

Finally, I'm going to move most of my blog posts to rarebreed.github.io (once I figure out how to use jekyll).  I like the fact that blogs are just markdown files, and that would let me share code examples and snippets much more easily.

Monday, September 7, 2015

How to evaluate forms given to a clojure macro without throwing an exception

OK, I probably shouldn't admit this, but it took me the better part of 2 days of straight coding to come up with a macro that I wanted.  In a nutshell, I wanted to be able to call a sequence of functions and collect the results, even if one of those functions would throw an exception.  For example, something like this:

209 (let [x 2
210       y 0]
211   (try+
212     (* x 2)
213     (* x y)
214     (+ 9 y)
215     (/ 1 y)))

Do you see why I needed a macro for the try+?  What if I had tried to write it as a function?  Since clojure is by default an eager language, it will try to evaluate the arguments first and then supply the results of the evaluation to the calling function.  However, it is quite possible that a function that is supplied as an argument to another function can throw an exception, which will result in the calling function failing, as well as any other "functions as arguments" to the right of the offending function not getting evaluated at all.  One way around that would be to quote each function call, and in the function evaluate it

   1 (defn awkward-try [& fncalls]
   2   (for [fnc fncalls]
   3     (try 
   4       (eval fnc)
   5         (catch Exception ex ex))))
   6 
   7 (awkward-try
   8     '(* 2 2)
   9     '(/ 1 0))

However, making the user quote the functions is unnecessary, although the solution for it was quite a bit more difficult.  Before I show you the working solution, I'll show a failed attempt to make it work, because sometimes, it's just as useful to show something you thought would work but didn't.

So an early attempt I made was similar to the awkward-try function above and it looked like this:

162 (defmacro firsttry+
163   "Takes a body of function calls and calls them lazily.  If a function throws
164    an exception, dont propagate it.  Collect the exception in the
165    results"
166   [& body]
167   `(for [arg# '~body]
168      (try
169        (eval arg#)
170        (catch Exception ex#
171          (println "caught exception")
172          ex#)))))

And if you try this, it seems to work:

(try++
  (* 2 2)
  (/ 1 0))
caught exception
=> (4 #error {
 :cause "Divide by zero"
 :via
 [{:type java.lang.ArithmeticException
   :message "Divide by zero"
   :at [clojure.lang.Numbers divide "Numbers.java" 158]}]
 :trace
 ...)


The problem is when you try to use let bound symbols:

(let [x 2
      y 0]
  (try++
    (* 2 x)
    (/ 1 y)))
caught exception
caught exception
=> (#error {
 :cause "Unable to resolve symbol: x in this context"
 :via
 [{:type clojure.lang.Compiler$CompilerException
   :message "java.lang.RuntimeException: Unable to resolve symbol: x in this context, compiling:(/home/stoner/.IdeaIC14/system/tmp/form-init96648934167159599.clj:4:5)"
   :at [clojure.lang.Compiler analyze "Compiler.java" 6543]}
  {:type java.lang.RuntimeException
   :message "Unable to resolve symbol: x in this context"
   :at [clojure.lang.Util runtimeException "Util.java" 221]}]
 :trace
...
 } #error {
 :cause "Unable to resolve symbol: y in this context"
 :via
 [{:type clojure.lang.Compiler$CompilerException
   :message "java.lang.RuntimeException: Unable to resolve symbol: y in this context, compiling:(/home/stoner/.IdeaIC14/system/tmp/form-init96648934167159599.clj:5:5)"
   :at [clojure.lang.Compiler analyze "Compiler.java" 6543]}
  {:type java.lang.RuntimeException
   :message "Unable to resolve symbol: y in this context"
   :at [clojure.lang.Util runtimeException "Util.java" 221]}]
 :trace
 ...)

Hmmm, so what's all this stuff about not being able to resolve symbol x and y when there are let bound symbols?  The key is in understanding how at macroexpansion time, the arguments that got passed in are exposed.  If you notice, I have a somewhat strange '~body in the for expression.  First off, it wasn't even clear to me what was in the body symbol once it was evaluated.  I couldn't just do ~body because the whole exercise of the macro was to avoid evaluating the body!  But, I did need to pull the elements out.

I also couldn't use ~@body, because that would have the wrong form in a for expression.  like let, loop, doseq, and binding, a for macro takes one or more pairs.  If I had done a unquote-splice, it would have done something like this when expanded:

(for [arg# (+ 2 2) (/ 1 0)]
  ...)

Which is not the right form.  So I thought ok, let me try '~body which I thought would return what body represented (including any substitutions), but without actually evaluating it because it would be quoted.  I thought doing that would be like this:

(for [arg#  '((* 2 2) (/ 1 0))]
  ... )

But that's not what happens, and what you really get is:

(for [arg# '((* 2 x) (/ 1 y))]
  ... )

And that is why the clojure compiler complains that it doesn't know what the symbol x and y are.  So ok, that explains the unknown symbol problem, but why did that happen?  Why didn't it substitute the value of 2 for x and 0 for y?  I honestly am not sure of the answer to that question.  Also, how can I substitute all symbols within each s-expression in the body of the macro one by one if I cant do ~body, ~@body or '~body?

After a lot of trial and error, I finally decided to try a different tack, and I looked at the or macro in clojure.  I saw that it just did a simple (stack overflowing) self-recursion based on different arities.    I realized I could do this too, but I wanted to save the results of calling each function.  It took me a little while to realize that once again, lazy-seq is your friend.

Here's the final code I finally came up with that works:

163 (defmacro wrap
164   "Takes a function call and surrounds it with a try catch.  Logs the function name
165    the args supplied to the function "
166   [head]
167   `(let [fnname# (first '~head)
168          args# (rest (list ~@head))]
169      (timbre/info "evaluating function:" fnname# ", args:" args#)
170      (try
171        ~head
172        (catch Exception ex#
173          [{:name fnname# :args args# :ex ex#}]))))
174 
175 
176 (defmacro try+
177   ([head]
178    [`(wrap ~head)])
179   ([head & tail]
180    `(lazy-seq
181       (cons
182        (wrap ~head)
183        (try+ ~@tail)))))
184 

The wrap macro is really just a helper macro to help print out what is getting called.  It takes one of the forms from body.  So from the above example the first execution, head will be (* 2 2).  Notice that the value of x does get substituted in (otherwise head would be (* 2 x) ).  I was able to use ~head on line 178 and 182, to do the substitution....but without evaluation.   Recall that with macros, an expression is not eagerly evaluated automatically.  So what happens here is:

(wrap (* 2 2))

But since wrap is itself a macro, (* 2 2) does not get evaluated yet.  That's why when it gets to the next form of (/ 1 0), it does not throw an exception as soon as wrap is expanded.  Otherwis try+ uses destructuring to split the forms submitted to it as a head and tail.

Just another note, it's a little tricky to figure when and where to start the syntax unquoting.  For example, in one of my earlier attempts, I did not put the ` syntax quote literal on line 180, but on 181 instead.  And what I noticed was that the macro would not evaluate lazily.  The try+ macro would evaluate all the forms given to it in one shot.  I believe the reason for this is because macros have a macroexpansion time.  Because I did not syntax quote the entire lazy-seq form, the macro expander was expanding the entire form all in one shot at macro expansion time, so when it got back to the evaluation run time phase, everything had already been calculated.

Some other gotchas I noticed was that lazy-seq either wants to go on infinitely, or if it is finite recursion, the final thing the recursive call must return must be some seq type.  If you notice on line 178, it returns a vector.   I needed that because as the end of the recursion, it has to return a seq type. That's why normally you see a pattern of:

...
(lazy-seq
  (if some-pred?
    (cons x (foo y))
    [])

Since cons takes (element, collection) as it's args, the 2nd arg to cons should be some kind of collection.  If you see an error like:

IllegalArgumentException Don't know how to create ISeq from: java.lang.Long

Then you are probably trying to cons a scalar element (a Long for example) to a sequence.

Sunday, August 23, 2015

Evangelizing clojure

Since I've started my new position where I get to work in clojure, I've been itching to try to get others at my workplace to see where clojure would be useful.  Currently, my workplace is a python, java, C and shell workshop (with a smattering of ruby here and there).  I'm one of the few engineers at my work where I get to work with clojure.

And that's sad.  Unfortunately, a common fear for most companies is the difficulty in finding engineers who are proficient in a certain technology stack.  I quite frankly find that a rather lame excuse.  Any engineer worth their salt should be able to learn a new language.  And in fact, I'd rather hire engineers who have a mind curious enough to learn a not-hot language with a very different paradigm.  If management's concern is that they want an engineer to "hit the ground running", I think they are sacrificing short term gains for long term benefits.  I used to work at a company that decided to use perl for all its scripting efforts, and they wound up having many perl "camps" where engineers spent an entire week going through intensive perl training.  If there are companies that make a living teaching new languages, why not take advantage of that?  So when I hear managers claim that the lack of engineers with the skill to program in clojure is a detriment, I find that as a weak excuse.

There's also the odd paradox that some companies don't seem to be so concerned about other new "hot" langauges like Go or Swift.  Perhaps it's because those two languages are backed by the giants Google and Apple respectively, and so therefore, they must have gotten something right.  Personally, having had a cursory glance at Go and Swift, I've found nothing particularly outstanding about them compared to other new languages without the hotness (clojure, elixir, rust, elm or julia for example).

So what can we clojurians do to help others understand where clojure could be a viable alternative?  I think we need to do several things:

  • Point out how language X has certain weaknesses that could be resolved with clojure
  • Point out how clojure can live synergistically with a Java ecosystem
  • Help train and educate others that lisps aren't as scary/gross as they think
  • Get people familiar with the tools and ecosystem of clojure
For example, I hope to release a set of tutorials to help compare and contrast how clojure could solve problems more elegantly than python.  It would cover things like how to do highly concurrent programs in comparison with python and how immutability can help make more robust programs.  I'd also show how python decorators, which are sometimes compared to lisp macros, fail to deliver the same power of a lisp-style macro.

Another topic that I don't see discussed too much, is how to integrate clojure with legacy java projects.  I'd like to create some articles talking about how to use TestNG with clojure, how gen-class really works, and how to plug clojure into a maven or gradle based program.  I'd also like to give more examples on how to use java interop constructs, including defprotocols, proxy and using them to bridge java and clojure.

Another hinderance is, IMHO, purely psychological.  I find people's first reactions to lisp syntax somewhat amusing.  It's such an immediate and almost visceral reaction that I have to truly wonder why lisp syntax is so (initially) despised by so many.  Is it perhaps because there is a relationship between lisp syntax and XML (and people hate XML)?  I remember my first reaction to lisp in college and I was just like "whoaaaa".  But I also remember my first reaction to python's syntax where white space mattered was like, "who the hell thought having white space matter was a good thing!!".  But after about 3 weeks, I didn't even notice it anymore.  And the same thing happened with me with clojure.  But how do you get people to even try clojure for 3-4 weeks?

Finally, another big barrier for people coming to clojure is the tools and ecosystem.  For starters, a large chunk of tutorials and videos you will see online use emacs + CIDER as the IDE.  I basically started learning emacs about 3 years ago in order to do clojure.  Now, I'm an older guy, so I'm not afraid of basic text editors unlike some young whipper-snappers who seem to be at a loss without a full fledged IDE.  Now for Java programming I do enjoy something like IntelliJ or Eclipse, but emacs is a pretty cool IDE for clojure.  While there is a plugin for vim and clojure, the majority of the community does work with emacs.  There's another interesting IDE called cursive which is supposed to have the ability to debug both clojure and java code which would come in handy.

Beyond the IDE, there's build tools, and so coming to grips with leiningen and perhaps boot would be useful.  Also, if you don't have any background in Java, while it's not absolutely necessary to know clojure, it will definitely help (the same is true if you don't really know javascript, but want to know clojurescript).  So some familiarity with the underlying runtime (the JVM or javascript engine) will go a long way to making you a better clojurian.

Figuring out Clojure vars vs. symbols

Although I realized that there was some kind of difference between a clojure var and a symbol, I hadn't really considered what the difference was.  To make matters worse, I just considered a clojure symbol as symbols are usually considered in other languages.  In other words, I just considered a symbol to be an object or reference to something that could be used by the program.  However, symbols have their own special meaning in clojure.

So, what exactly is a var?  When I first was learning clojure, I kept reading on websites and in the books I had that clojure doesn't have variables.  Instead, they have vars and bindings.  Well, ok, but what in the world does that mean, and how do vars differ from variables?  Furthermore,  I pretty much had assumed vars and symbols were (almost) the same thing.  For example, if I have:

(def foo 10)

Ok, so foo kind of looks like what other programming languages would call a variable.  But if foo isn't a variable, what is it?  It's a var right?

Hold on partner, we have to consider how we are looking at the thing called foo.  In the line above, yes, foo is a var.  But if i just type foo at the repl, what is it?  Or what is 'foo, or #'foo?

Let's step back for a moment and consider what Rich has wanted clojure to do.  Clojure is a language that dearly wants to separate identity, state and values.  Identity is what names a thing, state is a value at a moment in time, and values are...well, values :)  In Python, if I do this:

bar = [10]

Then bar is a variable which has the value  of 10.  However, it has not separated the notion of identity, state and value.  Identity, state and value are all commingled in the variable bar.

So back in clojure land,  how we look at  foo depends on how it is being evaluated.  Put simply foo (by itself) is a symbol which can be used to look up a var.  In this example, foo is our identity.  So you might now be wondering what the 10 is as that obviously seems to be a value.  Values have to be stored somewhere  and the var is what actually holds some value.

Normally we think of foo as neither a symbol nor a var, but a value.  In other words, I could just mentally replace the value of 10 wherever I see foo.  But wait kimosabe, you are forgetting about clojure's macros, but I am getting ahead of myself.   If I just type foo in the repl, I get its value back which is 10.

foo
10

(type foo)
java.lang.Long

Okay, so it seems like for all intents and purposes the symbol foo _is_ 10.  But is it?  What does the documentation say about def anyways?

boot.user=>; (doc def)
-------------------------
def
  (def symbol doc-string? init?)
Special Form
  Creates and interns a global var with the name
  of symbol in the current namespace (*ns*) or locates such a var if
  it already exists.  If init is supplied, it is evaluated, and the
  root binding of the var is set to the resulting value.  If init is
  not supplied, the root binding of the var is unaffected.

  Please see http://clojure.org/special_forms#def
nil

Hmmm, so (def foo 10) interns a var in the current namespace with the name of the symbol.   Have you wondered if def returns anything?

(println (def x 100))

Ah, so def returns the var itself.  The definition says that a var with the name of the symbol is created by a def.  Ok, is a symbol just a lookup name?  Where does it fit into the picture?  Consider this:

(symbol "foo")
(type (symbol "foo"))

What does that return?  It returns....gasp....a symbol :)  But what good is that?  It doesn't actually return 10.  Why not?  To get the value (the var contains) that foo represents, we could do something like this:

(eval (symbol "foo"))

But let's try another thought experiment to help illuminate the difference between vars, symbols and values.  Consider what this returns before trying this in the repl:

(var (symbol "foo"))

If you did try that in the repl, you'll notice that threw an exception...how rude!!

clojure.lang.Compiler$CompilerException: java.lang.ClassCastException: clojure.lang.PersistentList cannot be cast to clojure.lang.Symbol, compiling:(/tmp/boot.user2720475669809682962.clj:1:1)
           java.lang.ClassCastException: clojure.lang.PersistentList cannot be cast to clojure.lang.Symbol

Hmmm, so it looks like var is actually evaluating (symbol "foo"), and not the result of (symbol "foo").  Ok, let's try this:

(defmacro huh [var-string]
  `(let [x# (-> ~(symbol var-string) var)]
     x#))

So why did I have to make a macro?  var is a special form, so it doesn't eagerly evaluate the arg that gets passed into it.  By the way, try doing (-> (symbol "foo") var) and see what happens (and you'll see why I needed a macro).  If you look at the documentation for var, it says that it returns the var (not the value) of a symbol.  You can see that by doing this:

(type (huh "foo"))
clojure.lang.Var

So remember what we've done here.  By having (symbol "foo") we are creating a symbol.  This object does not evaluate to 10.  In fact, neither does getting the var which is pointed to by the symbol foo.  In order to actually get the value of the var object, we need to dereference it.  Let's make a small change to our macro:

(defmacro huh [var-string]
  `(let [x# (-> ~(symbol var-string) var)]
     @x#)) ;;

(huh "foo")
10

So why bother making a distinction between symbols and vars?  I mean, wouldn't it be simpler to just have the symbol directly reference the value?  Why have this 2-level look up system of symbol -> var -> value?  Recall what I said earlier about maintaining a distinction between identity, state, and value.  Another answer is to think about macros and macro expansion time vs. compile time.  Here's another exploration:

(doseq [elem '(def foo 10)]
  (println e "is a" (type e)))

def is a clojure.lang.Symbol
foo is a clojure.lang.Symbol
10 is a java.lang.Long
nil


Ahhhh, so when the reader looks at (def foo 10), foo is a symbol.   By having a var looked up by a symbol, and then the value retrieved from the var, we can delay actually getting the value....by retrieving the var instead.  Also, consider how many times clojure wants the symbol of a thing, rather than its value.  Furthermore, some clojure functions want the var itself rather than the value.  For example:

(defn ^{:version "1.0"} doubler 
  [x]
  (* x 2))

;(meta doubler)     ;; Wrong, the metadata doesnt belong to the doubler function, but the var itself 
(meta #'doubler)   ;; equivalent to (meta (var doubler))
{:version "1.0", :arglists ([x]), :line 1, :column 1, :file "/tmp/boot.user2720475669809682962.clj", :name doubler, :ns #object[clojure.lang.Namespace 0x2d471d43 "boot.user"]}


Another example is when we require or import from within the repl.  When you require or import from the repl (as opposed to when you use the :require or :import directives from the ns macro), it requires a sequence of symbols which refer to classes in the classpath.

Finally, remember that vars can have thread-local bindings.  That's why symbols shouldn't just point to values, as you may want to give another thread some other binding value.

I hope this makes the differences between vars and symbols a little more clear. 




Tuesday, July 7, 2015

Clojure....here I come!!

So, on July 15th, I'll be starting my new position at Red Hat, working as a Quality Engineer on the subscription-manager team.  One may be wondering why I would leave a hot product like Openstack to go into the relatively obscure quality team.  One word:  clojure.

I'll get to do clojure and be paid for it (and not have to be skunkworks)!  That alone is sufficient reason for me to have wanted to take on this role.  I'm pretty stoked about it, but my clojure has gotten a little rusty in the last few months.  If it wasn't for some hy code I was writing, I'd probably have forgotten a lot.

For example, for fun, I'm working on my first ever web application.  I wanted to do something fun because I've never made a web application before (I know...almost 9 years into my career, and I've never made a web app before).  So I am finally turning my role playing game ideas into a web app.  I saw this site that is a virtual roleplaying table and that looked cool.  But I'm far from that, and decided to just work on implementing characters and rules in clojure first.  Since part of my calculation requires working with exponents (yes, this game will require a computer), I thought it'd be neat to make a lazy exponent calculator.  I wanted something like this:

(take 4 (lazy-expt 2)) => (1 2 4 8)

And I was very confused about how to go about doing it.  Of course, lazy-seq was something I needed, but I couldn't figure out how to accumulate my results.  I really didn't want to force the user to pass in an accumulator.  That's when multiple-arity functions made me see the light.

(defn lazy-expt
  "Lazy sequence for exponents"
  ([base]
   (let [orig 1
           acc (* base orig)]
     (lazy-seq
      (cons orig (lazy-expt base orig)))))
  ([base acc]
   (let [total (* base acc)]
     (lazy-seq
       (cons total (lazy-expt base total))))))

The multiple arity allowed me to not require the user to pass in an accumulated result.  I very rarely use multiple arity methods and instead tend to use methods with default params or extra params (ie using & in the argument vector).

Just for fun, I'll start working on finding square (and other) roots to a number although the methods to do that look a lot more difficult.

I am currently learning luminus, clojurescript, webgl and HTML5.  I discovered that browsers have an experimental ability to get access to the webcam and microphone.  One thing I hate is forcing users to use flash, applet or plugin for that.  And instead of a 2d table mat, I want a 3d environment.  I'm also boning up on the OrientDB graph database, because that's what I'm going to use to store data.

I'm stoked.  It's an ambitious project, but as my grandfather used to say "shoot for the stars, hit the moon".  Or, "if it comes to you easily, it isn't worth it".  

Friday, February 27, 2015

How to modify a clojure map from a given sequence

Okay, this turned out to be rather challenging to me.  As I've been telling some people the syntax of clojure is relatively simple, the hard part is learning how to deal with immutability.  For a little OpenStack project I was working on, I wanted to convert the service catalog that got returned as a JSON string into something a little more search-worthy.  But the problem I was having was that I wanted to modify a map based on items in a sequence.  Here's a contrived example.

Suppose I have this list:

(def people [{:first "John" :last "Doe" :age 42 :company "LinkedIn"}
                     {:first "Harry" :last "Smith" :age 31 :company "NASA"}])

and this already existing map:

(def companies {1 "Google", 2 "RedHat"})

So, how do I create a new map of companies by iterating through people, and adding the company the person belongs to?


Unlike a mutable language, where I could just directly change the map upon each iteration, that wont work in clojure.  Let me show you an equivalent example in python and show why something similar in clojure wont work.

people = [{"first" : "John", "last": "Doe", "age": 42, "company": "LinkedIn"},
                 {"first": "Harry",  "last": "Smith", "age": 31, "company": "NASA"}]
companies = {1: "Google", 2: "RedHat"}

for id, p in enumerate(people, 3):
    c_name = p.get("company")
    companies[id] = c_name     # mutates the dictionary


Do you see why a similar approach in clojure won't work?  So, my first temptation to do something similar in clojure was like this:

(for [{:keys [company]}  people
         i (range 3 5)]
    (assoc companies i company))

Yields this:
 ({1 "Google", 3 "LinkedIn", 2 "LinkedIn"} {1 "Google", 4 "LinkedIn", 2 "LinkedIn"} {1 "Google", 3 "NASA", 2 "LinkedIn"} {1 "Google", 4 "NASA", 2 "LinkedIn"})


Okay, so that's not what I want.  There are 2 problems.  The first is that I don't want the nested loop (that's why I get 4 entries).  The second and perhaps more serious is that when the for sequence calls assoc, it's only associating the key-value pair once on the map, and then "forgetting" the change on the next item in the sequence.  Remember, we're not mutating the map as we iterate through it.   So I thought, ok, let me use recur.

(defn addme[coll, m, id, keyname]
  (let [p (first coll)
          val (p keyname)]
    (if (nil? val)
      m
      (recur (rest coll) (assoc m id val) (inc id) (keyname)))))

(addme people companies 3 :company)

While this works, I hope you see that it's not the prettiest thing to look at.  It's also rather verbose.  So I scratched my head a little bit and realized I could use reduce.

Frankly, reduce had always been a little obscure to me.  I had seen it used for things like +, but + can already take multiple args.  So I was never quite clear where reduce would come in handy.  Then it dawned on me....reduce is really recursion with a function that takes two args and it "accumulates" results.

For the moment, forget about the mapping of index to company, and let's look at perhaps a simpler problem.  Here, we can map 1-27 to the letters a-z.  There's always clojure's zipmap function to do this.  If all you need to do is map one collection (as keys) to another collection (as values), it's pretty simple:

(let [alpha-int (range 1 27)
        alpha-char (for [i alpha-int] (char (+ i 96)))]
  (zipmap alpha-int alpha-char))

Another way is to use reduce.  Generally, reduce transforms a sequence into a scalar value.  But if you look at reduce, all it does is take a function that takes 2 arguments, and returns some value.  What if the thing that is returned is a sequence?  Remember, reduce initially pulls the first 2 items from your sequence, operates on those 2 values, and returns something. On the next iteration, that return value is then used as the first argument, and _one_ more item is pulled from the collection.  This continues until the sequence is empty

Clojure's reduce has a handy form of reduce that instead of pulling the first 2 items from the sequence on the first iteration (really it's recursion), you supply an optional first argument.  In that case, on the first iteration, only one item is pulled from the sequence

(defn map-seq [m val]
  (let [offset (+ val 96)  ; \a = 97  so if we have val starting at 1, it will map to \a
          c (char offset)]      ; convert 97 to \a  
      (assoc m val c))))   ; transform m by associating the new val to c

(reduce map-seq {} (range 1 27))  

Think about what's happening on the first 3 calls

  1. (map-seq {} 1) => {1 \a}  ; (assoc {} 1 \a) => {1 \a}
  2. (map-seq {1 \a} 2) => {1 \a 2 \b}  ; (assoc {1 \a} 2 b) => {1 \a 2 \b}
  3. (map-seq {1 \a 2 \b} 3) =>             ; (assoc {1 \a 2 \b} 3 c) =>; {1 \a 2 \b 3 \c}


So, getting back to our other problem, how would we use reduce?  Like the above demonstrated, we need to create a function that we pass to reduce, that takes 2 arguments.  The first argument is the map, and in this case, the second value is a 2 element vector.

; A simple function that takes a map, and a collection that is a key-value pair
(defn add-to-map [m coll]
  (let [[k v] coll]
    (assoc m k v)))

; Takes a sequence of maps, looking for a value in the map, and returns a
; mapped sequence
(defn make-indexed [keyname coll & start]
  (let [[s] (if start
                   start
                   [0])
         vals (for [m coll] (m keyname))]
    (map #(vector (+ s %) %2)  (range)  vals)))
  
(reduce add-to-map companies (make-indexed :company people 3))

Okay, some of you may be thinking that's more verbose than the recursive function addme.  However, there's an advantage to breaking this up into sub functions and using reduce.  Those 2 subfunctions, add-to-map and make-indexed can be used in other scenarios.  In fact, add-to-map can be used like zipmap

(let [pairs (map #(into [] [%1 %2]) (range 1 27) (for [c (range 1 27] (+ 96 c)))
  (reduce add-to-map {} pairs))