Sunday, March 13, 2016

Haskell, java 8, and calculus

I bought the Haskell Programming book last week, and though the price tag was pretty hefty, it's been a good read so far.  It is the first book I have seen that actually covers the basics of the lambda calculus.  After that cursory examination I now understand currying much better, and the basis of lisp also makes more sense.  I also discovered that haskell can look a lot like lisp.

((+) (length [1,2,3])
       (length [x | x <- [0..20], ((==) (mod x 2) 0)]))

That basically adds the lengths of 2 lists.  You do need to convert some infix operators to prefix form (eg (+) or (==)), but otherwise, that's very lispy.  I'm sure a real haskeller would cry at that code, but honestly, I don't see anything wrong with it :).  I'm also getting a better grasp on haskell's types, namely the difference between type contructors, data constructors, and typeclasses.  If only C++ templates or java generics were like haskell's type classes!!

This is the number one thing I miss in clojure.  As cool as clojure is, the lack of type information still bothers me.  The little bit I've looked at core.typed leaves me wishing for a stronger type system that can handle type parameterization better.  Also, there's just the limitations of the JVM itself.  For example the lack of tail call optimization, continuations, lightweight threads or the generation of native binaries.  I've often said that clojure will be the gateway drug to haskell.  I think that clojure really kickstarted a lot of peoples interest in functional programming (even moreso than Scala which has a hybrid approach and didn't really require people to use immutable, persistent, or lazy data structures).  I can see haskell taking a prominent place in my programming future, but Clojure is still an interesting language, and I'll be using it for any JVM related project where I can (though I'll be exploring frege too).  I think clojure will be in my toolbox kind of like python (should be)...a tool to quickly hash out an idea, and then re-implement in a fully typed language.

Another thing that's caused clojure to lose a little bit of lustre for me is some recent problems I've been having with pheidippides.  Recently, I removed the Java bits of my code, because I discovered a way to workaround a bug I had found in clojure.  Everything seemed to work ok when a single Module sent just one message to the Controller.  But if I sent another message (either from the same Module or a different one), my code in clojure would sit in a while loop like this:

(while (.hasRemaining buff)
    (.read chan buff))

The buff is a size of one byte.  It was meant to read in the first byte of a message (the opcode) and given that determine how many more bytes to read in.  The Selector had already determined that the channel (the chan var there) was in the ready set.  Now according to the documentation it is possible that there really isn't data there from the SelectionKey.  So I double checked and called in a previous function (.isReadable sel-key), where sel-key contains the channel I'm reading from.  Since that returned true, the channel should have been readable and there must be at least one byte in it.  But for some reason, my clojure code would spin in that while loop forever.

However, when I resurrected my Java code....which ironically calls the very same clojure code, it works.

  public static PersistentVector test(SocketChannel chan) {
        IFn require = Clojure.var("clojure.core", "require");
        require.invoke(Clojure.read("pheidippides.messaging.core"));
        IFn getMessage = Clojure.var("pheidippides.messaging.messages", "get-chan-msg");
        PersistentVector msg = (PersistentVector) getMessage.invoke(chan);
        System.out.println("Opcode is " + msg.get(1).toString());
        return msg;
    }

Why is the Java class invoking the same function from clojure working, but the actual function from clojure is not?  Well, it turns out I forgot to do something in my clojure code that I was doing in the Java version.  In the Java version, as I was using an Iterator to walk through the SelectedKey set, and as I handled the event, I called iter.remove().  In the clojure code, rather than directly walk through the set with an iterator, I was using doseq to walk through the set.  However, at the end, I forgot to clear the set.  That meant that the SelectedKeys set was growing everytime a SocketChannel was sending data to the Controller.  The NIO Selector wants you to essentially remove a key from one of the sets (interest or ready set) which means that the Set is mutable.

Basically, although writing in pure clojure is nice, as soon as you have to reach down into Java, things get dirty.  Since I was using java's nio.channels, I have no choice but to deal with their mutable data.  Indeed, looking at my clojure code, I see how imperative it looks.


One reason I'll continue to use clojure is that for better or worse, the JVM is the world's number one platform (though javascript is mounting a big offensive on that front).  The JVM is continually improving, and in the Da Vinci project they are working on things like TCO, value types, and even the generation of native executables (though according to the JVMLS of 2015, that feature will only allow statically linked executables, and it will be a premium feature not in Java SE).  Since so many projects use the JVM in one form or another, it pays to understand the JVM and JVM languages.

In fact, at work, I had the need to grab all classes that had the @Test annotation applied to them from TestNG.  I'll get to that in a later post.  But I will say a few things:

  • Java lambdas take getting used to since they are in essence type inferred interfaces
  • Java lambdas don't seem to allow mutation of local variables
  • The stream API has a way to create new data structures instead of mutating existing ones
Although it was a little funky getting used to Java 8 lambdas and the new stream API, at least Java does seem to be heading in the right direction.  Unfortunately, there's just a ton of impure stuff and it can make things a pain.

The other thing occupying my time is refreshing my calculus.  About 3 weeks ago, there was a sale on udemy and I bought a couple of courses for $24, including 3 for calculus, and one for linear algebra.  Why go back over calculus?  I'd eventually like to study for the GRE, and my math is pretty rusty.  Plus, haskell has just gotten me back in the mood to get better at math.

I've always felt it was a shame that I minored in math and yet in my career, I basically haven't used it.  I think people tend to denigrate languages like haskell for being too academic, and not "real world" enough.  But at least there is scientific rigor to what they are doing.  I sometimes feel that "software engineering" is a misnomer.  "Programmers" are just churning out code that hopefully works.  But there's not really an analysis of what's going on.  As Rich Hickey said about simple vs. easy, the programming world wants the "easy" way to do things.  The problem is that very often "easy" is not composable, testable, or really what we need.  The easiness is just a veneer to get something "up and running", but the minute you run into trouble, you have to descend into a sea of madness to figure out what's really going on.

The argument I hear from the dynamic programmers is that types just get in their way.  But does it?  How many times does a python programmer have to sit at a debugger, and figure out why something isn't working, only to realize they passed in the wrong type?  How many times have you cursed at a null or None because some file wasn't there that you expected or a network hiccup caused a socket timeout?  Of course, all we have is anecdotal evidence, and if we can't prove that type information, Maybe monads that eliminate null conditions, or guaranteeing when functions can produce side effects are good....why go through all that trouble?  Plus, it seems that typing out a few extra characters is just too much for some (never mind that type inference can save you a lot of typing).

I truly think we need more rigor in the software engineering world.  To give you an example, at Red Hat, they want us to do more end-to-end scenario testing.  That sounds like a reasonable goal.  After all, unit tests don't guarantee that they will work in actual usage (perhaps your mock didn't do something the real component would do).  And ditto for functional tests.  But why?  Why should a functional test pass, but in the real world fail?  It's because of impurity.  If a functional (or unit test) passes, it's because your mock essentially guaranteed a certain result.  In effect, your mock made your test more pure.  But the real world gets in the way.  Network connections time out, files disappear, a hardware resource becomes unavailable,  a module changes some singleton or other global variable, or simply some part of your data mutates due to something else.

In a pure functional language, if a functional test passes but not in the real world, you know it has to be in one of your monads.  This helps pinpoint the problem much faster.  Good luck tracking down the problem if your language doesn't explicitly point out which functions are pure and which are not.

Since haskell essentially is mathematical, I figure getting better at calculus and linear algebra will put me back into a better frame of mind.  To be honest, calculus and linear algebra actually help in the real world too.  I feel like with what I've been doing lately, all I'm doing is pushing ghostly bits in the electronic ether.  Hopefully one day I can actually do something that will provide real-world value.


Sunday, February 28, 2016

Pheidippides and a new blog

I didn't realize it's been so long since I've written a blog.  I am going to try and correct that and write at least 2 blogs a month.  I've been up to a few things, so I'd like to talk about them as things progress.

The first thing though is that I've been at work on a messaging system I've named pheidippides.  It's partially a learning project, and partially something I can use at work.  In essence, it's a system and protocol that will handle both RPC and pub/sub style messaging.  Both of these will be handled in an asynchronous and decoupled nature.  Pheidippides is all about sending messages from one Module to another.

Encoding:
Rather than use JSON or XML to represent a message, it will use the binary encoded msgpack

Protocol:
Instead of using http with REST style mechanics, pheidippides will have its own protocol for interpreting messages

Transport:
While the first focus of pheidippides will be on TCP/IP, it will be abstracted enough to handle native (in-memory) transport, and some investigation into supporting dbus is also being done


Why yet another messaging framework?

Partially as I stated above, pheidippides is a learning project.  I've been curious about messaging frameworks and all the hoopla surrouding them.  From a more pragmatic standpoint, I wanted to replace our test framework's rampant usage of making SSH calls to do things.  I've also been considering making something akin to Ansible or Puppet, but using pheidippides as the means to "get stuff done".  Also, at work, we heavily make use of dbus messages, and I think I can make an adapter from pheidippides messages to dbus messages (given enough time).  If I can do this, it would be a huge advantage by being able to create a bridge between dbus and pheidippides (I'm gonna call it pdd for short from now on...reminder to self...make a shorter name for projects).  By having this bridge, any Module hooked into pdd would also get dbus events.  I'm also not unfamiliar with low-level protocols.  At my first job, we basically had a router who had attached services, and at LSI, we had message passing system between the (user library) driver and the firmware.

Also, most other messaging systems fall short for one reason or another.

HTTP w/REST:  Is still a request/response (synchronous) way of doing things.  Also, there's a lot of overhead with JSON or XML not to mention all the baggage of http

Websockets:  Websockets solves the synchronous problem of REST, but then you're left with implementing your own protocol anyway

AMQP: Comes close, but it's RPC mechanism takes a little work, and at least as I understand it, it requires a coupling of caller and callee

WAMP-proto:  I actually just discovered this yesterday, and it comes the closest to what I have been envisioning.  However, it doesn't use a binary encoding of messages, and although they say they will support outside of websockets, they appear to not have done so yet.

There's a ton of work I still need to do.  Here's a rough overview I've made

Right now, pheidippides isn't actually useable.  I just recently got it working to send a simple register message, but the Controller isn't even doing anything with it yet.  But, at least I got the encoding/decoding and transport for TCP/IP working.  If you look at the issues list, there's still a ton I need to do.

But I have to say, I'm having quite a bit of fun with this.  I'm learning a lot about networking on Java, as well as NIO2.  I have an idea to write a macro for something which I feel is a shortcoming in clojure (namely, that defrecords can't inherit...if you have defrecord A with fields foo, bar and baz, and I want another defrecord B that has those same fields but additionally quux, there's no clean way to do it in clojure without repeating the fields).

I'm also stoked to get pheidippides up and running to replace all of our SSH calls in our test code.  While SSH calls work ok, for one, they are very non-JVM'ish.  Basically we're coding in bash or shelling out to a subprocess, and then scraping the results.  That's just fugly in my opinion.  In fact, our code doesn't even get SSH output (or ProcessBuilder output) in real time (but my other library commando fixes that...although I've not replaced it in our testware yet).  Eventually, I'd like pheidippides to be the foundation for an agent based system to provision our test VM's.  The hotness seems to be Ansible, but while Ansible touts as an advantage that it doesn't require an Agent, I also view that as a weakness (because Ansible is a push-only system, it can not get events or messages that are not explicitly called for).

But beyond work, I envision using pheidippides as the basis for a real-time distributed analysis system.  I'm even going to look into Infinispan for the database, as that would also give me a distributed data-grid and distributed execution.

Finally, I'm going to move most of my blog posts to rarebreed.github.io (once I figure out how to use jekyll).  I like the fact that blogs are just markdown files, and that would let me share code examples and snippets much more easily.