Sunday, February 19, 2012

Laziness in clojure. Or "Why didn't I see it was just induction?"

Ever have one of those "A-ha!" moments of clarity?  One of those, "ohh, so that's what Eureka means" events of enlightenment?

So people who have been doing functional programming for awhile, and those more astute than I will probably go, "well, yeah...duh", but I just had a great realization that creating lazy sequences is really just the inverse of recursion.  And what's the inverse of recursion?  Induction.

So, I'm not going to go into what induction is exactly (whether it be mathematically speaking from a weak or strong perspective).

I decided to use a function I've written recursively several times that is of practical benefit (kinda) and turn it into a lazy sequence instead.  Basically this is an interest calculator, given a certain starting principal, an amount that is saved (per year), and an interest.  For my recursive solution, I also give a number of years, and in this case, I walk up to the number of years (starting with year 1).  But for a lazy sequence, this isn't necessary.  This is because, like induction, you don't try to hit the base case.

First, let's look at the recursive solution, then compare it to the lazy one.

(defn savings
  [ base saving interest years ]
  (loop [ b base
          y years ]
    (if (= y 0)
      b
      (do
        (println "year =" y "base = " b)
        (recur (* (+ b saving) interest) (dec y))))))

I did add a little print here so you can see how the interest rate accumulates.  But this is your basic tail-call recursive solution to a problem like this.  You would call it like this:

(savings 0 4000 1.07 10)


Which would calculate how much you would save in 10 years with a starting principal of 0 dollars, saving 4000 a year, and at a 1.07% total interest (yeah, pretty good I admit) for a total of 10 years.

But what about a lazy sequence solution?  And why would you want a lazy solution anyway?  Let's look at the lazy solution first.  In some ways, it is easier.


(defn lazy-savings
  [ base savings interest ]
  (let [acc (* (+ savings base) interest)
        b (+ base acc savings) ]
    (lazy-seq
     (cons b (lazy-savings b savings interest)))))

Notice the call to lazy-seq at the last line from the bottom.  This is what replaces the recursive (in this case implicit recursion...notice I didn't use recur here) call with a lazy one.  Without the recursion being wrapped in lazy-seq, there would be an eager evaluation, meaning that lazy-savings would be called again.  And since there is no base-case as there is with the savings function above, it would eventually blow the call stack.

But why is this solution nicer?  Because it allows you to do more interesting things with it.  Whereas the function savings only returns a single value, lazy-savings returns a seq of values.  Let's call it with the following:


user=> (take 10 (lazy-savings 0 4000 0.07))

You should get this:



(notice the interest rate is only .07 rather than 1.07)

See how you get all the values?  If you wanted the final value, you could have called it with last.

(last (take 10 (lazy-savings 0 4000 0.07)))

Or if you wanted the 5th year, you could have either used (last (take 5 (lazy-savings 0 4000 0.07))) or you could have have done (nth (lazy-savings 0 4000 0.07) 5 ).  In other words, having a lazily generated sequence is more powerful than the recursive version.


So the takeaway is two parts:
1) prefer laziness when possible
2) wrap the recursive part in lazy-seq, and don't worry about ending at the base case

Perhaps the last part bears a little more discussion.  Recursion and induction are really two ways of looking at the same problem.  Both involve a base case...a known condition.  Generally, in an induction, you start with the base case and move forward, showing that if F(n-1) is true, then F(n) is also true (apologies to the math purists, as I can't recall if this is weak or strong induction).  With recursion, you normally "walk backwards" to the base case.  If for example you know that F(0) = 1, and you are solving for F(10), then you apply F(n-1) until you hit F(0) combining all the intermediate results.

Saturday, February 4, 2012

Implementing clojure...in D?

Well, if you read my last few posts, you know I've been looking at a system's programming language called D.  This is kind of a jack-of-all trades programming language, but what I find interesting about D is many of the features that you don't see in other systems programming languages in the C family (C/C++/ObjectiveC).  For example:

*lambdas
closures
nested functions
**const and immutable  (this is actually more secure than Java's final)
tail call optimization (Java might get this in Java 8)
***concurrency support (though no MVCC STM)
garbage collection
lazy evaluation of function args
true float, complex and imaginary numbers (ok, this is in other C family languages)

* C++11x does have support for lambdas, not sure about closures
**C++'s const is a huge confusing pain. D's seems a little simplified
***C++11x concurrency support appears (on my cursory examination) to consist of old-school locks and mutexes albeit in a portable language native fashion

There's also work on a LLVM compiler for D.  This got me to thinking that it might be feasible to implement Clojure in D.  Having LLVM support would enable a JIT compiler for D code, just as clojure emits bytecode on the fly for the JVM.  Having true TCO, a more bare metal approach, imaginary number support, real floating support, and even safer immutability might even give it a leg up on Clojure itself...just as PyPy is even faster than canonical cpython.  Implementing Clojure in C or C++ would be much harder I think, due to those languages lack of certain features.

Now first off, I will be the first to admit that I don't have the brain power to begin such a project, though if someone else took up the mantle, I would gladly help.  I simply don't know enough about compilers, automata theory, grammars, AST, lexers, parsers, scanners, etc to go about creating my own language.  It's always been a dream of mine...but I just don't have enough knowledge on the subject to go about doing it.  When I finally pay off all my original school loans and finally try to get my Master's degree, I'll think about this as a project.  But for now, it's just a nice fantasy.

You might be asking why I don't think clojure, as-is, is good enough.  While high-level languages are great for building applications that essentially just get, manipulate and update data in one form or another, when you have to get to the metal and talk to the OS, they really are not all that hot.  When you have need to get at drivers or system information, high level languages like Java or C# (or even languages like python or ruby) will leave you feeling frustrated.  Since I work as an SDET for a company that builds SAS controllers, I routinely have to deal with low-level issues at the driver level (or even firmware level...of course at that level, you're pretty much stuck with C/C++ or assembly).

While tools like SWIG or jnaerator helps, it leaves a lot to be desired.  I would LOVE to have a language with the expressive power and flexibility of Clojure, but with the ability to do low level calls with our many C/C++ libraries.   Yes, I am aware of JNA, bridj, and SWIG.  I've even played with HawtJNI a little.  While they are nice, dealing with callbacks or going the opposite direction (from C calling Java) is problematic.  That's why hand rolling JNI code, despite its difficulty, is in some ways still the best option.

Now admittedly, D doesn't natively understand header files, so it won't be a drop in replacement.  But since D understands the same data types, it doesn't look like too much of a stretch to convert header files to D (though admittedly, tedious).  For example, Java's lack of unsigned data types kills me when I do JNI (not to mention how much of a pain it is).  Python's ctypes is probably the easiest of the high-level languages to muck around with C shared libraries, but it is of course slow (though PyPy is helping in that area enormously).


This idea really has my brain itching, and I wish I knew more about how to get started (not to mention have the time to do it).  Not only would I have to learn all the aforementioned things about automata and compiler theory, I would have to basically become a guru in D and Clojure.  I've only scratched the surface of Clojure (I still haven't played with protocols or multimethods, and I've only made one toy macro).  And I am just now starting to learn D, and I can't wait for the book by Alexandrescu to arrive.


UPDATE: A thought just occurred to me.  I could start writing some of the persistent data structures in D, like clojure's persistent data structures.  This is something I could probably do now, and it would help solidify my understanding of data structures again.  So I am going to go about creating D versions of persistent maps, lists, etc.  I'll have to think about the sequence interface, and how I would implement that in D.  A wrinkle is that Andrei wrote about the disadvantages of only doing forward iterative algorithms (like clojure's seqs).  But I did see some examples in D of creating lazy containers, so at least I know it's possible to implement.

For reference, I will be looking at the clojure source code, and these:

Videocast of Rich Hickey on data structures
MIT's OpenCourseWare class that has a section on persistent structures
Andrei's article on using ranges instead of iterators






Friday, February 3, 2012

Using emacs and leiningen on Windows for clojure pt. 3: Getting SLIME'd

jinspector.clj In the last post, I made a booboo when I made the dependencies for our project.  The logback groupID is actually ch.qos.logback, not just qos.logback.  So make sure you change your project.clj file accordingly.


(defproject MyFirstCljProject "1.0.0-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.3.0"]
                 [org.jboss.netty/netty "3.2.6.Final"]
                 [ch.qos.logback/logback-core "0.9.30"]
                 [ch.qos.logback/logback-classic "0.9.30"]
                 [org.slf4j/slf4j-api "1.6.3"]])


Let's add our first source file.  Notice in your MyFirstCljProject, there's a src directory, and under that, there is a MyFirstCljProject directory.  Leiningen is kind of like Maven in that the src directory is the root folder for your package.  Leiningen, by default through the new command, created your top level MyFirstCljProject directory.  This is the first level element to your clojure package.  Still confused?

Let's say you wanted to create a namespace like:

MyFirstCljProject.jinspector

That means you would find a  ../src/MyFirstCljProject/jinspector.clj file.  As another example, imagine you wanted a namespace of tools.networking.netty-client.  When you have - in the namespace, you MUST have an underscore in the actual name of the file.  So you would have a folder path of:

../src/tools/networking/netty_client.clj.

So now that we've got some namespacing under our belt, let's actually write that file.  Let's create our own version of a Java class inspector.  There is a clojure-contrib.repl-utils which does what I'll do here, but this is just for illustrative purposes.  Open a file in emacs by using C-x C-f, and opening the C:/users/your_name/MyFirstCljProject/src/MyFirstCljProject/jinspector.clj


(ns MyFirstCljProject.jinspector)


(defn inspect
  [ x ]
  (let [cl (class? x) x (.getClass x)
        fields (.getFields cl)
        pfields (.getDeclaredFields cl)
        methods (.getMethods cl)
        pmethods (.getDeclaredMethods cl)
        p (.getSuperclass cl) ]
    (doseq [ x (concat fields members) ]
      (println x))))

Now that we have our method, let's actually try and use it...from SLIME!  The first thing we have to do is actually start our project.  Now, I've had some trouble using the swank-clojure command, 'clojure-jack-in'.  This command is supposed to allow you to automatically connect to a leiningen project with a slime interface.  Unfortunately, I haven't been able to get it to work.

However, you can start it manually.  While your jinspector.clj buffer is the active one, do a M-x elein-run-task, and when it asks you for the task, enter 'swank'.   Once you do that, you should have something like this:


Notice how the new buffer says "Connection opened on localhost port 4005"?  That's our queue that we can use slime to connect to the swank-clojure plugin.  You can do that with the M-x slime-connect command.  It will ask you for an IP address (use the default 127.0.0.1...that's your local machine), and also hit enter again to use the default port of 4005.  Once you do that, it should look like this:


Now we can begin playing with the clojure REPL!!


Looking at D language again.

Seriously, I need to stop reading so much on the web.  I've got the attention span of a 5yr old in a candy store when it comes to programming languages.  Somehow, I don't know how, I stumbled upon GLFW, which is yet another library aiming to help with OpenGL (like libSDL or freeglut or SFML).  It turns out it natively supports D, and that got me to thinking.   One thing led to another, and I came across an interesting article on concu rrency support in D by the renowned author and C++ guru Andrei Alexandrescu.

Another interesting aspect I discovered is the immutable support in D, and how it seems to mirror Clojure's default immutability (although in D, things aren't immutable by default...kind of like Scala, and that lack of default of default immutability has led people to call Scala not truly functional).  However, while Clojure uses Software Transactional Memory as its way of avoiding ugly locks to memory access, D is using a different approach to its memory model.  For example, by default, threads do not share data, and this has to be explicitly made so.  It seems that D is using the more tried and true message-passing approach made most popular by erlang. It also appears that at least for the PyPy developers, that STM may able to remove the GIL in Python, so I wonder if the message passing model is the best to use, given that clojure is using STM, and even Scala is adding a STM approach to concurrency above and beyond their actor model.

So now I am all the more interested in D.  It's basically a more modern alternative to C/C++...even C++11x.  While C++11x does natively support threads, it still seems to be using the traditional, "lock memory access with mutexes" approach.  Templates seem easier, and of course, there's built in garbage collection.

Since I am just starting, I am seriously considering using D and GLFW instead of C++ and SDL.  SDL does look more advanced than GLFW however.  Plus, as I mentioned before, I don't want my C++ skills to rust away.  But D just looks a LOT nicer than C++ now.