Saturday, February 4, 2012

Implementing clojure...in D?

Well, if you read my last few posts, you know I've been looking at a system's programming language called D.  This is kind of a jack-of-all trades programming language, but what I find interesting about D is many of the features that you don't see in other systems programming languages in the C family (C/C++/ObjectiveC).  For example:

*lambdas
closures
nested functions
**const and immutable  (this is actually more secure than Java's final)
tail call optimization (Java might get this in Java 8)
***concurrency support (though no MVCC STM)
garbage collection
lazy evaluation of function args
true float, complex and imaginary numbers (ok, this is in other C family languages)

* C++11x does have support for lambdas, not sure about closures
**C++'s const is a huge confusing pain. D's seems a little simplified
***C++11x concurrency support appears (on my cursory examination) to consist of old-school locks and mutexes albeit in a portable language native fashion

There's also work on a LLVM compiler for D.  This got me to thinking that it might be feasible to implement Clojure in D.  Having LLVM support would enable a JIT compiler for D code, just as clojure emits bytecode on the fly for the JVM.  Having true TCO, a more bare metal approach, imaginary number support, real floating support, and even safer immutability might even give it a leg up on Clojure itself...just as PyPy is even faster than canonical cpython.  Implementing Clojure in C or C++ would be much harder I think, due to those languages lack of certain features.

Now first off, I will be the first to admit that I don't have the brain power to begin such a project, though if someone else took up the mantle, I would gladly help.  I simply don't know enough about compilers, automata theory, grammars, AST, lexers, parsers, scanners, etc to go about creating my own language.  It's always been a dream of mine...but I just don't have enough knowledge on the subject to go about doing it.  When I finally pay off all my original school loans and finally try to get my Master's degree, I'll think about this as a project.  But for now, it's just a nice fantasy.

You might be asking why I don't think clojure, as-is, is good enough.  While high-level languages are great for building applications that essentially just get, manipulate and update data in one form or another, when you have to get to the metal and talk to the OS, they really are not all that hot.  When you have need to get at drivers or system information, high level languages like Java or C# (or even languages like python or ruby) will leave you feeling frustrated.  Since I work as an SDET for a company that builds SAS controllers, I routinely have to deal with low-level issues at the driver level (or even firmware level...of course at that level, you're pretty much stuck with C/C++ or assembly).

While tools like SWIG or jnaerator helps, it leaves a lot to be desired.  I would LOVE to have a language with the expressive power and flexibility of Clojure, but with the ability to do low level calls with our many C/C++ libraries.   Yes, I am aware of JNA, bridj, and SWIG.  I've even played with HawtJNI a little.  While they are nice, dealing with callbacks or going the opposite direction (from C calling Java) is problematic.  That's why hand rolling JNI code, despite its difficulty, is in some ways still the best option.

Now admittedly, D doesn't natively understand header files, so it won't be a drop in replacement.  But since D understands the same data types, it doesn't look like too much of a stretch to convert header files to D (though admittedly, tedious).  For example, Java's lack of unsigned data types kills me when I do JNI (not to mention how much of a pain it is).  Python's ctypes is probably the easiest of the high-level languages to muck around with C shared libraries, but it is of course slow (though PyPy is helping in that area enormously).


This idea really has my brain itching, and I wish I knew more about how to get started (not to mention have the time to do it).  Not only would I have to learn all the aforementioned things about automata and compiler theory, I would have to basically become a guru in D and Clojure.  I've only scratched the surface of Clojure (I still haven't played with protocols or multimethods, and I've only made one toy macro).  And I am just now starting to learn D, and I can't wait for the book by Alexandrescu to arrive.


UPDATE: A thought just occurred to me.  I could start writing some of the persistent data structures in D, like clojure's persistent data structures.  This is something I could probably do now, and it would help solidify my understanding of data structures again.  So I am going to go about creating D versions of persistent maps, lists, etc.  I'll have to think about the sequence interface, and how I would implement that in D.  A wrinkle is that Andrei wrote about the disadvantages of only doing forward iterative algorithms (like clojure's seqs).  But I did see some examples in D of creating lazy containers, so at least I know it's possible to implement.

For reference, I will be looking at the clojure source code, and these:

Videocast of Rich Hickey on data structures
MIT's OpenCourseWare class that has a section on persistent structures
Andrei's article on using ranges instead of iterators






No comments:

Post a Comment