Monday, April 30, 2012

Making a Scheme...the grand plan

Man I hope I don't jinx myself, but there's a pretty good chance I might be working in a new position soon and I won't be an SDET anymore.  Lest Mr. Murphy come and visit me, I won't say what I might be doing until all the i's are dotted, and all the t's crossed.  But I will say that I will be doing a lot more low-level coding again.

So, it looks like once again, I'll be switching focus on what I'll be doing for my hobby time.  I'll be getting knee deep into linux internals again and need to brush up on my C.  I'm also going to spend a little more time looking at the Minix source code.  But especially, I'll be looking at LLVM and Scheme, specifically PLT Racket.  Why all of this?  And isn't Scheme a higher level language?

Let me start with why I am looking at Scheme now.  Although Clojure is a pretty cool language (especially the Software Transactional Memory, which I haven't seen an equivalent of in any other Lisps), it's still in Virtual Machine land.  And unfortunately, Java isn't all that great at interfacing with low-level C shared libraries or OS API's.  That's where Scheme comes in.  There are a couple of flavors of Scheme out there that have the ability translate into C code (for example Gambit Scheme and Chicken Scheme).  And although both of those Schemes look kind of cool, I am currently looking at PLT Racket (formerly known as PLT Scheme).

I'm even reading a bit of the R6RS standard, in order to help me wrap my head around Scheme a little better (Clojure, though a relative of lisp, seems to me to be neither truly a Common Lisp nor Scheme derivative...in essence, it seems to be its own branch on the lisp family tree along with language like Shen).  Although python is nice, and it is pretty easy, I want to learn a new higher level language that will let me interface with C more easily.  Scheme fits this bill, but it does have a drawback that I mentioned above....no easy concurrency support (and yeah, I have read that continuations can be used for a kind of parallelism, but I am not sure it can be used for concurrency).

So this is where LLVM fits in.  LLVM is a set of libraries which provides a front end (convert source code to the AST), optimizer and code generator (to generate the actual binary machine code for a specific architecture)  for a programming language.  LLVM is an interesting project, as it aims to help people write compilers, interpreters, or even JIT/VM's.  It can do so by providing a "universal" Interface Representation called the LLVM IR (I like to think of it as a universal assembly).  One could even create a language (it has lexers, scanners and parsers as well) with it.  And LLVM makes it trivial to call into the C ABI.

Do you see where I am heading with this?

R6RS Scheme standard
LLVM to create a JIT'ed language that can easily interface to C

There's one last piece of the puzzle...concurrency support.  Right now, the rage seems to be Erlang style message passing (Actors) for concurrency, but STM is gaining a lot of traction (Scala is implementing not one, but three STM libraries, haskell has 2 STM implementations, and pypy is experimenting with STM support).  I found a pretty interesting article by a C++ guru Bartosz Milewski on STM, including a link to an academic paper on Transactional Locking II.

Previously, I had toyed with the idea of implementing Clojure style syntax in D.  But the more I think about it, I realized it failed to satisfy several goals:

1) The stable dmd compiler only supports x86.  I want to work with  ARM processors
2) The front end to dmd2 is not open source
3) The LLVM based compiler ldc is not stable for D and is lagging behind
4) While it would make adding support for D modules easy, my real goal is support for C libraries
5) I would have to master D syntax and lisp/scheme

So I decided to something even harder:
1) Learn LLVM, including garbage collection and JIT byte generation
2) Learn how to make a lisp reader (it will be a LLVM based app )
3) Figure out how to implement STM in all of this
4) Slowly add in R6RS requirements (but not all)


Yeah yeah...I only have so much hobby time.  But creating a language is really something I've wanted to do for a LOOONG time.  I am getting close to paying off all my debts, which means I will be able to go and start on my Master's degree fairly soon.  And I really want to be able to design my own language.  Yup, I know, it's a dream of a lot of people, but I still think it would be cool.  But I also want it to be practical.

Some of the things that appealed to me about Clojure was that:
1) The syntax seemed easier for me to read that other lisps and schemes (more than just parens)
2) The appeal of built-in concurrent programming via STM
3) More scheme-like functional approach (immutability as the default for example)
4) Supports both a JIT bytecode generation, and a AOT compiled mode

So I definitely want to keep these in the toy language I will create.  But of course, there's one big hole in Clojure...good C/C++ library support.  Where Clojure can easily interface with Java, I want this language to easily interface with C (and ideally in both directions...but that might be too hard to support for now).  As a consequence, I want the language to support both a JIT'ed and AOT compiled (native) mode.

This is obviously a monumental task.  But I think it will force me to become a better Computer Scientist.  I will have to get better at many of the fundamentals of CS.  And yes, you DO need the things you learn about in school at work (choosing the right data structures, understanding complexity analysis to find poor algorithms, the ability to prove your solution is correct, etc etc).  Moreover, I am a firm believer that having learned several languages, and more importantly, different styles of programming has made me a better programmer.  When I was interviewing for my new position (at my own company), one of the interviewers noticed that I had Clojure down on my resume.  He seemed impressed and curious at the same time (he had heard of Clojure, and did a tiny bit of elisp, but thought lisp was too hard).

My feeling is that all the naysayers of lambdas in the upcoming Java 8 will eventually see their usefulness, instead of just decrying them as a "me too" feature for Java to catch up with C# on the bullet point list.  But, much to my surprise, many engineers are loathe to change.  I guess that's just one reason I consider myself a scientist rather than an engineer (engineers after all want stability, but scientists, in their quest for truth must be willing to give up the old in order to learn the new).

No comments:

Post a Comment