Saturday, May 5, 2012

How do you make a language? Good question...

So here I am, trying to figure out how to make my own scheme-like language, but I am not really sure where to start.  I actually never took compiler theory in school, but Automata Theory was a required class.  But even though I did take automata theory, that was many moons ago, and I have since forgotten a lot about it.  I mean, what DOES it take to create a language anyway?

How much do I need to know about lexers and tokenizers?  Does my language have to be understood by a LALR parser?  An LL parser?  And how do I make a parser anyway?  Do I have to use something like YACC and Bison, or maybe ANTLR?  What about my EBNF forms, how do I know they are complete?  Does the language have to be a context free grammar?  What does context free mean anyway?  Is a context free language different from a regular language?

And these are really more just grammar production and syntax questions.  What about creating control forms, concurrency support, tail-call optimization, etc etc that are features of the language?  Where does one even begin when trying to design a language?

That's why I've decided to look at the R6RS scheme reference as a starting point.  This language (which I am thinking of calling Shi, is the Chinese transliteration for the Pali word Vijnana which very loosely translates to "mind") won't actually be a scheme per se, but it will be scheme-like, just as clojure isn't exactly a lisp or scheme (however, the more I learn about scheme, the more clojure seems like a scheme, since it is a lisp-1 and it is more functional in nature).

However, that will only get me so far.  For example, what other features in the language need to be implemented?  What exactly is the goal of this language?  So I decided to list down some of the things I wanted to implement:


  1. Persistent data structures
  2. Lazy evaluation by default
  3. Dynamically typed by default, but with type hints
  4. Tail call optimized
  5. JIT'ed
  6. Support for continuation passing style
  7. Support for C FFI
  8. Some kind of concurrency support (debating between STM and message passing)
  9. Garbage collected
  10. lisp1 style lexically scoped with one namespace
  11. Hygienic macros only

Some features I'd eventually like to implement (but probably in libraries)

And that's for starters.  The design decisions above will impact the implementation.  From what I've read about LLVM so far, it looks like the LLVM IR will give me support for #5, #7 and #9 above.  It will also provide #4, but only on x86(_64) and PowerPC (but not the ARM...dammit).  #3 will be interesting, I'll have to think about how to do this (I'll probably peek at the Clojure source code, and see how they do this, as I think it's a pretty cool feature).  The immutable data structures will have to be provided at a somewhat higher level.  Although LLVM provides primitives for immutability, this is different from actually implementing persistent data structures.  Often, red-black trees are used to create associative arrays for example, but I still have to figure out how to make the structure persistent.

Even just figuring out what goes into making a programming language has been pretty fascinating so far, and I have only scratched the surface.  Ultimately, what fascinates me is the theory of computation itself, and I hope that creating my own language based on scheme will give me a greater insight into the lambda calculus and computation itself.

No comments:

Post a Comment