Tuesday, July 10, 2012

Getting back into C(++)

It's been a crazy last couple of weeks.  I started my new position as a linux driver developer on June 18th, followed shortly by the crazy Colorado Springs wild fire.  But, once again, I get to dive deep into native programming and get better at exactly how the linux kernel works.  Unfortunately, my C(++) skills have gotten rusty in the last 4 years or so.

It's not like I have written Zero C(++) programs, but they have been few and far inbetween.  So the last few days while I've been at home, I've been re-reading Bruce Eckel's free book on C++.  The reason I am reading this, as opposed to my old copy of C: The Complete Reference, is that A) I need to get better at C++ and B) Bruce Eckel's book basically compares and contrasts many of the differences between C and C++.

In regards to the 2nd aspect, I like that this book can essentially help me kill two birds with one stone.  It highlights many of the untyped features/pitfalls of the C language, and what C++ does to overcome them.  But by covering how C would do something, and then contrasting it with the C++ way, it's kind of like getting a refresher on C while learning C++.  Because of this, it is recommended that you know the fundamentals of C before reading Bruce Eckel's book.  I still highly recommend this book because it's one of the few books that actually talks about header files, what inclusion guards are, and at least a brief look at how and what linking object files is for.  Bruce also wrote a 2nd volume for C++, and it includes are more thorough examination of some advanced topics in C++ (eg, templates and the STL).  I also bought some notes from Scott Meyer's on C++11x since I think if I am going to write in C++, it may as well be the newest version with some of the included goodies.

As to the first point listed above, I need to get better at C++ because LLVM is written in C++.  I have been somewhat concerned to learn that the lead programmer (and probably others) for the LLVM project is paid by Apple, and thus the Mac is getting all the love (for example, LLDB only works on Mac OS X, and libc++ likewise is OS X only).  I have no love for Apple (flame me all you want, but their tyrannical control is almost the worst of any big company I know of, but as the article I linked to indicates, this is because I am all for freedom, even if that freedom requires a learning curve).  I find it ironic that in this country that is supposed to love freedom so much, we are willing to give up so much of it to corporations who dictate to us how things will be or makes it "just work" even if making it just work takes away my freedoms.  If you are going to argue that the "free market" handles this by letting the consumer decide to choose the company they want, tell that to all the litigation happy companies that use absurd patents to enforce their way of doing things.


But, I will stick with LLVM for Shi, since it is still an open source project.  Yes, I am still working on it, albeit very slowly.  I'm currently alternating between 3 books now, a book on compiler design, the SICP book, and a book on comparative programming languages.  Not to mention getting back up to speed on the linux kernel, and familiarizing myself with Gambit scheme.  Right now though, I am focusing on lexical analysis, or the ability to discern tokens from a text stream.  I decided to go full on with C++ for the lexer, so I've also been looking at using Boost's Regex library to help me do this.  The little tutorial that the LLVM project gives is just way too trivial, so I'm just going to plow through the Basics of Compiler Design book.

Fortunately, the r7rs draft already has a pseudo context free grammar, so that will help me figure out what to tokenize.  Of course, shi isn't going to be r7rs compliant...I just want to use that as a starting point.  I intend shi to be more grammatically similar to clojure actually, as I find that syntax easier to read than scheme.  I also like the type hints (annotations) from clojure better.  But of course, the biggest thing for me is going to be the ability to generate native code on the fly via LLVM/clang, so that I can call libraries dynamically and without having to do any weird data marshalling (my thought is the ability to essentially #include header files...which is one of the reasons I am looking at gambit scheme right now, to see how they compile scheme code to C code).