Tuesday, May 28, 2013

C++11 multi-threaded data structures and algorithms using waf: Part 1

I decided to kill four birds with one stone.  I realized how rusty my data structures and algorithm analysis knowledge is, so I decided to start writing a project going over some basic and advanced data structures (I'll start with some advanced linked lists, binary search trees, heaps, hash maps, and move on to more advanced structures like RB trees and Tries). 

The thread-safeness is the second bird.  Even the "basic" data structures will be advanced though, because I intend to make all these data structures thread-safe. I bought the book "C++ Concurrency in Action" by Anthony Williams and am on Chapter 3 now.  Although I'm relatively familiar with the synchronization features for the linux kernel (mutexes, spinlocks, semaphores, softirqs, tasklets, workqueues, etc), I need to learn this from an application level.  Writing multi-threaded capable software is notoriously tricky, but it's good for thinking about how everything runs as a whole.

The third bird I am going to kill is the C++ language itself.  C++11 feels like a new language with lots of nice features now.  With the addition of lambdas, I can write in a more functional style, and the auto and using keywords will make using templates easier. I've been spoiled by dynamically typed languages, and even Java.  That doesn't mean I think "thinking like the machine" is good or makes you better.  Unfortunately, it's a conceit I see all too often in low-level programmers.  My motivation to (re)learn C++ is to use LLVM.  In fact, thinking too low-level can be bad.  Why?  Edsger Dijkstra once said, "Computer Science is no more about computers than Astronomy is about telescopes".  Knowing about low-level details means you understand a concrete implementation, but what if the implementation changes (for example Quantum Computers or Lisp Machines)?  Just like how encapsulation in OOP protects you from implementation details, going too deep distracts you from solving whatever your domain problem is.  But if you're writing in a native language, then the machine-level implementation details are often unavoidable (for example, in kernel land, you sometimes have to be concerned even with the Out of Order execution abilities of the processor and put in memory barries, and the new C++11 multi-threaded memory model can also require thinking about this).

The final bird to kill is to learn a new build system.  I've actually already started writing a blog post about it already, so I'll just briefly mention it here.  For me personally, one of the most confusing and irritating parts about writing native code is the build system.  I hate Makefiles.  They are hard to debug and unless you use cygwin or mingw, you can't port it to Windows.  That being said, the build system I am using can generate Visual Studio solution projects, but I think it's better to have everything built by the same compiler.  In this case, I'm going to use mingw.  Why?  For starters, Eclipse can work with mingw and gdb, and thus you need only one IDE (or if you're like me, you can use gdb integrated with emacs).  It also means you only have to learn one ABI (ELF as opposed to ELF and COFF for windows...or whatever it is they use nowadays in Visual Studio).  There are other advantages to the build system I'm using (over CMake) and I'll post them in another blog.


Check out my next post where I'll go over the build system and the preliminary code I have so far.

Ultimately, I will post this project up on my Bitbucket account.  As with my other projects, this is essentially a self-guided tutorial.  I put these projects up so that:

1) I can reference them for future use
2) Others can see what I did
3) Others can use it for their own needs

It'll be a BSD type license.  So feel free to do with it as you please.


No comments:

Post a Comment