Monday, November 11, 2013

Veterans and Memorial Day...a strange feeling

This holiday always brings out a strange feeling in me.  While Memorial Day is about remembering the fallen and is a somber day to reflect on sacrifices human beings have made for their fellow man, Veteran's Day elicits a mixed feeling in me.

My grandfather was a Captain in the Navy, and 3 of my great uncles all served.  Even my great aunt Kaye served as a secretary in the Pentagon and with the government before it was built.  My other grandfather was not in the military, but he and my oldest uncles sometimes used to go Japanese head-hunting at night as part of the Filipino resistance.  When I was a younger boy, I always wanted to know about my grandfather's war stories, but for some reason, I never asked him even though I was very proud of him.  To this day, I measure myself not to my father, but to my grand father.

But neither did my grandfather or great uncles tell me about WWII.  Perhaps they thought I was too young, or maybe they just didn't want to tell the stories.  My father never knew (or never told me) about their experiences either.  And that was the great mystery.

As a young boy and teen, I had my adolescent fantasies of war, but perhaps not like others.  Where others might have seen glory in war I did not (as Robert E. Lee once famously said, "It is a good thing war is so terrible, otherwise we should grow too fond of them").  As a teen, I was fascinated with the Vietnam War.  The more I learned about it, the more ashamed I became of America.  How could we deny a people the right to a democratic process?  Did we honestly believe we were going to stop Communism and its "Domino Effect"?  How could we treat the returning soldiers so cruelly and coldly?

I remember as a teen reading the book "Once a Warrior King", and the author had both of his legs and one of his arms below his elbow blown off.  When he got back home from his devastating wounds, a man asked if he had served in 'Nam.  The author said yes, and the man said, "serves you right".  The amount of hatred and rage that boiled inside me upon hearing this was surprising, even to me.  I wanted to literally kick the man in the gonads until he could no longer produce offspring.

About this same time (mid to late 80s), I was into playing role playing games.  I had already graduated from Dungeons and Dragons at this point, and was playing a Post-Apocalyptic game called Twilight 2000.  I was also running a campaign using the Phoenix Command rules in Vietnam.  By this time, I had already read several books on the Vietnam War.  Even one of my "hippie" aunts and uncles gave me a "Bright Shining Lie"  by Neil Sheehan to read.  The movie Platoon had also come out around that time and fascinated me.

But my roleplaying sessions were not like a youthful idolatry of desiring to be a soldier.  Despite my age, I tried to inject the senselessness of the war, of comrades dying, and making the other players having to face the moral dilemma of shooting children coming at them with grenades.  I remember watching Platoon with the scene where Sgt. Barnes has a gun to a little girl's head and a tear coming to my eye.

So if war seemed to terrible to me, why in the world would I want to re-enact it?  Why was I seriously considering going into the military (early 90s)?

Desire.

Yes, desire or at least infatuation.  I wanted to know the wisdom that these men like my grandfather and great uncles had learned.  As Aeschylus said in Agamemnon (taken from http://wist.info/aeschylus/6209/):

Wisdom comes through suffering.
Trouble, with its memories of pain,
Drips in our hearts as we try to sleep,
So men against their will
Learn to practice moderation.
Favours come to us from gods.

These men saw things and experienced things that I will never truly comprehend.  I do not wish to see death or have to kill others.  Not by a long shot.  But the idea that these men had a secret knowledge that I did not used to consume me.  How could I be truly wise unless I experienced suffering like they did?

But as I got older, I realized that I can experience the pain and suffering of others without having to go through an awful war.  Perhaps that is why I was attracted to Buddhism.  All other religions attract you because of promises of eternal life and salvation.  Not Buddhism.  Buddhism offered only one thing: to be free from suffering.

It is interesting that the only path to Buddhism is through suffering.  And once you have crossed the stream just a little bit, you will see that others pain is your pain too.  It is immense and hard to bear at times.  But then I think about what these veterans have seen, and what little suffering I have experienced pales like a stubbed toe in comparison.

So here is my salute to all veterans, past and present.  It is a strange thing to be in a profession where you hope your skills are never used.


Sunday, October 27, 2013

What to study...real world pragmatism vs. haskell, scheme and CS fundamentals

So there's a lot of stuff to catch up on mainly because I've been very busy.  I switched jobs back in August and moved to Salt Lake City, Utah.  I went back to being an Automation/tools engineer rather than doing linux driver development.  Why?  I'll go over that in another post as it deserves its own topic.

Currently, I'm mostly doing some heavy duty and advanced python (metaclasses, lots of decorators, import machinery hooks, and an asynchronous socket based message system), and pretty soon I'll be working on some C++ code as well.  I've been working on some messaging frameworks too (MOM).  And lastly, I'm still plugging away at some computer science fundamentals.  I haven't touched my data structure project in awhile due to the business of moving and switching jobs, but I intend to get back on it.

In fact, I've been reinspired of late by reading a couple of books.  The first is the classic SICP.  Although I downloaded a pdf version, it's only the last few days I actually started reading it.  I finally got to the part about Normal Order evaluation versus Applicative Order evaluation and I had never considered that before.  In addition to the data structures project I have in C++, I figured I'd start working through the SICP problems and solve them either in Racket or Clojure.

While learning more about Scheme, I came across an interesting tutorial/book called  "Write Yourself a Scheme in 48hours" which is a guide on how to make a scheme in haskell.  I started reading that after finding Real World Haskell and Learn You a Haskell online.  I just started getting into Haskell a few days ago, so the syntax is a little odd to me still.  Nevertheless, I found it less confusing than 2 or 3 years ago while looking at it.  Haskell seemed to come up a lot while I was reading different forums or blogs on Clojure as there seemed to be quite a few comparisons.  I still don't quite understand Monads, but what I have seen so far is pretty interesting.

The idea of writing a Scheme in Haskell intrigued me more and more.  It seems like there is no language with the set of features that I'd like:

  • immutable data structures by default (not in Racket)
  • "native" continuations (not in haskell or clojure)
  • lazy evaluation by default (not in clojure, possible in Racket)
  • STM implementation (AFAIK, not in Racket)
  • lisp style macros and homoiconicity (not truly available in haskell)
  • C(++) ABI compatibility (none with c++, hardest to do in clojure )
The last bullet point is in all honesty, probably the hardest, but should be something allowable by using LLVM to build the language.  Building a Scheme in LLVM would be no small undertaking.  C++ is a hard enough language by itself, and then learning compiler theory and the LLVM libraries on top of that would be quite a challenge.  I could in theory use Boost.Spirit (which is a C++ parser combinator library) to build the lexer/parser part for the language, but there's a part of me that wants to learn how to make my own parsers.  But if I'm willing to forego that last bullet point, I could make a Scheme variant written in Haskell, and it would have the best of both worlds (pure functional procedures and data structures, lazy evaluation, STM, and homoiconic lisp style macros).

While all of this stuff is fascinating, in the back of mind, there's a nagging little voice that keeps saying, "but is this going to help your career?".  Even though all these topics are fascinating, I do sometimes wonder if I wouldn't be better off learning Ruby, Hadoop, some web skills or brushing off my Java.  Afterall, that's what employers want.  As I've heard many managers say, they want a candidate that can "hit the ground running".  This is a sad and unfortunate failure of a corporate culture to think this way.

Literally the last day that I was working at LSI (my old company), I got an email from a recruiter at Google saying he would like to do an interview with me.  I was tremendously flattered (not to mention mad at the universe for the quirk of fate that the opportunity didn't present itself a month earlier).  About a week later, I got a similar offer for a job interview at Amazon and Facebook both.  Although these opportunities were amazing, even had I still been searching for a job, I knew I was not ready for none of these technical giants.

I am not a bad software engineer, but I also believe I lack some strength in fundamentals.  What do I mean by that?  Software engineering is about more than knowing the intricacies of some programming language, or being a guru at some popular framework or library.  IMHO, there's a vast gulf between being a programmer, and being an engineer.  Programmers can churn out code.  It can even be well thought out and planned code.  But it takes a different mindset to know _how_ to tackle a problem in the most efficient and maintainable manner possible.

For example, what are the run time or space time constraints for some algorithm?  Given the choice of using some data structure, how will multiple concurrent threads access that data structure?  How do you deal with constantly changing inputs to a program, do you write a regex, a parser, or just rewrite your wrappers everytime?

There's a reason Computer Science majors study the basics of computation.  In fact, while studying data structures and algorithm analysis is interesting and useful in the real world, there's still far more to it than that.  For example, Graph Theory is heavily used to solve many problems, from version control systems to finding paths in a complex data structure.  Perhaps it may seem a waste of time to learn about finite automatas, but without them, how do you build regular expressions?

Some might argue that the hard work has already been done.  Just grab a PCRE library or some built in library or data structure in your language and do some real work.  I've even heard some managers complain of a "research project" mentality.  The essence being, if you're not coding, designing, scoping or documenting, you're not working (ie, there's no room for research).  I've had quite a few engineers tell me I have a much different philosophy than they do about solving problems.  My philosophy is that if I already know a way to solve some problem that would take one hour to implement, but I heard that there's another possible more elegant or more efficient way to solve the problem, but it would take one day to learn and implement, I'd much rather take the day to implement it in the more efficient or elegant way.

Now, the trick is, where's that cut off in terms of practicality/pragmatism and engineering elegance?  Management would usually say it's better to implement the rough way first, and then improve it later.  Afterall, that's the Agile mantra isn't it?  Get something working first, and then make small modifications over time to improve it.  The problem is, that is a false dichotomy.  The truth is, you rarely have time to ever go back and improve upon a solution.  So the old crufty but usable implementation is what stays forever more.  As my mom and grandfather used to drill in me, "if you are going to do something, do it right the first time".  But that doesn't mean you have to build some grand cathedral or not at all.  You can still spend the extra time to learn the better algorithm but implement it in a simple style.

As a simple example, you might be tasked with searching for a value in some data structure.  The simplest solution is to create an array and then just search element by element.  Simple right?  But why not store it in a binary search tree instead?  Just a tiny bit more difficulty, but now instead of a O(n) search time, now you have O(log n) search time.  A self-taught programmer might not have any clue about binary search trees (I know I didn't when I was teaching myself C++).  For a more real world example, if you have a relatively complicated piece of a formally structured data, do you solve the problem with a regex or with some other parser?  Take for example the typical example of the get long options in C (and its variants in other languages).  In python for example, you might have "-f 10", "--foo 10" or "--foo=10" (and it might even have a default value so it may be missing entirely).  So how do you come up with a way to verify all the command inputs are valid, or to programmatically come up with all of them?

There is perhaps a darker side to that nagging little voice in the back of my head that tells me I should be studying some popular web framework.  On the one hand, I would eventually like to move back closer to my family in Florida.  The problem is, Florida's job market is pretty much either as a web guy (back or front end), a database guy, or something in the military industrial complex.  While none of those are particular bad, I don't have the skills in web or database technologies, and the military industrial complex is very fickle from what I have heard.  The other insidious little aspect of this annoying voice is that learning frameworks and libraries is the easy part.  From my experience, it just takes time and some practice to learn some framework.  But learning the fundamentals of CS is _hard_.  In fact, I've never been as challenged in my work career as I was in school.  I've often found that an interesting conundrum.  Why isn't work as hard as school?

So despite that little voice, I will continue on training my fundamentals.  I remember many years ago, I was taking a class in Choy Lay Fut, a "modern" kung fu style that was amalgamated from several other styles during the mid 1800s.  For two months, literally, all I did was stances; horse stance, bow stance, cat stance, etc.  But I knew why the sifu was doing that.  Without a solid foundation, it is easy for your opponent to take advantage of you.  Everyone wants to get to the kicks and punches and blocks, but without a solid grounding, it's all useless.  As another martial arts analogy, I remember while taking Aikido, we were all told to breathe in, relax and in a slightly low posture, the sensei would come by and gently push on the center of our sternum.  It was easy to resist.  He did this several times, and then one time, he very lightly tapped us on the forehead, and a feather weight touch to our sternum toppled us very easily.  Lesson learned....your foundation is all important, and don't let anything distract your mind.

Saturday, October 26, 2013

Something like OSGi for python

Ok, so I have been knee deep into python again for my new job as an automation engineer at Fusion-io.  Although python is a nice language for small to medium sized project, when you start getting into the middle tens of thousands of lines of code, python's "convention over bondage" philosophy starts to wear a little thin.

One problem I have been thinking of tackling is creating an OSGi like component based framework for python.  If you aren't familiar with OSGi for Java, it's a modularity system for Java that among other things, helps resolve CLASSPATH issues, versioning problems between jars, dynamic lifecycle of modules, and helps design more modular and reusable code.  Perhaps you are thinking that python already has modules and packages, and this isn't necessary.  But hold on, let's examine this little problem in more detail.

It's quite common in the Java world to hear the expression, "program to an interface, not an implementation". Java, lacking multiple-inheritance, has Interfaces to help resolve that problem.  But an interface's true strength is that it enforces the set of behavior that something should expose.  The idea is that the implementation should be a black box to the client programmer...he just wants the functionality exposed from the service.

You might for example have many different ways of retrieving an object over the net.  You might use FTP, or SFTP, SCP, http, or one of many ways.  Even amongst these choices there are subchoices.  Do you use python's built in ftp library?  Or do you have some ftp client utility that you want to wrap using subprocess?  Do you see the problem now?

The implementations change, but they all have a set of common functionality; the interface.  That is what you program to...the interface...not the module or class that exposes the concrete implementation of some functionality.  Being able to code against an interface allows you to swap out the underlying implementation.  Perhaps you are wondering why you would need to do that?

It is a sad fact that requirements are always changing or software breaks.  Perhaps some tool you are using changed the output and you were screen scraping, or the function defintion to some remote API you were using changes.  By decoupling the interface and the implementation, it eases these kinds of problems.  It also allows you to create an extension or plugin based system.

I can already hear some people say that python has no need for a plugin system.  Its dynamic nature makes plugins a "built in" feature of the language.  That's true, but it's also like the wild west.  Even python itself realized this with the creation of abstract base classes.  By creating an ABC, it allows a python developer a way to specify that a subclass must override and implement a set of methods.  If you use monkey patching to "plugin" new functionality, there's no way to guarantee that whatever got monkey patched is honoring the interface's contract.

Let's consider a more concrete problem.  Look at the example dependency diagram below:


So we have a module Work.py, that has a dependency on Command.py and Logger.py.  The problem is that Command.py also has a dependency on Logger.py, but of a different version.  At some point, Logger went through some kind of API breaking change, and Work uses the older version, but Command uses the newer one.  How can you load the Logger module in python?

PYTHONPATH won't work (nor with a .pth file).  Neither will locally installing Logger.py to different locations (instead of site-packages).  You can't for example, sys.path.extend(["/path/to/1.1/Logger.py", "/path/to/2.0/Logger.py"]).  Neither pip nor setuptools will help you because they will simply overwrite the module in site-packages (at least, AFAIK).  And finally, virtualenv or python3's new venv won't help either.

Why will none of these "standard" solutions work?  The problem is how __import__ works and how it loads module objects into sys.modules.  I'll be speaking here from a python3 perspective, as this is what I have done my research on.  It will be highly beneficial for readers to look at the python docs for the import system and importlib.  When a program calls import, three  basic operations are done:

1. Find the module
2. Create the module object
3. Bind the object into a global module namespace

 The first two are handled by the builtin __import__(), and the last is done by the import statement.  So the first thing python will do is look in the global namespace to see if the symbol is defined in the global module namespace.  If it finds it, it will return it.  This global module namespace can be seen by printing out sys.modules.  The key to the problem is in #3...python only has a global namespace for modules.  If you have two modules with the name Logger, the second time you try to load Logger, you will in fact get the first one.  Aliasing the module with:

    import Logger as Logger_ver2

Will not work.  So is the problem solvable? Fortunately it is.  You can use either the importlib or imp module to get around the problem, though it is a bit odd.  The key to understanding how this is works is in knowing that the import system uses Finders and Loaders.  Finders do the job of determining where to look for and check if a file really is a module (or package).  If the find is successful, this class returns a Loader object, and this Loader object actually returns the module object so that it can be used.  Using this method, you can give a name for the module as something other than the filename.  So you could for example load the module into sys.modules["Logger2"].

import importlib.find_loader

loader = importlib.find_loader("Logger", path=["/path/to/1.1"])
loader.name = "Logger1"  ## necessary since sys.module is a flat namespace
logger_v1 = loader.load_module()
logger = logger_v1.get_logger(__name__, "DEBUG")

loader = importlib.find_loader("Logger", path=["/path/to/2.0"])
loader.name = "Logger2"  ## change module name here too
logger_v2 = loader.load_module()
logger2 = logger_v2.get_logger(__name__, "DEBUG", "/path/to/logfiles")


The solution is a bit ugly, but it is possible. A more elegant solution would be to derive your own PathFinder and Loader classes.

However, this is just one small part of what a component based framework would have to provide. For example, how do you encourage programming to an interface and not an implementation? Isn't duck-typing enough? Sadly, no, it's not. Remember, an interface doesn't just specify the function signature, it also encapsulates the "what". An interface is supposed to hide the "how", but just because two classes both have a run() method, one class might mean that as launching a new process/thread and for another class it might mean something as totally different as telling the same process to start. In other words, a good interface doesn't just mimic the same function name, arguments and return type. It also exhibits the same behavior. How that behavior is achieved is irrelevant, but we need to make sure that everyone is saying the same thing.

Then there's the matter of dependency injection. Fortunately, this is much easier to do in a dynamically typed language like python, but it's still requires some thought, and what exactly a person means by "dependency injection". While decorators are powerful, they lack one crucial feature. Decorators can modify what happens before or after a function call. It can modify arguments to the function call. It can even replace the wrapped function entirely, or not run it at all. It can change the return type of a function. But what it can not do is modify what happens _inside_ the function. Unless you have the luxury is developing in a lisp dialect with their uber powerful macros, this will remain out of reach. The closest you can come to achieving this is to alter the AST of the function which requires some pretty deep knowledge of python internals. So how can you "truly" insert the dependency into a function? Fortunately, one thing decorators can do is change the arguments and even environment of a wrapped function.

Another aspect of many component based frameworks is the ability to dynamically add or remove components during the lifetime of a program. Any plugin system worthy of the name must be able to insert a plugin and provide features the plugin exposes. A more difficult matter is how to remove a plugin as other plugins may now have a dependency on another plugin. So a graceful removal of all plugins must be considered. Here again though the module system of python leaves a few things to be desired. The sys.modules acts as a cache and while it is possible to reload a module, entirely removing a module from the namespace is more tricky than just deleting the entry (afterall, sys.module is just a list).

Frankly I'm a little surprised at these "enterprise" features in python. While it's possible to do all the above, it requires some work. Indeed, there have been attempts by other python projects to provide the basis for these kinds of frameworks. Perhaps over time, I can contribute myself.

Sunday, June 30, 2013

So, how do you say "kowtow" in Mandarin?

So, on Thursday my order for Fluenz Manadarin 1+2+3 came in and I started it yesterday.  So far, it's pretty interesting, but I'm only on lesson 3 so far.  This is the first long (2hr) lesson and the first two were relatively short.  It's a bit expensive at $358, but I purchased it thanks to a rather large reward I got at work.  I had to do a bit of deciding between it and Rosetta Stone, but the reviews were hands down in Fluenz's favor.  

One rather big difference between Fluenz approach and Rosetta Stone's approach is that Rosetta Stone teaches by the immersion method, which seems to be good for children.  However, Fluenz believes (from indications in neurological science) that adults lose the "plasticity" of the brain that makes children such rapid language learners.  Unfortunately, when I was growing up, the prevailing attitude was that it was harmful to teach a child multiple languages as it may confuse them.  This is entirely backwards, and the best time to learn a language is as a child.  So Fluenz took a different approach which for lack of a better term is the "explanation" approach.  Rather than lots of memorization or showing lots of pictures (immersion method), basically things are explained to you as you go along.  So far, I've found their approach fairly easy. 

The biggest criticism of the Fluenz program is it's lack of teaching of Chinese ideograms.  I remember when I took Japanese in college, by far the hardest part was learning the Kanji (the Japanese version of Chinese ideograms).  Learning the stroke order and radicals was no fun and could literally take up 2+ hrs a day in homework.  In Fluenz, everything is in Pin Yin, which is a romanization and phoneticization of the characters.  In other words, it is teaching you to speak and understand, but not really read or write.  But I think that's ok for now.

I probably should have learned Japanese or even Tagalog, but I think Mandarin is going to be a very important language in the next few decades.  And who knows, maybe I can find a job in Hong Kong or Singapore.  You never know, maybe I could even work in Shanghai or somewhere in Taiwan (yeah yeah, I know, Taiwanese mostly speak Hokkien, and most people in HK speak Cantonese).  Regardless, knowing Chinese will open up some opportunities in my career.

It may even open up some possibilities in my understanding of Buddhism.  Many Buddhist sutras were only saved in Chinese when Buddhism was essentially chased out of India.  It's unfortunate that many people don't understand how much of the meanings of their religious scripture(s) has changed due to elements being lost in translation.  This is especially bad for the Bible, which went from Aramaic to Hebrew to Greek to Latin to English.  You almost have to wonder how much of the translation is left.  For Buddhism, I'd like to re-read some of the Sutras in the Chinese form and see what it is like.  Of course, that will require reading comprehension which this program doesn't teach.  Still though, it'd be cool if I could read Buddhist sutras in Chinese (and from what I understand, even though Cantonese and Mandarin are pronounced differently, they can each read and write the same).


I consider myself pretty good at learning new computer languages, so let's see if this applies to human languages as well.  The one thing that worries me is that I don't consider myself all that good at memorization.  In fact, I have a pretty horrible (short term) memory.  I've always said I was extremely grateful that almost all the classes in my Computer Science curricula were open note.  I think I am pretty decent at learning new concepts, but my memory is not so hot.  And of course, learning new vocabulary is essentially 100% memorization.  The thing with human languages is that you can't think like you do in a programming language.  In a human language, you don't translate from one language to another.  I always used to say in French, "Je dois traduire les mots francais dans les mots anglais. Je ne pense pas vraiment en francais".  Basically, you shouldn't translate the words in your head from english to the other language and vice versa.  It has to come naturally and without an internal translation.  That requires more than memorization, it requires making it a part of yourself...and for me, that's difficult.  

The other thing I am a little fearful of is pronunciation.  Mandarin is a tonal language with 4 tones.  Basically this means that the meaning of a word changes by how it is pronounced.  I have been complimented before on my french pronunciation, so I am hoping I have a good ear for understanding.  However, speaking the proper tones back might be a little harder.

Sunday, June 9, 2013

A little foray into induction and persistent data structures.

Part of the reason I am working on the C++ Data Structures and Algorithms project is to get back to the basics and master them.  Some recent setbacks have caused me to re-evaluate my knowledge of Computer Science fundamentals.  My two previous posts on C++11 and data structures is going to be an ongoing commitment from me in this regards.

Unfortunately, working at the hardware level has eroded a lot of my computer science fundamentals.  In the linux kernel for example, you almost exclusively work with linked lists.  The scheduler uses red black trees for fairness, but I have never had to touch the scheduler.  And for some surprising reason, there is nary an associative array (hash, dictionary, map, etc) to be found unless you're working with the filesystem.  And in firmware, good luck.  You're pretty much stuck with arrays (used as stacks, queues or ringbuffers) and maybe some binary search trees or a priority queue.  The only time in my 6.5 year career where I've extensively worked with more advanced data structures was in creating a dependency resolver for installing software for test automation.

While my basics in Computer Science might have eroded, I see many otherwise smart people who lack curiosity.   It is my curiosity that drives me to learn new things. I think curiosity is something you either have or don't.  Smart people might pick up a skill more quickly, but few I think have the drive to learn more and dig deeper.  I am constantly wanting to learn new skills, and perhaps that is my downfall as it tends to distract my attention.  That is why I am going to stick to getting these data structures and algorithms down.

I recently had two little confidence boosters.  The first was solving a problem while reading a Python book on Algorithms by Magnus Lie Hetland.  At the end of a section on Induction, it asked to solve Euler's Formula regarding vertices, edges and faces using induction.  The equation is:

V - E + F = 2

V and E are vertixes and edges respectively, while F is a face (or field).  You can think of a field as "space" captured by edges and the "universe" outside all the vertices and edges.  Take for example this simple graph:



It has V = 2, E = 2 and F = 2 (the "universe" field, and the field captured by the space partitioned by the edges/vertices).  And as you can see, Euclid's equation holds (2 - 2 + 2 = 2).  But how do you prove that inductively?  The equation V - E + F = 2 doesn't look like your typical equation since there are 3 variables!!

Or are there?  If you increase the vertex count by one, you must also increase the edge count by one.  Adding one vertex and one edge does not increase the field count because it is not creating a new one.  Therefore we are adding one vertex but canceling it with a new edge:

V - E + F = (V+1) - (E+1) + F = 2



What if we add a new edge instead of a new vertex?  If a new edge is added, then we also must create a new field.  Again, we are subtracting an edge but adding a field:

V - E + F = V - (E+1) + (F+1)


And since there is no way to add a field without adding an edge, we have all our bases covered.  I was actually wracking my brain on this one for about 30min and decided to let it rest.  I was cooking dinner when I just had that eureka moment and was like, "oh yeah, that's the answer".  I don't know of any engineer or scientist who has not had one of these Archmidean Eureka moments.  It's kind of freaky when you think about it...the answer just comes out of nowhere.  While these "out of the blue" answers are ultimately a good thing, it sucks during job interviews when you can't think of an answer and then it only comes to you later.

My second confidence boosting moment was when I realized one way to create a persistent tree structure.  Persistent data structures are more common than you may think.  Take for example a revision control system like git.  You can envision your code edits as nodes in a tree.  When you create a new branch, you aren't just  creating a "branch on a tree".  What you are really doing in fact is replicating the old tree, and sticking a new branch on the new tree.  Why do that?

If you mutate your original tree, how do you go back?  What if you delete a branch far up in the "arm" of the branch, but later, you wanted a twig hanging off the branch?  If you have a mutable tree, it can't be done because the branch containing the "twig" is gone.  So how do you solve this?  What you do is you create a "new" tree, and it is this new tree has new branches added or pruned off.  When you do your merge, you "graft" or 'prune' the branches from this "new" tree to the original.  I had a lot of trouble grokking functional programming and immutable data structures until I read an analogy in the book The Joy of Clojure by Michael Fogus and Chris Houser.  They analogized functional programming languages with immutability to those flip books that you see to create the illusion of movement.  In immutable languages, every page is a "version" of the data.  In mutable languages, you have one piece of paper, and every time you want to change something, you have to erase the picture and start over.  There's no "versioning".

Now, that might sound like a huge performance hit both in terms of speed and size.  Creating a duplicate tree wastes space not to mention copying a whole bunch of stuff around.  But that's a naive approach.  If you think about it, you don't have to duplicate the entire tree.  If you think about inserting a node into a tree, what  do you first have to do?  You first have to traverse the tree to find the node insertion point.  By doing this, you have already walked a branch of the tree...and that's all you need to duplicate (in order to get to the new node to insert).  In fact, you don't even have to "duplicate" this branch (which is for all intents and purposes, a linked list of nodes).  In other words, you don't need to copy the branch, you just need to make pointers to the nodes making up the branch.

You can create a (shared) pointer for each of the nodes you have walked through.  Once you get to the node where you will insert the new node, you create two new nodes.  You create a new node for the insertion point, and a new node for the actual node to be inserted.

This requires log n new shared pointers (not actual node objects, just pointers to the nodes), and requires creation of just 2 new nodes (versus mutating one for a mutable tree).  This is much better than duplicating an entire new tree of n nodes.  You now have 2 trees, the original and a new tree which is exactly the same as the original with the exception of the new node.  And now you get all the advantages of an immutable and persistent tree data structure.  Look at this graphical example:




Basically, what you are "duplicating" is a linked list that goes to the newly inserted node.  The green circles represents new pointers to the already existing objects.  Now you have two "heads", one in blue, and one in green.  The blue head does not have a node with a value of 11, but the green head does.  Since this was a small tree, the number of new nodes is fairly large in proportion.  On average, you will only need log n + 1 new nodes.  I have read that there is a way to implement this with O(1) new nodes, but I have not studied this implementation.

C++11 multi-threaded data structures and algorithms using waf: Part 2, waf build system

This is part 2 of my C++11 data structures and algorithms blog series.  You can see the first part here.  I have also finally uploaded the code to my bitbucket account which you can see here:

https://bitbucket.org/Dauntless/shi

So as I mentioned in the previous post, I have been writing a small C++ library for data structures and algorithms.  The only C++ I've done recently was to make a little ioctl library for one of the drivers at work, but it wasn't that much code.  Also, I really want to get into the habit of doing Test Driven Development.  I actually went out and bought two books on TDD both by Pragmatic Programmers:

Modern C++ Programming with TestDriven Development: Code Better, Sleep Better by Jeff Langr
Test Driven Development for Embedded C by James W Greening



One of the first things I've been tackling is a build system.  I think one of the most confusing parts to building native apps is the build system. Makefiles aren't really taught all that well in school, nor for that matter is the idea of logically breaking up a large project into modules.  No matter the build system, they are either ugly and super hard to debug when you don't get them right, or are a mysterious black box that you have no clue what it's doing.  And there are a ton of build solutions out there, so which one should you use anyway?  Do you care about cross-platform capability?  Ease of use?  Ability to debug?

I considered learning CMake as it seems to be pretty popular nowadays, but I decided on learning waf instead.  Why waf?  Because waf is just a set of python libraries, and since it's python, that means it is Turing complete.  Having a build system that is also a library offers several advantages.

For example, my project requires the Google Mock framework.  I am writing the waf build script so that if it doesn't find the Google Mock library on your system, it can download and install it for you.  Try doing that with a Makefile, Visual Studio Solution, Eclipse project, or CMake.  I could also programmatically run the GMock tests upon the finish of a build.  In my opinion, it's probably the closest a C(++) build system can get to a Java maven build. Admittedly, with more elbow grease required, but at least it's possible.

The waf build system is actually relatively easy to pick up...for the basics at least.  Like most build systems, it breaks up a software project into different tasks.  For example, there's a configuration phase where you can set up various compiler options or dependency checks, a build phase to actually generate the binaries, and an install phase where you can install the binary to the user's system.  You can also generate your own commands, and this is what I will do to hook the build with running the GMock tests.

The key concept to understand is that waf uses a wscript file as the build script.  This is a python script but without a .py extension.  Because the script is being called by waf, you don't even have to import anything.  

My actual project directory looks like this:

/home/sean/Projects
  /shi
      /src
          /gui
          /algos
          wscript
      /include
      /templates
  wscript


Wait! why are there two wscripts?  Normally, you only want one entry point for your build script, but it may not make sense to have to change into a particular directory to run the build script.  For example, the configuration doesn't need to be in a source directory.  Notice the wscript I displayed above has a function called build(), and this function calls bld.recurse().  That's where the second wscript comes in.  When you call the recurse() function, the parameter is a directory, and it will call the same function in the wscript as the one from which recurse was called.  Since recurse() is being called from the build() function, it will look for

./src/wscript

and call the build() method defined in _that_ wscript.  So that being said, let's look at the wscript in the  toplevel project directory folder.

'''
This is the waf build script for shi
'''
import os
import urllib2

top = '.'
out = "build"


def notImpl(fn):
    def wrapper():
        print "TODO: {0} is not yet implemented".format(fn.__name__)
    return wrapper



def find_gmock(ctx):
    '''
    Find the gmock/gtest library
    '''
    print "trying to find Google gmock"
    if "GMOCK_HOME" in ctx.env:
        print "GMOCK_HOME is in env"
        return True

    has_gmock = False

    if ctx.options.gmock and \
       os.path.exists(ctx.options.gmock):
        has_gmock = True
        ctx.env['GMOCK_HOME'] = ctx.options.gmock
    else:
        print "ctx.options.gmock is ", ctx.options.gmock
    
    
    if not has_gmock:
        getGmock()
        ctx.fatal("Could not find gmock/gmock.h")
    
    return has_gmock



@notImpl
 def getGmock(version):
     '''
     Will retrieve the google gmock source 
     '''
     pass
 
 
 
 @notImpl
 def find_boost(ctx):
     '''
     Searches for boost library
     '''
     pass
 
 
 
 
 def configure(ctx):
     HOME = os.environ['HOME']
     has_gmock = find_gmock(ctx)
     if has_gmock:
         ctx.env.GMOCK_INC = ctx.env.GMOCK_HOME + "/include"
         ctx.env.GMOCK_LIB = ctx.env.GMOCK_HOME + "/lib"
         ctx.env.GTEST_HOME = ctx.env.GMOCK_HOME + "/gtest"
         ctx.env.GTEST_INC = ctx.env.GTEST_HOME + "/include"
         ctx.env.GTEST_LIB = ctx.env.GTEST_HOME + '/lib'
 
     ctx.find_program(ctx.options.compiler, var="CLANG", mandatory=True)
     ctx.env['CXX'] = [ctx.options.compiler]
     ctx.load("compiler_cxx")
     
    
 
 def options(conf):
     conf.add_option("--compiler", 
                     action="store",
                    default="clang++",
                    help="compiler to use (clang++ or g++)")
     conf.add_option("--gmock",
                     type="string",
                    dest="gmock",
                    help="location of google gmock library or GMOCK_HOME env var")
     
                    
     conf.load("compiler_cxx")
 
 
 
 def build(bld):
     bld.recurse('src')  ## call build from ./src/wscript
 

Hopefully all of the above seems reasonable.  But if not, in essense the waf system needs to configure the environment in which to do the build, including checking for dependencies, and it has to actually perform a build.  The configuration of your project is done in the configure function.  When you actually run this phase you would call it like this:

    ./waf configure

Or, if you do not have gmock in a standard library location, you could run the configure stage like this:

  ./waf configure --gmock=/home/sean/Downloads/gmock-1.6.0  --compiler=g++

Currently, my build doesn't download and install gmock if you don't have it, but that's one of the nice things about waf.  If you wanted to, you could.  There are some other things it doesn't check for, like CMake (to build gmock), or actual verification of the directory passed in from --gmock.  Creating a dependency system is no small task (trust me, I've done it before), but it's an interesting one involving some fun computer science
skills (graph traversals, transactions, and cyclic detection for example).

The configuration is very similar to running ./configure in a typical autotools C(++) program.  It makes sure that your build environment has all the required development tools, and sets up any environment variables required for building.

Once your system has been configured, you'll obviously want to do a build.  This is the step that actually generates your binaries.  Technically waf can differentiate between debug builds and release builds, but I'm doing a release build here.  In order to do a build, you just run the command like this:

    ./waf -v build

The -v is just a verbose flag, but it's useful if a compile fails.  The actual binary gets generated in a folder that you specify in your wscript from a variable named out.  It will generate sub-folders depending on the location of the recursively called wscript.  For example, the first level wscript calls ./src/wscript.  So the actual binary gets put into ./build/src.  If you had your second wscript in ./source, the binary would be in ./build/source for example.

So speaking of the second wscript, what does it look like?


 top = '.'
 
 import os
 
 def build(bld):
     '''
     Generates the node_test executable
     '''
     PRJ_DIR = os.getcwd()
     INCLUDES = [".", "..", 
                 PRJ_DIR + "/includes", 
                 PRJ_DIR + "/templates",
                 bld.env.GMOCK_INC,
                 bld.env.GTEST_INC]
 
     LIBPATH = ["/usr/local/lib", 
                bld.env.GMOCK_LIB,
                bld.env.GTEST_LIB]
 
 
     bld.program(
        source = 'gtests/main.cpp',
        target = 'node_test',
        includes = INCLUDES,
        lib = ['pthread'],
        libpath = LIBPATH,
        stlib = ['gtest', 'gmock'],
        stlibpath = ['/usr/local/lib'],
        cxxflags = ['-Wall', '-std=c++11'],
        dflags = ['-g'])
 
     bld.program(
        source = 'gtests/bintree_test.cpp',
        target = 'bintree_test',
        includes = INCLUDES,
        lib = ['pthread'],
        libpath = LIBPATH,
        stlib = ['gtest', 'gmock'],
        stlibpath = ['/usr/local/lib'],
        cxxflags = ['-g', '-Wall', '-std=c++11'],
        dflags = ['-g'])
 
     bld.program(
        source = 'gui/firstSFML.cpp',
        target = 'ogl-gui',
        includes = [PRJ_DIR,
                    "/usr/local/include"],
        lib = ["sfml-graphics", "sfml-window", "sfml-system"],
        libpath = ["/usr/local/lib"],
        cxxflags = ['-Wall', '-std=c++11'],
        dflags = ['-g'])
        

Hopefully, that doesn't look too bad if you're familiar with makefiles.  Currently, none of the programs require other object files, mainly because I am using templates.  Because I am using templates, I have to include the source rather than generate an object file that another object uses.  It's just one of the limitations of templates, and perhaps somewhat unintuitively, I have made my templates .hpp files rather than .cpp files (since they are getting included into other source rather than being compiled and linked against other objects).  If you do have objects to compile, you can use bld.object() instead of bld.program(), and other objects that use it would include a keyword argument called uses and the value would be target.  For example:

bld.object(
    source = "some_file.cpp",
    target = "some_obj",  ## generates some_obj.o
    includes = INCLUDES + ["includes/some_file.hpp"],
    lib = ['math'],
    cxxflags = ['-Wall', '-std=c++11'],
    dflags = ['-g'])

bld.object(
    source = "another_file.cpp",
    uses = ["some_obj"],  ## it links against some_obj.o
    cxxflags = ['-Wall', '-std=c++11'],
    dflags = ['-g'])
   

Just as the make program builds binaries, sometimes you want to clean everything.  In waf, you just specify:

  ./waf clean

And it will delete the contents (recursively) of your build folder.  There's also a distclean, which also forces you to run configure again.  This is because the waf build will cache some data to prevent rebuilding everything.

So that's it for now on waf.  As I make improvements to waf, I'll cover them in a future blog post.

Thursday, May 30, 2013

The way of test

Here's another gem I found in my posts just in a draft state.  Not sure why I never released this one before, as it was basically complete.



Here are some of my own personal beliefs about testing and the way of test.  I have learned from talking with others who worked in former companies, and it would appear that everyone has different ideas about how to test, what to test, and what constitutes "automation".


1.  Adhoc is superior to automation in finding bugs
2.  Don't let developers tell you how to come up with tests (fox guarding hen house?)
3.  Don't get lost in testing types (unit, integration, functional, etc).  The end goal is to find problems
4.  Test to fail, don't test to pass
5.  Development driven testing is the flip-side to Test Driven Development and just as important
6.  Test programs must follow a protocol
7.  Test programs must be versioned
8.  Test programs must be code reviewed
9.  Test Engineers shouldn't throw failures over the wall to developers without a first triage
10.  Test Cases must be repeatable
11.  Don't get hung up on writing tests before code
12.  Don't treat Test Engineers like testers

1.  Adhoc is superior to automation in finding bugs
Unfortunately, test automation has become a buzz word, and managers think that it will be a panacea to all their testing problems.  But the reality is that automation only applies to a fraction of a test plan, and that automating really only helps you find regressions.  It does save time however, and that time savings should be used to do more adhoc testing.  But trying to automate everything is often a waste of time.  Instead, it is often more advantageous to create a framework that allows for rapid exploratory and adhoc testing in which what was done can be recorded (and thus repeated).

2.  Don't let developers tell you how to come up with tests (fox guarding hen house?)
Sadly, many Test Engineers don't truly understand how the software (firmware or hardware) is supposed to work, and they rely on too much technical information from the developer.  What should happen is that there should be a specification (that both the developers and test engineers read).  The spec is your bible.  By knowing the spec, you know the inputs and outputs, states and transitions of the system.  From this, you don't need a developer to tell you how to test something.  When a test engineer simply parrots back what a developer said his code is doing, that's not testing.

3.  Don't get lost in testing types (unit, integration, functional, etc).  The end goal is to find problems

I see higher level types (managers or architects) falling into this trap due to seeing this in the abstract rather than dealing with testing "in the trenches".  Generally speaking, everybody should unit test.  Anyone who writes code needs to unit test their stuff (including Test Engineers, SDET's or Automation Engineers).  While some QAEs do black box integration tests and Developers should work on Acceptance tests with the customers, Test Engineers tend to be grey or white box functional/integration testers.  But the bottomline is that a test organization tries to produce higher quality more robust and efficient systems by minimizing the number of defects.

4.  Test to fail, don't test to pass
Unfortunately, there is often pressure with a Test department to make sure that a test passes.  Many organizations have rules where only so many defects of a certain severity can exist before being launched, or moving to a new phase.  And rather than risk the wrath of management to introduce another bug, little issues are swept under the rug by not testing certain things.  This can also manifest itself simply by laziness.  When a test passes...the tester or Test Engineer is done.  However, if the test fails, he will be given many dev drops to try out, which takes time.  Far easier to just "pass" a test than to fail it.  The solution to this is of course to reward finding bugs and defects.

5.  Development Driven Testing is the flip-side to Test Driven Development and just as important
This is perhaps a new concept and might need explaining.  In TDD (Test Driven Development), the tests are written before the actual feature is implemented.  In Development Driven Testing, the code (or at least its behavior) should be understood before the test is written.  While simply validating against a spec is important, if that is all you go by, you will not catch any exceptions.  Specs are not code, and are thus by nature, informal.  Only code is "written in stone" so to speak and thus provable or testable.  You can't "test" a spec (at least not directly).  Without understanding the code, you won't be able to find weak points, like invalid inputs, invalid access to globals or shared data structures between threads.

Also, if you don't understand how it works, how can you verify it?  Some tests are simple, but some can be very complex.  For example, the validity of a result might rely upon more than a single return value (for example, a C/C++ function might have a return value, but it might also store data in some pointer or reference which must also be examined).  Programs that update a database might insert one correct record, but it could insert 2.  Checking for this requires an in-depth knowledge that simply looking at a return value can not provide.

6.  Test programs must follow a protocol
What I mean by this is that if your Test Engineers, Automation Engineers and SDET's all write test programs in wildly different ways, the company will pay for it.  For example, does your test program simply use command line arguments?  Is it a GUI, and thus can't easily be automated?  Does your test program read a configuration file, and if so, is it in XML, YAML, or a plain old INI style file?  Test Programs must likewise report success and failure in a standardized way.  Without standardizing on a way to find, install and run a script, your Test organization will be in pure chaos.


7.  Test programs must be versioned
If your test programs are not versioned, how will you ever be able to do regression tests?  Or what if a OEM or customer wants you to reproduce an issue they are seeing with older software?  Test programs are software, and fall under the same software engineering principles as the actual code.  Furthermore, versioning should not be an afterthought.  Many headaches can be caused by poor forethought about how to version a product.  How will you deal with "forks" of code?  What about customer specific versions?  How do you specify release versions versus debug versions?

8.  Test programs must be code reviewed
Most enlightened companies understand the benefit of code reviews.  Having code reviews catches bugs early, and the earlier you catch them, the better off you are.  Also, it helps familiarize all the developers with everyone elses work.  And finally, I have noticed that it tends to help evolve a "group style".  All programmers have their own style, but sometimes they are so widely divergent that it makes reading code harder (and that's why having Coding Guidelines is helpful, though I don't consider it mandatory).

9.  Test Engineers shouldn't throw failures over the wall to developers without a first triage
Unfortunately, there seems to be no common standard for the difference between a QA Engineer, a Test Engineer, or a Software Engineer in Test.  However, most companies tend to have a black box test group vs. a white box test group.  For the pure black box group, they are in effect an internal customer.  The internal test group however should do both black box integration, functional testing AND white box dev testing.  When a defect occurs, at the very least, a Test Engineer should ensure that it wasn't a low-level issue (bad hard drive, intermittent network, invalid configuration or environment variables, etc), or their own script that caused the problem.  Better yet, they should dig deeper into the code.  This will help by giving the defect to the correct development team (for example, at my work, it might be a driver, controller firmware, or expander firmware problem).  

Unfortunately, there are people who believe that all the Test department has to do is check test cases off of a test plan.  Debugging an issue takes time and thus in their opinion, is not the problem for the Test Engineer.  For white box (internal) test groups, this is a waste.  I posit that you can't truly know how to test something unless you know how it is supposed to work.  Black box testing alone is not sufficient, because they will never see bad or wasteful code paths, nor will they have deeper insight into how to test what was coded.


10.  Test programs must be repeatable
Normally, this is a requirement for automation, but it should apply to ad-hoc testing too.  Doing an ad-hoc where you can't remember the steps or parameters you passed into a program or function are kind of useless.  Ideally, ad-hoc tests become regular Test Cases, and a program should be written to cover it.  If you design your testing framework with repeatability in mind, you are halfway there.  An even better solution is a kind of macro recorder.  This is where using a language with a shell (REPL) is awesome.  If you could record the commands issued from the shell, even ad-hoc testing can become automatable.

11.  Don't get hung up on writing tests before code
Just as many Test Engineers don't truly understand software engineering, many software engineers don't understand testing.  One area that I still struggle with is writing your tests before your code.  I understand the reasoning:  if you write your unit tests first, you have effectively created a formal requirements and specification, plus, how do you know if what you build works?  But this presupposes that you absolutely know what exactly it is that you are designing and building.  In my experience, software is an exploratory affair.  How can you test an experiment?  In science, you perform an experiment and then try to validate it with a theory.  I don't see some types of software design as being too different.

That being said, when applicable by all means do TDD.  It can help you better design your interface, because unless you write a test you may not even realize you need to expose something to validate the result.  It also guarantees that you will have unit tests at least to some degree.   This is better than the "I'll just write unit tests when I have time", because it usually becomes very unlikely that time will become available later.


12.  Don't treat Test Engineers like Testers
Not to denigrate testers, but if you use your Test Engineers simply to execute Test Cases and don't give them the time to debug and investigate the issue, or the time to write a script to automate their test cases, you may as well have hired a tester for half the salary or less.  Some Test Engineers like that though, and you need to discover which Test Engineers just like to manually execute tests, and which ones prefer to automate test cases, write test tools, or dig into the code to figure out what's going on.

Tuesday, May 28, 2013

C++11 multi-threaded data structures and algorithms using waf: Part 1

I decided to kill four birds with one stone.  I realized how rusty my data structures and algorithm analysis knowledge is, so I decided to start writing a project going over some basic and advanced data structures (I'll start with some advanced linked lists, binary search trees, heaps, hash maps, and move on to more advanced structures like RB trees and Tries). 

The thread-safeness is the second bird.  Even the "basic" data structures will be advanced though, because I intend to make all these data structures thread-safe. I bought the book "C++ Concurrency in Action" by Anthony Williams and am on Chapter 3 now.  Although I'm relatively familiar with the synchronization features for the linux kernel (mutexes, spinlocks, semaphores, softirqs, tasklets, workqueues, etc), I need to learn this from an application level.  Writing multi-threaded capable software is notoriously tricky, but it's good for thinking about how everything runs as a whole.

The third bird I am going to kill is the C++ language itself.  C++11 feels like a new language with lots of nice features now.  With the addition of lambdas, I can write in a more functional style, and the auto and using keywords will make using templates easier. I've been spoiled by dynamically typed languages, and even Java.  That doesn't mean I think "thinking like the machine" is good or makes you better.  Unfortunately, it's a conceit I see all too often in low-level programmers.  My motivation to (re)learn C++ is to use LLVM.  In fact, thinking too low-level can be bad.  Why?  Edsger Dijkstra once said, "Computer Science is no more about computers than Astronomy is about telescopes".  Knowing about low-level details means you understand a concrete implementation, but what if the implementation changes (for example Quantum Computers or Lisp Machines)?  Just like how encapsulation in OOP protects you from implementation details, going too deep distracts you from solving whatever your domain problem is.  But if you're writing in a native language, then the machine-level implementation details are often unavoidable (for example, in kernel land, you sometimes have to be concerned even with the Out of Order execution abilities of the processor and put in memory barries, and the new C++11 multi-threaded memory model can also require thinking about this).

The final bird to kill is to learn a new build system.  I've actually already started writing a blog post about it already, so I'll just briefly mention it here.  For me personally, one of the most confusing and irritating parts about writing native code is the build system.  I hate Makefiles.  They are hard to debug and unless you use cygwin or mingw, you can't port it to Windows.  That being said, the build system I am using can generate Visual Studio solution projects, but I think it's better to have everything built by the same compiler.  In this case, I'm going to use mingw.  Why?  For starters, Eclipse can work with mingw and gdb, and thus you need only one IDE (or if you're like me, you can use gdb integrated with emacs).  It also means you only have to learn one ABI (ELF as opposed to ELF and COFF for windows...or whatever it is they use nowadays in Visual Studio).  There are other advantages to the build system I'm using (over CMake) and I'll post them in another blog.


Check out my next post where I'll go over the build system and the preliminary code I have so far.

Ultimately, I will post this project up on my Bitbucket account.  As with my other projects, this is essentially a self-guided tutorial.  I put these projects up so that:

1) I can reference them for future use
2) Others can see what I did
3) Others can use it for their own needs

It'll be a BSD type license.  So feel free to do with it as you please.


Thursday, May 23, 2013

More functional style python: decorators

Although I'm no longer the Test Scripting Lead at my job, I still get a lot of questions about python, especially since many of the engineers at my job are new with python.  I thought it would be a good idea to put down a lot of this code so that there's a permanent repository of it.

I also decided that I wanted to get better at functional style programming, so I'm still learning (and teaching to others) a more functional approach to programming.  I still have a long way to go, but hopefully this will help others who are also trying to learn a functional style from a multi-paradigm language like python.

Decorators:
Decorators are a nifty concept in python which in a nutshell are functions which take a function as a parameter, and return a modified version of the function.  It's kind of a fancy wrapper with syntactic sugar thrown on top.  I'm only going to show function decorators, though class decorators are possible too.

271 def assertKwds( key_reqs ):
272     '''
273     Tests that the passed in keyword args match what is required
274     
275     For example, if the function definition is (age, employed=False)
276         then key_reqs would be { "employed" : type(bool) }
277     
278     *args*
279         key_reqs(dict)- is a dictionary of keyword to type.  
280         
281     *usage*::
282         
283         @assert_kw( { "employed" : type(bool) }
284         def somefunc(name, employed=False):
285             if employed:
286                 print "{0} is employed".format(name)
287                 
288     '''
289     def wrap( fn ):
290         def wrapper(*args, **kwds):
291             ## check keyword args.  Also check that we didn't make a 
292             ## faulty assertion error with a bad key_req
293             failed = False
294            
295             try:        
296                 for k,v in key_reqs.items():
297                     if type(kwds[k]) != v:
298                         msg = "Invalid type {0} for keyword {1}. Should be {2}"
299                         print msg.format(type(kwds[k]), k, str(v))
300                         failed = True
301                 if failed:
302                     return None          
303             except KeyError as ke:
304                 msg = "Faulty assertion. keyword arg {0} does not exist"
305                 print msg.format(ke.args[0])
306                 return None
307             
308             return fn(*args, **kwds)
309         return wrapper
310     return wrap

So this is a decorator that can be used to check keyword args.  Let's see how you would use this function


624     @assertKwds( { "name" : str, "company" : str, "years" : int } )
625     def showWorkInfo( self, name="Sean", company="Wonderland", years=0):
626         msg = "{0} has worked at {1} for {2} years"
627         self.logger.info(msg.format(name, company, years))
628         return 1

Here we have defined a function showWorkInfo, but what's that funny @assertKwds on top of it?  That's the special decorator syntax.   Lets look at that example above.

On line 624, the showWorkInfo function and its arguments is passed to assertKwds.
On line 290, the arguments passed to showWorkInfo are examined in assertKwds
On line 296, the arguments passed to assertKwds are used to compare against the args from showWorkInfo.

If any of the arguments don't match the type requirements, then we don't even call the function and return None.  Otherwise call and return showWorkInfo (line 308).


People tend to think of passing in functions to functions as something you do for a callback.  But you can do other things than callbacks.  In functional programming, it is not uncommon for a function to modify a function.  For example, partial applications can be done so that if you have a function that takes 3 arguments, but you only have two arguments ready, you can return a function that only requires that one other argument with the two others fixed.  You can also perform currying, where an argument that takes several arguments can be morphed into a chain of functions each with only one argument.

Decorators are kind of a poor-man's macro.  They allow you to inspect arguments, modify arguments, modify (or even create) new functions, modify return values, or handle returns.  This example showed how to check the arguments and then call the function.  But you could (just a small list):

1. Check args, then conditionally call function
2. Check args, and conditionally modify args, then call function
3. Check args, conditionally modify args or modify the passed in function itself
4. Generate a new function dynamically based on args
5. Call function, and depending on return value, modify the return value
6. Call function and trap exceptions

Perhaps this last one caught your attention?


252 def genericExceptCatch( extype, handler=None ):
253     '''
254     This is a very handy function that will wrap the exception handling 
255     here instead of the function itself.  This makes for much cleaner
256     code, and the decorator makes it obvious what kind of exception
257     might get thrown
258     '''
259     def wrap( fn ):
260         def wrapper(*args, **kwds):
261             try:
262                 return fn(*args, **kwds)
263             except extype as ex:
264                 declogger.info("Error: {0}".format(str(ex)))
265                 if handler:
266                     return handler(*args, **kwds)
267                 else: return None
268         return wrapper
269     return wrap

And here is an example of how to use it.

656     @genericExceptCatch( KeyError )
657     @genericExceptCatch( AttributeError )
658     def twoExceptions(self, mydict, myobj ):
659         print mydict["somekey"]
660         myobj.nofunc()

Can you see what this is doing?  If you get an exception, then it will call a handler that takes the same arguments as the called function.  This allows you to move exception handling outside of the function that can throw the exception.

When I first started writing decorators, I worried about methods in a class versus regular methods.  But I realized that the code above will work with either.  The only trick is that for member functions, you may sometimes want to look at args[0].  Remember, self is the first thing passed to a member function, so you may need to look at the value of args[0] from *args.