Monday, August 15, 2011

Training others

I've come into the responsibility of training some new, and not so new people how to program.  Right now, I am teaching them the basics  of the language we are (mostly) using, but I am also trying to teach them some of the finer points of software engineering that I had to learn from experience.  Some of the people I am training don't have Computer Science or Software Engineering degrees, but do have Electrical or Computer Engineering degrees.  So I'm trying to impart just some general guidelines on writing decent code.


Working as a team-
Many of the other points I will cover below have this as a root element to consider.  When I went to school, I had a grand total of two group projects, only one of which actually had any code to it.  That's totally unrealistic in the real world.  The fact is, your code and your work does not live in isolation.  Your code should be readable by others, they should know where to obtain your code, you should not duplicate an entirely new library that someone else has built (though you can make enhancements or improvements to it), and you should document your code so that others know how to install and use what you created.

Revision Control-
I have found it amusing that at 2 different workplaces, the Electricial Engineers were somewhat in arms over having to learn a revision control system, and yet the CS people were more fascinated by it.  Unfortunately, when I went to school, they weren't teaching anything about revision control systems, much less why you would need one.  And sometimes you do have to explain to someone why you would need one.  But without revision control, how do you experiment with your code?  How do you tag your code so that you can replicate an issue a customer is seeing?  How do you distribute your code so that others can see it and possibly make enhancements to it?  Many engineers are frightened when they first attempt to use a revision control system, because they are afraid they will jack up someone's code base.  Also, some revision control systems are easier to learn than others (I personally am finding git far harder to learn than mercurial).  But these are small drawbacks compared to what a revision control system provides

Code Reviews-
Many engineers are scared of code reviews when they first start.  Throughout school, you are ingrained not to share your code with others, and as a consequence you don't have to worry about what your code looks like.  But once a new engineer accepts the fact that many eyes will be seeing his code, this alone changes how he writes (or at least will change once several comments come in).  But code reviews are necessary because they  are the next thing to find bugs after your unit tests.  I also remind that when someone is a reviewer, that they should actually try to understand the code, rather than look for just superficial things like coding standards.  This takes more time, but I believe that it improves your own coding as well as helping the one being reviewed.

Unit testing-
Usually in the madhouse rush to get something working, testing is thrown to the wayside.  I am NOT a proponent of TDD, where you write your tests before you actually write your feature code, but one should eventually write tests for their code.  When using a dynamic language, it is often necessary to check that the type of arguments passed in is correct.  Make sure you write lots of negative tests too, because it's rather embarrassing to discover that invalid inputs makes your function return a supposedly valid result.

Reusability-
I usually give an anecdote for this.  Imagine that you write a script that performs some functionality for a test you have.  Later, you are tasked with a very similar problem, and so you write a 2nd, albeit slightly different script.  And then you do so later for a 3rd and 4th script.  But then, some new functionality in a library your scripts uses changes.  Perhaps a new product is made which requires a different parameter to be passed in.  Now, you have 4 scripts, and you have to go in and change all 4 of them.  Always try to isolate code that could possibly vary and keep it in a library, class or module of some sort.  I try to stress writing functions over writing scripts so that I have only one or two scripts, whose behavior changes depending on the arguments that get passed in.


DON'T copy and paste-
Also known as DRY (don't repeat yourself), copying and pasting code is BAD.  Why is it bad?  Because when you copy and paste code, you copy and paste bugs.  And when you need to make an enhancement to your code, every place you copied and pasted now has to be fixed as well.  As obvious as this one sounds, I am amazed in code reviews how many people simply copy and paste functions or worse, parts of functions into other functions.


Keep it simple stupid-
General Patton once said, "Don't give great orders.  Give orders that can be understood".  As mentioned earlier, code is read more than it is written.  If your code tries to get too fancy, you might want to make it easier for others to understand.  Of course this has its limits.  If the most efficient code  is complex, don't be afraid to do that, just comment the heck out of what your code is doing.


Avoid functions that return void-
This sort of goes along with unit testing, or perhaps testing in general.  Functions that return void are usually either mutating state of some object (either of itself, if this is a method, or of some argument that is passed in), or they are impure, and only have validity for some side effect (for example, updating a database, or printing to a log file).  The trouble is, how do you test this?  If the function is a method of an object, and it mutates some field in the object, now you need a second function that has to be called to make sure the field in that object is correct.  But what if this is a multi-threaded program?  It is entirely possible that another function can change the state of the object before your test function gets a chance to run.  Now you have to write some locks to make sure this is correct.  All of this can be avoided if you simply return some values and then you can check those values (which although the data might be stale, was valid at the time of the original function call).

Document, document, document-
One of the reasons that python has such elegant syntax is that Guido Van Rossum had the insight that code is read far more often than it is written.  An engineer should make it even easier for people to understand your code by making copious documentation.  Now, one shouldn't comment the obvious, but if anything might be even remotely unclear, it's a good idea to comment for others (and yourself!!) on what your code is trying to accomplish.  Also, learn the markup tool of choice for your language (doxygen, sphinx, doxia, javadoc, etc), as being able to publish a pdf or to have the documentation in html format is really really nice.

Use the debugger as a last resort-
This is where a lot of people may disagree with me, but to me, debuggers are the big guns of troubleshooting. Prefer loggers to debuggers when possible.  For example, in C or C++, debugging macros or templates is very difficult.  Loggers on the other hand can expand the macro for you, and you can also print out any genericized object.  An exception to this is when you are learning someone else's code, and you want to figure out what is going on.  Very complex code almost requires this.

Optimize AFTER your code works-
Unless you know a good algorithm right from the beginning, make something work first then make it faster.  However, do keep in mind the following:

1.  Nested for loops are almost always a bad sign (n^k runtime efficiency where k is # of nested loops)
2.  Sorting a data structure is usually more efficient than trying to find something randomnly (nlogn)
3.  When using recursion, watch out for potentially huge values being passed in (which will blow your call stack)
4.  When using recursion, watch out for function calling itself more than once (ie, fibonacci...n ^ k big O of n).
5.  Don't be afraid of recursion.  Yes, it pushes a new function on the call stack and thus is slower, but often, recursive solutions are easier to understand than an equivalent for or while loop.
6.  Be wary of cyclic data structures or potential ones (eg, a linked list where one node points to a previous node).  Your code might work on a non-cyclic data structure, but a cyclic one might make you spin forever or blow your call stack away.


No comments:

Post a Comment