Sunday, April 1, 2012

A Testing Manifesto for Hardware companies Part 1

I was looking through some of my posts and was looking at some of the "drafts" that I never published.  I thought I had published this one earlier, but apparently not.  I wrote this draft about a year ago, but I thought it should see the light of day :)

...

I've been an SDET at my company now for about 3.5 years, and I've either seen other companies or divisions and their test strategies, or have talked to other SDETs and Test Engineers from other companies and received an idea of what their company's old test strategies were like.  I have since come to several conclusions regarding how testing is done, and how it should be done.  What I will write here is primarily of interest to the managers of Testing departments in hardware oriented companies as well as Test Architects, but engineers in the Test department should also find some use of what I shall say.

First off, let me begin with what testing should be:

  1. Even Hardware-centric organizations require enterprise techniques
  2. Hardware-centric organizations need to use fundamental tenets of good software engineering
  3. Use new but mature technologies suited for the task at hand
  4. Test Engineers are "true" engineers and should be treated as such
  5. Managers (Test and Development) need to understand what testing requires
  6. Don't mix up white box, black box, and acceptance testing
  7. Test Engineers and Developers have to work hand in hand
  8. Unit tests should be written by the developers (no "dev test")
  9. Requirements gathering should be an ongoing process
  10. Continuous Integration and Deployment is a must


Is your department not exhibiting some or all of these?  Perhaps you don't understand some of what I am talking about?  Or maybe (gasp), you think even if your department isn't exhibiting one or more of these traits, that it isn't important?  So, let me go into a little more detail into each of these issues, and explain why not following the above is harmful to your organization.  Then I will discuss just a few ideas on how to make sure your group is following the above.

Hardware oriented organizations don't understand enterprise level computing

Ok, I know, "enterprise" computing itself doesn't have a definition that's exactly entrenched in stone.  But if your SCM or Process group doesn't understand "Software as a service", "distributed computing", or web service technologies (or even doesn't understand remote services or remote procedure calls), then I submit that your organization doesn't understand enterprise level computing.  When  I say enterprise, I don't necessarily mean high-volume, high-transaction computing environments, but I do mean remote, distributed computing, with at least some level of persistence and data tracking/mining/relationships.

Even should you understand what enterprise computing is about, how would this benefit the test department?  Think about what a test group does.  It creates tests which are designed to expose defects in the hardware (or the software that controls the hardware).  There are many hidden assumption in this seemingly easy enough responsibility.  You should immediately think of the following  aspects to this:

  1. How are you reporting the results? (are you able to do data mining on results of tests?)
  2. How are users finding the test tools they need? (given a test case, is there a test tool for it?)
  3. How are users installing the test tool? (if they found the test tool, how do they install it?)
  4. How is a user supposed to know how to run the tool? ( Do you maintain elaborate documentation?  What arguments are supposed to be passed in for Test Case A versus Test Case B?)
  5. How are you discovering systems that can run tests? (Can you find systems not in use programmatically, so that you are executing 24/7?)
If your organization isn't linking TestCases to test tools, then I submit that you are in chaos.  If your organization isn't storing results of test runs automatically somewhere (hopefully a database of some sort), then you are missing great opportunities.   Being able to know what features in the hardware or software are associated with what test tool (and any other metadata required) is absolutely essential.  Think about what happens if you don't have this linkage.

Tester- "Hey Sean, is there a script for this test case I got assigned?"
Test Engineer- "What test case is that?"
Tester- "Ummm, let me see, it's ID 00716459"
Test Engineer- "Oh, that's the one to make sure the ioctl in the driver doesn't time out right?"
Tester- "yeah, but is there a program or tool for that?"
Test Engineer-"Yeah there is, let me go find the script on the common share drive"
Tester-"I already kind of looked there..."
Test Engineer-"Did you look under the Sean folder?"
Tester-"Yeah, but there were a couple of scripts that had similar names"
Test Engineer-"Oh yeah...you have to use the one with -version.1.3.5 in it"
Tester-"Oh ok."
Test Engineer-"And did you make sure you installed all the prerequisites on your test machine?"
Tester-"such as?"
Test Engineer-"Well, first you have to install..."





If your engineers and testers are having conversations like this, you need to join the 21st century.  Your organization needs to do several things:

  1. God forbid you are storing test cases in an excel spread sheet.  Use a database to hold test projects with associated test cases and the test runs that testers execute
  2. Relate the test case to a test tool, including arguments to pass in and dependencies
  3. Have your test tools report failures, investigations, passes, or anything else you want back to the database, including tools used, arguments passed in, and system configuration/state
  4. Make installation of tools as pain free as possible (avoid thick client tools like GUIs)
  5. Make test tools remotely accessible

Hardware-centric organizations don't understand good software engineering practices

Ok, this is the one that will ruffle the most feathers.  I am not knocking Electrical Engineers, because they know lots of things I don't.  Don't ask me how a MOSFET works for example.  But time and time again, when I do code reviews that EE's submit, I see many violations.  If you are seeing this, please, try to educate your EE's on good software engineering principles:

  1. No copy and pasting of code.  Not just functions, but sections of functions
  2. No ginormous functions (500+ line function calls)
  3. No brute force algorithmic approaches
  4. Minimize tight coupling or low cohesion in modules (for you C(++) folks, that's a compilation or translation unit)  
  5. If using C++, don't just use C features (no namespaces, no inheritance or polymorphism, no encapsulation, no templaties, lambdas, type traits, range based for etc)
  6. No branching is done with source (except maybe rebasing during a new phase)
  7. Versioning is not haphazard or an afterthought
To be fair, I see a lot of CS and people with software engineering degrees make some of the same mistakes.  But in my opinion, the most egregious of the violations is #1 in the list above.  I know, your boss wants some feature done in an hour or two, and it's mostly already done somewhere else.  So the temptation is to just copy and paste the matching code into a new function.  The trouble of course is that you also copy and paste bugs into the code.

I have gotten push back on #2.  Some developers, thinking that test engineers are just a step above neanderthals, tried to cowe me into submission by saying that if they shortened their function calls, it would hurt performance.  I told them that, yes, splitting a huge function into sub-functions would necessitate pushing a new call onto the stack, but that for one, they could inline their code, and if they didn't want to do that (because inlining makes debugging harder) the benefits of logically breaking up the functionality outweighed the relatively minor performance hit.  For example, if I am forced to find your buggy function, I can do a backtrace and find the buggy 1000 line function monstrosity, but now I have to step through potentially 1000 lines of code to find the problem (and god forbid this is multi-threaded code).  I won the argument when the Software Architect joined my side.

I find it ironic that the developers are so focused on performance (such as their excuse above) but when I look at the algorithms in their code, they use brute force solutions much of the time.  One time, I even had to explain what a Red Black Tree and AVL tree was to one developer.  And though some EE's have heard of the Big-Oh notation, none I have met really ever took an algorithm analysis class before.  The algorithm is the first place you look for optimizations, not clever compiler tricks (inlining assembly or making variables volatile for example).

The 4th item is a bit harder to explain.  I have however seen some driver utilities which are just spaghetti code.  For example, if you have a GUI program, it's a good idea to use the MVC design pattern so that it's relatively easy to switch out the GUI for a CLI or even web-based View (the part that the user interacts with).  Instead, everything is so hard-wired together, that you can't usefully lift one class or shared library because of all the entangling dependencies.  Another symptom of this problem is if your device or software runs on multiple platforms, you have code that only works on some of them.  Granted, this usually isn't the firmware or driver developers problem, it's usually some other development group that creates a tool to interface with the driver or firmware.

I can't tell you how many times I've seen C++ code that really should have just been C code.  It looks like all they did was use a c++ compiler to compile C code (right down to extern "C" { } the header files).  When developers do this, they are missing on an opportunity to leverage some of the features that C++ brings to the table.  Granted, C++ is a beast of a language, and I don't blame developers for not exactly being thrilled with making everything a Templated class...nevertheless, namespaces and the basic OOP aspects of C++ do come in handy.  But on the flip side of the coin, if you don't need the added features of C++ don't do it.  Perhaps the biggest minus of C++ (and it's a biggie) is that making other languages (like python or ruby for example) use it is nearly impossible due to the lack of a standard ABI.  However, if you're going to do interop with Java, writing JNI in C++ is a natural fit.

Using source revision control correctly is a bit of an art, and I admit, I am guilty of not always using it effectively myself.  But far too often, I see code that never has any branches, and all development is done off the main stream/trunk/branch/view or whatever your VCS tool calls it.  This is of course BAD.  If you never properly branch, then users may inadvertently check out the wrong code.  Also, what if some OEM wants special or specific code?  This kind of segues into another misuse...

Versioning being an afterthought can be crippling.  In some places, there seems to be no rhyme or reason why a version changes, or even what the version numbers mean.  Versioning should not be something you think about after you write code.  It should be an integral part of the code itself.  For example, imagine you add a new function to an interface or a class.  If a user just uses the new interface, then your change didn't break the API.  But if the user has to implement from your interface, you've now made a breaking API change.  How do you let the user know that breaking changes have been made?


Test departments in hardware-centric companies are still clinging to circa 1998-2003 technologies 

Ah, this topic is my favorite.  Let me say right off the bat, if your company is still using perl for scripting, please, do yourself a favor and switch to something like python, ruby, or hell, even C or Java.  The second most important thing to do is to switch from thick client to distributed computing.  In a thick client model, all the programs reside locally on the system under test, and they can not or do not interact with any other resource that is non-local (this includes databases or even ftp servers).  In a distributed computing model, you can ask for a service from a machine, even if the machine you ask from is a different machine altogether.

Think about a scenario where your script or program has to control other machines, or even if should report results.  In the circa 2000 model, you use SSH to control another machine, and you write your result to a text file on a system.  In the 2012 approach, you use web services or remote procedures of some sort (SOAP, xmlrpc, JSONRPC, RMI, etc) to obtain services.

"Web services?  To test the driver or firmware of a hardware product?  Keep smoking whatever that is buddy...."

Yeah, I know what you are thinking.  But why do you think many hardware appliances have embedded web servers in them?  And even if your hardware doesn't embed a webserver, you can create layers above that can do this for you (at least on PC systems or more beefy embedded systems on a chip, including for example smart phones or tablets).

The first challenge test departments usually face is in picking a language to do their automation in.  My first statement to this is, don't pick a single language.  Secondly, don't get too caught up in automation that you forget ad-hoc or exploratory testing (where the real bugs are found).  A language like perl is a horrible choice if only because your only two real options to interface with C shared libraries is to use SWIG or XS (neither of which I really recommend).  Writing in C for the test group can, believe it or not, be a viable option.  But I recommend using python (with it's wonderfully easy ctypes), ruby (for its ease in creating DSLs), or Java to hook into C programs.

Java??  Java I recommend mostly because of all the enterprise level support that you get.  Perhaps you are thinking that Java stinks at low-level native OS operations...and for the most part you are right.  Using JNI is very difficult and time consuming, but it does allow C to Java callbacks.  Wrapper implementations like SWIG, HawtJNI, JNA or Bridj remove some of the complexity, but they become a bit of a black box and it's hard to call from C to Java.  So python has a much nicer FFI than Java with ctypes, but it does lack some of the enterprise features that Java has.

I will discuss bullets 4-10 in the next installment.

No comments:

Post a Comment