Never Test Before 4

Kind of a silly thought, I know, but it keeps coming back.

I work in a small agile development environment. Development works according to 2-week cycles to complete chunks of code. I keep noticing that anything prior to Cycle 4 or 5 is usually incomplete and unstable for testing. The first several cycles are when all the foundational architectural changes are usually happening.

So we can never really test before (cycle) 4. That's fine. I've got these Ruby scripts to keep me busy in the meanwhile. =)

Observation on the Proofreader Effect

I've been working on some performance test scripts using Ruby (Watir actually) over the last few weeks, and have been happily rewriting the scripts I first wrote a year ago. (Programmers call this activity refactoring.) I've learnt a lot about Ruby and scripting web apps over the last year. One of the biggest helps came when I read Brian Marick's new book "Everyday Scripting with Ruby". Thanks to that book, my performance test scripts are really slick now and look more like a programmer wrote them. But I digress..

The thing that I've been thinking about over the last few days is the problem of testing the scripts that I've written. Any good tester would never trust a programmer to write error-free code, so why should I trust myself to? But then who should test my scripts? Well, there really isn't anyone else around who can right now so I have to do it myself. Is that a problem? I don't think so.

I'm the biggest stakeholder who cares about these scripts working correctly, while my boss is mostly interested in the numbers and my analysis. So I ran the scripts and worked out the kinks one section at a time until I was able to run them straight through several times without error.

Is that good enough testing? Well, I got the coverage and measurements I wanted, so I guess so. The scripts don't have to be perfect, they just need to give me the data I need. So, it's all good.

Right. I completed the analysis for this run and then started to compare the numbers against the benchmark numbers from last year. It wasn't until several hours later that I noticed a typo in the output. Eek!

I'll just sneak back into the code and fix that. No one saw that. I'll just re-run the scripts and make sure the output looks "clean" this time. Great! Looks fine now.

So how did I miss that typo? I thought about this for a while. I think the proof-reader effect is like a FIFO buffer. That is, I don't think I could have seen this bug until I got the other bigger bugs out of the way.. you know, like the ones that prevented the script from completing or collecting the data I needed in the first place.

First in, First out. Get the big ones out of the way and then I can see the smaller ones that were hiding underneath. The typo was always there but I was just temporarily blinded to it because my attention was so focussed on the bigger fish.

So was I unqualified to test my own code? I don't think so. I caught all the bugs I cared about. It just took me a few days to find them. Would a separate tester have found the typo before me? Maybe, maybe not. The FIFO effect only affected *my* ability to see the little things until the bigger ones were out of the way because I was the one who wrote the scripts. A separate tester would have a different perspective and shouldn't be affected by this FIFO/Proofreader Effect in the same way.

We do Exploratory Testing almost exclusively on our products. When I test, I don't see the same effect happening to me. It's just a matter of time until I get to a feature or page and then I hit it like a whirlwind and move on. It's quite cool and effective. Defect finding rate starting to slow down? Switch to another Test Technique - voilĂ ! More bugs. All the Risks addressed? Move on.

I've seen a number of conversations happening on some of the message boards questioning whether or not a programmer is able to test his or her own code. After this recent experience, I think if the desire is there and there is enough time, then yes, she should be able to find all the bugs that matter.

Once again, a separate pair of eyes not constrained by the FIFO effect would likely speed up the process. Nothing we didn't already know. A Tester helps you to find the bugs that matter sooner rather than later. Well, a good one will anyway.

Sometimes "Good Enough" isn't good enough

I've been a big fan of the idea of "Good Enough" software testing over the last decade. Rather than thinking that the problem of doing good Software Testing is akin to "Digital" technology with it's complete, precise values, I've thought of it more like "Analog" technology with the big dials and reasonable, approximate (and cheaper) signals.

This past week, I've watched my seven year old son play with a new LEGO set that he got for Christmas. It's a neat mechanical lego set that lets him build a motorised helicopter, cool car, or attack crab thingy. (ASIDE: I can't begin to imagine what the Marketing team's conversation was like when they thought up that last one!) I noticed when he completed the helicopter and turned on the motor, that it didn't sound right to me. So I went over and took a close look at his creation. It looked correct. There didn't seem to be any missing pieces, but when he turned it on again, I noticed that not all of the gears turned together consistently. I picked it up and took a really good look at it. Not knowing much about how it was built, I just randomly squeezed together lego pieces that weren't tightly packed together whenever I came across them.

There was one set of lego pieces that had a gap of about a millimetre. When I squeezed them together, it made a (good) snap sound. I asked my son to turn on the motor again and this time it not only sounded correct, but the gears all worked together in perfect synch also. Voila!

I thought about this for a few moments afterwards. Up until then, my son had worked on the premise that if the lego pieces were reasonably attached, that it was "good enough". He didn't need to have a tight fit between every single piece to see the finished product. I mean, it looked like the complete picture of the helicopter in the instruction manual, so what difference would a small gap between a few pieces make?

In this case it made a big difference. If it needs to work like clockwork, then "good enough" is probably not enough.

So what's the tie in to Software Testing? Well, just how scalable is the "Good Enough" approach? For me, it's always been about testing to the most important Risks and using whatever tools and techniques seem appropriate to the situation at hand. It's always seemed kind of foolproof to me.

Maybe my Digital/Analog analogy is a flawed one. I mean, Analog technology has its limits and is not very scalable. Digital technology is more precise and can handle more information. Is there a point when a Digital solution gets so large that it requires an Analog approach again? (I think the answer here is 'yes.')

Is there a time when "good enough" needs to be replaced with a more complete, structured or methodical approach to software testing? I can't think of any situations like that right now, but that doesn't mean there aren't any. That is, I can't think of a time when I wouldn't want to say that good software testing has to strike a balance between the economics, quality and time to market for a product or system. Shipping with bugs is okay if you know that they aren't critical or life-threatening.

So perhaps "good enough" doesn't always apply when we're dealing with real-world objects like lego creations, automobiles, watches, et cetera. I think that it still holds pretty well to the virtual world of software testing. Until someone can give me a good example or two of when "good enough" wouldn't be good enough for testing software, I think I'll chalk this up to another distinction between testing software and testing hardware.