Defect Detection Effectiveness
Defect Detection Effectiveness
In early 1999 I gave my first conference presentation (to a BCS SIGiST in London) and its subject was metrics for test management. I’d been inspired by Bill Hetzel’s 1993 book “Making Software Measurement Work” and one of the things I’d found there was a way to measure how good we are at finding bugs before the software goes live. Well, if that isn’t the most important thing that any test team lead should know, it’s certainly right up there. Hetzel called it “defect detection efficiency” but, as it doesn’t take resource consumption into account, that’s not accurate so I called it “defect detection effectiveness (DDE)” (ok, I won’t start ranting about US vs. British English here, but it doesn’t take much to get me started about that).
At the same time, Mark Fewster and Dorothy Graham were writing a book called “Software Test Automation” (published in 1999 by Addison-Wesley, ISBN 0-201-33140-3) in which they independently identified this metric as being of great importance and called it “defect detection percentage”. It is a percentage, as you’ll see. Mark has since been kind enough to admit that my more descriptive name for it might be an improvement, but much more important is what it does and how it works.
DDE is the quantity of defects found by a test activity as a percentage of all defects that could and should have been found by that activity.
It can be measured for the test effort as a whole or for any part of it, e.g. a test level (system or acceptance testing) or a test type (integration or performance testing).
For example, the DDE calculation for the test effort as a whole is:
qty of bugs found in test * 100
qty of bugs found in test + qty of bugs found in live
Now, after looking at that formula you’ll have realised that It can be calculated only in retrospect, after later activities have found the defects that escaped from the one being measured. For a project / product as a whole, it can be calculated only after deployment to production, so we have to wait a while before we can see the result.
You’ll also have realised that it will change over time, as more bugs are found, and some people have used this as an argument against it. However, studies have shown that most of the defects that will ever be found in a release of a software product have been seen by the time it has been live for 6 months, and I’m happy with that, so I take an interim measurement after 3 months and a final one after 6 months. If the release’s life is shorter than that that then I take the measure on the last day before it is replaced; this can be done even when a Scrum team is releasing every week.
When calculating DDE for only part of the test effort, it’s important to consider only those types of defect which those test activities should have found. Don’t beat yourself up for not finding usability bugs during systems integration testing, because you weren’t supposed to!
This is definitely a metric for which the total ‘amount of badness’ is more significant than a breakdown by severity, because the ‘amount of badness’ that escaped is as near as we can get to a measure of the resulting user experience, and the ‘amount of badness’ that we found is a simple and direct measure of how good we are entitled to feel about the job we are doing. So I always use weighted defect quantity for this. And I always measure it, for at least three reasons:
Publishing the results promotes healthy competition between teams.
Measuring it on a regular basis shows whether the performance of a test team / Scrum team / security test vendor / whatever is getting better or worse, and by how much.
It’s a metric for which a measurable target can be set, against which the performance of teams or vendors can be judged.
And a fourth reason: if you’re measuring COQ then it’s very interesting to compare the way that these two metrics change over time. As DDE rises, COQ should fall; if not, although you are becoming more effective you’re delivering less value for money.
DDE is such a useful metric that I recommend it as a vital key performance indicator for testing. I’ve seen that many people use the terms metric and KPI interchangeably but, strictly speaking, they are different. How? Well, that’s a topic for another time.
This article is a part of a series. You can check the next one right now.
Author: Richard Taylor
Richard has dedicated more than 40 years of his professional career to IT business. He has been involved in programming, systems analysis and business analysis. Since 1992 he has specialised in test management. He was one of the first members of ISEB (Information Systems Examination Board). At present he is actively involved in the activities of the ISTQB (International Software Testing Qualifications Board), where he mainly contributes with numerous improvements to training materials. Richard is also a very popular lecturer at our training sessions and a regular speaker at international conferences.