How many levels of Severity do you like?

How many levels of Severity do you like?

In ‘A Bug Like This’, when explaining how to get a reliable idea of how much a particular type of bug might cost to fix if it went live, I argued that Severity isn’t a reliable basis for deriving the cost of a defect.  It is, however, one of the most important ways that we classify anything in our work as testers because so much can be learnt from analyses of defect reports.  The GIGO principle – Garbage In, Garbage Out – applies strongly here.  If we don’t get our severity classification right, analyses based on it will be misleading and the decisions that we then make may take us in the wrong direction.

Of course, it’s important to have each severity level clearly defined so that people can make the right choices.  Before you can do that, you’ll need to decide how many levels you will have.

When my IT career began, most organisations seemed to think that three levels sufficed.  The main exceptions were for safety-critical applications where more granularity was needed.  Norman Goodkin, in his ground-breaking work on defect-based acceptance metrics (which, unfortunately, no longer seems to be publicly available) proposed nine (yes, 9!) levels, with acceptance criteria based on a maximum permissible quantity of unresolved defects at each level.  In this scheme there’s no need to consider the nature of any individual defect, because the definition of its severity is precise enough to tell you all you need to know about the kind of impact that it will have.  Whilst I like defect-based acceptance criteria, a nine-level severity scheme would be overkill for most of us.  Is there an ideal quantity of severity levels?

The best classification schemes are the ones that make it easy to choose the right option.  That’s partly a matter of clear definitions but psychology also plays a part.  If the scheme has many levels then there’ll be only a small difference between any two neighbouring ones and then:

  • the most conscientious people will spend a long time considering the subtleties before feeling confident about their choice

    •  good classification but poor productivity

  • less conscientious people, especially at the end of a difficult day, will make a quick but arbitrary choice

    •  better productivity but poor classification

  • others will spend time trying to make a good choice but will then give up the struggle and make an arbitrary one

    •  worst of both worlds

If the scheme has only a few choices, like the classic High – Medium – Low one, there’ll be a big difference between any two neighbouring levels, each of which will have broad scope.  High, for example, could cover anything from total unavailability to one feature not working.  Then:

  • when a defect seems to be near the border between two levels, most people will prefer to be safe than sorry so they’ll choose the higher level; many defects will thus be over-rated.

  • analyses by severity won’t be very informative.

Is there a good compromise?  Five levels have always worked well for me: enough to allow confident choices and informative analysis but not so many that people lose the will to live!  The JIRA default level names of Blocker, Critical, Major, Minor and Trivial work well for this because they sort alphabetically, so nobody has to remember what level numbers mean.  As for definitions, how about these, at least as a starting point for thought:

  1. Blocker: entire system unusable

  2. Critical: an business-critical function is entirely unusable, or, loss or corruption of business-critical data

  3. Major: a business-critical feature is unusable with no acceptable work-around, or, loss or corruption of non-critical data, or, a non-critical function is entirely unusable

  4. Minor: a business-critical feature is unusable but there is an acceptable work-around, or, a non-critical feature is unusable, or, a cosmetic issue that may cause difficulty in use or reputational damage.

  5. Trivial: none of the above.

If you’re used to working with fewer levels you might be worried about the extra time needed for analysis.  If you want to analyse defect distribution or detection activity, for example, do you now have to repeat the process with five sets of numbers to see each level separately?  Not necessarily, because in many circumstances the use of a trick called weighting severity can be more efficient … but that’s the subject of a future article.

This article is a part of a series. You can check the next one right now.

Part 4                                                                                                                                                                                                                                   Part 6

Author: Richard Taylor

Richard has dedicated more than 40 years of his professional career to IT business. He has been involved in programming, systems analysis and business analysis. Since 1992 he has specialised in test management. He was one of the first members of ISEB (Information Systems Examination Board). At present he is actively involved in the activities of the ISTQB (International Software Testing Qualifications Board), where he mainly contributes with numerous improvements to training materials. Richard is also a very popular lecturer at our training sessions and a regular speaker at international conferences.