Software Quality Metrics

There are tons of literature written about this topic. Lots of books and articles. Why write another one? To save your time and share my 5 years of experience in Business Intelligence tools development and quality metrics.
This article explains the most simple and efficient software quality metrics, how they drive software quality, how to read them and how they can be manipulated to hide or show the true picture of software quality.

Topics
1. Definition Of Defect
2. Cumulative Metrics
3. Defects Backlog Metric
4. Release To Release (R2R) Comparison Metric
5. Mean Time To Resolution (MTTR) Metric
6. Running Total Metrics
7. Phase Containment Metrics
8. Conclusion

1. Definition Of Defect

Defect (bug) is an expression of opinion that something is broken. It may or may not be actually broken, but, regardless, each defect consumes some amount of engineering / support / test / QA time which, as we know, is money.

There are lots of defect tracking systems, however, the good ones make sure that defects have the following attributes:

- state (defect’s life cycle indicator: it may be open, assigned, postponed, waiting for information, resolved, verified, junked, duplicated, etc; value of state attribute depends on development / testing process and may vary);

- severity (how serious is the impact: catastrophic, severe, moderate, etc);

- priority (how soon it needs to be fixed; strongly depends on workaround);

- found (product life cycle stage it was actually found: development, testing, early field trial or customer use, etc);

- product, version (sometimes it is a project codename, branch name or any other way to accurately indicate which piece of code has the defect);

- headline, description, attachments, etc (all stuff which actually explains what the issue is, how it’s being fixed, etc);

- context-related attributes (depending on nature of business or product line structure, these attributes can be anything which helps to understand and fix the defects).

Defects can have names like “tickets”, “incidents”, “customer calls”, “issues” etc. The bottom line of all these things is: something is broken and needs to be fixed. Even feature request is a defect: lack of the feature means the product is not what customer exactly wants.

After we defined a “defect” term, it’s time to explore quality metrics. From my experience, there is no single metric, which able to provide complete picture of quality. There always several metrics need to be considered. Which ones are the best? Well, depending on development phase there are different sets of metrics with specific filters suitable for the context.

I’m going to describe the most simple and efficient metrics and explain how they can help to improve software quality.

2. Cumulative Metrics

CumulativeCumulative metrics needed to estimate two things: a) total number of defects (submitted, resolved or changed state) during specific period of time and b) dynamics of measured process (slope shape).

The example displays two trends: cumulative incoming and cumulative resolved defects. So, taking a look at this chart there are still a few defects being unresolved by the end of given period.

When you see a “knee” on cumulative incoming trend, usually it means that there are not much defects left to discover (usually about 1/2 - 1/3 of already found) and each next found defect will take more time and effort from QA team.

Cumulative metrics are simple and, hence, cannot be easily manipulated. They usually can be verified with minimum effort.

3. Defects Backlog Metric

BacklogIndicates number of certain defects in open state (not fixed) at any point of time.

Backlog metric interpretation strongly depends on development phase. Some people may think that lower value of backlog indicates better quality of the product. Not always. For example, during test period it’s expected to find more defects, so it drives backlog trend up. There is even more: if during the test backlog is not going up, that means there is something wrong with test methodology or QA process: lots of defects are not found in this phase, which means, customers will be observing them, instead of QA guys.

So, increase of backlog is not always a bad thing, as well as decrease is not always a good one. Picture above represents an example of healthy release development getting ready for production: the backlog was increasing during development and testing and, on late testing stages when the majority of issues were identified and being fixed, backlog is going down.

However, after product is shipped to customers, it’s better to drive defects backlog to zero. Ideally, you should not ship a release with open defects, however, price of delay may exceed price of possible defects encountered by customers, so, it should be conscious business decision about releasing a product with defects.

How this metric can be manipulated? Assuming, that somebody would like to present “general development quality improvement” (decreasing of backlog), they would include into consideration a collection of releases which about to be shipped or recently shipped. Why? These kind of releases naturally have their backlog decreasing (unless there are in really terrible shape), so, composition of these trends will always produce a nice decreasing slope.

Other ways to manipulate this metric is to set filters for one context and drive conclusions for other (for example, by considering only defects found by testers and excluding defects found by developers and in early field trials, it’s hard to say if development is going right or wrong direction).

4. Release To Release Comparison (R2R) Metric

How do we know if quality of next release is better or worse than previous one? How can we compare them?

Release To Release ComparisonRelease To Release Comparison metric is a relative comparison side by side of cumulative number of defects (counting submitted defects as a cumulative number) found by customers after the releases were shipped. Duplicated defects better be excluded from consideration.
The chart on the left represents an example of two releases cumulative defects trends relative to first customer shipment date. X-axle may represent days, weeks or months. Chart above shows that on, say, week #2 release 2.0 had about 8 defects in total, whereas release 3.0 had about 18. During the next week there were about 7 new defect submitted against 2.0 (so, total number is about 15) and 3 defects submitted against 3.0 (21 total).
Looking at the chart it seems obvious that release 3.0 is causing more customer pain, that release 2.0. Doesn’t matter if customer install base is bigger, or release 3.0 has more features. It still has bad quality, in compare to previous release.

There is a good example: when you buy a new car you expect it to be more reliable (to have less defects) than older models. If it brakes more often than previous model, would you consider “because of more features” as a reason to declare the new one is better in terms of quality? Very unlikely. I’d say, no way. So, why this kind of argument should fly on quality reviews? No good reason.

5. Mean Time To Resolution (MTTR) Metric

There is a metric, which is pretty much needed to estimate efficiency of defects fixing team.

MTTR is a number of days needed to resolve all outstanding (open) defects if there were no new defects submitted and if the team keeps the same productivity as they were doing for X weeks.

Formula:

MTTR(t) = #OPEN(t) / ( #RESOLVED(t) / X ) )

where

#OPEN(t) is number of open defects at specific day t,

#RESOLVED(t) is cumulative number of resolved defects during X days (weeks) prior to t.

MTTR is measured in units, defined by X. If X defined in days then MTTR units are also days.
Usually X is tied up to development process and should be big enough to ensure that the trend is not fluctuating a lot. 3 to 6 months may be a good number.

MTTR depends on the following things:

1) Defects backlog (more unresolved defects mean greater MTTR),

2) Efforts of the team to resolve defects during period X,

3) Efforts of the team to resolve defects prior to X. For example, if X days ago 20 defects were resolved, current backlog is 50 defects and during X (let’s say X=60) days total of 120 defects were fixed (2 defects/day in average), MTTR for today will be equal to 25 days; tomorrow, assuming the same backlog, MTTR will jump to 30 days ( 50 / ( 100 / 60 ) ).

MTTR is very sensitive metric and is intended to prevent process slowdown before it affects product quality.

Many people tend to think that Average Age of defects is the same. No. Average Age metric will not react so fast on immediate change in process and has so much inertia, that would make it hard to use.

6. Running Total Metrics

One more simple and efficient metric is the one which counts total number of some events, occurred during specific time frame.

Running Total MetricFor example, Customer Found Incoming Defects 2 months Running Total (CF2RT).

CF2RT(t) = #INCOMING[t-2m,t]

So, each point of CF2RT metric is cumulative number of submitted customer found defects during 2 months period prior to this point.

Simple formula makes it easily verifiable and, at the same time, pretty efficient in terms of indicating what’s going on.

7. Phase Containment Metrics

In order to find out how many defects were found in each of the development phase, Phase Containment metrics being used.

We know that there is a dependency between cost of defects and phase they found. During development phase, one defect may cost anywhere from a few dollars to few hundred dollars to find and fix. When the release is being tested the cost goes up a few times (x5 to x10) because more people are involved from different groups. When product goes to early trial, it’s again a few times more expensive than in previous phase (testing) and when the product is finally shipped, each defect costs 5 to 10 times more than in early trial phase. So, cost of the same defect may go from a few hundred dollars during development phase to a few hundred thousands (or even to a few millions, depending on organization scale, customer base and commercial popularity of the product) dollars during production phase.

Phase Containment ChartLet’s consider an example of our hypothetical product, E-Note.

Green-colored rectangle will represent cumulative number of defects found in development phase, blue - during internal testing, orange - during early field trial (beta-testing) and red - found by customers.

Version 1.0 was done in hurry and team made conscious decision to launch it a little bit earlier (maybe they wanted to be first who offer this kind of product).

Version 2.0 was done with focus on quality (more defects found during development and internal testing stage), so, customers did find much less of the defects.

Then something happened. Version 3.0 was a disaster. What has been done? They lost focus and didn’t find as much defects in development phase as they should have. Internal testing was also not very good. As a result - a lot of customer pain and lots of defects found by customers.

Version 4.0 is still under development, but, from the chart it seems that E-Note management team did learn their lesson: during development phase there is a much bigger number of defects found in compare to all previous releases.

Conclusion

There are much more metrics and measurements, than listed above. Maybe hundred times more. Every software quality metrics expert will be able to name at least 10. My goal was to explain simple and efficient ways to measure software quality. Simple things tend to work better and be more reliable. They also can save a lot for development, producing at times much better results than sophisticated ones.

Why?

Cost of Business Intelligence is not only defined by dollars paid to consultants or for licensing BI tools. It is also time spent to educate managers, engineers, executives to recognize, interpret and act according to metrics purpose. It also includes price of mistakes and mis-interpretations, which affect business decisions, not mentioning everyday contribution of efforts to work with metrics. Complex metrics tend to create more confusion.

At the end of the day the purpose of Business Intelligence and Metrics is to let executives, managers and engineers spend more time on their actual work, instead of getting stuck on metrics dashboards, trying to understand what the hell all these charts and numbers mean.

2 comments ↓

#1 hyphy on 02.14.07 at 5:42 am

hi, how to contact the admin on this site? Can’t find an email anywhere…

#2 Pavel Senko on 02.14.07 at 5:54 am

I sent a message to specified email address. Please, check mailbox.

You must log in to post a comment.