|
Industry Software Cost, Quality and Productivity Benchmarks
By J. Reifer, Reifer Consultants, Inc.
Abstract: This article provides software cost, quality and productivity benchmarks in twelve application-oriented domains that readers can use to determine how well their organizations are performing relative to industry averages. In addition to answering common questions raised relative to these benchmarks, the article summarizes the return on investments firms are realizing as they try to harness new technology for a variety of reasons.
Introduction
For years, I have heard my friends complain about the lack of software cost and productivity benchmarks in the open literature. They wanted to use the benchmarks to identify how they stacked up with others within their industry in terms of their performance. In March 2002, I tried to do something about this lack of data by publishing an article which put cost and productivity numbers that I had been compiling for more than two decades into the public domain along with a call for others to join me in publishing numbers [1] Unfortunately, nobody has answered the call. As a consequence, most cost and productivity numbers continue to remain confidential.
This paper tries to rectify the current state of affairs by providing readers with a revision of my original paper. I have updated my cost and productivity numbers and added quality benchmarks. The new data is important because I believe that cost and productivity should be measured from a quality point of view. It seems counterproductive to me to increase productivity by cutting quality. Yet, there are firms that will do anything to improve their numbers.
Needless to say, numbers are important. Besides giving you insight into what the costs and benefits are for a given alternative, numbers can be used to establish industry benchmarks that firms can use to compare their performance with others. When push comes to shove, such numbers are what really matters when you are seeking management approval for funds. “If there isn’t some benefit, why spend money?” management asks. Questions like these seem to permeate the atmosphere when management is asked to spend money on software improvements;
· What are the costs and what are the benefits?
· What are the returns on this investment?
· Why should I spend money on this investment rather than on alternatives?
· What would happen if I did nothing (actually my boss’s favorite question)?
Getting management to spend money isn’t easy. These same managers will use rules of thumb that they have developed over the years to determine if the numbers you are pitching are reasonable. For example, they might surprise you by asking the following question if your numbers don’t make sense: “How can you suggest that we will more than triple our productivity if your proposal is accepted?” You need to be prepared to respond to such a question. Having benchmark data that you can use to answer their question could make or break your proposal.
While helpful, the use of such benchmarks is often fraught with danger. This is primarily because people take numbers and use them out of context to justify what they are after. Often, management sees right through such tactics and turns down their request. To avoid failing the “numbers must make sense test”, they must be thoroughly defined and be used properly.
Looking at Software Productivity Numbers
Software productivity refers to the ability of an organization to generate outputs (software systems, documentation, etc.) using the inputs or resources it has at its disposal (people, money, equipment, tools, etc.). Using this definition, an organization can increase its productivity by either focusing on reducing the input or increasing the output.
To improve productivity using my definition, organizations could focus on either the input or output strategy. For an input-based productivity improvement strategy, they would focus on increasing workforce productivity through efficiencies gained by inserting better methods, tools, processes, facilities and equipment into the production mix. In contrast, an output-based productivity improvement strategy would place emphasis on increasing the amount of output generated under equivalent circumstances by using components, architecture-centric reuse and product line tactics. Both strategies strive to produce more output using less input.
Within many industries, productivity is commonly expressed as either equivalent source lines of code (ESLOC)/staff month (SM) or as function points (FP)/SM. While other measures may be used, these two tend to predominate, especially for medium to large scale projects. Of course, the measures ESLOC, FP and SM must be carefully scoped and consistently defined for these metrics to convey a common meaning (see Notes under Tables for applicable definitions). In addition, the many factors that cause these measures to vary must also be identified. These measures, called cost drivers, must be normalized when defining the terms. Because my firm’s databases are primarily ESLOC-based, I use this metric as the basis for my analysis. For those interested in how we handle other measures, we backfire FP data to convert them to ESLOC using the language conversion factors supplied from the International Function Point Users Group (e.g., one FP is expressed as so many lines of C or Java using such backfiring tables).
Table 1 summarizes the results of our analysis of twelve application domains for which my firm has collected data that seem to be of interest to readers. The numbers in Table 1 were derived by taking a 600 project subset of our 2,000+ project experience database and performing statistical analysis. In addition, there are no foreign projects in our database to distort conclusions. The average productivity in our foreign databases for Europe (
Belgium
,
England
,
Finland
,
France
,
Germany
,
Ireland
,
Italy
,
Spain
and
Sweden
) and Asia (
India
,
Korea
,
Japan
and
Singapore
) are summarized in Table 2. This Table shows how the average productivity for these regions compares to the
United States
and tries to explain why they are different. Summaries for
Australia
,
Canada
and South American numbers are not included in our analysis because the number of completed projects in our databases is too small to be statistically significant. The box under each of the Tables provides notes about the contents. Data were validated using standard statistical means. When anomalies in the data were observed, site visits were made to address concerns. An acronym list is also provided below to define terms used in the Tables.
|
Acronym List for Tables
|
|
IOC
|
Initial Operational Capability
|
PRR
|
Product Readiness Review
|
|
IPT
|
Integrated Product Team
|
SAR
|
Software Acceptance Review
|
|
IRR
|
Internal Requirements Review
|
SDR
|
System Design Review
|
|
KESLOC
|
Thousand Source Lines of Code
|
SETD
|
Systems Eng. & Technical Direction
|
|
LCA
|
Life Cycle Architecture (review)
|
SM
|
Staff Months
|
|
LCO
|
Life Cycle Objectives (review)
|
SRR
|
Software Requirements Review
|
|
MBASE
|
Model-Based Software Engineering
|
STR
|
Software Test Review
|
|
PDR
|
Preliminary Design Review
|
UTC
|
Unit Test Review
|
|
Application Domain
|
Number Projects
|
Size
Range
(KESLOC)
|
Avg. Productivity
(ESLOC/SM)
|
Range
(ESLOC/SM)
|
Example Application
|
|
Automation
|
55
|
25 to 650
|
245
|
120 to 445
|
Factory automation
|
|
Banking
|
30
|
55 to 1,000
|
270
|
155 to 550
|
Loan processing, ATM
|
|
Command & Control
|
45
|
35 to 4,500
|
225
|
95 to 350
|
Command centers
|
|
Data Processing
|
35
|
20 to 780
|
330
|
165 to 500
|
DB-intensive systems
|
|
Environment/Tools
|
75
|
15 to 1,200
|
260
|
143 to 630
|
CASE, compilers, etc.
|
|
Military -All
|
125
|
15 to 2,125
|
145
|
45 to 300
|
See subcategories
|
|
· Airborne
|
40
|
20 to 1,350
|
105
|
65 to 250
|
Embedded sensors
|
|
· Ground
|
52
|
25 to 2,125
|
195
|
80 to 300
|
Combat center
|
|
· Missile
|
15
|
22 to 125
|
85
|
52 to 175
|
GNC system
|
|
· Space
|
18
|
15 to 465
|
90
|
45 to 175
|
Attitude control system
|
|
Scientific
|
35
|
28 to 790
|
195
|
130 to 360
|
Seismic processing
|
|
Telecommunications
|
50
|
15 to 1,800
|
250
|
175 to 440
|
Digital switches
|
|
Test
|
35
|
20 to 800
|
210
|
100 to 440
|
Test equipment, devices
|
|
Trainers/Simulations
|
25
|
200 to 900
|
225
|
143 to 780
|
Virtual reality simulator
|
|
Web Business
|
65
|
10 to 270
|
275
|
190 to 985
|
Client/server sites
|
|
Other
|
25
|
5 to 1,000
|
182
|
65 to 481
|
All others
|
|
Totals
|
600
|
10 to 4,500
|
|
45 to 985
|
|
Table 1: Software Productivity (ESLOC/SM) by Application Domains
Notes for Table 1
· The 600 projects are the most recent projects taken from my database of more than 1,800 projects. These projects were completed within the last seven years by any of 40 organizations (each organization is kept anonymous due to the confidentiality of the data). A project is defined as the delivery of software to system integration. Projects include builds and products that are delivered externally, not internally. Both delivery of a product to market and a build to integration fit this definition.
· The scope of all projects starts with software requirements analysis and finishes with completion of software testing.
- For military systems, the scope extends from software requirements review until handoff to the system test bed for hardware and software integration and testing.
- For banking and ADP systems, the scope extends from approval of project startup until sell-off.
- For web systems, the scope extends from product conception to customer sell-off.
· Projects employ a variety of methods and practices ranging from simple to sophisticated.
· Analysis includes all chargeable engineering, management and support labor in the numbers.
- It includes programming, task management and support personnel who normally charge to project.
- It does not include quality assurance, system or operational test, and beta test personnel.
· The average number of hours per staff month was 152 (takes holidays, vacation, etc. into account).
· SLOC is defined by Florac and Carleton [2] to be logical source line of code using the conventions published by the Software Engineering Institute in 1993. ESLOC are defined by Boehm to take into account reworked and reused code [3]. All SLOC counts adhere to the SEI counting conventions.
· Function point sizes are defined using current International Function Point Users Group (IFPUG) standards, http://www.ifpug.org/.
· Function point sizes were converted to SLOC using backfiring factors published by IFPUG in 2000, as available on their web site.
· Projects used many different life cycle models and methodologies. For example, web projects typically followed a Rapid Application Development process and used lightweight methods, while military projects used more classical processes and methods. Commercial projects used object-oriented methodology predominantly, while military projects used a broader a mix of conventional and object-oriented approaches.
· Projects used a variety of different languages. For example, web projects employed Java, Perl and Visual C while military projects used predominately C/C++. |