|
Estimation Models for Software Maintenance Based on Functional Size
By Alain Abran, École de Technologie Supérieure - Université du Québec
Abstract
This paper illustrates the use of functional size measures in building estimation models for maintenance projects implementing small functional enhancements in existing software. More specifically, it reports on estimation models built with 19 maintenance projects on a single a real-time embedded software application in the defence industry. Functional size measures were collected using the COSMIC-FFP functional size measurement method and the maintenance projects were classified into two classes of project difficulty to identify sub-sets of projects with greater homogeneity in their relationship of project effort with respect to functional size.
1. INTRODUCTION
Software maintenance costs usually exceed software development costs and Ferens [1] indicates that the attention given to these costs has not been commensurate with their importance: for instance, only a handful of estimation models have been proposed for software maintenance while a significant number have been proposed for development projects. By the early 1980s few attempts had been made at software maintenance estimation. There was the COCOMO-M(aintenance) model, with a single additional maintenance-unique input, annual change traffic[2]. In this model, it is the cost of the whole maintenance life cycle that is estimated, over a period of time; with such a model, no attempt is made to estimate one maintenance project at a time. Ferens [1] reports that SLIM has a single maintenance-unique input, while PRICE-S, SEER-SEM and CHECKPOINT have multiple inputs. However, Ferens points out that the accuracy of such maintenance estimation models in their early analyses was low and that, in general, the performance of maintenance estimation models has yet to be demonstrated [1].
Estimation models are based on the generic concept of productivity defined as the ratio of output to input. In software projects, productivity is defined as the ratio of the software product developed to the resources required to produce it. While ‘effort’ is the generally accepted measure of the input (often measured in person-hours, -days or -months), software size is recognized as a key factor in the construction of models estimating project effort.
In addition to size, it is recognized that other factors affect effort needed for doing maintenance projects, such as type of application, programming language, age of software, quality of the documentation available. In the context reported here, the maintenance projects measured and analysed were carried out on the same software application, in the respective organizations; this means therefore that these specific factors were held constant and did not need to be taken into account in building the estimation models within the same application under maintenance. Still, many other factors can influence maintenance projects effort, in addition to size: for instance, whether or not a full system test is required or not, severe constraints on resources availability, functional complexity, technical complexity, low or high level of reuse, etc. In the study reported here, these individual factors were not investigated independently, but were rather represented through a single factor referred to as ‘project difficulty’.
Generally, factors such as ‘project difficulty’ are represented as categorical variables which are not intentded to represent quantitative values. Such categories are used to identify distinct groups of data to be analyzed. The variables, by contrast, are used when categorical factors can affect the positive relationship of variables in regression models. Therefore, they can be considered as explanatory variables in estimation models.
This paper reports on the use of this new generation of functional size measure (COSMIC-FFP) in building productivity and estimation models for measuring the effort involved in small projects implementing functional enhancements in the context of the maintenance of real-time software in the defense industry. The information reported here is a summary of the analysis of the data set from the defence industry reported in full by Abran et al. in [3]. This same reference includes as well examples of estimation models built for a set of 15 maintenance projects on a web-based software application in the linguistic domain. Nagano [4] presents as well estimation models for a set of maintenance projects on a telecom switching system, all measured with COSMIC-FFP.
This paper is organized as follows: the data set is presented in section 2, the simple regression models with function size only in section 3, and the multiple regression models in section 4. A discussion is presented in section 5. While measurement experts will be interested in the details of the equations and of the statistical tests, managers should focus on the approach used, and the graphical interpretations as well as on the impact for estimation purposes in software maintenance.
2. DATA SET
2.1 Context
This data set comes from an organization which designs, develops and implements systems for the defense industry. It is a subsidiairy of an international organization, and the software unit of this organization develops and maintains real-time embedded software. This organization did not have a measurement program in place, neither had it developed its own estimation models (either for development or for maintenance projects). All data collected were verified by subject matter experts from the industrial organization.
2.2 Functional size with COSMIC-FFP
The COSMIC-FFP functional sizing method[5,6] was designed to work equally well for ‘data-rich’ business/MIS software and for ‘control-rich’ or ‘real-time’ software, that is, the software typically found in telecoms, avionics and process control, and in embedded and operating systems. The method also makes it possible to size such software in any layer or peer item of a multi-layer and/or multi-tier architecture. In the COSMIC-FFP method, size is measured in Cfsu units (that is: COSMIC functional size units), which is equivalent to one data movement of one data group.
In a maintenance context, the functional size of a change to the Functional User Requirements within each piece of software is calculated by aggregating the sizes of the corresponding impacted data movements according to the following formula:
SizeCfus(Change)=∑size(added data movementi) + ∑size(changed data movementi) + ∑size(deleted data movementi)
The COSMIC-FFP method was designed to enable it to discriminate among the sizes of small functions – it uses a unitary size unit rather than the stepwise functions of the first generation measurement methods. Therefore, this COSMIC-FFP method opens up the possibility of building estimation models for real-time software based on functional size, not only for development projects, but also for small maintenance projects. COSMIC-FFP is supported by the Common Software Measurement International Consortium (COSMIC) formed in 1998 to design and bring to market a new generation of functional size measurement methods. Overall, close to 40 people from 8 countries participated in the design of this measurement method, and the Measurement Manual describing the method is available for free download in English, French, Japanese and Spanish [5]. It was adopted in 2003 as an international standard by ISO (ISO 19761) [6].
For this study, functional size was measured afterwards on completed maintenance projects.
2.3 Effort
The work effort (in person-hours) was obtained from the organization’s time reporting system: this corporate time reporting system allowed us to identify the effort expended on each specific functional maintenance project. However, for some of the projects the effort expended for the analysis phase of a project was not noted; therefore, only the effort expended, excluding analysis, for all projects was taken into account.
2.4 Project difficulty
The project difficulty category for a maintenance project was assigned by the staff who had carried out the maintenance projects. They did this on the basis of project documentation and their own experience; in industry, this is referred to as assignment of values by subject matter experts. At measurement time, four levels of project difficulty were identified on the following scale: 1, 3, 5 and 7, corresponding to no difficulty, difficult, highly difficult and extremely difficult. The use of these four initial difficulty levels in a sample of this size (21 observations) was problematic from a statistical viewpoint, as there were insufficient data observations for certain difficulty values (only 1 or 2 observations), which means that some difficulty categories would not be representative enough when building estimation models. A simpler classification of a categorical variable is then desirable. To do this, the difficulty variable with four levels was reclassified into a two-level classification (low and high)
2.5 Data sample
This industrial data set included information on 21 maintenance projects implementing functional enhancements to the software components of a defence system. Figure 1 presents a graphical representation of this data set, which contained the functional size and effort expended for each of the 21 projects.

Figure1: Data set including outliers(number of projects: N = 21)
When there are good reasons to believe that some outliers are not representative of the data set under study, they should be taken out of the sample being analyzed. Using the statistical tests described in Box 1, two such outliers were identified and dropped out from further analysis.
A visual analysis of this data set of 19 projects without outliers (Figure 2) suggests that there is a positive correlation between an increase in functional size and the increase in effort, even though this correlation is not necessarily strong. Here again, it can be observed that there is some degree of heteroskedasticity in this data set (the data are wedge-shaped), which suggests that regression models with a single variable only would not necessarily be very good models. Such a distribution shape provides a clue that there is, for this organization, at least one other important variable which has significant impact on maintenance project effort.

Figure2: Data graph excluding 2 outliers (N = 19)
3. SIMPLE REGRESSION MODELS WITH FUNCTIONAL SIZE ONLY
3.1 Simple linear regression model with
functional size only
The linear regression model with the functional size as the independent variable gives, for the sample of 19 observations, the following linear equation (Figure 3): Effort = 0.61 x Cfsu + 91 (R2= 0.12; n=19)
This linear model is not strongly positive, with a regression coefficient R2 of only 0.12, which means that only 12% of the total variability of effort in the projects is explained by the variation on its functional size, as measured in Cfsu.
See Box 2 for information about criteria for assessing the reliability of estimation models.

Figure3: Linear regression (N = 19)
3.2 Nonlinear regression models with functional size
Other forms of regression model have already been investigated, and these results are presented in Table 1, where R is the coefficient of correlation between the actual values of Y and the values derived from the equation. High values of R (maximum = 1.0) would indicate a high correlation, and R2 is the percentage of the variance of the dependent variable which can be explained by the given equation. From Table 1, it can be observed that none of the non-linear regression models represents a significant improvement over the linear model.

Table 1: Monlinear Regression Model (N = 19)
4. MULTIPLE REGRESSION MODELS WITH TWO INDEPENDENT VARIABLES
Regression models with multiple independent variables (functional size and another variable) were investigated next to analyze how this additional variable would contribute to the relationship between size and effort. The introduction of a second independent variable such as the total number of Cfsu for a project, the total number of lines of code, the number of lines of code modified and the total number of programs modified, did not bring about significant improvements to the explanatory power of the regression models of the form y = ax + bz + c. For example, the model with two independent variables (functional size and number of programs modified) is:
Y = a x Cfsu + b x (no. of programs modified) + c
Y = 0.78 x Cfsu – 3.62 x (no. of programs modified) + 98
This multiple regression model with the number of programs modified has the same value for its R2, 0.12, which is not an improvement over the simple linear regression model.
4.1 Multiple regression models – additive form
The next regression model constructed takes into account the categorical factor of the low-high level of the difficulty factor:
Difficulty = 1 -> if in the “high” level
Difficulty = 0 -> if in the “low” level
Then, an additive model with the low-high difficulty factor gives each level the same importance in the relationship between size and work effort. It takes the following form: y = ax + bz + c
where if z = 0 ? y = ax + c, or, if z = 1 ? y = ax + (b + c)
For this sample of 19 projects, the general form of the model is:
Effort = 0.92 `x Cfsu + 126.12 x difficulty +26 with R2 : 0.46, standard error: 82.7 and n = 19
When taking the low-high difficulty level into account, the following two models, as illustrated in Figure 4, are obtained:
If the Difficulty = 0 ? Effort = 0.92 Cfsu + 26
If the Difficulty = 1 ? Effort = 0.92 Cfsu + 152
This model, with the difficulty level variable, has a coefficient of determination R2 = 0.46, which is better than the simple linear regression model, but still not good enough. It can be observed in Figure 4 that both regression lines have the same slope (0.92), and are represented by parallel lines with different points at the origin when Cfsu = 0. This is typical of additive models.

Figure4: Additive model (N = 19)
4.2 Multiple regression model: multiplicative form
With the additive model, the size effect is not impacted by the values of the low-high difficulty class variable, and hence the impact of the increase in both variables is determined by constant additive values; that is, the impact of the size variable is measured independently of the low-high difficulty variable. To improve this regression model, as recommended in [7], a new variable is added, that is, the interaction of difficulty and size, as represented by the multiplication of the two variables (Difficulty x Cfsu). The inclusion of the variable in the model makes it possible to recognize the multiplicative impact of these two variables on the positive relation between size and effort. Of course, it eliminates the parallelism of the two lines in the additive model. The general form of the multiplicative model is the following:
Y = α X + β Z + γ ( X x Z ) + μ, that is
Effort = α Cfsu + β Difficulty + γ (Cfsu x Difficulty ) + μ
If difficulty = 0 ? Effort = α Cfsu + μ
If difficulty = 1 ? Effort = (α + γ) Cfsu + (μ + β)
The difficulty variable, represented by g, has an influence on the behavior of the size variable Cfsu, thereby modifying the slope and the constant of the curve during the analysis of its values of 0 and 1. The general multiple linear regression equation obtained is:
Effort = 0.64 Cfsu + 41.94 Difficulty + 3.85 (Difficulty x Cfsu ) + 41
With R2 = 0.75, standard error = 57,83 and n = 19
This multiplicative model has a coefficient of determination R2 = 0.75, which is a significant improvement over both the model with one variable and the additive model with two independent variables. Furthermore, the coefficient of the categorical variable linked to both difficulty and size is statistically significant, that is, it has a p value < 0.05. The specific equations for each difficulty level are as follows:
If difficulty = 0 ? Effort = 0.64 x Cfsu + 41 with R2 = 0.47 and n = 8
If difficulty = 1 ? Effort = 4.49 Cfsu + 82.94 with R2 = 0.78 and n = 11
These equations are presented in Figure 5 and clearly illustrate the fact that the effort level of projects depends both on their functional size and on their difficulty level as significant variables to be taken into account during estimation.

Figure5: Multiplicative Model (N = 19)
In addition, the graphical analysis in Figures 4 and 5 shows that the largest project in terms of size (for project no. 1, size = 216 Cfsu), is classified in the low difficulty category, and has a much lower level of effort than projects of smaller size, which is not consistent with the balance of the data set. The behavior of project no. 1 could be considered to be significantly different from the others in terms of some unspecified factor. It could therefore be excluded from the sample and reserved for further analysis to verify its impact on the multiplicative model. On this reduced sample of 18 projects, we apply the general multiplicative model given by (Y = α X + β Z + γ ( X x Z ) + μ), which gives:
Effort = 1.25 Cfsu + 56 Difficulty + 3.24 (Difficulty x Cfsu ) + 27 with R2 = 0.84 and n = 18
The specific models for each difficulty class are then as follows:
If difficulty = 0 ? Effort = 1.25 Cfsu + 27 with R2 = 0.87 and n = 8
If difficulty = 1 ? Effort = 4.49 Cfsu + 83 with R2 = 0.78 and n = 10
With a regression coefficient R2 = 0.84, this model is better than the previous one, and the Cfsu variable is statistically significant with a p value < 0.05, as is the multiplicative term of both variables (Figure 6). Box 3 presents a comparison of the results of the multiplicative models for both sample N=19 and sample N=18.

Figure6: Multiplicative Model (N = 18)
5. DISCUSSION
Software size is recognized as a key factor in the construction of models to estimate project effort. Software size can be measured from either a technical perspective, with Lines of Code for example, or from a functional perspective. While functional size measures have been available for software of the MIS type for the past twenty years, the industry has not yet felt that it was appropriate for real-time and embedded software. The COSMIC-FFP new generation of functional size measures has addressed this domain issue. In addition, the COSMIC-FFP method was designed with a much more granular scale, which enables it to discriminate among the various sizes of small functions – through the use of a unitary size unit rather than stepwise functions in the first generation measurement method. This COSMIC-FFP method thus makes it possible to build estimation models for real-time software based on functional size, and to do this not only for development projects, but for small maintenance projects as well.

This data set included maintenance projects modifying a single software application. The functional size has been measured with the COSMIC-FFP standard.
When the data set is graphically represented (x = functional size, y = effort), there is an indication of a position relationship between size and effort, but this relationship is not strong; both patterns have a wedge shape, which indicates that, for these organizations, at least one other factor has a significant influence on project effort.
Estimation models built with either an average unit cost or a simple regression model provided interesting models for both organizations, but they did not model the relationship between size and effort well for the individual projects. For this data set, even though there is a clear positive relationship between functional size and project effort, such a relation was not strong enough to derive good estimation models using this single independent variable with either the average unit costs models or the linear and nonlinear forms of simple regression models: both data sets had a graphical distribution on the two axes and were wedge-shaped, illustrating that the single size variable is not enough for building a good estimation model.
Two forms of multiple regression models were then investigated: additive models and multiplicative multiple regression models. While regular multiple regression gives a single model, both the additive and the multiplicative models lead to two models, one for each class of the second independent variable. In the additive model, the two submodels are represented by two parallel lines, with the same slope and a different point of origin; in the multiplicative model, the two submodels do not need to be parallel lines. The multiplicative models graphically mapped the two classes of projects much better (low-high) with respect to the other independent variable, functional size, and their criteria as good models of the relationship between size and work effort were much better.
These results are of particular interest in the development of estimation models for maintenance projects: the functional size of a maintenance enhancement can be measured very early on from the requirements themselves, and therefore this size can be obtained fairly accurately before any programming needs to be done (e.g. functional size does not need to be estimated with an associated level of uncertainty, as input to the estimation model).
These various analyses have provided insights into maintenance work, which consists of small functional enhancements and their relationships with work effort. A process for deriving estimation models has also been illustrated, with both an objectively derived quantitative variable (functional size), obtainable early on in the project life cycle, and a categorical variable, difficulty. Once such models are derived, and their quality analyzed, then they can be quite useful in estimating subsequent maintenance projects involving adding or modifying functions to these software applications.
6. REFERENCES
[1] Ferens, D.V. (1999) “The Conundrum of Software Estimation Models”, IEEE AES Systems Magazine, March, 1999, 23-29.
[2] Boehm, B. W. (1981) “Software Engineering Economics”, Englewood Cliffs, NJ, Prentice Hall, 1981.
[3] Abran, I. Silva, L. Primera, (2002), “Field Studies Using Functional Size Measurement in Building Estimation Models for Software Maintenance”, in Journal of Software Maintenance and Evolution: Research and Practice, Vol 14, 2002, pp. 31-64.
[4] Nagano,Shin-ichi; Mase, Ken-ichi; Watanabe, Yasuo; Watahiki, Takaichi; Nishiyama, Shigeru, (2001), ‘Validation of Application Results of COSMIC-FFP’, in Australian Software Conference on Measurement (ASCOM), Australia.
[5] Abran, A., Desharnais, J.M., Oligny, S., St-Pierre, D. and Symons, C. (2003) “Measurement Manual – The COSMIC Implementation Guide for ISO/IEC 19761: 2003, version 2.2”, Software Engineering Research Laboratory, École de technologie supérieure, Université du Québec, Montreal, Canada, 2003. Downloadable at http://www.gelog.etsmtl.ca/cosmic-ffp
[6] ISO/IEC 19761:2003, ‘Software engineering - COSMIC-FFP - A functional size measurement method’, International Organization for Standardization’, International Organization for Standardization, Geneva.
[7] Neter, J. Wasserman, W., Kutner, M.H. Applied Linear Regression Models, Irwing Inc.: Homewood IL., 1983: 338-339.
[8] Conte S.D., Dunsmsore D.E. and Shen V.Y. (1986) “Software Engineering Metrics and Models”, Menlo Park: The Benjamin/Cummings Publishing Company, Inc., 1986.
About the Author
Dr. Alain Abran is a Professor and the Director of the Software Engineering Research Laboratory at the École de Technologie Supérieure (ETS) - Université du Québec.
He is currently Co-executive editor of the Guide to the Software Engineering Body of Knowledge project. He is also actively involved in international software engineering standards and is Co-chair of the Common Software Metrics International Consortium (COSMIC).
Dr. Abran has more than 20 years of industry experience in information systems development and software engineering. The maintenance measurement program he developed and implemented at Montreal Trust, Canada, received one of the 1993 Best of the Best awards from the Quality Assurance Institute.
About Contact Information
Professor
Department of Software and IT Engineering
École de technologie supérieure – Université du Québec
1100, Notre-Dame Street West
Montréal, Québec
Canada H3C 1K3
Phone : +1 (514) 396- 8632
Fax: +1 (514) 396 8684
e-mail: Alain.Abran@etsmtl.ca
URL: www.gelog.etsmtl.ca
|