Though everyone has a favorite theory as to why software failures occur, my experience and work while with the MITRE Corporation and Booz-Allen & Hamilton has taught me that more projects are doomed by poor cost and schedule estimates than by technical, political or team problems. Capers Jones extensive research, found in his book Estimating Software Costs (McGraw-Hill, 1998), makes a similar claim. Its no surprise, therefore, that so few companies and individuals understand that software estimating can be a science, not just an art. It is possible to accurately and consistently predict development life cycle costs and schedules for a wide range of projects. In a series of four articles, I will provide a step-by-step tutorial in estimating the cost and schedule for your projects. You will be able to implement the concepts in these articles using nothing more complicated than a spreadsheet.
Ill start by covering the various methods of estimating the size, or volume, of a program. The two traditional measures for this are lines of code and function points, although there are others. At the end of this article, Ill show you how to prepare a preliminary, unadjusted estimate using this information.
In next months installation, entitled "Project Cost Adjustments," Ill explain how to adjust project costs for variations in the project environment. At the end of the article, you will be able to create an accurate estimate of the time and cost required to develop a new application.
Part three, "Dealing with Reuse," explains how to quantify the impact of software reuse and commercial components or libraries on your estimate.
Finally, part four, "Creating the Project Plan," describes how to use your newfound insight into project cost and schedule to create a complete project plan.
The Estimating Life Cycle
Before discussing specific size measures, I must point out the limitations of software cost estimating at the macro level. As shown in Figure 1, the typical accuracy of cost estimates varies based on the software development stage. Early uncertainty is largely based on variances in the input parameters to the estimate. Later uncertainty in the estimate is based on the variances to the estimating models.
Figure 1. Cost Estimate Accuracy by Development Stage
Early uncertainty in cost estimates is due to variances in the input parameters to the estimate. Later uncertainty can be traced back to variances in the estimating models. While at the concept stage, when requirements may be hazy, the general purpose of the new software should be clear. At this point, estimates using informal techniques such as historical comparisons or group consensus should have an accuracy of plus or minus 50 percent. By the time the detailed design is complete, an implementation-oriented estimate will be accurate within plus or minus 10 percent. |
Initially, at the concept stage, you may be presented with a vague definition of the project. Though the requirements may not yet be fully understood, the general purpose of the new software can be recognized. At this point, estimates with an accuracy of plus or minus 50 percent are typical for an experienced estimator using informal techniques such as historical comparisons or group consensus.
The key to accuracy lies in making periodic reestimates throughout the project life cycle, thereby identifying problems early enough to take corrective action.
Estimating Program Volume
The first step in preparing an estimate is to characterize the project volume. One measure is the number of source lines of code, or SLOC. A SLOC is a human written line of code that is not a blank line or comment. Do not count the same line more than once, even if the code appears several times in an application. We typically work with a related number, thousands of SLOC, or KSLOC, when estimating. SLOC as an estimating metric was popularized by Barry Boehms Constructive Cost Model, or COCOMO, found in his book Software Engineering Economics (Prentice Hall, 1981). The basic COCOMO model and the new COCOMO II model remain the most common estimating approaches.
Ill discuss approaches to estimating KSLOC in more detail, but first, how do you convert from the number of KSLOC to an estimate for the project?
Lets begin with the most simple estimate. If you know the number of KSLOC your developers must write, and you know the effort required per KSLOC, then you could multiply these two numbers together to arrive at the person months of effort required for your project. This concept is at the heart of all of the estimating models. Table 1 shows some common values that researchers have found for this linear productivity factor. Note that although language affects productivity in terms of functionality per hour, effort measured in terms of effort per line of code is language-independent. The values in the table are derived from work by Barry Boehm (COCOMO), Raymond Kyle and the U.S. Air Force Cost Analysis Agencys revised COCOMO (REVIC), and firms or organizations working directly with the Cost Xpert Group.
Table 1. Common Values for the Linear Productivity Factor
If you know how many thousands of lines of code (KSLOC) your developers must write and you know the effort required per KSLOC, you can multiply these two numbers together to arrive at the person months of effort required for your project. This concept is at the heart of all of the estimating models. |
OK, lets apply this approach. Suppose we were going to build an e-commerce system consisting of 15,000 lines of code. How many person months of effort would this take using just this equation?
The answer is computed as follows:
Productivity*KSLOC=3.60*15=Effort=54PersonMonths
If all of your projects are small, then you can use this basic equation. Researchers have found, however, that productivity does vary with project size. In fact, large projects are significantly less productive than small projectsprobably because they require increased coordination and communication time, plus more rework due to misunderstandings.
This productivity decrease with increasing project size is factored in by raising the number of KSLOC to a number greater than 1.0. This exponential factor then penalizes large projects for decreased efficiency. Table 2 shows some typical size penalty factors for various project types.
Table 2. Typical Size Penalty Factors for Various Project Types
Productivity does vary with project size. In fact, large projects are significantly less productive than small projectsprobably because they require increased coordination and communication time, plus more rework due to misunderstandings. The exponential factors above penalize large projects for decreased efficiency. |
So, after we do a size penalty adjustment, how many person months of effort would our 15,000 lines of code e-commerce system require? The answer is computed as follows:
Productivity*KSLOCPenalty=3.60*151.030=3.60*16.27=Effort=58.6PersonMonths
All of this is pretty straightforward. The next logical question is, "How do I know my project will end up as 15,000 SLOC?"
There are two main approaches to answering this question: direct estimation and function points with backfiring. Using either approach, the fundamental input variables are determined through expert opinion, often with your developers as the experts. The Delphi technique, described in Karl Wiegers article, "Stop Promising Miracles" (Feb. 2000), is a good way to cross-check the input variables.
Normally, the first step in estimating the number of lines of code is to break the project down into modules or some other logical grouping. For example, a very high level breakdown might be front-end processes, middle-tier processes and database code. Your developers then use their experience building similar systems to estimate the number of lines of code required.
We strongly recommend that you obtain three estimates for each input variable: a best case estimate, a worst case estimate and an expected case estimate. With these three inputs, you can then calculate the mean and standard deviation as
The standard deviation is a measure of how much deviation can be expected in the final number. For example, the mean plus three times the standard deviation will ensure that there is a 99 percent probability that your project will come in under your estimate.
For more information, refer to Barry Boehms Software Engineering and Project Management (IEEE Press, 1987).
Estimating Function Points
An alternative to direct SLOC estimating is to start with function points, then use a process called backfiring to convert them to SLOC. Function points were first utilized by IBM Corp. as a measure of program volume. The idea is simple: The programs delivered functionality (and hence, cost) is measured by the number of ways it must interact with the users.
To determine the number of function points, start by estimating the number of external inputs, external interface files, external outputs, external queries and logical internal tables.
External inputs are largely your data entry screens. If a screen contains a tabbed notebook or similar metaphor, each tab counts as a separate external input. External interface files are file-based inputs or outputs. Each record format within the file, or, in the case of XML, each data object type, would count as a separate interface file even if residing in the same physical file. External outputs are your reports. External queries are message or external function-based communication into or out of your application. Finally, logical internal tables are the number of tables in the database, assuming the database was third normal form or better.
To convert from these raw values into an actual count of function points, you multiply the raw numbers by a conversion factor from Table 3.
Table 3. Factors for Converting Raw Values to Function Points
To determine the number of function points, start by estimating the number of external inputs, external interface files, external outputs, external queries and logical internal tables. To convert from these raw values into an actual count of function points, you multiply the raw numbers by the conversion factors above. |
So, if we had a system consisting of 25 data entry screens, 5 interface files, 15 reports, 10 external queries and 20 logical internal tables, how many function points would we have?
The answer is computed as follows:
(25*4)+(7*5)+(15*5)+(10*4)+(20*10)=450FunctionPoints
The only remaining step is to use backfiring to convert from function points to an equivalent number of SLOC. This can be done using a table of language equivalencies. Capers Jones was a pioneer in this area, and his work still makes up approximately 70 percent of the published language efficiency values. Many of the values are published in his book Estimating Software Costs. See Table 4 for some common values.
Table 4. Lines of Code Per Function Point by Programming Language
A table of language equivalencies lists a standard number of source lines of code (SLOC) per function point in a given programming language. |
So, to implement the above project (450 function points) using Java 2 would require approximately the following number of SLOC:
450*46=20,700SLOC
And would require the following effort to implement, assuming that this was an e-commerce system:
Productivity*KSLOPenalty=3.60*20.71.030=3.60*22.67=Effort=81.61PersonMonths
As I discussed in my article "Estimating Internet Development" (E-development and Security, Aug. 2000), there are other approaches to calculating equivalent SLOC from a higher level input value. These include Internet points, Domino points and class-method points to name just a few. All of them work in a fashion analogous to function points.
In the next installment, Ill cover the concept of project cost adjustments for variations in the project environment.