COCOMO - appropriate for languages like Python?

Thu Jul 11 00:20:25 EDT 2002

Mike Brenner wrote:

> > JB wrote:
> > From a schedule estimation standpoint, LOC (however you choose to count > them) appears to be a pretty good estimator to use for a fixed staff and fixed
> > project sizes. How the prediction varies as you change staff or project sizes is > something you'll have to measure or guess at yourself.
>
> IMO, the following measurements apply to most Python efforts more than lines of code:

The issues you raise are valid, though I think most can be restated in terms of a LOC productivity factor.

>         - Most software has achieved the status of "maintenance" rather than "development". Thus, millions of lines of code might require, say, a one-line change. Some of those one-line changes take a month of research and testing, while others take a few seconds. The lines of code changed (and the lines of code in the whole project) only correlate to new code, not to code under maintenance.

To my mind, this is simply saying the average cost per new LOC is rather high in these circumstances and also the standard deviation also is high.  IIRC, COCOMO has a way to take this into account.

>         - Software Maintenance time primarily increases as backlog increases (e.g. programmer overtime which the company intends to not pay, other work awaiting action, inadequate functional testing, deferred regression testing which detects reappearance of prior bugs, incomplete impact analysis of past and present changes, incorrect documentation, and the age of parallel development paths). For example, parallel development paths that last more than a few days (fractured baselines) start to take on lives of their own, and become more expensive to merge into a single "golden" baseline as they age.

Increased backlog = increased project size.  Per COCOMO, this => increased cost per new LOC or (more likely) decreased LOC per unit cost.

However, IIRC, COCOMO does not have a parameter to calibrate for this particular criterion.

Ha!  Reminds me of one place I worked.  The company lumped telephone tech support and software QA into a single group.  QA schedules were pretty much fixed and when the phone rang it had to be answered, so this created a feedback loop in the ongoing software quality -- if it started to dip, phone calls went up, causing it to spiral further downward, due to even less QA resource.

>         - Software Maintenance time decreases as the technology factor (the part of the COCOMO model that applies to software maintenance rather than to new development) increase. Thus, to save time, get better tools for:

Agreed.   Per COCOMO, better tools => lower cost per new LOC.

>         - Software Maintenance time increases with the number of data integrity problems possible in the languages and tools used (coding bugs, data design flaws, data flaws, network design flaws, rifts, and sabotage).

tools, problem domain complexity => increased cost per new LOC

>         - Software Maintenance time positively correlates to the amount of time that maintenance organization last maintained that type of code (after subtracting the time it takes to open up the configuration, which relates to how long since a project last opened that code).

staff experience, staff domain experience => lower cost per new LOC

>         - Software Maintenance time negatively correlates with syntactic standards.

"Good" standards can help even without a tool, e.g., by improving readability.  "Bad" standards hurt no matter what.  Good vs. bad is somewhat subjective.

Whether good is worth the cost is debatable, though i've seen so many horror stories in code formats that I suspect some minimum standard always is in order.  So I'll say "good" standards => lower cost per new LOC but the difference may be marginal.

If you're going to enforce a standard, having a pretty print tool is best.

> For example, a rule like "indent each level 3 spaces" slows down software maintenance and development in two ways.

I agree.  This is an example from one of the worst standards I ever encountered.

>         - When management decides to use a metric, and the programmers become aware of that metric, then the programmers take whatever action required to make that metric reach 100%. For example, if management pays the programmers by the line of code, the lines of code will increase.

Having all the underlying metrics lying around begs the issue of why not use them to judge programmer performance?  Some organizations have tried to do this and, yes, programmers immediately respond by inflating the metrics.

Most promoters of cost modeling strongly recommend against doing this, precisely because it interferes with and distorts the result of future predictions.

Furthermore, while measuring lines of code may be useful in predicting future lines of code, this is absolutely no way of measuring a person's contribution to the organization.  Also the metrics work because they measure and predict gross, aggregate performance of a group on a project, not details about individuals.

The primary intended usage of models such as COCOMO is to predict future project costs.  That is, you calibrate the model with existing data in order that your future predictions about future projects will be more accurate.  In order for this to work you generally need to assure programmers that the metrics specifically will NOT be used for performance measurement.  The theory relies on people NOT changing their way of doing things and you generally need buy-in and full cooperation from the developers for it to work.  Thus you need to be open and honest and consistent about how the metrics will be used and how they will not be used.

Your concern here is valid but nobody I know of advocates abusing metrics this way.

>         - When management (personal, government, corporate, academic) requests non-applicable metrics like lines of code (a metric which makes sense only at development time) or McCabe Cyclomatic Complexity (a metric which makes sense only at testing time since it counts test paths and dings programmers for good stuff like nested SWITCH/CASE statements and BREAKs) to describe maintenance effort, consider get rid of that entire levelly management.

I rather think LOC is a valid metric in all circumstances (although I agree it may be less reliable and thus less useful in some extreme cases).

Cyclomatic Complexity metric was devised to abstract the code's underlying complexity away from purely lexical issues such as LOC which are more easily perverted by the developer.  To my knowledge, none of the more complex metrics have shown themselves to be any better than simply counting LOC.  And they're much harder to measure.  So, IMHO, anything more complex than LOC is not worth the bother.

>         - Without an accepted standard definition of "lines of code", one cannot know whether to count every line in every IMPORTed module, or just those the INVOKED lines, or just those lines that do the invoking. For example, do we count every line in the python runtime module STRING.PY or just the lines in STRING.PY that our modules call, or just the lines in our module that call STRING.PY? These different counting strategies differ by orders of magnitude.

I personally would count the import statements but not any of the imported lines.  My view is that you have a list of modules written by a programmer and Linux' "wc" command tells you the lines of code.  You don't get credit for an imported module unless you write it.  More generally, I'd envision a CVS system that tracks lines changed, as in larger projects, different people ultimately work on the same modules.  What about comments and blank lines?  I say count 'em.  Comments certainly CAN be as valuable as code.  I also think blank lines as a form of code paragraphing are very important.  If it was being abused, I might count multiple consecutive blank lines as a single one.

While it is true that the definition of "LOC" can vary widely it is only important that you use the same definition across the domain of projects you wish to analyze (typically your own company or organization).  Different definitions can produce consistent results so long as the definition is consistently applied to all the data.

Of course, if you're trying to compare your projects to Boehm's or somebody else's, then it's probably apples and oranges.  Even if you count LOC precisely the same way as the other org, numerous other variations are likely to account for an even bigger differences.  The interesting thing about Boehm's work is that he has data for so many projects that the results likely do apply to your project in a more general level, in that the shapes of the curves likely will be similar, even if baseline and rates may vary.

How do *I* use all this?  Really.

First off, I'm certainly not a big COCOMO fan.  I've never actually used the full model for anything and don't rely on metrics or modeling much in real life.  I read the book and it had some interesting insights.  Some of the most important results from Boehm are the general results which pretty much apply to all projects.  E.g., the vital importance of hiring only the best people and keeping project steps manageable (overall cost increases extra-lineraly with size).  Boehm's not the first to say this but what's interesting is he says ALL the other factors are decidedly less important.

Beyond that what I take away from this entire subject is that LOC is one easily computed, moderately useful, crude measurement of programmer productivity.  It's better than nothing and I am unconvinced that anything more complicated is at all worth the extra effort.  I usually use LOC to double-check my own progress.  Typically a project seems to be taking longer than it should, I'll do a WC, and sure enough, it's working out to be like twice or 3X code than I expected.  It's on occasion proved helpful with clients.  E.g., I had a client complain about my bills and I was able to show that my company's contribution to his project was approximately twice the LOC as were in the original application; thus we were actually a bargain, since he paid many times as much to get that first 1/3.  Counting lines is not the solution to all problems but it is yet another tool that has proven to be of some use to me on some occasions.

Regards

--jb

--
James J. Besemer  503-280-0838 voice
http://cascade-sys.com  503-280-0375 fax
mailto:jb at cascade-sys.com