Web Frameworks Excessive Complexity

Tue Nov 20 20:43:57 EST 2012

On Tue, 20 Nov 2012 20:07:54 +0000, Robert Kern wrote:

> The source of bugs is not excessive complexity in a method, just
> excessive lines of code.

Taken literally, that cannot possibly the case.

def method(self, a, b, c):
    do_this(a)
    do_that(b)
    do_something_else(c)

def method(self, a, b, c):
    do_this(a); do_that(b); do_something_else(c)

It *simply isn't credible* that version 1 is statistically likely to have 
twice as many bugs as version 2. Over-reliance on LOC is easily gamed, 
especially in semicolon languages.

Besides, I think you have the cause and effect backwards. I would rather 
say:

The source of bugs is not lines of code in a method, but excessive 
complexity. It merely happens that counting complexity is hard, counting 
lines of code is easy, and the two are strongly correlated, so why count 
complexity when you can just count lines of code?

Keep in mind that something like 70-80% of published scientific papers 
are never replicated, or cannot be replicated. Just because one paper 
concludes that LOC alone is a better metric than CC doesn't necessary 
make it so. But even if we assume that the paper is valid, it is 
important to understand just what it says, and not extrapolate too far.

The paper makes various assumptions, takes statistical samples, and uses 
models. (Which of course *any* such study must.) I'm not able to comment 
on whether those models and assumptions are valid, but assuming that they 
are, the conclusion of the paper is no stronger than the models and 
assumptions. We should not really conclude that "CC has no more 
predictive power than LOC". The right conclusion is that one specific 
model of cyclic complexity, McCabe's CC, has no more predictive power 
than LOC for projects written in C, C++ and Java.

How does that apply to Python code? Well, it's certainly suggestive, but 
it isn't definitive.

It's also important to note that the authors point out that in their 
samples of code, they found very high variance and large numbers of 
outliers:

[quote]
Modules where LOC does not predict CC (or vice-versa) may indicate an 
overly-complex module with a high density of decision points or an overly-
simple module that may need to be refactored.
[end quote]

So *even by the terms of this paper*, it isn't true that CC has no 
predictive value over LOC -- if the CC is radically high or low for the 
LOC, that is valuable to know.

> LoC is much simpler, easier to understand, and
> easier to correct than CC.

Well, sure, but do you really think Perl one-liners are the paragon of 
bug-free code we ought to be aiming for? *wink*

-- 
Steven