Web Frameworks Excessive Complexity

Wed Nov 21 06:47:44 EST 2012

On 21/11/2012 01:43, Steven D'Aprano wrote:
> On Tue, 20 Nov 2012 20:07:54 +0000, Robert Kern wrote:
>
>> The source of bugs is not excessive complexity in a method, just
>> excessive lines of code.
>
> Taken literally, that cannot possibly the case.
>
> def method(self, a, b, c):
>      do_this(a)
>      do_that(b)
>      do_something_else(c)
>
>
> def method(self, a, b, c):
>      do_this(a); do_that(b); do_something_else(c)
>
>
> It *simply isn't credible* that version 1 is statistically likely to have
> twice as many bugs as version 2. Over-reliance on LOC is easily gamed,
> especially in semicolon languages.

Logical LoC (executable LoC, number of statements, etc.) is a better measure 
than Physical LoC, I agree. That's not the same thing as cyclomatic complexity, 
though. Also, the relationship between LoC (of either type) and bugs is not 
linear (at least not in the small-LoC regime), so you are certainly correct that 
it isn't credible that version 1 is likely to have twice as many bugs as version 
2. No one is saying that it is.

> Besides, I think you have the cause and effect backwards. I would rather
> say:
>
> The source of bugs is not lines of code in a method, but excessive
> complexity. It merely happens that counting complexity is hard, counting
> lines of code is easy, and the two are strongly correlated, so why count
> complexity when you can just count lines of code?

No, that is not the takeaway of the research. More code correlates with more 
bugs. More cyclomatic complexity also correlates with more bugs. You want to 
find out what causes bugs. What the research shows is that cyclomatic complexity 
is so correlated with LoC that it is going to be very difficult, or impossible, 
to establish a causal relationship between cyclomatic complexity and bugs. The 
previous research that just correlated cyclomatic complexity to bugs without 
controlling for LoC does not establish the causal relationship.

> Keep in mind that something like 70-80% of published scientific papers
> are never replicated, or cannot be replicated. Just because one paper
> concludes that LOC alone is a better metric than CC doesn't necessary
> make it so. But even if we assume that the paper is valid, it is
> important to understand just what it says, and not extrapolate too far.

This paper is actually a replication. It is notable for how comprehensive it is.

> The paper makes various assumptions, takes statistical samples, and uses
> models. (Which of course *any* such study must.) I'm not able to comment
> on whether those models and assumptions are valid, but assuming that they
> are, the conclusion of the paper is no stronger than the models and
> assumptions. We should not really conclude that "CC has no more
> predictive power than LOC". The right conclusion is that one specific
> model of cyclic complexity, McCabe's CC, has no more predictive power
> than LOC for projects written in C, C++ and Java.
>
> How does that apply to Python code? Well, it's certainly suggestive, but
> it isn't definitive.

More so than the evidence that CC is a worthwhile measure, for Python or any 
language.

> It's also important to note that the authors point out that in their
> samples of code, they found very high variance and large numbers of
> outliers:
>
> [quote]
> Modules where LOC does not predict CC (or vice-versa) may indicate an
> overly-complex module with a high density of decision points or an overly-
> simple module that may need to be refactored.
> [end quote]
>
> So *even by the terms of this paper*, it isn't true that CC has no
> predictive value over LOC -- if the CC is radically high or low for the
> LOC, that is valuable to know.

Is it? What is the evidence that excess, unpredicted-by-LoC CC causes (or even 
correlates with) bugs? The paper points that out as a target for future research 
because no one has studied it yet. It may turn out to be a valid metric, but one 
that has a very specific utility: identifying a particular hotspot. Running CC 
over whole projects to compare their "quality", as the OP has done, is not a 
valid use of even that.

>> LoC is much simpler, easier to understand, and
>> easier to correct than CC.
>
> Well, sure, but do you really think Perl one-liners are the paragon of
> bug-free code we ought to be aiming for? *wink*

No, but introducing more statements and method calls to avoid if statements 
isn't either.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco