Python vs Ruby

Alex Martelli aleaxit at yahoo.com
Sun Oct 23 12:36:33 EDT 2005


Mike Meyer <mwm at mired.org> wrote:
   ...
> > Of course, these results only apply where the "complexity" (e.g., number
> > of operators, for example) in a single line of code is constant.
> 
> I'm not sure what you're trying to say here. The tests ranged over
> things from PL/I to assembler. Are you saying that those two languages
> have the same "complexity in a single line"?

Not necessarily, since PL/I, for example, is quite capable of usages at
extremes of operator density per line.  So, it doesn't even have "the
same complexity as itself", if used in widely different layout styles.

If the studies imply otherwise, then I'm reminded of the fact that both
Galileo and Newton published alleged experimental data which can be
shown to be "too good to be true" (fits the theories too well, according
to chi-square tests etc)...


> > for item in sequence: blaap(item)
> >
> > or
> >
> > for item in sequence:
> >     blaap(item)
> >
> > are EXACTLY as easy (or hard) to write, maintain, and document -- it's
> > totally irrelevant that the number of lines of code has "doubled" in the
> > second (more standard) layout of the code!-)
> 
> The studies didn't deal with maintenance. They only dealt with
> documentation in so far as code was commented.
> 
> On the other hand, studies of reading comprehension have shown that
> people can read and comprehend faster if the line lengths fall within
> certain ranges. While it's a stretch to assume those studies apply to
> code, I'd personally be hesitant to assume they don't apply without
> some reseach. If they do apply, then your claims about the difficulty
> of maintaining and documenting being independent of the textual line
> lengths are wrong. And since writing code inevitable involves
> debugging it - and the studies specified debugged lines - then the
> line length could affect how hard the code is to write as well.

If time to code depends on textual line lengths, then it cannot solely
depend on number of lines at the same time.  If, as you say, the studies
"prove" that speed of delivering debugged code depends strictly on the
LOCs in the delivered code, then those studies would also be showing
that the textual length of the lines is irrelevant to that speed (since,
depending on coding styles, in most languages one can trade off
textually longer lines for fewer lines).

OTOH, the following "mental experiment" shows that the purported
deterministic connection of coding time to LOC can't really hold:

say that two programmers, Able and Baker, are given exactly the same
task to accomplish in (say) language C, and end up with exactly the same
correct source code for the resulting function;

Baker, being a honest toiling artisan, codes and debugs his code in
"expansive" style, with lots of line breaks (as lots of programming
shops practice), so, given the final code looks like:
    while (foo())
      {
        bar();
        baz();
      }
(etc), he's coding 5 lines for each such loop;

Able, being able, codes and debugs extremely crammed code, so the same
final code looks, when Able is working on it, like:
    while (foo()) { bar(); baz(); }
so, Able is coding 1 line for each such loop, 5 times less than Baker
(thus, by hypothesis, Able must be done 5 times faster);

when Able's done coding and debugging, he runs a "code beautifier"
utility which runs in negligible time (compared to the time it takes to
code and debug the program) and easily produces the same "expansively"
laid-out code as Baker worked with all the time.

So, Able is 5 times faster than Baker yet delivers identical final code,
based, please note, not on any substantial difference in skill, but
strictly on a trivial trick hinging on a popular and widely used kind of
code-reformatting utility.


Real-life observation suggests that working with extremely crammed code
(to minimize number of lines) and beautifying it at the end is in fact
not a sensible coding strategy and cannot deliver such huge increases in
coding (and debugging) speed.  Thus, either those studies or your
reading of them must be fatally flawed in this respect (most likely,
some "putting hands forward" footnote about coding styles and tools in
use was omitted from the summaries, or neglected in the reading).

Such misunderstandings have seriously damaged the practice of
programming (and managements of programming) in the past.  For example,
shops evaluating coders' productivity in terms of lines of code have
convinced their coders to distort their style to emit more lines of code
in order to be measured as more productive -- it's generally trivial to
do so, of course, in many cases, e.g.
    for i in range(100):
        a[i] = i*i
can easily become 100 lines "a[0] = 0" and so on (easily produced by
copy and paste or editor macros, or other similarly trivial means).  At
the other extreme, some coders (particularly in languages suitable for
extreme density, such as Perl) delight in producing "one-liner"
(unreadable) ``very clever'' equivalents of straightforward loops that
would take up a few lines if written in the obvious way instead.

The textual measure of lines of code is extremely easy to obtain, and
pretty easy to adjust to account for some obvious first-order effects
(e.g., ignoring comments and whitespace, counting logical lines rather
than physical ones, etc), and that, no doubt, accounts for its undying
popularity -- but it IS really a terrible measurement for "actual
program size and complexity".

Moreover, even if you normalized "program size" by suitable language
specific factors (number of operators, decision points, cyclomatic
complexity, etc), the correlation between program size and time to code
it would still only hold within broadly defined areas, not across the
board.  I believe "The mythical man-month" was the first widely read
work to point out how much harder it is to debug programs that use
unrestrained concurrency (in today's terms, think of multithreading
without any of the modern theory and helpers for it), which Brooks
called "system programs", compared to "ordinary" sequential code (which
Brooks called "application programs" -- the terminology is quite dated,
but the deep distinction remains valid).  Also: one huge monolithic
program using global variables for everything is going to have
complexity (and time to delivery of debugged code) that grows way more
than linearly with program size; to keep a relation that's close to
linear (though in no case can exact linearity be repeatably achieved for
sufficiently large programming systems, I fear), we employ a huge
variety of techniques to make our software systems more modular.


It IS important to realize that higher level languages, by making
programs of equivalent functionality (and with comparable intrinsic
difficulty, modularity, etc) "textually smaller" (and thereby
"conceptually" smaller), raises program productivity.  But using "lines
of code", without all the appropriate qualifications, for these
measurements, is not appropriate.  Even the definition of a language's
level in terms of LOCs per function point is too "rough and ready" and
thus suffers from this issue (function points as a language-independent
measure of a coding task's "size" have their own issues, but much
smaller ones than LOCs as a measure of a delivered code's size).


Consider the analogy of measuring a writing task (in, say, English) by
number of delivered words -- a very popular measure, too.  No doubt, all
other things being equal, it may take a writer about twice as much to
deliver 2000 copy-edited words than to deliver 1000.  But... all other
things are rarely equal.  To me, for example, it often comes most
natural to take about 500 words to explain and illustrate one concept;
but when I need to be concise, I will then take a lot of time to edit
and re-edit that text until just about all of the same issues are put
across in 200 words or less.  It may take me twice as long to rewrite
the original 500 words into 200, as it took to put down the 500 words in
the first place -- which helps explains why many of my posts are so
long, as I don't take all the time to re-edit them, and why it taks so
long to write a "Nutshell" series book, where conciseness is crucial.

Nor it is only my own issue... remember Pascal's "Lettres Provinciales",
and the famous apology about "I am sorry that this letter is so long,
but I did not have the time to write a shorter one"!-)


Alex



More information about the Python-list mailing list