Programming language productivity

Sun May 21 17:43:31 EDT 2006

Peter Maas wrote:

> I think that a LOC comparison between a language that enforces line breaks
> and another language that enables putting an lots of code in one line
> doesn't make much sense. I wonder why comparisons aren't made in terms of
> word count. Word count would include literals, constants, variables,
> keywords, operators, bracket- and block delimiter pairs. Python
> indent/unindent would of course also count as block delimiters. I think
> this would be a more precise measure for software size.

I don't disagree, but "word" counts aren't so simple, either to define or to
implement.  What counts as a word?  Parser tokens?  That counts a.b (or
a::b, or a->b, depending on language) as 3 words.  Block delimiters?  After
a month, you don't even notice them in properly formatted code, which is
why python doesn't have them in the first place.  Operators?  Then e.g.
a = b + c + d + e counts more than a = add (b, c, d, e).  The complexity of
expressions seems determined by the numbers of operands; using operators as
well arguably overcounts.

Regardless of the above choices, you still need a parser (or at least a
lexer) to count anything.  Whitespace separation won't cut it - what
happens with 'for (i=0;i<5;i++)' or 's = "foo bar baz"'?  If you toss out
operators, you could almost get away with regular expressions for counting
the identifiers, keywords, and literals.  But there's still the problem of
overcounting string literals.

Line counts are simple to compute and it's easier to agree on which lines to
count.  Thus their popularity.

-- 
Edward Elliott
UC Berkeley School of Law (Boalt Hall)
complangpython at eddeye dot net