Python is readable

Wed Mar 21 00:55:51 EDT 2012

>> One example is performing a series of transformations on a collection of
>> data, with the intent of finding an element of that collection that
>> satisfies a particular criterion.  If you separate out the individual
>> transformations, you need to understand generators or you will waste
>> space and perform many unnecessary calculations.  If you only ever do a
>> single transformation with a clear conceptual meaning, you could create
>> a "master transformation function," but what if you have a large number
>> of potential permutations of that function?
>
> I'm sorry, that is far too abstract for me. Do you have a *concrete*
> example, even an trivial one?

How about a hypothetical log analyzer that parses a log file that is
aggregated from multiple event sources with disparate record
structures.  You will need to perform a series of transformations on
the data to convert record elements from text to specific formats, and
your function for identifying "alarm" records is also dependent on
record structure (and possibly system state, imagining an intrusion
detection system).  Of course you could go through a lot of trouble to
dispatch and detect alarms over 6-7 statements, however given the
description "for each log record you receive, convert text elements to
native data types based on the value of the first three fields of the
record, then trigger an alert if that record meets defined
requirements" and assuming you have maps from record values to
conversion functions for record elements, and a map from record types
to alert criteria functions for record types already constructed, it
seems like a one liner to me.

>> What if you are composing
>> three or four functions, each of which is conditional on the data?  If
>> you extract things from a statement and assign them somewhat arbitrary
>> names, you've just traded horizontal bloat for vertical bloat (with a
>> net increase in volume), while forcing a reader to scan back and forth
>> to different statements to understand what is happening.
>
> First off, vertical bloat is easier to cope with than horizontal bloat,
> at least for people used to reading left-to-right rather than vertically.
> There are few anti-patterns worse that horizontal scrolling, especially
> for text.

I agree that if a line goes into horizontal scroll buffer, you have a
problem.  Of course, I often rail on parenthesized
function-taking-arguments expression structure for the fact that it
forces you to read inside out and right to left, and I'd prefer not to
conflate the two issues here.  My assertion is that given an
expression structure that reads naturally regardless, horizontal bloat
is better than larger vertical bloat, in particular when the vertical
bloat does not fall along clean semantic boundaries.

> Secondly, the human brain can only deal with a limited number of tokens
> at any one time. It's easier to remember large numbers when they are
> broken up into chunks:
>
> 824-791-259-401 versus 824791259401
>
> (three tokens, versus twelve)
>
> Likewise for reading code. Chunking code into multiple lines instead of
> one long expression, and temporary variables, make things easier to
> understand, not harder.

This is true, when the tokens are an abstraction.  I read some of the
research on chunking, basically it came down to people being able to
remember multiple numbers efficiently in an auditory fashion using
phonemes.  Words versus random letter combinations have the same
effect, only with visual images (which is why I think Paul Graham is
full of shit with regards to his "shorter is better than descriptive"
mantra in old essays).  This doesn't really apply if storing the
elements in batches doesn't provide a more efficient representation.
Of course, if you can get your statements to read like sensible
English sentences, there is definitely a reduction in cognitive load.

> And thirdly, you're not "forcing" the reader to scan back and forth -- or
> at least if you are, then you've chosen your functions badly. Functions
> should have descriptive names and should perform a meaningful task, not
> just an arbitrary collection of code.

This is why I quoted Einstein.  I support breaking compound logical
statements down to simple statements, then combining those simple
statements.  The problem arises when your compound statement still
looks like "A B C D E F G H I J K L M N", and portions of that
compound statement don't have a lot of meaning outside the larger
statement.  You could say X = A B C D E, Y = F G H I J, Z = K L M N,
then say X Y Z, but now you've created bloat and forced the reader to
backtrack.

>
> When you read:
>
> x = range(3, len(sequence), 5)
>
> you're not forced to scan back and forth between that line and the code
> for range and len to understand it, because range and len are good
> abstractions that make sensible functions.
>
> There is a lot of badly written code in existence. Don't blame poor
> execution of design principles on the design principle itself.

I like to be fair and even handed, and I recognize that tool and
language creators don't control users.  At the same time, it is a
fundamental truth that people are much more inclined to laziness and
ignorance than their complements. Any exceptional design will
recognize this and make doing the right thing the intuitive, expedient
choice.  From this perspective I feel morally obligated to lay some
blame at the feet of language or tool creator when a person misuses
their creation in a way easily predicted given his or her nature.

> [...]
>> Also, because of Sphinx, it is very common in the Python community weave
>> documents and code together in a way that is convenient for authors but
>> irritating for readers.
>
> I don't know about "very common". I suspect, given the general paucity of
> documentation in the average software package, it is more like "very
> rare, but memorable when it does happen".

Well, as a textbook example, the Pocoo projects tend to this.
FormAlchemy is another offender.  This is just off the top of my head.

>> I personally would prefer not to have to scroll
>> past 100 lines of a tutorial with examples, tests and what not in order
>> to go from one function to another.
>
> Agreed. Docstrings should use a minimal number of examples and tests.
> Tutorials and extensive tests should be extracted into external documents.

I don't even mind having tutorials and doctests in the same file, I
just don't like them to be interleaved.  I really like module level
docstrings, and class docstrings are sometimes useful; it is function
docstrings that I usually find irritating.