Python arrays and sting formatting options

Wed Oct 1 05:35:03 EDT 2008

On Wed, 01 Oct 2008 06:58:11 +0000, Marc 'BlackJack' Rintsch wrote:

>> I would weaken that claim a tad... I'd say it is "usual" to write
>> something like this:
>> 
>> alist = []
>> for x in some_values:
>>     alist.append(something_from_x)
>> 
>> 
>> but it is not uncommon (at least not in my code) to write something
>> like this equivalent code instead:
>> 
>> alist = [None]*len(some_values)
>> for i, x in enumerate(some_values):
>>     alist[i] = something_from_x
> 
> I have never done this, except in the beginning I used Python, and --
> maybe more importantly -- I've never seen this in others code.  I really
> looks like a construct from someone who is still programming in some
> other language(s).

It occurs at least twice in the 2.5 standard library, once in 
sre_parse.py:

    groups = []
    groupsappend = groups.append
    literals = [None] * len(p)
    for c, s in p:
        if c is MARK:
            groupsappend((i, s))
            # literal[i] is already None
        else:
            literals[i] = s

and another time in xdrlib.py:

    succeedlist = [1] * len(packtest)
    count = 0
    for method, args in packtest:
        print 'pack test', count,
        try:
            method(*args)
            print 'succeeded'
        except ConversionError, var:
            print 'ConversionError:', var.msg
            succeedlist[count] = 0
        count = count + 1

>> Most often the first way is most natural, but the second way is
>> sometimes more natural.
> 
> When will it be more natural to introduce an unnecessary index?

We can agree that the two idioms are functionally equivalent. Appending 
is marginally less efficient, because the Python runtime engine has to 
periodically resize the list as it grows, and that can in principle take 
an arbitrary amount of time if it causes virtual memory paging. But 
that's unlikely to be a significant factor for any but the biggest lists.

So in the same way that any while-loop can be rewritten as a recursive 
function, and vice versa, so these two idioms can be trivially re-written 
from one form to the other. When should you use one or the other?

When the algorithm you have is conceptually about growing a list by 
appending to the end, then you should grow the list by appending to the 
end. And when the algorithm is conceptually about dropping values into 
pre-existing pigeon holes, then you should initialize the list and then 
walk it, modifying the values in place.

And if the algorithm in indifferent to which idiom you use, then you 
should use whichever idiom you are most comfortable with, and not claim 
there's Only One True Way to build a list.

>> And Marc, I think you're being a little unfair to the OP, who is
>> clearly unfamiliar with Python. I've been using Python for perhaps ten
>> years, and I still find your code above dense and hard to comprehend.
>> It uses a number of "advanced Python concepts" that a newbie is going
>> to have trouble with:
>> 
>> - the with statement acts by magic; if you don't know what it does,
>> it's an opaque black box.
> 
> Everything acts by magic unless you know what it does.  The Fortran
> 
>   read(*,*)(a(i,j,k),j=1,3)
> 
> in the OP's first post looks like magic too.  

It sure does. My memories of Fortran aren't good enough to remember what 
that does.

But I think you do Python a disservice. One of my Perl coders was writing 
some Python code the other day, and he was amazed at how guessable Python 
was. You can often guess the right way to do something. He wanted a set 
with all the elements of another set removed, so he guess that s1-s2 
would do the job -- and it did. A lot of Python is amazingly readable to 
people with no Python experience at all. But not everything.

> I admit that my code shows
> off advanced Python features but I don't think ``with`` is one of them.
> It makes it easier to write robust code and maybe even understandable
> without documentation by just reading it as "English text".

The first problem with "with" is that it looks like the Pascal "with" 
statement, but acts nothing like it. That may confuse anyone with Pascal 
experience, and there are a lot of us out there.

The second difficulty is that:

    with open('test.txt') as lines:

binds the result of open() to the name "lines". How is that different 
from "lines = open('test.txt')"? I know the answer, but we shouldn't 
expect newbies coming across it to be anything but perplexed.

Now that the newbie has determined that lines is a file object, the very 
next thing you do is assign something completely different to 'lines':

        lines = (line for line in lines if line.strip())

So the reader needs to know that brackets aren't just for grouping like 
in most other languages, but also that (x) can be equivalent to a for-
loop. They need to know, or guess, that iterating over a file object 
returns lines of the file, and they have to keep the two different 
bindings of "lines" straight in their head in a piece of code that uses 
"lines" twice and "line" three times.

And then they hit the next line, which includes a function called 
"partial", which has a technical meaning out of functional languages and 
I am sure it will mean nothing whatsoever to anyone unfamiliar to it. 
It's not something that is guessable, unlike open() or len() or append().

>> - you re-use the same name for different uses, which can cause
>> confusion.
> 
> Do you mean `lines`?  Then I disagree because the (duck) type is always
> "iterable over lines".  I just changed the content by filtering.

Nevertheless, for people coming from less dynamic languages than Python 
(such as Fortran), it is a common idiom to never use the same variable 
for two different things. It's not a bad choice really: imagine reading a 
function where the name "lines" started off as an integer number of 
lines, then became a template string, then was used for a list of 
character positions... 

Of course I'm not suggesting that your code was that bad. But rebinding a 
name does make code harder to understand.

-- 
Steven