Array design question

Thu May 29 11:12:55 EDT 2003

Quoth Peter Slizik:
  [...]
> Let's have a simple example. Suppose we have the file in format
> 7 "Text 1"
> 5 "Text 2"
> 3 "Text 3"
> ...
> ...
> 
> Our task is to read the content of the file into the array.
> 
> Simple PHP code
> 
> while( !eof() ) {
>      line = readline();
>      (number, text) = split(line);
>      array[number] = text;
> }
> 
> seems untranslatable into Python. The first attempt to store an object 
> with index 7 into array before other 6 objects are stored there, will 
> cause an exception. But the numbers in the input file aren't in any 
> special order.

If they're really just out of order, but otherwise form a sequence
0, 1, ..., n-1 (for example) then it's simple in Python too:

    f = file('whatever')
    try:
        data = []
        for line in f:
            number, text = line.rstrip().split()
            data.append((int(number), text))
        data.sort()
        assert [number for number, text in data] == range(len(data))
        data = [text for number, text in data]
    finally:
        f.close()

I haven't ever used PHP, but I suppose your example code above
does a bit of pseudo-error-recovery, using the assumptions that
    1. if duplicate numbers occur, only the last entry matters, and
    2. if numbers are missing, some default value is acceptable.
With these assumptions, you could in Python write, say,

    f = file('whatever')
    try:
        data = {}
        for line in f:
            number, text = line.rstrip().split()
            data[int(number)] = text     # smash any previous value
        maxindex = max(data.keys())
        datalist = ['']*(maxindex+1)     # use empty string as default value
        for number, text in data:
            datalist[number] = text
    finally:
        f.close()

More verbose, to be sure, but also more explicit -- which is good.
The assumptions above are not, imho, so commonly applicable that
they deserve default status.  (This varies by problem domain, of
course; there's been a wee discussion lately over on the
ll1-discuss mailing list about this exact point.)

The relevant Python Zen is "in the face of ambiguity, refuse the
temptation to guess".  Better to make the programmer state what
should happen in erroneous situations.  *Especially* when parsing
data files.  (Indeed, the above code is badly deficient in error
handling; it just dies horribly if, e.g., the text field contains
whitespace or the number field isn't a number.)

> Is there any reason why Python designers chose this concept? Wouldn't it 
>   be more convenient to have PHP-like arrays in Python too?

For certain tasks, perhaps.  But for the kinds of task I usually
program for, absolutely not -- I find it much more convenient that
assigning outside the bounds of an array is an error.

-- 
Steven Taschuk             "The world will end if you get this wrong."
staschuk at telusplanet.net     -- "Typesetting Mathematics -- User's Guide",
                                 Brian Kernighan and Lorrinda Cherry