Elementary string-parsing

Odysseus odysseus1479-at at yahoo-dot.ca
Mon Feb 4 04:43:04 EST 2008


In article <60nunqF1ro06iU4 at mid.uni-berlin.de>,
 Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:

<snip>

> Here and in later code you use a ``while`` loop although it is known at
> loop start how many times the loop body will be executed.  That's a job
> for a ``for`` loop.  If possible not over an integer that is used later
> just as index into list, but the list itself.  Here you need both, index
> and objects from `names`.  There's the `enumerate()` function for creating
> an iterable of (index, name) from `names`.

Thanks, that will be very useful. I was casting about for a replacement 
for PostScript's "for" loop, and the "while" loop (which PS lacks -- and 
which I've never missed there) was all I could come up with.

> I'd put all the relevant information that describes a field of the
> dictionary that is put into `found` into tuples and loop over it.  There
> is the cell name, the index of the cell and function that converts the
> string from that cell into an object that is stored in the dictionary. 
> This leads to (untestet):
> 
> def extract_data(names, na, cells):
>     found = dict()

The problem with initializing the 'super-dictionary' within this 
function is that I want to be able to add to it in further passes, with 
a new set of "names" & "cells" each time.

BTW what's the difference between the above and "found = {}"?

>     for i, name in enumerate(names):
>         data = dict()
>         cells_index = 10 * i + na
>         for cell_name, index, parse in (('epoch1', 0, parse_date),
>                                         ('epoch2', 1, parse_date),
>                                         ('time', 5, parse_number),
>                                         ('score1', 6, parse_number),
>                                         ('score2', 7, parse_number)):
>             data[cell_name] = parse(cells[cells_index + index])

This looks a lot more efficient than my version, but what about the 
strings that don't need parsing? Would it be better to define a 
'pass-through' function that just returns its input, so they can be 
handled by the same loop, or to handle them separately with another loop?

>         assert name.startswith('Name: ')

I looked up "assert", but all I could find relates to debugging. Not 
that I think debugging is something I can do without ;) but I don't 
understand what this line does.

>         found[name[6:]] = data
>     return found
> 
> The `parse_number()` function could look like this:
> 
> def parse_number(string):
>     try:
>         return float(string.replace(',', ''))
>     except ValueError:
>         return string
> 
> Indeed the commas can be replaced a bit more elegant.  :-)

Nice, but I'm somewhat intimidated by the whole concept of 
exception-handling (among others). How do you know to expect a 
"ValueError" if the string isn't a representation of a number? Is there 
a list of common exceptions somewhere? (Searching for "ValueError" 
turned up hundreds of passing mentions, but I couldn't find a definition 
or explanation.)

<snip>
> 
> As already said, that ``while`` loop should be a ``for`` loop.  But if you
> put `m_abbrevs` into a `list` you can replace the loop with a single call
> to its `index()` method: ``dlist[1] = m_abbrevs.index(dlist[1]) + 1``.

I had gathered that lists shouldn't be used for storing constants. Is 
that more of a suggestion than a rule? I take it tuples don't have an 
"index()" method.

Thanks for the detailed advice. I'll post back if I have any trouble 
implementing your suggestions.

-- 
Odysseus



More information about the Python-list mailing list