Elementary string-parsing
Odysseus
odysseus1479-at at yahoo-dot.ca
Mon Feb 4 04:43:04 EST 2008
In article <60nunqF1ro06iU4 at mid.uni-berlin.de>,
Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
<snip>
> Here and in later code you use a ``while`` loop although it is known at
> loop start how many times the loop body will be executed. That's a job
> for a ``for`` loop. If possible not over an integer that is used later
> just as index into list, but the list itself. Here you need both, index
> and objects from `names`. There's the `enumerate()` function for creating
> an iterable of (index, name) from `names`.
Thanks, that will be very useful. I was casting about for a replacement
for PostScript's "for" loop, and the "while" loop (which PS lacks -- and
which I've never missed there) was all I could come up with.
> I'd put all the relevant information that describes a field of the
> dictionary that is put into `found` into tuples and loop over it. There
> is the cell name, the index of the cell and function that converts the
> string from that cell into an object that is stored in the dictionary.
> This leads to (untestet):
>
> def extract_data(names, na, cells):
> found = dict()
The problem with initializing the 'super-dictionary' within this
function is that I want to be able to add to it in further passes, with
a new set of "names" & "cells" each time.
BTW what's the difference between the above and "found = {}"?
> for i, name in enumerate(names):
> data = dict()
> cells_index = 10 * i + na
> for cell_name, index, parse in (('epoch1', 0, parse_date),
> ('epoch2', 1, parse_date),
> ('time', 5, parse_number),
> ('score1', 6, parse_number),
> ('score2', 7, parse_number)):
> data[cell_name] = parse(cells[cells_index + index])
This looks a lot more efficient than my version, but what about the
strings that don't need parsing? Would it be better to define a
'pass-through' function that just returns its input, so they can be
handled by the same loop, or to handle them separately with another loop?
> assert name.startswith('Name: ')
I looked up "assert", but all I could find relates to debugging. Not
that I think debugging is something I can do without ;) but I don't
understand what this line does.
> found[name[6:]] = data
> return found
>
> The `parse_number()` function could look like this:
>
> def parse_number(string):
> try:
> return float(string.replace(',', ''))
> except ValueError:
> return string
>
> Indeed the commas can be replaced a bit more elegant. :-)
Nice, but I'm somewhat intimidated by the whole concept of
exception-handling (among others). How do you know to expect a
"ValueError" if the string isn't a representation of a number? Is there
a list of common exceptions somewhere? (Searching for "ValueError"
turned up hundreds of passing mentions, but I couldn't find a definition
or explanation.)
<snip>
>
> As already said, that ``while`` loop should be a ``for`` loop. But if you
> put `m_abbrevs` into a `list` you can replace the loop with a single call
> to its `index()` method: ``dlist[1] = m_abbrevs.index(dlist[1]) + 1``.
I had gathered that lists shouldn't be used for storing constants. Is
that more of a suggestion than a rule? I take it tuples don't have an
"index()" method.
Thanks for the detailed advice. I'll post back if I have any trouble
implementing your suggestions.
--
Odysseus
More information about the Python-list
mailing list