Elementary string-parsing
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Mon Feb 4 12:01:50 EST 2008
On Mon, 04 Feb 2008 09:43:04 +0000, Odysseus wrote:
> In article <60nunqF1ro06iU4 at mid.uni-berlin.de>,
> Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
>
>> def extract_data(names, na, cells):
>> found = dict()
>
> The problem with initializing the 'super-dictionary' within this
> function is that I want to be able to add to it in further passes, with
> a new set of "names" & "cells" each time.
Then you can either pass in `found` as argument instead of creating it
here, or you collect the passes in the calling code with the `update()`
method of `dict`. Something like this:
found = dict()
for pass in passes:
# ...
found.update(extract_data(names, na, cells))
> BTW what's the difference between the above and "found = {}"?
I find it more "explicit". ``dict`` and ``list`` are easier to
distinguish than ``{}`` and ``[]`` after a loooong coding session or when
printed/displayed in a small font. It's just a matter of taste.
>> for i, name in enumerate(names):
>> data = dict()
>> cells_index = 10 * i + na
>> for cell_name, index, parse in (('epoch1', 0, parse_date),
>> ('epoch2', 1, parse_date),
>> ('time', 5, parse_number),
>> ('score1', 6, parse_number),
>> ('score2', 7, parse_number)):
>> data[cell_name] = parse(cells[cells_index + index])
>
> This looks a lot more efficient than my version, but what about the
> strings that don't need parsing? Would it be better to define a
> 'pass-through' function that just returns its input, so they can be
> handled by the same loop, or to handle them separately with another loop?
I'd handle them in the same loop. A "pass-through" function for strings
already exists:
In [255]: str('hello')
Out[255]: 'hello'
>> assert name.startswith('Name: ')
>
> I looked up "assert", but all I could find relates to debugging. Not
> that I think debugging is something I can do without ;) but I don't
> understand what this line does.
It checks if `name` really starts with 'Name: '. This way I turned the
comment into code that checks the assertion in the comment.
>> The `parse_number()` function could look like this:
>>
>> def parse_number(string):
>> try:
>> return float(string.replace(',', ''))
>> except ValueError:
>> return string
>>
>> Indeed the commas can be replaced a bit more elegant. :-)
>
> Nice, but I'm somewhat intimidated by the whole concept of
> exception-handling (among others). How do you know to expect a
> "ValueError" if the string isn't a representation of a number?
Experience. I just tried what happens if I feed `float()` with a string
that is no number:
In [256]: float('abc')
---------------------------------------------------------------------------
<type 'exceptions.ValueError'> Traceback (most recent call last)
/home/bj/<ipython console> in <module>()
<type 'exceptions.ValueError'>: invalid literal for float(): abc
> Is there a list of common exceptions somewhere? (Searching for
> "ValueError" turned up hundreds of passing mentions, but I couldn't find
> a definition or explanation.)
The definition is quite vague. The type of an argument is correct, but
there's something wrong with the value.
See http://docs.python.org/lib/module-exceptions.html for an overview of
the built in exceptions.
>> As already said, that ``while`` loop should be a ``for`` loop. But if
>> you put `m_abbrevs` into a `list` you can replace the loop with a
>> single call to its `index()` method: ``dlist[1] =
>> m_abbrevs.index(dlist[1]) + 1``.
>
> I had gathered that lists shouldn't be used for storing constants. Is
> that more of a suggestion than a rule?
Some suggest this. Others say tuples are for data where the position of
an element has a "meaning" and lists are for elements that all have the
same "meaning" for some definition of meaning. As an example ('John',
'Doe', 'Dr.') vs. ['Peter', 'Paul', 'Mary']. In the first example we have
name, surname, title and in the second example all elements are just
names. Unless the second example models a relation like child, father,
mother, or something like that. Anyway, if you can make the source simpler
and easier to understand by using the `index()` method, use a list. :-)
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list