Elementary string-parsing

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Mon Feb 4 02:55:38 EST 2008


On Mon, 04 Feb 2008 03:21:18 +0000, Odysseus wrote:

> def extract_data():
>     i = 0
>     while i < len(names):
>         name = names[i][6:] # strip off "Name: "
>         found[name] = {'epoch1': cells[10 * i + na],
>                        'epoch2': cells[10 * i + na + 1],
>                        'time': cells[10 * i + na + 5],
>                        'score1': cells[10 * i + na + 6],
>                        'score2': cells[10 * i + na + 7]}

Here and in later code you use a ``while`` loop although it is known at
loop start how many times the loop body will be executed.  That's a job
for a ``for`` loop.  If possible not over an integer that is used later
just as index into list, but the list itself.  Here you need both, index
and objects from `names`.  There's the `enumerate()` function for creating
an iterable of (index, name) from `names`.

I'd put all the relevant information that describes a field of the
dictionary that is put into `found` into tuples and loop over it.  There
is the cell name, the index of the cell and function that converts the
string from that cell into an object that is stored in the dictionary. 
This leads to (untestet):

def extract_data(names, na, cells):
    found = dict()
    for i, name in enumerate(names):
        data = dict()
        cells_index = 10 * i + na
        for cell_name, index, parse in (('epoch1', 0, parse_date),
                                        ('epoch2', 1, parse_date),
                                        ('time', 5, parse_number),
                                        ('score1', 6, parse_number),
                                        ('score2', 7, parse_number)):
            data[cell_name] = parse(cells[cells_index + index])
        assert name.startswith('Name: ')
        found[name[6:]] = data
    return found

The `parse_number()` function could look like this:

def parse_number(string):
    try:
        return float(string.replace(',', ''))
    except ValueError:
        return string

Indeed the commas can be replaced a bit more elegant.  :-)

`parse_date()` is left as an exercise for the reader.

>         for k in ('epoch1', 'epoch2'):
>             dlist = found[name][k].split(" ")
>             m = 0
>             while m < 12:
>                 if m_abbrevs[m] == dlist[1]:
>                     dlist[1] = m + 1
>                     break
>                 m += 1
>             tlist = dlist[3].split(":")
>             found[name][k] = timegm((int(dlist[2]), int(dlist[1]),
>                                      int(dlist[0]), int(tlist[0]),
>                                      int(tlist[1]), int(tlist[2]),
>                                      -1, -1, 0))
>         i += 1
> 
> The function appears to be working OK as is, but I would welcome any & 
> all suggestions for improving it or making it more idiomatic.

As already said, that ``while`` loop should be a ``for`` loop.  But if you
put `m_abbrevs` into a `list` you can replace the loop with a single call
to its `index()` method: ``dlist[1] = m_abbrevs.index(dlist[1]) + 1``.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list