Stripping non-numbers from a file parse without nested lists?

andrew cooke andrew at acooke.org
Tue Mar 31 21:58:07 EDT 2009


Rhodri James wrote:
> On Tue, 31 Mar 2009 06:51:33 +0100, <daku9999 at gmail.com> wrote:
>
>> There has got to be a better way of doing this:
>>
>> I'm reading in a file that has a lot of garbage, but eventually has
>> something that looks similar to:
>> (some lines of garbage)
>> dip/dir.
>> (some more lines of garbage)
>> 55/158
>> (some more lines of garbage)
>> 33/156
>> etc.
>>
>> and I'm stripping out the 55/158 values (with error checking
>> removed):
>> ------
>> def read_data(filename):
>>        fh = open(filename, "r", encoding="ascii")
>>
>>        for line in fh:
>>            for word in line.lower().split():
>>                if "/" in word and "dip" not in word:
>>                    temp = word.partition("/")
>>                    dip.append(temp[0])
>>                    dir.append(temp[2])
>> -----
>>
>> I can't figure out a nicer way of doing it without turning the thing
>> into a nested list (non-ideal).  I could put the entire tuple inside
>> of a list, but that gets ugly with retrieval.  I'm sure there is an
>> easier way to store this.  I was having trouble with dictionary's due
>> to non-uniquie keys when I tried that route.
>>
>> Any ideas for a better way to store it?  This ultimately parses a
>> giant amount of data (ascii dxf's) and spits the information into a
>> csv, and I find the writing of nested lists cumbersome and I'm sure
>> I'm missing something as I'm quite new to Python.

i don't follow exactly what the problem is, but the mention of nested
lists makes me think maybe you need a generator.  you can define this
function:

def tokens(filename):
    with open(filename, "r", encoding="ascii") as fh:
        for line in fh:
            for word in line.lower().split():
                if "/" in word and "dip" not in word:
                    temp = word.partition("/")
                    yield(temp[0], temp[2])

and then elsewhere do:

for (val1, val2) in tokens(filename):
    .... stuff here ...

which is a very common pattern for avoiding constructing lists of things
that you want to use elsewhere.

andrew







More information about the Python-list mailing list