Parse each line by character location
Arnaud Delobelle
arnodel at googlemail.com
Tue Nov 4 14:16:31 EST 2008
claird at lairds.us (Cameron Laird) writes:
> In article <mailman.3467.1225819771.3487.python-list at python.org>,
> Tim Chase <python.list at tim.thechases.com> wrote:
>>> I hope this is the right place to ask, but I am trying to come up with
>>> a way to parse each line of a file. Unfortunately, the file is neither
>>> comma, nor tab, nor space delimited. Rather, the character locations
>>> imply what field it is.
>>>
>>> For example:
>>>
>>> The first ten characters would be the record number, the next
>>> character is the client type, the next ten characters are a volume,
>>> and the next three are order type, and the last character would be an
>>> optional type depending on the order type.
>>
>>Sounds like you could do something like
>>
>> recno_idx = slice(0,10)
>> client_idx = slice(10, 11)
>> volume_idx = slice(11,11+10)
>> order_type_idx = slice(11+10, 11+10+3)
> .
> .
> .
> !? That seems to me confusingly far from a working solution,
> at least in comparison to
>
> recno_idex = the_line[0:10]
> client_idx = the_line[10:11]
> ...
>
> What am I missing?
I suppose in your case the slice objects will be re-created for each
line, whereas with Tim's solution they are created once and for all
before the parsing starts. This may result in speedier parsing.
--
Arnaud
More information about the Python-list
mailing list