Parse each line by character location

Arnaud Delobelle arnodel at googlemail.com
Tue Nov 4 14:16:31 EST 2008


claird at lairds.us (Cameron Laird) writes:

> In article <mailman.3467.1225819771.3487.python-list at python.org>,
> Tim Chase  <python.list at tim.thechases.com> wrote:
>>> I hope this is the right place to ask, but I am trying to come up with
>>> a way to parse each line of a file. Unfortunately, the file is neither
>>> comma, nor tab, nor space delimited. Rather, the character locations
>>> imply what field it is.
>>> 
>>> For example:
>>> 
>>> The first ten characters would be the record number, the next
>>> character is the client type, the next ten characters are a volume,
>>> and the next three are order type, and the last character would be an
>>> optional type depending on the order type.
>>
>>Sounds like you could do something like
>>
>>   recno_idx = slice(0,10)
>>   client_idx = slice(10, 11)
>>   volume_idx = slice(11,11+10)
>>   order_type_idx = slice(11+10, 11+10+3)
> 			.
> 			.
> 			.
> !?  That seems to me confusingly far from a working solution,
> at least in comparison to
>
>     recno_idex = the_line[0:10]
>     client_idx = the_line[10:11]
> 	...
>
> What am I missing?

I suppose in your case the slice objects will be re-created for each
line, whereas with Tim's solution they are created once and for all
before the parsing starts.  This may result in speedier parsing.

-- 
Arnaud



More information about the Python-list mailing list