Parsing a file with iterators

Fri Oct 17 18:32:18 EDT 2008

On 17 Oct, 16:42, Luis Zarrabeitia <ky... at uh.cu> wrote:
> I need to parse a file, text file. The format is something like that:
>
> TYPE1 metadata
> data line 1
> data line 2
> ...
> data line N
> TYPE2 metadata
> data line 1
> ...
> TYPE3 metadata
> ...
>
> And so on. The type and metadata determine how to parse the following data
> lines. When the parser fails to parse one of the lines, the next parser is
> chosen (or if there is no 'TYPE metadata' line there, an exception is thrown).
>
> This doesn't work:
>
> ===
> for line in input:
>     parser = parser_from_string(line)
>     parser(input)
> ===
>
> because when the parser iterates over the input, it can't know that it finished
> processing the section until it reads the next "TYPE" line (actually, until it
> reads the first line that it cannot parse, which if everything went well, should
> be the 'TYPE'), but once it reads it, it is no longer available to the outer
> loop. I wouldn't like to leak the internals of the parsers to the outside.
>
> What could I do?
> (to the curious: the format is a dialect of the E00 used in GIS)

The main issue seems to be that you need to keep the 'current' line
data when a parser has decided it doesn't understand it so it can
still be used to select the next parser. The for loop in your example
uses the next() method which only returns the next and never the
current line. There are two easy options though:

1. Wrap the input file with your own object.
2. Use the linecache module and maintain a line number.

  http://blog.doughellmann.com/2007/04/pymotw-linecache.html

--
HTH,
James