Controlling a generator the pythonic way

Thomas Lotze thomas at thomas-lotze.de
Sat Jun 11 10:10:32 EDT 2005


Hi,

I'm trying to figure out what is the most pythonic way to interact with
a generator.

The task I'm trying to accomplish is writing a PDF tokenizer, and I want
to implement it as a Python generator. Suppose all the ugly details of
toknizing PDF can be handled (such as embedded streams of arbitrary
binary content). There remains one problem, though: In order to get
random file access, the tokenizer should not simply spit out a series of
tokens read from the file sequentially; it should rather be possible to
point it at places in the file at random.

I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, or a new generator needs to be instantiated
every time the tokenizer is pointed to a new file position.

The first approach has both the disadvantage that the pointer value is
exposed and that due to the complex rules for hacking a PDF to tokens,
there will be a lot of yield statements in the generator code, which
would make for a lot of pointer assignments. This seems ugly to me.

The second approach is cleaner in that respect, but pointing the
tokenizer to some place has now the added semantics of creating a whole
new generator instance. The programmer using the tokenizer now needs to
remember to throw away any references to the generator each time the
pointer is reset, which is also ugly.

Does anybody here have a third way of dealing with this? Otherwise,
which ugliness is the more pythonic one?

Thanks a lot for any ideas.

-- 
Thomas



More information about the Python-list mailing list