Tokenizer inconsistency wrt to new lines in comments

George Sakkis george.sakkis at gmail.com
Fri Apr 4 17:08:18 EDT 2008


On Apr 4, 4:38 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> George Sakkis wrote:
> >> If it was a bug it has to violate a functional requirement. I can't
> >> see which one.
>
> > Perhaps it's not a functional requirement but it came up as a real
> > problem on a source colorizer I use. I count on newlines generating
> > token.NEWLINE or tokenize.NL tokens in order to produce <br> tags. It
> > took me some time and head scratching to find out why some comments
> > were joined together with the following line. Now I have to check
> > whether a comment ends in new line and if it does output an extra <br>
> > tag.. it works but it's a kludge.
>
> well, the real kludge here is of course that you're writing your own
> colorizer, when you can just go and grab Pygments:
>
>    http://pygments.org/
>
> or, if you prefer something tiny and self-contained, something like the
> colorizer module in this directory:
>
>    http://svn.effbot.org/public/stuff/sandbox/pythondoc/
>
> (the element_colorizer module in the same directory gives you XHTML in
> an ElementTree instead of raw HTML, if you want to postprocess things)
>
> </F>

First off, I didn't write it from scratch, I just tweaked a single
module colorizer I had found online. Second, whether I or someone else
had to deal with it is irrelevant; the point is that generate_tokens()
is not consistent with respect to new lines after comments.

George



More information about the Python-list mailing list