[Edu-sig] whitespace and newlines and seperators ... oh my

Fri Feb 3 16:02:11 CET 2006

More (semi-IMO) irrelevancy ....

Trying to get to a presentable, announceable alpha release of PyGeo... 
and ending up confronting new complexities of the kind I want to just 
wish away.

Alan  Kay hasn't gotten there yet, so I am stuck with the prospect of 
understanding things and thinking things through.

OK - watch my directory separators in HTML.  HTML that works on my local 
Windows box breaks on the server.  Think Unix separators. Got it..

Then there is of course the cross platform newline issue - CRLF, CR, LF 
which gets me doing search and replace in ascii hex (geeky) and screwing 
up working code. 

What's the standard way to deal with this issue in a cross platform 
distribution?.  Still don't know.

But then it gets hairy, and Python specific:

Trying to use the pre-alpha pudge document  generator for the portion of 
PyGeo I consider to be  the underlying "framework".

http://pudge.lesscode.org/

Choosing pudge because it supports embedded reStructured text, and it 
supports "code as  text" - which I see as part of PyGeo at a basic level 
- by automating the linking of documentation to colorized, html versions 
of the actual code.  Very cool.

There seems to be a group of heavyweights behind it.

And the code base is small, I can follow it - so when it breaks when 
confronting a Boost-wrapped cvisual function I can find the work around..

First bigger problem is that pudge chokes on what I think is valid 
reStructured text - finding tables that pass reStructured scrutiny in 
stand-alone files to be malformed when embedded in triple quoted doc 
comments.  I am assuming this is some kind of whitespace parsing issue, 
but haven't dug into the code far enough to verify.  I am hoping it is 
something I can solve and feed back into the pudge project. Remains to 
be seen.

More surprising was the html colorizing problem.  The colorizing code 
relies on tokenize.py from the standard library - which keeps choking on 
code that compiles and runs fine by Python.  So I go to tabnanny.py, 
which is seems to be there exactly to diagnose these kinds of issues.  
But one of the symptoms of the problem is that tabnanny (i.e tokenize) 
is parsing the file in such a way that it is reporting back line numbers 
that don't correspond to the code when viewed in a text editor.

So its hard to pinpoint the problem.

Turns out (I think) - this took be a while - that tokenize seems to be 
trying to parse things between triple quoted strings, and since there is 
a lot of code intended to output to Povray SDL and formatted for that 
purpose - it is choking on whitespace issues (that IMO shouldn't be 
issues) in that code.

Is it fair to think that all rules should be off between """ ...and... 
""" - and that if I am right that this is where the choke is, that I 
should file a bug report.

OTOH, seems unlikely that this has not been confronted before.

Any clues to what I may be missing is appreciated.

Art