Parsing Indented Text (like parsing Python)

Mike Schinkel mikeschinkel at gmail.com
Sun Mar 11 05:34:03 EDT 2007


Hi,

I'm relatively new to Python but have lots of prior programming experience
as a developer, instructor, and author (ASP/VBScript/SQL Server and
Clipper.)

I'm trying to write an app that parses a text file containing an outline
useing essentially the same indentation rules as Python source code, i.e.
the first level has no indent, the second level has one indent, third level
has two indents, and so on. However, I can't know for sure that the
indentations are tabs or spaces, or even mixed tabs and spaces.  What's
more, I can't know for sure what tab spacing the persons editor was using if
they saved as spaces, i.e. tab='N' spaces where N= (2,3,4,5,6,7,8,...)

Clearly the source code for Python has to figure this out, but I wonder if
anyone knows how to do this in Python. Frankly I'm stumped on how to find an
elegant algorithm that does require multipass parsing and lots of code.  Any
help would be appreciated.

-- 
-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org
http://atlanta-web.org - http://t.oolicio.us









More information about the Python-list mailing list