Parsing Indented Text (like parsing Python)

Mike Schinkel mikeschinkel at gmail.com
Sun Mar 11 09:09:57 EDT 2007


Gabriel Genellina wrote:
> Start with IC = Previous IC = 0, and a stack with a single 0 
> element For each line in the file:
>    compute the indentation column IC (that is, count the 
> number of leading whitespace characters; perhaps replacing 
> tabs as 8 spaces)
>    compare IC with the Previous IC:
>    	same: continue with next line
> 	IC > previous ("indent"): push IC onto indent stack
> 	IC < previous ("dedent"):
> 		discard top of stack
> 		look at the new top of stack (but dont discard 
> it); if not the same, indentation error.
>    Previous IC = IC

I went away and reviewed this, but it appears it doesn't tackle the
difficult part which was what made me ask the question in the first place.  

The problem is, how do I figure out how many spaces represent a tab? In one
case, someone could have their editor configured to allow tabs to use 3
spaces and the user could intermingle tabs and spaces. In other cases, a
user might have their editor configured to have a tab equal 8 spaces yet
also intermingle tabs and spaces. When a human looks at the document it is
obvious the setting but how can I make it obvious to my program?  

I could force the user to specify tabwidth at the top of the file, but I'd
rather not. And since Python doesn't either, I know it is possible to write
a parser to do this. I just don't know how.

-- 
-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org
http://atlanta-web.org - http://t.oolicio.us




More information about the Python-list mailing list