Parsing Indented Text (like parsing Python)

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Mar 11 06:44:12 EDT 2007


En Sun, 11 Mar 2007 06:34:03 -0300, Mike Schinkel <mikeschinkel at gmail.com>  
escribió:

> I'm trying to write an app that parses a text file containing an outline
> useing essentially the same indentation rules as Python source code, i.e.
> the first level has no indent, the second level has one indent, third

You can get some ideas from tabnanny.py or reindent.py, but perhaps they  
are too specific at parsing Python code.

I hope this informal description may help:

Start with IC = Previous IC = 0, and a stack with a single 0 element
For each line in the file:
   compute the indentation column IC (that is, count the number of leading  
whitespace characters; perhaps replacing tabs as 8 spaces)
   compare IC with the Previous IC:
   	same: continue with next line
	IC > previous ("indent"): push IC onto indent stack
	IC < previous ("dedent"):
		discard top of stack
		look at the new top of stack (but dont discard it); if not the same,  
indentation error.
   Previous IC = IC
	
Note: You can rewrite the above without using Previous IC, only the  
stack... left to the reader :)
Note 2: At each stage, the "indentation level" is the current stack size.

-- 
Gabriel Genellina




More information about the Python-list mailing list