Parsing Indented Text (like parsing Python)

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Mar 11 09:37:06 EDT 2007


En Sun, 11 Mar 2007 10:09:57 -0300, Mike Schinkel <mikeschinkel at gmail.com>  
escribió:

> The problem is, how do I figure out how many spaces represent a tab? In

You can't, unless you have more context.

> one
> case, someone could have their editor configured to allow tabs to use 3
> spaces and the user could intermingle tabs and spaces. In other cases, a
> user might have their editor configured to have a tab equal 8 spaces yet
> also intermingle tabs and spaces. When a human looks at the document it  
> is
> obvious the setting but how can I make it obvious to my program?

"it is obvious the setting?"
How do you infer that? From other properties of the document, semantics?
Just from the content, and the number of tabs and spaces, you can't get  
anything.

> I could force the user to specify tabwidth at the top of the file, but  
> I'd
> rather not. And since Python doesn't either, I know it is possible to  
> write
> a parser to do this. I just don't know how.

Python simply assumes 8 spaces per tab.
If your Python source ONLY uses tabs, or ONLY spaces, it doesn't matter.  
If you mix tabs+spaces, Python simple replaces each tab by 8 spaces. If  
you edited that using 4 spaces, Python will get it wrong. That's why all  
people always say "never mix tabs and spaces"

If you know "somehow" that a certain block -using tabs- has the same  
indentation that another block -using spaces- you could infer the number  
of spaces per tab.

-- 
Gabriel Genellina




More information about the Python-list mailing list