code indentation

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Jul 23 23:20:52 EDT 2007


En Mon, 23 Jul 2007 16:53:01 -0300, ...:::JA:::...  
<vedrandekovic at v-programs.com> escribió:

>> If you are using the tokenize module as suggested some time ago, try to
>> analyze the token sequence you get using { } (or perhaps begin/end pairs
>> in your own language, that are easier to distinguish from a dictionary
>> display) and the sequence you get from the "real" python code. Then  
>> write
>> a script to transform one into another:
>
>> from tokenize import generate_tokens
>> from token import tok_name
>  >from cStringIO import StringIO
>
>> def analyze(source):
>   >   g = generate_tokens(StringIO(source).readline)
>    >  for toknum, tokval, _, _, _  in g:
>      >    print tok_name[toknum], repr(tokval)
>
>> I think you basically will have to ignore INDENT, DEDENT, and replace
>> NAME+"begin" with INDENT, NAME+"end" with DEDENT.
>
> So......how can I do this?????????????
> I will appreciate any help!!!!!

Try with a simple example. Let's say you want to convert this:

for x in range(10):
begin
print x
end

into this:

for x in range(10):
   print x

Using the analyze() function above, the former block (pseudo-python) gives  
this sequence of tokens:

NAME 'for'
NAME 'x'
NAME 'in'
NAME 'range'
OP '('
NUMBER '10'
OP ')'
OP ':'
NEWLINE '\n'
NAME 'begin'
NEWLINE '\n'
NAME 'print'
NAME 'x'
NEWLINE '\n'
NAME 'end'
ENDMARKER ''

The latter block ("real" python) gives this sequence:

NAME 'for'
NAME 'x'
NAME 'in'
NAME 'range'
OP '('
NUMBER '10'
OP ')'
OP ':'
NEWLINE '\n'
INDENT '  '
NAME 'print'
NAME 'x'
DEDENT ''
ENDMARKER ''

If you feed this token sequence into untokenize, in response you get a  
source code equivalent to the "real" python example above. So, to convert  
your "pseudo" python into the "real" python, it's enough to convert the  
first token sequence into the second - and from that, you can reconstruct  
the "real" python code. Converting from one sequence into the other is a  
programming exercise and has nothing to do with the details of the  
tokenize module, nor is very Python-specific - looking at both sequences  
you should figure out how to convert one into the other. (Hint: a few  
additional newlines are not important)

It is even simpler than the example given in the tokenize documentation:  
<http://docs.python.org/lib/module-tokenize.html> - which transforms  
3.1416 into Decimal("3.1416") by example.

Once you get this simple case working, you may try what happens with this:

for x in range(10):
   begin
     print x
   end

and this:

for x in range(10): begin
   print x
end

and later this:

for x in range(10):
   begin
     print x
end

You are now using explicit begin/end pairs to group statements, so  
indentation is no more significant. You may want to preprocess the  
pseudo-python source, stripping any leading blanks, before using tokenize  
- else you'll get indentation errors (which are bogus in your  
pseudo-python dialect).

Since this will be your own Python dialect, don't expect that someone else  
will do the work for you - you'll have to do it yourself. But it's not too  
dificult if you do the things in small steps. In case you get stuck at any  
stage and have specific questions feel free to ask.

-- 
Gabriel Genellina




More information about the Python-list mailing list