Removing comments... tokenize error

Tue Apr 12 12:57:53 EDT 2005

In analysing a very big application (pysol) made of almost
100 sources, I had the need to remove comments.

Removing the comments which take all the line is straightforward...

Instead for the embedded comments I used the tokenize module.

To my surprise the analysed output is different from the input
(the last tuple element should exactly replicate the input line)
The error comes out in correspondance of a triple string.
I don't know if this has already been corrected (I use Python 2.3)
or perhaps is a mistake on my part...

Next you find the script I use to replicate the strange behaviour:

import tokenize

Input = "pippo1"     
Output = "pippo2"     

f = open(Input)
fOut=open(Output,"w")

nLastLine=0
for i in tokenize.generate_tokens(f.readline):
.   if nLastLine != (i[2])[0]:   # the 3rd element of the tuple is
.   .   nLastLine = (i[2])[0]    # (startingRow, startingCol)
.   .   fOut.write(i[4])

f.close()
fOut.close()

The file to be used (pippo1) contains an extract:

class SelectDialogTreeData:
.   img = None
.   def __init__(self):
.   .   self.tree_xview = (0.0, 1.0)
.   .   self.tree_yview = (0.0, 1.0)
.   .   if self.img is None:
.   .   .   SelectDialogTreeData.img = (makeImage(dither=0, data="""
R0lGODlhEAAOAPIFAAAAAICAgMDAwP//AP///4AAAAAAAAAAACH5BAEAAAUALAAAAAAQAA4AAAOL
WLrcGxA6FoYYYoRZwhCDMAhDFCkBoa6sGgBFQAzCIAzCIAzCEACFAEEwEAwEA8FAMBAEAIUAYSAY
CAaCgWAgGAQAhQBBMBAMBAPBQDAQBACFAGEgGAgGgoFgIBgEAAUBBAIDAgMCAwIDAgMCAQAFAQQD
AgMCAwIDAgMCAwEABSaiogAKAKeoqakFCQA7"""), makeImage(dither=0, data="""
R0lGODlhEAAOAPIFAAAAAICAgMDAwP//AP///4AAAAAAAAAAACH5BAEAAAUALAAAAAAQAA4AAAN3
WLrcHBA6Foi1YZZAxBCDQESREhCDMAiDcFkBUASEMAiDMAiDMAgBAGlIGgQAgZeSEAAIAoAAQTAQ
DAQDwUAwAEAAhQBBMBAMBAPBQBAABACFAGEgGAgGgoFgIAAEAAoBBAMCAwIDAgMCAwEAAApERI4L
jpWWlgkAOw=="""), makeImage(dither=0, data="""
R0lGODdhEAAOAPIAAAAAAAAAgICAgMDAwP///wAAAAAAAAAAACwAAAAAEAAOAAADTii63DowyiiA
GCHrnQUQAxcQAAEQgAAIg+MCwkDMdD0LgDDUQG8LAMGg1gPYBADBgFbs1QQAwYDWBNQEAMHABrAR
BADBwOsVAFzoqlqdAAA7"""), makeImage(dither=0, data="""
R0lGODdhEAAOAPIAAAAAAAAAgICAgMDAwP8AAP///wAAAAAAACwAAAAAEAAOAAADVCi63DowyiiA
GCHrnQUQAxcUQAEUgAAIg+MCwlDMdD0LgDDQBE3UAoBgUCMUCDYBQDCwEWwFAUAwqBEKBJsAIBjQ
CDRCTQAQDKBQAcDFBrjf8Lg7AQA7"""))   

The output of tokenize (pippo2) gives instead:   

class SelectDialogTreeData:
.   img = None
.   def __init__(self):
.   .   self.tree_xview = (0.0, 1.0)
.   .   self.tree_yview = (0.0, 1.0)
.   .   if self.img is None:
.   .   .   SelectDialogTreeData.img = (makeImage(dither=0, data="""
AgMCAwIDAgMCAwEABSaiogAKAKeoqakFCQA7"""), makeImage(dither=0, data="""
jpWWlgkAOw=="""), makeImage(dither=0, data="""
BADBwOsVAFzoqlqdAAA7"""), makeImage(dither=0, data="""
CDRCTQAQDKBQAcDFBrjf8Lg7AQA7"""))

... with a big difference! Why?