remove strings from source

M.E.Farmer mefjr75 at hotmail.com
Sat Feb 26 18:59:44 EST 2005


qwweeeit wrote:
> Thank you for your suggestion, but it is too complicated for me...
> I decided to proceed in steps:
> 1. Take away all commented lines
> 2. Rebuild the multi-lines as single lines
ummm,
Ok all i can say is did you try this?
if not save it as a module then import it into the interperter and try
it.
This is a dead simple module to do *exactly* what you asked for :)
Like i said I have done this before so I will restate *I HAVE FAILED AT
THIS BEFORE, MANY TIMES*. Now I have a solution.
It handles stdio by default but can write to a filelike object if you
give it one.
Handles continued lines already, no need to futz around with some
solution.
Here is an example:
Py> filein = """
... class Stripper:
...     '''python comment and whitespace stripper
...     '''
...     def __init__(self, raw):
...         ''' Store the source text & set some flags.
...         '''
...         self.raw = raw
...
...     def format(self, out=sys.stdout, comments=0,
...                      spaces=1, untabify=1,eol='unix'):
...         '''Parse and send the colored source.'''
...         # Store line offsets in self.lines
...         self.lines = [0, 0]
...         pos = 0
...         # Strips the first blank line if 1
...         self.lasttoken = 1
...         self.temp = StringIO.StringIO()
...         self.spaces = spaces
...         self.comments = comments
...
...         if untabify:
...            self.raw = self.raw.expandtabs()
...         self.raw = self.raw.rstrip()+' '
...         self.out = out
...     """
Py> replacer = ReplaceParser(filein, out=sys.stdout)
Py> replacer.format()
class Stripper:
    s000001
    def __init__(self, raw):
        s000002
        self.raw = raw

    def format(self, out=sys.stdout, comments=0,
                     spaces=1, untabify=1,eol=s000003):
        s000004
        # Store line offsets in self.lines
        self.lines = [0, 0]
        pos = 0
        # Strips the first blank line if 1
        self.lasttoken = 1
        self.temp = StringIO.StringIO()
        self.spaces = spaces
        self.comments = comments

        if untabify:
           self.raw = self.raw.expandtabs()
        self.raw = self.raw.rstrip()+s000005
        self.out = out
Py> replacer.StringMap
{'s000004': "'''Parse and send the colored source.'''",
 's000005': "' '",
 's000001': "'''python comment and whitespace stripper :)\n    '''",
 's000002': "''' Store the source text & set some flags.\n        '''",
 's000003': "'unix'"}

You can also strip out comments with a few line.
It can easily get single comments or doubles.
add this in your __call__ function:
[snip]
            self.pos = newpos
            return
        # kills comments
        if (toktype == tokenize.COMMENT):
            return
        if (toktype == token.STRING):
            sname = self.StringName.next()
[snip]

If you insist on writing something go ahead.
Let me know what your solution is, I am curious.
M.E.Farmer




More information about the Python-list mailing list