regular expressions, substituting and adding in one step?

Paul McGuire ptmcg at austin.rr._bogus_.com
Mon May 8 12:20:48 EDT 2006


"John Salerno" <johnjsal at NOSPAMgmail.com> wrote in message
news:yEI7g.2056$No6.45379 at news.tufts.edu...
> Ok, this might look familiar. I'd like to use regular expressions to
> change this line:
>
> self.source += '<p>' + paragraph + '</p>\n\n'
>
> to read:
>
> self.source += '<p>%s</p>\n\n' % paragraph
>
John -

You've been asking for re-based responses, so I apologize in advance for
this digression.  Pyparsing is an add-on Python module that can provide a
number of features beyond just text matching and parsing.  Pyparsing allows
you to define callbacks (or "parse actions") that get invoked during the
parsing process, and these callbacks can modify the matched text.

Since your re approach seems to be on a fairly convergent path, I felt I
needed to come up with more demanding examples to justify a pyparsing
solution.  So I contrived these additional cases:

self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

The following code processes these expressions.  Admittedly, it is not as
terse as your re-based code samples have been, but it may give you another
data point in your pursuite of a solution.  (The pyparsing home wiki is at
http://pyparsing.wikispaces.com.)

The purpose of the intermediate classes is to convert the individual terms
of the string expresssion into a list of string terms, either variable
references or quoted literals.  This conversion is done in the term-specific
parse actions created by makeTermParseAction.  Then the overall string
expression gets its own parse action, which processes the list of term
objects, and creates the modified string expression.  Two different string
expression conversion functions are shown, one generating string
interpolation expressions, and one generating "".join() expressions.

Hope this helps, or is at least mildly entertaining,
-- Paul


================
from pyparsing import *

testLines = r"""
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
"""

# define some classes to use during parsing
class StringExprTerm(object):
    def __init__(self,content):
        self.content = content

class VarRef(StringExprTerm):
    pass

class QuotedLit(StringExprTerm):
    pass

def makeTermParseAction(cls):
    def parseAction(s,l,tokens):
        return cls(tokens[0])
    return parseAction

# define parts we want to recognize as terms in a string expression
varName = Word(alphas+"_", alphanums+"_")
varName.setParseAction( makeTermParseAction( VarRef ) )
quotedString.setParseAction( removeQuotes, makeTermParseAction(
QuotedLit ) )
stringTerm = varName | quotedString

# define a string expression in terms of term expressions
PLUS = Suppress("+")
EQUALS = Suppress("=")
stringExpr = EQUALS + stringTerm + ZeroOrMore( PLUS + stringTerm )

# define a parse action, to be invoked every time a string expression is
found
def interpolateTerms(originalString,locn,tokens):
    out = []
    refs = []
    terms = tokens
    for term in terms:
        if isinstance(term,QuotedLit):
            out.append( term.content )
        elif isinstance(term,VarRef):
            out.append( "%s" )
            refs.append( term.content )
        else:
            print "hey! this is impossible!"

    # generate string to be interpolated, and interp operator
    outstr = "'" + "".join(out) + "' % "

    # generate interpolation argument tuple
    if len(refs) > 1:
        outstr += "(" + ",".join(refs) + ")"
    else:
        outstr += ",".join(refs)

    # return generated string (don't forget leading = sign)
    return "= " + outstr

stringExpr.setParseAction( interpolateTerms )

print "Original:",
print testLines
print
print "Modified:",
print stringExpr.transformString( testLines )

# define slightly different parse action, to use list join instead of string
interp
def createListJoin(originalString,locn,tokens):
    out = []
    terms = tokens
    for term in terms:
        if isinstance(term,QuotedLit):
            out.append( "'" + term.content + "'" )
        elif isinstance(term,VarRef):
            out.append( term.content )
        else:
            print "hey! this is impossible!"

    # generate string to be interpolated, and interp operator
    outstr = "[" + ",".join(out) + "]"

    # return generated string (don't forget leading = sign)
    return "= ''.join(" + outstr + ")"

del stringExpr.parseAction[:]
stringExpr.setParseAction( createListJoin )

print
print "Modified (2):",
print stringExpr.transformString( testLines )

================
Prints out:
Original:
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

Modified:
self.source += '<p>%s</p>\n\n' % paragraph
listItem1 = '<li>%s</li>' % someText
listItem2 = '<li>%s</li>' % someMoreText
self.source += '<ul>%s\n%s\n</ul>\n\n' % (listItem1,listItem2)

Modified (2):
self.source += ''.join(['<p>',paragraph,'</p>\n\n'])
listItem1 = ''.join(['<li>',someText,'</li>'])
listItem2 = ''.join(['<li>',someMoreText,'</li>'])
self.source += ''.join(['<ul>',listItem1,'\n',listItem2,'\n','</ul>\n\n'])
================





More information about the Python-list mailing list