Where can be a problem?

Paul McGuire ptmcg at austin.rr.com
Fri Aug 12 11:21:40 EDT 2005


Try this, its a bit more readable than your re.

from pyparsing import Word,nums,Literal,replaceWith

data1='''<a href="detailaspxmember=15015&m-ode=advert" </a><a
href="detailaspxmember=15016&m­ode=advert" </a><a
href="detailaspxmember=15017&m­ode=advert" </a>'''

# a number is a word composed of nums, that is, the digits 0-9
# your search string is looking for a number between an '=' and '&'
EQUALS = Literal("=")
AMPER = Literal("&")
number = Word(nums)
hrefNumber = EQUALS + number + AMPER

# scanString is a generator, that returns matching tokens, start,
# and end location for each occurrence in the input string - we
# just care about the second token of each match
print [ tokens[1] for tokens,s,e in hrefNumber.scanString(data1) ]

# just for grins, here is how to convert the numbers to the
# string "###"
number.setParseAction( replaceWith("###") )
print number.transformString(data1)


Prints:

['15015', '15016', '15017']
<a href="detailaspxmember=###&m-ode=advert" </a><a
href="detailaspxmember=###&m­ode=advert" </a><a
href="detailaspxmember=###&m­ode=advert" </a>

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul




More information about the Python-list mailing list