problem with regex, how to conclude more than one character

tecspring at gmail.com tecspring at gmail.com
Fri Nov 7 02:08:38 EST 2008


On Nov 7, 3:06 pm, tecspr... at gmail.com wrote:
> I always have no idea about how to express "conclude the entire word"
> with regexp,  while using python, I encountered this problem again...
>
> for example, if I want to match the "string" in "test a string",
> re.findall(r"[^a]* (\w+)","test a string") will work, but what if
> there is not "a" but "an"(test a string)? the [^an] will failed
> because it will stop at the first character "a".
>
> I guess people not always use this kind of way to filter words?
> Here comes the real problem I encountered:
> I want to filter the text both in "<td>" block and the "<span>"'s
> title attribute
> ###################### code #############################
> import re
> content='''<tr align="center" valign="middle" class="CellCss"><td
> valign="middle">LA</td><td valign="middle">11/10/2008</td><td
> valign="middle">1340/1430</td><td valign="middle">PF1/5</td><td
> valign="middle"><span title="Understanding the stock market"
> class="MouseCursor">Understand....</span></td><td title="Charisma"
> valign="middle">Charisma</td><td valign="middle">Booked</td><td
> valign="middle">'''
>
> re.findall(r'''<td valign="middle">([^<]+)</td><td
> valign="middle">([^<]+)</td><td valign="middle">([^<]+)</td><td
> valign="middle">([^<]+)</td><td valign="middle"><span
> title="([^"]*)"''',content)
>
> #################### code end ############################
> As you saw above,
> I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding
> the stock market"
> there are two "<span>" block but I can just get the "title" attribute
> of the first "<span>" using regexp.
> for the second, which should be "Charisma" I need to use some kind of
> [^</td>]* to match "class="MouseCursor">Understand....</span></td>",
> then I can continue match the second "<span>" block.
>
> Maybe I didn't describe this clearly, then feel free to tell me:)
> thanks for any further reply!

And by the way, I've tried both (!</td>) and (?:!</td>), many ways
doesn't work.... so sad...



More information about the Python-list mailing list