[Tutor] RE troubles (fwd)

Mon Aug 16 22:14:14 CEST 2004

[Followup to tutor at python.org; =D8yvind figured out a good regex that does
nongreedy matching for the problem.]

---------- Forwarded message ----------
Date: Mon, 16 Aug 2004 20:58:42 +0200 (CEST)
From: "[iso-8859-1] =D8yvind" <python at kapitalisten.no>
To: Danny Yoo <dyoo at hkn.eecs.berkeley.edu>
Subject: Re: [Tutor] RE troubles

>> I know that the word is following "target=3D"_top">" and is before "</a>=
<a
>> href=3Djavascript". So the document will contain five instances of:
>>
>> target=3D"_top"> word1 </a><a href=3Djavascript
>> target=3D"_top">sentence 2</a><a href=3Djavascript
>> and so forth....
>>
>> How do I get them out?
>
>
> You can probably get what you want by doing something like this:
>
> ###
>>>> regex =3D re.compile(r"""\|
> ...                        (.*?)
> ...                        \|""", re.VERBOSE)
>>>>
> ###

Hello and thanks for the help.

  I got the following to work. Hopefully it is a good way of doing it...

rawstr =3D r"""target=3D"_top">(.*?)\n</a
"""
compile_obj =3D re.compile(rawstr,  re.IGNORECASE| re.MULTILINE| re.VERBOSE=
)
matchstr =3D side.read()
liste =3D compile_obj.findall(matchstr)

Have a great day,
=D8yvind