Regular expressopn problem

jim.vickroy jim.vickroy at noaa.gov
Thu May 10 10:47:22 EDT 2001


Hello Martin,

This is what I tried:

>>> import re
>>> parser =
re.compile(r"<token>([0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-9A-Ea-e]+)</token>")

>>> result =
parser.search('<token>4CA064D0-5653-45E4-9581-9EE9A9B12A49</token>')
>>> result
<SRE_Match object at 013C4F38>
>>> result.start()
0
>>> result.end()
51
>>>

I do not think you need to "escape" hyphens outside the [] expressions.

BTW, have you looked at xml.sax.  If all you want to do is parse an xml
document, the SAX interface may be easier as it does this sort of parsing for
you.




Martin Bertolino wrote:

> I'm using Python 2.0 to try to parse out some GUIDs from an XML document
> returned from an HTTP server, but I'm having some problems with some regular
> expressions that I believe should match what I'm looking for.
>
> Consider the following input to the parse_token function:
>
> <response version="1.0" server="test" code="0000" result_msg="success">
>     <signon_user request_id="123"
> response_id="E95FA6E0-3944-4945-93CD-4E37EDB807E3"
>             code="0000" result_msg="success">
>                 <token>4CA064D0-5653-45E4-9581-9EE9A9B12A49</token>
>                 <pin_attributes pin_set_needed="N" grace_logins_left="3"/>
>                 <pin_control pin_set_enabled="N" pin_reset_enabled="N"/>
>                 <last_signon date_time="2001-05-09T15:30:06Z"/>
>                 <identification ssn="222222222" employee_id="222222222"/>
>                 <name title="sir" first="jokey" middle="dorcs" last="smokey"
> suffix="sr."/>
>                 <division_id>central</division_id>
>     </signon_user>
> </response>
>
> and the parse_token function, whick attempts to get the value of the <token>
> node,  implemented as follows
>
> import re
>
> def parse_token(output_xml):
>     # v0: token_re =
> re.search(r"<token>([0-9A-Ea-e]+\-[0-9A-Ea-e]+\-[0-9A-Ea-e]+\-[0-9A-Ea-e]+\-
> [0-9A-Ea-e]+)</token>", output_xml)
>     # v1: token_re = re.search(r"<token>([0-9A-Ea-e\-]+)</token>",
> output_xml)
>     token_re = re.search(r"<token>(.+)</token>", output_xml)
>     if token_re:
>         return token_re.group(1)
>     else:
>         raise "unable to parse token out of output xml"
>
> The re.search that is not commented correctly matches the node so it returns
> 4CA064D0-5653-45E4-9581-9EE9A9B12A49. The other two (v0 and v1) do not. I
> tried those two RE on a perl script, and I do get the expected output.
>
> Am I missing something? It seems as if the compile is getting thrown by the
> '-'s in the RE.
>
> Thanks for the help.
>
> Martin Bertolino




More information about the Python-list mailing list