Regular expressopn problem

Martin Bertolino moc.xlat at bxm
Thu May 10 12:02:48 EDT 2001


Thanks, after trying on the intepreter for a while I got it to work, no
escaping was needed.

Martin

"jim.vickroy" <jim.vickroy at noaa.gov> wrote in message
news:3AFAA9FA.97CF9717 at noaa.gov...
> Hello Martin,
>
> This is what I tried:
>
> >>> import re
> >>> parser =
>
re.compile(r"<token>([0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-9A-Ea-e]+-[0-
9A-Ea-e]+)</token>")
>
> >>> result =
> parser.search('<token>4CA064D0-5653-45E4-9581-9EE9A9B12A49</token>')
> >>> result
> <SRE_Match object at 013C4F38>
> >>> result.start()
> 0
> >>> result.end()
> 51
> >>>
>
> I do not think you need to "escape" hyphens outside the [] expressions.
>
> BTW, have you looked at xml.sax.  If all you want to do is parse an xml
> document, the SAX interface may be easier as it does this sort of parsing
for
> you.
>
>
>
>
> Martin Bertolino wrote:
>
> > I'm using Python 2.0 to try to parse out some GUIDs from an XML document
> > returned from an HTTP server, but I'm having some problems with some
regular
> > expressions that I believe should match what I'm looking for.
> >
> > Consider the following input to the parse_token function:
> >
> > <response version="1.0" server="test" code="0000" result_msg="success">
> >     <signon_user request_id="123"
> > response_id="E95FA6E0-3944-4945-93CD-4E37EDB807E3"
> >             code="0000" result_msg="success">
> >                 <token>4CA064D0-5653-45E4-9581-9EE9A9B12A49</token>
> >                 <pin_attributes pin_set_needed="N"
grace_logins_left="3"/>
> >                 <pin_control pin_set_enabled="N" pin_reset_enabled="N"/>
> >                 <last_signon date_time="2001-05-09T15:30:06Z"/>
> >                 <identification ssn="222222222"
employee_id="222222222"/>
> >                 <name title="sir" first="jokey" middle="dorcs"
last="smokey"
> > suffix="sr."/>
> >                 <division_id>central</division_id>
> >     </signon_user>
> > </response>
> >
> > and the parse_token function, whick attempts to get the value of the
<token>
> > node,  implemented as follows
> >
> > import re
> >
> > def parse_token(output_xml):
> >     # v0: token_re =
> >
re.search(r"<token>([0-9A-Ea-e]+\-[0-9A-Ea-e]+\-[0-9A-Ea-e]+\-[0-9A-Ea-e]+\-
> > [0-9A-Ea-e]+)</token>", output_xml)
> >     # v1: token_re = re.search(r"<token>([0-9A-Ea-e\-]+)</token>",
> > output_xml)
> >     token_re = re.search(r"<token>(.+)</token>", output_xml)
> >     if token_re:
> >         return token_re.group(1)
> >     else:
> >         raise "unable to parse token out of output xml"
> >
> > The re.search that is not commented correctly matches the node so it
returns
> > 4CA064D0-5653-45E4-9581-9EE9A9B12A49. The other two (v0 and v1) do not.
I
> > tried those two RE on a perl script, and I do get the expected output.
> >
> > Am I missing something? It seems as if the compile is getting thrown by
the
> > '-'s in the RE.
> >
> > Thanks for the help.
> >
> > Martin Bertolino
>





More information about the Python-list mailing list