simple regular expression problem

Jason Drew jasondrew72 at gmail.com
Mon Sep 17 09:50:22 EDT 2007


You're welcome!

Also, of course, parsing XML is a very common task and you might be
interested in using one of the standard modules for that, e.g.
http://docs.python.org/lib/module-xml.parsers.expat.html

Then all the tricky parsing work has been done for you.

Jason


On Sep 17, 9:31 am, duikboot <dijkstra.ar... at gmail.com> wrote:
> Thank you very much, it works. I guess I didn't read it right.
>
> Arjen
>
> On Sep 17, 3:22 pm, Jason Drew <jasondre... at gmail.com> wrote:
>
> > You just need a one-character addition to your regex:
>
> > regex = re.compile(r'<organisatie.*?</organisatie>', re.S)
>
> > Note, there is now a question mark (?) after the .*
>
> > By default, regular expressions are "greedy" and will grab as much
> > text as possible when making a match. So your original expression was
> > grabbing everything between the first opening tag and the last closing
> > tag. The question mark says, don't be greedy, and you get the
> > behaviour you need.
>
> > This is covered in the documentation for the re module.http://docs.python.org/lib/module-re.html
>
> > Jason
>
> > On Sep 17, 9:00 am, duikboot <dijkstra.ar... at gmail.com> wrote:
>
> > > Hello,
>
> > > I am trying to extract a list of strings from a text. I am looking it
> > > for hours now, googling didn't help either.
> > > Could you please help me?
>
> > > >>>s = """ \n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>"""
> > > >>> regex = re.compile(r'<organisatie.*</organisatie>', re.S)
> > > >>> L = regex.findall(s)
> > > >>> print L
>
> > > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> > > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie']
>
> > > I expected:
> > > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> > > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</
> > > organisatie')]
>
> > > I must be missing something very obvious.
>
> > > Greetings Arjen





More information about the Python-list mailing list