simple regular expression problem

duikboot dijkstra.arjen at gmail.com
Mon Sep 17 09:31:56 EDT 2007


Thank you very much, it works. I guess I didn't read it right.

Arjen

On Sep 17, 3:22 pm, Jason Drew <jasondre... at gmail.com> wrote:
> You just need a one-character addition to your regex:
>
> regex = re.compile(r'<organisatie.*?</organisatie>', re.S)
>
> Note, there is now a question mark (?) after the .*
>
> By default, regular expressions are "greedy" and will grab as much
> text as possible when making a match. So your original expression was
> grabbing everything between the first opening tag and the last closing
> tag. The question mark says, don't be greedy, and you get the
> behaviour you need.
>
> This is covered in the documentation for the re module.http://docs.python.org/lib/module-re.html
>
> Jason
>
> On Sep 17, 9:00 am, duikboot <dijkstra.ar... at gmail.com> wrote:
>
> > Hello,
>
> > I am trying to extract a list of strings from a text. I am looking it
> > for hours now, googling didn't help either.
> > Could you please help me?
>
> > >>>s = """ \n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>"""
> > >>> regex = re.compile(r'<organisatie.*</organisatie>', re.S)
> > >>> L = regex.findall(s)
> > >>> print L
>
> > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie']
>
> > I expected:
> > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</
> > organisatie')]
>
> > I must be missing something very obvious.
>
> > Greetings Arjen




More information about the Python-list mailing list