simple regular expression problem
Bruno Desthuilliers
bruno.42.desthuilliers at wtf.websiteburo.oops.com
Mon Sep 17 09:50:32 EDT 2007
duikboot a écrit :
> Hello,
>
> I am trying to extract a list of strings from a text. I am looking it
> for hours now, googling didn't help either.
> Could you please help me?
>
>>>> s = """ \n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>"""
>>>> regex = re.compile(r'<organisatie.*</organisatie>', re.S)
>>>> L = regex.findall(s)
>>>> print L
> ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie']
>
> I expected:
> [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</
> organisatie')]
>
> I must be missing something very obvious.
wrt/ regexp, Jason gave you the answer. Another point is that, when
dealing with XML, it's sometime better to use an XML parser.
Q&D :
>>> from xml.etree import ElementTree as ET
>>> s = "<root>" + s + "</root>"
>>> tree = ET.fromstring(s)
>>> tree
<Element root at b795b2ac>
>>> tree.findall("organisatie/Profiel_Id")
[<Element Profiel_Id at b795b32c>, <Element Profiel_Id at b795b3ec>]
>>> _[0].text
'28996'
>>> [it.text for it in tree.findall("organisatie/Profiel_Id")]
['28996', '28997']
>>>
HTH
More information about the Python-list
mailing list