regex help

Gabriel Rossetti gabriel.rossetti at arimaz.com
Wed Dec 16 12:16:45 EST 2009


Hello everyone,

I'm going nuts with some regex, could someone please show me what I'm 
doing wrong?

I have an XMPP msg :

<message xmlns='jabber:client' to='node at host.com'>
    <mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
        <parameters>
            <param1>123</param1>
            <param2>456</param2>
        </parameters>
        <payload type='plain'>...</payload>
    </mynode>
    <x xmlns='jabber:x:expire' seconds='15'/>
</message>

the <parameter> node may be absent or empty (<parameter/>), the <x> node 
may be absent. I'd like to grab everything exept the <payload> nod and 
create something new using regex, with the XMPP message example above 
I'd get this :

<message xmlns='jabber:client' to='node at host.com'>
    <mynode xmlns='myprotocol:core' version='1.0' type='mytype'>
        <parameters>
            <param1>123</param1>
            <param2>456</param2>
        </parameters>
    </mynode>
    <x xmlns='jabber:x:expire' seconds='15'/>
</message>

for some reason my regex doesn't work correctly :

r"(<message .*?>).*?(<mynode 
.*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"

I group the opening <message> node, the opening <mynode> node and if the 
<parameters> node is present and not empty I group it and if the <x> 
node is present I group it. For some reason this doesn't work correctly :

 >>> import re
 >>> s1 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload 
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' 
seconds='15'/></message>"
 >>> s2 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'><parameters/><payload 
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' 
seconds='15'/></message>"
 >>> s3 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'><payload 
type='plain'>...</payload></mynode><x xmlns='jabber:x:expire' 
seconds='15'/></message>"
 >>> s4 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'><parameters><param1>123</param1><param2>456</param2></parameters><payload 
type='plain'>...</payload></mynode></message>"
 >>> s5 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'><parameters/><payload 
type='plain'>...</payload></mynode></message>"
 >>> s6 = "<message xmlns='jabber:client' to='node at host.com'><mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'><payload 
type='plain'>...</payload></mynode></message>"
 >>> exp = r"(<message .*?>).*?(<mynode 
.*?>).*?(?:(<parameters>.*?</parameters>)|<parameters/>)?.*?(<x .*/>)?"
 >>>
 >>> re.match(exp, s1).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", 
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
 >>>
 >>> re.match(exp, s2).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
 >>>
 >>> re.match(exp, s3).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
 >>>
 >>> re.match(exp, s4).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", 
'<parameters><param1>123</param1><param2>456</param2></parameters>', None)
 >>>
 >>> re.match(exp, s5).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
 >>>
 >>> re.match(exp, s6).groups()
("<message xmlns='jabber:client' to='node at host.com'>", "<mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)
 >>>


Does someone know what is wrong with my expression? Thank you, Gabriel



More information about the Python-list mailing list