Regex Question

Jussi Piitulainen jpiitula at ling.helsinki.fi
Sat Aug 18 12:22:37 EDT 2012


Frank Koshti writes:

> not always placed in HTML, and even in HTML, they may appear in
> strange places, such as <h1 $foo(x=3)>Hello</h1>. My specific issue
> is I need to match, process and replace $foo(x=3), knowing that
> (x=3) is optional, and the token might appear simply as $foo.
> 
> To do this, I decided to use:
> 
> re.compile('\$\w*\(?.*?\)').findall(mystring)
> 
> the issue with this is it doesn't match $foo by itself, and requires
> there to be () at the end.

Adding a ? after the meant-to-be-optional expression would let the
regex engine know what you want. You can also separate the mandatory
and the optional part in the regex to receive pairs as matches. The
test program below prints this:

>$foo()$foo(bar=3)$$$foo($)$foo($bar(v=0))etc</htm
('$foo', '')
('$foo', '(bar=3)')
('$foo', '($)')
('$foo', '')
('$bar', '(v=0)')

Here is the program:

import re

def grab(text):
    p = re.compile(r'([$]\w+)([(][^()]+[)])?')
    return re.findall(p, text)

def test(html):
    print(html)
    for hit in grab(html):
        print(hit)

if __name__ == '__main__':
    test('>$foo()$foo(bar=3)$$$foo($)$foo($bar(v=0))etc</htm')



More information about the Python-list mailing list