Text Suffix to Prefix Conversion

Peter Otten __peter__ at web.de
Thu Apr 19 03:11:28 EDT 2007


7stud wrote:

> On Apr 18, 11:08 pm, Steven Bethard <steven.beth... at gmail.com> wrote:
>> EMC ROY wrote:
>> > Original Sentence: An apple for you.
>> > Present:           An<AT0> apple<NN1> for<PRP> you<PNP> .<.>
>> > Desire:            <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
>> >>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
>> >>> import re
>> >>> re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)
>>
>> '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
> 
> If you end up calling re.sub() repeatedly, e.g. for each line in your
> file, then you should "compile" the regular expression so that python
> doesn't have to recompile it for every call:
> 
> import re
> 
> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
> myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')

re.compile() doesn't accept a replacement pattern:

"""
Help on function compile in module re:

compile(pattern, flags=0)
    Compile a regular expression pattern, returning a pattern object.
"""

> re.sub(myR, r'\2\1\3', text)
> 
> 
> Unfortunately, I must be doing something wrong because I can't get
> that code to work.  When I run it, I get the error:
> 
> Traceback (most recent call last):
>   File "2pythontest.py", line 3, in ?
>     myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre.py", line 180, in compile
>     return _compile(pattern, flags)
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre.py", line 225, in _compile
>     p = sre_compile.compile(pattern, flags)
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_compile.py", line 496, in compile
>     p = sre_parse.parse(p, flags)
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 668, in parse
>     p = _parse_sub(source, pattern, 0)
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 308, in _parse_sub
>     itemsappend(_parse(source, state))
>   File "/Library/Frameworks/Python.framework/Versions/2.4/lib/
> python2.4/sre_parse.py", line 396, in _parse
>     if state.flags & SRE_FLAG_VERBOSE:
> TypeError: unsupported operand type(s) for &: 'str' and 'int'
> 
> 
> Yet, these two examples work without error:
> 
> ------
> import re
> 
> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
> #myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
> print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text)
> 
> myR = re.compile(r'(hello)')
> text = "hello world"
> print re.sub(myR, r"\1XXX", text)
> 
> ---------output:
> <AT0>An <NN1>apple <PRP>for <PNP>you <.>.
> helloXXX world
> 
> 
> Can anyone help?

You can precompile the regular expression like this:

>>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>'
>>> r = re.compile(r'(\S+)(<[^>]+>)(\s*)')
>>> r.sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'

or even

>>> sub = re.compile(r'(\S+)(<[^>]+>)(\s*)').sub
>>> sub(r'\2\1\3', text)
'<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'

Note that this is not as much more efficient as you might think since
re.sub() and the other re functions look up already compiled regexps in a
cache.

Peter



More information about the Python-list mailing list