regexp strangeness
MRAB
google at mrabarnett.plus.com
Thu Apr 9 17:54:40 EDT 2009
Peter Otten wrote:
> Dale Amon wrote:
>
>> This finds nothing:
>>
>> import re
>> import string
>> card = "abcdef"
>> DEC029 = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]")
The regular expression you're actually providing is:
>>> print "[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]"
[^&0-9A-Z/ $*,.\-:#@'="[<(+\^!);\\]%_>?]
^^^
The backslash is escaped (the "\\") and the set ends at the first "]".
>> errs = DEC029.findall(card.strip("\n\r"))
>> print errs
>>
>> This works correctly:
>>
>> import re
>> import string
>> card = "abcdef"
>> DEC029 = re.compile("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!)\\;\]%_>?]")
The regular expression you're actually providing is:
>>> print "[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!)\\;\]%_>?]"
[^&0-9A-Z/ $*,.\-:#@'="[<(+\^!)\;\]%_>?]
^^ ^
The first "]" is escaped (the "\]") and the set ends at the second "]".
>> errs = DEC029.findall(card.strip("\n\r"))
>> print errs
>>
>> They differ only in the positioning of the quoted backslash.
>>
>> Just in case it is of interest to anyone.
>
> You have to escape twice; once for Python and once for the regular
> expression. Or use raw strings, denoted by an r"..." prefix:
>
>>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
> []
>>>> re.findall("[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\\\\]%_>?]", "abc")
> ['a', 'b', 'c']
>>>> re.findall(r"[^&0-9A-Z/ $*,.\-:#@'=\"[<(+\^!);\\\]%_>?]", "abc")
> ['a', 'b', 'c']
>
More information about the Python-list
mailing list