[issue14045] In regex pattern long unicode character isn't recognized by repetition characters +, * and {}

py.user report at bugs.python.org
Sat Feb 18 04:13:38 CET 2012


New submission from py.user <port139 at yandex.ru>:

>>> import re
>>> '\U00000061'
'a'
>>> '\U00100061'
'\U00100061'
>>> re.search('\U00100061', '\U00100061' * 10).group()
'\U00100061'
>>> re.search('\U00100061+', '\U00100061' * 10).group()
'\U00100061'
>>> re.search('(\U00100061)+', '\U00100061' * 10).group()
'\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061'
>>> 
>>>
>>> re.search('\U00100061{3}', '\U00100061' * 10)
>>> re.search('(\U00100061){3}', '\U00100061' * 10).group()
'\U00100061\U00100061\U00100061'
>>>

----------
components: Library (Lib), Regular Expressions
messages: 153629
nosy: ezio.melotti, py.user
priority: normal
severity: normal
status: open
title: In regex pattern long unicode character isn't recognized by repetition characters +, * and {}
type: behavior
versions: Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14045>
_______________________________________


More information about the Python-bugs-list mailing list