[Python-Dev] Misc re.match() complaint

Guido van Rossum guido at python.org
Tue Jul 16 01:37:27 CEST 2013


Ok, created http://bugs.python.org/issue18468.

On Mon, Jul 15, 2013 at 4:30 PM, Gregory P. Smith <greg at krypto.org> wrote:
>
> On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> In a discussion about mypy I discovered that the Python 3 version of
>> the re module's Match object behaves subtly different from the Python
>> 2 version when the target string (i.e. the haystack, not the needle)
>> is a buffer object.
>>
>> In Python 2, the type of the return value of group() is always either
>> a Unicode string or an 8-bit string, and the type is determined by
>> looking at the target string -- if the target is unicode, group()
>> returns a unicode string, otherwise, group() returns an 8-bit string.
>> In particular, if the target is a buffer object, group() returns an
>> 8-bit string. I think this is the appropriate behavior: otherwise
>> using regular expression matching to extract a small substring from a
>> large target string would unnecessarily keep the large target string
>> alive as long as the substring is alive.
>>
>> But in Python 3, the behavior of group() has changed so that its
>> return type always matches that of the target string. I think this is
>> bad -- apart from the lifetime concern, it means that if your target
>> happens to be a bytearray, the return value isn't even hashable!
>>
>> Does anyone remember whether this was a conscious decision? Is it too
>> late to fix?
>
>
> Hmm, that is not what I'd expect either. I would never expect it to return a
> bytearray; I'd normally assume that .group() returned a bytes object if the
> input was binary data and a str object if the input was unicode data (str)
> regardless of specific types containing the input target data.
>
> I'm going to hazard a guess that not much, if anything, would be depending
> on getting a bytearray out of that. Fix this in 3.4? 3.3 and earlier users
> are stuck with an extra bytes() call and data copy in these cases I guess.
>
> -gps
>



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list