[issue9873] urllib.parse: Allow bytes in some APIs that use string literals internally

Tue Oct 5 12:32:20 CEST 2010

Nick Coghlan <ncoghlan at gmail.com> added the comment:

On Tue, Oct 5, 2010 at 5:32 PM, STINNER Victor <report at bugs.python.org> wrote:
>
> STINNER Victor <victor.stinner at haypocalc.com> added the comment:
>
>> If you were worried about performance, then surrogateescape is certainly
>> much slower than latin1.
>
> If you were really worried about performance, the bytes type is maybe faster
> than: decode bytes to str using latin-1, process str strings, encode str to
> bytes using latin-1.

I'm fairly resigned to the fact that I'm going to need some kind of
micro-benchmark to compare the different approaches. For example, the
bytes based approach has a lot of extra assignments to local variables
that the str based approach doesn't need.

The first step is to actually have a str-based patch to compare to the
existing bytes based patch. If the code ends up significantly clearer
(as I expect it will), we can probably sacrifice a certain amount of
speed for that benefit.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9873>
_______________________________________