[issue3300] urllib.quote and unquote - Unicode issues

Thu Aug 14 14:18:51 CEST 2008

Matt Giuca <matt.giuca at gmail.com> added the comment:

OK I implemented the defaultdict solution. I got curious so ran some
rough speed tests, using the following code.

import random, urllib.parse
for i in range(0, 100000):
    str = ''.join(chr(random.randint(0, 0x10ffff)) for _ in range(50))
    quoted = urllib.parse.quote(str)

Time to quote 100,000 random strings of 50 characters.
(Ran each test twice, worst case printed)

HEAD, chars in range(0,0x110000): 1m44.80
HEAD, chars in range(0,256): 25.0s
patch9, chars in range(0,0x110000): 35.3s
patch9, chars in range(0,256): 27.4s
New, chars in range(0,0x110000): 31.4s
New, chars in range(0,256): 25.3s

Head is the current Py3k head. Patch 9 is my previous patch (before
implementing defaultdict), and New is after implementing defaultdict.

Interesting. Defaultdict didn't really make much of an improvement. You
can see the big help the cache itself makes, though (my code caches all
chars, whereas the HEAD just caches ASCII chars, which is why HEAD is so
slow on the full repertoire test). Other than that, differences are
fairly negligible.

However, I'll keep the defaultdict code, I quite like it, speedy or not
(it is slightly faster).

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3300>
_______________________________________