"More About Unicode in Python 2 and 3"

Chris Angelico rosuav at gmail.com
Wed Jan 8 18:45:37 EST 2014


On Thu, Jan 9, 2014 at 10:34 AM,  <rdsteph at mac.com> wrote:
> I just meant to say that internet programming using ASCII urls is so common and important that it hurts that Python 3 makes it so much harder. It sure would be great if Python 3 could be improved to allow such programming to be done using ASCII urls without requiring all the unicode overhead.
>
> Armin is right. Calling his post a rant doesn't help.

There's one big problem with that theory. We've been looking, on this
list and on python-ideas, at some practical suggestions for adding
something to Py3 that will help. So far, lots of people have suggested
things, and the complainers haven't attempted to explain what they
actually need. Hard facts and examples would help enormously.

Incidentally, before referring to "all the Unicode overhead", it would
help to actually measure the overhead of encoding and decoding.

Python 2.7:
>>> timeit.timeit("a.encode().decode()","a=u'a'*1000",number=500000)
8.787162614242874

Python 3.4:
>>> timeit.timeit("a.encode().decode()","a=u'a'*1000",number=500000)
1.7354552045022515

Since 3.3, the cost of UTF-8 encoding/decoding an all-ASCII string is
extremely low. So the real cost isn't in run-time performance but in
code complexity. Would it be easier to work with ASCII URLs with a
one-letter-name helper function? I never got an answer to that
question.

ChrisA



More information about the Python-list mailing list