flaming vs accuracy [was Re: Performance of int/long in Python 3]

Thu Mar 28 06:30:27 EDT 2013

On Thu, Mar 28, 2013 at 8:03 PM, jmfauth <wxjmfauth at gmail.com> wrote:
> Example of a good Unicode understanding.
> If you wish 1) to preserve memory, 2) to cover the whole range
> of Unicode, 3) to keep maximum performance while preserving the
> good work Unicode.org as done (normalization, sorting), there
> is only one solution: utf-8. For this you have to understand,
> what is really a "unicode transformation format".

You really REALLY need to sort out in your head the difference between
correctness and performance. I still haven't seen one single piece of
evidence from you that Python 3.3 fails on any point of Unicode
correctness. Covering the whole range of Unicode has never been a
problem.

In terms of memory usage and performance, though, there's one obvious
solution. Fork CPython 3.3 (or the current branch head[1]), change the
internal representation of a string to be UTF-8 (by the way, that's
the official spelling), and run the string benchmarks. Then post your
code and benchmark figures so other people can replicate your results.

> Python has certainly and definitvely not "revolutionize"
> Unicode.

This is one place where you're actually correct, though, because PEP
393 isn't the first instance of this kind of format - Pike's had it
for years. Funny though, I don't think that was your point :)

[1] Apologies if my terminology is wrong, I'm a git user and did one
quick Google search to see if hg uses the same term.

ChrisA