How do I display unicode value stored in a string variable using ord()

Terry Reedy tjreedy at udel.edu
Sun Aug 19 17:03:46 EDT 2012


On 8/19/2012 1:03 PM, Blind Anagram wrote:

> Running Python from a Windows command prompt,  I got the following on
> Python 3.2.3 and 3.3 beta 2:
>
> python33\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 39.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 51.8 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 52 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 50.3 usec per loop
> python33\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 51.6 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 38.3 usec per loop
> python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 50.3 usec per loop
>
> python32\python" -m timeit "('abc' * 1000).replace('c', 'de')"
> 10000 loops, best of 3: 24.5 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '……')"
> 10000 loops, best of 3: 24.7 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'x…')"
> 10000 loops, best of 3: 24.8 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', 'œ…')"
> 10000 loops, best of 3: 24 usec per loop
> python32\python" -m timeit "('ab…' * 1000).replace('…', '€…')"
> 10000 loops, best of 3: 24.1 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('X', 'éç')"
> 10000 loops, best of 3: 24.4 usec per loop
> python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')"
> 10000 loops, best of 3: 24.3 usec per loop

This is one test repeated 7 times with essentially irrelevant 
variations. The difference is less on my system (50%). Others report 
seeing 3.3 as faster. When I asked on pydev, the answer was don't bother 
making a tracker issue unless I was personally interested in 
investigating why search is relatively slow in 3.3 on Windows. Any 
change would have to not slow other operations or severely impact search 
on other systems. I suggest the same answer to you.

If you seriously want to compare old and new unicode, go to
http://hg.python.org/cpython/file/tip/Tools/stringbench/stringbench.py
and click raw to download. Run on 3.2 and 3.3, ignoring the bytes times.

Here is a version of the first comparison from stringbench:
print(timeit('''('NOW IS THE TIME FOR ALL GOOD PEOPLE TO COME TO THE AID 
OF PYTHON'* 10).lower()'''))
Results are 5.6 for 3.2 and .8 for 3.3. WOW! 3.3 is 7 times faster!

OK, not fair. I cherry picked. The 7 times speedup in 3.3 likely is at 
least partly independent of the 393 unicode change. The same test in 
stringbench for bytes is twice as fast in 3.3 as 3.2, but only 2x, not 
7x. In fact, it may have been the bytes/unicode comparison in 3.2 that 
suggested that unicode case conversion of ascii chrs might be made faster.

The sum of the 3.3 unicode times is 109 versus 110 for 3.3 bytes and 125 
for 3.2 unicode. This unweighted sum is not really fair since the raw 
times vary by a factor of at least 100. But is does suggest that anyone 
claiming that 3.3 unicode is overall 'slower' than 3.2 unicode has some 
work to do.

There is also this. On my machine, the lowest bytes-time/unicode-time 
for 3.3 is .71. This suggests that there is not a lot of fluff left in 
the unicode code, and that not much is lost by the bytes to unicode 
switch for strings.

-- 
Terry Jan Reedy





More information about the Python-list mailing list