Unicode 7

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Apr 30 03:06:41 EDT 2014


@ Time Chase

I'm perfectly aware about what I'm doing.


@ MRAB

"...Although the third example is the fastest, it's also the wrong
way to handle Unicode: ..."

Maybe that's exactly the opposite. It illustrates very well,
the quality of coding schemes endorsed by Unicode.org.
I deliberately choose utf-8.


>>> sys.getsizeof('\u0fce')
40
>>> sys.getsizeof('\u0fce'.encode('utf-8'))
20
>>> sys.getsizeof('\u0fce'.encode('utf-16-be'))
19
>>> sys.getsizeof('\u0fce'.encode('utf-32-be'))
21
>>> 

Q. How to save memory without wasting time in encoding?
By using products using natively the unicode coding schemes?

Are you understanding unicode? Or are you understanding
unicode via Python?

---

A Tibetan monk [*] using Py32:

>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'")
[2.3394840182882186, 2.3145832750782653, 2.3207231951529685]
>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = '\u0fce'")
[2.328517624800078, 2.3169403900011076, 2.317586282812048]
>>>

[*] Your curiosity has certainly shown, what this code point means.
For the others:
U+0FCE TIBETAN SIGN RDEL NAG RDEL DKAR
signifies good luck earlier, bad luck later


(My comment: Good luck with Python or bad luck with Python)

jmf



More information about the Python-list mailing list