How do I display unicode value stored in a string variable using ord()

Paul Rubin no.email at nospam.invalid
Mon Aug 20 02:24:55 EDT 2012


Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> Paul Rubin already told you about his experience using OCR to generate 
> multiple terrabytes of text, and how he would not be happy if that was 
> stored in UCS-4.

That particular text was stored on disk as compressed XML that had UTF-8
in the data fields, but I think Roy is right that it would have
compressed to around the same size in UCS-4.  Converting it to UCS-4 on
input would have bloated up the memory footprint and that was the issue
of concern to me.

> Pittance or not, I do not believe that people will widely abandon compact 
> storage formats like UTF-8 and Latin-1 for UCS-4 any time soon.

Looking at http://www.icu-project.org/ the C++ classes seem to use
UTF-16 sort like Python 3.2 :(.  I'm not certain of this though.



More information about the Python-list mailing list