How do I display unicode value stored in a string variable using ord()

Paul Rubin no.email at nospam.invalid
Sun Aug 19 01:58:14 EDT 2012


Chris Angelico <rosuav at gmail.com> writes:
> Generally, I'm working with pure ASCII, but port those same algorithms
> to Python and you'll easily be able to read in a file in some known
> encoding and manipulate it as Unicode.

If it's pure ASCII, you can use the bytes or bytearray type.  

> It's not so much 'random access to the nth character' as an efficient
> way of jumping forward. For instance, if I know that the next thing is
> a literal string of n characters (that I don't care about), I want to
> skip over that and keep parsing.

I don't understand how this is supposed to work.  You're going to read a
large unicode text file (let's say it's UTF-8) into a single big string?
So the runtime library has to scan the encoded contents to find the
highest numbered codepoint (let's say it's mostly ascii but has a few
characters outside the BMP), expand it all (in this case) to UCS-4
giving 4x memory bloat and requiring decoding all the UTF-8 regardless,
and now we should worry about the efficiency of skipping n characters?

Since you have to decode the n characters regardless, I'd think this
skipping part should only be an issue if you have to do it a lot of
times.



More information about the Python-list mailing list