"More About Unicode in Python 2 and 3"

Chris Angelico rosuav at gmail.com
Sun Jan 5 23:59:34 EST 2014


On Mon, Jan 6, 2014 at 3:49 PM, Roy Smith <roy at panix.com> wrote:
> Thanks.  But, I see I didn't formulate my problem statement well.  I was
> (naively) assuming there wouldn't be a built-in codec for rot-13.  Let's
> assume there isn't; I was trying to find a case where you had to treat
> the data as integers in one place and text in another.  How would you do
> that?

I assumed that you would have checked that one, and answered
accordingly :) Though I did dig into the EBCDIC part of the question.

My thinking is that, if you're working with integers, you probably
mean either bytes (so encode it before you do stuff - typical for
crypto) or codepoints / Unicode ordinals (so use ord()/chr()). In
other languages there are ways to treat strings as though they were
arrays of integers (lots of C-derived languages treat 'a' as 97 and
"a"[0] as 97 also; some extend this to the full Unicode range), and
even there, I almost never actually use that identity much. There's
only one case that I can think of where I did a lot of
string<->integer-array transmutation, and that was using a diff
function that expected an integer array - if the transformation to and
from strings hadn't been really easy, that function would probably
have been written to take strings.

The Py2 str.translate() method was a little clunky to use, but
presumably fast to execute - you build up a lookup table and translate
through that. The Py3 equivalent takes a dict mapping the from and to
values. Pretty easy to use. And it lets you work with codepoints or
strings, as you please.

ChrisA



More information about the Python-list mailing list