Pyhon 2.x or 3.x, which is faster?

Chris Angelico rosuav at gmail.com
Thu Mar 10 15:07:09 EST 2016


On Fri, Mar 11, 2016 at 1:22 AM, BartC <bc at freeuk.com> wrote:
>>
>> 1) Unicode support, intrinsic to the language, is crucial, even if
>> BartC refuses to recognize this. Anything released beyond the confines
>> of his personal workspace will need full Unicode support, otherwise it
>> is a problem to the rest of the world, and should be destroyed with
>> fire. Thanks.
>
>
> I don't agree. If I distribute some text in the form of a series of ASCII
> byte values (eg. classic TXT format, with either kind of line separator),
> then that same data can be directly interpreted as UTF-8.

What you call "classic TXT format" is still an encoding, which means
you're acknowledging the difference between characters and bytes -
that's the first step. But you have to be certain that you are
interpreting it as UTF-8, in which case ASCII ceases to be
significant, and what you've done is declare that your file consists
of a stream of UTF-8-encoded Unicode characters, divided into lines
with either U+000D U+000A or just U+000A. That's a nice clear encoding
definition.

And the difference between characters and bytes is only the first step
(albeit the biggest and most important step). You _need_ to make sure
that you're thinking about text as text, and that means being aware of
RTL vs LTR, combining characters, case conversions, collations, etc,
etc, etc, all in terms of Unicode rather than as eight-bit or
seven-bit characters. (For example, a naïve MUD client might assume
that one byte is one character is 8 pixels of width. I know this,
because some years ago I wrote one exactly like that (well, the figure
"8" came from measuring the current font, but other than at font
changes, it was fixed). An intelligent Unicode-aware MUD client has to
not only cope with variable width, but also characters that don't have
any width at all, and those that use the same space as their base
character, and those that are placed to the left of the preceding
character.) You can't ignore this, although you might be able to leave
full support for later - but it's a bug until you do.

ChrisA



More information about the Python-list mailing list