Python 3 is killing Python
Marko Rauhamaa
marko at pacujo.net
Wed Jul 16 09:11:26 EDT 2014
Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
> With a few exceptions, /etc is filled with text files, not binary
> files, and half the executables on the system are text (Python, Perl,
> bash, sh, awk, etc.).
Our debate seems to stem from a different idea of what text is. To me,
text in the Python sense is a sequence of UCS-4 character code points.
The opposite of text is not necessarily binary.
Most of those "text" files under /etc expect ASCII. In many contexts,
they tolerate UTF-8 or Latin-3 or whatever, but it's a bit iffy (how are
extra-ASCII passwords encoded in the /etc/shadow?). Also, the files
under /etc, /var/log etc should not depend on the locale since they are
typically interpreted by daemons, which typically don't possess locales.
> Relatively rare. Like, um, email, news, html, Unix config files,
> Windows ini files, source code in just about every language ever,
> SMSes, XML, JSON, YAML, instant messenger apps,
I would be especially wary of letting Python 3 interpret those files for
me. Python's [text] strings could be a wonderful tool on the inside of
my program, but I definitely would like to micromanage the I/O. Do I
obey the locale or not? That's too big (and painful) a question for
Python to answer on its own (and pretend like everything's under
control).
> word processors... even *graphic* applications invariably have a text
> tool.
Thing is, the serious text utilities like word processors probably need
lots of ancillary information so Python's [text] strings might be too
naive to represent even a single character.
>> More often, len(b'λ') is what I want.
>
> Oh really? Are you sure? What exactly is b'λ'?
That's something that ought to work in the UTF-8 paradise.
Unfortunately, Python only allows ASCII in bytes. ASCII only! In this
day and age! Even C is not so picky:
#include <stdio.h>
int main()
{
printf("Hyvää yötä\n");
return 0;
}
Marko
More information about the Python-list
mailing list