[Python-Dev] Python-3.0, unicode, and os.environ

Mon Dec 8 15:54:44 CET 2008

On 2008-12-06 01:48, Nick Coghlan wrote:
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.

This is not entirely true: OSes, shells, and applications will
typically represent the file names using either ?-replacements or
some form of hex or decimal escapes for the characters they can't
decode. Since humans are usually very good at pattern recognition,
this goes a long way.

Of course, how the application maps that partially converted file name
back to the real thing is another issue and that's something that
Python should not make harder than it should be.

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.

It's going to take a long time before file names, environment variables
and command line parameters are all encoded using UTF-8, so "practicality
beats purity" will have to get more attention in this thread.

Python APIs should work out of the box most of the time.

Currently, if you live in a non-ASCII and non-pure-UTF-8 environment,
you have to deal with different and mixed encodings on a regular
basis.

Whether that's a USB stick, you're trying to read, a ZIP file
you're trying to open, a mounted network drive, etc. the problem
pops up in many different kinds of areas.

If I write "do_something.py *" I expect Python to indeed work on
all the files in my directory, not just the one that happen to
fit a particular encoding.

If I hook up a CGI script written in Python with a web server,
I expect all data to be received by the script, not just data
that happens to be UTF-8 encoded.

> In the meantime, raw bytes APIs will provide an alternative for those
> that disagree with that philosophy.

I think that's a wrong way to put it: The problems are not made
up by people who disagree with the one-encoding-for-everything
strategy.

The problems occur in real-life IT processing all the time - maybe
not so much in places where English scripts dominate, but certainly
in most other places with non-English scripts.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/