[Python-Dev] Python-3.0, unicode, and os.environ
Terry Reedy
tjreedy at udel.edu
Tue Dec 9 00:58:09 CET 2008
M.-A. Lemburg wrote:
>> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>> try:
>>> files = os.listdir(somedir, errors = strict)
>>> except OSError as e:
>>> log(<verbose error message that includes somedir and e>)
>>> files = os.listdir(somedir)
> If that error parameter is the same as in unicode(value, errors),
> then this would be a useful feature:
Except that unicode becomes str in 3.0, that is exactly my intention.
> People could then choose among the already existing error handlers
> ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
> their own ones via the codecs module.
These could be passed through from listdir or getenv to str.
[Side questions:
1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string.
Should it be or is 'xmlcharrefreplace' an addition for a later version.
2. A garbage value for errors (such as 'blah') is silently ignored (so I
cannot test the above). Intended or a bug?]
Someone else proposed a new option 'warn', which Guido has accepted to
be the default instead of the current 'ignore'. It could not be passed
through (unless str were changed or something registered). I believe
the implementation of that would be to call str with 'strict' but catch
errors and warn instead. Whether there should be 1 warning for each
problematic bytes encountered or 1 for each listdir (or whatever) call,
possibly with the number of problems, I leave to others to decide.
> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.
>
> Perhaps we should also add an ''encoding'' parameter that can be
> set on a per directory basis (if necessary) and defaults to the
> global file system encoding.
That could also be passed through, but I will lets others make the
argument for it.
>
> If an application hits directory that is known to cause problems,
> it could then chose to receive the file names in a different,
> more suitable encoding. This allows implementing fallback
> mechanisms with a list of common encodings for a locale.
Terry Jan Reedy
More information about the Python-Dev
mailing list