[Python-Dev] Python-3.0, unicode, and os.environ

Adam Olsen rhamph at gmail.com
Mon Dec 8 23:25:03 CET 2008


On Mon, Dec 8, 2008 at 2:44 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-12-08 22:32, Adam Olsen wrote:
>> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 2008-12-08 21:45, Antoine Pitrou wrote:
>>>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>>>> Such application specific error handlers could then also apply
>>>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>>>> to Unicode escapes, private code points, etc. as seen fit by the
>>>>> application.
>>>> I'd argue that such fancy round-trip safe error handler should be provided by
>>>> Python. It's not reasonable to expect application coders to come up with their
>>>> own codec variation based on subtle details of the unicode spec.
>>> Fair enough. We could add some e.g.
>>>
>>>  * a round-trip safe escape error handler that uses a Unicode private
>>>   code point area which we officially reserve for the Python
>>>   interpreter
>>
>> This would of course alter the behaviour of those private code points,
>> preventing them from round-tripping properly.
>>
>> I don't think round-tripping can be done from an error handler.  You
>> need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
>> bytes.  This has long since gotten repetitive..
>
> The error handler would just map the problem bytes to the private
> area. The application would then have to decide what to do with
> them, ie. the error handler only provides one half of the round-
> tripping.

By that point it's already too late.  You've already conflated garbage
PUA with legitimate PUA.

To make it work you need to treat those legitimate PUA scalars as
errors too, transforming them.  A common example is how escaping
replaces a single '\' with '\\'.

Hrm.  nul-escaping should work.  Obviously it can't be used outside
the filesystem though, as they may introduce a legitimate nul.


-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-Dev mailing list