[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

"Martin v. Löwis" martin at v.loewis.de
Wed Apr 29 07:52:23 CEST 2009


>>> C. File on disk with the invalid surrogate code, accessed via the str
>>> interface, no decoding happens, matches in memory the file on disk with
>>> the byte that translates to the same surrogate, accessed via the bytes
>>> interface.  Ambiguity.
>>
>> Is that an alternative to A and B?
> 
> I guess it is an adjunct to case B, the current PEP.
> 
> It is what happens when using the PEP on a system that provides both
> bytes and str interfaces, and both get used.

Your formulation is a bit too stenographic to me, but please trust me
that there is *no* ambiguity in the case you construct.

By "accessed via the str interface", I assume you do something like

  fn = "some string"
  open(fn)

You are wrong in assuming "no decoding happens", and that "matches
in memory the file on disk" (whatever that means - how do I match
a file on disk in memory??????). What happens instead is that fn
gets *encoded* with the file system encoding, and the python-escape
handler. This will *not* produce an ambiguity.

If you think there is an ambiguity in that you can use both the
byte interface and the string interface to access the same file:
this would be a ridiculous interpretation. *Of course* you can
access /etc/passwd both as "/etc/passwd" and b"/etc/passwd",
there is nothing ambiguous about that.

Regards,
Martin


More information about the Python-Dev mailing list