[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Steven D'Aprano steve at pearwood.info
Wed Oct 1 02:04:39 CEST 2008


On Wed, 1 Oct 2008 09:21:37 am you wrote:
> On Tue, Sep 30, 2008 at 4:08 PM, Steven D'Aprano <steve at pearwood.info> 
wrote:
> > On Wed, 1 Oct 2008 07:40:01 am Martin v. Löwis wrote:
> >> >> On Windows, we might reject bytes filenames for all file
> >> >> operations: open(), unlink(), os.path.join(), etc. (raise a
> >> >> TypeError or UnicodeError)
> >> >
> >> > Since I've seen no objections to this yet: please no. If we
> >> > offer a "lower-level" bytes filename API, it should work for all
> >> > platforms.
> >>
> >> Unfortunately, it can't. You cannot represent all possible file
> >> names in a byte string in Windows (just as you can't do so in a
> >> Unicode string on Unix).
> >
> > Sorry, maybe I'm just being thick here, but I don't understand how
> > that is possible. On the physical disk, each Windows file name must
> > be represented by a byte string, yes? So how is it possible that
> > there are Windows files with names that can't be represented as a
> > byte string? What have I missed?
>
> I believe on disk it uses UTF-16.

Which is made up of bytes. There may be byte sequences that are illegal 
UTF-16, but that's not what Martin said. I don't understand how there 
can be UTF-16 sequences which don't correspond to some sequence of 
bytes. How would they be represented in memory? Is this to do with the 
endianness of the UTF-16 sequence?


-- 
Steven


More information about the Python-Dev mailing list