[Distutils] PEP 517: Open questions around artifact export directories

Nathaniel Smith njs at pobox.com
Tue Jun 13 04:22:26 EDT 2017


On Tue, Jun 13, 2017 at 12:59 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>
> On 12 June 2017 at 23:42, Donald Stufft <donald at stufft.io> wrote:
> >
> > As always, it’s complicated —
> > https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/
>
> Lol. But I still consider falling over on normalisation rules an
> improvement over falling over on unrepresentable characters ;-) And I
> assume that the stdlib zipfile module at least does the right thing
> with Unicode filenames (UTF-8 encoding, recorded properly in the zip).

Yeah, it looks like the stdlib (at least in master) does like:

- if the filename is valid ascii, then flag it as being in cp437
- otherwise, use utf-8 and flag it as such

I guess the idea is that this way, the resulting zip files can be read
correctly by correct readers that pay attention to the flags, and by
readers that ignore the flags and assume everything is utf-8, and (if
the filenames happen to fit in ascii) by ancient readers that only
know cp437 and blow up when they see the utf-8 flag.

Though - we should probably mandate sometime/somewhere that the
filenames in a wheel are always in UTF-8 and also that they follow
some particular normalization (NFC?). And likewise for sdists, though
actually this is a place where zip is technically superior to tar:
AFAIK tar has no way to indicate the filename encoding at all. I guess
this could be an argument for making zip the standard sdist format as
well, but I really don't care either way so I'm not going to fuss
about it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Distutils-SIG mailing list