[Python-Dev] Filename as byte string in python 2.6 or 3.0?

glyph at divmod.com glyph at divmod.com
Mon Sep 29 16:01:33 CEST 2008


On 11:59 am, eckhardt at satorlaser.com wrote:
>Sorry, I wasn't clear enough. I'll try to explain further...
>
>Let's assume we have a filename like this:
>
>  0xc2 0xa9 0x2f 0x7f
>
>The first two bytes are the copyright sign encoded in UTF-8, followed 
>by a
>slash (0x2f, path separator) and a character encoded in an unknown 
>codepage
>(0x7f is not ASCII!).

Originally I thought that this was a valid idea, but then it became 
clear that this could be a problem.  Consider a filename which includes 
a UTF-8 encoding of a PUA code point.
>I'm not sure if the use I proposed is correct according to the intended 
>use of
>the PUA. I know that ideally no such string would escape from Python, 
>i.e. it
>should only be visible internally. I would guess that that is something 
>the
>PUA was intended for.

Viewing the PUA with GNOME charmap, I can see that many code points 
there have character renderings on my Ubuntu system.  I have to assume, 
therefore, that there are other (and potentially conflicting) uses for 
this unicode feature.


More information about the Python-Dev mailing list