[Python-Dev] PEP 383 and GUI libraries

Fri May 1 11:06:08 CEST 2009

Zooko O'Whielacronx wrote:
> [snip...]
> Would it be possible for Python unicode objects to have a flag
> indicating whether the 'python-escape' error handler was present?  That
> would serve the same purpose as my "failed_decode" flag above, and would
> basically allow me to use the Python APIs directory and make all this
> work-around code disappear.
>
> Failing that, I can't see any way to use the os.listdir() in its
> unicode-oriented mode to satisfy Tahoe's requirements.
>
> If you take the above code and then add the fact that you want to use
> the failed_decode flag when *encoding* the d argument to os.listdir(),
> then you get this code: [2].
>
> Oh, I just realized that I *could* use the PEP 383 os.listdir(), like
> this:
>
> def listdir(d):
>     fse = sys.getfilesystemencoding()
>     if fse == 'utf-8b':
>         fse = 'utf-8'
>     ns = []
>     for fn in os.listdir(d):
>         bytes = fn.encode(fse, 'python-escape')
>         try:
>             ns.append(FName(bytes.decode(fse, 'strict')))
>         except UnicodeDecodeError:
>             ns.append(FName(fn.decode('utf-8', 'python-escape'),
>                       failed_decode=True))
>     return ns
>
> (And I guess I could define listdir() like this only on the
> non-unicode-safe platforms, as above.)
>
> However, that strikes me as even more horrible than the previous
> "listdir()" work-around, in part because it means decoding, re-encoding,
> and re-decoding every name, so I think I would stick with the previous
> version.
>   

The current unicode mode would skip the filenames you are interested 
(those that fail to decode correctly) - so you would have been forced to 
use the bytes mode. If you need access to the original bytes then you 
should continue to do this. PEP-383 is entirely neutral for your use 
case as far as I can see.

Michael

> Oh, one more note: for Tahoe's purposes you can, in all of the code
> above, replace ".decode('utf-8', 'python-replace')" with
> ".decode('windows-1252')" and it works just as well.  While UTF-8b seems
> like a really cool hack, and it would produce more legible results if
> utf-8-encoded strings were partially corrupted, I guess I should just
> use 'windows-1252' which is already implemented in Python 2 (as well as
> in all other software in the world).
>
> I guess this means that PEP 383, which I have approved of and liked so
> far in this discussion, would actually not help Tahoe at all and would
> in fact harm Tahoe -- I would have to remember to detect and work-around
> the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python
> 3.
>
> If anyone else has a concrete, real use case which would be helped by
> PEP 383, I would like to hear about it.  Perhaps Tahoe can learn
> something from it.
>
> Oh, if this PEP could be extended to add a flag to each unicode object
> indicating whether it was created with the python-escape handler or not,
> then it would be useful to me.
>
> Regards,
>
> Zooko
>
> [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html
> [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog