Newbie question about text encoding

Marko Rauhamaa marko at pacujo.net
Sat Mar 7 12:14:28 EST 2015


Chris Angelico <rosuav at gmail.com>:

> If you really REALLY can't use the bytes() type to work with something
> that is, yaknow, bytes, then you could use an alternative encoding
> that has a value for every byte. It's still not Unicode text, so it
> doesn't much matter which encoding you use. But it's much better to
> use the bytes type to work with bytes. It is not text, so don't treat
> it as text.

See:

   $ mkdir /tmp/xyz
   $ touch /tmp/xyz/$'\x80'
   $ python3
   Python 3.3.2 (default, Dec  4 2014, 12:49:00) 
   [GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import os
   >>> os.listdir('/tmp/xyz')
   ['\udc80']
   >>> open(os.listdir('/tmp/xyz')[0])
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   FileNotFoundError: [Errno 2] No such file or directory: '\udc80'

File names encoded with Latin-X are quite commonplace even in UTF-8
locales.


Marko



More information about the Python-list mailing list