Newbie question about text encoding
Marko Rauhamaa
marko at pacujo.net
Sat Mar 7 12:14:28 EST 2015
Chris Angelico <rosuav at gmail.com>:
> If you really REALLY can't use the bytes() type to work with something
> that is, yaknow, bytes, then you could use an alternative encoding
> that has a value for every byte. It's still not Unicode text, so it
> doesn't much matter which encoding you use. But it's much better to
> use the bytes type to work with bytes. It is not text, so don't treat
> it as text.
See:
$ mkdir /tmp/xyz
$ touch /tmp/xyz/$'\x80'
$ python3
Python 3.3.2 (default, Dec 4 2014, 12:49:00)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir('/tmp/xyz')
['\udc80']
>>> open(os.listdir('/tmp/xyz')[0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '\udc80'
File names encoded with Latin-X are quite commonplace even in UTF-8
locales.
Marko
More information about the Python-list
mailing list