Everything you did not want to know about Unicode in Python 3

Mark Lawrence breamoreboy at yahoo.co.uk
Mon May 12 22:33:02 EDT 2014


On 13/05/2014 02:18, Steven D'Aprano wrote:
> On Mon, 12 May 2014 17:47:48 +0000, alister wrote:
>
>> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote:
>>
>>> This was *NOT* written by our resident unicode expert
>>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
>>>
>>> Posted as I thought it would make a rather pleasant change from
>>> interminable threads about names vs values vs variables vs objects.
>>
>> Surely those example programs are not the pythonoic way to do things or
>> am i missing something?
>
> Feel free to show us your version of "cat" for Python then. Feel free to
> target any version you like. Don't forget to test it against files with
> names and content that:
>
> - aren't valid UTF-8;
>
> - are valid UTF-8, but not valid in the local encoding.
>
>
>
>> if those code samples are anything to go by this guy makes JMF look
>> sensible.
>
> Armin Ronacher is an extremely experienced and knowledgeable Python
> developer, and a Python core developer. He might be wrong, but he's not
> *obviously* wrong.
>
> Unicode is hard, not because Unicode is hard, but because of legacy
> problems. I can create a file on a machine that uses ISO-8859-7 for the
> file name, put JShift-JIS encoded text inside it, transfer it to a
> machine that uses Windows-1251 as the file system encoding, then SSH into
> that machine from a system using Big5, and try to make sense of it. If
> everybody used UTF-8 any time data touched a disk or network, we'd be
> laughing. It would all be so simple.
>
> Reading Armin's post, I think that all that is needed to simplify his
> Python 3 version is:
>
> - have a bytes version of sys.argv (bargv? argvb?) and read
>    the file names from that;
>
> - have a simple way to write bytes to stdout and stderr.
>
> Most programs won't need either of those, but file system utilities will.
>

I think http://bugs.python.org/issue8776 and 
http://bugs.python.org/issue8775 are relevant but both were placed in 
the small round filing cabinet.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com





More information about the Python-list mailing list