Python unicode and Windows cmd.exe

Neil Hodgson nyamatongwe+thunder at gmail.com
Sun Mar 14 17:05:59 EDT 2010


Guillermo:

> I then open the file m.txt with notepad, and I see "mañana" normally.
> I save (again, no actual modifications), go back to the dos prompt, do
> type m.txt and this time it works! I get "mañana". When notepad opens
> the file, the encoding is already UTF-8, so short of a UTF-8 bom being
> added to the file, 

   That is what happens: the file now starts with a BOM \xEB\xBB\xBF as
you can see with a hex editor.

> I don't know what happens when I save the
> unmodified file. Also, I would think that the python script should
> save a valid utf-8 file in the first place...

   Its just as valid UTF-8 without a BOM. People have different opinions
on this but for compatibility, I think it is best to always start UTF-8
files with a BOM.

   Neil



More information about the Python-list mailing list