The "loop and a half"

eryk sun eryksun at gmail.com
Fri Oct 6 10:34:44 EDT 2017


On Fri, Oct 6, 2017 at 2:35 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>
> cmd:
>     for /f %i in ('gcc -E program.c') do ...

Note that CMD processes the output as decoded Unicode text instead of
encoded bytes. This is often a source of mojibake. It runs the above
command with stdout redirected to a pipe, and it decodes the output
line-by-line using the console's output codepage from
GetConsoleOutputCP(). The console defaults to the system locale's OEM
codepage. If CMD is run without a console (detached), it defaults to
the system locale's ANSI codepage. Commonly, the program being run
might write its output as OEM, ANSI, or UTF-8 text. For example,
Python defaults to ANSI on Windows, but %PYTHONIOENCODING% could
override this as UTF-8.

You can change the console's input and output codepages to another
codepage (but not separate values) via chcp.com. For example, `chcp
65001`  sets it to UTF-8. But don't change it permanently, and
certainly don't leave it in its (buggy) UTF-8 mode. Instead, a batch
script should save the previous codepage and switch back when it's
done.



More information about the Python-list mailing list