First two bytes of 'stdout' are lost

Thu Apr 11 17:55:55 EDT 2024

On 11Apr2024 14:42, Olivier B. <perso.olivier.barthelemy at gmail.com> wrote:
>I am trying to use StringIO to capture stdout, in code that looks like this:
>
>import sys
>from io import StringIO
>old_stdout = sys.stdout
>sys.stdout = mystdout = StringIO()
>print( "patate")
>mystdout.seek(0)
>sys.stdout = old_stdout
>print(mystdout.read())
>
>Well, it is not exactly like this, since this works properly

Aye, I just tried that. All good.

>This code is actually run from C++ using the C Python API.
>This worked quite well, so the code was right at some point. But now,
>two things changed:
> - Now using python 3.11.7 instead of 3.7.12
> - Now using only the python limited C API

Maybe you should post the code then: the exact Python code and the exact 
C++ code.

>And it seems that now, mystdout.read() always misses the first two
>characters that have been written to stdout.
>
>My first ideas was something related to the BOM improperly truncated
>at some point, but i am manipulating UTF-8, so the bom would be 3
>bytes, not 2.

I didn't think UTF-8 needed a BOM. Somone will doubtless correct me.

However, does the `mystdout.read()` code _know_ you're using UTF-8? I 
have the vague impression that eg some Windows systems default to UTF-16 
of some flavour, possibly _with_ a BOM.

I'm suggesting that you rigorously check that the bytes->text bits know 
what text encoding they're using. If you've left an encoding out 
anywhere, put it in explicitly.

>Hopefully someone has a clue on what would have changed in Python for
>this to stop working compared to python 3.7?

None at all, alas. My experience with the Python C API is very limited.