tail

Dennis Lee Bieber wlfraed at ix.netcom.com
Fri May 6 17:10:12 EDT 2022


On Fri, 6 May 2022 21:19:48 +0100, MRAB <python at mrabarnett.plus.com>
declaimed the following:

>Is the file UTF-8? That's a variable-width encoding, so are any of the 
>characters > U+007F?
>
>Which OS? On Windows, it's common/normal for UTF-8 files to start with a 
>BOM/signature, which is 3 bytes/1 codepoint.

	Windows also uses <cr><lf> for the EOL marker, but Python's I/O system
condenses that to just <lf> internally (for TEXT mode) -- so using the
length of a string so read to compute a file position may be off-by-one for
each EOL in the string.

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
"""
In text mode, the default when reading is to convert platform-specific line
endings (\n on Unix, \r\n on Windows) to just \n. When writing in text
mode, the default is to convert occurrences of \n back to platform-specific
line endings. This behind-the-scenes modification to file data is fine for
text files, but will corrupt binary data like that in JPEG or EXE files. Be
very careful to use binary mode when reading and writing such files.
"""



-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/


More information about the Python-list mailing list