[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)
Julian Taylor
report at bugs.python.org
Fri Apr 14 10:18:56 EDT 2017
New submission from Julian Taylor:
Probably a case of 'don't do that' but reading lines in a compressed files in binary mode produces bytes with invalid newlines in encodings that where '\n' is encoded as something else:
with lzma.open("test.xz", "wt", encoding="UTF-32-LE") as f:
f.write('0 1 2\n3 4 5');
lzma.open("test.xz", "rb").readlines()[0].decode('UTF-32-LE')
Fails with:
UnicodeDecodeError: 'utf-32-le' codec can't decode byte 0x0a in position 20: truncated data
as readlines() produces:
b'0\x00\x00\x00 \x00\x00\x001\x00\x00\x00 \x00\x00\x002\x00\x00\x00\n'
The last newline should be '\n'.encode('UTF-32-LE') == b'\n\x00\x00\x00'
----------
components: Library (Lib)
messages: 291661
nosy: jtaylor
priority: normal
severity: normal
status: open
title: binary compressed file reading corrupts newlines (lzma, gzip, bz2)
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30073>
_______________________________________
More information about the Python-bugs-list
mailing list