Python 3.0 automatic decoding of UTF16

MRAB google at mrabarnett.plus.com
Fri Dec 5 20:05:17 EST 2008


John Machin wrote:
> On Dec 6, 10:35 am, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
>>>> So UTF-16 has an explicit EOF marker within the text?
>>> No, it does not.  I don't know what Terry's thinking of there, but text
>>> files do not have any EOF marker.  They start at the beginning
>>> (sometimes including a byte-order mark), and go till the end of the
>>> file, period.
>> Windows text files still interpret ctrl-Z as EOF, or at least Windows XP
>> does. Vista, who knows?
> 
> This applies only to files being read in an 8-bit text mode. It is
> inherited from MS-DOS, which followed the CP/M convention, which was
> necessary because CP/M's file system recorded only the physical file
> length in 128-byte sectors, not the logical length. It is likely to
> continue in perpetuity, just as standard railway gauge is (allegedly)
> based on the axle-length of Roman chariots.
> 
The chariots in question were drawn by 2 horses, so the gauge is based 
in the width of a horse. :-)

> None of this is relevant to the OP's problem; his file appears to have
> been truncated rather than having spurious data appended to it.



More information about the Python-list mailing list