Python 3.0 automatic decoding of UTF16

John Machin sjmachin at lexicon.net
Fri Dec 5 19:26:36 EST 2008


On Dec 6, 10:35 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
> >> So UTF-16 has an explicit EOF marker within the text?
>
> > No, it does not.  I don't know what Terry's thinking of there, but text
> > files do not have any EOF marker.  They start at the beginning
> > (sometimes including a byte-order mark), and go till the end of the
> > file, period.
>
> Windows text files still interpret ctrl-Z as EOF, or at least Windows XP
> does. Vista, who knows?

This applies only to files being read in an 8-bit text mode. It is
inherited from MS-DOS, which followed the CP/M convention, which was
necessary because CP/M's file system recorded only the physical file
length in 128-byte sectors, not the logical length. It is likely to
continue in perpetuity, just as standard railway gauge is (allegedly)
based on the axle-length of Roman chariots.

None of this is relevant to the OP's problem; his file appears to have
been truncated rather than having spurious data appended to it.



More information about the Python-list mailing list