Python 3.0 automatic decoding of UTF16
John Machin
sjmachin at lexicon.net
Sun Dec 7 17:20:03 EST 2008
On Dec 8, 2:05 am, Johannes Bauer <dfnsonfsdu... at gmx.de> wrote:
> John Machin schrieb:
>
> > He did. Ugly stuff using readline() :-) Should still work, though.
>
> Well, well, I'm a C kinda guy used to while (fgets(b, sizeof(b), f))
> kinda loops :-)
>
> But, seriously - I find that whole "while True:" and "if line == """
> construct ugly as hell, too. How can reading a file line by line be
> achieved in a more pythonic kind of way?
By using
for line in open(.....)
as mentioned in (1) my message that you were replying to (2) the
tutorial:
http://docs.python.org/3.0/tutorial/inputoutput.html#reading-and-writing-files
... skip the stuff on readline() and readlines() this time :-)
While waiting for the bug to be fixed, you'll need something like the
following:
def utf16_getlines(fname, newline_terminated=True):
f = open(fname, 'rb')
raw_bytes = f.read()
f.close()
decoded = raw_bytes.decode('utf16')
if newline_terminated:
normalised = decoded.replace('\r\n', '\n')
lines = normalised.splitlines(True)
else:
lines = decoded.splitlines()
return lines
That avoids the chunk-reading problem by reading the whole file in one
go. In fact given the way I've written it, there can be 4 copies of
the file contents. Fortunately your files are tiny.
HTH,
John
More information about the Python-list
mailing list