How do you read unicode files?
Matt Gerrans
mgerrans at mindspring.com
Fri Jun 7 00:35:47 EDT 2002
How do you read in a unicode file and convert it to a standard string?
It seems that when you open a file and read it, what you get is a string of
single-byte characters. I've tried all kinds of permutations of calls to
unicode(), decode(), encode(), etc. with different flavors of encoding
('utf-8', 'utf-16' and so on).
I could parse the data myself (skipping the initial two bytes and then every
other one -- I'm only working with ASCII in double byte format, so the high
order byte is always 0), but I imagine there must be a way to get the
existing tools to work.
What I want to be able to do is write a search and replace tool that will
work equally well on ANSI and Unicode (or double-byte) text files (without
changing the file type, of course)...
More information about the Python-list
mailing list