How do you read unicode files?

Matt Gerrans mgerrans at mindspring.com
Fri Jun 7 00:35:47 EDT 2002


How do you read in a unicode file and convert it to a standard string?

It seems that when you open a file and read it, what you get is a string of
single-byte characters.   I've tried all kinds of permutations of calls to
unicode(), decode(), encode(), etc. with different flavors of encoding
('utf-8',  'utf-16' and so on).

I could parse the data myself (skipping the initial two bytes and then every
other one -- I'm only working with ASCII in double byte format, so the high
order byte is always 0), but I imagine there must be a way to get the
existing tools to work.

What I want to be able to do is write a search and replace tool that will
work equally well on ANSI and Unicode (or double-byte) text files (without
changing the file type, of course)...





More information about the Python-list mailing list