Newbie problem with codecs

Andrew Dalke adalke at mindspring.com
Thu Aug 21 06:06:53 EDT 2003


derek / nul:
> Andrew, this what I was expecting, but my system does not do it.
>
> codecs.lookup("utf-16-le")
>
> this is the code cut from my program, but there is NO output from my
program.

I barely know what I am doing, and given that you don't report
what you mean by "does not do it" nor "NO output from my program"
makes it hard to track down this problem.

Still, this might help.  Suppose you wanted to read from a utf-16-le
encoded file and write to a utf-8 encoded file.  You can do

reader_factory = codecs.getreader("utf-16-le")
writer_factory = codecs.getwriter("utf-8")

reader = reader_factory(open("input.utf16")
writer = writer_factory(open("output.utf8", "rb")
while 1:
    s = reader.read(100000)
    if not s:
        break
    writer.write(s)

I've not actually tested this, but it seems that that should
work given the API and my limited experimentation.

I've not worked with this before, so if things fail, please
repost (with more details) and hopefully someone with
better knowledge can help you out.

The other options is to do the conversion through strings
instead of through files.

# s = "....some set of bytes with your utf-16 in it .."
s = open("input.utf16", "rb").read() # the whole file

# convert to unicode, given the encoding
t = unicode(s, "utf-16-le")

# convert to utf-8 encoding
s2 = t.encode("utf-8")

open("output.utf8", "rb").write(s2)


Again, untested.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list