Newbie problem with codecs

Andrew Dalke adalke at mindspring.com
Fri Aug 22 03:41:34 EDT 2003


derek / nul
> My code so far
   ...
> t = unicode(eng_file, "utf-16-le")
> print t
> -----------------------------------------------------
>
> The print fails (as expected) with a non printing char  '\ufeff'  which is
of
> course the BOM.
> Is there a nice way to strip off the BOM?

How does it fail?  It may be because print tries to convert the
data as appropriate for your IDE or terminal, and fails.  Eg, the
default expects ASCII.  See

http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.102.htp

Asa a guess, since you're on MS Windows, your terminal might
expect mbcs.  Try

print t.encode('mbcs')

If you really want to strip it off, do t[2:] (or [4:]?), to get the
string after the first 2/4 characters (the BOM) in the string.  But
I doubt that's the correct solution.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list