Newbie problem with codecs
Andrew Dalke
adalke at mindspring.com
Fri Aug 22 03:41:34 EDT 2003
derek / nul
> My code so far
...
> t = unicode(eng_file, "utf-16-le")
> print t
> -----------------------------------------------------
>
> The print fails (as expected) with a non printing char '\ufeff' which is
of
> course the BOM.
> Is there a nice way to strip off the BOM?
How does it fail? It may be because print tries to convert the
data as appropriate for your IDE or terminal, and fails. Eg, the
default expects ASCII. See
http://www.python.org/cgi-bin/faqw.py?req=show&file=faq04.102.htp
Asa a guess, since you're on MS Windows, your terminal might
expect mbcs. Try
print t.encode('mbcs')
If you really want to strip it off, do t[2:] (or [4:]?), to get the
string after the first 2/4 characters (the BOM) in the string. But
I doubt that's the correct solution.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list