unicode compare errors

Mon Dec 13 10:33:07 EST 2010

On Dec 10, 4:09 pm, Nobody <nob... at nowhere.com> wrote:
> On Fri, 10 Dec 2010 11:51:44 -0800, Ross wrote:
> > Since I can't control the encoding of the input file that users
> > submit, how to I get past this?  How do I make such comparisons be
> > True?
> On Fri, 10 Dec 2010 12:07:19 -0800, Ross wrote:
> > I found I could import codecs that allow me to read the file with my
> > desired encoding. Huzzah!
> > If I'm off-base and kludgey here and should be doing something
>
> Er, do you know the file's encoding or don't you? Using:
>
>     aFile = codecs.open(thisFile, encoding='utf-8')
>
> is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
> you'll get decoding errors.
>
> If you are given a file with no known encoding, then you can't reliably
> determine what /characters/ it contains, and thus can't reliably compare
> the contents of the file against strings of characters, only against
> strings of bytes.
>
> About the best you can do is to use an autodetection library such as:
>
>        http://chardet.feedparser.org/

That's right I don't know what encoding the user will have used. The
use of autodetection sounds good - I'll look into that. Thx.

R.