elementtree w/utf8
Tim Arnold
tim.arnold at sas.com
Fri Oct 26 13:15:30 EDT 2007
"Marc 'BlackJack' Rintsch" <bj_666 at gmx.net> wrote in message
news:5ocgedFm1hl5U5 at mid.uni-berlin.de...
> On Thu, 25 Oct 2007 17:15:36 -0400, Tim Arnold wrote:
>
>> Hi, I'm getting the by-now-familiar error:
>> return codecs.charmap_decode(input,errors,decoding_map)
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in
>> position
>> 4615: ordinal not in range(128)
>>
>> the html file I'm working with is in utf-8, I open it with codecs, try to
>> feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
>> from elementtree import ElementTree as ET
>> from elementtidy import TidyHTMLTreeBuilder
>>
>> fd = codecs.open(htmfile,encoding='utf-8')
>> tidyTree =
>> TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
>> tidyTree.feed(fd.read())
>> self.tree = tidyTree.close()
>> fd.close()
>>
>> what am I doing wrong? Thanks in advance.
>
> You feed decoded data to `TidyHTMLTreeBuilder`. As the `encoding`
> argument suggests this class wants bytes not unicode. Decoding twice
> doesn't work.
>
> Ciao,
> Marc 'BlackJack' Rintsch
well now that you say it, it seems so obvious...
some day I will get the hang of this encode/decode stuff. When I read about
it, I'm fine, it makes sense, etc. maybe even a little boring. And then I
write stuff like the above!
Thanks to you and Diez for straightening me out.
--Tim
More information about the Python-list
mailing list