Unicode error
John Machin
sjmachin at lexicon.net
Sat Jul 24 18:37:26 EDT 2010
dirknbr <dirknbr <at> gmail.com> writes:
> I have kind of developped this but obviously it's not nice, any better
> ideas?
>
> try:
> text=texts[i]
> text=text.encode('latin-1')
> text=text.encode('utf-8')
> except:
> text=' '
As Steven has pointed out, if the .encode('latin-1') works, the result is thrown
away. This would be very fortunate.
It appears that your goal was to encode the text in latin1 if possible,
otherwise in UTF-8, with no indication of which encoding was used. Your second
posting confirmed that you were doing this in a loop, ending up with the
possibility that your output file would have records with mixed encodings.
Did you consider what a programmer writing code to READ your output file would
need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to
latin1??? Did you consider what would be the result of sending a stream of
mixed-encoding text to a display device?
As already advised, the short answer to avoid all of that hassle; just encode in
UTF-8.
More information about the Python-list
mailing list