Is str/unicode.encode supposed to work? with replace/ignore

Matt Nordhoff mnordhoff at mattnordhoff.com
Tue Jan 15 22:47:02 EST 2008


BerlinBrown wrote:
> With this code, ignore/replace still generate an error
> 
> 			# Encode to simple ascii format.
> 			field.full_content = field.full_content.encode('ascii', 'replace')
> 
> Error:
> 
> [0/1] 'ascii' codec can't decode byte 0xe2 in position 14317: ordinal
> not in ran
> ge(128)
> 
> The document in question; is a wikipedia document.  I believe they use
> latin-1 unicode or something similar.  I thought replace and ignore
> were supposed to replace and ignore?

Is field.full_content a str or a unicode? You probably haven't decoded
it from a byte string yet.

>>> field.full_content = field.full_content.decode('utf8', 'replace')
>>> field.full_content = field.full_content.encode('ascii', 'replace')

Why do you want to use ASCII? UTF-8 is great. :-)
-- 



More information about the Python-list mailing list