how to remove 'FFFD' character
Carsten Haese
carsten.haese at gmail.com
Fri Jan 9 14:12:44 EST 2009
webcomm wrote:
> I don't know what the character encoding of this data is and don't
> know what the 'FFFD' represents.
The codepoint 0xFFFD is the so-called 'REPLACEMENT CHARACTER'. It is
used replace an incoming character whose value is unknown or
unrepresentable in Unicode. The browser might display these if for
example a page is encoded in latin-1 but it claims to be utf-8, so the
byte stream will contain byte sequences that can't be decoded into
unicode code points.
> I just
> want to scrub it out. I tried this...
>
> clean = txt.encode('ascii','ignore')
>
> ...but the 'FFFD' still comes through.
You must be doing something wrong, then:
py> u'Hello,\ufffd World'.encode('ascii', 'ignore')
'Hello, World'
HTH,
--
Carsten Haese
http://informixdb.sourceforge.net
More information about the Python-list
mailing list