unicode and strings
Diez B. Roggisch
deetsNOSPAM at web.de
Wed Nov 3 05:29:59 EST 2004
Jacob Friis wrote:
> I'm trying to learn Python via Marks Feedparser.
>
> <snip src="http://feedparser.org/docs/character-encoding.html">
> If the character encoding can not be determined, Universal Feed Parser
> sets the bozo bit to 1 and sets bozo_exception to
> feedparser.CharacterEncodingUnknown. In this case, parsed values will be
> strings, not Unicode strings.
> </snip>
>
> I guess this means that all data will be unicode, and to put in a
> database I could use my mycode function. Correct?
No. It means that you don't get unicode objects, but strings which are
basically sequences of bytes. And there is no way to be sure what encoding
they are in.
>
> def mycode(value):
> if isinstance(value, unicode):
> value = value.encode('utf-8')
> return value
this will either yield a string in utf8-encoding, or a string in an unknown
encoding.
--
Regards,
Diez B. Roggisch
More information about the Python-list
mailing list