[OT] does the charset lie?
Robert Brewer
fumanchu at amor.org
Sun May 2 17:14:03 EDT 2004
David Goodger wrote:
> Skip Montanaro wrote:
> > OTOH, this means if I need the raw
> > content of the page (after expanding any entities), I need
> to so something
> > like (assuming the raw bytes are already in data):
> >
> > data = unicode(data, "iso-8859-1").encode("utf-8")
> > data = map_entities_to_utf_8(data)
> > data = unicode(data, "utf-8")
>
> Or, even simpler, skip the intermediate step:
>
> data = unicode(data, "iso-8859-1")
> data = map_entities_to_unicode(data)
>
> map_entities_to_unicode() could use htmlentitydefs.name2codepoint from
> the stdlib. This must have already been done somewhere.
As an average, I'd guess at least once per Python web app. ;)
FuManChu
More information about the Python-list
mailing list