[Baypiggies] HTML code sets

Paul McNett p at ulmcnett.com
Fri Oct 26 20:46:22 CEST 2007


Hi Max,

> I am reading some raw HTML that contains things like: 
> 
> "enforcing the nation\xe2\x80\x99s laws" 
> 
> and I need to know what incantation to apply to translate the xe2,x80,x99
> into some kind of apostrophe char. I can initialize this string as str or
> unicode.
> 
> The headers are:
> '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html
> xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<meta
> http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />\n


ISO-8859-1 is also known as latin-1.

 >>> s = u"enforcing the nation\xe2\x80\x99s laws"
 >>> print s.encode("latin-1")
enforcing the nation’s laws

-- 
pkm ~ http://paulmcnett.com


More information about the Baypiggies mailing list