[Baypiggies] HTML code sets

Max Slimmer max at theslimmers.net
Fri Oct 26 21:35:02 CEST 2007


Interesting, I tried this (was aware that ISO-8859-1 is latin-1), and it doesn't work on my system. But treating it as
A utf-8 string as suggested by Chris Clark did work. I wonder if you have some encoding set in your machine different from mine. Cutting and pasting the lines you indicated return 

>>> print s.encode("latin-1")
enforcing the nation’s laws


 

> -----Original Message-----
> From: Paul McNett [mailto:p at ulmcnett.com] 
> Sent: Friday, October 26, 2007 11:46 AM
> To: Max Slimmer
> Cc: 'Python'
> Subject: Re: [Baypiggies] HTML code sets
> 
> Hi Max,
> 
> > I am reading some raw HTML that contains things like: 
> > 
> > "enforcing the nation\xe2\x80\x99s laws" 
> > 
> > and I need to know what incantation to apply to translate the 
> > xe2,x80,x99 into some kind of apostrophe char. I can 
> initialize this 
> > string as str or unicode.
> > 
> > The headers are:
> > '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html
> > xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<meta
> > http-equiv="Content-Type" content="text/html; 
> charset=ISO-8859-1" />\n
> 
> 
> ISO-8859-1 is also known as latin-1.
> 
>  >>> s = u"enforcing the nation\xe2\x80\x99s laws"
>  >>> print s.encode("latin-1")
> enforcing the nation’s laws
> 
> --
> pkm ~ http://paulmcnett.com
> 



More information about the Baypiggies mailing list