[omaha] Parsing bad html
Matthew Nuzum
newz at bearfruit.org
Wed Dec 12 16:38:43 CET 2007
On Dec 12, 2007 7:57 AM, Mike Hostetler <mike at hostetlerhome.com> wrote:
> I was going to say the same thing. If it looks
> something like HTML, then BeautifulSoup can parse it. It's really a killer
> library for Python (although now there is a Ruby version of it).
>
> BeautifulSoup also has one of my favorite-named classes of all time:
>
> class UnicodeDammit
> | A
> class for detecting the encoding of a *ML document and
> | converting it to a Unicode string. If
> the source encoding is
> | windows-1252,
> can replace MS smart quotes with their HTML or XML
> | equivalents.
>
>
Oh, that is so beautiful. Nice tip, cp1252 is a curse.
--
Matthew Nuzum
newz2000 on freenode
More information about the Omaha
mailing list