HTMLparsing abnormal html pages

Aahz Maruch aahz at panix.com
Fri Mar 16 19:14:11 EST 2001


In article <98pvp1$15t$1 at news.netmar.com>,  <asle at spam.com> wrote:
>
>Considering the small program below. Running it will show that the
>HTMLparser
>is truncating urls in the HTML page. Now, most of you will probably say that
>the page and in particular the URL's of this page are not valid according to
>the RFC1738 protocol --bad luck. But there must be a work-around for this?

For this specific case, Mark's solution may well work (haven't tested it
myself).  But you cannot easily find a generic solution because of all
the different ways to mangle HTML.
-- 
                      --- Aahz  <*>  (Copyright 2001 by aahz at pobox.com)

Androgynous poly kinky vanilla queer het Pythonista   http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6

Three sins: BJ, B&J, B&J



More information about the Python-list mailing list