Urllib2, problems with a webserver

John J. Lee jjl at pobox.com
Tue Aug 31 16:04:58 EDT 2004


Erling Ringen Elvsrud <ere.lists at killozapHALLO.com.invalid> writes:
[...]
> HTMLParser.HTMLParseError: malformed start tag, at line 2, column 1365
[...]
> How come I get this error? 
[...]

Bad HTML.  (OK, I haven't actually looked at the HTML, but it's 100/1
that HTMLParser is at fault.)

I hope eventually to rewrite mechanize to use htmllib.HTMLParser
everywhere, and not use HTMLParser.HTMLParser.  The former is less
fussy.  That just means rewriting pullparser to support both classes,
I think.  Not too hard (see ClientForm for how to do it -- why not
write a patch?-).

In the meantime, the best thing to do is to pre-process the HTML.
Inconvenient, I know.  Also a bit inconvenient is that the only way to
do this ATM with mechanize is to write a tiny urllib2 handler class
(.http_response() is the handler method you want, which only exists in
the as-yet-unreleased Python 2.4, and in ClientCookie, which has a
near-identical interface to urllib2; mechanize uses ClientCookie, not
urllib2).  See posts to the wwwsearch-general mailing lists for sample
code.

Don't mix urllib2 and ClientCookie, BTW (with the exception of classes
that exist in urllib2 but not in ClientCookie: you can use those
urllib2 classes with ClientCookie).

HTH


John



More information about the Python-list mailing list