HTMLParser error
alex23
wuwei23 at gmail.com
Wed May 21 06:04:37 EDT 2008
On May 21, 6:58 pm, jonbutle... at googlemail.com wrote:
> Its not a variable I set, its one of HTMLParser's inbuilt variables. I
> am using it with urlopen to get the source of a website and feed it to
> htmlparser.
>
> def parse(self, page):
> try:
> self.feed(urlopen('http://' + page).read())
> except HTTPError:
> print 'Error getting page source'
>
> This is the code I am using. I have tested the other modules and they
> work fine, but I havn't got a clue how to fix this one.
You're not providing enough information. Try to post a minimal code
fragment that demonstrates your error; it gives us all a common basis
for discussion.
Is your Spider class a subclass of HTMLParser? Is it over-riding
__init__? If so, is it doing something like:
super(Spider, self).__init__()
If this is your issue, looking at the HTMLParser code you could get
away with just doing the following in __init__:
self.reset()
This appears to be the function that adds the .rawdata attribute.
Ideally, you should use the former super() syntax...you're less
reliant on the implementation of HTMLParser that way.
- alex23
More information about the Python-list
mailing list