python+libxml2+scrapy AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

Sat Aug 18 13:56:42 EDT 2012

Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> When I run scrapy, I get
> 
>   File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in <module>
>     libxml2.HTML_PARSE_NOERROR + \
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> 
> When I run
>  python -c 'import libxml2; libxml2.HTML_PARSE_RECOVER'
> 
> I get
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
> 
> How can I cure it?
> 
> Python 2.7
> libxml2-python 2.6.9
> 2.6.11-gentoo-r6

That version of libxml2 is way too old and doesn't support parsing
real-world HTML. IIRC, that started with 2.6.21 and got improved a bit
after that.

Get a 2.8.0 installation, as someone pointed out already.

Stefan