Installing Parsers/Tree Builders to, and accessing these packages from Python2.7

Simon Evans musicalhacksaw at yahoo.co.uk
Sun Nov 2 16:59:25 EST 2014


Oh I don't mind quoting console output, I just thought I'd be sparing you 

unnecessary detail. 

output was going nicely as I input text from my 'Getting Started with 

Beautiful Soup' even when the author reckoned things would go wrong - due to

lxml not being installed, things went right, because I had already installed

it, re:
----------------------------------------------------------------------------
page 17
----------------------------------------------------------------------------
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> url = "http://www.packtpub.com/books"
>>> page = urllib2.urlopen(url)
>>> soup_packtpage = BeautifulSoup(page)
>>> with open("foo.html","r") as foo_file:
... soup_foo = Soup(foo_file)
  File "<stdin>", line 2
    soup_foo = Soup(foo_file)
           ^
IndentationError: expected an indented block
>>> soup_foo= BeautifulSoup("foo.html")
----------------------------------------------------------------------------
page 18
----------------------------------------------------------------------------
>>> print(soup_foo)
<html><body><p>foo.html</p></body></html>
>>> soup_url = BeautifulSoup("http://www.packtpub.com/books")
>>> print(soup_url)
<html><body><p>http://www.packtpub.com/books</p></body></html>
>>> helloworld = "<p>Hello World</p>"
>>> soup_string = BeautifulSoup(helloworld)
>>> print(soup_string)
<html><body><p>Hello World</p></body></html>
----------------------------------------------------------------------------
page 19: no code in text on this page
----------------------------------------------------------------------------
page 20
----------------------------------------------------------------------------
>>> soup_xml = BeautifulSoup(helloworld,features= "xml")
>>> soup_xml = BeautifulSoup(helloworld,"xml")
>>> print(soup_xml)
<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>
>>> soup_xml = BeautifulSoup(helloworld,features = "xml")
>>> print(soup_xml)
<?xml version="1.0" encoding="utf-8"?>
<p>Hello World</p>
>>>
----------------------------------------------------------------------------
Then on bottom of page 20 it says 'we should install the required parsers using easy-install,pip or setup.py install' but as I can't get the downloads of html or html5 parsers, text code halfway down returns statutory response regarding requisite parser needing to be installed, re: 
----------------------------------------------------------------------------
page 21
----------------------------------------------------------------------------
>>> invalid_html = '<a invalid content'
>>> soup_invalid_html = BeautifulSoup(invalid_html,'lxml')
>>> print(soup_invalid_html)
<html><body><a content="" invalid=""></a></body></html>
>>> soup_invalid_html = BeautifulSoup(invalid_html,'html5lib')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\bs4\__init__.py", line 155, in __init__
    % ",".join(features))
ValueError: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
>>>



More information about the Python-list mailing list