Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?

Simon Evans musicalhacksaw at yahoo.co.uk
Sun Jul 12 04:59:57 EDT 2015


Dear Mark Lawrence, thank you for your advice. 
I take it that I use the input you suggest for the line :

soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml")

seeing as I have to give the file's full address I therefore have to modify your :

soup = BeautifulSoup(ecological_pyramid,"lxml")

to :

soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml")

otherwise I get :


>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as ecological_pyramid:
>>> soup = BeautifulSoup(ecological_pyramid,"lxml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ecological_pyramid' is not defined


so anyway with the input therefore as:

>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as ecological_pyramid: 
>>> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid,","lxml")
>>> producer_entries = soup.find("ul")
>>> print(producer_entries.li.div.string)

I still get the following output from the console:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'li'
>>>

As is probably evident, what is the problem Python has with finding the required html code within the 'ecologicalpyramid' html file, or more specifically why does it respond that the html file has no such attribute as 'li' ?
Incidentally I have installed all the xml, lxml, html, and html5 TreeBuilders/ Parsers. I am using lxml as that is the format specified in the text. 

I may as well quote the text on the page in question in 'Getting Started with Beautiful Soup':

'Since producers come as the first entry for the <ul>tag, we can use the find() method, which normally searches fo ronly the first occurrance of a particular tag in a BeautifulSoup object. We store this in producer_entries. The next line prints the name of the first producer. From the previous HTML diagram we can understand that the first producer is stored inside the first <div> tag of the first <li> tag that immediately follows the first <ul> tag , as shown inthe following code: 

<ul id = "producers">
<li class= "producerlist">
<div class= "name">plants</div>
<div class="name">100000</div>
</li>
</ul>

So after running the preceding code, we will get plants, which is the first producer, as the output.'

(page 30)



More information about the Python-list mailing list