[Tutor] beautifulsoup - getting an error when NavigableString object is returned

Clay Wiedemann clay.wiedemann at gmail.com
Sun Mar 4 04:22:05 CET 2007


I wanted to strip the quotes from IMDB quote pages, just to start
learning python. Quotes are not nested, so I got the anchor links that
precede them. I thought I could walk down until I hit an HR tag,
meanwhile grabbing people and quotes via hits on <b> and <br>.
But once I tried to walk down from my hit on the anchor link and pull
the name, I found I kept getting a NavigableString instead of tag, so
asking for the .name attribute gave an error.

Any idea why this might happen?


This is the relevant chunk of IMDB code:

<a name="qt0210620"></a>

<b><a href="/name/nm0629454/">Bill</a></b>:
You're supposed to wear the blue dress when I wear this.
<br>

<b><a href="/name/nm0707043/">Mary</a></b>:
I don't want to dress like twins anymore.
<br>

<b><a href="/name/nm0629454/">Bill</a></b>:
We're not twins. We're a trio.
<br>
<hr width="30%">


---


And this is what I wrote (and if there are other awful things about
this, I would be happy to know):


#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup
import re


# stubs --------------------------

movietitle_stub = "Nashville" 							#later search an pull first
result (if movie?)
movieurl_stub = "http://imdb.com/title/tt0073440/" 		#and get this



def soupifyPage(target):
	"""
	grab html from a page
	probably need real method of checking for failure, huh
	"""
	codeReq = urllib2.Request(target)
	response = urllib2.urlopen(codeReq)
	soupyhtml = BeautifulSoup(response)
	return soupyhtml


def pullQuote(curTag):
	# character is in bold
	print curTag.nextSibling.name
	'''
	if curTag.nextSibling.name == 'hr':
		#are done
		return quoteBlock
	print "seeing" + curTag.nextSibling.name
	quoteBlock = quoteBlock + " - " + curTag.nextSibling.name
	curTag = curTag.nextSibling
	'''




quotepage = movieurl_stub + "quotes"
print "Getting this:" + quotepage
print "---------------"
quotebag = soupifyPage(quotepage)


# each quote is preceded by anchorlink, begins with qt : example <a
name="qt0229419"></a>
# the end with an HR tag
# they are not nested

quotations = quotebag.findAll(attrs = {'name' : re.compile("^qt")})

for q in quotations:
	#pullQuote(q)
	print q.nextSibling.name  # attribute error: "'NavigableString'
object has no attribute 'name'"
	print "next!"
		



Thanks,
Clay

- - - - - - -

Clay S. Wiedemann


More information about the Tutor mailing list