[Tutor] Beautiful soup
paul brian
paul1brian at gmail.com
Tue Oct 4 19:13:45 CEST 2005
How did you change it to look at the file on your PC?
You appear to have told urllib2 to use "FRE_word_list.htm", it cannot
find that online so tried to look for it on your local disk at
'\\C:\\Python24\\FRE_word_list.htm
I would suggest that you either put your local html on a web server
and send in that local URL or replace html =
urllib2.urlopen(url).read() with
html = open(r'c:\myfolder\myfile.html').read() and see where that takes you.
cheers
On 10/4/05, David Holland <davholla2002 at yahoo.co.uk> wrote:
> I tried to use this script which I found on the web :-
> import urllib2, pprint
> from BeautifulSoup import BeautifulSoup
>
>
> def cellToWord(cell):
> """Given a table cell, return the word in that
> cell."""
> # Some words are in bold.
> if cell('b'):
> return cell.first('b').string.strip() #
> Return the bold piece.
> else:
> return cell.string.split('.')[1].strip() #
> Remove the number.
>
>
> def parse(url):
> """Parse the given URL and return a dictionary
> mapping US words to
> foreign words."""
>
>
> # Read the URL and pass it to BeautifulSoup.
> html = urllib2.urlopen(url).read()
> soup = BeautifulSoup()
> soup.feed(html)
>
>
> # Read the main table, extracting the words from
> the table cells.
> USToForeign = {}
> mainTable = soup.first('table')
> rows = mainTable('tr')
> for row in rows[1:]: # Exclude the first
> (headings) row.
> cells = row('td')
> if len(cells) == 3: # Some rows have a
> single colspan="3" cell.
> US = cellToWord(cells[0])
> foreign = cellToWord(cells[1])
> USToForeign[US] = foreign
>
>
> return USToForeign
>
>
> if __name__ == '__main__':
> url =
> 'http://msdn.microsoft.com/library/en-us/dnwue/html/FRE_word_list.htm'
>
> USToForeign = parse(url)
> pairs = USToForeign.items()
> pairs.sort(lambda a, b: cmp(a[0].lower(),
> b[0].lower())) # Web page order
> pprint.pprint(pairs)
>
> and it works well. However I change it to get it to
> look at a file on my PC, then I get this message :-
> Traceback (most recent call last):
> File "C:\Python24\beaexp2", line 43, in -toplevel-
> USToForeign = parse(url)
> File "C:\Python24\beaexp2", line 20, in parse
> html = urllib2.urlopen(url).read()
> File "C:\Python24\lib\urllib2.py", line 130, in
> urlopen
> return _opener.open(url, data)
> File "C:\Python24\lib\urllib2.py", line 358, in open
> response = self._open(req, data)
> File "C:\Python24\lib\urllib2.py", line 376, in
> _open
> '_open', req)
> File "C:\Python24\lib\urllib2.py", line 337, in
> _call_chain
> result = func(*args)
> File "C:\Python24\lib\urllib2.py", line 1119, in
> file_open
> return self.open_local_file(req)
> File "C:\Python24\lib\urllib2.py", line 1135, in
> open_local_file
> stats = os.stat(localfile)
> OSError: [Errno 2] No such file or directory:
> '\\C:\\Python24\\FRE_word_list.htm
> Any idea how to solve it ? The file is on my PC.
>
> I am using Python 2.4 on Win XP.
>
> Thanks in advance.
>
> David
>
>
>
> ___________________________________________________________
> How much free photo storage do you get? Store your holiday
> snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
--
--------------------------
Paul Brian
m. 07875 074 534
t. 0208 352 1741
More information about the Tutor
mailing list