[Tutor] list iteration question for writing to a file on disk

sacha rook sacharook at hotmail.co.uk
Fri Sep 14 11:56:57 CEST 2007


Hi
 
can someone help with this please?
 
i got to this point with help from the list.
 
from BeautifulSoup import BeautifulSoupdoc = ['<html><head><title>Page title</title></head>',       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',       '<a href="http://www.google.co.uk"></a>',       '<a href="http://www.bbc.co.uk"></a>',       '<a href="http://www.amazon.co.uk"></a>',       '<a href="http://www.redhat.co.uk"></a>',           '</html>']soup = BeautifulSoup(''.join(doc))alist = soup.findAll('a')
import urlparsefor a in alist:    href = a['href']    print urlparse.urlparse(href)[1]
 
so BeautifulSoup used to find <a> tags; use urlparse to extract to fully qualified domain name use print to print a nice list of hosts 1 per line. here
www.google.co.ukwww.bbc.co.ukwww.amazon.co.ukwww.redhat.co.uk
 
nice, so i think write them out to a file; change program to this to write to disk and read them back to see what's been done.
 
from BeautifulSoup import BeautifulSoupdoc = ['<html><head><title>Page title</title></head>',       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',       '<a href="http://www.google.co.uk"></a>',       '<a href="http://www.bbc.co.uk"></a>',       '<a href="http://www.amazon.co.uk"></a>',       '<a href="http://www.redhat.co.uk"></a>',           '</html>']soup = BeautifulSoup(''.join(doc))alist = soup.findAll('a')
 
import urlparseoutput = open("fqdns.txt","w")
for a in alist:    href = a['href']    output.write(urlparse.urlparse(href)[1])
output.close()
 
 
this writes out www.google.co.ukwww.bbc.co.ukwww.amazon.co.ukwww.redhat.co.uk
 
so I look in Alan's tutor pdf for issue and read page 120 where it suggests doing this; outp.write(line + '\n') # \n is a newline
 
so i change my line from this
    output.write(urlparse.urlparse(href)[1])
to this
    output.write(urlparse.urlparse(href)[1] + "\n")
 
I look at the output file and I get this
 
www.google.co.ukwww.bbc.co.ukwww.amazon.co.ukwww.redhat.co.uk
 
hooray I think, so then I open the file in the program to read each line to do something with it.
i pop this after the last output.close()
 
input = open("fqdns.txt","r")for j in input:    print j
input.close()
 
but his prints out 
 
www.google.co.uk
 
www.bbc.co.uk
 
www.amazon.co.uk
 
www.redhat.co.uk
 
 
Why do i get each record with an extra new line ? Am I writing out the records incorrectly or am I handling them incorrectly when I open the file and print do I have to take out newlines as I process?
 
any help would be great
 
s
 
_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070914/55a4ee06/attachment.htm 


More information about the Tutor mailing list