Writing Multiple files at a times
Denis McMahon
denismfmcmahon at gmail.com
Sun Jun 29 15:21:39 EDT 2014
On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote:
> I am opening multiple URLs with urllib.open, now one Url has huge html
> source files, like that each one has. As these files are read I am
> trying to concatenate them and put in one txt file as string.
> From this big txt file I am trying to take out each html file body of
> each URL and trying to write and store them
OK, let me clarify what I think you said.
First you concatenate all the web pages into a single file.
Then you extract all the page bodies from the single file and save them
as separate files.
This seems a silly way to do things, why don't you just save each html
body section as you receive it?
This sounds like it should be something as simple as:
from BeautifulSoup import BeautifulSoup
import requests
urlList = [
"http://something/",
"http://something/",
"http://something/",
....... ]
n = 0
for url in urlList:
r = requests.get( url )
soup = BeautifulSoup( r.content )
body = soup.find( "body" )
fp = open( "scraped/body{:0>5d}.htm".format( n ), "w" )
fp.write( body.prettify() )
fp.close
n += 1
will give you:
scraped/body00000.htm
scraped/body00001.htm
scraped/body00002.htm
........
for as many urls as you have in your url list. (make sure the target
directory exists!)
--
Denis McMahon, denismfmcmahon at gmail.com
More information about the Python-list
mailing list