Opening Multiple files at one time

Dave Angel davea at davea.name
Tue Apr 21 06:17:35 EDT 2015


On 04/21/2015 03:56 AM, subhabrata.banerji at gmail.com wrote:

>
> Yes. They do not. They are opening one by one.
> I have some big chunk of data I am getting by crawling etc.
> now as I run the code it is fetching data.
> I am trying to fetch the data from various sites.
> The contents of the file are getting getting stored in
> separate files.
> For example, if I open the site of "http://www.theguardian.com/international", then the result may be stored in file in file "file1.txt", and the contents of site "http://edition.cnn.com/", may be stored in file "file2.txt".
>
> But the contents of each site changes everyday. So everyday as you open these sites and store the results, you should store in different text files. These text files I am looking to be created on its own, as you do not know its numbers, how many numbers you need to fetch the data.
>
> I am trying to do some results with import datetime as datetime.datetime.now()
> may change everytime. I am trying to put it as name of file. But you may suggest better.
>

To get the text version of today's date, use something like:

import datetime
import itertools
SUFFIX = datetime.datetime.now().strftime("%Y$%m%d")

To write a filename generator that generates names sequentially  (untested):

def filenames(suffix=SUFFIX):
     for integer in itertools.count(1):
         yield "{0:04d}".format(integer) + "-" + SUFFIX


for filename in filenames():
     f = open (filename, "w")
     ...Do some work here which writes to the file, and does "break"
     ... if we don't need any more files
     f.close()

Note that this is literally open-ended.  If you don't put some logic in 
that loop which will break, it'll write files till your OS stops you, 
either because of disk full, or directory too large, or whatever.

I suggest you test the loop out by using a print() before using it to 
actually create the files.

In the format above, I used 4 digits, on the assumption that usually 
that is enough.  If you need more than that on the occasional day, it 
won't break, but the names won't be nicely sorted when you view them.


If this were my problem, I'd also use a generator for the web page 
names.  If you write that generator, then you could do something like 
(untested):

for filename, url in zip(filenames(), urlames():
     f = open (filename, "w")
     ... process the url, writing to file f
     f.close()

In this loop, it'll automatically end when you run out of urlnames.


I also think you should consider making the date the directory name you 
use, rather than putting many days files in a single directory.  But 
this mainly affects the way you concatenate the parts together.  You'd 
use os.path.join() rather than  "+" to combine parts.


-- 
DaveA



More information about the Python-list mailing list