[Tutor] A further question about opening and closing files

Steven D'Aprano steve at pearwood.info
Wed Sep 9 20:46:53 CEST 2015


On Wed, Sep 09, 2015 at 10:24:57AM -0400, richard kappler wrote:
> Under a different subject line (More Pythonic?) Steven D'Aprano commented:
> 
> > And this will repeatedly open the file, append one line, then close it
> > again. Almost certainly not what you want -- it's wasteful and
> > potentially expensive.
> 
> And I get that. It does bring up another question though. When using
> 
> with open(somefile, 'r') as f:
>     with open(filename, 'a') as f1:
>         for line in f:
> 
> the file being appended is opened and stays open while the loop iterates,
> then the file closes when exiting the loop, yes? 

The file closes when exiting the *with block*, not necessarily the loop. 
Consider:

with open(blah blah blah) as f:
    for line in f:
        pass
    time.sleep(120)
# file isn't closed until we get here

Even if the file is empty, and there are no lines, it will be held open 
for two minutes.


> Does this not have the
> potential to be expensive as well if you are writing a lot of data to the
> file?

Er, expensive in what way?

Yes, I suppose it is more expensive to write 1 gigabyte of data to a 
file than to write 1 byte. What's your point? If you want to write 1 GB, 
then you have to write 1 GB, and it will take as long as it takes.

Look at it this way: suppose you have to hammer 1000 nails into a fence. 
You can grab your hammer out of your tool box, hammer one nail, put the 
hammer back in the tool box and close the lid, open the lid, take the 
hammer out again, hammer one nail, put the hammer back in the tool box, 
close the lid, open the lid again, take out the hammer...

Or you take the hammer out, hammer 1000 nails, then put the hammer away. 
Sure, while you are hammering those 1000 nails, you're not mowing the 
lawn, painting the porch, walking the dog or any of the dozen other jobs 
you have to do, but you have to hammer those nails eventually.

> I did a little experiment:
> 
> >>> f1 = open("output/test.log", 'a')
> >>> f1.write("this is a test")
> >>> f1.write("this is a test")
> >>> f1.write('why isn\'t this writing????')
> >>> f1.close()
> 
> monitoring test.log as I went. Nothing was written to the file until I
> closed it, or at least that's the way it appeared to the text editor in
> which I had test.log open (gedit). In gedit, when a file changes it tells
> you and gives you the option to reload the file. This didn't happen until I
> closed the file. So I'm presuming all the writes sat in a buffer in memory
> until the file was closed, at which time they were written to the file.

Correct. All modern operating systems do that. Writing to disk is slow, 
*hundreds of thousands of times slower* than writing to memory, so the 
operating system will queue up a reasonable amount of data before 
actually forcing it to the disk drive.

 
> Is that actually how it happens, and if so does that not also have the
> potential to cause problems if memory is a concern?

No. The operating system is not stupid enough to queue up gigabytes of 
data. Typically the buffer is a something like 128 KB of data (I think), 
or maybe a MB or so. Writing a couple of short lines of text won't fill 
it, which is why you don't see any change until you actually close the 
file. Try writing a million lines, and you'll see something different. 
The OS will flush the buffer when it is full, or when you close the 
file, whichever happens first.

If you know that you're going to take a long time to fill the buffer, 
say you're performing a really slow calculation, and your data is 
trickling in really slowly, then you might do a file.flush() every few 
seconds or so. Or if you're writing an ACID database. But for normal 
use, don't try to out-smart the OS, because you will fail. This is 
really specialised know-how.

Have you noticed how slow gedit is to save files? That's because the 
gedit programmers thought they were smarter than the OS, so every time 
they write a file, they call flush() and sync(). Possibly multiple 
times. All that happens is that they slow the writing down greatly. 
Other text editors let the OS manage this process, and saving is 
effectively instantaneous. With gedit, there's a visible pause when it 
saves. (At least in all the versions of gedit I've used.)

And the data is not any more safe than the other text editors, 
because when the OS has written to the hard drive, there is no guarantee 
that the data has hit the platter yet. Hard drives themselves contain 
buffers, and they won't actually write data to the platter until they 
are good and ready.

-- 
Steve


More information about the Tutor mailing list