read and write the same text file

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Mar 9 22:45:19 EST 2013


On Sat, 09 Mar 2013 10:47:34 -0500, Roy Smith wrote:

> In article <c41028f7-e457-4a2b-99f4-c473eaadd128 at googlegroups.com>,
>  iMath <redstone-cold at 163.com> wrote:

>> def text_process(file):
>>     with open(file,'r') as f:
>>         text = f.read()
>>     with open(file,'w') as f:
>>         f.write(replace_pattern.sub('',text))
> 
> At a minimum, you need to close the file after you read it and before
> you re-open it for writing.

The "with" statement automatically does that. When the indented block  -- 
in this case, a single line "text = f.read()" -- completes, f.close() is 
automatically called.

Technically, this is a property of file objects, not the with statement. 
The with statement merely calls the magic __exit__ method, which for file 
objects closes the file.


> There's a variety of ways you could achieve the same effect.  You might
> open the file once, in read-write mode, read the contents, rewind to the
> beginning with seek(), then write the new contents.  You might also
> write the modified data out to a new file, close it, and then rename it.
> But, open, read, close, open, write, close is the most straight-forward.

All of those techniques are acceptable for quick and dirty scripts. But 
for professional applications, you want something which is resistant to 
data loss. The problems happens when you do this:

1. Read file.
2. Modify data.
3. Open file for writing. # This deletes contents of the file!
4. Write data.
5. Flush data to the hard drive.

Notice that there is a brief time frame where the old data is destroyed, 
but the new data hasn't been saved to the hard drive yet. This is where 
you can get data loss if, say, the power goes off.

So a professional-quality application should do something like this:

1. Read file.
2. Modify data.
3. Open a new file for writing.
4. Write data to this new file.
5. Flush data to the hard drive.
6. Atomically replace the original file with the new file.

Since sensible operating systems promise to be able to replace as a 
single atomic step, guaranteed to either fully succeed or not occur at 
all, this reduces the risk of data corruption or data loss significantly.

Even if you're using an OS that doesn't guarantee atomic replacement 
*cough* Windows *cough* it still decreases the risk of data loss, just 
not as much as we would like.

But having said all that, for "normal" use, a simple read-write cycle is 
still pretty safe, and it's *much* simpler to get right.



-- 
Steven



More information about the Python-list mailing list