Secure delete with python

Duncan Booth duncan.booth at invalid.invalid
Tue Sep 7 04:30:14 EDT 2004


Ville Vainio <ville at spammers.com> wrote in 
news:du74qmb9mzs.fsf at amadeus.cc.tut.fi:

> Seriously? What OSen are known for doing this? I'd had thought that if
> the file size is unchanged, the data is always written over the old
> data...

I don't know for certain, but I think it is a pretty safe bet that NTFS 
allocates new disc blocks instead of updating the existing ones.

NTFS is a transaction based file system, i.e. it guarantees that any 
particular disc operation either completes or doesn't, you can never get 
file-system corruption due to a power loss part way through updating a 
file. Transactions are written to two transaction logs (in case one is 
corrupted on failure), and every few seconds the outstanding transactions 
are committed. Once committed there is sufficient information in the 
transaction log that even if power is lost the transaction can be 
completed, and likewise any transaction that has not been committed has 
sufficient information stored that it can be rolled back.

There isn't very much published information on the NTFS internals (any 
useful references gratefully received), but so far as I can see writing 
updates to a fresh disc block would be the only realistic way to implement 
this (otherwise you would need to write the data three times: once to each 
transaction log then again to the actual file). If the data is written 
separately then the transaction log only needs to store the location of the 
new data (so it can be wiped if the transaction is rolled back) and then 
update pointers when it is committed.

The other reason why I'm sure overwriting an existing file must allocate 
new disc blocks is that NTFS supports compression on files, so if you start 
off with a compressed file containing essentially random data and overwrite 
it with repeated data (e.g. nulls) it will occupy less disc space.



More information about the Python-list mailing list