Script to Remove Attachments in Exchange Mailbox

Tim Golden mail at timgolden.me.uk
Fri Oct 12 06:38:55 EDT 2007


Tim Golden wrote:
> [KDawg44]
> 
>>>> I am frustrated with my users who send large files around the office
>>>> instead of using the network shares.
> 
> [Tim Golden]
>>> I have something v. similar, ...
>>> It's a while since they were last run to they're probably quite
>>> dusty but it sounds like the kind of thing you're after.
> 
> Well, here's[1] the first part -- running through folders and replacing
> big attachments with links. 

... and here's [1] the second part -- running through folders on disk
and replacing duplicate files with a link to the first one found. For
obvious reasons, be careful with this: it deletes files and replaces
them with hardlinks, but there's no rollback. Nor is it so well
commented as the other, I'm afraid, but I think it is still fairly
clear what's going on.

You call it with a space-separated list of directories to search
(usually one in my own experience) and it walks down those directory
trees looking for files of type INCLUDE_EXTENSIONS except for files
called IGNORE_FILES. Its comparison is very simple: comparing md5
hashes with no optimisations; obviously a lot of work could be done
there if it were an issue.

It ends up with nested dictionaries, the outer one keyed by filename,
the inner one by size and containing a list of directories where
files of that name/size are found. The first in the list is considered
the master and all the others are deleted and replaced by NTFS
hardlinks. Probably safer to hardlink first to an alias, *then*
delete and then rename the alias, but I leave that up to you if
you want to use the script at all.

TJG

[1] http://timgolden.me.uk/python/downloads/find_duplicate_files.py



More information about the Python-list mailing list