Oh no, my code is being published ... help!

Thu Nov 29 14:59:58 EST 2007

There is a Linux forum that I frequent from time to time on which I
mentioned a couple of scripts that I wrote.  The editors of a small
Linux magazine heard and found them interesting enough to ask me to
write an article about them.  I accepted gladly, of course.  I wrote
the article and submitted it and I was told to look for it on the
January issue.  Sounds good, right?

The thing is I am starting to get a little nervous about it.  You see,
programming is not my full time job.  I dabble in it from time to
time, mostly to scratch my own itches, as they say.  But, I know that
my code is probably far from being of professional quality.  So, I was
wondering if some of you would be interested in taking a peak and
offer some suggestions for improving the quality and safety of the
code.  Let me try to explain what they do.

Lets say, for example that you have, as I do, a large directory tree
that you want to compress containing data that you hardly ever use,
but that you want to have easy access to from time to time.  In my
case, that directory tree contains the RAW image files that come from
my DSLR camera.  Each of those files is about 10 MB.  The total size
of that directory tree is about 45 GB, and it is constantly growing.
(Note: I store my finished, "processed", images on a different
directory tree.  They are stored as JPEG files, so they are already
compressed.)  How would you go about using compression to retake some
disk space on a situation like this one?

Well, one way I came up with was to write my own tool to do this job.
I created a program called 7sqz (7Squeeze) that can take care of this
task with ease.  It is a Python script that navigates through a
directory tree compressing its contents only, not the actual
directories.  As it enters each directory on the tree it saves all the
files on that directory on an archive on that same directory giving it
the name of the directory itself.  If it finds that the directory
already has an archive file with the correct name it leaves it alone
and goes to the next directory, unless it also finds an uncompressed
file in it.  When that happens it simply moves it into the existing
archive file, updating it inside the archive if it was already there.

I also created 7usqz which is the opposite counterpart of 7sqz.  It
will simply go through a specified directory tree looking for archive
files named as the holding directory and will uncompress them,
essentially leaving the directory as it was before being squeezed.
Both 7sqz and 7usqz use p7zip for the actual compression, so you need
to have p7zip already installed.

You can obtain 7sqz from here:
http://rmcorrespond.googlepages.com/7sqz

And you can get 7usqz from here:
http://rmcorrespond.googlepages.com/7usqz

After downloading them, save them in a place like /usr/bin and make
sure they are executable.

To use 7sqz you could just give it a target directory as a parameter,
like this:

7sqz /home/some_directory

By default it will use the 7z format (which gives better compression
than zip), but you can use the zip format if you prefer by using the -
m option like this:

7sqz -m zip /home/some_directory

By default it will use Normal as the level of compression, but you can
use EXTRA or MAX if you prefer by using the -l option like this:

7sqz -l Extra /home/some_directory

By default it will just skip any file if it found an error during
compression and will log the error, but you can tell it to "Halt on
Error" with the -e option like this:

7sqz -e /home/some_directory

And of course, you can combine options as you please like this:

7sqz -m zip -l Max -e /home/some_directory

As I said, 7usqz is the opposite counterpart of 7sqz.  To use it you
could just give it a target directory as a parameter, like this:

7usqz /home/some_directory

By default it will just skip any file if it found an error during
decompression and will log the error, but you can tell it to "Halt on
Error" with the -e option like this:

7usqz -e /home/some_directory

Please do a few, or better yet a lot of tests, before using it on a
directory that you cannot afford to loose. I believe it has all the
necessary safety precautions to protect your data, but I can't
guaranty it.  That is why I'm asking for your help.  All I can say is
that I have never lost any data with it and that it works great for
me.  What do you think?