creating size-limited tar files

Tue Nov 13 05:31:32 EST 2012

2012/11/9 andrea crotti <andrea.crotti.0 at gmail.com>:
> Anyway in the meanwhile I implemented this tar and split in this way below.
> It works very well and it's probably much faster, but the downside is that
> I give away control to tar and split..
>
> def tar_and_split(inputfile, output, bytes_size=None):
>     """Take the file containing all the files to compress, the bytes
>     desired for the split and the base name of the output file
>     """
>     # cleanup first
>     for fname in glob(output + "*"):
>         logger.debug("Removing old file %s" % fname)
>         remove(fname)
>
>     out = '-' if bytes_size else (output + '.tar.gz')
>     cmd = "tar czpf {} $(cat {})".format(out, inputfile)
>     if bytes_size:
>         cmd += "| split -b {} -d - {}".format(bytes_size, output)
>
>     logger.info("Running command %s" % cmd)
>
>     proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
> stderr=subprocess.PIPE)
>     out, err = proc.communicate()
>     if err:
>         logger.error("Got error messages %s" % err)
>
>     logger.info("Output %s" % out)
>
>     if proc.returncode != 0:
>         logger.error("Something failed running %s, need to re-run" % cmd)
>         return False

There is another problem with this solution, if I run something like
this with Popen:
    cmd = "tar {bigc} -czpf - --files-from {inputfile} | split -b
{bytes_size} -d - {output}"

    proc = subprocess.Popen(to_run, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)

the proc.returncode will only be the one from "split", so I lose the
ability to check if tar failed..

A solution would be something like this:
{ ls -dlkfjdsl; echo $? > tar.status; } | split

but it's a bit ugly.  I wonder if I can use the subprocess PIPEs to do
the same thing, is it going to be as fast and work in the same way??