creating size-limited tar files
andrea crotti
andrea.crotti.0 at gmail.com
Tue Nov 13 05:31:32 EST 2012
2012/11/9 andrea crotti <andrea.crotti.0 at gmail.com>:
> Anyway in the meanwhile I implemented this tar and split in this way below.
> It works very well and it's probably much faster, but the downside is that
> I give away control to tar and split..
>
> def tar_and_split(inputfile, output, bytes_size=None):
> """Take the file containing all the files to compress, the bytes
> desired for the split and the base name of the output file
> """
> # cleanup first
> for fname in glob(output + "*"):
> logger.debug("Removing old file %s" % fname)
> remove(fname)
>
> out = '-' if bytes_size else (output + '.tar.gz')
> cmd = "tar czpf {} $(cat {})".format(out, inputfile)
> if bytes_size:
> cmd += "| split -b {} -d - {}".format(bytes_size, output)
>
> logger.info("Running command %s" % cmd)
>
> proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
> stderr=subprocess.PIPE)
> out, err = proc.communicate()
> if err:
> logger.error("Got error messages %s" % err)
>
> logger.info("Output %s" % out)
>
> if proc.returncode != 0:
> logger.error("Something failed running %s, need to re-run" % cmd)
> return False
There is another problem with this solution, if I run something like
this with Popen:
cmd = "tar {bigc} -czpf - --files-from {inputfile} | split -b
{bytes_size} -d - {output}"
proc = subprocess.Popen(to_run, shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
the proc.returncode will only be the one from "split", so I lose the
ability to check if tar failed..
A solution would be something like this:
{ ls -dlkfjdsl; echo $? > tar.status; } | split
but it's a bit ugly. I wonder if I can use the subprocess PIPEs to do
the same thing, is it going to be as fast and work in the same way??
More information about the Python-list
mailing list