creating size-limited tar files

Wed Nov 7 18:15:14 EST 2012

On 7 November 2012 21:52, Andrea Crotti <andrea.crotti.0 at gmail.com> wrote:
> On 11/07/2012 08:32 PM, Roy Smith wrote:
>>
>> In article <509ab0fa$0$6636$9b4e6d93 at newsspool2.arcor-online.net>,
>>   Alexander Blinne <news at blinne.net> wrote:
>>
>>> I don't know the best way to find the current size, I only have a
>>> general remark.
>>> This solution is not so good if you have to impose a hard limit on the
>>> resulting file size. You could end up having a tar file of size "limit +
>>> size of biggest file - 1 + overhead" in the worst case if the tar is at
>>> limit - 1 and the next file is the biggest file. Of course that may be
>>> acceptable in many cases or it may be acceptable to do something about
>>> it by adjusting the limit.
>
> But the other problem is that at the moment the people that get our chunks
> reassemble the file with a simple:
>
> cat file1.tar.gz file2.tar.gz > file.tar.gz
>
> which I suppose is not going to work if I create 2 different tar files,
> since it would recreate the header in all of the them, right?

Correct. But if you read the rest of Alexander's post you'll find a
suggestion that would work in this case and that can guarantee to give
files of the desired size.

You just need to define your own class that implements a write()
method and then distributes any data it receives to separate files.
You can then pass this as the fileobj argument to the tarfile.open
function:
http://docs.python.org/2/library/tarfile.html#tarfile.open

Oscar