[ python-Bugs-1330039 ] tarfile.add() produces hard links instead of normal files

SourceForge.net noreply at sourceforge.net
Wed Oct 19 14:41:43 CEST 2005


Bugs item #1330039, was opened at 2005-10-18 22:27
Message generated for change (Comment added) made by gustaebel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1330039&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin Pitt (mpitt)
Assigned to: Nobody/Anonymous (nobody)
Summary: tarfile.add() produces hard links instead of normal files

Initial Comment:
When opening a tarfile for writing and adding several
files, some files end up being a hardlink to a
previously added tar member instead of being a proper
file member.

I attach a demo that demonstrates the problem. It
basically does:

tarfile.open('tarfile-bug.tar', 'w')
tar.add('tarfile-bug-f1')
tar.add('tarfile-bug-f2')
tar.close()

in the resulting tar, "tarfile-bug-f2" is a hard link
to tarfile-bug-f1, although both entries should be
proper files.

It works when the tarfile is close()d and opened again
in append mode between the two add()s, but that slows
down the process dramatically and is certainly not the
intended way.

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2005-10-19 14:41

Message:
Logged In: YES 
user_id=642936

I just submitted patch #1331635 which ought to fix your
problem. Thank you for your report.

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2005-10-19 11:31

Message:
Logged In: YES 
user_id=642936

This is a feature ;-)
tarfile.py records the inode and device number (st_ino,
st_dev) for each added file in a list (TarFile.inodes). When
a new file is added and its inode and device number is found
in this list, it will be added as a hardlink member,
otherwise as a regular file.
Because your test script adds and immediately removes each
file, both files are assigned the same inode number. If you
had another process creating a file in the meantime, the
problem would not occur, because it would take over the
inode number before the second file has the chance.

Your problem shows that the way tarfile.py handles hardlinks
is too sloppy. It must take the stat.st_nlink field into
account. I will create a fix for this.

As a workaround you have several options:
- Do not remove the files after adding them, but after the
TarFile is closed.
- Set TarFile.dereference to False before adding files, so
files with several links would always be added as regular
files (see the Documentation). Disadvantage: symbolic links
would be added as regular files as well.
- Tamper with the source code. Edit TarFile.gettarinfo().
Change the line that says "if inode in self.inodes and not
self.dereference:" to "if statres.st_nlink > 1 and inode in
self.inodes and not self.dereference:".
- Empy the TarFile.inodes list after each file. Ugh!



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1330039&group_id=5470


More information about the Python-bugs-list mailing list