[Python-checkins] r88529 - in python/branches/release32-maint: Lib/tarfile.py Lib/test/test_tarfile.py Misc/NEWS
lars.gustaebel
python-checkins at python.org
Wed Feb 23 12:52:38 CET 2011
Author: lars.gustaebel
Date: Wed Feb 23 12:52:31 2011
New Revision: 88529
Log:
Merged revisions 88528 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r88528 | lars.gustaebel | 2011-02-23 12:42:22 +0100 (Wed, 23 Feb 2011) | 16 lines
Issue #11224: Improved sparse file read support (r85916) introduced a
regression in _FileInFile which is used in file-like objects returned
by TarFile.extractfile(). The inefficient design of the
_FileInFile.read() method causes various dramatic side-effects and
errors:
- The data segment of a file member is read completely into memory
every(!) time a small block is accessed. This is not only slow
but may cause unexpected MemoryErrors with very large files.
- Reading members from compressed tar archives is even slower
because of the excessive backwards seeking which is done when the
same data segment is read over and over again.
- As a backwards seek on a TarFile opened in stream mode is not
possible, using extractfile() fails with a StreamError.
........
Modified:
python/branches/release32-maint/ (props changed)
python/branches/release32-maint/Lib/tarfile.py
python/branches/release32-maint/Lib/test/test_tarfile.py
python/branches/release32-maint/Misc/NEWS
Modified: python/branches/release32-maint/Lib/tarfile.py
==============================================================================
--- python/branches/release32-maint/Lib/tarfile.py (original)
+++ python/branches/release32-maint/Lib/tarfile.py Wed Feb 23 12:52:31 2011
@@ -760,9 +760,8 @@
self.map_index = 0
length = min(size, stop - self.position)
if data:
- self.fileobj.seek(offset)
- block = self.fileobj.read(stop - start)
- buf += block[self.position - start:self.position + length]
+ self.fileobj.seek(offset + (self.position - start))
+ buf += self.fileobj.read(length)
else:
buf += NUL * length
size -= length
Modified: python/branches/release32-maint/Lib/test/test_tarfile.py
==============================================================================
--- python/branches/release32-maint/Lib/test/test_tarfile.py (original)
+++ python/branches/release32-maint/Lib/test/test_tarfile.py Wed Feb 23 12:52:31 2011
@@ -419,6 +419,22 @@
mode="r|"
+ def test_read_through(self):
+ # Issue #11224: A poorly designed _FileInFile.read() method
+ # caused seeking errors with stream tar files.
+ for tarinfo in self.tar:
+ if not tarinfo.isreg():
+ continue
+ fobj = self.tar.extractfile(tarinfo)
+ while True:
+ try:
+ buf = fobj.read(512)
+ except tarfile.StreamError:
+ self.fail("simple read-through using TarFile.extractfile() failed")
+ if not buf:
+ break
+ fobj.close()
+
def test_fileobj_regular_file(self):
tarinfo = self.tar.next() # get "regtype" (can't use getmember)
fobj = self.tar.extractfile(tarinfo)
Modified: python/branches/release32-maint/Misc/NEWS
==============================================================================
--- python/branches/release32-maint/Misc/NEWS (original)
+++ python/branches/release32-maint/Misc/NEWS Wed Feb 23 12:52:31 2011
@@ -15,6 +15,10 @@
Library
-------
+- Issue #11224: Fixed a regression in tarfile that affected the file-like
+ objects returned by TarFile.extractfile() regarding performance, memory
+ consumption and failures with the stream interface.
+
- Issue #11074: Make 'tokenize' so it can be reloaded.
- Issue #4681: Allow mmap() to work on file sizes and offsets larger than
More information about the Python-checkins
mailing list