[New-bugs-announce] [issue10045] poor cStringIO.StringO seek performance

Patrick Strawderman report at bugs.python.org
Thu Oct 7 19:37:49 CEST 2010


New submission from Patrick Strawderman <patrick at zope.com>:

cStringIO.StringO's seek method has O(n) characteristics in certain,
albeit pathological, cases, while the pure Python implementation and
cStringIO.StringI's seek methods both execute in constant time in all cases.

When the file offset is set n bytes beyond the end of actual data,
the gap is filled in with n bytes in cStringIO.StringO's seek method.

however, POSIX states that reads of data in the gap will return null bytes
only if a subsequent write has taken place, so filling in the gap is not
required at the time of the seek.

This patch for 2.7 corrects the behavior by unifying StringO and StringI's
seek methods, and moving the writing of null bytes to StringO's write
method.  There may be a more elegant way to write this, I don't know.
I believe this issue affects Python 3 as well, though I have yet to
test it.

NOTE: Perhaps this seems like an extreme edge case not worthy of a fix, but
this actually caused problems for us when parsing images with malformed
EXIF data; a web request for uploading such a photo was taking on the order
of 15 minutes.  When we stopped using cStringIO.StringO, it took seconds.

----------
files: cStringIO.diff
keywords: patch
messages: 118123
nosy: boogenhagn
priority: normal
severity: normal
status: open
title: poor cStringIO.StringO seek performance
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3
Added file: http://bugs.python.org/file19150/cStringIO.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10045>
_______________________________________


More information about the New-bugs-announce mailing list