Improving cStringIO API (Request For Comments)

Michael D. Marchionna mdm at corp.eCircles.com
Thu Jun 22 19:03:50 EDT 2000


I was messing around trying to convert my Java app to
Python, and
ran into some difficulty with cStringIO module.  In my app I
transmit
a segmented byte stream over a socket connection.  I read
the data
section by section while laying the message out in a single
StringIO
buffer.  I then create a series of section objects that
reference the
segments in the StringIO buffer.  I want the section object
to contain
no data except an offset from the start of the buffer.   So
that when I
query the contents of a section the data is actually pulled
from the
StringIO object.

This led me to a bunch of code like the following:

class MsgSect:
    <<< code deleted >>>
    def getAttrText(self):
        value = None
        if self.type == ATTR and self.size > 9 and
self.atype == 'C':
            oldpos = self.buffer.tell()
            self.buffer.seek(self.offset+9)
            value = self.buffer.read(self.size-9)
            self.buffer.seek(oldpos)
        return value

The point is not to disturb the current read/write pointer.
You might ask
why not just extract the string and add it as a member of
the MsgSect
object.  Well I have some reasons OK.  Anyway this led me to
making some
modifications to the cStringIO module.

At first I just added a peek method so I could read sections
without
modifying the read/write pointer.  For example:

    >>> from cStringIO import  StringIO
    >>> so = StringIO()
    >>> so.write('0123456789ABCDEF')
    >>> so.tell()
    16
    >>> so.peek(4,3)
    '456'
    >>> so.tell()
    16

I also added a poke method to do what you might think:

    >>> so.poke(4,'***')
    >>> so.getvalue()
    '0123***789ABCDEF'
    >>> so.tell()
    16

Then I got to thinking these are still strings why don't
they behave a little
more like them.  So I added a set of sequence methods to the
StringI and
StringO object types.  So now you can do

    >>> si = StringIO('0123456789ABCDEF')
    >>> len(si)
    16
    >>> si[4]
    '4'
    >>> si[-4]
    'C'
    >>> si[4:7]
    '456'
    >>> si[-6:-2]
    'ABCD'

Item assignment works like you would expect for StringO
objects:

    >>> so = StringIO()
    >>> so.write('ABCD')
    >>> so[3] = 'c'
    >>> so[:]
    'ABcD'

Repeat operations work as well:

    >>> soso = so * 2
    >>> soso[:]
    'ABcDABcD'

Things get a little more squirely with the remaining concat
and setslice
methods.  A concat of StringI objects works fine, but the a
concat of
StringO objects leaves the problem of what to do with the
softspace
attribute.  Currently I just copy the value from the self
object.

    >>> so1 = StringIO()
    >>> so1.write('ABC')
    >>> so2 = StringIO()
    >>> so2.write('123')
    >>> so1.softspace
    0
    >>> so2.softspace = 1
    >>> so3 = so1 + so2
    >>> so3.getvalue()
    'ABC123'
    >>> so3.softspace
    0
    >>> so4 = so2 + so1
    >>> so4.getvalue()
    '123ABC'
    >>> so4.softspace
    1

More variations can be dealt with the concat methods.  For
example it is
possible to mix StringI and StringO types, but what should
the resultant
type be.  Also it seems reasonable to allow string objects
in the concat
as well, so5 = so1 + 'XYZ'.   Unfortunately so5 = 'XYZ' +
so1 won't work.

Slice assignments obviously only work on StringO objects.
In some
sence the setslice should overwrite the contents of the
StringO buffer rather
than produce a new StringO object.  But the poke method
handles most of
the setslice operation except it currently doesn't permit
extending the buffer.
It could do that though.  Perhaps optionally via a third
argument:

    def poke(offset, value, extend=0):

The other areas of concern about the cStringIO module
concern its fixed
reallocation scheme.   Currenty a StringO object starts out
at a capacity
of 128 and will double its size each time a write goes
beyond the current
buffer space.  It would be nice to allow this scheme
modified.  Perhaps via
optional keyword arguments to the StringIO factory
function.  For example:

        so = StringIO(capacity=4096, increment=8192)

Would create a StringO object that would start out with a
buffer space of
4096 bytes, and add 8192 each time a write went off the
edge.  Or possibly
something even likethis:

        so = StringIO(capacity=1024*1024, upscale=0.1)

Would create a 1MB buffer and increase the size by 10
percent for each
overwite.  Options could also be added for decreasing the
size of the buffer
when it is truncated, e.g. decrement=65536, downscale=0.5.

Another feature that would be nice is to allow shifting of
the contents of a
StringO buffer.  Similar to what memmove does.

Further on I would like to support the struct.pack and
struct.unpack methods
so that they operate directly on the contents of the
StringIO buffer.

In the end I was wondering what people think of adding some,
or all of these
features to the cStringIO module, or perhaps to a completely
new module.

Comments and suggestions?

--MDM





More information about the Python-list mailing list