[Python-ideas] String-like methods on StringIO objects?

dw+python-ideas at hmmz.org dw+python-ideas at hmmz.org
Thu Jun 5 18:39:25 CEST 2014


On Fri, Jun 06, 2014 at 02:05:35AM +1000, Nick Coghlan wrote:
> From the "idle speculation" files (inspired by the recent thread on
> python-dev): has anyone ever experimented with offering string methods like
> find() on StringIO objects?

> I don't work in any sufficiently memory constrained environments these days
> that that style of API would be worth the hassle relative to a normal string,
> it just struck me as a potentially interesting approach to the notion of a
> string manipulation type that didn't generally copy data around and could use
> different code point sizes internally for different parts of the text data.

Thought about this quite a bit. There are a few ways
StringIO/BytesIO/buffers could improve, not sure which approaches are
interesting, though..

1) Not sure if it's the case in Python3.x (pretty sure it isn't in 2.x),
but cStringIO could optimize for the case where the IO is discarded
after building a single string by using the CPython APIs for doing that
(e.g. _PyString_Resize).

In that case, getvalue() returns the built string, and sets an internal
flag to cause it to be copied to a new private string if any further IO
is invoked. This inverts the current behaviour where the normal case of
build-and-discard causes a copy.


2) Rather than implement string methods on the StringIO, it might be
nicer if those methods could apply to a memoryview, and then make it
possible e.g. for BytesIO to be exposed as a memoryview. Right now
Python doesn't have much in the way of generic "type safe / memory safe"
APIs for doing things to regular memory without first invoking
copies/conversions of various sorts. This might be the more useful thing
to fix.

We have plenty of special cases, like bytearray(), array.array(),
StringIO (to some degree), and so on, and various ways to manipulate
that memory (ctypes and struct module for example), but they are all
somewhat hodge-podges of each other and lack any "one way to do it".


I had looked at building some kind of unified 'memory slice' type last
year, since I keep bumping into the need for better Python-level support
for this stuff when working on 'bit twiddling' projects of various
kinds.

It's mostly thinking aloud, but here is a rough sketch for the kind
of module I had been considering last year, mostly while working with
Python 2: https://github.com/dw/memsink/wiki/Memory-Module . The idea
was to provide a common 'Slice' adaptor type whose memory could be
interpreted using a couple of different abstractions (Vector and File
being the obvious).


David

> 
> Cheers,
> Nick.
> 

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



More information about the Python-ideas mailing list