[Python-Dev] Mini-Pep: An Empty String ABC

Raymond Hettinger python at rcn.com
Sun Jun 1 08:59:00 CEST 2008


Mini-Pep:  An Empty String ABC
Target:  Py2.6 and Py3.0
Author:  Raymond Hettinger

Proposal
--------

Add a new collections ABC specified as:

    class String(Sequence):
        pass

Motivation
----------
Having an ABC for strings allows string look-alike classes to declare
themselves as sequences that contain text.  Client code (such as a flatten
operation or tree searching tool) may use that ABC to usefully differentiate
strings from other sequences (i.e. containers vs containees).  And in code
that only relies on sequence behavior, isinstance(x,str) may be usefully
replaced by isinstance(x,String) so that look-alikes can be substituted in
calling code.

A natural temptation is add other methods to the String ABC, but strings are a
tough case.  Beyond simple sequence manipulation, the string methods get very
complex.  An ABC that included those methods would make it tough to write a
compliant class that could be registered as a String.  The split(), rsplit(),
partition(), and rpartition() methods are examples of methods that would be
difficult to emulate correctly.  Also, starting with Py3.0, strings are
essentially abstract sequences of code points, meaning that an encode() method
is essential to being able to usefully transform them back into concrete data.
Unfortunately, the encode method is so complex that it cannot be readily
emulated by an aspiring string look-alike.

Besides complexity, another problem with the concrete str API is the extensive
number of methods.  If string look-alikes were required to emulate the likes
of zfill(), ljust(), title(), translate(), join(), etc., it would
significantly add to the burden of writing a class complying with the String
ABC.

The fundamental problem is that of balancing a client function's desire to
rely on a broad number of behaviors against the difficulty of writing a
compliant look-alike class.  For other ABCs, the balance is more easily struck
because the behaviors are fewer in number, because they are easier to
implement correctly, and because some methods can be provided as mixins.  For
a String ABC, the balance should lean toward minimalism due to the large
number of methods and how difficult it is to implement some of the correctly.

A last reason to avoid expanding the String API is that almost none of the
candidate methods characterize the notion of "stringiness".  With something
calling itself an integer, an __add__() method would be expected as it is
fundamental to the notion of "integeriness".  In contrast, methods like
startswith() and title() are non-essential extras -- we would not discount
something as being not stringlike if those methods were not present.






More information about the Python-Dev mailing list