[Python-ideas] Easily remove characters from a string.

Steven D'Aprano steve at pearwood.info
Sun Oct 23 11:37:34 EDT 2016


On Sat, Oct 22, 2016 at 03:34:23PM +0700, Simon Mark Holland wrote:
> Having researched this as heavily as I am capable with limited experience,
> I would like to suggest a Python 3 equivalent to string.translate() that
> doesn't require a table as input.  Maybe in the form of str.stripall() or
> str.replaceall().

stripall() would not be appropriate: "strip" refers to removing from the 
front and end of the string, not the middle, and str.strip() already 
implements a "strip all" functionality:

py> '+--+*abcd+-*xyz-*+-'.strip('*+-')
'abcd+-*xyz'


But instead of a new method, why not fix translate() to be more user- 
friendly? Currently, it takes two method calls to delete characters 
using translate:

table = str.maketrans('', '', '*+-.!?')
newstring = mystring.translate(table)

That's appropriate when you have a big translation table which you are 
intending to use many times, but its a bit clunky for single, one-off 
uses.

Maybe we could change the API of translate to something like this:

def translate(self, *args):
    if len(args) == 1:
        # Same as the existing behaviour.
        table = args[0]
    elif len(args) == 3:
        table = type(self).maketrans(*args)
    else:
        raise TypeError('too many or not enough arguments')
    ...


Then we could write:

newstring = mystring.translate('', '', '1234567890')

to delete the digits. 

So we could fix this... but should we? Is this *actually* a problem that 
needs fixing, or are we just adding unnecessary complexity?


> My reasoning is that while it is currently possible to easily strip()
> preceding and trailing characters, and even replace() individual characters
> from a string, 

Stripping from the front and back is a very common operation; in my 
experience, replacing is probably half as common, maybe even less. But 
deleting is even less common.


> My proposal is that if strip() and replace() are important enough to
> receive modules, then the arguably more common operation (in terms of
> programming tutorials, if not mainstream development) of just removing all
> instances of specified numbers, punctuation, or even letters etc from a
> list of characters should also.

I think the reason that deleting characters is common in tutorials is 
that it is a simple, easy, obvious task that can be programmed by a 
beginner in just a few lines. I don't think it is actually something 
that people need to do very often, outside of exercises.



-- 
Steve


More information about the Python-ideas mailing list