[Python-ideas] New explicit methods to trim strings

Rhodri James rhodri at kynesim.co.uk
Tue Apr 2 07:06:33 EDT 2019


On 02/04/2019 01:52, Steven D'Aprano wrote:
> Here's a partial list of English prefixes that somebody doing text
> processing might want to remove to get at the root word:
> 
>      a an ante anti auto circum co com con contra contro de dis
>      en ex extra hyper il im in ir inter intra intro macro micro
>      mono non omni post pre pro sub sym syn tele un uni up
> 
> I count fourteen clashes:
> 
>      a: an ante anti
>      an: ante anti
>      co: com con contra contro
>      ex: extra
>      in: inter intra intro
>      un: uni
> 
> (That's over a third of this admittedly incomplete list of prefixes.)
> 
> I can think of at least one English suffix pair that clash: -ify, -fy.

You're beginning to persuade me that cut/trim methods/functions aren't a 
good idea :-)

So far we have two slightly dubious use-cases.

1. Stripping file extensions.  Personally I find that treating filenames 
like filenames (i.e. using os.path or (nowadays) pathlib) results in me 
thinking more appropriately about what I'm doing.

2. Stripping prefixes and suffixes to get to root words.  Python has 
been used for natural language work for over a decade, and I don't think 
I've heard any great call from linguists for the functionality.  English 
isn't a girl who puts out like that on a first date :-)  There are too 
many common exception cases for such a straightforward approach not to 
cause confusion.

3. My most common use case (not very common at that) is for stripping 
annoying prompts off text-based APIs.  I'm happy using .startswith() and 
string slicing for that, though your point about the repeated use of the 
string to be stripped off (or worse, hard-coding its length) is well made.

I am beginning to worry slightly that actually there are usually more 
appropriate things to do than simply cutting off affixes, and that in 
providing these particular batteries we might be encouraging poor practise.

-- 
Rhodri James *-* Kynesim Ltd


More information about the Python-ideas mailing list