[Python-ideas] New explicit methods to trim strings
Rhodri James
rhodri at kynesim.co.uk
Tue Apr 2 07:06:33 EDT 2019
On 02/04/2019 01:52, Steven D'Aprano wrote:
> Here's a partial list of English prefixes that somebody doing text
> processing might want to remove to get at the root word:
>
> a an ante anti auto circum co com con contra contro de dis
> en ex extra hyper il im in ir inter intra intro macro micro
> mono non omni post pre pro sub sym syn tele un uni up
>
> I count fourteen clashes:
>
> a: an ante anti
> an: ante anti
> co: com con contra contro
> ex: extra
> in: inter intra intro
> un: uni
>
> (That's over a third of this admittedly incomplete list of prefixes.)
>
> I can think of at least one English suffix pair that clash: -ify, -fy.
You're beginning to persuade me that cut/trim methods/functions aren't a
good idea :-)
So far we have two slightly dubious use-cases.
1. Stripping file extensions. Personally I find that treating filenames
like filenames (i.e. using os.path or (nowadays) pathlib) results in me
thinking more appropriately about what I'm doing.
2. Stripping prefixes and suffixes to get to root words. Python has
been used for natural language work for over a decade, and I don't think
I've heard any great call from linguists for the functionality. English
isn't a girl who puts out like that on a first date :-) There are too
many common exception cases for such a straightforward approach not to
cause confusion.
3. My most common use case (not very common at that) is for stripping
annoying prompts off text-based APIs. I'm happy using .startswith() and
string slicing for that, though your point about the repeated use of the
string to be stripped off (or worse, hard-coding its length) is well made.
I am beginning to worry slightly that actually there are usually more
appropriate things to do than simply cutting off affixes, and that in
providing these particular batteries we might be encouraging poor practise.
--
Rhodri James *-* Kynesim Ltd
More information about the Python-ideas
mailing list