[Python-ideas] Complicate str methods

INADA Naoki songofacandy at gmail.com
Thu Feb 8 12:14:50 EST 2018


I think it shouldn't be str's method.

They should be separate class to reuse internal tree.

There are some Aho Corasick implementation on PyPI.
As far as I know, AC is longest match.

On the other hand, Go's replacer (it's trie based too) is:

> Replacements are performed in order, without overlapping matches.
https://golang.org/pkg/strings/#NewReplacer


On Sun, Feb 4, 2018 at 7:04 AM, Franklin? Lee
<leewangzhong+python at gmail.com> wrote:
> Let s be a str. I propose to allow these existing str methods to take params
> in new forms.
>
> s.replace(old, new):
>     Allow passing in a collection of olds.
>     Allow passing in a single argument, a mapping of olds to news.
>     Allow the olds in the mapping to be tuples of strings.
>
> s.split(sep), s.rsplit, s.partition:
>     Allow sep to be a collection of separators.
>
> s.startswith, s.endswith:
>     Allow argument to be a collection of strings.
>
> s.find, s.index, s.count, x in s:
>     Similar.
>     These methods are also in `list`, which can't distinguish between items,
> subsequences, and subsets. However, `str` is already inconsistent with
> `list` here: list.M looks for an item, while str.M looks for a subsequence.
>
> s.[r|l]strip:
>     Sadly, these functions already interpret their str arguments as
> collections of characters.
>
> These new forms can be optimized internally, as a search for multiple
> candidate substrings can be more efficient than searching for one at a time.
> See
> https://stackoverflow.com/questions/3260962/algorithm-to-find-multiple-string-matches
>
> The most significant change is on .replace. The others are simple enough to
> simulate with a loop or something. It is harder to make multiple
> simultaneous replacements using one .replace at a time, because previous
> replacements can form new things that look like replaceables. The easiest
> Python solution is to use regex or install some package, which uses (if
> you're lucky) regex or (if unlucky) doesn't simulate simultaneous
> replacements. (If possible, just use str.translate.)
>
> I suppose .split on multiple separators is also annoying to simulate. The
> two-argument form of .split may be even more of a burden, though I don't
> know when a limited multiple-separator split is useful. The current best
> solution is, like before, to use regex, or install a package and hope for
> the best.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-ideas mailing list