[Python-ideas] More user-friendly version for string.translate()

Chris Barker chris.barker at noaa.gov
Wed Oct 26 18:17:42 EDT 2016


I"ve lost track of what (If anything) is actually being proposed here... so
I"m going to try a quick summary:


1) an easy way to spell "remove all the characters other than these"

I think that's a good idea. What with unicode having an enormous number of
code points, it really does make sense to have a way to specify only what
you want, rather than what you don't want.

Back in the good old days of 1-byte chars, it wasn't hard to build up a
full 256 element translate table -- not so much anymore. And one of the
whole points of str.translate() is good performance.

 a) a new method:

   str.remove_all_but(sequence_of_chars)
  (naming TBD)

b) a new flag in translate (Kind of like the decode keywords)

  str.translate(table, missing='ignore'|'remove')


(b) has the advantage of adding translation and removal in one fell swoop
-- but if you only want to remove, then you have to make a translation
table of 1:1 mappings = not hard, but a annoying:

table = {c:c for c in sequence_of_chars}

I'm on the fence about what I personally prefer.

2) (in another thread, but similar enough) being able to pass in more than
one string to replace:

str.replace( old=seq_of_strings, new=seq_of_strings )

I know I've wanted this a lot, and certainly from a performance
perspective, it could be a nice bonus.

But: It overlaps a lot with str.translate -- at least for single character
replacements. so really why? so it would really only make sense if
supported multi-char strings:

str.replace(old = ("aword", "another_word"), ("something", "something
else"))

However: a string IS a sequence of strings, so we'd have confusion about
that:

str.replace("this", "four")

Does the user want the word "this" replaced with the word "four" -- or do
they want each character replaced?

Maybe we'd need a .replace_many() method? ugh!

There are also other issues with what to di with repeated / overlapping
cahractors:

str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")

and all sort of other complications!

THAT I think could be nailed down by defining the "order of operations"
Does it lop through the entire string for each item? or through each item
for each point in the string? note that if you loop thorugh the entire
string for each item, you might as well have written the loop yourself:

for old, new in sip(old_list, new_list):
    s = s.replace(old, new))

and at least if the length of the string si long-ish, and the number of
replacements short-ish -- performance would be fine.


*** So the question is -- is there support for these enhancements? If so,
then it would be worth hashing ot the details.

But the next question is -- does anyone care enough to manage that process
-- it'll be a lot of work!

NOTE: there has also been a fair bit of discussion in this thread about
ordinals vs characters, and unicode itself -- I don't think any of that
resulted in any possible proposals...

-CHB



On Wed, Oct 26, 2016 at 2:48 PM, Mikhail V <mikhailwas at gmail.com> wrote:

> On 26 October 2016 at 20:58, Stephen J. Turnbull
> <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> >import collections
> >def translate_or_drop(string, table):
> >    """
> >    string: a string to process
> >    table: a dict as accepted by str.translate
> >    """
> >    return string.translate(collections.defaultdict(lambda: None,
> **table))
>
> >All OK now?
>
> Not really. I tried with a simple example
> intab = "ae"
> outtab = "XM"
> table = string.maketrans(intab, outtab)
> collections.defaultdict(lambda: None, **table)
>
> an this gives me
> TypeError: type object argument after ** must be a mapping, not str
>
> But I probably I misunderstood the idea. Anyway this code does not make
> much sence to me, I would never in life understand what is meant here.
> And in my not so big, but not so small, Python experience I *never* had
> an occasion using collections or lambda.
>
> >sets as a single, universal character set.  As it happens, although
> >there are differences of opinion over how to handle Unicode in Python,
> >there is consensus that Python does have to handle Unicode flexibly,
> >effectively and efficiently.
> >
>
> I was merely talking about syntax and sources files standard, not about
> unicode
> strings. No doubt one needs some way to store different glyph sets.
>
> So I was talking about that if one defines a syntax and has good intentions
> for readability in mind, there is not so many rationale to adopt the syntax
> to current "hybrid" system: 7-bit and/or multibyte paradigm.
> Again this a too far going discussion, but one should not probably much
> look ahead on those. The situation is not so good in this sense that most
> standard software is  attached to this strange paradigm
> (even those which does not have anything
> to do with multi-lingual typography).
> So IMO something gone wrong with those standard characters.
>
> >If you insist on bucking it, you'll
> >have to do it pretty much alone, perhaps even maintaining your own
> >fork of Python.
>
> As for me I would take the path of developing of own IDE which will enable
> typografic quality rendering and of course all useful glyphs, such as
> curly quotes,
> bullets, etc, which all is fundamental to any possible improvements of
> cognitive qualities of code. And I'll stay in 8-bit boundaries, thats for
> sure.
> So if Python will take the path of "unicode" code input (e.g. for some
> punctuaion characters)
> this would only add a minor issue for generating valid Python source
> files in this case.
>
>
> Mikhail
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161026/1b55b06c/attachment-0001.html>


More information about the Python-ideas mailing list