Find and Replace Simplification

Joshua Landau joshua at landau.ws
Sat Jul 20 18:33:23 EDT 2013


On 20 July 2013 22:56, Dave Angel <davea at davea.name> wrote:
> On 07/20/2013 02:37 PM, Joshua Landau wrote:
>>
>> The problem can be solved, I'd imagine, for builtin types. Just build
>> an internal representation upon calling .translate that's faster. It's
>> especially easy in the list case
>
> What "list case"?  list doesn't have a replace() method or translate()
> method.

I mean some_str.translate(some_list).

>> -- just build a C array¹ at the start
>> mapping int -> int and then have really fast C mapping speeds.
>
>
> As long as you can afford to have a list with a billion or so entries in it.
> We are talking about strings and version 3.3, aren't we?  Of course, one
> could always examine the mapping object (table) and see what the max value
> was, and only build a "C array" if it was smaller than say 50,000.

When talking about some_str.translate(some_list), this doesn't apply
very much -- they've already gotten a much bigger Python list.

In the dict case² I don't actually want to jump to the conclusion that
one should do array-based mappings because I can see the obvious
downsides and it's obviously not good to have 100 cases in there,
*but* I still think that there's a solution.

Here are some ideas:
· Latin and ASCII can obviously be done with a C array, and I imagine
that covers at least a fair portion of use-cases.
· If you only have a few characters in the mapping (so sys.getsizeof
is small) then it'll be a lot faster to just iterate through a C list
instead of checking the dict.
· Other cases are:
    · Full-character-set or equiv. mappings, which are already faster
than .replace(). Those should really be re-made into lists so that the
list optimisation can take place, and lists are much faster even in
versions without these hypothetical optimizations, too.
    · Custom objects. There's nothing we can do here.

I realise that this is a lot more code, so it's not something I'm
going to try to force. However, I think it's useful if it stops people
using .replace in a loop ;).

² some_str.translate(some_dict)



More information about the Python-list mailing list