Delete all not allowed characters..

Michal Bozon bozonm at vscht.cz
Thu Oct 25 17:23:37 EDT 2007


> 
>> the list comprehension does not allow "else", but it can be used in a
>> similar form:
>> 

( I was wrong, as Tim Chase have shown )

>> s2 = ""
>> for ch in s1:
>>     s2 += ch if ch in allowed else " "
>> 
>> (maybe this could be written more nicely)
> 
> Repeatedly adding strings together in this way is about the most 
> inefficient, slow way of building up a long string. (Although I'm sure 
> somebody can come up with a worse way if they try hard enough.)
> 
> Even though recent versions of CPython have a local optimization that 
> improves the performance hit of string concatenation somewhat, it is 
> better to use ''.join() rather than add many strings together:
> 

String appending is not tragically slower,
for strings long tens of MB, the speed
makes me a difference in few tens of percents,
so it is not several times slower, or so

> s2 = []
> for ch in s1:
>     s2.append(ch if (ch in allowed) else " ")
> s2 = ''.join(s2)
> 
> Although even that doesn't come close to the efficiency and speed of 
> string.translate() and string.maketrans(). Try to find a way to use them.
> 
> Here is one way, for ASCII characters.
> 
> allowed = "abcdef"
> all = string.maketrans('', '')
> not_allowed = ''.join(c for c in all if c not in allowed)
> table = string.maketrans(not_allowed, ' '*len(not_allowed))
> new_string = string.translate(old_string, table)

Nice, I did not know that string translation exists, but
Abandoned have defined allowed characters, so making
a translation table for the unallowed characters,
which would take nearly complete unicode character table
would be inefficient.




More information about the Python-list mailing list