[Tutor] clean text

Kent Johnson kent37 at tds.net
Tue May 19 20:31:18 CEST 2009


On Tue, May 19, 2009 at 1:19 PM, spir <denis.spir at free.fr> wrote:
> Thank you Albert, Kent, Sanders, Lie, Malcolm.
>
> This time regex wins! Thought it wouldn't because of the additional func call (too bad we cannot pass a mapping to re.sub). Actually the diff. is very small ;-) The relevant  change is indeed using a dict.

The substChar() function is only called when a control character is
found, so the relative time between the regex version and the next
best will depend on the character mix. Your random strings seem a bit
heavy on control chars.

My guess is that the reason regex is a win is because it gets rid of
the explicit Python-coded loop.

> Replacing string concat with ''.join() is slower (tested with 10 times and 100 times bigger strings too). Strange...
> Membership test in a set is only very slightly faster than in dict keys.

String concatenation has been optimized for this use case in recent
versions of Python.

Kent


More information about the Tutor mailing list