string substitutions

John Machin sjmachin at lexicon.net
Mon Feb 25 17:26:22 EST 2002


gerson.kurz at t-online.de (Gerson Kurz) wrote in message news:<3c787c1f.2523859 at news.t-online.de>...
> On 23 Feb 2002 11:52:10 -0800, bobnotbob at byu.edu (Bob Roberts) wrote:
> 
> >What would be a good way to replace every one or more spaces (" ") in
> >a string with just one space?  Or replace any number of newlines with
> >just one?
> 
> Lets see. There were four solutions mentioned:
> 
> -------------(cut here)---------------
> def test1(newstring):
>     while newstring.find('  ') > -1:
>         newstring = newstring.replace('  ', ' ')
>     return newstring
> 
> def test2(newstring):
>     return " ".join(filter(None,newstring.split(' ')))

This "solution" removes leading and trailing spaces. That may or may
not be what the OP was asking for.

> 
> def test3(newstring):
>     return re.sub(' +', ' ', newstring)

Coded extremely suboptimally.
Try this:
def test3o(newstring, subber=re.compile('  +').sub):
    # Note *two* spaces in pattern!!
    return subber(' ', newstring)

> 
> def test4(newstring):
>     return ' '.join(newstring.split())
> -------------(cut here)---------------
> 
> Note that test4() does also split newlines (and tabs), this is why
> test2() explicitly splits for *blanks only*. Given these, you can test
> which one is fastest

>         func("This  is a    test\n isn't it?")

Only one piece of test data? Try these pieces and then try to explain
the
timings:

[("a" + " " * n) * (100 / (n+1)) for n in range(5)]

Just for fun, try a line with say 10000 spaces and nothing else. Try
to explain why solution 2 which is superficially O(N) goes off the
planet [it does on my platform, YMMV], even worse than solution 1
which is superficially O(N*log(N)) and why solutions 3, 3o & 4 stay
sane.

Then repeat all of the above with Python 2.1 or earlier, for the
benefit of those not yet upgraded.

Then there's the possibility that the results may vary by platform ...

> 
> So, the simple "while"-solution (test1) is actually the fastest if you
> really only want to replace spaces (and not newlines and tabs, also).
> I find that quite interesting because the other three solutions seem
> so much more sophisticated.

Understandable ... "sophisticated" can also mean "deceptive" or
"misleading" according to my dictionary.



More information about the Python-list mailing list