Fastest way to calculate leading whitespace

Patrick Maupin pmaupin at gmail.com
Sat May 8 13:59:03 EDT 2010


On May 8, 12:19 pm, dasacc22 <dasac... at gmail.com> wrote:
> Hi
>
> This is a simple question. I'm looking for the fastest way to
> calculate the leading whitespace (as a string, ie '    ').
>
> Here are some different methods I have tried so far
> --- solution 1
>
> a = '    some content\n'
> b = a.strip()
> c = ' '*(len(a)-len(b))
>
> --- solution 2
>
> a = '    some content\n'
> b = a.strip()
> c = a.partition(b[0])[0]
>
> --- solution 3
>
> def get_leading_whitespace(s):
>     def _get():
>         for x in s:
>             if x != ' ':
>                 break
>             yield x
>     return ''.join(_get())
>
> ---
>
> Solution 1 seems to be about as fast as solution 2 except in certain
> circumstances where the value of b has already been determined for
> other purposes. Solution 3 is slower due to the function overhead.
>
> Curious to see what other types of solutions people might have.
>
> Thanks,
> Daniel

Well, you could try a solution using re, but that's probably only
likely to be faster if you can use it on multiple concatenated lines.
I usually use something like your solution #1.  One thing to be aware
of, though, is that strip() with no parameters will strip *any*
whitespace, not just spaces, so the implicit assumption in your code
that what you have stripped is spaces may not be justified (depending
on the source data).  OTOH, depending on how you use that whitespace
information, it may not really matter.  But if it does matter, you can
use strip(' ')

If speed is really an issue for you, you could also investigate
mxtexttools, but, like re, it might perform better if the source
consists of several batched lines.

Regards,
Pat



More information about the Python-list mailing list