Fastest way to calculate leading whitespace

Patrick Maupin pmaupin at gmail.com
Sat May 8 18:18:00 EDT 2010


On May 8, 1:16 pm, dasacc22 <dasac... at gmail.com> wrote:
> On May 8, 12:59 pm, Patrick Maupin <pmau... at gmail.com> wrote:
>
>
>
> > On May 8, 12:19 pm, dasacc22 <dasac... at gmail.com> wrote:
>
> > > Hi
>
> > > This is a simple question. I'm looking for the fastest way to
> > > calculate the leading whitespace (as a string, ie '    ').
>
> > > Here are some different methods I have tried so far
> > > --- solution 1
>
> > > a = '    some content\n'
> > > b = a.strip()
> > > c = ' '*(len(a)-len(b))
>
> > > --- solution 2
>
> > > a = '    some content\n'
> > > b = a.strip()
> > > c = a.partition(b[0])[0]
>
> > > --- solution 3
>
> > > def get_leading_whitespace(s):
> > >     def _get():
> > >         for x in s:
> > >             if x != ' ':
> > >                 break
> > >             yield x
> > >     return ''.join(_get())
>
> > > ---
>
> > > Solution 1 seems to be about as fast as solution 2 except in certain
> > > circumstances where the value of b has already been determined for
> > > other purposes. Solution 3 is slower due to the function overhead.
>
> > > Curious to see what other types of solutions people might have.
>
> > > Thanks,
> > > Daniel
>
> > Well, you could try a solution using re, but that's probably only
> > likely to be faster if you can use it on multiple concatenated lines.
> > I usually use something like your solution #1.  One thing to be aware
> > of, though, is that strip() with no parameters will strip *any*
> > whitespace, not just spaces, so the implicit assumption in your code
> > that what you have stripped is spaces may not be justified (depending
> > on the source data).  OTOH, depending on how you use that whitespace
> > information, it may not really matter.  But if it does matter, you can
> > use strip(' ')
>
> > If speed is really an issue for you, you could also investigate
> > mxtexttools, but, like re, it might perform better if the source
> > consists of several batched lines.
>
> > Regards,
> > Pat
>
> Hi,
>
> thanks for the info. Using .strip() to remove all whitespace in
> solution 1 is a must. If you only stripped ' ' spaces then line
> endings would get counted in the len() call and when multiplied
> against ' ', would produce an inaccurate result. Regex is
> significantly slower for my purposes but ive never heard of
> mxtexttools. Even if it proves slow its spurred my curiousity as to
> what functionality it provides (on an unrelated note)

Could you reorganize your code to do multiple lines at a time?  That
might make regex competitive.

Regards,
Pat



More information about the Python-list mailing list