Equivalent of Perl chomp?

Thu Jan 31 18:24:16 EST 2002

"Mark McEahern" <marklists at mceahern.com> wrote in message
news:mailman.1012513005.17840.python-list at python.org...
> Tim Peters wrote:
> > Well, you're in for some fun of a kind I didn't have in mind <wink>.  Do
a
> > sanity check here:  what's the clock rate on your box?  How many cycles
go
> > by in 0.07 seconds (let alone 0.36!)?  Can you think of *any*
> > non-insane way
> > to implement these operations that would take so bloody long?
> >
> > [...] you might want to
> > contemplate the sensibility of using a 100MB string.
>
> I admit to being perfectly clueless about these subtleties, which I'm sure
> my code only demonstrated all too abundantly.  ;-)
>
Indeed, but it also demonstrated you are the kind of person who can generate
100 MB strings without having to sit there and listen to the hardware
wailing and gnashing its teeth, so I'm jealous already.

Besides which, you show a laudable propensity to satisfy your curiosity by
experiment, which makes a pleasant change from people asking questions they
should be answering for themselves with web searches :-) Well done!

> Anyway, switching the order and trying the test again only leads to
further
> confusion:
>
>   <function sliceIt at 0x100f9510> : 0.00 seconds.
>   <function endsWith at 0x100f94d0> : 0.00 seconds.
>
> And, when I switch the order back, I get the same thing:
>
>   <function endsWith at 0x100f94d0> : 0.00 seconds.
>   <function sliceIt at 0x100f9510> : 0.00 seconds.
>
So they both take exactly the same length of time, then? Perhaps not ...
maybe you need a few more significant figures?

> Is it:
>
>   1) That using time.time() for timing is brain dead?

No, using time.time() is fairly well-accepted, with the proviso that its
behavior in the large is much more reliable than its behavior in the small.
On Windows, for example, the clock tends to tick around eighteen times a
minute, so events timed naively can take either zero or about 50
milliseconds!

So, it's usually a good idea to repeat your test many times, and average out
the timings.

>   2) I'm using an unrealistically large string?
>
Infeasibly for me and my little laptop, I'm afraid, , but the length of the
string isn't really the point either. Your next exercise will be to see if
it makes any difference...

> Tim, is your point that the performance difference in normal cases is
> negligible?  If so, I would guess the further point would be to choose
> endswith() because it's more legible.
>
I think Tim is pointing out that the things you are measuring had better
happen in rather less than ten millliseconds if Python users aren't to take
up cudgels and beat the development team to death with the floating-point
unit from a Cray 1. From your latest timings the difference certainly
appears negligible, but things are often not what they seem.

Another thing to take into account is that testing invariably has to spend
some time in "framework" code, whose duration should really be subtracted
out if comparisons are to be meaningful.

> I generally prefer the more legible approach and don't worry too much
about
> what's faster, but I thought I'd try to see whether these two different
> approaches to endswith had a significant performance difference.
>
Here's a rewrite of your program (I added the comments after I got it
working).

#! /usr/bin/env python

import time

def getString(n):
    return "s" * n+ '\n'

def endsWith(s):
    s.endswith('\n')

def sliceIt(s):
    s[-1:] == '\n'

# A "do nothing" for comparison
def nullFunc(s):
    return s

# Now we do a million tests, and average
# the results for more sensible results
def timeIt(func):
    n = 1000000
    i = 1000000
    s = getString(n)
    timeIn = time.time()
    for i in xrange(i):
        func(s)
    timeOut = time.time()
    return (timeOut - timeIn)/(i*1.0)

# How long does it take to do nothing?
nullTime = timeIt(nullFunc)

# The reported times have the null time removed
print "Endswith %1.8f" % (timeIt(endsWith)-nullTime)
print "sliceIt  %1.8f" % (timeIt(sliceIt)-nullTime)

My results follow. You may find you need to add a couple more decimal places
to get sensible results, my Thinkpad is a bit of a cronker ...

D:\Steve\Projects\Python>python time1.py
Endswith 0.00000511
sliceIt  0.00000434

D:\Steve\Projects\Python>python time1.py
Endswith 0.00000511
sliceIt  0.00000186

D:\Steve\Projects\Python>python time1.py
Endswith 0.00000505
sliceIt  0.00000209

As you can see there is still some variability but slicing looks like a
clear winner. Increasing the iteration count might produce convergence on
the timings. You might also ponder why sliceIt appear more variable than
Endswith (tell me if you think of a good reason, I have no clue - this isn't
a trick question).

Tim actually dealt with you very kindly. Perhaps we were lucky he's busy.
>From what I've been able to gather, in a former incarnation he was
responsible for all kinds of hairy floating point stuff, and would go over
his benchmark programs with a sharp chisel to shave off the last few
microseconds. So he knows more about performance measurement than both of us
put together.

and-averaged-out-over-a-million-lifetimes-ly y'rs  - steve
--
Consulting, training, speaking: http://www.holdenweb.com/
Python Web Programming: http://pydish.holdenweb.com/pwp/