[Python-Dev] Re: [Python-checkins] Using string methods in stdlib

Guido van Rossum guido@python.org
Tue, 09 Jul 2002 16:04:13 -0400


> (moved to python-dev and changed title)

The move didn't succeed.  But I'm moving this response there.

[Ping]
> > > >
> > > > Update of /cvsroot/python/python/dist/src/Lib
> > > >
> > > > Modified Files:
> > > >         cgitb.py
> > > >
> > > > +             if name[:1] == '_': continue

[Neal]
> > > Any reason not to use:
> > >
> > > if name.startswith('_'): continue
> > >
> > > ?

[Fredrik]
> > tried benchmarking?

[Neal again]
> I wasn't asking because of speed.  I don't know
> which version is faster and I couldn't care less.
> I think using the method is clearer.

startswith() was added because it was observed that there were
(relatively) frequent bugs involving tests for s[:I] == C where len(C)
!= I, either due to miscounting or due to an edit of C without a
matching edit of I.  startswith() avoids all that.

> > and figuring out that "_" is exactly one character long isn't
> > that hard, really.
> 
> I agree that for a single character either way is clear.

Agreed too.  The startswith() use case is for string long enough that
you don't "see" the length immediately.  Probably that means anything
longer than 4.  But in order to create good habits I think it's fine
to use it in all cases.

> > (can we please cut this python newspeak enforcement crap
> > now, btw.  even if slicing hadn't been much faster, there's
> > nothing wrong with using an idiom that has worked perfectly
> > fine for the last decade...)

Maybe Neal is showing a bit too much of youthful enthusiasm for the
new way.  But I don't see it as enforcement crap.  When I see Python
code I wrote 10 years ago that works fine, I usually still think, "um,
I wouldn't have written it that way now."  If I think that about code
that I feel is important as an example for later generations I like to
fix it.

We're trying to stay out of the modules that need to remain 1.5.2
compatible.

> I thought the stdlib used startswith/endswith.  But I did 
> a simple grep just to find where startswith could be used and 
> was surprised to find about 150 cases.  Many are 1 char,
> but there are many others of 5+ chars which make it harder
> to determine immediately if the code is correct.
> 
> I also see several cases of code like this in mimify:
> 
> 	line[:len(prefix)] == prefix
> 
> and other places where the length is calculated elsewhere, 
> (rlcompleter) making it even harder to verify correctness.
> 
> Part of the reason to prefer the methods is for defensive programming.
> There is duplicate information by using slicing (str & length) and 
> it's possible to change half the information and not the other, 
> leading to bugs.  That's not possible with the methods.
> 
> I don't think the stdlib should use every new feature.  But I
> do think it should reflect the best programming practices and
> should be programmed defensively in order to try to avoid future bugs.

I agree for new code, but I think we should be conservative in
migrating existing code to use new idioms.  It's better only to do
that as part of a general overhaul of a module.  As I've remarked
before, I'm no big fan of "peephole" changes, where lots of modules
are changed to implement one particular style change (e.g. string
methods).  Historically, such peephole changes have always introduced
bugs because it's 99% boring work, and then you start making mistakes.
Also, it leads to anachronisms where ancient code suddenly makes use
of a modern feature but otherwise still looks ancient.

--Guido van Rossum (home page: http://www.python.org/~guido/)