how to avoid leading white spaces

rurpy at yahoo.com rurpy at yahoo.com
Fri Jun 3 15:29:52 EDT 2011


On 06/03/2011 08:25 AM, Steven D'Aprano wrote:
> On Fri, 03 Jun 2011 05:51:18 -0700, rurpy at yahoo.com wrote:
>
>> On 06/02/2011 07:21 AM, Neil Cerutti wrote:
>
>>> > Python's str methods, when they're sufficent, are usually more
>>> > efficient.
>>
>> Unfortunately, except for the very simplest cases, they are often not
>> sufficient.
>
> Maybe so, but the very simplest cases occur very frequently.

Right, and I stated that.

>> I often find myself changing, for example, a startwith() to
>> a RE when I realize that the input can contain mixed case
>
> Why wouldn't you just normalise the case?

Because some of the text may be case-sensitive.

>[...]
>> or that I have
>> to treat commas as well as spaces as delimiters.
>
> source.replace(",", " ").split(" ")

Uhgg. create a whole new string just so you can split it on
one rather than two characters?  Sorry, but I find

    re.split ('[ ,]', source)

states much more clearly exactly what is being done with no
obfuscation.  Obviously this is a simple enough case that the
difference is minor but when the pattern gets only a little
more complex, the clarity difference becomes greater.

>[...]
> re.split is about four times slower than the simple solution.

If this processing is a bottleneck, by all means use a more
complex hard-coded replacement for a regex.  In most cases
that won't be necessary.

>> After doing this a
>> number of times, one starts to use an RE right from the get go unless
>> one is VERY sure that there will be no requirements creep.
>
> YAGNI.

IAHNI. (I actually have needed it.)

> There's no need to use a regex just because you think that you *might*,
> someday, possibly need a regex. That's just silly. If and when
> requirements change, then use a regex. Until then, write the simplest
> code that will solve the problem you have to solve now, not the problem
> you think you might have to solve later.

I would not recommend you use a regex instead of a string method
solely because you might need a regex later.  But when you have
to spend 10 minutes writing a half-dozen lines of python versus
1 minute writing a regex, your evaluation of the possibility of
requirements changing should factor into your decision.

> [...]
>> In short, although your observations are true to some extent, they
>> are not sufficient to justify the anti-RE attitude often seen here.
>
> I don't think that there's really an *anti* RE attitude here. It's more a
> skeptical, cautious attitude to them, as a reaction to the Perl "when all
> you have is a hammer, everything looks like a nail" love affair with
> regexes.

Yes, as I said, the regex attitude here seems in large part to
be a reaction to their frequent use in Perl.  It seems anti- to
me in that I often see cautions about their use but seldom see
anyone pointing out that they are often a better solution than
a mass of twisty little string methods and associated plumbing.

> There are a few problems with regexes:
>
> - they are another language to learn, a very cryptic a terse language;

Chinese is cryptic too but there are a few billion people who
don't seem to be bothered by that.

> - hence code using many regexes tends to be obfuscated and brittle;

No.  With regexes the code is likely to be less brittle than
a dozen or more lines of mixed string functions, indexes, and
conditionals.

> - they're over-kill for many simple tasks;
> - and underpowered for complex jobs, and even some simple ones;

Right, like all tools (including Python itself) they are suited
best for a specific range of problems.  That range is quite wide.

> - debugging regexes is a nightmare;

Very complex ones, perhaps.  "Nightmare" seems an overstatement.

> - they're relatively slow;

So is Python.  In both cases, if it is a bottleneck then
choosing another tool is appropriate.

> - and thanks in part to Perl's over-reliance on them, there's a tendency
> among many coders (especially those coming from Perl) to abuse and/or
> misuse regexes; people react to that misuse by treating any use of
> regexes with suspicion.

So you claim.  I have seen more postings in here where
REs were not used when they would have simplified the code,
then I have seen regexes used when a string method or two
would have done the same thing.

> But they have their role to play as a tool in the programmers toolbox.

We agree.

> Regarding their syntax, I'd like to point out that even Larry Wall is
> dissatisfied with regex culture in the Perl community:
>
> http://www.perl.com/pub/2002/06/04/apo5.html

You did see the very first sentence in this, right?

  "Editor's Note: this Apocalypse is out of date and remains here
  for historic reasons. See Synopsis 05 for the latest information."

(Note that "Apocalypse" is referring to a series of Perl design
documents and has nothing to do with regexes in particular.)

Synopsis 05 is (AFAICT with a quick scan) a proposal for revising
regex syntax.  I didn't see anything about de-emphasizing them in
Perl.  (But I have no idea what is going on for Perl 6 so I could
be wrong about that.)

As for the original reference, Wall points out a number of
problems with regexes, mostly details of their syntax.  For
example that more frequently used non-capturing groups require
more characters than less-frequently used capturing groups.
Most of these criticisms seem irrelevant to the question of
whether hard-wired string manipulation code or regexes should
be preferred in a Python program.

And for the few criticisms that are relevant, nobody ever said
regexes were perfect.  The problems are well known, especially on
this list where we've all been told about them a million times.

The fact that REs are not perfect does not make them not useful.
We also know about Python's problems (slow, the GIL, excessively
terse and poorly organized documentation, etc) but that hardly
makes Python not useful.

Finally he is talking about *revising* regex syntax (in part by
replacing some magic character sequences with other "better" ones)
beyond the core CS textbook forms.  He was *not* AFAICT advocating
using hard-wired string manipulation code in place of regexes.
So it is hardly a condemnation of the concept of regexs, rather
just the opposite.

Perhaps you stopped reading after seeing his "regular expression
culture is a mess" comment without trying to see what he meant
by "culture" or "mess"?



More information about the Python-list mailing list