Whittle it on down

Stephen Hansen me+python at ixokai.io
Thu May 5 03:34:29 EDT 2016


On Thu, May 5, 2016, at 12:04 AM, Steven D'Aprano wrote:
> On Thursday 05 May 2016 16:46, Stephen Hansen wrote:
> > > On Wed, May 4, 2016, at 11:04 PM, Steven D'Aprano wrote:
> >> Start by writing a function or a regex that will distinguish strings that
> >> match your conditions from those that don't. A regex might be faster, but
> >> here's a function version.
> >> ... snip ...
> > 
> > Yikes. I'm all for the idea that one shouldn't go to regex when Python's
> > powerful string type can answer the problem more clearly, but this seems
> > to go out of its way to do otherwise.
> > 
> > I don't even care about faster: Its overly complicated. Sometimes a
> > regular expression really is the clearest way to solve a problem.
> 
> You're probably right, but I find it easier to reason about matching in 
> Python rather than the overly terse, cryptic regular expression mini-
> language.
> 
> I haven't tested my function version, but I'm 95% sure that it is
> correct. 
> It trickiest part of it is the logic about splitting around ampersands.
> And 
> I'll cheerfully admit that it isn't easy to extend to (say) "ampersand,
> or 
> at signs". But your regex solution:
> 
> r"^[A-Z\s&]+$"
> 
> is much smaller and more compact, but *wrong*. For instance, your regex 
> wrongly accepts both "&&&&&" and "      " as valid strings, and wrongly 
> rejects "ΔΣΘΛ". Your Greek customers will be sad...

Meh. You have a pedantic definition of wrong. Given the inputs, it
produced right output. Very often that's enough. Perfect is the enemy of
good, it's said. 

There's no situation where "&&&&&" and "     " will exist in the given
dataset, and recognizing that is important. You don't have to account
for every bit of nonsense. 

If the OP needs a unicode-aware solution that redefines "A-Z" as perhaps
"\w" with an isupper call. Its still far simpler then you're suggesting.

-- 
Stephen Hansen
  m e @ i x o k a i . i o



More information about the Python-list mailing list