Regular expressions

rurpy at yahoo.com rurpy at yahoo.com
Wed Nov 4 19:13:58 EST 2015


On 11/04/2015 07:52 AM, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>> I'm afraid you are making a category error but perhaps that's in
>> part because I wasn't clear.  I was not talking about computer
>> science.  I was talking about human beings learning about computers.
>> Most people I know consider programming to be a higher level activity
>> than "using" a computer: editing, sending email etc.  Many computer
>> users (not programmers) learn to use regular expressions as part
>> of using a computer without knowing anything about programming.
>> It was on that basis I called them more fundamental -- something
>> learned earlier which is expanded on and added to later.  But you
>> have a bit of a point, perhaps "fundamental" was not the best choice
>> of word to communicate that.
>
> The "fundamentals" of something are its most basic functions, not its
> most basic uses. The most common use of a computer might be to browse
> the web, but the fundamental functionality is arithmetic and logic.

If one accepted that then one would have to reject the term "fundamental 
use" as meaningless.  A quick trip to google shows that's not true.

> Setting aside the choice of word, though, I still don't think regular
> expressions are a more basic use of computing than loops and
> conditionals. A regex can't be used for anything other than string
> matching; they exist for one purpose, and one purpose only: to answer
> the question "Does this string match this pattern?". 

But string matching *is* a fundamental problem that arises frequently
in many aspects of CS, programming and, as I mentioned, day-to-day
computer use.  Saying its "only" for pattern matching is like saying 
floating point numbers are "only" for doing non-integer arithmetic,
or unicode is "only" for representing text.  (Neither of those is a 
good analogy because both lack the important theoretical underpinnings 
that regular expressions have [*]).
There would be far fewer computer languages, and they would be much
more primitive if regular expressions (and the fundamental concepts
that they express) did not exist.

To be sure, I did gloss over Michael Torries' point that there are 
other concepts that are more basic in the context of learning 
programming, he was correct about that. 

But that does not negate the fact that regexes are important and 
fundamental.  They are both very useful in a practical sense (they 
are even available in Microsoft Excel) and important in a theoretical 
sense.  You are not well rounded as a programmer if you decline to 
learn about regular expressions because "they are too cryptic", or 
"I can do in code anything they do".  

I think the constant negative reception the posters receive here when
they ask about regexes does them a great disservice.

By all means point out that python offers a number of functions that 
can avoid the need for using regexes in simple cases.  Even point out 
that you (the plural you) don't like them and prefer other solutions
(like writing code that does the same thing in a more half-assed bug
ridden way, the posts in this thread being a case in point.)

But I really wish every mention of regexes here wasn't reflexively 
greeted with a barrage of negative comments and that lame "two problems"
quote, especially without an answer to the poster's regex question.

> Sure, you can
> abuse that into a primality check and other forms of crazy arithmetic,
> but it's not what they truly do. I also would not teach regexes to
> people as part of an "introduction to computing" course, any more than
> I would teach the use of Microsoft Excel, which some such courses have
> been known to do. (And no, it's not because of the Microsoftness. I
> wouldn't teach LibreOffice Calc either.) You don't need to know how to
> work a spreadsheet as part of the basics of computer usage, and you
> definitely don't need an advanced form of text search.

Seems to me that clearly depends on the intent of the class, the students
goal's, what they'll be studying after the class, what their current 
level of knowledge is, etc.  Your scenario seems way too under-specified
to say anything definitive.  And further, the pedagogy of CS (or of any 
subject of education) is not "settled science" and that kind of question
almost never has a clear right/wrong answer.

This list is not a class.  If someone comes here with a question about 
Python's regexes they deserve an answer and not be bombarded with reasons
why they shouldn't be using regexes beyond mentioning some of the alternatives
in a "oh, by the way" way.  (And yes, I recognize in this case the OP did 
get a good answer from MRAB early on.)

----
[*] yes, I know there is a lot of CS theory underlying floating point.
I don't think it is as deep or as important as that underlying regexes,
automata and language.



More information about the Python-list mailing list