[Tutor] regexp

Terry Carroll carroll at tjc.com
Sun Nov 6 20:21:26 CET 2011


On Sat, 5 Nov 2011, Dinara Vakhitova wrote:

> I need to find the words in a corpus, which letters are in the alphabetical
> order ("almost", "my" etc.)
> I started with matching two consecutive letters in a word, which are in
> the alphabetical order, and tried to use this expression: ([a-z])[\1-z], but
> it won't work, it's matching any sequence of two letters. I can't figure out
> why... Evidently I can't refer to a group like this, can I? But how in this
> case can I achieve what I need?

First, I agree with the others that this is a lousy task for regular 
expressions.  It's not the tool I would use.  But, I do think it's doable, 
provided the requirement is not to check with a single regular expression. 
For simplicity's sake, I'll construe the problem as determining whether a 
given string consists entirely of lower-case alphabetic characters, 
arranged in alphabetical order.

What I would do is set a variable to the lowest permissible character, 
i.e., "a", and another to the highest permissible character, i.e., "z" 
(actually, you could just use a constant, for the highest, but I like the 
symmetry.

Then construct a regex to see if a character is within the 
lowest-permissible to highest-permissible range.

Now, iterate through the string, processing one character at a time.  On 
each iteration:

  - test if your character meets the regexp; if not, your answer is
    "false"; on pass one, this means it's not lower-case alphabetic; on
    subsequent passes, it means either that, or that it's not in sorted
    order.
  - If it passes, update your lowest permissible character with the
    character you just processed.
  - regenerate your regexp using the updated lowest permissible character.
  - iterate.

I assumed lower case alphabetic for simplicity, but you could modify this 
basic approach with mixed case (e.g., first transforming to all-lower-case 
copy) or other complications.

I don't think there's a problem with asking for help with homework on this 
list; but you should identify it as homework, so the responders know not 
to just give you a solution to your homework, but instead provide you with 
hints to help you solve it.


More information about the Tutor mailing list