Simple regex with whitespaces

Paul McGuire ptmcg at austin.rr._bogus_.com
Mon Sep 11 10:36:35 EDT 2006


<mathieu.malaterre at gmail.com> wrote in message 
news:1157945804.805624.32620 at p79g2000cwp.googlegroups.com...
> Hello,
>
>  I cannot figure out a way to find a regular expression that would
> match one and only one of these two strings:
>
> s1 = '                              how are you'
> s2 = '    hello world               how are you'
>
>  All I could come up with was:
> patt = re.compile('^[ ]*([A-Za-z]+)[  ]+([A-Za-z]+)$')
>
>  Which of course does not work. I cannot express the fact: sentence
> have 0 or 1 whitespace, separation of group have two or more
> whitespaces.
>
> Any suggestion ? Thanks a bunch !
> Mathieu
>
A pyparsing approach is not as terse as regexp's, but it's not terribly long 
either.  Following John Machin's submission as a pattern:

s1 = '                              how are you'
s2 = '    hello world               how are you'

from pyparsing import *

wd = Word(printables)
# this is necessary to suppress pyparsing's built-in whitespace skipping
wd.leaveWhitespace()
sentence = delimitedList(wd, delim=White(' ',exact=1))

for test in (s1,s2):
    print sentence.searchString(test)


Pyparsing returns data as ParseResults objects, which can be accessed as 
lists or dicts.  From this first cut, we get:
[['how', 'are', 'you']]
[['hello', 'world'], ['how', 'are', 'you']]

These aren't really sentences any more, but we can have pyparsing put them 
back into sentences, by adding a parse action to sentence.

sentence.setParseAction(lambda toks: " ".join(toks))

Now our results are:
[['how are you']]
[['hello world'], ['how are you']]


If you really want to get fancy, and clean up some of that capitalization 
and lack of punctuation, you can add a more elaborate parse action instead:
ispunc = lambda s: s in ".!?;:,"
sixLoyalServingMen = ('What','Why','When','How','Where','Who')
def cleanup(t):
    t[0] = t[0].title()
    if not ispunc( t[-1][-1] ):
        if t[0] in sixLoyalServingMen:
            punc = "?"
        else:
            punc = "."
    else:
        punc = ""
    return " ".join(t) + punc
sentence.setParseAction(cleanup)

This time we get:
[['How are you?']]
[['Hello world.'], ['How are you?']]


The pyparsing home page is at pyparsing.wikispaces.com.

-- Paul





More information about the Python-list mailing list