trailing space in RE
Harvey Thomas
hst at empolis.co.uk
Fri Aug 2 12:06:14 EDT 2002
Doru-Catalin Togea [mailto:doru-cat at ifi.uio.no] wrote:
>
> Hi all!
>
> I have written a little script to parse some Bible text, and to this
> purpose I defined the following re:
>
> bibleRef = r'(\w+) (\d+):(\d+) (.+)'
>
> I use it to match Bible references of the kind: 'gen 1:1' or
> '1Co 10:12',
> and the pertaining text.
>
> My re looks for the following:
> - (\w+) # first 3 characters, either 3 letters, or 1 digit and 2
> letters: 'gen', '1co'
> - a space
> - (\d+) # a number
> - : # a collon
> - (\d+) # another number
> - (.+) # the rest of the text
>
> Everything works fine, but I have a problem in that "the rest of the
> text" allways has a trailing space like this:
>
> "Gen 1:1 In the beginning God created the heavens and the earth. "
> "1Co 10:12 Therefore let him who thinks he stands take heed lest he
> fall. "
>
> So my question is, how do I match "the rest of the text" but
> not the last
> character (which is a space)?
>
> I guess I could strip, or slice my text before matching it,
> but I would
> like to know how to write the re, as described above.
>
> I appreciate your help.
>
> Best regards,
> Catalin
>
>
>
> <<<< ================================== >>>>
> << We are what we repeatedly do. >>
> << Excellence, therefore, is not an act >>
> << but a habit. >>
> <<<< ================================== >>>>
>
use the following:
bibleRef = r'(\w+) (\d+):(\d+) (.+)(?=\s*$)'
The (?=\s*$) is a lookahead assertion meaning that the (.+) will terminate at the last non-space character which is followed by zero or more white space characters and then the end of the string.
HTH
Harvey
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list
mailing list