trailing space in RE

Harvey Thomas hst at empolis.co.uk
Fri Aug 2 12:06:14 EDT 2002


 Doru-Catalin Togea [mailto:doru-cat at ifi.uio.no] wrote:
> 
> Hi all!
> 
> I have written a little script to parse some Bible text, and to this
> purpose I defined the following re:
> 	
> 	bibleRef = r'(\w+) (\d+):(\d+) (.+)'
> 
> I use it to match Bible references of the kind: 'gen 1:1' or 
> '1Co 10:12',
> and the pertaining text.
> 
> My re looks for the following:
> - (\w+) # first 3 characters, either 3 letters, or 1 digit and 2
> letters: 'gen', '1co'
> - a space
> - (\d+) # a number
> - :     # a collon
> - (\d+) # another number
> - (.+)  # the rest of the text
> 
> Everything works fine, but I have a problem in that "the rest of the
> text" allways has a trailing space like this:
> 	
> "Gen 1:1 In the beginning God created the heavens and the earth. "
> "1Co 10:12 Therefore let him who thinks he stands take heed lest he
> fall. "
>  
> So my question is, how do I match "the rest of the text" but 
> not the last
> character (which is a space)?
> 
> I guess I could strip, or slice my text before matching it, 
> but I would
> like to know how to write the re, as described above.
> 
> I appreciate your help.
> 
> Best regards,
> Catalin
> 
> 
> 
> 	<<<< ================================== >>>>
> 	<<     We are what we repeatedly do.      >>
> 	<<  Excellence, therefore, is not an act  >>
> 	<<             but a habit.               >>
> 	<<<< ================================== >>>>
> 
use the following:

bibleRef = r'(\w+) (\d+):(\d+) (.+)(?=\s*$)'

The (?=\s*$) is a lookahead assertion meaning that the (.+) will terminate at the last non-space character which is followed by zero or more white space characters and then the end of the string.

HTH

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.




More information about the Python-list mailing list