[Tutor] regexp help

Kent Johnson kent_johnson at skillsoft.com
Thu Sep 23 17:26:45 CEST 2004


OK. I suggest you use re.sub() on the match data to remove the blank lines. 
So you would use one re to get the data and another one to remove the blank 
lines. For example:

 >>> import re
 >>> data = '   \n   \nhere\nis\n  \nsome data\n\nwith\nblank 
lines   \n   \n   '
 >>> rx = re.compile(r'(^|\n)\s*(\n|$)')
 >>> rx.sub(r'\1', data)
'here\nis\nsome data\nwith\nblank lines   \n'
 >>> rx.sub(r'\1', '\n\n\ntest\n\n\n')
'test\n'

The re to match blank lines is a little tricky because you want to match
'   \n' - a blank line at the beginning
'\n  \n' - a blank line in the middle
'   ' - a blank line at the end

The solution I used was to allow either beginning of string ^ or newline \n 
before the blank, and either newline or end of string $ at the end. Then I 
replace by whatever matched at the beginning so the blank line at start is 
not replaced by a newline itself.

If your data uses \r\n for newlines you will have to modify the regex a 
little bit.

Kent

At 04:25 PM 9/23/2004 +0200, Botykai Zsolt wrote:
>csütörtök 23 szeptember 2004 16:14 dátummal Kent Johnson ezt írta:
> > The code you have will read all input from stdin and process it at once.
> > The regex will match everything from the first --==--==--==--==-- to the
> > last, even spanning many lines. Is that what you want?
>
>yes. these are not so big text file, with contents like this:
>
><some useless row>
><delimiter>
><some useful rows or empty lines (where empty lines contains only \n or 
>spaces
>and/or tabs + \n>
><some useful rows>
><some useful rows or empty lines (where empty lines contains only \n or 
>spaces
>and/or tabs + \n>
><some useful rows>
><some useful rows or empty lines (where empty lines contains only \n or 
>spaces
>and/or tabs + \n>
><delimiter>
><useless rows>
>
>AFAIK the first solution is to handle STDIN in a cycle and print process like
>you suggested, but I wanted to resolve this with regexp and without cycles.
>So you are right, I wanted to match all the lines between --==--==--==--==--s
>except the above defined empty lines.
>
>Zsoltik@



More information about the Tutor mailing list