regexp: extracting multiple multiline groups

Steven Bethard bediviere at hotmail.com
Thu Oct 3 21:35:59 EDT 2002


I have an input file that looks something like:

--- 1
A description that
could be multiple lines
--- 2
Another description
...

I'd like to extract both the number and the corresponding description for each entry.  Right now, I do this by:

 docNumbersMatcher = re.compile(r"^--- (\d+)$", re.MULTILINE)
 docNumbers = docNumbersMatcher.findall(output)

 docBoundaryMatcher = re.compile("^--- \d+$", re.MULTILINE)
 docs = docBoundaryMatcher.split(output)

However, it seems a waste to run through the same document twice with essentially the same expression.  Is there a way to do this with a single pass?  I've tried a few things, but they typically take too much or too little.  For example:

  m = re.compile("^--- (\d+)\n(.*)", re.MULTILINE)
  m.findall(output)

gets the correct digit but only extracts the description to the first newline, and

  m = re.compile("^--- (\d+)\n(.*)", re.MULTILINE | re.DOTALL)
  m.findall(output)

gets the first digit, and then includes everything else as the first description.

Any help would be appreciated.  Thanks in advance,

Steve


--------------------------------------------------------------------------------
      Most wierdos want to be. 
     - Jimmie's Chicken Shack 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20021003/d1e46497/attachment.html>


More information about the Python-list mailing list