regexp: extracting multiple multiline groups
Steven Bethard
bediviere at hotmail.com
Thu Oct 3 21:35:59 EDT 2002
I have an input file that looks something like:
--- 1
A description that
could be multiple lines
--- 2
Another description
...
I'd like to extract both the number and the corresponding description for each entry. Right now, I do this by:
docNumbersMatcher = re.compile(r"^--- (\d+)$", re.MULTILINE)
docNumbers = docNumbersMatcher.findall(output)
docBoundaryMatcher = re.compile("^--- \d+$", re.MULTILINE)
docs = docBoundaryMatcher.split(output)
However, it seems a waste to run through the same document twice with essentially the same expression. Is there a way to do this with a single pass? I've tried a few things, but they typically take too much or too little. For example:
m = re.compile("^--- (\d+)\n(.*)", re.MULTILINE)
m.findall(output)
gets the correct digit but only extracts the description to the first newline, and
m = re.compile("^--- (\d+)\n(.*)", re.MULTILINE | re.DOTALL)
m.findall(output)
gets the first digit, and then includes everything else as the first description.
Any help would be appreciated. Thanks in advance,
Steve
--------------------------------------------------------------------------------
Most wierdos want to be.
- Jimmie's Chicken Shack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20021003/d1e46497/attachment.html>
More information about the Python-list
mailing list