negative lookahead question

Skip Montanaro skip at pobox.com
Mon Apr 21 12:24:34 EDT 2003


This re.sub call lives in Lib/stmplib.py as a way to make line endings
canonical: 

    re.sub(r'(?:\r\n|\n|\r(?!\n))', CRLF, data)

This certainly seems to do what's desired, however, it looks overly complex
to me.  First, the non-grouping parens are unnecessary.  Second, I don't
think the negative lookahead assertion is required.  This simpler function
call seems to do the trick:

    re.sub(r'\r\n|\n|\r', CRLF, data)

A simple test case containing a combination of different line endings seems
to yield identical results:

    >>> data = 'line 0\r\nline 1\nline 2\rline 3\r\r\nline 4\n'
    >>> re.sub(r'\r\n|\n|\r(?!\n)',CRLF,data) == re.sub(r'\r\n|\n|\r',CRLF,data)
    True

Is there a case where the negative lookahead assertion will produce correct
results but the simpler regular expression won't?

(FYI, this isn't a performance question, but a readability question.)

Thx,

Skip





More information about the Python-list mailing list