Detecting line endings

Fuzzyman fuzzyman at gmail.com
Mon Feb 6 16:56:08 EST 2006


Sybren Stuvel wrote:
> Fuzzyman enlightened us with:
> > My worry is that if '\n' *doesn't* signify a line break on the Mac,
> > then it may exist in the body of the text - and trigger ``ending =
> > '\n'`` prematurely ?
>
> I'd count the number of occurences of '\r\n', '\n' without a preceding
> '\r' and '\r' without following '\n', and let the majority decide.
>

This is what I came up with. As you can see from the docstring, it
attempts to sensible(-ish) things in the event of a tie, or no line
endings at all.

Comments/corrections welcomed. I know the tests aren't very useful
(because they make no *assertions* they won't tell you if it breaks),
but you can see what's going on :

import re
import os

rn = re.compile('\r\n')
r = re.compile('\r(?!\n)')
n = re.compile('(?<!\r)\n')

# Sequence of (regex, literal, priority) for each line ending
line_ending = [(n, '\n', 3), (rn, '\r\n', 2), (r, '\r', 1)]


def find_ending(text, default=os.linesep):
    """
    Given a piece of text, use a simple heuristic to determine the line
    ending in use.

    Returns the value assigned to default if no line endings are found.
    This defaults to ``os.linesep``, the native line ending for the
    machine.

    If there is a tie between two endings, the priority chain is
    ``'\n', '\r\n', '\r'``.
    """
    results = [(len(exp.findall(text)), priority, literal) for
        exp, literal, priority in line_ending]
    results.sort()
    print results
    if not sum([m[0] for m in results]):
        return default
    else:
        return results[-1][-1]

if __name__ == '__main__':
    tests = [
        'hello\ngoodbye\nmy fish\n',
        'hello\r\ngoodbye\r\nmy fish\r\n',
        'hello\rgoodbye\rmy fish\r',
        'hello\rgoodbye\n',
        '',
        '\r\r\r \n\n',
        '\n\n \r\n\r\n',
        '\n\n\r \r\r\n',
        '\n\r \n\r \n\r',
        ]
    for entry in tests:
        print repr(entry)
        print repr(find_ending(entry))
        print

All the best,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
> Sybren
> --
> The problem with the world is stupidity. Not saying there should be a
> capital punishment for stupidity, but why don't we just take the
> safety labels off of everything and let the problem solve itself?
>                                              Frank Zappa




More information about the Python-list mailing list