Detecting line endings

Bengt Richter bokr at oz.net
Tue Feb 7 10:42:32 EST 2006


On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <fuzzyman at gmail.com> wrote:

>Hello all,
>
>I'm trying to detect line endings used in text files. I *might* be
>decoding the files into unicode first (which may be encoded using
>multi-byte encodings) - which is why I'm not letting Python handle the
>line endings.
>
>Is the following safe and sane :
>
>text = open('test.txt', 'rb').read()
>if encoding:
>    text = text.decode(encoding)
>ending = '\n' # default
>if '\r\n' in text:
>    text = text.replace('\r\n', '\n')
>    ending = '\r\n'
>elif '\n' in text:
>    ending = '\n'
>elif '\r' in text:
>    text = text.replace('\r', '\n')
>    ending = '\r'
>
>
>My worry is that if '\n' *doesn't* signify a line break on the Mac,
>then it may exist in the body of the text - and trigger ``ending =
>'\n'`` prematurely ?
>
Are you guaranteed that text bodies don't contain escape or quoting
mechanisms for binary data where it would be a mistake to convert
or delete an '\r' ? (E.g., I think XML CDATA might be an example).

Regards,
Bengt Richter



More information about the Python-list mailing list