Detecting line endings

Fuzzyman fuzzyman at gmail.com
Tue Feb 7 10:57:04 EST 2006


Bengt Richter wrote:
> On 6 Feb 2006 06:35:14 -0800, "Fuzzyman" <fuzzyman at gmail.com> wrote:
>
> >Hello all,
> >
> >I'm trying to detect line endings used in text files. I *might* be
> >decoding the files into unicode first (which may be encoded using
> >multi-byte encodings) - which is why I'm not letting Python handle the
> >line endings.
> >
> >Is the following safe and sane :
> >
> >text = open('test.txt', 'rb').read()
> >if encoding:
> >    text = text.decode(encoding)
> >ending = '\n' # default
> >if '\r\n' in text:
> >    text = text.replace('\r\n', '\n')
> >    ending = '\r\n'
> >elif '\n' in text:
> >    ending = '\n'
> >elif '\r' in text:
> >    text = text.replace('\r', '\n')
> >    ending = '\r'
> >
> >
> >My worry is that if '\n' *doesn't* signify a line break on the Mac,
> >then it may exist in the body of the text - and trigger ``ending =
> >'\n'`` prematurely ?
> >
> Are you guaranteed that text bodies don't contain escape or quoting
> mechanisms for binary data where it would be a mistake to convert
> or delete an '\r' ? (E.g., I think XML CDATA might be an example).
>

My personal use case is for reading config files in arbitrary encodings
(so it's not an issue).

How would Python handle opening such files when not in binary mode ?
That may be an  issue even on Linux - if you  open a windows file and
use splitlines does Python convert '\r\n' to '\n' ? (or does it leave
the extra '\r's in place, which is *different to the behaviour under
windows).

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

> Regards,
> Bengt Richter




More information about the Python-list mailing list