[issue12855] linebreak sequences should be better documented

Tue Aug 30 06:45:19 CEST 2011

Matthew Boehm <boehm.matthew at gmail.com> added the comment:

I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:

"""
Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"
"""

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.

----------
keywords: +patch
title: open() and codecs.open() treat form-feed differently -> linebreak sequences should be better documented
Added file: http://bugs.python.org/file23069/linebreakdoc.py27.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12855>
_______________________________________