[Python-checkins] cpython (2.7): Issue #12855: Document what exactly unicode.splitlines() splits on

Tue Jun 14 21:43:40 EDT 2016

https://hg.python.org/cpython/rev/2e6fda267a20
changeset:   102047:2e6fda267a20
branch:      2.7
parent:      102043:e9d023b7d676
user:        Martin Panter <vadmium+py at gmail.com>
date:        Wed Jun 01 09:39:46 2016 +0000
summary:
  Issue #12855: Document what exactly unicode.splitlines() splits on

Also synchronize with Python 3 examples. Based on patches by Matthew Boehm
and Alexander Schrijver.

files:
  Doc/library/stdtypes.rst |  63 ++++++++++++++++++++++++++-
  1 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -1217,13 +1217,68 @@
    Line breaks are not included in the resulting list unless *keepends* is
    given and true.
 
-   For example, ``'ab c\n\nde fg\rkl\r\n'.splitlines()`` returns
-   ``['ab c', '', 'de fg', 'kl']``, while the same call with ``splitlines(True)``
-   returns ``['ab c\n', '\n', 'de fg\r', 'kl\r\n']``.
+   Python recognizes ``"\r"``, ``"\n"``, and ``"\r\n"`` as line boundaries for
+   8-bit strings.
+
+   For example::
+
+      >>> 'ab c\n\nde fg\rkl\r\n'.splitlines()
+      ['ab c', '', 'de fg', 'kl']
+      >>> 'ab c\n\nde fg\rkl\r\n'.splitlines(True)
+      ['ab c\n', '\n', 'de fg\r', 'kl\r\n']
 
    Unlike :meth:`~str.split` when a delimiter string *sep* is given, this
    method returns an empty list for the empty string, and a terminal line
-   break does not result in an extra line.
+   break does not result in an extra line::
+
+      >>> "".splitlines()
+      []
+      >>> "One line\n".splitlines()
+      ['One line']
+
+   For comparison, ``split('\n')`` gives::
+
+      >>> ''.split('\n')
+      ['']
+      >>> 'Two lines\n'.split('\n')
+      ['Two lines', '']
+
+.. method:: unicode.splitlines([keepends])
+
+   Return a list of the lines in the string, like :meth:`str.splitlines`.
+   However, the Unicode method splits on the following line boundaries,
+   which are a superset of the :term:`universal newlines` recognized for
+   8-bit strings.
+
+   +-----------------------+-----------------------------+
+   | Representation        | Description                 |
+   +=======================+=============================+
+   | ``\n``                | Line Feed                   |
+   +-----------------------+-----------------------------+
+   | ``\r``                | Carriage Return             |
+   +-----------------------+-----------------------------+
+   | ``\r\n``              | Carriage Return + Line Feed |
+   +-----------------------+-----------------------------+
+   | ``\v`` or ``\x0b``    | Line Tabulation             |
+   +-----------------------+-----------------------------+
+   | ``\f`` or ``\x0c``    | Form Feed                   |
+   +-----------------------+-----------------------------+
+   | ``\x1c``              | File Separator              |
+   +-----------------------+-----------------------------+
+   | ``\x1d``              | Group Separator             |
+   +-----------------------+-----------------------------+
+   | ``\x1e``              | Record Separator            |
+   +-----------------------+-----------------------------+
+   | ``\x85``              | Next Line (C1 Control Code) |
+   +-----------------------+-----------------------------+
+   | ``\u2028``            | Line Separator              |
+   +-----------------------+-----------------------------+
+   | ``\u2029``            | Paragraph Separator         |
+   +-----------------------+-----------------------------+
+
+   .. versionchanged:: 2.7
+
+      ``\v`` and ``\f`` added to list of line boundaries.
 
 
 .. method:: str.startswith(prefix[, start[, end]])

-- 
Repository URL: https://hg.python.org/cpython