[Python-checkins] bpo-36502: Correct documentation of str.isspace() (GH-15019) (GH-15296)

Miss Islington (bot) webhook-mailer at python.org
Mon Aug 19 06:10:24 EDT 2019


https://github.com/python/cpython/commit/0fcdd8d6d67f57733203fc79e6a07a89b924a390
commit: 0fcdd8d6d67f57733203fc79e6a07a89b924a390
branch: 3.7
author: Miss Islington (bot) <31488909+miss-islington at users.noreply.github.com>
committer: GitHub <noreply at github.com>
date: 2019-08-19T03:10:14-07:00
summary:

bpo-36502: Correct documentation of str.isspace() (GH-15019) (GH-15296)


The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.

Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.

Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so.  The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.

Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
(cherry picked from commit 8c1c426a631ba02357112657193f82c58d3e08b4)

Co-authored-by: Greg Price <gnprice at gmail.com>

files:
M Doc/library/stdtypes.rst
M Lib/test/test_unicode.py

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index d35c171aba39..b9581ce1c9ae 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -1731,9 +1731,13 @@ expression support in the :mod:`re` module).
 .. method:: str.isspace()
 
    Return true if there are only whitespace characters in the string and there is
-   at least one character, false otherwise.  Whitespace characters  are those
-   characters defined in the Unicode character database as "Other" or "Separator"
-   and those with bidirectional property being one of "WS", "B", or "S".
+   at least one character, false otherwise.
+
+   A character is *whitespace* if in the Unicode character database
+   (see :mod:`unicodedata`), either its general category is ``Zs``
+   ("Separator, space"), or its bidirectional class is one of ``WS``,
+   ``B``, or ``S``.
+
 
 .. method:: str.istitle()
 
diff --git a/Lib/test/test_unicode.py b/Lib/test/test_unicode.py
index 1aad9334074c..4ebd82d3e0c2 100644
--- a/Lib/test/test_unicode.py
+++ b/Lib/test/test_unicode.py
@@ -11,6 +11,7 @@
 import operator
 import struct
 import sys
+import unicodedata
 import unittest
 import warnings
 from test import support, string_tests
@@ -615,11 +616,21 @@ def test_isspace(self):
         self.checkequalnofix(True, '\u2000', 'isspace')
         self.checkequalnofix(True, '\u200a', 'isspace')
         self.checkequalnofix(False, '\u2014', 'isspace')
-        # apparently there are no non-BMP spaces chars in Unicode 6
+        # There are no non-BMP whitespace chars as of Unicode 12.
         for ch in ['\U00010401', '\U00010427', '\U00010429', '\U0001044E',
                    '\U0001F40D', '\U0001F46F']:
             self.assertFalse(ch.isspace(), '{!a} is not space.'.format(ch))
 
+    @support.requires_resource('cpu')
+    def test_isspace_invariant(self):
+        for codepoint in range(sys.maxunicode + 1):
+            char = chr(codepoint)
+            bidirectional = unicodedata.bidirectional(char)
+            category = unicodedata.category(char)
+            self.assertEqual(char.isspace(),
+                             (bidirectional in ('WS', 'B', 'S')
+                              or category == 'Zs'))
+
     def test_isalnum(self):
         super().test_isalnum()
         for ch in ['\U00010401', '\U00010427', '\U00010429', '\U0001044E',



More information about the Python-checkins mailing list