[Python-checkins] [3.11] gh-93575: Use correct way to calculate PyUnicode struct sizes (GH-93602) (GH-93613)

tiran webhook-mailer at python.org
Wed Jun 8 16:21:25 EDT 2022


https://github.com/python/cpython/commit/47a7855f416affceb8d6b33f131f3f44457ac589
commit: 47a7855f416affceb8d6b33f131f3f44457ac589
branch: 3.11
author: Christian Heimes <christian at python.org>
committer: tiran <christian at python.org>
date: 2022-06-08T22:21:20+02:00
summary:

[3.11] gh-93575: Use correct way to calculate PyUnicode struct sizes (GH-93602) (GH-93613)

* gh-93575: Use correct way to calculate PyUnicode struct sizes

* Add comment to keep test_sys and test_unicode in sync

* Fix case code < 256.
(cherry picked from commit 5442561c1a094b68900198bade616da9ed509ac8)

Co-authored-by: Christian Heimes <christian at python.org>

files:
A Misc/NEWS.d/next/Tests/2022-06-08-14-17-59.gh-issue-93575.Xb2LNB.rst
M Lib/test/test_sys.py
M Lib/test/test_unicode.py

diff --git a/Lib/test/test_sys.py b/Lib/test/test_sys.py
index 8aaf23272607b..87ff4a2deb593 100644
--- a/Lib/test/test_sys.py
+++ b/Lib/test/test_sys.py
@@ -1538,6 +1538,7 @@ class newstyleclass(object): pass
         samples = ['1'*100, '\xff'*50,
                    '\u0100'*40, '\uffff'*100,
                    '\U00010000'*30, '\U0010ffff'*100]
+        # also update field definitions in test_unicode.test_raiseMemError
         asciifields = "nnbP"
         compactfields = asciifields + "nPn"
         unicodefields = compactfields + "P"
diff --git a/Lib/test/test_unicode.py b/Lib/test/test_unicode.py
index c98fabf8bc9b5..90bd75f550dff 100644
--- a/Lib/test/test_unicode.py
+++ b/Lib/test/test_unicode.py
@@ -2370,20 +2370,19 @@ def test_expandtabs_optimization(self):
         self.assertIs(s.expandtabs(), s)
 
     def test_raiseMemError(self):
-        if struct.calcsize('P') == 8:
-            # 64 bits pointers
-            ascii_struct_size = 48
-            compact_struct_size = 72
-        else:
-            # 32 bits pointers
-            ascii_struct_size = 24
-            compact_struct_size = 36
+        asciifields = "nnbP"
+        compactfields = asciifields + "nPn"
+        ascii_struct_size = support.calcobjsize(asciifields)
+        compact_struct_size = support.calcobjsize(compactfields)
 
         for char in ('a', '\xe9', '\u20ac', '\U0010ffff'):
             code = ord(char)
-            if code < 0x100:
+            if code < 0x80:
                 char_size = 1  # sizeof(Py_UCS1)
                 struct_size = ascii_struct_size
+            elif code < 0x100:
+                char_size = 1  # sizeof(Py_UCS1)
+                struct_size = compact_struct_size
             elif code < 0x10000:
                 char_size = 2  # sizeof(Py_UCS2)
                 struct_size = compact_struct_size
@@ -2395,8 +2394,18 @@ def test_raiseMemError(self):
             # be allocatable, given enough memory.
             maxlen = ((sys.maxsize - struct_size) // char_size)
             alloc = lambda: char * maxlen
-            self.assertRaises(MemoryError, alloc)
-            self.assertRaises(MemoryError, alloc)
+            with self.subTest(
+                char=char,
+                struct_size=struct_size,
+                char_size=char_size
+            ):
+                # self-check
+                self.assertEqual(
+                    sys.getsizeof(char * 42),
+                    struct_size + (char_size * (42 + 1))
+                )
+                self.assertRaises(MemoryError, alloc)
+                self.assertRaises(MemoryError, alloc)
 
     def test_format_subclass(self):
         class S(str):
diff --git a/Misc/NEWS.d/next/Tests/2022-06-08-14-17-59.gh-issue-93575.Xb2LNB.rst b/Misc/NEWS.d/next/Tests/2022-06-08-14-17-59.gh-issue-93575.Xb2LNB.rst
new file mode 100644
index 0000000000000..98d15328a087a
--- /dev/null
+++ b/Misc/NEWS.d/next/Tests/2022-06-08-14-17-59.gh-issue-93575.Xb2LNB.rst
@@ -0,0 +1,4 @@
+Fix issue with test_unicode test_raiseMemError. The test case now use
+``test.support.calcobjsize`` to calculate size of PyUnicode structs.
+:func:`sys.getsizeof` may return different size when string has UTF-8
+memory.



More information about the Python-checkins mailing list