[New-bugs-announce] [issue37758] unicodedata checksum-tests only test 1/17th of Unicode's codepoints

Sun Aug 4 21:06:02 EDT 2019

New submission from Greg Price <gnprice at gmail.com>:

The unicodedata module has two test cases which run through the database and make a hash of its visible outputs for all codepoints, comparing the hash against a checksum.  These are helpful regression tests for making sure the behavior isn't changed by patches that didn't intend to change it.

But Unicode has grown since Python first gained support for it, when Unicode itself was still rather new.  These test cases were added in commit 6a20ee7de back in 2000, and they haven't needed to change much since then... but they should be changed to look beyond the Basic Multilingual Plane (`range(0x10000)`) and cover all 17 planes of Unicode's final form.

Spotted in discussion on GH-15019 (https://github.com/python/cpython/pull/15019#discussion_r308947884 ).  I have a patch for this which I'll send shortly.

----------
components: Tests
messages: 349014
nosy: Greg Price
priority: normal
severity: normal
status: open
title: unicodedata checksum-tests only test 1/17th of Unicode's codepoints
type: enhancement
versions: Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37758>
_______________________________________