[New-bugs-announce] [issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass
Greg Price
report at bugs.python.org
Sun Aug 4 23:55:35 EDT 2019
New submission from Greg Price <gnprice at gmail.com>:
I spent some time yesterday on #18236, and I have a patch for it.
Most of that work happens in the script Tools/unicode/makeunicode.py , and along the way I made several changes there that I found made it somewhat nicer to work on, and I think will help other people reading that script too. I'd like to try to merge those improvements first.
The main changes are:
* As the script has grown over the years, it's gained many copies and reimplementations of logic to parse the standard format of the Unicode character database. I factored those out into a single place, which makes the parsing code shorter and the interesting parts stand out more easily.
* The main per-character record type in the script's data structures is a length-18 tuple. Using the magic of dataclasses, I converted this so that e.g. the code says `record.numeric_value` instead of `record[8]`.
There's no radical restructuring or rewrite here; this script has served us well. I've kept these changes focused where there's a high ratio of value, in future ease of development, to cost, in a reviewer's effort as well as mine.
I'll send PRs of my changes shortly.
----------
components: Unicode
messages: 349020
nosy: Greg Price, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: Refactor makeunicodedata.py: dedupe parsing, use dataclass
type: enhancement
versions: Python 3.9
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37760>
_______________________________________
More information about the New-bugs-announce
mailing list