[Patches] [ python-Patches-626485 ] Support Unicode normalization
noreply@sourceforge.net
noreply@sourceforge.net
Sat, 23 Nov 2002 14:08:53 -0800
Patches item #626485, was opened at 2002-10-21 21:02
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626485&group_id=5470
Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Martin v. Löwis (loewis)
Summary: Support Unicode normalization
Initial Comment:
This patch adds support for the normalization forms
NFC, NFKC, NFD, NFKD. It passes the
NormalizationTest-3.2.0.txt tests.
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2002-11-23 23:08
Message:
Logged In: YES
user_id=21627
Thanks! Committed as
libunicodedata.tex 1.4
test_normalization.py 1.1
NEWS 1.541
unicodedata.c 2.24
unicodedata_db.h 1.7
makeunicodedata.py 1.15
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2002-11-23 22:50
Message:
Logged In: YES
user_id=38388
Looks good (I don't have time to review the patch
in detail, though). Please check it in.
Thanks.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-11-23 16:19
Message:
Logged In: YES
user_id=21627
This version changes the indentation to 4 spaces. Are any
further changes needed?
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-10-25 17:03
Message:
Logged In: YES
user_id=21627
This patches addresses your issues in the following way:
- single API: done.
- add _getrecord_ex: done. Rename to getunicoderecord:
since this is a static function in unicodedata.c, this
renaming
would not add that much information, so not done.
- #ifdef Py_UNICODE_WIDE. I could not spot any place where
this is necessary.
- Drop -Latest: done.
- adjust skip message: done.
- reformat to 4 spaces: not done, I think PEP 7 should be
followed.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2002-10-23 12:36
Message:
Logged In: YES
user_id=38388
One more minor nit: the indentation in the C file is 4
chars, please reindent your code accordingly
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2002-10-23 12:27
Message:
Logged In: YES
user_id=38388
The patch looks Ok except for a few nits:
* I'd rather like a single API normalize(form) which takes
the form as string argument instead of NFKD, etc.
* __getrecord should be renamed to _getrecord_ex;
perhaps both should use a different name altogether,
e.g. getunicoderecord
* I think you have to add some #ifdef Py_UNICODE_WIDE
in the code to avoid compiler warnings for narrow builds
about non-const if expressions being always true due to
size limits.
* The filenames you are using should not include the '-Latest'
suffix. If you download the files from unicode.org via FTP
they don't have this extension.
* The skip test message should include a reference of where to
get the test file from, ie.
ftp://ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt
Thanks for working on this !
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626485&group_id=5470