[Python-3000] String comparison

Rauli Ruohonen rauli.ruohonen at gmail.com
Fri Jun 8 02:26:41 CEST 2007


On 6/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > FWIW, I don't buy that normalization is expensive, as most strings are
> > in NFC form anyway, and there are fast checks for that (see UAX#15,
> > "Detecting Normalization Forms"). Python does not currently have
> > a fast path for this, but if it's added, then normalizing everything
> > to NFC should be fast.
>
> That would be useful to have, anyway. Would you like to contribute it?

I implemented it for all normalizations in the most straightforward way I
could think of, which was adding a field to _PyUnicode_DatabaseRecord,
generating data for it in makeunicodedata.py from
DerivedNormalizationProps.txt of UCD 4.1, and writing a function
is_normalized which uses it. The function is called from
unicodedata_normalized. I made the modifications against py3k-struni.
Does this sound reasonable?

I haven't made any contributions to Python before, but I heard attempting
such hazardous activity involves lots of hard knocks :-) Where should I
send the patch? I saw some patches here in other threads, but then again
http://www.python.org/dev/patches/ tells to use SourceForge.


More information about the Python-3000 mailing list