[Python-checkins] r42954 - in python/trunk: Doc/lib/libunicodedata.tex Include/ucnhash.h Lib/encodings/idna.py Lib/stringprep.py Modules/unicodedata.c

Fri Mar 17 11:09:04 CET 2006

Martin v. Löwis wrote:
> [as Thomas points out, this is on python-checkins, so continuing in
>  English]
> 
>> Falsch, weil der Patch wesentlich komplexer ist, als zur
>> Lösung des Problems nötig gewesen wäre und man nun auch in Zukunft
>> stets mehrere Versionen der Datenbank bereithalten muß, anstatt
>> einfach mehrere Module dafür bereitzuhalten, die je nach Bedarf
>> hinzugeladen werden können.
> 
> Well, "the problem" to be solved was not merely to provide two versions
> of the database, but also in a space-efficient way. All this effort
> in trying to squeeze the size of the data would be wasted when it
> then gets double just because two versions of the database must
> be provided.

Since the big tables of the database are static C data,
only the portions needed would ever get swapped into
memory, so this argument is rather weak.

Also, most users won't ever use the IDNA codec, so they'd
benefit from not having the extra complexity around.

>> Es wird auch nicht möglich sein, die alten Versionen ohne Problem
>> abzutrennen, so daß bei einer Erweiterung der Datenbank um weitere
>> Felder oder Informationen, Probleme mit der Synchronisierung der
>> Datenbank entstehen werden.
> 
> There is no need to strip the old version. Parts of the library
> rely on the old version specifically, and these parts are not going
> to go away for a foreseeable future, nor does the need go away that
> these libraries need the version 3.2 of the Unicode database.
> IDNA is simply not going to change in that respect, for several
> years to come.
> 
> *If* there is a need to strip off 3.2 at some point, this is
> very easily done through a slight modification to
> makeunicodedata.py.

You're missing the point:

With the old version available in a separate module, users who
still need the old version could continue to compile it for
themselves.

If you change makeunicodedata.py, then there's no way back
for these users.

Given that the stringprep RFC has started out by pointing
to a specific Unicode version, it is likely that these
strong binding to specific versions are going to happen
again in the future.

This makes it nearly impossible to remove the old database
version support, since there's always be some users that
will have to rely on the availability of the old database
versions.

>>> Das ist ja genau der Trick: sie müssen das nicht. Die Unterstützung
>>> von Unicode 3.2 kostet nur 18kB.
>>
>> Das ist in der Tat wenig.
> 
> That's because only the changed records are collected, plus a list
> of characters that were unassigned in 3.2 but are defined in 4.1.
> 
> In principle, there should not be a single changed record. In practive,
> a few records have changed - mostly changes to the character category.
> As a matter of principle, the names of a character never change in
> Unicode (this is a promise the consortium and ISO make), and, as a
> similar principle, the normalization never changes except for clear
> errors.
> 
> There are only five characters for which normalization changed
> between between 3.2 and 4.1; I generate a C function for these.
> Interestingly enough, these changes are one of the primary reasons
> why some people in IETF despise the notion of updating IDNA:
> This would be a change in wire protocol, with potential
> security implications (i.e. it might allow for phishing). In
> these cases, the potential for phishing is really minimal -
> but it exists, which means proposals to update IDNA will meet
> strong resistance.
> 
> It might be possible to reduce the table of changes even further,
> using a three-level trie, if desired.

As you've pointed out, the size is really irrelevant.

What about access speed ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 17 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::