[I18n-sig] codec aliases

M.-A. Lemburg mal@lemburg.com
Tue, 12 Dec 2000 10:59:24 +0100


Tamito KAJIYAMA wrote:
> 
> M.-A. Lemburg wrote:
> |
> | > One problem is that the alias "japanese.jis-7" does not work
> | > unless the corresponding original name "japanese.iso-2022-jp"
> | > have been referred once.  This is because the alias is defined
> | > by means of getaliases() in japanese/iso_2022_jp.py, and this
> | > module is not imported when the first time the original name is
> | > referred.  Is there a work-around for this problem?
> |
> | The only "work-around" I know of (which doesn't involve some
> | kind of boot code) is by defining aliases via almost empty
> | module which redirect the search function to the correct
> | codec, e.g.
> |
> | codec_alias.py:
> | ---------------
> | from codec_alias_target import *
> 
> I'm not sure how your work-around works.  How is codec_alias.py
> used?  Is that intended to be imported in site.py?
> 
> I also think that aliases cannot be defined only by importing
> a codec module, since the aliases are defined by means of
> getaliases(), and this function is not invoked until the
> original name corresponding to the aliases is looked up first.
> 
> I wonder if I need to put a call of codecs.register() somewhere
> in the modularized codecs...

The above scenario should enable you to write one codec,
say "main_codec.py" which provides the Real Thing and then
allow you to add aliases to this codec by adding any number
of additionl redirection codec modules, e.g. "codec_alias_1.py",
"codec_alias_2.py" which all contain just one line:

from main_codec import *
 
Now, when the search function is queried for e.g.
"codec-alias-1" it will import codec_alias_1.py and then
apply the usual processing (even register the additional
aliases). However, the functionality is provided by
main_codec.py.

There's no need to call any registration function prior
to using one of the codec aliases in this setup. The import
mechanism will play the part of the aliasing engine in this
case.

> | > The other problem is that hyphens and underscores are
> | > significant in an alias, although they are not in an original
> | > name.  A work-around is to define all combinations of hyphens
> | > and underscores for an alias (e.g. defining both
> | > "japanese.jis-7" and "japanese.jis_7"), but this seems not a
> | > good idea for me.
> |
> | Codec aliases returned by codec.getaliases() must always use
> | the underscore naming scheme.
> |
> | The standard search function will convert hyphens to underscores
> | *before* applying the alias mapping, so there's no need to worry
> | about different combinations of hyphens and underscores in
> | the alias names (unless I've overlooked something here).
> 
> Returning names with underscores in getaliases() seems not
> sufficient.  In encodings/__init__.py:
> 
> def search_function(encoding):
>     ...
>     # Cache the encoding and its aliases
>     _cache[encoding] = entry
>     try:
>         codecaliases = mod.getaliases()
>     except AttributeError:
>         pass
>     else:
>         for alias in codecaliases:
>             _cache[alias] = entry
>     return entry
> 
> The names returned by mod.getaliases() are put into _cache as it
> is, so equivalent names with hyphens will not be defined.

So I have indeed overlooked something. Thanks for pointing me
at it (I don't currently have time to test what I write here,
so please bare with me). The aliases should really be added to
the aliases.aliases dictionary instead of _cache and also prevent
overwrites of already existing aliases (since these would cause 
strange and unwanted effects).

I'll think about this some more and check in a patch to implement
the above scheme.

Thanks again,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


BTW: Your email occasionally bounces -- e.g. the last message
I sent you got back to me (fortunately, you still seem to get the
i18n-sig message).