[Python-Dev] default encodings (was: Internationalization Toolkit)

Greg Stein gstein@lyra.org
Thu, 11 Nov 1999 15:14:55 -0800 (PST)


On Thu, 11 Nov 1999, Mark Hammond wrote:
> Marc writes:
> > > modes are evil.  python is not perl.  etc.
> >
> > But a requirement by the customer... they want to be able to
> > set the locale
> > on a per thread basis. Not exactly my preference (I think all locale
> > settings should be passed as parameters, not via globals).
> 
> Sure - that is what this customer wants, but we need to be clear about
> the "best thing" for Python generally versus what this particular
> client wants.

Ha! I was getting ready to say exactly the same thing. Are building Python
for a particular customer, or are we building it to Do The Right Thing?

I've been getting increasingly annoyed at "well, HP says this" or "HP
wants that." I'm ecstatic that they are a Consortium member and are
helping to fund the development of Python. However, if that means we are
selling Python's soul to corporate wishes rather than programming and
design ideals... well, it reduces my enthusiasm :-)

>...
> I agree that having a default encoding that can be changed is a bad
> idea.  It may make 3 line scripts that need to print something easier
> to work with, but at the cost of reliability in large systems.  Kinda
> like the existing "locale" support, which is thread specific, and is
> well known to cause these sorts of problems.  The end result is that
> in your app, you find _someone_ has changed the default encoding, and
> some code no longer works.  So the solution is to change the default
> encoding back, so _your_ code works again.  You just know that whoever
> it was that changed the default encoding in the first place is now
> going to break - but what else can you do?

Yes! Yes! Example #2.

My first example (import hooks) was shrugged off by some as "well, nobody
uses those." Okay, maybe people don't use them (but I believe that is
*because* of this kind of problem).

In Mark's example, however... this is a definite problem. I ran into this
when I was building some code for Microsoft Site Server. IIS was setting a
different locale on my thread -- one that I definitely was not expecting.
All of a sudden, strlwr() no longer worked as I expected -- certain
characters didn't get lower-cased, so my dictionary lookups failed because
the keys were not all lower-cased.

Solution? Before passing control from C++ into Python, I set the locale to
the default locale. Restored it on the way back out. Extreme measures, and
costly to do, but it had to be done.

I think I'll pick up Fredrik's phrase here...

(chanting) "Modes Are Evil!"  "Modes Are Evil!"  "Down with Modes!"

:-)

> Having a fixed, default encoding may make life slightly more difficult
> when you want to work primarily in a different encoding, but at least
> your system is predictable and reliable.

*bing*

I'm with Mark on this one. Global modes and state are a serious pain when
it comes to developing a system.

Python is very amenable to utility functions and classes. Any "customer"
can use a utility function to manually do the encoding according to a
per-thread setting stashed in some module-global dictionary (map thread-id
to default-encoding). Done. Keep it out of the interpreter...

Cheers,
-g

--
Greg Stein, http://www.lyra.org/