Reliably getting/setting the (current) encoding name?

Mike C. Fletcher mcfletch at rogers.com
Fri Jan 31 00:06:36 EST 2003


I've just discovered a bug in one of my libraries where I bomb out when 
a unicode value is passed which uses characters > 127 in the local 
code-page (e.g. u"á" (a-accent-ague (sorry for the spelling, been a 
while since French class)).  These values are being encoded by the GUI 
library as unicode values (under certain versions of the GUI library), 
but the application needs the values in local-code-page str objects 
(yes, eventually it will all move to Unicode, but that's a much bigger 
project than I can tackle just at the moment).

So, thinks I, just call value.encode() on the objects.
    UnicodeError: ASCII encoding error: ordinal not in range(128)
Weird, thinks I, why isn't the platform default encoding available, oh 
well, let's use locale to set the default locale and then move on with 
our lives:
    import locale
    locale.setlocale(locale.LC_ALL, '')
but value.encode() still gives:
    UnicodeError: ASCII encoding error: ordinal not in range(128)

Well, thinks I, getting a little exasperated, let's just find out what 
the name of the default encoding is and pass that explicitly to encode 
every place we do a string-formatting print or log or attribute-set (ick):
    locale.getdefaultlocale()[1]
seems to be what I'm after.  It's a Windows-specific name for the 
encoding (cp1252), but I hope in my heart that it'll be platform 
appropriate and == an encoding-name somewhere.  

But wait, this library is supposed to be reused in other applications. 
 Shouldn't I be using the current locale instead? No problem, locale has 
a handy-dandy getlocale() function.  Unfortunately, what that returns is 
a bare number, rather than an encoding-name.  Sure, I could just add 
"cp" to everything, but I'm pretty sure that'd break for non-MS platforms.

The codecs module doesn't appear to have a "get default codec name" 
function.  So I'm asking the question:

    How does one reliably get the currently-active default codec name 
(i.e. the one that reflects the currently-set locale) as suitable for 
use with uncode.encode( ... )?

and idealy:

    How does one set the default encoding to be used by:
        str( unicode )
    and
        "My log message: value = %s"%( unicode )
    ?

to avoid ASCII-encoding errors everywhere, without having to add 
conversion code all over the joint?

Thoughts appreciated,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/








More information about the Python-list mailing list