I like Unicode more than I used to...

SUZUKI Hisao suzuki611 at oki.com
Mon Feb 24 22:47:48 EST 2003


In message <DHo6a.224849$0v.6336159 at news1.tin.it>, Alex Martelli wrote:
> gabor wrote:
>    ...
> > is there a way to specify in python some kind of default-encoding? i
> > mean can i somehow tell him that when printing unicode strings, i always
> > want to use utf-8, so that .encode('utf-8') isn't necessary?
> 
> Yes, on a site-wide basis: see the file site.py in your site-packages
> directory.  You can either change the blocks guarded by "if 0:" in
> that file itself, or add a sitecustomize.py file in the same
> directory that calls sys.setdefaultencoding('utf-8') with the
> same effect.

The file "site.py" is in "../site-packages" while you can put your
"sitecustomize.py" in "site-packages".

> Of course, if you do that you're likely to write programs that
> will only run on your site (or other similarly customized) and
> not be suitable for general distribution to other sites.  But
> if that's OK with you, Python lets you do it.

I wonder you said it from your actual experiences if you don't mind me
saying so.  My experiences tell me another way.

For example, what if I write my program as follows?

   u = s.decode('euc-jp')
   ...(some work)...
   print u.encode('euc-jp')

Perhaps you cannot use it.  What if I write it as follows?

   u = s.decode()
   ...(some work)...
   print u.encode()

Perhaps you can use it for your data now, if you set your "site.py"
appropriately.

It is important that you do not guess what encoding your users choose
if you are going to distribute your program to other sites.  Relying
on the default is a fairly good practice for many applications.  (Of
course, you should test your program under the 'ascii' environment at
least before you distribute it.)

The next to best way is to prepare an ad hoc way to customize the
encoding.  For exampe, start your program as follows:

   ENCODING = 'euc-jp'   # Please replace 'euc-jp' by your encoding
   def dec(s): return s.decode(ENCODING)
   def enc(u): return u.encode(ENCODING)
   ...
   u = dec(s)
   ...(some work)...
   print enc(u)   

It is sometimes necessary if you allow users to use ASCII-incompatible
encodings.  The defect is that it is 'ad hoc' and lacks consistency
over various programs generally.

-- SUZUKI Hisao





More information about the Python-list mailing list