Unicode and Zipfile problems

Martin v. Löwis martin at v.loewis.de
Fri Nov 7 16:05:15 EST 2003


gerson.kurz at t-online.de (Gerson Kurz) writes:

> Thanks, but could you please elaborate a little on that? Are you
> suggesting I write my own import hook to filter the warning there? 

import warnings
warnings.filterwarnings('ignore', 'Non-ASCII character .*/peps/pep-0263',
    DeprecationWarning)

(see http://groups.google.com/groups?selm=ZUAeb.149502%24hE5.5065101%40news1.tin.it)

> Basically, what is annoying about the way python handles unicode is
> this:

We are back to square zero now. What specifically do you suggest to
change?

> a) you get warnings when you do stuff you've been doing for years
> without ever getting any warning. 

Yes, but that happens not only for Unicode strings; try importing
regex for another example.

> b) it forces you to be correct - even when you don't care. 

Yes, but it does so all over the place:

x = 3.14

is different, in Python, from

x = 3,14

You have to know whether you mean a decimal point, or a tuple comma,
and you have to know how to spell either.

Python does not automatically correct mistakes that you make.

> Maybe its time for a "UNICODE for dummies" section in the python
> manual. But maybe its also time for a more relaxed way of handling all
> that?

Contributions are welcome.

> So, back to the two ways in which the Python unicode handling is
> annoying - it would be fine if you could easily change "strict"
> encoding to "relaxed" (I'm not sure about that, but toying with my
> dontcare.py I see that there is a parameter to the en/coding
> functions, so maybe one could set default encoding = OS locale
> encoding (see below) and disable exceptions when something goes wrong.

You can replace "strict" with "replace", in Unicode error handling, if
this is what you are suggesting.

Of course, that would not have helped in your original problem, as it
would have replaced the header bytes of the zip header with question
marks, when converting the header to a Unicode object.

But let's assume we support a way of setting the default error
handling to "replace".

> a) you DONT get warnings when you do stuff you've been doing for years
> without ever getting any warning. 

Right.

> b) it DOES NOT force you to be correct - even when you don't care. 

Wrong. The code that you had been using for years still would stop
working, and you would not get an exception; instead, you would get a
corrupted zip file.

> so I at least would be happy with that. 

I don't believe you would.

> >Yes, please do. What is the difference between the C implementation
> >and the OS implementation?
> 
> a) Last time I checked, strftime gives you a date and time
> representation for the current locale. As in: one date and time
> representation ("%x %X"). However, you have like long and short dates.
> Ask the simple question: do you put the time before the date or after?

Time first. What does that have to do with C implementation and OS
implementation?

> The OS version is a set of API functions called the "National
> Language Support Functions", which contains the functions
> GetDateFormat and GetTimeFormat which have a completely different
> syntax and are used by other applications (such as, yuk, VB).

You are apparently talking about Microsoft Windows here. Yes, that
particular OS has an API that is different from standard
C. Fortunately, Microsoft compilers use the underlying API to expose
OS functionality. So by calling the C library, you call OS routines.

> c) I run an english version of Windows 2000, but I have german locale
> settings. Windows distinguishes between "system locale" and "user
> locale". Many applications, virtually all of them, use the user lcoale
> settings (that is, german). Python uses the C default which is - well
> I'm not really sure whether or not its english, but it certainly isn't
> german by (OS) default.

Yes, on startup, Python is not locale-aware.

> d) The documentation for the locale format says you should set "de" or
> "de_DE", but "GERMAN" is the actual locale for "german". But how do
> you know? And how do you add functionality to your application to
> always use the users locale (ie German on my english system - as any
> other app including stupid MFC apps can do)? 

You don't give any locale name. If you want the user's settings to be
used, invoke

locale.setlocale(locale.LC_ALL, "")

That gives you what I think you mean by "OS functions".

Regards,
Martin




More information about the Python-list mailing list