Unicode and Zipfile problems

Gerson Kurz gerson.kurz at t-online.de
Fri Nov 7 00:41:29 EST 2003


>   No, these applications link to what Microsoft calls the "ANSI" versions
>of the Windows API. That is important because your initial problem would not
>have occurred if all your file names were ASCII. Instead they contained
>character 0x88 which was probably a circumflex modifier although it could
>have been a Euro symbol.

The filename was DENOMALG.INI. The problem was not with the filename,
it was that the binary struct.packed header in front of it contained
0x88. The problem occured because Python thinks that if one tiny part
of an expression is unicode - even if all unicode characters of that
tiny part are <0x7F - everything in the expression has to be promoted
to unicode, BUT woe is you if the byte strings have anything >0x7F.
You can get an exception if you simply write chr(0xFF)+u"" - an empty
string!!!

>If you are interested in enhancing Python then produce a concrete
>proposal.

I am going to think about that. Here are some ideas that immediately
come to my mind.

There should be two modes, easily configurable: "dontcare" and
"international". 

"dontcare" can be implemented easily enough by changing site.py (I
only have to find out how to remove that s***** DepracationWarning
introduced in 2.3 for source with german comments, for christs sake!
comments!). 

For "international" mode, its ok to be as strict as now. But I also
think there should be a more elaborate concept. You know, you don't
internationalize your application simply by using unicode strings. 

- there is no monetary data type in Python (I see that this is a
discussion that has been on the list a few days ago)
- in my "dayjob" C++ application that is used across europe and
america, I use SCU (smallest currency unit = cent, pfennig) but I need
different ways of displaying that information. For example, take the
amount 100000. In the standard output, you want to see 1.000,00 Euro.
If you edit a field that contains this value, you want to edit 1000,00
(that is, you don't want either the Euro or the "."). If you have a
value that represents notes, you want to see 200 Euro and not 200,00
Euro (nobody talks that way about notes). 
- the locale module relys on the C implementation, not the OS
implementation (the OS settings should take higher priority), has
documentation errors, and is generally, well, ugly. strftime, need I
say more? 
- the gettext API sucks, especially if you need to change text along
the way. Python is an interpreted language, why not simply use one or
more module "strings.py" which contain the strings which are refered
to by identifiers? 

But I digress.

You know, Python is a great language, it is a lot better than C or C++
and I love it and everything. And the core developers are doing a
really good job, and I am very thankful for that, and I mean it. 

It is just that sometimes I get carried away because things turn out
not to be as intuitive as they should be - and Python spoils you, it
really does, because for *most* things, Python is as intuitive as it
can get, so things that are maybe too complex to be intuitively simple
suddenly start to annoy you.

 






More information about the Python-list mailing list