Unicode and Zipfile problems

Neil Hodgson nhodgson at bigpond.net.au
Thu Nov 6 15:54:18 EST 2003


Gerson Kurz:

> Of course I don't have the source for these, but
> the Dependency Viewer (from the Microsoft SDK) will show you that all
> of these link with the ASCII-Versions of the Windows API.

   No, these applications link to what Microsoft calls the "ANSI" versions
of the Windows API. That is important because your initial problem would not
have occurred if all your file names were ASCII. Instead they contained
character 0x88 which was probably a circumflex modifier although it could
have been a Euro symbol.

> Seems like
> there is a lot of broken apps out there! And the most shocking of all
> - this holds true even of python23.dll: ShellExecuteA,
> RegQueryValueExA, LoadStringA, LoadLibraryExA - its all ASCII!
> Somebody better call for a major unicode cleanup!

   We will gradually work on increasing the scope of Unicode support in
Python. For example, os.popen would be a good candidate for receiving
Unicode support.

> But OK, I agree, the subject is somewhat boring - even though every
> week somebody else runs into problems with this (see the thread
> "Strange problem with encoding" from today)  there will probably be no
> change introduced in Python at this point on this subject.

   If you are interested in enhancing Python then produce a concrete
proposal.

   From my point of view, Unicode has been a great source of simplification
as it has reduced the need for code conversion and potential for loss of
information. In the future, more of the software infrastructure will be able
to handle Unicode. ZIP files produced by some tools can already store
Unicode file names although there is no published standard for this.

   Example product:
http://www.componentsource.com/Catalog/XceedZipCompressionLibrary_505440.htm
   "Stores and retrieves the latest zip file format extensions, allowing
Unicode filenames and NT file attributes, extra time stamps and security
permissions to be stored in the zip file"

ZIP format definition:
http://www.pkware.com/products/enterprise/white_papers/appnote.html

   Neil






More information about the Python-list mailing list