[Python-checkins] CVS: python/nondist/peps pep-0277.txt,NONE,1.1 pep-0000.txt,1.150,1.151

Barry Warsaw bwarsaw@users.sourceforge.net
Sat, 12 Jan 2002 16:13:40 -0800


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv31340

Modified Files:
	pep-0000.txt 
Added Files:
	pep-0277.txt 
Log Message:
PEP 277, Unicode file name support for Windows NT, Neil Hodgson


--- NEW FILE: pep-0277.txt ---
PEP: 277
Title: Unicode file name support for Windows NT
Version: $Revision: 1.1 $
Last-Modified: $Date: 2002/01/13 00:13:38 $
Author: neilh@scintilla.org (Neil Hodgson)
Status: Draft
Type: Standards Track
Created: 11-Jan-2002
Python-Version: 2.3
Post-History:


Abstract

    This PEP discusses supporting access to all files possible on
    Windows NT by passing Unicode file names directly to the system's
    wide-character functions.


Rationale

    Python 2.2 on Win32 platforms converts Unicode file names passed
    to open and to functions in the os module into the 'mbcs' encoding
    before passing the result to the operating system.  This is often
    successful in the common case where the script is operating with
    the locale set to the same value as when the file was created.
    Most machines are set up as one locale and rarely if ever changed
    from this locale.  For some users, locale is changed more often
    and on servers there are often files saved by users using
    different locales.

    On Windows NT and descendent operating systems, including Windows
    2000 and Windows XP, wide-character APIs are available that
    provide direct access to all file names, including those that are
    not representable using the current locale.  The purpose of this
    proposal is to provide access to these wide-character APIs through
    the standard Python file object and posix module and so provide
    access to all files on Windows NT.


Specification

    On Windows platforms which provide wide-character file APIs, when
    Unicode arguments are provided to file APIs, wide-character calls
    are made instead of the standard C library and posix calls.

    The Python file object is extended to use a Unicode file name
    argument directly rather than converting it.  This affects the
    file object constructor file(filename[, mode[, bufsize]]) and also
    the open function which is an alias of this constructor.  When a
    Unicode filename argument is used here then the name attribute of
    the file object will be Unicode.  The representation of a file
    object, repr(f) will display Unicode file names as an escaped
    string in a similar manner to the representation of Unicode
    strings.

    The posix module contains functions that take file or directory
    names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat,
    and _getfullpathname.  These will use Unicode arguments directly
    rather than converting them.  For the rename function, this
    behaviour is triggered when either of the arguments is Unicode and
    the other argument converted to Unicode using the default
    encoding.

    The listdir function currently returns a list of strings.  Under
    this proposal, it will return a list of Unicode strings when its
    path argument is Unicode.

    To allow client code to determine that these features are
    implemented, the unicodefilenames function is provided.  This
    function returns true when the underlying system supports file
    names containing most Unicode characters and any valid file name
    may be passed to open as a Unicode string.


Restrictions

    On the consumer Windows operating systems, Windows 95, Windows 98,
    and Windows ME, there are no wide-character file APIs so behaviour
    is unchanged under this proposal.  It may be possible in the
    future to extend this proposal to cover these operating systems as
    the VFAT-32 file system used by them does support Unicode file
    names but access is difficult and so implementing this would
    require much work.  The "Microsoft Layer for Unicode" could be a
    starting point for implementing this.

    Python can be compiled with the size of Unicode characters set to
    4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte
    type and Py_UNICODE_SIZE to be 4.  As the Windows API does not
    accept 4 byte characters, the features described in this proposal
    will not work in this mode so the implementation falls back to the
    current 'mbcs' encoding technique.


Reference Implementation

    An experimental implementation is available from
    http://scintilla.sourceforge.net/winunichanges.zip


References

    [1] Microsoft Windows APIs
        http://msdn.microsoft.com/


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End:


Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.150
retrieving revision 1.151
diff -C2 -d -r1.150 -r1.151
*** pep-0000.txt	2002/01/10 16:10:32	1.150
--- pep-0000.txt	2002/01/13 00:13:38	1.151
***************
*** 88,91 ****
--- 88,92 ----
   S   275  Switching on Multiple Values                 Lemburg
   S   276  Simple Iterator for ints                     Althoff
+  S   277  Unicode file name support for Windows NT     Hodgson
  
   Finished PEPs (done, implemented in CVS)
***************
*** 236,239 ****
--- 237,241 ----
   S   275  Switching on Multiple Values                 Lemburg
   S   276  Simple Iterator for ints                     Althoff
+  S   277  Unicode file name support for Windows NT     Hodgson
   SR  666  Reject Foolish Indentation                   Creighton
  
***************
*** 266,269 ****
--- 268,272 ----
      Goodger, David           dgoodger@bigfoot.com
      Griffin, Grant           g2@iowegian.com
+     Hodgson, Neil            neilh@scintilla.org
      Hudson, Michael          mwh@python.net
      Hylton, Jeremy           jeremy@zope.com