python 2.7.12 on Linux behaving differently than on Windows

eryk sun eryksun at gmail.com
Wed Dec 7 15:17:02 EST 2016


On Wed, Dec 7, 2016 at 12:39 PM, Chris Angelico <rosuav at gmail.com> wrote:
> Note that two of the Beauty Stone tracks include quotes as well as
> question marks. How do you identify those? Let's say you want to play
> one of these in VLC, and then maybe you decide that the track in
> Pirates of Penzance/MusicOnly is slightly mis-cropped, so you rebuild
> it from the one in the parent directory. How does that work on
> Windows? If you say "it doesn't", then (a) you have taken away choice
> on a fundamental level, (b) you have your head in the sand, and (c)
> you still haven't solved the problem of percent signs, carets, and so
> on, which are perfectly legal in file names, but have meaning to the
> shell.

The five wildcard characters ("<>*?) aren't allowed in the names of
files and directories -- at least not by any Windows filesystem that
I've used. This makes it easy for a filesystem to support globbing in
its implementation of NtQueryDirectoryFile. Filenames also can't
contain control characters, slash, backslash, pipe, and colon (the
latter delimits a fully-qualified NTFS name, e.g.
filename:streamname:streamtype). NTFS stream names are less limited.
They only disallow NUL, slash, backslash, and colon.

The filesystem runtime library provides the macro
FsRtlIsAnsiCharacterLegal [1], among other related macros, which
allows filesystem drivers to be consistent with FAT or NTFS. To my
knowledge this is voluntary, but going against the grain is only
asking for headaches.

[1]: https://msdn.microsoft.com/en-us/library/ff546731

This macro depends on the array FsRtlLegalAnsiCharacterArray, which
indicates whether each ASCII character is valid for a fixed set of
filesystems. The flag values are as follows:

    0x01 - FAT
    0x02 - OS/2 HPFS
    0x04 - NTFS/Stream
    0x08 - Wildcard
    0x10 - Stream

Here's the array dumped from the kernel debugger. For convenience I've
added the printable ASCII characters above each line.

    lkd> db poi(nt!FsRtlLegalAnsiCharacterArray)

    fffff801`fc0e8550  00 10 10 10 10 10 10 10-10 10 10 10 10 10 10 10

    fffff801`fc0e8560  10 10 10 10 10 10 10 10-10 10 10 10 10 10 10 10
                           !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /
    fffff801`fc0e8570  17 07 18 17 17 17 17 17-17 17 18 16 16 17 07 00
                        0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?
    fffff801`fc0e8580  17 17 17 17 17 17 17 17-17 17 04 16 18 16 18 18
                        @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O
    fffff801`fc0e8590  17 17 17 17 17 17 17 17-17 17 17 17 17 17 17 17
                        P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _
    fffff801`fc0e85a0  17 17 17 17 17 17 17 17-17 17 17 16 00 16 17 17
                        `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o
    fffff801`fc0e85b0  17 17 17 17 17 17 17 17-17 17 17 17 17 17 17 17
                        p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~
    fffff801`fc0e85c0  17 17 17 17 17 17 17 17-17 17 17 17 10 17 17 17

NTFS stream names are the least restricted, allowing everything except
NUL, slash, and backslash. For example:

    >>> open('test:\x01|*?<>"', 'w').close()
    >>> win32file.FindStreams('test')
    [(0, '::$DATA'), (0, ':\x01|*?<>":$DATA')]

The first stream listed above is the anonymous data stream
"test::$DATA", which is the same as simply opening "test".

Technically by the above table ":" is allowed in NTFS names, but its
use is reserved as the delimiter of the fully-qualified name. For
example, the fully-qualified name of a directory is
"dirname:$I30:$INDEX_ALLOCATION". The stream name is "$I30" and the
stream type is "$INDEX_ALLOCATION". A directory can also have multiple
named $DATA streams, because NTFS is weird like that.



More information about the Python-list mailing list