[Python-Dev] Memory woes under Windows

Fri, 26 May 2000 11:41:57 -0400

Just polishing part of this off, for the curious:

> ...
> Dragon's Win98 woes appear due to something else:  right after a Win98
> system w/ 64Mb RAM is booted, about half the memory is already locked (not
> just committed)!  Dragon's product needs more than the remaining 32Mb to
> avoid thrashing.  Even stranger, killing every process after booting
> releases an insignificant amount of that locked memory. ...

That turned out to be (mostly) irrelevant, and even if it were relevant it
turns out you can reduce the locked memory (to what appears to be an
undocumented minimum) and the file-cache size (to what is a documented
minimum) just by malloc'ing, zero'ing and free'ing a few giant arrays
(Windows malloc()-- unlike Linux's --returns a pointer to committed memory;
Windows has other calls if you really want memory you can't trust <0.5
wink>).

The next red herring was much funnier:  we couldn't reproduce the problem
when running the recognizer by hand (from a DOS box cmdline)!  But, run it
as Research did, system()'ed from a small Perl script, and it magically ran
3x slower, with monstrous disk thrashing.  So I had a great time besmirching
Perl's reputation <wink>.

Alas, it turned out the *real* trigger was something else entirely, that
we've known about for years but have never understood:  from inside the Perl
script, people used UNC paths to various network locations.  Like

    \\earwig\research2\data5\natspeak\testk\big55.voc

Exactly the same locations were referenced when people ran it "by hand", but
when people do it by hand, they naturally map a drive letter first, in order
reduce typing.  Like

    net use N: \\earwig\research2\data5\natspeak

once and then

    N:\testk\big55.voc

in their command lines.

This difference alone can make a *huge* timing difference!  Like I said,
we've never understood why.  Could simply be a bug in Dragon's
out-of-control network setup, or a bug in MS's networking code, or a bug in
Novell's server code -- I don't think we'll ever know.  The number of
IQ-hours that have gone into *trying* to figure this out over the years
could probably have carried several startups to successful IPOs <0.9 wink>.

One last useless clue:  do all this on a Win98 with 128Mb RAM, and the
timing difference goes away.  Ditto Win95, but much less RAM is needed.  It
sometimes acts like a UNC path consumes 32Mb of dedicated RAM!

Apart from this UNC-vs-mapped-drive issue, over many hours of dead-end
scenarios I was pleased to see that Win98 appears to do a good job of
reallocating physical RAM in response to changing demands, & in particular
better than Win95.  There's no problem here at all!

The original test case I posted-- showing massive heap fragmentation under
Win95, Win98, and W2K (but not NT), when growing a large Python list one
element at a time --remains an as-yet unstudied mystery.  I can easily make
*that* problem go away by, e.g., doing

    a = [1]*3000000
    del a

from time to time, apparently just to convince the Windows malloc that it
would be a wise idea to allocate a lot more than it thinks it needs from
time to time.  This suggests (untested) that it *could* be a huge win for
huge lists under Windows to overallocate huge lists by more than Python does
today.  I'll look into that "someday".