Newbie question about text encoding

Dave Angel davea at davea.name
Fri Feb 27 02:30:46 EST 2015


On 02/27/2015 12:58 AM, Steven D'Aprano wrote:
> Dave Angel wrote:
>
>> (Although I believe Seymour Cray was quoted as saying that virtual
>> memory is a crock, because "you can't fake what you ain't got.")
>
> If I recall correctly, disk access is about 10000 times slower than RAM, so
> virtual memory is *at least* that much slower than real memory.
>

It's so much more complicated than that, that I hardly know where to 
start.  I'll describe a generic processor/OS/memory/disk architecture; 
there will be huge differences between processor models even from a 
single manufacturer.

First, as soon as you add swapping logic to your 
processor/memory-system, you theoretically slow it down.  And in the 
days of that quote, Cray's memory was maybe 50 times as fast as the 
memory used by us mortals.  So adding swapping logic would have slowed 
it down quite substantially, even when it was not swapping.  But that 
logic is inside the CPU chip these days, and presumably thoroughly 
optimized.

Next, statistically, a program uses a small subset of its total program 
& data space in its working set, and the working set should reside in 
real memory.  But when the program greatly increases that working set, 
and it approaches the amount of physical memory, then swapping becomes 
more frenzied, and we say the program is thrashing.  Simple example, try 
sorting an array that's about the size of available physical memory.

Next, even physical memory is divided into a few levels of caching, some 
on-chip and some off.  And the caching is done in what I call strips, 
where accessing just one byte causes the whole strip to be loaded from 
non-cached memory.  I forget the current size for that, but it's maybe 
64 to 256 bytes or so.

If there are multiple processors (not multicore, but actual separate 
processors), then each one has such internal caches, and any writes on 
one processor may have to trigger flushes of all the other processors 
that happen to have the same strip loaded.

The processor not only prefetches the next few instructions, but decodes 
and tentatively executes them, subject to being discarded if a 
conditional branch doesn't go the way the processor predicted.  So some 
instructions execute in zero time, some of the time.

Every address of instruction fetch, or of data fetch or store, goes 
through a couple of layers of translation.  Segment register plus offset 
gives linear address.  Lookup those in tables to get physical address, 
and if table happens not to be in on-chip cache, swap it in.  If 
physical address isn't valid, a processor exception causes the OS to 
potentially swap something out, and something else in.

Once we're paging from the swapfile, the size of the read is perhaps 4k. 
  And that read is regardless of whether we're only going to use one 
byte or all of it.

The ratio between an access which was in the L1 cache and one which 
required a page to be swapped in from disk?  Much bigger than your 
10,000 figure.  But hopefully it doesn't happen a big percentage of the 
time.

Many, many other variables, like the fact that RAM chips are not 
directly addressable by bytes, but instead count on rows and columns. 
So if you access many bytes in the same row, it can be much quicker than 
random access.  So simple access time specifications don't mean as much 
as it would seem;  the controller has to balance the RAM spec with the 
various cache requirements.
-- 
DaveA



More information about the Python-list mailing list