python: ascii read

Roel Schroeven rschroev_nospam_ml at fastmail.fm
Thu Sep 16 16:08:50 EDT 2004


Brian van den Broek wrote:

> But I don't really feel I've a handle on the significance of saying it 
> maps the file into memory versus reading the file. The naive thought is 
> that since the data gets into memory, the file must be read. But this 
> makes me sure I'm missing a distinction in the terminology. Explanations 
> and pointers for what to read gratefully received.

Eventually the file is read, of course (or at least parts thereof). Mmap 
is a feature of the virtual memory system in modern operating systems, 
so you need a basic understanding of virtual memory in order to 
understand mmap. All details can be found e.g. in Modern Operating 
Systems by Andrew Tanenbaum. 
http://mirrors.kernel.org/LDP/LDP/tlk/tlk.html does a good job 
explaining how Linux handles it,, but I'll try to explain the general 
basics here in short.

With virtual memory systems, the addresses that are used by application 
programs don't refer directly to memory locations. Instead the addresses 
are split in two parts; the first part is a page number, the second is 
the offset of the memory location in the page. The system keeps a list 
of all pages. When an address is referenced, the page is looked up in 
that list (Pages are blocks of memory, typically 4-8 kB). There are two 
possibilities:
- The page is already in memory. In that case, the list contains the 
real physical address of the page in memory. That address is combined 
with the offset to form the physical address of the memory location.
- The page is not in memory. The virtual memory system loads it in 
memory and stores the physical address in the list. Processing then 
continues as in the other case. Note that it may be necessary to remove 
another page from memory in order to load a new one; in that case, the 
other page is paged to disk if it is still needed so that it can be read 
again later.

This behind-the-scenes translation and paging to and from disk is what 
allows modern operating systems to use much more memory than what's 
physically available in the system.

mmap creates an entry in the list that says the page is not in memory, 
but tells the system what file to load it from: a range of addresses is 
'mapped' to the data in the file. It also returns the logical address of 
the data. When an address in the range is referenced, the virtual memory 
system loads the appropriate page from disk (or possibly more than one 
page at the time, for efficiency reasons) to memory and stores its 
(theirs) location in the list. An application program can access exactly 
the same way as any other part of memory.

> And, since mmap behave differently on different platforms, I'm mostly a 
> win32 user looking to transition to Linux.

I think Python hides much of the differences between the Windows and 
Unix implentations of mmap (Windows doesn't really have mmap; instead 
you use CreateFileMapping and MapViewOfFile).

-- 
"Codito ergo sum"
Roel Schroeven



More information about the Python-list mailing list