python: ascii read
Roel Schroeven
rschroev_nospam_ml at fastmail.fm
Thu Sep 16 16:08:50 EDT 2004
Brian van den Broek wrote:
> But I don't really feel I've a handle on the significance of saying it
> maps the file into memory versus reading the file. The naive thought is
> that since the data gets into memory, the file must be read. But this
> makes me sure I'm missing a distinction in the terminology. Explanations
> and pointers for what to read gratefully received.
Eventually the file is read, of course (or at least parts thereof). Mmap
is a feature of the virtual memory system in modern operating systems,
so you need a basic understanding of virtual memory in order to
understand mmap. All details can be found e.g. in Modern Operating
Systems by Andrew Tanenbaum.
http://mirrors.kernel.org/LDP/LDP/tlk/tlk.html does a good job
explaining how Linux handles it,, but I'll try to explain the general
basics here in short.
With virtual memory systems, the addresses that are used by application
programs don't refer directly to memory locations. Instead the addresses
are split in two parts; the first part is a page number, the second is
the offset of the memory location in the page. The system keeps a list
of all pages. When an address is referenced, the page is looked up in
that list (Pages are blocks of memory, typically 4-8 kB). There are two
possibilities:
- The page is already in memory. In that case, the list contains the
real physical address of the page in memory. That address is combined
with the offset to form the physical address of the memory location.
- The page is not in memory. The virtual memory system loads it in
memory and stores the physical address in the list. Processing then
continues as in the other case. Note that it may be necessary to remove
another page from memory in order to load a new one; in that case, the
other page is paged to disk if it is still needed so that it can be read
again later.
This behind-the-scenes translation and paging to and from disk is what
allows modern operating systems to use much more memory than what's
physically available in the system.
mmap creates an entry in the list that says the page is not in memory,
but tells the system what file to load it from: a range of addresses is
'mapped' to the data in the file. It also returns the logical address of
the data. When an address in the range is referenced, the virtual memory
system loads the appropriate page from disk (or possibly more than one
page at the time, for efficiency reasons) to memory and stores its
(theirs) location in the list. An application program can access exactly
the same way as any other part of memory.
> And, since mmap behave differently on different platforms, I'm mostly a
> win32 user looking to transition to Linux.
I think Python hides much of the differences between the Windows and
Unix implentations of mmap (Windows doesn't really have mmap; instead
you use CreateFileMapping and MapViewOfFile).
--
"Codito ergo sum"
Roel Schroeven
More information about the Python-list
mailing list