dynamic allocation file buffer

Fredrik Lundh fredrik at pythonware.com
Thu Sep 11 04:34:07 EDT 2008


Steven D'Aprano wrote:

> I'm no longer *claiming* anything, I'm *asking* whether random access to 
> a 4GB XML file is something that is credible or useful. It is my 
> understanding that XML is particularly ill-suited to random access once 
> the amount of data is too large to fit in RAM.

An XML file doesn't contain any indexing information, so random access 
to a large XML file is very inefficient.  You can build (or precompute) 
index information and store in a separate file, of course, but that's 
hardly something that's useful in the general case.

And as I said before, the only use case for *huge* XML files I've ever 
seen used in practice is to store large streams of record-style data; 
data that's intended to be consumed by sequential processes (and you can 
do a lot with sequential processing these days; for those interested in 
this, digging up a few review papers on "data stream processing" might 
be a good way to waste some time).

Document-style XML usually fits into memory on modern machines; 
structures larger than that are usually split into different parts (e.g. 
using XInclude) and stored in a container file.

Random *modifications* to an arbitrary XML file cannot be done, as long 
as you store the file in a standard file system.  And if you invent your 
own format, it's no longer an XML file.

</F>




More information about the Python-list mailing list