[SciPy-User] format for chunked file save and read ?

Bruce Southey bsouthey at gmail.com
Wed Sep 22 14:29:49 EDT 2010


  On 09/22/2010 01:09 PM, josef.pktd at gmail.com wrote:
> On Wed, Sep 22, 2010 at 1:35 PM, Nathaniel Smith<njs at pobox.com>  wrote:
>> On Wed, Sep 22, 2010 at 9:40 AM, Robert Kern<robert.kern at gmail.com>  wrote:
>>> On Wed, Sep 22, 2010 at 11:29, Nathaniel Smith<njs at pobox.com>  wrote:
>>>> On Wed, Sep 22, 2010 at 7:18 AM,<josef.pktd at gmail.com>  wrote:
>>>>> What is the best file format for storing temporary data, for chunked
>>>>> saving and loading, that only uses numpy and scipy?
>>>>> I would like a file format that could be shared cross-platform and
>>>>> across python/numpy versions if needed.
>>>> Why not just use pickle? Mmap isn't giving you any advantages here
>>>> that I can see, and pickles are much easier to handle when you want to
>>>> write things out incrementally.
>>> Large arrays are not written or read incrementally in a pickle. We
>>> have some tricks in order to not duplicate memory, but they don't
>>> always work.
>> Oh, I see, we're talking past each other. I don't think Josef's
>> problem is how to save a large array where you need to avoid copies; I
>> think the problem is to compute and save one array, then compute and
>> save another array, then compute and save another array, etc. Pickles
>> can handle that sort of incrementality just fine :-).
> No, Stata appends to the file (as it is described, I don't know what
> they are doing internally.)
> If you save to a new file, then several files would need to be pieced
> together, or the previous data needs to be loaded and saved again.
>
> For example, when they use an optimal stopping rule (estimating the
> error of a given number of bootstrap samples), they have to do it in
> at least two step, initial number of samples, then update error
> estimate, then sample more given the new estimate.
With Python2.5+ you do have the option of sqlite. Sure it has various 
issues but it is builtin.


Bruce





More information about the SciPy-User mailing list