[SciPy-User] format for chunked file save and read ?

Nathaniel Smith njs at pobox.com
Wed Sep 22 13:35:45 EDT 2010


On Wed, Sep 22, 2010 at 9:40 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Sep 22, 2010 at 11:29, Nathaniel Smith <njs at pobox.com> wrote:
>> On Wed, Sep 22, 2010 at 7:18 AM,  <josef.pktd at gmail.com> wrote:
>>> What is the best file format for storing temporary data, for chunked
>>> saving and loading, that only uses numpy and scipy?
>>> I would like a file format that could be shared cross-platform and
>>> across python/numpy versions if needed.
>>
>> Why not just use pickle? Mmap isn't giving you any advantages here
>> that I can see, and pickles are much easier to handle when you want to
>> write things out incrementally.
>
> Large arrays are not written or read incrementally in a pickle. We
> have some tricks in order to not duplicate memory, but they don't
> always work.

Oh, I see, we're talking past each other. I don't think Josef's
problem is how to save a large array where you need to avoid copies; I
think the problem is to compute and save one array, then compute and
save another array, then compute and save another array, etc. Pickles
can handle that sort of incrementality just fine :-).

For bootstrapping, if we can construct the whole array in memory even
once, then there's no need to save them out to a file at all -- the
bootstrap routine can just return that array and let the user decide
what they want to do with it!

On Wed, Sep 22, 2010 at 10:04 AM,  <josef.pktd at gmail.com> wrote:
> I don't like pickles much for anything that needs to be stored for
> more than 5 minutes, because several times I wasn't able to read them
> anymore after some version or code changes.

Sure, if you pickle some object, and then change the in-memory
definition of the object, the pickle system won't magically know how
to translate the old version of the object into the new version -- I
assume that's the issue you ran into? (I often use pickles for quick
ad-hoc storage, but restrict myself to built-in types like tuples and
dicts for just this reason.) But here you're just talking about
ndarray's, where pickle compatibility *is* guaranteed (right?).

-- Nathaniel



More information about the SciPy-User mailing list