Newbie Q: (NumPy) how to write an array in a file ?

Alex Martelli aleaxit at yahoo.com
Wed Oct 11 06:25:32 EDT 2000


"Nicolas Decoster" <Nicolas.Decoster at Noveltis.fr> wrote in message
news:39E433C9.D847265C at Noveltis.fr...
> Alex Martelli wrote:
> >
> > "binary format" is *NOT* portable.  If you write binary data on a
> > big-endian machine, you can't just transparently re-read it on a
> > little-endian machine -- it's even worse for floating-point data
> > in different formats!
    [snip]
> I'm sorry. I have misused the word "portable". What I need is not a
> portable format between machines, but between apps. And it must be a
> binary format. Pickle is not usefull for me.
>
> And as far as I know little-endian is a quite portable format as well as
> big-endian. Of course on a little-endian machine it is "longer" to read
> big-endian data and vice-versa.

If you're willing to write any arbitrary amount of code to translate
the format, then and only then binary formats can be said to be
"portable" (among applications, machines, etc).  It's not just a
question of big-endian vs little-endian -- when the binary data
includes floating-point numbers, you're in real trouble (unless all
of your machines use essentially-identical floating-point formats).
It's probably easier to run Python's pickle from inside your apps,
than to perform the arbitrary translations that may be needed to
handle binary-format floating-point...:-)

Anyway, back to your request.  This should work...:

>>> x=Numeric.array(range(4),'d')
>>> x
array([ 0.,  1.,  2.,  3.])
>>> fou=open("c:/feep.dat", "wb")
>>> x.tostring()
'\000\000\000\000\000\000\000\000\000\000\000\000\000\000\360?\000\000\000\0
00\000\000\000@\000\000\000\000\000\000\010@'
>>> fou.write(x.tostring())
>>> fou.close()
>>>

I'm displaying x.tostring() just to show that the format is,
indeed, a binary one (8 bytes for each double-precision
number contained in the array, in machine-format).  Note,
also, that the .tostring method returns just the *data* --
no information about the *shape* of the array is preserved
(you'll need to save that separately if you also need it!).


Now, you can do, from, say, a C application:

int main(int argc, char* argv[])
{
    double x;
    FILE* fin = fopen("c:/feep.dat","rb");
    if(!fin) {
        perror("Can't open c:/feep.dat");
        return 1;
    }
    while(fread(&x, sizeof(x), 1, fin)) {
        printf("%8.2f ", x);
    }
    printf("\n");
    return 0;
}

this will print:
    0.00     1.00     2.00     3.00


I suspect this may cause an extra copy of the array's
data (does PyString_FromStringAndSize copy the data
it's given...?  I think it does...).  Unfortunately, I
think that, to avoid it, the array-object should implement
the "buffer-object interface", and the current version of
Numeric, I believe, does not (there is a small commented
block about "these should be added", but the methods
needed are not in fact defined and added to the object).


Alex






More information about the Python-list mailing list