[SciPy-dev] SciPy-dev] [patch] read/write v5 .mat files with structs, cell arrays, objects, or function handles
Vebjorn Ljosa
ljosa at broad.mit.edu
Thu Oct 2 10:14:14 EDT 2008
Stéfan van der Walt <stefan at sun.ac.za> writes:
>
> We should certainly look at applying the non-API-changing parts,
> though. I'm not sure what the best way is to represent these
> structures on the Python side.
>
> Thouis, you've thought about this a lot: could you tell us the pros
> and cons of switching to the new representation?
The reason Ray and I changed some of the representations is that we
wanted the mapping from Matlab to Python to be symmetric: anything read
from a MAT-file should be represented in a way that allows the writer
code to write it back in its original form. This requires that the
original Matlab type be deducible from the Python representation.
* Struct arrays: Matlab struct arrays were previously represented as
numpy arrays of dtype=object filled with instances of mat_struct.
The problem is that Matlab cell arrays were also represented as numpy
arrays of dtype=objects. The writer code could in most cases have
identified structs by looking at the contents (instances of
mat_struct), but there was no way to distinguish a 0x0 cell array
from a 0x0 struct array. We therefore opted to represent struct
arrays as numpy record arrays.
In order not to break existing code, we could introduce a keyword
argument to loadmat that selects the old or new representation,
similar to numpy.histogram's "new" argument. In 0.7, leaving the
argument out would default to False (old behavior), but give a
deprecation warning. Later versions can first change the default to
True and then remove the old behavior entirely. The best name I can
think of for this keyword argument is "struct_as_record".
* Char arrays/strings: Same story. At the lowest level, the code
represented char arrays as numpy arrays of dtype='U1', which is
fine. A very useful "processor function" (in miobase) turns them
into arrays of strings, however. This processor function created
an array of dtype=object. We changed this to 'U...' so the array
could be distinguished from a cell array. I think this is unlikely
to break any code, do you agree?
* Objects: This change in representation was purely for our
convenience, and we should be able to fix our patch to keep the old
representation.
Vebjorn
More information about the SciPy-Dev
mailing list