[Numpy-discussion] loadtxt ndmin option

Derek Homeier derek at astro.physik.uni-goettingen.de
Sun Jun 19 13:11:13 EDT 2011


On 31 May 2011, at 21:28, Pierre GM wrote:

> On May 31, 2011, at 6:37 PM, Derek Homeier wrote:
>
>> On 31 May 2011, at 18:25, Pierre GM wrote:
>>
>>> On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
>>>>
>>>> I think stuff like multiple delimiters should have been dealt with
>>>> before, as the right place to insert the ndmin code (which includes
>>>> the decision to squeeze or not to squeeze as well as to add
>>>> additional
>>>> dimensions, if required) would be right at the end before the
>>>> 'unpack'
>>>> switch, or  rather replacing the bit:
>>>>
>>>>  if usemask:
>>>>      output = output.view(MaskedArray)
>>>>      output._mask = outputmask
>>>>  if unpack:
>>>>      return output.squeeze().T
>>>>  return output.squeeze()
>>>>
>>>> But there it's already not clear to me how to deal with the
>>>> MaskedArray case...
>>>
>>> Oh, easy.
>>> You need to replace only the last three lines of genfromtxt with the
>>> ones from loadtxt  (808-833). Then, if usemask is True, you need to
>>> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
>>> Comments:
>>> * I would raise an exception if ndmin isn't correct *before* trying
>>> to read the file...
>>> * You could define a `collapse_function` that would be
>>> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on
>>> the values of `usemask` and `ndmin`...

Thanks, that helped to clean up a little bit.

>>> If you have any question about numpy.ma, don't hesitate to contact
>>> me directly.
>>
>> Thanks for the directions! I was not sure about the usemask case
>> because it presently does not invoke .squeeze() either...
>
> The idea is that if `usemask` is True, you build a second array (the  
> mask), that you attach to your main array at the very end (in the  
> `output=output.view(MaskedArray), output._mask = mask` combo...).  
> Afterwards, it's a regular MaskedArray that supports the .squeeze()  
> method...
>
OK, in both cases output.squeeze() is now used if ndim>ndmin and  
usemask is False - at least it does not break any tests, so it seems  
to work with MaskedArrays as well.
>
>> On a
>> possibly related note, genfromtxt also treats the 'unpack'ing of
>> structured arrays differently from loadtxt (which returns a list of
>> arrays in that case) - do you know if this is on purpose, or also
>> rather missing functionality (I guess it might break  
>> recfromtxt()...)?
>
> Keep in mind that I haven't touched genfromtxt since 8-10 months or  
> so. I wouldn't be surprised that it were lagging a bit behind  
> loadtxt in terms of development. Yes, there'll be some tweaking to  
> do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the  
> defaults) and others, but nothing major.

Well, at long last I got to implement the above and added the  
corresponding tests for genfromtxt - with the exception of the  
dimension-0 cases, since genfromtxt raises an error on empty files.  
There already is a comment it should perhaps better return an empty  
array, so I am putting that idea up for discussion here again.
I tried to devise a very basic test with masked arrays, just added to  
test_withmissing now.
I also implemented the same unpacking behaviour for structured arrays  
and just made recfromtxt set unpack=False to work (or should it issue  
a warning?).

The patches are up for review as commit 8ac01636 in my iocnv-wildcard  
branch:
https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard

Cheers,
							Derek




More information about the NumPy-Discussion mailing list