[Numpy-discussion] Bug in genfromtxt with usecols and converters

Derek Homeier derek at astro.physik.uni-goettingen.de
Tue Aug 26 12:56:47 EDT 2014


Hi Adrian,

>> not sure whether to call it a bug; the error seems to arise before reading any actual data
>> (even on reading from an empty string); when genfromtxt is checking the filling_values used
>> to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1
>> which your conversion scheme does not know about. Although I think it is rather the user’s
>> responsibility to provide valid converters, probably the documentation should at least be
>> updated to make them aware of this requirement.
>> I see two possible fixes/workarounds:
>> 
>> provide an keyword argument filling_values=[0,0,'1:1’]
> This workaround seems to be work, but I doubt that the actual problem is
> the converter function I pass. The '-1', which is used as the testing
> value is the first_values from the 3rd column (line 1574 in npyio.py),
> but the converter is defined for column 4. by setting the filling_values
> to an array of length 3, this obviously makes the problem disappear. But
> I think if the first row is used, it should also use the values from the
> column for which the converter is defined.

it is certainly related to the converter function because a KeyError for the dictionary you provide is raised:
File "test.py", line 13, in <module>
    3: lambda rel: relEnum[rel.decode()]})
  File "/sw/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1581, in genfromtxt
    missing_values=missing_values[i],)
  File "/sw/lib/python3.4/site-packages/numpy/lib/_iotools.py", line 784, in update
    tester = func(testing_value or asbytes('1'))
  File "test.py", line 13, in <lambda>
    3: lambda rel: relEnum[rel.decode()]})
KeyError: '-1’

But you are right that the problem with using the first_values, which should of course be valid,
somehow stems from the use of usecols, it seems that in that loop

    for (i, conv) in user_converters.items():

i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of
modifying i inside the loop appears a bit dangerous to me. I’ll have look if I can make this safer.

As long as your data don’t actually contain any missing values you might also simply use np.loadtxt.

Cheers,
						Derek




More information about the NumPy-Discussion mailing list