[Numpy-discussion] More loadtxt() changes

Wed Nov 26 18:16:04 EST 2008

On Nov 26, 2008, at 5:55 PM, Ryan May wrote:

> Manuel Metz wrote:
>> Ryan May wrote:
>>> 3) Better support for missing values.  The docstring mentions a  
>>> way of
>>> handling missing values by passing in a converter.  The problem  
>>> with this is
>>> that you have to pass in a converter for *every column* that will  
>>> contain
>>> missing values.  If you have a text file with 50 columns, writing  
>>> this
>>> dictionary of converters seems like ugly and needless  
>>> boilerplate.  I'm
>>> unsure of how best to pass in both what values indicate missing  
>>> values and
>>> what values to fill in their place.  I'd love suggestions
>>
>> Hi Ryan,
>>   this would be a great feature to have !!!

About missing values:

* I don't think missing values should be supported in np.loadtxt. That  
should go into a specific np.ma.io.loadtxt function, a preview of  
which I posted earlier. I'll modify it taking Ryan's new function into  
account, and Chrisopher's suggestion (defining a dictionary {column  
name : missing values}.

* StringConverter already defines some default filling values for each  
dtype. In  np.ma.io.loadtxt, these values can be overwritten. Note  
that you should also be able to define a filling value by specifying a  
converter (think float(x or 0) for example)

* Missing values on space-separated fields are very tricky to handle:
take a line like "a,,,d". With a comma as separator, it's clear that  
the 2nd and 3rd fields are missing.
Now, imagine that commas are actually spaces ( "a     d"): 'd' is now  
seen as the 2nd field of a 2-field record, not as the 4th field of a 4- 
field record with 2 missing values. I thought about it, and kicked in  
touch

* That said, there should be a way to deal with fixed-length fields,  
probably by taking consecutive slices of the initial string. That way,  
we should be able to keep track of missing data...
>