[Numpy-discussion] More loadtxt() changes

Manuel Metz mmetz at astro.uni-bonn.de
Thu Nov 27 03:08:41 EST 2008


Pierre GM wrote:
> On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
> 
>> Manuel Metz wrote:
>>> Ryan May wrote:
>>>> 3) Better support for missing values.  The docstring mentions a  
>>>> way of
>>>> handling missing values by passing in a converter.  The problem  
>>>> with this is
>>>> that you have to pass in a converter for *every column* that will  
>>>> contain
>>>> missing values.  If you have a text file with 50 columns, writing  
>>>> this
>>>> dictionary of converters seems like ugly and needless  
>>>> boilerplate.  I'm
>>>> unsure of how best to pass in both what values indicate missing  
>>>> values and
>>>> what values to fill in their place.  I'd love suggestions
>>> Hi Ryan,
>>>   this would be a great feature to have !!!
> 
> About missing values:
> 
> * I don't think missing values should be supported in np.loadtxt. That  
> should go into a specific np.ma.io.loadtxt function, a preview of  
> which I posted earlier. I'll modify it taking Ryan's new function into  
> account, and Chrisopher's suggestion (defining a dictionary {column  
> name : missing values}.
> 
> * StringConverter already defines some default filling values for each  
> dtype. In  np.ma.io.loadtxt, these values can be overwritten. Note  
> that you should also be able to define a filling value by specifying a  
> converter (think float(x or 0) for example)
> 
> * Missing values on space-separated fields are very tricky to handle:
> take a line like "a,,,d". With a comma as separator, it's clear that  
> the 2nd and 3rd fields are missing.
> Now, imagine that commas are actually spaces ( "a     d"): 'd' is now  
> seen as the 2nd field of a 2-field record, not as the 4th field of a 4- 
> field record with 2 missing values. I thought about it, and kicked in  
> touch
> 
> * That said, there should be a way to deal with fixed-length fields,  
> probably by taking consecutive slices of the initial string. That way,  
> we should be able to keep track of missing data...

Certainly, yes! Dealing with fixed-length fields would be necessary. The 
case I had in mind had both -- a separator ("|") __and__ fixed-length 
fields -- and is probably very special in that sense. But such 
data-files exists out there...

mm



More information about the NumPy-Discussion mailing list