[Numpy-discussion] More loadtxt() changes

Thu Nov 27 03:20:56 EST 2008

On Thu, 27 Nov 2008 09:08:41 +0100
  Manuel Metz <mmetz at astro.uni-bonn.de> wrote:
> Pierre GM wrote:
>> On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
>> 
>>> Manuel Metz wrote:
>>>> Ryan May wrote:
>>>>> 3) Better support for missing values.  The docstring 
>>>>>mentions a  
>>>>> way of
>>>>> handling missing values by passing in a converter.  The 
>>>>>problem  
>>>>> with this is
>>>>> that you have to pass in a converter for *every column* 
>>>>>that will  
>>>>> contain
>>>>> missing values.  If you have a text file with 50 
>>>>>columns, writing  
>>>>> this
>>>>> dictionary of converters seems like ugly and needless  
>>>>> boilerplate.  I'm
>>>>> unsure of how best to pass in both what values indicate 
>>>>>missing  
>>>>> values and
>>>>> what values to fill in their place.  I'd love 
>>>>>suggestions
>>>> Hi Ryan,
>>>>   this would be a great feature to have !!!
>> 
>> About missing values:
>> 
>> * I don't think missing values should be supported in 
>>np.loadtxt. That  
>> should go into a specific np.ma.io.loadtxt function, a 
>>preview of  
>> which I posted earlier. I'll modify it taking Ryan's new 
>>function into  
>> account, and Chrisopher's suggestion (defining a 
>>dictionary {column  
>> name : missing values}.
>> 
>> * StringConverter already defines some default filling 
>>values for each  
>> dtype. In  np.ma.io.loadtxt, these values can be 
>>overwritten. Note  
>> that you should also be able to define a filling value 
>>by specifying a  
>> converter (think float(x or 0) for example)
>> 
>> * Missing values on space-separated fields are very 
>>tricky to handle:
>> take a line like "a,,,d". With a comma as separator, 
>>it's clear that  
>> the 2nd and 3rd fields are missing.
>> Now, imagine that commas are actually spaces ( "a 
>>    d"): 'd' is now  
>> seen as the 2nd field of a 2-field record, not as the 
>>4th field of a 4- 
>> field record with 2 missing values. I thought about it, 
>>and kicked in  
>> touch
>> 
>> * That said, there should be a way to deal with 
>>fixed-length fields,  
>> probably by taking consecutive slices of the initial 
>>string. That way,  
>> we should be able to keep track of missing data...
> 
> Certainly, yes! Dealing with fixed-length fields would 
>be necessary. The 
> case I had in mind had both -- a separator ("|") __and__ 
>fixed-length 
> fields -- and is probably very special in that sense. 
>But such 
> data-files exists out there...
> 
See page 9, 10  (Bulk data input deck)
http://www.zonatech.com/Documentation/zndalusersmanual2.0.pdf

Nils