[Numpy-discussion] More loadtxt() changes
Nils Wagner
nwagner at iam.uni-stuttgart.de
Thu Nov 27 03:20:56 EST 2008
On Thu, 27 Nov 2008 09:08:41 +0100
Manuel Metz <mmetz at astro.uni-bonn.de> wrote:
> Pierre GM wrote:
>> On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
>>
>>> Manuel Metz wrote:
>>>> Ryan May wrote:
>>>>> 3) Better support for missing values. The docstring
>>>>>mentions a
>>>>> way of
>>>>> handling missing values by passing in a converter. The
>>>>>problem
>>>>> with this is
>>>>> that you have to pass in a converter for *every column*
>>>>>that will
>>>>> contain
>>>>> missing values. If you have a text file with 50
>>>>>columns, writing
>>>>> this
>>>>> dictionary of converters seems like ugly and needless
>>>>> boilerplate. I'm
>>>>> unsure of how best to pass in both what values indicate
>>>>>missing
>>>>> values and
>>>>> what values to fill in their place. I'd love
>>>>>suggestions
>>>> Hi Ryan,
>>>> this would be a great feature to have !!!
>>
>> About missing values:
>>
>> * I don't think missing values should be supported in
>>np.loadtxt. That
>> should go into a specific np.ma.io.loadtxt function, a
>>preview of
>> which I posted earlier. I'll modify it taking Ryan's new
>>function into
>> account, and Chrisopher's suggestion (defining a
>>dictionary {column
>> name : missing values}.
>>
>> * StringConverter already defines some default filling
>>values for each
>> dtype. In np.ma.io.loadtxt, these values can be
>>overwritten. Note
>> that you should also be able to define a filling value
>>by specifying a
>> converter (think float(x or 0) for example)
>>
>> * Missing values on space-separated fields are very
>>tricky to handle:
>> take a line like "a,,,d". With a comma as separator,
>>it's clear that
>> the 2nd and 3rd fields are missing.
>> Now, imagine that commas are actually spaces ( "a
>> d"): 'd' is now
>> seen as the 2nd field of a 2-field record, not as the
>>4th field of a 4-
>> field record with 2 missing values. I thought about it,
>>and kicked in
>> touch
>>
>> * That said, there should be a way to deal with
>>fixed-length fields,
>> probably by taking consecutive slices of the initial
>>string. That way,
>> we should be able to keep track of missing data...
>
> Certainly, yes! Dealing with fixed-length fields would
>be necessary. The
> case I had in mind had both -- a separator ("|") __and__
>fixed-length
> fields -- and is probably very special in that sense.
>But such
> data-files exists out there...
>
See page 9, 10 (Bulk data input deck)
http://www.zonatech.com/Documentation/zndalusersmanual2.0.pdf
Nils
More information about the NumPy-Discussion
mailing list