[Numpy-discussion] More loadtxt() changes
Pierre GM
pgmdevlist at gmail.com
Wed Nov 26 18:16:04 EST 2008
On Nov 26, 2008, at 5:55 PM, Ryan May wrote:
> Manuel Metz wrote:
>> Ryan May wrote:
>>> 3) Better support for missing values. The docstring mentions a
>>> way of
>>> handling missing values by passing in a converter. The problem
>>> with this is
>>> that you have to pass in a converter for *every column* that will
>>> contain
>>> missing values. If you have a text file with 50 columns, writing
>>> this
>>> dictionary of converters seems like ugly and needless
>>> boilerplate. I'm
>>> unsure of how best to pass in both what values indicate missing
>>> values and
>>> what values to fill in their place. I'd love suggestions
>>
>> Hi Ryan,
>> this would be a great feature to have !!!
About missing values:
* I don't think missing values should be supported in np.loadtxt. That
should go into a specific np.ma.io.loadtxt function, a preview of
which I posted earlier. I'll modify it taking Ryan's new function into
account, and Chrisopher's suggestion (defining a dictionary {column
name : missing values}.
* StringConverter already defines some default filling values for each
dtype. In np.ma.io.loadtxt, these values can be overwritten. Note
that you should also be able to define a filling value by specifying a
converter (think float(x or 0) for example)
* Missing values on space-separated fields are very tricky to handle:
take a line like "a,,,d". With a comma as separator, it's clear that
the 2nd and 3rd fields are missing.
Now, imagine that commas are actually spaces ( "a d"): 'd' is now
seen as the 2nd field of a 2-field record, not as the 4th field of a 4-
field record with 2 missing values. I thought about it, and kicked in
touch
* That said, there should be a way to deal with fixed-length fields,
probably by taking consecutive slices of the initial string. That way,
we should be able to keep track of missing data...
>
More information about the NumPy-Discussion
mailing list