[Numpy-discussion] genfromtxt - the return

Tue Oct 6 22:08:58 EDT 2009

On Tue, Oct 6, 2009 at 4:04 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>
> On Oct 6, 2009, at 4:43 PM, Christopher Barker wrote:
>
>> Pierre GM wrote:
>>>> I think that the default invalid_raise should be True.
>>>
>>> Mmh, OK, that's a +1/) for invalid_raise=true. Anybody else ?
>>
>> yup -- make it +2 -- ignoring erreos and losing data by default is a
>> "bad idea"!
>
> OK then, that's enough for me: I'll put invalid_raise as True by
> default. Note that a warning was emitted no matter what.
>
>
>>
>>>> One 'feature' is that there is no way to indicate multiple
>>>> delimiters
>>>> when the delimiter is whitespace.
>>>> A B C D
>>>> 1 2 3 4
>>>> 1     4 5
>>
>> I'd say someone has made a very poor choice of file formats!

No, just seeing what sort of problems I can create. This case is
partly based on if someone is using tab-delimited then they need to
set the delimiter='\t' otherwise it gives an error. Also I often parse
text files so, yes, you have to be careful of the delimiters. It is
also arises because certain programs like spreadsheets there is the
option to merge delimiters - actually in SAS it is default (you need
to specify the DSD option).

>>
>> Unless this s a fixed width file, in which case it should be processes
>> as such, rather than as a delimited one. I suppose it wouldn't hurt to
>> add that feature to genfromtxt.. or is it there already. Perhaps
>> that's
>> what this means:
>>
>>> Have you tried using a sequence of integers for the delimiter ?
>
> Yes, if you give a sequence of integers as delimiter, it is
> interpreted as the length of each field. At least, should be.

More to learn and test.

Anyhow, I am really impressed on how this function works.

Bruce