[Numpy-discussion] loadtxt/savetxt tickets

Ralf Gommers ralf.gommers at googlemail.com
Thu Mar 31 10:42:13 EDT 2011


On Thu, Mar 31, 2011 at 4:53 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
> <paul.anton.letnes at gmail.com> wrote:
>>
>> On 26. mars 2011, at 21.44, Derek Homeier wrote:
>>
>> > Hi Paul,
>> >
>> > having had a look at the other tickets you dug up,
>> >
>> >> My opinions are my own, and in detail, they are:
>> >> 1752:
>> >>   I attach a possible patch. FWIW, I agree with the request. The
>> >> patch is written to be compatible with the fix in ticket #1562, but
>> >> I did not test that yet.
>> >
>> > Tested, see also my comments on Trac.
>>
>> Great!
>>
>> >> 1731:
>> >>   This seems like a rather trivial feature enhancement. I attach a
>> >> possible patch.
>> >
>> > Agreed. Haven't tested it though.
>>
>> Great!
>>
>> >> 1616:
>> >>   The suggested patch seems reasonable to me, but I do not have a
>> >> full list of what objects loadtxt supports today as opposed to what
>> >> this patch will support.
>>
>> Looks like you got this one. Just remember to make it compatible with
>> #1752. Should be easy.
>>
>> >> 1562:
>> >>   I attach a possible patch. This could also be the default
>> >> behavior to my mind, since the function caller can simply call
>> >> numpy.squeeze if needed. Changing default behavior would probably
>> >> break old code, however.
>> >
>> > See comments on Trac as well.
>>
>> Your patch is better, but there is one thing I disagree with.
>> 808    if X.ndim < ndmin:
>> 809        if ndmin == 1:
>> 810            X.shape = (X.size, )
>> 811        elif ndmin == 2:
>> 812            X.shape = (X.size, 1)
>> The last line should be:
>> 812            X.shape = (1, X.size)
>> If someone wants a 2D array out, they would most likely expect a one-row
>> file to come out as a one-row array, not the other way around. IMHO.
>>
>> >> 1458:
>> >>   The fix suggested in the ticket seems reasonable, but I have
>> >> never used record arrays, so I am not sure  of this.
>> >
>> > There were some issues with Python3, and I also had some general
>> > reservations
>> > as noted on Trac - basically, it makes 'unpack' equivalent to
>> > transposing for 2D-arrays,
>> > but to splitting into fields for 1D-recarrays. My question was, what's
>> > going to happen
>> > when you get to 2D-recarrays? Currently this is not an issue since
>> > loadtxt can only
>> > read 2D regular or 1D structured arrays. But this might change if the
>> > data block
>> > functionality (see below) were to be implemented - data could then be
>> > returned as
>> > 3D arrays or 2D structured arrays... Still, it would probably make
>> > most sense (or at
>> > least give the widest functionality) to have 'unpack=True' always
>> > return a list or iterator
>> > over columns.
>>
>> OK, I don't know recarrays, as I said.
>>
>> >> 1445:
>> >>   Adding this functionality could break old code, as some old
>> >> datafiles may have empty lines which are now simply ignored. I do
>> >> not think the feature is a good idea. It could rather be implemented
>> >> as a separate function.
>> >> 1107:
>> >>   I do not see the need for this enhancement. In my eyes, the
>> >> usecols kwarg does this and more. Perhaps I am misunderstanding
>> >> something here.
>> >
>> > Agree about #1445, and the bit about 'usecols' - 'numcols' would just
>> > provide a
>> > shorter call to e.g. read the first 20 columns of a file (well, not
>> > even that much
>> > over 'usecols=range(20)'...), don't think that justifies an extra
>> > argument.
>> > But the 'datablocks' provides something new, that a number of people
>> > seem
>> > to miss from e.g. gnuplot (including me, actually ;-). And it would
>> > also satisfy the
>> > request from #1445 without breaking backwards compatibility.
>> > I've been wondering if could instead specify the separator lines
>> > through the
>> > parameter, e.g. "blocksep=['None', 'blank','invalid']", not sure if
>> > that would make
>> > it more useful...
>>
>> What about writing a separate function, e.g. loadblocktxt, and have it
>> separate the chunks and call loadtxt for each chunk? Just a thought. Another
>> possibility would be to write a function that would let you load a set of
>> text files in a directory, and return a dict of datasets, one per file. One
>> could write a similar save-function, too. They would just need to call
>> loadtxt/savetxt on a per-file basis.
>>
>> >> 1071:
>> >>      It is not clear to me whether loadtxt is supposed to support
>> >> missing values in the fashion indicated in the ticket.
>> >
>> > In principle it should at least allow you to, by the use of converters
>> > as described there.
>> > The problem is, the default delimiter is described as 'any
>> > whitespace', which in the
>> > present implementation obviously includes any number of blanks or
>> > tabs. These
>> > are therefore treated differently from delimiters like ',' or '&'. I'd
>> > reckon there are
>> > too many people actually relying on this behaviour to silently change it
>> > (e.g. I know plenty of tables with columns separated by either one or
>> > several
>> > tabs depending on the length of the previous entry). But the tab is
>> > apparently also
>> > treated differently if explicitly specified with "delimiter='\t'" -
>> > and in that case using
>> > a converter à la {2: lambda s: float(s or 'Nan')} is working for
>> > fields in the middle of
>> > the line, but not at the end - clearly warrants improvement. I've
>> > prepared a patch
>> > working for Python3 as well.
>>
>> Great!
>>
>> >> 1163:
>> >> 1565:
>> >>   These tickets seem to have the same origin of the problem. I
>> >> attach one possible patch. The previously suggested patches that
>> >> I've seen will not correctly convert floats to ints, which I believe
>> >> my patch will.
>> >
>> > +1, though I am a bit concerned that prompting to raise a ValueError
>> > for every
>> > element could impede performance. I'd probably still enclose it into an
>> > if issubclass(typ, np.uint64) or issubclass(typ, np.int64):
>> > just like in npio.patch. I also thought one might switch to
>> > int(float128(x)) in that
>> > case, but at least for the given examples float128 cannot convert with
>> > more
>> > accuracy than float64 (even on PowerPC ;-).
>> > There were some dissenting opinions that trying to read a float into
>> > an int should
>> > generally throw an exception though...
>> >
>> > And Chuck just beat me...
>>
>> I am sure someone has been using this functionality to convert floats to
>> ints. Changing will break their code. I am not sure how big a deal that
>> would be. Also, I am of the opinion that one should _first_ write a program
>> that works _correctly_, and only afterwards worry about performance. Even
>> so, using numpy.int64 would be better, because it would give a sensible
>> error message.
>>
>> > On 26 Mar 2011, at 21:25, Charles R Harris wrote:
>> >
>> >> I put all these patches together at
>> >> https://github.com/charris/numpy/tree/loadtxt-savetxt
>> >> . Please pull from there to continue work on loadtxt/savetxt so as
>> >> to avoid conflicts in the patches. One of the numpy tests is
>> >> failing, I assume from patch conflicts, and more tests for the
>> >> tickets are needed in any case. Also, new keywords should be added
>> >> to the end, not put in the middle of existing keywords.
>> >>
>> >> I haven't reviewed the patches, just tried to get them organized.
>> >> Also, I have Derek as the author on all of them, that can be changed
>> >> if it is decided the credit should go elsewhere ;) Thanks for the
>> >> work you all have been doing on these tickets.
>> >
>> > Thanks, I'll have a look at the new ticket and try to get that
>> > organized!
>>
>> With some luck, all the loadtxt tickets should be closed in short time :-)
>>
>
> Can some one bring me up to date on which tickets have been dealt with?
> Also, how do they divide into bug fixes and enhancements?

If you look in Trac under "All Tickets by Milestone" you'll find all
nine tickets together under 1.6.0. Five are bug fixes, four are
enhancements. There are some missing tests, but all tickets have
proposed patches.

Cheers,
Ralf



More information about the NumPy-Discussion mailing list