[SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers?

Skipper Seabold jsseabold at gmail.com
Thu Jun 3 11:06:40 EDT 2010


On Wed, Jun 2, 2010 at 1:14 PM, Stefan <stefan.czesla at hs.uni-hamburg.de> wrote:
>
>> Not that I am complaining rather trying to understand what is expected
>> to happen.
>> Under the patch, it is very much user beware.  The header argument can
>> be anything or nothing. There is no check for the contents or if the
>> delimiter used is the same as the rest of the output. Further with the
>> newline option there is no guarantee that the lines in the header will
>> have the same line endings throughout the file.
>> So what should a user be allowed to use as a header?
>> You could write a whole program there or an explanation of the
>> following output - which is very appealing. You could force a list of
>> strings so that you print out newline.join(header) - okay not quite
>> because it should include the comment argument.
>> Should savetxt be restricted to something that loadtxt can read?
>> This is potentially problematic if you want a header line. Although it
>> could return the number of header lines.
>> [savetxt should also be updated to allow bz2 as loadtxt handles those
>> now - not that I have used it]
>>
>>
>>
>>
>> Also note that since that patch was written, savetxt takes a user
>> supplied newline keyword, so you can just append that to the header
>> string.
>>
>>
>>
>>   True, we were not aware of this, but this does not help much for the
>> comment/header.
>>
>>
>>
>> Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180
>> Should this be forced to check for valid options for new lines?
>> Otherwise you from this  'np.savetxt('junk.text', [1,2,3,4,5],
>> newline='what')' you get:
>>
> 1.000000000000000000e+00what2.000000000000000000e+00what
> 3.000000000000000000e+00what4.000000000000000000e+00
> what5.000000000000000000e+00what
>> Which is not going to be read back by loadtxt.
>>
>>
>>
>> As numpy.loadtxt has a default comment character ('#'), the same may be
>> implemented for numpy.savetxt. In this case, numpy.savetxt would get two
>> additional keywords (e.g. header, comment(character)), which bloats the
>> interface, but potentially provides more safety.
>>
>>
>>
>>
>> FWIW, I ended up rolling my own using the most recent pre-Python 3
>> changes for savetxt that accepts a list of names instead of one string
>> or if the provided array has the attribute dtype.names (non-nested rec
>> or structured arrays) it uses those.  Whatever is done I think the
>> support for structured arrays is nice, and I think having this
>> functionality is a no-brainer.  I need it quite often.
>>
>>
>>
>>   Although, we have not been using record arrays too often, we see their
>> advantages and agree that it should be possible to use them as you described
>> it.
>> We also thought about a solution, using the __str__ method for the 'header
>> object'. In this vain, an arbitrary header class (including a plane string)
>> providing an __str__ member may be handed to numpy.savetxt,
>> which can use it to write the header.
>>
>
>
> So let us briefly summarize whats on the table. It appears to us that
> there are basically three open issues:
> (1) a csv like header for savetxt written files (first line contains column
>    names)
> (2) comments (introduced by comment character e.g. '#') at the beginning
>    of the file (preceding the data)
> (3) the role of the 'newline' option
>
> As was noted, the patch (ticket 1079) enables both to write a csv like
> header (1) and comment line(s) introduced by a comment character (e.g. '#').
> Nonetheless, this solution is quite unsatisfactory
> in our opinion, because it may be error prone,
> as the user is in charge of the entire formatting. Despite this, we think
> that it should be up to the user what amount of information is to be put
> at the top of the file, but the format should be checked as far as possible.
>
> Using either a string or a list/tuple of strings, as proposed by Bruce,
> seems to be a reasonable possibility to implement the desired functionality.
> Maybe two individual keywords ('header' and 'comment') should exist to
> distinguish whether the the user requests case (1) or (2). As for loadtxt
> the default comment character should be '#', but it may be changed by the
> user.
>
> We think that savetxt should not be restricted to output, which can be read
> by loadtxt. Although it should be possible to add commments to the output
> file, so that it remains readable by loadtxt (without tweaking it
> e.g. with the skiprows keyword).
>

Thanks.  This does clear up my confusion and I think having both a
header and a comments keyword makes sense.  For the form, as I said, I
went with a list of strings, as I encounter this more often than one
string, but in the end it's all the same to me.

Glad this is getting some attention.

> We agree that the newline keyword may cause inconsistencies in the file
> (if ticket 1079 were applied),
> and possibly strange behavior such as when newline='what' is specified.
> Yet, this question does not only concern the header/comments.
>
> Stefan & Christian
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list