[SciPy-Dev] np.savetxt: apply patch in enhancement ticket 1079 to add headers?

Bruce Southey bsouthey at gmail.com
Thu Jun 3 09:27:26 EDT 2010


On 06/02/2010 12:14 PM, Stefan wrote:
>    
>> Not that I am complaining rather trying to understand what is expected
>> to happen.
>> Under the patch, it is very much user beware.  The header argument can
>> be anything or nothing. There is no check for the contents or if the
>> delimiter used is the same as the rest of the output. Further with the
>> newline option there is no guarantee that the lines in the header will
>> have the same line endings throughout the file.
>> So what should a user be allowed to use as a header?
>> You could write a whole program there or an explanation of the
>> following output - which is very appealing. You could force a list of
>> strings so that you print out newline.join(header) - okay not quite
>> because it should include the comment argument.
>> Should savetxt be restricted to something that loadtxt can read?
>> This is potentially problematic if you want a header line. Although it
>> could return the number of header lines.
>> [savetxt should also be updated to allow bz2 as loadtxt handles those
>> now - not that I have used it]
>>
>>
>>
>>
>> Also note that since that patch was written, savetxt takes a user
>> supplied newline keyword, so you can just append that to the header
>> string.
>>
>>
>>
>>    True, we were not aware of this, but this does not help much for the
>> comment/header.
>>
>>
>>
>> Entered as ~3 months ago:http://projects.scipy.org/numpy/changeset/8180
>> Should this be forced to check for valid options for new lines?
>> Otherwise you from this  'np.savetxt('junk.text', [1,2,3,4,5],
>> newline='what')' you get:
>>
>>      
> 1.000000000000000000e+00what2.000000000000000000e+00what
> 3.000000000000000000e+00what4.000000000000000000e+00
> what5.000000000000000000e+00what
>    
>> Which is not going to be read back by loadtxt.
>>
>>
>>
>> As numpy.loadtxt has a default comment character ('#'), the same may be
>> implemented for numpy.savetxt. In this case, numpy.savetxt would get two
>> additional keywords (e.g. header, comment(character)), which bloats the
>> interface, but potentially provides more safety.
>>
>>
>>
>>
>> FWIW, I ended up rolling my own using the most recent pre-Python 3
>> changes for savetxt that accepts a list of names instead of one string
>> or if the provided array has the attribute dtype.names (non-nested rec
>> or structured arrays) it uses those.  Whatever is done I think the
>> support for structured arrays is nice, and I think having this
>> functionality is a no-brainer.  I need it quite often.
>>
>>
>>
>>    Although, we have not been using record arrays too often, we see their
>> advantages and agree that it should be possible to use them as you described
>> it.
>> We also thought about a solution, using the __str__ method for the 'header
>> object'. In this vain, an arbitrary header class (including a plane string)
>> providing an __str__ member may be handed to numpy.savetxt,
>> which can use it to write the header.
>>
>>      
>
> So let us briefly summarize whats on the table. It appears to us that
> there are basically three open issues:
> (1) a csv like header for savetxt written files (first line contains column
>      names)
> (2) comments (introduced by comment character e.g. '#') at the beginning
>      of the file (preceding the data)
> (3) the role of the 'newline' option
>
> As was noted, the patch (ticket 1079) enables both to write a csv like
> header (1) and comment line(s) introduced by a comment character (e.g. '#').
> Nonetheless, this solution is quite unsatisfactory
> in our opinion, because it may be error prone,
> as the user is in charge of the entire formatting. Despite this, we think
> that it should be up to the user what amount of information is to be put
> at the top of the file, but the format should be checked as far as possible.
>
> Using either a string or a list/tuple of strings, as proposed by Bruce,
> seems to be a reasonable possibility to implement the desired functionality.
> Maybe two individual keywords ('header' and 'comment') should exist to
> distinguish whether the the user requests case (1) or (2). As for loadtxt
> the default comment character should be '#', but it may be changed by the
> user.
>
> We think that savetxt should not be restricted to output, which can be read
> by loadtxt. Although it should be possible to add commments to the output
> file, so that it remains readable by loadtxt (without tweaking it
> e.g. with the skiprows keyword).
>
> We agree that the newline keyword may cause inconsistencies in the file
> (if ticket 1079 were applied),
> and possibly strange behavior such as when newline='what' is specified.
> Yet, this question does not only concern the header/comments.
>
> Stefan&  Christian
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>    
I am in agreement with what you suggest so post a patch. :-)

Some of what I suggested was over thinking what can really be done and 
keep the function relatively simple and easy to use.

My wish list would be that:
1) If the header is added that it allows names from structured/record 
arrays to be used and perhaps autogenerated (such as var1, var2, ..., varn).
2) That the dtype of the array_like input be used in the fmt when fmt is 
not provided.


Bruce



More information about the SciPy-Dev mailing list