[Numpy-discussion] Proposed change in genfromtxt(..., comments='#', names=True) behaviour

Paul Natsuo Kishimoto mail at paul.kishimoto.name
Mon Jul 16 15:06:15 EDT 2012


I've implemented this feature with skip_header=-1 as suggested by
Pierre, and in doing so removed the regression. TravisBot seems to like
it: https://github.com/numpy/numpy/pull/351

On Mon, 2012-07-16 at 16:12 +0200, Pierre GM wrote:
>         To be ultra clear (since I want to code this), you are
>         suggesting that
>         'first_commented_line' be a *new* accepted value for the kwarg
>         'names', to invoke the behaviour you suggest?
>         
>         
> 
> Nope, I was just referring to some hypothetical variable name. I meant
> that:
> 
> first_values = None
> try:
>     while not first_values:
>         first_line = fhd.next()
>         if names is True:
>             parsed = [m for m in first_line.split(comments) if
> m.strip()]
>             if parsed:
>                 first_value = split_line(parsed[0])
>         else:
>             ...
> 
> (it's not tested, I'm writing it as it comes. And I didn't even use
> the `first_commented_line` name, sorry)
> 
> 
>         If this IS what you mean, I'd counter-propose something in the
>         same spirit, but a bit simpler…we let the kwarg 'skip_header'
>         take some additional value, say int(0), int(-1), str('auto'),
>         or True.
>         
>         
> 
> 
>         In this case, instead of skipping a fixed number of lines, it
>         will skip any number of consecutive empty OR commented lines;
>         
>         
> 
> 
> I really like the idea of having `skip_header=-1` skip all the empty
> or commented lines (that is, lines whose first non-space character is
> the `comments` character). That'd be rather convenient.
> 
>  
> 
> 
>         The semantics of this are more intuitive, because this is what
>         I am
>         really after: to *skip* a commented *header* of arbitrary
>         length. So my four examples below could be parsed with:
>         
>         1. genfromtxt(..., names=True)
>         2. genfromtxt(..., names=True, skip_header=True)
>         3. genfromtxt(..., names=True)
>         4. genfromtxt(..., names=True, skip_header=True)
>         
>         …crucially #1 avoids the regression.
>         
>         
>         Does this seem good to everyone?
>         
>         
> 
> 
> Sounds good w/ `skip_header=-1`
> 
> 
>         But if this is NOT what you mean, then what you say does not
>         actually work with the simple use-case of my Example #2 below.
>         The first commented line is "# here is a..." with # as the
>         first non-space character, so the part after becomes the names
>         'here', 'is', 'a' etc.
>         
>         
> 
> 
> In that case, you could always use `skip_header=2`
> 
>         In short, the code can't resolve the ambiguity without some
>         extra
>         information from the user.
>         
>         
> 
> 
> It's always best not to let the code guess too much anyway...
> 
> Well, no regression, and you have a nice plan. I'm for it.
> Anybody else?
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Paul Natsuo Kishimoto

SM candidate, Technology & Policy Program (2012)
Research assistant,  http://globalchange.mit.edu
https://paul.kishimoto.name      +1 617 302 6105
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120716/8bb969aa/attachment.sig>


More information about the NumPy-Discussion mailing list