[Numpy-discussion] Proposed change in genfromtxt(..., comments='#', names=True) behaviour
Paul Natsuo Kishimoto
mail at paul.kishimoto.name
Mon Jul 16 15:06:15 EDT 2012
I've implemented this feature with skip_header=-1 as suggested by
Pierre, and in doing so removed the regression. TravisBot seems to like
it: https://github.com/numpy/numpy/pull/351
On Mon, 2012-07-16 at 16:12 +0200, Pierre GM wrote:
> To be ultra clear (since I want to code this), you are
> suggesting that
> 'first_commented_line' be a *new* accepted value for the kwarg
> 'names', to invoke the behaviour you suggest?
>
>
>
> Nope, I was just referring to some hypothetical variable name. I meant
> that:
>
> first_values = None
> try:
> while not first_values:
> first_line = fhd.next()
> if names is True:
> parsed = [m for m in first_line.split(comments) if
> m.strip()]
> if parsed:
> first_value = split_line(parsed[0])
> else:
> ...
>
> (it's not tested, I'm writing it as it comes. And I didn't even use
> the `first_commented_line` name, sorry)
>
>
> If this IS what you mean, I'd counter-propose something in the
> same spirit, but a bit simpler…we let the kwarg 'skip_header'
> take some additional value, say int(0), int(-1), str('auto'),
> or True.
>
>
>
>
> In this case, instead of skipping a fixed number of lines, it
> will skip any number of consecutive empty OR commented lines;
>
>
>
>
> I really like the idea of having `skip_header=-1` skip all the empty
> or commented lines (that is, lines whose first non-space character is
> the `comments` character). That'd be rather convenient.
>
>
>
>
> The semantics of this are more intuitive, because this is what
> I am
> really after: to *skip* a commented *header* of arbitrary
> length. So my four examples below could be parsed with:
>
> 1. genfromtxt(..., names=True)
> 2. genfromtxt(..., names=True, skip_header=True)
> 3. genfromtxt(..., names=True)
> 4. genfromtxt(..., names=True, skip_header=True)
>
> …crucially #1 avoids the regression.
>
>
> Does this seem good to everyone?
>
>
>
>
> Sounds good w/ `skip_header=-1`
>
>
> But if this is NOT what you mean, then what you say does not
> actually work with the simple use-case of my Example #2 below.
> The first commented line is "# here is a..." with # as the
> first non-space character, so the part after becomes the names
> 'here', 'is', 'a' etc.
>
>
>
>
> In that case, you could always use `skip_header=2`
>
> In short, the code can't resolve the ambiguity without some
> extra
> information from the user.
>
>
>
>
> It's always best not to let the code guess too much anyway...
>
> Well, no regression, and you have a nice plan. I'm for it.
> Anybody else?
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Paul Natsuo Kishimoto
SM candidate, Technology & Policy Program (2012)
Research assistant, http://globalchange.mit.edu
https://paul.kishimoto.name +1 617 302 6105
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120716/8bb969aa/attachment.sig>
More information about the NumPy-Discussion
mailing list