[Pandas-dev] Colon available everywhere

Stephan Hoyer shoyer at gmail.com
Thu Jul 19 12:33:41 EDT 2018


I'm pretty sure this has been proposed before on Python-ideas. Definitely
search through the archives first.

Another option I liked that involved no changes to Python syntax would be
to make indexing the built-in slice class return a slice object, e.g.,
slice[:5] -> slice(None, 5, None). But if I recall correctly that had been
shot down, too.
On Thu, Jul 19, 2018 at 8:17 AM Pietro Battiston <me at pietrobattiston.it>
wrote:

> Il giorno mer, 18/07/2018 alle 09.01 +0200, Pietro Battiston ha
> scritto:
> > Il giorno mar, 17/07/2018 alle 16.10 -0700, William Ayd ha scritto:
> > > > - if, after creating all my columns, I want to e.g. select all
> > > > columns
> > > > that contain sums, I need to do some sort of "df[[col if
> > > > col.startswith("Sum of")]]". Compare to "df.loc[:, ('Sum',)]”
> > >
> > > Unless I am mistaken you would have to do something like
> > > "df.groupby('a').agg([sum]).loc[:, slice(None, 'sum’)]” to get that
> > > to work.
> >
> > Yeah, I had swapped the levels, it is
> >
> > df.groupby('a').agg([sum]).loc[:, (slice(None), 'sum’)]
> >
> >
> > > I don’t think that syntax really is that clean
> >
> > In my code I always start by defining
> >
> > WE = slice(None) # WhatEver
> >
> > and we could advertise this as a way to make the syntax shorter, but
> > regardless of that, it definitely is cleaner than any string
> > manipulation.
>
>
> Related to this, I'm curious about some opinion from pandas devs on an
> idea which I think would simplify our users' life (and by that, I don't
> only mean current users of current pandas API) at (almost) no cost.
>
> The colon in Python is meant for:
>
> 1) logical blocks:
>   if True:
>
> 2) separating args and body of a lambda:
>   lambda x : x**2
>
> 3) assignment expressions (since 3.8):
>   if (a := True):
>
> 4) separating key and value in dict:
>   {1 : 'a'}
>
> 5) define slices:
>   a_series.loc['2018-06-01':'2018-07-03']
>
> The last example is entirely indistinguishable from
> a_series.loc[slice('2018-06-01','2018-07-03')]
> ... but unfortunately, only works inside __getitem__ calls.
>
> My idea is: there is no obvious reason why it should be so, that is,
> why
>
> '2018-06-01':'2018-07-03'
>
> couldn't just be parsed as slice('2018-06-01','2018-07-03').
>
> The alternative uses 1)-4) of the colon imply that some precaution must
> be taken, but:
>
> 1) should not create ambiguity, as the ":" is always matched with a
> control flow statement
>
> 2) should not create ambiguity, as the ":" is always matched with the
> "lambda" statement
>
> 3) should not create ambiguity, as the ":" is always present close to
> "=", while the "slice interpretation" of ":" would never appear (unless
> nested) in the left part of an assignment
>
> 4) is the only potential problematic case, as
>   {2 : 3}
> could be interpreted as
>   {slice(2, 3)}
> but is currently interpreted as
>   dict([(1,3)])
>
> However, the solution could be to just prioritize the current
> interpretation, and use
>   {(2 : 3)}
> to force the second.
>
>
> If this proposal was implemented,
>
>   df.loc[:, (slice(None), 'sum’)]
>
> would finally just become
>
>   df.loc[:, (:, 'sum’)]
>
> at the cost of a minimal ambiguity (in the case shown above), which is
> easy to solve (and no more grave, I guess, than the fact that {} is an
> empty dict and not an empty set).
>
> For Python beginners, it would probably even simplify the understanding
> of slices (today, it is not trivial, I think, to understand that obj[:]
> is exactly equivalent to obj[slice(None)] - but that ":" does not per
> se mean anything).
> Moreover, it would mimick "...", which is instead available also
> outside of __getitem__ calls.
>
> Would it be crazy to propose a PEP with this?
>
> A milder form would be to allow ":" to be used only inside __getitem__
> calls, but also nested: I think however this would be more confusing
> and probably difficult to implement.
>
> Thoughts?
>
> Pietro
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180719/b24ba5c8/attachment.html>


More information about the Pandas-dev mailing list