[Pandas-dev] Colon available everywhere

Stephan Hoyer shoyer at gmail.com
Thu Jul 19 14:46:03 EDT 2018


Yes, I'd love to see *args and **kwargs for __getitem__, but that's a much
bigger change. See also https://www.python.org/dev/peps/pep-0472/


On Thu, Jul 19, 2018 at 11:31 AM Chris Bartak <cbartak at gmail.com> wrote:

> I think this has also been discussed in forms too, but another syntax
> possibility would be expanding what is accepted inside __getitem__.
> Ignoring backwards compat for a second (can of worms how `*args`/existing
> tuple key behavior would interact), could envision something roughly like
> this, which could also solve the named indexer problem.
>
> class A:
>     def __getitem__(self, *args, **kwargs): print(args, kwargs)
> a = A()
>
> a[1, 2]
> # (1, 2), {}
>
> a[1, 2, b=3]
> # (1, 2), {'b': 3}
>
> a[1, 2, (:, 2), c=3]
> # (1, 2, (slice(None), 2)), {'c': 3}
>
>
>
> On Thu, Jul 19, 2018 at 11:34 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> I'm pretty sure this has been proposed before on Python-ideas. Definitely
>> search through the archives first.
>>
>> Another option I liked that involved no changes to Python syntax would be
>> to make indexing the built-in slice class return a slice object, e.g.,
>> slice[:5] -> slice(None, 5, None). But if I recall correctly that had been
>> shot down, too.
>> On Thu, Jul 19, 2018 at 8:17 AM Pietro Battiston <me at pietrobattiston.it>
>> wrote:
>>
>>> Il giorno mer, 18/07/2018 alle 09.01 +0200, Pietro Battiston ha
>>> scritto:
>>> > Il giorno mar, 17/07/2018 alle 16.10 -0700, William Ayd ha scritto:
>>> > > > - if, after creating all my columns, I want to e.g. select all
>>> > > > columns
>>> > > > that contain sums, I need to do some sort of "df[[col if
>>> > > > col.startswith("Sum of")]]". Compare to "df.loc[:, ('Sum',)]”
>>> > >
>>> > > Unless I am mistaken you would have to do something like
>>> > > "df.groupby('a').agg([sum]).loc[:, slice(None, 'sum’)]” to get that
>>> > > to work.
>>> >
>>> > Yeah, I had swapped the levels, it is
>>> >
>>> > df.groupby('a').agg([sum]).loc[:, (slice(None), 'sum’)]
>>> >
>>> >
>>> > > I don’t think that syntax really is that clean
>>> >
>>> > In my code I always start by defining
>>> >
>>> > WE = slice(None) # WhatEver
>>> >
>>> > and we could advertise this as a way to make the syntax shorter, but
>>> > regardless of that, it definitely is cleaner than any string
>>> > manipulation.
>>>
>>>
>>> Related to this, I'm curious about some opinion from pandas devs on an
>>> idea which I think would simplify our users' life (and by that, I don't
>>> only mean current users of current pandas API) at (almost) no cost.
>>>
>>> The colon in Python is meant for:
>>>
>>> 1) logical blocks:
>>>   if True:
>>>
>>> 2) separating args and body of a lambda:
>>>   lambda x : x**2
>>>
>>> 3) assignment expressions (since 3.8):
>>>   if (a := True):
>>>
>>> 4) separating key and value in dict:
>>>   {1 : 'a'}
>>>
>>> 5) define slices:
>>>   a_series.loc['2018-06-01':'2018-07-03']
>>>
>>> The last example is entirely indistinguishable from
>>> a_series.loc[slice('2018-06-01','2018-07-03')]
>>> ... but unfortunately, only works inside __getitem__ calls.
>>>
>>> My idea is: there is no obvious reason why it should be so, that is,
>>> why
>>>
>>> '2018-06-01':'2018-07-03'
>>>
>>> couldn't just be parsed as slice('2018-06-01','2018-07-03').
>>>
>>> The alternative uses 1)-4) of the colon imply that some precaution must
>>> be taken, but:
>>>
>>> 1) should not create ambiguity, as the ":" is always matched with a
>>> control flow statement
>>>
>>> 2) should not create ambiguity, as the ":" is always matched with the
>>> "lambda" statement
>>>
>>> 3) should not create ambiguity, as the ":" is always present close to
>>> "=", while the "slice interpretation" of ":" would never appear (unless
>>> nested) in the left part of an assignment
>>>
>>> 4) is the only potential problematic case, as
>>>   {2 : 3}
>>> could be interpreted as
>>>   {slice(2, 3)}
>>> but is currently interpreted as
>>>   dict([(1,3)])
>>>
>>> However, the solution could be to just prioritize the current
>>> interpretation, and use
>>>   {(2 : 3)}
>>> to force the second.
>>>
>>>
>>> If this proposal was implemented,
>>>
>>>   df.loc[:, (slice(None), 'sum’)]
>>>
>>> would finally just become
>>>
>>>   df.loc[:, (:, 'sum’)]
>>>
>>> at the cost of a minimal ambiguity (in the case shown above), which is
>>> easy to solve (and no more grave, I guess, than the fact that {} is an
>>> empty dict and not an empty set).
>>>
>>> For Python beginners, it would probably even simplify the understanding
>>> of slices (today, it is not trivial, I think, to understand that obj[:]
>>> is exactly equivalent to obj[slice(None)] - but that ":" does not per
>>> se mean anything).
>>> Moreover, it would mimick "...", which is instead available also
>>> outside of __getitem__ calls.
>>>
>>> Would it be crazy to propose a PEP with this?
>>>
>>> A milder form would be to allow ":" to be used only inside __getitem__
>>> calls, but also nested: I think however this would be more confusing
>>> and probably difficult to implement.
>>>
>>> Thoughts?
>>>
>>> Pietro
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180719/24376078/attachment.html>


More information about the Pandas-dev mailing list