[Pandas-dev] pandas types

Joris Van den Bossche jorisvandenbossche at gmail.com
Wed Aug 29 04:35:46 EDT 2018


Op za 18 aug. 2018 om 21:39 schreef Marc Garcia <garcia.marc at gmail.com>:

> Sorry for the lack of context in my first email. A wapper around numpy,
> arrow (and possibly others) is what I had in mind. As well as a way to
> abstract the user on whether the type has a direct physical representation
> (int, float) or not (category...).
>

I am not fully sure how possible this is in practice with current numpy. Eg
a custom dtype class can never be compared to a numpy dtype (it will always
raise a TypeError if numpy does not recognize it) due to the way numpy has
implemented dtype comparisons.
To actually write a dtype object that is compatible with numpy, I think
this can currently only be done in C by writing an actual new numpy dtype
(but I might be wrong here). So I am not sure that a simple system that
wraps numpy's dtypes is actually possible.

I agree with the points you raise for why we would want our own dtype
objects, and I also think we should do this in the long term.
But I doubt that it can currently be done without a big backwards
compatibility break (even the light wrapping to provide a consistent
experience to our users). And if that is the case, I don't think we should
consider that for pandas 1.0

Joris


> This document (I guess Wes wrote it), is why I was assuming this was
> already in the agenda:
> https://pandas-dev.github.io/pandas2/internal-architecture.html#high-level-logical-type-proposal
>
> My proposal wasn't anything else besides what the document says. I was
> just proposing to make the change (at least the API part) sooner rather
> than later. IMO ideally before pandas 1.0, for the reasons I mentioned.
>
> On Sat, Aug 18, 2018 at 8:25 PM Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>> Your third advantage is the most compelling to me.
>>
>> I don't think we really have the developer bandwidth or expertise to
>> develop our own type system. And I don't think it'd be a good
>> from an ecosystem perspective either, as we want fundamental things like
>> dtypes to be shared across projects. Currently that's
>> NumPy's dtype system. But I could maybe see the advantage of a very
>> simple system that wraps NumPy's (or someday Arrow's
>> or some other library).
>>
>> Wasn't there a dtypes BoF at SciPy this year? Did anything come of that?
>>
>>
>>
>>
>> On Fri, Aug 17, 2018 at 10:21 AM Marc Garcia <garcia.marc at gmail.com>
>> wrote:
>>
>>> I was thinking that it could be a good idea to start using pandas types
>>> before pandas 1.0 (I think this change was assumed to happen sooner or
>>> later).
>>>
>>> Meaning that instead of something like `df.astype(numpy.uint8)` or
>>> `df.astype('category')` users would have to use `df.astype(pandas.uint8)`
>>> or `df.astype(pandas.category)`.
>>>
>>> I see 3 advantages on doing it before 1.0:
>>> - The API would be clearer and more consistent for users (and creating
>>> new extension types will be more controlled).
>>> - IMO users will be excited about migrating to pandas 1.0, and as the
>>> change will be quite trivial for them, I think the adoption of the new
>>> syntax will be faster, than if left until later.
>>> - I think it should allow us to make some internal changes transparently
>>> (e.g. replacing numpy).
>>>
>>> I think as a first version, the change could be almost as simple as
>>> implementing the pandas types as classes extending a base class, with an
>>> attribute that maps the current type. And then in every function/method
>>> that receives a dtype, check if the type is a pandas type, do the lookup if
>>> it is, and show a deprecation warning if it's not.
>>>
>>> Does this make sense? Am I missing something?
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20180829/b9370e53/attachment.html>


More information about the Pandas-dev mailing list