From achabot at enthought.com  Sun Feb  2 15:53:24 2020
From: achabot at enthought.com (Alexandre Chabot-Leclerc)
Date: Sun, 2 Feb 2020 14:53:24 -0600
Subject: [Pandas-dev] =?utf-8?q?SciPy_2020_Tutorials_=E2=80=94_Call_for_P?=
 =?utf-8?q?apers?=
Message-ID: <CAKxnwBpa4Vxc8uwoYUcVSevZxNhTdXpR559pvCz7qr3VTb08Ug@mail.gmail.com>

Dear pandas developers,

The SciPy tutorial committee is interested in including a tutorial on
advanced pandas this year and we would like to encourage you to submit
for SciPy 2020. Tutorials will be presented July 6?7 and, as usual,
the conference is in Austin. In recognition of the effort required to
plan and prepare a high quality tutorial, we pay a stipend of $1,000
to each instructor (or team of instructors) for each half-day session
they lead. You may find additional information at the link below, and
the deadline for submission is February 11.

https://www.scipy2020.scipy.org/tutorials

You can find examples of successful submissions on GitHub:
https://github.com/scipy-conference/scipy-conference

Please feel free to email with any questions.

Kind regards,

Scipy 2020 Tutorial Chairs

Serah Njambi Rono
Alexandre Chabot-Leclerc
Mike Hearne

From jorisvandenbossche at gmail.com  Mon Feb 10 12:43:21 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 10 Feb 2020 18:43:21 +0100
Subject: [Pandas-dev] What could a pandas 2.0 look like?
Message-ID: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>

pandas 1.0 is out, so time to start thinking about 2.0 ;)

In principle, pandas 2.0 will just be one of the next releases when we
decide we want to clean-up the deprecations / make a few changes that are
hard to deprecate (following our new versioning policy).
But nonetheless, I think it can still be interesting to think about it if
it can also be something more than that, and have more specific goals in
mind*.

Last year I made the pd.NA proposal, which resulted in using that for the
nullale integer, boolean and string dtypes. In the proposal, pd.NA was
described as "can be used consistently across all data types". And for me,
the aspirational end goal of this proposal is to *actually* have this for
*all* dtypes, but we never really discussed this aspect explicitly.

So, for me, a possible future pandas 2.0:

   - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
   missing value indicator). That means we add a nullable version of all other
   dtypes (as we now already did for int, boolean, string). End goal: a single
   missing value indicator with the same behavior for all dtypes.
   - If we add such nullable dtypes using the extension dtypes/array
   mechanism (so it can first be opt-in in 1.X), this could "automatically"
   lead to a simplification of the internals / Block Manager (another
   aspirational goal that has been discussed before, but never became
   concrete). Because in such a case (all extension dtypes), we would only be
   using 1D blocks (simplifying the 1D / 2D thorny cases in internals). This
   simplifies the memory model, consolidation, etc

Do you think this is a desirable goal? And realistic? Other aspirational
goals?

Best,
Joris

*Agreeing on goals doesn't mean it will happen, that's open source (or at
least community-based open source). But I think it can still be useful to
guide some efforts where possible or in trying to get traction for certain
issues from contributors. And then we can still see if it gets done in 2.0,
3.0, 4.0 or never ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200210/72e1de4d/attachment.html>

From tom.augspurger88 at gmail.com  Tue Feb 11 10:02:57 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Tue, 11 Feb 2020 09:02:57 -0600
Subject: [Pandas-dev] Monthly Dev Meeting - February 202
Message-ID: <CAE1aY-khehz6YLM-b2Kew3NmMT=9+vzfmDyfFrWndUM==8c-VQ@mail.gmail.com>

Hi all,

The next monthly dev call is tomorrow (Wednesday, February 12th) at 18:00
UTC. We invite all to come.
https://dev.pandas.io/docs/development/meeting.html

We'll use this Zoom link: https://zoom.us/j/942410248

The agenda is at
https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing
if you have topics to discuss.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200211/30c9993c/attachment.html>

From irv at princeton.com  Tue Feb 11 12:13:35 2020
From: irv at princeton.com (Irv Lustig)
Date: Tue, 11 Feb 2020 12:13:35 -0500
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <mailman.30.1581440403.4593.pandas-dev@python.org>
References: <mailman.30.1581440403.4593.pandas-dev@python.org>
Message-ID: <CAG4hgvO7OvDZhHXh8oxRjekFmX81nQ8WubnHUKHeGcU6s9uoAA@mail.gmail.com>

Joris:

Another aspirational goal for pandas 2.0 would be to clean up the API so
that index names and column names are treated equivalently throughout.  I
created a meta-issue for this 6+ months ago:
https://github.com/pandas-dev/pandas/issues/27652

Dr-Irv


> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 10 Feb 2020 18:43:21 +0100
> From: Joris Van den Bossche <jorisvandenbossche at gmail.com>
> To: pandas-dev <pandas-dev at python.org>
> Subject: [Pandas-dev] What could a pandas 2.0 look like?
> Message-ID:
>         <
> CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> pandas 1.0 is out, so time to start thinking about 2.0 ;)
>
> In principle, pandas 2.0 will just be one of the next releases when we
> decide we want to clean-up the deprecations / make a few changes that are
> hard to deprecate (following our new versioning policy).
> But nonetheless, I think it can still be interesting to think about it if
> it can also be something more than that, and have more specific goals in
> mind*.
>
> Last year I made the pd.NA proposal, which resulted in using that for the
> nullale integer, boolean and string dtypes. In the proposal, pd.NA was
> described as "can be used consistently across all data types". And for me,
> the aspirational end goal of this proposal is to *actually* have this for
> *all* dtypes, but we never really discussed this aspect explicitly.
>
> So, for me, a possible future pandas 2.0:
>
>    - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
>    missing value indicator). That means we add a nullable version of all
> other
>    dtypes (as we now already did for int, boolean, string). End goal: a
> single
>    missing value indicator with the same behavior for all dtypes.
>    - If we add such nullable dtypes using the extension dtypes/array
>    mechanism (so it can first be opt-in in 1.X), this could "automatically"
>    lead to a simplification of the internals / Block Manager (another
>    aspirational goal that has been discussed before, but never became
>    concrete). Because in such a case (all extension dtypes), we would only
> be
>    using 1D blocks (simplifying the 1D / 2D thorny cases in internals).
> This
>    simplifies the memory model, consolidation, etc
>
> Do you think this is a desirable goal? And realistic? Other aspirational
> goals?
>
> Best,
> Joris
>
> *Agreeing on goals doesn't mean it will happen, that's open source (or at
> least community-based open source). But I think it can still be useful to
> guide some efforts where possible or in trying to get traction for certain
> issues from contributors. And then we can still see if it gets done in 2.0,
> 3.0, 4.0 or never ;)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200211/faf22be9/attachment.html>

From tom.augspurger88 at gmail.com  Wed Feb 12 18:28:48 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Wed, 12 Feb 2020 17:28:48 -0600
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
Message-ID: <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>

Thanks Joris.

This was discussed on the call today:
https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing.
I'll try to summarize the discussion here.

On NA by default, there were a few concerns, none of which is likely a
blocker. Things like the memory overhead of masks can be improved by making
them optional (relatively easy) and possibly using a bitmask (probably
harder).

I wondered if this was blocked by the BlockManager being written in Python.
This change would imply that blockwise ops would become columwise, so we'll
have more overhead for some ops. Joris cited a bit of work he did to make
this not too bad, at least for not too wide of tables.

I also wondered whether this would be inappropriate as long as NA lives in
pandas, rather than something that is understood by the entire scientific
python ecosystem. It's worth thinking about and seeing how the community
reacts to NA. Probably not a blocker.

This would also imply creating a nullable float dtype and making our
datelikes use NA rather than NaT too. That seemed to be generally OK, but
wasn't discussed too much.

---

Other "2.0" topics included rethinking our dependencies. It's possible
Arrow could be added. Going nullable by default would make Arrow a pretty
attractive option for storing arrays. But we would need to consult with our
downstream dependencies (like xarray) and users about that.

Fixing __getitem__ was also discussed. That will take someone to write up a
detailed proposal about specific proposed changes and possible deprecation
paths.

Tom

On Mon, Feb 10, 2020 at 11:43 AM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> pandas 1.0 is out, so time to start thinking about 2.0 ;)
>
> In principle, pandas 2.0 will just be one of the next releases when we
> decide we want to clean-up the deprecations / make a few changes that are
> hard to deprecate (following our new versioning policy).
> But nonetheless, I think it can still be interesting to think about it if
> it can also be something more than that, and have more specific goals in
> mind*.
>
> Last year I made the pd.NA proposal, which resulted in using that for the
> nullale integer, boolean and string dtypes. In the proposal, pd.NA was
> described as "can be used consistently across all data types". And for me,
> the aspirational end goal of this proposal is to *actually* have this for
> *all* dtypes, but we never really discussed this aspect explicitly.
>
> So, for me, a possible future pandas 2.0:
>
>    - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
>    missing value indicator). That means we add a nullable version of all other
>    dtypes (as we now already did for int, boolean, string). End goal: a single
>    missing value indicator with the same behavior for all dtypes.
>    - If we add such nullable dtypes using the extension dtypes/array
>    mechanism (so it can first be opt-in in 1.X), this could "automatically"
>    lead to a simplification of the internals / Block Manager (another
>    aspirational goal that has been discussed before, but never became
>    concrete). Because in such a case (all extension dtypes), we would only be
>    using 1D blocks (simplifying the 1D / 2D thorny cases in internals). This
>    simplifies the memory model, consolidation, etc
>
> Do you think this is a desirable goal? And realistic? Other aspirational
> goals?
>
> Best,
> Joris
>
> *Agreeing on goals doesn't mean it will happen, that's open source (or at
> least community-based open source). But I think it can still be useful to
> guide some efforts where possible or in trying to get traction for certain
> issues from contributors. And then we can still see if it gets done in 2.0,
> 3.0, 4.0 or never ;)
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200212/339dfb16/attachment.html>

From jbrockmendel at gmail.com  Thu Feb 13 17:23:20 2020
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Thu, 13 Feb 2020 14:23:20 -0800
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
Message-ID: <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>

> This would also imply creating a nullable float dtype and making our
datelikes use NA rather than NaT too. That seemed to be generally OK, but
wasn't discussed too much.

My understanding of the discussion is that using a mask on top of
datetimelike arrays would not _replace_ NaT, but supplement it with
something semantically different.  Replacing NaT with NA breaks arithmetic
consistency, as has been discussed ad nauseum.

On Wed, Feb 12, 2020 at 3:29 PM Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> Thanks Joris.
>
> This was discussed on the call today:
> https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing.
> I'll try to summarize the discussion here.
>
> On NA by default, there were a few concerns, none of which is likely a
> blocker. Things like the memory overhead of masks can be improved by making
> them optional (relatively easy) and possibly using a bitmask (probably
> harder).
>
> I wondered if this was blocked by the BlockManager being written in
> Python. This change would imply that blockwise ops would become columwise,
> so we'll have more overhead for some ops. Joris cited a bit of work he did
> to make this not too bad, at least for not too wide of tables.
>
> I also wondered whether this would be inappropriate as long as NA lives in
> pandas, rather than something that is understood by the entire scientific
> python ecosystem. It's worth thinking about and seeing how the community
> reacts to NA. Probably not a blocker.
>
> This would also imply creating a nullable float dtype and making our
> datelikes use NA rather than NaT too. That seemed to be generally OK, but
> wasn't discussed too much.
>
> ---
>
> Other "2.0" topics included rethinking our dependencies. It's possible
> Arrow could be added. Going nullable by default would make Arrow a pretty
> attractive option for storing arrays. But we would need to consult with our
> downstream dependencies (like xarray) and users about that.
>
> Fixing __getitem__ was also discussed. That will take someone to write up
> a detailed proposal about specific proposed changes and possible
> deprecation paths.
>
> Tom
>
> On Mon, Feb 10, 2020 at 11:43 AM Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> pandas 1.0 is out, so time to start thinking about 2.0 ;)
>>
>> In principle, pandas 2.0 will just be one of the next releases when we
>> decide we want to clean-up the deprecations / make a few changes that are
>> hard to deprecate (following our new versioning policy).
>> But nonetheless, I think it can still be interesting to think about it if
>> it can also be something more than that, and have more specific goals in
>> mind*.
>>
>> Last year I made the pd.NA proposal, which resulted in using that for
>> the nullale integer, boolean and string dtypes. In the proposal, pd.NA was
>> described as "can be used consistently across all data types". And for me,
>> the aspirational end goal of this proposal is to *actually* have this
>> for *all* dtypes, but we never really discussed this aspect explicitly.
>>
>> So, for me, a possible future pandas 2.0:
>>
>>    - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
>>    missing value indicator). That means we add a nullable version of all other
>>    dtypes (as we now already did for int, boolean, string). End goal: a single
>>    missing value indicator with the same behavior for all dtypes.
>>    - If we add such nullable dtypes using the extension dtypes/array
>>    mechanism (so it can first be opt-in in 1.X), this could "automatically"
>>    lead to a simplification of the internals / Block Manager (another
>>    aspirational goal that has been discussed before, but never became
>>    concrete). Because in such a case (all extension dtypes), we would only be
>>    using 1D blocks (simplifying the 1D / 2D thorny cases in internals). This
>>    simplifies the memory model, consolidation, etc
>>
>> Do you think this is a desirable goal? And realistic? Other aspirational
>> goals?
>>
>> Best,
>> Joris
>>
>> *Agreeing on goals doesn't mean it will happen, that's open source (or at
>> least community-based open source). But I think it can still be useful to
>> guide some efforts where possible or in trying to get traction for certain
>> issues from contributors. And then we can still see if it gets done in 2.0,
>> 3.0, 4.0 or never ;)
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200213/fc50b8ea/attachment.html>

From tom.augspurger88 at gmail.com  Fri Feb 14 16:02:19 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Fri, 14 Feb 2020 15:02:19 -0600
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
Message-ID: <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>

> Replacing NaT with NA breaks arithmetic consistency

This means the result dtype of a Series & scalar, right? If so, it's worth
deciding whether that's more valuable than consistency in the behavior of
missing values in arithmetic and comparison operations.

On Thu, Feb 13, 2020 at 4:23 PM Brock Mendel <jbrockmendel at gmail.com> wrote:

> > This would also imply creating a nullable float dtype and making our
> datelikes use NA rather than NaT too. That seemed to be generally OK, but
> wasn't discussed too much.
>
> My understanding of the discussion is that using a mask on top of
> datetimelike arrays would not _replace_ NaT, but supplement it with
> something semantically different.  Replacing NaT with NA breaks arithmetic
> consistency, as has been discussed ad nauseum.
>
> On Wed, Feb 12, 2020 at 3:29 PM Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>> Thanks Joris.
>>
>> This was discussed on the call today:
>> https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing.
>> I'll try to summarize the discussion here.
>>
>> On NA by default, there were a few concerns, none of which is likely a
>> blocker. Things like the memory overhead of masks can be improved by making
>> them optional (relatively easy) and possibly using a bitmask (probably
>> harder).
>>
>> I wondered if this was blocked by the BlockManager being written in
>> Python. This change would imply that blockwise ops would become columwise,
>> so we'll have more overhead for some ops. Joris cited a bit of work he did
>> to make this not too bad, at least for not too wide of tables.
>>
>> I also wondered whether this would be inappropriate as long as NA lives
>> in pandas, rather than something that is understood by the entire
>> scientific python ecosystem. It's worth thinking about and seeing how the
>> community reacts to NA. Probably not a blocker.
>>
>> This would also imply creating a nullable float dtype and making our
>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>> wasn't discussed too much.
>>
>> ---
>>
>> Other "2.0" topics included rethinking our dependencies. It's possible
>> Arrow could be added. Going nullable by default would make Arrow a pretty
>> attractive option for storing arrays. But we would need to consult with our
>> downstream dependencies (like xarray) and users about that.
>>
>> Fixing __getitem__ was also discussed. That will take someone to write up
>> a detailed proposal about specific proposed changes and possible
>> deprecation paths.
>>
>> Tom
>>
>> On Mon, Feb 10, 2020 at 11:43 AM Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> pandas 1.0 is out, so time to start thinking about 2.0 ;)
>>>
>>> In principle, pandas 2.0 will just be one of the next releases when we
>>> decide we want to clean-up the deprecations / make a few changes that are
>>> hard to deprecate (following our new versioning policy).
>>> But nonetheless, I think it can still be interesting to think about it
>>> if it can also be something more than that, and have more specific goals in
>>> mind*.
>>>
>>> Last year I made the pd.NA proposal, which resulted in using that for
>>> the nullale integer, boolean and string dtypes. In the proposal, pd.NA was
>>> described as "can be used consistently across all data types". And for me,
>>> the aspirational end goal of this proposal is to *actually* have this
>>> for *all* dtypes, but we never really discussed this aspect explicitly.
>>>
>>> So, for me, a possible future pandas 2.0:
>>>
>>>    - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
>>>    missing value indicator). That means we add a nullable version of all other
>>>    dtypes (as we now already did for int, boolean, string). End goal: a single
>>>    missing value indicator with the same behavior for all dtypes.
>>>    - If we add such nullable dtypes using the extension dtypes/array
>>>    mechanism (so it can first be opt-in in 1.X), this could "automatically"
>>>    lead to a simplification of the internals / Block Manager (another
>>>    aspirational goal that has been discussed before, but never became
>>>    concrete). Because in such a case (all extension dtypes), we would only be
>>>    using 1D blocks (simplifying the 1D / 2D thorny cases in internals). This
>>>    simplifies the memory model, consolidation, etc
>>>
>>> Do you think this is a desirable goal? And realistic? Other aspirational
>>> goals?
>>>
>>> Best,
>>> Joris
>>>
>>> *Agreeing on goals doesn't mean it will happen, that's open source (or
>>> at least community-based open source). But I think it can still be useful
>>> to guide some efforts where possible or in trying to get traction for
>>> certain issues from contributors. And then we can still see if it gets done
>>> in 2.0, 3.0, 4.0 or never ;)
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200214/263134f4/attachment.html>

From garcia.marc at gmail.com  Mon Feb 17 05:12:50 2020
From: garcia.marc at gmail.com (Marc Garcia)
Date: Mon, 17 Feb 2020 10:12:50 +0000
Subject: [Pandas-dev] Domain and hosting
Message-ID: <CAEk5N5u_a0j5NYQKMOBwHT9JPJr2Dxn2y36z9BpxODg9vg6S1w@mail.gmail.com>

We've got a bit of a mess at the moment with the pandas domain and hosting.
I'll try to leave things in a more reasonable way, but there are some
decisions pending.

My understanding from a thread in this list was that everybody was happy
with using pandas.io, and we set up the domain for the new hosting, and
also dev.pandas.io for the development version of the website (and the blog
and anything published in our GitHub organization pages).

>From the discussion in https://github.com/pandas-dev/pandas/issues/28528
seems like the preference is to keep the old pandas.pydata.org instead.
Given that, I think we can get rid of the dev.pandas.io, and point
pandas.pydata.org to the new server once it's ready.

For the website, I think the agreement is to update it with the latest
version from master. So, no dev.pandas.io. For the development (master)
documentation, I think it can live in pandas.pydata.org/docs/dev/.

The blog, I think the best is to have the posts as pages on the website, in
a directory blog/, so we don't need to maintain separaterly, and it has the
look and feel of the website.

For the new hosting, I'll move everything (all old documentation versions)
from the current server to the new one, and then set up that the website
and the development docs are automatically updated. Tom, can you give me
access to the current web server so I can fetch the data please?

Once everything is working in the new server, we'll be able to see it in
pandas.io, and when we're happy we can change the domain pandas.pydata.org
to point to it, and disable pandas.io.

Please let me know if there are objections to any of the above, otherwise
I'll move forward.

Cheers!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/e110fe66/attachment.html>

From jorisvandenbossche at gmail.com  Mon Feb 17 06:36:54 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 17 Feb 2020 12:36:54 +0100
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
Message-ID: <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>

>
> > This would also imply creating a nullable float dtype and making our
> datelikes use NA rather than NaT too. That seemed to be generally OK, but
> wasn't discussed too much.
>
> My understanding of the discussion is that using a mask on top of
> datetimelike arrays would not _replace_ NaT, but supplement it with
> something semantically different.
>

Yes, if we see it similar as NaNs for floats (where NaN is a specific float
value in the data array, while NAs are tracked in the mask array), then for
datetimelike arrays we can do something similar. And the same discussions
about to what extent to distinguish NaN and NA or whether we need to
provide options that we are going to have for float dtypes, will also be
relevant for datetimelike dtypes (but then for NaT and NA).

But note that in practice, I *think* that the big majority of use cases
will mostly use NA and not NaT in the data (eg when reading from files that
have missing data).

Replacing NaT with NA breaks arithmetic consistency, as has been discussed
> ad nauseum.
>

It's not fully clear to me what you want to say with this, so a more
detailed clarification is welcome (I mean, I understand the sentence and
remember the discussion, but don't fully understand the point being made in
context, or in what direction you think more discussion is needed).

Assume we introduce a new "nullable datetime" dtype that uses a mask to
track NAs, and can still have NaT in the values. In practice, this still
means that we "replace NaT with NA" (because even though NaT is still
possible, I think you would mostly get NAs as mentioned above; eg reading a
file would now give NA instaed of NaT).
So do you mean: "in my opinion, we should not do this" (what I just
described above), because in practice that would mean breaking arithmetic
consistency? Or that if we want to start using NA for datetimelike dtypes,
you think "dtype-parametrized" NA values are necessary (so you can
distinguish NA[datetime] and NA[timedelta] ?)

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/89c276d2/attachment.html>

From andy.terrel at gmail.com  Mon Feb 17 08:09:16 2020
From: andy.terrel at gmail.com (Andy Ray Terrel)
Date: Mon, 17 Feb 2020 07:09:16 -0600
Subject: [Pandas-dev] Domain and hosting
In-Reply-To: <CAEk5N5u_a0j5NYQKMOBwHT9JPJr2Dxn2y36z9BpxODg9vg6S1w@mail.gmail.com>
References: <CAEk5N5u_a0j5NYQKMOBwHT9JPJr2Dxn2y36z9BpxODg9vg6S1w@mail.gmail.com>
Message-ID: <CA+WonSQQsV-72w9rAxgcBGR9NXp09uz9GPv-=tZNkScALoTRoA@mail.gmail.com>

On Mon, Feb 17, 2020 at 4:13 AM Marc Garcia <garcia.marc at gmail.com> wrote:

> We've got a bit of a mess at the moment with the pandas domain and
> hosting. I'll try to leave things in a more reasonable way, but there are
> some decisions pending.
>
> My understanding from a thread in this list was that everybody was happy
> with using pandas.io, and we set up the domain for the new hosting, and
> also dev.pandas.io for the development version of the website (and the
> blog and anything published in our GitHub organization pages).
>
> From the discussion in https://github.com/pandas-dev/pandas/issues/28528
> seems like the preference is to keep the old pandas.pydata.org instead.
> Given that, I think we can get rid of the dev.pandas.io, and point
> pandas.pydata.org to the new server once it's ready.
>
> For the website, I think the agreement is to update it with the latest
> version from master. So, no dev.pandas.io. For the development (master)
> documentation, I think it can live in pandas.pydata.org/docs/dev/.
>
> The blog, I think the best is to have the posts as pages on the website,
> in a directory blog/, so we don't need to maintain separaterly, and it has
> the look and feel of the website.
>
> For the new hosting, I'll move everything (all old documentation versions)
> from the current server to the new one, and then set up that the website
> and the development docs are automatically updated. Tom, can you give me
> access to the current web server so I can fetch the data please?
>

Marc, send me your ssh-key and preferred login, I can get you access.
NumFOCUS just got a deal with AWS and we have the Rackspace servers for
another two years, so unless you are just dying to pay fees, let me get you
free servers.


>
> Once everything is working in the new server, we'll be able to see it in
> pandas.io, and when we're happy we can change the domain pandas.pydata.org
> to point to it, and disable pandas.io.
>
> Please let me know if there are objections to any of the above, otherwise
> I'll move forward.
>
> Cheers!
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/6e2de0af/attachment-0001.html>

From jbrockmendel at gmail.com  Mon Feb 17 11:25:26 2020
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Mon, 17 Feb 2020 08:25:26 -0800
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
Message-ID: <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>

> It's not fully clear to me what you want to say with this, so a more
detailed clarification is welcome (I mean, I understand the sentence and
remember the discussion, but don't fully understand the point being made in
context, or in what direction you think more discussion is needed).

I don't particularly think more discussion is needed, as this is a rehash
of #28095, where this horse has already been beaten to death.

As Tom noted here
<https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
using pd.NA in places where we currently use NaT breaks the usual identity
(that we rely on A LOT)

```(array + array)[0].dtype <=> (array + array[0]).dtype```

(Yes, this holds only imperfectly for NaT because NaT serves as both
NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
in #28095.)

Also from #28095:

```Series[timedelta64] * pd.NaT``` unambiguously raises, but
```Series[timedelta64] * pd.NA``` could be timedelta64

> Assume we introduce a new "nullable datetime" dtype that uses a mask to
track NAs, and can still have NaT in the values. In practice, this still
means that we "replace NaT with NA"

This strikes me as contradictory.

> So do you mean: "in my opinion, we should not do this" (what I just
described above), because in practice that would mean breaking arithmetic
consistency? Or that if we want to start using NA for datetimelike dtypes,
you think "dtype-parametrized" NA values are necessary (so you can
distinguish NA[datetime] and NA[timedelta] ?)

I think:

1) pd.NA solves an _actual_ problem which is that we used to use np.nan in
places (categorical, object) where np.nan was semantically misleading.
   a) What these have in common is that they are in general non-arithmetic
dtypes.
   b) This is an improvement, and I'm glad you put in the effort to make it
happen.
   c) Trying to shoe-horn pd.NA into cases where it is semantically
misleading based on the Highlander Principle is counter-productive.

2) The "only one NA value is simpler" argument strikes me as a solution in
search of a problem.
   a) All the more so if you want this to supplement np.nan/pd.NaT instead
of replace them.
   b) *the idea of replacing vs supplementing needs to be made much more
explicit/clear*

3) The "dtype-parametrized" NA did come up in #28095, but I never advocated
it.
   a) I am open to separating out a NaTimedelta (xref #24983) from pd.NaT,
and don't particularly care what it is called.


On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> > This would also imply creating a nullable float dtype and making our
>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>> wasn't discussed too much.
>>
>> My understanding of the discussion is that using a mask on top of
>> datetimelike arrays would not _replace_ NaT, but supplement it with
>> something semantically different.
>>
>
> Yes, if we see it similar as NaNs for floats (where NaN is a specific
> float value in the data array, while NAs are tracked in the mask array),
> then for datetimelike arrays we can do something similar. And the same
> discussions about to what extent to distinguish NaN and NA or whether we
> need to provide options that we are going to have for float dtypes, will
> also be relevant for datetimelike dtypes (but then for NaT and NA).
>
> But note that in practice, I *think* that the big majority of use cases
> will mostly use NA and not NaT in the data (eg when reading from files that
> have missing data).
>
> Replacing NaT with NA breaks arithmetic consistency, as has been discussed
>> ad nauseum.
>>
>
> It's not fully clear to me what you want to say with this, so a more
> detailed clarification is welcome (I mean, I understand the sentence and
> remember the discussion, but don't fully understand the point being made in
> context, or in what direction you think more discussion is needed).
>
> Assume we introduce a new "nullable datetime" dtype that uses a mask to
> track NAs, and can still have NaT in the values. In practice, this still
> means that we "replace NaT with NA" (because even though NaT is still
> possible, I think you would mostly get NAs as mentioned above; eg reading a
> file would now give NA instaed of NaT).
> So do you mean: "in my opinion, we should not do this" (what I just
> described above), because in practice that would mean breaking arithmetic
> consistency? Or that if we want to start using NA for datetimelike dtypes,
> you think "dtype-parametrized" NA values are necessary (so you can
> distinguish NA[datetime] and NA[timedelta] ?)
>
> Joris
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/7d6a3ef1/attachment.html>

From tom.augspurger88 at gmail.com  Mon Feb 17 11:33:52 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Mon, 17 Feb 2020 10:33:52 -0600
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
Message-ID: <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>

> 2) The "only one NA value is simpler" argument strikes me as a solution
in search of a problem.

I don't think that's correct. I think consistently propagating NA in
comparison operations is a worthwhile goal.

On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
wrote:

> > It's not fully clear to me what you want to say with this, so a more
> detailed clarification is welcome (I mean, I understand the sentence and
> remember the discussion, but don't fully understand the point being made in
> context, or in what direction you think more discussion is needed).
>
> I don't particularly think more discussion is needed, as this is a rehash
> of #28095, where this horse has already been beaten to death.
>
> As Tom noted here
> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
> using pd.NA in places where we currently use NaT breaks the usual identity
> (that we rely on A LOT)
>
> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>
> (Yes, this holds only imperfectly for NaT because NaT serves as both
> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
> in #28095.)
>
> Also from #28095:
>
> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
> ```Series[timedelta64] * pd.NA``` could be timedelta64
>
> > Assume we introduce a new "nullable datetime" dtype that uses a mask to
> track NAs, and can still have NaT in the values. In practice, this still
> means that we "replace NaT with NA"
>
> This strikes me as contradictory.
>
> > So do you mean: "in my opinion, we should not do this" (what I just
> described above), because in practice that would mean breaking arithmetic
> consistency? Or that if we want to start using NA for datetimelike dtypes,
> you think "dtype-parametrized" NA values are necessary (so you can
> distinguish NA[datetime] and NA[timedelta] ?)
>
> I think:
>
> 1) pd.NA solves an _actual_ problem which is that we used to use np.nan in
> places (categorical, object) where np.nan was semantically misleading.
>    a) What these have in common is that they are in general non-arithmetic
> dtypes.
>    b) This is an improvement, and I'm glad you put in the effort to make
> it happen.
>    c) Trying to shoe-horn pd.NA into cases where it is semantically
> misleading based on the Highlander Principle is counter-productive.
>
> 2) The "only one NA value is simpler" argument strikes me as a solution in
> search of a problem.
>    a) All the more so if you want this to supplement np.nan/pd.NaT instead
> of replace them.
>    b) *the idea of replacing vs supplementing needs to be made much more
> explicit/clear*
>
> 3) The "dtype-parametrized" NA did come up in #28095, but I never
> advocated it.
>    a) I am open to separating out a NaTimedelta (xref #24983) from pd.NaT,
> and don't particularly care what it is called.
>
>
> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> > This would also imply creating a nullable float dtype and making our
>>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>>> wasn't discussed too much.
>>>
>>> My understanding of the discussion is that using a mask on top of
>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>> something semantically different.
>>>
>>
>> Yes, if we see it similar as NaNs for floats (where NaN is a specific
>> float value in the data array, while NAs are tracked in the mask array),
>> then for datetimelike arrays we can do something similar. And the same
>> discussions about to what extent to distinguish NaN and NA or whether we
>> need to provide options that we are going to have for float dtypes, will
>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>
>> But note that in practice, I *think* that the big majority of use cases
>> will mostly use NA and not NaT in the data (eg when reading from files that
>> have missing data).
>>
>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>> discussed ad nauseum.
>>>
>>
>> It's not fully clear to me what you want to say with this, so a more
>> detailed clarification is welcome (I mean, I understand the sentence and
>> remember the discussion, but don't fully understand the point being made in
>> context, or in what direction you think more discussion is needed).
>>
>> Assume we introduce a new "nullable datetime" dtype that uses a mask to
>> track NAs, and can still have NaT in the values. In practice, this still
>> means that we "replace NaT with NA" (because even though NaT is still
>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>> file would now give NA instaed of NaT).
>> So do you mean: "in my opinion, we should not do this" (what I just
>> described above), because in practice that would mean breaking arithmetic
>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>> you think "dtype-parametrized" NA values are necessary (so you can
>> distinguish NA[datetime] and NA[timedelta] ?)
>>
>> Joris
>>
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/7354147c/attachment-0001.html>

From jbrockmendel at gmail.com  Mon Feb 17 12:50:38 2020
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Mon, 17 Feb 2020 09:50:38 -0800
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
Message-ID: <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>

> I think consistently propagating NA in comparison operations is a
worthwhile goal.

That's an argument for having a three-valued bool-dtype, not for replacing
all other NA-like values.

On Mon, Feb 17, 2020 at 8:34 AM Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> > 2) The "only one NA value is simpler" argument strikes me as a solution
> in search of a problem.
>
> I don't think that's correct. I think consistently propagating NA in
> comparison operations is a worthwhile goal.
>
> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> > It's not fully clear to me what you want to say with this, so a more
>> detailed clarification is welcome (I mean, I understand the sentence and
>> remember the discussion, but don't fully understand the point being made in
>> context, or in what direction you think more discussion is needed).
>>
>> I don't particularly think more discussion is needed, as this is a rehash
>> of #28095, where this horse has already been beaten to death.
>>
>> As Tom noted here
>> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
>> using pd.NA in places where we currently use NaT breaks the usual identity
>> (that we rely on A LOT)
>>
>> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>>
>> (Yes, this holds only imperfectly for NaT because NaT serves as both
>> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
>> in #28095.)
>>
>> Also from #28095:
>>
>> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
>> ```Series[timedelta64] * pd.NA``` could be timedelta64
>>
>> > Assume we introduce a new "nullable datetime" dtype that uses a mask to
>> track NAs, and can still have NaT in the values. In practice, this still
>> means that we "replace NaT with NA"
>>
>> This strikes me as contradictory.
>>
>> > So do you mean: "in my opinion, we should not do this" (what I just
>> described above), because in practice that would mean breaking arithmetic
>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>> you think "dtype-parametrized" NA values are necessary (so you can
>> distinguish NA[datetime] and NA[timedelta] ?)
>>
>> I think:
>>
>> 1) pd.NA solves an _actual_ problem which is that we used to use np.nan
>> in places (categorical, object) where np.nan was semantically misleading.
>>    a) What these have in common is that they are in general
>> non-arithmetic dtypes.
>>    b) This is an improvement, and I'm glad you put in the effort to make
>> it happen.
>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>> misleading based on the Highlander Principle is counter-productive.
>>
>> 2) The "only one NA value is simpler" argument strikes me as a solution
>> in search of a problem.
>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>> instead of replace them.
>>    b) *the idea of replacing vs supplementing needs to be made much more
>> explicit/clear*
>>
>> 3) The "dtype-parametrized" NA did come up in #28095, but I never
>> advocated it.
>>    a) I am open to separating out a NaTimedelta (xref #24983) from
>> pd.NaT, and don't particularly care what it is called.
>>
>>
>> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> > This would also imply creating a nullable float dtype and making our
>>>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>>>> wasn't discussed too much.
>>>>
>>>> My understanding of the discussion is that using a mask on top of
>>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>>> something semantically different.
>>>>
>>>
>>> Yes, if we see it similar as NaNs for floats (where NaN is a specific
>>> float value in the data array, while NAs are tracked in the mask array),
>>> then for datetimelike arrays we can do something similar. And the same
>>> discussions about to what extent to distinguish NaN and NA or whether we
>>> need to provide options that we are going to have for float dtypes, will
>>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>>
>>> But note that in practice, I *think* that the big majority of use cases
>>> will mostly use NA and not NaT in the data (eg when reading from files that
>>> have missing data).
>>>
>>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>>> discussed ad nauseum.
>>>>
>>>
>>> It's not fully clear to me what you want to say with this, so a more
>>> detailed clarification is welcome (I mean, I understand the sentence and
>>> remember the discussion, but don't fully understand the point being made in
>>> context, or in what direction you think more discussion is needed).
>>>
>>> Assume we introduce a new "nullable datetime" dtype that uses a mask to
>>> track NAs, and can still have NaT in the values. In practice, this still
>>> means that we "replace NaT with NA" (because even though NaT is still
>>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>>> file would now give NA instaed of NaT).
>>> So do you mean: "in my opinion, we should not do this" (what I just
>>> described above), because in practice that would mean breaking arithmetic
>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>> you think "dtype-parametrized" NA values are necessary (so you can
>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>
>>> Joris
>>>
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/cb84ffb2/attachment.html>

From tom.augspurger88 at gmail.com  Mon Feb 17 12:55:23 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Mon, 17 Feb 2020 11:55:23 -0600
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
Message-ID: <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>

Is NaT defined to be unequal in all comparisons, just like NaN? I think the
goal of propagating NA
requires either using NA or changing the behavior of NaT in comparisons to
be like NA.

On Mon, Feb 17, 2020 at 11:50 AM Brock Mendel <jbrockmendel at gmail.com>
wrote:

> > I think consistently propagating NA in comparison operations is a
> worthwhile goal.
>
> That's an argument for having a three-valued bool-dtype, not for replacing
> all other NA-like values.
>
> On Mon, Feb 17, 2020 at 8:34 AM Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>> > 2) The "only one NA value is simpler" argument strikes me as a solution
>> in search of a problem.
>>
>> I don't think that's correct. I think consistently propagating NA in
>> comparison operations is a worthwhile goal.
>>
>> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>> > It's not fully clear to me what you want to say with this, so a more
>>> detailed clarification is welcome (I mean, I understand the sentence and
>>> remember the discussion, but don't fully understand the point being made in
>>> context, or in what direction you think more discussion is needed).
>>>
>>> I don't particularly think more discussion is needed, as this is a
>>> rehash of #28095, where this horse has already been beaten to death.
>>>
>>> As Tom noted here
>>> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
>>> using pd.NA in places where we currently use NaT breaks the usual identity
>>> (that we rely on A LOT)
>>>
>>> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>>>
>>> (Yes, this holds only imperfectly for NaT because NaT serves as both
>>> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
>>> in #28095.)
>>>
>>> Also from #28095:
>>>
>>> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
>>> ```Series[timedelta64] * pd.NA``` could be timedelta64
>>>
>>> > Assume we introduce a new "nullable datetime" dtype that uses a mask
>>> to track NAs, and can still have NaT in the values. In practice, this still
>>> means that we "replace NaT with NA"
>>>
>>> This strikes me as contradictory.
>>>
>>> > So do you mean: "in my opinion, we should not do this" (what I just
>>> described above), because in practice that would mean breaking arithmetic
>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>> you think "dtype-parametrized" NA values are necessary (so you can
>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>
>>> I think:
>>>
>>> 1) pd.NA solves an _actual_ problem which is that we used to use np.nan
>>> in places (categorical, object) where np.nan was semantically misleading.
>>>    a) What these have in common is that they are in general
>>> non-arithmetic dtypes.
>>>    b) This is an improvement, and I'm glad you put in the effort to make
>>> it happen.
>>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>>> misleading based on the Highlander Principle is counter-productive.
>>>
>>> 2) The "only one NA value is simpler" argument strikes me as a solution
>>> in search of a problem.
>>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>>> instead of replace them.
>>>    b) *the idea of replacing vs supplementing needs to be made much more
>>> explicit/clear*
>>>
>>> 3) The "dtype-parametrized" NA did come up in #28095, but I never
>>> advocated it.
>>>    a) I am open to separating out a NaTimedelta (xref #24983) from
>>> pd.NaT, and don't particularly care what it is called.
>>>
>>>
>>> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
>>> jorisvandenbossche at gmail.com> wrote:
>>>
>>>> > This would also imply creating a nullable float dtype and making our
>>>>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>>>>> wasn't discussed too much.
>>>>>
>>>>> My understanding of the discussion is that using a mask on top of
>>>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>>>> something semantically different.
>>>>>
>>>>
>>>> Yes, if we see it similar as NaNs for floats (where NaN is a specific
>>>> float value in the data array, while NAs are tracked in the mask array),
>>>> then for datetimelike arrays we can do something similar. And the same
>>>> discussions about to what extent to distinguish NaN and NA or whether we
>>>> need to provide options that we are going to have for float dtypes, will
>>>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>>>
>>>> But note that in practice, I *think* that the big majority of use
>>>> cases will mostly use NA and not NaT in the data (eg when reading from
>>>> files that have missing data).
>>>>
>>>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>>>> discussed ad nauseum.
>>>>>
>>>>
>>>> It's not fully clear to me what you want to say with this, so a more
>>>> detailed clarification is welcome (I mean, I understand the sentence and
>>>> remember the discussion, but don't fully understand the point being made in
>>>> context, or in what direction you think more discussion is needed).
>>>>
>>>> Assume we introduce a new "nullable datetime" dtype that uses a mask to
>>>> track NAs, and can still have NaT in the values. In practice, this still
>>>> means that we "replace NaT with NA" (because even though NaT is still
>>>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>>>> file would now give NA instaed of NaT).
>>>> So do you mean: "in my opinion, we should not do this" (what I just
>>>> described above), because in practice that would mean breaking arithmetic
>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>
>>>> Joris
>>>>
>>>>
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/2a349cfa/attachment-0001.html>

From jbrockmendel at gmail.com  Mon Feb 17 12:58:28 2020
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Mon, 17 Feb 2020 09:58:28 -0800
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
Message-ID: <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>

> or changing the behavior of NaT in comparisons to be like NA.

Pending the kinks being worked out of pd.NA, I have no problem with that.

On Mon, Feb 17, 2020 at 9:55 AM Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> Is NaT defined to be unequal in all comparisons, just like NaN? I think
> the goal of propagating NA
> requires either using NA or changing the behavior of NaT in comparisons to
> be like NA.
>
> On Mon, Feb 17, 2020 at 11:50 AM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> > I think consistently propagating NA in comparison operations is a
>> worthwhile goal.
>>
>> That's an argument for having a three-valued bool-dtype, not for
>> replacing all other NA-like values.
>>
>> On Mon, Feb 17, 2020 at 8:34 AM Tom Augspurger <
>> tom.augspurger88 at gmail.com> wrote:
>>
>>> > 2) The "only one NA value is simpler" argument strikes me as a
>>> solution in search of a problem.
>>>
>>> I don't think that's correct. I think consistently propagating NA in
>>> comparison operations is a worthwhile goal.
>>>
>>> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
>>> wrote:
>>>
>>>> > It's not fully clear to me what you want to say with this, so a more
>>>> detailed clarification is welcome (I mean, I understand the sentence and
>>>> remember the discussion, but don't fully understand the point being made in
>>>> context, or in what direction you think more discussion is needed).
>>>>
>>>> I don't particularly think more discussion is needed, as this is a
>>>> rehash of #28095, where this horse has already been beaten to death.
>>>>
>>>> As Tom noted here
>>>> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
>>>> using pd.NA in places where we currently use NaT breaks the usual identity
>>>> (that we rely on A LOT)
>>>>
>>>> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>>>>
>>>> (Yes, this holds only imperfectly for NaT because NaT serves as both
>>>> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
>>>> in #28095.)
>>>>
>>>> Also from #28095:
>>>>
>>>> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
>>>> ```Series[timedelta64] * pd.NA``` could be timedelta64
>>>>
>>>> > Assume we introduce a new "nullable datetime" dtype that uses a mask
>>>> to track NAs, and can still have NaT in the values. In practice, this still
>>>> means that we "replace NaT with NA"
>>>>
>>>> This strikes me as contradictory.
>>>>
>>>> > So do you mean: "in my opinion, we should not do this" (what I just
>>>> described above), because in practice that would mean breaking arithmetic
>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>
>>>> I think:
>>>>
>>>> 1) pd.NA solves an _actual_ problem which is that we used to use np.nan
>>>> in places (categorical, object) where np.nan was semantically misleading.
>>>>    a) What these have in common is that they are in general
>>>> non-arithmetic dtypes.
>>>>    b) This is an improvement, and I'm glad you put in the effort to
>>>> make it happen.
>>>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>>>> misleading based on the Highlander Principle is counter-productive.
>>>>
>>>> 2) The "only one NA value is simpler" argument strikes me as a solution
>>>> in search of a problem.
>>>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>>>> instead of replace them.
>>>>    b) *the idea of replacing vs supplementing needs to be made much
>>>> more explicit/clear*
>>>>
>>>> 3) The "dtype-parametrized" NA did come up in #28095, but I never
>>>> advocated it.
>>>>    a) I am open to separating out a NaTimedelta (xref #24983) from
>>>> pd.NaT, and don't particularly care what it is called.
>>>>
>>>>
>>>> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
>>>> jorisvandenbossche at gmail.com> wrote:
>>>>
>>>>> > This would also imply creating a nullable float dtype and making our
>>>>>> datelikes use NA rather than NaT too. That seemed to be generally OK, but
>>>>>> wasn't discussed too much.
>>>>>>
>>>>>> My understanding of the discussion is that using a mask on top of
>>>>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>>>>> something semantically different.
>>>>>>
>>>>>
>>>>> Yes, if we see it similar as NaNs for floats (where NaN is a specific
>>>>> float value in the data array, while NAs are tracked in the mask array),
>>>>> then for datetimelike arrays we can do something similar. And the same
>>>>> discussions about to what extent to distinguish NaN and NA or whether we
>>>>> need to provide options that we are going to have for float dtypes, will
>>>>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>>>>
>>>>> But note that in practice, I *think* that the big majority of use
>>>>> cases will mostly use NA and not NaT in the data (eg when reading from
>>>>> files that have missing data).
>>>>>
>>>>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>>>>> discussed ad nauseum.
>>>>>>
>>>>>
>>>>> It's not fully clear to me what you want to say with this, so a more
>>>>> detailed clarification is welcome (I mean, I understand the sentence and
>>>>> remember the discussion, but don't fully understand the point being made in
>>>>> context, or in what direction you think more discussion is needed).
>>>>>
>>>>> Assume we introduce a new "nullable datetime" dtype that uses a mask
>>>>> to track NAs, and can still have NaT in the values. In practice, this still
>>>>> means that we "replace NaT with NA" (because even though NaT is still
>>>>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>>>>> file would now give NA instaed of NaT).
>>>>> So do you mean: "in my opinion, we should not do this" (what I just
>>>>> described above), because in practice that would mean breaking arithmetic
>>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>>
>>>>> Joris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/26004cfe/attachment.html>

From tom.augspurger88 at gmail.com  Mon Feb 17 13:06:27 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Mon, 17 Feb 2020 12:06:27 -0600
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
 <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
Message-ID: <CAE1aY-mLB+1DfvP0hmiaJjzz8C60K1soiNA-MOeqVm-gSbUogA@mail.gmail.com>

On Mon, Feb 17, 2020 at 11:58 AM Brock Mendel <jbrockmendel at gmail.com>
wrote:

> > or changing the behavior of NaT in comparisons to be like NA.
>
> Pending the kinks being worked out of pd.NA, I have no problem with that.
>

You have no problem with changing the behavior of NaT, or changing to use
pd.NA?

Is changing the defined behavior of NaT even an option? Is it defined in a
spec
like NaN, or did NumPy just choose that behavior?

Assuming NaT had NA-like behavior in comparisons, what's remaining
arguments for keeping NaT?
Preserving dtypes in scalar - array ops? Anything else?

On Mon, Feb 17, 2020 at 9:55 AM Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>> Is NaT defined to be unequal in all comparisons, just like NaN? I think
>> the goal of propagating NA
>> requires either using NA or changing the behavior of NaT in comparisons
>> to be like NA.
>>
>> On Mon, Feb 17, 2020 at 11:50 AM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>> > I think consistently propagating NA in comparison operations is a
>>> worthwhile goal.
>>>
>>> That's an argument for having a three-valued bool-dtype, not for
>>> replacing all other NA-like values.
>>>
>>> On Mon, Feb 17, 2020 at 8:34 AM Tom Augspurger <
>>> tom.augspurger88 at gmail.com> wrote:
>>>
>>>> > 2) The "only one NA value is simpler" argument strikes me as a
>>>> solution in search of a problem.
>>>>
>>>> I don't think that's correct. I think consistently propagating NA in
>>>> comparison operations is a worthwhile goal.
>>>>
>>>> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
>>>> wrote:
>>>>
>>>>> > It's not fully clear to me what you want to say with this, so a more
>>>>> detailed clarification is welcome (I mean, I understand the sentence and
>>>>> remember the discussion, but don't fully understand the point being made in
>>>>> context, or in what direction you think more discussion is needed).
>>>>>
>>>>> I don't particularly think more discussion is needed, as this is a
>>>>> rehash of #28095, where this horse has already been beaten to death.
>>>>>
>>>>> As Tom noted here
>>>>> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
>>>>> using pd.NA in places where we currently use NaT breaks the usual identity
>>>>> (that we rely on A LOT)
>>>>>
>>>>> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>>>>>
>>>>> (Yes, this holds only imperfectly for NaT because NaT serves as both
>>>>> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
>>>>> in #28095.)
>>>>>
>>>>> Also from #28095:
>>>>>
>>>>> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
>>>>> ```Series[timedelta64] * pd.NA``` could be timedelta64
>>>>>
>>>>> > Assume we introduce a new "nullable datetime" dtype that uses a mask
>>>>> to track NAs, and can still have NaT in the values. In practice, this still
>>>>> means that we "replace NaT with NA"
>>>>>
>>>>> This strikes me as contradictory.
>>>>>
>>>>> > So do you mean: "in my opinion, we should not do this" (what I just
>>>>> described above), because in practice that would mean breaking arithmetic
>>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>>
>>>>> I think:
>>>>>
>>>>> 1) pd.NA solves an _actual_ problem which is that we used to use
>>>>> np.nan in places (categorical, object) where np.nan was semantically
>>>>> misleading.
>>>>>    a) What these have in common is that they are in general
>>>>> non-arithmetic dtypes.
>>>>>    b) This is an improvement, and I'm glad you put in the effort to
>>>>> make it happen.
>>>>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>>>>> misleading based on the Highlander Principle is counter-productive.
>>>>>
>>>>> 2) The "only one NA value is simpler" argument strikes me as a
>>>>> solution in search of a problem.
>>>>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>>>>> instead of replace them.
>>>>>    b) *the idea of replacing vs supplementing needs to be made much
>>>>> more explicit/clear*
>>>>>
>>>>> 3) The "dtype-parametrized" NA did come up in #28095, but I never
>>>>> advocated it.
>>>>>    a) I am open to separating out a NaTimedelta (xref #24983) from
>>>>> pd.NaT, and don't particularly care what it is called.
>>>>>
>>>>>
>>>>> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
>>>>> jorisvandenbossche at gmail.com> wrote:
>>>>>
>>>>>> > This would also imply creating a nullable float dtype and making
>>>>>>> our datelikes use NA rather than NaT too. That seemed to be generally OK,
>>>>>>> but wasn't discussed too much.
>>>>>>>
>>>>>>> My understanding of the discussion is that using a mask on top of
>>>>>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>>>>>> something semantically different.
>>>>>>>
>>>>>>
>>>>>> Yes, if we see it similar as NaNs for floats (where NaN is a specific
>>>>>> float value in the data array, while NAs are tracked in the mask array),
>>>>>> then for datetimelike arrays we can do something similar. And the same
>>>>>> discussions about to what extent to distinguish NaN and NA or whether we
>>>>>> need to provide options that we are going to have for float dtypes, will
>>>>>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>>>>>
>>>>>> But note that in practice, I *think* that the big majority of use
>>>>>> cases will mostly use NA and not NaT in the data (eg when reading from
>>>>>> files that have missing data).
>>>>>>
>>>>>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>>>>>> discussed ad nauseum.
>>>>>>>
>>>>>>
>>>>>> It's not fully clear to me what you want to say with this, so a more
>>>>>> detailed clarification is welcome (I mean, I understand the sentence and
>>>>>> remember the discussion, but don't fully understand the point being made in
>>>>>> context, or in what direction you think more discussion is needed).
>>>>>>
>>>>>> Assume we introduce a new "nullable datetime" dtype that uses a mask
>>>>>> to track NAs, and can still have NaT in the values. In practice, this still
>>>>>> means that we "replace NaT with NA" (because even though NaT is still
>>>>>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>>>>>> file would now give NA instaed of NaT).
>>>>>> So do you mean: "in my opinion, we should not do this" (what I just
>>>>>> described above), because in practice that would mean breaking arithmetic
>>>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>>>
>>>>>> Joris
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pandas-dev mailing list
>>>>>> Pandas-dev at python.org
>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/d98ade19/attachment-0001.html>

From garcia.marc at gmail.com  Mon Feb 17 13:24:34 2020
From: garcia.marc at gmail.com (Marc Garcia)
Date: Mon, 17 Feb 2020 18:24:34 +0000
Subject: [Pandas-dev] Domain and hosting
In-Reply-To: <CA+WonSQQsV-72w9rAxgcBGR9NXp09uz9GPv-=tZNkScALoTRoA@mail.gmail.com>
References: <CAEk5N5u_a0j5NYQKMOBwHT9JPJr2Dxn2y36z9BpxODg9vg6S1w@mail.gmail.com>
 <CA+WonSQQsV-72w9rAxgcBGR9NXp09uz9GPv-=tZNkScALoTRoA@mail.gmail.com>
Message-ID: <CAEk5N5unNNMN_qQ8Vo-VcFyxPCfmOKa=Lfh4-c85pw12gib_pQ@mail.gmail.com>

The OVH hosting is also free. I thought Rackspace started sending invoices
now, I saw on Twitter comments from some other projects about it.

Part of the idea of moving to OVH was because they should provide us with
the Binder infrastructure if we make the examples in the docs runnable.

Not sure how we should move forward then. Should we simply disable or
redirect pandas.io, and we set up the CI to update the website and the dev
docs there? The current server seems to work well enough, and probably not
worth moving things for now if we still have it for two more years. Does
anyone have a different idea or a preference?

On Mon, Feb 17, 2020 at 1:09 PM Andy Ray Terrel <andy.terrel at gmail.com>
wrote:

>
>
> On Mon, Feb 17, 2020 at 4:13 AM Marc Garcia <garcia.marc at gmail.com> wrote:
>
>> We've got a bit of a mess at the moment with the pandas domain and
>> hosting. I'll try to leave things in a more reasonable way, but there are
>> some decisions pending.
>>
>> My understanding from a thread in this list was that everybody was happy
>> with using pandas.io, and we set up the domain for the new hosting, and
>> also dev.pandas.io for the development version of the website (and the
>> blog and anything published in our GitHub organization pages).
>>
>> From the discussion in https://github.com/pandas-dev/pandas/issues/28528
>> seems like the preference is to keep the old pandas.pydata.org instead.
>> Given that, I think we can get rid of the dev.pandas.io, and point
>> pandas.pydata.org to the new server once it's ready.
>>
>> For the website, I think the agreement is to update it with the latest
>> version from master. So, no dev.pandas.io. For the development (master)
>> documentation, I think it can live in pandas.pydata.org/docs/dev/.
>>
>> The blog, I think the best is to have the posts as pages on the website,
>> in a directory blog/, so we don't need to maintain separaterly, and it has
>> the look and feel of the website.
>>
>> For the new hosting, I'll move everything (all old documentation
>> versions) from the current server to the new one, and then set up that the
>> website and the development docs are automatically updated. Tom, can you
>> give me access to the current web server so I can fetch the data please?
>>
>
> Marc, send me your ssh-key and preferred login, I can get you access.
> NumFOCUS just got a deal with AWS and we have the Rackspace servers for
> another two years, so unless you are just dying to pay fees, let me get you
> free servers.
>
>
>>
>> Once everything is working in the new server, we'll be able to see it in
>> pandas.io, and when we're happy we can change the domain
>> pandas.pydata.org to point to it, and disable pandas.io.
>>
>> Please let me know if there are objections to any of the above, otherwise
>> I'll move forward.
>>
>> Cheers!
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/12d0bbcb/attachment.html>

From andy.terrel at gmail.com  Mon Feb 17 14:16:35 2020
From: andy.terrel at gmail.com (Andy Ray Terrel)
Date: Mon, 17 Feb 2020 13:16:35 -0600
Subject: [Pandas-dev] Domain and hosting
In-Reply-To: <CAEk5N5unNNMN_qQ8Vo-VcFyxPCfmOKa=Lfh4-c85pw12gib_pQ@mail.gmail.com>
References: <CAEk5N5u_a0j5NYQKMOBwHT9JPJr2Dxn2y36z9BpxODg9vg6S1w@mail.gmail.com>
 <CA+WonSQQsV-72w9rAxgcBGR9NXp09uz9GPv-=tZNkScALoTRoA@mail.gmail.com>
 <CAEk5N5unNNMN_qQ8Vo-VcFyxPCfmOKa=Lfh4-c85pw12gib_pQ@mail.gmail.com>
Message-ID: <CA+WonSS6qe=6yAQjueE3NzzC-B7B4goQXxTrRCOA1LijC9T9vA@mail.gmail.com>

On Mon, Feb 17, 2020 at 12:24 PM Marc Garcia <garcia.marc at gmail.com> wrote:

> The OVH hosting is also free. I thought Rackspace started sending invoices
> now, I saw on Twitter comments from some other projects about it.
>

NumFOCUS has negotiated a 2 year extension. The twitter comments seems to
be mostly tied to personal accounts.


>
> Part of the idea of moving to OVH was because they should provide us with
> the Binder infrastructure if we make the examples in the docs runnable.
>

OVH is great but as far as I know there is no formal relationship. Sorry if
there are details elsewhere that I've missed. Anywho, I just wanted to make
sure you know I had resources to help if needed.


> Not sure how we should move forward then. Should we simply disable or
> redirect pandas.io, and we set up the CI to update the website and the
> dev docs there? The current server seems to work well enough, and probably
> not worth moving things for now if we still have it for two more years.
> Does anyone have a different idea or a preference?
>

> On Mon, Feb 17, 2020 at 1:09 PM Andy Ray Terrel <andy.terrel at gmail.com>
> wrote:
>
>>
>>
>> On Mon, Feb 17, 2020 at 4:13 AM Marc Garcia <garcia.marc at gmail.com>
>> wrote:
>>
>>> We've got a bit of a mess at the moment with the pandas domain and
>>> hosting. I'll try to leave things in a more reasonable way, but there are
>>> some decisions pending.
>>>
>>> My understanding from a thread in this list was that everybody was happy
>>> with using pandas.io, and we set up the domain for the new hosting, and
>>> also dev.pandas.io for the development version of the website (and the
>>> blog and anything published in our GitHub organization pages).
>>>
>>> From the discussion in https://github.com/pandas-dev/pandas/issues/28528
>>> seems like the preference is to keep the old pandas.pydata.org instead.
>>> Given that, I think we can get rid of the dev.pandas.io, and point
>>> pandas.pydata.org to the new server once it's ready.
>>>
>>> For the website, I think the agreement is to update it with the latest
>>> version from master. So, no dev.pandas.io. For the development (master)
>>> documentation, I think it can live in pandas.pydata.org/docs/dev/.
>>>
>>> The blog, I think the best is to have the posts as pages on the website,
>>> in a directory blog/, so we don't need to maintain separaterly, and it has
>>> the look and feel of the website.
>>>
>>> For the new hosting, I'll move everything (all old documentation
>>> versions) from the current server to the new one, and then set up that the
>>> website and the development docs are automatically updated. Tom, can you
>>> give me access to the current web server so I can fetch the data please?
>>>
>>
>> Marc, send me your ssh-key and preferred login, I can get you access.
>> NumFOCUS just got a deal with AWS and we have the Rackspace servers for
>> another two years, so unless you are just dying to pay fees, let me get you
>> free servers.
>>
>>
>>>
>>> Once everything is working in the new server, we'll be able to see it in
>>> pandas.io, and when we're happy we can change the domain
>>> pandas.pydata.org to point to it, and disable pandas.io.
>>>
>>> Please let me know if there are objections to any of the above,
>>> otherwise I'll move forward.
>>>
>>> Cheers!
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200217/1605bb1a/attachment-0001.html>

From tom.augspurger88 at gmail.com  Tue Feb 18 12:20:01 2020
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Tue, 18 Feb 2020 11:20:01 -0600
Subject: [Pandas-dev] Fwd:  What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-=vd-v4+37CoH22pbXNbq+XNa-g8KZVw1=3MYX5v+=JhA@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
 <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
 <CAE1aY-mLB+1DfvP0hmiaJjzz8C60K1soiNA-MOeqVm-gSbUogA@mail.gmail.com>
 <CAKf8g9R5rQfE+bNK7BOAdJt2hubJrth7y3pKmX33LnH2ziASUQ@mail.gmail.com>
 <CAE1aY-=vd-v4+37CoH22pbXNbq+XNa-g8KZVw1=3MYX5v+=JhA@mail.gmail.com>
Message-ID: <CAE1aY-m=96_anT=JewOy5y4hYLB+yPXV2CSCHYNsk_MeqTRQuw@mail.gmail.com>

(Accidentally dropped the mailing list)

On Mon, Feb 17, 2020 at 7:17 PM Brock Mendel <jbrockmendel at gmail.com> wrote:

> > You have no problem with changing the behavior of NaT, or changing to
> use pd.NA?
>
> If/when we get to a point where we propagate NAs in all other comparisons,
> I would have no problem with editing `NaT.__richcmp__` to match that
> convention.
>

What are the advantages of a NaT with NA-like comparison semantics over
using NA
(or NA[datetime])?

1. Retain dtype in array - scalar ops with a scalar NA
2. ...
3. Less disruptive than changing to NA

My ... could include things like `isinstance(NaT, Timestamp)` being true and
`NaT.<attr>` for Timestamp attributes. But those don't strike me as
necessarily
good things. They seem sometimes useful and sometimes harmful.

The downside of changing NaT in comparison operations are

1. We're diverging from `np.NaT`. I don't know how problematic this
actually is.
2. It's a special case. Should users need to know that datelikes use their
own
   NA value because the underlying storage is able to store them "in-band"
   rather than as a mask? My gut reaction is "no, users shouldn't be
exposed to
   this."
3. Changing NaT would leave just NaN with the "always unequal in
comparisons"
   behavior.

Thus far, I see three options going forward

1. Use NaN for floats, NaT for datelikes, NA for other.
  1-a: Leave NaT with always unequal
  1-b: Change NaT to have NA-like comparison behavior
2. Use NA everywhere (no NaN for float, no NaT for datelike
3. Implement a typed `NA<T>`, where we have an `NA` per dtype.

Option 3 I think solves the array - scalar op issue. It's more complex for
users
though hopefully not too complex? My biggest worry is that it makes the
implementation much more complex, though perhaps I'm being pessimistic.

On balance, I'm not sure where I come down yet. Good news: we can take time
to
figure this out :)


> On Mon, Feb 17, 2020 at 10:06 AM Tom Augspurger <
> tom.augspurger88 at gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Feb 17, 2020 at 11:58 AM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>> > or changing the behavior of NaT in comparisons to be like NA.
>>>
>>> Pending the kinks being worked out of pd.NA, I have no problem with that.
>>>
>>
>> You have no problem with changing the behavior of NaT, or changing to use
>> pd.NA?
>>
>> Is changing the defined behavior of NaT even an option? Is it defined in
>> a spec
>> like NaN, or did NumPy just choose that behavior?
>>
>> Assuming NaT had NA-like behavior in comparisons, what's remaining
>> arguments for keeping NaT?
>> Preserving dtypes in scalar - array ops? Anything else?
>>
>> On Mon, Feb 17, 2020 at 9:55 AM Tom Augspurger <
>>> tom.augspurger88 at gmail.com> wrote:
>>>
>>>> Is NaT defined to be unequal in all comparisons, just like NaN? I think
>>>> the goal of propagating NA
>>>> requires either using NA or changing the behavior of NaT in comparisons
>>>> to be like NA.
>>>>
>>>> On Mon, Feb 17, 2020 at 11:50 AM Brock Mendel <jbrockmendel at gmail.com>
>>>> wrote:
>>>>
>>>>> > I think consistently propagating NA in comparison operations is a
>>>>> worthwhile goal.
>>>>>
>>>>> That's an argument for having a three-valued bool-dtype, not for
>>>>> replacing all other NA-like values.
>>>>>
>>>>> On Mon, Feb 17, 2020 at 8:34 AM Tom Augspurger <
>>>>> tom.augspurger88 at gmail.com> wrote:
>>>>>
>>>>>> > 2) The "only one NA value is simpler" argument strikes me as a
>>>>>> solution in search of a problem.
>>>>>>
>>>>>> I don't think that's correct. I think consistently propagating NA in
>>>>>> comparison operations is a worthwhile goal.
>>>>>>
>>>>>> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> > It's not fully clear to me what you want to say with this, so a
>>>>>>> more detailed clarification is welcome (I mean, I understand the sentence
>>>>>>> and remember the discussion, but don't fully understand the point being
>>>>>>> made in context, or in what direction you think more discussion is needed).
>>>>>>>
>>>>>>> I don't particularly think more discussion is needed, as this is a
>>>>>>> rehash of #28095, where this horse has already been beaten to death.
>>>>>>>
>>>>>>> As Tom noted here
>>>>>>> <https://github.com/pandas-dev/pandas/issues/28095#issuecomment-537501744>,
>>>>>>> using pd.NA in places where we currently use NaT breaks the usual identity
>>>>>>> (that we rely on A LOT)
>>>>>>>
>>>>>>> ```(array + array)[0].dtype <=> (array + array[0]).dtype```
>>>>>>>
>>>>>>> (Yes, this holds only imperfectly for NaT because NaT serves as both
>>>>>>> NotADatetime and NotATimedelta, and I'll refer you to the decomposing horse
>>>>>>> in #28095.)
>>>>>>>
>>>>>>> Also from #28095:
>>>>>>>
>>>>>>> ```Series[timedelta64] * pd.NaT``` unambiguously raises, but
>>>>>>> ```Series[timedelta64] * pd.NA``` could be timedelta64
>>>>>>>
>>>>>>> > Assume we introduce a new "nullable datetime" dtype that uses a
>>>>>>> mask to track NAs, and can still have NaT in the values. In practice, this
>>>>>>> still means that we "replace NaT with NA"
>>>>>>>
>>>>>>> This strikes me as contradictory.
>>>>>>>
>>>>>>> > So do you mean: "in my opinion, we should not do this" (what I
>>>>>>> just described above), because in practice that would mean breaking
>>>>>>> arithmetic consistency? Or that if we want to start using NA for
>>>>>>> datetimelike dtypes, you think "dtype-parametrized" NA values are necessary
>>>>>>> (so you can distinguish NA[datetime] and NA[timedelta] ?)
>>>>>>>
>>>>>>> I think:
>>>>>>>
>>>>>>> 1) pd.NA solves an _actual_ problem which is that we used to use
>>>>>>> np.nan in places (categorical, object) where np.nan was semantically
>>>>>>> misleading.
>>>>>>>    a) What these have in common is that they are in general
>>>>>>> non-arithmetic dtypes.
>>>>>>>    b) This is an improvement, and I'm glad you put in the effort to
>>>>>>> make it happen.
>>>>>>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>>>>>>> misleading based on the Highlander Principle is counter-productive.
>>>>>>>
>>>>>>> 2) The "only one NA value is simpler" argument strikes me as a
>>>>>>> solution in search of a problem.
>>>>>>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>>>>>>> instead of replace them.
>>>>>>>    b) *the idea of replacing vs supplementing needs to be made much
>>>>>>> more explicit/clear*
>>>>>>>
>>>>>>> 3) The "dtype-parametrized" NA did come up in #28095, but I never
>>>>>>> advocated it.
>>>>>>>    a) I am open to separating out a NaTimedelta (xref #24983) from
>>>>>>> pd.NaT, and don't particularly care what it is called.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 17, 2020 at 3:37 AM Joris Van den Bossche <
>>>>>>> jorisvandenbossche at gmail.com> wrote:
>>>>>>>
>>>>>>>> > This would also imply creating a nullable float dtype and making
>>>>>>>>> our datelikes use NA rather than NaT too. That seemed to be generally OK,
>>>>>>>>> but wasn't discussed too much.
>>>>>>>>>
>>>>>>>>> My understanding of the discussion is that using a mask on top of
>>>>>>>>> datetimelike arrays would not _replace_ NaT, but supplement it with
>>>>>>>>> something semantically different.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, if we see it similar as NaNs for floats (where NaN is a
>>>>>>>> specific float value in the data array, while NAs are tracked in the mask
>>>>>>>> array), then for datetimelike arrays we can do something similar. And the
>>>>>>>> same discussions about to what extent to distinguish NaN and NA or whether
>>>>>>>> we need to provide options that we are going to have for float dtypes, will
>>>>>>>> also be relevant for datetimelike dtypes (but then for NaT and NA).
>>>>>>>>
>>>>>>>> But note that in practice, I *think* that the big majority of use
>>>>>>>> cases will mostly use NA and not NaT in the data (eg when reading from
>>>>>>>> files that have missing data).
>>>>>>>>
>>>>>>>> Replacing NaT with NA breaks arithmetic consistency, as has been
>>>>>>>>> discussed ad nauseum.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's not fully clear to me what you want to say with this, so a
>>>>>>>> more detailed clarification is welcome (I mean, I understand the sentence
>>>>>>>> and remember the discussion, but don't fully understand the point being
>>>>>>>> made in context, or in what direction you think more discussion is needed).
>>>>>>>>
>>>>>>>> Assume we introduce a new "nullable datetime" dtype that uses a
>>>>>>>> mask to track NAs, and can still have NaT in the values. In practice, this
>>>>>>>> still means that we "replace NaT with NA" (because even though NaT is still
>>>>>>>> possible, I think you would mostly get NAs as mentioned above; eg reading a
>>>>>>>> file would now give NA instaed of NaT).
>>>>>>>> So do you mean: "in my opinion, we should not do this" (what I just
>>>>>>>> described above), because in practice that would mean breaking arithmetic
>>>>>>>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>>>>>>>> you think "dtype-parametrized" NA values are necessary (so you can
>>>>>>>> distinguish NA[datetime] and NA[timedelta] ?)
>>>>>>>>
>>>>>>>> Joris
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pandas-dev mailing list
>>>>>>>> Pandas-dev at python.org
>>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pandas-dev mailing list
>>>>>>> Pandas-dev at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200218/416a2664/attachment-0001.html>

From jorisvandenbossche at gmail.com  Wed Feb 19 17:46:37 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 19 Feb 2020 23:46:37 +0100
Subject: [Pandas-dev] What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
Message-ID: <CALQtMBYhfU6pSpBuXchAv0qk9K4t=JAmucx5hGjF32ge9pKy+w@mail.gmail.com>

Some answers to previous mails first:

On Mon, 17 Feb 2020 at 17:34, Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> > 2) The "only one NA value is simpler" argument strikes me as a solution
> in search of a problem.
>
> I don't think that's correct. I think consistently propagating NA in
> comparison operations is a worthwhile goal.
>
> Having a single, consistent missing value indicator across all dtypes is *for
me* one of the main drivers that led me to make the pd.NA proposal.
>From my personal experience (eg when teaching pandas to beginners), this is
an existing problem that complicates things, not one that is being invented.


> On Mon, Feb 17, 2020 at 10:25 AM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>>
>> > Assume we introduce a new "nullable datetime" dtype that uses a mask to
>> track NAs, and can still have NaT in the values. In practice, this still
>> means that we "replace NaT with NA"
>>
>> This strikes me as contradictory.
>>
>
I tried to explain this in the next sentence from the original text:
"(because even though NaT is still possible, I think you would mostly get
NAs as mentioned above; eg reading a file would now give NA instead of
NaT)."
So assuming you have a masked-array-approach for datetimes, then you can
have NaT as a valid datetime value in the values part or NA due to the mask
part of the array. In such a case (but this is only an assumption how the
extension array *could* work!), it's the NA that is the main missing value
indicator. So if you are creating such a masked datetime-like array with
missing values (eg from reading a file), you will get NAs as missing values
in this case in contrast to NaTs right now. Hence, in practice we would
"replace NaT with NA", although you can still have NaT in the values.

Note I only started to explain this in response to your initial "using a
mask on top of datetimelike arrays would not _replace_ NaT, but supplement
it with something semantically different", but maybe I misunderstood your
initial comment.


>
>> > So do you mean: "in my opinion, we should not do this" (what I just
>> described above), because in practice that would mean breaking arithmetic
>> consistency? Or that if we want to start using NA for datetimelike dtypes,
>> you think "dtype-parametrized" NA values are necessary (so you can
>> distinguish NA[datetime] and NA[timedelta] ?)
>>
>> I think:
>>
>> 1) pd.NA solves an _actual_ problem which is that we used to use np.nan
>> in places (categorical, object) where np.nan was semantically misleading.
>>    a) What these have in common is that they are in general
>> non-arithmetic dtypes.
>>    b) This is an improvement, and I'm glad you put in the effort to make
>> it happen.
>>    c) Trying to shoe-horn pd.NA into cases where it is semantically
>> misleading based on the Highlander Principle is counter-productive.
>>
>
With "semantically misleading", I suppose you mean that "Series[Timedelta]
+ pd.NA" could result both in timedelta or datetime64?

Personally, I don't think this is big problem (or at least I think a single
pd.NA brings bigger benefits), but this has indeed already discussed


>
>> 2) The "only one NA value is simpler" argument strikes me as a solution
>> in search of a problem.
>>    a) All the more so if you want this to supplement np.nan/pd.NaT
>> instead of replace them.
>>    b) *the idea of replacing vs supplementing needs to be made much more
>> explicit/clear*
>>
>>
I thought you were actually advocating for supplementing instead of
replacing in your first email ;) (but maybe you were rather trying to
summarize the discussion, and not giving an opinion?)
Anyway, I will open an github issue for float dtypes to further discuss
this (I think the float case is the easier to discuss, while the issues are
similar to datetime with NaT).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200219/ce9489f8/attachment.html>

From jorisvandenbossche at gmail.com  Wed Feb 19 17:55:27 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 19 Feb 2020 23:55:27 +0100
Subject: [Pandas-dev] Fwd: What could a pandas 2.0 look like?
In-Reply-To: <CAE1aY-m=96_anT=JewOy5y4hYLB+yPXV2CSCHYNsk_MeqTRQuw@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
 <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
 <CAE1aY-mLB+1DfvP0hmiaJjzz8C60K1soiNA-MOeqVm-gSbUogA@mail.gmail.com>
 <CAKf8g9R5rQfE+bNK7BOAdJt2hubJrth7y3pKmX33LnH2ziASUQ@mail.gmail.com>
 <CAE1aY-=vd-v4+37CoH22pbXNbq+XNa-g8KZVw1=3MYX5v+=JhA@mail.gmail.com>
 <CAE1aY-m=96_anT=JewOy5y4hYLB+yPXV2CSCHYNsk_MeqTRQuw@mail.gmail.com>
Message-ID: <CALQtMBZjNh08vA_PmB=0FYPvtrXqb0vzavCUPe74HrSDvroVQA@mail.gmail.com>

On Tue, 18 Feb 2020 at 18:20, Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

>
> On Mon, Feb 17, 2020 at 7:17 PM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> > You have no problem with changing the behavior of NaT, or changing to
>> use pd.NA?
>>
>> If/when we get to a point where we propagate NAs in all other
>> comparisons, I would have no problem with editing `NaT.__richcmp__` to
>> match that convention.
>>
>
> What are the advantages of a NaT with NA-like comparison semantics over
> using NA
> (or NA[datetime])?
>
> 1. Retain dtype in array - scalar ops with a scalar NA
> 2. ...
> 3. Less disruptive than changing to NA
>
> My ... could include things like `isinstance(NaT, Timestamp)` being true
> and
> `NaT.<attr>` for Timestamp attributes. But those don't strike me as
> necessarily
> good things. They seem sometimes useful and sometimes harmful.
>
> The downside of changing NaT in comparison operations are
>
> 1. We're diverging from `np.NaT`. I don't know how problematic this
> actually is.
> 2. It's a special case. Should users need to know that datelikes use their
> own
>    NA value because the underlying storage is able to store them "in-band"
>    rather than as a mask? My gut reaction is "no, users shouldn't be
> exposed to
>    this."
> 3. Changing NaT would leave just NaN with the "always unequal in
> comparisons"
>    behavior.
>

Personally, I think changing the behaviour of NaT in pandas, and thus
deviating from the behaviour of the same value in numpy, is not a good
idea. For me, that seems more confusing than having a clearly distinct
value (pd.NA) that has the different behaviour.


>
> Thus far, I see three options going forward
>
> 1. Use NaN for floats, NaT for datelikes, NA for other.
>   1-a: Leave NaT with always unequal
>   1-b: Change NaT to have NA-like comparison behavior
> 2. Use NA everywhere (no NaN for float, no NaT for datelike
> 3. Implement a typed `NA<T>`, where we have an `NA` per dtype.
>
> Option 3 I think solves the array - scalar op issue. It's more complex for
> users
> though hopefully not too complex? My biggest worry is that it makes the
> implementation much more complex, though perhaps I'm being pessimistic.
>
> On balance, I'm not sure where I come down yet. Good news: we can take
> time to
> figure this out :)
>

Thanks for the summary!
Personally, I don't like the first option *long term* as it keeps different
missing values (eg NaN) with different behaviours for some dtypes as
default, while I would like to see us moving to a consistent missing value
indicator.
And I think we can take a similar approach as we somewhat decided in the
original discussion on pd.NA: let's start with a single pd.NA, and we can
see later if there is a need to make it typed.

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200219/b9cec9ba/attachment.html>

From jbrockmendel at gmail.com  Wed Feb 19 18:52:10 2020
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Wed, 19 Feb 2020 15:52:10 -0800
Subject: [Pandas-dev] Fwd: What could a pandas 2.0 look like?
In-Reply-To: <CALQtMBZjNh08vA_PmB=0FYPvtrXqb0vzavCUPe74HrSDvroVQA@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
 <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
 <CAE1aY-mLB+1DfvP0hmiaJjzz8C60K1soiNA-MOeqVm-gSbUogA@mail.gmail.com>
 <CAKf8g9R5rQfE+bNK7BOAdJt2hubJrth7y3pKmX33LnH2ziASUQ@mail.gmail.com>
 <CAE1aY-=vd-v4+37CoH22pbXNbq+XNa-g8KZVw1=3MYX5v+=JhA@mail.gmail.com>
 <CAE1aY-m=96_anT=JewOy5y4hYLB+yPXV2CSCHYNsk_MeqTRQuw@mail.gmail.com>
 <CALQtMBZjNh08vA_PmB=0FYPvtrXqb0vzavCUPe74HrSDvroVQA@mail.gmail.com>
Message-ID: <CAKf8g9RuyrBi7Xys_dYD9eGLZ=bfmj=AB8LkxPkhxKCTdWHx7Q@mail.gmail.com>

Pivoting: Joris, on the call you mentioned a TimestampArray.  Can you
expand on that a bit?

On Wed, Feb 19, 2020 at 2:55 PM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

>
>
> On Tue, 18 Feb 2020 at 18:20, Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>>
>> On Mon, Feb 17, 2020 at 7:17 PM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>> > You have no problem with changing the behavior of NaT, or changing to
>>> use pd.NA?
>>>
>>> If/when we get to a point where we propagate NAs in all other
>>> comparisons, I would have no problem with editing `NaT.__richcmp__` to
>>> match that convention.
>>>
>>
>> What are the advantages of a NaT with NA-like comparison semantics over
>> using NA
>> (or NA[datetime])?
>>
>> 1. Retain dtype in array - scalar ops with a scalar NA
>> 2. ...
>> 3. Less disruptive than changing to NA
>>
>> My ... could include things like `isinstance(NaT, Timestamp)` being true
>> and
>> `NaT.<attr>` for Timestamp attributes. But those don't strike me as
>> necessarily
>> good things. They seem sometimes useful and sometimes harmful.
>>
>> The downside of changing NaT in comparison operations are
>>
>> 1. We're diverging from `np.NaT`. I don't know how problematic this
>> actually is.
>> 2. It's a special case. Should users need to know that datelikes use
>> their own
>>    NA value because the underlying storage is able to store them "in-band"
>>    rather than as a mask? My gut reaction is "no, users shouldn't be
>> exposed to
>>    this."
>> 3. Changing NaT would leave just NaN with the "always unequal in
>> comparisons"
>>    behavior.
>>
>
> Personally, I think changing the behaviour of NaT in pandas, and thus
> deviating from the behaviour of the same value in numpy, is not a good
> idea. For me, that seems more confusing than having a clearly distinct
> value (pd.NA) that has the different behaviour.
>
>
>>
>> Thus far, I see three options going forward
>>
>> 1. Use NaN for floats, NaT for datelikes, NA for other.
>>   1-a: Leave NaT with always unequal
>>   1-b: Change NaT to have NA-like comparison behavior
>> 2. Use NA everywhere (no NaN for float, no NaT for datelike
>> 3. Implement a typed `NA<T>`, where we have an `NA` per dtype.
>>
>> Option 3 I think solves the array - scalar op issue. It's more complex
>> for users
>> though hopefully not too complex? My biggest worry is that it makes the
>> implementation much more complex, though perhaps I'm being pessimistic.
>>
>> On balance, I'm not sure where I come down yet. Good news: we can take
>> time to
>> figure this out :)
>>
>
> Thanks for the summary!
> Personally, I don't like the first option *long term* as it keeps
> different missing values (eg NaN) with different behaviours for some dtypes
> as default, while I would like to see us moving to a consistent missing
> value indicator.
> And I think we can take a similar approach as we somewhat decided in the
> original discussion on pd.NA: let's start with a single pd.NA, and we can
> see later if there is a need to make it typed.
>
> Joris
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200219/9acc7aa5/attachment-0001.html>

From jorisvandenbossche at gmail.com  Wed Feb 26 05:40:46 2020
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 26 Feb 2020 11:40:46 +0100
Subject: [Pandas-dev] Fwd: What could a pandas 2.0 look like?
In-Reply-To: <CAKf8g9RuyrBi7Xys_dYD9eGLZ=bfmj=AB8LkxPkhxKCTdWHx7Q@mail.gmail.com>
References: <CALQtMBZdrFD7iiNwOQeXs94tYxLqLb-otoJcSfvTX9meQnPcyw@mail.gmail.com>
 <CAE1aY-mPvm8=1wUbW2ADa1bypWwthqPSU5K3J+6_2cSnbVW9OQ@mail.gmail.com>
 <CAKf8g9S7Lm7q2EBNN7E_O1nt3N7gOLbcJhenmsDSnBwWhE5PEg@mail.gmail.com>
 <CAE1aY-krBZfSk-cHEABWBXS5oGRqQyAPjCCvHVQosrCiKKGNsw@mail.gmail.com>
 <CALQtMBa1uTUC2pHEgvj_1agrtDtMzjNEWNyX3hU6S6qWaHHLpQ@mail.gmail.com>
 <CAKf8g9SNcViBQ2DdkJxWuZd6gf6c1G=aCQiAPXX974-LXwB36A@mail.gmail.com>
 <CAE1aY-mNN4p64s5kbM=FGvG4Y51vrXEgSxhKk81EA8L3hDdG8Q@mail.gmail.com>
 <CAKf8g9Tf-pVAwFZhP8UHV8LUtEbo+i7O9yc8KgQ=vR2kdCaKdw@mail.gmail.com>
 <CAE1aY-=GeJza4H3OXiuXFG30u6q7pgsYcj6BdRyF6pMfaMs1Qg@mail.gmail.com>
 <CAKf8g9TVrmO0fdDaa85=3+Xd-SO7wrJLWEsCYzpfb=kt7Jz1jg@mail.gmail.com>
 <CAE1aY-mLB+1DfvP0hmiaJjzz8C60K1soiNA-MOeqVm-gSbUogA@mail.gmail.com>
 <CAKf8g9R5rQfE+bNK7BOAdJt2hubJrth7y3pKmX33LnH2ziASUQ@mail.gmail.com>
 <CAE1aY-=vd-v4+37CoH22pbXNbq+XNa-g8KZVw1=3MYX5v+=JhA@mail.gmail.com>
 <CAE1aY-m=96_anT=JewOy5y4hYLB+yPXV2CSCHYNsk_MeqTRQuw@mail.gmail.com>
 <CALQtMBZjNh08vA_PmB=0FYPvtrXqb0vzavCUPe74HrSDvroVQA@mail.gmail.com>
 <CAKf8g9RuyrBi7Xys_dYD9eGLZ=bfmj=AB8LkxPkhxKCTdWHx7Q@mail.gmail.com>
Message-ID: <CALQtMBa+QSfF-8S4Eg=iaoaPo6fQV6biLnxTL5sW-gf9OrFJAQ@mail.gmail.com>

On Thu, 20 Feb 2020 at 00:52, Brock Mendel <jbrockmendel at gmail.com> wrote:

> Pivoting: Joris, on the call you mentioned a TimestampArray.  Can you
> expand on that a bit?
>

Basically what I mentioned before in this thread: a new ExtensionArray that
uses pd.NA as missing value indicator instead of pd.NaT, and where the NAs
are potentially tracked in a mask (as done for the nullable integer
dtypes).
There are more things about it (like allowing more resolutions? single
dtype for tz-naive/tz-aware? ..), but I will try to open a separate
discussion going more in depth about this shortly.

But, the issue for NA vs NaT is somewhat similar as NA vs NaN, for which I
just opened an issue to further discuss this in more detail:
https://github.com/pandas-dev/pandas/issues/32265


>
> On Wed, Feb 19, 2020 at 2:55 PM Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>>
>>
>> On Tue, 18 Feb 2020 at 18:20, Tom Augspurger <tom.augspurger88 at gmail.com>
>> wrote:
>>
>>>
>>> On Mon, Feb 17, 2020 at 7:17 PM Brock Mendel <jbrockmendel at gmail.com>
>>> wrote:
>>>
>>>> > You have no problem with changing the behavior of NaT, or changing to
>>>> use pd.NA?
>>>>
>>>> If/when we get to a point where we propagate NAs in all other
>>>> comparisons, I would have no problem with editing `NaT.__richcmp__` to
>>>> match that convention.
>>>>
>>>
>>> What are the advantages of a NaT with NA-like comparison semantics over
>>> using NA
>>> (or NA[datetime])?
>>>
>>> 1. Retain dtype in array - scalar ops with a scalar NA
>>> 2. ...
>>> 3. Less disruptive than changing to NA
>>>
>>> My ... could include things like `isinstance(NaT, Timestamp)` being true
>>> and
>>> `NaT.<attr>` for Timestamp attributes. But those don't strike me as
>>> necessarily
>>> good things. They seem sometimes useful and sometimes harmful.
>>>
>>> The downside of changing NaT in comparison operations are
>>>
>>> 1. We're diverging from `np.NaT`. I don't know how problematic this
>>> actually is.
>>> 2. It's a special case. Should users need to know that datelikes use
>>> their own
>>>    NA value because the underlying storage is able to store them
>>> "in-band"
>>>    rather than as a mask? My gut reaction is "no, users shouldn't be
>>> exposed to
>>>    this."
>>> 3. Changing NaT would leave just NaN with the "always unequal in
>>> comparisons"
>>>    behavior.
>>>
>>
>> Personally, I think changing the behaviour of NaT in pandas, and thus
>> deviating from the behaviour of the same value in numpy, is not a good
>> idea. For me, that seems more confusing than having a clearly distinct
>> value (pd.NA) that has the different behaviour.
>>
>>
>>>
>>> Thus far, I see three options going forward
>>>
>>> 1. Use NaN for floats, NaT for datelikes, NA for other.
>>>   1-a: Leave NaT with always unequal
>>>   1-b: Change NaT to have NA-like comparison behavior
>>> 2. Use NA everywhere (no NaN for float, no NaT for datelike
>>> 3. Implement a typed `NA<T>`, where we have an `NA` per dtype.
>>>
>>> Option 3 I think solves the array - scalar op issue. It's more complex
>>> for users
>>> though hopefully not too complex? My biggest worry is that it makes the
>>> implementation much more complex, though perhaps I'm being pessimistic.
>>>
>>> On balance, I'm not sure where I come down yet. Good news: we can take
>>> time to
>>> figure this out :)
>>>
>>
>> Thanks for the summary!
>> Personally, I don't like the first option *long term* as it keeps
>> different missing values (eg NaN) with different behaviours for some dtypes
>> as default, while I would like to see us moving to a consistent missing
>> value indicator.
>> And I think we can take a similar approach as we somewhat decided in the
>> original discussion on pd.NA: let's start with a single pd.NA, and we can
>> see later if there is a need to make it typed.
>>
>> Joris
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200226/7c400e89/attachment.html>