From tom.augspurger88 at gmail.com  Sun Feb  3 22:10:05 2019
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Sun, 3 Feb 2019 21:10:05 -0600
Subject: [Pandas-dev] ANN: Pandas 0.24.1 Released
Message-ID: <CAE1aY-nY_3u-YzOJLC-gBXsMhpfbc1t_6-F6PeP6J7N1aFKO2A@mail.gmail.com>

This is a minor bug-fix release in the 0.24.x series and includes some
regression fixes
and bug fixes. We recommend that all users upgrade to this version.

See the full whatsnew
<http://pandas.pydata.org/pandas-docs/version/0.24.2/whatsnew/v0.24.2.html> for
a list of all the changes.

The release can be installed with conda from the defaults and conda-forge
channels:

conda install pandas

Or via PyPI:

python -m pip install --upgrade pandas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190203/7e922d45/attachment.html>

From xhochy at gmail.com  Mon Feb  4 11:20:00 2019
From: xhochy at gmail.com (Uwe L. Korn)
Date: Mon, 4 Feb 2019 17:20:00 +0100
Subject: [Pandas-dev] How Far do we take ExtensionArrays?
In-Reply-To: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
References: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
Message-ID: <CAGSNw=Bc4WPff=NvqT--UWYHfhDVX3hDGneNn03gciy-V8gjyQ@mail.gmail.com>

Hello Tom,

overall I really like the concept of ExtensionArrays but for more advanced
usage I think there is still a lot to do. At the moment, an implementer is
quite well off when the ExtensionArray can be coerced into a numpy array.
Once you have data that is not well represented by a numpy array, you need
to develop much more algorithms. For fletcher this has been a major hurdle
for me (or why I'm not implementing so much). This might also just be that
my backing library (Apache Arrow) is missing a lot of numerical operations
yet. I hope to have some time in the next months to work more on this and
then we can see how much issues pop up. At the end though, I would like to
avoid coercing as much as possible to NumPy arrays as the conversion of
arrays with null adds some computational overhead.

More comments inline.

Am Mi., 16. Jan. 2019 um 18:16 Uhr schrieb Tom Augspurger <
tom.augspurger88 at gmail.com>:

> This is something I've been mulling over the past few days: how much do we
> want
> ExtensionArrays to change pandas?
>
> They've been great so far at addressing some of the shortcomings of
> NumPy's type
> system, but I imagine that users will be interested in pushing things even
> further. For example, users have been asking for proper support for nested
> data.
> Now that we have ExtensionArrays, things are essentially solved at the
> memory
> level (by e.g. Apache Arrow). But, I imagine that the set of APIs
> typically used
> for nested data is quite different from those used for flat, tabular data
> pandas
> handles thus far. If we want to properly support nested data, what
> tolerance do
> we have for it "cluttering" the existing API?
>

Do we already have an example use case for nested data. For me it is hard
to image intuitive APIs for nested data without really good example use
cases.


>
> Finally (and this may be a topic for another day) have people thought
> about how
> 3rd-party EAs fit in with the potential block manager rewrite? IIUC, one
> of the
> goals there was a stable C API to the memory inside a DataFrame. Does
> anyone
> know how that would work with a array that doesn't (or can't) implement the
> buffer protocol?
>

As an example, all Arrow arrays cannot implement the buffer protocol as
each Array has at least 2 buffers (bitmap and the actual data). In fletcher
I have also used ChunkedArray as the backing object of a series. This
allows us to do operations like concat in constant time but also comes with
the cognitive overhead that data may not be stored as a single contiguous
memory array.


>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190204/47621dd3/attachment.html>

From julien.marrec at gmail.com  Tue Feb  5 02:35:08 2019
From: julien.marrec at gmail.com (Julien Marrec)
Date: Tue, 05 Feb 2019 08:35:08 +0100
Subject: [Pandas-dev] [pydata] ANN: Pandas 0.24.1 Released
In-Reply-To: <CAE1aY-nY_3u-YzOJLC-gBXsMhpfbc1t_6-F6PeP6J7N1aFKO2A@mail.gmail.com>
References: <CAE1aY-nY_3u-YzOJLC-gBXsMhpfbc1t_6-F6PeP6J7N1aFKO2A@mail.gmail.com>
Message-ID: <cf9d2cc1-b783-4b5b-8d0a-db404b50cce9@gmail.com>

Hi,

For some reason the link points to 24.2 and cannot be found. Please find the correct link here: 
http://pandas.pydata.org/pandas-docs/version/0.24.1/whatsnew/v0.24.1.html


Thanks for this release and all the good work!

Julien
?--
Sent from a mobile device, please excuse the brevity.

Julien Marrec, EBCP, BPI MFBA
Owner

Direct:?+33 6 95 14 42 13
Website:?www.effibem.com
LinkedIn (en) | (fr)?

On Feb 4, 2019, 04:10, at 04:10, Tom Augspurger <tom.augspurger88 at gmail.com> wrote:
>This is a minor bug-fix release in the 0.24.x series and includes some
>regression fixes
>and bug fixes. We recommend that all users upgrade to this version.
>
>See the full whatsnew
><http://pandas.pydata.org/pandas-docs/version/0.24.2/whatsnew/v0.24.2.html>
>for
>a list of all the changes.
>
>The release can be installed with conda from the defaults and
>conda-forge
>channels:
>
>conda install pandas
>
>Or via PyPI:
>
>python -m pip install --upgrade pandas
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "PyData" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to pydata+unsubscribe at googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190205/9cbf8733/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 57.png
Type: image/png
Size: 1456 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190205/9cbf8733/attachment.png>

From tom.augspurger88 at gmail.com  Thu Feb  7 15:07:54 2019
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Thu, 7 Feb 2019 14:07:54 -0600
Subject: [Pandas-dev] How Far do we take ExtensionArrays?
In-Reply-To: <1549569777.8043.3.camel@pietrobattiston.it>
References: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
 <1549569777.8043.3.camel@pietrobattiston.it>
Message-ID: <CAE1aY-mOQZxwxdc-2FKbGx9d3JHYk090163sxfN6bto6vh3p0Q@mail.gmail.com>

Third party accessors work just fine today:
http://pandas-docs.github.io/pandas-docs-travis/development/extending.html#extending-register-accessors

There are likely thorny internal issues where we expect "scalars" but
actually get an array in the nested data case.

On Thu, Feb 7, 2019 at 2:02 PM Pietro Battiston <me at pietrobattiston.it>
wrote:

> Quick questions on one of your points,
>
> Il giorno mer, 16/01/2019 alle 11.15 -0600, Tom Augspurger ha scritto:
> > [...]
> > They've been great so far at addressing some of the shortcomings of
> > NumPy's type
> > system, but I imagine that users will be interested in pushing things
> > even
> > further. For example, users have been asking for proper support for
> > nested data.
>
> How much would it be difficult, in your opinion, to allow users to
> provide third-party accessors?
>
> My impression is that "proper support for nested data" is way beyond
> the amount of complexity that we want introduce in the code base, but
> that with EA we should be very close to allowing them to do it
> themselves.
>
> (Not commenting on the other points because I don't know enough about
> what was discussed, e.g. on the block manager rewrite)
>
> Pietro
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190207/0a7f9ffb/attachment.html>

From me at pietrobattiston.it  Thu Feb  7 15:02:57 2019
From: me at pietrobattiston.it (Pietro Battiston)
Date: Thu, 07 Feb 2019 21:02:57 +0100
Subject: [Pandas-dev] How Far do we take ExtensionArrays?
In-Reply-To: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
References: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
Message-ID: <1549569777.8043.3.camel@pietrobattiston.it>

Quick questions on one of your points,

Il giorno mer, 16/01/2019 alle 11.15 -0600, Tom Augspurger ha scritto:
> [...]
> They've been great so far at addressing some of the shortcomings of
> NumPy's type
> system, but I imagine that users will be interested in pushing things
> even
> further. For example, users have been asking for proper support for
> nested data.

How much would it be difficult, in your opinion, to allow users to
provide third-party accessors?

My impression is that "proper support for nested data" is way beyond
the amount of complexity that we want introduce in the code base, but
that with EA we should be very close to allowing them to do it
themselves.

(Not commenting on the other points because I don't know enough about
what was discussed, e.g. on the block manager rewrite)

Pietro


From jorisvandenbossche at gmail.com  Thu Feb  7 16:59:24 2019
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Thu, 7 Feb 2019 22:59:24 +0100
Subject: [Pandas-dev] How Far do we take ExtensionArrays?
In-Reply-To: <CAE1aY-mOQZxwxdc-2FKbGx9d3JHYk090163sxfN6bto6vh3p0Q@mail.gmail.com>
References: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
 <1549569777.8043.3.camel@pietrobattiston.it>
 <CAE1aY-mOQZxwxdc-2FKbGx9d3JHYk090163sxfN6bto6vh3p0Q@mail.gmail.com>
Message-ID: <CALQtMBZnSoeDTxLxke=0i1bVseWCx-0iPKyZk8Lp+YMBjUYGZA@mail.gmail.com>

Op do 7 feb. 2019 om 21:08 schreef Tom Augspurger <
tom.augspurger88 at gmail.com>:

> Third party accessors work just fine today:
> http://pandas-docs.github.io/pandas-docs-travis/development/extending.html#extending-register-accessors
>
> There are likely thorny internal issues where we expect "scalars" but
> actually get an array in the nested data case.
>
> I indeed think that's the main problem right now if you would want to
implement a proper neted (eg list/dict) EA, and we already had some issues
about that for the dummy json test case I think.
That might involve a mechanism deferring to the ExtensionArray to determine
if an object is a scalar vs array-like of scalars of certain length?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190207/d9a522d1/attachment.html>

From jorisvandenbossche at gmail.com  Fri Feb  8 07:26:40 2019
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Fri, 8 Feb 2019 13:26:40 +0100
Subject: [Pandas-dev] How Far do we take ExtensionArrays?
In-Reply-To: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
References: <CAE1aY-kuyyX_SN=KJeMV2+=REt_LSsfwjtRyBVJb3h=+RNztEw@mail.gmail.com>
Message-ID: <CALQtMBaDU65JVuDnPob636kz7O4H-JfFdLL6c0dE=kN45K3YXQ@mail.gmail.com>

Op wo 16 jan. 2019 om 18:16 schreef Tom Augspurger <
tom.augspurger88 at gmail.com>:

> This is something I've been mulling over the past few days: how much do we
> want
> ExtensionArrays to change pandas?
>
> [...]
>
> As another semi-example, users may be interested in storing some or all
> their
> data on a GPU in an ExtensionArray or arrays backed by GPU-memory. I
> suspect
> that some things work quite well currently, e.g. `Series.sort_values` will
> dispatch to the `ExtensionArray.argsort`, which can use a GPU-accelerated
> sorting algorithm. But other parts of pandas (anything in Cython, for
> example)
> won't necessarily work. How much are we willing to refactor pandas'
> internals to
> support something that's going to live outside pandas (as a GPU extension
> array
> likely would)?
>
> To have a practical example: for example for a groupby operation, we
dispatch to the ExtensionArray for the factorization step, but the actual
computation of grouped reductions is still done in cython. Is that the kind
of things you were thinking about?


> Finally (and this may be a topic for another day) have people thought
> about how
> 3rd-party EAs fit in with the potential block manager rewrite? IIUC, one
> of the
> goals there was a stable C API to the memory inside a DataFrame. Does
> anyone
> know how that would work with a array that doesn't (or can't) implement the
> buffer protocol?
>
> In the idea of getting rid of blocks and having just 1D arrays, it
certainly fits I would say (we could extend the current numpy-backed
PandasArrays that are now only used in `.array`). But if the idea is to
rewrite the block manager in C/Cython, that might be more difficult.
However, if a future version of pandas would be backed by Arrow, we
wouldn't necessarily need our own C API, as a reference to the Arrow table
/ arrays might be sufficient? Of course that depends on how tight we want
to depend on Arrow, as that might limit the extensibility with other
backends.

Op ma 4 feb. 2019 om 17:26 schreef Uwe L. Korn <xhochy at gmail.com>:

> Hello Tom,
>
> overall I really like the concept of ExtensionArrays but for more advanced
> usage I think there is still a lot to do. At the moment, an implementer is
> quite well off when the ExtensionArray can be coerced into a numpy array.
> Once you have data that is not well represented by a numpy array, you need
> to develop much more algorithms.
>

I think that is more or less to be expected. All our internal algorithms
are based on numpy arrays. I think it would be an interesting idea to see
if we can/want to expose some of our algorithms for external users (eg
external ExtensionArray implementors). But even if we do that, it wouldn't
really help for the fletcher case given the different memory layout.
Shorter term idea that I would find interesting is to see to what extent
xtensor could be used for the algorithms.

Joris


> For fletcher this has been a major hurdle for me (or why I'm not
> implementing so much). This might also just be that my backing library
> (Apache Arrow) is missing a lot of numerical operations yet. I hope to have
> some time in the next months to work more on this and then we can see how
> much issues pop up. At the end though, I would like to avoid coercing as
> much as possible to NumPy arrays as the conversion of arrays with null adds
> some computational overhead.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190208/0030b499/attachment-0001.html>

From jorisvandenbossche at gmail.com  Sat Feb  9 04:32:26 2019
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Sat, 9 Feb 2019 10:32:26 +0100
Subject: [Pandas-dev] Remaining API changes / deprecations to do before
 1.0 (wishlist)
In-Reply-To: <CAE1aY-=VoTrG4Lz0cxP1v6_-3kXk63XMGVAF_odnAieFXbguFw@mail.gmail.com>
References: <CALQtMBb2mwyWsgiHOqizA-BxFxuGCLqRJ1Xf9zrughDhOi9+uQ@mail.gmail.com>
 <CAE1aY-=VoTrG4Lz0cxP1v6_-3kXk63XMGVAF_odnAieFXbguFw@mail.gmail.com>
Message-ID: <CALQtMBYzydyGbyvEceCcaJwMnf2eDgxAYL25Ca_yjC7j4bEdJA@mail.gmail.com>

Op do 17 jan. 2019 om 16:20 schreef Tom Augspurger <
tom.augspurger88 at gmail.com>:

>
> My main one is deprecation of SparseSeries and SparseDataFrame, since I
> don't want to carry that around for all of 1.x.
> I didn't quite get there for 0.24, but all the really disruptive changes
> are done now (no method in pandas should return a SparseDataFrame anymore,
> aside from DataFrame.to_sparse()).
>

Yes, the SparseSeries / SparseDataFrame is also on my list.
Some others:

- I think it would be good to do the planned breaking changes to
IntervalIndex indexing before 1.0.
- The `inplace` deprecation (GH16529
<https://github.com/pandas-dev/pandas/issues/16529>) (or at least decide on
it). Although a possible deprecation + removal will have a huge impact and
need some time, so this might rather be something for after 1.0.

Other people's thoughts on the original question?

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190209/e6e9d1cf/attachment.html>

From brunox.leitao at gmail.com  Mon Feb 11 14:57:09 2019
From: brunox.leitao at gmail.com (Bruno Xavier)
Date: Mon, 11 Feb 2019 17:57:09 -0200
Subject: [Pandas-dev] DataFrame Object creation
Message-ID: <CAFMzLkS1gcPLr4vcej6UKwKTF0NrkqM8aw6LQq+MtyVX=vBaQA@mail.gmail.com>

I'm new in Pandas Library and I got a doubt on DataFrame object
construction.

When building from a list  the shape of a dataframe is 1 column with
the row-count
equaling the number of list elements.

Unlike that, if we build it from a dictionary it's shape is 1 row with
the column-count
equalling the number of key-value pairs.

My question is why they are different, why not made both follow the same
pattern?

-- 
*Atenciosamente, *

*Bruno Xavier Leit?o*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190211/f765a4c9/attachment.html>