From tom.augspurger88 at gmail.com Sun Feb 3 22:10:05 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Sun, 3 Feb 2019 21:10:05 -0600 Subject: [Pandas-dev] ANN: Pandas 0.24.1 Released Message-ID: This is a minor bug-fix release in the 0.24.x series and includes some regression fixes and bug fixes. We recommend that all users upgrade to this version. See the full whatsnew for a list of all the changes. The release can be installed with conda from the defaults and conda-forge channels: conda install pandas Or via PyPI: python -m pip install --upgrade pandas -------------- next part -------------- An HTML attachment was scrubbed... URL: From xhochy at gmail.com Mon Feb 4 11:20:00 2019 From: xhochy at gmail.com (Uwe L. Korn) Date: Mon, 4 Feb 2019 17:20:00 +0100 Subject: [Pandas-dev] How Far do we take ExtensionArrays? In-Reply-To: References: Message-ID: Hello Tom, overall I really like the concept of ExtensionArrays but for more advanced usage I think there is still a lot to do. At the moment, an implementer is quite well off when the ExtensionArray can be coerced into a numpy array. Once you have data that is not well represented by a numpy array, you need to develop much more algorithms. For fletcher this has been a major hurdle for me (or why I'm not implementing so much). This might also just be that my backing library (Apache Arrow) is missing a lot of numerical operations yet. I hope to have some time in the next months to work more on this and then we can see how much issues pop up. At the end though, I would like to avoid coercing as much as possible to NumPy arrays as the conversion of arrays with null adds some computational overhead. More comments inline. Am Mi., 16. Jan. 2019 um 18:16 Uhr schrieb Tom Augspurger < tom.augspurger88 at gmail.com>: > This is something I've been mulling over the past few days: how much do we > want > ExtensionArrays to change pandas? > > They've been great so far at addressing some of the shortcomings of > NumPy's type > system, but I imagine that users will be interested in pushing things even > further. For example, users have been asking for proper support for nested > data. > Now that we have ExtensionArrays, things are essentially solved at the > memory > level (by e.g. Apache Arrow). But, I imagine that the set of APIs > typically used > for nested data is quite different from those used for flat, tabular data > pandas > handles thus far. If we want to properly support nested data, what > tolerance do > we have for it "cluttering" the existing API? > Do we already have an example use case for nested data. For me it is hard to image intuitive APIs for nested data without really good example use cases. > > Finally (and this may be a topic for another day) have people thought > about how > 3rd-party EAs fit in with the potential block manager rewrite? IIUC, one > of the > goals there was a stable C API to the memory inside a DataFrame. Does > anyone > know how that would work with a array that doesn't (or can't) implement the > buffer protocol? > As an example, all Arrow arrays cannot implement the buffer protocol as each Array has at least 2 buffers (bitmap and the actual data). In fletcher I have also used ChunkedArray as the backing object of a series. This allows us to do operations like concat in constant time but also comes with the cognitive overhead that data may not be stored as a single contiguous memory array. > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien.marrec at gmail.com Tue Feb 5 02:35:08 2019 From: julien.marrec at gmail.com (Julien Marrec) Date: Tue, 05 Feb 2019 08:35:08 +0100 Subject: [Pandas-dev] [pydata] ANN: Pandas 0.24.1 Released In-Reply-To: References: Message-ID: Hi, For some reason the link points to 24.2 and cannot be found. Please find the correct link here: http://pandas.pydata.org/pandas-docs/version/0.24.1/whatsnew/v0.24.1.html Thanks for this release and all the good work! Julien ?-- Sent from a mobile device, please excuse the brevity. Julien Marrec, EBCP, BPI MFBA Owner Direct:?+33 6 95 14 42 13 Website:?www.effibem.com LinkedIn (en) | (fr)? On Feb 4, 2019, 04:10, at 04:10, Tom Augspurger wrote: >This is a minor bug-fix release in the 0.24.x series and includes some >regression fixes >and bug fixes. We recommend that all users upgrade to this version. > >See the full whatsnew > >for >a list of all the changes. > >The release can be installed with conda from the defaults and >conda-forge >channels: > >conda install pandas > >Or via PyPI: > >python -m pip install --upgrade pandas > >-- >You received this message because you are subscribed to the Google >Groups "PyData" group. >To unsubscribe from this group and stop receiving emails from it, send >an email to pydata+unsubscribe at googlegroups.com. >For more options, visit https://groups.google.com/d/optout. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 57.png Type: image/png Size: 1456 bytes Desc: not available URL: From tom.augspurger88 at gmail.com Thu Feb 7 15:07:54 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Thu, 7 Feb 2019 14:07:54 -0600 Subject: [Pandas-dev] How Far do we take ExtensionArrays? In-Reply-To: <1549569777.8043.3.camel@pietrobattiston.it> References: <1549569777.8043.3.camel@pietrobattiston.it> Message-ID: Third party accessors work just fine today: http://pandas-docs.github.io/pandas-docs-travis/development/extending.html#extending-register-accessors There are likely thorny internal issues where we expect "scalars" but actually get an array in the nested data case. On Thu, Feb 7, 2019 at 2:02 PM Pietro Battiston wrote: > Quick questions on one of your points, > > Il giorno mer, 16/01/2019 alle 11.15 -0600, Tom Augspurger ha scritto: > > [...] > > They've been great so far at addressing some of the shortcomings of > > NumPy's type > > system, but I imagine that users will be interested in pushing things > > even > > further. For example, users have been asking for proper support for > > nested data. > > How much would it be difficult, in your opinion, to allow users to > provide third-party accessors? > > My impression is that "proper support for nested data" is way beyond > the amount of complexity that we want introduce in the code base, but > that with EA we should be very close to allowing them to do it > themselves. > > (Not commenting on the other points because I don't know enough about > what was discussed, e.g. on the block manager rewrite) > > Pietro > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at pietrobattiston.it Thu Feb 7 15:02:57 2019 From: me at pietrobattiston.it (Pietro Battiston) Date: Thu, 07 Feb 2019 21:02:57 +0100 Subject: [Pandas-dev] How Far do we take ExtensionArrays? In-Reply-To: References: Message-ID: <1549569777.8043.3.camel@pietrobattiston.it> Quick questions on one of your points, Il giorno mer, 16/01/2019 alle 11.15 -0600, Tom Augspurger ha scritto: > [...] > They've been great so far at addressing some of the shortcomings of > NumPy's type > system, but I imagine that users will be interested in pushing things > even > further. For example, users have been asking for proper support for > nested data. How much would it be difficult, in your opinion, to allow users to provide third-party accessors? My impression is that "proper support for nested data" is way beyond the amount of complexity that we want introduce in the code base, but that with EA we should be very close to allowing them to do it themselves. (Not commenting on the other points because I don't know enough about what was discussed, e.g. on the block manager rewrite) Pietro From jorisvandenbossche at gmail.com Thu Feb 7 16:59:24 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Thu, 7 Feb 2019 22:59:24 +0100 Subject: [Pandas-dev] How Far do we take ExtensionArrays? In-Reply-To: References: <1549569777.8043.3.camel@pietrobattiston.it> Message-ID: Op do 7 feb. 2019 om 21:08 schreef Tom Augspurger < tom.augspurger88 at gmail.com>: > Third party accessors work just fine today: > http://pandas-docs.github.io/pandas-docs-travis/development/extending.html#extending-register-accessors > > There are likely thorny internal issues where we expect "scalars" but > actually get an array in the nested data case. > > I indeed think that's the main problem right now if you would want to implement a proper neted (eg list/dict) EA, and we already had some issues about that for the dummy json test case I think. That might involve a mechanism deferring to the ExtensionArray to determine if an object is a scalar vs array-like of scalars of certain length? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Fri Feb 8 07:26:40 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Fri, 8 Feb 2019 13:26:40 +0100 Subject: [Pandas-dev] How Far do we take ExtensionArrays? In-Reply-To: References: Message-ID: Op wo 16 jan. 2019 om 18:16 schreef Tom Augspurger < tom.augspurger88 at gmail.com>: > This is something I've been mulling over the past few days: how much do we > want > ExtensionArrays to change pandas? > > [...] > > As another semi-example, users may be interested in storing some or all > their > data on a GPU in an ExtensionArray or arrays backed by GPU-memory. I > suspect > that some things work quite well currently, e.g. `Series.sort_values` will > dispatch to the `ExtensionArray.argsort`, which can use a GPU-accelerated > sorting algorithm. But other parts of pandas (anything in Cython, for > example) > won't necessarily work. How much are we willing to refactor pandas' > internals to > support something that's going to live outside pandas (as a GPU extension > array > likely would)? > > To have a practical example: for example for a groupby operation, we dispatch to the ExtensionArray for the factorization step, but the actual computation of grouped reductions is still done in cython. Is that the kind of things you were thinking about? > Finally (and this may be a topic for another day) have people thought > about how > 3rd-party EAs fit in with the potential block manager rewrite? IIUC, one > of the > goals there was a stable C API to the memory inside a DataFrame. Does > anyone > know how that would work with a array that doesn't (or can't) implement the > buffer protocol? > > In the idea of getting rid of blocks and having just 1D arrays, it certainly fits I would say (we could extend the current numpy-backed PandasArrays that are now only used in `.array`). But if the idea is to rewrite the block manager in C/Cython, that might be more difficult. However, if a future version of pandas would be backed by Arrow, we wouldn't necessarily need our own C API, as a reference to the Arrow table / arrays might be sufficient? Of course that depends on how tight we want to depend on Arrow, as that might limit the extensibility with other backends. Op ma 4 feb. 2019 om 17:26 schreef Uwe L. Korn : > Hello Tom, > > overall I really like the concept of ExtensionArrays but for more advanced > usage I think there is still a lot to do. At the moment, an implementer is > quite well off when the ExtensionArray can be coerced into a numpy array. > Once you have data that is not well represented by a numpy array, you need > to develop much more algorithms. > I think that is more or less to be expected. All our internal algorithms are based on numpy arrays. I think it would be an interesting idea to see if we can/want to expose some of our algorithms for external users (eg external ExtensionArray implementors). But even if we do that, it wouldn't really help for the fletcher case given the different memory layout. Shorter term idea that I would find interesting is to see to what extent xtensor could be used for the algorithms. Joris > For fletcher this has been a major hurdle for me (or why I'm not > implementing so much). This might also just be that my backing library > (Apache Arrow) is missing a lot of numerical operations yet. I hope to have > some time in the next months to work more on this and then we can see how > much issues pop up. At the end though, I would like to avoid coercing as > much as possible to NumPy arrays as the conversion of arrays with null adds > some computational overhead. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Sat Feb 9 04:32:26 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Sat, 9 Feb 2019 10:32:26 +0100 Subject: [Pandas-dev] Remaining API changes / deprecations to do before 1.0 (wishlist) In-Reply-To: References: Message-ID: Op do 17 jan. 2019 om 16:20 schreef Tom Augspurger < tom.augspurger88 at gmail.com>: > > My main one is deprecation of SparseSeries and SparseDataFrame, since I > don't want to carry that around for all of 1.x. > I didn't quite get there for 0.24, but all the really disruptive changes > are done now (no method in pandas should return a SparseDataFrame anymore, > aside from DataFrame.to_sparse()). > Yes, the SparseSeries / SparseDataFrame is also on my list. Some others: - I think it would be good to do the planned breaking changes to IntervalIndex indexing before 1.0. - The `inplace` deprecation (GH16529 ) (or at least decide on it). Although a possible deprecation + removal will have a huge impact and need some time, so this might rather be something for after 1.0. Other people's thoughts on the original question? Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From brunox.leitao at gmail.com Mon Feb 11 14:57:09 2019 From: brunox.leitao at gmail.com (Bruno Xavier) Date: Mon, 11 Feb 2019 17:57:09 -0200 Subject: [Pandas-dev] DataFrame Object creation Message-ID: I'm new in Pandas Library and I got a doubt on DataFrame object construction. When building from a list the shape of a dataframe is 1 column with the row-count equaling the number of list elements. Unlike that, if we build it from a dictionary it's shape is 1 row with the column-count equalling the number of key-value pairs. My question is why they are different, why not made both follow the same pattern? -- *Atenciosamente, * *Bruno Xavier Leit?o* -------------- next part -------------- An HTML attachment was scrubbed... URL: