From stu.45075 at prakan.ac.th  Tue Oct  5 09:39:48 2021
From: stu.45075 at prakan.ac.th (=?UTF-8?B?MTLguKjguLjguKDguLLguIHguKMg4LmA4LiK4Li04LiU4LiK4Li54LiY4Lij4Lij4Lih?=)
Date: Tue, 5 Oct 2021 20:39:48 +0700
Subject: [Pandas-dev] Pandas package installation error
Message-ID: <CAM449iLSb+B9-SY9xWrWZdvaqddLKNhPr04EDi4ijrfxchms+g@mail.gmail.com>

hi Python dev,

I can't install pandas package, see attached snap shot file, could you
suggest how to install it.

thank you,
Get
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211005/305c6ccd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pandas error.png
Type: image/png
Size: 101615 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211005/305c6ccd/attachment-0001.png>

From whdgns4195 at gmail.com  Mon Oct 11 02:19:07 2021
From: whdgns4195 at gmail.com (=?UTF-8?B?6rmA7KKF7ZuI?=)
Date: Mon, 11 Oct 2021 15:19:07 +0900
Subject: [Pandas-dev] Allow me to make page(Korean Wikipedia)
Message-ID: <CAA20YancyW6wXFg3+M-qOfhZgDa_o+=N+eoimz0UuZsN1xG5Pw@mail.gmail.com>

Hi, I'm a student who dreams of becoming a programmer in Korea.

I'm using the Pandas library well.

I found the page "https://en.wikipedia.org/wiki/Pandas_(software)".

But there is no page in Korean Wikipedia.

So, I hope to make Pandas Wikipedia page for myself.

Would you allow me to do that?

Thanks for reading my mail.

Reply please :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211011/2357eb82/attachment.html>

From outlook_F7B1812D1D1BFAC6 at outlook.com  Mon Oct 11 02:45:54 2021
From: outlook_F7B1812D1D1BFAC6 at outlook.com (=?ks_c_5601-1987?B?uc68riCxuA==?=)
Date: Mon, 11 Oct 2021 06:45:54 +0000
Subject: [Pandas-dev] Contribution requests and methods(pandas)
Message-ID: <VI1PR10MB1791CF324505B78E2DA398E3E3B59@VI1PR10MB1791.EURPRD10.PROD.OUTLOOK.COM>


Windows? ??<https://go.microsoft.com/fwlink/?LinkId=550986>?? ??? ?????.

Hello, I am studying at a university in Korea.
I am a student who dreams of becoming a programmer.
I wonder if I can contribute to the project by translating the pandas project documents into Korean.
So, I want to help many Korean students understand Pandas.
And if i can contribute in this way, I?d appreciate it if you could let me know which documents would be helpful to translate.

Please reply this mail

Thank you for reading this long post amidst your busy schedule.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211011/8640939f/attachment.html>

From jorisvandenbossche at gmail.com  Mon Oct 11 17:06:18 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 11 Oct 2021 23:06:18 +0200
Subject: [Pandas-dev] Proposal for consistent,
 clear copy/view semantics in pandas with Copy-on-Write
In-Reply-To: <CAKf8g9R3XNmkwLiYsCW57_kfV_T++=pDWxVAgubN8eMxSVG_dg@mail.gmail.com>
References: <CALQtMBYqKg+HHb5pJZmYG6hbUg9M5f0b1ycPWYda_EvsWyHe=A@mail.gmail.com>
 <CALQtMBbY7t7F3eddhT5DN4UhNdr7GVzyKTWr4FxcdFJPiNw2PA@mail.gmail.com>
 <CAKf8g9RjaC8b2XHDELHa452pQ+JUVNg4aDDXTKPOJANBTWQZjQ@mail.gmail.com>
 <CAEQ_TvcZ55BXfh9J--6Su_iLrdy2BL=juzrkFZGhvf1tQCcrbQ@mail.gmail.com>
 <CALQtMBY=yCMvdUNE4yi_3Z1FG0vrzZT2CxSQYYN-9B+dHy2ruQ@mail.gmail.com>
 <CAEk5N5uY0i+9vyGST62tpXZOmr5nJ0_UtaFZ4Ey0mpxQmq2zEA@mail.gmail.com>
 <CAEOrW4-=p6aB-A3oH4RFg=cd8xT5LqhnDxW5ENr9dCvjqwBqOA@mail.gmail.com>
 <CALQtMBYCf+mtvdq-SnMct2Yzc-pYhRHMct6JoS+Y81Aeggtamw@mail.gmail.com>
 <CAKf8g9TJNdHHihSKP8CzwuXrsONPBEkjybOzdFNAsmFBZiu_WA@mail.gmail.com>
 <CALQtMBat5CekxE9CHgqs2U_EKik2FPY=E+H9mbEi9oEqhqNn=g@mail.gmail.com>
 <CAEOrW49RzmrRN7mdxC7prBnUiC33LXQxK-_mj278kFK4YayhUw@mail.gmail.com>
 <CALQtMBYe2B4mFoYBV7arsqqMfy4H01FjYArZZKN5_TPpMTQyvg@mail.gmail.com>
 <CAKf8g9R3XNmkwLiYsCW57_kfV_T++=pDWxVAgubN8eMxSVG_dg@mail.gmail.com>
Message-ID: <CALQtMBbQ6M9+OQSmQ+KiwwbGZquw9u0jGSrC9YoxJfkCchbC1Q@mail.gmail.com>

(trying to revive this discussion)

Some assorted comments on the last emails in this thread / comments on the
google doc (and I will follow-up with a separate email about the
single-Series-from-DataFrame-as-view issue).

- A small note about "users' expectations": I am not going to say this easy
(in contrast, this is one of the hardest parts of being a library author,
IMO), but we are creating tools to be used by users. So while designing
those tools, I think it is an essential part to think about how users will
use your library / how they think something works / what they need / what
they find intuitive / etc (thus, related to their expectations).
And because this is a hard problem (and subjective), it would be good to
get some more feedback from others on the proposed semantics from the usage
point of view. I think the current proposal will be simpler to grasp and
reason about especially for new users, but I certainly don't hold the truth
on this aspect (and there are different options that are all simpler as the
current situation).

- On the google doc, Adrin made an interesting comment, quoting a part of
that:

I understand a slice and a mask are fundamentally different, but I don't
> think from the perspective of a user they're different. The user is
> selecting a subset of the original data.
> ...
> Reading through this document I understand why users (and I occasionally)
> would get the pandas warnings telling us we're modifying something which is
> not the original object, but it always puzzled me since I didn't expect a
> slice or a mask to create a copy.
>

This is an interesting point, and I think one of the crucial aspects that
the proposal tries to address.

In short: while using a slice or mask are both methods to select a subset
of your original data, when it comes to copy/view semantics they *are*
fundamentally different for numpy arrays (a slice gives a view, a mask
gives a copy). Currently, those numpy rules "leak" through to pandas,
although not exactly the same and fully consistently. So we expect a pandas
user to know those numpy concepts (views / fancy indexing), and know the
differences in rules with pandas. If we want that pandas users don't have
to know this, I think the most sensible option is to make them both behave
as a copy (which is what the copy-on-write proposal does).

I added a new section about this (relation with numpy views and
differences) in the good doc:
https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit#heading=h.yud4azltfua5

On Thu, 12 Aug 2021 at 01:45, Brock Mendel <jbrockmendel at gmail.com> wrote:

>
> 2) I find the case for CoW more compelling for the chained methods usage
> `frame.rename(...).reset_index(...).set_index(...)`.  If we had a viable
> way to implement CoW for these independently of the indexing, that would be
> a slam dunk.  Alternatively, we could get a lot of the benefits from a
> `copy` keyword in the pertinent methods (explicit, better than implicit).
>

Based on my intuition from implementing the POC, I don't think it would be
feasible to have both CoW in some cases, and normal views (eg when
selecting columns from a DataFrame) in other cases (but you are certainly
welcome to experiment with it as well).

Personally I think adding keywords alone would not be a
sufficient/satisfying solution, as I would like to see those methods to not
copy by default, while keeping the behaviour of returning a new object
(that doesn't modify the parent one if mutated).

In addition, there are also methods that do indexing-like operations
(reindex on columns, filter), and I think it would be surprising if those
behaved differently as the indexing operations (getitem).

On Thu, 12 Aug 2021 at 01:45, Brock Mendel <jbrockmendel at gmail.com> wrote:

> A couple of thoughts from the discussion on today's call:
>
> 1) A lot of the discussion about the indexing behavior revolved around
> "users expect X".  I fundamentally do *not* want to be in the business of
> speculating about this.
>
> 2) I find the case for CoW more compelling for the chained methods usage
> `frame.rename(...).reset_index(...).set_index(...)`.  If we had a viable
> way to implement CoW for these independently of the indexing, that would be
> a slam dunk.  Alternatively, we could get a lot of the benefits from a
> `copy` keyword in the pertinent methods (explicit, better than implicit).
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211011/781267bc/attachment.html>

From jorisvandenbossche at gmail.com  Mon Oct 11 17:22:34 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 11 Oct 2021 23:22:34 +0200
Subject: [Pandas-dev] Proposal for consistent,
 clear copy/view semantics in pandas with Copy-on-Write
In-Reply-To: <CALQtMBbQ6M9+OQSmQ+KiwwbGZquw9u0jGSrC9YoxJfkCchbC1Q@mail.gmail.com>
References: <CALQtMBYqKg+HHb5pJZmYG6hbUg9M5f0b1ycPWYda_EvsWyHe=A@mail.gmail.com>
 <CALQtMBbY7t7F3eddhT5DN4UhNdr7GVzyKTWr4FxcdFJPiNw2PA@mail.gmail.com>
 <CAKf8g9RjaC8b2XHDELHa452pQ+JUVNg4aDDXTKPOJANBTWQZjQ@mail.gmail.com>
 <CAEQ_TvcZ55BXfh9J--6Su_iLrdy2BL=juzrkFZGhvf1tQCcrbQ@mail.gmail.com>
 <CALQtMBY=yCMvdUNE4yi_3Z1FG0vrzZT2CxSQYYN-9B+dHy2ruQ@mail.gmail.com>
 <CAEk5N5uY0i+9vyGST62tpXZOmr5nJ0_UtaFZ4Ey0mpxQmq2zEA@mail.gmail.com>
 <CAEOrW4-=p6aB-A3oH4RFg=cd8xT5LqhnDxW5ENr9dCvjqwBqOA@mail.gmail.com>
 <CALQtMBYCf+mtvdq-SnMct2Yzc-pYhRHMct6JoS+Y81Aeggtamw@mail.gmail.com>
 <CAKf8g9TJNdHHihSKP8CzwuXrsONPBEkjybOzdFNAsmFBZiu_WA@mail.gmail.com>
 <CALQtMBat5CekxE9CHgqs2U_EKik2FPY=E+H9mbEi9oEqhqNn=g@mail.gmail.com>
 <CAEOrW49RzmrRN7mdxC7prBnUiC33LXQxK-_mj278kFK4YayhUw@mail.gmail.com>
 <CALQtMBYe2B4mFoYBV7arsqqMfy4H01FjYArZZKN5_TPpMTQyvg@mail.gmail.com>
 <CAKf8g9R3XNmkwLiYsCW57_kfV_T++=pDWxVAgubN8eMxSVG_dg@mail.gmail.com>
 <CALQtMBbQ6M9+OQSmQ+KiwwbGZquw9u0jGSrC9YoxJfkCchbC1Q@mail.gmail.com>
Message-ID: <CALQtMBa-zt8ii6Wn6rqE_BMcOw30ymVzgDhJ2AwFP-Jp4gxjuw@mail.gmail.com>

I would like to highlight a comment that Stephan made earlier in this
thread about accessing a DataFrame column as a Series:

A simpler variant would be to make indexing out a single Series from a
> DataFrame return a view, with everything else doing copy on write. Then
> the existing pattern df.column_one[:] = ... would still work.
>

In the old issue about this, Stephan also mentioned this option (see eg
https://github.com/pandas-dev/pandas/issues/10954#issuecomment-136521398
and https://github.com/pandas-dev/pandas/issues/10954#issuecomment-136816312
).

For me, this is one of the main aspects of the proposal I am the least sure
about.
On the one hand, it would certainly help the transition ("df[col][..] = .."
is a case we currently don't warn about and would stop working with a pure
CoW, but would keep working with this modification). It also makes sense in
the idea of seeing a DataFrame as a "dict of Series" objects.
On the other hand, it also adds complication because it inherently adds a
special case to the rules. It might also result in some confusing corner
cases (see eg the example I gave earlier in this thread at
https://mail.python.org/pipermail/pandas-dev/2021-July/001368.html).

What are people's thoughts on this aspect?

This would also complicate the implementation, but I now think it might be
possible to do this, if we preferred this behaviour (eg by turning a
SingleBlockManager into a wrapper around the parent DataFrame BlockManager,
so it's actually referencing directly the original DataFrame's data instead
of an independent array).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211011/30645375/attachment.html>

From jbrockmendel at gmail.com  Mon Oct 11 19:02:55 2021
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Mon, 11 Oct 2021 16:02:55 -0700
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
Message-ID: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>

I've spent some time looking at our import time and the memory footprint at
import and I _think_ we can cut another 20-30% by e.g. lazifying imports.
The last 5-10% of that is pretty hairy though.

My question for the community is: is this worth optimizing?  Is there
anyone (dask maybe?) for whom import time and memory footprint is a pain
point?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211011/4ee635bc/attachment-0001.html>

From mrocklin at gmail.com  Tue Oct 12 09:41:33 2021
From: mrocklin at gmail.com (Matthew Rocklin)
Date: Tue, 12 Oct 2021 08:41:33 -0500
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
In-Reply-To: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
References: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
Message-ID: <CAJ8oX-HQiEhbAcxF31HYyFoY0qRB6ewA7rzznDK7e5pU+pzt3Q@mail.gmail.com>

>From my perspective it's a mild pain point, but not in our top ten today.

On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com> wrote:

> I've spent some time looking at our import time and the memory footprint
> at import and I _think_ we can cut another 20-30% by e.g. lazifying
> imports.  The last 5-10% of that is pretty hairy though.
>
> My question for the community is: is this worth optimizing?  Is there
> anyone (dask maybe?) for whom import time and memory footprint is a pain
> point?
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211012/a9c7569a/attachment.html>

From garcia.marc at gmail.com  Tue Oct 12 12:45:53 2021
From: garcia.marc at gmail.com (Marc Garcia)
Date: Tue, 12 Oct 2021 11:45:53 -0500
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
In-Reply-To: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
References: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
Message-ID: <CAEk5N5uLV7JJ3J5T1zDyQPjxzxOBVFCwNixYtdQQAgg4oF0Zqg@mail.gmail.com>

Hi Brock, thanks for having a look at this.

Just a question. For this do you have in mind moving imports from the top
of the file into the functions that use them in our code base. Or would it
be more not loading components of pandas until the user uses them
(components like plotting, timeseries, IO connectors...). The main
difference being that in the latter case, most Python files would keep the
imports at the top, but we'd avoid loading pandas modules until needed.

Feels like the latter, where it makes sense, could be a nice thing not only
for the loading time and the base memory footprint.

On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com> wrote:

> I've spent some time looking at our import time and the memory footprint
> at import and I _think_ we can cut another 20-30% by e.g. lazifying
> imports.  The last 5-10% of that is pretty hairy though.
>
> My question for the community is: is this worth optimizing?  Is there
> anyone (dask maybe?) for whom import time and memory footprint is a pain
> point?
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211012/dd97d315/attachment.html>

From jbrockmendel at gmail.com  Tue Oct 12 13:21:37 2021
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Tue, 12 Oct 2021 10:21:37 -0700
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
In-Reply-To: <CAEk5N5uLV7JJ3J5T1zDyQPjxzxOBVFCwNixYtdQQAgg4oF0Zqg@mail.gmail.com>
References: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
 <CAEk5N5uLV7JJ3J5T1zDyQPjxzxOBVFCwNixYtdQQAgg4oF0Zqg@mail.gmail.com>
Message-ID: <CAKf8g9QDWRm7YywEkoRV1CyAiPXOdtEDuqAEVgYzpb=h9ynrPQ@mail.gmail.com>

> For this do you have in mind moving imports from the top of the file into
the functions that use them in our code base. Or would it be more not
loading components of pandas until the user uses them (components like
plotting, timeseries, IO connectors...)

Some of each.  The main candidates I've looked at recently

1) make pyarrow import lazy (~15%
https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050)
2) make pandas.io.api imports (into pd namespace) lazy (4-5%)
3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i think
not worth it)

On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia <garcia.marc at gmail.com> wrote:

> Hi Brock, thanks for having a look at this.
>
> Just a question. For this do you have in mind moving imports from the top
> of the file into the functions that use them in our code base. Or would it
> be more not loading components of pandas until the user uses them
> (components like plotting, timeseries, IO connectors...). The main
> difference being that in the latter case, most Python files would keep the
> imports at the top, but we'd avoid loading pandas modules until needed.
>
> Feels like the latter, where it makes sense, could be a nice thing not
> only for the loading time and the base memory footprint.
>
> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> I've spent some time looking at our import time and the memory footprint
>> at import and I _think_ we can cut another 20-30% by e.g. lazifying
>> imports.  The last 5-10% of that is pretty hairy though.
>>
>> My question for the community is: is this worth optimizing?  Is there
>> anyone (dask maybe?) for whom import time and memory footprint is a pain
>> point?
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211012/9331596d/attachment.html>

From garcia.marc at gmail.com  Tue Oct 12 13:24:34 2021
From: garcia.marc at gmail.com (Marc Garcia)
Date: Tue, 12 Oct 2021 12:24:34 -0500
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
In-Reply-To: <CAKf8g9QDWRm7YywEkoRV1CyAiPXOdtEDuqAEVgYzpb=h9ynrPQ@mail.gmail.com>
References: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
 <CAEk5N5uLV7JJ3J5T1zDyQPjxzxOBVFCwNixYtdQQAgg4oF0Zqg@mail.gmail.com>
 <CAKf8g9QDWRm7YywEkoRV1CyAiPXOdtEDuqAEVgYzpb=h9ynrPQ@mail.gmail.com>
Message-ID: <CAEk5N5u0eFLybt3hYve=Nr6z2dxGkHq2k0RMi6cpo=NvL72xJA@mail.gmail.com>

+1 on all them

I don't think 3 should be that complex, I might be wrong.

On Tue, Oct 12, 2021 at 12:21 PM Brock Mendel <jbrockmendel at gmail.com>
wrote:

> > For this do you have in mind moving imports from the top of the file
> into the functions that use them in our code base. Or would it be more not
> loading components of pandas until the user uses them (components like
> plotting, timeseries, IO connectors...)
>
> Some of each.  The main candidates I've looked at recently
>
> 1) make pyarrow import lazy (~15%
> https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050)
> 2) make pandas.io.api imports (into pd namespace) lazy (4-5%)
> 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i think
> not worth it)
>
> On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia <garcia.marc at gmail.com> wrote:
>
>> Hi Brock, thanks for having a look at this.
>>
>> Just a question. For this do you have in mind moving imports from the top
>> of the file into the functions that use them in our code base. Or would it
>> be more not loading components of pandas until the user uses them
>> (components like plotting, timeseries, IO connectors...). The main
>> difference being that in the latter case, most Python files would keep the
>> imports at the top, but we'd avoid loading pandas modules until needed.
>>
>> Feels like the latter, where it makes sense, could be a nice thing not
>> only for the loading time and the base memory footprint.
>>
>> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>> I've spent some time looking at our import time and the memory footprint
>>> at import and I _think_ we can cut another 20-30% by e.g. lazifying
>>> imports.  The last 5-10% of that is pretty hairy though.
>>>
>>> My question for the community is: is this worth optimizing?  Is there
>>> anyone (dask maybe?) for whom import time and memory footprint is a pain
>>> point?
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211012/e67e0a98/attachment-0001.html>

From jorisvandenbossche at gmail.com  Tue Oct 12 19:22:00 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 13 Oct 2021 01:22:00 +0200
Subject: [Pandas-dev] Import time/size optimization - how much do people
 care?
In-Reply-To: <CAEk5N5u0eFLybt3hYve=Nr6z2dxGkHq2k0RMi6cpo=NvL72xJA@mail.gmail.com>
References: <CAKf8g9QrFBNK6_xEjPGwc1McGnhO9yc-6n4whHrHs9BX_OZpQw@mail.gmail.com>
 <CAEk5N5uLV7JJ3J5T1zDyQPjxzxOBVFCwNixYtdQQAgg4oF0Zqg@mail.gmail.com>
 <CAKf8g9QDWRm7YywEkoRV1CyAiPXOdtEDuqAEVgYzpb=h9ynrPQ@mail.gmail.com>
 <CAEk5N5u0eFLybt3hYve=Nr6z2dxGkHq2k0RMi6cpo=NvL72xJA@mail.gmail.com>
Message-ID: <CALQtMBaViRdQTduMZztiSsvEd5+LaeZqVqgzhbOFV8qYEkO5aA@mail.gmail.com>

On Tue, 12 Oct 2021 at 19:24, Marc Garcia <garcia.marc at gmail.com> wrote:

> +1 on all them
>
> I don't think 3 should be that complex, I might be wrong.
>
> On Tue, Oct 12, 2021 at 12:21 PM Brock Mendel <jbrockmendel at gmail.com>
> wrote:
>
>> > For this do you have in mind moving imports from the top of the file
>> into the functions that use them in our code base. Or would it be more not
>> loading components of pandas until the user uses them (components like
>> plotting, timeseries, IO connectors...)
>>
>> Some of each.  The main candidates I've looked at recently
>>
>> 1) make pyarrow import lazy (~15%
>> https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050)
>>
>
You are linking to an issue that is explicitly about *not* having the
pyarrow import lazy (because we need to register extension types). For the
reasons mentioned in the issue, I would prefer to keep pyarrow as a
non-lazy import.

Joris


> 2) make pandas.io.api imports (into pd namespace) lazy (4-5%)
>> 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i
>> think not worth it)
>>
>> On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia <garcia.marc at gmail.com>
>> wrote:
>>
>>> Hi Brock, thanks for having a look at this.
>>>
>>> Just a question. For this do you have in mind moving imports from the
>>> top of the file into the functions that use them in our code base. Or would
>>> it be more not loading components of pandas until the user uses them
>>> (components like plotting, timeseries, IO connectors...). The main
>>> difference being that in the latter case, most Python files would keep the
>>> imports at the top, but we'd avoid loading pandas modules until needed.
>>>
>>> Feels like the latter, where it makes sense, could be a nice thing not
>>> only for the loading time and the base memory footprint.
>>>
>>> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel <jbrockmendel at gmail.com>
>>> wrote:
>>>
>>>> I've spent some time looking at our import time and the memory
>>>> footprint at import and I _think_ we can cut another 20-30% by e.g.
>>>> lazifying imports.  The last 5-10% of that is pretty hairy though.
>>>>
>>>> My question for the community is: is this worth optimizing?  Is there
>>>> anyone (dask maybe?) for whom import time and memory footprint is a pain
>>>> point?
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211013/0c9085a5/attachment.html>

From jorisvandenbossche at gmail.com  Tue Oct 12 19:29:53 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 13 Oct 2021 01:29:53 +0200
Subject: [Pandas-dev] October 2021 monthly community meeting (Wednesday
 October 13, UTC 18:00)
Message-ID: <CALQtMBbf3ctg6gTtCMq0kVfpX5AzO-hPMLiW9Sd7B-iXFFgT0A@mail.gmail.com>

Hi all,

A reminder that the next monthly dev call is tomorrow (Wednesday, October
13th) at 18:00 UTC (1 pm Central). Our calendar is at
https://pandas.pydata.org/docs/development/meeting.html#calendar to
check your local time.
All are welcome to attend!

Video Call:
https://us06web.zoom.us/j/84484803210?pwd=TjUxNmcyNHcvcG9SNGJvbE53Y21GZz09
Minutes:
https://docs.google.com/document/u/1/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?ouid=102771015311436394588&usp=docs_home&ths=true

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211013/6b8e25dc/attachment.html>

From simonjayhawkins at gmail.com  Mon Oct 18 07:00:00 2021
From: simonjayhawkins at gmail.com (Simon Hawkins)
Date: Mon, 18 Oct 2021 12:00:00 +0100
Subject: [Pandas-dev] ANN: pandas v1.3.4
Message-ID: <CAFs9qGunw7BLwN-kBHPnbPZ_ZKA+J=icPB_eG0MjE9D3RzjhjA@mail.gmail.com>

Hi all,

I'm pleased to announce the release of pandas v1.3.4.

This is a patch release in the 1.3.x series and includes some regression
fixes and bug fixes. We recommend that all users upgrade to this version.

See the release notes
<https://pandas.pydata.org/pandas-docs/version/1.3/whatsnew/v1.3.4.html> for
a list of all the changes.


The release can be installed from PyPI

    python -m pip install --upgrade pandas==1.3.4

Or from conda-forge

    conda install -c conda-forge pandas==1.3.4


Please report any issues with the release on the pandas issue tracker
<https://github.com/pandas-dev/pandas/issues>.

Thanks to all the contributors who made this release possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211018/d1172b38/attachment.html>

From darren.frimponglebrun at nomura.com  Mon Oct 18 09:56:36 2021
From: darren.frimponglebrun at nomura.com (darren.frimponglebrun at nomura.com)
Date: Mon, 18 Oct 2021 13:56:36 +0000
Subject: [Pandas-dev] Python 3.10 Wheel for Windows
Message-ID: <1cb0182197fe43c08cea9c5423c48c48@nomura.com>

Hi Development Team

We are looking to deploy Python 3.10 soon. Is there a scheduled date for the pypi availability for Wheels for win32 and win64?

Thanks,

Darren

Fingal Technology
Nomura

This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211018/99c11e17/attachment.html>

From jorisvandenbossche at gmail.com  Sat Oct 23 15:32:02 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Sat, 23 Oct 2021 21:32:02 +0200
Subject: [Pandas-dev] Python 3.10 Wheel for Windows
In-Reply-To: <1cb0182197fe43c08cea9c5423c48c48@nomura.com>
References: <1cb0182197fe43c08cea9c5423c48c48@nomura.com>
Message-ID: <CALQtMBaOV-5eJuAD6v2gSmq-RZPaV-81u6WNv6+Xw-mQpmSYJw@mail.gmail.com>

Hi Darren,

There is some discussion about this on the following issue:
https://github.com/pandas-dev/pandas/issues/44136
And it seems the work to build those wheels was merged earlier today:
https://github.com/MacPython/pandas-wheels/pull/156

Best,
Joris

On Mon, 18 Oct 2021 at 20:15, darren.frimponglebrun--- via Pandas-dev <
pandas-dev at python.org> wrote:

> Hi Development Team
>
>
>
> We are looking to deploy Python 3.10 soon. Is there a scheduled date for
> the pypi availability for Wheels for win32 and win64?
>
>
>
> Thanks,
>
>
>
> Darren
>
>
>
> Fingal Technology
>
> *Nomura*
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211023/126af441/attachment.html>

From jorisvandenbossche at gmail.com  Sat Oct 23 15:42:41 2021
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Sat, 23 Oct 2021 21:42:41 +0200
Subject: [Pandas-dev] Allow me to make page(Korean Wikipedia)
In-Reply-To: <CAA20YancyW6wXFg3+M-qOfhZgDa_o+=N+eoimz0UuZsN1xG5Pw@mail.gmail.com>
References: <CAA20YancyW6wXFg3+M-qOfhZgDa_o+=N+eoimz0UuZsN1xG5Pw@mail.gmail.com>
Message-ID: <CALQtMBZWs8yiF3k+cAJ9tD3YCEp73-SY2-gQppo02OKTDo-=yA@mail.gmail.com>

I don't think you need permission from us, so feel free to do so! (and
thanks for contributing to wikipedia / pandas!)

Best,
Joris

On Mon, 11 Oct 2021 at 14:19, ??? <whdgns4195 at gmail.com> wrote:

> Hi, I'm a student who dreams of becoming a programmer in Korea.
>
> I'm using the Pandas library well.
>
> I found the page "https://en.wikipedia.org/wiki/Pandas_(software)".
>
> But there is no page in Korean Wikipedia.
>
> So, I hope to make Pandas Wikipedia page for myself.
>
> Would you allow me to do that?
>
> Thanks for reading my mail.
>
> Reply please :)
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211023/e53f8dc0/attachment.html>

From aryanagarwal6222 at gmail.com  Sun Oct 24 02:21:46 2021
From: aryanagarwal6222 at gmail.com (Aryan Agarwal)
Date: Sun, 24 Oct 2021 11:51:46 +0530
Subject: [Pandas-dev] (no subject)
Message-ID: <CAGWa3=AMh8Xm1GLECGR9UmUYoJBGwvX3iYrKXiNxztALxsgV7A@mail.gmail.com>

I am not able to install pandas package in my pycharm. It shows failed
building wheels
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211024/93b58b7d/attachment.html>

From jbrockmendel at gmail.com  Wed Oct 27 00:38:10 2021
From: jbrockmendel at gmail.com (Brock Mendel)
Date: Tue, 26 Oct 2021 21:38:10 -0700
Subject: [Pandas-dev] API: Make silent casting behavior consistent by
 deprecating silent _object_-dtype casting
Message-ID: <CAKf8g9RN6ZOWpFnSgg4vLP0FBy-P-SRGtmAsScepS3pPkMx6_Q@mail.gmail.com>

TLDR
----
We have inconsistent silent-casting vs raising logic for numpy vs EA dtypes
(and inconsistencies within EA dtypes).  By deprecating silently casting to
*object* dtype, we can *mostly* make the behaviors match.


Background
----------
A number of Series/DataFrame methods will silently cast when dealing with
mismatched values.  With a numpy dtype, each of the following silently
cast to float64:

    ser = pd.Series([1, 2, 3], dtype="i8")

    ser.shift(1, fill_value=1.5)
    ser.mask([True, False, False], 1.5)
    ser.where([False, True, True], 1.5)
    ser.replace(1, 1.5)
    ser[0] = 1.5
    ser.fillna(1.5)  # <- this one doesn't cast as it is a no-op

If we were to pass "foo" or a pd.Period, these would coerce to object
instead of float.

By contrast, similar mixed-type operations with an ExtensionDtype Series
_mostly_ raise:

    ser2 = pd.Series(pd.period_range("2016-01-01", periods=3, freq="D"))

    ser2.shift(1, fill_value=1.5)         # <- ValueError
    ser2.mask([True, False, False], 1.5)  # <- ValueError
    ser2.where([False, True, True], 1.5)  # <- ValueError
    ser2.fillna(1.5)                      # <- TypeError
    ser2.replace(ser2[0], 1.5)            # <- coerces to object
    ser2[0] = 1.5                         # <- coerces to object

    ser3 = pd.Series([pd.NA, 2, 3], dtype="Int64")

    ser3.shift(1, fill_value=1.5)         # <- TypeError
    ser3.mask([True, False, False], 1.5)  # <- TypeError
    ser3.where([False, True, True], 1.5)  # <- TypeError
    ser3.fillna(1.5)                      # <- TypeError
    ser3.replace(ser3[0], 1.5)            # <- TypeError
    ser3[0] = 1.5                         # <- TypeError

timedelta64, datetime64, and datetime64tz mostly behave like the numpy
dtypes,
with a few exceptions:

    - shift raises on mismatch
    - fillna raises on mismatch for timedelta64, casts for the others

Categorical mostly behaves like other ExtensionDtypes, except for replace
which
has special logic.

Goals
-----
- Have matching behavior across dtypes.
- Share code.

Options
-------
1) Change EA (and dt64/td64) behavior to match non-EA behavior
2) Change non-EA behavior to match EA behavior (or stricter xref
https://github.com/pandas-dev/pandas/issues/39584)
3) Deprecate (and eventually raise on) silent casting to _object_ dtype,
allowing silent casting otherwise.


Here I am advocating for option 3).  The advantages as I see them:

A) For numpy dtypes, we retain the most useful cases (int->float)
B) Deprecates cases most likely to be unintentional (e.g. typo "2016-01-01"
-> "2p16-01-01" causing a datetime64 Series to silently cast)
C) For td64/dt64/dt64tz/period, the *only* silent casting is to object, so
this completely gets rid of special-casing among that code
D) For IntegerArray, FloatingArray, IntervalArray leaves open the option of
allowing e.g. Integer->Floating casting (xref
https://github.com/pandas-dev/pandas/issues/25288#issuecomment-941762174)
E) Does not preclude later deciding on the stricter options in 2)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211026/9fe9d32e/attachment.html>

From tiger1472999 at naver.com  Fri Oct 29 11:59:01 2021
From: tiger1472999 at naver.com (=?utf-8?B?6rWs66+87ISd?=)
Date: Sat, 30 Oct 2021 00:59:01 +0900
Subject: [Pandas-dev] =?utf-8?q?=28Help=29Contribution_requests_and_metho?=
 =?utf-8?q?ds?=
Message-ID: <bb90493b08f4a76ea9eaf82a9adb9c8@cweb011.nm.nfra.io>

Hello, I am studying at a university in Korea.
I am a student who dreams of becoming a programmer.
I wonder if I can contribute to the project by translating the pandas project documents into Korean.
So, I want to help many Korean students understand Pandas.
And if i can contribute in this way, I?d appreciate it if you could let me know which documents would be helpful to translate.
 
Please reply this mail
 
Thank you for reading this long post amidst your busy schedule.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211030/e8637f49/attachment.html>

From tiger147299 at gmail.com  Sun Oct 31 10:01:48 2021
From: tiger147299 at gmail.com (=?UTF-8?B?6rWs66+87ISd?=)
Date: Sun, 31 Oct 2021 23:01:48 +0900
Subject: [Pandas-dev] (Help)Contribution requests and methods
Message-ID: <CAO4oRiNefEozfc4WNJyjLk4Wng6PvQfziksHQpW8pb0q4r7gQw@mail.gmail.com>

Hello, I am studying at a university in Korea.

I am a student who dreams of becoming a programmer.

I'm using the Pandas library well.

I wonder if I can contribute to the project by translating the pandas
project documents into Korean.

So, I want to help many Korean students understand Pandas.

And if i can contribute in this way, I?d appreciate it if you could let me
know which documents would be helpful to translate.


Please reply this mail


Thank you for reading this long post amidst your busy schedule.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211031/061705c0/attachment.html>

From shishaozhong at gmail.com  Sun Oct 31 12:03:43 2021
From: shishaozhong at gmail.com (Shaozhong SHI)
Date: Sun, 31 Oct 2021 16:03:43 +0000
Subject: [Pandas-dev] How to apply a self defined function in Pandas
Message-ID: <CA+i5Jwa597U8rst0AzMyksqHYUPO98KHXyB2emnGb7W466cvAQ@mail.gmail.com>

I defined a function and apply it to a column in Pandas.  But it does not
return correct values.

I am trying to test which url in a column full of url to see which one can
be connected to or not

def connect(url):
    try:
        urllib.request.urlopen(url)
        return True
    except:
        return False

df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1)

I ran without any error, but did not return any true.

I just could not find any error with it.

Can anyone try and find out why


Regards,

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20211031/b51bca5e/attachment.html>