From stu.45075 at prakan.ac.th Tue Oct 5 09:39:48 2021 From: stu.45075 at prakan.ac.th (=?UTF-8?B?MTLguKjguLjguKDguLLguIHguKMg4LmA4LiK4Li04LiU4LiK4Li54LiY4Lij4Lij4Lih?=) Date: Tue, 5 Oct 2021 20:39:48 +0700 Subject: [Pandas-dev] Pandas package installation error Message-ID: hi Python dev, I can't install pandas package, see attached snap shot file, could you suggest how to install it. thank you, Get -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pandas error.png Type: image/png Size: 101615 bytes Desc: not available URL: From whdgns4195 at gmail.com Mon Oct 11 02:19:07 2021 From: whdgns4195 at gmail.com (=?UTF-8?B?6rmA7KKF7ZuI?=) Date: Mon, 11 Oct 2021 15:19:07 +0900 Subject: [Pandas-dev] Allow me to make page(Korean Wikipedia) Message-ID: Hi, I'm a student who dreams of becoming a programmer in Korea. I'm using the Pandas library well. I found the page "https://en.wikipedia.org/wiki/Pandas_(software)". But there is no page in Korean Wikipedia. So, I hope to make Pandas Wikipedia page for myself. Would you allow me to do that? Thanks for reading my mail. Reply please :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From outlook_F7B1812D1D1BFAC6 at outlook.com Mon Oct 11 02:45:54 2021 From: outlook_F7B1812D1D1BFAC6 at outlook.com (=?ks_c_5601-1987?B?uc68riCxuA==?=) Date: Mon, 11 Oct 2021 06:45:54 +0000 Subject: [Pandas-dev] Contribution requests and methods(pandas) Message-ID: Windows? ???? ??? ?????. Hello, I am studying at a university in Korea. I am a student who dreams of becoming a programmer. I wonder if I can contribute to the project by translating the pandas project documents into Korean. So, I want to help many Korean students understand Pandas. And if i can contribute in this way, I?d appreciate it if you could let me know which documents would be helpful to translate. Please reply this mail Thank you for reading this long post amidst your busy schedule. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Mon Oct 11 17:06:18 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Mon, 11 Oct 2021 23:06:18 +0200 Subject: [Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write In-Reply-To: References: Message-ID: (trying to revive this discussion) Some assorted comments on the last emails in this thread / comments on the google doc (and I will follow-up with a separate email about the single-Series-from-DataFrame-as-view issue). - A small note about "users' expectations": I am not going to say this easy (in contrast, this is one of the hardest parts of being a library author, IMO), but we are creating tools to be used by users. So while designing those tools, I think it is an essential part to think about how users will use your library / how they think something works / what they need / what they find intuitive / etc (thus, related to their expectations). And because this is a hard problem (and subjective), it would be good to get some more feedback from others on the proposed semantics from the usage point of view. I think the current proposal will be simpler to grasp and reason about especially for new users, but I certainly don't hold the truth on this aspect (and there are different options that are all simpler as the current situation). - On the google doc, Adrin made an interesting comment, quoting a part of that: I understand a slice and a mask are fundamentally different, but I don't > think from the perspective of a user they're different. The user is > selecting a subset of the original data. > ... > Reading through this document I understand why users (and I occasionally) > would get the pandas warnings telling us we're modifying something which is > not the original object, but it always puzzled me since I didn't expect a > slice or a mask to create a copy. > This is an interesting point, and I think one of the crucial aspects that the proposal tries to address. In short: while using a slice or mask are both methods to select a subset of your original data, when it comes to copy/view semantics they *are* fundamentally different for numpy arrays (a slice gives a view, a mask gives a copy). Currently, those numpy rules "leak" through to pandas, although not exactly the same and fully consistently. So we expect a pandas user to know those numpy concepts (views / fancy indexing), and know the differences in rules with pandas. If we want that pandas users don't have to know this, I think the most sensible option is to make them both behave as a copy (which is what the copy-on-write proposal does). I added a new section about this (relation with numpy views and differences) in the good doc: https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit#heading=h.yud4azltfua5 On Thu, 12 Aug 2021 at 01:45, Brock Mendel wrote: > > 2) I find the case for CoW more compelling for the chained methods usage > `frame.rename(...).reset_index(...).set_index(...)`. If we had a viable > way to implement CoW for these independently of the indexing, that would be > a slam dunk. Alternatively, we could get a lot of the benefits from a > `copy` keyword in the pertinent methods (explicit, better than implicit). > Based on my intuition from implementing the POC, I don't think it would be feasible to have both CoW in some cases, and normal views (eg when selecting columns from a DataFrame) in other cases (but you are certainly welcome to experiment with it as well). Personally I think adding keywords alone would not be a sufficient/satisfying solution, as I would like to see those methods to not copy by default, while keeping the behaviour of returning a new object (that doesn't modify the parent one if mutated). In addition, there are also methods that do indexing-like operations (reindex on columns, filter), and I think it would be surprising if those behaved differently as the indexing operations (getitem). On Thu, 12 Aug 2021 at 01:45, Brock Mendel wrote: > A couple of thoughts from the discussion on today's call: > > 1) A lot of the discussion about the indexing behavior revolved around > "users expect X". I fundamentally do *not* want to be in the business of > speculating about this. > > 2) I find the case for CoW more compelling for the chained methods usage > `frame.rename(...).reset_index(...).set_index(...)`. If we had a viable > way to implement CoW for these independently of the indexing, that would be > a slam dunk. Alternatively, we could get a lot of the benefits from a > `copy` keyword in the pertinent methods (explicit, better than implicit). > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Mon Oct 11 17:22:34 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Mon, 11 Oct 2021 23:22:34 +0200 Subject: [Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write In-Reply-To: References: Message-ID: I would like to highlight a comment that Stephan made earlier in this thread about accessing a DataFrame column as a Series: A simpler variant would be to make indexing out a single Series from a > DataFrame return a view, with everything else doing copy on write. Then > the existing pattern df.column_one[:] = ... would still work. > In the old issue about this, Stephan also mentioned this option (see eg https://github.com/pandas-dev/pandas/issues/10954#issuecomment-136521398 and https://github.com/pandas-dev/pandas/issues/10954#issuecomment-136816312 ). For me, this is one of the main aspects of the proposal I am the least sure about. On the one hand, it would certainly help the transition ("df[col][..] = .." is a case we currently don't warn about and would stop working with a pure CoW, but would keep working with this modification). It also makes sense in the idea of seeing a DataFrame as a "dict of Series" objects. On the other hand, it also adds complication because it inherently adds a special case to the rules. It might also result in some confusing corner cases (see eg the example I gave earlier in this thread at https://mail.python.org/pipermail/pandas-dev/2021-July/001368.html). What are people's thoughts on this aspect? This would also complicate the implementation, but I now think it might be possible to do this, if we preferred this behaviour (eg by turning a SingleBlockManager into a wrapper around the parent DataFrame BlockManager, so it's actually referencing directly the original DataFrame's data instead of an independent array). -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrockmendel at gmail.com Mon Oct 11 19:02:55 2021 From: jbrockmendel at gmail.com (Brock Mendel) Date: Mon, 11 Oct 2021 16:02:55 -0700 Subject: [Pandas-dev] Import time/size optimization - how much do people care? Message-ID: I've spent some time looking at our import time and the memory footprint at import and I _think_ we can cut another 20-30% by e.g. lazifying imports. The last 5-10% of that is pretty hairy though. My question for the community is: is this worth optimizing? Is there anyone (dask maybe?) for whom import time and memory footprint is a pain point? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Tue Oct 12 09:41:33 2021 From: mrocklin at gmail.com (Matthew Rocklin) Date: Tue, 12 Oct 2021 08:41:33 -0500 Subject: [Pandas-dev] Import time/size optimization - how much do people care? In-Reply-To: References: Message-ID: >From my perspective it's a mild pain point, but not in our top ten today. On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel wrote: > I've spent some time looking at our import time and the memory footprint > at import and I _think_ we can cut another 20-30% by e.g. lazifying > imports. The last 5-10% of that is pretty hairy though. > > My question for the community is: is this worth optimizing? Is there > anyone (dask maybe?) for whom import time and memory footprint is a pain > point? > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Oct 12 12:45:53 2021 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 12 Oct 2021 11:45:53 -0500 Subject: [Pandas-dev] Import time/size optimization - how much do people care? In-Reply-To: References: Message-ID: Hi Brock, thanks for having a look at this. Just a question. For this do you have in mind moving imports from the top of the file into the functions that use them in our code base. Or would it be more not loading components of pandas until the user uses them (components like plotting, timeseries, IO connectors...). The main difference being that in the latter case, most Python files would keep the imports at the top, but we'd avoid loading pandas modules until needed. Feels like the latter, where it makes sense, could be a nice thing not only for the loading time and the base memory footprint. On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel wrote: > I've spent some time looking at our import time and the memory footprint > at import and I _think_ we can cut another 20-30% by e.g. lazifying > imports. The last 5-10% of that is pretty hairy though. > > My question for the community is: is this worth optimizing? Is there > anyone (dask maybe?) for whom import time and memory footprint is a pain > point? > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrockmendel at gmail.com Tue Oct 12 13:21:37 2021 From: jbrockmendel at gmail.com (Brock Mendel) Date: Tue, 12 Oct 2021 10:21:37 -0700 Subject: [Pandas-dev] Import time/size optimization - how much do people care? In-Reply-To: References: Message-ID: > For this do you have in mind moving imports from the top of the file into the functions that use them in our code base. Or would it be more not loading components of pandas until the user uses them (components like plotting, timeseries, IO connectors...) Some of each. The main candidates I've looked at recently 1) make pyarrow import lazy (~15% https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050) 2) make pandas.io.api imports (into pd namespace) lazy (4-5%) 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i think not worth it) On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia wrote: > Hi Brock, thanks for having a look at this. > > Just a question. For this do you have in mind moving imports from the top > of the file into the functions that use them in our code base. Or would it > be more not loading components of pandas until the user uses them > (components like plotting, timeseries, IO connectors...). The main > difference being that in the latter case, most Python files would keep the > imports at the top, but we'd avoid loading pandas modules until needed. > > Feels like the latter, where it makes sense, could be a nice thing not > only for the loading time and the base memory footprint. > > On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel > wrote: > >> I've spent some time looking at our import time and the memory footprint >> at import and I _think_ we can cut another 20-30% by e.g. lazifying >> imports. The last 5-10% of that is pretty hairy though. >> >> My question for the community is: is this worth optimizing? Is there >> anyone (dask maybe?) for whom import time and memory footprint is a pain >> point? >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Oct 12 13:24:34 2021 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 12 Oct 2021 12:24:34 -0500 Subject: [Pandas-dev] Import time/size optimization - how much do people care? In-Reply-To: References: Message-ID: +1 on all them I don't think 3 should be that complex, I might be wrong. On Tue, Oct 12, 2021 at 12:21 PM Brock Mendel wrote: > > For this do you have in mind moving imports from the top of the file > into the functions that use them in our code base. Or would it be more not > loading components of pandas until the user uses them (components like > plotting, timeseries, IO connectors...) > > Some of each. The main candidates I've looked at recently > > 1) make pyarrow import lazy (~15% > https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050) > 2) make pandas.io.api imports (into pd namespace) lazy (4-5%) > 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i think > not worth it) > > On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia wrote: > >> Hi Brock, thanks for having a look at this. >> >> Just a question. For this do you have in mind moving imports from the top >> of the file into the functions that use them in our code base. Or would it >> be more not loading components of pandas until the user uses them >> (components like plotting, timeseries, IO connectors...). The main >> difference being that in the latter case, most Python files would keep the >> imports at the top, but we'd avoid loading pandas modules until needed. >> >> Feels like the latter, where it makes sense, could be a nice thing not >> only for the loading time and the base memory footprint. >> >> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel >> wrote: >> >>> I've spent some time looking at our import time and the memory footprint >>> at import and I _think_ we can cut another 20-30% by e.g. lazifying >>> imports. The last 5-10% of that is pretty hairy though. >>> >>> My question for the community is: is this worth optimizing? Is there >>> anyone (dask maybe?) for whom import time and memory footprint is a pain >>> point? >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Oct 12 19:22:00 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 13 Oct 2021 01:22:00 +0200 Subject: [Pandas-dev] Import time/size optimization - how much do people care? In-Reply-To: References: Message-ID: On Tue, 12 Oct 2021 at 19:24, Marc Garcia wrote: > +1 on all them > > I don't think 3 should be that complex, I might be wrong. > > On Tue, Oct 12, 2021 at 12:21 PM Brock Mendel > wrote: > >> > For this do you have in mind moving imports from the top of the file >> into the functions that use them in our code base. Or would it be more not >> loading components of pandas until the user uses them (components like >> plotting, timeseries, IO connectors...) >> >> Some of each. The main candidates I've looked at recently >> >> 1) make pyarrow import lazy (~15% >> https://github.com/pandas-dev/pandas/issues/41432#issuecomment-939083050) >> > You are linking to an issue that is explicitly about *not* having the pyarrow import lazy (because we need to register extension types). For the reasons mentioned in the issue, I would prefer to keep pyarrow as a non-lazy import. Joris > 2) make pandas.io.api imports (into pd namespace) lazy (4-5%) >> 3) avoid @doc/@Appender/@Substitution at runtime (~4-5% but a PITA i >> think not worth it) >> >> On Tue, Oct 12, 2021 at 9:46 AM Marc Garcia >> wrote: >> >>> Hi Brock, thanks for having a look at this. >>> >>> Just a question. For this do you have in mind moving imports from the >>> top of the file into the functions that use them in our code base. Or would >>> it be more not loading components of pandas until the user uses them >>> (components like plotting, timeseries, IO connectors...). The main >>> difference being that in the latter case, most Python files would keep the >>> imports at the top, but we'd avoid loading pandas modules until needed. >>> >>> Feels like the latter, where it makes sense, could be a nice thing not >>> only for the loading time and the base memory footprint. >>> >>> On Mon, Oct 11, 2021 at 6:03 PM Brock Mendel >>> wrote: >>> >>>> I've spent some time looking at our import time and the memory >>>> footprint at import and I _think_ we can cut another 20-30% by e.g. >>>> lazifying imports. The last 5-10% of that is pretty hairy though. >>>> >>>> My question for the community is: is this worth optimizing? Is there >>>> anyone (dask maybe?) for whom import time and memory footprint is a pain >>>> point? >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Oct 12 19:29:53 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 13 Oct 2021 01:29:53 +0200 Subject: [Pandas-dev] October 2021 monthly community meeting (Wednesday October 13, UTC 18:00) Message-ID: Hi all, A reminder that the next monthly dev call is tomorrow (Wednesday, October 13th) at 18:00 UTC (1 pm Central). Our calendar is at https://pandas.pydata.org/docs/development/meeting.html#calendar to check your local time. All are welcome to attend! Video Call: https://us06web.zoom.us/j/84484803210?pwd=TjUxNmcyNHcvcG9SNGJvbE53Y21GZz09 Minutes: https://docs.google.com/document/u/1/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?ouid=102771015311436394588&usp=docs_home&ths=true Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From simonjayhawkins at gmail.com Mon Oct 18 07:00:00 2021 From: simonjayhawkins at gmail.com (Simon Hawkins) Date: Mon, 18 Oct 2021 12:00:00 +0100 Subject: [Pandas-dev] ANN: pandas v1.3.4 Message-ID: Hi all, I'm pleased to announce the release of pandas v1.3.4. This is a patch release in the 1.3.x series and includes some regression fixes and bug fixes. We recommend that all users upgrade to this version. See the release notes for a list of all the changes. The release can be installed from PyPI python -m pip install --upgrade pandas==1.3.4 Or from conda-forge conda install -c conda-forge pandas==1.3.4 Please report any issues with the release on the pandas issue tracker . Thanks to all the contributors who made this release possible. -------------- next part -------------- An HTML attachment was scrubbed... URL: From darren.frimponglebrun at nomura.com Mon Oct 18 09:56:36 2021 From: darren.frimponglebrun at nomura.com (darren.frimponglebrun at nomura.com) Date: Mon, 18 Oct 2021 13:56:36 +0000 Subject: [Pandas-dev] Python 3.10 Wheel for Windows Message-ID: <1cb0182197fe43c08cea9c5423c48c48@nomura.com> Hi Development Team We are looking to deploy Python 3.10 soon. Is there a scheduled date for the pypi availability for Wheels for win32 and win64? Thanks, Darren Fingal Technology Nomura This e-mail (including any attachments) is private and confidential, may contain proprietary or privileged information and is intended for the named recipient(s) only. Unintended recipients are strictly prohibited from taking action on the basis of information in this e-mail and must contact the sender immediately, delete this e-mail (and all attachments) and destroy any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in, this e-mail. If verification is sought please request a hard copy. Any reference to the terms of executed transactions should be treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves the right to retain, monitor and intercept e-mail communications through its networks (subject to and in accordance with applicable laws). No confidentiality or privilege is waived or lost by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Sat Oct 23 15:32:02 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Sat, 23 Oct 2021 21:32:02 +0200 Subject: [Pandas-dev] Python 3.10 Wheel for Windows In-Reply-To: <1cb0182197fe43c08cea9c5423c48c48@nomura.com> References: <1cb0182197fe43c08cea9c5423c48c48@nomura.com> Message-ID: Hi Darren, There is some discussion about this on the following issue: https://github.com/pandas-dev/pandas/issues/44136 And it seems the work to build those wheels was merged earlier today: https://github.com/MacPython/pandas-wheels/pull/156 Best, Joris On Mon, 18 Oct 2021 at 20:15, darren.frimponglebrun--- via Pandas-dev < pandas-dev at python.org> wrote: > Hi Development Team > > > > We are looking to deploy Python 3.10 soon. Is there a scheduled date for > the pypi availability for Wheels for win32 and win64? > > > > Thanks, > > > > Darren > > > > Fingal Technology > > *Nomura* > This e-mail (including any attachments) is private and confidential, may > contain proprietary or privileged information and is intended for the named > recipient(s) only. Unintended recipients are strictly prohibited from > taking action on the basis of information in this e-mail and must contact > the sender immediately, delete this e-mail (and all attachments) and > destroy any hard copies. Nomura will not accept responsibility or liability > for the accuracy or completeness of, or the presence of any virus or > disabling code in, this e-mail. If verification is sought please request a > hard copy. Any reference to the terms of executed transactions should be > treated as preliminary only and subject to formal written confirmation by > Nomura. Nomura reserves the right to retain, monitor and intercept e-mail > communications through its networks (subject to and in accordance with > applicable laws). No confidentiality or privilege is waived or lost by > Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is > a reference to any entity in the Nomura Holdings, Inc. group. Please read > our Electronic Communications Legal Notice which forms part of this e-mail: > http://www.Nomura.com/email_disclaimer.htm > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Sat Oct 23 15:42:41 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Sat, 23 Oct 2021 21:42:41 +0200 Subject: [Pandas-dev] Allow me to make page(Korean Wikipedia) In-Reply-To: References: Message-ID: I don't think you need permission from us, so feel free to do so! (and thanks for contributing to wikipedia / pandas!) Best, Joris On Mon, 11 Oct 2021 at 14:19, ??? wrote: > Hi, I'm a student who dreams of becoming a programmer in Korea. > > I'm using the Pandas library well. > > I found the page "https://en.wikipedia.org/wiki/Pandas_(software)". > > But there is no page in Korean Wikipedia. > > So, I hope to make Pandas Wikipedia page for myself. > > Would you allow me to do that? > > Thanks for reading my mail. > > Reply please :) > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aryanagarwal6222 at gmail.com Sun Oct 24 02:21:46 2021 From: aryanagarwal6222 at gmail.com (Aryan Agarwal) Date: Sun, 24 Oct 2021 11:51:46 +0530 Subject: [Pandas-dev] (no subject) Message-ID: I am not able to install pandas package in my pycharm. It shows failed building wheels -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrockmendel at gmail.com Wed Oct 27 00:38:10 2021 From: jbrockmendel at gmail.com (Brock Mendel) Date: Tue, 26 Oct 2021 21:38:10 -0700 Subject: [Pandas-dev] API: Make silent casting behavior consistent by deprecating silent _object_-dtype casting Message-ID: TLDR ---- We have inconsistent silent-casting vs raising logic for numpy vs EA dtypes (and inconsistencies within EA dtypes). By deprecating silently casting to *object* dtype, we can *mostly* make the behaviors match. Background ---------- A number of Series/DataFrame methods will silently cast when dealing with mismatched values. With a numpy dtype, each of the following silently cast to float64: ser = pd.Series([1, 2, 3], dtype="i8") ser.shift(1, fill_value=1.5) ser.mask([True, False, False], 1.5) ser.where([False, True, True], 1.5) ser.replace(1, 1.5) ser[0] = 1.5 ser.fillna(1.5) # <- this one doesn't cast as it is a no-op If we were to pass "foo" or a pd.Period, these would coerce to object instead of float. By contrast, similar mixed-type operations with an ExtensionDtype Series _mostly_ raise: ser2 = pd.Series(pd.period_range("2016-01-01", periods=3, freq="D")) ser2.shift(1, fill_value=1.5) # <- ValueError ser2.mask([True, False, False], 1.5) # <- ValueError ser2.where([False, True, True], 1.5) # <- ValueError ser2.fillna(1.5) # <- TypeError ser2.replace(ser2[0], 1.5) # <- coerces to object ser2[0] = 1.5 # <- coerces to object ser3 = pd.Series([pd.NA, 2, 3], dtype="Int64") ser3.shift(1, fill_value=1.5) # <- TypeError ser3.mask([True, False, False], 1.5) # <- TypeError ser3.where([False, True, True], 1.5) # <- TypeError ser3.fillna(1.5) # <- TypeError ser3.replace(ser3[0], 1.5) # <- TypeError ser3[0] = 1.5 # <- TypeError timedelta64, datetime64, and datetime64tz mostly behave like the numpy dtypes, with a few exceptions: - shift raises on mismatch - fillna raises on mismatch for timedelta64, casts for the others Categorical mostly behaves like other ExtensionDtypes, except for replace which has special logic. Goals ----- - Have matching behavior across dtypes. - Share code. Options ------- 1) Change EA (and dt64/td64) behavior to match non-EA behavior 2) Change non-EA behavior to match EA behavior (or stricter xref https://github.com/pandas-dev/pandas/issues/39584) 3) Deprecate (and eventually raise on) silent casting to _object_ dtype, allowing silent casting otherwise. Here I am advocating for option 3). The advantages as I see them: A) For numpy dtypes, we retain the most useful cases (int->float) B) Deprecates cases most likely to be unintentional (e.g. typo "2016-01-01" -> "2p16-01-01" causing a datetime64 Series to silently cast) C) For td64/dt64/dt64tz/period, the *only* silent casting is to object, so this completely gets rid of special-casing among that code D) For IntegerArray, FloatingArray, IntervalArray leaves open the option of allowing e.g. Integer->Floating casting (xref https://github.com/pandas-dev/pandas/issues/25288#issuecomment-941762174) E) Does not preclude later deciding on the stricter options in 2) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiger1472999 at naver.com Fri Oct 29 11:59:01 2021 From: tiger1472999 at naver.com (=?utf-8?B?6rWs66+87ISd?=) Date: Sat, 30 Oct 2021 00:59:01 +0900 Subject: [Pandas-dev] =?utf-8?q?=28Help=29Contribution_requests_and_metho?= =?utf-8?q?ds?= Message-ID: Hello, I am studying at a university in Korea. I am a student who dreams of becoming a programmer. I wonder if I can contribute to the project by translating the pandas project documents into Korean. So, I want to help many Korean students understand Pandas. And if i can contribute in this way, I?d appreciate it if you could let me know which documents would be helpful to translate. Please reply this mail Thank you for reading this long post amidst your busy schedule. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiger147299 at gmail.com Sun Oct 31 10:01:48 2021 From: tiger147299 at gmail.com (=?UTF-8?B?6rWs66+87ISd?=) Date: Sun, 31 Oct 2021 23:01:48 +0900 Subject: [Pandas-dev] (Help)Contribution requests and methods Message-ID: Hello, I am studying at a university in Korea. I am a student who dreams of becoming a programmer. I'm using the Pandas library well. I wonder if I can contribute to the project by translating the pandas project documents into Korean. So, I want to help many Korean students understand Pandas. And if i can contribute in this way, I?d appreciate it if you could let me know which documents would be helpful to translate. Please reply this mail Thank you for reading this long post amidst your busy schedule. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shishaozhong at gmail.com Sun Oct 31 12:03:43 2021 From: shishaozhong at gmail.com (Shaozhong SHI) Date: Sun, 31 Oct 2021 16:03:43 +0000 Subject: [Pandas-dev] How to apply a self defined function in Pandas Message-ID: I defined a function and apply it to a column in Pandas. But it does not return correct values. I am trying to test which url in a column full of url to see which one can be connected to or not def connect(url): try: urllib.request.urlopen(url) return True except: return False df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) I ran without any error, but did not return any true. I just could not find any error with it. Can anyone try and find out why Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: