From adrin.jalali at gmail.com  Wed Jun  2 05:16:33 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Wed, 2 Jun 2021 11:16:33 +0200
Subject: [scikit-learn] Understanding Our Contributors - NumFOCUS survey
Message-ID: <CAEOrW49q8q3srMwx7gL8o4LP9TAe=og+_gk2z87QzX03h50ZxA@mail.gmail.com>

Hi all,

NumFOCUS <https://numfocus.org/>, our fiscal sponsorship organization, is
conducting a research project looking into understanding the diversity,
inclusion and barriers to participation within NumFOCUS-sponsored projects
and the wider open source community.

The survey <https://numfocus.typeform.com/to/W6Bax8eq> will take 15-20 min
to complete. We?d appreciate your contribution.

The results of this survey will help NumFOCUS work closely with projects,
including scikit-learn, to develop practices that will lead to project
success around diversity, inclusion and sustainability.

Click here to participate in the survey
<https://numfocus.typeform.com/to/W6Bax8eq>

Thank you for your participation!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210602/6e425446/attachment.html>

From adrin.jalali at gmail.com  Wed Jun  2 05:24:18 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Wed, 2 Jun 2021 11:24:18 +0200
Subject: [scikit-learn] custom scorer needs group information: how?
In-Reply-To: <CA+4vAE=4j7bVr4B6yqbeE7OgcV=SiJiNNPYQLwhqTfhE6dTQyQ@mail.gmail.com>
References: <CA+4vAEnWavFvLfryve8_H10fQGy8e3VMN2av0DN7bX_mrPxbCg@mail.gmail.com>
 <CADeotZpZep96Q_HvZtzn7fuUYGgzYqR1d=1kEKs8469+w7rNUg@mail.gmail.com>
 <CA+4vAE=4j7bVr4B6yqbeE7OgcV=SiJiNNPYQLwhqTfhE6dTQyQ@mail.gmail.com>
Message-ID: <CAEOrW4_GWA+TD4EUH332-85tG8pzJKbP0DCdUE3fuPerE5fGqQ@mail.gmail.com>

Hi Emanuele,

In the meantime, you could also try the hack I have written here:
https://stackoverflow.com/questions/49581104/sklearn-gridsearchcv-not-using-sample-weight-in-score-function/49598597#49598597

Cheers,
Adrin

On Sat, May 22, 2021 at 7:54 PM Emanuele Olivetti <
emanuele.olivetti at gmail.com> wrote:

> Hi Alex,
>
> Thank you for the quick response. That SLEP looks very interesting! Indeed
> I had the impression that there was no easy way around the issue of
> automatically passing additional (meta)data to scorers. Irrespective of my
> issue, I hope the SLEP will get the green light soon.
>
> Best,
>
> Emanuele
>
> On Sat, May 22, 2021 at 10:27 AM Alexandre Gramfort <
> alexandre.gramfort at inria.fr> wrote:
>
>> hi Emanuelle,
>>
>> I would suggest you have a look at
>> https://github.com/scikit-learn/enhancement_proposals/pull/55
>>
>> it's work in progress though
>>
>> Alex
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210602/949f9c38/attachment.html>

From emanuele.olivetti at gmail.com  Thu Jun  3 03:04:10 2021
From: emanuele.olivetti at gmail.com (Emanuele Olivetti)
Date: Thu, 3 Jun 2021 09:04:10 +0200
Subject: [scikit-learn] custom scorer needs group information: how?
In-Reply-To: <CAEOrW4_GWA+TD4EUH332-85tG8pzJKbP0DCdUE3fuPerE5fGqQ@mail.gmail.com>
References: <CA+4vAEnWavFvLfryve8_H10fQGy8e3VMN2av0DN7bX_mrPxbCg@mail.gmail.com>
 <CADeotZpZep96Q_HvZtzn7fuUYGgzYqR1d=1kEKs8469+w7rNUg@mail.gmail.com>
 <CA+4vAE=4j7bVr4B6yqbeE7OgcV=SiJiNNPYQLwhqTfhE6dTQyQ@mail.gmail.com>
 <CAEOrW4_GWA+TD4EUH332-85tG8pzJKbP0DCdUE3fuPerE5fGqQ@mail.gmail.com>
Message-ID: <CA+4vAEnDuqR+WLoFK4RrEeEHzSqF0e=ZMqFt5Jpvb4124pB-3Q@mail.gmail.com>

Thank you Adrin,

Your solution based on using Pandas DataFrames by leveraging the indexing
that comes with them is pretty ingenious. Moreover, the whole StackOverflow
page is quite interesting. I'll try also your suggestion.

Best,

Emanuele

On Wed, Jun 2, 2021 at 11:26 AM Adrin <adrin.jalali at gmail.com> wrote:

> Hi Emanuele,
>
> In the meantime, you could also try the hack I have written here:
> https://stackoverflow.com/questions/49581104/sklearn-gridsearchcv-not-using-sample-weight-in-score-function/49598597#49598597
>
> Cheers,
> Adrin
>
> On Sat, May 22, 2021 at 7:54 PM Emanuele Olivetti <
> emanuele.olivetti at gmail.com> wrote:
>
>> Hi Alex,
>>
>> Thank you for the quick response. That SLEP looks very interesting!
>> Indeed I had the impression that there was no easy way around the issue of
>> automatically passing additional (meta)data to scorers. Irrespective of my
>> issue, I hope the SLEP will get the green light soon.
>>
>> Best,
>>
>> Emanuele
>>
>> On Sat, May 22, 2021 at 10:27 AM Alexandre Gramfort <
>> alexandre.gramfort at inria.fr> wrote:
>>
>>> hi Emanuelle,
>>>
>>> I would suggest you have a look at
>>> https://github.com/scikit-learn/enhancement_proposals/pull/55
>>>
>>> it's work in progress though
>>>
>>> Alex
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210603/9d22bc34/attachment.html>

From reshama.stat at gmail.com  Fri Jun  4 13:33:42 2021
From: reshama.stat at gmail.com (Reshama Shaikh)
Date: Fri, 4 Jun 2021 13:33:42 -0400
Subject: [scikit-learn] [Data Umbrella] 3 Components for Reviewing a
 Pull Request (PR)
In-Reply-To: <CAKPCsuhY1+ruShYBHvL4kfH1vFM6=a0YTeBsi3TjrxiroRyrnw@mail.gmail.com>
References: <CAKPCsuhY1+ruShYBHvL4kfH1vFM6=a0YTeBsi3TjrxiroRyrnw@mail.gmail.com>
Message-ID: <CAKPCsuguTZj31_+SYSNhcM69o8UVEYtDLAoEJYxqEKSQBDnCGg@mail.gmail.com>

Hello,

The video is up for Thomas Fan's talk:  3 Components of Reviewing a Pull
Request
https://youtu.be/dyxS9KKCNzA

It's 75 minutes, with a nice Q&A at the end.  We both agreed all the topics
discussed in the talk could be three separate talks.  Lots of good points
in the video, especially if you do contribute to scikit-learn or would like
to understand the process better.

---
Reshama Shaikh
she/her
Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
| LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
<https://github.com/reshamas>

Data Umbrella <https://www.dataumbrella.org>
NYC PyLadies
<https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>


On Sun, May 23, 2021 at 10:38 AM Reshama Shaikh <reshama.stat at gmail.com>
wrote:

> Hello,
> Thomas Fan, a core contributor to scikit-learn, will be presenting on
> "Reviewing a Pull Request."  This live webinar is scheduled for Wednesday,
> June 2 at 6pm EDT.
>
> Sign-up info is here:
> https://www.meetup.com/data-umbrella/events/278045166/
>
> This presentation will be recorded and shared on YouTube about a day after
> the event.  You can look for it here:
> https://www.youtube.com/c/DataUmbrella/featured
>
> Best,
> Reshama
> ---
> Reshama Shaikh
> she/her
> Blog <https://reshamas.github.io> | Twitter <https://twitter.com/reshamas>
> | LinkedIn <https://www.linkedin.com/in/reshamas/> | GitHub
> <https://github.com/reshamas>
>
> Data Umbrella <https://www.dataumbrella.org>
> NYC PyLadies
> <https://meet.meetup.com/wf/click?upn=pEEcc35imY7Cq0tG1vyTt6zEs68RbcMfjPcajNHTKtn9NmwqQbJhe15mAZ1gz2La_s50GiGgQPBz9c9AKCDbbu2LRERFOLQHDZ3rAVGAkUEIFdmeKWgLQ1JD-2FBfVxXpI86J1oyur7RYRzToaqco1fWUx-2FWPOn-2FLCyCICxwu5bjlHJvtSvVekt71L43UiQL8dMjr0HfGP-2FSeiGQFG0QQxzS-2FX5o4Q8Ch-2BHrlA5hsa9VyPXC5FvBn1cNbkmil3SgwH7HWFmXsKFJ7RYrzZR0EwWFIMarRA8-2BTgd8yXJYlfxogk-3D>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210604/dc8ae586/attachment.html>

From mlists at ligand.eu  Tue Jun  8 03:22:14 2021
From: mlists at ligand.eu (Francois Berenger)
Date: Tue, 08 Jun 2021 16:22:14 +0900
Subject: [scikit-learn] Is there a model for truncated regression in sklearn?
Message-ID: <ccf9bc2da4ae990ade1e1adcff942f3c@ligand.eu>

Hello,

https://en.wikipedia.org/wiki/Truncated_regression_model

Sometimes, data have missing samples when the target variable
is above or below a threshold value.
This is very often the case for biochemical data (e.g. target
variable outside detection range of some lab equipment).

I highly suspect some specific models could handle such datasets
better than generic methods (i.e. train better models).

Some points of entry, if that might help:

- R has a truncreg package
   https://cran.r-project.org/web/packages/truncreg/index.html
- a related paper from the wikipedia page:
   "Local likelihood estimation of truncated regression and
   its partial derivatives: Theory and application"
   
https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf

I can provide a cleaned public regression dataset, if someone is 
interested, for tests
(there are many such datasets in ChEMBL and PubChem by the way, but you 
need to know how
to "featurize"/encode molecules).

Regards,
F.

From gael.varoquaux at normalesup.org  Tue Jun  8 03:31:03 2021
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Tue, 8 Jun 2021 09:31:03 +0200
Subject: [scikit-learn] Is there a model for truncated regression in
 sklearn?
In-Reply-To: <ccf9bc2da4ae990ade1e1adcff942f3c@ligand.eu>
References: <ccf9bc2da4ae990ade1e1adcff942f3c@ligand.eu>
Message-ID: <20210608073103.yig66zd4zaohgpy3@phare.normalesup.org>

Hi,

Scikit-learn does not cover this problem.

I think that it relates to what is called survival analysis. You'll find
a survival analysis package in Python at
https://lifelines.readthedocs.io/en/latest/

Best,

Ga?l

On Tue, Jun 08, 2021 at 04:22:14PM +0900, Francois Berenger wrote:
> Hello,

> https://en.wikipedia.org/wiki/Truncated_regression_model

> Sometimes, data have missing samples when the target variable
> is above or below a threshold value.
> This is very often the case for biochemical data (e.g. target
> variable outside detection range of some lab equipment).

> I highly suspect some specific models could handle such datasets
> better than generic methods (i.e. train better models).

> Some points of entry, if that might help:

> - R has a truncreg package
>   https://cran.r-project.org/web/packages/truncreg/index.html
> - a related paper from the wikipedia page:
>   "Local likelihood estimation of truncated regression and
>   its partial derivatives: Theory and application"
> https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf

> I can provide a cleaned public regression dataset, if someone is interested,
> for tests
> (there are many such datasets in ChEMBL and PubChem by the way, but you need
> to know how
> to "featurize"/encode molecules).

> Regards,
> F.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Research Director, INRIA		  Visiting professor, McGill 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux

From g.lemaitre58 at gmail.com  Thu Jun 10 03:25:08 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Thu, 10 Jun 2021 09:25:08 +0200
Subject: [scikit-learn] New member of the triage team: Julien
Message-ID: <CACDxx9gJ7yNj53gZgF1thx_02D+khA5LPUznujMOoh+BBdotSg@mail.gmail.com>

We are excited to welcome a new member of the triage team:

* Julien Jerphanion https://github.com/jjerphan

The thorough work of the triage team on helping the community is much
appreciated.

Cheers,
-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210610/e5a31608/attachment.html>

From adrin.jalali at gmail.com  Fri Jun 11 05:44:55 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Fri, 11 Jun 2021 11:44:55 +0200
Subject: [scikit-learn] New member of the triage team: Julien
In-Reply-To: <CACDxx9gJ7yNj53gZgF1thx_02D+khA5LPUznujMOoh+BBdotSg@mail.gmail.com>
References: <CACDxx9gJ7yNj53gZgF1thx_02D+khA5LPUznujMOoh+BBdotSg@mail.gmail.com>
Message-ID: <CAEOrW49=Mj8jt1kiwzTfUY6fQazADYE-Vqr03a7-w1ZaK6RuAg@mail.gmail.com>

Congratulations Julien. Happy to have you in the team :)

On Thu, Jun 10, 2021 at 9:26 AM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

> We are excited to welcome a new member of the triage team:
>
> * Julien Jerphanion https://github.com/jjerphan
>
> The thorough work of the triage team on helping the community is much
> appreciated.
>
> Cheers,
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210611/b222573f/attachment.html>

From g.lemaitre58 at gmail.com  Thu Jun 17 05:33:19 2021
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Thu, 17 Jun 2021 11:33:19 +0200
Subject: [scikit-learn] New member of the triage team: Norbert
Message-ID: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>

We are excited to welcome a new member of the triage team:

* Norbert Preining https://github.com/norbusan

The thorough work of the triage team on helping the scikit-learn
community by triaging issues and PRs, organizing sprints, responding
to discussions, is extremely valuable and helpful in the development
and use of scikit-learn.

Cheers,
-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210617/8e5d463f/attachment.html>

From adrin.jalali at gmail.com  Thu Jun 17 05:38:15 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Thu, 17 Jun 2021 11:38:15 +0200
Subject: [scikit-learn] New member of the triage team: Norbert
In-Reply-To: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
References: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
Message-ID: <CAEOrW48hQkAfB+1gqL50RLYSw-=JA-qp6sFN_cdwnq=s839u0w@mail.gmail.com>

Welcome to the team Norbert!

On Thu, Jun 17, 2021 at 11:34 AM Guillaume Lema?tre <g.lemaitre58 at gmail.com>
wrote:

> We are excited to welcome a new member of the triage team:
>
> * Norbert Preining https://github.com/norbusan
>
> The thorough work of the triage team on helping the scikit-learn
> community by triaging issues and PRs, organizing sprints, responding
> to discussions, is extremely valuable and helpful in the development
> and use of scikit-learn.
>
> Cheers,
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210617/0c9ce9ec/attachment.html>

From bodor_sati at hotmail.com  Thu Jun 17 06:35:09 2021
From: bodor_sati at hotmail.com (bodor sati)
Date: Thu, 17 Jun 2021 10:35:09 +0000
Subject: [scikit-learn] New member of the triage team: Norbert
In-Reply-To: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
References: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
Message-ID: <VE1P191MB1069359E9ADAC2F9CE372B09EB0E9@VE1P191MB1069.EURP191.PROD.OUTLOOK.COM>

Hi,
I have only one question related to scikit-learn.
how to compute topic coherence of lda models in scikit-lean.  I don't find any function that calculate a coherence value.
please, reply me.
thanks


-----------------------------------------------
Bodor Ali Bashir Sati

PhD Student

Sudan University of Science and Technology


________________________________
From: scikit-learn <scikit-learn-bounces+bodor_sati=hotmail.com at python.org> on behalf of Guillaume Lema?tre <g.lemaitre58 at gmail.com>
Sent: Thursday, June 17, 2021 12:33 PM
To: Scikit-learn user and developer mailing list <scikit-learn at python.org>
Subject: [scikit-learn] New member of the triage team: Norbert

We are excited to welcome a new member of the triage team:

* Norbert Preining https://github.com/norbusan

The thorough work of the triage team on helping the scikit-learn
community by triaging issues and PRs, organizing sprints, responding
to discussions, is extremely valuable and helpful in the development
and use of scikit-learn.

Cheers,
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210617/572dc7d7/attachment.html>

From manpritsinghece at gmail.com  Fri Jun 18 06:45:10 2021
From: manpritsinghece at gmail.com (Manprit Singh)
Date: Fri, 18 Jun 2021 16:15:10 +0530
Subject: [scikit-learn] function transformer
Message-ID: <CAO1OCwanV3K26evmfPx3ZPydnseGO7qHig9EwMAaEJYtGWVEDg@mail.gmail.com>

Dear sir ,

Just need to know if I can use a function transformer to generate new
columns in the data set .

Just see the below written pipeline

num_pipeline = Pipeline([('imputer', SimpleImputer(strategy="median")),
                         ('attribs_adder', column_adder),
                         ('std_scaler', StandardScaler()),
                        ])
This pipeline is for numerical attributes in the dataset, firstly it will
treat all mising values in the data set using  SimpleImputer , then i have
made a function to add three more columns in the existing data, i have made
a function transformer with this function and then StandardScaler .

The columns being added are generated from existing columns (by element
wise division of two columns) . So Using a function transformer is ok ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210618/492988aa/attachment.html>

From solegalli at protonmail.com  Mon Jun 21 02:43:30 2021
From: solegalli at protonmail.com (Sole Galli)
Date: Mon, 21 Jun 2021 06:43:30 +0000
Subject: [scikit-learn] function transformer
In-Reply-To: <CAO1OCwanV3K26evmfPx3ZPydnseGO7qHig9EwMAaEJYtGWVEDg@mail.gmail.com>
References: <CAO1OCwanV3K26evmfPx3ZPydnseGO7qHig9EwMAaEJYtGWVEDg@mail.gmail.com>
Message-ID: <M1bY5mgzuLn39_hEZCLNhT-PuWxa7Vx8yJ6g_2hl4OM56rZwGgsLmWUigCICucgwyrWI66Ve-5uJIE6hueiXIgICLYdjSwg0XWPOh4wP5gg=@protonmail.com>

The FunctionTransformer will apply the transformation coded your function to the entire dataset passed to the transform() method.

I find it hard to see how this could work to add additional columns to the dataset, but I guess it might depend on how you designed your function.

Did you try passing your function to the FunctionTransformer and then apply the transform() method on your data and see the result?

Alternatively, you could create your own class to add additional columns to your data and pass that class within the pipeline.

Or, easier, use the [CombineWithFeatureReference](https://feature-engine.readthedocs.io/en/latest/creation/CombineWithReferenceFeature.html) transformer from another open source package for feature engineering (Feature-engine), which does exactly what you want to do.

Hope this helps

Soledad Galli
https://www.trainindata.com/

??????? Original Message ???????
On Friday, June 18th, 2021 at 12:45 PM, Manprit Singh <manpritsinghece at gmail.com> wrote:

> Dear sir ,
>
> Just need to know if I can use a function transformer to generate new columns in the data set .
>
> Just see the below written pipeline
>
> num_pipeline = Pipeline([('imputer', SimpleImputer(strategy="median")),
> ('attribs_adder', column_adder),
> ('std_scaler', StandardScaler()),
> ])
> This pipeline is for numerical attributes in the dataset, firstly it will treat all mising values in the data set using SimpleImputer , then i have made a function to add three more columns in the existing data, i have made a function transformer with this function and then StandardScaler .
>
> The columns being added are generated from existing columns (by element wise division of two columns) . So Using a function transformer is ok ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210621/d647068d/attachment.html>

From manpritsinghece at gmail.com  Mon Jun 21 04:18:12 2021
From: manpritsinghece at gmail.com (Manprit Singh)
Date: Mon, 21 Jun 2021 13:48:12 +0530
Subject: [scikit-learn] function transformer
In-Reply-To: <CAO1OCwanV3K26evmfPx3ZPydnseGO7qHig9EwMAaEJYtGWVEDg@mail.gmail.com>
References: <CAO1OCwanV3K26evmfPx3ZPydnseGO7qHig9EwMAaEJYtGWVEDg@mail.gmail.com>
Message-ID: <CAO1OCwbV-LdxBSqT3gSZtMBT=ccWoyBoMWKaSLBJw-nQHdG2Zw@mail.gmail.com>

 Dear Sir,
I  have made such a transformer, below given is an example that generates 3
new columns, from existing 2 columns of a numpy array , first column is for
element wise addition, second is for element wise multiplication and third
is for element wise division .

>>> import numpy as np
>>> from sklearn.preprocessing import FunctionTransformer
>>> def col_add(x):
           x1 = x[:, 0] + x[:, 1]
           x2 = x[:, 0] * x[:, 1]
           x3 = x[:, 0] / x[:, 1]
           return np.c_[x, x1, x2, x3]

>>> col_adder = FunctionTransformer(col_add)
>>> arr = np.array([[2, 7], [4, 9], [3, 5]])
>>> arr
array([[2, 7],
       [4, 9],
       [3, 5]])
>>> col_adder.transform(arr) # will add 3 columns
array([[ 2.        ,  7.        ,  9.        , 14.        ,  0.28571429],
       [ 4.        ,  9.        , 13.        , 36.        ,  0.44444444],
       [ 3.        ,  5.        ,  8.        , 15.        ,  0.6       ]])
>>>

So in this way a function transformer can be used to add new features
generated from existing columns ?

On Fri, Jun 18, 2021 at 4:15 PM Manprit Singh <manpritsinghece at gmail.com>
wrote:

> Dear sir ,
>
> Just need to know if I can use a function transformer to generate new
> columns in the data set .
>
> Just see the below written pipeline
>
> num_pipeline = Pipeline([('imputer', SimpleImputer(strategy="median")),
>                          ('attribs_adder', column_adder),
>                          ('std_scaler', StandardScaler()),
>                         ])
> This pipeline is for numerical attributes in the dataset, firstly it will
> treat all mising values in the data set using  SimpleImputer , then i have
> made a function to add three more columns in the existing data, i have made
> a function transformer with this function and then StandardScaler .
>
> The columns being added are generated from existing columns (by element
> wise division of two columns) . So Using a function transformer is ok ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210621/d04680e6/attachment.html>

From olivier.grisel at ensta.org  Mon Jun 21 10:46:13 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Mon, 21 Jun 2021 16:46:13 +0200
Subject: [scikit-learn] New member of the triage team: Norbert
In-Reply-To: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
References: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
Message-ID: <CAFvE7K7FKL5AU3HFz2Q3SsQTKUCj0eG_BBYRKG-6tcXBqV5vzQ@mail.gmail.com>

I am a bit late but I am very happy to see Norbert joining the triage
team! Welcome!

From olivier.grisel at ensta.org  Mon Jun 21 11:11:33 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Mon, 21 Jun 2021 17:11:33 +0200
Subject: [scikit-learn] New member of the triage team: Norbert
In-Reply-To: <VE1P191MB1069359E9ADAC2F9CE372B09EB0E9@VE1P191MB1069.EURP191.PROD.OUTLOOK.COM>
References: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
 <VE1P191MB1069359E9ADAC2F9CE372B09EB0E9@VE1P191MB1069.EURP191.PROD.OUTLOOK.COM>
Message-ID: <CAFvE7K4NHyOxcXKjTzGH_V9EqH5FJ+1be4ZGJv-bmWtNahciFA@mail.gmail.com>

> I have only one question related to scikit-learn.
> how to compute topic coherence of lda models in scikit-lean.  I don't find any function that calculate a coherence value.
> please, reply me.

We don't have such a metric in scikit-learn. I assume you are referring to:
http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf

which is implemented in Gensim as:
https://radimrehurek.com/gensim/models/coherencemodel.html

If I understand correctly this metric needs to compute relative
frequencies of occurrences and co-occurrences of words in the
documents of the training set. This feels very domain specific
compared to the more domain agnostic metrics that we have in
scikit-learn.

From norbert at preining.info  Mon Jun 21 22:37:31 2021
From: norbert at preining.info (Norbert Preining)
Date: Tue, 22 Jun 2021 11:37:31 +0900
Subject: [scikit-learn] New member of the triage team: Norbert
In-Reply-To: <CAFvE7K7FKL5AU3HFz2Q3SsQTKUCj0eG_BBYRKG-6tcXBqV5vzQ@mail.gmail.com>
References: <CACDxx9ixyq+heQqXVKUj6w9ccz5g1ziGDgUA_dJ9pBs+6Ptypw@mail.gmail.com>
 <CAFvE7K7FKL5AU3HFz2Q3SsQTKUCj0eG_BBYRKG-6tcXBqV5vzQ@mail.gmail.com>
Message-ID: <YNFM62Jq1AREc+A6@bulldog.preining.info>

Hi everyone,

On Mon, 21 Jun 2021, Olivier Grisel wrote:
> I am a bit late but I am very happy to see Norbert joining the triage

Thanks everyone for the welcome and I am looking forward to our
collaboration.

Norbert

--
PREINING Norbert                              https://www.preining.info
Fujitsu Research  +  IFMGA Guide  +  TU Wien  +  TeX Live  + Debian Dev
GPG: 0x860CDC13   fp: F7D8 A928 26E3 16A1 9FA0 ACF0 6CAC A448 860C DC13

From adrin.jalali at gmail.com  Wed Jun 23 11:10:50 2021
From: adrin.jalali at gmail.com (Adrin)
Date: Wed, 23 Jun 2021 17:10:50 +0200
Subject: [scikit-learn] HOWTO fix your merge conflicts after we've applied
 `black`
Message-ID: <CAEOrW48fN5Q8rM0KeRme0np0j2J=BRFFnCaCLZuzvHA9y67-rw@mail.gmail.com>

Hi,

This is to let you know that if you have an open PR, and you have merge
conflicts due to the fact that now we have applied `black` to the repo,
please refer to this issue
<https://github.com/scikit-learn/scikit-learn/issues/20301> which explains
how you can fix your merge conflicts.

Best,
Adrin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210623/72284a71/attachment.html>

From olivier.grisel at ensta.org  Fri Jun 25 05:55:54 2021
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Fri, 25 Jun 2021 11:55:54 +0200
Subject: [scikit-learn] scikit-learn monthly developer meeting: Monday June
 28 2021
Message-ID: <CAFvE7K5t-ddhjyyxmphz2bhLZ0KkDbogUoKYgNryibBL8fG7Aw@mail.gmail.com>

Dear all,

The scikit-learn developer monthly meeting will take place on Monday
June 28th at
3PM UTC.

- Video call link: https://meet.google.com/qbg-ucpe-ngz
- Meeting notes / agenda: https://hackmd.io/0yokz72CTZSny8y3Re648Q
- Local times:
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2021&month=6&day=28&hour=15&min=0&sec=0&p1=1440&p2=240&p3=248&p4=195&p5=179&p6=224

The goal of this meeting is to discuss ongoing development topics for
the project. Everybody is welcome.

As usual, please follow the code of conduct of the project:
https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md

Regards,

-- 
Olivier