[scikit-learn] Using logistic regression with count proportions data

Sean Violante sean.violante at gmail.com
Mon Oct 10 07:22:11 EDT 2016


no ( but please check !)

sample weights should be the counts for the respective label (0/1)

[ I am actually puzzled about the glm help file - proportions loses how
often an input data 'row' was present relative to the other - though you
could do this by repeating the row 'n' times]

On Mon, Oct 10, 2016 at 1:15 PM, Raphael C <drraph at gmail.com> wrote:

> How do I use sample_weight for my use case?
>
> In my case is "y" an array of 0s and 1s and sample_weight then an
> array real numbers between 0 and 1 where I should make sure to set
> sample_weight[i]= 0 when y[i] = 0?
>
> Raphael
>
> On 10 October 2016 at 12:08, Sean Violante <sean.violante at gmail.com>
> wrote:
> > should be the sample weight function in fit
> >
> > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.
> LogisticRegression.html
> >
> > On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drraph at gmail.com> wrote:
> >>
> >> I just noticed this about the glm package in R.
> >> http://stats.stackexchange.com/a/26779/53128
> >>
> >> "
> >> The glm function in R allows 3 ways to specify the formula for a
> >> logistic regression model.
> >>
> >> The most common is that each row of the data frame represents a single
> >> observation and the response variable is either 0 or 1 (or a factor
> >> with 2 levels, or other varibale with only 2 unique values).
> >>
> >> Another option is to use a 2 column matrix as the response variable
> >> with the first column being the counts of 'successes' and the second
> >> column being the counts of 'failures'.
> >>
> >> You can also specify the response as a proportion between 0 and 1,
> >> then specify another column as the 'weight' that gives the total
> >> number that the proportion is from (so a response of 0.3 and a weight
> >> of 10 is the same as 3 'successes' and 7 'failures')."
> >>
> >> Either of the last two options would do for me.  Does scikit-learn
> >> support either of these last two options?
> >>
> >> Raphael
> >>
> >> On 10 October 2016 at 11:55, Raphael C <drraph at gmail.com> wrote:
> >> > I am trying to perform regression where my dependent variable is
> >> > constrained to be between 0 and 1. This constraint comes from the fact
> >> > that it represents a count proportion. That is counts in some category
> >> > divided by a total count.
> >> >
> >> > In the literature it seems that one common way to tackle this is to
> >> > use logistic regression. However, it appears that in scikit learn
> >> > logistic regression is only available as a classifier
> >> >
> >> > (http://scikit-learn.org/stable/modules/generated/
> sklearn.linear_model.LogisticRegression.html
> >> > ) . Is that right?
> >> >
> >> > Is there another way to perform regression using scikit learn where
> >> > the dependent variable is a count proportion?
> >> >
> >> > Thanks for any help.
> >> >
> >> > Raphael
> >> _______________________________________________
> >> scikit-learn mailing list
> >> scikit-learn at python.org
> >> https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161010/49787872/attachment-0001.html>


More information about the scikit-learn mailing list