From randy.heiland at gmail.com Tue Oct 1 19:33:32 2019 From: randy.heiland at gmail.com (Randy Heiland) Date: Tue, 1 Oct 2019 19:33:32 -0400 Subject: [scikit-learn] AffinityProp to classify 2D points Message-ID: This is surely a well-studied problem, but I'm enjoying just playing with it for now. I have a bunch of 2D points (they are actually circles with possibly varying radii... later) and I'd like to devise a metric of sorts to quantify their arrangement. At first I was thinking K-means, but I don't know how many clusters there might be. So I began playing with AffinityPropagation (for my first time). The results weren't exactly what I was expecting, and I was wondering what parameters I should tweak to get different results? In the 2 sample datasets/outcomes at https://github.com/rheiland/PhysiCell_tools/tree/master/cell_metrics, I have what I call "uniform" and "clumpy". Can someone offer a general explanation of why they both have ~25 clusters? I'm probably making false assumptions about the AP alg. Thanks for any insights. Next, I'll probably explore some image processing algs and graph algs. But I'd welcome other ideas. Randy -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Fri Oct 4 12:48:24 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 4 Oct 2019 12:48:24 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References:

Message-ID: I'm getting some funny results. I am doing a regression decision tree, the response variables are assigned to levels. The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category. The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How does the sklearn know internally 0 vs. 1 is categorical, not numerical? In R for instance, you do as.factor(), which explicitly states the data type. Thank you! On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller wrote: > > > On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: > > > > On Sat, 14 Sep 2019 at 20:59, C W wrote: > >> Thanks, Guillaume. >> Column transformer looks pretty neat. I've also heard though, this >> pipeline can be tedious to set up? Specifying what you want for every >> feature is a pain. >> > > It would be interesting for us which part of the pipeline is tedious to > set up to know if we can improve something there. > Do you mean, that you would like to automatically detect of which type of > feature (categorical/numerical) and apply a > default encoder/scaling such as discuss there: > https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 > > IMO, one a user perspective, it would be cleaner in some cases at the cost > of applying blindly a black box > which might be dangerous. > > Also see > https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor > Which basically does that. > > > > >> >> Jaiver, >> Actually, you guessed right. My real data has only one numerical >> variable, looks more like this: >> >> Gender Date Income Car Attendance >> Male 2019/3/01 10000 BMW Yes >> Female 2019/5/02 9000 Toyota No >> Male 2019/7/15 12000 Audi Yes >> >> I am predicting income using all other categorical variables. Maybe it is >> catboost! >> >> Thanks, >> >> M >> >> >> >> >> >> >> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez >> wrote: >> >>> If you have datasets with many categorical features, and perhaps many >>> categories, the tools in sklearn are quite limited, >>> but there are alternative implementations of boosted trees that are >>> designed with categorical features in mind. Take a look >>> at catboost [1], which has an sklearn-compatible API. >>> >>> J >>> >>> [1] https://catboost.ai/ >>> >>> On Sat, Sep 14, 2019 at 3:40 AM C W wrote: >>> >>>> Hello all, >>>> I'm very confused. Can the decision tree module handle both continuous >>>> and categorical features in the dataset? In this case, it's just CART >>>> (Classification and Regression Trees). >>>> >>>> For example, >>>> Gender Age Income Car Attendance >>>> Male 30 10000 BMW Yes >>>> Female 35 9000 Toyota No >>>> Male 50 12000 Audi Yes >>>> >>>> According to the documentation >>>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>>> it can not! >>>> >>>> It says: "scikit-learn implementation does not support categorical >>>> variables for now". >>>> >>>> Is this true? If not, can someone point me to an example? If yes, what >>>> do people do? >>>> >>>> Thank you very much! >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Oct 4 13:03:17 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 4 Oct 2019 12:03:17 -0500 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References:

Message-ID: Hi, > The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5 that's not a onehot encoding then. For an Audi datapoint, it should be BMW=0 Toyota=0 Audi=1 for BMW BMW=1 Toyota=0 Audi=0 and for Toyota BMW=0 Toyota=1 Audi=0 The split threshold should then be at 0.5 for any of these features. Based on your email, I think you were assuming that the DT does the one-hot encoding internally, which it doesn't. In practice, it is hard to guess what is a nominal and what is a ordinal variable, so you have to do the onehot encoding before you give the data to the decision tree. Best, Sebastian > On Oct 4, 2019, at 11:48 AM, C W wrote: > > I'm getting some funny results. I am doing a regression decision tree, the response variables are assigned to levels. > > The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category. > > The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How does the sklearn know internally 0 vs. 1 is categorical, not numerical? > > In R for instance, you do as.factor(), which explicitly states the data type. > > Thank you! > > > On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller > wrote: > > > On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >> >> >> On Sat, 14 Sep 2019 at 20:59, C W > wrote: >> Thanks, Guillaume. >> Column transformer looks pretty neat. I've also heard though, this pipeline can be tedious to set up? Specifying what you want for every feature is a pain. >> >> It would be interesting for us which part of the pipeline is tedious to set up to know if we can improve something there. >> Do you mean, that you would like to automatically detect of which type of feature (categorical/numerical) and apply a >> default encoder/scaling such as discuss there: https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >> >> IMO, one a user perspective, it would be cleaner in some cases at the cost of applying blindly a black box >> which might be dangerous. > Also see https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor > Which basically does that. > > >> >> >> Jaiver, >> Actually, you guessed right. My real data has only one numerical variable, looks more like this: >> >> Gender Date Income Car Attendance >> Male 2019/3/01 10000 BMW Yes >> Female 2019/5/02 9000 Toyota No >> Male 2019/7/15 12000 Audi Yes >> >> I am predicting income using all other categorical variables. Maybe it is catboost! >> >> Thanks, >> >> M >> >> >> >> >> >> >> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: >> If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, >> but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look >> at catboost [1], which has an sklearn-compatible API. >> >> J >> >> [1] https://catboost.ai/ >> On Sat, Sep 14, 2019 at 3:40 AM C W > wrote: >> Hello all, >> I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). >> >> For example, >> Gender Age Income Car Attendance >> Male 30 10000 BMW Yes >> Female 35 9000 Toyota No >> Male 50 12000 Audi Yes >> >> According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart , it can not! >> >> It says: "scikit-learn implementation does not support categorical variables for now". >> >> Is this true? If not, can someone point me to an example? If yes, what do people do? >> >> Thank you very much! >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Fri Oct 4 14:01:23 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 4 Oct 2019 14:01:23 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References:

Message-ID: Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, typo on my part. Looks like I did one-hot-encoding correctly. My new variable names are: car_Audi, car_BMW, etc. But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? Is there a good toy example on the sklearn website? I am only see this: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html . Thanks! On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka wrote: > Hi, > > The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, > Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5 > > > that's not a onehot encoding then. > > For an Audi datapoint, it should be > > BMW=0 > Toyota=0 > Audi=1 > > for BMW > > BMW=1 > Toyota=0 > Audi=0 > > and for Toyota > > BMW=0 > Toyota=1 > Audi=0 > > The split threshold should then be at 0.5 for any of these features. > > Based on your email, I think you were assuming that the DT does the > one-hot encoding internally, which it doesn't. In practice, it is hard to > guess what is a nominal and what is a ordinal variable, so you have to do > the onehot encoding before you give the data to the decision tree. > > Best, > Sebastian > > On Oct 4, 2019, at 11:48 AM, C W wrote: > > I'm getting some funny results. I am doing a regression decision tree, the > response variables are assigned to levels. > > The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, > Audi=2) as numerical values, not category. > > The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How > does the sklearn know internally 0 vs. 1 is categorical, not numerical? > > In R for instance, you do as.factor(), which explicitly states the data > type. > > Thank you! > > > On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller wrote: > >> >> >> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >> >> >> >> On Sat, 14 Sep 2019 at 20:59, C W wrote: >> >>> Thanks, Guillaume. >>> Column transformer looks pretty neat. I've also heard though, this >>> pipeline can be tedious to set up? Specifying what you want for every >>> feature is a pain. >>> >> >> It would be interesting for us which part of the pipeline is tedious to >> set up to know if we can improve something there. >> Do you mean, that you would like to automatically detect of which type of >> feature (categorical/numerical) and apply a >> default encoder/scaling such as discuss there: >> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >> >> IMO, one a user perspective, it would be cleaner in some cases at the >> cost of applying blindly a black box >> which might be dangerous. >> >> Also see >> https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >> Which basically does that. >> >> >> >> >>> >>> Jaiver, >>> Actually, you guessed right. My real data has only one numerical >>> variable, looks more like this: >>> >>> Gender Date Income Car Attendance >>> Male 2019/3/01 10000 BMW Yes >>> Female 2019/5/02 9000 Toyota No >>> Male 2019/7/15 12000 Audi Yes >>> >>> I am predicting income using all other categorical variables. Maybe it >>> is catboost! >>> >>> Thanks, >>> >>> M >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez >>> wrote: >>> >>>> If you have datasets with many categorical features, and perhaps many >>>> categories, the tools in sklearn are quite limited, >>>> but there are alternative implementations of boosted trees that are >>>> designed with categorical features in mind. Take a look >>>> at catboost [1], which has an sklearn-compatible API. >>>> >>>> J >>>> >>>> [1] https://catboost.ai/ >>>> >>>> On Sat, Sep 14, 2019 at 3:40 AM C W wrote: >>>> >>>>> Hello all, >>>>> I'm very confused. Can the decision tree module handle both continuous >>>>> and categorical features in the dataset? In this case, it's just CART >>>>> (Classification and Regression Trees). >>>>> >>>>> For example, >>>>> Gender Age Income Car Attendance >>>>> Male 30 10000 BMW Yes >>>>> Female 35 9000 Toyota No >>>>> Male 50 12000 Audi Yes >>>>> >>>>> According to the documentation >>>>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>>>> it can not! >>>>> >>>>> It says: "scikit-learn implementation does not support categorical >>>>> variables for now". >>>>> >>>>> Is this true? If not, can someone point me to an example? If yes, what >>>>> do people do? >>>>> >>>>> Thank you very much! >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> Guillaume Lemaitre >> INRIA Saclay - Parietal team >> Center for Data Science Paris-Saclay >> https://glemaitre.github.io/ >> >> _______________________________________________ >> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From niourf at gmail.com Fri Oct 4 14:44:04 2019 From: niourf at gmail.com (Nicolas Hug) Date: Fri, 4 Oct 2019 14:44:04 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References:

Message-ID: <5e9661ff-dfb2-cc2e-b71f-ba18024374a1@gmail.com> > But, decision tree is still mistaking one-hot-encoding as numerical > input and split at 0.5. This is not right. Perhaps, I'm doing > something wrong? You're not doing anything wrong, and neither is the tree. Trees don't support categorical variables in sklearn, so everything is treated as numerical. This is why we do one-hot-encoding: so that a set of numerical (one hot encoded) features can be treated as if they were just one categorical feature. Nicolas On 10/4/19 2:01 PM, C W wrote: > Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, typo on > my part. > > Looks like I did one-hot-encoding correctly. My new variable names > are: car_Audi, car_BMW, etc. > > But, decision tree is still mistaking one-hot-encoding as numerical > input and split at 0.5. This is not right. Perhaps, I'm doing > something wrong? > > Is there a good toy example on the sklearn website? I am only see > this: > https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html. > > Thanks! > > > > On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka > > wrote: > > Hi, > >> The funny part is: the tree is taking one-hot-encoding (BMW=0, >> Toyota=1, Audi=2) as numerical values, not category.The tree >> splits at 0.5 and 1.5 > > that's not a onehot encoding then. > > For an Audi datapoint, it should be > > BMW=0 > Toyota=0 > Audi=1 > > for BMW > > BMW=1 > Toyota=0 > Audi=0 > > and for Toyota > > BMW=0 > Toyota=1 > Audi=0 > > The split threshold should then be at 0.5 for any of these features. > > Based on your email, I think you were assuming that the DT does > the one-hot encoding internally, which it doesn't. In practice, it > is hard to guess what is a nominal and what is a ordinal variable, > so you have to do the onehot encoding before you give the data to > the decision tree. > > Best, > Sebastian > >> On Oct 4, 2019, at 11:48 AM, C W > > wrote: >> >> I'm getting some funny results. I am doing a regression decision >> tree, the response variables are assigned to levels. >> >> The funny part is: the tree is taking one-hot-encoding (BMW=0, >> Toyota=1, Audi=2) as numerical values, not category. >> >> The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding >> wrong? How does the sklearn know internally 0 vs. 1 is >> categorical, not numerical? >> >> In R for instance, you do as.factor(), which explicitly states >> the data type. >> >> Thank you! >> >> >> On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller >> > wrote: >> >> >> >> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >>> >>> >>> On Sat, 14 Sep 2019 at 20:59, C W >> > wrote: >>> >>> Thanks,?Guillaume. >>> Column transformer looks pretty neat. I've also heard >>> though, this pipeline can be tedious to set up? >>> Specifying what you want for every feature is a pain. >>> >>> >>> It would be interesting for us which part of the pipeline is >>> tedious to set up to know if we can improve something there. >>> Do you mean, that you would like to automatically detect of >>> which type of feature (categorical/numerical) and apply a >>> default encoder/scaling such as discuss there: >>> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >>> >>> IMO, one a user perspective, it would be cleaner in some >>> cases at the cost of applying blindly a black box >>> which might be dangerous. >> Also see >> https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >> Which basically does that. >> >> >>> >>> Jaiver, >>> Actually, you guessed right. My real data has only one >>> numerical variable,?looks more like this: >>> >>> Gender Date Income? Car?? Attendance >>> Male? ? ?2019/3/01? ?10000 BMW????????? Yes >>> Female 2019/5/02? ? 9000 ?Toyota? ??????? No >>> Male???? 2019/7/15? ?12000 Audi ? ? ????? Yes >>> >>> I am predicting income using all other categorical >>> variables. Maybe?it is catboost! >>> >>> Thanks, >>> >>> M >>> >>> >>> >>> >>> >>> >>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez >>> wrote: >>> >>> If you have datasets with many categorical features, >>> and perhaps many categories, the tools in sklearn >>> are quite limited, >>> but there are alternative implementations of boosted >>> trees that are designed with categorical features in >>> mind. Take a look >>> at catboost [1], which has an sklearn-compatible API. >>> >>> J >>> >>> [1] https://catboost.ai/ >>> >>> On Sat, Sep 14, 2019 at 3:40 AM C W >>> > wrote: >>> >>> Hello all, >>> I'm very confused. Can the decision tree module >>> handle both continuous and categorical features >>> in the dataset? In this case, it's just CART >>> (Classification and Regression Trees). >>> >>> For example, >>> Gender Age Income Car?? Attendance >>> Male???? 30?? 10000 BMW????????? Yes >>> Female 35???? 9000 Toyota? ??????? No >>> Male???? 50?? 12000 Audi ? ? ????? Yes >>> >>> According to the documentation >>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>> it can not! >>> >>> It says: "scikit-learn implementation does not >>> support categorical variables for now". >>> >>> Is this true? If not, can someone point me to an >>> example? If yes, what do people do? >>> >>> Thank you very much! >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> INRIA Saclay - Parietal team >>> Center for Data Science Paris-Saclay >>> https://glemaitre.github.io/ >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Oct 4 15:35:42 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 4 Oct 2019 14:35:42 -0500 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: <5e9661ff-dfb2-cc2e-b71f-ba18024374a1@gmail.com> References:

<5e9661ff-dfb2-cc2e-b71f-ba18024374a1@gmail.com> Message-ID: <7E3EE86D-4B8A-438A-B03A-8DFC8E1D8AB4@sebastianraschka.com> Like Nicolas said, the 0.5 is just a workaround but will do the right thing on the one-hot encoded variables, here. You will find that the threshold is always at 0.5 for these variables. I.e., what it will do is to use the following conversion: treat as car_Audi=1 if car_Audi >= 0.5 treat as car_Audi=0 if car_Audi < 0.5 or, it may be treat as car_Audi=1 if car_Audi > 0.5 treat as car_Audi=0 if car_Audi <= 0.5 (Forgot which one sklearn is using, but either way. it will be fine.) Best, Sebastian > On Oct 4, 2019, at 1:44 PM, Nicolas Hug wrote: > > >> But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? > > You're not doing anything wrong, and neither is the tree. Trees don't support categorical variables in sklearn, so everything is treated as numerical. > > This is why we do one-hot-encoding: so that a set of numerical (one hot encoded) features can be treated as if they were just one categorical feature. > > > > Nicolas > > On 10/4/19 2:01 PM, C W wrote: >> Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, typo on my part. >> >> Looks like I did one-hot-encoding correctly. My new variable names are: car_Audi, car_BMW, etc. >> >> But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? >> >> Is there a good toy example on the sklearn website? I am only see this: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html . >> >> Thanks! >> >> >> >> On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka > wrote: >> Hi, >> >>> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5 >> >> that's not a onehot encoding then. >> >> For an Audi datapoint, it should be >> >> BMW=0 >> Toyota=0 >> Audi=1 >> >> for BMW >> >> BMW=1 >> Toyota=0 >> Audi=0 >> >> and for Toyota >> >> BMW=0 >> Toyota=1 >> Audi=0 >> >> The split threshold should then be at 0.5 for any of these features. >> >> Based on your email, I think you were assuming that the DT does the one-hot encoding internally, which it doesn't. In practice, it is hard to guess what is a nominal and what is a ordinal variable, so you have to do the onehot encoding before you give the data to the decision tree. >> >> Best, >> Sebastian >> >>> On Oct 4, 2019, at 11:48 AM, C W > wrote: >>> >>> I'm getting some funny results. I am doing a regression decision tree, the response variables are assigned to levels. >>> >>> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category. >>> >>> The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How does the sklearn know internally 0 vs. 1 is categorical, not numerical? >>> >>> In R for instance, you do as.factor(), which explicitly states the data type. >>> >>> Thank you! >>> >>> >>> On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller > wrote: >>> >>> >>> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >>>> >>>> >>>> On Sat, 14 Sep 2019 at 20:59, C W > wrote: >>>> Thanks, Guillaume. >>>> Column transformer looks pretty neat. I've also heard though, this pipeline can be tedious to set up? Specifying what you want for every feature is a pain. >>>> >>>> It would be interesting for us which part of the pipeline is tedious to set up to know if we can improve something there. >>>> Do you mean, that you would like to automatically detect of which type of feature (categorical/numerical) and apply a >>>> default encoder/scaling such as discuss there: https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >>>> >>>> IMO, one a user perspective, it would be cleaner in some cases at the cost of applying blindly a black box >>>> which might be dangerous. >>> Also see https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >>> Which basically does that. >>> >>> >>>> >>>> >>>> Jaiver, >>>> Actually, you guessed right. My real data has only one numerical variable, looks more like this: >>>> >>>> Gender Date Income Car Attendance >>>> Male 2019/3/01 10000 BMW Yes >>>> Female 2019/5/02 9000 Toyota No >>>> Male 2019/7/15 12000 Audi Yes >>>> >>>> I am predicting income using all other categorical variables. Maybe it is catboost! >>>> >>>> Thanks, >>>> >>>> M >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: >>>> If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, >>>> but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look >>>> at catboost [1], which has an sklearn-compatible API. >>>> >>>> J >>>> >>>> [1] https://catboost.ai/ >>>> On Sat, Sep 14, 2019 at 3:40 AM C W > wrote: >>>> Hello all, >>>> I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). >>>> >>>> For example, >>>> Gender Age Income Car Attendance >>>> Male 30 10000 BMW Yes >>>> Female 35 9000 Toyota No >>>> Male 50 12000 Audi Yes >>>> >>>> According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart , it can not! >>>> >>>> It says: "scikit-learn implementation does not support categorical variables for now". >>>> >>>> Is this true? If not, can someone point me to an example? If yes, what do people do? >>>> >>>> Thank you very much! >>>> >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>>> -- >>>> Guillaume Lemaitre >>>> INRIA Saclay - Parietal team >>>> Center for Data Science Paris-Saclay >>>> https://glemaitre.github.io/ >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmrsg11 at gmail.com Fri Oct 4 18:34:50 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 4 Oct 2019 18:34:50 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: <7E3EE86D-4B8A-438A-B03A-8DFC8E1D8AB4@sebastianraschka.com> References:

<5e9661ff-dfb2-cc2e-b71f-ba18024374a1@gmail.com> <7E3EE86D-4B8A-438A-B03A-8DFC8E1D8AB4@sebastianraschka.com> Message-ID: I don't understand your answer. Why after one-hot-encoding it still outputs greater than 0.5 or less than? Does sklearn website have a working example on categorical input? Thanks! On Fri, Oct 4, 2019 at 3:48 PM Sebastian Raschka wrote: > Like Nicolas said, the 0.5 is just a workaround but will do the right > thing on the one-hot encoded variables, here. You will find that the > threshold is always at 0.5 for these variables. I.e., what it will do is to > use the following conversion: > > treat as car_Audi=1 if car_Audi >= 0.5 > treat as car_Audi=0 if car_Audi < 0.5 > > or, it may be > > treat as car_Audi=1 if car_Audi > 0.5 > treat as car_Audi=0 if car_Audi <= 0.5 > > (Forgot which one sklearn is using, but either way. it will be fine.) > > Best, > Sebastian > > > On Oct 4, 2019, at 1:44 PM, Nicolas Hug wrote: > > > But, decision tree is still mistaking one-hot-encoding as numerical input > and split at 0.5. This is not right. Perhaps, I'm doing something wrong? > > > You're not doing anything wrong, and neither is the tree. Trees don't > support categorical variables in sklearn, so everything is treated as > numerical. > > This is why we do one-hot-encoding: so that a set of numerical (one hot > encoded) features can be treated as if they were just one categorical > feature. > > > Nicolas > On 10/4/19 2:01 PM, C W wrote: > > Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, typo on my > part. > > Looks like I did one-hot-encoding correctly. My new variable names are: > car_Audi, car_BMW, etc. > > But, decision tree is still mistaking one-hot-encoding as numerical input > and split at 0.5. This is not right. Perhaps, I'm doing something wrong? > > Is there a good toy example on the sklearn website? I am only see this: > https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html > . > > Thanks! > > > > On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka < > mail at sebastianraschka.com> wrote: > >> Hi, >> >> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, >> Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5 >> >> >> that's not a onehot encoding then. >> >> For an Audi datapoint, it should be >> >> BMW=0 >> Toyota=0 >> Audi=1 >> >> for BMW >> >> BMW=1 >> Toyota=0 >> Audi=0 >> >> and for Toyota >> >> BMW=0 >> Toyota=1 >> Audi=0 >> >> The split threshold should then be at 0.5 for any of these features. >> >> Based on your email, I think you were assuming that the DT does the >> one-hot encoding internally, which it doesn't. In practice, it is hard to >> guess what is a nominal and what is a ordinal variable, so you have to do >> the onehot encoding before you give the data to the decision tree. >> >> Best, >> Sebastian >> >> On Oct 4, 2019, at 11:48 AM, C W wrote: >> >> I'm getting some funny results. I am doing a regression decision tree, >> the response variables are assigned to levels. >> >> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, >> Audi=2) as numerical values, not category. >> >> The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How >> does the sklearn know internally 0 vs. 1 is categorical, not numerical? >> >> In R for instance, you do as.factor(), which explicitly states the data >> type. >> >> Thank you! >> >> >> On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller >> wrote: >> >>> >>> >>> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >>> >>> >>> >>> On Sat, 14 Sep 2019 at 20:59, C W wrote: >>> >>>> Thanks, Guillaume. >>>> Column transformer looks pretty neat. I've also heard though, this >>>> pipeline can be tedious to set up? Specifying what you want for every >>>> feature is a pain. >>>> >>> >>> It would be interesting for us which part of the pipeline is tedious to >>> set up to know if we can improve something there. >>> Do you mean, that you would like to automatically detect of which type >>> of feature (categorical/numerical) and apply a >>> default encoder/scaling such as discuss there: >>> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >>> >>> IMO, one a user perspective, it would be cleaner in some cases at the >>> cost of applying blindly a black box >>> which might be dangerous. >>> >>> Also see >>> https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >>> Which basically does that. >>> >>> >>> >>> >>>> >>>> Jaiver, >>>> Actually, you guessed right. My real data has only one numerical >>>> variable, looks more like this: >>>> >>>> Gender Date Income Car Attendance >>>> Male 2019/3/01 10000 BMW Yes >>>> Female 2019/5/02 9000 Toyota No >>>> Male 2019/7/15 12000 Audi Yes >>>> >>>> I am predicting income using all other categorical variables. Maybe it >>>> is catboost! >>>> >>>> Thanks, >>>> >>>> M >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez >>>> wrote: >>>> >>>>> If you have datasets with many categorical features, and perhaps many >>>>> categories, the tools in sklearn are quite limited, >>>>> but there are alternative implementations of boosted trees that are >>>>> designed with categorical features in mind. Take a look >>>>> at catboost [1], which has an sklearn-compatible API. >>>>> >>>>> J >>>>> >>>>> [1] https://catboost.ai/ >>>>> >>>>> On Sat, Sep 14, 2019 at 3:40 AM C W wrote: >>>>> >>>>>> Hello all, >>>>>> I'm very confused. Can the decision tree module handle both >>>>>> continuous and categorical features in the dataset? In this case, it's just >>>>>> CART (Classification and Regression Trees). >>>>>> >>>>>> For example, >>>>>> Gender Age Income Car Attendance >>>>>> Male 30 10000 BMW Yes >>>>>> Female 35 9000 Toyota No >>>>>> Male 50 12000 Audi Yes >>>>>> >>>>>> According to the documentation >>>>>> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, >>>>>> it can not! >>>>>> >>>>>> It says: "scikit-learn implementation does not support categorical >>>>>> variables for now". >>>>>> >>>>>> Is this true? If not, can someone point me to an example? If yes, >>>>>> what do people do? >>>>>> >>>>>> Thank you very much! >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> scikit-learn at python.org >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>> >>> >>> -- >>> Guillaume Lemaitre >>> INRIA Saclay - Parietal team >>> Center for Data Science Paris-Saclay >>> https://glemaitre.github.io/ >>> >>> _______________________________________________ >>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at sebastianraschka.com Fri Oct 4 18:50:41 2019 From: mail at sebastianraschka.com (Sebastian Raschka) Date: Fri, 4 Oct 2019 17:50:41 -0500 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References:

<5e9661ff-dfb2-cc2e-b71f-ba18024374a1@gmail.com> <7E3EE86D-4B8A-438A-B03A-8DFC8E1D8AB4@sebastianraschka.com> Message-ID: Not sure if there's a website for that. In any case, to explain this differently, as discussed earlier sklearn assumes continuous features for decision trees. So, it will use a binary threshold for splitting along a feature attribute. In other words, it cannot do sth like if x == 1 then right child node else left child node Instead, what it does is if x >= 0.5 then right child node else left child node These are basically equivalent as you can see when you just plug in values 0 and 1 for x. Best, Sebastian > On Oct 4, 2019, at 5:34 PM, C W wrote: > > I don't understand your answer. > > Why after one-hot-encoding it still outputs greater than 0.5 or less than? Does sklearn website have a working example on categorical input? > > Thanks! > > On Fri, Oct 4, 2019 at 3:48 PM Sebastian Raschka wrote: > Like Nicolas said, the 0.5 is just a workaround but will do the right thing on the one-hot encoded variables, here. You will find that the threshold is always at 0.5 for these variables. I.e., what it will do is to use the following conversion: > > treat as car_Audi=1 if car_Audi >= 0.5 > treat as car_Audi=0 if car_Audi < 0.5 > > or, it may be > > treat as car_Audi=1 if car_Audi > 0.5 > treat as car_Audi=0 if car_Audi <= 0.5 > > (Forgot which one sklearn is using, but either way. it will be fine.) > > Best, > Sebastian > > >> On Oct 4, 2019, at 1:44 PM, Nicolas Hug wrote: >> >> >>> But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? >> >> You're not doing anything wrong, and neither is the tree. Trees don't support categorical variables in sklearn, so everything is treated as numerical. >> >> This is why we do one-hot-encoding: so that a set of numerical (one hot encoded) features can be treated as if they were just one categorical feature. >> >> >> >> Nicolas >> >> On 10/4/19 2:01 PM, C W wrote: >>> Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, typo on my part. >>> >>> Looks like I did one-hot-encoding correctly. My new variable names are: car_Audi, car_BMW, etc. >>> >>> But, decision tree is still mistaking one-hot-encoding as numerical input and split at 0.5. This is not right. Perhaps, I'm doing something wrong? >>> >>> Is there a good toy example on the sklearn website? I am only see this: https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html. >>> >>> Thanks! >>> >>> >>> >>> On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka wrote: >>> Hi, >>> >>>> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category.The tree splits at 0.5 and 1.5 >>> >>> that's not a onehot encoding then. >>> >>> For an Audi datapoint, it should be >>> >>> BMW=0 >>> Toyota=0 >>> Audi=1 >>> >>> for BMW >>> >>> BMW=1 >>> Toyota=0 >>> Audi=0 >>> >>> and for Toyota >>> >>> BMW=0 >>> Toyota=1 >>> Audi=0 >>> >>> The split threshold should then be at 0.5 for any of these features. >>> >>> Based on your email, I think you were assuming that the DT does the one-hot encoding internally, which it doesn't. In practice, it is hard to guess what is a nominal and what is a ordinal variable, so you have to do the onehot encoding before you give the data to the decision tree. >>> >>> Best, >>> Sebastian >>> >>>> On Oct 4, 2019, at 11:48 AM, C W wrote: >>>> >>>> I'm getting some funny results. I am doing a regression decision tree, the response variables are assigned to levels. >>>> >>>> The funny part is: the tree is taking one-hot-encoding (BMW=0, Toyota=1, Audi=2) as numerical values, not category. >>>> >>>> The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding wrong? How does the sklearn know internally 0 vs. 1 is categorical, not numerical? >>>> >>>> In R for instance, you do as.factor(), which explicitly states the data type. >>>> >>>> Thank you! >>>> >>>> >>>> On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller wrote: >>>> >>>> >>>> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >>>>> >>>>> >>>>> On Sat, 14 Sep 2019 at 20:59, C W wrote: >>>>> Thanks, Guillaume. >>>>> Column transformer looks pretty neat. I've also heard though, this pipeline can be tedious to set up? Specifying what you want for every feature is a pain. >>>>> >>>>> It would be interesting for us which part of the pipeline is tedious to set up to know if we can improve something there. >>>>> Do you mean, that you would like to automatically detect of which type of feature (categorical/numerical) and apply a >>>>> default encoder/scaling such as discuss there: https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >>>>> >>>>> IMO, one a user perspective, it would be cleaner in some cases at the cost of applying blindly a black box >>>>> which might be dangerous. >>>> Also see https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >>>> Which basically does that. >>>> >>>> >>>>> >>>>> >>>>> Jaiver, >>>>> Actually, you guessed right. My real data has only one numerical variable, looks more like this: >>>>> >>>>> Gender Date Income Car Attendance >>>>> Male 2019/3/01 10000 BMW Yes >>>>> Female 2019/5/02 9000 Toyota No >>>>> Male 2019/7/15 12000 Audi Yes >>>>> >>>>> I am predicting income using all other categorical variables. Maybe it is catboost! >>>>> >>>>> Thanks, >>>>> >>>>> M >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez wrote: >>>>> If you have datasets with many categorical features, and perhaps many categories, the tools in sklearn are quite limited, >>>>> but there are alternative implementations of boosted trees that are designed with categorical features in mind. Take a look >>>>> at catboost [1], which has an sklearn-compatible API. >>>>> >>>>> J >>>>> >>>>> [1] https://catboost.ai/ >>>>> >>>>> On Sat, Sep 14, 2019 at 3:40 AM C W wrote: >>>>> Hello all, >>>>> I'm very confused. Can the decision tree module handle both continuous and categorical features in the dataset? In this case, it's just CART (Classification and Regression Trees). >>>>> >>>>> For example, >>>>> Gender Age Income Car Attendance >>>>> Male 30 10000 BMW Yes >>>>> Female 35 9000 Toyota No >>>>> Male 50 12000 Audi Yes >>>>> >>>>> According to the documentation https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart, it can not! >>>>> >>>>> It says: "scikit-learn implementation does not support categorical variables for now". >>>>> >>>>> Is this true? If not, can someone point me to an example? If yes, what do people do? >>>>> >>>>> Thank you very much! >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>>> -- >>>>> Guillaume Lemaitre >>>>> INRIA Saclay - Parietal team >>>>> Center for Data Science Paris-Saclay >>>>> https://glemaitre.github.io/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> >>>>> scikit-learn at python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn at python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> >>> scikit-learn at python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From tmrsg11 at gmail.com Fri Oct 4 19:33:15 2019 From: tmrsg11 at gmail.com (C W) Date: Fri, 4 Oct 2019 19:33:15 -0400 Subject: [scikit-learn] Can Scikit-learn decision tree (CART) have both continuous and categorical features? In-Reply-To: References: