From mk1853387 at gmail.com Tue Jan 9 12:39:16 2024 From: mk1853387 at gmail.com (marc nicole) Date: Tue, 9 Jan 2024 18:39:16 +0100 Subject: [scikit-learn] How to extract subtree from a RegressionTree using the tree attribute? Message-ID: I want to extract the subtree from the RegressionTree resulting from training the associated model based on inputs: rootNode and depth, Here's my buggy code (that I want it to be checked for errors) def extract_tree_depth_first_traversal(tree, root_start, t_depth): depth = 1 sub_tree = [] stack = Queue() stack.put(root_start) while stack: current_node = stack.get(0) sub_tree.append(current_node) left_child = tree.children_left[current_node] if left_child >= 0: stack.put(left_child) right_child = tree.children_right[current_node] if right_child >= 0: stack.put(right_child) children_current_node = [left_child, right_child] for child in children_current_node: sub_tree.append(child) if depth >= t_depth: break depth = depth + 1 return sub_tree Could somebody spot the error for me ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.braune79 at gmail.com Tue Jan 9 13:34:49 2024 From: christian.braune79 at gmail.com (Christian Braune) Date: Tue, 9 Jan 2024 19:34:49 +0100 Subject: [scikit-learn] How to extract subtree from a RegressionTree using the tree attribute? In-Reply-To: References: Message-ID: Hi Marc, a first observation: stack.get(0) returns but does NOT remove the first element from a list (even if you name it stack). If you want a stack, you need to use the pop method. See also here: https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-stacks Best regards Christian marc nicole schrieb am Di., 9. Jan. 2024, 18:37: > I want to extract the subtree from the RegressionTree resulting from > training the associated model based on inputs: rootNode and depth, > > Here's my buggy code (that I want it to be checked for errors) > > def extract_tree_depth_first_traversal(tree, root_start, t_depth): > depth = 1 > sub_tree = [] > stack = Queue() > stack.put(root_start) > while stack: > current_node = stack.get(0) > sub_tree.append(current_node) > left_child = tree.children_left[current_node] > if left_child >= 0: > stack.put(left_child) > right_child = tree.children_right[current_node] > if right_child >= 0: > stack.put(right_child) > children_current_node = [left_child, right_child] > for child in children_current_node: > sub_tree.append(child) > if depth >= t_depth: > break > depth = depth + 1 > return sub_tree > > Could somebody spot the error for me ? > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mk1853387 at gmail.com Sun Jan 14 16:15:38 2024 From: mk1853387 at gmail.com (marc nicole) Date: Sun, 14 Jan 2024 22:15:38 +0100 Subject: [scikit-learn] level search traversal on binary decision regression tree with recursive calls returning wrong node order Message-ID: Hi all, Suppose I have this binary tree that I want to level-based traverse using recursive algorithm: . ??? 1/ ??? 2/ ? ??? 3/ ? ? ??? 4 ? ? ??? 9 ? ??? 30 ??? 71/ ??? 72 ??? 99 I wrote this algorithm inspired by the level first traversal of a tree algorithm which stops at a certain input depth: def get_subtree_from_rt(subtree, root_start, max_depth): if max_depth == 0: return [] nodes = [root_start] if root_start == -1: return [] else: nodes.extend([subtree.children_left[root_start], subtree.children_right[root_start]]) print(nodes) nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_left[root_start], max_depth - 1) if child not in list(filter(lambda a: a != -1, nodes))) nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_right[root_start], max_depth - 1) if child not in list(filter(lambda a: a != -1, nodes))) return nodes The algorithm does traverse the tree but in an unwanted order, namely the returned result for the mentioned tree was: [1, 2, 71, 3, 30, 4, 9] While the right one should have been: [1, 2, 71, 3, 30, 72, 99] Indeed the root_start is not the same for both recursive calls, since the first recursive call alters its value. My question is how to obtain the mentioned results but avoid calling the second recursive call on a different root_start value? use: tree_stucture as input as subtree import pandas as pd import numpy as np from sklearn import * from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from sklearn import tree dataset = pd.read_csv("anydatasetPath") x = dataset.drop(dataset.columns[9],axis = 1) y = dataset.iloc[:,9] x_train, x_test,y_train,y_test = train_test_split(x,y,test_size= 0.2,random_state = 28) model = DecisionTreeRegressor(random_state=0) model.fit(x_train,y_train) y_pred = model.predict(x_test) tree_stucture = model.tree_ print(get_subtree_from_rt(tree_stucture,1,3)) with many thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.braune79 at gmail.com Mon Jan 15 02:48:31 2024 From: christian.braune79 at gmail.com (Christian Braune) Date: Mon, 15 Jan 2024 08:48:31 +0100 Subject: [scikit-learn] level search traversal on binary decision regression tree with recursive calls returning wrong node order In-Reply-To: References: Message-ID: Hello Marc, you might want to look at the intro to algorithms and data structures course from Sedgewick (your specific problem is discussed here: https://www.cs.princeton.edu/courses/archive/spring15/cos226/lectures/31ElementarySymbolTables+32BinarySearchTrees.pdf, p50/51 (slide 22 specifically). In short: Level-order traversal is better solved using an iterative approach. I also believe that your problem is not specific to sklearn, right? Best regards Christian Am So., 14. Jan. 2024 um 22:13 Uhr schrieb marc nicole : > Hi all, > > Suppose I have this binary tree that I want to level-based traverse using > recursive algorithm: > > . > ??? 1/ > ??? 2/ > ? ??? 3/ > ? ? ??? 4 > ? ? ??? 9 > ? ??? 30 > ??? 71/ > ??? 72 > ??? 99 > > I wrote this algorithm inspired by the level first traversal of a tree > algorithm which stops at a certain input depth: > > def get_subtree_from_rt(subtree, root_start, max_depth): > if max_depth == 0: > return [] > nodes = [root_start] > if root_start == -1: > return [] > else: > nodes.extend([subtree.children_left[root_start], subtree.children_right[root_start]]) > print(nodes) > nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_left[root_start], max_depth - 1) if > child not in list(filter(lambda a: a != -1, nodes))) > > nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_right[root_start], max_depth - 1) if > child not in list(filter(lambda a: a != -1, nodes))) > return nodes > > The algorithm does traverse the tree but in an unwanted order, namely the > returned result for the mentioned tree was: > > [1, 2, 71, 3, 30, 4, 9] > > While the right one should have been: > > [1, 2, 71, 3, 30, 72, 99] > > Indeed the root_start is not the same for both recursive calls, since the > first recursive call alters its value. > > > My question is how to obtain the mentioned results but avoid calling the > second recursive call on a different root_start value? > > use: tree_stucture as input as subtree > > import pandas as pd > import numpy as np > from sklearn import * > from sklearn.model_selection import train_test_split > from sklearn.tree import DecisionTreeRegressor > from sklearn import tree > dataset = pd.read_csv("anydatasetPath") > x = dataset.drop(dataset.columns[9],axis = 1) > y = dataset.iloc[:,9] > > x_train, x_test,y_train,y_test = train_test_split(x,y,test_size= 0.2,random_state = 28) > > > model = DecisionTreeRegressor(random_state=0) > model.fit(x_train,y_train) > y_pred = model.predict(x_test) > > tree_stucture = model.tree_ > > print(get_subtree_from_rt(tree_stucture,1,3)) > > > > with many thanks > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mk1853387 at gmail.com Mon Jan 15 13:07:22 2024 From: mk1853387 at gmail.com (marc nicole) Date: Mon, 15 Jan 2024 19:07:22 +0100 Subject: [scikit-learn] level search traversal on binary decision regression tree with recursive calls returning wrong node order In-Reply-To: References: Message-ID: thanks for the reply, no it is not specific to scikit learn but one application is about scikit learn Le lun. 15 janv. 2024 ? 08:50, Christian Braune < christian.braune79 at gmail.com> a ?crit : > Hello Marc, > > you might want to look at the intro to algorithms and data structures > course from Sedgewick (your specific problem is discussed here: > https://www.cs.princeton.edu/courses/archive/spring15/cos226/lectures/31ElementarySymbolTables+32BinarySearchTrees.pdf, > p50/51 (slide 22 specifically). > In short: Level-order traversal is better solved using an iterative > approach. > I also believe that your problem is not specific to sklearn, right? > > Best regards > Christian > > Am So., 14. Jan. 2024 um 22:13 Uhr schrieb marc nicole < > mk1853387 at gmail.com>: > >> Hi all, >> >> Suppose I have this binary tree that I want to level-based traverse using >> recursive algorithm: >> >> . >> ??? 1/ >> ??? 2/ >> ? ??? 3/ >> ? ? ??? 4 >> ? ? ??? 9 >> ? ??? 30 >> ??? 71/ >> ??? 72 >> ??? 99 >> >> I wrote this algorithm inspired by the level first traversal of a tree >> algorithm which stops at a certain input depth: >> >> def get_subtree_from_rt(subtree, root_start, max_depth): >> if max_depth == 0: >> return [] >> nodes = [root_start] >> if root_start == -1: >> return [] >> else: >> nodes.extend([subtree.children_left[root_start], subtree.children_right[root_start]]) >> print(nodes) >> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_left[root_start], max_depth - 1) if >> child not in list(filter(lambda a: a != -1, nodes))) >> >> nodes.extend(child for child in get_subtree_from_rt(subtree, subtree.children_right[root_start], max_depth - 1) if >> child not in list(filter(lambda a: a != -1, nodes))) >> return nodes >> >> The algorithm does traverse the tree but in an unwanted order, namely the >> returned result for the mentioned tree was: >> >> [1, 2, 71, 3, 30, 4, 9] >> >> While the right one should have been: >> >> [1, 2, 71, 3, 30, 72, 99] >> >> Indeed the root_start is not the same for both recursive calls, since >> the first recursive call alters its value. >> >> >> My question is how to obtain the mentioned results but avoid calling the >> second recursive call on a different root_start value? >> >> use: tree_stucture as input as subtree >> >> import pandas as pd >> import numpy as np >> from sklearn import * >> from sklearn.model_selection import train_test_split >> from sklearn.tree import DecisionTreeRegressor >> from sklearn import tree >> dataset = pd.read_csv("anydatasetPath") >> x = dataset.drop(dataset.columns[9],axis = 1) >> y = dataset.iloc[:,9] >> >> x_train, x_test,y_train,y_test = train_test_split(x,y,test_size= 0.2,random_state = 28) >> >> >> model = DecisionTreeRegressor(random_state=0) >> model.fit(x_train,y_train) >> y_pred = model.predict(x_test) >> >> tree_stucture = model.tree_ >> >> print(get_subtree_from_rt(tree_stucture,1,3)) >> >> >> >> with many thanks >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn at python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremie.du-boisberranger at inria.fr Fri Jan 19 06:15:25 2024 From: jeremie.du-boisberranger at inria.fr (Jeremie du Boisberranger) Date: Fri, 19 Jan 2024 12:15:25 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.4.0 release In-Reply-To: References: Message-ID: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr> Hi everyone, We're happy to announce the 1.4.0 release which you can install via pip or conda: ??? pip install -U scikit-learn or ??? conda install -c conda-forge scikit-learn You can read the release highlights under https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html and the long list of the changes under https://scikit-learn.org/stable/whats_new/v1.4.html This version supports Python versions 3.9 to 3.12. Thanks to all contributors who helped on this release ! J?r?mie, On behalf of the scikit-learn maintainers team. From lorentzen.ch at gmail.com Fri Jan 19 11:32:36 2024 From: lorentzen.ch at gmail.com (Christian Lorentzen) Date: Fri, 19 Jan 2024 17:32:36 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.4.0 release In-Reply-To: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr> References: <478b3e81-3b41-45dc-acd2-2e09ec4db5b1@inria.fr> Message-ID: <822B1F6A-BB14-4B39-8454-17D93B5AF139@gmail.com> Thank you very much, J?r?mie, for taking care of this release. I?m excited to use the new features and improvements. Christian > > Am 19.01.2024 um 12:18 schrieb Jeremie du Boisberranger : > > ?Hi everyone, > > We're happy to announce the 1.4.0 release which you can install via pip or conda: > > pip install -U scikit-learn > > or > > conda install -c conda-forge scikit-learn > > > You can read the release highlights under https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html and the long list of the changes under https://scikit-learn.org/stable/whats_new/v1.4.html > > This version supports Python versions 3.9 to 3.12. > > Thanks to all contributors who helped on this release ! > > J?r?mie, > On behalf of the scikit-learn maintainers team. > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn From apoorva.kulkarni at rwth-aachen.de Fri Jan 26 13:47:05 2024 From: apoorva.kulkarni at rwth-aachen.de (Kulkarni, Apoorva) Date: Fri, 26 Jan 2024 18:47:05 +0000 Subject: [scikit-learn] Decsion tree Visualization Message-ID: Hello, For an academic project I have used decision tree with depth of 70. To document the data I need visual tree represention only upto depth of 5. Is there any way to do that? Please suggest. Apoorva Get Outlook for Android -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.braune79 at gmail.com Fri Jan 26 13:53:34 2024 From: christian.braune79 at gmail.com (Christian Braune) Date: Fri, 26 Jan 2024 19:53:34 +0100 Subject: [scikit-learn] Decsion tree Visualization In-Reply-To: References: Message-ID: Hello Apoorva, have you tried this function: https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html ? It has a max_depth parameter which might just do, what you need. Have a nice weekend! Kulkarni, Apoorva schrieb am Fr., 26. Jan. 2024, 19:49: > Hello, > > For an academic project I have used decision tree with depth of 70. > > To document the data I need visual tree represention only upto depth of 5. > Is there any way to do that? Please suggest. > > Apoorva > > > > Get Outlook for Android > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apoorva.kulkarni at rwth-aachen.de Fri Jan 26 14:06:26 2024 From: apoorva.kulkarni at rwth-aachen.de (Kulkarni, Apoorva) Date: Fri, 26 Jan 2024 19:06:26 +0000 Subject: [scikit-learn] Decsion tree Visualization In-Reply-To: References: , Message-ID: Hello, The suggested solution worked. We are beginners in this domain, hence grateful for your valuable input. Thank you so much for your prompt help. Apoorva ________________________________ From: scikit-learn on behalf of Christian Braune Sent: Friday, January 26, 2024 7:53:34 PM To: Scikit-learn mailing list Subject: Re: [scikit-learn] Decsion tree Visualization Hello Apoorva, have you tried this function: https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html ? It has a max_depth parameter which might just do, what you need. Have a nice weekend! Kulkarni, Apoorva > schrieb am Fr., 26. Jan. 2024, 19:49: Hello, For an academic project I have used decision tree with depth of 70. To document the data I need visual tree represention only upto depth of 5. Is there any way to do that? Please suggest. Apoorva Get Outlook for Android _______________________________________________ scikit-learn mailing list scikit-learn at python.org https://mail.python.org/mailman/listinfo/scikit-learn -------------- next part -------------- An HTML attachment was scrubbed... URL: From mk1853387 at gmail.com Sun Jan 28 13:16:10 2024 From: mk1853387 at gmail.com (marc nicole) Date: Sun, 28 Jan 2024 19:16:10 +0100 Subject: [scikit-learn] How to create a binary tree hierarchy given a list of elements as its leaves Message-ID: So I am trying to build a binary tree hierarchy given numerical elements serving for its leaves (last level of the tree to build). From the leaves I want to randomly create a name for the higher level of the hierarchy and assign it to the children elements. For example: if the elements inputted are `0,1,2,3` then I would like to create firstly 4 elements (say by random giving them a label composed of a letter and a number) then for the second level (iteration) I assign each of 0,1 to a random name label (e.g. `b1`) and `2,3` to another label (`b2`) then for the third level I assign a parent label to each of `b1` and `b2` as `c1`. An illustation of the example is the following tree: [image: tree_exp.PNG] For this I use numpy's `array_split()` to get the chunks of arrays based on the iteration needs. for example to get the first iteration arrays I use `np.array_split(input, (input.size // k))` where `k` is an even number. In order to assign a parent node to the children the array range should enclose the children's. For example to assign the parent node with label `a1` to children `b1` and `b2` with range respectively [0,1] and [2,3], the parent should have the range [0,3]. All is fine until a certain iteration (k=4) returns parent with range [0,8] which is overlapping to children ranges and therefore cannot be their parent. My question is how to evenly partition such arrays in a binary way and create such binary tree so that to obtain for k=4 the first range to be [0,7] instead of [0,8]? My code is the following: #!/usr/bin/python # -*- coding: utf-8 -*- import string import random import numpy as np def generate_numbers_list_until_number(stop_number): if str(stop_number).isnumeric(): return np.arange(stop_number) else: raise TypeError('Input should be a number!') def generate_node_label(): return random.choice(string.ascii_lowercase) \ + str(random.randint(0, 10)) def main(): data = generate_numbers_list_until_number(100) k = 1 hierarchies = [] cells_arrays = np.array_split(data, data.size // k) print cells_arrays used_node_hierarchy_name = [] node_hierarchy_name = [generate_node_label() for _ in range(0, len(cells_arrays))] used_node_hierarchy_name.extend(node_hierarchy_name) while len(node_hierarchy_name) > 1: k = k * 2 # bug here in the following line cells_arrays = list(map(lambda x: [x[0], x[-1]], np.array_split(data, data.size // k))) print cells_arrays node_hierarchy_name = [] # node hierarchy names should not be redundant in another level for _ in range(0, len(cells_arrays)): node_name = generate_node_label() while node_name in used_node_hierarchy_name: node_name = generate_node_label() node_hierarchy_name.append(node_name) used_node_hierarchy_name.extend(node_hierarchy_name) print used_node_hierarchy_name hierarchies.append(list(zip(node_hierarchy_name, cells_arrays))) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tree_exp.PNG Type: image/png Size: 21487 bytes Desc: not available URL: From mdiramali at yahoo.com Mon Jan 29 08:53:24 2024 From: mdiramali at yahoo.com (Murat DIRAMALI) Date: Mon, 29 Jan 2024 13:53:24 +0000 (UTC) Subject: [scikit-learn] Data Analysis Advice References: <587408342.1192495.1706536404241.ref@mail.yahoo.com> Message-ID: <587408342.1192495.1706536404241@mail.yahoo.com> Hello,I need an advice on the usage of K-Fold cross-validation for the master's thesis I'm supervising. As I know, we run it with the best parameters but do we use train or test dataset? I'm sharing the python code that I'm working on. I would appreciate if you correct my mistakes.Yours sincerely,Murat D?ramal? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Code Snippet.py Type: text/x-python Size: 2449 bytes Desc: not available URL: