From sepand.haghighi at yahoo.com Fri Dec 7 09:27:28 2018 From: sepand.haghighi at yahoo.com (Sepand Haghighi) Date: Fri, 7 Dec 2018 14:27:28 +0000 (UTC) Subject: [SciPy-User] Confusion matrix statistical analysis References: <414704346.419417.1544192848931.ref@mail.yahoo.com> Message-ID: <414704346.419417.1544192848931@mail.yahoo.com> Dear All,? Here I want to introduce an open source Python library which named PyCM. PyCM is a machine learning library providing statistical analysis of confusion matrix through a large variety of parameters such as AUC, Confusion Entropy, information theory related parameters, and etc. This developing library can be used in order to evaluate the performance of different machine learning algorithms by offering different evaluation parameters on their resulted confusion matrix. PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers. Do not hesitate to contact us about this library and help us to develop it by your valuable suggestions.You can find us on??https://github.com/sepandhaghighi/pycm -------------- next part -------------- An HTML attachment was scrubbed... URL: From diallobakary4 at gmail.com Fri Dec 7 11:53:10 2018 From: diallobakary4 at gmail.com (Bakary N'tji Diallo) Date: Fri, 7 Dec 2018 18:53:10 +0200 Subject: [SciPy-User] Are the scores normally distributed? Message-ID: Dear all, Hope you are doing very well. I am trying to apply a statistical normalization which require the values to be normally distributed. I have prepared a short notebook with all details. https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb It will be great if someone can help me out. Thanks Best regards -- Bakary N?tji DIALLO PhD Student (Bioinformatics) , Research Unit in Bioinformatics (RUBi) Mail: diallobakary4 at gmail.com | Skype: diallobakary4 Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Fri Dec 7 12:02:16 2018 From: pmhobson at gmail.com (Paul Hobson) Date: Fri, 7 Dec 2018 09:02:16 -0800 Subject: [SciPy-User] Are the scores normally distributed? In-Reply-To: References: Message-ID: I think you misunderstand the null hypothesis. The null hypothesis for this test is that the data are *not* normally distributed. Since the p-value is your examples is 0.0003 (i.e., less than 0.001), you can reject the null hypothesis, suggesting that your data are normally distributed. -Paul On Fri, Dec 7, 2018 at 8:54 AM Bakary N'tji Diallo wrote: > Dear all, > Hope you are doing very well. > > I am trying to apply a statistical normalization which require the values > to be normally distributed. > I have prepared a short notebook with all details. > > https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb > > It will be great if someone can help me out. > > Thanks > Best regards > -- > > Bakary N?tji DIALLO > > PhD Student (Bioinformatics) , Research > Unit in Bioinformatics (RUBi) > > Mail: diallobakary4 at gmail.com | Skype: diallobakary4 > > Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14 > > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.akhiyarov at gmail.com Fri Dec 7 12:51:34 2018 From: denis.akhiyarov at gmail.com (Denis Akhiyarov) Date: Fri, 7 Dec 2018 11:51:34 -0600 Subject: [SciPy-User] Confusion matrix statistical analysis In-Reply-To: <414704346.419417.1544192848931@mail.yahoo.com> References: <414704346.419417.1544192848931.ref@mail.yahoo.com> <414704346.419417.1544192848931@mail.yahoo.com> Message-ID: Well done! I'm using Scikit-Learn metrics for classifiers, what is missing there that you bring to the table? Thanks, Denis On Fri, Dec 7, 2018, 8:28 AM Sepand Haghighi wrote: > Dear All, > > Here I want to introduce an open source Python library which named PyCM. > PyCM is a machine learning library providing statistical analysis of > confusion matrix through a large variety of parameters such as AUC, > Confusion Entropy, information theory related parameters, and etc. This > developing library can be used in order to evaluate the performance of > different machine learning algorithms by offering different evaluation > parameters on their resulted confusion matrix. > > PyCM is a multi-class confusion matrix library written in Python that > supports both input data vectors and direct matrix, and a proper tool for > post-classification model evaluation that supports most classes and overall > statistics parameters. PyCM is the swiss-army knife of confusion matrices, > targeted mainly at data scientists that need a broad array of metrics for > predictive models and an accurate evaluation of large variety of > classifiers. > > Do not hesitate to contact us about this library and help us to develop it > by your valuable suggestions. > You can find us on https://github.com/sepandhaghighi/pycm > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerrit.holl at gmail.com Fri Dec 7 12:59:23 2018 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Fri, 7 Dec 2018 17:59:23 +0000 Subject: [SciPy-User] slides for python3 in science advocacy Message-ID: Hi, does anyone have a presentation format (bibtex or pdf or otherwise) advocating why scientists should care about Python 3? A bit like https://python-3-for-scientists.readthedocs.io/en/latest/python3_features.html but in a presentation form. I find that frighteningly many colleagues are still using Python 2 even when they write new code from scratch. I could use pandoc to convert python-3-for-scientists to tex and then manually turn it into a collection of beamer frames, but maybe someone has already done something similar? regards, Gerrit. From josef.pktd at gmail.com Fri Dec 7 15:05:04 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 7 Dec 2018 15:05:04 -0500 Subject: [SciPy-User] Are the scores normally distributed? In-Reply-To: References:

Message-ID: On Fri, Dec 7, 2018 at 12:08 PM Paul Hobson wrote: > I think you misunderstand the null hypothesis. > > The null hypothesis for this test is that the data are *not* normally > distributed. > That's not correct. The null hypothesis is the data come from a normal distribution. My guess is that because of the relatively large sample size, the power is quite large and the test detects relatively small deviation from normality. len(x) Out[8]: 1444 stats.skewtest(x) Out[9]: SkewtestResult(statistic=1.79241121722139, pvalue=0.073067119279312559) stats.kurtosistest(x) Out[10]: KurtosistestResult(statistic=3.5348152259352097, pvalue=0.00040806039300234271) According the the two separate tests that are combined in the normal test, the data has heavier tails, larger kurtosis than the normal distribution. (Using kstest as distance measure, however, shows that the normal distribution matches the data better than a t distribution with smaller df. Note, pvalues for kstest don't apply because loc and scale are estimated.) Josef > > Since the p-value is your examples is 0.0003 (i.e., less than 0.001), you > can reject the null hypothesis, suggesting that your data are normally > distributed. > -Paul > > On Fri, Dec 7, 2018 at 8:54 AM Bakary N'tji Diallo < > diallobakary4 at gmail.com> wrote: > >> Dear all, >> Hope you are doing very well. >> >> I am trying to apply a statistical normalization which require the values >> to be normally distributed. >> I have prepared a short notebook with all details. >> >> https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb >> >> It will be great if someone can help me out. >> >> Thanks >> Best regards >> -- >> >> Bakary N?tji DIALLO >> >> PhD Student (Bioinformatics) , Research >> Unit in Bioinformatics (RUBi) >> >> Mail: diallobakary4 at gmail.com | Skype: diallobakary4 >> >> Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14 >> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at python.org >> https://mail.python.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From diallobakary4 at gmail.com Fri Dec 7 23:52:44 2018 From: diallobakary4 at gmail.com (Bakary N'tji Diallo) Date: Sat, 8 Dec 2018 06:52:44 +0200 Subject: [SciPy-User] Are the scores normally distributed? In-Reply-To: References:

Message-ID: Thank you for your replies. About the large sample size, just for clarification, this is not a sample, these are all the scores. Should I do a random sampling? Other approach I tried was to normalize the data using the following approach: x = x - 2*x log_data = np.log(x) # to transform scores into positive value to apply the log function The log_data was also found to be not normally distributed. Le ven. 7 d?c. 2018 ? 22:05, a ?crit : > > > On Fri, Dec 7, 2018 at 12:08 PM Paul Hobson wrote: > >> I think you misunderstand the null hypothesis. >> >> The null hypothesis for this test is that the data are *not* normally >> distributed. >> > > That's not correct. The null hypothesis is the data come from a normal > distribution. > > My guess is that because of the relatively large sample size, the power is > quite large and the test detects relatively small deviation from normality. > > len(x) > Out[8]: 1444 > > stats.skewtest(x) > Out[9]: SkewtestResult(statistic=1.79241121722139, > pvalue=0.073067119279312559) > > stats.kurtosistest(x) > Out[10]: KurtosistestResult(statistic=3.5348152259352097, > pvalue=0.00040806039300234271) > > According the the two separate tests that are combined in the normal test, > the data has heavier tails, larger kurtosis than the normal distribution. > > (Using kstest as distance measure, however, shows that the normal > distribution matches the data better than a t distribution with smaller df. > Note, pvalues for kstest don't apply because loc and scale are estimated.) > > Josef > > > > >> >> Since the p-value is your examples is 0.0003 (i.e., less than 0.001), you >> can reject the null hypothesis, suggesting that your data are normally >> distributed. >> -Paul >> >> On Fri, Dec 7, 2018 at 8:54 AM Bakary N'tji Diallo < >> diallobakary4 at gmail.com> wrote: >> >>> Dear all, >>> Hope you are doing very well. >>> >>> I am trying to apply a statistical normalization which require the >>> values to be normally distributed. >>> I have prepared a short notebook with all details. >>> >>> https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb >>> >>> It will be great if someone can help me out. >>> >>> Thanks >>> Best regards >>> -- >>> >>> Bakary N?tji DIALLO >>> >>> PhD Student (Bioinformatics) , Research >>> Unit in Bioinformatics (RUBi) >>> >>> Mail: diallobakary4 at gmail.com | Skype: diallobakary4 >>> >>> Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14 >>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at python.org >>> https://mail.python.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at python.org >> https://mail.python.org/mailman/listinfo/scipy-user >> > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > -- Bakary N?tji DIALLO PhD Student (Bioinformatics) , Research Unit in Bioinformatics (RUBi) Mail: diallobakary4 at gmail.com | Skype: diallobakary4 Tel: ?+27798233845 | +223 74 56 57 22 | +223 97 39 77 14 -------------- next part -------------- An HTML attachment was scrubbed... URL: From elastica at laposte.net Sat Dec 8 01:25:52 2018 From: elastica at laposte.net (elastica at laposte.net) Date: Sat, 8 Dec 2018 07:25:52 +0100 (CET) Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> Message-ID: <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> Hi, Scipy library provides C-implementation for some classical graph algorithms like Kruskal or connected component finding. Nevertheless, there is an efficiency problem: sparse graphs are usually given by list of edges or adjacency list (and not adjacency matrix). And in order to run the Scipy routines, we have to convert our graphs to the compressed SCR format (dense array is not well suited for graph with a lot of vertices, say more than 20,000). And sometimes, after execution we have to convert back. This causes the routine spending most of the execution time in conversion and retro-conversion tasks. So my question is: does Scipy provide a C-routine for converting and back converting to edge/adjacency list format? From robert.kern at gmail.com Sat Dec 8 02:17:45 2018 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 7 Dec 2018 23:17:45 -0800 Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> Message-ID: On Fri, Dec 7, 2018 at 10:51 PM wrote: > > Hi, > > Scipy library provides C-implementation for some classical graph algorithms like Kruskal or connected component finding. Nevertheless, there is an efficiency problem: sparse graphs are usually given by list of edges or adjacency list (and not adjacency matrix). And in order to run the Scipy routines, we have to convert our graphs to the compressed SCR format (dense array is not well suited for graph with a lot of vertices, say more than 20,000). And sometimes, after execution we have to convert back. This causes the routine spending most of the execution time in conversion and retro-conversion tasks. > > So my question is: does Scipy provide a C-routine for converting and back converting to edge/adjacency list format? How are you currently doing the conversion? The most efficient method is probably to convert from the CSR format to COO, which provides two parallel arrays giving the row and column indices of the edges. These can be simply zip()ed together to get a list of edge tuples. |30> import numpy as np |31> Nnodes = 20000 |32> edges = [] |33> for i in range(Nnodes): ...> js = np.random.permutation(Nnodes)[:int(np.sqrt(Nnodes))] ...> edges.extend([(i, j) for j in js]) ...> # Construct a CSR matrix of the graph, reasonably efficiently. |35> from scipy import sparse |36> row_ind, col_ind = np.transpose(edges) |37> Gcsr = sparse.csr_matrix((np.ones_like(row_ind), (row_ind, col_ind)), shape=(Nnodes, ...> Nnodes)) # Now convert back to a list of edge tuples. |39> Gcoo = Gcsr.tocoo() |40> coo_edges = zip(Gcoo.row, Gcoo.col) |41> set(edges) == set(coo_edges) True -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From elastica at laposte.net Sat Dec 8 09:26:28 2018 From: elastica at laposte.net (elastica at laposte.net) Date: Sat, 8 Dec 2018 15:26:28 +0100 (CET) Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> Message-ID: <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> Thanks Robert for your hepfull response, now I better understand how csr conversion works. I'm benchmarking some graph libraries. My edges-to-csr conversion was hand-written in Python, so very slow! Now, with your code, execution time is 7 times faster! Here is the code, perhaps we can optimize more: # ============================================= from time import clock from sys import stderr, stdin from scipy.sparse.csgraph import minimum_spanning_tree from scipy.sparse import csr_matrix import numpy as np # ------------ Scipy Code --------------------------- def wedges2adj(edges, n): G=[[]for _ in range(n)] for a, b, w in edges: G[a].append((b,w)) G[b].append((a,w)) return G def wedges2scr(edges, n): G=wedges2adj(edges, n) indptr=[0] cnt=0 for line in G: cnt+=len(line) indptr.append(cnt) data=[] indices=[] for i in range(n): for (j,w) in G[i]: data.append(w) indices.append(j) return [data, indptr, indices] def csr2wedges(Mcsr, shape): n, p=shape k=0 edges=[] data, cumul, cols = Mcsr.data,Mcsr.indptr, Mcsr.indices for i in range(n): for j in range(cumul[i+1]-cumul[i]): edges.append((i, cols[k], data[k])) k+=1 return edges def kruskal_scipy(edges, n): data, indptr, indices=wedges2scr(edges, n) csr=csr_matrix((data, indices, indptr), shape=(n, n)) tree=minimum_spanning_tree(csr) edges=csr2wedges(tree, (n,n)) return int(sum((round(w)) for (a,b,w) in edges)) # ??????????????????????? def wedges2scr_FAST(wedges, n): row_ind, col_ind, costs = np.transpose(wedges) return csr_matrix((costs, (row_ind, col_ind)), shape=(n,n)) def kruskal_scipy_FAST(wedges, n): csr=wedges2scr_FAST(wedges, n) tree=minimum_spanning_tree(csr).tocoo() return int(round(sum(w for (i,j,w) in zip(tree.row, tree.col, tree.data)))) # ------------ Benchmark --------------------------- def go(solver, L): global duration N=len(L) k=1 solution=[] while k References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> Message-ID: Hmm. I'm not an expert here, but did you try using the "networkx" package? On Sat, Dec 8, 2018 at 7:36 AM wrote: > > Thanks Robert for your hepfull response, now I better understand how csr > conversion works. > > I'm benchmarking some graph libraries. My edges-to-csr conversion was > hand-written in Python, so very slow! Now, with your code, execution time > is 7 times faster! > > Here is the code, perhaps we can optimize more: > > > # ============================================= > from time import clock > from sys import stderr, stdin > > from scipy.sparse.csgraph import minimum_spanning_tree > from scipy.sparse import csr_matrix > import numpy as np > > > # ------------ Scipy Code --------------------------- > > def wedges2adj(edges, n): > > G=[[]for _ in range(n)] > for a, b, w in edges: > G[a].append((b,w)) > G[b].append((a,w)) > return G > > def wedges2scr(edges, n): > G=wedges2adj(edges, n) > indptr=[0] > cnt=0 > > for line in G: > cnt+=len(line) > indptr.append(cnt) > data=[] > indices=[] > for i in range(n): > for (j,w) in G[i]: > data.append(w) > indices.append(j) > > return [data, indptr, indices] > > def csr2wedges(Mcsr, shape): > n, p=shape > k=0 > edges=[] > data, cumul, cols = Mcsr.data,Mcsr.indptr, Mcsr.indices > for i in range(n): > for j in range(cumul[i+1]-cumul[i]): > edges.append((i, cols[k], data[k])) > k+=1 > return edges > > def kruskal_scipy(edges, n): > data, indptr, indices=wedges2scr(edges, n) > csr=csr_matrix((data, indices, indptr), shape=(n, n)) > tree=minimum_spanning_tree(csr) > edges=csr2wedges(tree, (n,n)) > return int(sum((round(w)) for (a,b,w) in edges)) > > # ??????????????????????? > > > > def wedges2scr_FAST(wedges, n): > row_ind, col_ind, costs = np.transpose(wedges) > return csr_matrix((costs, (row_ind, col_ind)), > shape=(n,n)) > > def kruskal_scipy_FAST(wedges, n): > csr=wedges2scr_FAST(wedges, n) > tree=minimum_spanning_tree(csr).tocoo() > return int(round(sum(w for (i,j,w) in zip(tree.row, tree.col, > tree.data)))) > > > # ------------ Benchmark --------------------------- > > def go(solver, L): > global duration > > N=len(L) > k=1 > solution=[] > while k edges=[] > n=L[k] > k+=1 > > for a in range(n): > d=L[k] > k+=1 > for j in range(d): > b=L[k] > k+=1 > w=L[k] > k+=1 > if a edges.append([a,b-1,w]) > if solver==kruskal_scipy: > data, indptr, indices=wedges2scr(edges, n) > > start=clock() > csr=csr_matrix((data, indices, indptr), shape=(n, n)) > > tree=minimum_spanning_tree(csr) > edges=csr2wedges(tree, (n,n)) > sol=solver(edges, n) > duration+=clock()-start > solution.append(sol) > else: > > start=clock() > sol=solver(edges, n) > duration+=clock()-start > solution.append(sol) > return solution > > # data > L=list(int(z) for z in stdin.read().split() if z.isdigit()) > output=[] > solvers=[kruskal_scipy, kruskal_scipy_FAST] > for solver in solvers: > duration=0 > costs=go(solver, L) > output.append(costs) > print("%-20s : %.3f" %(solver.__name__, duration), file=stderr) > > if all(output[i]== output[0] for i in range(len(solvers))): > print("NO error detected", file=stderr) > else: > print("ERROR detected", file=stderr) > > # ============================================= > > > Link to data file (100 graphs with at most 30000 vertices each): > > > https://drive.google.com/file/d/1BKxRAzJ9jeowbrFPyLZ23EomKn1eqohs/view?usp=sharing > > The above code is faster than Networkit library code and little slower > than Graph-Tool library. > > I have two observations: > > 1. Python code is only 2 times slower > 2. Pure C-code is about 9 times faster > > In each case, even SciPy cf. _min_spanning_tree.pyx, algorithm is the > same: kruskal with Union-find. > > > > > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elastica at laposte.net Sat Dec 8 11:49:14 2018 From: elastica at laposte.net (elastica at laposte.net) Date: Sat, 8 Dec 2018 17:49:14 +0100 (CET) Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> Message-ID: <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> > Hmm. I'm not an expert here, but did you try using the "networkx" package? Of course I did. NetworkX has a very clean interface, clean code, is nicelly documented, well maintained, provides many features. But so slow :( Here are the results against the data file from my previous post: kruskal_networkX : 29.411 kruskal_python : 3.021 kruskal_gt : 1.384 kruskal_networkit : 2.501 kruskal_igraph : 7.530 kruskal_scipy : 10.744 kruskal_scipy_FAST : 1.509 NO error detected [gt=Graph-tool] From guillaume at damcb.com Sat Dec 8 12:31:11 2018 From: guillaume at damcb.com (Guillaume Gay) Date: Sat, 08 Dec 2018 18:31:11 +0100 Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> Message-ID: <20D97E88-8646-4196-9582-EAC81FDB9FDF@damcb.com> Have you looked at graph-tool, which is fast? Le 8 d?cembre 2018 17:49:14 GMT+01:00, elastica at laposte.net a ?crit : > > > > > >> Hmm. I'm not an expert here, but did you try using the "networkx" >package? > > >Of course I did. NetworkX has a very clean interface, clean code, is >nicelly documented, well maintained, provides many features. But so >slow :( > >Here are the results against the data file from my previous post: > >kruskal_networkX : 29.411 >kruskal_python : 3.021 >kruskal_gt : 1.384 >kruskal_networkit : 2.501 >kruskal_igraph : 7.530 >kruskal_scipy : 10.744 >kruskal_scipy_FAST : 1.509 >NO error detected > >[gt=Graph-tool] > > >_______________________________________________ >SciPy-User mailing list >SciPy-User at python.org >https://mail.python.org/mailman/listinfo/scipy-user -- Envoy? de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma bri?vet?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elastica at laposte.net Sat Dec 8 12:45:13 2018 From: elastica at laposte.net (elastica at laposte.net) Date: Sat, 8 Dec 2018 18:45:13 +0100 (CET) Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: <20D97E88-8646-4196-9582-EAC81FDB9FDF@damcb.com> References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> <20D97E88-8646-4196-9582-EAC81FDB9FDF@damcb.com> Message-ID: <2113275939.15047554.1544291113156.JavaMail.zimbra@laposte.net> > Have you looked at graph-tool, which is fast? Yes, I did, cf. the previous post: > >kruskal_gt : 1.384 [snip] > >[gt=Graph-tool] Graph-tool performs well but I was expecting better timing, it's only 2 times faster than pure Python code where Kruskal is implemented with a basic Union-Find, we are far from genuine C/C++ performance, because usually graph implementations are (about) 20 times faster in C/C++ than pure Python. From guillaume at damcb.com Sat Dec 8 12:31:57 2018 From: guillaume at damcb.com (Guillaume Gay) Date: Sat, 08 Dec 2018 18:31:57 +0100 Subject: [SciPy-User] Conversion to graph adjacency list In-Reply-To: <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> References: <2056612712.11005819.1544250184316.JavaMail.zimbra@laposte.net> <105739704.11014979.1544250352965.JavaMail.zimbra@laposte.net> <540997379.13977248.1544279188282.JavaMail.zimbra@laposte.net> <1674266595.14738071.1544287754394.JavaMail.zimbra@laposte.net> Message-ID: Sorry, I saw your benchmark too late! Le 8 d?cembre 2018 17:49:14 GMT+01:00, elastica at laposte.net a ?crit : > > > > > >> Hmm. I'm not an expert here, but did you try using the "networkx" >package? > > >Of course I did. NetworkX has a very clean interface, clean code, is >nicelly documented, well maintained, provides many features. But so >slow :( > >Here are the results against the data file from my previous post: > >kruskal_networkX : 29.411 >kruskal_python : 3.021 >kruskal_gt : 1.384 >kruskal_networkit : 2.501 >kruskal_igraph : 7.530 >kruskal_scipy : 10.744 >kruskal_scipy_FAST : 1.509 >NO error detected > >[gt=Graph-tool] > > >_______________________________________________ >SciPy-User mailing list >SciPy-User at python.org >https://mail.python.org/mailman/listinfo/scipy-user -- Envoy? de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma bri?vet?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Dec 20 11:16:50 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 20 Dec 2018 09:16:50 -0700 Subject: [SciPy-User] NumPy 1.16.0rc1 released Message-ID: Hi All, On behalf of the NumPy team I'm pleased to announce the release of NumPy 1.16.0rc1. This is the last NumPy release to support Python 2.7 and will be maintained as a long term release with bug fixes until 2020. This release has seen a lot of refactoring and features many bug fixes, improved code organization, and better cross platform compatibility. Not all of these improvements will be visible to users, but they should help make maintenance easier going forward. Highlights are - Experimental support for overriding numpy functions in downstream projects. - The matmul function is now a ufunc and can be overridden using __array_ufunc__. - Improved support for the ARM and POWER architectures. - Improved support for AIX and PyPy. - Improved interoperation with ctypes. - Improved support for PEP 3118. The supported Python versions are 2.7 and 3.5-3.7, support for 3.4 has been dropped. The wheels on PyPI are linked with OpenBLAS v0.3.4+, which should fix the known threading issues found in previous OpenBLAS versions. Downstream developers building this release should use Cython >= 0.29 and, if linking OpenBLAS, OpenBLAS > v0.3.4. Wheels for this release can be downloaded from PyPI , source archives are available from Github . *Contributors* A total of 111 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Alan Fontenot + - Allan Haldane - Alon Hershenhorn + - Alyssa Quek + - Andreas Nussbaumer + - Anner + - Anthony Sottile + - Antony Lee - Ayappan P + - Bas van Schaik + - C.A.M. Gerlach + - Charles Harris - Chris Billington - Christian Clauss - Christoph Gohlke - Christopher Pezley + - Daniel B Allan + - Daniel Smith - Dawid Zych + - Derek Kim + - Dima Pasechnik + - Edgar Giovanni Lepe + - Elena Mokeeva + - Elliott Sales de Andrade + - Emil Hessman + - Eric Schles + - Eric Wieser - Giulio Benetti + - Guillaume Gautier + - Guo Ci - Heath Henley + - Isuru Fernando + - J. Lewis Muir + - Jack Vreeken + - Jaime Fernandez - James Bourbeau - Jeff VanOss - Jeffrey Yancey + - Jeremy Chen + - Jeremy Manning + - Jeroen Demeyer - John Darbyshire + - John Zwinck - Jonas Jensen + - Joscha Reimer + - Juan Azcarreta + - Julian Taylor - Kevin Sheppard - Krzysztof Chomski + - Kyle Sunden - Lars Gr?ter - Lilian Besson + - MSeifert04 - Mark Harfouche - Marten van Kerkwijk - Martin Thoma - Matt Harrigan + - Matthew Bowden + - Matthew Brett - Matthias Bussonnier - Matti Picus - Max Aifer + - Michael Hirsch, Ph.D + - Michael James Jamie Schnaitter + - MichaelSaah + - Mike Toews - Minkyu Lee + - Mircea Akos Bruma + - Mircea-Akos Brum? + - Moshe Looks + - Muhammad Kasim + - Nathaniel J. Smith - Nikita Titov + - Paul M?ller + - Paul van Mulbregt - Pauli Virtanen - Pierre Glaser + - Pim de Haan - Ralf Gommers - Robert Kern - Robin Aggleton + - Rohit Pandey + - Roman Yurchak + - Ryan Soklaski - Sebastian Berg - Sho Nakamura + - Simon Gibbons - Stan Seibert + - Stefan Otte - Stefan van der Walt - Stephan Hoyer - Stuart Archibald - Taylor Smith + - Tim Felgentreff + - Tim Swast + - Tim Teichmann + - Toshiki Kataoka - Travis Oliphant - Tyler Reddy - Uddeshya Singh + - Warren Weckesser - Weitang Li + - Wenjamin Petrenko + - William D. Irons - Yannick Jadoul + - Yaroslav Halchenko - Yug Khanna + - Yuji Kanagawa + - Yukun Guo + - lerbuke + - @ankokumoyashi + Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From diallobakary4 at gmail.com Fri Dec 21 03:55:41 2018 From: diallobakary4 at gmail.com (Bakary N'tji Diallo) Date: Fri, 21 Dec 2018 10:55:41 +0200 Subject: [SciPy-User] Are the scores normally distributed? In-Reply-To: References:

Message-ID: I am reading this: "With large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems (4); this implies that we can use parametric procedures even when the data are not normally distributed (8)." from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693611/ Can I then use the normalization procedure given the large sample size? The normalization is simply calculating the zscore as in (here ) using the mean and standard deviation. Le sam. 8 d?c. 2018 ? 06:52, Bakary N'tji Diallo a ?crit : > Thank you for your replies. > About the large sample size, just for clarification, this is not a sample, > these are all the scores. > Should I do a random sampling? > Other approach I tried was to normalize the data using the following > approach: > x = x - 2*x > log_data = np.log(x) # to transform scores into positive value to apply > the log function > The log_data was also found to be not normally distributed. > > Le ven. 7 d?c. 2018 ? 22:05, a ?crit : > >> >> >> On Fri, Dec 7, 2018 at 12:08 PM Paul Hobson wrote: >> >>> I think you misunderstand the null hypothesis. >>> >>> The null hypothesis for this test is that the data are *not* normally >>> distributed. >>> >> >> That's not correct. The null hypothesis is the data come from a normal >> distribution. >> >> My guess is that because of the relatively large sample size, the power >> is quite large and the test detects relatively small deviation from >> normality. >> >> len(x) >> Out[8]: 1444 >> >> stats.skewtest(x) >> Out[9]: SkewtestResult(statistic=1.79241121722139, >> pvalue=0.073067119279312559) >> >> stats.kurtosistest(x) >> Out[10]: KurtosistestResult(statistic=3.5348152259352097, >> pvalue=0.00040806039300234271) >> >> According the the two separate tests that are combined in the normal >> test, the data has heavier tails, larger kurtosis than the normal >> distribution. >> >> (Using kstest as distance measure, however, shows that the normal >> distribution matches the data better than a t distribution with smaller df. >> Note, pvalues for kstest don't apply because loc and scale are estimated.) >> >> Josef >> >> >> >> >>> >>> Since the p-value is your examples is 0.0003 (i.e., less than 0.001), >>> you can reject the null hypothesis, suggesting that your data are normally >>> distributed. >>> -Paul >>> >>> On Fri, Dec 7, 2018 at 8:54 AM Bakary N'tji Diallo < >>> diallobakary4 at gmail.com> wrote: >>> >>>> Dear all, >>>> Hope you are doing very well. >>>> >>>> I am trying to apply a statistical normalization which require the >>>> values to be normally distributed. >>>> I have prepared a short notebook with all details. >>>> >>>> https://nbviewer.jupyter.org/github/diallobakary4/bioinformatics/blob/master/Normatily_test.ipynb >>>> >>>> It will be great if someone can help me out. >>>> >>>> Thanks >>>> Best regards >>>> -- >>>> >>>> Bakary N?tji DIALLO >>>> >>>> PhD Student (Bioinformatics) , Research >>>> Unit in Bioinformatics (RUBi) >>>> >>>> Mail: diallobakary4 at gmail.com | Skype: diallobakary4 >>>> >>>> Tel: +27798233845 | +223 74 56 57 22 | +223 97 39 77 14 >>>> >>>> _______________________________________________ >>>> SciPy-User mailing list >>>> SciPy-User at python.org >>>> https://mail.python.org/mailman/listinfo/scipy-user >>>> >>> _______________________________________________ >>> SciPy-User mailing list >>> SciPy-User at python.org >>> https://mail.python.org/mailman/listinfo/scipy-user >>> >> _______________________________________________ >> SciPy-User mailing list >> SciPy-User at python.org >> https://mail.python.org/mailman/listinfo/scipy-user >> > > > -- > > Bakary N?tji DIALLO > > PhD Student (Bioinformatics) , Research > Unit in Bioinformatics (RUBi) > > Mail: diallobakary4 at gmail.com | Skype: diallobakary4 > > Tel: ?+27798233845 | +223 74 56 57 22 | +223 97 39 77 14 > > -- Bakary N?tji DIALLO PhD Student (Bioinformatics) , Research Unit in Bioinformatics (RUBi) Mail: diallobakary4 at gmail.com | Skype: diallobakary4 Tel: ?+27798233845 | +223 74 56 57 22 | +223 97 39 77 14 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Wed Dec 26 19:31:19 2018 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 27 Dec 2018 01:31:19 +0100 Subject: [SciPy-User] ANN: SfePy 2018.4 Message-ID: I am pleased to announce release 2018.4 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (limited support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/ Git (source) repository, issue tracker: https://github.com/sfepy/sfepy Highlights of this release -------------------------- - better support for eigenvalue problems - improved MUMPS solver interface - support for logging and plotting of complex values For full release notes see [1]. Cheers, Robert Cimrman [1] http://docs.sfepy.org/doc/release_notes.html#id1 --- Contributors to this release in alphabetical order: Robert Cimrman Vladimir Lukes Matyas Novak Jan Heczko Lubos Kejzlar