From markflorisson88 at gmail.com Sun Dec 11 01:02:09 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 11 Dec 2011 00:02:09 +0000 Subject: [Cython] [cython-users] malloc vs PyMem_malloc In-Reply-To: <4EE3D66F.5040505@behnel.de> References: <4EE3D66F.5040505@behnel.de> Message-ID: On 10 December 2011 22:00, Stefan Behnel wrote: > mark florisson, 10.12.2011 21:44: >> >> On 10 December 2011 20:39, mark florisson wrote: >> >>> On 10 December 2011 19:16, Robert Bradshaw wrote: >>>> >>>> On this note, a useful pattern is >>>> >>>> try: >>>> ? ?x = malloc(...) >>>> finally: >>>> ? ?free(x) >>>> >>>> It could be nice to encapsulate this in a context manager. > > > Absolutely. > > > >>> I think I'd prefer variable-sized arrays that would always get >>> deallocated on exit of the function > > > Why? A context manager is much clearer That is highly subjective, I think it would be harder to read and introduce more code blocks and nesting. > and gives users total control over > the lifetime of the memory. > Yes, but very often you don't need it. And if Cython would support declarations in blocks you'd get it for free. Supporting that (disregarding the difficulties of that) would also be helpful in identifying the scope and privatization rules in parallel blocks. The thing is that a context manager would be very Cython-specific, whereas most people are already familiar with arrays of variable size from C or Java. Lets compare the following statements and decide which is more aesthetically pleasing: cdef int array1[m] cdef double array2[n] vs cdef int *array1 cdef double *arrays2 with cython.malloc(sizeof(int) * m), cython.malloc(sizeof(double) * n) as array1, array2: ... > >>> (which could be implemented as C99 >>> variable sized arrays, with alloca or with malloc, depending on the >>> size of the array and the availability of the respective >>> functionalities). > > > That could still be done for a context manager, just like we do with > gil/nogil blocks today. > Sure (it was more of an observation than an argument). > >> That wouldn't tackle every use case, such as for instance mallocing >> stuff in a parallel section (until we get declarations in blocks!), >> but special cases can still just malloc and use try blocks, as >> demonstrated. > > > I would consider the usage of memory over the whole lifetime of a function > the special case, not the other way round. Yes, but the point is not where to deallocate the memory, the point is that you very often don't care. You need it somewhere in the function, and deallocation on return is fine (or, "at the end of the block"). Analogously, you don't 'del' your variables once you have stopped using them. I also gave this functionality some thought for memoryviews, e.g. cdef int[:m, :n] myslice # this gets you a view on a cython.array of size m * n > Stefan From stefan_ml at behnel.de Sun Dec 11 17:51:59 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 11 Dec 2011 17:51:59 +0100 Subject: [Cython] [cython-users] malloc vs PyMem_malloc In-Reply-To: References: <4EE3D66F.5040505@behnel.de> Message-ID: <4EE4DFAF.8080303@behnel.de> mark florisson, 11.12.2011 01:02: > On 10 December 2011 22:00, Stefan Behnel wrote: >> mark florisson, 10.12.2011 21:44: >>> On 10 December 2011 20:39, mark florisson wrote: >>>> On 10 December 2011 19:16, Robert Bradshaw wrote: >>>>> On this note, a useful pattern is >>>>> >>>>> try: >>>>> x = malloc(...) >>>>> finally: >>>>> free(x) >>>>> >>>>> It could be nice to encapsulate this in a context manager. >> >> Absolutely. >> >>>> I think I'd prefer variable-sized arrays that would always get >>>> deallocated on exit of the function >> >> Why? A context manager is much clearer > > That is highly subjective, I think it would be harder to read and > introduce more code blocks and nesting. > >> and gives users total control over >> the lifetime of the memory. > > Yes, but very often you don't need it. And if Cython would support > declarations in blocks you'd get it for free. Supporting that > (disregarding the difficulties of that) would also be helpful in > identifying the scope and privatization rules in parallel blocks. > > The thing is that a context manager would be very Cython-specific Not at all. It's the One Way To Do It in Python. Stefan From wesmckinn at gmail.com Mon Dec 12 21:09:18 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 12 Dec 2011 15:09:18 -0500 Subject: [Cython] Distributing Windows binary using OpenMP / cython.parallel Message-ID: I'm interested in using the Cython OpenMP extensions in pandas for various calculations, but I'm concerned about cross-platform issues, especially distributing built binaries of the extensions to Windows users. Is there a clean way to bundle the relevant OpenMP DLLs in distutils? thanks, Wes From markflorisson88 at gmail.com Mon Dec 12 22:37:33 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 12 Dec 2011 21:37:33 +0000 Subject: [Cython] Distributing Windows binary using OpenMP / cython.parallel In-Reply-To: References: Message-ID: On 12 December 2011 20:09, Wes McKinney wrote: > I'm interested in using the Cython OpenMP extensions in pandas for > various calculations, but I'm concerned about cross-platform issues, > especially distributing built binaries of the extensions to Windows > users. Is there a clean way to bundle the relevant OpenMP DLLs in > distutils? > > thanks, > Wes > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel As a non-Windows user I'm not quite sure, but I think users will need to install the C runtime: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=15336 . Maybe you can ship vcomp.dll as well, no idea. Maybe the distutils mailing list could be more helpful. From sturla at molden.no Tue Dec 13 12:47:33 2011 From: sturla at molden.no (Sturla Molden) Date: Tue, 13 Dec 2011 12:47:33 +0100 Subject: [Cython] Distributing Windows binary using OpenMP / cython.parallel In-Reply-To: References: Message-ID: <4EE73B55.9060807@molden.no> Den 12.12.2011 21:09, skrev Wes McKinney: > I'm interested in using the Cython OpenMP extensions in pandas for > various calculations, but I'm concerned about cross-platform issues, > especially distributing built binaries of the extensions to Windows > users. Is there a clean way to bundle the relevant OpenMP DLLs in > distutils? > > thanks, > Wes > Are you using MSVC or MinGW compiler? If you use MinGW, beware of licensing issues for the required pthreads library (pthreadsGC2.dll, I think it's LGPL). It is not a part of MinGW or GCC/GNU. So linking it statically can be a problem. As for the MSVC compiler, IIRC the OpenMP runtime is a part of the MSVC runtime DLLs which the user must install anyway. Sturla From wesmckinn at gmail.com Tue Dec 13 17:32:11 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 13 Dec 2011 11:32:11 -0500 Subject: [Cython] Distributing Windows binary using OpenMP / cython.parallel In-Reply-To: <4EE73B55.9060807@molden.no> References: <4EE73B55.9060807@molden.no> Message-ID: On Tue, Dec 13, 2011 at 6:47 AM, Sturla Molden wrote: > Den 12.12.2011 21:09, skrev Wes McKinney: > >> I'm interested in using the Cython OpenMP extensions in pandas for >> various calculations, but I'm concerned about cross-platform issues, >> especially distributing built binaries of the extensions to Windows >> users. Is there a clean way to bundle the relevant OpenMP DLLs in >> distutils? >> >> thanks, >> Wes >> > > Are you using MSVC or MinGW compiler? > > If you use MinGW, beware of licensing issues for the required pthreads > library (pthreadsGC2.dll, I think it's LGPL). It is not a part of MinGW or > GCC/GNU. So linking it statically can be a problem. > > As for the MSVC compiler, IIRC the OpenMP runtime is a part of the MSVC > runtime DLLs which the user must install anyway. > > > Sturla > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Using MSVC compiler. For regular binaries the user doesn't need to install anything. I guess I'll cross the bridge when I come to it. - Wes From L.J.Buitinck at uva.nl Sat Dec 17 13:16:00 2011 From: L.J.Buitinck at uva.nl (Lars Buitinck) Date: Sat, 17 Dec 2011 13:16:00 +0100 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails Message-ID: Hello all, I was trying to build a C++ extension with debugging support as described in http://docs.cython.org/src/userguide/debugging.html, but I got an error from the Cython compiler: lars at schoothond:~/src/sortedcollections$ python setup.py build_ext --inplace --pyrex-gdb running build_ext cythoning sortedcollections/set.pyx to sortedcollections/set.cpp Traceback (most recent call last): File "setup.py", line 9, in language="c++")] File "/usr/lib/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/usr/local/lib/python2.7/dist-packages/Cython/Distutils/build_ext.py", line 125, in run _build_ext.build_ext.run(self) File "/usr/lib/python2.7/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/usr/local/lib/python2.7/dist-packages/Cython/Distutils/build_ext.py", line 132, in build_extensions ext.sources = self.cython_sources(ext.sources, ext) File "/usr/local/lib/python2.7/dist-packages/Cython/Distutils/build_ext.py", line 275, in cython_sources full_module_name=module_name) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py", line 625, in compile return compile_single(source, options, full_module_name) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py", line 570, in compile_single return run_pipeline(source, options, full_module_name) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py", line 462, in run_pipeline err, enddata = Pipeline.run_pipeline(pipeline, source) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Pipeline.py", line 313, in run_pipeline data = phase(data) File "Visitor.py", line 276, in Cython.Compiler.Visitor.CythonTransform.__call__ (Cython/Compiler/Visitor.c:5173) File "Visitor.py", line 259, in Cython.Compiler.Visitor.VisitorTransform.__call__ (Cython/Compiler/Visitor.c:4917) File "Visitor.py", line 165, in Cython.Compiler.Visitor.TreeVisitor._visit (Cython/Compiler/Visitor.c:3401) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/ParseTreeTransforms.py", line 2613, in visit_ModuleNode self.visit_FuncDefNode(nested_funcdef) File "/usr/local/lib/python2.7/dist-packages/Cython/Compiler/ParseTreeTransforms.py", line 2660, in visit_FuncDefNode self.tb.start('Function', attrs=attrs) File "/usr/local/lib/python2.7/dist-packages/Cython/Debugger/DebugWriter.py", line 52, in start self.tb.start(name, attrs or {}) File "saxparser.pxi", line 398, in lxml.etree.TreeBuilder.start (src/lxml/lxml.etree.c:83668) File "saxparser.pxi", line 430, in lxml.etree.TreeBuilder._handleSaxStart (src/lxml/lxml.etree.c:84030) File "apihelpers.pxi", line 224, in lxml.etree._makeSubElement (src/lxml/lxml.etree.c:12831) File "apihelpers.pxi", line 219, in lxml.etree._makeSubElement (src/lxml/lxml.etree.c:12756) File "apihelpers.pxi", line 299, in lxml.etree._initNodeAttributes (src/lxml/lxml.etree.c:13615) File "apihelpers.pxi", line 1364, in lxml.etree._utf8 (src/lxml/lxml.etree.c:22190) TypeError: Argument must be bytes or unicode, got 'NoneType' My setup.py is : from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("sortedcollections.set", ["sortedcollections/set.pyx"], language="c++")] ) Importing Extension from Cython.Distutils.extension and adding pyrex_gdb=True to the Extension gives the same error. Regards, -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam From L.J.Buitinck at uva.nl Sat Dec 17 13:40:12 2011 From: L.J.Buitinck at uva.nl (Lars Buitinck) Date: Sat, 17 Dec 2011 13:40:12 +0100 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails In-Reply-To: References: Message-ID: 2011/12/17 Lars Buitinck : > I was trying to build a C++ extension with debugging support as > described in http://docs.cython.org/src/userguide/debugging.html, but > I got an error from the Cython compiler: Forgot to mention: I was using the very latest Git version of Cython, a04c0f4eccc96ad475360cf3844b5792bd077836. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam From markflorisson88 at gmail.com Sat Dec 17 13:55:05 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 17 Dec 2011 12:55:05 +0000 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails In-Reply-To: References: Message-ID: On 17 December 2011 12:40, Lars Buitinck wrote: > 2011/12/17 Lars Buitinck : >> I was trying to build a C++ extension with debugging support as >> described in http://docs.cython.org/src/userguide/debugging.html, but >> I got an error from the Cython compiler: > > Forgot to mention: I was using the very latest Git version of Cython, > a04c0f4eccc96ad475360cf3844b5792bd077836. > > -- > Lars Buitinck > Scientific programmer, ILPS > University of Amsterdam > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Thanks for your report, could you paste/attach the code of your extension module? From L.J.Buitinck at uva.nl Sat Dec 17 13:57:31 2011 From: L.J.Buitinck at uva.nl (Lars Buitinck) Date: Sat, 17 Dec 2011 13:57:31 +0100 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails In-Reply-To: References: Message-ID: 2011/12/17 mark florisson : > Thanks for your report, could you paste/attach the code of your > extension module? The code is online at https://github.com/larsmans/sortedcollection -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam From markflorisson88 at gmail.com Sat Dec 17 14:13:11 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 17 Dec 2011 13:13:11 +0000 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails In-Reply-To: References: Message-ID: On 17 December 2011 12:57, Lars Buitinck wrote: > 2011/12/17 mark florisson : >> Thanks for your report, could you paste/attach the code of your >> extension module? > > The code is online at https://github.com/larsmans/sortedcollection > > -- > Lars Buitinck > Scientific programmer, ILPS > University of Amsterdam > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Thanks, that was very helpful. The problem was that generator functions apparently don't fill out a name in the symbol table. You can try again from this branch: https://github.com/markflorisson88/cython/commit/ed648d932f3922f77e140a6292d65f56f4899090 which will be pushed to cython's master branch later. From romain.py at gmail.com Sat Dec 17 20:48:35 2011 From: romain.py at gmail.com (Romain Guillebert) Date: Sat, 17 Dec 2011 20:48:35 +0100 Subject: [Cython] Merging of the ctypes backend branch Message-ID: <20111217194835.GA10421@hardshooter> Hi everyone I rebased the ctypes backend branch to the last cython commit, and I wondered how the branch should be merged with the main cython repository. I see 3 options : a) upload the branch without merging it b) merge the branch but not run the test suite on ctypes by default or c) merge the branch and run the test suite. I think it's better to do c) but the test results shouldn't mix with the results of the other backends. What do you think ? Romain From markflorisson88 at gmail.com Sun Dec 18 12:45:49 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 18 Dec 2011 11:45:49 +0000 Subject: [Cython] Merging of the ctypes backend branch In-Reply-To: <20111217194835.GA10421@hardshooter> References: <20111217194835.GA10421@hardshooter> Message-ID: On 17 December 2011 19:48, Romain Guillebert wrote: > Hi everyone > > I rebased the ctypes backend branch to the last cython commit, and I > wondered how the branch should be merged with the main cython > repository. I see 3 options : a) upload the branch without merging it b) > merge the branch but not run the test suite on ctypes by default or c) > merge the branch and run the test suite. > > I think it's better to do c) but the test results shouldn't mix with the > results of the other backends. > > What do you think ? > > Romain > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I think before we try merging anything, we should assess what functionality is there and how well it works. Could you explain what the ctypes backend currently supports, and which tests it passes/doesn't pass? I checked out our your branch from github (https://github.com/hardshooter/CythonCTypesBackend) but when I run cython on a script that uses any Cython utility code the compiler crashes (see below for details). ##### test.pyx cdef int[:] slice ##### output: Error compiling Cython file: ------------------------------------------------------------ ... cdef extern from *: cdef object __pyx_test_dep(object) ^ ------------------------------------------------------------ TestClass:5:31: Compiler crash in ControlFlowAnalysis ModuleNode.body = StatListNode(TestClass:4:0) CDefExternNode.body = StatListNode(TestClass:5:9) CVarDefNode.declarators[0] = CFuncDeclaratorNode(TestClass:5:30, calling_convention = '') CFuncDeclaratorNode.args[0] = CArgDeclNode(TestClass:5:31, is_generic = 1) Compiler crash traceback from this point on: File "/Users/mark/cy/Cython/Compiler/Visitor.py", line 176, in _visitchild result = handler_method(child) File "/Users/mark/cy/Cython/Compiler/FlowControl.py", line 735, in visit_CArgDeclNode entry = self.env.lookup(node.name) AttributeError: 'CArgDeclNode' object has no attribute 'name' Traceback (most recent call last): File "/Users/mark/code/cython/bin/cython", line 8, in main(command_line = 1) File "/Users/mark/cy/Cython/Compiler/Main.py", line 670, in main result = compile(sources, options) File "/Users/mark/cy/Cython/Compiler/Main.py", line 645, in compile return compile_multiple(source, options) File "/Users/mark/cy/Cython/Compiler/Main.py", line 617, in compile_multiple result = run_pipeline(source, options) File "/Users/mark/cy/Cython/Compiler/Main.py", line 476, in run_pipeline err, enddata = Pipeline.run_pipeline(pipeline, source) File "/Users/mark/cy/Cython/Compiler/Pipeline.py", line 375, in run_pipeline data = phase(data) File "/Users/mark/cy/Cython/Compiler/ParseTreeTransforms.py", line 1401, in __call__ return super(AnalyseDeclarationsTransform, self).__call__(root) File "/Users/mark/cy/Cython/Compiler/Visitor.py", line 276, in __call__ return super(CythonTransform, self).__call__(node) File "/Users/mark/cy/Cython/Compiler/Visitor.py", line 259, in __call__ return self._visit(root) File "/Users/mark/cy/Cython/Compiler/Visitor.py", line 165, in _visit return handler_method(obj) File "/Users/mark/cy/Cython/Compiler/ParseTreeTransforms.py", line 1409, in visit_ModuleNode node.analyse_declarations(self.env_stack[-1]) File "/Users/mark/cy/Cython/Compiler/ModuleNode.py", line 96, in analyse_declarations self.body.analyse_declarations(env) File "/Users/mark/cy/Cython/Compiler/Nodes.py", line 337, in analyse_declarations stat.analyse_declarations(env) File "/Users/mark/cy/Cython/Compiler/Nodes.py", line 1047, in analyse_declarations base_type = self.base_type.analyse(env) File "/Users/mark/cy/Cython/Compiler/Nodes.py", line 858, in analyse axes_specs = MemoryView.get_axes_specs(env, self.axes) File "/Users/mark/cy/Cython/Compiler/MemoryView.py", line 696, in get_axes_specs cythonscope.load_cythonscope() File "/Users/mark/cy/Cython/Compiler/CythonScope.py", line 104, in load_cythonscope self, cython_scope=self) File "/Users/mark/cy/Cython/Compiler/UtilityCode.py", line 147, in declare_in_scope tree = self.get_tree(entries_only=True, cython_scope=cython_scope) File "/Users/mark/cy/Cython/Compiler/UtilityCode.py", line 127, in get_tree assert not err, err AssertionError: Error compiling Cython file: ------------------------------------------------------------ ... cdef extern from *: cdef object __pyx_test_dep(object) ^ ------------------------------------------------------------ TestClass:5:31: Compiler crash in ControlFlowAnalysis ModuleNode.body = StatListNode(TestClass:4:0) CDefExternNode.body = StatListNode(TestClass:5:9) CVarDefNode.declarators[0] = CFuncDeclaratorNode(TestClass:5:30, calling_convention = '') CFuncDeclaratorNode.args[0] = CArgDeclNode(TestClass:5:31, is_generic = 1) Compiler crash traceback from this point on: File "/Users/mark/cy/Cython/Compiler/Visitor.py", line 176, in _visitchild result = handler_method(child) File "/Users/mark/cy/Cython/Compiler/FlowControl.py", line 735, in visit_CArgDeclNode entry = self.env.lookup(node.name) AttributeError: 'CArgDeclNode' object has no attribute 'name' From dtcaciuc at gmail.com Mon Dec 19 05:17:31 2011 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Sun, 18 Dec 2011 20:17:31 -0800 Subject: [Cython] Cannot assign type 'set &' to 'set' Message-ID: Hello everyone, Here's a small test case I'm trying to compile. I'm trying to pass a STL set reference to a method in a template class. x.pyx: from libcpp.set cimport set as cpp_set cdef extern from "x.hh": cdef cppclass Foo [T]: Foo() void set_x(cpp_set[size_t] & x) cpdef func(): cdef Foo[int] foo cdef cpp_set[size_t] x cdef cpp_set[size_t] & xref = x foo.set_x(xref) x.hh: #include template struct Foo { void set_x(const std::set & x) { /* do nothing */ } }; To compile, bash $ cython --cplus x.pyx Which results in foo.set_x(xref) ^ ------------------------------------------------------------ x.pyx:15:18: Cannot assign type 'set &' to 'set' However, if I remove the template parameter from Foo, everything works. y.pyx: from libcpp.set cimport set as cpp_set cdef extern from "y.hh": cdef cppclass Foo: Foo() void set_x(cpp_set[size_t] & x) cpdef func(): cdef Foo foo cdef cpp_set[size_t] x cdef cpp_set[size_t] & xref = x foo.set_x(xref) y.hh: #include struct Foo { void set_x(const std::set & x) { /* do nothing */ } }; >From what I can tell, the CppClassType instance the CReferenceType is pointing to has the correct name "set", however it's a different class instance. The particular failing expression is in `ExprNode.coerce_to` if not (str(src.type) == str(dst_type) or dst_type.assignable_from(src_type)) I wish I could suggest a patch, but unfortunately I'm a complete newbie to Cython internals. Perhaps someone could give a few pointers as to what should be done to fix this? Thanks, Dimitri From L.J.Buitinck at uva.nl Tue Dec 20 19:31:15 2011 From: L.J.Buitinck at uva.nl (Lars Buitinck) Date: Tue, 20 Dec 2011 19:31:15 +0100 Subject: [Cython] Bug report: building an extension with --pyrex-gdb fails In-Reply-To: References: Message-ID: 2011/12/17 mark florisson : > Thanks, that was very helpful. The problem was that generator > functions apparently don't fill out a name in the symbol table. You > can try again from this branch: > https://github.com/markflorisson88/cython/commit/ed648d932f3922f77e140a6292d65f56f4899090 > which will be pushed to cython's master branch later. It works, thanks! -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam From markflorisson88 at gmail.com Tue Dec 20 21:18:10 2011 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 20 Dec 2011 20:18:10 +0000 Subject: [Cython] [cython-users] memoryviews & parameter passing In-Reply-To: <9f56606c-b69a-4158-9fe2-f816265aa6e0@i6g2000vbh.googlegroups.com> References: <9f56606c-b69a-4158-9fe2-f816265aa6e0@i6g2000vbh.googlegroups.com> Message-ID: On 20 December 2011 18:57, Dirk Rothe wrote: > Hello Cython-Devs, > > I'v thought I check out the memoryview syntax from cython-trunk to > refactor some tight loops on numpy arrays into smaller functions. But > either I'm doing something wrong or the call-overhead (of dostuff() ) > is still very large. Am I missing something? > > @cython.boundscheck(False) > cdef inline int dostuff(np.int_t[:] data, int i, int j) nogil: > ? ?return data[j] + i + j > > @cython.boundscheck(False) > def test(): > ? ?cdef np.int_t[:, :] data = np.zeros((3000, 20000), dtype=np.int) > ? ?cdef int i, j > ? ?with nogil: > ? ? ? ?for i in range(3000): > ? ? ? ? ? ?for j in range(20000): > ? ? ? ? ? ? ? ?# try to be as fast > ? ? ? ? ? ? ? ?data[i, j] = dostuff(data[i], i, j) > ? ? ? ? ? ? ? ?# as direct array access > ? ? ? ? ? ? ? ?#~ data[i, j] = data[i, j] + i + j > > thnx, dirk The performance difference is indeed quite large. There are several problems with the implementation of slices: 1) the overhead of PyThread_acquire_lock() is quite large, we should resort to an atomic approach 2) the slices support up to 32 dimensions by default (configurable as compiler option). This is a lot of memory to copy around all the time. I think a default of 8 would be more sensible and the compiler option should be documented well (who uses 32 dimensions anyway?) 3) the slice function has a generic approach and could be somewhat faster if the slice is direct and strided Addressing these problems by tweaking the generated code brings it down from ~16 seconds to ~2.4 seconds. The direct indexing approach without function call takes ~0.35 seconds. Slicing will never be as fast, so if you'd really want to write that code you'd move the data[i] call to the outer loop, as in: for i in range(3000): dataslice = data[i] for j in range(...): ... Now Cython could do that optimization itself as the 'data' slice does not change in the inner loop, but it doesn't. But at least it should not be more than 10 times slower (so this will be worked on). @cython-dev How should atomic operations be supported? Should this use something like http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html , or something like libatomic? Or should we "just" implement a garbage collector for pure-Cython level stuff (like memoryview slices), thereby avoiding the need to acquisition count? From vitja.makarov at gmail.com Wed Dec 21 19:48:48 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Wed, 21 Dec 2011 22:48:48 +0400 Subject: [Cython] Generators & closure optimization Message-ID: Some time ago we were talking about generators optimization by copying local variables from closure into local scope. Now I think that should be a good idea to implement this for both generators and regular closure functions. So local var will be used for reference and assignment should be made to both copies. Of course there are some variables that shouldn't be copied: non-local vars, arrays, C++ classes and structures. Also it may be a good idea to move outer scope pointer into local variable. So I'm wondering what is a good test to measure actual speedup? -- vitja. From stefan_ml at behnel.de Wed Dec 21 21:17:23 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 21 Dec 2011 21:17:23 +0100 Subject: [Cython] Generators & closure optimization In-Reply-To: References: Message-ID: <4EF23ED3.6090302@behnel.de> Vitja Makarov, 21.12.2011 19:48: > Some time ago we were talking about generators optimization by copying > local variables from closure into local scope. Yes, I think that will make it easier for the C compiler to make optimistic assumptions about external values. > Now I think that should be a good idea to implement this for both > generators and regular closure functions. So local var will be used > for reference and assignment should be made to both copies. Of course > there are some variables that shouldn't be copied: non-local vars, > arrays, C++ classes and structures. Basically, anything that external code can modify. That makes it a bit tricky to do it also for 'normal' closure functions - the whole idea is that there is more than one function that can refer to a variable. > Also it may be a good idea to move outer scope pointer into local variable. > > So I'm wondering what is a good test to measure actual speedup? http://blog.behnel.de/index.php?p=163 Just take the plain Python versions of the iterparse functions and compare them before and after the change. The raw C implementation in CPython gives a good baseline. Actually, it would be generally interesting to run the Cython versions through callgrind to see where the time is actually being spent. Stefan From stefan_ml at behnel.de Sun Dec 25 11:47:57 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 25 Dec 2011 11:47:57 +0100 Subject: [Cython] Generators & closure optimization In-Reply-To: <4EF23ED3.6090302@behnel.de> References: <4EF23ED3.6090302@behnel.de> Message-ID: <4EF6FF5D.1040307@behnel.de> Stefan Behnel, 21.12.2011 21:17: > Vitja Makarov, 21.12.2011 19:48: >> Some time ago we were talking about generators optimization by copying >> local variables from closure into local scope. > > Yes, I think that will make it easier for the C compiler to make optimistic > assumptions about external values. > >> Now I think that should be a good idea to implement this for both >> generators and regular closure functions. So local var will be used >> for reference and assignment should be made to both copies. Of course >> there are some variables that shouldn't be copied: non-local vars, >> arrays, C++ classes and structures. > > Basically, anything that external code can modify. That makes it a bit > tricky to do it also for 'normal' closure functions - the whole idea is > that there is more than one function that can refer to a variable. > >> Also it may be a good idea to move outer scope pointer into local variable. >> >> So I'm wondering what is a good test to measure actual speedup? > > http://blog.behnel.de/index.php?p=163 > > Just take the plain Python versions of the iterparse functions and compare > them before and after the change. The raw C implementation in CPython gives > a good baseline. > > Actually, it would be generally interesting to run the Cython versions > through callgrind to see where the time is actually being spent. Another useful test is the "nqueens" benchmark from the CPython test suite. It's regularly run on Jenkins in Py2.7 and 3.3. https://sage.math.washington.edu:8091/hudson/view/bench/ Note that it mostly uses generator expressions, which could easily benefit from a couple of further optimisations by specialising them, e.g. by providing a length hint. http://trac.cython.org/cython_trac/ticket/756 Stefan From vitja.makarov at gmail.com Mon Dec 26 20:07:58 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 26 Dec 2011 23:07:58 +0400 Subject: [Cython] Generators & closure optimization In-Reply-To: <4EF6FF5D.1040307@behnel.de> References: <4EF23ED3.6090302@behnel.de> <4EF6FF5D.1040307@behnel.de> Message-ID: 2011/12/25 Stefan Behnel : > Stefan Behnel, 21.12.2011 21:17: > >> Vitja Makarov, 21.12.2011 19:48: >>> >>> Some time ago we were talking about generators optimization by copying >>> local variables from closure into local scope. >> >> >> Yes, I think that will make it easier for the C compiler to make >> optimistic >> assumptions about external values. >> >>> Now I think that should be a good idea to implement this for both >>> generators and regular closure functions. So local var will be used >>> for reference and assignment should be made to both copies. Of course >>> there are some variables that shouldn't be copied: non-local vars, >>> arrays, C++ classes and structures. >> >> >> Basically, anything that external code can modify. That makes it a bit >> tricky to do it also for 'normal' closure functions - the whole idea is >> that there is more than one function that can refer to a variable. >> >>> Also it may be a good idea to move outer scope pointer into local >>> variable. >>> >>> So I'm wondering what is a good test to measure actual speedup? >> >> >> http://blog.behnel.de/index.php?p=163 >> >> Just take the plain Python versions of the iterparse functions and compare >> them before and after the change. The raw C implementation in CPython >> gives >> a good baseline. >> >> Actually, it would be generally interesting to run the Cython versions >> through callgrind to see where the time is actually being spent. > > > Another useful test is the "nqueens" benchmark from the CPython test suite. > It's regularly run on Jenkins in Py2.7 and 3.3. > > https://sage.math.washington.edu:8091/hudson/view/bench/ > > Note that it mostly uses generator expressions, which could easily benefit > from a couple of further optimisations by specialising them, e.g. by > providing a length hint. > > http://trac.cython.org/cython_trac/ticket/756 > I have implemented local variable copying, it could be found here: https://github.com/vitek/cython/tree/_copy_closure I didn't noticed significant speedup running nqueens test. Indeed I'm not sure it's speedup. It's all about <+-2%. Perhaps some better test is required. Btw, I got ~8% speedup for this dummy test: def foo(): cdef int i, r cdef list o o = [] def bar(): return len(o) for i in range(10000000): bar() r += len(o) return r What's iterparse()? -- vitja. From stefan_ml at behnel.de Mon Dec 26 21:05:49 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 26 Dec 2011 21:05:49 +0100 Subject: [Cython] Generators & closure optimization In-Reply-To: References: <4EF23ED3.6090302@behnel.de> <4EF6FF5D.1040307@behnel.de> Message-ID: <4EF8D39D.2030407@behnel.de> Vitja Makarov, 26.12.2011 20:07: > 2011/12/25 Stefan Behnel: >> Stefan Behnel, 21.12.2011 21:17: >>> Vitja Makarov, 21.12.2011 19:48: >>>> >>>> Some time ago we were talking about generators optimization by copying >>>> local variables from closure into local scope. >>> >>> Yes, I think that will make it easier for the C compiler to make >>> optimistic assumptions about external values. >>> >>>> Now I think that should be a good idea to implement this for both >>>> generators and regular closure functions. So local var will be used >>>> for reference and assignment should be made to both copies. Of course >>>> there are some variables that shouldn't be copied: non-local vars, >>>> arrays, C++ classes and structures. >>> >>> >>> Basically, anything that external code can modify. That makes it a bit >>> tricky to do it also for 'normal' closure functions - the whole idea is >>> that there is more than one function that can refer to a variable. >>> >>>> Also it may be a good idea to move outer scope pointer into local >>>> variable. >>>> >>>> So I'm wondering what is a good test to measure actual speedup? >>> >>> >>> http://blog.behnel.de/index.php?p=163 >>> >>> Just take the plain Python versions of the iterparse functions and compare >>> them before and after the change. The raw C implementation in CPython >>> gives a good baseline. >>> >>> Actually, it would be generally interesting to run the Cython versions >>> through callgrind to see where the time is actually being spent. >> >> Another useful test is the "nqueens" benchmark from the CPython test suite. >> It's regularly run on Jenkins in Py2.7 and 3.3. >> >> https://sage.math.washington.edu:8091/hudson/view/bench/ >> >> Note that it mostly uses generator expressions, which could easily benefit >> from a couple of further optimisations by specialising them, e.g. by >> providing a length hint. >> >> http://trac.cython.org/cython_trac/ticket/756 >> > > I have implemented local variable copying, it could be found here: > > https://github.com/vitek/cython/tree/_copy_closure Cool. > I didn't noticed significant speedup running nqueens test. Indeed I'm > not sure it's speedup. > It's all about<+-2%. Perhaps some better test is required. Well, yes, I'd guess that generator expressions are really not the target of this optimisation. They simply don't do enough in between two iterations. > Btw, I got ~8% speedup for this dummy test: > def foo(): > cdef int i, r > cdef list o > o = [] > def bar(): > return len(o) > for i in range(10000000): > bar() > r += len(o) > return r Any speedup will help someone, I guess. And it's good to know that it makes a difference if the closure code isn't plain trivial. > What's iterparse()? Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant "itertools". See the blog link above. Also, consider using callgrind for performance analysis. For one, it'll print the number of executed instructions after a test run. For Cython generated ode, it's usually a good sign if that decreases after a change. For a deeper analysis of such a profiling run, you can then use kcachegrind. Stefan From vitja.makarov at gmail.com Tue Dec 27 19:34:52 2011 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Tue, 27 Dec 2011 22:34:52 +0400 Subject: [Cython] Generators & closure optimization In-Reply-To: <4EF8D39D.2030407@behnel.de> References: <4EF23ED3.6090302@behnel.de> <4EF6FF5D.1040307@behnel.de> <4EF8D39D.2030407@behnel.de> Message-ID: 2011/12/27 Stefan Behnel : > Vitja Makarov, 26.12.2011 20:07: >> >> 2011/12/25 Stefan Behnel: >> >>> Stefan Behnel, 21.12.2011 21:17: >>>> >>>> Vitja Makarov, 21.12.2011 19:48: >>>>> >>>>> >>>>> Some time ago we were talking about generators optimization by copying >>>>> local variables from closure into local scope. >>>> >>>> >>>> Yes, I think that will make it easier for the C compiler to make >>>> optimistic assumptions about external values. >>>> >>>>> Now I think that should be a good idea to implement this for both >>>>> generators and regular closure functions. So local var will be used >>>>> for reference and assignment should be made to both copies. Of course >>>>> there are some variables that shouldn't be copied: non-local vars, >>>>> arrays, C++ classes and structures. >>>> >>>> >>>> >>>> Basically, anything that external code can modify. That makes it a bit >>>> tricky to do it also for 'normal' closure functions - the whole idea is >>>> that there is more than one function that can refer to a variable. >>>> >>>>> Also it may be a good idea to move outer scope pointer into local >>>>> variable. >>>>> >>>>> So I'm wondering what is a good test to measure actual speedup? >>>> >>>> >>>> >>>> http://blog.behnel.de/index.php?p=163 >>>> >>>> Just take the plain Python versions of the iterparse functions and >>>> compare >>>> them before and after the change. The raw C implementation in CPython >>>> gives a good baseline. >>>> >>>> Actually, it would be generally interesting to run the Cython versions >>>> through callgrind to see where the time is actually being spent. >>> >>> >>> Another useful test is the "nqueens" benchmark from the CPython test >>> suite. >>> It's regularly run on Jenkins in Py2.7 and 3.3. >>> >>> https://sage.math.washington.edu:8091/hudson/view/bench/ >>> >>> Note that it mostly uses generator expressions, which could easily >>> benefit >>> from a couple of further optimisations by specialising them, e.g. by >>> providing a length hint. >>> >>> http://trac.cython.org/cython_trac/ticket/756 >>> >> >> I have implemented local variable copying, it could be found here: >> >> https://github.com/vitek/cython/tree/_copy_closure > > > Cool. > > > >> I didn't noticed significant speedup running nqueens test. Indeed I'm >> not sure it's speedup. >> It's all about<+-2%. Perhaps some better test is required. > > > Well, yes, I'd guess that generator expressions are really not the target of > this optimisation. They simply don't do enough in between two iterations. > > > >> Btw, I got ~8% speedup for this dummy test: >> def foo(): >> ? ? cdef int i, r >> ? ? cdef list o >> ? ? o = [] >> ? ? def bar(): >> ? ? ? ? return len(o) >> ? ? for i in range(10000000): >> ? ? ? ? bar() >> ? ? ? ? r += len(o) >> ? ? return r > > > Any speedup will help someone, I guess. And it's good to know that it makes > a difference if the closure code isn't plain trivial. > > >> What's iterparse()? > > > Ah, sorry, my bad. Too much lxml stuff lately, I guess. I really meant > "itertools". See the blog link above. > > Also, consider using callgrind for performance analysis. For one, it'll > print the number of executed instructions after a test run. For Cython > generated ode, it's usually a good sign if that decreases after a change. > For a deeper analysis of such a profiling run, you can then use kcachegrind. > > I tried count() generator from your blog: optimized version is about 1% slower. I think this optimization would make sens for regular closures and generators that do some real work inside. -- vitja.