From mark.dufour at gmail.com Sun Sep 11 00:36:41 2005 From: mark.dufour at gmail.com (Mark Dufour) Date: Sun, 11 Sep 2005 00:36:41 +0200 Subject: [summerofcode] First release of Shed Skin, a Python-to-C++ compiler. Message-ID: <8180ef6905091015361d3ffdf7@mail.gmail.com> After nine months of hard work, I am proud to introduce my baby to the world: an experimental Python-to-C++ compiler. It can convert many Python programs into optimized C++ code, without any user intervention such as adding type declarations. It uses rather advanced static type inference techniques to deduce type information by itself. In addition, it determines whether deduced types may be parameterized, and if so, it generates corresponding C++ generics. Based on deduced type information, it also attempts to convert heap allocation into stack and static preallocation (falling back to libgc in case this fails.) The compiler was motivated by the belief that in many cases it should be possible to automatically deduce C++ versions of Python programs, enabling users to enjoy both the productivity of Python and the efficiency of C++. It works best for Python programs written in a relatively static C++-style, in essence enabling users to specify C++ programs at a higher level. At the moment the compiler correctly handles 124 unit tests, six of which are serious programs of between 100 and 200 lines: -an othello player -two satisfiability solvers -a japanese puzzle solver -a sudoku solver -a neural network simulator Unfortunately I am just a single person, and much work remains to be done. At the moment, there are several limitations to the type of Python programs that the compiler accepts. Even so, there is enough of Python left to be able to remain highly productive in many cases. However, for most larger programs, there are probably some minor problems that need to be fixed first, and some external dependencies to be implemented/bridged in C++. With this initial release, I hope to attract other people to help me locate remaining problems, help implement external dependencies, and in the end hopefully even to contribute to the compiler itself. I would be very happy to receive small programs that the compiler does or should be able to handle. If you are a C++ template wizard, and you would be interested in working on the C++ implementation of builtin types, I would also love to get in contact with you. Actually, I'd like to talk to anyone even slightly interested in the compiler, as this would be highly motivating to me. The source code is available at the following site. Please check the README for simple installation/usage instructions. Let me know if you would like to create ebuild/debian packages. Sourceforge site: http://shedskin.sourceforge.net Shed Skin blog: http://shed-skin.blogspot.com Should you reply to this mail, please also reply to me directly. Thanks! Credits Parts of the compiler have been sponsored by Google, via its Summer of Code program. I am very grateful to them for keeping me motivated during a difficult period. I am also grateful to the Python Software Foundation for chosing my project for the Summer of Code. Finally, I would like to thank my university advisor Koen Langendoen for guiding this project. Details The following describes in a bit more detail various aspects of the compiler. Before seriously using the compiler, please make sure to understand especially its limitations. Main Features -very precise, efficient static type inference (iterative object contour splitting, where each iteration performs the cartesian product algorithm) -stack and static pre-allocation (libgc is used as a fall-back) -support for list comprehensions, tuple assignments, anonymous funcs -generation of arbitrarily complex class and function templates (even member templates, or generic, nested list comprehensions) -binary tuples are internally analyzed -some understanding of inheritance (e.g. list(dict/list) becomes list>) -hierarchical project support: generation of corresponding C++ hierarchy, including (nested) Makefiles; C++ namespaces -annotation of source code with deduced types -builtin classes, functions (enumerate, sum, min, max, range, zip..) -polymorphic inline caches or virtual vars/calls (not well tested) -always unbox scalars (compiler bails out with error if scalars are mixed with pointer types) -full source code available under the MIT license Main Limitations/TODO's -Windows support (I don't have Windows, sorry) -reflection (getattr, hasattr), dynamic inheritance, eval, .. -mixing scalars with pointer types (e.g. int and None in a single variable) -mixing unrelated types in single container instance variable other than tuple-2 -holding different types of objects in tuples with length >2; builtin 'zip' can only take 2 arguments. -exceptions, generators, nested functions, operator overloading -recursive types (e.g. a = []; a.append(a)) -expect some problems when mixing floats and ints together -varargs (*x) are not very well supported; keyword args are not supported yet -arbitrary-size arithmetic -possible non-termination ('recursive customization', have not encountered it yet) -profiling will be required for scaling to very large programs -combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,)) -unboxing of small tuples (should form a nice speedup) -foreign code has to be modeled and implemented/bridged in C++ -some builtins are not implemented yet, e.g. 'reduce' and 'map'