From oliphant.travis at ieee.org Wed Nov 1 00:03:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue, 31 Oct 2006 16:03:01 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com> References: <20061028135415.GA13049@code0.codespeak.net> <4547007D.30404@v.loewis.de> <45478C71.2010600@v.loewis.de> <79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com> Message-ID: Paul Moore wrote: > On 10/31/06, Travis Oliphant wrote: > >>Martin v. L?wis wrote: >> >>>[...] because I still don't quite understand what the PEP >>>wants to achieve. >>> >> >>Are you saying you still don't understand after having read the extended >>buffer protocol PEP, yet? > > > I can't speak for Martin, but I don't understand how I, as a Python > programmer, might use the data type objects specified in the PEP. I > have skimmed the extended buffer protocol PEP, but I'm conscious that > no objects I currently use support the extended buffer protocol (and > the PEP doesn't mention adding support to existing objects), so I > don't see that as too relevant to me. Do you use the PIL? The PIL supports the array interface. CVXOPT supports the array interface. Numarray Numeric NumPy all support the array interface. > > I have also installed numpy, and looked at the help for numpy.dtype, > but that doesn't add much to the PEP. The source-code is available. > The freely available chapters of > the numpy book explain how dtypes describe data structures, but not > how to use them. The freely available Numeric documentation doesn't > refer to dtypes, as far as I can tell. It kind of does, they are PyArray_Descr * structures in Numeric. They just aren't Python objects. Is there any documentation on > how to use dtypes, independently of other features of numpy? There are examples and other help pages at http://www.scipy.org If not, > can you clarify where the benefit lies for a Python user of this > proposal? (I understand the benefits of a common language for > extensions to communicate datatype information, but why expose it to > Python? How do Python users use it?) > The only benefit I imagine would be for an extension module library writer and for users of the struct and array modules. But, other than that, I don't know. It actually doesn't have to be exposed to Python. I used Python notation in the PEP to explain what is basically a C-structure. I don't care if the object ever gets exposed to Python. Maybe that's part of the communication problem. > This is probably all self-evident to the numpy community, but I think > that as the PEP is aimed at a wider audience it needs a little more > background. It's hard to write that background because most of what I understand is from the NumPy community. I can't give you all the examples but my concern is that you have all these third party libraries out there describing what is essentially binary data and using either string-copies or the buffer protocol + extra information obtained by some method or attribute that varies across the implementations. There should really be a standard for describing this data. There are attempts at it in the struct and array module. There is the approach of ctypes but I claim that using Python type objects is over-kill for the purposes of describing data-formats. -Travis From pj at place.org Wed Nov 1 00:05:57 2006 From: pj at place.org (Paul Jimenez) Date: Tue, 31 Oct 2006 17:05:57 -0600 Subject: [Python-Dev] patch 1462525 or similar solution? Message-ID: <20061031230557.656049036@place.org> I submitted patch 1462525 awhile back to solve the problem described even longer ago in http://mail.python.org/pipermail/python-dev/2005-November/058301.html and I'm wondering what my appropriate next steps are. Honestly, I don't care if you take my patch or someone else's proposed solution, but I'd like to see something go into the stdlib so that I can eventually stop having to ship custom code for what is really a standard problem. --pj From rrr at ronadam.com Wed Nov 1 00:58:39 2006 From: rrr at ronadam.com (Ron Adam) Date: Tue, 31 Oct 2006 17:58:39 -0600 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <20061028135415.GA13049@code0.codespeak.net> <4547007D.30404@v.loewis.de> <45478C71.2010600@v.loewis.de> <79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com> Message-ID: > The only benefit I imagine would be for an extension module library > writer and for users of the struct and array modules. But, other than > that, I don't know. It actually doesn't have to be exposed to Python. > I used Python notation in the PEP to explain what is basically a > C-structure. I don't care if the object ever gets exposed to Python. > > Maybe that's part of the communication problem. I get the impression where ctypes is good for accessing native C libraries from within python, the data-type object is meant to add a more direct way to share native python object's *data* with C (or other languages) in a more efficient way. For data that can be represented well in continuous memory address's, it lightens the load so instead of a list of python objects you get an "array of data for n python_type objects" without the duplications of the python type for every element. I think maybe some more complete examples demonstrating how it is to be used from both the Python and C would be good. Cheers, Ron From oliphant.travis at ieee.org Wed Nov 1 01:13:37 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue, 31 Oct 2006 17:13:37 -0700 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: <4547BF86.6070806@v.loewis.de> References: <4547BF86.6070806@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Travis E. Oliphant schrieb: > >> Several extensions to Python utilize the buffer protocol to share >> the location of a data-buffer that is really an N-dimensional >> array. However, there is no standard way to exchange the >> additional N-dimensional array information so that the data-buffer >> is interpreted correctly. The NumPy project introduced an array >> interface (http://numpy.scipy.org/array_interface.shtml) through a >> set of attributes on the object itself. While this approach >> works, it requires attribute lookups which can be expensive when >> sharing many small arrays. > > > Can you please give examples for real-world applications of this > interface, preferably examples involving multiple > independently-developed libraries? > ("this" being the current interface in NumPy - I understand that > the PEP's interface isn't implemented, yet) > Examples of Need 1) Suppose you have a image in *.jpg format that came from a camera and you want to apply Fourier-based image recovery to try and de-blur the image using modified Wiener filtering. Then you want to save the result in *.png format. The PIL provides an easy way to read *.jpg files into Python and write the result to *.png and NumPy provides the FFT and the array math needed to implement the algorithm. Rather than have to dig into the details of how NumPy and the PIL interpret chunks of memory in order to write a "converter" between NumPy arrays and PIL arrays, there should be support in the buffer protocol so that one could write something like: # Read the image a = numpy.frombuffer(Image.open('myimage.jpg')). # Process the image. A = numpy.fft.fft2(a) B = A*inv_filter b = numpy.fft.ifft2(B).real # Write it out Image.frombuffer(b).save('filtered.png') Currently, without this proposal you have to worry about the "mode" the image is in and get it's shape using a specific method call (this method call is different for every object you might want to interface with). 2) The same argument for a library that reads and writes audio or video formats exists. 3) You want to blit images onto a GUI Image buffer for rapid updates but need to do math processing on the image values themselves or you want to read the images from files supported by the PIL. If the PIL supported the extended buffer protocol, then you would not need to worry about the "mode" and the "shape" of the Image. What's more, you would also be able to accept images from any object (like NumPy arrays or ctypes arrays) that supported the extended buffer protcol without having to learn how it shares information like shape and data-format. I could have also included examples from PyGame, OpenGL, etc. I thought people were more aware of this argument as we've made it several times over the years. It's just taken this long to get to a point to start asking for something to get into Python. > Paul Moore (IIRC) gave the example of equalising the green values > and maximizing the red values in a PIL image by passing it to NumPy: > Is that a realistic (even though not-yet real-world) example? I think so, but I've never done something like that. If > so, what algorithms of NumPy would I use to perform this image > manipulation (and why would I use NumPy for it if I could just > write a for loop that does that in pure Python, given PIL's > getpixel/setdata)? Basically you would use array math operations and reductions (ufuncs and it's methods which are included in NumPy). You would do it this way for speed. It's going to be a lot slower doing those loops in Python. NumPy provides the ability to do them at close-to-C speeds. -Travis From tjreedy at udel.edu Wed Nov 1 01:24:34 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 Oct 2006 19:24:34 -0500 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. References: <4547BF86.6070806@v.loewis.de> Message-ID: ""Martin v. L?wis"" wrote in message news:4547BF86.6070806 at v.loewis.de... > Paul Moore (IIRC) gave the example of equalising the green values > and maximizing the red values in a PIL image by passing it to NumPy: > Is that a realistic (even though not-yet real-world) example? If > so, what algorithms of NumPy would I use to perform this image > manipulation The use of surfarrays manipulated by Numeric has been an optional but important part of PyGame for years. http://www.pygame.org/docs/ says Surfarray Introduction Pygame uses the Numeric python module to allow efficient per pixel effects on images. Using the surface arrays is an advanced feature that allows custom effects and filters. This also examines some of the simple effects from the Pygame example, arraydemo.py. The Examples section of the linked page http://www.pygame.org/docs/tut/surfarray/SurfarrayIntro.html has code snippets for generating, resizing, recoloring, filtering, and cross-fading images. >(and why would I use NumPy for it if I could just > write a for loop that does that in pure Python, given PIL's > getpixel/setdata)? Why does anyone use Numeric/NumArray/NumPy? Faster,easier coding and much faster execution, which is especially important when straining for an acceptible framerate. ---- I believe that at present PyGame can only work with external images that it is programmed to know how to import. My guess is that if image source program X (such as PIL) described its data layout in a way that NumPy could read and act on, the import/copy step could be eliminated. But perhaps Travis can clarify this. Terry Jan Reedy From wbaxter at gmail.com Wed Nov 1 02:58:41 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Wed, 1 Nov 2006 01:58:41 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: Message-ID: One thing I'm curious about in the ctypes vs this PEP debate is the following. How do the approaches differ in practice if I'm developing a library that wants to accept various image formats that all describe the same thing: rgb data. Let's say for now all I want to support is two different image formats whose pixels are described in C structs by: struct rbg565 { unsigned short r:5; unsigned short g:6; unsigned short b:5; }; struct rgb101210 { unsigned int r:10; unsigned int g:12; unsigned int b:10; }; Basically in my code I want to be able to take the binary data descriptor and say "give me the 'r' field of this pixel as an integer". Is either one (the PEP or c-types) clearly easier to use in this case? What would the code look like for handling both formats generically? --bb From sluggoster at gmail.com Wed Nov 1 04:14:13 2006 From: sluggoster at gmail.com (Mike Orr) Date: Tue, 31 Oct 2006 19:14:13 -0800 Subject: [Python-Dev] Path object design Message-ID: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com> I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying that the first object-oriented proposal was rejected. I'm in favor of the "directory tuple" approach which wasn't mentioned in the thread. This was proposed by Noal Raphael several months ago: a Path object that's a sequence of components (a la os.path.split) rather than a string. The beauty of this approach is that slicing and joining are expressed naturally using the [] and + operators, eliminating several methods. Introduction: http://wiki.python.org/moin/AlternativePathClass Feature discussion: http://wiki.python.org/moin/AlternativePathDiscussion Reference implementation: http://wiki.python.org/moin/AlternativePathModule (There's a link to the introduction at the end of PEP 355.) Right now I'm working on a test suite, then I want to add the features marked "Mike" in the discussion -- in a way that people can compare the feature alternatives in real code -- and write a PEP. But it's a big job for one person, and there are unresolved issues on the discussion page, not to mention things brought up in the "PEP 355 status" thread. We had three people working on the discussion page but development seems to have ground to a halt. One thing is sure -- we urgently need something better than os.path. It functions well but it makes hard-to-read and unpythonic code. For instance, I have an application that has to add its libraries to the Python path, relative to the executable's location. /toplevel app1/ bin/ main_progam.py utility1.py init_app.py lib/ app_module.py shared/ lib/ shared_module.py The solution I've found is an init_app module in every application that sets up the paths. Conceptually it needs "../lib" and "../../shared/lib", but I want the absolute paths without hardcoding them, in a platform-neutral way. With os.path, "../lib" is: os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib") YUK! Compare to PEP 355: Path(__FILE__).parent.parent.join("lib") Much easier to read and debug. Under Noam's proposal it would be: Path(__FILE__)[:-2] + "lib" I'd also like to see the methods more intelligent: don't raise an error if an operation is already done (e.g., a directory exists or a file is already removed). There's no reason to clutter one's code with extra if's when the methods can easily encapsulate this. This was considered a too radical departure from os.path for some, but I have in mind even more radical convenience methods which I'd put in a third-party subclass if they're not accepted into the standard library, the way 'datetime' has third-party subclasses. In my application I started using Orendorff's path module, expecting the standard path object would be close to it. When PEP 355 started getting more changes and the directory-based alternative took off, I took path.py out and rewrote my code for os.path until an alternative becomes more stable. Now it looks like it will be several months and possibly several third-party packages until one makes it into the standard library. This is unfortunate. Not only does it mean ugly code in applications, but it means packages can't accept or return Path objects and expect them to be compatible with other packages. The reasons PEP 355 was rejected also sound strange. Nick Coghlan wrote (Oct 1): > Things the PEP 355 path object lumps together: > - string manipulation operations > - abstract path manipulation operations (work for non-existent filesystems) > - read-only traversal of a concrete filesystem (dir, stat, glob, etc) > - addition & removal of files/directories/links within a concrete filesystem > Dumping all of these into a single class is certainly practical from a utility > point of view, but it's about as far away from beautiful as you can get, which > creates problems from a learnability point of view, and from a > capability-based security point of view. What about the convenience of the users and the beauty of users' code? That's what matters to me. And I consider one class *easier* to learn. I'm tired of memorizing that 'split' is in os.path while 'remove' and 'stat' are in os. This seems arbitrary: you're statting a path, aren't you? Also, if you have four classes (abstract path, file, directory, symlink), *each* of those will have 3+ platform-specific versions. Then if you want to make an enhancement subclass you'll have to make 12 of them, one for each of the 3*4 combinations of superclasses. Encapsulation can help with this, but it strays from the two-line convenience for the user: from path import Path p = Path("ABC") # Works the same for files/directories on any platform. Nevertheless, I'm open to seeing a multi-class API, though hopefully less verbose than Talin's preliminary one (Oct 26). Is it necessary to support path.parent(), pathobj.parent(), io.dir.listdir(), *and* io.dir.Directory(). That's four different namespaces to memorize which function/method is where, and if a function/method belongs to multiple ones it'll be duplicated, and you'll have to remember that some methods are duplicated and others aren't... Plus, constructors like io.dir.Directory() look too verbose. io.Directory() might be acceptable, with the functions as class methods. I agree that supporting non-filesystem directories (zip files, CSV/Subversion sandboxes, URLs) would be nice, but we already have a big enough project without that. What constraints should a Path object keep in mind in order to be forward-compatible with this? If anyone has design ideas/concerns about a new Path class(es), please post them. If anyone would like to work on a directory-based spec/implementation, please email me. -- Mike Orr From talin at acm.org Wed Nov 1 04:20:50 2006 From: talin at acm.org (Talin) Date: Tue, 31 Oct 2006 19:20:50 -0800 Subject: [Python-Dev] Path object design In-Reply-To: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com> References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com> Message-ID: <45481292.7040403@acm.org> I'm right in the middle of typing up a largish post to go on the Python-3000 mailing list about this issue. Maybe we should move it over there, since its likely that any path reform will have to be targeted at Py3K...? Mike Orr wrote: > I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying > that the first object-oriented proposal was rejected. I'm in favor of > the "directory tuple" approach which wasn't mentioned in the thread. > This was proposed by Noal Raphael several months ago: a Path object > that's a sequence of components (a la os.path.split) rather than a > string. The beauty of this approach is that slicing and joining are > expressed naturally using the [] and + operators, eliminating several > methods. > > Introduction: http://wiki.python.org/moin/AlternativePathClass > Feature discussion: http://wiki.python.org/moin/AlternativePathDiscussion > Reference implementation: http://wiki.python.org/moin/AlternativePathModule > > (There's a link to the introduction at the end of PEP 355.) Right now > I'm working on a test suite, then I want to add the features marked > "Mike" in the discussion -- in a way that people can compare the > feature alternatives in real code -- and write a PEP. But it's a big > job for one person, and there are unresolved issues on the discussion > page, not to mention things brought up in the "PEP 355 status" thread. > We had three people working on the discussion page but development > seems to have ground to a halt. > > One thing is sure -- we urgently need something better than os.path. > It functions well but it makes hard-to-read and unpythonic code. For > instance, I have an application that has to add its libraries to the > Python path, relative to the executable's location. > > /toplevel > app1/ > bin/ > main_progam.py > utility1.py > init_app.py > lib/ > app_module.py > shared/ > lib/ > shared_module.py > > The solution I've found is an init_app module in every application > that sets up the paths. Conceptually it needs "../lib" and > "../../shared/lib", but I want the absolute paths without hardcoding > them, in a platform-neutral way. With os.path, "../lib" is: > > os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib") > > YUK! Compare to PEP 355: > > Path(__FILE__).parent.parent.join("lib") > > Much easier to read and debug. Under Noam's proposal it would be: > > Path(__FILE__)[:-2] + "lib" > > I'd also like to see the methods more intelligent: don't raise an > error if an operation is already done (e.g., a directory exists or a > file is already removed). There's no reason to clutter one's code > with extra if's when the methods can easily encapsulate this. This was > considered a too radical departure from os.path for some, but I have > in mind even more radical convenience methods which I'd put in a > third-party subclass if they're not accepted into the standard > library, the way 'datetime' has third-party subclasses. > > In my application I started using Orendorff's path module, expecting > the standard path object would be close to it. When PEP 355 started > getting more changes and the directory-based alternative took off, I > took path.py out and rewrote my code for os.path until an alternative > becomes more stable. Now it looks like it will be several months and > possibly several third-party packages until one makes it into the > standard library. This is unfortunate. Not only does it mean ugly > code in applications, but it means packages can't accept or return > Path objects and expect them to be compatible with other packages. > > The reasons PEP 355 was rejected also sound strange. Nick Coghlan > wrote (Oct 1): > >> Things the PEP 355 path object lumps together: >> - string manipulation operations >> - abstract path manipulation operations (work for non-existent filesystems) >> - read-only traversal of a concrete filesystem (dir, stat, glob, etc) >> - addition & removal of files/directories/links within a concrete filesystem > >> Dumping all of these into a single class is certainly practical from a utility >> point of view, but it's about as far away from beautiful as you can get, which >> creates problems from a learnability point of view, and from a >> capability-based security point of view. > > What about the convenience of the users and the beauty of users' code? > That's what matters to me. And I consider one class *easier* to > learn. I'm tired of memorizing that 'split' is in os.path while > 'remove' and 'stat' are in os. This seems arbitrary: you're statting > a path, aren't you? Also, if you have four classes (abstract path, > file, directory, symlink), *each* of those will have 3+ > platform-specific versions. Then if you want to make an enhancement > subclass you'll have to make 12 of them, one for each of the 3*4 > combinations of superclasses. Encapsulation can help with this, but > it strays from the two-line convenience for the user: > > from path import Path > p = Path("ABC") # Works the same for files/directories on any platform. > > Nevertheless, I'm open to seeing a multi-class API, though hopefully > less verbose than Talin's preliminary one (Oct 26). Is it necessary > to support path.parent(), pathobj.parent(), io.dir.listdir(), *and* > io.dir.Directory(). That's four different namespaces to memorize > which function/method is where, and if a function/method belongs to > multiple ones it'll be duplicated, and you'll have to remember that > some methods are duplicated and others aren't... Plus, constructors > like io.dir.Directory() look too verbose. io.Directory() might be > acceptable, with the functions as class methods. > > I agree that supporting non-filesystem directories (zip files, > CSV/Subversion sandboxes, URLs) would be nice, but we already have a > big enough project without that. What constraints should a Path > object keep in mind in order to be forward-compatible with this? > > If anyone has design ideas/concerns about a new Path class(es), please > post them. If anyone would like to work on a directory-based > spec/implementation, please email me. > From tjreedy at udel.edu Wed Nov 1 05:01:27 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 Oct 2006 23:01:27 -0500 Subject: [Python-Dev] PEP: Extending the buffer protocol to share arrayinformation. References: <4547BF86.6070806@v.loewis.de> Message-ID: "Travis Oliphant" wrote in message news:ei8ors$7m4$1 at sea.gmane.org... >Examples of Need [snip] < I could have also included examples from PyGame, OpenGL, etc. I thought >people were more aware of this argument as we've made it several times >over the years. It's just taken this long to get to a point to start >asking for something to get into Python. The problem of data format definition and sharing of data between applications has been a bugaboo of computer science for decades. But some have butted their heads against it more than others. Something which made a noticeable dent in the problem, by making sharing 'just work' more easily, would, to me, be a read plus for python. tjr From ronaldoussoren at mac.com Wed Nov 1 07:53:27 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 1 Nov 2006 07:53:27 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <454789F9.7050808@ctypes.org> References: <45468C8E.1000203@canterbury.ac.nz> <454789F9.7050808@ctypes.org> Message-ID: On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: > > This mechanism is probably a hack because it'n not possible to add > C accessible > fields to type objects, on the other hand it is extensible (in > principle, at least). I better start rewriting PyObjC then :-). PyObjC stores some addition information in the type objects that are used to describe Objective-C classes (such as a reference to the proxied class). IIRC This has been possible from Python 2.3. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/660e4844/attachment.bin From fredrik at pythonware.com Wed Nov 1 08:45:06 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 08:45:06 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <45481292.7040403@acm.org> References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com> <45481292.7040403@acm.org> Message-ID: Talin wrote: > I'm right in the middle of typing up a largish post to go on the > Python-3000 mailing list about this issue. Maybe we should move it over > there, since its likely that any path reform will have to be targeted at > Py3K...? I'd say that any proposal that cannot be fit into the current 2.X design is simply too disruptive to go into 3.0. So here's my proposal for 2.6 (reposted from the 3K list). This is fully backwards compatible, can go right into 2.6 without breaking anything, allows people to update their code as they go, and can be incrementally improved in future releases: 1) Add a pathname wrapper to "os.path", which lets you do basic path "algebra". This should probably be a subclass of unicode, and should *only* contain operations on names. 2) Make selected "shutil" operations available via the "os" name- space; the old POSIX API vs. POSIX SHELL distinction is pretty irrelevant. Also make the os.path predicates available via the "os" namespace. This gives a very simple conceptual model for the user; to manipulate path *names*, use "os.path.(string)" functions or the "" wrapper. To manipulate *objects* identified by a path, given either as a string or a path wrapper, use "os.(path)". This can be taught in less than a minute. With this in place in 2.6 and 2.7, all that needs to be done for 3.0 is to remove (some of) the old cruft. From fredrik at pythonware.com Wed Nov 1 08:53:04 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 08:53:04 +0100 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: Terry Reedy wrote: > I believe that at present PyGame can only work with external images that it > is programmed to know how to import. My guess is that if image source > program X (such as PIL) described its data layout in a way that NumPy could > read and act on, the import/copy step could be eliminated. I wish you all stopped using PIL as an example in this discussion; for PIL 2, I'm moving towards an entirely opaque data model, with a "data view"-style client API. From martin at v.loewis.de Wed Nov 1 09:16:25 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 09:16:25 +0100 Subject: [Python-Dev] patch 1462525 or similar solution? In-Reply-To: <20061031230557.656049036@place.org> References: <20061031230557.656049036@place.org> Message-ID: <454857D9.2040008@v.loewis.de> Paul Jimenez schrieb: > I submitted patch 1462525 awhile back to > solve the problem described even longer ago in > http://mail.python.org/pipermail/python-dev/2005-November/058301.html > and I'm wondering what my appropriate next steps are. Honestly, I don't > care if you take my patch or someone else's proposed solution, but I'd > like to see something go into the stdlib so that I can eventually stop > having to ship custom code for what is really a standard problem. The problem, as I see it, is that we cannot afford to include an "incorrect" library *again*. urllib may be ill-designed, but can't be changed for backwards compatibility reasons. The same should not happen to urilib: it has to be "right" from the start. So the question is: are you willing to work on it until it is right? I just reviewed it a bit, and have a number of questions: - Can you please sign a contributor form, from http://www.python.org/psf/contrib/ and then add the magic words ("Licensed to PSF under a Contributor Agreement.") to this code? - I notice there is no documentation. Can you please come up with a patch to Doc/lib? - Also, there are no test cases. Can you please come up with a test suite? - Is this library also meant to support creation of URIs? If so, shouldn't it also do percent-encoding, if the input contains reserved characters. Also, shouldn't it perform percent-undecoding when the URI contains unreserved characters? - Should this library support RFC 3987 also? - Why does the code still name things "URL"? The RFC avoids this name throughout (except for explaining that the fact that the URI is a locator is really irrelevant) Regards, Martin From martin at v.loewis.de Wed Nov 1 09:24:06 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 09:24:06 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: Message-ID: <454859A6.4050904@v.loewis.de> Bill Baxter schrieb: > Basically in my code I want to be able to take the binary data descriptor and > say "give me the 'r' field of this pixel as an integer". > > Is either one (the PEP or c-types) clearly easier to use in this case? What > would the code look like for handling both formats generically? The PEP, as specified, does not support accessing individual fields from Python. OTOH, ctypes, as implemented, does. This comparison is not fair, though: an *implementation* of the PEP (say, NumPy) might also give you Python-level access to the fields. With the PEP, you can get access to the 'r' field from C code. Performing this access is quite tedious; as I'm uncertain whether you actually wanted to see C code, I refrain from trying to formulate it. Regards, Martin From glyph at divmod.com Wed Nov 1 09:36:11 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 01 Nov 2006 08:36:11 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> On 03:14 am, sluggoster at gmail.com wrote: >One thing is sure -- we urgently need something better than os.path. >It functions well but it makes hard-to-read and unpythonic code. I'm not so sure. The need is not any more "urgent" today than it was 5 years ago, when os.path was equally "unpythonic" and unreadable. The problem is real but there is absolutely no reason to hurry to a premature solution. I've already recommended Twisted's twisted.python.filepath module as a possible basis for the implementation of this feature. I'm sorry I don't have the time to pursue that. I'm also sad that nobody else seems to have noticed. Twisted's implemenation has an advantage that it doesn't seem that these new proposals do, an advantage I would really like to see in whatever gets seriously considered for adoption: *It is already used in a large body of real, working code, and therefore its limitations are known.* If I'm wrong about this, and I can't claim to really know about the relative levels of usage of all of these various projects when they're not mentioned, please cite actual experiences using them vs. using os.path. Proposals for extending the language are contentious and it is very difficult to do experimentation with non-trivial projects because nobody wants to do that and then end up with a bunch of code written in a language that is no longer supported when the experiment fails. I understand, therefore, that language-change proposals are therefore going to be very contentious no matter what. However, there is no reason that library changes need to follow this same path. It is perfectly feasible to write a library, develop some substantial applications with it, tweak it based on that experience, and *THEN* propose it for inclusion in the standard library. Users of the library can happily continue using the library, whether it is accepted or not, and users of the language and standard library get a new feature for free. For example, I plan to continue using FilePath regardless of the outcome of this discussion, although perhaps some conversion methods or adapters will be in order if a new path object makes it into the standard library. I specifically say "library" and not "recipie". This is not a useful exercise if every user of the library has a subtly incompatible and manually tweaked version for their particular application. Path representation is a bike shed. Nobody would have proposed writing an entirely new embedded database engine for Python: python 2.5 simply included SQLite because its utility was already proven. I also believe it is important to get this issue right. It might be a bike shed, but it's a *very important* bike shed. Google for "web server url filesystem path vulnerability" and you'll see what I mean. Getting it wrong (or passing strings around everywhere) means potential security gotchas lurking around every corner. Even Twisted, with no C code at all, got its only known arbitrary-code-execution vulnerability from a path manipulation bug. That was even after we'd switched to an OO path-manipulation layer specifically to avoid bugs like this! I am not addressing this message to the py3k list because its general message of extreme conservatism on new features is more applicable to python-dev. However, py3k designers might also take note: if py3k is going to do something in this area and drop support for the "legacy" os.path, it would be good to choose something that is known to work and have few gotchas, rather than just choosing the devil we don't know over the devil we do. The weaknesses of os.path are at least well-understood. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/38391818/attachment.htm From fredrik at pythonware.com Wed Nov 1 10:11:12 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 10:11:12 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> Message-ID: glyph at divmod.com wrote: > I am not addressing this message to the py3k list because its general > message of extreme conservatism on new features is more applicable to > python-dev. However, py3k designers might also take note: if py3k is > going to do something in this area and drop support for the "legacy" > os.path, it would be good to choose something that is known to work and > have few gotchas, rather than just choosing the devil we don't know over > the devil we do. The weaknesses of os.path are at least well-understood. that's another reason why a new design might as well be defined in terms of the old design -- especially if the main goal is call-site convenience, rather than fancy new algorithms. From ncoghlan at gmail.com Wed Nov 1 10:41:41 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Nov 2006 19:41:41 +1000 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <45468C8E.1000203@canterbury.ac.nz> <4547452A.5040501@gmail.com> Message-ID: <45486BD5.9060000@gmail.com> Travis Oliphant wrote: > Nick Coghlan wrote: >> In fact, it may make sense to just use the lists/strings directly as the data >> exchange format definitions, and let the various libraries do their own >> translation into their private format descriptions instead of creating a new >> one-type-to-describe-them-all. > > Yes, I'm open to this possibility. I basically want two things in the > object passed through the extended buffer protocol: > > 1) It's fast on the C-level > 2) It covers all the use-cases. > > If just a particular string or list structure were passed, then I would > drop the data-format PEP and just have the dataformat argument of the > extended buffer protocol be that thing. > > Then, something that converts ctypes objects to that special format > would be very nice indeed. It may make sense to have a couple distinct sections in the datatype PEP: a. describing data formats with basic Python types b. a lightweight class for parsing these data format descriptions It's most of the way there already - part A would just be the various styles of arguments accepted by the datatype constructor, and part B would be the datatype object itself. I personally think it makes the most sense to do both, but separating the two would make it clear that the descriptions can be standardised without *necessarily* defining a new class. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From g.brandl at gmx.net Wed Nov 1 11:02:39 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 01 Nov 2006 11:02:39 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com> <45481292.7040403@acm.org> Message-ID: Fredrik Lundh wrote: > Talin wrote: > >> I'm right in the middle of typing up a largish post to go on the >> Python-3000 mailing list about this issue. Maybe we should move it over >> there, since its likely that any path reform will have to be targeted at >> Py3K...? > > I'd say that any proposal that cannot be fit into the current 2.X design > is simply too disruptive to go into 3.0. So here's my proposal for 2.6 > (reposted from the 3K list). > > This is fully backwards compatible, can go right into 2.6 without > breaking anything, allows people to update their code as they go, > and can be incrementally improved in future releases: > > 1) Add a pathname wrapper to "os.path", which lets you do basic > path "algebra". This should probably be a subclass of unicode, > and should *only* contain operations on names. > > 2) Make selected "shutil" operations available via the "os" name- > space; the old POSIX API vs. POSIX SHELL distinction is pretty > irrelevant. Also make the os.path predicates available via the > "os" namespace. > > This gives a very simple conceptual model for the user; to manipulate > path *names*, use "os.path.(string)" functions or the "" > wrapper. To manipulate *objects* identified by a path, given either as > a string or a path wrapper, use "os.(path)". This can be taught in > less than a minute. +1. This is really straightforward and easy to learn. I have been a supporter of the full-blown Path object in the past, but the recent discussions have convinved me that it is just too big and too confusing, and that you can't kill too many birds with one stone in this respect. Most of the ugliness really lies in the path name manipulation functions, which nicely map to methods on a path name object. Georg From g.brandl at gmx.net Wed Nov 1 11:06:14 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 01 Nov 2006 11:06:14 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> Message-ID: glyph at divmod.com wrote: > On 03:14 am, sluggoster at gmail.com wrote: > > >One thing is sure -- we urgently need something better than os.path. > >It functions well but it makes hard-to-read and unpythonic code. > > I'm not so sure. The need is not any more "urgent" today than it was 5 > years ago, when os.path was equally "unpythonic" and unreadable. The > problem is real but there is absolutely no reason to hurry to a > premature solution. > > I've already recommended Twisted's twisted.python.filepath module as a > possible basis for the implementation of this feature. I'm sorry I > don't have the time to pursue that. I'm also sad that nobody else seems > to have noticed. Twisted's implemenation has an advantage that it > doesn't seem that these new proposals do, an advantage I would really > like to see in whatever gets seriously considered for adoption: Looking at , it seems as if FilePath was made to serve a different purpose than what we're trying to discuss here: """ I am a path on the filesystem that only permits 'downwards' access. Instantiate me with a pathname (for example, FilePath('/home/myuser/public_html')) and I will attempt to only provide access to files which reside inside that path. [...] The correct way to use me is to instantiate me, and then do ALL filesystem access through me. """ What a successor to os.path needs is not security, it's a better (more pythonic, if you like) interface to the old functionality. Georg From ncoghlan at gmail.com Wed Nov 1 11:16:02 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Nov 2006 20:16:02 +1000 Subject: [Python-Dev] patch 1462525 or similar solution? In-Reply-To: <20061031230557.656049036@place.org> References: <20061031230557.656049036@place.org> Message-ID: <454873E2.3060308@gmail.com> Paul Jimenez wrote: > I submitted patch 1462525 awhile back to > solve the problem described even longer ago in > http://mail.python.org/pipermail/python-dev/2005-November/058301.html > and I'm wondering what my appropriate next steps are. Honestly, I don't > care if you take my patch or someone else's proposed solution, but I'd > like to see something go into the stdlib so that I can eventually stop > having to ship custom code for what is really a standard problem. Something that has been lurking on my to-do list for the past year(!) is to get the urischemes module I wrote based on your uriparse module off the Python patch tracker [1] and into the cheese shop somewhere. It already has limited documentation in the form of docstrings with doctest examples (although the non-doctest examples in the module docstring still need to be fixed), and there are a whole barrel tests in the _test() function which could be converted to unittest fairly easily. The reason I'd like to see something in the cheese shop rather than going straight into the standard library is that: 1. It may help people now, rather than in 18-24 months when 2.6 comes out 2. The module can see some real world usage to firm up the API before we commit to it for the standard lib (if it gets added at all) That said, I don't see myself finding the roundtuits to publish and promote this anytime soon :( Cheers, Nick. [1] http://sourceforge.net/tracker/?func=detail&aid=1500504&group_id=5470&atid=305470 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From anthony at python.org Wed Nov 1 11:50:32 2006 From: anthony at python.org (Anthony Baxter) Date: Wed, 1 Nov 2006 21:50:32 +1100 Subject: [Python-Dev] RELEASED Python 2.3.6, FINAL Message-ID: <200611012150.44644.anthony@python.org> On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.6 (FINAL). Python 2.3.6 is a security bug-fix release. While Python 2.5 is the latest version of Python, we're making this release for people who are still running Python 2.3. Unlike the recently released 2.4.4, this release only contains a small handful of security-related bugfixes. See the website for more. * Python 2.3.6 contains a fix for PSF-2006-001, a buffer overrun * in repr() of unicode strings in wide unicode (UCS-4) builds. * See http://www.python.org/news/security/PSF-2006-001/ for more. This is a **source only** release. The Windows and Mac binaries of 2.3.5 were built with UCS-2 unicode, and are therefore not vulnerable to the problem outlined in PSF-2006-001. The PCRE fix is for a long-deprecated module (you should use the 're' module instead) and the email fix can be obtained by downloading the standalone version of the email package. Most vendors who ship Python should have already released a patched version of 2.3.5 with the above fixes, this release is for people who need or want to build their own release, but don't want to mess around with patch or svn. There have been no changes (apart from the version number) since the release candidate of 2.3.6. Python 2.3.6 will complete python.org's response to PSF-2006-001. If you're still on Python 2.2 for some reason and need to work with UCS-4 unicode strings, please obtain the patch from the PSF-2006-001 security advisory page. Python 2.4.4 and Python 2.5 have both already been released and contain the fix for this security problem. For more information on Python 2.3.6, including download links for source archives, release notes, and known issues, please see: http://www.python.org/2.3.6 Highlights of this new release include: - A fix for PSF-2006-001, a bug in repr() for unicode strings on UCS-4 (wide unicode) builds. - Two other, less critical, security fixes. Enjoy this release, Anthony Anthony Baxter anthony at python.org Python Release Manager (on behalf of the entire python-dev team) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/cb5f29ef/attachment-0001.pgp From jml at mumak.net Wed Nov 1 12:57:43 2006 From: jml at mumak.net (Jonathan Lange) Date: Wed, 1 Nov 2006 22:57:43 +1100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> Message-ID: On 11/1/06, Georg Brandl wrote: > glyph at divmod.com wrote: > > On 03:14 am, sluggoster at gmail.com wrote: > > > > >One thing is sure -- we urgently need something better than os.path. > > >It functions well but it makes hard-to-read and unpythonic code. > > > > I'm not so sure. The need is not any more "urgent" today than it was 5 > > years ago, when os.path was equally "unpythonic" and unreadable. The > > problem is real but there is absolutely no reason to hurry to a > > premature solution. > > > > I've already recommended Twisted's twisted.python.filepath module as a > > possible basis for the implementation of this feature. I'm sorry I > > don't have the time to pursue that. I'm also sad that nobody else seems > > to have noticed. Twisted's implemenation has an advantage that it > > doesn't seem that these new proposals do, an advantage I would really > > like to see in whatever gets seriously considered for adoption: > > Looking at > , > it seems as if FilePath was made to serve a different purpose than what we're > trying to discuss here: > > """ > I am a path on the filesystem that only permits 'downwards' access. > > Instantiate me with a pathname (for example, > FilePath('/home/myuser/public_html')) and I will attempt to only provide access > to files which reside inside that path. [...] > > The correct way to use me is to instantiate me, and then do ALL filesystem > access through me. > """ > > What a successor to os.path needs is not security, it's a better (more pythonic, > if you like) interface to the old functionality. > Then let us discuss that. Is FilePath actually a better interface to the old functionality? Even if it was designed to solve a security problem, it might prove to be an extremely useful general interface. jml From fredrik at pythonware.com Wed Nov 1 13:10:21 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 13:10:21 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com> Message-ID: Jonathan Lange wrote: > Then let us discuss that. Glyph's references to bike sheds went right over your head, right? From exarkun at divmod.com Wed Nov 1 15:09:48 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Wed, 1 Nov 2006 09:09:48 -0500 Subject: [Python-Dev] Path object design In-Reply-To: Message-ID: <20061101140948.20948.1876757841.divmod.quotient.8952@ohm> On Wed, 01 Nov 2006 11:06:14 +0100, Georg Brandl wrote: >glyph at divmod.com wrote: >> On 03:14 am, sluggoster at gmail.com wrote: >> >> >One thing is sure -- we urgently need something better than os.path. >> >It functions well but it makes hard-to-read and unpythonic code. >> >> I'm not so sure. The need is not any more "urgent" today than it was 5 >> years ago, when os.path was equally "unpythonic" and unreadable. The >> problem is real but there is absolutely no reason to hurry to a >> premature solution. >> >> I've already recommended Twisted's twisted.python.filepath module as a >> possible basis for the implementation of this feature. I'm sorry I >> don't have the time to pursue that. I'm also sad that nobody else seems >> to have noticed. Twisted's implemenation has an advantage that it >> doesn't seem that these new proposals do, an advantage I would really >> like to see in whatever gets seriously considered for adoption: > >Looking at >, >it seems as if FilePath was made to serve a different purpose than what we're >trying to discuss here: > >""" >I am a path on the filesystem that only permits 'downwards' access. > >Instantiate me with a pathname (for example, >FilePath('/home/myuser/public_html')) and I will attempt to only provide access >to files which reside inside that path. [...] > >The correct way to use me is to instantiate me, and then do ALL filesystem >access through me. >""" > >What a successor to os.path needs is not security, it's a better (more pythonic, >if you like) interface to the old functionality. No. You've misunderstood the code you looked at. FilePath serves exactly the purpose being discussed here. Take a closer look. Jean-Paul From wbaxter at gmail.com Wed Nov 1 16:41:06 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Wed, 1 Nov 2006 15:41:06 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: <454859A6.4050904@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > Bill Baxter schrieb: > > Basically in my code I want to be able to take the binary data descriptor and > > say "give me the 'r' field of this pixel as an integer". > > > > Is either one (the PEP or c-types) clearly easier to use in this case? > > What > > would the code look like for handling both formats generically? > > The PEP, as specified, does not support accessing individual fields from > Python. OTOH, ctypes, as implemented, does. This comparison is not fair, > though: an *implementation* of the PEP (say, NumPy) might also give you > Python-level access to the fields. I see. So at the Python-user convenience level it's pretty much a wash. Are there significant differences in memory usage and/or performance? ctypes sounds to be more heavyweight from the discussion. If I have a lot of image formats I want to support is that going to mean lots of overhead with ctypes? Do I pay for it whether or not I actually end up having to handle an image in a given format? > With the PEP, you can get access to the 'r' field from C code. > Performing this access is quite tedious; as I'm uncertain whether you > actually wanted to see C code, I refrain from trying to formulate it. Actually this is more what I was after. I've written C code to interface with Numpy arrays and found it to be not so bad. But the data I was passing around was just a plain N-dimensional array of doubles. Very basic. It *sounds* like what Travis is saying is that handling a less simple case, like the one above of supporting a variety of RGB image formats, would be easier with the PEP than with ctypes. Or maybe it's generating the data in my C code that's trickier, as opposed to consuming it? I'm just trying to understand what the deal is, and at the same time perhaps inject a more concrete example into the discussion. Travis has said several times that working with ctypes, which requires a Python type per 'element', is more complicated from the C side, and I'd like to see more concretely how so, as someone who may end up needing to write such code. And I'm ok without seeing the actual code if someone can actually answer my question. The question is not whether it is tedious or not -- everything about the Python C API is tedious from what I've seen. The question is which is *more* tedious, and how significan is the difference in tediousness to the guy who's job it is to actually write the code. --bb From oliphant.travis at ieee.org Wed Nov 1 17:06:00 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 09:06:00 -0700 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: Fredrik Lundh wrote: > Terry Reedy wrote: > >> I believe that at present PyGame can only work with external images that it >> is programmed to know how to import. My guess is that if image source >> program X (such as PIL) described its data layout in a way that NumPy could >> read and act on, the import/copy step could be eliminated. > > I wish you all stopped using PIL as an example in this discussion; > for PIL 2, I'm moving towards an entirely opaque data model, with a > "data view"-style client API. That's an un-reasonable request. The point of the buffer protocol allows people to represent their data in whatever way they like internally but still share it in a standard way. The extended buffer protocol allows sharing of the shape of the data and its format in a standard way as well. We just want to be able to convert the data in PIL objects to other Python objects without having to write special "converter" functions. It's not important how PIL or PIL 2 stores the data as long as it participates in the buffer protocol. Of course if the memory layout were compatible with the model of NumPy, then data-copies would not be required, but that is really secondary. -Travis From glyph at divmod.com Wed Nov 1 17:09:10 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 01 Nov 2006 16:09:10 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com> On 10:06 am, g.brandl at gmx.net wrote: >What a successor to os.path needs is not security, it's a better (more pythonic, >if you like) interface to the old functionality. Why? I assert that it needs a better[1] interface because the current interface can lead to a variety of bugs through idiomatic, apparently correct usage. All the more because many of those bugs are related to critical errors such as security and data integrity. If I felt the current interface did a good job at doing the right thing in the right situation, but was cumbersome to use, I would strenuously object to _any_ work taking place to change it. This is a hard API to get right. [1]: I am rather explicitly avoiding the word "pythonic" here. It seems to have grown into a shibboleth (and its counterpart, "unpythonic", into an expletive). I have the impression it used to mean something a bit more specific, maybe adherence to Tim Peters' "Zen" (although that was certainly vague enough by itself and not always as self-evidently true as some seem to believe). More and more, now, though, I hear it used to mean 'stuff should be more betterer!' and then everyone nods sagely because we know that no filthy *java* programmer wants things to be more betterer; *we* know *they* want everything to be horrible. Words like this are a pet peeve of mine though, so perhaps I am overstating the case. Anyway, moving on... as long as I brought up the Zen, perhaps a particular couplet is appropriate here: Now is better than never. Although never is often better than *right* now. Rushing to a solution to a non-problem, e.g. the "pythonicness" of the interface, could exacerbate a very real problem, e.g. the security and data-integrity implications of idiomatic usage. Granted, it would be hard to do worse than os.path, but it is by no means impossible (just look at any C program!), and I can think of a couple of kinds of API which would initially appear more convenient but actually prove more problematic over time. That brings me back to my original point: the underlying issue here is too important a problem to get wrong *again* on the basis of a superficial "need" for an API that is "better" in some unspecified way. os.path is at least possible to get right if you know what you're doing, which is no mean feat; there are many path-manipulation libraries in many languages which cannot make that claim (especially portably). Its replacement might not be. Getting this wrong outside the standard library might create problems for some people, but making it worse _in_ the standard library could create a total disaster for everyone. I do believe that this wouldn't get past the dev team (least of all the release manager) but it would waste a lot less of everyone's time if we focused the inevitable continuing bike-shed discussion along the lines of discussing the known merits of widely deployed alternative path libraries, or at least an approach to *get* that data on some new code if there is consensus that existing alternatives are in some way inadequate. If for some reason it _is_ deemed necessary to go with an untried approach, I can appreciate the benefits that /F has proposed of trying to base the new interface entirely and explicitly off the old one. At least that way it will still definitely be possible to get right. There are problems with that too, but they are less severe. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/0f4b5a01/attachment.html From oliphant.travis at ieee.org Wed Nov 1 17:44:27 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 09:44:27 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP Message-ID: Thanks for all the comments that have been given on the data-type (data-format) PEP. I'd like opinions on an idea for revising the PEP I have. What if we look at this from the angle of trying to communicate data-formats between different libraries (not change the way anybody internally deals with data-formats). For example, ctypes has one way to internally deal with data-formats (using type objects). NumPy/Numeric has a way to internally deal with data-formats (using PyArray_Descr * structure -- in Numeric it's just a C-structure but in NumPy it's fleshed out further and also a Python object called the data-type). Numarray has a way to internally deal with data-formats (using type objects). The array module has a way to internally deal with data-formats (using a PyArray_Descr * structure -- and character codes to select one). The struct module deals with data-formats using character codes. The PIL deals with data-formats using image modes. PyVTK deals with data-formats using it's own internal objects. MPI deals with data-formats using it's own MPI_DataType structures. This list goes on and on. What I claim is needed in Python (to make it better glue) is to have a standard way to communicate data-format information between these extensions. Then, you don't have to build in support for all the different ways data-formats are represented by different libraries. The library only has to be able to translate their representation to the standard way that Python uses to represent data-format. How is this goal going to be achieved? That is the real purpose of the data-type object I previously proposed. Nick showed that there are two (non-orthogonal) ways to think about this goal. 1) We could define a special string-syntax (or list syntax) that covers every special case. The array interface specification goes this direction and it requires no new Python types. This could also be seen as an extension of the "struct" module to allow for nested structures, etc. 2) We could define a Python object that specifically carries data-format information. There is also a third way (or really 2b) that has been mentioned: take one of the extensions and use what it does to communicate data-format between objects and require all other extensions to conform to that standard. The problem with 2b is that what works inside an extension module may not be the best option when it comes to communicating across multiple extension modules. Certainly none of the extension modules have argued that case effectively. Does that explain the goal of what I'm trying to do better? From oliphant.travis at ieee.org Wed Nov 1 17:58:05 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 09:58:05 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: Message-ID: Travis E. Oliphant wrote: > Thanks for all the comments that have been given on the data-type > (data-format) PEP. I'd like opinions on an idea for revising the PEP I > have. > > 1) We could define a special string-syntax (or list syntax) that covers > every special case. The array interface specification goes this > direction and it requires no new Python types. This could also be seen > as an extension of the "struct" module to allow for nested structures, etc. > > 2) We could define a Python object that specifically carries data-format > information. > > > Does that explain the goal of what I'm trying to do better? In other-words, what I'm saying is I really want a PEP that does this. Could we have a discussion about what the best way to communicate data-format information across multiple extension modules would look like. I'm not saying my (pre-)PEP is best. The point of putting it in it's infant state out there is to get the discussion rolling, not to claim I've got all the answers. It seems like there are enough people who have dealt with this issue that we ought to be able to put something very useful together that would make Python much better glue. -Travis From martin at v.loewis.de Wed Nov 1 18:48:45 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 18:48:45 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: Message-ID: <4548DDFD.5030604@v.loewis.de> Travis E. Oliphant schrieb: > What if we look at this from the angle of trying to communicate > data-formats between different libraries (not change the way anybody > internally deals with data-formats). ISTM that this is not the right approach. If the purpose of the datatype object is just to communicate the layout in the extended buffer interface, then it should be specified in that PEP, rather than being stand-alone, and it should not pretend to serve any other purpose. Or, if it does have uses independent of the buffer extension: what are those uses? > 1) We could define a special string-syntax (or list syntax) that covers > every special case. The array interface specification goes this > direction and it requires no new Python types. This could also be seen > as an extension of the "struct" module to allow for nested structures, etc. > > 2) We could define a Python object that specifically carries data-format > information. To distinguish between these, convenience of usage (and of construction) should have to be taken into account. At least for the preferred alternative, but better for the runners-up, too, there should be a demonstration on how existing modules have to be changed to support it (e.g. for the struct and array modules as producers; not sure what good consumer code would be). Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B), what would the C code look like that does that? (it seems now that the primary purpose of this machinery is image manipulation) > The problem with 2b is that what works inside an extension module may > not be the best option when it comes to communicating across multiple > extension modules. Certainly none of the extension modules have argued > that case effectively. I think there are two ways in which one option could be "better" than the other: it might be more expressive, and it might be easier to use. For the second aspect (ease of use), there are two subways: it might be easier to produce, or it might be easier to consume. Regards, Martin From g.brandl at gmx.net Wed Nov 1 19:04:27 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 01 Nov 2006 19:04:27 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com> References: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com> Message-ID: glyph at divmod.com wrote: > On 10:06 am, g.brandl at gmx.net wrote: > >What a successor to os.path needs is not security, it's a better (more > pythonic, > >if you like) interface to the old functionality. > > Why? > > I assert that it needs a better[1] interface because the current > interface can lead to a variety of bugs through idiomatic, apparently > correct usage. All the more because many of those bugs are related to > critical errors such as security and data integrity. AFAICS, people just want an interface that is easier to use and feels more... err... (trying to avoid the p-word). I've never seen security arguments being made in this discussion. > If I felt the current interface did a good job at doing the right thing > in the right situation, but was cumbersome to use, I would strenuously > object to _any_ work taking place to change it. This is a hard API to > get right. Well, it's hard to change any running system with that attitude. It doesn't have to be changed if nobody comes up with something that's agreed (*) to be better. (*) agreed in the c.l.py sense, of course Georg From fredrik at pythonware.com Wed Nov 1 19:14:15 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 19:14:15 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com> References: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com> Message-ID: glyph at divmod.com wrote: > I assert that it needs a better[1] interface because the current > interface can lead to a variety of bugs through idiomatic, apparently > correct usage. All the more because many of those bugs are related to > critical errors such as security and data integrity. instead of referring to some esoteric knowledge about file systems that us non-twisted-using mere mortals may not be evolved enough to under- stand, maybe you could just make a list of common bugs that may arise due to idiomatic use of the existing primitives? I promise to make a nice FAQ entry out of it, with proper attribution. From jimjjewett at gmail.com Wed Nov 1 19:17:42 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 1 Nov 2006 13:17:42 -0500 Subject: [Python-Dev] PEP: Adding data-type objects to Python Message-ID: I'm still not sure exactly what is missing from ctypes. To make this concrete: You have an array of 500 elements meeting struct { int simple; struct nested { char name[30]; char addr[45]; int amount; } ctypes can describe this as class nested(Structure): _fields_ = [("name", c_char*30), ("addr", c_char*45), ("amount", c_long)] class struct(Structure): _fields_ = [("simple", c_int), ("nested", nested)] desc = struct * 500 You have said that creating whole classes is too much overhead, and the description should only be an instance. To me, that particular class (arrays of 500 structs) still looks pretty lightweight. So please clarify when it starts to be a problem. (1) For simple types -- mapping char name[30]; ==> ("name", c_char*30) Do you object to using the c_char type? Do you object to the array-of-length-30 class, instead of just having a repeat or shape attribute? Do you object to naming the field? (2) For the complex types, nested and struct Do you object to creating these two classes even once? For example, are you expecting to need different classes for each buffer, and to have many buffers created quickly? Is creating that new class a royal pain, but frequent (and slow) enough that you can't just make a call into python (or ctypes)? (3) Given that you will describe X, is X*500 (==> a type describing an array of 500 Xs) a royal pain in C? If so, are you expecting to have to do it dynamically for many sizes, and quickly enough that you can't just let ctypes do it for you? -jJ From oliphant.travis at ieee.org Wed Nov 1 19:30:07 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 11:30:07 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <4548DDFD.5030604@v.loewis.de> References: <4548DDFD.5030604@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Travis E. Oliphant schrieb: >> What if we look at this from the angle of trying to communicate >> data-formats between different libraries (not change the way anybody >> internally deals with data-formats). > > ISTM that this is not the right approach. If the purpose of the datatype > object is just to communicate the layout in the extended buffer > interface, then it should be specified in that PEP, rather than being > stand-alone, and it should not pretend to serve any other purpose. I'm actually quite fine with that. If that is the consensus, then I will just go that direction. ISTM though that since we are putting forth the trouble inside the extended buffer protocol we might as well be as complete as we know how to be. > Or, if it does have uses independent of the buffer extension: what > are those uses? So that NumPy and ctypes and audio libraries and video libraries and database libraries and image-file format libraries can communicate about data-formats using the same expressions (in Python). Maybe we decide that ctypes-based expressions are a very good way to communicate about those things in Python for all other packages. If that is the case, then I argue that we ought to change the array module, and the struct module to conform (of course keeping the old ways for backward compatibility) and set the standard for other packages to follow. What problem do you have in defining a standard way to communicate about binary data-formats (not just images)? I still can't figure out why you are so resistant to the idea. MPI had to do it. > >> 1) We could define a special string-syntax (or list syntax) that covers >> every special case. The array interface specification goes this >> direction and it requires no new Python types. This could also be seen >> as an extension of the "struct" module to allow for nested structures, etc. >> >> 2) We could define a Python object that specifically carries data-format >> information. > > To distinguish between these, convenience of usage (and of construction) > should have to be taken into account. At least for the preferred > alternative, but better for the runners-up, too, there should be a > demonstration on how existing modules have to be changed to support it > (e.g. for the struct and array modules as producers; not sure what > good consumer code would be). Absolutely --- if something is to be made useful across packages and from Python. This is where the discussion should take place. The struct module and array modules would both be consumers also so that in the struct module you could specify your structure in terms of the standard data-represenation and in the array module you could specify your array in terms of the standard representation instead of using "character codes". > > Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B), > what would the C code look like that does that? (it seems now that the > primary purpose of this machinery is image manipulation) > For me it is definitely not image manipulation that is the only purpose (or even the primary purpose). It's just an easy one to explain --- most people understand images). But, I think this question is actually irrelevant (IMHO). To me, how you change all RGB values to gray would depend on the library you are using not on how data-formats are expressed. Maybe we are still mis-understanding each other. If you really want to know. In NumPy it might look like this: Python code: img['r'] = img['g'] img['b'] = img['g'] C-code: use the Python C-API to do essentially the same thing as above or to do img['r'] = img['g'] dtype = img->descr; r_field = PyDict_GetItemString(dtype,'r'); g_field = PyDict_GetItemString(dtype,'g'); r_field_dtype = PyTuple_GET_ITEM(r_field, 0); r_field_offset = PyTuple_GET_ITEM(r_field, 1); g_field_dtype = PyTuple_GET_ITEM(g_field, 0); g_field_offset = PyTuple_GET_ITEM(g_field, 1); obj = PyArray_GetField(img, g_field, g_field_offset); Py_INCREF(r_field) PyArray_SetField(img, r_field, r_field_offset, obj); But, I still don't see how that is relevant to the question of how to represent the data-format to share that information across two extensions. >> The problem with 2b is that what works inside an extension module may >> not be the best option when it comes to communicating across multiple >> extension modules. Certainly none of the extension modules have argued >> that case effectively. > > I think there are two ways in which one option could be "better" than > the other: it might be more expressive, and it might be easier to use. > For the second aspect (ease of use), there are two subways: it might > be easier to produce, or it might be easier to consume. I like this as a means to judge a data-format representation. Let me summarize to see if I understand: 1) Expressive (does it express every data-format you might want or need) 2) Ease of use a) Production: How easy is it to create the representation. b) Consumption: How easy is it to interpret the representation. -Travis From brett at python.org Wed Nov 1 20:17:56 2006 From: brett at python.org (Brett Cannon) Date: Wed, 1 Nov 2006 11:17:56 -0800 Subject: [Python-Dev] [Tracker-discuss] Getting Started In-Reply-To: <4548F1FD.5010505@sympatico.ca> References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca> <50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com> <200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca> <4548F1FD.5010505@sympatico.ca> Message-ID: On 11/1/06, Stefan Seefeld wrote: > > Brett Cannon wrote: > > On 11/1/06, Stefan Seefeld wrote: > > >> Right. Brett, do we need accounts on python.org for this ? > > > > > > Yep. It just requires SSH 2 keys from each of you. You can then email > > python-dev with those keys and your first.last name and someone there > will > > install the keys for you. > > My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld. > > Thanks ! Just to clarify, this is not for pydotorg but the svn.python.org. The admins for our future Roundup instance are going to keep their Roundup code in svn so they need commit access. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/e2947c52/attachment.html From oliphant.travis at ieee.org Wed Nov 1 19:50:16 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 11:50:16 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: Message-ID: <4548EC68.1020505@ieee.org> Jim Jewett wrote: > I'm still not sure exactly what is missing from ctypes. To make this > concrete: I think the only thing missing from ctypes "expressiveness" as far as I can tell in terms of what you "can" do is the byte-order representation. What is missing is ease-of use for producers and consumers in interpreting the data-type. When I speak of Producers and consumers, I'm largely talking about C-code (or Java or .NET) code writers. Producers must basically use Python code to create classes of various types. This is going to be slow in 'C'. Probably slower than the array interface (which is what we have no informally). Consumers are going to have a hard time interpreting the result. I'm not even sure how to do that, in fact. I'd like NumPy to be able to understand ctypes as a means to specify data. Would I have to check against all the sub-types of CDataType, pull out the fields, check the tp_name of the type object? I'm not sure. It seems like a string with the C-structure would be better as a data-representation, but then a third-party library would want to parse that so that Python might as well have it's own parser for data-types. So, Python might as well have it's own way to describe data. My claim is this default way should *not* be overloaded by using Python type-objects (the ctypes way). I'm making a claim that the NumPy way of using a different Python object to describe data-types. I'm not saying the NumPy object should be used. I'm saying we should come up with a singe DataFormatType whose instances express the data formats in ways that other packages can produce and consume (or even use internally). It would be easy for NumPy to "use" the default Python object in it's PyArray_Descr * structure. It would also be easy for ctypes to "use" the default Python object in its StgDict object that is the tp_dict of every ctypes type object. It would be easy for the struct module to allow for this data-format object (instead of just strings) in it's methods. It would be easy for the array module to accept this data-format object (instead of just typecodes) in it's constructor. Lot's of things would suddenly be more consistent throughout both the Python and C-Python user space. Perhaps after discussion, it becomes clear that the ctypes approach is sufficient to be "that thing" that all modules use to share data-format information. It's definitely expressive enough. But, my argument is that NumPy data-type objects are also "pretty close." so why should they be rejected. We could also make a "string-syntax" do it. > > You have said that creating whole classes is too much overhead, and > the description should only be an instance. To me, that particular > class (arrays of 500 structs) still looks pretty lightweight. So > please clarify when it starts to be a problem. > > (1) For simple types -- mapping > char name[30]; ==> ("name", c_char*30) > > Do you object to using the c_char type? > Do you object to the array-of-length-30 class, instead of just having > a repeat or shape attribute? > Do you object to naming the field? > > (2) For the complex types, nested and struct > > Do you object to creating these two classes even once? For example, > are you expecting to need different classes for each buffer, and to > have many buffers created quickly? I object to the way I "consume" and "produce" the ctypes interface. It's much to slow to be used on the C-level for sharing many small buffers quickly. > > Is creating that new class a royal pain, but frequent (and slow) > enough that you can't just make a call into python (or ctypes)? > > (3) Given that you will describe X, is X*500 (==> a type describing > an array of 500 Xs) a royal pain in C? If so, are you expecting to > have to do it dynamically for many sizes, and quickly enough that you > can't just let ctypes do it for you? That pretty much sums it up (plus the pain of having to basically write Python code from "C"). -Travis From jimjjewett at gmail.com Wed Nov 1 20:35:33 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 1 Nov 2006 14:35:33 -0500 Subject: [Python-Dev] Path object design Message-ID: On 10:06 am, g.brandl at gmx.net wrote: >> What a successor to os.path needs is not security, it's a better (more pythonic, >> if you like) interface to the old functionality. Glyph: > Why? > Rushing ... could exacerbate a very real problem, e.g. > the security and data-integrity implications of idiomatic usage. The proposed Path object (or new path module) is intended to replace os.path. If it can't do the equivalent of "cd ..", then it isn't a replacement; it is just another similar alternative to confuse beginners. If you're saying that a webserver should use a more restricted subclass (or even the existing FilePath alternative), then I agree. I'll even agree that a restricted version would ideally be available out of the box. I don't think it should be the only option. -jJ From brett at python.org Wed Nov 1 20:36:56 2006 From: brett at python.org (Brett Cannon) Date: Wed, 1 Nov 2006 11:36:56 -0800 Subject: [Python-Dev] [Tracker-discuss] Getting Started In-Reply-To: <87slh3vuk0.fsf@uterus.efod.se> References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca> <50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com> <200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca> <4548F1FD.5010505@sympatico.ca> <87slh3vuk0.fsf@uterus.efod.se> Message-ID: On 11/1/06, Erik Forsberg wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > "Brett Cannon" writes: > > > On 11/1/06, Stefan Seefeld wrote: > >> > >> Brett Cannon wrote: > >> > On 11/1/06, Stefan Seefeld wrote: > >> > >> >> Right. Brett, do we need accounts on python.org for this ? > >> > > >> > > >> > Yep. It just requires SSH 2 keys from each of you. You can then > email > >> > python-dev with those keys and your first.last name and someone there > >> will > >> > install the keys for you. > >> > >> My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan > Seefeld. > >> > >> Thanks ! > > > > > > Just to clarify, this is not for pydotorg but the svn.python.org. The > > admins for our future Roundup instance are going to keep their Roundup > code > > in svn so they need commit access. > > Now when that's clarified, here's my data: > > Public SSH key: http://efod.se/about/ptkey.pub > First.Lastname: erik.forsberg > > I'd appreciate if someone with good taste could tell us where in the > tree we should add our code :-). Right at the root: ``svn+ssh://pythondev at svn.python.org/tracker`` (or replace "tracker" without whatever name you guys want to go with). This is because the tracker code is conceptually its own project. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/95b687e6/attachment.htm From oliphant.travis at ieee.org Wed Nov 1 20:38:01 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 12:38:01 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: Message-ID: Jim Jewett wrote: > I'm still not sure exactly what is missing from ctypes. To make this concrete: I was too hasty. There are some things actually missing from ctypes: 1) long double (this is not the same across platforms, but it is a data-type). 2) complex-valued types (you might argue that it's just a 2-array of floats, but you could say the same thing about int as an array of bytes). The point is how do people interpret the data. Complex-valued data-types are very common. It is one reason Fortran is still used by scientists. 3) Unicode characters (there is w_char support but I mean a way to describe what kind of unicode characters you have in a cross-platform way). I actually think we have a way to describe encodings in the data-format representation as well. 4) What about floating-point representations that are not IEEE 754 4-byte or 8-byte. There should be a way to at least express the data-format in these cases (this is actually how long double should be handled as well since it varies across platforms what is actually done with the extra bits). So, we can't "just use ctypes" as a complete data-format representation because it's also missing some things. What we need is a standard way for libraries that deal with data-formats to communicate with each other. I need help with a PEP like this and that's what I'm asking for. It's all I've really been after all along. A couple of points: * One reason to support the idea of the Python object approach (versus a string-syntax) is that it "is already parsed". A list-syntax approach (perhaps built from strings for fundamental data-types) might also be considered "already parsed" as well. * One advantage of using "kind" versus a character for every type (like struct and array do) is that it helps consumers and producers speed up the parser (a fuller branching tree). -Travis From martin at v.loewis.de Wed Nov 1 20:49:44 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 20:49:44 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: <4548DDFD.5030604@v.loewis.de> Message-ID: <4548FA58.4050702@v.loewis.de> Travis E. Oliphant schrieb: >> Or, if it does have uses independent of the buffer extension: what >> are those uses? > > So that NumPy and ctypes and audio libraries and video libraries and > database libraries and image-file format libraries can communicate about > data-formats using the same expressions (in Python). I find that puzzling. In what way can the specification of a data type enable communication? Don't you need some kind of protocol for it (i.e. operations to be invoked)? Also, do you mean that these libraries can communicate with each other? Or with somebody else? If so, with whom? > What problem do you have in defining a standard way to communicate about > binary data-formats (not just images)? I still can't figure out why you > are so resistant to the idea. MPI had to do it. I'm afraid of "dead" specifications, things whose only motivation is that they look nice. They are just clutter. There are a few examples of this already in Python, like the character buffer interface or the multi-segment buffers. As for MPI: It didn't just independently define a data types system. Instead, it did that, *and* specified the usage of the data types in operations such as MPI_SEND. It is very clear what the scope of this data description is, and what the intended usage is. Without specifying an intended usage, it is impossible to evaluate whether the specification meets its goals. > Absolutely --- if something is to be made useful across packages and > from Python. This is where the discussion should take place. The > struct module and array modules would both be consumers also so that in > the struct module you could specify your structure in terms of the > standard data-represenation and in the array module you could specify > your array in terms of the standard representation instead of using > "character codes". Ok, that would be a new usage: I expected that datatype instances always come in pairs with memory allocated and filled according to the description. If you are proposing to modify/extend the API of the struct and array modules, you should say so somewhere (in a PEP). >> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B), >> what would the C code look like that does that? (it seems now that the >> primary purpose of this machinery is image manipulation) >> > > For me it is definitely not image manipulation that is the only purpose > (or even the primary purpose). It's just an easy one to explain --- > most people understand images). But, I think this question is actually > irrelevant (IMHO). To me, how you change all RGB values to gray would > depend on the library you are using not on how data-formats are expressed. > > Maybe we are still mis-understanding each other. I expect that the primary readers/users of the PEP would be people who have to write libraries: i.e. people implementing NumPy, struct, array, and people who implement algorithms that operate on data. So usability of the specification is a matter of how easy it is to *write* a library that does perform the image manipulation. > If you really want to know. In NumPy it might look like this: > > Python code: > > img['r'] = img['g'] > img['b'] = img['g'] That's not what I'm asking. Instead, what does the NumPy code look like that gets invoked on these read-and-write operations? Does it only use the void* pointing to the start of the data, and the datatype object? If not, how would C code look like that only has the void* and the datatype object? > dtype = img->descr; In this code, is descr a datatype object? ... > r_field = PyDict_GetItemString(dtype,'r'); ... I guess not, because apparently, it is a dictionary, not a datatype object. > But, I still don't see how that is relevant to the question of how to > represent the data-format to share that information across two extensions. Well, if NumPy gets the data from a different module, it can't assume there is a descr object that is a dictionary. Instead, it must perform these operations just by using the datatype object. What else is the purpose of sharing the information, if not to use it to access the data? Regards, Martin From martin at v.loewis.de Wed Nov 1 21:05:28 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 21:05:28 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: Message-ID: <4548FE08.7070402@v.loewis.de> Travis E. Oliphant schrieb: > I was too hasty. There are some things actually missing from ctypes: I think Thomas can correct me if I'm wrong: I think endianness is supported (although this support seems undocumented). There seems to be code that checks for the presence of a _byteswapped_ attribute on fields of a struct; presence of this field is then interpreted as data having the "other" endianness. > 1) long double (this is not the same across platforms, but it is a > data-type). That's indeed missing. > 2) complex-valued types (you might argue that it's just a 2-array of > floats, but you could say the same thing about int as an array of > bytes). The point is how do people interpret the data. Complex-valued > data-types are very common. It is one reason Fortran is still used by > scientists. Well, by the same reasoning, you could argue that pixel values (RGBA) are missing in the PEP. It's a convenience, sure, and it may also help interfacing with the platform's FORTRAN implementation - however, are you sure that NumPy's complex layout is consistent with the platform's C99 _Complex definition? > 3) Unicode characters > > 4) What about floating-point representations that are not IEEE 754 > 4-byte or 8-byte. Both of these are available in a platform-dependent way: if the platform uses non-IEEE754 formats for C float and C double, ctypes will interface with that just fine. It is actually vice versa: IEEE-754 4-byte and 8-byte is not supported in ctypes. Same for Unicode: the platform's wchar_t is supported (as you said), but not a platform-independent (say) 4-byte little-endian. Regards, Martin From sluggoster at gmail.com Wed Nov 1 21:14:53 2006 From: sluggoster at gmail.com (Mike Orr) Date: Wed, 1 Nov 2006 12:14:53 -0800 Subject: [Python-Dev] Path object design In-Reply-To: <6e9196d20611011011m3b04225ao3b51b015accfa0a7@mail.gmail.com> References: <6e9196d20611011011m3b04225ao3b51b015accfa0a7@mail.gmail.com> Message-ID: <6e9196d20611011214g5bf63839j24ee976a0a0d4c67@mail.gmail.com> Argh, it's difficult to respond to one topic that's now spiraling into two conversations on two lists. glyph at divmod.com wrote: > On 03:14 am, sluggoster at gmail.com wrote: > > >One thing is sure -- we urgently need something better than os.path. > >It functions well but it makes hard-to-read and unpythonic code. > > I'm not so sure. The need is not any more "urgent" today than it was > 5 years ago, when os.path was equally "unpythonic" and unreadable. > The problem is real but there is absolutely no reason to hurry to a > premature solution. Except that people have had to spend five years putting hard-to-read os.path functions in the code, or reinventing the wheel with their own libraries that they're not sure they can trust. I started to use path.py last year when it looked like it was emerging as the basis of a new standard, but yanked it out again when it was clear the API would be different by the time it's accepted. I've gone back to os.path for now until something stable emerges but I really wish I didn't have to. > I've already recommended Twisted's twisted.python.filepath module as a > possible basis for the implementation of this feature.... > *It is already used in a large body of real, working code, and > therefore its limitations are known.* This is an important consideration.However, to me a clean API is more important. Since we haven't agreed on an API there is no widely-used module that implements it... it's a chicken-and-egg problem since it takes significant time to write and test an implementation. So I'd like to start from the standpoint of an ideal API rather than just taking the API of the most widely-used implementation. os.path is clearly the most widely-used implementation, but that doesn't mean that OOizing it as-is would be my favorite choice. I took a quick look at filepath. It looks similar in concept to PEP 355. Four concerns: - unfamiliar method names (createDirectory vs mkdir, child vs join) - basename/dirname/parent are methods rather than properties: leads to () overproliferation in user code. - the "secure" features may not be necessary. If they are, this should be a separate discussion, and perhaps implemented as a subclass. - stylistic objection to verbose camelCase names like createDirectory > Proposals for extending the language are contentious and it is very > difficult to do experimentation with non-trivial projects because > nobody wants to do that and then end up with a bunch of code written > in a language that is no longer supported when the experiment fails. True. > Path representation is a bike shed. Nobody would have proposed > writing an entirely new embedded database engine for Python: python > 2.5 simply included SQLite because its utility was already proven. There's a quantum level of difference between path/file manipulation -- which has long been considered a requirement for any full-featured programming language -- and a database engine which is much more complex. Georg Brandl wrote: > I have been a supporter of the full-blown Path object in the past, but the > recent discussions have convinved me that it is just too big and too confusing, > and that you can't kill too many birds with one stone in this respect. > Most of the ugliness really lies in the path name manipulation functions, which > nicely map to methods on a path name object. Fredrik has convinced me that it's more urgent to OOize the pathname conversions than the filesystem operations. Pathname conversions are the ones that frequently get nested or chained, whereas filesystem operations are usually done at the top level of a program statement, or return a different "kind" of value (stat, true/false, etc). However, it's interesting that all the proposals I've seen in the past three years have been a "monolithic" OO class. Clearly there are a lot of people who prefer this way, or at least have never heard of anything different. Where have all the proponents of non-OO or limited-OO strategies been? The first proposal of that sort I've seen was Nich Cochlan's October 1. Have y'all just been ignoring the monolithic OO efforts without offering any alternatives? Fredrik Lundh wrote: > > This is fully backwards compatible, can go right into 2.6 without > > breaking anything, allows people to update their code as they go, > > and can be incrementally improved in future releases: > > > > 1) Add a pathname wrapper to "os.path", which lets you do basic > > path "algebra". This should probably be a subclass of unicode, > > and should *only* contain operations on names. > > > > 2) Make selected "shutil" operations available via the "os" name- > > space; the old POSIX API vs. POSIX SHELL distinction is pretty > > irrelevant. Also make the os.path predicates available via the > > "os" namespace. > > > > This gives a very simple conceptual model for the user; to manipulate > > path *names*, use "os.path.(string)" functions or the "" > > wrapper. To manipulate *objects* identified by a path, given either as > > a string or a path wrapper, use "os.(path)". This can be taught in > > less than a minute. Making this more concrete, I think Fredrik is suggesting: - Make (os.path) abspath, basename, commonprefix, dirname, expanduser, expandvars, isabs, join, normcase, normpath, split, splitdrive, splitext, splitunc methods of a Path object. - Copy functions into os: (os.path) exists, lexists, get{atime,mtime,ctime,size}, is{file,dir,link,mount}, realpath, samefile, sameopenfile, samestat, (shutil) copy, copy2, copy{file,fileobj,mode,stat,tree}, rmtree, move. - Deprecate the old functions to remove in 3.0. - Abandon os.path.walk because os.walk is better. This is worth considering as a start. It does mean moving a lot of functions that may be moved again at some point in the future. If we do move shutil functions into os, I'd at least like to make some tiny improvements in them. Adding four lines to the beginning of rmtree would make it behave like my purge() function without detracting from its existing use: if not os.exists(p): return if not os.isdir(p): p.remove() Also, do we really need six copy methods? copy2 can be handled by a third argument, etc. -- Mike Orr From oliphant.travis at ieee.org Wed Nov 1 21:18:23 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 13:18:23 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <4548FA58.4050702@v.loewis.de> References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> Message-ID: <4549010F.6090200@ieee.org> Martin v. L?wis wrote: > Travis E. Oliphant schrieb: > >>> Or, if it does have uses independent of the buffer extension: what >>> are those uses? >>> >> So that NumPy and ctypes and audio libraries and video libraries and >> database libraries and image-file format libraries can communicate about >> data-formats using the same expressions (in Python). >> > > I find that puzzling. In what way can the specification of a data type > enable communication? Don't you need some kind of protocol for it > (i.e. operations to be invoked)? Also, do you mean that these libraries > can communicate with each other? Or with somebody else? If so, with > whom? > What is puzzling? I've just specified the extended buffer protocol as something concrete that data-format objects are shared through. That's on the C-level. I gave several examples of where such sharing would be useful. Then, I gave examples in Python of how sharing data-formats would also be useful so that modules could support the same means to construct data-formats (instead of struct using strings, array using typecodes, ctypes using it's type-objects, and NumPy using dtype objects). > >> What problem do you have in defining a standard way to communicate about >> binary data-formats (not just images)? I still can't figure out why you >> are so resistant to the idea. MPI had to do it. >> > > I'm afraid of "dead" specifications, things whose only motivation is > that they look nice. They are just clutter. There are a few examples > of this already in Python, like the character buffer interface or > the multi-segment buffers. > O.K. I can understand that concern. But, all you do is make struct, array, and ctypes support the same data-format specification (by support I mean have a way to "consume" and "produce" the data-format object to the natural represenation that they have internally) and you are guaranteed it won't "die." In fact, what would be ideal is for the PIL, NumPy, CVXOpt, PyMedia, PyGame, pyre, pympi, PyVoxel, etc., etc. (there really are many modules that should be able to talk to each other more easily) to all support the same data-format representations. Then, you don't have to learn everybody's re-invention of the same concept whenever you encounter a new library that does something with binary data. How much time do you actually spend with binary data (sound, video, images, just plain numbers from a scientific experiment) and trying to use multiple Python modules to manipulate it? If you don't spend much time, then I can understand why you don't understand the need. > As for MPI: It didn't just independently define a data types system. > Instead, it did that, *and* specified the usage of the data types > in operations such as MPI_SEND. It is very clear what the scope of > this data description is, and what the intended usage is. > > Without specifying an intended usage, it is impossible to evaluate > whether the specification meets its goals. > What is not understood about the intended usage in the extended buffer protocol. What is not understood about the intended usage of giving the array and struct modules a uniform way to represent binary data? > Ok, that would be a new usage: I expected that datatype instances > always come in pairs with memory allocated and filled according to > the description. To me that is the most important usage, but it's not the *only* one. > If you are proposing to modify/extend the API > of the struct and array modules, you should say so somewhere (in > a PEP). > Sure, I understand that. But, if there is no data-format object, then there is no PEP to "extend the struct and array modules" to support it. Chicken before the egg, and all that. > I expect that the primary readers/users of the PEP would be people who > have to write libraries: i.e. people implementing NumPy, struct, array, > and people who implement algorithms that operate on data. Yes, but not only them. If it's a default way to represent data, then *users* of those libraries that "consume" the representation would also benefit by learning a standard. > So usability > of the specification is a matter of how easy it is to *write* a library > that does perform the image manipulation. > > >> If you really want to know. In NumPy it might look like this: >> >> Python code: >> >> img['r'] = img['g'] >> img['b'] = img['g'] >> > > That's not what I'm asking. Instead, what does the NumPy code look > like that gets invoked on these read-and-write operations? Does it > only use the void* pointing to the start of the data, and the > datatype object? If not, how would C code look like that only has > the void* and the datatype object? > > >> dtype = img->descr; >> > > In this code, is descr a datatype object? ... > Yes. But, I have a mistake later... > >> r_field = PyDict_GetItemString(dtype,'r'); >> Actually it should read PyDict_GetItemString(dtype->fields). The r_field is a tuple (data-type object, offset). The fields attribute is (currently) a Python dictionary. > > ... I guess not, because apparently, it is a dictionary, not > > a datatype object. > Sorry for the confusion. > >> But, I still don't see how that is relevant to the question of how to >> represent the data-format to share that information across two extensions. >> > > Well, if NumPy gets the data from a different module, it can't assume > there is a descr object that is a dictionary. Instead, it must > perform these operations just by using the datatype object. Right. I see. Again, I made a mistake in the code. img->descr is a data-type object in NumPy. img->descr->fields is a dictionary of fields keyed by 'name' and returning a tuple (data-type object, offset) But, the other option (especially for code already written) would be to just convert the data-format specification into it's own internal representation. This is the case that I was thinking about when I said it didn't matter how the library operated on the data. If new code wanted to use the data-format object as *the* internal representation, then it would matter. > What > else is the purpose of sharing the information, if not to use it > to access the data? > Of course. I'm sorry my example was incorrect. I guess this falls under the category of "ease of use". If the data-type format can *be* the internal representation, then ease of use is *optimal* because no translation is required. In my ideal world that's the way it would be. But, even if we can't get there immediately, we can at least define a standard for communication. From alexander.belopolsky at gmail.com Wed Nov 1 21:52:43 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 1 Nov 2006 20:52:43 +0000 (UTC) Subject: [Python-Dev] idea for data-type (data-format) PEP References: Message-ID: Travis E. Oliphant ieee.org> writes: > What if we look at this from the angle of trying to communicate > data-formats between different libraries (not change the way anybody > internally deals with data-formats). > > For example, ctypes has one way to internally deal with data-formats > (using type objects). > > NumPy/Numeric has a way to internally deal with data-formats (using > PyArray_Descr * structure -- in Numeric it's just a C-structure but in > NumPy it's fleshed out further and also a Python object called the > data-type). > Ctypes and NumPy's Array Interface address two different needs. When using ctypes, producers of type information are at the Python level, but Array Interface information is produced in C code. It is very convenient to write c_int*2*3 to specify a 2x3 integer matrix in Python, but it is much easier to set type code to 'i' and populate the shape array with integers in C. Consumers of type information are at the C level in both ctypes and Array Interface applications, but in the case of ctypes, users are not expected to write C code. It is typical for an array interface consumer to switch on the type code. Single character (or numeric) type codes are much more convenient than verbose type names in this case. I have used Array Interface extensively, but only for simple types and I have studied ctypes from Python level, but not from C level. I think the standard data type description object should build on the strengths of both approaches. I believe the first step should be to agree on a representation of simple types. Just an agreement on the standard type codes that every module could use would be a great improvement. (Personally, I don't need anything else from array interface.) I don't like letter codes, however. I would prefer to use an enum at the C level and verbose names at Python level. I would also like to mention one more difference between NumPy datatypes and ctypes that I did not see discussed. In ctypes arrays of different shapes are represented using different types. As a result, if the object exporting its buffer is resized, the datatype object cannot be reused, it has to be replaced. From martin at v.loewis.de Wed Nov 1 21:54:33 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 21:54:33 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <4549010F.6090200@ieee.org> References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> <4549010F.6090200@ieee.org> Message-ID: <45490989.9010603@v.loewis.de> Travis Oliphant schrieb: >>> r_field = PyDict_GetItemString(dtype,'r'); >>> > Actually it should read PyDict_GetItemString(dtype->fields). The > r_field is a tuple (data-type object, offset). The fields attribute is > (currently) a Python dictionary. Ok. This seems to be missing in the PEP. The section titled "Attributes" seems to talk about Python-level attributes. Apparently, you are suggesting that there is also a C-level API, lower than PyObject_GetAttrString, so that you can write dtype->fields, instead of having to write PyObject_GetAttrString(dtype, "fields"). If it is indeed the intend that this kind of acccess is available for datatype objects, then the PEP should specify it. Notice that it would be uncommon for a type in Python: Most types have getter functions (such as PyComplex_RealAsDouble, rather then specifying direct access through obj->cval.real). Going now back to your original code (and assuming proper adjustments): dtype = img->descr; r_field = PyDict_GetItemString(dtype,'r'); g_field = PyDict_GetItemString(dtype,'g'); r_field_dtype = PyTuple_GET_ITEM(r_field, 0); r_field_offset = PyTuple_GET_ITEM(r_field, 1); g_field_dtype = PyTuple_GET_ITEM(g_field, 0); g_field_offset = PyTuple_GET_ITEM(g_field, 1); obj = PyArray_GetField(img, g_field, g_field_offset); Py_INCREF(r_field) PyArray_SetField(img, r_field, r_field_offset, obj); In this code, where is PyArray_GetField coming from? What does it do? If I wanted to write this code from scratch, what should I write instead? Since this is all about a flat memory block, I'm surprised I need "true" Python objects for the field values in there. > But, the other option (especially for code already written) would be to > just convert the data-format specification into it's own internal > representation. Ok, so your assumption is that consumers already have their own machinery, in which case ease-of-use would be the question how difficult it is to convert datatype objects into the internal representation. Regards, Martin From alexander.belopolsky at gmail.com Wed Nov 1 22:05:17 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 1 Nov 2006 21:05:17 +0000 (UTC) Subject: [Python-Dev] idea for data-type (data-format) PEP References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > I'm afraid of "dead" specifications, things whose only motivation is > that they look nice. They are just clutter. There are a few examples > of this already in Python, like the character buffer interface or > the multi-segment buffers. > Multi-segment buffers are only dead because standard library modules do not support them. I often work with text data that is represented as an array of strings. I would love to implement a multi-segment buffer interface on top of that data and be able to do a full text regular expression search without having to concatenate into one big string, but python's re module would not take a multi-segment buffer. From anthony at python.org Wed Nov 1 11:50:32 2006 From: anthony at python.org (Anthony Baxter) Date: Wed, 1 Nov 2006 21:50:32 +1100 Subject: [Python-Dev] RELEASED Python 2.3.6, FINAL Message-ID: <200611012150.44644.anthony@python.org> On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.6 (FINAL). Python 2.3.6 is a security bug-fix release. While Python 2.5 is the latest version of Python, we're making this release for people who are still running Python 2.3. Unlike the recently released 2.4.4, this release only contains a small handful of security-related bugfixes. See the website for more. * Python 2.3.6 contains a fix for PSF-2006-001, a buffer overrun * in repr() of unicode strings in wide unicode (UCS-4) builds. * See http://www.python.org/news/security/PSF-2006-001/ for more. This is a **source only** release. The Windows and Mac binaries of 2.3.5 were built with UCS-2 unicode, and are therefore not vulnerable to the problem outlined in PSF-2006-001. The PCRE fix is for a long-deprecated module (you should use the 're' module instead) and the email fix can be obtained by downloading the standalone version of the email package. Most vendors who ship Python should have already released a patched version of 2.3.5 with the above fixes, this release is for people who need or want to build their own release, but don't want to mess around with patch or svn. There have been no changes (apart from the version number) since the release candidate of 2.3.6. Python 2.3.6 will complete python.org's response to PSF-2006-001. If you're still on Python 2.2 for some reason and need to work with UCS-4 unicode strings, please obtain the patch from the PSF-2006-001 security advisory page. Python 2.4.4 and Python 2.5 have both already been released and contain the fix for this security problem. For more information on Python 2.3.6, including download links for source archives, release notes, and known issues, please see: http://www.python.org/2.3.6 Highlights of this new release include: - A fix for PSF-2006-001, a bug in repr() for unicode strings on UCS-4 (wide unicode) builds. - Two other, less critical, security fixes. Enjoy this release, Anthony Anthony Baxter anthony at python.org Python Release Manager (on behalf of the entire python-dev team) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/cb5f29ef/attachment-0002.pgp -------------- next part -------------- -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html From martin at v.loewis.de Wed Nov 1 22:13:29 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Nov 2006 22:13:29 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: Message-ID: <45490DF9.4070500@v.loewis.de> Alexander Belopolsky schrieb: > I would also like to mention one more difference between NumPy datatypes > and ctypes that I did not see discussed. In ctypes arrays of different > shapes are represented using different types. As a result, if the object > exporting its buffer is resized, the datatype object cannot be reused, it > has to be replaced. That's also an interesting issue for the datatypes PEP: are datatype objects meant to be immutable? This is particularly interesting for the extended buffer protocol: how long can one keep the data you get from bt_getarrayinfo? Also, how does the memory management work for the results? Regards, Martin From Chris.Barker at noaa.gov Wed Nov 1 20:20:47 2006 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed, 1 Nov 2006 19:20:47 +0000 (UTC) Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. References: <4547BF86.6070806@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > Can you please give examples for real-world applications of this > interface, preferably examples involving multiple > independently-developed libraries? OK -- here's one I haven't seen in this thread yet: wxPython has a lot code to translate between various Python data types and wx data types. An example is PointList Helper. This code examines the input Python data, and translates it to a wxList of wxPoints. It is used in a bunch of the drawing functions, for instance. It has some nifty optimizations so that if a python list if (x,y) tuples is passed in, then the code uses PyList_GetItem() to access the tuples, for instance. If an Nx2 numpy array is passed in, it defaults to PySequence_GetItem() to get the (x,y) pair, and then again to get the values, which are converted to Python numbers, then checked and converted again to C ints. The results is an awful lot of processing, even though the data in the numpy array already exists in a C array that could be exactly the same as the wxList of wxPoints (in fact, many of the drawing methods take a pointer to a correctly formatted C array of data). Right now, it is faster to convert your numpy array of points to a python list of tuples first, then pass it in to wx. However, were there a standard way to describe a buffer (pointer to a C array of data), then the PointListHelper code could look to see if the data is already correctly formated, and pass the pointer right through. If it was not it could still do the translation (like from doubles to ints, for instance) far more efficiently. When I get the chance, I do intend to contribute code to support this in wxPython, using the numpy array interface. However, wouldn't it be better for it to support a generic interface that was in the standard lib, rather than only numpy? While /F suggested we get off the PIL bandwagon, I do have code that has to pass data around between numpy, PIL and wx.Images ( and matplotlib AGG buffers, and GDAL geo-referenced image buffers, and ...). Most do support the current buffer protocol, so it can be done, but I'd be much happier if there was a little more checking going on, rather than my python code having to make sure the data is all arranged in memory the right way. Oh, there is also the Python Cartographic Library, which can take a Python list of tuples as coordinates, and to a Projection on them, but which can't take a numpy array holding that same data. -Chris From fredrik at pythonware.com Wed Nov 1 22:46:29 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 01 Nov 2006 22:46:29 +0100 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: Chris Barker wrote: > While /F suggested we get off the PIL bandwagon I suggest we drop the obsession with pointers to memory areas that are supposed to have a specific format; modern data access API:s don't work that way for good reasons, so I don't see why Python should grow a standard based on that kind of model. the "right solution" for things like this is an *API* that lets you do things like: view = object.acquire_view(region, supported formats) ... access data in view ... view.release() and, for advanced users format = object.query_format(constraints) From glyph at divmod.com Wed Nov 1 22:57:24 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 01 Nov 2006 21:57:24 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> On 06:14 pm, fredrik at pythonware.com wrote: >glyph at divmod.com wrote: > >> I assert that it needs a better[1] interface because the current >> interface can lead to a variety of bugs through idiomatic, apparently >> correct usage. All the more because many of those bugs are related to >> critical errors such as security and data integrity. >instead of referring to some esoteric knowledge about file systems that >us non-twisted-using mere mortals may not be evolved enough to under- >stand, On the contrary, twisted users understand even less, because (A) we've been demonstrated to get it wrong on numerous occasions in highly public and embarrassing ways and (B) we already have this class that does it all for us and we can't remember how it works :-). >maybe you could just make a list of common bugs that may arise >due to idiomatic use of the existing primitives? Here are some common gotchas that I can think of off the top of my head. Not all of these are resolved by Twisted's path class: Path manipulation: * This is confusing as heck: >>> os.path.join("hello", "/world") '/world' >>> os.path.join("hello", "slash/world") 'hello/slash/world' >>> os.path.join("hello", "slash//world") 'hello/slash//world' Trying to formulate a general rule for what the arguments to os.path.join are supposed to be is really hard. I can't really figure out what it would be like on a non-POSIX/non-win32 platform. * it seems like slashes should be more aggressively converted to backslashes on windows, because it's near impossible to do anything with os.sep in the current situation. * "C:blah" does not mean what you think it means on Windows. Regardless of what you think it means, it is not that. I thought I understood it once as the current process having a current directory on every mapped drive, but then I had to learn about UNC paths of network mapped drives and it stopped making sense again. * There are special files on windows such as "CON" and "NUL" which exist in _every_ directory. Twisted does get around this, by looking at the result of abspath: >>> os.path.abspath("c:/foo/bar/nul") '\\\\nul' * Sometimes a path isn't a path; the zip "paths" in sys.path are a good example. This is why I'm a big fan of including a polymorphic interface of some kind: this information is *already* being persisted in an ad-hoc and broken way now, so it needs to be represented; it would be good if it were actually represented properly. URL manipulation-as-path-manipulation is another; the recent perforce use-case mentioned here is a special case of that, I think. * paths can have spaces in them and there's no convenient, correct way to quote them if you want to pass them to some gross function like os.system - and a lot of the code that manipulates paths is shell-script-replacement crud which wants to call gross functions like os.system. Maybe this isn't really the path manipulation code's fault, but it's where people start looking when they want properly quoted path arguments. * you have to care about unicode sometimes. rarely enough that none of your tests will ever account for it, but often enough that _some_ users will notice breakage if your code is ever widely distributed. this is an even more obscure example, but pygtk always reports pathnames in utf8-encoded *byte* strings, regardless of your filesystem encoding. If you forget to decode/encode it, hilarity ensues. There's no consistent error reporting (as far as I can tell, I have encountered this rarely) and no real way to detect this until you have an actual insanely-configured system with an insanely-named file on it to test with. (Polymorphic interfaces might help a *bit* here. At worst, they would at least make it possible to develop a canonical "insanely encoded filesystem" test-case backend. At best, you'd absolutely have to work in terms of unicode all the time, and no implicit encoding issues would leak through to application code.) Twisted's thing doesn't deal with this at all, and it really should. * also *sort* of an encoding issue, although basically only for webservers or other network-accessible paths: thanks to some of these earlier issues as well as %2e%2e, there are effectively multiple ways to spell "..". Checking for all of them is impossible, you need to use the os.path APIs to determine if the paths you've got really relate in the ways you think they do. * os.pathsep can be, and actually sometimes is, embedded in a path. (again, more of a general path problem, not really python's fault) * relative path manipulation is difficult. ever tried to write the function to iterate two separate trees of files in parallel? shutil re-implements this twice completely differently via recursion, and it's harder to do with a generator (which is what you really want). you can't really split on os.sep and have it be correct due to the aforementioned windows-path issue, but that's what everybody does anyway. * os.path.split doesn't work anything like str.split. FS manipulation: * although individual operations are atomic, shutil.copytree and friends aren't. I've often seen python programs confused by partially-copied trees of files. This isn't even really an atomicity issue; it's often due to a traceback in the middle of a running python program which leaves the tree half-broken. * the documentation really can't emphasize enough how bad using 'os.path.exists/isfile/isdir', and then assuming the file continues to exist when it is a contended resource, is. It can be handy, but it is _always_ a race condition. >I promise to make a nice FAQ entry out of it, with proper attribution. Thanks. The list here is just a brain dump, I'm not sure it's all appropriate for a FAQ, but I hope some of it is useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/b6052fc2/attachment.html From alexander.belopolsky at gmail.com Wed Nov 1 22:58:42 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 1 Nov 2006 16:58:42 -0500 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <45490DF9.4070500@v.loewis.de> References: <45490DF9.4070500@v.loewis.de> Message-ID: On 11/1/06, "Martin v. L?wis" wrote: > That's also an interesting issue for the datatypes PEP: are datatype > objects meant to be immutable? > That's a question for Travis, but I would think that they would be immutable at the Python level, but mutable at the C level. In Travis' approach array size is not stored in the datatype, so I don't see much need to modify datatype objects in-place. It may be reasonable to allow adding fields to a record, but I don't have enough experience with that to comment. > This is particularly interesting for the extended buffer protocol: > how long can one keep the data you get from bt_getarrayinfo? > I think your question is limited to shape and strides outputs because dataformat is a reference counted PyObject (and PEP should specify whether it is a borrowed reference). And the answer is the same as for the data from bf_getreadbuffer/bf_getwritebuffer . AFAIK, existing buffer protocol does not answer this question delegating it to the extension module writers who provide objects exporting their buffers. > Also, how does the memory management work for the results? I think it is implied that all pointers are borrowed references. I could not find any discussion of memory management in the current buffer protocol documentation. This is a good question. It may be the case that the shape or stride information is not available as Py_intptr_t array inside the object that wants to export its memory buffer. This is not theoretical, I have a 64-bit application that uses objects that keep their size information in a 32-bit int. BTW, I think the memory management issues with the buffer objects have been resolved at some point. Any lessons to learn from that? From oliphant.travis at ieee.org Wed Nov 1 23:22:52 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 15:22:52 -0700 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: Fredrik Lundh wrote: > Chris Barker wrote: > > >>While /F suggested we get off the PIL bandwagon > > > I suggest we drop the obsession with pointers to memory areas that are > supposed to have a specific format; modern data access API:s don't work > that way for good reasons, so I don't see why Python should grow a > standard based on that kind of model. > Please give us an example of a modern data-access API (i.e. an application that uses one)? I presume you are not fundamentally opposed to sharing memory given the example you gave. > the "right solution" for things like this is an *API* that lets you do > things like: > > view = object.acquire_view(region, supported formats) > ... access data in view ... > view.release() > > and, for advanced users > > format = object.query_format(constraints) > It sounds like you are concerned about the memory-area-not-current problem. Yeah, it can be a problem (but not an unsolvable one). Objects that share memory through the buffer protcol just have to be careful about resizing themselves or eliminating memory. Anyway, it's a problem not solved by the buffer protocol. I have no problem with trying to fix that in the buffer protocol, either. It's all completely separate from what I'm talking about as far as I can tell. -Travis From glyph at divmod.com Wed Nov 1 23:29:03 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 01 Nov 2006 22:29:03 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com> On 08:14 pm, sluggoster at gmail.com wrote: >Argh, it's difficult to respond to one topic that's now spiraling into >two conversations on two lists. >glyph at divmod.com wrote: >(...) people have had to spend five years putting hard-to-read >os.path functions in the code, or reinventing the wheel with their own >libraries that they're not sure they can trust. I started to use >path.py last year when it looked like it was emerging as the basis of >a new standard, but yanked it out again when it was clear the API >would be different by the time it's accepted. I've gone back to >os.path for now until something stable emerges but I really wish I >didn't have to. You *don't* have to. This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever. It's OK to use libraries. It's OK even to use libraries that Guido doesn't like! I'm pretty sure the first person to tell you that would be Guido himself. (Well, second, since I just told you.) If you like path.py and it solves your problems, use path.py. You don't have to cram it into the standard library to do that. It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier. >> *It is already used in a large body of real, working code, and >> therefore its limitations are known.* > >This is an important consideration.However, to me a clean API is more >important. It's not that I don't think a "clean" API is important. It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it". Any code that is really easy to use right will tend to have *some* aesthetic appeal. >I took a quick look at filepath. It looks similar in concept to PEP >355. Four concerns: > - unfamiliar method names (createDirectory vs mkdir, child vs join) Fair enough, but "child" really means child, not join. It is explicitly for joining one additional segment, with no slashes in it. > - basename/dirname/parent are methods rather than properties: >leads to () overproliferation in user code. The () is there because every invocation returns a _new_ object. I think that this is correct behavior but I also would prefer that it remain explicit. > - the "secure" features may not be necessary. If they are, this >should be a separate discussion, and perhaps implemented as a >subclass. The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class. Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous. child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly. Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con". It turns out that "con" is shorthand for "fall in a well and die" in win32-ese. A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze. Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them. > - stylistic objection to verbose camelCase names like createDirectory There is no accounting for taste, I suppose. Obviously if it violates the stlib's naming conventions it would have to be adjusted. >> Path representation is a bike shed. Nobody would have proposed >> writing an entirely new embedded database engine for Python: python >> 2.5 simply included SQLite because its utility was already proven. > >There's a quantum level of difference between path/file manipulation >-- which has long been considered a requirement for any full-featured >programming language -- and a database engine which is much more >complex. "quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you. No, it's not as hard as writing a database engine. Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion. >Fredrik has convinced me that it's more urgent to OOize the pathname >conversions than the filesystem operations. I agree in the relative values. I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library. >Where have all the proponents of non-OO or limited-OO strategies been? This continuum doesn't make any sense to me. Where would you place Twisted's solution on it? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/8fff62db/attachment.htm From oliphant.travis at ieee.org Wed Nov 1 23:49:08 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 15:49:08 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <4548FE08.7070402@v.loewis.de> References: <4548FE08.7070402@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Travis E. Oliphant schrieb: > >>2) complex-valued types (you might argue that it's just a 2-array of >>floats, but you could say the same thing about int as an array of >>bytes). The point is how do people interpret the data. Complex-valued >>data-types are very common. It is one reason Fortran is still used by >>scientists. > > > Well, by the same reasoning, you could argue that pixel values (RGBA) > are missing in the PEP. It's a convenience, sure, and it may also help > interfacing with the platform's FORTRAN implementation - however, are > you sure that NumPy's complex layout is consistent with the platform's > C99 _Complex definition? > I think so (it is on gcc). And yes, where you draw the line between fundamental and "derived" data-type is somewhat arbitrary. I'd rather include complex-numbers than not given their prevalence in the data-streams I'm trying to make compatible with each other. > >>3) Unicode characters >> >>4) What about floating-point representations that are not IEEE 754 >>4-byte or 8-byte. > > > Both of these are available in a platform-dependent way: if the > platform uses non-IEEE754 formats for C float and C double, ctypes > will interface with that just fine. It is actually vice versa: > IEEE-754 4-byte and 8-byte is not supported in ctypes. That's what I meant. The 'f' kind in the data-type description is also intended to mean "platform float" whatever that is. But, a complete data-format representation would have a way to describe other bit-layouts for floating point representation. Even if you can't actually calculate directly with them without conversion. > Same for Unicode: the platform's wchar_t is supported (as you said), > but not a platform-independent (say) 4-byte little-endian. Right. It's a matter of scope. Frankly, I'd be happy enough to start with "typecodes" in the extended buffer protocol (that's where the array module is now) and then move up to something more complete later. But, since we already have an array interface for record-arrays to share information and data with each other, and ctypes showing all of it's power, then why not be more complete? -Travis From alexander.belopolsky at gmail.com Thu Nov 2 00:42:25 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 1 Nov 2006 23:42:25 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: <4548FE08.7070402@v.loewis.de> Message-ID: Travis Oliphant ieee.org> writes: > Frankly, I'd be happy enough to start with > "typecodes" in the extended buffer protocol (that's where the array > module is now) and then move up to something more complete later. > Let's just start with that. The way I see the problem is that buffer protocol is fine as long as your data is an array of bytes, but if it is an array of doubles, you are out of luck. So, while I can do >>> b = buffer(array('d', [1,2,3])) there is not much that I can do with b. For example, if I want to pass it to numpy, I will have to provide the type and shape information myself: >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b) array([ 1., 2., 3.]) With the extended buffer protocol, I should be able to do >>> numpy.array(b) So let's start by solving this problem and limit it to data that can be found in a standard library array. This way we can postpone the discussion of shapes, strides and nested structs. I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method that would return a type code and the size of the data item. I believe it is better to have type codes free from size information for several reasons: 1. Generic code can use size information directly without having to know that int is 32 and double is 64 bits. 2. Odd sizes can be easily described without having to add a new type code. 3. I assume that the existing bf_ functions would still return size in bytes, so having item size available as an int will help to get number of items. If we manage to agree on the standard way to pass primitive type information, it will be a big achievement and immediately useful because simple arrays are already in the standard library. From p.f.moore at gmail.com Thu Nov 2 01:01:40 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Nov 2006 00:01:40 +0000 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: <79990c6b0611011601h37b1c805rda5bbee22127ce18@mail.gmail.com> On 11/1/06, Alexander Belopolsky wrote: > Let's just start with that. The way I see the problem is that buffer protocol > is fine as long as your data is an array of bytes, but if it is an array of > doubles, you are out of luck. So, while I can do > > >>> b = buffer(array('d', [1,2,3])) > > there is not much that I can do with b. For example, if I want to pass it to > numpy, I will have to provide the type and shape information myself: > > >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b) > array([ 1., 2., 3.]) > > With the extended buffer protocol, I should be able to do > > >>> numpy.array(b) As a data point, this is the first posting that has clearly explained to me what the two PEPs are attempting to achieve. That may be my blindness to what others find self-evident, but equally, I may not be the only one who needed this example... Paul. From oliphant.travis at ieee.org Thu Nov 2 01:46:48 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 17:46:48 -0700 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: Fredrik Lundh wrote: > Chris Barker wrote: > > >>While /F suggested we get off the PIL bandwagon > > > I suggest we drop the obsession with pointers to memory areas that are > supposed to have a specific format; modern data access API:s don't work > that way for good reasons, so I don't see why Python should grow a > standard based on that kind of model. > > the "right solution" for things like this is an *API* that lets you do > things like: > > view = object.acquire_view(region, supported formats) > ... access data in view ... > view.release() > > and, for advanced users > > format = object.query_format(constraints) So, if the extended buffer protocol were enhanced to enforce this kind of viewing and release, then would you support it? Basically, the extended buffer protocol would at the same time as providing *more* information about the "view" require the implementer to undertand the idea of "holding" and "releasing" the view. Would this basically require the object supporting the extended buffer protocol to keep some kind of list of who has views (or at least a number indicating how many views there are)? -Travis From oliphant.travis at ieee.org Thu Nov 2 01:58:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 17:58:01 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: Alexander Belopolsky wrote: > Travis Oliphant ieee.org> writes: > > >>>>b = buffer(array('d', [1,2,3])) > > > there is not much that I can do with b. For example, if I want to pass it to > numpy, I will have to provide the type and shape information myself: > > >>>>numpy.ndarray(shape=(3,), dtype=float, buffer=b) > > array([ 1., 2., 3.]) > > With the extended buffer protocol, I should be able to do > > >>>>numpy.array(b) or just numpy.array(array.array('d',[1,2,3])) and leave-out the buffer object all together. > > > So let's start by solving this problem and limit it to data that can be found > in a standard library array. This way we can postpone the discussion of shapes, > strides and nested structs. Don't lump those ideas together. Shapes and strides are necessary for N-dimensional array's (it's essentially what *defines* the N-dimensional array). I really don't want to sacrifice those in the extended buffer protocol. If you want to separate them into different functions then that is a possibility. > > If we manage to agree on the standard way to pass primitive type information, > it will be a big achievement and immediately useful because simple arrays are > already in the standard library. > We could start there, I suppose. Especially if it helps us all get on the same page. But, we already see the applications beyond this simple case so I would like to have at least an "eye" for the more difficult case which we already have a working solution for in the "array interface" -Travis From oliphant.travis at ieee.org Thu Nov 2 02:08:41 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 18:08:41 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> Message-ID: <45494519.4020501@ieee.org> Paul Moore wrote: > > > Enough of the abstract. As a concrete example, suppose I have a (byte) > string in my program containing some binary data - an ID3 header, or a > TCP packet, or whatever. It doesn't really matter. Does your proposal > offer anything to me in how I might manipulate that data (assuming I'm > not using NumPy)? (I'm not insisting that it should, I'm just trying > to understand the scope of the PEP). > What do you mean by "manipulate the data." The proposal for a data-format object would help you describe that data in a standard way and therefore share that data between several library that would be able to understand the data (because they all use and/or understand the default Python way to handle data-formats). It would be up to the other packages to "manipulate" the data. So, what you would be able to do is take your byte-string and create a buffer object which you could then share with other packages: Example: b = buffer(bytestr, format=data_format_object) Now. a = numpy.frombuffer(b) a['field1'] # prints data stored in the field named "field1" etc. Or. cobj = ctypes.frombuffer(b) # Now, cobj is a ctypes object that is basically a "structure" that can be passed # directly to your C-code. Does this help? -Travis From sluggoster at gmail.com Thu Nov 2 02:46:49 2006 From: sluggoster at gmail.com (Mike Orr) Date: Wed, 1 Nov 2006 17:46:49 -0800 Subject: [Python-Dev] Path object design In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> On 11/1/06, glyph at divmod.com wrote: > > On 06:14 pm, fredrik at pythonware.com wrote: > >glyph at divmod.com wrote: > > > >> I assert that it needs a better[1] interface because the current > >> interface can lead to a variety of bugs through idiomatic, apparently > >> correct usage. All the more because many of those bugs are related to > >> critical errors such as security and data integrity. > > >instead of referring to some esoteric knowledge about file systems that > >us non-twisted-using mere mortals may not be evolved enough to under- > >stand, > > On the contrary, twisted users understand even less, because (A) we've been > demonstrated to get it wrong on numerous occasions in highly public and > embarrassing ways and (B) we already have this class that does it all for us > and we can't remember how it works :-). This is ironic coming from one of Python's celebrity geniuses. "We made this class but we don't know how it works." Actually, it's downright alarming coming from someone who knows Twisted inside and out yet still can't make sense of path patform oddities. > * This is confusing as heck: > >>> os.path.join("hello", "/world") > '/world' That's in the documentation. I'm not sure it's "wrong". What should it do in this situation? Pretend the slash isn't there? This came up in the directory-tuple proposal. I said there was no reason to change the existing behavior of join. Noam favored an exception. > >>> os.path.join("hello", "slash/world") > 'hello/slash/world' That has always been a loophole in the function, and many programs depend on it. Again, is it "wrong"? Should an embedded separator in an argument be an error? Obviously this depends on the user's knowledge that the separator happens to be slash. > >>> os.path.join("hello", "slash//world") > 'hello/slash//world' Again a case of what "should" it do? The filesystem treats it as a single slash. The user didn't call normpath, so should we normalize it anyway? > * Sometimes a path isn't a path; the zip "paths" in sys.path are a good > example. This is why I'm a big fan of including a polymorphic interface of > some kind: this information is *already* being persisted in an ad-hoc and > broken way now, so it needs to be represented; it would be good if it were > actually represented properly. URL > manipulation-as-path-manipulation is another; the recent > perforce use-case mentioned here is a special case of that, I think. Good point, but exactly what functionality do you want to see for zip files and URLs? Just pathname manipulation? Or the ability to see whether a file exists and extract it, copy it, etc? > * you have to care about unicode sometimes. rarely enough that none of > your tests will ever account for it, but often enough that _some_ users will > notice breakage if your code is ever widely distributed. This is a Python-wide problem. The move to universal unicode will lessen this, or at least move the problem to *one* place (creating the unicode object), where every Python programmer will get bitten by it and we'll develop a few standard strategies to deal with it. (The problem is that if str and unicode are mixed in expressions, Python will promote the str to unicode and you'll get a UnicodeDecodeError if it contains non-ASCII characters. Figuring out all the ways such strings can slip into a program is difficult if you're dealing with user strings from an unknown charset, or your MySQL server is configured differently than you thought it was, or the string contains Windows curly quotes et al which are undefined in Latin-1.) > * the documentation really can't emphasize enough how bad using > 'os.path.exists/isfile/isdir', and then assuming the file continues to exist > when it is a contended resource, is. It can be handy, but it is _always_ a > race condition. What else can you do? It's either os.path.exists()/os.remove() or "do it anyway and catch the exception". And sometimes you have to check the filetype in order to determine *what* to do. -- Mike Orr From oliphant.travis at ieee.org Thu Nov 2 02:08:41 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 01 Nov 2006 18:08:41 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> Message-ID: <45494519.4020501@ieee.org> Paul Moore wrote: > > > Enough of the abstract. As a concrete example, suppose I have a (byte) > string in my program containing some binary data - an ID3 header, or a > TCP packet, or whatever. It doesn't really matter. Does your proposal > offer anything to me in how I might manipulate that data (assuming I'm > not using NumPy)? (I'm not insisting that it should, I'm just trying > to understand the scope of the PEP). > What do you mean by "manipulate the data." The proposal for a data-format object would help you describe that data in a standard way and therefore share that data between several library that would be able to understand the data (because they all use and/or understand the default Python way to handle data-formats). It would be up to the other packages to "manipulate" the data. So, what you would be able to do is take your byte-string and create a buffer object which you could then share with other packages: Example: b = buffer(bytestr, format=data_format_object) Now. a = numpy.frombuffer(b) a['field1'] # prints data stored in the field named "field1" etc. Or. cobj = ctypes.frombuffer(b) # Now, cobj is a ctypes object that is basically a "structure" that can be passed # directly to your C-code. Does this help? -Travis From sluggoster at gmail.com Thu Nov 2 03:36:31 2006 From: sluggoster at gmail.com (Mike Orr) Date: Wed, 1 Nov 2006 18:36:31 -0800 Subject: [Python-Dev] Path object design In-Reply-To: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com> References: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com> Message-ID: <6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com> On 11/1/06, glyph at divmod.com wrote: > On 08:14 pm, sluggoster at gmail.com wrote: > >(...) people have had to spend five years putting hard-to-read > >os.path functions in the code, or reinventing the wheel with their own > >libraries that they're not sure they can trust. I started to use > >path.py last year when it looked like it was emerging as the basis of > >a new standard, but yanked it out again when it was clear the API > >would be different by the time it's accepted. I've gone back to > >os.path for now until something stable emerges but I really wish I > >didn't have to. > > You *don't* have to. This is a weird attitude I've encountered over and > over again in the Python community, although sometimes it masquerades as > resistance to Twisted or Zope or whatever. It's OK to use libraries. It's > OK even to use libraries that Guido doesn't like! I'm pretty sure the first > person to tell you that would be Guido himself. (Well, second, since I just > told you.) If you like path.py and it solves your problems, use path.py. > You don't have to cram it into the standard library to do that. It won't be > any harder to migrate from an old path object to a new path object than from > os.path to a new path object, and in fact it would likely be considerably > easier. Oh, I understand it's OK to use libraries. It's just that a path library needs to be widely tested and well supported so you know it won't scramble your files. A bug in a date library affects only datetimes. A bug in a database database library affects only that database. A bug in a template library affects only the page being output. But a bug in a path library could ruin your whole day. "Um, remember those important files in that other project directory you weren't working in? They were just overwritten." Also, I train several programmers new to Python at work. I want to make them learn *one* path library that we'll be sure to stick with for several years. Every path library has subtle quirks, and switching from one to another may not be just a matter of renaming methods. > > - the "secure" features may not be necessary. If they are, this > >should be a separate discussion, and perhaps implemented as a > >subclass. > > The main "secure" feature is "child" and it is, in my opinion, the best part > about the whole class. Some of the other stuff (rummaging around for > siblings with extensions, for example) is probably extraneous. child, > however, lets you take a string from arbitrary user input and map it into a > path segment, both securely and quietly. Here's a good example (and this > actually happened, this is how I know about that crazy windows 'special > files' thing I wrote in my other recent message): you have a decision-making > program that makes two files to store information about a process: "pro" and > "con". It turns out that "con" is shorthand for "fall in a well and die" in > win32-ese. A "secure" path manipulation library would alert you to this > problem with a traceback rather than having it inexplicably freeze. > Obscure, sure, but less obscure would be getting deterministic errors from a > user entering slashes into a text field that shouldn't accept them. Perhaps you're right. I'm not saying it *should not* be a basic feature, just that unless the Python community as a whole is ready for this, users should have a choice to use it or not. I learned about DOS device files from the manuals back in the 80s. But I had completely forgotten them when I made several "aux" directories in a Subversion repository on Linux. People tried to check it out on Windows and... got some kind of error. "CON" means console: its input comes from the keyboard and its output goes to the screen. Since this is a device file, I'm not sure a path library has any responsibility to treat it specially. We don't treat "/dev/stdout" specially unless the user specifically calls a device function. I have no idea why Microsoft thought it was a good idea to put the seven-odd device files in every directory. Why not force people to type the colon ("CON:"). If they've memorized what CON means they should have no trouble with the colon, especially since it's required with "A:" and "C:" anyway For trivia, these are the ones I remember: CON Console (keyboard input, screen output) KBRD Keyboard input. ??? screen output LPT1/2/3 parallel ports COM 1/2/3/4 serial ports PRN alias for default printer port (normally LPT1) NUL bit bucket AUX game port? COPY CON FILENAME.TXT # Unix: "cat >filename.txt". COPY FILENAME.TXT PRN # Unix: "lp filename.txt" or "cat filename.txt | lp". TYPE FILENAME.TXT # Unix: "cat filename.txt". > >Where have all the proponents of non-OO or limited-OO strategies been? > > This continuum doesn't make any sense to me. Where would you place > Twisted's solution on it? In the "let's create a brilliant library and put a dark box around it so nobody knows it's there" position. Although you say you've been trying to spread the word about it. For whatever reason, I haven't heard about it till now. Not sure what this means. But what I meant is, we OO proponents have been trying to promote path.py and/or get a similar module into the stdlib for years, and all we got was... not even hostility... just indifference and silence. People like to complain about os.path but not do anything about fixing it, or even to say which approach they *would* support. Talin started a great thread on the python-3000 list, going back to the beginning and saying "What is wrong with os.path, how much does it need fixing, and is consensus on an API possible?" Maybe he did what the rest of us (including me) should have done long ago. -- Mike Orr From glyph at divmod.com Thu Nov 2 04:18:27 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 02 Nov 2006 03:18:27 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com> On 01:46 am, sluggoster at gmail.com wrote: >On 11/1/06, glyph at divmod.com wrote: >This is ironic coming from one of Python's celebrity geniuses. "We >made this class but we don't know how it works." Actually, it's >downright alarming coming from someone who knows Twisted inside and >out yet still can't make sense of path patform oddities. Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius". My ego doesn't need any help, you know? :) In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time. I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation. I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use. >> * This is confusing as heck: >> >>> os.path.join("hello", "/world") >> '/world' > >That's in the documentation. I'm not sure it's "wrong". What should >it do in this situation? Pretend the slash isn't there? You can document anything. That doesn't really make it a good idea. The point I was trying to make wasn't really that os.path is *wrong*. Far from it, in fact, it defines some useful operations and they are basically always correct. I didn't even say "wrong", I said "confusing". FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs. The question is, how hard is it to remember what its inputs should be? >> >>> os.path.join("hello", "slash/world") >> 'hello/slash/world' > >That has always been a loophole in the function, and many programs >depend on it. If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;). I am as cagey as they come about this. No matter what else happens, the behavior of os.path should not really change. >The user didn't call normpath, so should we normalize it anyway? That's really the main point here. What is a path that hasn't been "normalized"? Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it? os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them. In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to. I definitely prefer this: path1 == path2 to this: os.path.abspath(pathstr1) == os.path.abspath(pathstr2) though. You'll notice I used abspath instead of normpath. As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea. You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever. Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used. I think that sequences of strings might be sufficient though. >Good point, but exactly what functionality do you want to see for zip >files and URLs? Just pathname manipulation? Or the ability to see >whether a file exists and extract it, copy it, etc? The latter. See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py This is still _really_ raw functionality though. I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about. I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment. >> * you have to care about unicode sometimes. >This is a Python-wide problem. I completely agree, and this isn't the thread to try to solve it. The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem. The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that. >> * the documentation really can't emphasize enough how bad using >> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist >> when it is a contended resource, is. It can be handy, but it is _always_ a >> race condition. > >What else can you do? It's either os.path.exists()/os.remove() or "do >it anyway and catch the exception". And sometimes you have to check >the filetype in order to determine *what* to do. You have to catch the exception anyway in many cases. I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion. A more accurate criticism might be that "the absence of a file locking system in the stdlib means that there are lots outside it, and many are broken". Different issue though; if it's related, it's a different method that can be added later. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061102/6d496b43/attachment.html From sluggoster at gmail.com Thu Nov 2 04:42:44 2006 From: sluggoster at gmail.com (Mike Orr) Date: Wed, 1 Nov 2006 19:42:44 -0800 Subject: [Python-Dev] Path object design In-Reply-To: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com> References: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com> Message-ID: <6e9196d20611011942x7d09789ah428f833f623029@mail.gmail.com> On 11/1/06, glyph at divmod.com wrote: > On 01:46 am, sluggoster at gmail.com wrote: > >On 11/1/06, glyph at divmod.com wrote: > > >This is ironic coming from one of Python's celebrity geniuses. "We > >made this class but we don't know how it works." Actually, it's > >downright alarming coming from someone who knows Twisted inside and > >out yet still can't make sense of path patform oddities. > > Man, it is going to be hard being ironically self-deprecating if people keep > going around calling me a "celebrity genius". My ego doesn't need any help, > you know? :) I respect Twisted in the same way I respect a loaded gun. It's powerful, but approach with caution. > If you ever think I'm suggesting breaking something in Python, you're > misinterpreting me ;). I am as cagey as they come about this. No matter > what else happens, the behavior of os.path should not really change. The point is, what *should* a join-like method do in a future improved path module? os.path.join should not change because too many programs depend on its current behavior, in ways we can't necessarily predict. But a new function/method is not bound by these constraints, as long as the boundary cases are well documented. All the os.path and file-related os/shutil functions need to be reexamined in this context. Maybe the existing behavior is best, maybe we'll keep it even if it's sub-optimal, but we should document why we're making these choices. > >The user didn't call normpath, so should we normalize it anyway? > > That's really the main point here. > > What is a path that hasn't been "normalized"? Is it a path at all, or is it > some random garbage with slashes (or maybe other things) in it? os.path > performs correct path algebra on correct inputs, and it's correct (as far as > one can be correct) on inputs that have weird junk in them. I'm tempted to say Path("/a/b").join("c", "d") should do the same thing your .child method does, but allow multiple levels in one step. But on the other hand, there will always be people with prebuilt "path/fragments" to join to other fragments, and I'm not sure we should force them to split the fragment just to rejoin it again. Maybe we need a .join_unsafe method for this, haha. -- Mike Orr From alexander.belopolsky at gmail.com Thu Nov 2 05:42:14 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 2 Nov 2006 04:42:14 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: <4548FE08.7070402@v.loewis.de> Message-ID: Travis Oliphant ieee.org> writes: > > Don't lump those ideas together. Shapes and strides are necessary for > N-dimensional array's (it's essentially what *defines* the N-dimensional > array). I really don't want to sacrifice those in the extended buffer > protocol. If you want to separate them into different functions then > that is a possibility. > I don't understand. Do you want to discuss shapes and strides separately from the datatype or not? Note that in ctypes shape is a property of datatype (as in c_int*2*3). In your proposal, shapes and strides are communicated separately. This presents a unique memory management challenge: if the object does not contain shape information in a ready to be pointed to form, who is responsible for deallocating the shape array? > > > > If we manage to agree on the standard way to pass primitive type information, > > it will be a big achievement and immediately useful because simple arrays are > > already in the standard library. > > > > We could start there, I suppose. Especially if it helps us all get on > the same page. Let's start: 1. Should primitive types be associated with simple type codes (short, int, long, float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), (float, 64)]? - I prefer pairs 2. Should primitive type codes be characters or integers (from an enum) at C level? - I prefer integers 3. Should size be expressed in bits or bytes? - I prefer bits From oliphant.travis at ieee.org Thu Nov 2 06:01:50 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Wed, 01 Nov 2006 22:01:50 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: Alexander Belopolsky wrote: > Travis Oliphant ieee.org> writes: >> Don't lump those ideas together. Shapes and strides are necessary for >> N-dimensional array's (it's essentially what *defines* the N-dimensional >> array). I really don't want to sacrifice those in the extended buffer >> protocol. If you want to separate them into different functions then >> that is a possibility. >> > > I don't understand. Do you want to discuss shapes and strides separately > from the datatype or not? Note that in ctypes shape is a property of > datatype (as in c_int*2*3). In your proposal, shapes and strides are > communicated separately. This presents a unique memory management > challenge: if the object does not contain shape information in a ready to > be pointed to form, who is responsible for deallocating the shape array? > Perhaps a "view object" should be returned like /F suggests and it manages the shape, strides, and data-format. >>> If we manage to agree on the standard way to pass primitive type information, >>> it will be a big achievement and immediately useful because simple arrays are >>> already in the standard library. >>> >> We could start there, I suppose. Especially if it helps us all get on >> the same page. > > Let's start: > > 1. Should primitive types be associated with simple type codes (short, int, long, > float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), > (float, 64)]? > - I prefer pairs > > 2. Should primitive type codes be characters or integers (from an enum) at > C level? > - I prefer integers Are these orthogonal? > > 3. Should size be expressed in bits or bytes? > - I prefer bits > So, you want an integer enum for the "kind" and an integer for the bitsize? That's fine with me. One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined in structmember.h already. Should we just re-use those #defines while adding to them to make an easy to use interface for primitive types? -Travis From alexander.belopolsky at gmail.com Thu Nov 2 06:42:26 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 2 Nov 2006 05:42:26 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: <4548FE08.7070402@v.loewis.de> Message-ID: Travis E. Oliphant ieee.org> writes: > > Alexander Belopolsky wrote: > > ... > > 1. Should primitive types be associated with simple type codes (short, int, long, > > float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), > > (float, 64)]? > > - I prefer pairs > > > > 2. Should primitive type codes be characters or integers (from an enum) at > > C level? > > - I prefer integers > > Are these orthogonal? > Do you mean are my quiestions 1 and 2 orthogonal? I guess they are. > > > > 3. Should size be expressed in bits or bytes? > > - I prefer bits > > > > So, you want an integer enum for the "kind" and an integer for the > bitsize? That's fine with me. > > One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined > in structmember.h already. Should we just re-use those #defines while > adding to them to make an easy to use interface for primitive types? > I was thinking about using something like NPY_TYPES enum, but T_* codes would work as well. Let me just present both options for the record: --- numpy/ndarrayobject.h --- enum NPY_TYPES { NPY_BOOL=0, NPY_BYTE, NPY_UBYTE, NPY_SHORT, NPY_USHORT, NPY_INT, NPY_UINT, NPY_LONG, NPY_ULONG, NPY_LONGLONG, NPY_ULONGLONG, NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE, NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE, NPY_OBJECT=17, NPY_STRING, NPY_UNICODE, NPY_VOID, NPY_NTYPES, NPY_NOTYPE, NPY_CHAR, /* special flag */ NPY_USERDEF=256 /* leave room for characters */ }; --- structmember.h --- /* Types */ #define T_SHORT 0 #define T_INT 1 #define T_LONG 2 #define T_FLOAT 3 #define T_DOUBLE 4 #define T_STRING 5 #define T_OBJECT 6 /* XXX the ordering here is weird for binary compatibility */ #define T_CHAR 7 /* 1-character string */ #define T_BYTE 8 /* 8-bit signed int */ /* unsigned variants: */ #define T_UBYTE 9 #define T_USHORT 10 #define T_UINT 11 #define T_ULONG 12 /* Added by Jack: strings contained in the structure */ #define T_STRING_INPLACE 13 #define T_OBJECT_EX 16 /* Like T_OBJECT, but raises AttributeError when the value is NULL, instead of converting to None. */ #ifdef HAVE_LONG_LONG #define T_LONGLONG 17 #define T_ULONGLONG 18 #endif /* HAVE_LONG_LONG */ From martin at v.loewis.de Thu Nov 2 07:09:12 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 02 Nov 2006 07:09:12 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: <45498B88.1000306@v.loewis.de> Travis E. Oliphant schrieb: >> 2. Should primitive type codes be characters or integers (from an enum) at >> C level? >> - I prefer integers > >> 3. Should size be expressed in bits or bytes? >> - I prefer bits >> > > So, you want an integer enum for the "kind" and an integer for the > bitsize? That's fine with me. > > One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined > in structmember.h already. Should we just re-use those #defines while > adding to them to make an easy to use interface for primitive types? Notice that those type codes imply sizes, namely the platform sizes (where "platform" always means "what the C compiler does"). So if you want to have platform-independent codes as well, you shouldn't use the T_ codes. Regards, Martin From sluggoster at gmail.com Thu Nov 2 07:47:54 2006 From: sluggoster at gmail.com (Mike Orr) Date: Wed, 1 Nov 2006 22:47:54 -0800 Subject: [Python-Dev] Mini Path object Message-ID: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com> Posted to python-dev and python-3000. Follow-ups to python-dev only please. On 10/31/06, Fredrik Lundh wrote: > here's mine; it's fully backwards compatible, can go right into 2.6, > and can be incrementally improved in future releases: > > 1) add a pathname wrapper to "os.path", which lets you do basic > path "algebra". this should probably be a subclass of unicode, > and should *only* contain operations on names. > > 2) make selected "shutil" operations available via the "os" name- > space; the old POSIX API vs. POSIX SHELL distinction is pretty > irrelevant. also make the os.path predicates available via the > "os" namespace. > > this gives a very simple conceptual model for the user; to manipulate > path *names*, use "os.path.(string)" functions or the "" > wrapper. to manipulate *objects* identified by a path, given either as > a string or a path wrapper, use "os.(path)". this can be taught in > less than a minute. Given the widely-diverging views on what, if anything, should be done to os.path, how about we make a PEP and a standalone implementation of (1) for now, and leave (2) and everything else for a later PEP. This will make people who want a reasonably forward-compatable object NOW for their Python 2.4/2.5 programs happy, provide a common seed for more elaborate libraries that may be proposed for the standard library later (and eliminate the possibility of moving the other functions and later deprecating them), and provide a module that will be well tested by the time 2.6 is ready for finalization. There's already a reference implementation in PEP 355, we'd just have to strip out the non-pathname features. There's a copy here (http://wiki.python.org/moin/PathModule) that looks reasonably recent (constructors are self.__class__() to make it subclassable), although I wonder why the class is called path instead of Path. There was another copy in the Python CVS although I can't find it now; was it deleted in the move to Subversion? (I thought it was in /sandbox/trunk/: http://svn.python.org/view/sandbox/trunk/). So, let's say we strip this Path class to: class Path(unicode): Path("foo") Path( Path("directory"), "subdirectory", "file") # Replaces .joinpath(). Path() Path.cwd() Path("ab") + "c" => Path("abc") .abspath() .normcase() .normpath() .realpath() .expanduser() .expandvars() .expand() .parent .name # Full filename without path .namebase # Filename without extension .ext .drive .splitpath() .stripext() .splitunc() .uncshare .splitall() .relpath() .relpathto() Would this offend anyone? Are there any attribute renames or method enhancements people just can't live without? 'namebase' is the only name I hate but I could live with it. The multi-argument constructor is a replacement for joining paths. (The PEP says .joinpath was "problematic" without saying why.) This could theoretically go either way, doing either the same thing as os.path.join, getting a little smarter, or doing "safe" joins by disallowing "/" embedded in string arguments. I would say that a directory-tuple Path object with these features could be maintained in parallel, but since the remaining functions require string arguments you'd have to use unicode() a lot. -- Mike Orr From p.f.moore at gmail.com Thu Nov 2 09:46:09 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Nov 2006 08:46:09 +0000 Subject: [Python-Dev] [Python-3000] Mini Path object In-Reply-To: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com> References: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com> Message-ID: <79990c6b0611020046j9d95781i378b65a55ea016c3@mail.gmail.com> On 11/2/06, Mike Orr wrote: > Given the widely-diverging views on what, if anything, should be done > to os.path, how about we make a PEP and a standalone implementation of > (1) for now, and leave (2) and everything else for a later PEP. Why write a PEP at this stage? Just release your proposal as a module, and see if people use it. If they do, write a PEP to include it in the stdlib. (That's basically what happened with the original PEP - it started off proposing Jason Orendorff's path module IIRC). >From what you're proposing, I may well use such a module, if it helps :-) (But I'm not sure I'd vote for it in to go the stdlib without having survived as an external module first) Paul. From p.f.moore at gmail.com Thu Nov 2 09:53:56 2006 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Nov 2006 08:53:56 +0000 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <45494519.4020501@ieee.org> References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> <45494519.4020501@ieee.org> Message-ID: <79990c6b0611020053w6a11c424yfc6d329ab48e4a90@mail.gmail.com> On 11/2/06, Travis Oliphant wrote: > What do you mean by "manipulate the data." The proposal for a > data-format object would help you describe that data in a standard way > and therefore share that data between several library that would be able > to understand the data (because they all use and/or understand the > default Python way to handle data-formats). > > It would be up to the other packages to "manipulate" the data. Yes, some other messages I read since I posted this clarified it for me. Essentially, as a Python programmer, there's nothing in the PEP for me - it's for extension writers (and maybe writers of some lower-level Python modules? I'm not sure about this). So as I'm not really the target audience, I won't comment further. > So, what you would be able to do is take your byte-string and create a > buffer object which you could then share with other packages: > > Example: > > b = buffer(bytestr, format=data_format_object) > > Now. > > a = numpy.frombuffer(b) > a['field1'] # prints data stored in the field named "field1" > > etc. > > Or. > > cobj = ctypes.frombuffer(b) > > # Now, cobj is a ctypes object that is basically a "structure" that can > be passed # directly to your C-code. > > Does this help? Somewhat. My understanding is that the python-level buffer object is frowned upon as not good practice, and is scheduled for removal at some point (Py3K, quite possibly?) Hence, any code that uses buffer() feels like it "needs" to be replaced by something "more acceptable". So although I understand the use you suggest, it's not compelling to me because I am left with the feeling that I wish I knew "the way to do it that didn't need the buffer object" (even though I realise intellectually that such a way may not exist). Paul. From oliphant.travis at ieee.org Thu Nov 2 16:59:14 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu, 02 Nov 2006 08:59:14 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <45498B88.1000306@v.loewis.de> References: <4548FE08.7070402@v.loewis.de> <45498B88.1000306@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Travis E. Oliphant schrieb: > >>>2. Should primitive type codes be characters or integers (from an enum) at >>>C level? >>> - I prefer integers >> >>>3. Should size be expressed in bits or bytes? >>> - I prefer bits >>> >> >>So, you want an integer enum for the "kind" and an integer for the >>bitsize? That's fine with me. >> >>One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined >>in structmember.h already. Should we just re-use those #defines while >>adding to them to make an easy to use interface for primitive types? > > > Notice that those type codes imply sizes, namely the platform sizes > (where "platform" always means "what the C compiler does"). So if > you want to have platform-independent codes as well, you shouldn't > use the T_ codes. > In NumPy we've found it convenient to use both. Basically, we've set up a header file that "does the translation" using #defines and typedefs to create things like (on a 32-bit platform) typedef npy_int32 int #define NPY_INT32 NPY_INT So, that either the T_code-like enum or the bit-width can be used interchangable. Typically people want to specify bit-widths (and see their data-types in bit-widths) but in C-code that implements something you need to use one of the platform integers. I don't know if we really need to bring all of that over. -Travis From theller at ctypes.org Thu Nov 2 21:15:19 2006 From: theller at ctypes.org (Thomas Heller) Date: Thu, 02 Nov 2006 21:15:19 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: Message-ID: <454A51D7.90505@ctypes.org> Travis E. Oliphant schrieb: > Travis E. Oliphant wrote: >> Thanks for all the comments that have been given on the data-type >> (data-format) PEP. I'd like opinions on an idea for revising the PEP I >> have. > >> >> 1) We could define a special string-syntax (or list syntax) that covers >> every special case. The array interface specification goes this >> direction and it requires no new Python types. This could also be seen >> as an extension of the "struct" module to allow for nested structures, etc. >> >> 2) We could define a Python object that specifically carries data-format >> information. >> >> >> Does that explain the goal of what I'm trying to do better? > > In other-words, what I'm saying is I really want a PEP that does this. > Could we have a discussion about what the best way to communicate > data-format information across multiple extension modules would look > like. I'm not saying my (pre-)PEP is best. The point of putting it in > it's infant state out there is to get the discussion rolling, not to > claim I've got all the answers. IIUC, so far the 'data-object' carries information about the structure of the data it describes. Couldn't it go a step further and have also some functionality? Converting the data into a Python object and back? This is what the ctypes SETFUNC and GETFUNC functions do, and what also is implemented in the struct module... Thomas From theller at ctypes.org Thu Nov 2 21:35:32 2006 From: theller at ctypes.org (Thomas Heller) Date: Thu, 02 Nov 2006 21:35:32 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <45468C8E.1000203@canterbury.ac.nz> <454789F9.7050808@ctypes.org> Message-ID: <454A5694.1020809@ctypes.org> Ronald Oussoren schrieb: > On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: > >> >> This mechanism is probably a hack because it'n not possible to add >> C accessible >> fields to type objects, on the other hand it is extensible (in >> principle, at least). > > I better start rewriting PyObjC then :-). PyObjC stores some addition > information in the type objects that are used to describe Objective-C > classes (such as a reference to the proxied class). > > IIRC This has been possible from Python 2.3. I assume you are referring to the code in pyobjc/Modules/objc/objc-class.h ? If this really is reliable I should better start rewriting ctypes then ;-). Hm, I always thought there was some additional magic going on with type objects, fields appended dynamically at the end or whatever. Thomas From ronaldoussoren at mac.com Thu Nov 2 22:38:08 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 2 Nov 2006 22:38:08 +0100 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <454A5694.1020809@ctypes.org> References: <45468C8E.1000203@canterbury.ac.nz> <454789F9.7050808@ctypes.org> <454A5694.1020809@ctypes.org> Message-ID: On Nov 2, 2006, at 9:35 PM, Thomas Heller wrote: > Ronald Oussoren schrieb: >> On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: >> >>> >>> This mechanism is probably a hack because it'n not possible to add >>> C accessible >>> fields to type objects, on the other hand it is extensible (in >>> principle, at least). >> >> I better start rewriting PyObjC then :-). PyObjC stores some addition >> information in the type objects that are used to describe Objective-C >> classes (such as a reference to the proxied class). >> >> IIRC This has been possible from Python 2.3. > > I assume you are referring to the code in pyobjc/Modules/objc/objc- > class.h Yes. > > If this really is reliable I should better start rewriting ctypes > then ;-). > > Hm, I always thought there was some additional magic going on with > type > objects, fields appended dynamically at the end or whatever. There is such magic, but that magic was updated in Python 2.3 to allow type-object extensions like this. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061102/fc916cd2/attachment.bin From oliphant.travis at ieee.org Thu Nov 2 23:30:51 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu, 02 Nov 2006 15:30:51 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <454A51D7.90505@ctypes.org> References: <454A51D7.90505@ctypes.org> Message-ID: T > > IIUC, so far the 'data-object' carries information about the structure > of the data it describes. > > Couldn't it go a step further and have also some functionality? > Converting the data into a Python object and back? > Yes, I had considered it to do that. That's why the setfunc and getfunc functions were written the way they were. -teo From greg.ewing at canterbury.ac.nz Fri Nov 3 00:52:44 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 12:52:44 +1300 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: Message-ID: <454A84CC.9000905@canterbury.ac.nz> Alexander Belopolsky wrote: > In ctypes arrays of different > shapes are represented using different types. As a result, if the object > exporting its buffer is resized, the datatype object cannot be reused, it > has to be replaced. I was thinking about that myself the other day. I was thinking that both ctypes and NumPy arrays + proposed_type_descriptor provide a way of describing an array of binary data and providing Python-level access to that data. So a NumPy array and an instance of a ctypes type that happens to describe an array are very similar things. I was wondering whether they could be unified somehow. But then I realised that the ctypes array is a fixed-size array, whereas NumPy's notion of an array is rather more flexible. So they're not really the same thing after all. However, the *elements* of the array are fixed size in both cases, so the respective descriptions of the element type could potentially have something in common. My current take on the situation is that Travis is probably right about ctypes types being too cumbersome for what he has in mind. The next best thing would be to make them interoperate: have an easy way of getting a ctypes type corresponding to a given data layout description and vice versa. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 00:57:21 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 12:57:21 +1300 Subject: [Python-Dev] PEP: Extending the buffer protocol to share array information. In-Reply-To: References: <4547BF86.6070806@v.loewis.de> Message-ID: <454A85E1.90902@canterbury.ac.nz> Fredrik Lundh wrote: > the "right solution" for things like this is an *API* that lets you do > things like: > > view = object.acquire_view(region, supported formats) And how do you describe the "supported formats"? That's where Travis's proposal comes in, as far as I can see. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:04:07 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:04:07 +1300 Subject: [Python-Dev] Path object design In-Reply-To: <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> Message-ID: <454A9587.6030806@canterbury.ac.nz> Mike Orr wrote: >> * This is confusing as heck: >> >>> os.path.join("hello", "/world") >> '/world' It's only confusing if you're not thinking of pathnames as abstract entities. There's a reason for this behaviour -- it's so you can do things like full_path = os.path.join(default_dir, filename_from_user) where filename_from_user can be either a relative or absolute path at his discretion. In other words, os.path.join doesn't just mean "join these two paths together", it means "interpret the second path in the context of the first". Having said that, I can see there could be an element of confusion in calling it "join". -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:04:13 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:04:13 +1300 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: <454A958D.9070002@canterbury.ac.nz> Travis Oliphant wrote: > or just > > numpy.array(array.array('d',[1,2,3])) > > and leave-out the buffer object all together. I think the buffer object in his example was just a placeholder for "some arbitrary object that supports the buffer interface", not necessarily another NumPy array. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:04:19 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:04:19 +1300 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: <45490DF9.4070500@v.loewis.de> Message-ID: <454A9593.5000807@canterbury.ac.nz> Alexander Belopolsky wrote: > That's a question for Travis, but I would think that they would be > immutable at the Python level, but mutable at the C level. Well, anything's mutable at the C level -- the question is whether you *should* be mutating it. I think the datatype object should almost certainly be immutable. Since it's separated from the data it's describing, it's possible for one datatype object to describe multiple chunks of data. So you wouldn't want to mutate one in case it's being used for something else that you don't know about. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:04:23 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:04:23 +1300 Subject: [Python-Dev] Path object design In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: <454A9597.7090102@canterbury.ac.nz> glyph at divmod.com wrote: > >>> os.path.join("hello", "slash/world") > 'hello/slash/world' > >>> os.path.join("hello", "slash//world") > 'hello/slash//world' > Trying to formulate a general rule for what the arguments to > os.path.join are supposed to be is really hard. If you're serious about writing platform-agnostic pathname code, you don't put slashes in the arguments at all. Instead you do os.path.join("hello", "slash", "world") Many of the other things you mention are also a result of not treating pathnames as properly opaque objects. If you're saying that the fact they're strings makes it easy to forget that you're supposed to be treating them opaquely, there may be merit in that view. It would be an argument for making path objects a truly opaque type instead of a subclass of string or tuple. > * although individual operations are atomic, shutil.copytree and > friends aren't. I've often seen python programs confused by > partially-copied trees of files. I can't see how this can be even remotely regarded as a pathname issue, or even a filesystem interface issue. It's no different to any other situation where a piece of code can fall over and leave a partial result behind. As always, the cure is defensive coding (clean up a partial result on error, or be prepared to tolerate the presence of a previous partial result when re-trying). It could be argued that shutil.copytree should clean up after itself if there is an error, but that might not be what you want -- e.g. you might want to find out how far it got, and maybe carry on from there next time. It's probably better to leave things like that to the caller. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:11:54 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:11:54 +1300 Subject: [Python-Dev] Path object design In-Reply-To: <6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com> References: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com> <6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com> Message-ID: <454A975A.1050708@canterbury.ac.nz> Mike Orr wrote: > I have no idea why Microsoft thought it was a good idea to > put the seven-odd device files in every directory. Why not force > people to type the colon ("CON:"). Yes, this is a particularly stupid piece of braindamage on the part of the designers of MS-DOS. As far as I remember, even CP/M (which was itself a severely warped and twisted version of RT11) had the good sense to put colons on the end of such things. But maybe "design" is too strong a word to apply to MS-DOS... Anyhow, I think I agree that there's really nothing a path library can do about this. Whatever it tries to do, the fact will remain that it's impossible to have a regular file called "con", and users will have to live with that. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:16:23 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:16:23 +1300 Subject: [Python-Dev] Path object design In-Reply-To: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com> References: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com> Message-ID: <454A9867.8030807@canterbury.ac.nz> glyph at divmod.com wrote: > Relative > paths, if they should exist at all, should have to be explicitly linked > as relative to something *else* (e.g. made absolute) before they can be > used. If paths were opaque objects, this could be enforced by not having any way of constructing a path that wasn't rooted in some existing absolute path. -- Greg From greg.ewing at canterbury.ac.nz Fri Nov 3 02:39:41 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 14:39:41 +1300 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: References: <4548FE08.7070402@v.loewis.de> Message-ID: <454A9DDD.8040000@canterbury.ac.nz> Travis E. Oliphant wrote: > We have T_UBYTE and T_BYTE, etc. defined > in structmember.h already. Should we just re-use those #defines while > adding to them to make an easy to use interface for primitive types? They're mixed up with size information, though, which we don't want to do. -- Greg From glyph at divmod.com Fri Nov 3 02:54:48 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 03 Nov 2006 01:54:48 -0000 Subject: [Python-Dev] Path object design Message-ID: <20061103015448.14394.1229016541.divmod.xquotient.630@joule.divmod.com> On 01:04 am, greg.ewing at canterbury.ac.nz wrote: >glyph at divmod.com wrote: >If you're serious about writing platform-agnostic >pathname code, you don't put slashes in the arguments >at all. Instead you do > > os.path.join("hello", "slash", "world") > >Many of the other things you mention are also a >result of not treating pathnames as properly opaque >objects. Of course nobody who cares about these issues is going to put constant forward slashes into pathnames. The point is not that you'll forget you're supposed to be dealing with pathnames; the point is that you're going to get input from some source that you've got very little control over, and *especially* if that source is untrusted (although sometimes just due to mistakes) there are all kinds of ways it can trip you up. Did you accidentally pass it through something that doubles or undoubles all backslashes, etc. Sometimes these will result in harmless errors anyway, sometimes it's a critical error that will end up trying to delete /usr instead of /home/user/installer-build/ROOT/usr. If you have the path library catching these problems for you then a far greater percentage fall into the former category. >If you're saying that the fact they're strings makes >it easy to forget that you're supposed to be treating >them opaquely, That's exactly what I'm saying. >> * although individual operations are atomic, shutil.copytree and friends >>aren't. I've often seen python programs confused by partially-copied trees >>of files. >I can't see how this can be even remotely regarded >as a pathname issue, or even a filesystem interface >issue. It's no different to any other situation >where a piece of code can fall over and leave a >partial result behind. It is a bit of a stretch, I'll admit, but I included it because it is a weakness of the path library that it is difficult to do the kind of parallel iteration required to implement tree-copying yourself. If that were trivial, then you could write your own file-copying loop and cope with errors yourself. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061103/e2c92bb2/attachment.html From alexander.belopolsky at gmail.com Fri Nov 3 02:55:41 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 2 Nov 2006 20:55:41 -0500 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <454A9593.5000807@canterbury.ac.nz> References: <45490DF9.4070500@v.loewis.de> <454A9593.5000807@canterbury.ac.nz> Message-ID: <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com> On Nov 2, 2006, at 8:04 PM, Greg Ewing wrote: > > I think the datatype object should almost certainly > be immutable. Since it's separated from the data > it's describing, it's possible for one datatype > object to describe multiple chunks of data. So > you wouldn't want to mutate one in case it's being > used for something else that you don't know about. I only mentioned that the datatype object would be mutable at C level because changing the object instead of deleting and creating a new one could be a valid optimization in situations where the object is know not to be shared. My main concern was that in ctypes the size of an array is a part of the datatype object and this seems to be redundant if used for the buffer protocol. Buffer protocol already reports the size of the buffer as a return value of bf_get*buffer methods. In another post, Greg Ewing wrote: > > numpy.array(array.array('d',[1,2,3])) > > > > and leave-out the buffer object all together. > I think the buffer object in his example was just a > placeholder for "some arbitrary object that supports > the buffer interface", not necessarily another NumPy > array. Yes, thanks. In fact numpy.array(array.array('d',[1,2,3])) already works in numpy (I think because numpy knows about the standard library array type). In my example, I wanted to use an object that supports buffer protocol and little else. From greg.ewing at canterbury.ac.nz Fri Nov 3 03:25:15 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 15:25:15 +1300 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com> References: <45490DF9.4070500@v.loewis.de> <454A9593.5000807@canterbury.ac.nz> <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com> Message-ID: <454AA88B.2070900@canterbury.ac.nz> Alexander Belopolsky wrote: > My main concern was that in ctypes the size of an array is a part of > the datatype object and this seems to be redundant if used for the > buffer protocol. Buffer protocol already reports the size of the > buffer as a return value of bf_get*buffer methods. I think what would happen if you were interoperating with ctypes is that you would get a datatype describing one element of the array, together with the shape information, and construct a ctypes array type from that. And going the other way, from a ctypes array type you would extract an element datatype and a shape. -- Greg From steve at holdenweb.com Fri Nov 3 03:55:28 2006 From: steve at holdenweb.com (Steve Holden) Date: Fri, 03 Nov 2006 02:55:28 +0000 Subject: [Python-Dev] Path object design In-Reply-To: <454A9587.6030806@canterbury.ac.nz> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> <454A9587.6030806@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Mike Orr wrote: > > >>>* This is confusing as heck: >>> >>> os.path.join("hello", "/world") >>> '/world' > > > It's only confusing if you're not thinking of > pathnames as abstract entities. > > There's a reason for this behaviour -- it's > so you can do things like > > full_path = os.path.join(default_dir, filename_from_user) > > where filename_from_user can be either a relative > or absolute path at his discretion. > > In other words, os.path.join doesn't just mean "join > these two paths together", it means "interpret the > second path in the context of the first". > > Having said that, I can see there could be an > element of confusion in calling it "join". > Good point. "relativise" might be appropriate, though something shorter would be better. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From alexander.belopolsky at gmail.com Fri Nov 3 04:20:22 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 3 Nov 2006 03:20:22 +0000 (UTC) Subject: [Python-Dev] PEP: Adding data-type objects to Python References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> <45494519.4020501@ieee.org> <79990c6b0611020053w6a11c424yfc6d329ab48e4a90@mail.gmail.com> Message-ID: Paul Moore gmail.com> writes: > Somewhat. My understanding is that the python-level buffer object is > frowned upon as not good practice, and is scheduled for removal at > some point (Py3K, quite possibly?) Hence, any code that uses buffer() > feels like it "needs" to be replaced by something "more acceptable". Python 2.x buffer object serves two distinct purposes. First, it is a "mutable string" object and this is definitely not going away being replaced by the bytes object. (Interestingly, this functionality is not exposed to python, but C extension modules can call PyBuffer_New(size) to create a buffer.) Second, it is a "view" into any object supporting buffer protocol. For a while this usage was indeed frowned upon because buffer objects held the pointer obtained from bf_get*buffer for too long causing memory errors in situations like this: >>> a = array('c', "x"*10) >>> b = buffer(a, 5, 2) >>> a.extend('x'*1000) >>> str(b) 'xx' This problem was fixed more than two years ago. ------ r35400 | nascheme | 2004-03-10 Make buffer objects based on mutable objects (like array) safe. ------ Even though it was suggested in the past that buffer *object* should be deprecated as unsafe, I don't remember seeing a call to deprecate the buffer protocol. > So although I understand the use you suggest, it's not compelling to > me because I am left with the feeling that I wish I knew "the way to > do it that didn't need the buffer object" (even though I realise > intellectually that such a way may not exist). > As I explained in another post, I used buffer object as an example of an object that supports buffer protocol, but does not export type information in the form usable by numpy. Here is another way to illustrate the problem: >>> a = numpy.array(array.array('H', [1,2,3])) >>> b = numpy.array([1,2,3],dtype='H') >>> a.dtype == b.dtype False With the extended buffer protocol it will be possible for numpy.array(..) to realize that array.array('H', [1,2,3]) is a sequence of unsigned short integers and convert it accordingly. Currently numpy has to go through the sequence protocol to create a numpy.array from an array.array and loose the type information. From alexander.belopolsky at gmail.com Fri Nov 3 04:36:59 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 2 Nov 2006 22:36:59 -0500 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <454AA88B.2070900@canterbury.ac.nz> References: <45490DF9.4070500@v.loewis.de> <454A9593.5000807@canterbury.ac.nz> <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com> <454AA88B.2070900@canterbury.ac.nz> Message-ID: On Nov 2, 2006, at 9:25 PM, Greg Ewing wrote: > > I think what would happen if you were interoperating with > ctypes is that you would get a datatype describing one > element of the array, together with the shape information, > and construct a ctypes array type from that. And going > the other way, from a ctypes array type you would extract > an element datatype and a shape. Correct, assuming Travis' approach is accepted. However I understood that Martin was suggesting that ctypes types should be used to describe the structure of the buffer. Thus a buffer containing 10 integers would report its datatype as c_int*10. I was probably mistaken and Martin was suggesting the same as you. In this case extended buffer protocol would still use a different model from ctype and "don't reinvent the wheel" argument goes away. From greg.ewing at canterbury.ac.nz Fri Nov 3 07:31:57 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Nov 2006 19:31:57 +1300 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> <454A9587.6030806@canterbury.ac.nz> Message-ID: <454AE25D.9090507@canterbury.ac.nz> Steve Holden wrote: > Greg Ewing wrote: > >>Having said that, I can see there could be an >>element of confusion in calling it "join". >> > > Good point. "relativise" might be appropriate, Sounds like something to make my computer go at warp speed, which would be nice, but I won't be expecting a patch any time soon. :-) -- Greg From talin at acm.org Fri Nov 3 07:35:11 2006 From: talin at acm.org (Talin) Date: Thu, 02 Nov 2006 22:35:11 -0800 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com> <454A9587.6030806@canterbury.ac.nz> Message-ID: <454AE31F.1050300@acm.org> Steve Holden wrote: > Greg Ewing wrote: >> Mike Orr wrote: >> Having said that, I can see there could be an >> element of confusion in calling it "join". >> > Good point. "relativise" might be appropriate, though something shorter > would be better. > > regards > Steve The term used in many languages for this sort of operation is "combine". (See .Net System.IO.Path for an example.) I kind of like the term - it implies that you are mixing two paths together, but it doesn't imply that the combination will be additive. - Talin From dalke at dalkescientific.com Fri Nov 3 17:58:54 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 3 Nov 2006 17:58:54 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: glyph: > Path manipulation: > > * This is confusing as heck: > >>> os.path.join("hello", "/world") > '/world' > >>> os.path.join("hello", "slash/world") > 'hello/slash/world' > >>> os.path.join("hello", "slash//world") > 'hello/slash//world' > Trying to formulate a general rule for what the arguments to os.path.join > are supposed to be is really hard. I can't really figure out what it would > be like on a non-POSIX/non-win32 platform. Made trickier by the similar yet different behaviour of urlparse.urljoin. >>> import urlparse >>> urlparse.urljoin("hello", "/world") '/world' >>> urlparse.urljoin("hello", "slash/world") 'slash/world' >>> urlparse.urljoin("hello", "slash//world") 'slash//world' >>> It does not make sense to me that these should be different. Andrew dalke at dalkescientific.com [Apologies to glyph for the dup; mixed up the reply-to. Still getting used to gmail.] From steve at holdenweb.com Fri Nov 3 19:38:21 2006 From: steve at holdenweb.com (Steve Holden) Date: Fri, 03 Nov 2006 18:38:21 +0000 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: Andrew Dalke wrote: > glyph: > >>Path manipulation: >> >> * This is confusing as heck: >> >>> os.path.join("hello", "/world") >> '/world' >> >>> os.path.join("hello", "slash/world") >> 'hello/slash/world' >> >>> os.path.join("hello", "slash//world") >> 'hello/slash//world' >> Trying to formulate a general rule for what the arguments to os.path.join >>are supposed to be is really hard. I can't really figure out what it would >>be like on a non-POSIX/non-win32 platform. > > > Made trickier by the similar yet different behaviour of urlparse.urljoin. > > >>> import urlparse > >>> urlparse.urljoin("hello", "/world") > '/world' > >>> urlparse.urljoin("hello", "slash/world") > 'slash/world' > >>> urlparse.urljoin("hello", "slash//world") > 'slash//world' > >>> > > It does not make sense to me that these should be different. > Although the last two smell like bugs, the point of urljoin is to make an absolute URL from an absolute ("current page") URL and a relative (link) one. As we see: >>> urljoin("/hello", "slash/world") '/slash/world' and >>> urljoin("http://localhost/hello", "slash/world") 'http://localhost/slash/world' but >>> urljoin("http://localhost/hello/", "slash/world") 'http://localhost/hello/slash/world' >>> urljoin("http://localhost/hello/index.html", "slash/world") 'http://localhost/hello/slash/world' >>> I think we can probably conclude that this is what's supposed to happen. In the case of urljoin the first argument is interpreted as referencing an existing resource and the second as a link such as might appear in that resource. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From fredrik at pythonware.com Fri Nov 3 20:04:40 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 03 Nov 2006 20:04:40 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: Steve Holden wrote: > Although the last two smell like bugs, the point of urljoin is to make > an absolute URL from an absolute ("current page") URL also known as a base URL: http://www.w3.org/TR/html4/struct/links.html#h-12.4.1 (os.path.join's behaviour is also well-defined, btw; if any component is an absolute path, all preceding components are ignored.) From martin at v.loewis.de Sat Nov 4 00:32:57 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 00:32:57 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> Message-ID: <454BD1A9.8080508@v.loewis.de> Andrew Dalke schrieb: > >>> import urlparse > >>> urlparse.urljoin("hello", "/world") > '/world' > >>> urlparse.urljoin("hello", "slash/world") > 'slash/world' > >>> urlparse.urljoin("hello", "slash//world") > 'slash//world' > >>> > > It does not make sense to me that these should be different. Just in case this isn't clear from Steve's and Fredrik's post: The behaviour of this function is (or should be) specified, by an IETF RFC. If somebody finds that non-intuitive, that's likely because their mental model of relative URIs deviate's from the RFC's model. Of course, there is also the chance that the implementation deviates from the RFC; that would be a bug. Regards, Martin From scott+python-dev at scottdial.com Sat Nov 4 01:07:35 2006 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 03 Nov 2006 19:07:35 -0500 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <45494519.4020501@ieee.org> References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> <45494519.4020501@ieee.org> Message-ID: <454BD9C7.9050001@scottdial.com> Travis Oliphant wrote: > Paul Moore wrote: >> Enough of the abstract. As a concrete example, suppose I have a (byte) >> string in my program containing some binary data - an ID3 header, or a >> TCP packet, or whatever. It doesn't really matter. Does your proposal >> offer anything to me in how I might manipulate that data (assuming I'm >> not using NumPy)? (I'm not insisting that it should, I'm just trying >> to understand the scope of the PEP). >> > > What do you mean by "manipulate the data." The proposal for a > data-format object would help you describe that data in a standard way > and therefore share that data between several library that would be able > to understand the data (because they all use and/or understand the > default Python way to handle data-formats). > Perhaps the most relevant thing to pull from this conversation is back to what Martin has asked about before: "flexible array members". A TCP packet has no defined length (there isn't even a header field in the packet for this, so in fairness we can talk about IP packets which do). There is no way for me to describe this with the pre-PEP data-formats. I feel like it is misleading of you to say "it's up to the package to do manipulations," because you glanced over the fact that you can't even describe this type of data. ISTM, that you're only interested in describing repetitious fixed-structure arrays. If we are going to have a "default Python way to handle data-formats", then don't you feel like this falls short of the mark? I fear that you speak about this in too grandiose terms and are now trapped by people asking, "well, can I do this?" I think for a lot of folks the answer is: "nope." With respect to the network packets, this PEP doesn't do anything to fix the communication barrier. Is this not in the scope of "a consistent and standard way to discuss the format of binary data" (which is what your PEP's abstract sets out as the task)? -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From dalke at dalkescientific.com Sat Nov 4 01:56:39 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 4 Nov 2006 01:56:39 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454BD1A9.8080508@v.loewis.de> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> Message-ID: Martin: > Just in case this isn't clear from Steve's and Fredrik's > post: The behaviour of this function is (or should be) > specified, by an IETF RFC. If somebody finds that non-intuitive, > that's likely because their mental model of relative URIs > deviate's from the RFC's model. While I didn't realize that urljoin is only supposed to be used with a base URL, where "base URL" (used in the docstring) has a specific requirement that it be absolute. I instead saw the word "join" and figured it's should do roughly the same things as os.path.join. >>> import urlparse >>> urlparse.urljoin("file:///path/to/hello", "slash/world") 'file:///path/to/slash/world' >>> urlparse.urljoin("file:///path/to/hello", "/slash/world") 'file:///slash/world' >>> import os >>> os.path.join("/path/to/hello", "slash/world") '/path/to/hello/slash/world' >>> It does not. My intuition, nowadays highly influenced by URLs, is that with a couple of hypothetical functions for going between filenames and URLs: os.path.join(absolute_filename, filename) == file_url_to_filename(urlparse.urljoin( filename_to_file_url(absolute_filename), filename_to_file_url(filename))) which is not the case. os.join assumes the base is a directory name when used in a join: "inserting '/' as needed" while RFC 1808 says The last segment of the base URL's path (anything following the rightmost slash "/", or the entire path if no slash is present) is removed Is my intuition wrong in thinking those should be the same? I suspect it is. I've been very glad that when I ask for a directory name that I don't need to check that it ends with a "/". Urljoin's behaviour is correct for what it's doing. os.path.join is better for what it's doing. (And about once a year I manually verify the difference because I get unsure.) I think these should not share the "join" in the name. If urljoin is not meant for relative base URLs, should it raise an exception when misused? Hmm, though the RFC algorithm does not have a failure mode and the result may be a relative URL. Consider >>> urlparse.urljoin("http://blah.com/a/b/c", "..") 'http://blah.com/a/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../") 'http://blah.com/a/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../..") 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../") 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../..") 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../") 'http://blah.com/../' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..") # What?! 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../../") 'http://blah.com/../../' >>> > Of course, there is also the chance that the implementation > deviates from the RFC; that would be a bug. The comment in urlparse # XXX The stuff below is bogus in various ways... is ever so reassuring. I suspect there's a bug given the previous code. Or I've a bad mental model. ;) Andrew dalke at dalkescientific.com From oliphant.travis at ieee.org Sat Nov 4 02:44:19 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 03 Nov 2006 18:44:19 -0700 Subject: [Python-Dev] PEP: Adding data-type objects to Python In-Reply-To: <454BD9C7.9050001@scottdial.com> References: <20061028135415.GA13049@code0.codespeak.net> <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com> <45494519.4020501@ieee.org> <454BD9C7.9050001@scottdial.com> Message-ID: <454BF073.2050402@ieee.org> > > Perhaps the most relevant thing to pull from this conversation is back > to what Martin has asked about before: "flexible array members". A TCP > packet has no defined length (there isn't even a header field in the > packet for this, so in fairness we can talk about IP packets which > do). There is no way for me to describe this with the pre-PEP > data-formats. > > I feel like it is misleading of you to say "it's up to the package to > do manipulations," because you glanced over the fact that you can't > even describe this type of data. ISTM, that you're only interested in > describing repetitious fixed-structure arrays. Yes, that's right. I'm only interested in describing binary data with a fixed length. Others can help push it farther than that (if they even care). > If we are going to have a "default Python way to handle data-formats", > then don't you feel like this falls short of the mark? Not for me. We can fix what needs fixing, but not if we can't get out of the gate. > > I fear that you speak about this in too grandiose terms and are now > trapped by people asking, "well, can I do this?" I think for a lot of > folks the answer is: "nope." With respect to the network packets, this > PEP doesn't do anything to fix the communication barrier. Yes it could if you were interested in pushing it there. No, I didn't solve that particular problem with the PEP (because I can only solve the problems I'm aware of), but I do think the problem could be solved. We have far too many nay-sayers on this list, I think. Right now, I don't have time to push this further. My real interest is the extended buffer protocol. I want something that works for that. When I do have time again to discuss it again, I might come back and push some more. But, not now. -Travis From pje at telecommunity.com Sat Nov 4 03:09:47 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 03 Nov 2006 21:09:47 -0500 Subject: [Python-Dev] Path object design In-Reply-To: References: <454BD1A9.8080508@v.loewis.de> <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> Message-ID: <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote: >os.join assumes the base is a directory >name when used in a join: "inserting '/' as needed" while RFC >1808 says > > The last segment of the base URL's path (anything > following the rightmost slash "/", or the entire path if no > slash is present) is removed > >Is my intuition wrong in thinking those should be the same? Yes. :) Path combining and URL absolutization(?) are inherently different operations with only superficial similarities. One reason for this is that a trailing / on a URL has an actual meaning, whereas in filesystem paths a trailing / is an aberration and likely an actual error. The path combining operation says, "treat the following as a subpath of the base path, unless it is absolute". The URL normalization operation says, "treat the following as a subpath of the location the base URL is *contained in*". Because of this, os.path.join assumes a path with a trailing separator is equivalent to a path without one, since that is the only reasonable way to interpret treating the joined path as a subpath of the base path. But for a URL join, the path /foo and the path /foo/ are not only *different paths* referring to distinct objects, but the operation wants to refer to the *container* of the referenced object. /foo might refer to a directory, while /foo/ refers to some default content (e.g. index.html). This is actually why Apache normally redirects you from /foo to /foo/ before it serves up the index.html; relative URLs based on a base URL of /foo won't work right. The URL approach is designed to make peer-to-peer linking in a given directory convenient. Instead of referring to './foo.html' (as one would have to do with filenames, you can simply refer to 'foo.html'. But the cost of saving those characters in every link is that joining always takes place on the parent, never the tail-end. Thus directory URLs normally end in a trailing /, and most tools tend to automatically redirect when somebody leaves it off. (Because otherwise the links would be wrong.) From steve at holdenweb.com Sat Nov 4 05:34:12 2006 From: steve at holdenweb.com (Steve Holden) Date: Sat, 04 Nov 2006 04:34:12 +0000 Subject: [Python-Dev] Path object design In-Reply-To: <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> References: <454BD1A9.8080508@v.loewis.de> <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> Message-ID: Phillip J. Eby wrote: > At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote: > >>os.join assumes the base is a directory >>name when used in a join: "inserting '/' as needed" while RFC >>1808 says >> >> The last segment of the base URL's path (anything >> following the rightmost slash "/", or the entire path if no >> slash is present) is removed >> >>Is my intuition wrong in thinking those should be the same? > > > Yes. :) > > Path combining and URL absolutization(?) are inherently different > operations with only superficial similarities. One reason for this is that > a trailing / on a URL has an actual meaning, whereas in filesystem paths a > trailing / is an aberration and likely an actual error. > > The path combining operation says, "treat the following as a subpath of the > base path, unless it is absolute". The URL normalization operation says, > "treat the following as a subpath of the location the base URL is > *contained in*". > > Because of this, os.path.join assumes a path with a trailing separator is > equivalent to a path without one, since that is the only reasonable way to > interpret treating the joined path as a subpath of the base path. > > But for a URL join, the path /foo and the path /foo/ are not only > *different paths* referring to distinct objects, but the operation wants to > refer to the *container* of the referenced object. /foo might refer to a > directory, while /foo/ refers to some default content (e.g. > index.html). This is actually why Apache normally redirects you from /foo > to /foo/ before it serves up the index.html; relative URLs based on a base > URL of /foo won't work right. > > The URL approach is designed to make peer-to-peer linking in a given > directory convenient. Instead of referring to './foo.html' (as one would > have to do with filenames, you can simply refer to 'foo.html'. But the > cost of saving those characters in every link is that joining always takes > place on the parent, never the tail-end. Thus directory URLs normally end > in a trailing /, and most tools tend to automatically redirect when > somebody leaves it off. (Because otherwise the links would be wrong.) > Having said this, Andrew *did* demonstrate quite convincingly that the current urljoin has some fairly egregious directory traversal glitches. Is it really right to punt obvious gotchas like >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../") 'http://blah.com/../../' >>> to the server? regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From ncoghlan at gmail.com Sat Nov 4 05:38:53 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 04 Nov 2006 14:38:53 +1000 Subject: [Python-Dev] Path object design In-Reply-To: References: <454BD1A9.8080508@v.loewis.de> <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> Message-ID: <454C195D.2060901@gmail.com> Steve Holden wrote: > Having said this, Andrew *did* demonstrate quite convincingly that the > current urljoin has some fairly egregious directory traversal glitches. > Is it really right to punt obvious gotchas like > > >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../") > > 'http://blah.com/../../' > > >>> > > to the server? See Paul Jimenez's thread about replacing urlparse with something better. The current module has some serious issues :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From seefeld at sympatico.ca Wed Nov 1 20:14:05 2006 From: seefeld at sympatico.ca (Stefan Seefeld) Date: Wed, 01 Nov 2006 14:14:05 -0500 Subject: [Python-Dev] [Tracker-discuss] Getting Started In-Reply-To: References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca> <50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com> <200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca> Message-ID: <4548F1FD.5010505@sympatico.ca> Brett Cannon wrote: > On 11/1/06, Stefan Seefeld wrote: >> Right. Brett, do we need accounts on python.org for this ? > > > Yep. It just requires SSH 2 keys from each of you. You can then email > python-dev with those keys and your first.last name and someone there will > install the keys for you. My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld. Thanks ! Stefan -- ...ich hab' noch einen Koffer in Berlin... From forsberg at efod.se Wed Nov 1 20:25:03 2006 From: forsberg at efod.se (Erik Forsberg) Date: Wed, 01 Nov 2006 20:25:03 +0100 Subject: [Python-Dev] [Tracker-discuss] Getting Started In-Reply-To: (Brett Cannon's message of "Wed, 1 Nov 2006 11:17:56 -0800") References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca> <50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com> <200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca> <4548F1FD.5010505@sympatico.ca> Message-ID: <87slh3vuk0.fsf@uterus.efod.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 "Brett Cannon" writes: > On 11/1/06, Stefan Seefeld wrote: >> >> Brett Cannon wrote: >> > On 11/1/06, Stefan Seefeld wrote: >> >> >> Right. Brett, do we need accounts on python.org for this ? >> > >> > >> > Yep. It just requires SSH 2 keys from each of you. You can then email >> > python-dev with those keys and your first.last name and someone there >> will >> > install the keys for you. >> >> My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld. >> >> Thanks ! > > > Just to clarify, this is not for pydotorg but the svn.python.org. The > admins for our future Roundup instance are going to keep their Roundup code > in svn so they need commit access. Now when that's clarified, here's my data: Public SSH key: http://efod.se/about/ptkey.pub First.Lastname: erik.forsberg I'd appreciate if someone with good taste could tell us where in the tree we should add our code :-). Thanks, \EF - -- Erik Forsberg http://efod.se GPG/PGP Key: 1024D/0BAC89D9 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8+ iD8DBQFFSPSOrJurFAusidkRAucqAKDWdlq6dkI1nNt5caSyJ+gFviSeJACg4gNJ ItRUEsEI3/4ZN154Znw4jEQ= =o+Iy -----END PGP SIGNATURE----- From oliphant at ee.byu.edu Wed Nov 1 22:41:38 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed, 01 Nov 2006 14:41:38 -0700 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <45490989.9010603@v.loewis.de> References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> <4549010F.6090200@ieee.org> <45490989.9010603@v.loewis.de> Message-ID: <45491492.9060208@ee.byu.edu> Martin v. L?wis wrote: >Travis Oliphant schrieb: > > >>>>r_field = PyDict_GetItemString(dtype,'r'); >>>> >>>> >>>> >>Actually it should read PyDict_GetItemString(dtype->fields). The >>r_field is a tuple (data-type object, offset). The fields attribute is >>(currently) a Python dictionary. >> >> > >Ok. This seems to be missing in the PEP. > Yeah, actually quite a bit is missing. Because I wanted to float the idea for discussion before "getting the details perfect" (which of course they wouldn't be if it was just my input producing them). >In this code, where is PyArray_GetField coming from? > This is a NumPy Specific C-API. That's why I was confused about why you wanted me to show how I would do it. But, what you are actually asking is how would another application use the data-type information to do the same thing using the data-type object and a pointer to memory. Is that correct? This is a reasonable thing to request. And your example is a good one. I will use the PEP to explain it. Ultimately, the code you are asking for will have to have some kind of dispatch table for different binary code depending on the actual data-types being shared (unless all that is needed is a copy in which case just the size of the element area can be used). In my experience, the dispatch table must be present for at least the "simple" data-types. The data-types built up from there can depend on those. In NumPy, the data-type objects have function pointers to accomplish all the things NumPy does quickly. So, each data-type object in NumPy points to a function-pointer table and the NumPy code defers to it to actually accomplish the task (much like Python really). Not all libraries will support working with all data-types. If they don't support it, they just raise an error indicating that it's not possible to share that kind of data. > What does >it do? If I wanted to write this code from scratch, what >should I write instead? Since this is all about a flat >memory block, I'm surprised I need "true" Python objects >for the field values in there. > > Well, actually, the block could be "strided" as well. So, you would write something that gets the pointer to the memory and then gets the extended information (dimensionality, shape, and strides, and data-format object). Then, you would get the offset of the field you are interested in from the start of the element (it's stored in the data-format representation). Then do a memory copy from the right place (using the array iterator in NumPy you can actually do it without getting the shape and strides information first but I'm holding off on that PEP until an N-d array is proposed for Python). I'll write something like that as an example and put it in the PEP for the extended buffer protocol. -Travis > > >>But, the other option (especially for code already written) would be to >>just convert the data-format specification into it's own internal >>representation. >> >> > >Ok, so your assumption is that consumers already have their own >machinery, in which case ease-of-use would be the question how >difficult it is to convert datatype objects into the internal >representation. > >Regards, >Martin > > From micktwomey at gmail.com Thu Nov 2 12:09:05 2006 From: micktwomey at gmail.com (Michael Twomey) Date: Thu, 2 Nov 2006 11:09:05 +0000 Subject: [Python-Dev] [Tracker-discuss] Getting Started In-Reply-To: References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca> <50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com> <200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca> Message-ID: <50a522ca0611020309i5be21d99t8c39bbeb323289ed@mail.gmail.com> On 11/1/06, Brett Cannon wrote: > > > > Right. Brett, do we need accounts on python.org for this ? > > Yep. It just requires SSH 2 keys from each of you. You can then email > python-dev with those keys and your first.last name and someone there will > install the keys for you. > I'll need svn access to svn.python.org too for the roundup tracker. My key is over at http://translucentcode.org/mick/ssh_key.txt firstname.lastname: michael.twomey cheers, Michael From kxroberto at googlemail.com Fri Nov 3 11:50:05 2006 From: kxroberto at googlemail.com (Robert) Date: Fri, 03 Nov 2006 11:50:05 +0100 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) Message-ID: <454B1EDD.9050908@googlemail.com> repeated from c.l.p : "Feature Request: Py_NewInterpreter to create separate GIL (branch)" Daniel Dittmar wrote: > robert wrote: >> I'd like to use multiple CPU cores for selected time consuming Python >> computations (incl. numpy/scipy) in a frictionless manner. >> >> Interprocess communication is tedious and out of question, so I >> thought about simply using a more Python interpreter instances >> (Py_NewInterpreter) with extra GIL in the same process. > > If I understand Python/ceval.c, the GIL is really global, not specific > to an interpreter instance: > static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */ > Thats the show stopper as of now. There are only a handfull funcs in ceval.c to use that very global lock. The rest uses that funcs around thread states. Would it be a possibilty in next Python to have the lock separate for each Interpreter instance. Thus: have *interpreter_lock separate in each PyThreadState instance and only threads of same Interpreter have same GIL? Separation between Interpreters seems to be enough. The Interpreter runs mainly on the stack. Possibly only very few global C-level resources would require individual extra locks. Sooner or later Python will have to answer the multi-processor question. A per-interpreter GIL and a nice module for tunneling Python-Objects directly between Interpreters inside one process might be the answer at the right border-line ? Existing extension code base would remain compatible, as far as there is already decent locking on module globals, which is the the usual case. Robert From larry at hastings.org Sat Nov 4 07:38:45 2006 From: larry at hastings.org (Larry Hastings) Date: Fri, 03 Nov 2006 22:38:45 -0800 Subject: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom] In-Reply-To: <453985ED.7050303@hastings.org> References: <4523F890.9060804@hastings.org> <453985ED.7050303@hastings.org> Message-ID: <454C3575.1070807@hastings.org> On 2006/10/20, Larry Hastings wrote: > I'm ready to post the patch. Sheesh! Where does the time go. I've finally found the time to re-validate and post the patch. It's SF.net patch #1590352: http://sourceforge.net/tracker/index.php?func=detail&aid=1590352&group_id=5470&atid=305470 I've attached both the patch itself (against the current 2.6 revision, 52618) and a lengthy treatise on the patch and its ramifications as I understand them. I've also added one more experimental change: a new string method, str.simplify(). All it does is force a lazy concatenation / lazy slice to render. (If the string isn't a lazy string, or it's already been rendered, str.simplify() is a no-op.) The idea is, if you know these consarned "lazy slices" are giving you the oft-cited horrible memory usage scenario, you can tune your app by forcing the slices to render and drop their references. 99% of the time you don't care, and you enjoy the minor speedup. The other 1% of the time, you call .simplify() and your code behaves as it did under 2.5. Is this the right approach? I dunno. So far I like it better than the alternatives. But I'm open to suggestions, on this or any other aspect of the patch. Cheers, /larry/ From brett at python.org Sat Nov 4 08:20:23 2006 From: brett at python.org (Brett Cannon) Date: Fri, 3 Nov 2006 23:20:23 -0800 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: <454B1EDD.9050908@googlemail.com> References: <454B1EDD.9050908@googlemail.com> Message-ID: On 11/3/06, Robert wrote: > > repeated from c.l.p : "Feature Request: Py_NewInterpreter to create > separate GIL (branch)" > > Daniel Dittmar wrote: > > robert wrote: > >> I'd like to use multiple CPU cores for selected time consuming Python > >> computations (incl. numpy/scipy) in a frictionless manner. > >> > >> Interprocess communication is tedious and out of question, so I > >> thought about simply using a more Python interpreter instances > >> (Py_NewInterpreter) with extra GIL in the same process. > > > > If I understand Python/ceval.c, the GIL is really global, not specific > > to an interpreter instance: > > static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */ > > > > Thats the show stopper as of now. > There are only a handfull funcs in ceval.c to use that very global lock. > The rest uses that funcs around thread states. > > Would it be a possibilty in next Python to have the lock separate for > each Interpreter instance. > Thus: have *interpreter_lock separate in each PyThreadState instance and > only threads of same Interpreter have same GIL? > Separation between Interpreters seems to be enough. The Interpreter runs > mainly on the stack. Possibly only very few global C-level resources > would require individual extra locks. Right, but that's the trick. For instance extension modules are shared between interpreters. Also look at the sys module and basically anything that is set by a function call is a process-level setting that would also need protection. Then you get into the fun stuff of the possibility of sharing objects created in one interpreter and then passed to another that is not necessarily known ahead of time (whether it be directly through C code or through process-level objects such as an attribute in an extension module). It is not as simple, unfortunately, as a few locks. Sooner or later Python will have to answer the multi-processor question. > A per-interpreter GIL and a nice module for tunneling Python-Objects > directly between Interpreters inside one process might be the answer at > the right border-line ? Existing extension code base would remain > compatible, as far as there is already decent locking on module globals, > which is the the usual case. This is not true (see above). From my viewpoint the only way for this to work would be to come up with a way to wrap all access to module objects in extension modules so that they are not trampled on because of separate locks per-interpreter, or have to force all extension modules to be coded so that they are instantiated individually per interpreter. And of course deal with all other process-level objects somehow. The SMP issue for Python will most likely not happen until someone cares enough to write code to do it and this take on it is no exception. There is no simple solution or else someone would have done it by now. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061103/9ca05403/attachment.html From jcarlson at uci.edu Sat Nov 4 08:27:03 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 03 Nov 2006 23:27:03 -0800 Subject: [Python-Dev] The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom] In-Reply-To: <454C3575.1070807@hastings.org> References: <453985ED.7050303@hastings.org> <454C3575.1070807@hastings.org> Message-ID: <20061103231558.81F0.JCARLSON@uci.edu> Larry Hastings wrote: > But I'm open > to suggestions, on this or any other aspect of the patch. As Martin, I, and others have suggested, direct the patch towards Python 3.x unicode text. Also, don't be surprised if Guido says no... http://mail.python.org/pipermail/python-3000/2006-August/003334.html In that message he talks about why view+string or string+view or view+view should return strings. Some are not quite applicable in this case because with your implementation all additions can return a 'view'. However, he also states the following with regards to strings vs. views (an earlier variant of the "lazy strings" you propose), "Because they can have such different performance and memory usage characteristics, it's not right to treat them as the same type." - GvR This suggests (at least to me) that unifying the 'lazy string' with the 2.x string is basically out of the question, which brings me back to my earlier suggestion; make it into a wrapper that could be used with 3.x bytes, 3.x text, and perhaps others. - Josiah From martin at v.loewis.de Sat Nov 4 09:15:44 2006 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 04 Nov 2006 09:15:44 +0100 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> Message-ID: <454C4C30.6080209@v.loewis.de> Alexander Belopolsky schrieb: > Multi-segment buffers are only dead because standard library modules > do not support them. That, in turn, is because nobody has contributed code to make that work. My guess is that people either don't need it, or find it too difficult to implement. In any case, it is an important point that such a specification is likely dead if the standard library doesn't support it throughout, from start. So for this PEP, the same criterion likely applies: it's not sufficient to specify an interface, one also has to specify (and then implement) how that affects modules and types of the standard library. > I often work with text data that is represented > as an array of strings. I would love to implement a multi-segment > buffer interface on top of that data and be able to do a full text > regular expression search without having to concatenate into one big > string, but python's re module would not take a multi-segment buffer. If you are curious, try adding such a feature to re some time. I expect that implementing it would be quite involved. I wonder what Fredrik Lundh thinks about providing such a feature. Regards, Martin From martin at v.loewis.de Sat Nov 4 09:37:32 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 09:37:32 +0100 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: References: Message-ID: <454C514C.9000602@v.loewis.de> Paul Chiusano schrieb: > I was looking for a good pairing_heap implementation and came across > one that had apparently been checked in a couple years ago (!). Have you looked at the heapq module? What application do you have for a pairing heap that you can't do readily with the heapq module? Anyway, the immediate author of this code is Dan Stutzbach (as Raymond Hettinger's checkin message says); you probably should contact him to find out whether the project is still alive. Regards, Martin From martin at v.loewis.de Sat Nov 4 09:49:53 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 09:49:53 +0100 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: <454B1EDD.9050908@googlemail.com> References: <454B1EDD.9050908@googlemail.com> Message-ID: <454C5431.7080609@v.loewis.de> Robert schrieb: > Would it be a possibilty in next Python to have the lock separate for > each Interpreter instance. Thus: have *interpreter_lock separate in > each PyThreadState instance and only threads of same Interpreter have > same GIL? Separation between Interpreters seems to be enough. The > Interpreter runs mainly on the stack. Possibly only very few global > C-level resources would require individual extra locks. Notice that at least the following objects are shared between interpreters, as they are singletons: - None, True, False, (), "", u"" - strings of length 1, Unicode strings of length 1 with ord < 256 - integers between -5 and 256 How do you deal with the reference counters of these objects? Also, type objects (in particular exception types) are shared between interpreters. These are mutable objects, so you have actually dictionaries shared between interpreters. How would you deal with these? Also, the current thread state is a global variable, currently (_PyThreadState_Current). How would you provide access to the current thread state if there are multiple simultaneous threads? Regards, Martin From martin at v.loewis.de Sat Nov 4 16:47:37 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 16:47:37 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa Message-ID: <454CB619.7010804@v.loewis.de> Patch #1346572 proposes to also search for .pyc when OptimizeFlag is set, and for .pyo when it is not set. The author argues this is for consistency, as the zipimporter already does that. This reasoning is somewhat flawed, of course: to achieve consistency, one could also change the zipimporter instead. However, I find the proposed behaviour reasonable: Python already automatically imports the .pyc file if .py is not given and vice versa. So why not look for .pyo if the .pyc file is not present? What do you think? Regards, Martin From murman at gmail.com Sat Nov 4 17:09:11 2006 From: murman at gmail.com (Michael Urman) Date: Sat, 4 Nov 2006 10:09:11 -0600 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> Message-ID: On 11/3/06, Steve Holden wrote: > Having said this, Andrew *did* demonstrate quite convincingly that the > current urljoin has some fairly egregious directory traversal glitches. > Is it really right to punt obvious gotchas like > > >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../") > > 'http://blah.com/../../' Ah, but how do you know when that's wrong? At least under ftp:// your root is often a mid-level directory until you change up out of it. http:// will tend to treat the targets as roots, but I don't know that there's any requirement for a /.. to be meaningless (even if it often is). -- Michael Urman http://www.tortall.net/../mu/blog ;) From phd at phd.pp.ru Sat Nov 4 17:47:37 2006 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 4 Nov 2006 19:47:37 +0300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CB619.7010804@v.loewis.de> References: <454CB619.7010804@v.loewis.de> Message-ID: <20061104164737.GB29309@phd.pp.ru> On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote: > Patch #1346572 proposes to also search for .pyc when OptimizeFlag > is set, and for .pyo when it is not set. The author argues this is > for consistency, as the zipimporter already does that. > > This reasoning is somewhat flawed, of course: to achieve consistency, > one could also change the zipimporter instead. > > However, I find the proposed behaviour reasonable: Python already > automatically imports the .pyc file if .py is not given and vice > versa. So why not look for .pyo if the .pyc file is not present? > > What do you think? +1 from me. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From fredrik at pythonware.com Sat Nov 4 17:52:09 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 04 Nov 2006 17:52:09 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CB619.7010804@v.loewis.de> References: <454CB619.7010804@v.loewis.de> Message-ID: Martin v. L?wis wrote: > However, I find the proposed behaviour reasonable: Python already > automatically imports the .pyc file if .py is not given and vice > versa. So why not look for .pyo if the .pyc file is not present? well, from a performance perspective, it would be nice if Python looked for *fewer* things, not more things. (wouldn't transparent import of PYO files mean that you end up with a program where some assertions apply, and others don't? could be con- fusing...) From alexander.belopolsky at gmail.com Sat Nov 4 18:13:28 2006 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 4 Nov 2006 12:13:28 -0500 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <454C4C30.6080209@v.loewis.de> References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> <454C4C30.6080209@v.loewis.de> Message-ID: <6BD06AFE-6BA4-494F-BB68-B8EF651783EA@gmail.com> On Nov 4, 2006, at 3:15 AM, Martin v. L?wis wrote: > Alexander Belopolsky schrieb: >> Multi-segment buffers are only dead because standard library modules >> do not support them. > > That, in turn, is because nobody has contributed code to make that > work. > My guess is that people either don't need it, or find it too difficult > to implement. Last time I tried to contribute code related to buffer protocol, it was rejected with little discussion http://sourceforge.net/tracker/index.php? func=detail&aid=1539381&group_id=5470&atid=305470 that patch implemented two features: enabled creation of read-write buffer objects and added readinto method to StringIO. The resolution was: """ The file object's readinto method is not meant for public use, so adding the method to StringIO is not a good idea. """ The read-write buffer part was not discussed, but I guess the resolution would be that buffer objects are deprecated, so adding features to them is not a good idea. > > If you are curious, try adding such a feature to re some time. I > expect that implementing it would be quite involved. I wonder what > Fredrik Lundh thinks about providing such a feature. I would certainly invest some time into that if that feature had a chance of being accepted. At the moment I feel that anything related to buffers or buffer protocol is met with strong opposition. I think the opposition is mostly fueled by the belief that buffer objects are "unsafe" and buffer protocol is deprecated. None of these premises is correct AFAIK. From steve at holdenweb.com Sat Nov 4 18:16:51 2006 From: steve at holdenweb.com (Steve Holden) Date: Sat, 04 Nov 2006 17:16:51 +0000 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> Message-ID: <454CCB03.1030806@holdenweb.com> Michael Urman wrote: > On 11/3/06, Steve Holden wrote: > >> Having said this, Andrew *did* demonstrate quite convincingly that the >> current urljoin has some fairly egregious directory traversal glitches. >> Is it really right to punt obvious gotchas like >> >> >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../") >> >> 'http://blah.com/../../' > > > Ah, but how do you know when that's wrong? At least under ftp:// your > root is often a mid-level directory until you change up out of it. > http:// will tend to treat the targets as roots, but I don't know that > there's any requirement for a /.. to be meaningless (even if it often > is). > I'm darned if I know. I simply know that it isn't right for http resources. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From martin at v.loewis.de Sat Nov 4 19:23:44 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 19:23:44 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: References: <454CB619.7010804@v.loewis.de> Message-ID: <454CDAB0.5040409@v.loewis.de> Fredrik Lundh schrieb: >> However, I find the proposed behaviour reasonable: Python already >> automatically imports the .pyc file if .py is not given and vice >> versa. So why not look for .pyo if the .pyc file is not present? > > well, from a performance perspective, it would be nice if Python looked > for *fewer* things, not more things. That's true. > (wouldn't transparent import of PYO files mean that you end up with a > program where some assertions apply, and others don't? could be con- > fusing...) That's also true, however, it might still be better to do that instead of raising an ImportError. I'm not sure whether a scenario were you have only .pyo files for some modules and only .pyc files for others is really likely, though, and the performance hit of another system call doesn't sound attractive. So I guess that zipimport should stop importing .pyo files if OptimizeFlag is false, then? Regards, Martin From fredrik at pythonware.com Sat Nov 4 19:33:10 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 04 Nov 2006 19:33:10 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454CCB03.1030806@holdenweb.com> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> Message-ID: Steve Holden wrote: >> Ah, but how do you know when that's wrong? At least under ftp:// your >> root is often a mid-level directory until you change up out of it. >> http:// will tend to treat the targets as roots, but I don't know that >> there's any requirement for a /.. to be meaningless (even if it often >> is). >> > I'm darned if I know. I simply know that it isn't right for http resources. the URI specification disagrees; an URI that starts with "../" is per- fectly legal, and the specification explicitly states how it should be interpreted. (it's important to realize that "urijoin" produces equivalent URI:s, not file names) From martin at v.loewis.de Sat Nov 4 20:00:55 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Nov 2006 20:00:55 +0100 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: References: <454C514C.9000602@v.loewis.de> Message-ID: <454CE367.7000604@v.loewis.de> Paul Chiusano schrieb: > To support this, the insert method needs to return a reference to an > object which I can then pass to adjust_key() and delete() methods. > It's extremely difficult to have this functionality with array-based > heaps because the index of an item in the array changes as items are > inserted and removed. I see. > Okay, I'll do that. What needs to be done to move the project along > and possibly get a pairing heap incorporated into a future version of > python? As a starting point, I think the implementation should get packaged as an independent library, and be listed in the Cheeseshop for a few years. If then there's wide interest in including it into Python, it should be reconsidered. At that point, the then-authors of the package will have to sign a contributor form. Regards, Martin From osantana at gmail.com Sat Nov 4 20:38:52 2006 From: osantana at gmail.com (Osvaldo Santana) Date: Sat, 4 Nov 2006 16:38:52 -0300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CDAB0.5040409@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454CDAB0.5040409@v.loewis.de> Message-ID: Hi, I'm the author of this patch and we are already using it in Python port for Maemo platform. We are using .pyo modules mainly to remove docstrings from the modules. We've discussed about this patch here[1] before. Now, I agree that the zipimport behaviour is incorrect but I don't have other option to remove docstrings of a .pyc file. I'm planning to send a patch that adds a "--remove-docs" to the Python interpreter to replace the "-OO" option that create only .pyo files. [1] http://mail.python.org/pipermail/python-dev/2005-November/057959.html On 11/4/06, "Martin v. L?wis" wrote: > Fredrik Lundh schrieb: > >> However, I find the proposed behaviour reasonable: Python already > >> automatically imports the .pyc file if .py is not given and vice > >> versa. So why not look for .pyo if the .pyc file is not present? > > > > well, from a performance perspective, it would be nice if Python looked > > for *fewer* things, not more things. > > That's true. [cut] -- Osvaldo Santana Neto (aCiDBaSe) http://www.pythonologia.org From brett at python.org Sat Nov 4 21:33:40 2006 From: brett at python.org (Brett Cannon) Date: Sat, 4 Nov 2006 12:33:40 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CDAB0.5040409@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454CDAB0.5040409@v.loewis.de> Message-ID: On 11/4/06, "Martin v. L?wis" wrote: > > Fredrik Lundh schrieb: > >> However, I find the proposed behaviour reasonable: Python already > >> automatically imports the .pyc file if .py is not given and vice > >> versa. So why not look for .pyo if the .pyc file is not present? > > > > well, from a performance perspective, it would be nice if Python looked > > for *fewer* things, not more things. > > That's true. > > > (wouldn't transparent import of PYO files mean that you end up with a > > program where some assertions apply, and others don't? could be con- > > fusing...) > > That's also true, however, it might still be better to do that instead > of raising an ImportError. > > I'm not sure whether a scenario were you have only .pyo files for > some modules and only .pyc files for others is really likely, though, > and the performance hit of another system call doesn't sound attractive. > > So I guess that zipimport should stop importing .pyo files if > OptimizeFlag is false, then? Yes, I think it should. When I get around to rewriting zipimport for my import rewrite it will do this by default. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/40d3c521/attachment.html From brett at python.org Sat Nov 4 21:40:23 2006 From: brett at python.org (Brett Cannon) Date: Sat, 4 Nov 2006 12:40:23 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: References: <454CB619.7010804@v.loewis.de> <454CDAB0.5040409@v.loewis.de> Message-ID: On 11/4/06, Osvaldo Santana wrote: > > Hi, > > I'm the author of this patch and we are already using it in Python > port for Maemo platform. > > We are using .pyo modules mainly to remove docstrings from the > modules. We've discussed about this patch here[1] before. > > Now, I agree that the zipimport behaviour is incorrect but I don't > have other option to remove docstrings of a .pyc file. > > I'm planning to send a patch that adds a "--remove-docs" to the Python > interpreter to replace the "-OO" option that create only .pyo files. > > [1] http://mail.python.org/pipermail/python-dev/2005-November/057959.html The other option is to do away with .pyo files: http://www.python.org/dev/summary/2005-11-01_2005-11-15/#importing-pyc-and-pyo-files Guido has said he wouldn't mind it, but then .pyc files need to grow a field or so to be able to store what optimizations were used. While this would lead to more bytecode regeneration, it would help deal with this case and allow for more optimizations on the bytecode. -Brett On 11/4/06, "Martin v. L?wis" wrote: > > Fredrik Lundh schrieb: > > >> However, I find the proposed behaviour reasonable: Python already > > >> automatically imports the .pyc file if .py is not given and vice > > >> versa. So why not look for .pyo if the .pyc file is not present? > > > > > > well, from a performance perspective, it would be nice if Python > looked > > > for *fewer* things, not more things. > > > > That's true. > [cut] > > -- > Osvaldo Santana Neto (aCiDBaSe) > http://www.pythonologia.org > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/5347f0af/attachment.htm From jcarlson at uci.edu Sat Nov 4 21:50:51 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 04 Nov 2006 12:50:51 -0800 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: <454CE367.7000604@v.loewis.de> References: <454CE367.7000604@v.loewis.de> Message-ID: <20061104122150.81FF.JCARLSON@uci.edu> "Martin v. L?wis" wrote: > Paul Chiusano schrieb: > > To support this, the insert method needs to return a reference to an > > object which I can then pass to adjust_key() and delete() methods. > > It's extremely difficult to have this functionality with array-based > > heaps because the index of an item in the array changes as items are > > inserted and removed. > > I see. It is not required. If you are careful, you can implement a pairing heap with a structure combining a dictionary and list. It requires that all values be unique and hashable, but it is possible (I developed one for a commercial project). If other people find the need for it, I could rewrite it (can't release the closed source). It would use far less memory than the pairing heap implementation provided in the sandbox, and could be converted to C if desired and/or required. On the other hand, I've found the pure Python version to be fast enough for most things I've needed it for. - Josiah From greg.ewing at canterbury.ac.nz Sun Nov 5 02:21:34 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Nov 2006 14:21:34 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: References: <454CB619.7010804@v.loewis.de> Message-ID: <454D3C9E.5030505@canterbury.ac.nz> Fredrik Lundh wrote: > well, from a performance perspective, it would be nice if Python looked > for *fewer* things, not more things. Instead of searching for things by doing a stat call for each possible file name, would it perhaps be faster to read the contents of all the directories along sys.path into memory and then go searching through that? -- Greg From exarkun at divmod.com Sun Nov 5 02:37:32 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Sat, 4 Nov 2006 20:37:32 -0500 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454D3C9E.5030505@canterbury.ac.nz> Message-ID: <20061105013732.20948.1283244333.divmod.quotient.13773@ohm> On Sun, 05 Nov 2006 14:21:34 +1300, Greg Ewing wrote: >Fredrik Lundh wrote: > >> well, from a performance perspective, it would be nice if Python looked >> for *fewer* things, not more things. > >Instead of searching for things by doing a stat call >for each possible file name, would it perhaps be >faster to read the contents of all the directories >along sys.path into memory and then go searching >through that? Bad for large directories. There's a cross-over at some number of entries. Maybe Python should have a runtime-tuned heuristic for selecting a filesystem traversal mechanism. Jean-Paul From martin at v.loewis.de Sun Nov 5 04:14:11 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Nov 2006 04:14:11 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454D3C9E.5030505@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> Message-ID: <454D5703.5070509@v.loewis.de> Greg Ewing schrieb: > Fredrik Lundh wrote: > >> well, from a performance perspective, it would be nice if Python looked >> for *fewer* things, not more things. > > Instead of searching for things by doing a stat call > for each possible file name, would it perhaps be > faster to read the contents of all the directories > along sys.path into memory and then go searching > through that? That should never be better: the system will cache the directory blocks, also, and it will do a better job than Python will. Regards, Martin From brett at python.org Sun Nov 5 08:28:59 2006 From: brett at python.org (Brett Cannon) Date: Sat, 4 Nov 2006 23:28:59 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061105013732.20948.1283244333.divmod.quotient.13773@ohm> References: <454D3C9E.5030505@canterbury.ac.nz> <20061105013732.20948.1283244333.divmod.quotient.13773@ohm> Message-ID: On 11/4/06, Jean-Paul Calderone wrote: > > On Sun, 05 Nov 2006 14:21:34 +1300, Greg Ewing < > greg.ewing at canterbury.ac.nz> wrote: > >Fredrik Lundh wrote: > > > >> well, from a performance perspective, it would be nice if Python looked > >> for *fewer* things, not more things. > > > >Instead of searching for things by doing a stat call > >for each possible file name, would it perhaps be > >faster to read the contents of all the directories > >along sys.path into memory and then go searching > >through that? > > Bad for large directories. There's a cross-over at some number > of entries. Maybe Python should have a runtime-tuned heuristic > for selecting a filesystem traversal mechanism. Hopefully my import rewrite is flexible enough that people will be able to plug in their own importer/loader for the filesystem so that they can tune how things like this are handled (e.g., caching what files are in a directory, skipping bytecode files, etc.). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/7a45ac89/attachment.html From steve at holdenweb.com Sun Nov 5 10:13:38 2006 From: steve at holdenweb.com (Steve Holden) Date: Sun, 05 Nov 2006 09:13:38 +0000 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: References: <454D3C9E.5030505@canterbury.ac.nz> <20061105013732.20948.1283244333.divmod.quotient.13773@ohm> Message-ID: [Off-list] Brett Cannon wrote: [...] > > Hopefully my import rewrite is flexible enough that people will be able > to plug in their own importer/loader for the filesystem so that they can > tune how things like this are handled (e.g., caching what files are in a > directory, skipping bytecode files, etc.). > I just wondered whether you plan to support other importers of the PEP 302 style? I have been experimenting with import from database, and would like to see that work migrate to your rewrite if possible. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From stephen at xemacs.org Sun Nov 5 10:10:44 2006 From: stephen at xemacs.org (stephen at xemacs.org) Date: Sun, 05 Nov 2006 18:10:44 +0900 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> Message-ID: <87y7qqqmwb.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Urman writes: > Ah, but how do you know when that's wrong? At least under ftp:// your > root is often a mid-level directory until you change up out of it. > http:// will tend to treat the targets as roots, but I don't know that > there's any requirement for a /.. to be meaningless (even if it often > is). ftp and http schemes both have authority ("host") components, so the meaning of ".." path components is defined in the same way for both by section 5 of RFC 3986. Of course an FTP server is not bound to interpret the protocol so as to mimic URL semantics. But that's a different question. From dalke at dalkescientific.com Sun Nov 5 12:23:25 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 5 Nov 2006 12:23:25 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> Message-ID: Steve: > > I'm darned if I know. I simply know that it isn't right for http resources. /F: > the URI specification disagrees; an URI that starts with "../" is per- > fectly legal, and the specification explicitly states how it should be > interpreted. I have looked at the spec, and can't figure out how its explanation matches the observed urljoin results. Steve's excerpt trimmed out the strangest example. >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../") 'http://blah.com/../' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..") # What?! 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/a/b/c", "../../../../") 'http://blah.com/../../' >>> > (it's important to realize that "urijoin" produces equivalent URI:s, not > file names) Both, though, are "paths". The OP, Mik Orr, wrote: I agree that supporting non-filesystem directories (zip files, CSV/Subversion sandboxes, URLs) would be nice, but we already have a big enough project without that. What constraints should a Path object keep in mind in order to be forward-compatible with this? Is the answer therefore that URLs and URI behaviour should not place constraints on a Path object becuse they are sufficiently dissimilar from file-system paths? Do these other non-FS hierarchical structures have similar differences causing a semantic mismatch? Andrew dalke at dalkescientific.com From paul.chiusano at gmail.com Sat Nov 4 19:18:02 2006 From: paul.chiusano at gmail.com (Paul Chiusano) Date: Sat, 4 Nov 2006 13:18:02 -0500 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: <454C514C.9000602@v.loewis.de> References: <454C514C.9000602@v.loewis.de> Message-ID: Hi Martin, Yes, I'm familiar with the heapq module, but it doesn't do all that I'd like. The main functionality I am looking for is the ability to adjust the value of an item in the heap and delete items from the heap. There's a lot of heap applications where this is useful. (I might even say most heap applications!) To support this, the insert method needs to return a reference to an object which I can then pass to adjust_key() and delete() methods. It's extremely difficult to have this functionality with array-based heaps because the index of an item in the array changes as items are inserted and removed. I guess I don't need a pairing heap, but of the pointer-based heaps I've looked at, pairing heaps seem to be the simplest while still having good complexity guarantees. > Anyway, the immediate author of this code is Dan Stutzbach (as > Raymond Hettinger's checkin message says); you probably should > contact him to find out whether the project is still alive. Okay, I'll do that. What needs to be done to move the project along and possibly get a pairing heap incorporated into a future version of python? Best, Paul On 11/4/06, "Martin v. L?wis" wrote: > Paul Chiusano schrieb: > > I was looking for a good pairing_heap implementation and came across > > one that had apparently been checked in a couple years ago (!). > > Have you looked at the heapq module? What application do you have > for a pairing heap that you can't do readily with the heapq module? > > Anyway, the immediate author of this code is Dan Stutzbach (as > Raymond Hettinger's checkin message says); you probably should > contact him to find out whether the project is still alive. > > Regards, > Martin > From aahz at pythoncraft.com Sun Nov 5 17:24:58 2006 From: aahz at pythoncraft.com (Aahz) Date: Sun, 5 Nov 2006 08:24:58 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454D5703.5070509@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> Message-ID: <20061105162458.GA23812@panix.com> On Sun, Nov 05, 2006, "Martin v. L?wis" wrote: > Greg Ewing schrieb: >> Fredrik Lundh wrote: >>> >>> well, from a performance perspective, it would be nice if Python looked >>> for *fewer* things, not more things. >> >> Instead of searching for things by doing a stat call for each >> possible file name, would it perhaps be faster to read the contents >> of all the directories along sys.path into memory and then go >> searching through that? > > That should never be better: the system will cache the directory > blocks, also, and it will do a better job than Python will. Maybe so, but I recently dealt with a painful bottleneck in Python code caused by excessive stat() calls on a directory with thousands of files, while the os.listdir() function was bogging things down hardly at all. Granted, Python bytecode was almost certainly the cause of much of the overhead, but I still suspect that a simple listing will be faster in C code because of fewer system calls. It should be a matter of profiling before this suggestion is rejected rather than making assertions about what "should" be happening. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 1993 From sluggoster at gmail.com Sun Nov 5 17:59:33 2006 From: sluggoster at gmail.com (Mike Orr) Date: Sun, 5 Nov 2006 08:59:33 -0800 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> Message-ID: <6e9196d20611050859x7b39410eyb7de882d52713631@mail.gmail.com> On 11/5/06, Andrew Dalke wrote: > > I agree that supporting non-filesystem directories (zip files, > CSV/Subversion sandboxes, URLs) would be nice, but we already have a > big enough project without that. What constraints should a Path > object keep in mind in order to be forward-compatible with this? > > Is the answer therefore that URLs and URI behaviour should not > place constraints on a Path object becuse they are sufficiently > dissimilar from file-system paths? Do these other non-FS hierarchical > structures have similar differences causing a semantic mismatch? This discussion has renforced my belief that os.path.join's behavior is correct with non-initial absolute args: os.path.join('/usr/bin', '/usr/local/bin/python') I've used that in applications and haven't found it a burden. Its behavior with '..' seems justifiable too, and Talin's trick of wrapping everything in os.path.normpath is a great one. I do think join should take more care to avoid multiple slashes together in the middle of a path, although this is really the responsibility of the platform library, not a generic function/method. Join is true to its documentation of only adding separators and never than deleting them, but that seems like a bit of sloppiness. On the other hand, the filesystems don't care; I don't think anybody has mentioned a case where it actually creates a path the filesystem can't handle. urljoin clearly has a different job. When we talked about extending path to URLs, I was thinking more in terms of opening files, fetching resources, deleting, renaming, etc. rather than split-modify-rejoin. A hypothetical urlpath module would clearly have to follow the URL rules. I don't see a contradition in supporting both URL joining rules and having a non-initial absolute argument, just to avoid cross-"platform" surprises. But urlpath would also need methods to parse the scheme and host on demand, query strings, #fragments, a class method for building a URL from the smallest parts, etc. As for supporting path fragments and '..' in join arguments (for filesystem paths), it's clearly too widely used to eliminate. Users can voluntarily refrain from passing arguments containing separators. For cases involving a user-supplied -- possibly hostile -- path, either a separate method (safe_join, child) could achieve this, or a subclass implemetation that allows only safe arguments. Regarding pathname-manipulation methods and filesystem-access methods, I'm not sure how workable it is to have separate objects for them. os.mkdir( Path("/usr/local/lib/python/Cheetah/Template.py").parent ) Path("/usr/local/lib/python/Cheetah/Template.py").parent.mkdir() FileAccess( Path("/usr/local/lib/python/Cheetah/Template.py").parent ).mkdir() The first two are reasonable. The third... who would want to do this for every path? How often would you reuse the FileAccess object? I typically create Path objects from configuration values and keep them around for the entire application; e.g., data_dir. Then I create derived paths as necessary. I suppose if the FileAccess object has a .path attribute, it could do double-duty so you wouldn't have to store the path separately. Is this what the advocates of two classes have in mind? With usage like this? my_file = FileAccess( file_access_obj.path.joinpath("my_file") ) my_file = FileAccess( Path(file_access_obj,path, "my_file") ) Working on my Path implementation. (Yes it's necessary, Glyph, at least to me.) It's going slow because I just got a Macintosh laptop and am still rounding up packages to install. -- Mike Orr From jcarlson at uci.edu Sun Nov 5 19:24:45 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 05 Nov 2006 10:24:45 -0800 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: References: <20061104122150.81FF.JCARLSON@uci.edu> Message-ID: <20061105095621.8212.JCARLSON@uci.edu> "Paul Chiusano" wrote: > > > It is not required. If you are careful, you can implement a pairing > > heap with a structure combining a dictionary and list. > > That's interesting. Can you give an overview of how you can do that? I > can't really picture it. You can support all the pairing heap > operations with the same complexity guarantees? Do you mean a linked > list here or an array? I mean a Python list. The trick is to implement a sequence API that keeps track of the position of any 'pair'. That is, ph[posn] will return a 'pair' object, but when you perform ph[posn] = pair, you also update a mapping; ph.mapping[pair.value] = posn . With a few other bits, one can use heapq directly and get all of the features of the pairing heap API without keeping an explicit tree with links, etc. In terms of running time, adjust_key, delete, and extract(0) are all O(logn), meld is O(min(n+m, mlog(n+m))), empty and peek are O(1), values is O(n), and extract_all is O(nlogn) but uses list.sort() rather than repeatedly pulling from the heap (heapq's documentation suggests this is faster in terms of comparisions, but likely very much faster in terms of actual running time). Attached is a sample implementation using this method with a small test example. It may or may not use less memory than the sandbox pairing_heap.py, and using bare lists rather than pairs may result in less memory overall (if there exists a list "free list"), but this should give you something to start with. - Josiah > Paul > > On 11/4/06, Josiah Carlson wrote: > > > > "Martin v. L?wis" wrote: > > > Paul Chiusano schrieb: > > > > To support this, the insert method needs to return a reference to an > > > > object which I can then pass to adjust_key() and delete() methods. > > > > It's extremely difficult to have this functionality with array-based > > > > heaps because the index of an item in the array changes as items are > > > > inserted and removed. > > > > > > I see. > > > > It is not required. If you are careful, you can implement a pairing > > heap with a structure combining a dictionary and list. It requires that > > all values be unique and hashable, but it is possible (I developed one > > for a commercial project). > > > > If other people find the need for it, I could rewrite it (can't release > > the closed source). It would use far less memory than the pairing heap > > implementation provided in the sandbox, and could be converted to C if > > desired and/or required. On the other hand, I've found the pure Python > > version to be fast enough for most things I've needed it for. > > > > - Josiah > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: pair_heap.py Type: application/octet-stream Size: 5377 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061105/62f57d3a/attachment.obj From martin at v.loewis.de Sun Nov 5 20:22:13 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Nov 2006 20:22:13 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> Message-ID: <454E39E5.8040604@v.loewis.de> Andrew Dalke schrieb: > I have looked at the spec, and can't figure out how its explanation > matches the observed urljoin results. Steve's excerpt trimmed out > the strangest example. Unfortunately, you didn't say which of these you want explained. As it is tedious to write down even a single one, I restrain to the one with the What?! remark. >>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..") # What?! > 'http://blah.com/' Please follow me through section 5 of http://www.ietf.org/rfc/rfc3986.txt 5.2.1: Pre-parse the Base URI B.scheme = "http" B.authority = "blah.com" B.path = "/a/b/c" B.query = undefined B.fragment = undefined 5.2.2: Transform References parse("../../../..") R.scheme = R.authority = R.query = R.fragment = undefined R.path = "../../../.." (strictness not relevant, R.scheme is already undefined) R.scheme is not defined R.authority is not defined R.path is not "" R.path does not start with / T.path = merge("/a/b/c", "../../../..") T.path = remove_dot_segments(T.path) T.authority = "blah.com" T.scheme = "http" T.fragment = undefined 5.2.3 Merge paths merge("/a/b/c", "../../../..") = (base URI does have path) "/a/b/../../../.." 5.2.4 Remove Dot Segments remove_dot_segments("/a/b/../../../..") 1. I = "/a/b/../../../.." O = "" 2. A (does not apply) B (does not apply) C (does not apply) D (does not apply) E O="/a" I="/b/../../../.." 2. E O="/a/b" I="/../../../.." 2. C O="/a" I="/../../.." 2. C O="" I="/../.." 2. C O="" I="/.." 2. C O="" I="/" 2. E O="/" I="" 3. Result: "/" 5.3 Component Recomposition result = "" (scheme is defined) result = "http:" (authority is defined) result = "http://blah.com" (append path) result = "http://blah.com/" HTH, Martin From brett at python.org Sun Nov 5 21:07:03 2006 From: brett at python.org (Brett Cannon) Date: Sun, 5 Nov 2006 12:07:03 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: References: <454D3C9E.5030505@canterbury.ac.nz> <20061105013732.20948.1283244333.divmod.quotient.13773@ohm> Message-ID: On 11/5/06, Steve Holden wrote: > > [Off-list] > Brett Cannon wrote: > [...] > > > > Hopefully my import rewrite is flexible enough that people will be able > > to plug in their own importer/loader for the filesystem so that they can > > tune how things like this are handled (e.g., caching what files are in a > > directory, skipping bytecode files, etc.). > > > I just wondered whether you plan to support other importers of the PEP > 302 style? I have been experimenting with import from database, and > would like to see that work migrate to your rewrite if possible. Yep. The main point of this rewrite is to refactor the built-in importers to be PEP 302 importers so that they can easily be left out to protect imports. Plus I have made sure that doing something like .ptl files off the filesystem is simple (a subclass with a single method overloaded) or introducing a DB as a back-end store (should only require the importer/loader part; can even use an existing class to handle whether bytecode should be recreated or not). Since a DB back-end is a specific use-case I even have notes in the module docstring stating how I would go about doing it. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061105/2931109c/attachment.html From martin at v.loewis.de Sun Nov 5 21:36:51 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Nov 2006 21:36:51 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061105162458.GA23812@panix.com> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <20061105162458.GA23812@panix.com> Message-ID: <454E4B63.2020603@v.loewis.de> Aahz schrieb: > Maybe so, but I recently dealt with a painful bottleneck in Python code > caused by excessive stat() calls on a directory with thousands of files, > while the os.listdir() function was bogging things down hardly at all. > Granted, Python bytecode was almost certainly the cause of much of the > overhead, but I still suspect that a simple listing will be faster in C > code because of fewer system calls. It should be a matter of profiling > before this suggestion is rejected rather than making assertions about > what "should" be happening. That works both ways, of course: whoever implements such a patch should also provide profiling information. Last time I changed the importing code to reduce the number of stat calls, I could hardly demonstrate a speedup. Regards, Martin From dalke at dalkescientific.com Sun Nov 5 23:29:13 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 5 Nov 2006 23:29:13 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454E39E5.8040604@v.loewis.de> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> Message-ID: Martin: > Unfortunately, you didn't say which of these you want explained. > As it is tedious to write down even a single one, I restrain to the > one with the What?! remark. > > >>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..") # What?! > > 'http://blah.com/' The "What?!" is in context with the previous and next entries. I've reduced it to a simpler case >>> urlparse.urljoin("http://blah.com/", "..") 'http://blah.com/' >>> urlparse.urljoin("http://blah.com/", "../") 'http://blah.com/../' >>> urlparse.urljoin("http://blah.com/", "../..") 'http://blah.com/' Does the result make sense to you? Does it make sense that the last of these is shorter than the middle one? It sure doesn't to me. I thought it was obvious that there was an error; obvious enough that I didn't bother to track down why - especially as my main point was to argue there are different ways to deal with hierarchical/path-like schemes, each correct for its given domain. > Please follow me through section 5 of > > http://www.ietf.org/rfc/rfc3986.txt The core algorithm causing the "what?!" comes from "reduce_dot_segments", section 5.2.4. In parallel my 3 cases should give: 5.2.4 Remove Dot Segments remove_dot_segments("/..") r_d_s("/../") r_d_s("/../..") 1. I = "/.." I="/../" I="/../.." O = "" O="" O="" 2A. (does not apply) 2A. (does not apply) 2A. (does not apply) 2B. (does not apply) 2B. (does not apply) 2B. (does not apply) 2C. O="" I="/" 2C. O="" I="/" 2C. O="" I="/.." 2A. (does not apply) 2A. (does not apply) .. reduces to r_d_s("/..") 2B. (does not apply) 2B. (does not apply) 3. Result "/" 2C. (does not apply) 2C. (does not apply) 2D. (does not apply) 2D. (does not apply) 2E. O="/", I="" 2E. O="/", I="" 3. Result: "/" 3. Result "/" My reading of the RFC 3986 says all three examples should produce the same result. The fact that my "what?!" comment happens to be correct according to that RFC is purely coincidental. Then again, urlparse.py does *not* claim to be RFC 3986 compliant. The module docstring is """Parse (absolute and relative) URLs. See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June 1995. """ I tried the same code with 4Suite, which does claim compliance, and get >>> import Ft >>> from Ft.Lib import Uri >>> Uri.Absolutize("..", "http://blah.com/") 'http://blah.com/' >>> Uri.Absolutize("../", "http://blah.com/") 'http://blah.com/' >>> Uri.Absolutize("../..", "http://blah.com/") 'http://blah.com/' >>> The text of it's Uri.py says This function is similar to urlparse.urljoin() and urllib.basejoin(). Those functions, however, are (as of Python 2.3) outdated, buggy, and/or designed to produce results acceptable for use with other core Python libraries, rather than being earnest implementations of the relevant specs. Their problems are most noticeable in their handling of same-document references and 'file:' URIs, both being situations that come up far too often to consider the functions reliable enough for general use. """ # Reasons to avoid using urllib.basejoin() and urlparse.urljoin(): # - Both are partial implementations of long-obsolete specs. # - Both accept relative URLs as the base, which no spec allows. # - urllib.basejoin() mishandles the '' and '..' references. # - If the base URL uses a non-hierarchical or relative path, # or if the URL scheme is unrecognized, the result is not # always as expected (partly due to issues in RFC 1808). # - If the authority component of a 'file' URI is empty, # the authority component is removed altogether. If it was # not present, an empty authority component is in the result. # - '.' and '..' segments are not always collapsed as well as they # should be (partly due to issues in RFC 1808). # - Effective Python 2.4, urllib.basejoin() *is* urlparse.urljoin(), # but urlparse.urljoin() is still based on RFC 1808. In searching the archives http://mail.python.org/pipermail/python-dev/2005-September/056152.html Fabien Schwob: > I'm using the module urlparse and I think I've found a bug in the > urlparse module. When you merge an url and a link > like"../../../page.html" with urljoin, the new url created keep some > "../" in it. Here is an example : > > >>> import urlparse > >>> begin = "http://www.example.com/folder/page.html" > >>> end = "../../../otherpage.html" > >>> urlparse.urljoin(begin, end) > 'http://www.example.com/../../otherpage.html' Guido: > You shouldn't be giving more "../" sequences than are possible. I find > the current behavior acceptable. (Aparently for RFC 1808 that's a valid answer; it was an implementation choice in how to handle that case.) While not directly relevant, postings like John J Lee's http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html > The urlparse.urlparse() code should not be changed, for > backwards compatibility reasons. strongly suggest a desire to not change that code. The last definitive statement on this topic that I could find was mentioned in http://www.python.org/dev/summary/2005-11-16_2005-11-30/#updating-urlparse-to-support-rfc-3986 > Guido pointed out that the main purpose of urlparse is to be RFC-compliant. > Paul explained that the current code is valid according to RFC 1808 > (1995-1998), but that this was superceded by RFC 2396 (1998-2004) > and RFC 3986 (2005-). Guido was convinced, and asked for a new API > (for backwards compatibility) and a patch to be submitted via sourceforge. As this is not a bug, I have added the feature request 1591035 to SF titled "update urlparse to RFC 3986". Nothing else appeared to exist on that specific topic. Andrew dalke at dalkescientific.com From martin at v.loewis.de Mon Nov 6 00:06:07 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 00:06:07 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> Message-ID: <454E6E5F.7070800@v.loewis.de> Andrew Dalke schrieb: >>>> urlparse.urljoin("http://blah.com/", "..") > 'http://blah.com/' >>>> urlparse.urljoin("http://blah.com/", "../") > 'http://blah.com/../' >>>> urlparse.urljoin("http://blah.com/", "../..") > 'http://blah.com/' > > Does the result make sense to you? Does it make > sense that the last of these is shorter than the middle > one? It sure doesn't to me. I thought it was obvious > that there was an error; That wasn't obvious at all to me. Now looking at the examples, I agree there is an error. The middle one is incorrect; urlparse.urljoin("http://blah.com/", "../") should also give 'http://blah.com/'. >> You shouldn't be giving more "../" sequences than are possible. I find >> the current behavior acceptable. > > (Aparently for RFC 1808 that's a valid answer; it was an implementation > choice in how to handle that case.) There is still some text left to that respect in 5.4.2 of RFC 3986. > While not directly relevant, postings like John J Lee's > http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html >> The urlparse.urlparse() code should not be changed, for >> backwards compatibility reasons. > > strongly suggest a desire to not change that code. This is John J Lee's opinion, of course. I don't see a reason not to fix such bugs, or to update the implementation to the current RFCs. > As this is not a bug, I have added the feature request 1591035 to SF > titled "update urlparse to RFC 3986". Nothing else appeared to exist > on that specific topic. Thanks. It always helps to be more specific; being less specific often hurts. I find there is a difference between "urllib behaves non-intuitively" and "urllib gives result A for parameters B and C, but should give result D instead". Can you please add specific examples to your report that demonstrate the difference between implemented and expected behavior? Regards, Martin From greg.ewing at canterbury.ac.nz Mon Nov 6 00:21:26 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Nov 2006 12:21:26 +1300 Subject: [Python-Dev] [Python-3000] Mini Path object In-Reply-To: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com> References: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com> Message-ID: <454E71F6.7090103@canterbury.ac.nz> Mike Orr wrote: > .abspath() > .normpath() > .realpath() > .splitpath() > .relpath() > .relpathto() Seeing as the whole class is about paths, having "path" in the method names seems redundant. I'd prefer to see terser method names without any noise characters in them. -- Greg From greg.ewing at canterbury.ac.nz Mon Nov 6 00:34:05 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Nov 2006 12:34:05 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454D5703.5070509@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> Message-ID: <454E74ED.8070706@canterbury.ac.nz> Martin v. L?wis wrote: > That should never be better: the system will cache the directory > blocks, also, and it will do a better job than Python will. If that's really the case, then why do discussions of how improve Python startup speeds seem to focus on the number of stat calls made? Also, cacheing isn't the only thing to consider. Last time I looked at the implementation of unix file systems, they mostly seemed to do directory lookups by linear search. Unless that's changed a lot, I have a hard time seeing how that's going to beat Python's highly-tuned dictionaries. -- Greg From dalke at dalkescientific.com Mon Nov 6 00:43:42 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Nov 2006 00:43:42 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454E6E5F.7070800@v.loewis.de> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: Me [Andrew]: > > As this is not a bug, I have added the feature request 1591035 to SF > > titled "update urlparse to RFC 3986". Nothing else appeared to exist > > on that specific topic. Martin: > Thanks. It always helps to be more specific; being less specific often > hurts. So does being more specific. I wasn't trying to report a bug in urlparse. I figured everyone knew the problems existed. The code comments say so and various back discussions on this list say so. All I wanted to do what point out that two seemingly similar problems - path traversal of hierarchical structures - had two different expected behaviors. Now I've spent entirely too much time on specifics I didn't care about and didn't think were important. I've also been known to do the full report and have people ignore what I wrote because it was too long. > I find there is a difference between "urllib behaves > non-intuitively" and "urllib gives result A for parameters B and C, > but should give result D instead". Can you please add specific examples > to your report that demonstrate the difference between implemented > and expected behavior? No. I consider the "../" cases to be unimportant edge cases and I would rather people fixed the other problems highlighted in the text I copied from 4Suite's Uri.py -- like improperly allowing a relative URL as the base url, which I incorrectly assumed was legit - and that others have reported on python-dev, easily found with Google. If I only add test cases for "../" then I believe that that's all that will be fixed. Given the back history of this problem and lack of followup I also believe it won't be fixed unless someone develops a brand new module, from scratch, which will be added to some future Python version. There's probably a compliance suite out there to use for this sort of task. I hadn't bothered to look as I am no more proficient than others here at Google. Finally, I see that my report is a dup. SF search is poor. As Nick Coghlan reported, Paul Jimenez has a replacement for urlparse. Summarized in http://www.python.org/dev/summary/2006-04-01_2006-04-15/ It was submitted in spring as a patch - SF# 1462525 at http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470 which I didn't find in my earlier searching. Andrew dalke at dalkescientific.com From greg.ewing at canterbury.ac.nz Mon Nov 6 01:07:55 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Nov 2006 13:07:55 +1300 Subject: [Python-Dev] idea for data-type (data-format) PEP In-Reply-To: <45491492.9060208@ee.byu.edu> References: <4548DDFD.5030604@v.loewis.de> <4548FA58.4050702@v.loewis.de> <4549010F.6090200@ieee.org> <45490989.9010603@v.loewis.de> <45491492.9060208@ee.byu.edu> Message-ID: <454E7CDB.2000402@canterbury.ac.nz> Travis Oliphant wrote: > In NumPy, the data-type objects have function pointers to accomplish all > the things NumPy does quickly. If the datatype object is to be extracted and made a stand-alone feature, that might need to be refactored. Perhaps there could be a facility for traversing a datatype with a user-supplied dispatch table? -- Greg From foom at fuhm.net Mon Nov 6 04:08:42 2006 From: foom at fuhm.net (James Y Knight) Date: Sun, 5 Nov 2006 22:08:42 -0500 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: <454C5431.7080609@v.loewis.de> References: <454B1EDD.9050908@googlemail.com> <454C5431.7080609@v.loewis.de> Message-ID: On Nov 4, 2006, at 3:49 AM, Martin v. L?wis wrote: > Notice that at least the following objects are shared between > interpreters, as they are singletons: > - None, True, False, (), "", u"" > - strings of length 1, Unicode strings of length 1 with ord < 256 > - integers between -5 and 256 > How do you deal with the reference counters of these objects? > > Also, type objects (in particular exception types) are shared between > interpreters. These are mutable objects, so you have actually > dictionaries shared between interpreters. How would you deal with > these? All these should be dealt with by making them per-interpreter singletons, not per address space. That should be simple enough, unfortunately the margins of this email are too small to describe how. ;) Also it'd be backwards incompatible with current extension modules. James From guido at python.org Mon Nov 6 05:52:46 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Nov 2006 20:52:46 -0800 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: References: <454B1EDD.9050908@googlemail.com> <454C5431.7080609@v.loewis.de> Message-ID: On 11/5/06, James Y Knight wrote: > > On Nov 4, 2006, at 3:49 AM, Martin v. L?wis wrote: > > > Notice that at least the following objects are shared between > > interpreters, as they are singletons: > > - None, True, False, (), "", u"" > > - strings of length 1, Unicode strings of length 1 with ord < 256 > > - integers between -5 and 256 > > How do you deal with the reference counters of these objects? > > > > Also, type objects (in particular exception types) are shared between > > interpreters. These are mutable objects, so you have actually > > dictionaries shared between interpreters. How would you deal with > > these? > > All these should be dealt with by making them per-interpreter > singletons, not per address space. That should be simple enough, > unfortunately the margins of this email are too small to describe > how. ;) Also it'd be backwards incompatible with current extension > modules. I don't know how you define simple. In order to be able to have separate GILs you have to remove *all* sharing of objects between interpreters. And all other data structures, too. It would probably kill performance too, because currently obmalloc relies on the GIL. So I don't see much point in continuing this thread. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Mon Nov 6 07:27:52 2006 From: talin at acm.org (Talin) Date: Sun, 05 Nov 2006 22:27:52 -0800 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: References: <454B1EDD.9050908@googlemail.com> <454C5431.7080609@v.loewis.de> Message-ID: <454ED5E8.4010709@acm.org> Guido van Rossum wrote: > I don't know how you define simple. In order to be able to have > separate GILs you have to remove *all* sharing of objects between > interpreters. And all other data structures, too. It would probably > kill performance too, because currently obmalloc relies on the GIL. Nitpick: You have to remove all sharing of *mutable* objects. One day, when we get "pure" GC with no refcounting, that will be a meaningful distinction. :) -- Talin From martin at v.loewis.de Mon Nov 6 07:49:14 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 07:49:14 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454E74ED.8070706@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> Message-ID: <454EDAEA.7050501@v.loewis.de> Greg Ewing schrieb: >> That should never be better: the system will cache the directory >> blocks, also, and it will do a better job than Python will. > > If that's really the case, then why do discussions > of how improve Python startup speeds seem to focus > on the number of stat calls made? A stat call will not only look at the directory entry, but also look at the inode. This will require another disk access, as the inode is at a different location of the disk. > Also, cacheing isn't the only thing to consider. > Last time I looked at the implementation of unix > file systems, they mostly seemed to do directory > lookups by linear search. Unless that's changed > a lot, I have a hard time seeing how that's > going to beat Python's highly-tuned dictionaries. It depends on the file system you are using. An NTFS directory lookup is a B-Tree search; NT has not been doing linear search since its introduction 15 years ago. Linux only recently started doing tree-based directories with the introduction of ext4. However, Linux' in-memory directory cache (the dcache) doesn't need to scan over the directory block structure; not sure whether it uses linear search still. For a small directory, the difference is likely negligible. For a large directory, the cost of reading in the entire directory might be higher than the savings gained from not having to search it. Also, if we do our own directory caching, the question is when to invalidate the cache. Regards, Martin From martin at v.loewis.de Mon Nov 6 08:03:45 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 08:03:45 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: <454EDE51.7000307@v.loewis.de> Andrew Dalke schrieb: >> I find there is a difference between "urllib behaves >> non-intuitively" and "urllib gives result A for parameters B and C, >> but should give result D instead". Can you please add specific examples >> to your report that demonstrate the difference between implemented >> and expected behavior? > > No. > > I consider the "../" cases to be unimportant edge cases and > I would rather people fixed the other problems highlighted in the > text I copied from 4Suite's Uri.py -- like improperly allowing a > relative URL as the base url, which I incorrectly assumed was > legit - and that others have reported on python-dev, easily found > with Google. It still should be possible to come up with examples for these as well, no? For example, if you pass a relative URI as the base URI, what would you like to see happen? > If I only add test cases for "../" then I believe that that's all that > will be fixed. That's true. Actually, it's probably not true; it will only get fixed if some volunteer contributes a fix. > Finally, I see that my report is a dup. SF search is poor. As > Nick Coghlan reported, Paul Jimenez has a replacement for urlparse. > Summarized in > http://www.python.org/dev/summary/2006-04-01_2006-04-15/ > It was submitted in spring as a patch - SF# 1462525 at > http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470 > which I didn't find in my earlier searching. So do you think this patch meets your requirements? This topic (URL parsing) is not only inherently difficult to implement, it is just as tedious to review. Without anybody reviewing the contributed code, it's certain that it will never be incorporated. Regards, Martin From dalke at dalkescientific.com Mon Nov 6 12:04:28 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Nov 2006 12:04:28 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454EDE51.7000307@v.loewis.de> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> <454EDE51.7000307@v.loewis.de> Message-ID: Martin: > It still should be possible to come up with examples for these as > well, no? For example, if you pass a relative URI as the base > URI, what would you like to see happen? Until two days ago I didn't even realize that was an incorrect use of urljoin. I can't be the only one. Hence, raise an exception - just like 4Suite's Uri.py does. > That's true. Actually, it's probably not true; it will only get fixed > if some volunteer contributes a fix. And it's not I. A true fix is a lot of work. I would rather use Uri.py, now that I see it handles everything I care about, and then some. Eg, file name <-> URI conversion. > So do you think this patch meets your requirements? # new >>> uriparse.urljoin("http://spam/", "foo/bar") 'http://spam//foo/bar' >>> # existing >>> urlparse.urljoin("http://spam/", "foo/bar") 'http://spam/foo/bar' >>> No. That was the first thing I tried. Also found >>> urlparse.urljoin("http://blah", "/spam/") 'http://blah/spam/' >>> uriparse.urljoin("http://blah", "/spam/") 'http://blah/spam' >>> I reported these on the patch page. Nothing else strange came up, but I did only try http urls and not the others. My "requirements", meaning my vague, spur-of-the-moment thoughts without any research or experimentation to determing their validity, are different than those for Python. My real requirements are met by the existing code. My imagined ones include support for edge cases, the idna codec, unicode, and real-world use on a variety of OSes. 4Suite's Uri.py seems to have this. Eg, lots of edge-case code like # On Windows, ensure that '|', not ':', is used in a drivespec. if os.name == 'nt' and scheme == 'file': path = path.replace(':','|',1) Hence the uriparse.py patch does not meet my hypothetical requirements . Python's requirements are probably to get closer to the spec. In which case yes, it's at least as good as and likely generally better than the existing module, modulo a few API naming debates and perhaps some rough edges which will be found when put into use. And perhaps various arguments about how bug compatible it should be and if the old code should be available as well as the new one, for those who depend on the existing 1808-allowed implementation dependent behavior. For those I have not the experience to guide me and no care to push the debate. I've decided I'm going to experiment using 4Suite's Uri.py for my code because it handles things I want which are outside of the scope of uriparse.py > This topic (URL parsing) is not only inherently difficult to > implement, it is just as tedious to review. Without anybody > reviewing the contributed code, it's certain that it will never > be incorporated. I have a different opinion. Python's url manipulation code is a mess. urlparse, urllib, urllib2. Why is "urlencode" part of urllib and not urllib2? For that matter, urllib is labeled 'Open an arbitrary URL' and not 'and also do manipulations on parts of URLs." I don't want to start fixing code because doing it the way I want to requires a new API and a much better understanding of the RFCs than I care about, especially since 4Suite and others have already done this. Hence I would say to just grab their library. And perhaps update the naming scheme. Also, urlgrabber and pycURL are better for downloading arbitrary URIs. For some definitions of "better". Andrew dalke at dalkescientific.com From tds333+pydev at gmail.com Mon Nov 6 13:18:37 2006 From: tds333+pydev at gmail.com (Wolfgang Langner) Date: Mon, 6 Nov 2006 13:18:37 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CB619.7010804@v.loewis.de> References: <454CB619.7010804@v.loewis.de> Message-ID: <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com> Why not only import *.pyc files and no longer use *.pyo files. It is simpler to have one compiled python file extension. PYC files can contain optimized python byte code and normal byte code. -- bye by Wolfgang From arigo at tunes.org Mon Nov 6 14:57:51 2006 From: arigo at tunes.org (Armin Rigo) Date: Mon, 6 Nov 2006 14:57:51 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454CB619.7010804@v.loewis.de> References: <454CB619.7010804@v.loewis.de> Message-ID: <20061106135751.GA29592@code0.codespeak.net> Hi Martin, On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote: > Patch #1346572 proposes to also search for .pyc when OptimizeFlag > is set, and for .pyo when it is not set. The author argues this is > for consistency, as the zipimporter already does that. My strong opinion on the matter is that importing a .pyc file if the .py file is not present is wrong in the first place. It caused many headaches in several projects I worked on. Additionally trying to importing .pyo files looks like a complete semantic non-sense, but I can't really argue from experience, as I never run python -O. Typical example: someone in the project removes a .py file, and checks in this change; someone else does an 'svn up', which kills the .py in his working copy, but not the .pyc. These stale .pyc's cause pain, e.g. by shadowing the real module (further down sys.path), or simply by preventing the project's developers from realizing that they forgot to fix some imports. We regularly had obscure problems that went away as soon as we deleted all .pyc files around, but I cannot comment more on that because we never really investigated. I know it's a discussion that comes up and dies out regularly. My two cents is that it would be saner to have two separate concepts: cache files used internally by the interpreter for speed reasons only, and bytecode files that can be shipped and imported. This could e.g. be done with different file extensions (then you just rename the files if you want to ship them as bytecode without source), or with a temporary cache directory (from where you can fish bytecode files if you want to ship them). Experience suggests I should not be holding my breath until something is decided about this, though. If I were asked to come up with a patch I'd simply propose one that removes importing of stale .pyc files (I'm always running a version of Python with such a patch, to avoid the above-mentioned troubles). A bientot, Armin From fredrik at pythonware.com Mon Nov 6 14:59:06 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 06 Nov 2006 14:59:06 +0100 Subject: [Python-Dev] Path object design In-Reply-To: <454E6E5F.7070800@v.loewis.de> References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Andrew Dalke schrieb: >>>>> urlparse.urljoin("http://blah.com/", "..") >> 'http://blah.com/' >>>>> urlparse.urljoin("http://blah.com/", "../") >> 'http://blah.com/../' >>>>> urlparse.urljoin("http://blah.com/", "../..") >> 'http://blah.com/' >> >> Does the result make sense to you? Does it make >> sense that the last of these is shorter than the middle >> one? It sure doesn't to me. I thought it was obvious >> that there was an error; > > That wasn't obvious at all to me. Now looking at the > examples, I agree there is an error. The middle one > is incorrect; > > urlparse.urljoin("http://blah.com/", "../") > > should also give 'http://blah.com/'. make that: could also give 'http://blah.com/'. as I said, today's urljoin doesn't guarantee that the output is the *shortest* possible way to represent the resulting URI. From dalke at dalkescientific.com Mon Nov 6 15:57:59 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Nov 2006 15:57:59 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: Andrew: > >>> urlparse.urljoin("http://blah.com/", "..") > 'http://blah.com/' > >>> urlparse.urljoin("http://blah.com/", "../") > 'http://blah.com/../' > >>> urlparse.urljoin("http://blah.com/", "../..") > 'http://blah.com/' /F: > as I said, today's urljoin doesn't guarantee that the output is > the *shortest* possible way to represent the resulting URI. I didn't think anyone was making that claim. The module claims RFC 1808 compliance. From the docstring: DESCRIPTION See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June 1995. Now quoting from RFC 1808: 5.2. Abnormal Examples Although the following abnormal examples are unlikely to occur in normal practice, all URL parsers should be capable of resolving them consistently. Each example uses the same base as above. An empty reference resolves to the complete base URL: <> = Parsers must be careful in handling the case where there are more relative path ".." segments than there are hierarchical levels in the base URL's path. My claim is that "consistent" implies "in the spirit of the rest of the RFC" and "to a human trying to make sense of the results" and not only mean "does the same thing each time." Else >>> urljoin("http://blah.com/", "../../..") 'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url' would be equally consistent. >>> for rel in ".. ../ ../.. ../../ ../../.. ../../../ ../../../..".split(): ... print repr(rel), repr(urlparse.urljoin("http://blah.com/", rel)) ... '..' 'http://blah.com/' '../' 'http://blah.com/../' '../..' 'http://blah.com/' '../../' 'http://blah.com/../../' '../../..' 'http://blah.com/../' '../../../' 'http://blah.com/../../../' '../../../..' 'http://blah.com/../../' I grant there is a consistency there. It's not one most would have predicted beforehand. Then again, "should" is that wishy-washy "unless you've got a good reason to do it a different way" sort of constraint. Andrew dalke at dalkescientific.com From tomerfiliba at gmail.com Mon Nov 6 16:02:51 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 6 Nov 2006 17:02:51 +0200 Subject: [Python-Dev] __dir__, part 2 Message-ID: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> so, if you remember, i suggested adding __dir__ to objects, so as to make dir() customizable, remove the deprecated __methods__ and __members__, and make it symmetrical to other built-in functions. you can see the original post here: http://mail.python.org/pipermail/python-dev/2006-July/067095.html which was generally accepted by the forum: http://mail.python.org/pipermail/python-dev/2006-July/067139.html so i went on, now that i have some spare time, to research the issue. the current dir() works as follows: (*) builtin_dir calls PyObject_Dir to do the trick (*) if the object is NULL (dir with no argument), return the frame's locals (*) if the object is a *module*, we're just using it's __dict__ (*) if the object is a *type*, we're using it's __dict__ and __bases__, but not __class__ (so as not to show the metaclass) (*) otherwise, it's a "normal object", so we take it's __dict__, along with __methods__, __members__, and dir(__class__) (*) create a list of keys from the dict, sort, return we'll have to change that if we were to introduce __dir__. my design is: (*) builtin_dir, if called without an argument, returns the frame's locals (*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__() (*) if `self` doesn't have __dir__, default to object.__dir__(self) (*) the default object.__dir__ implementation would do the same as today: collect __dict__, __members__, __methods__, and dir(__class__). by py3k, we'll remove looking into __methods__ and __members__. (*) type objects and module objects would implement __dir__ to their liking (as PyObject_Dir does today) (*) builtin_dir would take care of sorting the list returned by PyObject_Dir so first i'd want you people to react on my design, maybe you'd find flaws whatever. also, should this become a PEP? and last, how do i add a new method slot? does it mean i need to change all type-object definitions throughout the codebase? do i add it to some protocol? or directly to the "object protocol"? -tomer From jcarlson at uci.edu Mon Nov 6 16:53:45 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 06 Nov 2006 07:53:45 -0800 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) In-Reply-To: <454ED5E8.4010709@acm.org> References: <454ED5E8.4010709@acm.org> Message-ID: <20061106075222.8221.JCARLSON@uci.edu> Talin wrote: > > Guido van Rossum wrote: > > I don't know how you define simple. In order to be able to have > > separate GILs you have to remove *all* sharing of objects between > > interpreters. And all other data structures, too. It would probably > > kill performance too, because currently obmalloc relies on the GIL. > > Nitpick: You have to remove all sharing of *mutable* objects. One day, > when we get "pure" GC with no refcounting, that will be a meaningful > distinction. :) Python already grew that feature a couple years back, but it never became mainline. Search google (I don't know the magic incantation off the top of my head), buf if I remember correctly, it wasn't a significant win if any at all. - Josiah From guido at python.org Mon Nov 6 17:07:07 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Nov 2006 08:07:07 -0800 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> Message-ID: Sounds like a good plan, though I'm not sure if it's worth doing in 2.6 -- I'd be happy with doing this just in 3k. I'm not sure what you mean by "adding a method slot" -- certainly it's possible to define a method __foo__ and call it directly without having a special tp_foo in the type object, and I recommend doing it that way since the tp_foo slots are just there to make things fast; in this case I don't see a need for dir() to be fast. --Guido On 11/6/06, tomer filiba wrote: > so, if you remember, i suggested adding __dir__ to objects, so as to make > dir() customizable, remove the deprecated __methods__ and __members__, > and make it symmetrical to other built-in functions. > > you can see the original post here: > http://mail.python.org/pipermail/python-dev/2006-July/067095.html > which was generally accepted by the forum: > http://mail.python.org/pipermail/python-dev/2006-July/067139.html > > so i went on, now that i have some spare time, to research the issue. > the current dir() works as follows: > (*) builtin_dir calls PyObject_Dir to do the trick > (*) if the object is NULL (dir with no argument), return the frame's locals > (*) if the object is a *module*, we're just using it's __dict__ > (*) if the object is a *type*, we're using it's __dict__ and __bases__, > but not __class__ (so as not to show the metaclass) > (*) otherwise, it's a "normal object", so we take it's __dict__, along with > __methods__, __members__, and dir(__class__) > (*) create a list of keys from the dict, sort, return > > we'll have to change that if we were to introduce __dir__. my design is: > (*) builtin_dir, if called without an argument, returns the frame's locals > (*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__() > (*) if `self` doesn't have __dir__, default to object.__dir__(self) > (*) the default object.__dir__ implementation would do the same as > today: collect __dict__, __members__, __methods__, and dir(__class__). > by py3k, we'll remove looking into __methods__ and __members__. > (*) type objects and module objects would implement __dir__ to their > liking (as PyObject_Dir does today) > (*) builtin_dir would take care of sorting the list returned by PyObject_Dir > > so first i'd want you people to react on my design, maybe you'd find > flaws whatever. also, should this become a PEP? > > and last, how do i add a new method slot? does it mean i need to > change all type-object definitions throughout the codebase? > do i add it to some protocol? or directly to the "object protocol"? > > > -tomer > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Mon Nov 6 16:48:45 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 06 Nov 2006 16:48:45 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: Andrew Dalke wrote: >> as I said, today's urljoin doesn't guarantee that the output is >> the *shortest* possible way to represent the resulting URI. > > I didn't think anyone was making that claim. The module claims > RFC 1808 compliance. From the docstring: > > DESCRIPTION > See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, > UC Irvine, June 1995. > > Now quoting from RFC 1808: > > 5.2. Abnormal Examples > > Although the following abnormal examples are unlikely to occur in > normal practice, all URL parsers should be capable of resolving them > consistently. > My claim is that "consistent" implies "in the spirit of the rest of the RFC" > and "to a human trying to make sense of the results" and not only > mean "does the same thing each time." Else > >>>> urljoin("http://blah.com/", "../../..") > 'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url' > > would be equally consistent. perhaps, but such an urljoin wouldn't pass the minimize(base + relative) == minimize(urljoin(base, relative)) test that today's urljoin passes (where "minimize" is defined as "create the shortest possible URI that identifies the same target, according to the relevant RFC"). isn't the real issue in this subthread whether urljoin should be expected to pass the minimize(base + relative) == urljoin(base, relative) test? From jcarlson at uci.edu Mon Nov 6 17:36:14 2006 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 06 Nov 2006 08:36:14 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061106135751.GA29592@code0.codespeak.net> References: <454CB619.7010804@v.loewis.de> <20061106135751.GA29592@code0.codespeak.net> Message-ID: <20061106082638.822F.JCARLSON@uci.edu> Armin Rigo wrote: > Hi Martin, > On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote: > > Patch #1346572 proposes to also search for .pyc when OptimizeFlag > > is set, and for .pyo when it is not set. The author argues this is > > for consistency, as the zipimporter already does that. > > My strong opinion on the matter is that importing a .pyc file if the .py > file is not present is wrong in the first place. It caused many > headaches in several projects I worked on. > > Typical example: someone in the project removes a .py file, and checks > in this change; someone else does an 'svn up', which kills the .py in > his working copy, but not the .pyc. These stale .pyc's cause pain, e.g. > by shadowing the real module (further down sys.path), or simply by > preventing the project's developers from realizing that they forgot to > fix some imports. We regularly had obscure problems that went away as > soon as we deleted all .pyc files around, but I cannot comment more on > that because we never really investigated. I had a very similar problem the other week when mucking about with a patch to ntpath . I had it in a somewhat small temporary projects folder and needed to run another project. It picked up the local ntpath.py when importing path.py, but then failed because I was working on a 2.5 derived ntpath, but I was using 2.3 to run the other project. After renaming the local ntpath, I continued to get the error until I realized "damn pyc" and was halfway through a filsystem wide search for the problem code (10 minutes elapsed). About the only place where I have found the need for pyc-without-py importing is for zipimports, specifically as used by py2exe and other freezing applications. I don't know if we want to add a new command line option, or a __future__ import, or something, but I think there should be some method of warning people that an import was performed without source code. - Josiah From martin at v.loewis.de Mon Nov 6 18:55:04 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 18:55:04 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> <454EDE51.7000307@v.loewis.de> Message-ID: <454F76F8.5090805@v.loewis.de> Andrew Dalke schrieb: > Hence I would say to just grab their library. And perhaps update the > naming scheme. Unfortunately, this is not an option. *You* can just grab their library; the Python distribution can't. Doing so would mean to fork, and history tells that forks cause problems in the long run. OTOH, if the 4Suite people would contribute the library, integrating it would be an option. Regards, Martin From martin at v.loewis.de Mon Nov 6 19:00:22 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 19:00:22 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061106135751.GA29592@code0.codespeak.net> References: <454CB619.7010804@v.loewis.de> <20061106135751.GA29592@code0.codespeak.net> Message-ID: <454F7836.9030807@v.loewis.de> Armin Rigo schrieb: > My strong opinion on the matter is that importing a .pyc file if the .py > file is not present is wrong in the first place. There is, of course, an important use case (which you are addressing with a different approach): people want to ship only byte code, not source code, because they feel it protects their IP better, and also for space reasons. So outright ignoring pyc files is not really an option. > I know it's a discussion that comes up and dies out regularly. My two > cents is that it would be saner to have two separate concepts: cache > files used internally by the interpreter for speed reasons only, and > bytecode files that can be shipped and imported. There once was a PEP to better control byte code file generation; it died because it wasn't implemented. I don't think there is a strong opposition to changing the status quo - it's just that you need a well-designed specification before you start, a serious, all-singing-all-dancing implementation, and a lot of test cases. I believe it is these constraints which have prevented any progress here. Regards, Martin From martin at v.loewis.de Mon Nov 6 19:02:13 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 19:02:13 +0100 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454BD1A9.8080508@v.loewis.de> <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: <454F78A5.6000400@v.loewis.de> Fredrik Lundh schrieb: >> urlparse.urljoin("http://blah.com/", "../") >> >> should also give 'http://blah.com/'. > > make that: could also give 'http://blah.com/'. How so? If that would implement RFC 3986, you can get only a single outcome, if urljoin is meant to implement section 5 of that RFC. Regards, Martin From martin at v.loewis.de Mon Nov 6 19:03:57 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 06 Nov 2006 19:03:57 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com> References: <454CB619.7010804@v.loewis.de> <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com> Message-ID: <454F790D.6090508@v.loewis.de> Wolfgang Langner schrieb: > Why not only import *.pyc files and no longer use *.pyo files. > > It is simpler to have one compiled python file extension. > PYC files can contain optimized python byte code and normal byte code. So what would you do with the -O option of the interpreter? Regards, Martin From rasky at develer.com Mon Nov 6 20:54:35 2006 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 6 Nov 2006 20:54:35 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa References: <454CB619.7010804@v.loewis.de> <20061106135751.GA29592@code0.codespeak.net> Message-ID: <06d201c701dd$62aa8110$c003030a@trilan> Armin Rigo wrote: > Typical example: someone in the project removes a .py file, and checks > in this change; someone else does an 'svn up', which kills the .py in > his working copy, but not the .pyc. These stale .pyc's cause pain, > e.g. > by shadowing the real module (further down sys.path), or simply by > preventing the project's developers from realizing that they forgot to > fix some imports. We regularly had obscure problems that went away as > soon as we deleted all .pyc files around, but I cannot comment more on > that because we never really investigated. This is exactly why I always use this module: ================== nobarepyc.py ============================ #!/usr/bin/env python #-*- coding: utf-8 -*- import ihooks import os class _NoBarePycHooks(ihooks.Hooks): def load_compiled(self, name, filename, *args, **kwargs): sourcefn = os.path.splitext(filename)[0] + ".py" if not os.path.isfile(sourcefn): raise ImportError('forbidden import of bare .pyc file: %r' % filename) return ihooks.Hooks.load_compiled(name, filename, *args, **kwargs) ihooks.ModuleImporter(ihooks.ModuleLoader(_NoBarePycHooks())).install() ================== /nobarepyc.py ============================ Just import it before importing anything else (or in site.py if you prefer) and you'll be done. Ah, it doesn't work with zipimports... -- Giovanni Bajo From steve at holdenweb.com Mon Nov 6 20:48:55 2006 From: steve at holdenweb.com (Steve Holden) Date: Mon, 06 Nov 2006 14:48:55 -0500 Subject: [Python-Dev] Path object design In-Reply-To: References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com> <454CCB03.1030806@holdenweb.com> <454E39E5.8040604@v.loewis.de> <454E6E5F.7070800@v.loewis.de> Message-ID: Fredrik Lundh wrote: > Andrew Dalke wrote: > > >>>as I said, today's urljoin doesn't guarantee that the output is >>>the *shortest* possible way to represent the resulting URI. >> >>I didn't think anyone was making that claim. The module claims >>RFC 1808 compliance. From the docstring: >> >> DESCRIPTION >> See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, >> UC Irvine, June 1995. >> >>Now quoting from RFC 1808: >> >> 5.2. Abnormal Examples >> >> Although the following abnormal examples are unlikely to occur in >> normal practice, all URL parsers should be capable of resolving them >> consistently. > > >>My claim is that "consistent" implies "in the spirit of the rest of the RFC" >>and "to a human trying to make sense of the results" and not only >>mean "does the same thing each time." Else >> >> >>>>>urljoin("http://blah.com/", "../../..") >> >>'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url' >> >>would be equally consistent. > > > perhaps, but such an urljoin wouldn't pass the > > minimize(base + relative) == minimize(urljoin(base, relative)) > > test that today's urljoin passes (where "minimize" is defined as "create > the shortest possible URI that identifies the same target, according to > the relevant RFC"). > > isn't the real issue in this subthread whether urljoin should be > expected to pass the > > minimize(base + relative) == urljoin(base, relative) > > test? > I should hope that *is* the issue, and I should further hope that the general wish would be for it to pass that test. Of course web systems have been riddled with canonicalization errors in the past, so it'd be best if you and/or Andrew could provide a minimize() implementation :-) regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From rasky at develer.com Mon Nov 6 21:01:19 2006 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 6 Nov 2006 21:01:19 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa References: <454CB619.7010804@v.loewis.de><4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com> <454F790D.6090508@v.loewis.de> Message-ID: <071a01c701de$533a0fb0$c003030a@trilan> Martin v. L?wis wrote: >> Why not only import *.pyc files and no longer use *.pyo files. >> >> It is simpler to have one compiled python file extension. >> PYC files can contain optimized python byte code and normal byte >> code. > > So what would you do with the -O option of the interpreter? I just had an idea: we could have only pyc files, and *no* way to identify whether specific "optimizations" (-O, -OO --only-strip-docstrings, whatever) were performed on them or not. So, if you regularly run different python applications with different optimization settings, you'll end up with .pyc files containing bytecode that was generated with mixed optimization settings. It doesn't really matter in most cases, after all. Then, we add a single command line option (eg: "-I") which is: "ignore *every* .pyc file out there, and regenerate them as needed". So, the few times that you really care that a certain application is run with a specific setting, you can use "python -I -OO app.py". And that's all. -- Giovanni Bajo From brett at python.org Mon Nov 6 21:17:58 2006 From: brett at python.org (Brett Cannon) Date: Mon, 6 Nov 2006 12:17:58 -0800 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <071a01c701de$533a0fb0$c003030a@trilan> References: <454CB619.7010804@v.loewis.de> <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com> <454F790D.6090508@v.loewis.de> <071a01c701de$533a0fb0$c003030a@trilan> Message-ID: On 11/6/06, Giovanni Bajo wrote: > > Martin v. L?wis wrote: > > >> Why not only import *.pyc files and no longer use *.pyo files. > >> > >> It is simpler to have one compiled python file extension. > >> PYC files can contain optimized python byte code and normal byte > >> code. > > > > So what would you do with the -O option of the interpreter? > > I just had an idea: we could have only pyc files, and *no* way to identify > whether specific "optimizations" (-O, -OO --only-strip-docstrings, > whatever) > were performed on them or not. So, if you regularly run different python > applications with different optimization settings, you'll end up with .pyc > files containing bytecode that was generated with mixed optimization > settings. It doesn't really matter in most cases, after all. I don't know about that. If you suspected that a failure could be because of some bytecode optimization you were trying wouldn't you like to be able to tell easily that fact? Granted our situation is not as bad as gcc in terms the impact of having to regenerate a compiled version, but it still would be nice to be able to make sure that every .pyc file is the same. We would need to make it easy to blast out every .pyc file found if we did allow mixing of optimizations (as you suggest below). Then, we add a single command line option (eg: "-I") which is: "ignore > *every* .pyc file out there, and regenerate them as needed". So, the few > times that you really care that a certain application is run with a > specific > setting, you can use "python -I -OO app.py". That might work. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061106/dc398f8d/attachment.html From tomerfiliba at gmail.com Mon Nov 6 22:55:11 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 6 Nov 2006 23:55:11 +0200 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> Message-ID: <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> cool. first time i build the entire interpreter, 'twas fun :) currently i "retained" support for __members__ and __methods__, so it doesn't break anything and is compatible with 2.6. i really hope it will be included in 2.6 as today i'm using ugly hacks in RPyC to make remote objects appear like local ones. having __dir__ solves all of my problems. besides, it makes a lot of sense of define __dir__ for classes that define __getattr__. i don't think it should be pushed back to py3k. here's the patch: http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470 here's a demo: >>> class foo(object): ... def __dir__(self): ... return ["kan", "ga", "roo"] ... >>> f = foo() >>> f <__main__.foo object at 0x00A90C78> >>> dir() ['__builtins__', '__doc__', '__name__', 'f', 'foo'] >>> dir(f) ['ga', 'kan', 'roo'] >>> dir(foo) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__getattribute__ ', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__ ', '__repr__', '__setattr__', '__str__', '__weakref__'] >>> class bar(object): ... __members__ = ["bow", "wow"] ... >>> b=bar() >>> dir(b) ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash_ _', '__init__', '__members__', '__module__', '__new__', '__reduce__', '__reduce_ ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'bow', 'wow'] -tomer On 11/6/06, Guido van Rossum wrote: > Sounds like a good plan, though I'm not sure if it's worth doing in > 2.6 -- I'd be happy with doing this just in 3k. > > I'm not sure what you mean by "adding a method slot" -- certainly it's > possible to define a method __foo__ and call it directly without > having a special tp_foo in the type object, and I recommend doing it > that way since the tp_foo slots are just there to make things fast; in > this case I don't see a need for dir() to be fast. > > --Guido > > On 11/6/06, tomer filiba wrote: > > so, if you remember, i suggested adding __dir__ to objects, so as to make > > dir() customizable, remove the deprecated __methods__ and __members__, > > and make it symmetrical to other built-in functions. > > > > you can see the original post here: > > http://mail.python.org/pipermail/python-dev/2006-July/067095.html > > which was generally accepted by the forum: > > http://mail.python.org/pipermail/python-dev/2006-July/067139.html > > > > so i went on, now that i have some spare time, to research the issue. > > the current dir() works as follows: > > (*) builtin_dir calls PyObject_Dir to do the trick > > (*) if the object is NULL (dir with no argument), return the frame's locals > > (*) if the object is a *module*, we're just using it's __dict__ > > (*) if the object is a *type*, we're using it's __dict__ and __bases__, > > but not __class__ (so as not to show the metaclass) > > (*) otherwise, it's a "normal object", so we take it's __dict__, along with > > __methods__, __members__, and dir(__class__) > > (*) create a list of keys from the dict, sort, return > > > > we'll have to change that if we were to introduce __dir__. my design is: > > (*) builtin_dir, if called without an argument, returns the frame's locals > > (*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__() > > (*) if `self` doesn't have __dir__, default to object.__dir__(self) > > (*) the default object.__dir__ implementation would do the same as > > today: collect __dict__, __members__, __methods__, and dir(__class__). > > by py3k, we'll remove looking into __methods__ and __members__. > > (*) type objects and module objects would implement __dir__ to their > > liking (as PyObject_Dir does today) > > (*) builtin_dir would take care of sorting the list returned by PyObject_Dir > > > > so first i'd want you people to react on my design, maybe you'd find > > flaws whatever. also, should this become a PEP? > > > > and last, how do i add a new method slot? does it mean i need to > > change all type-object definitions throughout the codebase? > > do i add it to some protocol? or directly to the "object protocol"? > > > > > > -tomer > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From ncoghlan at gmail.com Mon Nov 6 23:57:07 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 07 Nov 2006 08:57:07 +1000 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> Message-ID: <454FBDC3.3060100@gmail.com> tomer filiba wrote: > cool. first time i build the entire interpreter, 'twas fun :) > currently i "retained" support for __members__ and __methods__, > so it doesn't break anything and is compatible with 2.6. > > i really hope it will be included in 2.6 as today i'm using ugly hacks > in RPyC to make remote objects appear like local ones. > having __dir__ solves all of my problems. > > besides, it makes a lot of sense of define __dir__ for classes that > define __getattr__. i don't think it should be pushed back to py3k. > > here's the patch: > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470 As I noted on the tracker, PyObject_Dir is a public C API function, so it's behaviour needs to be preserved as well as the behaviour of calling dir() from Python code. So the final form of the patch will likely need to include stronger tests for that section of the API, as well as updating the documentation in various places (the dir and PyObject_Dir documentation, obviously, but also the list of magic methods in the language reference). +1 on targeting 2.6, too. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Tue Nov 7 00:20:00 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Nov 2006 12:20:00 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454EDAEA.7050501@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> Message-ID: <454FC320.9050604@canterbury.ac.nz> Martin v. L?wis wrote: > A stat call will not only look at the directory entry, but also > look at the inode. This will require another disk access, as the > inode is at a different location of the disk. That should be in favour of the directory-reading approach, since e.g. to find out which if any of x.py/x.pyc/x.pyo exists, you only need to look for the names. > It depends on the file system you are using. An NTFS directory > lookup is a B-Tree search; ... Yes, I know that some file systems are smarter; MacOS HFS is another one that uses b-trees. However it still seems to me that looking up a path in a file system is a much heavier operation than looking up a Python dict, even if everything is in memory. You have to parse the path, and look up each component separately in a different directory tree or whatever. The way I envisage it, you would read all the directories and build a single dictionary mapping fully-qualified module names to pathnames. Any given import then takes at most one dict lookup and one access of a known-to-exist file. > For > a large directory, the cost of reading in the entire directory > might be higher than the savings gained from not having to > search it. Possibly. I guess we'd need some timings to assess the meaning of "large". > Also, if we do our own directory caching, the question > is when to invalidate the cache. I think I'd be happy with having to do that explicitly. I expect the vast majority of Python programs don't need to track changes to the set of importable modules during execution. The exceptions would be things like IDEs, and they could do a cache flush before reloading a module, etc. -- Greg From exarkun at divmod.com Tue Nov 7 00:33:36 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 6 Nov 2006 18:33:36 -0500 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454FC320.9050604@canterbury.ac.nz> Message-ID: <20061106233336.20948.1525550504.divmod.quotient.16145@ohm> On Tue, 07 Nov 2006 12:20:00 +1300, Greg Ewing wrote: > >I think I'd be happy with having to do that explicitly. >I expect the vast majority of Python programs don't >need to track changes to the set of importable modules >during execution. The exceptions would be things like >IDEs, and they could do a cache flush before reloading >a module, etc. Another questionable optimization which changes application- level semantics. No, please? Jean-Paul From martin at v.loewis.de Tue Nov 7 00:38:57 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 00:38:57 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454FC320.9050604@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> Message-ID: <454FC791.7080106@v.loewis.de> Greg Ewing schrieb: > I think I'd be happy with having to do that explicitly. > I expect the vast majority of Python programs don't > need to track changes to the set of importable modules > during execution. The exceptions would be things like > IDEs, and they could do a cache flush before reloading > a module, etc. That would be a change in behavior, of course. Currently, you can put a file on disk and import it immediately; that will stop working. I'm pretty sure that there are a number of applications that rely on this specific detail of the current implementation (and not only IDEs). It still might be worthwhile to make such a change, but I'd like to see practical advantages demonstrated first. Regards, Martin From tdelaney at avaya.com Tue Nov 7 00:53:26 2006 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Tue, 7 Nov 2006 10:53:26 +1100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com> "Martin v. L?wis" wrote: > Greg Ewing schrieb: >> I think I'd be happy with having to do that explicitly. >> I expect the vast majority of Python programs don't >> need to track changes to the set of importable modules >> during execution. The exceptions would be things like >> IDEs, and they could do a cache flush before reloading >> a module, etc. > > That would be a change in behavior, of course. > > Currently, you can put a file on disk and import it > immediately; that will stop working. I'm pretty sure > that there are a number of applications that rely > on this specific detail of the current implementation > (and not only IDEs). Would it be reasonable to always do a stat() on the directory, reloading if there's been a change? Would this be reliable across platforms? Tim Delaney From hg211 at hszk.bme.hu Tue Nov 7 03:11:31 2006 From: hg211 at hszk.bme.hu (Herman Geza) Date: Tue, 7 Nov 2006 03:11:31 +0100 (MET) Subject: [Python-Dev] valgrind Message-ID: Hi! I've embedded python into my application. Using valgrind I got a lot of errors. I understand that "Conditional jump or move depends on uninitialised value(s)" errors are completely ok (from Misc/README.valgrind). However, I don't understand why "Invalid read"'s are legal, like this: ==21737== Invalid read of size 4 ==21737== at 0x408DDDF: PyObject_Free (in /usr/lib/libpython2.4.so.1.0) ==21737== by 0x4096F67: (within /usr/lib/libpython2.4.so.1.0) ==21737== by 0x408A5AC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==21737== by 0x40C65F8: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==21737== Address 0xC02E010 is 32 bytes inside a block of size 40 free'd ==21737== at 0x401D139: free (vg_replace_malloc.c:233) ==21737== by 0x408DE00: PyObject_Free (in /usr/lib/libpython2.4.so.1.0) ==21737== by 0x407BB4D: (within /usr/lib/libpython2.4.so.1.0) ==21737== by 0x407A3D6: (within /usr/lib/libpython2.4.so.1.0) Here python reads from an already-freed memory area, right? (I don't think that Misc/README.valgrind answers this question). Or is it a false alarm? Thanks, Geza Herman From martin at v.loewis.de Tue Nov 7 07:19:24 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 07:19:24 +0100 Subject: [Python-Dev] valgrind In-Reply-To: References: Message-ID: <4550256C.1020109@v.loewis.de> Herman Geza schrieb: > Here python reads from an already-freed memory area, right? It looks like it, yes. Of course, it could be a flaw in valgrind, too. To find out, one would have to understand what the memory block is, and what part of PyObject_Free accesses it. Regards, Martin From nnorwitz at gmail.com Tue Nov 7 08:02:22 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 6 Nov 2006 23:02:22 -0800 Subject: [Python-Dev] valgrind In-Reply-To: <4550256C.1020109@v.loewis.de> References: <4550256C.1020109@v.loewis.de> Message-ID: On 11/6/06, "Martin v. L?wis" wrote: > Herman Geza schrieb: > > Here python reads from an already-freed memory area, right? > > It looks like it, yes. Of course, it could be a flaw in valgrind, too. > To find out, one would have to understand what the memory block is, > and what part of PyObject_Free accesses it. I'm a bit confused. I ran with valgrind ./python -c pass which returns 23 invalid read problems (some are the same chunk of memory). This is with 2.5 (more or less). Valgrind 3.2.1 on amd64. Every address ended with 0x5...020. That seems odd. I looked through the valgrind bug reports and didn't see anything. The first problem reported was: Invalid read of size 4 at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741) by 0x44E225: PyObject_Free (obmalloc.c:920) by 0x44EB90: _PyObject_DebugFree (obmalloc.c:1361) by 0x444A28: dictresize (dictobject.c:546) by 0x444D5B: PyDict_SetItem (dictobject.c:655) by 0x462533: PyString_InternInPlace (stringobject.c:4920) by 0x448450: PyDict_SetItemString (dictobject.c:2120) by 0x4C240A: PyModule_AddObject (modsupport.c:615) by 0x428B00: _PyExc_Init (exceptions.c:2117) by 0x4C449A: Py_InitializeEx (pythonrun.c:225) by 0x4C4827: Py_Initialize (pythonrun.c:315) by 0x41270A: Py_Main (main.c:449) Address 0x52AE020 is 4,392 bytes inside a block of size 5,544 free'd at 0x4A1A828: free (vg_replace_malloc.c:233) by 0x5071635: qsort (in /lib/libc-2.3.5.so) by 0x474E4B: init_slotdefs (typeobject.c:5368) by 0x47522E: add_operators (typeobject.c:5511) by 0x46E3A1: PyType_Ready (typeobject.c:3209) by 0x46E2D4: PyType_Ready (typeobject.c:3173) by 0x44D13E: _Py_ReadyTypes (object.c:1864) by 0x4C4362: Py_InitializeEx (pythonrun.c:183) by 0x4C4827: Py_Initialize (pythonrun.c:315) by 0x41270A: Py_Main (main.c:449) by 0x411CD2: main (python.c:23) Note that the free is inside qsort. The memory freed under qsort should definitely not be the bases which we allocated under PyType_Ready. I'll file a bug report with valgrind to help determine if this is a problem in Python or valgrind. http://bugs.kde.org/show_bug.cgi?id=136989 One other thing that is weird is that the complaint is about 4 bytes which should not be possible. All pointers should be 8 bytes AFAIK since this is amd64. I also ran this on x86. There were 32 errors and all of their addresses were 0x4...010. n From tim.peters at gmail.com Tue Nov 7 08:20:14 2006 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 7 Nov 2006 02:20:14 -0500 Subject: [Python-Dev] valgrind In-Reply-To: <4550256C.1020109@v.loewis.de> References: <4550256C.1020109@v.loewis.de> Message-ID: <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> [Herman Geza] >> Here python reads from an already-freed memory area, right? [Martin v. L?wis] > It looks like it, yes. Of course, it could be a flaw in valgrind, too. > To find out, one would have to understand what the memory block is, > and what part of PyObject_Free accesses it. When PyObject_Free is handed an address it doesn't control, the "arena base address" it derives from that address may point at anything the system malloc controls, including uninitialized memory, memory the system malloc has allocated to something, memory the system malloc has freed, or internal system malloc bookkeeping bytes. The Py_ADDRESS_IN_RANGE macro has no way to know before reading it up. So figure out which line of code valgrind is complaining about (doesn't valgrind usually produce that?). If it's coming from the expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought. From guido at python.org Tue Nov 7 09:00:14 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Nov 2006 00:00:14 -0800 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <454FBDC3.3060100@gmail.com> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: No objection on targetting 2.6 if other developers agree. Seems this is well under way. good work! On 11/6/06, Nick Coghlan wrote: > tomer filiba wrote: > > cool. first time i build the entire interpreter, 'twas fun :) > > currently i "retained" support for __members__ and __methods__, > > so it doesn't break anything and is compatible with 2.6. > > > > i really hope it will be included in 2.6 as today i'm using ugly hacks > > in RPyC to make remote objects appear like local ones. > > having __dir__ solves all of my problems. > > > > besides, it makes a lot of sense of define __dir__ for classes that > > define __getattr__. i don't think it should be pushed back to py3k. > > > > here's the patch: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470 > > As I noted on the tracker, PyObject_Dir is a public C API function, so it's > behaviour needs to be preserved as well as the behaviour of calling dir() from > Python code. > > So the final form of the patch will likely need to include stronger tests for > that section of the API, as well as updating the documentation in various > places (the dir and PyObject_Dir documentation, obviously, but also the list > of magic methods in the language reference). > > +1 on targeting 2.6, too. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ronaldoussoren at mac.com Tue Nov 7 09:57:31 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 7 Nov 2006 09:57:31 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454FC320.9050604@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> Message-ID: <85B44252-6E88-434C-A126-745018C66B25@mac.com> On 7Nov 2006, at 12:20 AM, Greg Ewing wrote: > >> Also, if we do our own directory caching, the question >> is when to invalidate the cache. > > I think I'd be happy with having to do that explicitly. > I expect the vast majority of Python programs don't > need to track changes to the set of importable modules > during execution. The exceptions would be things like > IDEs, and they could do a cache flush before reloading > a module, etc. Not only IDE's, also the interactive prompt. It is very convenient that you can currently install an additional module when an import fails and then try the import again (at the python prompt). Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061107/9dd3183f/attachment.bin From anthony at interlink.com.au Tue Nov 7 14:19:46 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed, 8 Nov 2006 00:19:46 +1100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: <200611080019.47931.anthony@interlink.com.au> On Tuesday 07 November 2006 19:00, Guido van Rossum wrote: > No objection on targetting 2.6 if other developers agree. Seems this > is well under way. good work! Sounds fine to me! Less magic under the hood is less magic, and that's always a good thing. The use case for it seems completely appropriate, too. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From kristjan at ccpgames.com Tue Nov 7 15:05:14 2006 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_V=2E_J=F3nsson?=) Date: Tue, 7 Nov 2006 14:05:14 -0000 Subject: [Python-Dev] valgrind Message-ID: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc> You want to disable the obmalloc module when using valgrind, as I have when using Rational Purify. obmalloc does some evil stuff to recocnize its memory. You also want to disable it so that you get verification on a per-block level. Actually, obmalloc could be improved in this aspect. Similar code that I once wrote computed the block base address, but than looked in its tables to see if it was actually a known block before accessing it. That way you can have blocks that are larger than the virtual memory block of the process. K > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com at python.org > [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] > On Behalf Of Herman Geza > Sent: 7. n?vember 2006 02:12 > To: python-dev at python.org > Subject: [Python-Dev] valgrind > > Hi! > > I've embedded python into my application. Using valgrind I > got a lot of errors. I understand that "Conditional jump or > move depends on uninitialised value(s)" errors are completely > ok (from Misc/README.valgrind). However, I don't understand > why "Invalid read"'s are legal, like this: > > ==21737== Invalid read of size 4 > ==21737== at 0x408DDDF: PyObject_Free (in > /usr/lib/libpython2.4.so.1.0) > ==21737== by 0x4096F67: (within /usr/lib/libpython2.4.so.1.0) > ==21737== by 0x408A5AC: PyCFunction_Call (in > /usr/lib/libpython2.4.so.1.0) > ==21737== by 0x40C65F8: PyEval_EvalFrame (in > /usr/lib/libpython2.4.so.1.0) > ==21737== Address 0xC02E010 is 32 bytes inside a block of > size 40 free'd > ==21737== at 0x401D139: free (vg_replace_malloc.c:233) > ==21737== by 0x408DE00: PyObject_Free (in > /usr/lib/libpython2.4.so.1.0) > ==21737== by 0x407BB4D: (within /usr/lib/libpython2.4.so.1.0) > ==21737== by 0x407A3D6: (within /usr/lib/libpython2.4.so.1.0) > > Here python reads from an already-freed memory area, right? > (I don't think that Misc/README.valgrind answers this > question). Or is it a false alarm? > > Thanks, > Geza Herman > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com > From martin at v.loewis.de Tue Nov 7 15:31:18 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 15:31:18 +0100 Subject: [Python-Dev] valgrind In-Reply-To: References: <4550256C.1020109@v.loewis.de> Message-ID: <455098B6.3020903@v.loewis.de> Neal Norwitz schrieb: > at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741) > > Note that the free is inside qsort. The memory freed under qsort > should definitely not be the bases which we allocated under > PyType_Ready. I'll file a bug report with valgrind to help determine > if this is a problem in Python or valgrind. > http://bugs.kde.org/show_bug.cgi?id=136989 As Tim explains, a read from Py_ADDRESS_IN_RANGE is fine, and by design. If p is the pointer, we do pool = ((poolp)((Py_uintptr_t)(p) & ~(Py_uintptr_t)((4 * 1024) - 1))); i.e. round down p to the start of the page, to obtain "pool". Then we do f (((pool)->arenaindex < maxarenas && (Py_uintptr_t)(p) - arenas[(pool)->arenaindex].address < (Py_uintptr_t)(256 << 10) && arenas[(pool)->arenaindex].address != 0)) i.e. access pool->arenaindex. If this is our own memory, we really find a valid arena index there. If this is malloc'ed memory, we read garbage - due to the page size, we are guaranteed to read successfully, still. To determine whether it's garbage, we look it up in the arenas array. > One other thing that is weird is that the complaint is about 4 bytes > which should not be possible. All pointers should be 8 bytes AFAIK > since this is amd64. That's because the arenaindex is unsigned int. We could widen it to size_t, if we don't, PyMalloc can "only" manage 1 PiB (with an arena being 256kiB, and 4Gi arena indices being available). > I also ran this on x86. There were 32 errors and all of their > addresses were 0x4...010. That's because we round down to the beginning of the page. Regards, Martin From hg211 at hszk.bme.hu Tue Nov 7 15:54:54 2006 From: hg211 at hszk.bme.hu (Herman Geza) Date: Tue, 7 Nov 2006 15:54:54 +0100 (MET) Subject: [Python-Dev] valgrind In-Reply-To: <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> Message-ID: On Tue, 7 Nov 2006, Tim Peters wrote: > When PyObject_Free is handed an address it doesn't control, the "arena > base address" it derives from that address may point at anything the > system malloc controls, including uninitialized memory, memory the > system malloc has allocated to something, memory the system malloc has > freed, or internal system malloc bookkeeping bytes. The > Py_ADDRESS_IN_RANGE macro has no way to know before reading it up. > > So figure out which line of code valgrind is complaining about > (doesn't valgrind usually produce that?). If it's coming from the > expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought. Hmm. I don't think that way. What if free() does other things? For example if free(addr) sees that the memory block at addr is the last block then it may call brk with a decreased end_data_segment. Or the last block in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries to read from this freed memory block it gets SIGSEGV. However, I've never got SIGSEGV from python. I don't really think that reading from an already-freed block is ever legal. I asked my original question because I saw that I'm not the only one who sees "Illegal reads" from python. Is valgrind wrong in this case? I just want to be sure that I'll never get SIGSEGV from python. Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s at Py_ADDRESS_IN_RANGE. Geza Herman From tomerfiliba at gmail.com Tue Nov 7 16:41:53 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 7 Nov 2006 17:41:53 +0200 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <454FBDC3.3060100@gmail.com> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: <1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com> okay, everything's fixed. i updated the patch and added a small test to: Lib/test/test_builtins.py::test_dir -tomer On 11/7/06, Nick Coghlan wrote: > tomer filiba wrote: > > cool. first time i build the entire interpreter, 'twas fun :) > > currently i "retained" support for __members__ and __methods__, > > so it doesn't break anything and is compatible with 2.6. > > > > i really hope it will be included in 2.6 as today i'm using ugly hacks > > in RPyC to make remote objects appear like local ones. > > having __dir__ solves all of my problems. > > > > besides, it makes a lot of sense of define __dir__ for classes that > > define __getattr__. i don't think it should be pushed back to py3k. > > > > here's the patch: > > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470 > > As I noted on the tracker, PyObject_Dir is a public C API function, so it's > behaviour needs to be preserved as well as the behaviour of calling dir() from > Python code. > > So the final form of the patch will likely need to include stronger tests for > that section of the API, as well as updating the documentation in various > places (the dir and PyObject_Dir documentation, obviously, but also the list > of magic methods in the language reference). > > +1 on targeting 2.6, too. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > http://www.boredomandlaziness.org > From tomerfiliba at gmail.com Tue Nov 7 16:43:45 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 7 Nov 2006 17:43:45 +0200 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> <1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com> Message-ID: <1d85506f0611070743j2d400f14hcb3d7802172f93d@mail.gmail.com> > as well as updating the documentation in various > places (the dir and PyObject_Dir documentation, obviously, but also the list > of magic methods in the language reference). oops, i meant everything except that -tomer On 11/7/06, tomer filiba wrote: > okay, everything's fixed. > i updated the patch and added a small test to: > Lib/test/test_builtins.py::test_dir > > > -tomer > > On 11/7/06, Nick Coghlan wrote: > > tomer filiba wrote: > > > cool. first time i build the entire interpreter, 'twas fun :) > > > currently i "retained" support for __members__ and __methods__, > > > so it doesn't break anything and is compatible with 2.6. > > > > > > i really hope it will be included in 2.6 as today i'm using ugly hacks > > > in RPyC to make remote objects appear like local ones. > > > having __dir__ solves all of my problems. > > > > > > besides, it makes a lot of sense of define __dir__ for classes that > > > define __getattr__. i don't think it should be pushed back to py3k. > > > > > > here's the patch: > > > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470 > > > > As I noted on the tracker, PyObject_Dir is a public C API function, so it's > > behaviour needs to be preserved as well as the behaviour of calling dir() from > > Python code. > > > > So the final form of the patch will likely need to include stronger tests for > > that section of the API, as well as updating the documentation in various > > places (the dir and PyObject_Dir documentation, obviously, but also the list > > of magic methods in the language reference). > > > > +1 on targeting 2.6, too. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > --------------------------------------------------------------- > > http://www.boredomandlaziness.org > > > From ronaldoussoren at mac.com Tue Nov 7 16:45:37 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 7 Nov 2006 16:45:37 +0100 Subject: [Python-Dev] Inconvenient filename in sandbox/decimal-c/new_dt Message-ID: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com> Hi, I'm having problems with updating the sandbox. ilithien:~/Python/sandbox-trunk ronald$ svn cleanup ilithien:~/Python/sandbox-trunk ronald$ svn up A import_in_py/mock_importer.py U import_in_py/test_importer.py U import_in_py/importer.py svn: Failed to add file 'decimal-c/new_dt/rounding.decTest': object of the same name already exists This is on a 10.4.8 box with a recent version of subversion. It turns out this is caused by a testcase file: decimal-c/new_dt contains both remainderNear.decTest and remaindernear.decTest (the filenames differ by case only). It this intentional? This makes it impossible to do a checkout on a system with a case insensitive filesystem. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061107/5ab7d1c8/attachment.bin From martin at v.loewis.de Tue Nov 7 17:33:44 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 17:33:44 +0100 Subject: [Python-Dev] valgrind In-Reply-To: References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> Message-ID: <4550B568.6000805@v.loewis.de> Herman Geza schrieb: >> So figure out which line of code valgrind is complaining about >> (doesn't valgrind usually produce that?). If it's coming from the >> expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought. > > Hmm. I don't think that way. What if free() does other things? It can't, as the hardware won't support it. > For example > if free(addr) sees that the memory block at addr is the last block then it > may call brk with a decreased end_data_segment. It can't. In brk, you can only manage memory in chunks of "one page" (i.e. 4kiB on x86). Since we only access memory on the same page, access is guaranteed to succeed. > Or the last block > in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries > to read from this freed memory block it gets SIGSEGV. However, I've never > got SIGSEGV from python. Likewise. This is guaranteed to work, by the processor manufacturers. > I don't really think that reading from an already-freed block is ever > legal. Define "legal". There is no law against it; you don't go to jail for doing it. What other penalties would you expect (other than valgrind spitting out error messages, and users complaining from time to time that it's "illegal")? > I asked my original question because I saw that I'm not the only > one who sees "Illegal reads" from python. Is valgrind wrong in this case? If it is this case, then no, valgrind is right. Notice that valgrind doesn't call them "illegal"; it calls them "invalid". > I just want to be sure that I'll never get SIGSEGV from python. You least won't get SIGSEGVs from that part of the code. > Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s > at Py_ADDRESS_IN_RANGE. Right. This is to tell valgrind that these reads are known to work as designed. Regards, Martin From martin at v.loewis.de Tue Nov 7 17:42:23 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 17:42:23 +0100 Subject: [Python-Dev] Inconvenient filename in sandbox/decimal-c/new_dt In-Reply-To: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com> References: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com> Message-ID: <4550B76F.1090800@v.loewis.de> Ronald Oussoren schrieb: > This is on a 10.4.8 box with a recent version of subversion. It turns > out this is caused by a testcase file: decimal-c/new_dt contains both > remainderNear.decTest and remaindernear.decTest (the filenames differ by > case only). It this intentional? I don't think so. The files differed only in the version: field, and remainderNear.decTest is the same as the Python trunk, so I removed remaindernear.decTest as bogus. Regards, Martin From hg211 at hszk.bme.hu Tue Nov 7 18:09:46 2006 From: hg211 at hszk.bme.hu (Herman Geza) Date: Tue, 7 Nov 2006 18:09:46 +0100 (MET) Subject: [Python-Dev] valgrind In-Reply-To: <4550B568.6000805@v.loewis.de> References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> <4550B568.6000805@v.loewis.de> Message-ID: > > For example > > if free(addr) sees that the memory block at addr is the last block then it > > may call brk with a decreased end_data_segment. > > It can't. In brk, you can only manage memory in chunks of "one page" > (i.e. 4kiB on x86). Since we only access memory on the same page, > access is guaranteed to succeed. Yes, I'm aware of it. But logically, it is possible, isn't it? At malloc(), libc recognizes that brk needed, it calls sbrk(4096). Suppose that python releases this very same block immediately. At free(), libc recognizes that sbrk(-4096) could be executed, so the freed block not available anymore (even for reading) > > Or the last block > > in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries > > to read from this freed memory block it gets SIGSEGV. However, I've never > > got SIGSEGV from python. > > Likewise. This is guaranteed to work, by the processor manufacturers. The same: if the freed block is the last one in the mmap'd area, libc may unmap it, doesn't it? > > I don't really think that reading from an already-freed block is ever > > legal. > > Define "legal". There is no law against it; you don't go to jail for > doing it. What other penalties would you expect (other than valgrind > spitting out error messages, and users complaining from time to time > that it's "illegal")? Ok, sorry about the strong word "legal". > > I asked my original question because I saw that I'm not the only > > one who sees "Illegal reads" from python. Is valgrind wrong in this case? > > If it is this case, then no, valgrind is right. Notice that valgrind > doesn't call them "illegal"; it calls them "invalid". > > > I just want to be sure that I'll never get SIGSEGV from python. > > You least won't get SIGSEGVs from that part of the code. That's what I still don't understand. If valgrind is right then how can python be sure that it can still reach a freed block? > > Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s > > at Py_ADDRESS_IN_RANGE. > > Right. This is to tell valgrind that these reads are known to work > as designed. Does this mean that python strongly depends on libc? If I want to port python to another platform which uses a totally different malloc, is Py_ADDRESS_IN_RANGE guaranteed to work or do I have to make some changes? (actually I'm porting python to another platfrom that's why I'm asking these questions, not becaue I'm finical or something) Thanks, Geza Herman From martin at v.loewis.de Tue Nov 7 18:50:04 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Nov 2006 18:50:04 +0100 Subject: [Python-Dev] valgrind In-Reply-To: References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> <4550B568.6000805@v.loewis.de> Message-ID: <4550C74C.1060402@v.loewis.de> Herman Geza schrieb: >> It can't. In brk, you can only manage memory in chunks of "one page" >> (i.e. 4kiB on x86). Since we only access memory on the same page, >> access is guaranteed to succeed. > Yes, I'm aware of it. But logically, it is possible, isn't it? No, it isn't. > At malloc(), libc recognizes that brk needed, it calls sbrk(4096). > Suppose that python releases this very same block immediately. At free(), > libc recognizes that sbrk(-4096) could be executed, so the freed block not > available anymore (even for reading) That can't happen for a different reason. When this access occurs, we still have a pointer to allocated memory (either allocated through malloc, or obmalloc - we don't know at the pointer where the access is made). The access is "invalid" only if the memory was allocated through malloc. So when the access is made, we have a pointer p, which is allocated through malloc, and access p-3000 (say, assuming that p-3000 is a page boundary). Since p is still allocated, libc *cannot* have made sbrk(p-3000), since that would have released the still-allocated block. > >>> Or the last block >>> in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries >>> to read from this freed memory block it gets SIGSEGV. However, I've never >>> got SIGSEGV from python. >> Likewise. This is guaranteed to work, by the processor manufacturers. > The same: if the freed block is the last one in the mmap'd area, libc may > unmap it, doesn't it? But it isn't. We still have an allocated block of memory on the same page. The C library can't have released it. >>> I just want to be sure that I'll never get SIGSEGV from python. >> You least won't get SIGSEGVs from that part of the code. > That's what I still don't understand. If valgrind is right then how can > python be sure that it can still reach a freed block? valgrind knows the block is released. We know that the block is still "mapped" to memory by the operating system. These are different properties. To write to memory, you better have allocated it. To read from memory, it ought to be mapped (in most applications, it is also an error to read from released memory, even if the read operation succeeds; valgrind reports this error as "invalid read"). >>> Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s >>> at Py_ADDRESS_IN_RANGE. >> Right. This is to tell valgrind that these reads are known to work >> as designed. > Does this mean that python strongly depends on libc? No. It strongly depends on a lower estimate of the page size, and that memory is mapped on page boundaries. > If I want to port > python to another platform which uses a totally different malloc, is > Py_ADDRESS_IN_RANGE guaranteed to work or do I have to make some changes? It's rather unimportant how malloc is implemented. The real question is whether you have a flat address space (Python likely won't work at all if you don't have a flat address space), and whether the system either doesn't have virtual memory, or, if it does, whether obmalloc's guess of the page size is either right or an underestimation. If some constraints fail, you can't use obmalloc (you could still port Python, to not use obmalloc). Notice that on a system with limited memory, you probably don't want to use obmalloc, even if it worked. obmalloc uses arenas of 256kiB, which might be expensive on the target system. Out of curiosity: what is your target system? Regards, Martin From hg211 at hszk.bme.hu Tue Nov 7 19:56:31 2006 From: hg211 at hszk.bme.hu (Herman Geza) Date: Tue, 7 Nov 2006 19:56:31 +0100 (MET) Subject: [Python-Dev] valgrind In-Reply-To: <4550C74C.1060402@v.loewis.de> References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> <4550B568.6000805@v.loewis.de> <4550C74C.1060402@v.loewis.de> Message-ID: Thanks Martin, now everything is clear. Python always reads from the page where the about-to-be-freed block is located (that was the information that I missed) - as such, never causes a SIGSEGV. Geza Herman From tim.peters at gmail.com Tue Nov 7 21:34:45 2006 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 7 Nov 2006 15:34:45 -0500 Subject: [Python-Dev] valgrind In-Reply-To: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc> References: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc> Message-ID: <1f7befae0611071234q468cd8ccn46290ebadc26ae16@mail.gmail.com> [Kristj?n V. J?nsson] > ... > Actually, obmalloc could be improved in this aspect. Similar code that I once wrote > computed the block base address, but than looked in its tables to see if it was > actually a known block before accessing it. Several such schemes were tried (based on, e.g., binary search and splay trees), but discarded due to measurable sloth. The overwhelming advantage of the current scheme is that it does the check in constant time, independent of how many distinct arenas (whether one or thousands makes no difference) pymalloc is managing. > That way you can have blocks that are larger than the virtual memory block > of the process. If you have a way to do the check in constant time, that would be good. Otherwise speed rules here. From tim.peters at gmail.com Tue Nov 7 22:01:01 2006 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 7 Nov 2006 16:01:01 -0500 Subject: [Python-Dev] valgrind In-Reply-To: <4550C74C.1060402@v.loewis.de> References: <4550256C.1020109@v.loewis.de> <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com> <4550B568.6000805@v.loewis.de> <4550C74C.1060402@v.loewis.de> Message-ID: <1f7befae0611071301oaf34eebma466c7128bcf3e5a@mail.gmail.com> [Martin v. L?wis] Thanks for explaining all this! One counterpoint: > Notice that on a system with limited memory, you probably don't > want to use obmalloc, even if it worked. obmalloc uses arenas > of 256kiB, which might be expensive on the target system. OTOH, Python allocates a lot of small objects, and one of the reasons for obmalloc's existence is that it typically uses memory more efficiently (less bookkeeping space overhead and less fragmentation) for mounds of small objects than the all-purpose system malloc. In a current (trunk) debug build, simply starting Python hits an arena highwater mark of 9, and doing "python -S" instead hits a highwater mark of 2. Given how much memory Python needs to do nothing ;-), it's doubtful that the system malloc would be doing better. From grig.gheorghiu at gmail.com Tue Nov 7 23:33:04 2006 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Tue, 7 Nov 2006 14:33:04 -0800 Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy Message-ID: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com> One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing the unit test step for the trunk, specifically the test_ucn test. Here's the error: test_ucn test test_ucn failed -- Traceback (most recent call last): File "/home/pybot/pybot/trunk.bear-x86/build/Lib/test/test_ucn.py", line 102, in test_bmp_characters self.assertEqual(unicodedata.lookup(name), char) KeyError: "undefined character name 'EIGHT PETALLED OUTLINED BLACK FLORETTE'" Here's the entire log for the failed step: http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/142/step-test/0 Note that this test passes on all the other plaforms running in the Pybots farm, including an amd64 Ubuntu Edgy machine. Looks like the failure started to happen after this checkin: http://svn.python.org/view?rev=52621&view=rev Grig -- http://agiletesting.blogspot.com From greg.ewing at canterbury.ac.nz Wed Nov 8 02:38:02 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Nov 2006 14:38:02 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061106135751.GA29592@code0.codespeak.net> References: <454CB619.7010804@v.loewis.de> <20061106135751.GA29592@code0.codespeak.net> Message-ID: <455134FA.9000001@canterbury.ac.nz> Armin Rigo wrote: It would seem good practice to remove all .pycs after checking out a new version of the source, just in case there are other problems such as mismatched timestamps, which can cause the same trouble. > My two > cents is that it would be saner to have two separate concepts: cache > files used internally by the interpreter for speed reasons only, and > bytecode files that can be shipped and imported. That's a possibility. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 8 02:38:18 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Nov 2006 14:38:18 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <454FC791.7080106@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de> Message-ID: <4551350A.60607@canterbury.ac.nz> Martin v. L?wis wrote: > Currently, you can put a file on disk and import it > immediately; that will stop working. One thing I should add is that if you try to import a module that wasn't there before, the interpreter will notice this and has the opportunity to update its idea of what's on the disk. Likewise, if you delete a module, the interpreter will notice when it tries to open a file that no longer exists. The only change would be if you added a module that shadowed something formerly visible further along sys.path -- in between starting the program and attempting to import it for the first time. So I don't think there would be any visible change as far as most people could tell. -- Greg From greg.ewing at canterbury.ac.nz Wed Nov 8 02:38:28 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Nov 2006 14:38:28 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com> References: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com> Message-ID: <45513514.6090400@canterbury.ac.nz> Delaney, Timothy (Tim) wrote: > Would it be reasonable to always do a stat() on the directory, > reloading if there's been a change? Would this be reliable across > platforms? To detect a new shadowing you'd have to stat all the directories along sys.path, not just the one you think the file is in. That might wipe out most of the advantage. It would be different on platforms which provide a way of "watching" a directory and getting notified of changes. I think MacOSX, Linux and Windows all provide some way of doing that nowadays, although I'm not familiar with the details. -- Greg From python-dev at zesty.ca Wed Nov 8 03:20:52 2006 From: python-dev at zesty.ca (Ka-Ping Yee) Date: Tue, 7 Nov 2006 20:20:52 -0600 (CST) Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <20061106135751.GA29592@code0.codespeak.net> References: <454CB619.7010804@v.loewis.de> <20061106135751.GA29592@code0.codespeak.net> Message-ID: On Mon, 6 Nov 2006, Armin Rigo wrote: > I know it's a discussion that comes up and dies out regularly. My two > cents is that it would be saner to have two separate concepts: cache > files used internally by the interpreter for speed reasons only, and > bytecode files that can be shipped and imported. I like this approach. Bringing source code and program behaviour closer together makes debugging easier, and if someone wants to run Python programs without source code, then EIBTI. -- ?!ng From kbk at shore.net Wed Nov 8 05:31:22 2006 From: kbk at shore.net (Kurt B. Kaiser) Date: Tue, 7 Nov 2006 23:31:22 -0500 (EST) Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200611080431.kA84VM59025651@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 430 open ( -4) / 3447 closed (+17) / 3877 total (+13) Bugs : 922 open ( -7) / 6316 closed (+31) / 7238 total (+24) RFE : 245 open ( +0) / 241 closed ( +1) / 486 total ( +1) New / Reopened Patches ______________________ modulefinder changes for py3k (2006-10-27) CLOSED http://python.org/sf/1585966 opened by Thomas Heller no wraparound for enumerate() (2006-10-28) CLOSED http://python.org/sf/1586315 opened by Georg Brandl missing imports ctypes in documentation examples (2006-09-13) CLOSED http://python.org/sf/1557890 reopened by theller better error msgs for some TypeErrors (2006-10-29) http://python.org/sf/1586791 opened by Georg Brandl cookielib: lock acquire/release try..finally protected (2006-10-30) http://python.org/sf/1587139 opened by kxroberto Patch for #1586414 to avoid fragmentation on Windows (2006-10-31) http://python.org/sf/1587674 opened by Enoch Julias Typo in Mac installer image name (2006-11-01) CLOSED http://python.org/sf/1589013 opened by Humberto Di?genes Typo in Mac image name (2006-11-01) CLOSED http://python.org/sf/1589014 opened by Humberto Di?genes MacPython Build Installer - Typos and Style corrections (2006-11-02) CLOSED http://python.org/sf/1589070 opened by Humberto Di?genes bdist_sunpkg distutils command (2006-11-02) http://python.org/sf/1589266 opened by Holger The "lazy strings" patch (2006-11-04) http://python.org/sf/1590352 opened by Larry Hastings adding __dir__ (2006-11-06) http://python.org/sf/1591665 opened by ganges master `in` for classic object causes segfault (2006-11-07) http://python.org/sf/1591996 opened by Hirokazu Yamamoto PyErr_CheckSignals returns -1 on error, not 1 (2006-11-07) http://python.org/sf/1592072 opened by Gustavo J. A. M. Carneiro Add missing elide argument to Text.search (2006-11-07) http://python.org/sf/1592250 opened by Russell Owen Patches Closed ______________ Fix for structmember conversion issues (2006-08-30) http://python.org/sf/1549049 closed by loewis Enable SSL for smtplib (2006-09-28) http://python.org/sf/1567274 closed by loewis Mailbox will not lock properly after flush() (2006-10-11) http://python.org/sf/1575506 closed by akuchling urllib2 - Fix line breaks in authorization headers (2006-10-09) http://python.org/sf/1574068 closed by akuchling Tiny patch to stop make spam (2006-06-09) http://python.org/sf/1503717 closed by akuchling modulefinder changes for py3k (2006-10-27) http://python.org/sf/1585966 closed by gvanrossum unparse.py decorator support (2006-09-04) http://python.org/sf/1552024 closed by gbrandl no wraparound for enumerate() (2006-10-28) http://python.org/sf/1586315 closed by rhettinger missing imports ctypes in documentation examples (2006-09-13) http://python.org/sf/1557890 closed by theller missing imports ctypes in documentation examples (2006-09-13) http://python.org/sf/1557890 closed by nnorwitz tarfile.py: better use of TarInfo objects with longnames (2006-10-24) http://python.org/sf/1583880 closed by gbrandl tarfile depends on undocumented behaviour (2006-09-25) http://python.org/sf/1564981 closed by gbrandl Typo in Mac installer image name (2006-11-02) http://python.org/sf/1589013 closed by ronaldoussoren Typo in Mac image name (2006-11-01) http://python.org/sf/1589014 deleted by virtualspirit MacPython Build Installer - Typos and Style corrections (2006-11-02) http://python.org/sf/1589070 closed by ronaldoussoren bdist_rpm not able to compile multiple rpm packages (2004-11-04) http://python.org/sf/1060577 closed by loewis Remove inconsistent behavior between import and zipimport (2005-11-03) http://python.org/sf/1346572 closed by loewis Rational Reference Implementation (2002-10-02) http://python.org/sf/617779 closed by loewis Problem at the end of misformed mailbox (2002-11-03) http://python.org/sf/632934 closed by loewis New / Reopened Bugs ___________________ csv.reader.line_num missing 'new in 2.5' (2006-10-27) CLOSED http://python.org/sf/1585690 opened by Kent Johnson tarfile.extract() may cause file fragmentation on Windows XP (2006-10-28) http://python.org/sf/1586414 opened by Enoch Julias compiler module dont emit LIST_APPEND w/ list comprehension (2006-10-29) CLOSED http://python.org/sf/1586448 opened by sebastien Martini codecs.open problem with "with" statement (2006-10-28) CLOSED http://python.org/sf/1586513 opened by Shaun Cutts zlib/bz2_codec doesn't support incremental decoding (2006-10-29) CLOSED http://python.org/sf/1586613 opened by Topia hashlib documentation is insuficient (2006-10-29) CLOSED http://python.org/sf/1586773 opened by Marcos Daniel Marado Torres scipy gammaincinv gives incorrect answers (2006-10-31) CLOSED http://python.org/sf/1587679 opened by David J.C. MacKay quoted printable parse the sequence '= ' incorrectly (2006-10-31) http://python.org/sf/1588217 opened by Wai Yip Tung string subscripting not working on a specific string (2006-11-02) CLOSED http://python.org/sf/1588975 opened by Dan Aronson Unneeded constants left during optimization (2006-11-02) CLOSED http://python.org/sf/1589074 opened by Daniel ctypes XXX - add a crossref, at least (2006-11-02) CLOSED http://python.org/sf/1589328 opened by Jim Jewett urllib2 does local import of tokenize.py (2006-11-02) http://python.org/sf/1589480 reopened by drfarina urllib2 does local import of tokenize.py (2006-11-02) http://python.org/sf/1589480 opened by Daniel Farina __getattr__ = getattr crash (2006-11-03) CLOSED http://python.org/sf/1590036 opened by Brian Harring Error piping output between scripts on Windows (2006-11-03) http://python.org/sf/1590068 opened by Andrei where is zlib??? (2006-11-04) http://python.org/sf/1590592 opened by AKap mail message parsing glitch (2006-11-05) http://python.org/sf/1590744 opened by Mike python: Python/ast.c:541: seq_for_testlist: Assertion fails (2006-10-31) CLOSED http://python.org/sf/1588287 opened by Tom Epperly python: Python/ast.c:541: seq_for_testlist: Assertion (2006-11-05) CLOSED http://python.org/sf/1590804 opened by Jay T Miller subprocess deadlock (2006-11-05) http://python.org/sf/1590864 opened by Michael Tsai random.randrange don't return correct value for big number (2006-11-06) http://python.org/sf/1590891 opened by MATSUI Tetsushi update urlparse to RFC 3986 (2006-11-05) http://python.org/sf/1591035 opened by Andrew Dalke problem building python in vs8express (2006-11-05) http://python.org/sf/1591122 reopened by thomashsouthern problem building python in vs8express (2006-11-05) http://python.org/sf/1591122 opened by Thomas Southern replace groups doesn't work in this special case (2006-11-06) CLOSED http://python.org/sf/1591319 opened by Thomas K. Urllib2.urlopen() raises OSError w/bad HTTP Location header (2006-11-07) http://python.org/sf/1591774 opened by nikitathespider Undocumented implicit strip() in split(None) string method (2005-01-19) http://python.org/sf/1105286 reopened by yohell Stepping into a generator throw does not work (2006-11-07) http://python.org/sf/1592241 opened by Bernhard Mulder Bugs Closed ___________ python_d python (2006-09-21) http://python.org/sf/1563243 closed by sf-robot glob.glob("c:\\[ ]\*) doesn't work (2006-10-19) http://python.org/sf/1580472 closed by gbrandl structmember T_LONG won't accept a python long (2006-08-24) http://python.org/sf/1545696 closed by loewis T_ULONG -> double rounding in PyMember_GetOne() (2006-09-27) http://python.org/sf/1566140 closed by loewis Different behavior when stepping through code w/ pdb (2006-10-24) http://python.org/sf/1583276 closed by jpe csv.reader.line_num missing 'new in 2.5' (2006-10-27) http://python.org/sf/1585690 closed by akuchling asyncore.dispatcher.set_reuse_addr not documented. (2006-09-20) http://python.org/sf/1562583 closed by akuchling inconsistency in PCALL conditional code in ceval.c (2006-08-17) http://python.org/sf/1542016 closed by akuchling functools.wraps fails on builtins (2006-10-12) http://python.org/sf/1576241 closed by akuchling str(WindowsError) wrong (2006-10-12) http://python.org/sf/1576174 closed by theller does not raise SystemError on too many nested blocks (2006-09-25) http://python.org/sf/1565514 closed by nnorwitz curses module segfaults on invalid tparm arguments (2006-08-28) http://python.org/sf/1548092 closed by nnorwitz "from __future__ import foobar;" causes wrong SyntaxError (2006-08-19) http://python.org/sf/1543306 closed by nnorwitz compiler module dont emit LIST_APPEND w/ list comprehension (2006-10-28) http://python.org/sf/1586448 closed by gbrandl distutils adds (unwanted) -xcode=pic32 in the compile comman (2006-05-19) http://python.org/sf/1491574 closed by nnorwitz codecs.open problem with "with" statement (2006-10-29) http://python.org/sf/1586513 closed by gbrandl suprocess cannot handle shell arguments (2005-11-16) http://python.org/sf/1357915 closed by gbrandl zlib/bz2_codec doesn't support incremental decoding (2006-10-29) http://python.org/sf/1586613 closed by gbrandl missing __enter__ + __getattr__ forwarding (2006-10-20) http://python.org/sf/1581357 closed by gbrandl hashlib documentation is insuficient (2006-10-29) http://python.org/sf/1586773 closed by gbrandl scipy gammaincinv gives incorrect answers (2006-10-31) http://python.org/sf/1587679 closed by loewis string subscripting not working on a specific string (2006-11-02) http://python.org/sf/1588975 closed by gbrandl ctypes XXX - add a crossref, at least (2006-11-02) http://python.org/sf/1589328 closed by theller dict keyerror formatting and tuples (2006-10-13) http://python.org/sf/1576657 closed by gbrandl __getattr__ = getattr crash (2006-11-03) http://python.org/sf/1590036 closed by arigo potential buffer overflow in complexobject.c (2006-10-13) http://python.org/sf/1576861 closed by sf-robot inspect.py imports local "tokenize.py" file (2006-11-02) http://python.org/sf/1589480 closed by loewis python: Python/ast.c:541: seq_for_testlist: Assertion fails (2006-10-31) http://python.org/sf/1588287 closed by nnorwitz python: Python/ast.c:541: seq_for_testlist: Assertion (2006-11-05) http://python.org/sf/1590804 closed by loewis TypeError message on bad iteration is misleading (2005-04-21) http://python.org/sf/1187437 closed by gbrandl replace groups doesn't work in this special case (2006-11-06) http://python.org/sf/1591319 closed by niemeyer unchecked metaclass mro (2006-09-28) http://python.org/sf/1567234 closed by akuchling curses getkey() crash in raw mode (2004-02-09) http://python.org/sf/893250 closed by akuchling RFE Closed __________ Unneeded constants left during optimization (2006-11-02) http://python.org/sf/1589074 closed by loewis From martin at v.loewis.de Wed Nov 8 06:18:43 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Nov 2006 06:18:43 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <4551350A.60607@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de> <4551350A.60607@canterbury.ac.nz> Message-ID: <455168B3.3040809@v.loewis.de> Greg Ewing schrieb: > One thing I should add is that if you try to import > a module that wasn't there before, the interpreter will > notice this and has the opportunity to update its idea > of what's on the disk. How will it notice that it wasn't there before? The interpreter will see that it hasn't imported the module; it can't know whether it was there before while trying to resolve the import: when looking at a directory in sys.path, it needs to decide whether to use the directory cache or not. If the directory is not in the cache, it might be one of three things: a) the directory cache is out of date, and you should re-read the directory b) the module still isn't there, but is available in a later directory on sys.path (which hasn't yet been visited) c) the module isn't there at all, and the import will eventually fail. How can the interpreter determine which of these it is? Regards, Martin From martin at v.loewis.de Wed Nov 8 06:54:26 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Nov 2006 06:54:26 +0100 Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy In-Reply-To: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com> References: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com> Message-ID: <45517112.3060509@v.loewis.de> Grig Gheorghiu schrieb: > One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing > the unit test step for the trunk, specifically the test_ucn test. Something is wrong with the machine. I forced a clean rebuild, and now it crashes in test_doctest2: http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/145/step-test/0 So either the compiler or some library has been updated in a strange way, or there is a hardware problem. One would need access to the machine to find out (and analyzing it is likely time-consuming). Regards, Martin From nnorwitz at gmail.com Wed Nov 8 07:04:10 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 7 Nov 2006 22:04:10 -0800 Subject: [Python-Dev] valgrind In-Reply-To: <455098B6.3020903@v.loewis.de> References: <4550256C.1020109@v.loewis.de> <455098B6.3020903@v.loewis.de> Message-ID: On 11/7/06, "Martin v. L?wis" wrote: > Neal Norwitz schrieb: > > at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741) > > > > Note that the free is inside qsort. The memory freed under qsort > > should definitely not be the bases which we allocated under > > PyType_Ready. I'll file a bug report with valgrind to help determine > > if this is a problem in Python or valgrind. > > http://bugs.kde.org/show_bug.cgi?id=136989 > > As Tim explains, a read from Py_ADDRESS_IN_RANGE is fine, and by design. > If p is the pointer, we do Yeah, thanks for going over it again. I was tired and only half paying attention last night. Tonight isn't going much better. :-( I wonder if we can capture any of these exchanges and put into README.valgrind. I'm not about to do it tonight though. n From mwh at python.net Wed Nov 8 16:07:41 2006 From: mwh at python.net (Michael Hudson) Date: Wed, 08 Nov 2006 16:07:41 +0100 Subject: [Python-Dev] Last chance to join the Summer of PyPy! Message-ID: <87fycurn7m.fsf@starship.python.net> Hopefully by now you have heard of the "Summer of PyPy", our program for funding the expenses of attending a sprint for students. If not, you've just read the essence of the idea :-) However, the PyPy EU funding period is drawing to an end and there is now only one sprint left where we can sponsor the travel costs of interested students within our program. This sprint will probably take place in Leysin, Switzerland from 8th-14th of January 2007. So, as explained in more detail at: http://codespeak.net/pypy/dist/pypy/doc/summer-of-pypy.html we would encourage any interested students to submit a proposal in the next month or so. If you're stuck for ideas, you can find some at http://codespeak.net/pypy/dist/pypy/doc/project-ideas.html but please do not feel limited in any way by this list! Cheers, mwh ... and the PyPy team -- the highest calling of technical book writers is to destroy the sun -- from Twisted.Quotes From grig.gheorghiu at gmail.com Wed Nov 8 16:27:44 2006 From: grig.gheorghiu at gmail.com (Grig Gheorghiu) Date: Wed, 8 Nov 2006 07:27:44 -0800 Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy In-Reply-To: <45517112.3060509@v.loewis.de> References: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com> <45517112.3060509@v.loewis.de> Message-ID: <3f09d5a00611080727k4e300871h2af920acd3867b8a@mail.gmail.com> On 11/7/06, "Martin v. L?wis" wrote: > Grig Gheorghiu schrieb: > > One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing > > the unit test step for the trunk, specifically the test_ucn test. > > Something is wrong with the machine. I forced a clean rebuild, and > now it crashes in test_doctest2: > > http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/145/step-test/0 > > So either the compiler or some library has been updated in a strange > way, or there is a hardware problem. One would need access to the > machine to find out (and analyzing it is likely time-consuming). > > Regards, > Martin > Thanks for looking into it. I'll contact the owner of that machine and we'll try to figure out what's going on. Grig From greg.ewing at canterbury.ac.nz Thu Nov 9 00:29:43 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 09 Nov 2006 12:29:43 +1300 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <455168B3.3040809@v.loewis.de> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de> <4551350A.60607@canterbury.ac.nz> <455168B3.3040809@v.loewis.de> Message-ID: <45526867.70004@canterbury.ac.nz> Martin v. L?wis wrote: > a) the directory cache is out of date, and you should > re-read the directory > b) the module still isn't there, but is available in > a later directory on sys.path (which hasn't yet > been visited) > c) the module isn't there at all, and the import will > eventually fail. > > How can the interpreter determine which of these it > is? It doesn't need to - if there is no file for the module in the cache, it assumes that the cache could be out of date and rebuilds it. If that turns up a file, then fine, else the module doesn't exist. BTW, I'm not thinking of cacheing individual directories, but scanning all the directories and building a single qualified_module_name -> pathname mapping. If the cache gets invalidated, all the directories along the path are re-scanned, so a new module will be picked up wherever it is on the path. -- Greg From martin at v.loewis.de Thu Nov 9 06:11:13 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 06:11:13 +0100 Subject: [Python-Dev] Importing .pyc in -O mode and vice versa In-Reply-To: <45526867.70004@canterbury.ac.nz> References: <454CB619.7010804@v.loewis.de> <454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de> <454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de> <454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de> <4551350A.60607@canterbury.ac.nz> <455168B3.3040809@v.loewis.de> <45526867.70004@canterbury.ac.nz> Message-ID: <4552B871.2010303@v.loewis.de> Greg Ewing schrieb: > Martin v. L?wis wrote: > >> a) the directory cache is out of date, and you should >> re-read the directory >> b) the module still isn't there, but is available in >> a later directory on sys.path (which hasn't yet >> been visited) >> c) the module isn't there at all, and the import will >> eventually fail. >> >> How can the interpreter determine which of these it >> is? > > It doesn't need to - if there is no file for the module > in the cache, it assumes that the cache could be out > of date and rebuilds it. If that turns up a file, then > fine, else the module doesn't exist. I lost track. I thought we were talking about creating a cache of directory listings, not a stat cache? If you invalidate the cache when a file name is not listed, you will invalidate it on nearly every import, and multiple times, too: Python looks for foo.py, foo.pyc, foo.so, foomodule.so. At most one of them is found, the others aren't. So if foo.so would be found, are you invalidating the cache because foo.py isn't? > BTW, I'm not thinking of cacheing individual directories, > but scanning all the directories and building a single > qualified_module_name -> pathname mapping. If the cache > gets invalidated, all the directories along the path > are re-scanned, so a new module will be picked up > wherever it is on the path. That won't work well with path import objects. You have to observe the order in which sys.path is scanned, for correct semantics. Regards, Martin From martin at v.loewis.de Thu Nov 9 06:30:42 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 06:30:42 +0100 Subject: [Python-Dev] Using SCons for cross-compilation Message-ID: <4552BD02.2090808@v.loewis.de> Patch #841454 takes a stab at cross-compilation (for MingW32 on a Linux system, in this case), and proposes to use SCons instead of setup.py to compile extension modules. Usage of SCons would be restricted to cross-compilation (for the moment). What do you think? Regards, Martin From anthony at interlink.com.au Thu Nov 9 07:45:30 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu, 9 Nov 2006 17:45:30 +1100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <4552BD02.2090808@v.loewis.de> References: <4552BD02.2090808@v.loewis.de> Message-ID: <200611091745.31443.anthony@interlink.com.au> On Thursday 09 November 2006 16:30, Martin v. L?wis wrote: > Patch #841454 takes a stab at cross-compilation > (for MingW32 on a Linux system, in this case), > and proposes to use SCons instead of setup.py > to compile extension modules. Usage of SCons > would be restricted to cross-compilation (for > the moment). > > What do you think? So we'd now have 3 places to update when things change (setup.py, PCbuild area, SCons)? How does this deal with the problems that autoconf has with cross-compilation? It would seem to me that just fixing the extension module building is a tiny part of the problem... or am I missing something? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From amk at amk.ca Thu Nov 9 15:01:46 2006 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 9 Nov 2006 09:01:46 -0500 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS In-Reply-To: <20061109135115.15FA81E4006@bag.python.org> References: <20061109135115.15FA81E4006@bag.python.org> Message-ID: <20061109140146.GB8808@localhost.localdomain> On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote: > Author: andrew.kuchling > Date: Thu Nov 9 14:51:14 2006 > New Revision: 52692 > > [Patch #1514544 by David Watson] use fsync() to ensure data is really on disk Should I backport this change to 2.5.1? Con: The patch adds two new internal functions, _sync_flush() and _sync_close(), so it's an internal API change. Pro: it's a patch that should reduce chances of data loss, which is important to people processing mailboxes. Because it fixes a small chance of potential data loss and the new functions are prefixed with _, my personal inclination would be to backport this change. Comments? Anthony, do you want to pronounce on this issue? --amk From barry at python.org Thu Nov 9 16:07:23 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Nov 2006 10:07:23 -0500 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS In-Reply-To: <20061109140146.GB8808@localhost.localdomain> References: <20061109135115.15FA81E4006@bag.python.org> <20061109140146.GB8808@localhost.localdomain> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 9, 2006, at 9:01 AM, A.M. Kuchling wrote: > Should I backport this change to 2.5.1? Con: The patch adds two new > internal functions, _sync_flush() and _sync_close(), so it's an > internal API change. Pro: it's a patch that should reduce chances of > data loss, which is important to people processing mailboxes. > > Because it fixes a small chance of potential data loss and the new > functions are prefixed with _, my personal inclination would be to > backport this change. I agree. _ is a hint as to its non-publicness and I don't have a problem in principle adding such methods. In this particular case, it seems the patch improves reliability, so +1. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRVNEL3EjvBPtnXfVAQIb0QP+Nmd6XPKQeeXaHrAG/fAFjrVHFn4SFkhH PtJqnLVOAQeSDonDdBQKluypGdWktcpGM/r1mz51cpJhxytYnAbwqeu1LyWJ/maX ABxG6zrkd7YCjZ5VyK2VQNs2dSLVWYYH24V/xwP5E5D2sEQ80sII3mnydSO+KLVI HBg9jztsc70= =Sj0Q -----END PGP SIGNATURE----- From skip at pobox.com Thu Nov 9 16:12:15 2006 From: skip at pobox.com (skip at pobox.com) Date: Thu, 9 Nov 2006 09:12:15 -0600 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <200611091745.31443.anthony@interlink.com.au> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> Message-ID: <17747.17743.429254.590319@montanaro.dyndns.org> Anthony> So we'd now have 3 places to update when things change Anthony> (setup.py, PCbuild area, SCons)? Four. You forgot Modules/Setup... Skip From david at boddie.org.uk Thu Nov 9 16:42:48 2006 From: david at boddie.org.uk (David Boddie) Date: Thu, 9 Nov 2006 16:42:48 +0100 Subject: [Python-Dev] Using SCons for cross-compilation Message-ID: <200611091642.48998.david@boddie.org.uk> On Thu Nov 9 07:45:30 CET 2006, Anthony Baxter wrote: > On Thursday 09 November 2006 16:30, Martin v. L?wis wrote: > > Patch #841454 takes a stab at cross-compilation > > (for MingW32 on a Linux system, in this case), > > and proposes to use SCons instead of setup.py > > to compile extension modules. Usage of SCons > > would be restricted to cross-compilation (for > > the moment). > > > > What do you think? > > So we'd now have 3 places to update when things change (setup.py, PCbuild > area, SCons)? How does this deal with the problems that autoconf has with > cross-compilation? It would seem to me that just fixing the extension module > building is a tiny part of the problem... or am I missing something? I've been working on adding cross-compiling support to Python's build system, too, though I've had the luxury of building on Linux for a target platform that also runs Linux. Since the build system originally came from the GCC project, it shouldn't surprise anyone that there's already a certain level of support for cross-compilation built in. Simply setting the --build and --host options is a good start, for example. It seems that Martin's patch solves some problems I encountered more cleanly (in certain respects) than the solutions I came up with. Here are some issues I encountered (from memory): * The current system assumes that Parser/pgen will be built using the compiler being used for the rest of the build. This obviously isn't going to work when the executable is meant for the target platform. At the same time, the object files for pgen need to be compiled for the interpreter for the target platform. * The newly-compiled interpreter is used to compile the standard library, run tests and execute the setup.py file. Some of these things should be done by the interpreter, but it won't work on the host platform. On the other hand, the setup.py script should be run by the host's Python interpreter, but using information about the target interpreter's configuration. * There are various extensions defined in the setup.py file that are found and erroneously included if you execute it using the host's interpreter. Ideally, it would be possible to use the target's configuration to disable extensions, but a more configurable build process would also be welcome. I'll try to look at Martin's patch at some point. I hope these observations and suggestions help explain the current issues with the build system when cross-compiling. David From chris at kateandchris.net Thu Nov 9 17:29:37 2006 From: chris at kateandchris.net (Chris Lambacher) Date: Thu, 9 Nov 2006 11:29:37 -0500 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <200611091642.48998.david@boddie.org.uk> References: <200611091642.48998.david@boddie.org.uk> Message-ID: <20061109162937.GA3812@kateandchris.net> On Thu, Nov 09, 2006 at 04:42:48PM +0100, David Boddie wrote: > On Thu Nov 9 07:45:30 CET 2006, Anthony Baxter wrote: > > > On Thursday 09 November 2006 16:30, Martin v. L?wis wrote: > > > Patch #841454 takes a stab at cross-compilation > > > (for MingW32 on a Linux system, in this case), > > > and proposes to use SCons instead of setup.py > > > to compile extension modules. Usage of SCons > > > would be restricted to cross-compilation (for > > > the moment). > > > > > > What do you think? > > > > So we'd now have 3 places to update when things change (setup.py, PCbuild > > area, SCons)? How does this deal with the problems that autoconf has with > > cross-compilation? It would seem to me that just fixing the extension module > > building is a tiny part of the problem... or am I missing something? > > I've been working on adding cross-compiling support to Python's build system, > too, though I've had the luxury of building on Linux for a target platform > that also runs Linux. Since the build system originally came from the GCC > project, it shouldn't surprise anyone that there's already a certain level > of support for cross-compilation built in. Simply setting the --build and > --host options is a good start, for example. > > It seems that Martin's patch solves some problems I encountered more cleanly > (in certain respects) than the solutions I came up with. Here are some > issues I encountered (from memory): > > * The current system assumes that Parser/pgen will be built using the > compiler being used for the rest of the build. This obviously isn't > going to work when the executable is meant for the target platform. > At the same time, the object files for pgen need to be compiled for > the interpreter for the target platform. > > * The newly-compiled interpreter is used to compile the standard library, > run tests and execute the setup.py file. Some of these things should > be done by the interpreter, but it won't work on the host platform. > On the other hand, the setup.py script should be run by the host's > Python interpreter, but using information about the target interpreter's > configuration. > > * There are various extensions defined in the setup.py file that are > found and erroneously included if you execute it using the host's > interpreter. Ideally, it would be possible to use the target's > configuration to disable extensions, but a more configurable build > process would also be welcome. > This pretty much covers the difficulties I encountered. For what it's worth, my experiences with Python 2.5 are documented here: I am also interested in pursuing solutions that make it easier to both build python and third party extensions in cross compile environment. > I'll try to look at Martin's patch at some point. I hope these observations > and suggestions help explain the current issues with the build system when > cross-compiling. > > David > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/chris%40kateandchris.net From tjreedy at udel.edu Thu Nov 9 19:54:00 2006 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 9 Nov 2006 13:54:00 -0500 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:Lib/mailbox.py Misc/NEWS References: <20061109135115.15FA81E4006@bag.python.org> <20061109140146.GB8808@localhost.localdomain> Message-ID: "A.M. Kuchling" wrote in message news:20061109140146.GB8808 at localhost.localdomain... > On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote: >> Author: andrew.kuchling >> Date: Thu Nov 9 14:51:14 2006 >> New Revision: 52692 >> >> [Patch #1514544 by David Watson] use fsync() to ensure data is really on >> disk > > Should I backport this change to 2.5.1? Con: The patch adds two new > internal functions, _sync_flush() and _sync_close(), so it's an > internal API change. Pro: it's a patch that should reduce chances of > data loss, which is important to people processing mailboxes. I am not familiar with the context but I would naively think of data loss as a bug. The new functions' code could be preceded by a comment that they were added in 2.5.1 for internal use only and that external use would make code incompatible with 2.5 -- and of course, not documented elsewhere. tjr From martin at v.loewis.de Thu Nov 9 20:02:12 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 20:02:12 +0100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <200611091745.31443.anthony@interlink.com.au> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> Message-ID: <45537B34.9010100@v.loewis.de> Anthony Baxter schrieb: > So we'd now have 3 places to update when things change (setup.py, PCbuild > area, SCons)? How does this deal with the problems that autoconf has with > cross-compilation? It would seem to me that just fixing the extension module > building is a tiny part of the problem... or am I missing something? I'm not quite sure. I believe distutils is too smart to support cross-compilation. It has its own notion of where to look for header files and how to invoke the compiler; these builtin assumptions break for cross-compilation. In any case, the patch being contributed uses SCons. If people think this is unmaintainable, this is a reason to reject the patch. Regards, Martin From skip at pobox.com Thu Nov 9 20:15:15 2006 From: skip at pobox.com (skip at pobox.com) Date: Thu, 9 Nov 2006 13:15:15 -0600 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <45537B34.9010100@v.loewis.de> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> Message-ID: <17747.32323.67824.681099@montanaro.dyndns.org> Martin> In any case, the patch being contributed uses SCons. If people Martin> think this is unmaintainable, this is a reason to reject the Martin> patch. Could SCons replace distutils? Skip From chris at kateandchris.net Thu Nov 9 20:27:17 2006 From: chris at kateandchris.net (Chris Lambacher) Date: Thu, 9 Nov 2006 14:27:17 -0500 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <17747.32323.67824.681099@montanaro.dyndns.org> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> Message-ID: <20061109192717.GA4353@kateandchris.net> On Thu, Nov 09, 2006 at 01:15:15PM -0600, skip at pobox.com wrote: > > Martin> In any case, the patch being contributed uses SCons. If people > Martin> think this is unmaintainable, this is a reason to reject the > Martin> patch. > > Could SCons replace distutils? If SCons replaced Distutils would SCons have to become part of Python? Is SCons ready for that? What do you do about the existing body 3rd party extensions that are already using Distutils? Think of the resistance to the, relatively minor, that Setuptools made to the way Distutils works. I think a better question is what about Distutils hinders cross-compiler scenarios and how to we fix those deficiencies? -Chris From martin at v.loewis.de Thu Nov 9 20:50:51 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 20:50:51 +0100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <20061109192717.GA4353@kateandchris.net> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> <20061109192717.GA4353@kateandchris.net> Message-ID: <4553869B.8060203@v.loewis.de> Chris Lambacher schrieb: > I think a better question is what about Distutils hinders cross-compiler > scenarios and how to we fix those deficiencies? It's primarily the lack of contributions. Somebody would have to define a cross-compilation scenario (where "use Cygwin on Linux" is one that might be available to many people), and try to make it work. I believe it wouldn't work out of the box because distutils issues the wrong commands with the wrong command line options. But I don't know for sure; I haven't tried myself. Regards, Martin From skip at pobox.com Thu Nov 9 20:54:37 2006 From: skip at pobox.com (skip at pobox.com) Date: Thu, 9 Nov 2006 13:54:37 -0600 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <20061109192717.GA4353@kateandchris.net> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> <20061109192717.GA4353@kateandchris.net> Message-ID: <17747.34685.755754.32319@montanaro.dyndns.org> >> Could SCons replace distutils? Chris> If SCons replaced Distutils would SCons have to become part of Chris> Python? Is SCons ready for that? What do you do about the Chris> existing body 3rd party extensions that are already using Chris> Distutils? Sorry, my question was ambiguous. Let me rephrase it: Could SCons replace distutils as the way to build extension modules delivered with Python proper? In answer to your quesions: * Yes, I believe so. * I have no idea what SCons is ready for. * I assume distutils would continue to ship with Python, so existing distutils-based setup.py install scripts should continue to work. Someone (I don't know who) submitted a patch to use SCons for building modules in cross-compilation contexts. Either the author tried to shoehorn this into distutils and failed or never tried (maybe because using SCons for such takss is much easier - who knows?). I assume that if the patch is accepted that SCons would have to be bundled with Python. I don't see that as a big problem as long as there's someone to support it and it meets the basic requirements for inclusion (significant user base, documentation, test cases, release form). Given that SCons can apparently be coaxed into cross-compiling extension modules, I presume it should be relatively simple to do the same in a normal compilation environment. If that's the case, then why use distutils to build Python's core extension modules at all? Skip From martin at v.loewis.de Thu Nov 9 20:56:04 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 20:56:04 +0100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <200611091642.48998.david@boddie.org.uk> References: <200611091642.48998.david@boddie.org.uk> Message-ID: <455387D4.20206@v.loewis.de> David Boddie schrieb: > It seems that Martin's patch solves some problems I encountered more cleanly > (in certain respects) than the solutions I came up with. Here are some > issues I encountered (from memory): Just let me point out that it is not my patch: http://python.org/sf/841454 was contributed by Andreas Ames. I performed triage on it (as it is about to reach its 3rd anniversary), and view SCons usage as the biggest obstacle. Regards, Martin From martin at v.loewis.de Thu Nov 9 20:59:23 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Nov 2006 20:59:23 +0100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <17747.34685.755754.32319@montanaro.dyndns.org> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> <20061109192717.GA4353@kateandchris.net> <17747.34685.755754.32319@montanaro.dyndns.org> Message-ID: <4553889B.7050404@v.loewis.de> skip at pobox.com schrieb: > Someone (I don't know who) submitted a patch to use SCons for building > modules in cross-compilation contexts. Either the author tried to shoehorn > this into distutils and failed or never tried (maybe because using SCons for > such takss is much easier - who knows?). I assume that if the patch is > accepted that SCons would have to be bundled with Python. I don't see that as a requirement. People cross-compiling Python could be required to install SCons - they are used to install all kinds of things for a cross-compilation environment. In particular, to run SCons, they need a host python. The just-built python is unsuitable, as it only runs on the target. Regards, Martin From barry at python.org Thu Nov 9 22:19:19 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Nov 2006 16:19:19 -0500 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: <17747.32323.67824.681099@montanaro.dyndns.org> References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 9, 2006, at 2:15 PM, skip at pobox.com wrote: > > Martin> In any case, the patch being contributed uses SCons. If > people > Martin> think this is unmaintainable, this is a reason to > reject the > Martin> patch. > > Could SCons replace distutils? I'm not so sure. I love SCons, but it has some unpythonic aspects to it, which (IMO) make sense as a standalone build tool, but not so much as a standard library module. I'd probably want to see some of those things improved if we were to use it to replace distutils. There does seem to be overlap between the two tools though, and it might make for an interesting sprint/project to find and refactor the commonality. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRVObXHEjvBPtnXfVAQIhQQP/esS6o+7NX/JenJcuEdvb7rWIVxRgzVEh rfZGSOO2mp6b0PgrvXjAnZQHYJFpQO5JXpWJVqLBPxbucbBwvWaA0+tgTrpnBpj9 Cs/vwlMsmk55CwSYjvl7eM0uW9aIuT9QcZxuf4j+T7dzQOL0LL2Id4/876Azcfo0 7A0dtc2oJ+U= =H1w2 -----END PGP SIGNATURE----- From pedronis at strakt.com Thu Nov 9 22:29:14 2006 From: pedronis at strakt.com (Samuele Pedroni) Date: Thu, 09 Nov 2006 22:29:14 +0100 Subject: [Python-Dev] Using SCons for cross-compilation In-Reply-To: References: <4552BD02.2090808@v.loewis.de> <200611091745.31443.anthony@interlink.com.au> <45537B34.9010100@v.loewis.de> <17747.32323.67824.681099@montanaro.dyndns.org> Message-ID: <45539DAA.7070701@strakt.com> Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Nov 9, 2006, at 2:15 PM, skip at pobox.com wrote: > > >> Martin> In any case, the patch being contributed uses SCons. If >> people >> Martin> think this is unmaintainable, this is a reason to >> reject the >> Martin> patch. >> >> Could SCons replace distutils? >> > > I'm not so sure. I love SCons, but it has some unpythonic aspects to > it, which (IMO) make sense as a standalone build tool, but not so > much as a standard library module. I'd probably want to see some of > those things improved if we were to use it to replace distutils. > > in PyPy we explored at some point using SCons instead of abusing distutils for our building needs, it seems to have a library part but a lot of its high-level dependency logic seems to be coded in what is its main invocation script logic in a monolithic way and with a lot of global state. We didn't feel like trying to untangle that or explore more. > There does seem to be overlap between the two tools though, and it > might make for an interesting sprint/project to find and refactor the > commonality. > > - -Barry > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (Darwin) > > iQCVAwUBRVObXHEjvBPtnXfVAQIhQQP/esS6o+7NX/JenJcuEdvb7rWIVxRgzVEh > rfZGSOO2mp6b0PgrvXjAnZQHYJFpQO5JXpWJVqLBPxbucbBwvWaA0+tgTrpnBpj9 > Cs/vwlMsmk55CwSYjvl7eM0uW9aIuT9QcZxuf4j+T7dzQOL0LL2Id4/876Azcfo0 > 7A0dtc2oJ+U= > =H1w2 > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/pedronis%40strakt.com > From anthony at interlink.com.au Fri Nov 10 01:56:25 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri, 10 Nov 2006 11:56:25 +1100 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS In-Reply-To: <20061109140146.GB8808@localhost.localdomain> References: <20061109135115.15FA81E4006@bag.python.org> <20061109140146.GB8808@localhost.localdomain> Message-ID: <200611101156.29325.anthony@interlink.com.au> On Friday 10 November 2006 01:01, A.M. Kuchling wrote: > On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote: > > Author: andrew.kuchling > > Date: Thu Nov 9 14:51:14 2006 > > New Revision: 52692 > > > > [Patch #1514544 by David Watson] use fsync() to ensure data is really on > > disk > > Should I backport this change to 2.5.1? Con: The patch adds two new > internal functions, _sync_flush() and _sync_close(), so it's an > internal API change. Pro: it's a patch that should reduce chances of > data loss, which is important to people processing mailboxes. > > Because it fixes a small chance of potential data loss and the new > functions are prefixed with _, my personal inclination would be to > backport this change. Looking at the patch, the functions are pretty clearly internal implementation details. I'm happy for it to go into release25-maint (particularly because the consequences of the bug are so dire). Anthony -- Anthony Baxter It's never too late to have a happy childhood. From amk at amk.ca Fri Nov 10 03:45:13 2006 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 9 Nov 2006 21:45:13 -0500 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS In-Reply-To: <200611101156.29325.anthony@interlink.com.au> References: <20061109135115.15FA81E4006@bag.python.org> <20061109140146.GB8808@localhost.localdomain> <200611101156.29325.anthony@interlink.com.au> Message-ID: <20061110024513.GB1739@Andrew-iBook2.local> On Fri, Nov 10, 2006 at 11:56:25AM +1100, Anthony Baxter wrote: > Looking at the patch, the functions are pretty clearly internal implementation > details. I'm happy for it to go into release25-maint (particularly because > the consequences of the bug are so dire). OK, I'll backport it; thanks! (It's not fixing a frequent data-loss problem -- the patch just assures that when flush() or close() returns, data is more likely to have been written to disk and be safe after a subsequent system crash.) --amk From anthony at interlink.com.au Fri Nov 10 04:36:52 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Fri, 10 Nov 2006 14:36:52 +1100 Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS In-Reply-To: <20061110024513.GB1739@Andrew-iBook2.local> References: <20061109135115.15FA81E4006@bag.python.org> <200611101156.29325.anthony@interlink.com.au> <20061110024513.GB1739@Andrew-iBook2.local> Message-ID: <200611101436.52930.anthony@interlink.com.au> On Friday 10 November 2006 13:45, A.M. Kuchling wrote: > OK, I'll backport it; thanks! > > (It's not fixing a frequent data-loss problem -- the patch just > assures that when flush() or close() returns, data is more likely to > have been written to disk and be safe after a subsequent system > crash.) Sure - it's a potential bug waiting to happen, though. And it's not a fun one :) From paul.chiusano at gmail.com Sun Nov 5 18:36:35 2006 From: paul.chiusano at gmail.com (Paul Chiusano) Date: Sun, 5 Nov 2006 12:36:35 -0500 Subject: [Python-Dev] Status of pairing_heap.py? In-Reply-To: <20061104122150.81FF.JCARLSON@uci.edu> References: <454CE367.7000604@v.loewis.de> <20061104122150.81FF.JCARLSON@uci.edu> Message-ID: > It is not required. If you are careful, you can implement a pairing > heap with a structure combining a dictionary and list. That's interesting. Can you give an overview of how you can do that? I can't really picture it. You can support all the pairing heap operations with the same complexity guarantees? Do you mean a linked list here or an array? Paul On 11/4/06, Josiah Carlson wrote: > > "Martin v. L?wis" wrote: > > Paul Chiusano schrieb: > > > To support this, the insert method needs to return a reference to an > > > object which I can then pass to adjust_key() and delete() methods. > > > It's extremely difficult to have this functionality with array-based > > > heaps because the index of an item in the array changes as items are > > > inserted and removed. > > > > I see. > > It is not required. If you are careful, you can implement a pairing > heap with a structure combining a dictionary and list. It requires that > all values be unique and hashable, but it is possible (I developed one > for a commercial project). > > If other people find the need for it, I could rewrite it (can't release > the closed source). It would use far less memory than the pairing heap > implementation provided in the sandbox, and could be converted to C if > desired and/or required. On the other hand, I've found the pure Python > version to be fast enough for most things I've needed it for. > > - Josiah > > From kxroberto at googlemail.com Mon Nov 6 17:56:02 2006 From: kxroberto at googlemail.com (Robert) Date: Mon, 06 Nov 2006 17:56:02 +0100 Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch) References: ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com Message-ID: <454F6922.20503@googlemail.com> Talin wrote: >>/ I don't know how you define simple. In order to be able to have />>/ separate GILs you have to remove *all* sharing of objects between />>/ interpreters. And all other data structures, too. It would probably />>/ kill performance too, because currently obmalloc relies on the GIL. / > Nitpick: You have to remove all sharing of *mutable* objects. One day, > when we get "pure" GC with no refcounting, that will be a meaningful > distinction. :) Is it mad?: It could be a distinction now: immutables/singletons refcount could be held ~fix around MAXINT easily (by a loose periodic GC scheme, or by Py_INC/DEFREF to be like { if ob.refcount!=MAXINT ... ) dicty things like Exception.x=5 could either be disabled or Exception.refcount=MAXINT/.__dict__=lockingdict ... or exceptions could be doubled as they don't have to cross the bridge (weren't they in an ordinary python module once ?). obmalloc.c/LOCK() could be something fast like: _retry: __asm LOCK INC malloc_lock if (malloc_lock!=1) { LOCK DEC malloc_lock; /*yield();*/ goto _retry; } To know the final speed costs ( http://groups.google.de/group/comp.lang.python/msg/01cef42159fd1712 ) would require an experiment. Cheap signal processors (<1%) don't need to be supported for free threading interpreters. Builtin/Extension modules global __dict__ to become a lockingdict. Yet a speedy LOCK INC lock method may possibly lead to general free threading threads (for most CPUs) at all. Almost all Python objects have static/uncritical attributes/require only few locks. A full blown LOCK INC lock method on dict & list accesses, (avoidable for fastlocals?) & defaulty Py_INC/DECREF (as far as there is still refcounting in Py3K). Py_FASTINCREF could be fast for known immutables (mainly Py_None) with MAXINT method, and for fresh creations etc. PyThreadState_GET(): A ts(PyThread_get_thread_ident())/*TlsGetValue() would become necessary. Is there a fast thread_ID register in todays CPU's?* Robert From fredrik at pythonware.com Fri Nov 10 18:21:35 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 10 Nov 2006 18:21:35 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: Guido van Rossum wrote: > No objection on targetting 2.6 if other developers agree. Seems this > is well under way. good work! given that dir() is used extensively by introspection tools, I'm not sure I'm positive to a __dir__ that *overrides* the standard dir() behaviour. *adding* to the default dir() list is okay, re- placing it is a lot more questionable. (what about vars(), btw?) From guido at python.org Fri Nov 10 20:30:57 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Nov 2006 11:30:57 -0800 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: On 11/10/06, Fredrik Lundh wrote: > Guido van Rossum wrote: > > > No objection on targetting 2.6 if other developers agree. Seems this > > is well under way. good work! > > given that dir() is used extensively by introspection tools, I'm > not sure I'm positive to a __dir__ that *overrides* the standard > dir() behaviour. *adding* to the default dir() list is okay, re- > placing it is a lot more questionable. I think that ought to go into the guidlines for what's an acceptable __dir__ implementation. We don't try to stop people from overriding __add__ as subtraction either. > (what about vars(), btw?) Interesting question! Right now vars() and dir() don't seem to use the same set of keys; e.g.: >>> class C: pass ... >>> c = C() >>> c.foo = 42 >>> vars(c) {'foo': 42} >>> dir(c) ['__doc__', '__module__', 'foo'] >>> It makes some sense for vars(x) to return something like dict((name, getattr(x, name)) for name in dir(x) if hasattr(x, name)) and for the following equivalence to hold between vars() and dir() without args: dir() == sorted(vars().keys()) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Fri Nov 10 20:41:28 2006 From: theller at ctypes.org (Thomas Heller) Date: Fri, 10 Nov 2006 20:41:28 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: <4554D5E8.3020600@ctypes.org> Fredrik Lundh schrieb: > Guido van Rossum wrote: > >> No objection on targetting 2.6 if other developers agree. Seems this >> is well under way. good work! > > given that dir() is used extensively by introspection tools, I'm > not sure I'm positive to a __dir__ that *overrides* the standard > dir() behaviour. *adding* to the default dir() list is okay, re- > placing it is a lot more questionable. One part that *I* would like about a complete overridable __dir__ implementation is that it would be nice to customize what help(something) prints. Thomas From fredrik at pythonware.com Fri Nov 10 21:25:02 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 10 Nov 2006 21:25:02 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: <4554D5E8.3020600@ctypes.org> References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> <4554D5E8.3020600@ctypes.org> Message-ID: Thomas Heller wrote: >>> No objection on targetting 2.6 if other developers agree. Seems this >>> is well under way. good work! >> >> given that dir() is used extensively by introspection tools, I'm >> not sure I'm positive to a __dir__ that *overrides* the standard >> dir() behaviour. *adding* to the default dir() list is okay, re- >> placing it is a lot more questionable. > > One part that *I* would like about a complete overridable __dir__ implementation > is that it would be nice to customize what help(something) prints. I don't think you should confuse reliable introspection with the help system, though. introspection is used for a lot more than implementing help(). From fredrik at pythonware.com Fri Nov 10 21:26:34 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 10 Nov 2006 21:26:34 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: Guido van Rossum wrote: > I think that ought to go into the guidlines for what's an acceptable > __dir__ implementation. We don't try to stop people from overriding > __add__ as subtraction either. to me, overriding dir() is a lot more like overriding id() than over- riding "+". I don't think an object should be allowed to lie to the introspection mechanisms. From guido at python.org Fri Nov 10 22:12:19 2006 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Nov 2006 13:12:19 -0800 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: On 11/10/06, Fredrik Lundh wrote: > Guido van Rossum wrote: > > > I think that ought to go into the guidlines for what's an acceptable > > __dir__ implementation. We don't try to stop people from overriding > > __add__ as subtraction either. > > to me, overriding dir() is a lot more like overriding id() than over- > riding "+". I don't think an object should be allowed to lie to the > introspection mechanisms. Why not? You can override __class__ already. With a metaclass you can probably override inspection of the class, too. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Sat Nov 11 11:20:41 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Nov 2006 11:20:41 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: Guido van Rossum wrote: >> (what about vars(), btw?) > > Interesting question! Right now vars() and dir() don't seem to use the > same set of keys; e.g.: > >>>> class C: pass > ... >>>> c = C() >>>> c.foo = 42 >>>> vars(c) > {'foo': 42} >>>> dir(c) > ['__doc__', '__module__', 'foo'] >>>> > > It makes some sense for vars(x) to return something like > > dict((name, getattr(x, name)) for name in dir(x) if hasattr(x, name)) > > and for the following equivalence to hold between vars() and dir() without args: > > dir() == sorted(vars().keys()) +1. This is easy and straightforward to explain, better than "With a module, class or class instance object as argument (or anything else that has a __dict__ attribute), returns a dictionary corresponding to the object's symbol table." Georg From g.brandl at gmx.net Sat Nov 11 11:21:08 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Nov 2006 11:21:08 +0100 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: Fredrik Lundh wrote: > Guido van Rossum wrote: > >> No objection on targetting 2.6 if other developers agree. Seems this >> is well under way. good work! > > given that dir() is used extensively by introspection tools, I'm > not sure I'm positive to a __dir__ that *overrides* the standard > dir() behaviour. *adding* to the default dir() list is okay, re- > placing it is a lot more questionable. If the new default __dir__ implementation only yields the same set of attributes (or more), there should be no problem. If somebody overrides __dir__, he knows what he's doing. He will most likely do something like "return super.__dir__() + [my, custom, attributes]". regards, Georg From ncoghlan at gmail.com Sun Nov 12 04:34:29 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Nov 2006 13:34:29 +1000 Subject: [Python-Dev] __dir__, part 2 In-Reply-To: References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com> <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com> <454FBDC3.3060100@gmail.com> Message-ID: <45569645.1080206@gmail.com> Fredrik Lundh wrote: > Guido van Rossum wrote: > >> No objection on targetting 2.6 if other developers agree. Seems this >> is well under way. good work! > > given that dir() is used extensively by introspection tools, I'm > not sure I'm positive to a __dir__ that *overrides* the standard > dir() behaviour. *adding* to the default dir() list is okay, re- > placing it is a lot more questionable. If a class only overrides __getattr__, then I agree it should only add to __dir__ (most likely by using a super call as Georg suggests). If it overrides __getattribute__, however, then it can actually deliberately block access to attributes that would otherwise be accessible, so it may make sense for it to alter the basic result of dir() instead of just adding more attributes to the end. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Sun Nov 12 12:01:20 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Nov 2006 12:01:20 +0100 Subject: [Python-Dev] Passing floats to file.seek Message-ID: <4556FF00.3070108@v.loewis.de> Patch #1067760 deals with passing of float values to file.seek; the original version tries to fix the current implementation by converting floats to long long, rather than plain C long (thus supporting files larger than 2GiB). I propose a different approach: passing floats to seek should be an error. My version of the patch uses the index API, this will automatically give an error. Two questions: a) should floats be supported as parameters to file.seek b) if not, should Python 2.6 just deprecate such usage, or outright reject it? Regards, Martin From fredrik at pythonware.com Sun Nov 12 12:09:49 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 12 Nov 2006 12:09:49 +0100 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: <4556FF00.3070108@v.loewis.de> References: <4556FF00.3070108@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Patch #1067760 deals with passing of float values to file.seek; > the original version tries to fix the current implementation > by converting floats to long long, rather than plain C long > (thus supporting files larger than 2GiB). > > I propose a different approach: passing floats to seek should > be an error. My version of the patch uses the index API, this > will automatically give an error. > > Two questions: > a) should floats be supported as parameters to file.seek I don't really see why. > b) if not, should Python 2.6 just deprecate such usage, > or outright reject it? Python 2.5 silently accepts (and truncates) a float that's within range, so a warning sounds like the right thing to do for 2.6. note that read already produces such a warning: >>> f = open("hello.txt") >>> f.seek(1.5) >>> f.read(1.5) __main__:1: DeprecationWarning: integer argument expected, got float 'e' From anthony at interlink.com.au Sun Nov 12 16:23:08 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon, 13 Nov 2006 02:23:08 +1100 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: References: <4556FF00.3070108@v.loewis.de> Message-ID: <200611130223.11460.anthony@interlink.com.au> On Sunday 12 November 2006 22:09, Fredrik Lundh wrote: > Martin v. L?wis wrote: > > Patch #1067760 deals with passing of float values to file.seek; > > the original version tries to fix the current implementation > > by converting floats to long long, rather than plain C long > > (thus supporting files larger than 2GiB). > > b) if not, should Python 2.6 just deprecate such usage, > > or outright reject it? > > Python 2.5 silently accepts (and truncates) a float that's within range, > so a warning sounds like the right thing to do for 2.6. note that read I agree that a warning seems best. If someone (for whatever reason) is flinging floats around where they actually meant to have ints, going straight to an error from silently truncating and accepting it seems a little bit harsh. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From fredrik at pythonware.com Sun Nov 12 18:47:16 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 12 Nov 2006 18:47:16 +0100 Subject: [Python-Dev] ready-made timezones for the datetime module Message-ID: I guess I should remember, but what's the rationale for not including even a single concrete "tzinfo" implementation in the standard library? not even a UTC class? or am I missing something? From martin at v.loewis.de Sun Nov 12 20:14:47 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Nov 2006 20:14:47 +0100 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: References: Message-ID: <455772A7.6090800@v.loewis.de> Fredrik Lundh schrieb: > I guess I should remember, but what's the rationale for not including > even a single concrete "tzinfo" implementation in the standard library? > > not even a UTC class? > > or am I missing something? If you are asking for a time-zone database, such as pytz (http://sourceforge.net/projects/pytz/), then I think there are two reasons for why no such code is included: a) such a database is not available in standard C, or even in POSIX. So it is not possible to provide this functionality by wrapping a widely-available library. b) no code to provide such functionality has been contributed. Normally, b) would be the bigger issue. In this case, I think there might also be resistance to including a large database (as usual when inclusion of some database is proposed). Regards, Martin From fredrik at pythonware.com Sun Nov 12 21:55:57 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 12 Nov 2006 21:55:57 +0100 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: <455772A7.6090800@v.loewis.de> References: <455772A7.6090800@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> I guess I should remember, but what's the rationale for not including >> even a single concrete "tzinfo" implementation in the standard library? >> >> not even a UTC class? >> >> or am I missing something? > > If you are asking for a time-zone database I was more thinking of basic stuff like the UTC, FixedOffset and LocalTimezone classes from the library reference: http://docs.python.org/lib/datetime-tzinfo.html I just wrote a small RSS generator; it took more more time to sort out how to get strftime("%z") to print something meaningful than it took to write the rest of the code. would anyone mind if I added the above classes to the datetime module ? From barry at python.org Sun Nov 12 22:16:23 2006 From: barry at python.org (Barry Warsaw) Date: Sun, 12 Nov 2006 16:16:23 -0500 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: References: <455772A7.6090800@v.loewis.de> Message-ID: <57828D04-F0DD-497A-AE11-BB7BC5FD675F@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 12, 2006, at 3:55 PM, Fredrik Lundh wrote: > would anyone mind if I added the above classes to the datetime > module ? +1. I mean, we have an example of UTC in the docs, so, er, why not include it in the stdlib?! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRVePLHEjvBPtnXfVAQIyGAQAi18TdI55P1vDp7sTuHS7eQMZmXMAr4+M 8i2RpWZrxtgi4c21J/qiwEIoY3KdANiUyzb8PbScf8LuFzZZTiDPsuMuTDC8IhBR w6bvU/AOpsmWpkuSKyjPaVdgZlOQ8IsHOJUQtYAVDsfMCh4D0Y65jMHENi1gYzud JJky5a6DifM= =BxZL -----END PGP SIGNATURE----- From rasky at develer.com Mon Nov 13 00:09:40 2006 From: rasky at develer.com (Giovanni Bajo) Date: Mon, 13 Nov 2006 00:09:40 +0100 Subject: [Python-Dev] Summer of Code: zipfile? Message-ID: <099a01c706af$a21f1d20$ce09f01b@bagio> Hello, wasn't there a project about the zipfile module in the Summer of Code? How did it go? Giovanni Bajo From guido at python.org Mon Nov 13 02:23:57 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Nov 2006 17:23:57 -0800 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: References: Message-ID: IMO it was an oversight. Or we were all exhausted. I keep copying those three classes from the docs, which is silly. :-) On 11/12/06, Fredrik Lundh wrote: > I guess I should remember, but what's the rationale for not including > even a single concrete "tzinfo" implementation in the standard library? > > not even a UTC class? > > or am I missing something? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Nov 13 02:25:32 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 12 Nov 2006 17:25:32 -0800 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: <200611130223.11460.anthony@interlink.com.au> References: <4556FF00.3070108@v.loewis.de> <200611130223.11460.anthony@interlink.com.au> Message-ID: On 11/12/06, Anthony Baxter wrote: > On Sunday 12 November 2006 22:09, Fredrik Lundh wrote: > > Martin v. L?wis wrote: > > > Patch #1067760 deals with passing of float values to file.seek; > > > the original version tries to fix the current implementation > > > by converting floats to long long, rather than plain C long > > > (thus supporting files larger than 2GiB). > > > > b) if not, should Python 2.6 just deprecate such usage, > > > or outright reject it? > > > > Python 2.5 silently accepts (and truncates) a float that's within range, > > so a warning sounds like the right thing to do for 2.6. note that read > > I agree that a warning seems best. If someone (for whatever reason) is > flinging floats around where they actually meant to have ints, going straight > to an error from silently truncating and accepting it seems a little bit > harsh. Right. There seem to be people who believe that 1e6 is an int. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Mon Nov 13 05:18:00 2006 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 12 Nov 2006 20:18:00 -0800 Subject: [Python-Dev] Summer of Code: zipfile? In-Reply-To: <099a01c706af$a21f1d20$ce09f01b@bagio> References: <099a01c706af$a21f1d20$ce09f01b@bagio> Message-ID: You probably need to contact the authors for more info: https://svn.sourceforge.net/svnroot/ziparchive/ziparchive/trunk/ http://wiki.python.org/moin/SummerOfCode n -- On 11/12/06, Giovanni Bajo wrote: > Hello, > > wasn't there a project about the zipfile module in the Summer of Code? How did > it go? > > Giovanni Bajo > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/nnorwitz%40gmail.com > From fredrik at pythonware.com Mon Nov 13 08:48:46 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 13 Nov 2006 08:48:46 +0100 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: References: Message-ID: Guido van Rossum wrote: > IMO it was an oversight. Or we were all exhausted. I keep copying > those three classes from the docs, which is silly. :-) I'll whip up a patch. would the "embedded python module" approach I'm using for _elementtree be okay, or should this go into a support library ? From steve at holdenweb.com Mon Nov 13 11:08:17 2006 From: steve at holdenweb.com (Steve Holden) Date: Mon, 13 Nov 2006 04:08:17 -0600 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: References: <4556FF00.3070108@v.loewis.de> <200611130223.11460.anthony@interlink.com.au> Message-ID: Guido van Rossum wrote: > On 11/12/06, Anthony Baxter wrote: >> On Sunday 12 November 2006 22:09, Fredrik Lundh wrote: >>> Martin v. L?wis wrote: >>>> Patch #1067760 deals with passing of float values to file.seek; >>>> the original version tries to fix the current implementation >>>> by converting floats to long long, rather than plain C long >>>> (thus supporting files larger than 2GiB). >>>> b) if not, should Python 2.6 just deprecate such usage, >>>> or outright reject it? >>> Python 2.5 silently accepts (and truncates) a float that's within range, >>> so a warning sounds like the right thing to do for 2.6. note that read >> I agree that a warning seems best. If someone (for whatever reason) is >> flinging floats around where they actually meant to have ints, going straight >> to an error from silently truncating and accepting it seems a little bit >> harsh. > > Right. There seem to be people who believe that 1e6 is an int. > In which case an immediate transition to error status would seem to offer a way of providing an effective education. Deprecation may well be the best way to go for customer-friendliness, but anyone who believes 1e6 is an int should be hit with a stick. Next thing you know some damned fool is going to suggest that 1e6 gets parsed into a long integer. There, I feel better now. thank-you-for-listening-ly y'rs - steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From murman at gmail.com Mon Nov 13 15:23:43 2006 From: murman at gmail.com (Michael Urman) Date: Mon, 13 Nov 2006 08:23:43 -0600 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: References: <4556FF00.3070108@v.loewis.de> <200611130223.11460.anthony@interlink.com.au> Message-ID: On 11/13/06, Steve Holden wrote: > In which case an immediate transition to error status would seem to > offer a way of providing an effective education. Deprecation may well be > the best way to go for customer-friendliness, but anyone who believes > 1e6 is an int should be hit with a stick. Right, but what about those people who just didn't examine it? I consider myself a pretty good programmer, and was surprised by Guido's remark. A little quick self-education later, I understood. Still I find the implication that anyone using 1e6 for an integer should be (have all their users) beaten absurd in the context of backwards compatibility. Especially when they were using one of the less apparent floats in a place that accepted floats. Perhaps it would be a fine change for py3k. > Next thing you know some damned fool is going to suggest that 1e6 gets > parsed into a long integer. I can guess why it isn't, but it seems more a matter of ease than a matter of doing what's right. I had expected it to be an int because I thought of 1e6 as a shorthand for (1 * 10 ** 6), which is an int. 1e-6 would be (1 * 10 ** -6) which is a float. 1.0e6 would be (1.0 * 10 ** 6) which would also be a float. Clearly instead the e wins out as the format specifier. I'm not going to argue for it to be turned into an int, or even suggest it, after all compatibility with obscure realities of C is important. I'm just going to say that it makes more sense to me than your reaction indicates. -- Michael Urman http://www.tortall.net/mu/blog From skip at pobox.com Mon Nov 13 15:49:26 2006 From: skip at pobox.com (skip at pobox.com) Date: Mon, 13 Nov 2006 08:49:26 -0600 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: References: <4556FF00.3070108@v.loewis.de> <200611130223.11460.anthony@interlink.com.au> Message-ID: <17752.34294.638353.574267@montanaro.dyndns.org> >> Right. There seem to be people who believe that 1e6 is an int. ... Steve> Next thing you know some damned fool is going to suggest that 1e6 Steve> gets parsed into a long integer. Maybe in Py3k a decimal point should be required in floats using exponential notation - 1.e6 or 1.0e6 - with suitable deprecation warnings in 2.6+ about 1e6. Skip From steve at holdenweb.com Mon Nov 13 18:16:11 2006 From: steve at holdenweb.com (Steve Holden) Date: Mon, 13 Nov 2006 11:16:11 -0600 Subject: [Python-Dev] Passing floats to file.seek In-Reply-To: <17752.34294.638353.574267@montanaro.dyndns.org> References: <4556FF00.3070108@v.loewis.de> <200611130223.11460.anthony@interlink.com.au> <17752.34294.638353.574267@montanaro.dyndns.org> Message-ID: <4558A85B.8020304@holdenweb.com> skip at pobox.com wrote: > >> Right. There seem to be people who believe that 1e6 is an int. > ... > Steve> Next thing you know some damned fool is going to suggest that 1e6 > Steve> gets parsed into a long integer. > > Maybe in Py3k a decimal point should be required in floats using exponential > notation - 1.e6 or 1.0e6 - with suitable deprecation warnings in 2.6+ about > 1e6. > My remarks weren't entirely tongue in cheek. Once you have long integers seamlessly integrated there is a case to be made that if the mantissa is integral then the literal should have an integral value. Then, of course, we'll get people complaining about the length of time it takes to compute expressions containing huge integers. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From guido at python.org Mon Nov 13 18:31:51 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Nov 2006 09:31:51 -0800 Subject: [Python-Dev] ready-made timezones for the datetime module In-Reply-To: References: Message-ID: On 11/12/06, Fredrik Lundh wrote: > Guido van Rossum wrote: > > > IMO it was an oversight. Or we were all exhausted. I keep copying > > those three classes from the docs, which is silly. :-) > > I'll whip up a patch. would the "embedded python module" approach I'm > using for _elementtree be okay, or should this go into a support library ? I'll leave that to the 2.6 management; I don't know what you're talking about and would rather keep it that way. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Tue Nov 14 22:51:23 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 14 Nov 2006 22:51:23 +0100 Subject: [Python-Dev] PyFAQ: help wanted with thread article Message-ID: (reposted from c.l.py) the following FAQ item talks about using sleep to make sure that threads run properly: http://effbot.org/pyfaq/none-of-my-threads-seem-to-run-why.htm I suspect it was originally written for the "thread" module, since as far as I know, the "threading" module takes care of the issues described here all by itself. so, should this item be removed? or can anyone suggest a rewrite that's more relevant for "threading" users? From amk at amk.ca Wed Nov 15 16:08:31 2006 From: amk at amk.ca (A.M. Kuchling) Date: Wed, 15 Nov 2006 10:08:31 -0500 Subject: [Python-Dev] Arlington sprint this Saturday Message-ID: <20061115150831.GA6153@rogue.amk.ca> The monthly Arlington VA sprint is this Saturday, November 18 2006, 9 AM - 6 PM. Please see http://wiki.python.org/moin/ArlingtonSprint for directions. --amk From martin at v.loewis.de Wed Nov 15 22:20:12 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 15 Nov 2006 22:20:12 +0100 Subject: [Python-Dev] 2.5 portability problems Message-ID: <455B848C.9070909@v.loewis.de> I'd like to share an observation on portability of extension modules to Python 2.5: python-ldap would crash on Solaris, see http://groups.google.com/group/comp.lang.python/msg/a678a969c90f21ab?dmode=source&hl=en It turns out that this was caused by a mismatch in malloc "families" (PyMem_Del vs. PyObject_Del): http://sourceforge.net/tracker/index.php?func=detail&aid=1575329&group_id=2072&atid=102072 So if Python 2.5 crashes in malloc/free, it's probably a good guess that some extension module failed use correct APIs. There is probably not much we can do about this: it's already mentioned in "Porting to 2.5" of whatsnew25. It would be good if people were aware of this issue (and the other changes to the C API); thus I hope that this message/thread makes it to the python-dev summary :-) Regards, Martin From g.brandl at gmx.net Wed Nov 15 23:15:07 2006 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 15 Nov 2006 23:15:07 +0100 Subject: [Python-Dev] Results of the SOC projects Message-ID: Hi, this might seem a bit late, and perhaps I was just blind, but I miss something like a summary how the Python summer of code projects went, and what the status of the ones that were meant to improve the standard library, e.g. the C decimal implementation, is. cheers, Georg From nilton.volpato at gmail.com Thu Nov 16 20:37:10 2006 From: nilton.volpato at gmail.com (Nilton Volpato) Date: Thu, 16 Nov 2006 17:37:10 -0200 Subject: [Python-Dev] Summer of Code: zipfile? In-Reply-To: <099a01c706af$a21f1d20$ce09f01b@bagio> References: <099a01c706af$a21f1d20$ce09f01b@bagio> Message-ID: <27fef5640611161137y29ee8eb5g6cfa42c80195b1c@mail.gmail.com> Hi Giovanni, I'm the author of the new zipfile module, which has come to be named ziparchive. The SoC project was mentored by Ilya Etingof. It's available through sourceforge page [1,2], were you can download a package for it, and also through svn [3]. The current implementation is working nicely, and includes the initially proposed features, which includes: file-like access to zip members; support for BZIP2 compression; support for member file removal; and support for encryption. However, I'm not fully satisfied with the current API niceness (and some of its limitations), and I'm working on a somewhat new design, which will start within the next version. So, it would be very nice to get suggestions, ideas and criticism about the current version so that the next one can be better still. So, I encourage whoever is interested to download and try it. There are some examples in the code and in the project home page [2]. And, please, send some feedback, which will help make this the ultimate zip library for python. :-) [1] http://sourceforge.net/projects/ziparchive [2] http://ziparchive.sourceforge.net/ [3] https://svn.sourceforge.net/svnroot/ziparchive/ziparchive/ Cheers, -- Nilton On 11/12/06, Giovanni Bajo wrote: > Hello, > > wasn't there a project about the zipfile module in the Summer of Code? How did > it go? > > Giovanni Bajo From brett at python.org Thu Nov 16 20:41:16 2006 From: brett at python.org (Brett Cannon) Date: Thu, 16 Nov 2006 11:41:16 -0800 Subject: [Python-Dev] Results of the SOC projects In-Reply-To: References: Message-ID: On 11/15/06, Georg Brandl wrote: > > Hi, > > this might seem a bit late, and perhaps I was just blind, > but I miss something like a summary how the Python > summer of code projects went, and what the status of the ones > that were meant to improve the standard library, e.g. the > C decimal implementation, is. There was never a formal one to my knowledge. Part of the problem is that the PSF acted as a blanket organization this year so we just basically helped dole out slots to various Python projects. This meant it was not under very centralized control and thus not easy to track. Anyway, as for the python-dev projects, there is an email in another thread about the zip work. As for the adding of logging to the stdlib modules or the decimal in C, we need the mentors to step forward and say something about that. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061116/89a372cb/attachment.htm From fredrik at pythonware.com Thu Nov 16 21:49:34 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 16 Nov 2006 21:49:34 +0100 Subject: [Python-Dev] 2.5 portability problems In-Reply-To: <455B848C.9070909@v.loewis.de> References: <455B848C.9070909@v.loewis.de> Message-ID: Martin v. L?wis wrote: > I'd like to share an observation on portability of extension > modules to Python 2.5: python-ldap would crash on Solaris, see > > http://groups.google.com/group/comp.lang.python/msg/a678a969c90f21ab?dmode=source&hl=en > > It turns out that this was caused by a mismatch in malloc > "families" (PyMem_Del vs. PyObject_Del): I was just hit *hard* by this issue (in an extension that worked perfectly well under all test cases, and all but one demo script, which happened to be the only one that happened to do a certain trivial operation more than 222 times), so I added a FAQ entry: http://effbot.org/pyfaq/why-does-my-c-extension-suddenly-crash-under-2.5.htm feel free to add symptoms or other observations for other platforms and/or extensions. cheers /F From turnbull at sk.tsukuba.ac.jp Fri Nov 17 02:49:04 2006 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 17 Nov 2006 10:49:04 +0900 Subject: [Python-Dev] Results of the SOC projects In-Reply-To: References: Message-ID: <878xiaj10v.fsf@uwakimon.sk.tsukuba.ac.jp> Brett Cannon writes: > There was never a formal one to my knowledge. Part of the problem is that > the PSF acted as a blanket organization this year so we just basically > helped dole out slots to various Python projects. This meant it was not > under very centralized control and thus not easy to track. I don't think you need "centralization" or "control"; the Python mentors are all public spirited and responsible folks, right? It's just that report-writing is kind of unrewarding work, especially if you don't know what the report is supposed to be like (and haven't even been asked for them!) Why not have a wiki page for reports, and hand out a T-shirt or something like that to *mentors* who file their reports? Somebody at the PSF should sit down, think about what the report really needs to say from their point of view, and buy a pizza (as well as the T-shirt!) for somebody trusted to write a good but *minimal* report. Then point to that: "Here's the quality of prose and citation you need to aspire to, here's the minimum length and content you *must* include." Report-writing of this kind is for the *mentors*: you want to know who supervises well, and eventually do meta-mentoring. Of course the participants should be writing reports too, but this page should link to those reports. You'll get them; the mentor's T-shirt ("Somebody participated in the Summer of Code and all I got is this lousy T-shirt") is at stake! From kbk at shore.net Fri Nov 17 05:38:56 2006 From: kbk at shore.net (Kurt B. Kaiser) Date: Thu, 16 Nov 2006 23:38:56 -0500 (EST) Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200611170438.kAH4cu59022987@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 416 open (-14) / 3463 closed (+16) / 3879 total ( +2) Bugs : 930 open ( +8) / 6333 closed (+17) / 7263 total (+25) RFE : 244 open ( -1) / 244 closed ( +3) / 488 total ( +2) New / Reopened Patches ______________________ tkSimpleDialog freezes when apply raises exception (2006-11-11) http://python.org/sf/1594554 opened by Hirokazu Yamamoto Iterating closed StringIO.StringIO (2005-11-18) http://python.org/sf/1359365 reopened by doerwalter Cross compiling patches for MINGW (2006-11-16) http://python.org/sf/1597850 opened by Han-Wen Nienhuys Patches Closed ______________ `in` for classic object causes segfault (2006-11-07) http://python.org/sf/1591996 closed by loewis askyesnocancel helper for tkMessageBox (2005-11-08) http://python.org/sf/1351744 closed by loewis PyErr_CheckSignals returns -1 on error, not 1 (2006-11-07) http://python.org/sf/1592072 closed by gbrandl make pty.fork() allocate a controlling tty (2003-11-08) http://python.org/sf/838546 closed by loewis Add missing elide argument to Text.search (2006-11-07) http://python.org/sf/1592250 closed by loewis mailbox: use fsync() to ensure data is really on disk (2006-06-29) http://python.org/sf/1514544 closed by akuchling mailbox (Maildir): avoid losing messages on name clash (2006-06-29) http://python.org/sf/1514543 closed by akuchling Fix struct.pack on 64-bit archs (broken on 2.*) (2004-10-02) http://python.org/sf/1038854 closed by loewis Cross building python for mingw32 (2003-11-13) http://python.org/sf/841454 closed by loewis httplib: allowing stream-type body part in requests (2004-11-12) http://python.org/sf/1065257 closed by loewis support whence argument for GzipFile.seek (bug #1316069) (2005-11-12) http://python.org/sf/1355023 closed by loewis fix for 1067728: Better handling of float arguments to seek (2004-11-17) http://python.org/sf/1067760 closed by loewis ftplib transfer problem with certain servers (2005-11-17) http://python.org/sf/1359217 closed by loewis bdist_rpm still can't handle dashes in versions (2005-11-18) http://python.org/sf/1360200 closed by loewis Fix the vc8 solution files (2006-08-19) http://python.org/sf/1542946 closed by krisvale Practical ctypes example (2006-09-15) http://python.org/sf/1559219 closed by theller New / Reopened Bugs ___________________ Unfortunate naming of variable in heapq example (2006-11-08) CLOSED http://python.org/sf/1592533 opened by Martin Thorsen Ranang gettext has problems with .mo files that use non-ASCII chars (2006-11-08) CLOSED http://python.org/sf/1592627 opened by Russell Phillips replace groups doesn't work in this special case (2006-11-06) http://python.org/sf/1591319 reopened by tomek74 readline problem on ia64-unknown-linux-gnu (2006-11-08) http://python.org/sf/1593035 opened by Kate Minola No IDLE in Windows (2006-11-09) CLOSED http://python.org/sf/1593384 opened by A_V_I No IDLE in Windows (2006-11-09) CLOSED http://python.org/sf/1593407 opened by A_V_I No IDLE in Windows (2006-11-09) CLOSED http://python.org/sf/1593442 opened by A_V_I site-packages isn't created before install_egg_info (2006-09-28) CLOSED http://python.org/sf/1566719 reopened by loewis Modules/unicodedata.c contains C++-style comment (2006-11-09) CLOSED http://python.org/sf/1593525 opened by Mike Kent No IDLE in Windows (2006-11-09) CLOSED http://python.org/sf/1593634 opened by A_V_I poor urllib error handling (2006-11-09) http://python.org/sf/1593751 opened by Guido van Rossum small problem with description (2006-11-09) CLOSED http://python.org/sf/1593829 opened by Atlas Word should be changed on page 3.6.1 (2006-11-11) CLOSED http://python.org/sf/1594742 opened by jikanter Make docu for dict.update more clear (2006-11-11) CLOSED http://python.org/sf/1594758 opened by Christoph Zwerschke make install fails, various modules do not work (2006-11-11) CLOSED http://python.org/sf/1594809 opened by Evan doctest simple usage recipe is misleading (2006-11-12) http://python.org/sf/1594966 opened by Ken Rimey smtplib.SMTP.sendmail() does not provide transparency (2006-11-12) CLOSED http://python.org/sf/1595045 opened by Avi Kivity texinfo library documentation fails to build (2006-11-12) http://python.org/sf/1595164 opened by Mark Diekhans User-agent header added by an opener is "frozen" (2006-11-13) http://python.org/sf/1595365 opened by Bj?rn Steinbrink parser module bug for nested try...except statements (2006-11-13) CLOSED http://python.org/sf/1595594 opened by Kay Schluehr SocketServer allow_reuse_address checked in constructor (2006-11-13) http://python.org/sf/1595742 opened by Peter Parente read() in windows stops on chr(26) (2006-11-13) http://python.org/sf/1595822 opened by reson5 KeyError at exit after 'import threading' in other thread (2006-11-14) http://python.org/sf/1596321 opened by Christian Walther HTTP headers (2006-11-15) http://python.org/sf/1597000 opened by Hugo Leisink Reading with bz2.BZ2File() returns one garbage character (2006-11-15) http://python.org/sf/1597011 opened by Clodoaldo Pinto Neto Can't exclude words before capture group (2006-11-15) CLOSED http://python.org/sf/1597014 opened by Cees Timmerman sqlite timestamp converter bug (floating point) (2006-11-15) http://python.org/sf/1597404 opened by Michael Salib "Report website bug" -> Forbidden :( (2006-11-16) CLOSED http://python.org/sf/1597570 opened by Jens Diemer Modules/readline.c fails to compile on AIX 4.2 (2006-11-16) http://python.org/sf/1597798 opened by Mike Kent atexit.register does not return the registered function. (2006-11-16) CLOSED http://python.org/sf/1597824 opened by Pierre Rouleau Python/ast.c:541: seq_for_testlist: Assertion fails (2006-11-16) CLOSED http://python.org/sf/1597930 opened by Darrell Schiebel Top-level exception handler writes to stdout unsafely (2006-11-16) http://python.org/sf/1598083 opened by Jp Calderone Bugs Closed ___________ Unfortunate naming of variable in heapq example (2006-11-08) http://python.org/sf/1592533 closed by gbrandl gettext has problems with .mo files that use non-ASCII chars (2006-11-08) http://python.org/sf/1592627 closed by avantman42 problem building python in vs8express (2006-11-06) http://python.org/sf/1591122 closed by loewis No IDLE in Windows (2006-11-09) http://python.org/sf/1593384 closed by loewis No IDLE in Windows (2006-11-09) http://python.org/sf/1593407 deleted by akuchling mailbox.Maildir.get_folder() loses factory information (2006-10-03) http://python.org/sf/1569790 closed by akuchling No IDLE in Windows (2006-11-09) http://python.org/sf/1593442 deleted by gbrandl site-packages isn't created before install_egg_info (2006-09-28) http://python.org/sf/1566719 closed by pje Modules/unicodedata.c contains C++-style comment (2006-11-09) http://python.org/sf/1593525 closed by doerwalter No IDLE in Windows (2006-11-09) http://python.org/sf/1593634 deleted by gbrandl small problem with description (2006-11-09) http://python.org/sf/1593829 deleted by bauersj Word should be changed on page 3.6.1 (2006-11-11) http://python.org/sf/1594742 closed by gbrandl Make docu for dict.update more clear (2006-11-11) http://python.org/sf/1594758 closed by gbrandl make install fails, various modules do not work (2006-11-11) http://python.org/sf/1594809 closed by loewis gzip.GzipFile.seek missing second argument (2005-10-07) http://python.org/sf/1316069 closed by loewis smtplib.SMTP.sendmail() does not provide transparency (2006-11-12) http://python.org/sf/1595045 deleted by avik parser module bug for nested try...except statements (2006-11-13) http://python.org/sf/1595594 closed by gbrandl Can't exclude words before capture group (2006-11-15) http://python.org/sf/1597014 closed by gbrandl "Report website bug" -> Forbidden :( (2006-11-16) http://python.org/sf/1597570 closed by gbrandl atexit.register does not return the registered function. (2006-11-16) http://python.org/sf/1597824 closed by gbrandl quoted printable parse the sequence '= ' incorrectly (2006-10-31) http://python.org/sf/1588217 closed by gbrandl Python/ast.c:541: seq_for_testlist: Assertion fails (2006-11-16) http://python.org/sf/1597930 closed by gbrandl New / Reopened RFE __________________ "".translate() docs should mention string.maketrans() (2006-11-08) http://python.org/sf/1592899 opened by Ori Avtalion base64 doc Python 2.3 <-> 2.4 (2006-11-16) CLOSED http://python.org/sf/1597576 opened by Jens Diemer RFE Closed __________ wsgi.org link in wsgiref (2006-08-18) http://python.org/sf/1542920 closed by akuchling Move gmtime function from calendar to time module (2003-03-05) http://python.org/sf/697985 closed by akuchling base64 doc Python 2.3 <-> 2.4 (2006-11-16) http://python.org/sf/1597576 closed by gbrandl From python-dev at zesty.ca Sat Nov 18 18:28:23 2006 From: python-dev at zesty.ca (Ka-Ping Yee) Date: Sat, 18 Nov 2006 11:28:23 -0600 (CST) Subject: [Python-Dev] Python in first-year MIT core curriculum Message-ID: Wow. Did you catch this news? http://www-tech.mit.edu/V125/N65/coursevi.html The first four weeks of C1 will be a lot like the first four weeks of 6.001, Abelson said. The difference is that programming will be done in Python and not Scheme. -- ?!ng From fredrik at pythonware.com Sat Nov 18 19:05:05 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 18 Nov 2006 19:05:05 +0100 Subject: [Python-Dev] Python in first-year MIT core curriculum In-Reply-To: References: Message-ID: Ka-Ping Yee wrote: > Wow. Did you catch this news? > > http://www-tech.mit.edu/V125/N65/coursevi.html > > The first four weeks of C1 will be a lot like the first > four weeks of 6.001, Abelson said. The difference is > that programming will be done in Python and not Scheme. "This story was published on Wednesday, February 1, 2006." ;-) From brett at python.org Sat Nov 18 22:02:16 2006 From: brett at python.org (Brett Cannon) Date: Sat, 18 Nov 2006 13:02:16 -0800 Subject: [Python-Dev] discussion of schema for new issue tracker starting Message-ID: Discussion of what we want in terms of the schema for the new issue tracker has begun. If you wish to give feedback on what you would like each issue to have in terms of data then please file an issue in the meta tracker at http://psf.upfronthosting.co.za/roundup/meta/ . You can see the current test tracker at http://psf.upfronthosting.co.za/roundup/tracker/ . And the tracker-discuss mailing list is at http://mail.python.org/mailman/listinfo/tracker-discuss (although you can bypass the list and use the meta tracker to your ideas relating to the schema). If you do participate through the meta tracker please sign up for an account so that it is not anonymous. I really hope that Anthony and Neal can participate so that we can make sure the tracker does what they need to make their lives easier during a release. And obviously everyone who still works with bugs and patches should participate as well. We can change the schema even after we launch to the new tracker, but it would be nice to minimize the amount of feature churn once the tracker is up and going. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061118/95531b46/attachment.html From martin at v.loewis.de Sun Nov 19 11:58:37 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 19 Nov 2006 11:58:37 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook Message-ID: <456038DD.4040304@v.loewis.de> Patch #849407 proposes to change the meaning of the urllib reporthook so that it takes the amount of the data read instead of the block size as its second argument. While this is a behavior change (and even for explicitly-documented behavior), I still propose to apply the change: - in many cases, the number of bytes read will equal to the block size, so no change should occur - the signature (number of parameters) does not change, so applications shouldn't crash because of that change - applications that do use the parameter to estimate total download time now get a better chance to estimate since they learn about short reads. What do you think? Regards, Martin From phd at phd.pp.ru Mon Nov 20 09:44:57 2006 From: phd at phd.pp.ru (Oleg Broytmann) Date: Mon, 20 Nov 2006 11:44:57 +0300 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: <456038DD.4040304@v.loewis.de> References: <456038DD.4040304@v.loewis.de> Message-ID: <20061120084457.GF32570@phd.pp.ru> On Sun, Nov 19, 2006 at 11:58:37AM +0100, "Martin v. L?wis" wrote: > - the signature (number of parameters) does not > change, so applications shouldn't crash because > of that change I am slightly worried about the change in semantics. > - applications that do use the parameter to > estimate total download time now get a better > chance to estimate since they learn about > short reads. +1 Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From fredrik at pythonware.com Mon Nov 20 13:20:01 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 20 Nov 2006 13:20:01 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook References: <456038DD.4040304@v.loewis.de> Message-ID: Martin v. L?wis wrote: > While this is a behavior change (and even for > explicitly-documented behavior), I still propose > to apply the change: > - in many cases, the number of bytes read will > equal to the block size, so no change should > occur > - the signature (number of parameters) does not > change, so applications shouldn't crash because > of that change > - applications that do use the parameter to > estimate total download time now get a better > chance to estimate since they learn about > short reads. haven't used the reporthook, but my reading of the documentation would have led me to believe that I should do count*blocksize to determine how much data I've gotten this far. changing the blocksize without setting the count to zero would break such code. From jimjjewett at gmail.com Mon Nov 20 17:57:31 2006 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 20 Nov 2006 11:57:31 -0500 Subject: [Python-Dev] Results of the SOC projects Message-ID: Brett: > As for the adding of logging to the stdlib modules ... we need the mentors > to step forward and say something about that. The logging additions are not ready for stdlib inclusion at this time. Some modules are closer than others, but whether it makes sense to add them piecemeal is a different question. -jJ From fredrik at pythonware.com Mon Nov 20 20:34:10 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 20 Nov 2006 20:34:10 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations Message-ID: the FAQ contains a list of "atomic" operation, and someone recently asked whether += belongs to this group. can anyone who knows the answer perhaps add a comment to: http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm ? (other comments on that page are of course also welcome) From martin at v.loewis.de Mon Nov 20 23:45:05 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 20 Nov 2006 23:45:05 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: References: <456038DD.4040304@v.loewis.de> Message-ID: <45622FF1.4030202@v.loewis.de> Fredrik Lundh schrieb: > haven't used the reporthook, but my reading of the documentation would have led me > to believe that I should do count*blocksize to determine how much data I've gotten this > far. changing the blocksize without setting the count to zero would break such code. Right - such code would break. I believe the code would also break when the count is set to zero; I can't see how this would help. The question is whether this breakage is a strong enough reason not to change the code. Regards, Martin From martin at v.loewis.de Mon Nov 20 23:55:42 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 20 Nov 2006 23:55:42 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: References: Message-ID: <4562326E.904@v.loewis.de> Fredrik Lundh schrieb: > the FAQ contains a list of "atomic" operation, and someone recently > asked whether += belongs to this group. In general, += isn't atomic: it may invoke __add__ or __iadd__ on the left-hand side, or __radd__ on the right-hand side. >From your list, I agree with Josiah Carlson's observation that the examples you give involve separate name lookups (e.g. L.append(x) loads L, then fetches L.append, then loads x, then calls apped, each in a single opcode); the actual operation is atomic. If you only look at the actual operation, the these aren't atomic: x.field = y # may invoke __setattr__, may also be a property D[x] = y # may invoke x.__hash__, and x.__eq__ I'm uncertain whether D1.update(D2) will invoke callbacks (it probably will). Regards, Martin From scott+python-dev at scottdial.com Tue Nov 21 00:23:59 2006 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 20 Nov 2006 18:23:59 -0500 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: References: <456038DD.4040304@v.loewis.de> Message-ID: <4562390F.8090703@scottdial.com> Fredrik Lundh wrote: > haven't used the reporthook, but my reading of the documentation would have led me > to believe that I should do count*blocksize to determine how much data I've gotten this > far. changing the blocksize without setting the count to zero would break such code. > > > I'm not sure where the error in your reading happened, but I read the docs and got the same thing out of it except that there is no problem with Martin's change. This API doesn't seem to make much sense anyways because who is going to be interested in the count? Fixing the count to one always and setting blocksize to the actual amount of data makes the most sense in recovering this API. The only potential problem is if there is a non-null answer to "who is going to be interested in the count?" -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From julvar at tamu.edu Mon Nov 13 06:27:49 2006 From: julvar at tamu.edu (Julian) Date: Sun, 12 Nov 2006 23:27:49 -0600 Subject: [Python-Dev] Suggestion/ feature request Message-ID: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu> Hello, I am using python with swig and I get a lot of macro redefinition warnings like so: warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition In the file - pyconfig.h - rather than the following lines, I was wondering if it would be more reasonable to use #ifdef statements as shown in the bottom of the email... #define _CRT_SECURE_NO_DEPRECATE 1 #define _CRT_NONSTDC_NO_DEPRECATE 1 #if !defined(_CRT_SECURE_NO_DEPRECATE) # define _CRT_SECURE_NO_DEPRECATE #endif #if !defined(_CRT_NONSTDC_NO_DEPRECATE) # define _CRT_NONSTDC_NO_DEPRECATE #endif Just a suggestion... Thanks for reading! Julian. From matt.kern at undue.org Fri Nov 17 13:15:22 2006 From: matt.kern at undue.org (Matt Kern) Date: Fri, 17 Nov 2006 12:15:22 +0000 Subject: [Python-Dev] POSIX Capabilities Message-ID: <20061117121522.GA13677@pling.qwghlm.org> I was looking around for an interface to POSIX capabilities from Python under Linux. I couldn't find anything that did the job, so I wrote the attached PosixCapabilities module. It has a number of shortcomings: * it is written using ctypes to interface directly to libcap; * it assumes the sizes/types of various POSIX defined types; * it only gets/sets process capabilities; * it can test/set/clear capability flags. Despite the downsides, I think it would be good to get the package out there. If anyone wishes to adopt it, update it, rewrite it and/or put it into the distribution, then feel free. Regards, Matt -- Matt Kern http://www.undue.org/ -------------- next part -------------- A non-text attachment was scrubbed... Name: PosixCapabilities.py Type: text/x-python Size: 7374 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20061117/4db99f3b/attachment-0001.py From kate01123 at gmail.com Sat Nov 18 20:40:45 2006 From: kate01123 at gmail.com (Kate Minola) Date: Sat, 18 Nov 2006 14:40:45 -0500 Subject: [Python-Dev] [1593035] Re: readline problem with python-2.5 Message-ID: <9c27041b0611181140y2ac41d89m2815225db3b42cd9@mail.gmail.com> I have a fix to my bug report 1593035 regarding python-2.5 not working with readline on ia64-Linux. This bug was found while trying to port SAGE (http://modular.math.washington.edu/sage/) to ia64-Linux. The problem is caused by the line of Modules/readline.c in flex_complete() return completion_matches(text, *on_completion); In readline-5.2, completion_matches() is defined in compat.c as char ** completion_matches(const char *,rl_compentry_func_t *); But in Modules/readline.c completion_matches() by default is assumed to return an int, and on_completion() is defined as char *. To fix the problem, both the function itself and the second argument need to be cast to the correct types in Modules/readline.c/flex_complete() return (char **) completion_matches(text, (rl_compentry_func_t *)*on_completio n); and completion_matches needs to be defined as an external function. I added the following else clause to the ifdef at the top of Modules/readline.c/flex_complete() #ifdef HAVE_RL_COMPLETION_MATCHES #define completion_matches(x, y) \ rl_completion_matches((x), ((rl_compentry_func_t *)(y))) #else extern char ** completion_matches(const char *,rl_compentry_func_t *); #endif Kate Minola University of Maryland, College Park From guido at python.org Tue Nov 21 02:24:21 2006 From: guido at python.org (Guido van Rossum) Date: Mon, 20 Nov 2006 17:24:21 -0800 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: <45622FF1.4030202@v.loewis.de> References: <456038DD.4040304@v.loewis.de> <45622FF1.4030202@v.loewis.de> Message-ID: Is there any reason to assume the data size is ever less than the block size except for the last data block? It's reading from a pseudo-file tied to a socket, but Python files tend to have the property that read(n) returns exactly n bytes unless at EOF. BTW I left a longer comment at SF earlier. On 11/20/06, "Martin v. L?wis" wrote: > Fredrik Lundh schrieb: > > haven't used the reporthook, but my reading of the documentation would have led me > > to believe that I should do count*blocksize to determine how much data I've gotten this > > far. changing the blocksize without setting the count to zero would break such code. > > Right - such code would break. I believe the code would also break when > the count is set to zero; I can't see how this would help. > > The question is whether this breakage is a strong enough reason not to > change the code. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From exarkun at divmod.com Tue Nov 21 03:53:04 2006 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 20 Nov 2006 21:53:04 -0500 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <4562326E.904@v.loewis.de> Message-ID: <20061121025304.20948.524767534.divmod.quotient.36726@ohm> On Mon, 20 Nov 2006 23:55:42 +0100, "\"Martin v. L?wis\"" wrote: >Fredrik Lundh schrieb: >> the FAQ contains a list of "atomic" operation, and someone recently >> asked whether += belongs to this group. > >In general, += isn't atomic: it may invoke __add__ or __iadd__ on the >left-hand side, or __radd__ on the right-hand side. > >>From your list, I agree with Josiah Carlson's observation that the >examples you give involve separate name lookups (e.g. L.append(x) >loads L, then fetches L.append, then loads x, then calls apped, >each in a single opcode); the actual operation is atomic. > >If you only look at the actual operation, the these aren't atomic: > >x.field = y # may invoke __setattr__, may also be a property >D[x] = y # may invoke x.__hash__, and x.__eq__ > >I'm uncertain whether D1.update(D2) will invoke callbacks (it >probably will). Quite so: >>> class X: ... def __del__(self): ... print 'X.__del__' ... >>> a = {1: X()} >>> b = {1: 2} >>> a.update(b) X.__del__ >>> Jean-Paul From steven.bethard at gmail.com Tue Nov 21 05:05:22 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 20 Nov 2006 21:05:22 -0700 Subject: [Python-Dev] DRAFT: python-dev summary for 2006-10-01 to 2006-10-15 Message-ID: Here's the summary for the first half of October. As always, comments and corrections are greatly appreciated. ============= Announcements ============= ----------------------------- QOTF: Quotes of the Fortnight ----------------------------- Martin v. L?wis on a small change to Python that wouldn't affect many applications: I'm pretty sure someone will notice, though; someone always notices. Contributing thread: - `Caching float(0.0) `__ Steve Holden reminds us that patch submissions are dramatically preferred to verbose thread discussions: This thread has disappeared down a rat-hole, never to re-emerge with anything of significant benefit to users. C'mon, guys, implement a patch or leave it alone :-) Contributing thread: - `Caching float(0.0) `__ ========= Summaries ========= -------------- Caching floats -------------- Nick Craig-Wood discovered that he could save 7MB in his application by adding the following simple code:: if age == 0.0: age = 0.0 A large number of his calculations were producing the value 0.0, which meant that many copies of 0.0 were being stored. Since all 0.0 literals refer to the same object, the code above was removing all the duplicate copies of 0.0. Skip Montanaro and played around a bit with floatobject.c, and found that Python's test suite allocated a large number of small integral floats (though only a couple hundred were generally allocated at the same time). Kristj?n V. J?nsson played around with caching for float values between -10.0 and 10.0 with the EVE server and got a 25% savings in allocations. There was some concern that for systems with both +0.0 and -0.0, the cache might cause problems, since determining which zero you have seemed difficult. However, James Y Knight showed how to do this fairly easily in C with a double/uint64_t union. Eventually, people agreed that it should be fine to just cache +0.0. Kristj?n V. J?nsson and Josiah Carlson proposed patches, but nothing was posted to SourceForge. Contributing threads: - `Caching float(0.0) `__ - `Caching float(0.0) `__ -------------------------------------------- Buffer overrun in repr() and Python releases -------------------------------------------- The implications of PSF-2006-001_, a buffer overrun problem in repr(), were considered for the various Python releases. The bug had been fixed before Python 2.5 was released, and had been applied to the Python 2.4 branch shortly before Python 2.4.4 was released. The security advisory provided patches for both Python 2.3 and 2.4, but to make sure that full source releases were available for all major versions of Python still in use, it looked like there would be a source-only 2.3.6 release (source-only because neither Mac nor Windows builds were affected). .. _PSF-2006-001: http://www.python.org/news/security/PSF-2006-001/ Contributing threads: - `Security Advisory for unicode repr() bug? `__ - `2.3.6 for the unicode buffer overrun `__ --------------------------- Build system for python.org --------------------------- Anthony, Barry Warsaw, Georg Brandl and others indicated that the current website build system was making releases and other updates more difficult than they should be. Most people didn't have enough cycles to spare for this, but Michael Foord said he could help with a transition to rest2web_ if that was desirable. Fredrik Lundh also suggested a few options, including his older proposal to `use Django`_. No definite plans were made though. .. _rest2web: http://www.voidspace.org.uk/python/rest2web/ .. _use Django: http://effbot.org/zone/pydotorg-cache.htm Contributing thread: - `2.3.6 for the unicode buffer overrun `__ -------------------- String concatenation -------------------- Larry Hastings posted a `patch for string concatenation`_ that delays the creation of a new string until someone asks for the string's value. As a result, the following code would be about as fast as the ``''.join(strings)`` idiom:: result = '' for s in strings: result += s To achieve this, he had to change ``PyStringObject.ob_sval`` from a ``char[1]`` array, to a ``char *``. Reaction was mixed -- some people really disliked using ``join()``, while others didn't see the need for such a change. .. _patch for string concatenation: http://bugs.python.org/1569040 Contributing thread: - `PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom `__ ---------------------------- PEP 315: Enhanced While Loop ---------------------------- Hans Polak revived the discussion about `PEP 315`_, which proposes a do-while loop for Python that would allow the current code:: while True: if not : break to be written instead as:: do: while : Hans was hoping to simplify the situation where there is no ```` following the ```` test and a number of syntax suggestions were proposed to this end. In the end, Guido indicated that none of the suggestions were acceptable, and Raymond Hettinger offered to withdraw the PEP. .. _PEP 315: http://www.python.org/dev/peps/pep-0315/ Contributing threads: - `PEP 351 - do while `__ - `PEP 351 - do while `__ - `PEP 315 - do while `__ ------------------------------ PEP 355: path objects rejected ------------------------------ Luis P Caamano asked about the status of `PEP 355`_, which aimed to introduce an object-oriented reorganization of Python's path-related functions. Guido indicated that the current "amalgam of unrelated functionality" was unacceptable and pronounced it dead. Nick Coghlan elaborated the "amalgam" point, explaining that `PEP 355`_ lumped together all the following: - string manipulation operations - abstract path manipulation operations - read-only traversal of a concrete filesystem - addition and removal of files/directories/links within a concrete filesystem Jason Orendorff pointed out some other problems with the PEP: - the motivation was weak - the API had too many methods - it didn't fix all the perceived problems with the existing APIs - it would have introduced a Second Way To Do It without being clearly better than the current way There were some rumors of a new PEP based on Twisted's filepath_ module, but nothing concrete at the time of this summary. .. _PEP 355: http://www.python.org/dev/peps/pep-0355/ .. _filepath: http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py Contributing threads: - `PEP 355 status `__ - `PEP 355 status `__ ---------------------------------- Processes and threading module API ---------------------------------- Richard Oudkerk proposed a module that would make processes usable with an API like that of the threading module. People seemed unsure as to whether it would be better to have a threading-style or XML-RPC-style API. A few other relevant modules were identified, including PyXMLRPC_ and POSH_. No clear winner emerged. .. _PyXMLRPC: http://sourceforge.net/projects/py-xmlrpc/ .. _POSH: http://poshmodule.sourceforge.net/ Contributing thread: - `Cloning threading.py using proccesses `__ ------------------------------------ Python 3.0: registering methods in C ------------------------------------ After a brief exchange about which ``tp_flags`` implied which others, there was some discussion on how to simplify ``tp_flags`` for Python 3000. Raymond Hettinger suggested that the NULL or non-NULL status of a slot should be enough to indicate its presence. Martin v. L?wis pointed out that this would require recompiling extension modules for every release, since if a new slot is added, extension modules from earlier releases wouldn't even *have* the slot. Fredrik Lundh suggested a `dynamic registration method`_ instead, which would look something like:: static PyTypeObject *NoddyType; NoddyType = PyType_Setup("noddy.Noddy", sizeof(Noddy)); PyType_Register(NoddyType, PY_TP_DEALLOC, Noddy_dealloc); PyType_Register(NoddyType, PY_TP_DOC, "Noddy objects"); ... PyType_Register(NoddyType, PY_TP_NEW, Noddy_new); if (PyType_Ready(&NoddyType) < 0) return; People thought this looked like a good idea, and Fredrik Lundh planned to look into it seriously for Python 3000. .. _dynamic registration method: http://effbot.org/zone/idea-register-type.htm Contributing thread: - `2.4.4: backport classobject.c HAVE_WEAKREFS? `__ ----------------------------------------- Tracker Recommendations: JIRA and Roundup ----------------------------------------- The PSF Infrastructure Committee announced their recommendations for trackers to replace SourceForge. Both JIRA and Roundup were definite improvements over SourceForge, though the Infrastructure Committee was leaning towards JIRA since Atlassian had offered to host it for them. Roundup was still under consideration if 6-10 admins could volunteer to maintain the installation. (More updates on this in the next summary.) Contributing thread: - `PSF Infrastructure Committee's recommendation for a new issue tracker `__ ----------------------------- PEP 302: import hooks phase 2 ----------------------------- Brett Cannon announced that he'd be working on a C implementation of phase 2 of `PEP 302`_. Phillip J. Eby pointed out that phase 2 could not be implemented in a backwards-compatible way, and so the code should be targeted at the p3yk branch. He also suggested that rewriting the import mechanisms in Python was probably going to be easier than trying to do it in C, particularly since some of the pieces were already available in the pkgutil module. Neal Norwitz strongly agreed, pointing out that string and list manipulation, which is necessary in a variety of places in the import mechanisms, is much easier in Python than in C. Brett promised a Python implementation as part of his research work. .. _PEP 302: http://www.python.org/dev/peps/pep-0302/ Contributing thread: - `Created branch for PEP 302 phase 2 work (in C) `__ ------------------------------------------ Web crawlers and development documentation ------------------------------------------ Fredrik Lundh noticed that Google was sometimes finding the `development documentation`_ instead of the `current release documentation`_. A.M. Kuchling added a ``robots.txt`` to keep crawlers out of the development area. .. _development documentation: http://docs.python.org/dev/ .. _current release documentation: http://docs.python.org/ Contributing thread: - `what's really new in python 2.5 ? `__ ----------------------------------------- Buildbots, compile errors and batch files ----------------------------------------- Tim Peters noticed that bsddb was getting compile errors on Windows but the buildbots were not reporting anything. Because some additional commands were added after the call to ``devenv.com`` in the ``build.bat`` script, the error status was not getting propagated appropriately. After Tim and Martin v. L?wis figured out how to repair this, the buildbots were again able to report compile errors. Contributing thread: - `2.4 vs Windows vs bsddb `__ --------------------------------- Python 2.5 and Visual Studio 2005 --------------------------------- Kristj?n V. J?nsson showed that using Visual Studio 2005 instead of Visual Studio 2003 gave a 7% gain in speed, and a 10% gain when performance guided optimization (PGO) was enabled. While the "official" compiler can't get changed at a point release, everyone agreed that making the PCBuild8 directory work out of the box and adding an appropriate buildslave was a good idea. Kristj?n promised to look into setting up a buildslave. Contributing thread: - `Python 2.5 performance `__ ---------------------------------- Distributing debug build of Python ---------------------------------- David Abrahams asked if python.org would be willing to post links to the ActiveState debug builds of Python to make it easier for Boost.Python_ users to obtain a debug build. People seemed to think that Boost.Python_ users should be able to create a debug build of Python themselves if necessary. .. _Boost.Python: http://www.boost.org/libs/python/doc/index.html Contributing thread: - `Plea to distribute debugging lib `__ ----------------------------------------- Unmarshalling/Unpickling multiple objects ----------------------------------------- Tim Lesher proposed adding a generator to marshal and pickle so that instead of:: while True: try: obj = marshal.load(fobj) # or pickle.load(fobj) except EOFError: break ... do something with obj ... you could write something like:: for obj in marshal.loaditer(fobj): # or pickle.loaditer(fobj) ... do something with obj ... when you wanted to load multiple objects in sequence from the same file. Both Perforce and Mailman store objects in a way that would benefit from such a function, so it seemed like such an API might be reasonable. No patch had been submitted at the time of this summary. Contributing thread: - `Iterating over marshal/pickle `__ ------------------------------- spawnvp and spawnvpe on Windows ------------------------------- Alexey Borzenkov asked why spawnvp and spawnvpe weren't available in Python on Windows even though they were implemented in the CRT. He got the usual answer, that no one had submitted an appropriate patch, but that such a patch would be a reasonable addition for Python 2.6. Fredrik Lundh pointed out that the subprocess module was probably a better choice than spawnvp and spawnvpe anyway. Contributing thread: - `Why spawnvp not implemented on Windows? `__ ================== Previous Summaries ================== - `difficulty of implementing phase 2 of PEP 302 in Python source `__ - `Python Doc problems `__ - `Signals, threads, blocking C functions `__ =============== Skipped Threads =============== - `Removing __del__ `__ - `Tix not included in 2.5 for Windows `__ - `Weekly Python Patch/Bug Summary `__ - `HAVE_UINTPTR_T test in configure.in `__ - `OT: How many other people got this spam? `__ - `2.4.4 fixes `__ - `2.4.4 fix: Socketmodule Ctl-C patch `__ - `[Python-checkins] r51862 - python/branches/release25-maint/Tools/msi/msi.py `__ - `Fwd: [ python-Feature Requests-1567948 ] poplib.py list interface `__ - `Can't check in on release25-maint branch `__ - `if __debug__: except Exception, e: pdb.set_trace() `__ - `2.5, 64 bit `__ - `BUG (urllib2) Authentication request header is broken on long usernames and passwords `__ - `[Python-3000] Sky pie: a "var" keyword `__ - `Proprietary code in python? `__ - `DRAFT: python-dev summary for 2006-08-16 to 2006-08-31 `__ - `BRANCH FREEZE, release24-maint for 2.4.4c1. 00:00UTC, 11 October 2006 `__ - `2.4 vs Windows vs bsddb [correction] `__ - `RELEASED Python 2.4.4, release candidate 1 `__ - `ConfigParser: whitespace leading comment lines `__ - `Exceptions and slicing `__ - `Proposal: No more standard library additions `__ - `[py3k] Re: Proposal: No more standard library additions `__ - `Modulefinder `__ - `VC6 support on release25-maint `__ - `os.utime on directories: bug fix or new feature? `__ - `Problem building module against Mac Python 2.4 and Python 2.5 `__ From aahz at pythoncraft.com Tue Nov 21 05:12:32 2006 From: aahz at pythoncraft.com (Aahz) Date: Mon, 20 Nov 2006 20:12:32 -0800 Subject: [Python-Dev] POSIX Capabilities In-Reply-To: <20061117121522.GA13677@pling.qwghlm.org> References: <20061117121522.GA13677@pling.qwghlm.org> Message-ID: <20061121041232.GB25517@panix.com> On Fri, Nov 17, 2006, Matt Kern wrote: > > I was looking around for an interface to POSIX capabilities from Python > under Linux. I couldn't find anything that did the job, so I wrote the > attached PosixCapabilities module. It has a number of shortcomings: Please upload it to the Cheeseshop; optional is making an announcement on c.l.py.announce. python-dev really is not the right place. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 1993 From martin at v.loewis.de Tue Nov 21 06:56:20 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 06:56:20 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: References: <456038DD.4040304@v.loewis.de> <45622FF1.4030202@v.loewis.de> Message-ID: <45629504.1070002@v.loewis.de> Guido van Rossum schrieb: > Is there any reason to assume the data size is ever less than the > block size except for the last data block? It's reading from a > pseudo-file tied to a socket, but Python files tend to have the > property that read(n) returns exactly n bytes unless at EOF. Right: socket._fileobject will invoke recv as many times as necessary to read the requested amount of data. I was somehow assuming that it maps read() to read(2), which, in turn, would directly map to recv(2), which could return less data. So it's a semantic change only for the last block. Regards, Martin From martin at v.loewis.de Tue Nov 21 07:01:26 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 07:01:26 +0100 Subject: [Python-Dev] Suggestion/ feature request In-Reply-To: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu> References: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu> Message-ID: <45629636.6090407@v.loewis.de> Julian schrieb: > I am using python with swig and I get a lot of macro redefinition warnings > like so: > warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition > > In the file - pyconfig.h - rather than the following lines, I was wondering > if it would be more reasonable to use #ifdef statements as shown in the > bottom of the email... While I agree that would be reasonable, I also wonder why you are getting these errors. Where is the first definition of these macros, and how is the macro defined at the first definition? Regards, Martin From martin at v.loewis.de Tue Nov 21 07:07:58 2006 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 07:07:58 +0100 Subject: [Python-Dev] POSIX Capabilities In-Reply-To: <20061117121522.GA13677@pling.qwghlm.org> References: <20061117121522.GA13677@pling.qwghlm.org> Message-ID: <456297BE.6040409@v.loewis.de> Matt Kern schrieb: > I was looking around for an interface to POSIX capabilities from Python > under Linux. I couldn't find anything that did the job, so I wrote the > attached PosixCapabilities module. It has a number of shortcomings: > > * it is written using ctypes to interface directly to libcap; > * it assumes the sizes/types of various POSIX defined types; > * it only gets/sets process capabilities; > * it can test/set/clear capability flags. > > Despite the downsides, I think it would be good to get the package out > there. If anyone wishes to adopt it, update it, rewrite it and/or put > it into the distribution, then feel free. As Aahz says: make a distutils package out of it, and upload it to the Cheeseshop. For inclusion into Python, I would rather prefer to see the traditional route: make an autoconf test for presence of these functions, then edit Modules/posixmodule.c to conditionally expose these APIs from posix/os (they are POSIX functions, after all). The standard library should expose them as-is, without providing a convenience wrapper. I believe your implementation has limited portability, due to its usage of hard-coded symbolic values for the capabilites (I guess this is the Linux numbering, right?). Unfortunately, a ctypes-based implementation can't really do much better. Regards, Martin From martin at v.loewis.de Tue Nov 21 07:09:25 2006 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 21 Nov 2006 07:09:25 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <20061121025304.20948.524767534.divmod.quotient.36726@ohm> References: <20061121025304.20948.524767534.divmod.quotient.36726@ohm> Message-ID: <45629815.3060903@v.loewis.de> Jean-Paul Calderone schrieb: >> I'm uncertain whether D1.update(D2) will invoke callbacks (it >> probably will). > > Quite so: > > >>> class X: > ... def __del__(self): > ... print 'X.__del__' > ... > >>> a = {1: X()} > >>> b = {1: 2} > >>> a.update(b) > X.__del__ > >>> Ah, right: that's true for any assignment, then. Regards, Martin From arigo at tunes.org Tue Nov 21 12:51:15 2006 From: arigo at tunes.org (Armin Rigo) Date: Tue, 21 Nov 2006 12:51:15 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: <45629504.1070002@v.loewis.de> References: <456038DD.4040304@v.loewis.de> <45622FF1.4030202@v.loewis.de> <45629504.1070002@v.loewis.de> Message-ID: <20061121115115.GA24321@code0.codespeak.net> Hi Martin, On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote: > Right: socket._fileobject will invoke recv as many times as > necessary to read the requested amount of data. I was somehow > assuming that it maps read() to read(2), which, in turn, would > directly map to recv(2), which could return less data. > > So it's a semantic change only for the last block. That means that it would be rather pointless to make the change, right? The original poster's motivation is to get accurate progress during the transfer - but he missed that he already gets that. The proposed change only appears to be relevant together with a hypothetical rewrite of the underlying code, one that would use recv() instead of read(). A bientot, Armin From arigo at tunes.org Tue Nov 21 13:08:37 2006 From: arigo at tunes.org (Armin Rigo) Date: Tue, 21 Nov 2006 13:08:37 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <4562326E.904@v.loewis.de> References: <4562326E.904@v.loewis.de> Message-ID: <20061121120837.GB24321@code0.codespeak.net> Hi Martin, On Mon, Nov 20, 2006 at 11:55:42PM +0100, "Martin v. L?wis" wrote: > In general, += isn't atomic: it may invoke __add__ or __iadd__ on the > left-hand side, or __radd__ on the right-hand side. > If you only look at the actual operation, the these aren't atomic: > > x.field = y # may invoke __setattr__, may also be a property > D[x] = y # may invoke x.__hash__, and x.__eq__ I think this list of examples isn't meant to be read that way. Half of them can invoke custom methods, not just the two you mention here. I think the idea is that provided only "built-in enough" objects are involved, the core operation described by each line works atomically, in the sense e.g. that if two threads do 'L.append(x)' you really add two items to the list (only the order is unspecified), and if two threads perform x.field = y roughly at the same time, and the type of x doesn't override the default __setattr__ logic, then you know that the object x will end up with a 'field' that is present and has exactly one of the two values that the threads tried to put in. Python programs rely on these kind of properties, and they are probably a good thing - at least, much better IMHO than having to put locks everywhere. I would even say that the distinction between "preventing the interpreter from crashing" and "getting sane results" is not really relevant. If your program doesn't crash the interpreter, but loose some append()s or produce similar nonsense if you forget a lock, then we get the drawbacks of the GIL without its benefits... In practice, the list of operations that is atomic should (ideally) be documented more precisely -- one way to do that is to specify it at the level of built-in methods instead of syntax, e.g. saying that the method list.append() works atomically, and so does dict.setdefault() as long as all keys are "built-in enough" objects. A bientot, Armin From guido at python.org Tue Nov 21 17:10:03 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 21 Nov 2006 08:10:03 -0800 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: <20061121115115.GA24321@code0.codespeak.net> References: <456038DD.4040304@v.loewis.de> <45622FF1.4030202@v.loewis.de> <45629504.1070002@v.loewis.de> <20061121115115.GA24321@code0.codespeak.net> Message-ID: OK, so let's reject the change. On 11/21/06, Armin Rigo wrote: > Hi Martin, > > On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote: > > Right: socket._fileobject will invoke recv as many times as > > necessary to read the requested amount of data. I was somehow > > assuming that it maps read() to read(2), which, in turn, would > > directly map to recv(2), which could return less data. > > > > So it's a semantic change only for the last block. > > That means that it would be rather pointless to make the change, right? > The original poster's motivation is to get accurate progress during the > transfer - but he missed that he already gets that. > > The proposed change only appears to be relevant together with a > hypothetical rewrite of the underlying code, one that would use recv() > instead of read(). > > > A bientot, > > Armin > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From facundobatista at gmail.com Tue Nov 21 17:42:40 2006 From: facundobatista at gmail.com (Facundo Batista) Date: Tue, 21 Nov 2006 13:42:40 -0300 Subject: [Python-Dev] Results of the SOC projects In-Reply-To: References: Message-ID: 2006/11/15, Georg Brandl : > this might seem a bit late, and perhaps I was just blind, > but I miss something like a summary how the Python > summer of code projects went, and what the status of the ones > that were meant to improve the standard library, e.g. the > C decimal implementation, is. The C decimal implementation is quite finished, but really not ready for production usage. Actually, what this work proved is that is not enough to translate decimal.py, there should be a redesign of the structure. There's a lot of mails from Raymond H. about this. And he's right. Regarding the SOC, I approved Matheusz's work, because he finished the task, and even if we need to recode it, we learned in the process. You're free to look at it, the C decimal implementation is in the sandbox. Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From martin at v.loewis.de Tue Nov 21 19:22:27 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 19:22:27 +0100 Subject: [Python-Dev] Suggestion/ feature request In-Reply-To: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu> References: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu> Message-ID: <456343E3.4000203@v.loewis.de> Julian schrieb: > SWIG seems to have done it properly by checking to see if it has been > defined already (which, I think, is how python should do it as well) > Now, even if I am not using SWIG, I could imagine these being defined > elsewhere (by other headers/libraries) or even by setting them in the VS2005 > IDE project settings (which I actually do sometimes). While these are *just* > warnings and not errors, it would look cleaner if pyconfig.h would check if > they were defined already. Sure; I have fixed this now in r52817 and r52818 I just wondered why you get the warning: you shouldn't get one if the redefinition is the same as the original one. In this case, it wasn't the same redefinition, as SWIG was merely defining them, and Python was defining them to 1. Regards, Martin From martin at v.loewis.de Tue Nov 21 19:30:24 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 19:30:24 +0100 Subject: [Python-Dev] Passing actual read size to urllib reporthook In-Reply-To: <20061121115115.GA24321@code0.codespeak.net> References: <456038DD.4040304@v.loewis.de> <45622FF1.4030202@v.loewis.de> <45629504.1070002@v.loewis.de> <20061121115115.GA24321@code0.codespeak.net> Message-ID: <456345C0.6010204@v.loewis.de> Armin Rigo schrieb: > Hi Martin, > > On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote: >> Right: socket._fileobject will invoke recv as many times as >> necessary to read the requested amount of data. I was somehow >> assuming that it maps read() to read(2), which, in turn, would >> directly map to recv(2), which could return less data. >> >> So it's a semantic change only for the last block. > > That means that it would be rather pointless to make the change, right? Right; I rejected the patch. Thanks for all your input. Regards, Martin From martin at v.loewis.de Tue Nov 21 19:41:50 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 19:41:50 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <20061121120837.GB24321@code0.codespeak.net> References: <4562326E.904@v.loewis.de> <20061121120837.GB24321@code0.codespeak.net> Message-ID: <4563486E.30104@v.loewis.de> Armin Rigo schrieb: > I think this list of examples isn't meant to be read that way. Half of > them can invoke custom methods, not just the two you mention here. I > think the idea is that provided only "built-in enough" objects are > involved, the core operation described by each line works atomically, in > the sense e.g. that if two threads do 'L.append(x)' you really add two > items to the list (only the order is unspecified), and if two threads > perform x.field = y roughly at the same time, and the type of x > doesn't override the default __setattr__ logic, then you know that the > object x will end up with a 'field' that is present and has exactly > one of the two values that the threads tried to put in. Ah, so it's more about Consistency (lists not being corrupted) than about Atomicity (operations either succeeding completely or failing completely). Perhaps it's also about Isolation (no intermediate results visible), but I'm not so sure which of these operations are isolated (given the callbacks). > Python programs rely on these kind of properties, and they are probably > a good thing - at least, much better IMHO than having to put locks > everywhere. I would even say that the distinction between "preventing > the interpreter from crashing" and "getting sane results" is not really > relevant. If your program doesn't crash the interpreter, but loose some > append()s or produce similar nonsense if you forget a lock, then we get > the drawbacks of the GIL without its benefits... So again, I think it's consistency you are after here (of the ACID properties). > In practice, the list of operations that is atomic should (ideally) be > documented more precisely -- one way to do that is to specify it at the > level of built-in methods instead of syntax, e.g. saying that the method > list.append() works atomically, and so does dict.setdefault() as long as > all keys are "built-in enough" objects. But many of these operations don't work atomically! (although .append does) For example, x = y may cause __del__ for the old value of x to be invoked, which may fail with an exception. If it fails, the assignment is still carried out, instead of being rolled back. Regards, Martin From fumanchu at amor.org Tue Nov 21 20:29:22 2006 From: fumanchu at amor.org (Robert Brewer) Date: Tue, 21 Nov 2006 11:29:22 -0800 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations Message-ID: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> Martin v. L?wis wrote: > Armin Rigo schrieb: > > I think this list of examples isn't meant to be read that > > way. Half of them can invoke custom methods, not just the > > two you mention here. > > Ah, so it's more about Consistency (lists not being corrupted) > than about Atomicity (operations either succeeding completely > or failing completely). Perhaps it's also about Isolation (no > intermediate results visible), but I'm not so sure which of > these operations are isolated (given the callbacks). It's not "about" any of those things, because we're not discussing transactional models. The FAQ entry is trying to list statements which can be considered a single operation due to being implemented via a single bytecode. By eliminating statements which use multiple VM instructions, one minimizes overlapping operations; there are other ways, but this is easy and common, and is an important "first step" toward making a container "thread-safe". You're bringing in other, larger issues, which is fine and should be addressed in a larger context. But the FAQ isn't trying to address those. The confusion arises because transactional theory uses "atomic transaction" in a much narrower sense than language design uses the phrase "atomic operation" (see http://en.wikipedia.org/wiki/Atomic_operation for example--it includes isolation and consistency). And the FAQ entry is only addressing the "isolation" concern; whether or not a given operation can be interrupted/overlapped. Those who design thread-safe containers benefit from such a list. Yes, they must also make sure no Python code (like __del__ or __setattr_, etc) is invoked during the operation; others have already pointed out that by using builtins, this can be minimized. But that "second step" doesn't negate the benefit of the "first step", eliminating statements which require multiple VM instructions. Robert Brewer System Architect Amor Ministries fumanchu at amor.org From martin at v.loewis.de Tue Nov 21 21:01:35 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 21:01:35 +0100 Subject: [Python-Dev] Suggestion/ feature request In-Reply-To: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu> References: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu> Message-ID: <45635B1F.4090401@v.loewis.de> Julian schrieb: > I have two questions though... Is there any reason why Python is defining > them to 1? No particular reason, except that it's C tradition to give macros a value of 1 when you define them. > And then later on in the same file: > /* Turn off warnings about deprecated C runtime functions in > VisualStudio .NET 2005 */ > #if _MSC_VER >= 1400 && !defined _CRT_SECURE_NO_DEPRECATE > #define _CRT_SECURE_NO_DEPRECATE > #endif > > Isn't that redundant? It is indeed. > I don't think that second block will ever get > executed. Moreover, in the second block, it is not being defined to 1. why > is that ? Different people have contributed this; the first one came from r41563 | martin.v.loewis | 2005-11-29 18:09:13 +0100 (Di, 29 Nov 2005) Silence VS2005 warnings about deprecated functions. and the second one from r46778 | kristjan.jonsson | 2006-06-09 18:28:01 +0200 (Fr, 09 Jun 2006) | 2 lines Turn off warning about deprecated CRT functions on for VisualStudio .NET 2005. Make the definition #ARRAYSIZE conditional. VisualStudio .NET 2005 already has it defined using a better gimmick. Regards, Martin From martin at v.loewis.de Tue Nov 21 21:24:07 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 21:24:07 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> Message-ID: <45636067.7040305@v.loewis.de> Robert Brewer schrieb: > The confusion arises because transactional theory uses "atomic > transaction" in a much narrower sense than language design uses the > phrase "atomic operation" (see > http://en.wikipedia.org/wiki/Atomic_operation for example--it > includes isolation and consistency). And the FAQ entry is only > addressing the "isolation" concern; whether or not a given operation > can be interrupted/overlapped. Those who design thread-safe > containers benefit from such a list. Yes, they must also make sure no > Python code (like __del__ or __setattr_, etc) is invoked during the > operation; others have already pointed out that by using builtins, > this can be minimized. But that "second step" doesn't negate the > benefit of the "first step", eliminating statements which require > multiple VM instructions. Ok. I think I would have understood that FAQ entry better if it had said: "These operations are represented in a single byte-code operation", instead of saying "they are thread-safe", or "they are atomic". Of course, Josiah Carlson's remark then still applies: all statements listed there take multiple byte codes, because you have to put the parameters onto the stack first. This is more than hypothetical. If two threads do simultaneously thread1 thread2 x = y y = x then, if these were "atomic", you would expect that afterwards, both variables have the same value: Either thread1 executes first, which means that x has the value of y (and thread2's operation has no effect), or thread2 executes first, in which case both variables get x's original value. However, in Python, it may happen that afterwards, the values get swapped: thread1 loads y onto the stack, then a context switch occurs, then thread2 sets y = x (so y gets x's value), later thread1 becomes active again, and x gets y's original value (from the thread1 stack). If you were looking for actions where the "core" operation is a single opcode, then this list could be much longer: all of the following are "atomic" or "thread-safe", too: unary operations (+x, -x, not x, ~x) binary operations (a+b,a-b,a*b,a/b,a//b,a[b]) exec "string" del x del x.field del x[i] As for the original questions: "x+=1" is two "atomic" operations, not one. Or, more precisely, it's 4 opcodes, not 2. Regards, Martin From arigo at tunes.org Tue Nov 21 22:56:20 2006 From: arigo at tunes.org (Armin Rigo) Date: Tue, 21 Nov 2006 22:56:20 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <45636067.7040305@v.loewis.de> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> Message-ID: <20061121215620.GA24206@code0.codespeak.net> Hi Martin, On Tue, Nov 21, 2006 at 09:24:07PM +0100, "Martin v. L?wis" wrote: > As for the original questions: "x+=1" is two "atomic" > operations, not one. Or, more precisely, it's 4 opcodes, > not 2. Or, more interestingly, the same is true for constructs like 'd[x]+=1': they are a sequence of three bytecodes that may overlap other threads (read d[x], add 1, store the result back in d[x]) so it's not a thread-safe way to increment a counter. (More generally it's very easy to forget that expr1[expr2] += expr3 really means x = expr1; y = expr2; x[y] = x[y] + expr3 using a '+' that is special only in that it invokes the __iadd__ instead of the __add__ method, if there is one.) A bientot, Armin From martin at v.loewis.de Tue Nov 21 23:24:21 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 21 Nov 2006 23:24:21 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <20061121215620.GA24206@code0.codespeak.net> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> Message-ID: <45637C95.6050907@v.loewis.de> Armin Rigo schrieb: > Or, more interestingly, the same is true for constructs like 'd[x]+=1': > they are a sequence of three bytecodes that may overlap other threads > (read d[x], add 1, store the result back in d[x]) so it's not a > thread-safe way to increment a counter. > > (More generally it's very easy to forget that expr1[expr2] += expr3 > really means > > x = expr1; y = expr2; x[y] = x[y] + expr3 > > using a '+' that is special only in that it invokes the __iadd__ instead > of the __add__ method, if there is one.) OTOH, using += is "thread-safe" if the object is mutable (e.g. a list), and all modifications use +=. In that case, __iadd__ will be invoked, which may (for lists) or may not (for other types) be thread-safe. Since the same object gets assigned to the original slot in all threads, execution order does not really matter. I personally consider it "good style" to rely on implementation details of CPython; if you do, you have to know precisely what these details are, and document why you think a specific fragment of code is correct. Regards, Martin From ncoghlan at gmail.com Wed Nov 22 10:32:23 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Nov 2006 19:32:23 +1000 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <45637C95.6050907@v.loewis.de> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> Message-ID: <45641927.7080501@gmail.com> Martin v. L?wis wrote: > I personally consider it "good style" to rely on implementation details > of CPython; Is there a 'do not' missing somewhere in there? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From steven.bethard at gmail.com Wed Nov 22 20:48:48 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 22 Nov 2006 12:48:48 -0700 Subject: [Python-Dev] DRAFT: python-dev summary for 2006-10-16 to 2006-10-31 Message-ID: Here's the summary for the second half of October. Comments and corrections welcome as always, especially on that extended buffer protocol / binary format specifier discussion which was a little overwhelming. ;-) ============= Announcements ============= -------------------------------------- Roundup to replace SourceForge tracker -------------------------------------- Roundup_ has been named as the official replacement for the SourceForge_ issue tracker. Thanks go out to the new volunteer admins, Paul DuBois, Michael Twomey, Stefan Seefeld, and Erik Forsberg, and also to `Upfront Systems`_ who will be hosting the tracker. If you'd like to provide input on what the new tracker should do, please join the `tracker-discuss mailing list`_. .. _SourceForge: http://www.sourceforge.net/ .. _Roundup: http://roundup.sourceforge.net/ .. _Upfront Systems: http://www.upfrontsystems.co.za/ .. _tracker-discuss mailing list: http://mail.python.org/mailman/listinfo/tracker-discuss Contributing threads: - `PSF Infrastructure has chosen Roundup as the issue tracker for Python development `__ - `Status of new issue tracker `__ ========= Summaries ========= --------------------------------------------------------------- The buffer protocol and communicating binary format information --------------------------------------------------------------- Travis E. Oliphant presented a pre-PEP for adding a standard way to describe the shape and intended types of binary-formatted data. It was accompanied by a pre-PEP for extending the buffer protocol to handle such shapes and types. Under the proposal, a new ``datatype`` object would describe binary-formatted data with an API like:: datatype((float, (3,2)) # describes a 3*2*8=48 byte block of memory that should be interpreted # as 6 doubles laid out as arr[0,0], arr[0,1], ... a[2,0], a[1,2] datatype([( ([1,2],'coords'), 'f4', (3,6)), ('address', 'S30')]) # describes the structure # float coords[3*6] /* Has [1,2] associated with this field */ # char address[30] Alexander Belopolsky provided a nice example of why you might want to extend the buffer protocol along these lines. Currently, there's not much you can do with a basic buffer object. If you want to pass it to numpy_, you have to provide the type and shape information yourself:: >>> b = buffer(array('d', [1,2,3])) >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b) array([ 1., 2., 3.]) By extending the buffer protocol appropriately so that the necessary information can be provided, you should be able to pass the buffer directly to numpy_ and have it understand the format itself:: >>> numpy.array(b) People were uncomfortable with the many ``datatype`` variants -- the constructor accepted types, strings, lists or dicts, each of which could specify the structure in a different way. Also, a number of people questioned why the existing ``ctypes`` mechanisms for describing binary data couldn't be used instead, particularly since ``ctypes`` could already describe things like function pointers and recursive types, which the pre-PEP could not. Travis said he was looking for a way to unify the data formats of all the ``array``, ``struct``, ``numpy`` and ``ctypes`` modules, and felt like using the ``ctypes`` approach was too verbose for use in the other modules. In particular, he felt like the ``ctypes`` use of type objects as binary-format specifiers was problematic because type objects were harder to manipulate at the C level. The discussion continued on into the next fortnight. .. _numpy: Contributing threads: - `PEP: Adding data-type objects to Python `__ - `PEP: Extending the buffer protocol to share array information. `__ ------------------------ The "lazy strings" patch ------------------------ Discussion continued on Larry Hastings `lazy strings patch`_ that would have delayed until necessary the evaluation of some string operations, like concatenation and slicing. With his patch, repeated string concatenation could be used instead of the standard ``.join()`` idiom, and slices which were never used would never be rendered. Discussions of the patch showed that people were concerned about memory increases when a small slice of a very large string kept the large string around in memory. People also felt like a stronger motivation was necessary to justify complicating the string representation so much. Larry was pointed to some `code that his patch would break`_, which was using ``ob_sval`` directly instead of calling ``PyString_AS_STRING()`` like it was supposed to. He was also referred to the `Python 3000 list`_ where the recent discussions of `string views`_ would be relevant, and his proposal might have a better chance of acceptance. .. _lazy strings patch: http://bugs.python.org/1569040 .. _code that his patch would break: http://www.google.com/codesearch?hl=en&lr=&q=ob_sval+-stringobject.%5Bhc%5D&btnG=Search .. _Python 3000 list: http://mail.python.org/mailman/listinfo/python-3000 .. _string views: http://mail.python.org/pipermail/python-3000/2006-August/003280.html Contributing threads: - `PATCH submitted: Speed up + for string Re: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom `__ - `Python-Dev Digest, Vol 39, Issue 54 `__ - `Python-Dev Digest, Vol 39, Issue 55 `__ - `The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom] `__ - `The "lazy strings" patch `__ -------------- PEP 355 status -------------- BJ?rn Lindqvist wanted to wrap up the loose ends of `PEP 355`_ and asked whether the problem was the specific path object of `PEP 355`_ or path objects in general. A number of people felt that some reorganization of the path-related functions could be helpful, but that trying to put everything into a single object was a mistake. Some important requirements for a reorganization of the path-related functions: * should divide the functions into coherent groups * should allow you to manipulate paths foreign to your OS There were a few suggestions of possible new APIs, but no concrete implementations. People seemed hopeful that the issue could be resurrected for Python 3K, but no one appeared to be taking the lead. .. _PEP 355: http://www.python.org/dev/peps/pep-0355/ Contributing thread: - `PEP 355 status `__ -------------------------------------------------- Buildbots, configure changes and extension modules -------------------------------------------------- Grig Gheorghiu, who's been taking care of the `Python Community Buildbots`_, noticed that the buildbots started failing after a checkin that made changes to ``configure``. Martin v. L?wis explained that even though a plain ``make`` will trigger a re-run of ``configure`` if it has changed, there is an issue with distutils not rebuilding when header files change, and so extension modules are sometimes not rebuilt. Contributions to fix that deficiency in distutils are welcome. Martin also pointed out a handy way of forcing a buildbot to start with a clean build: ask the buildbot to build a non-existing branch. This causes the checkouts to be deleted and the build to fail. The next regular build will then start from scratch. .. _Python Community Buildbots: http://www.pybots.org/ Contributing thread: - `Python unit tests failing on Pybots farm `__ --------------- Sqlite versions --------------- Skip Montanaro ran into some problems running ``test_sqlite`` on OSX where he was getting a bunch of ``ProgrammingError: library routine called out of sequence`` errors. These errors appeared reliably when ``test_sqlite`` was run immediately after ctypes' ``test_find``. When he started linking to sqlite 3.1.3 instead of sqlite 3.3.8, the problems went away. Barry Warsaw mentioned that he had run into similar troubles when he tried to upgrade from 3.2.1 to 3.2.8. Contributing thread: - `Massive test_sqlite failure on Mac OSX ... sometimes `__ --------------------------------------------- Threads, generators, exceptions and segfaults --------------------------------------------- Mike Klaas managed to `provoke a segfault`_ in Python 2.5 using threads, generators and exceptions. Tim Peters was able to whittle Mike's problem down to a relatively simple test case, where a generator was created within a thread, and then the thread vanished before the generator had exited. The segfault was a result of Python's attempt to clean up the abandoned generator, during which it tried to access the generator's already free()'d thread state. No clear solution to this problem had been decided on at the time of this summary. .. _provoke a segfault: http://bugs.python.org/1579370 Contributing thread: - `Segfault in python 2.5 `__ ---------------- ctypes and win64 ---------------- Previously, Thomas Heller had asked that ctypes be removed from the Python 2.5 win64 MSI installers since it did not work for that platform at the time. Since then, Thomas integrated some patches in the trunk so that _ctypes could be built for win64/AMD64. Backporting these fixes to Python 2.5 would have meant that, while the MSI installer would still not include it, _ctypes could be built from a source distribution on win64/AMD64. It was unclear whether this would constitute a bugfix (in which case the backport would be okay) or a feature (in which case it wouldn't). Contributing thread: - `ctypes and win64 `__ ------------------------------ Python 2.3.X and 2.4.X retired ------------------------------ Anthony Baxter pushed out a Python 2.4.4 release and was pushing out the Python 2.3.6 source release as well. He indicated that once 2.3.6 was out, both of these branches could be officially retired. Contributing thread: - `state of the maintenance branches `__ --------------------------------------- Producing bytecode from Python 2.5 ASTs --------------------------------------- Michael Spencer offered up his compiler2_ module, a rewrite of the compiler module which allows bytecode to be produced from ``_ast.AST`` objects. Currently, it produces almost identical output to ``__builtin__.compile`` for all the stdlib modules and their tests. He asked for feedback on what would be necessary to get it stdlib ready, but had no responses. .. _compiler2: http://svn.brownspencer.com/pycompiler/branches/new_ast/ Contributing thread: - `Fwd: Re: ANN compiler2 : Produce bytecode from Python 2.5 AST `__ ================== Previous Summaries ================== - `Python 2.5 performance `__ - `Promoting PCbuild8 (Was: Python 2.5 performance) `__ - `2.3.6 for the unicode buffer overrun `__ - `2.4.4: backport classobject.c HAVE_WEAKREFS? `__ =============== Skipped Threads =============== - `Weekly Python Patch/Bug Summary `__ - `Problem building module against Mac Python 2.4 and Python 2.5 `__ - `svn.python.org down `__ - `BRANCH FREEZE release24-maint, Wed 18th Oct, 00:00UTC `__ - `who is interested on being on a python-dev panel at PyCon? `__ - `RELEASED Python 2.4.4, Final. `__ - `Nondeterministic long-to-float coercion `__ - `Promoting PCbuild8 `__ - `OT: fdopen on Windows question `__ - `Modulefinder `__ - `Optional type checking/pluggable type systems for Python `__ - `readlink and unicode strings (SF:1580674) Patch http://www.python.org/sf/1580674 fixes readlink's behaviour w.r.t. Unicode strings: without this patch this function uses the system default encoding instead of the filesystem encoding to convert Unicode objects to plain strings. Like os.listdir, os.readlink will now return a Unicode object when the argument is a Unicode object. What I'd like to know is if this can be backported to the 2.5 branch. The first part of this patch (use filesystem encoding instead of the system encoding) is IMHO a bugfix, the second part might break existing applications (that might not expect a unicode result from os.readlink). The reason I did this patch is that os.path.realpath currently breaks when the path is a unicode string with non-ascii characters and at least one element of the path is a symlink. Ronald `__ - `readlink and unicode strings (SF:1580674) `__ - `RELEASED Python 2.3.6, release candidate 1 `__ - `__str__ bug? `__ - `Hunting down configure script error `__ - `Python 2.4.4 docs? `__ - `DRAFT: python-dev summary for 2006-09-01 to 2006-09-15 `__ - `DRAFT: python-dev summary for 2006-09-16 to 2006-09-30 `__ - `[Python-checkins] r52482 - in python/branches/release25-maint: Lib/urllib.py Lib/urllib2.py Misc/NEWS `__ - `Typo.pl scan of Python 2.5 source code `__ - `build bots, log output `__ - `PyCon: proposals due by Tuesday 10/31 `__ - `test_codecs failures `__ From martin at v.loewis.de Wed Nov 22 22:49:26 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Nov 2006 22:49:26 +0100 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <45641927.7080501@gmail.com> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com> Message-ID: <4564C5E6.8070605@v.loewis.de> Nick Coghlan schrieb: > Martin v. L?wis wrote: >> I personally consider it "good style" to rely on implementation details >> of CPython; > > Is there a 'do not' missing somewhere in there? No - I really mean it. I can find nothing wrong with people relying on reference counting to close files, for example. It's a property of CPython, and not guaranteed in other Python implementations - yet it works in a well-defined way in CPython. Code that relies on that feature is not portable, but portability is only one goal in software development, and may be irrelevant for some projects. Likewise, I see nothing wrong with people relying on .append on a list working "correctly" when used from two threads, even though the language specification does not guarantee that property. Similarly, it's fine when people rely on the C type "int" to have 32-bits when used with gcc on x86 Linux. The C standard makes that implementation-defined, but this specific implementation made a choice that you can rely on. Regards, Martin From kbk at shore.net Thu Nov 23 04:36:13 2006 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed, 22 Nov 2006 22:36:13 -0500 (EST) Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200611230336.kAN3aD59005113@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 406 open (-10) / 3479 closed (+16) / 3885 total ( +6) Bugs : 931 open ( +1) / 6349 closed (+16) / 7280 total (+17) RFE : 245 open ( +1) / 244 closed ( +0) / 489 total ( +1) New / Reopened Patches ______________________ Logging Module - followfile patch (2006-11-17) http://python.org/sf/1598415 reopened by cjschr Logging Module - followfile patch (2006-11-17) http://python.org/sf/1598415 opened by chads Logging Module - followfile patch (2006-11-17) CLOSED http://python.org/sf/1598426 opened by chads mailbox.py: check that os.fsync is available before using it (2006-11-19) http://python.org/sf/1599256 opened by David Watson CodeContext - Improved text indentation (2005-11-21) CLOSED http://python.org/sf/1362975 reopened by taleinat TCPServer option to bind and activate (2006-11-20) http://python.org/sf/1599845 opened by Peter Parente __bool__ instead of __nonzero__ (2006-11-21) http://python.org/sf/1600346 opened by ganges master 1572210 doc patch (2006-11-21) http://python.org/sf/1600491 opened by Jim Jewett Patches Closed ______________ Logging Module - followfile patch (2006-11-17) http://python.org/sf/1598426 closed by gbrandl tkSimpleDialog.askstring() Tcl/Tk-8.4 lockup (2006-08-11) http://python.org/sf/1538878 closed by loewis tkSimpleDialog freezes when apply raises exception (2006-11-11) http://python.org/sf/1594554 closed by loewis Tix: subwidget names (bug #1472877) (2006-10-25) http://python.org/sf/1584712 closed by loewis better error msgs for some TypeErrors (2006-10-29) http://python.org/sf/1586791 closed by gbrandl Auto Complete module for IDLE (2005-11-19) http://python.org/sf/1361016 closed by loewis Add BLANK_LINE to token.py (2004-11-20) http://python.org/sf/1070218 closed by loewis improve embeddability of python (2003-11-25) http://python.org/sf/849278 closed by loewis Extend struct.unpack to produce nested tuples (2003-11-23) http://python.org/sf/847857 closed by loewis Iterating closed StringIO.StringIO (2005-11-18) http://python.org/sf/1359365 closed by loewis urllib reporthook could be more informative (2003-11-26) http://python.org/sf/849407 closed by loewis xmlrpclib - marshalling new-style classes. (2004-11-20) http://python.org/sf/1070046 closed by loewis CodeContext - Improved text indentation (2005-11-21) http://python.org/sf/1362975 closed by loewis Implementation of PEP 3102 Keyword Only Argument (2006-08-30) http://python.org/sf/1549670 closed by gvanrossum readline does not need termcap (2004-12-01) http://python.org/sf/1076826 closed by loewis Make cgi.py use logging module (2004-12-06) http://python.org/sf/1079729 closed by loewis New / Reopened Bugs ___________________ The empty set is a subset of the empty set (2006-11-17) CLOSED http://python.org/sf/1598166 opened by Andreas Kloeckner subprocess.py: O(N**2) bottleneck (2006-11-16) http://python.org/sf/1598181 opened by Ralf W. Grosse-Kunstleve import curses fails (2006-11-17) CLOSED http://python.org/sf/1598357 opened by thorvinrhuebarb Misspelled submodule names for email module. (2006-11-17) CLOSED http://python.org/sf/1598361 opened by Dmytro O. Redchuk ctypes Structure allows recursive definition (2006-11-17) http://python.org/sf/1598620 opened by Lenard Lindstrom csv library does not handle '\x00' (2006-11-18) CLOSED http://python.org/sf/1599055 opened by Stephen Day --disable-sunaudiodev --disable-tk does not work (2006-10-17) CLOSED http://python.org/sf/1579029 reopened by thurnerrupert mailbox: other programs' messages can vanish without trace (2006-11-19) http://python.org/sf/1599254 opened by David Watson htmlentitydefs.entitydefs assumes Latin-1 encoding (2006-11-19) CLOSED http://python.org/sf/1599325 opened by Erik Demaine SSL-ed sockets don't close correct? (2004-06-24) http://python.org/sf/978833 reopened by arigo Segfault on bsddb.db.DB().type() (2006-11-20) CLOSED http://python.org/sf/1599782 opened by Rob Sanderson problem with socket.gethostname documentation (2006-11-20) CLOSED http://python.org/sf/1599879 opened by Malte Helmert Immediate Crash on Open (2006-11-20) http://python.org/sf/1599931 opened by Farhymn mailbox: Maildir.get_folder does not inherit factory (2006-11-21) http://python.org/sf/1600152 opened by Tetsuya Takatsuru [PATCH] Quitting The Interpreter (2006-11-20) CLOSED http://python.org/sf/1600157 opened by Chris Carter Tix ComboBox entry is blank when not editable (2006-11-21) http://python.org/sf/1600182 opened by Tim Wegener --enable-shared links extensions to libpython statically (2006-11-22) http://python.org/sf/1600860 opened by Marien Zwart urllib2 does not close sockets properly (2006-11-23) http://python.org/sf/1601399 opened by Brendan Jurd utf_8_sig decode fails with buffer input (2006-11-23) http://python.org/sf/1601501 opened by bazwal Bugs Closed ___________ The empty set should be a subset of the empty set (2006-11-17) http://python.org/sf/1598166 closed by gbrandl import curses fails (2006-11-17) http://python.org/sf/1598357 closed by akuchling Misspelled submodule names for email module. (2006-11-17) http://python.org/sf/1598361 closed by gbrandl Tix: Subwidget names (2006-04-19) http://python.org/sf/1472877 closed by loewis replace groups doesn't work in this special case (2006-11-06) http://python.org/sf/1591319 closed by gbrandl csv module does not handle '\x00' (2006-11-19) http://python.org/sf/1599055 closed by gbrandl --disable-sunaudiodev --disable-tk does not work (2006-10-17) http://python.org/sf/1579029 closed by loewis htmlentitydefs.entitydefs assumes Latin-1 encoding (2006-11-19) http://python.org/sf/1599325 closed by loewis where is zlib??? (2006-11-04) http://python.org/sf/1590592 closed by sf-robot Segfault on bsddb.db.DB().type() (2006-11-20) http://python.org/sf/1599782 closed by nnorwitz problem with socket.gethostname documentation (2006-11-20) http://python.org/sf/1599879 closed by nnorwitz [PATCH] Quitting The Interpreter (2006-11-21) http://python.org/sf/1600157 closed by mwh os.popen w/o using the shell (2002-04-25) http://python.org/sf/548661 closed by nnorwitz memory leaks when importing posix module (2002-09-23) http://python.org/sf/613222 closed by nnorwitz docs missing 'trace' module (2003-07-29) http://python.org/sf/779976 closed by nnorwitz infinite __str__ recursion in thread causes seg fault (2003-07-31) http://python.org/sf/780714 closed by nnorwitz python and lithuanian locales (2003-11-02) http://python.org/sf/834452 closed by nnorwitz Bus error in extension with gcc 3.3 (2005-06-29) http://python.org/sf/1229788 closed by nnorwitz New / Reopened RFE __________________ urllib(2) should allow automatic decoding by charset (2006-11-19) http://python.org/sf/1599329 opened by Erik Demaine From steven.bethard at gmail.com Thu Nov 23 07:48:44 2006 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 22 Nov 2006 23:48:44 -0700 Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to 2006-11-15 Message-ID: Here's the summary for the first half of November. Try not to spend it all in one place! ;-) As always, corrections and comments are greatly appreciated. ============= Announcements ============= -------------------------- Python 2.5 malloc families -------------------------- Just a reminder that if you find your extension module is crashing with Python 2.5 in malloc/free, there is a high chance that you have a mismatch in malloc "families". Unlike previous versions, Python 2.5 no longer allows sloppiness here -- if you allocate with the ``PyMem_*`` functions, you must free with the ``PyMem_*`` functions, and similarly, if you allocate with the ``PyObject_*`` functions, you must free with the ``PyObject_*`` functions. Contributing thread: - `2.5 portability problems `__ ========= Summaries ========= ---------------------------------- Path algebra and related functions ---------------------------------- Mike Orr started work on a replacement for `PEP 355`_ that would better group the path-related functions currently in ``os``, ``os.path``, ``shutil`` and other modules. He proposed to start with a `directory-tuple Path class`_ that would have allowed code like:: # equivalent to # os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib") os.path.Path(__FILE__)[:-2] + "lib" where a Path object would act like a tuple of directories, and could be easily sliced and reordered as such. As an alternative, glyph proposed using `Twisted's filepath module`_ which was already being used in a large body of code. He showed some common pitfalls, like that the existence on Windows of "CON" and "NUL" in *every* directory can make paths invalid, and indicated how FilePath solved these problems. Fredrik Lundh suggested a reorganization where functions that manipulate path *names* would reside in ``os.path``, and functions that manipulate *objects* identified by a path would reside in ``os``. The ``os.path`` module would gain a path wrapper object, which would allow "path algebra" manipulations, e.g. ``path1 + path2``. The ``os`` module would gain some of the ``os.path`` and ``shutil`` functions that were manipulating real filesystem objects and not just the path names. Most people seemed to like this approach, because it correctly targeted the "algebraic" features at the areas where chained operations were most common: path name operations, not filesystem operations. Some of the conversation moved on to the `Python 3000 list`_. .. _PEP 355: http://www.python.org/dev/peps/pep-0355/ .. _directory-tuple Path class: http://wiki.python.org/moin/AlternativePathClass .. _Twisted's filepath module: http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py .. _Python 3000 list: http://mail.python.org/mailman/listinfo/python-3000 Contributing threads: - `Path object design `__ - `Mini Path object `__ - `[Python-3000] Mini Path object `__ ------------------ Replacing urlparse ------------------ A few more bugs in ``urlparse`` were turned up, and `earlier discussions about replacing urlparse`_ were briefly revisited. Paul Jimenez asked about `uriparse module`_ and was told that due to the constant problems with ``urlparse``, people were concerned about including the "incorrect" library again, so requirements were a little stringent. Martin v. L?wis gave him some guidance on a few specific points, and Nick Coghlan promised to try to post his `urischemes module`_ (a derivative of Paul's `uriparse module`_) to the `Python Package Index`_. .. _earlier discussions about replacing urlparse: http://www.python.org/dev/summary/2006-06-01_2006-06-15/#rfc-3986-uniform-resource-identifiers-uris .. _uriparse module: http://bugs.python.org/1462525 .. _urischemes module: http://bugs.python.org/1500504 .. _Python Package Index: http://www.python.org/pypi Contributing threads: - `patch 1462525 or similar solution? `__ - `Path object design `__ ---------------------------------- Importing .py, .pyc and .pyo files ---------------------------------- Martin v. L?wis brought up `Osvaldo Santana's patch`_ which would have made Python search for both .pyc and .pyo files regardless of whether or not the optimize flag, "-OO", was set (like zipimporter does). Without this patch, when "-OO" was given, Python never looked for .pyc files. Some people thought that an extra ``stat()`` call or directory listing to check for the other file would be too expensive, but no one profiled the various versions of the code so the cost was unclear. People were leaning towards removing the extra functionality from zipimporter so that at least it was consistent with the rest of Python. Giovanni Bajo suggested that .pyo file support should be dropped completely, with .pyc files being compiled at various levels of optimization depending on the command line flags. To make sure all your .pyc files were compiled at the same level of optimization, you'd use a new "-I" flag to indicate that all files should be recompiled, e.g. ``python -I -OO app.py``. Armin Rigo suggested only loading files with a .py extension. Python would still generate .pyc files as a means of caching bytecode for speed reasons, but it would never import them without a corresponding .py file around. For people wanting to ship just bytecode, the cached .pyc files could be renamed to .py files and then those could be shipped and imported. There was some support for Armin's solution, but it was not overwhelming. .. _Osvaldo Santana's patch: http://bugs.python.org/1346572 Contributing thread: - `Importing .pyc in -O mode and vice versa `__ --------------------------------------------------------------- The buffer protocol and communicating binary format information --------------------------------------------------------------- The discussion of extending the buffer protocol to more binary formats continued this fortnight. Though the PIL_ had been used as an example of a library that could benefit from an extended buffer protocol, Fredrik Lundh indicated that future versions of the PIL_ would make the binary data model completely opaque, and instead provide a view-style API like:: view = object.acquire_view(region, supported formats) ... access data in view ... view.release() Along these lines, the discussion turned away from the particular C formats used in ``ctypes``, ``numpy``, ``array``, etc. and more towards the best way to communicate format information between these modules. Though it seemed like people were not completely happy with the proposed API of the new buffer protocol, the discussion seemed to skirt around any concrete suggestions for better APIs. In the end, the only thing that seemed certain was that a new buffer protocol could only be successful if it were implemented on all of the appropriate stdlib modules: ``ctypes``, ``array``, ``struct``, etc. .. _PIL: http://www.pythonware.com/products/pil/ Contributing threads: - `PEP: Adding data-type objects to Python `__ - `PEP: Extending the buffer protocol to share array information. `__ - `idea for data-type (data-format) PEP `__ --------------- __dir__, part 2 --------------- Tomer Filiba continued his `previous investigations`_ into adding a ``__dir__()`` method to allow customization of the ``dir()`` builtin. He moved most of the current ``dir()`` logic into ``object.__dir__()``, with some additional logic necessary for modules and types being moved to ``ModuleType.__dir__()`` and ``type.__dir__()`` respectively. He posted a `patch for his implementation`_ and it got approval for Python 2.6. There was a brief discussion about whether or not it was okay for an object to lie about its members, with Fredrik Lundh suggesting that you should only be allowed to *add* to the result that ``dir()`` produces. Nick Coghlan pointed out that when a class overrides ``__getattribute__()``, attributes that the default ``dir()`` implementation sees can be blocked, in which case removing members from the result of ``dir()`` might be quite appropriate. .. _previous investigations: http://www.python.org/dev/summary/2006-07-01_2006-07-15/#adding-a-dir-magic-method .. _patch for his implementation: http://bugs.python.org/1591665 Contributing thread: - `__dir__, part 2 `__ -------------------------------- Invalid read errors and valgrind -------------------------------- Using valgrind, Herman Geza found that he was getting some "Invalid read" read errors in PyObject_Free which weren't identified as acceptable in Misc/README.valgrind. Tim Peters and Martin v. L?wis explained that these are okay if they are reads from Py_ADDRESS_IN_RANGE. If the address given is Python's own memory, a valid arena index is read. Otherwise, garbage is read (though this read will never fail since Python always reads from the page where the about-to-be-freed block is located). The arenas are then checked to see whether the result was garbage or not. Neal Norwitz promised to try to update Misc/README.valgrind with this information. Contributing thread: - `valgrind `__ --------------------------- SCons and cross-compilation --------------------------- Martin v. L?wis reviewed a `patch for cross-compilation`_ which proposed to use SCons_ instead of distutils because updating distutils to work for cross-compilation would have involved some fairly major changes. Distutils had certain notions of where to look for header files and how to invoke the compiler which were incorrect for cross-compilation, and which were difficult to change. While accepting the patch would not have required SCons_ to be added to Python proper (which a number of people opposed), people didn't like the idea of having to update SCons configuration in addition to already having to update setup.py, Modules/Setup and the PCbuild area. The patch was therefore rejected. .. _patch for cross-compilation: http://bugs.python.org/841454 .. _SCons: http://www.scons.org/ Contributing thread: - `Using SCons for cross-compilation `__ ---------------------------- Individual interpreter locks ---------------------------- Robert asked about having a separate lock for each interpreter instance instead of the global interpreter lock (GIL). Brett Cannon and Martin v. L?wis explained that a variety of objects are shared between interpreters, including: * extension modules * type objects (including exception types) * singletons like ``None``, ``True``, ``()``, strings of length 1, etc. * many things in the sys module A single lock for each interpreter would not be sufficient for handling access to such shared objects. Contributing thread: - `Feature Request: Py_NewInterpreter to create separate GIL (branch) `__ --------------------------- Passing floats to file.seek --------------------------- Python's implementation of ``file.seek`` was converting floats to ints. `Robert Church suggested a patch`_ that would convert floats to long longs and thus support files larger than 2GiB. Martin v. L?wis proposed instead to use the ``__index__()`` API to support the large files and to raise an exception for float arguments. Martin's approach was approved, with a warning instead of an exception for Python 2.6. .. _Robert Church suggested a patch: http://bugs.python.org/1067760 Contributing thread: - `Passing floats to file.seek `__ ---------------------------------------- The datetime module and timezone objects ---------------------------------------- Fredrik Lundh asked about including a ``tzinfo`` object implementation for the ``datetime`` module, along the lines of the ``UTC``, ``FixedOffset`` and ``LocalTimezone`` classes from the `library reference`_. A number of people reported having copied those classes into their own code repeatedly, and so Fredrik got the go-ahead to put them into Python 2.6. .. _library reference: http://docs.python.org/lib/datetime-tzinfo.html Contributing thread: - `ready-made timezones for the datetime module `__ ================ Deferred Threads ================ - `Summer of Code: zipfile? `__ - `Results of the SOC projects `__ ================== Previous Summaries ================== - `The "lazy strings" patch [was: PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom] `__ =============== Skipped Threads =============== - `RELEASED Python 2.3.6, FINAL `__ - `[Tracker-discuss] Getting Started `__ - `Status of pairing_heap.py? `__ - `Inconvenient filename in sandbox/decimal-c/new_dt `__ - `test_ucn fails for trunk on x86 Ubuntu Edgy `__ - `Weekly Python Patch/Bug Summary `__ - `Last chance to join the Summer of PyPy! `__ - `[Python-checkins] r52692 - in python/trunk: Lib/mailbox.py Misc/NEWS `__ - `PyFAQ: help wanted with thread article `__ - `Arlington sprint this Saturday `__ - `Suggestion/ feature request `__ From ncoghlan at gmail.com Thu Nov 23 10:59:09 2006 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Nov 2006 19:59:09 +1000 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <4564C5E6.8070605@v.loewis.de> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com> <4564C5E6.8070605@v.loewis.de> Message-ID: <456570ED.8020706@gmail.com> Martin v. L?wis wrote: > Nick Coghlan schrieb: >> Martin v. L?wis wrote: >>> I personally consider it "good style" to rely on implementation details >>> of CPython; >> Is there a 'do not' missing somewhere in there? > > No - I really mean it. I can find nothing wrong with people relying on > reference counting to close files, for example. It's a property of > CPython, and not guaranteed in other Python implementations - yet it > works in a well-defined way in CPython. Code that relies on that feature > is not portable, but portability is only one goal in software > development, and may be irrelevant for some projects. Cool, that's what I thought you meant (and it's a point I actually agree with). I was uncertain enough about your intent that I felt it was worth asking the question, though :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From arigo at tunes.org Thu Nov 23 12:45:09 2006 From: arigo at tunes.org (Armin Rigo) Date: Thu, 23 Nov 2006 12:45:09 +0100 Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to 2006-11-15 In-Reply-To: References: Message-ID: <20061123114509.GA7900@code0.codespeak.net> Hi Steven, On Wed, Nov 22, 2006 at 11:48:44PM -0700, Steven Bethard wrote: > (... pyc files ...) > For people wanting to ship just bytecode, the cached > .pyc files could be renamed to .py files and then those could be > shipped and imported. Yuk! Not renamed to .py files. Distributing .py files that are actually bytecode looks like a new funny way to create confusion. No, I was half-heartedly musing about introducing Yet Another file extension (.pyc for caching and .pyX for importable bytecode, or possibly the other way around). A bientot, Armin From fredrik at pythonware.com Thu Nov 23 18:06:45 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 23 Nov 2006 18:06:45 +0100 Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to 2006-11-15 In-Reply-To: <20061123114509.GA7900@code0.codespeak.net> References: <20061123114509.GA7900@code0.codespeak.net> Message-ID: Armin Rigo wrote: > Yuk! Not renamed to .py files. Distributing .py files that are > actually bytecode looks like a new funny way to create confusion. No, I > was half-heartedly musing about introducing Yet Another file extension > (.pyc for caching and .pyX for importable bytecode, or possibly the > other way around). an alternative would be to only support source-less PYC import from ZIP archives (or other non-filesystem importers). From theller at ctypes.org Fri Nov 24 20:19:54 2006 From: theller at ctypes.org (Thomas Heller) Date: Fri, 24 Nov 2006 20:19:54 +0100 Subject: [Python-Dev] ctypes and powerpc Message-ID: <456745DA.3010903@ctypes.org> I'd like to ask for help with an issue which I do not know how to solve. Please see this bug http://python.org/sf/1563807 "ctypes built with GCC on AIX 5.3 fails with ld ffi error" Apparently this is a powerpc machine, ctypes builds but cannot be imported because of undefined symbols like 'ffi_call', 'ffi_prep_closure'. These symbols are defined in file Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c. The whole contents of this file is enclosed within a #ifdef __ppc__ ... #endif block. IIRC, this block has been added by Ronald for the Mac universal build. Now, it seems that on the AIX machine the __ppc__ symbols is not defined; removing the #ifdef/#endif makes the built successful. We have asked (in the SF bug tracker) for the symbols that are defined; one guy has executed 'gcc -v -c empty.c' and posted the output, as far as I see these are the symbols defined in gcc: -D__GNUC__=2 -D__GNUC_MINOR__=9 -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_LONG_LONG -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_LONG_LONG -Asystem(unix) -Asystem(aix) -D__CHAR_UNSIGNED__ -D_ARCH_COM What should we do now? Should the conditional be changed to #if defined(__ppc__) || defined(_POWER) or should we suggest to add '-D__ppc__' to the CFLAGS env var, or what? Any suggestions? Thanks, Thomas From theller at ctypes.org Fri Nov 24 20:59:41 2006 From: theller at ctypes.org (Thomas Heller) Date: Fri, 24 Nov 2006 20:59:41 +0100 Subject: [Python-Dev] ctypes and powerpc In-Reply-To: <456745DA.3010903@ctypes.org> References: <456745DA.3010903@ctypes.org> Message-ID: <45674F2D.7050203@ctypes.org> Thomas Heller schrieb: > I'd like to ask for help with an issue which I do not know > how to solve. > > Please see this bug http://python.org/sf/1563807 > "ctypes built with GCC on AIX 5.3 fails with ld ffi error" > > Apparently this is a powerpc machine, ctypes builds but cannot be imported > because of undefined symbols like 'ffi_call', 'ffi_prep_closure'. > > These symbols are defined in file > Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c. > The whole contents of this file is enclosed within a > > #ifdef __ppc__ > ... > #endif > > block. IIRC, this block has been added by Ronald for the > Mac universal build. Now, it seems that on the AIX machine > the __ppc__ symbols is not defined; removing the #ifdef/#endif > makes the built successful. Of course, the simple solution would be to change it to: #ifndef __i386__ ... #endif Thomas From martin at v.loewis.de Sat Nov 25 08:23:21 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Nov 2006 08:23:21 +0100 Subject: [Python-Dev] ctypes and powerpc In-Reply-To: <456745DA.3010903@ctypes.org> References: <456745DA.3010903@ctypes.org> Message-ID: <4567EF69.9080601@v.loewis.de> Thomas Heller schrieb: > What should we do now? Should the conditional be changed to > > #if defined(__ppc__) || defined(_POWER) > This would be the right test, if you want to test for "power-pc like". POWER and PowerPC are different processor architectures, IBM pSeries machines (now System p) have POWER processors; this is the predecessor of the PowerPC architecture (where PowerPC omitted some POWER features, and added new ones). Recent POWER processors (POWER3 and later, since 1997) are apparently PowerPC-compatible. Still, AIX probably continues to define _POWER for backwards-compatibility (back to RS/6000 times). Regards, Martin From ronaldoussoren at mac.com Sat Nov 25 08:24:07 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 24 Nov 2006 23:24:07 -0800 Subject: [Python-Dev] ctypes and powerpc In-Reply-To: <456745DA.3010903@ctypes.org> References: <456745DA.3010903@ctypes.org> Message-ID: On Friday, November 24, 2006, at 08:21PM, "Thomas Heller" wrote: >I'd like to ask for help with an issue which I do not know >how to solve. > >Please see this bug http://python.org/sf/1563807 >"ctypes built with GCC on AIX 5.3 fails with ld ffi error" > >Apparently this is a powerpc machine, ctypes builds but cannot be imported >because of undefined symbols like 'ffi_call', 'ffi_prep_closure'. > >These symbols are defined in file > Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c. >The whole contents of this file is enclosed within a > >#ifdef __ppc__ >... >#endif > >block. IIRC, this block has been added by Ronald for the >Mac universal build. Now, it seems that on the AIX machine >the __ppc__ symbols is not defined; removing the #ifdef/#endif >makes the built successful. The defines were indeed added for the universal build and I completely overlooked the fact that ffi_darwin.c is also used for AIX. One way to fix this is #if ! (defined(__APPLE__) && !defined(__ppc__)) ... #endif That is, compile the file unless __APPLE__ is defined but __ppc__ isn't. This more clearly documents the intent. > >We have asked (in the SF bug tracker) for the symbols that are defined; >one guy has executed 'gcc -v -c empty.c' and posted the output, as far as I >see these are the symbols defined in gcc: > >-D__GNUC__=2 >-D__GNUC_MINOR__=9 -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 >-D_AIX51 -D_LONG_LONG -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 >-D_AIX51 -D_LONG_LONG -Asystem(unix) -Asystem(aix) -D__CHAR_UNSIGNED__ >-D_ARCH_COM > >What should we do now? Should the conditional be changed to > >#if defined(__ppc__) || defined(_POWER) > >or should we suggest to add '-D__ppc__' to the CFLAGS env var, or what? >Any suggestions? > >Thanks, >Thomas > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com > > From tomerfiliba at gmail.com Sun Nov 26 16:40:52 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 26 Nov 2006 17:40:52 +0200 Subject: [Python-Dev] infinities Message-ID: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> i found several places in my code where i use positive infinity (posinf) for various things, i.e., def readline(self, limit = -1): if limit < 0: limit = 1e10000 # posinf chars = [] while limit > 0: ch = self.read(1) chars.append(ch) if not ch or ch == "\n": break limit -= 1 return "".join(chars) i like the concept, but i hate the "1e10000" stuff... why not add posint, neginf, and nan to the float type? i find it much more readable as: if limit < 0: limit = float.posinf posinf, neginf and nan are singletons, so there's no problem with adding as members to the type. -tomer From bob at redivi.com Sun Nov 26 16:52:24 2006 From: bob at redivi.com (Bob Ippolito) Date: Sun, 26 Nov 2006 10:52:24 -0500 Subject: [Python-Dev] infinities In-Reply-To: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> Message-ID: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> On 11/26/06, tomer filiba wrote: > i found several places in my code where i use positive infinity > (posinf) for various things, i.e., > > def readline(self, limit = -1): > if limit < 0: > limit = 1e10000 # posinf > chars = [] > while limit > 0: > ch = self.read(1) > chars.append(ch) > if not ch or ch == "\n": > break > limit -= 1 > return "".join(chars) > > i like the concept, but i hate the "1e10000" stuff... why not add > posint, neginf, and nan to the float type? i find it much more readable as: > > if limit < 0: > limit = float.posinf > > posinf, neginf and nan are singletons, so there's no problem with > adding as members to the type. sys.maxint makes more sense there. Or you could change it to "while limit != 0" and set it to -1 (though I probably wouldn't actually do that)... There is already a PEP 754 for float constants, which is implemented in the fpconst module (see CheeseShop). It's not (yet) part of the stdlib though. -bob From tomerfiliba at gmail.com Sun Nov 26 18:07:08 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 26 Nov 2006 19:07:08 +0200 Subject: [Python-Dev] infinities In-Reply-To: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> Message-ID: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com> > sys.maxint makes more sense there. no, it requires *infinity* to accomplish x - y == x; y != 0, for example: while limit > 0: limit -= len(chunk) with limit = posinf, the above code should be equivalent to "while True". > There is already a PEP 754 for float constants okay, that would suffice. but why isn't it part of stdlib already? the pep is three years old... it should either be rejected or accepted. meanwhile, there are lots of missing API functions in the floating-point implementation... besides, all the suggested APIs should be part of the float type, not a separate module. here's what i want: >>> f = 5.0 >>> f.is_infinity() False >>> float.PosInf 1.#INF -tomer On 11/26/06, Bob Ippolito wrote: > On 11/26/06, tomer filiba wrote: > > i found several places in my code where i use positive infinity > > (posinf) for various things, i.e., > > > > def readline(self, limit = -1): > > if limit < 0: > > limit = 1e10000 # posinf > > chars = [] > > while limit > 0: > > ch = self.read(1) > > chars.append(ch) > > if not ch or ch == "\n": > > break > > limit -= 1 > > return "".join(chars) > > > > i like the concept, but i hate the "1e10000" stuff... why not add > > posint, neginf, and nan to the float type? i find it much more readable as: > > > > if limit < 0: > > limit = float.posinf > > > > posinf, neginf and nan are singletons, so there's no problem with > > adding as members to the type. > > sys.maxint makes more sense there. Or you could change it to "while > limit != 0" and set it to -1 (though I probably wouldn't actually do > that)... > > There is already a PEP 754 for float constants, which is implemented > in the fpconst module (see CheeseShop). It's not (yet) part of the > stdlib though. > > -bob > From fredrik at pythonware.com Sun Nov 26 18:13:16 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 26 Nov 2006 18:13:16 +0100 Subject: [Python-Dev] infinities In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com> Message-ID: tomer filiba wrote: > no, it requires *infinity* to accomplish x - y == x; y != 0, for example: > > while limit > 0: > limit -= len(chunk) > > with limit = posinf, the above code should be equivalent to "while True". that's a remarkably stupid way to count bytes. if you want to argue for additions to the language, you could at least bother to come up with a sane use case. From pje at telecommunity.com Sun Nov 26 18:59:13 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 26 Nov 2006 12:59:13 -0500 Subject: [Python-Dev] infinities In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.co m> References: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> Message-ID: <5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com> At 07:07 PM 11/26/2006 +0200, tomer filiba wrote: > > sys.maxint makes more sense there. >no, it requires *infinity* to accomplish x - y == x; y != 0, for example: > >while limit > 0: > limit -= len(chunk) Um, you do realize that you're not going to be able to fit sys.maxint strings into a list, right? That's over 2 billion *pointers* worth of memory, so at least 8 gigabytes on a 32-bit machine... that probably can't address more than 4 gigabytes of memory to start with. The code will fail with MemoryError long before you exhaust sys.maxint, even in the case where you're using only 1-character strings. From julvar at tamu.edu Tue Nov 21 07:29:45 2006 From: julvar at tamu.edu (Julian) Date: Tue, 21 Nov 2006 00:29:45 -0600 Subject: [Python-Dev] Suggestion/ feature request In-Reply-To: <45629636.6090407@v.loewis.de> Message-ID: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu> > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: Tuesday, November 21, 2006 12:01 AM > To: Julian > Cc: python-dev at python.org > Subject: Re: [Python-Dev] Suggestion/ feature request > > Julian schrieb: > > I am using python with swig and I get a lot of macro redefinition > > warnings like so: > > warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition > > > > In the file - pyconfig.h - rather than the following lines, I was > > wondering if it would be more reasonable to use #ifdef > statements as > > shown in the bottom of the email... > > While I agree that would be reasonable, I also wonder why you > are getting these errors. Where is the first definition of > these macros, and how is the macro defined at the first definition? > > Regards, > Martin In my specific case, the order of the definitions in any wrapper file created by SWIG (I am using Version 1.3.30) looks like this: //example_wrap.cxx //snipped code /* Deal with Microsoft's attempt at deprecating C standard runtime functions */ #if !defined(SWIG_NO_CRT_SECURE_NO_DEPRECATE) && defined(_MSC_VER) && !defined(_CRT_SECURE_NO_DEPRECATE) # define _CRT_SECURE_NO_DEPRECATE #endif /* Python.h has to appear first */ #include //snipped code SWIG seems to have done it properly by checking to see if it has been defined already (which, I think, is how python should do it as well) Now, even if I am not using SWIG, I could imagine these being defined elsewhere (by other headers/libraries) or even by setting them in the VS2005 IDE project settings (which I actually do sometimes). While these are *just* warnings and not errors, it would look cleaner if pyconfig.h would check if they were defined already. Julian. From julvar at tamu.edu Tue Nov 21 20:13:09 2006 From: julvar at tamu.edu (Julian) Date: Tue, 21 Nov 2006 13:13:09 -0600 Subject: [Python-Dev] Suggestion/ feature request In-Reply-To: <456343E3.4000203@v.loewis.de> Message-ID: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu> > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: Tuesday, November 21, 2006 12:22 PM > To: Julian > Cc: python-dev at python.org > Subject: Re: [Python-Dev] Suggestion/ feature request > > Julian schrieb: > > SWIG seems to have done it properly by checking to see if > it has been > > defined already (which, I think, is how python should do it > as well) > > Now, even if I am not using SWIG, I could imagine these > being defined > > elsewhere (by other headers/libraries) or even by setting > them in the > > VS2005 IDE project settings (which I actually do sometimes). While > > these are *just* warnings and not errors, it would look cleaner if > > pyconfig.h would check if they were defined already. > > Sure; I have fixed this now in r52817 and r52818 > > I just wondered why you get the warning: you shouldn't get > one if the redefinition is the same as the original one. In > this case, it wasn't the same redefinition, as SWIG was > merely defining them, and Python was defining them to 1. > > Regards, > Martin > Thanks! you are right... I didn't know that ! I have two questions though... Is there any reason why Python is defining them to 1? In pyconfig.h, there is: #ifndef _CRT_SECURE_NO_DEPRECATE #define _CRT_SECURE_NO_DEPRECATE 1 #endif And then later on in the same file: /* Turn off warnings about deprecated C runtime functions in VisualStudio .NET 2005 */ #if _MSC_VER >= 1400 && !defined _CRT_SECURE_NO_DEPRECATE #define _CRT_SECURE_NO_DEPRECATE #endif Isn't that redundant? I don't think that second block will ever get executed. Moreover, in the second block, it is not being defined to 1. why is that ? Julian. From imurdock at imurdock.com Wed Nov 22 17:09:35 2006 From: imurdock at imurdock.com (Ian Murdock) Date: Wed, 22 Nov 2006 11:09:35 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: Hi everyone, Guido van Rossum suggested I send this email here. I'm CTO of the Free Standards Group and chair of the Linux Standard Base (LSB), the interoperability standard for the Linux distributions. We're wanting to add Python to the next version of the LSB (LSB 3.2) [1] and are looking for someone (or, better, a few folks) in the Python community to help us lead the effort to do that. The basic goal is to standardize the Python environment compliant Linux distributions (Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that application developers can write LSB compliant applications in Python. [1] http://www.freestandards.org/en/LSB_Roadmap The first question we have to answer is: What does it mean to "add Python to the LSB"? Is it enough to say that Python is present at a certain version and above, or do we need to do more than that (e.g., many distros ship numerous Python add-ons which apps may or may not rely on--do we need to specific some of these too)? What would be the least common denominator version? Answering this question will require us to look at the major Linux distros (RHEL, SLES, Debian, Ubuntu, etc.) to see what versions they ship. And so on. Once we've decided how best to specify that Python is present, how do we test that it is indeed present? Of course, there's the existing Python test suites, so there shouldn't be a lot of work to do here. Another question is how to handle binary modules. The LSB provides strict backward compatibility at the binary level, even across major versions, and that may or may not be appropriate for Python. The LSB is mostly concerned with backward compatibility from an application developer's point of view, and this would seem to mean largely 100% Python, whereas C extensions would seem to be largely the domain of component developers, such as Python access to Gtk or other OS services (here, we'd probably look to add those components to the LSB directly rather than specifying the Python ABI so they can be maintained separately). Of course I could be wrong about this. Anyway, as you can see, there are numerous issues to work out here. If anyone is interested in getting involved, please drop me a line, and I'd be happy to answer any questions (discussion on any of the topics above would be welcomed as well). Finally, for any Python developers in and around Berlin, the LSB is holding its next face to face meeting in Berlin December 4-6, where the LSB 3.2 roadmap will be finalized. If you could find some time to stop by and talk with us, we would deeply appreciate it: http://www.freestandards.org/en/LSB_face-to-face_%28December_2006%29 Thanks, -ian -- Ian Murdock 317-863-2590 http://ianmurdock.com/ "Don't look back--something might be gaining on you." --Satchel Paige From cfarwell at mac.com Sun Nov 26 13:45:03 2006 From: cfarwell at mac.com (Chris Farwell) Date: Sun, 26 Nov 2006 12:45:03 +0000 Subject: [Python-Dev] (no subject) Message-ID: <3471ECA0-BDC2-4830-8901-4E274F0EF802@mac.com> Mr. Rossum, I saw an old post you made about the Google Internships (Jan 25,2006). As a prospective for next summer, you mention that it would be in my best interest to contact brett Cannon. I have many questions I'd love to have answered, how do I go about contacting him? I look forward to your reply. Chris Farwell cfarwell at mac.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061126/03848bea/attachment-0001.html From martin at v.loewis.de Sun Nov 26 19:48:29 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 26 Nov 2006 19:48:29 +0100 Subject: [Python-Dev] infinities In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com> Message-ID: <4569E17D.3070608@v.loewis.de> tomer filiba schrieb: > okay, that would suffice. but why isn't it part of stdlib already? > the pep is three years old... it should either be rejected or accepted. > meanwhile, there are lots of missing API functions in the floating-point > implementation... It's not rejected because people keep requesting the feature, and not accepted because it's not implementable in general (i.e. it is difficult to implement on platforms where the double type is not IEEE-754). Regards, Martin From martin at v.loewis.de Sun Nov 26 20:10:12 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 26 Nov 2006 20:10:12 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: Message-ID: <4569E694.7040901@v.loewis.de> Ian Murdock schrieb: > I'm CTO of the Free Standards Group and chair of the Linux Standard > Base (LSB), the interoperability standard for the Linux distributions. > We're wanting to add Python to the next version of the LSB (LSB 3.2) [1] > and are looking for someone (or, better, a few folks) in the Python > community to help us lead the effort to do that. The basic goal > is to standardize the Python environment compliant Linux distributions > (Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that > application developers can write LSB compliant applications in Python. I wrote to Ian that I would be interested; participating in the meeting in Berlin is quite convenient. I can try to keep python-dev updated. Regards, Martin From aahz at pythoncraft.com Sun Nov 26 20:20:27 2006 From: aahz at pythoncraft.com (Aahz) Date: Sun, 26 Nov 2006 11:20:27 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <4569E694.7040901@v.loewis.de> References: <4569E694.7040901@v.loewis.de> Message-ID: <20061126192026.GA5909@panix.com> On Sun, Nov 26, 2006, "Martin v. L?wis" wrote: > > I wrote to Ian that I would be interested; participating in the meeting > in Berlin is quite convenient. I can try to keep python-dev updated. Please do -- it's not something I have a lot of cycles for but am interested in. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Usenet is not a democracy. It is a weird cross between an anarchy and a dictatorship. From pje at telecommunity.com Sun Nov 26 20:41:16 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 26 Nov 2006 14:41:16 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: Message-ID: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> At 11:09 AM 11/22/2006 -0500, Ian Murdock wrote: >The first question we have to answer is: What does it mean to "add >Python to the LSB"? Is it enough to say that Python is present >at a certain version and above, or do we need to do more than that >(e.g., many distros ship numerous Python add-ons which apps >may or may not rely on--do we need to specific some of these too)? Just a suggestion, but one issue that I think needs addressing is the FHS language that leads some Linux distros to believe that they should change Python's normal installation layout (sometimes in bizarre ways) or that they should remove and separately package different portions of the standard library. Other vendors apparently also patch Python in various ways to support their FHS-based theories of how Python should install files. These changes are detrimental to compatibility. Another issue is specifying dependencies. The existence of the Cheeseshop as a central registry of Python project names has not been taken into account in vendor packaging practices, for example. (Python 2.5 also introduced the ability to install metadata alongside installed Python packages, supporting runtime checking for package presence and versions.) I don't know how closely these issues tie into what the LSB is tying to do, as I've only observed these issues in the breach, where certain distribution policies require e.g. that project names be replaced with internal package names, demand separation of package data files from their packages, or other procrustean chopping that makes mincemeat of any attempt at multi-distribution compatibility for an application or multi-dependency library. Some clarification at the LSB level of what is actually considered standard for Python might perhaps be helpful in motivating updates to some of these policies. From tomerfiliba at gmail.com Sun Nov 26 20:57:08 2006 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 26 Nov 2006 21:57:08 +0200 Subject: [Python-Dev] infinities In-Reply-To: <5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com> <5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com> Message-ID: <1d85506f0611261157m15bcf761vdc5e1f57960f19f8@mail.gmail.com> > Um, you do realize that you're not going to be able to fit sys.maxint > strings into a list, right? i can multiply by four, thank you. of course i don't expect anyone to read a string *that* long. besides, this *particular example* isn't important, it was just meant to show why someone might want to use it. why are people being so picky about the details of an example code? first of all, a "while True" loop is not limited by sys.maxint, so i see no reason why i couldn't get the same result by subtracting from infinity. that may seem blunt, but it's a good way have the same code handle both cases (limited and unlimited reading). all i was asking for was a better way to express and handle infinity (and nan), instead of the poor-man's version of "nan = 2e2222/3e3333". float.posinf or float.isinf(5.0) seem the right way to me. for some reference, it seemed the right way to other people too: http://msdn2.microsoft.com/en-gb/library/system.double_methods.aspx http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Float.html the third-party fp module is nice, but it ought to be part of the float type, or at least part of stdlib. - - - - - - if it were up to me, *literals* producing infinity would be a syntax error (of course i would allow computations to result in infinity). for the reason why, consider this: >>> 1e11111 == 2e22222 True -tomer From talin at acm.org Sun Nov 26 21:24:03 2006 From: talin at acm.org (Talin) Date: Sun, 26 Nov 2006 12:24:03 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see Message-ID: <4569F7E3.9040004@acm.org> I've been looking once again over the docs for distutils and setuptools, and thinking to myself "this seems a lot more complicated than it ought to be". Before I get into detail, however, I want to explain carefully the scope of my critique - in particular, why I am talking about setuptools on the python-dev list. You see, in my mind, the process of assembling, distributing, and downloading a package is, or at least ought to be, a unified process. It ought to be a fundamental part of the system, and not split into separate tools with separate docs that have to be mentally assembled in order to understand it. Moreover, setuptools is the defacto standard these days - a novice programmer who googles for 'python install tools' will encounter setuptools long before they learn about distutils; and if you read the various mailing lists and blogs, you'll sense a subtle aura of deprecation and decay that surrounds distutils. I would claim, then, that regardless of whether setuptools is officially blessed or not, it is an intrinstic part of the "Python experience". (I'd also like to put forward the disclaimer that there are probably factual errors in this post, or errors of misunderstanding; All I can claim as an excuse is that it's not for lack of trying, and corrections are welcome as always.) Think about the idea of module distribution from a pedagogical standpoint - when does a newbie Python programmer start learning about module distribution and what do they learn first? A novice Python user will begin by writing scripts for themselves, and not thinking about distribution at all. However, once they reach the point where they begin to think about packaging up their module, the Python documentation ought to be able to lead them, step by step, towards a goal of making a distributable package: -- It should teach them how to organize their code into packages and modules -- It should show them how to write the proper setup scripts -- If there is C code involved, it should explain how that fits into the picture. -- It should explain how to write unit tests and where they should go. So how does the current system fail in this regard? The docs for each component - distutils, setuptools, unit test frameworks, and so on, only talk about that specific module - not how it all fits together. For example, the docs for distutils start by telling you how to build a setup script. It never explains why you need a setup script, or why Python programs need to be "installed" in the first place. [1] The distutils docs never describe how your directory structure ought to look. In fact, they never tell you how to *write* a distributable package; rather, it seems to be more oriented towards taking an already-working package and modifying it to be distributable. The setuptools docs are even worse in this regard. If you look carefully at the docs for setuptools, you'll notice that each subsection is effectively a 'diff', describing how setuputils is different from distutils. One section talks about the "new and changed keywords", without explaining what the old keywords were or how to find them. Thus, for the novice programmer, learning how to write a setup script ends up being a process of flipping back and forth between the distutils and setuptools docs, trying to hold in their minds enough of each to be able to achieve some sort of understanding. What we have now does a good job of explaining how the individual tools work, but it doesn't do a good job of answering the question "Starting from an empty directory, how do I create a distributable Python package?" A novice programmer wants to know what to create first, what to create next, and so on. This is especially true if the novice programmer is creating an extension module. Suppose I have a C library that I need to wrap. In order to even compile and test it, I'm going to need a setup script. That means I need to understand distutils before I even think about distribution, before I even begin writing the code! (Sure, I could write a Makefile, but I'd only end up throwing it away later -- so why not cut to the chase and *start* with a setup script? Ans: Because it's too hard!) But it isn't just the docs that are at fault here - otherwise, I'd be posting this on a different mailing list. It seems like the whole architecture is 'diff'-based, a series of patches on top of patches, which are in need of some serious refactoring. Except that nobody can do this refactoring, because there's no formal list of requirements. I look at distutils, and while some parts are obvious, there are other parts where I go "what problem were they trying to solve here?" In my experience, you *don't* go mucking with someone's code and trying to fix it unless you understand what problem they were trying to solve - otherwise you'll botch it and make a mess. Since few people ever bother to write down what problem they were trying to solve (although they tend to be better at describing their clever solution), usually this ends up being done through a process of reverse engineering the requirements from the code, unless you are lucky enough to have someone around who knows the history of the thing. Admittedly, I'm somewhat in ignorance here. My perspective is that of an 'end-user developer', someone who uses these tools but does not write them. I don't know the internals of these tools, nor do I particularly want to - I've got bigger fish to fry. I'm posting this here because what I'd like folks to think about is the whole process of Python development, not just the documentation. What is the smoothest path from empty directory to a finished package on PyPI? What can be changed about the current standard libraries that will ease this process? [1] The answer, AFAICT, is that 'setup' is really a Makefile - in other words, its a platform-independent way of describing how to construct a compiled module from sources, and making it available to all programs on that system. Although this gets confusing when we start talking about "pure python" modules that have no C component - because we have all this language that talks about compiling and installing and such, when all that is really going on underneath is a plain old file copy. -- Talin From fredrik at pythonware.com Sun Nov 26 21:35:03 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun, 26 Nov 2006 21:35:03 +0100 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <4569F7E3.9040004@acm.org> References: <4569F7E3.9040004@acm.org> Message-ID: Talin wrote: > But it isn't just the docs that are at fault here - otherwise, I'd be > posting this on a different mailing list. It seems like the whole > architecture is 'diff'-based, a series of patches on top of patches, > which are in need of some serious refactoring. so to summarize, you want someone to rewrite the code and write new documentation, and since you didn't even have time to make your post shorter, that someone will obviously not be you ? From talin at acm.org Sun Nov 26 21:48:05 2006 From: talin at acm.org (Talin) Date: Sun, 26 Nov 2006 12:48:05 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: References: <4569F7E3.9040004@acm.org> Message-ID: <4569FD85.4010006@acm.org> Fredrik Lundh wrote: > Talin wrote: > >> But it isn't just the docs that are at fault here - otherwise, I'd be >> posting this on a different mailing list. It seems like the whole >> architecture is 'diff'-based, a series of patches on top of patches, >> which are in need of some serious refactoring. > > so to summarize, you want someone to rewrite the code and write new > documentation, and since you didn't even have time to make your post > shorter, that someone will obviously not be you ? Oh, it was a lot longer when I started :) As far as rewriting it goes - I can only rewrite things that I understand. > From sluggoster at gmail.com Sun Nov 26 22:21:55 2006 From: sluggoster at gmail.com (Mike Orr) Date: Sun, 26 Nov 2006 13:21:55 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <4569F7E3.9040004@acm.org> References: <4569F7E3.9040004@acm.org> Message-ID: <6e9196d20611261321m5142989yef4c9180ebc9427e@mail.gmail.com> On 11/26/06, Talin wrote: > I've been looking once again over the docs for distutils and setuptools, > and thinking to myself "this seems a lot more complicated than it ought > to be". > > Before I get into detail, however, I want to explain carefully the scope > of my critique - in particular, why I am talking about setuptools on the > python-dev list. You see, in my mind, the process of assembling, > distributing, and downloading a package is, or at least ought to be, a > unified process. It ought to be a fundamental part of the system, and > not split into separate tools with separate docs that have to be > mentally assembled in order to understand it. > > Moreover, setuptools is the defacto standard these days - a novice > programmer who googles for 'python install tools' will encounter > setuptools long before they learn about distutils; and if you read the > various mailing lists and blogs, you'll sense a subtle aura of > deprecation and decay that surrounds distutils. Look at the current situation as more of an evoluntionary point than a finished product. There's widespread support for integrating setuptools into Python as you suggest. I've heard it discussed at Pycon the past two years. The reason it hasn't been done yet is technical, from what I've heard. Distutils is apparently difficult to patch correctly and could stand a rewrite. I'm currently studying the Pylons implementation and thus having to learn more about entry points, resources, ini files used by eggs, etc. This requires studying three different pages on the peak.telecommunity.com site -- exactly the problem you're describing. A comprehensive third-party manual that integrates the documentation would be a good place to start. Even the outline of such a manual would be a good. That would give a common baseline of understanding for package users, package developers, and core developers. I wonder if one of the Python books already has this written down somewhere. >From the manual one could then distill a spec for "what's needed in a package manager, what features a distutils upgrade would provide, and what a package should/may contain". That would be a basis for one or more PEPs. The "diff" approach is understandable at the beginning, because that's how the developers think of it, and how most users will approach it initially. We also needed real-world experience to see if the setuptools approach was even feasable large-scale or whether it needed major changes. Now we have more experience, and more Pythoneers are appearing who are unfamiliar with the "distutils-only" approach. So requests like Talin's will become more frequent. It's such a big job and Python 2.6 is slated as "minimal features" release, so it may be better to target this for Python 3 and backport it if possible. -- Mike Orr From pje at telecommunity.com Sun Nov 26 23:36:27 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 26 Nov 2006 17:36:27 -0500 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <6e9196d20611261321m5142989yef4c9180ebc9427e@mail.gmail.com > References: <4569F7E3.9040004@acm.org> <4569F7E3.9040004@acm.org> Message-ID: <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com> At 01:21 PM 11/26/2006 -0800, Mike Orr wrote: >A comprehensive third-party manual that integrates the documentation >would be a good place to start. Even the outline of such a manual >would be a good. That would give a common baseline of understanding >for package users, package developers, and core developers. A number of people have written quick-start or how-to guides for setuptools, although I haven't been keeping track. I have noticed, however, that a signficant number of help requests for setuptools can be answered by internal links to one of its manuals -- and when a topic comes up that isn't in the manual, I usually add it. The "diff" issue is certainly there, of course, as is the fact that there are multiple manuals. However, I don't think the answer is fewer manuals, in fact it's likely to be having *more*. What exists right now is a developer's guide and reference for setuptools, a reference for the pkg_resources API, and an all-purpose handbook for easy_install. Each of these could use beginner's introductions or tutorials that are deliberately short on details, but which provide links to the relevant sections of the comprehensive manuals. My emphasis on the existing manuals was aimed at early adopters, who were likely to be familiar with at least some of distutils' hazards and difficulties, and thus would learn most quickly (and be most motivated) by seeing what was different. Obviously, nearly everybody in that camp has either already switched or decided they're not switching due to investment in other distutils-wrapping technologies and/or incompatible philosophies. So, the manuals are no longer adequate for the next wave of developers. Anyway, I would be happy to link from the manuals and Cheeseshop page to quality tutorials that focus on one or more aspects of developing, packaging, or distributing Python projects using setuptools. From guido at python.org Mon Nov 27 01:07:59 2006 From: guido at python.org (Guido van Rossum) Date: Sun, 26 Nov 2006 16:07:59 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <4569E694.7040901@v.loewis.de> References: <4569E694.7040901@v.loewis.de> Message-ID: Excellent! Like Aahz, I have no cycles, but I think it's a worthy goal. --Guido On 11/26/06, "Martin v. L?wis" wrote: > Ian Murdock schrieb: > > I'm CTO of the Free Standards Group and chair of the Linux Standard > > Base (LSB), the interoperability standard for the Linux distributions. > > We're wanting to add Python to the next version of the LSB (LSB 3.2) [1] > > and are looking for someone (or, better, a few folks) in the Python > > community to help us lead the effort to do that. The basic goal > > is to standardize the Python environment compliant Linux distributions > > (Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that > > application developers can write LSB compliant applications in Python. > > I wrote to Ian that I would be interested; participating in the meeting > in Berlin is quite convenient. I can try to keep python-dev updated. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From sluggoster at gmail.com Mon Nov 27 04:05:06 2006 From: sluggoster at gmail.com (Mike Orr) Date: Sun, 26 Nov 2006 19:05:06 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com> References: <4569F7E3.9040004@acm.org> <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com> Message-ID: <6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com> On 11/26/06, Phillip J. Eby wrote: > I have noticed, however, that a signficant number of help requests for > setuptools can be answered by internal links to one of its manuals -- and > when a topic comes up that isn't in the manual, I usually add it. Hmm, I may have a couple topics for you after I check my notes. > The "diff" issue is certainly there, of course, as is the fact that there > are multiple manuals. However, I don't think the answer is fewer manuals, > in fact it's likely to be having *more*. What exists right now is a > developer's guide and reference for setuptools, a reference for the > pkg_resources API, and an all-purpose handbook for easy_install. Each of > these could use beginner's introductions or tutorials that are deliberately > short on details, but which provide links to the relevant sections of the > comprehensive manuals. I could see a comprehensive manual running forty pages, and most readers only caring about a small fraction of it. So you have a point. Maybe more impotant than one book is having "one place to go", a TOC of articles that are all independent yet written to complement each other. But Talin's point is still valid. Users have questions like, "How do I structure my package so it takes advantage of all the gee-whiz cheeseshop features? Where do I put my tests? Should I use unittest, py.test, or nose? How will users see my README and my docs if they easy_install my package? What are all those files in the EGG-INFO directory? What's that word 'distribution' in some of the function signatures? How do I use entry points, they look pretty complicated?" Some of these questions are multi-tool or are outside the scope of setuptools; some span both the Peak docs and the Python docs. People need an answer that starts with their question, rather than an answer that's a section in a manual describing a particular tool. -- Mike Orr From talin at acm.org Mon Nov 27 04:11:18 2006 From: talin at acm.org (Talin) Date: Sun, 26 Nov 2006 19:11:18 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com> References: <4569F7E3.9040004@acm.org> <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com> <6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com> Message-ID: <456A5756.80406@acm.org> Mike Orr wrote: > On 11/26/06, Phillip J. Eby wrote: >> I have noticed, however, that a signficant number of help requests for >> setuptools can be answered by internal links to one of its manuals -- and >> when a topic comes up that isn't in the manual, I usually add it. > > Hmm, I may have a couple topics for you after I check my notes. > >> The "diff" issue is certainly there, of course, as is the fact that there >> are multiple manuals. However, I don't think the answer is fewer manuals, >> in fact it's likely to be having *more*. What exists right now is a >> developer's guide and reference for setuptools, a reference for the >> pkg_resources API, and an all-purpose handbook for easy_install. Each of >> these could use beginner's introductions or tutorials that are deliberately >> short on details, but which provide links to the relevant sections of the >> comprehensive manuals. > > I could see a comprehensive manual running forty pages, and most > readers only caring about a small fraction of it. So you have a > point. Maybe more impotant than one book is having "one place to go", > a TOC of articles that are all independent yet written to complement > each other. > > But Talin's point is still valid. Users have questions like, "How do > I structure my package so it takes advantage of all the gee-whiz > cheeseshop features? Where do I put my tests? Should I use unittest, > py.test, or nose? How will users see my README and my docs if they > easy_install my package? What are all those files in the EGG-INFO > directory? What's that word 'distribution' in some of the function > signatures? How do I use entry points, they look pretty complicated?" > Some of these questions are multi-tool or are outside the scope of > setuptools; some span both the Peak docs and the Python docs. People > need an answer that starts with their question, rather than an answer > that's a section in a manual describing a particular tool. You said it way better than I did - I feel totally validated now :) -- Talin From rhamph at gmail.com Mon Nov 27 10:17:10 2006 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 27 Nov 2006 02:17:10 -0700 Subject: [Python-Dev] infinities In-Reply-To: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com> Message-ID: On 11/26/06, tomer filiba wrote: > i found several places in my code where i use positive infinity > (posinf) for various things, i.e., > > > i like the concept, but i hate the "1e10000" stuff... why not add > posint, neginf, and nan to the float type? i find it much more readable as: > > if limit < 0: > limit = float.posinf > > posinf, neginf and nan are singletons, so there's no problem with > adding as members to the type. There's no reason this has to be part of the float type. Just define your own PosInf/NegInf singletons and PosInfType/NegInfType classes, giving them the appropriate special methods. NaN is a bit iffier, but in your case it's sufficient to raise an exception whenever it would be created. Consider submitting it to the Python Cookbook when you're done. ;) -- Adam Olsen, aka Rhamphoryncus From jmatejek at suse.cz Mon Nov 27 14:38:13 2006 From: jmatejek at suse.cz (Jan Matejek) Date: Mon, 27 Nov 2006 14:38:13 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> Message-ID: <456AEA45.7060209@suse.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Phillip J. Eby napsal(a): > Just a suggestion, but one issue that I think needs addressing is the FHS > language that leads some Linux distros to believe that they should change > Python's normal installation layout (sometimes in bizarre ways) (...) > Other vendors apparently also patch Python in various > ways to support their FHS-based theories of how Python should install > files. +1 on that. There should be a clear (and clearly presented) idea of how Python is supposed to be laid out in the distribution-provided /usr hierarchy. And it would be nice if this idea complied to FHS. It would also be nice if somebody finally admitted the existence of /usr/lib64 and made Python aware of it ;e) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFFaupFjBrWA+AvBr8RArJcAKCGbeoih7TwKp2tBHtV3RMoY4JqvQCeJq87 +RgREnCI7DM/G5MNtjqmdVI= =WHpB -----END PGP SIGNATURE----- From pje at telecommunity.com Mon Nov 27 15:09:35 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 27 Nov 2006 09:09:35 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456AEA45.7060209@suse.cz> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> At 02:38 PM 11/27/2006 +0100, Jan Matejek wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >Phillip J. Eby napsal(a): > > Just a suggestion, but one issue that I think needs addressing is the FHS > > language that leads some Linux distros to believe that they should change > > Python's normal installation layout (sometimes in bizarre ways) (...) > > Other vendors apparently also patch Python in various > > ways to support their FHS-based theories of how Python should install > > files. > >+1 on that. There should be a clear (and clearly presented) idea of how >Python is supposed to be laid out in the distribution-provided /usr >hierarchy. And it would be nice if this idea complied to FHS. > >It would also be nice if somebody finally admitted the existence of >/usr/lib64 and made Python aware of it ;e) Actually, I meant that (among other things) it should be clarified that it's alright to e.g. put .pyc and data files inside Python library directories, and NOT okay to split them up. From jason.orendorff at gmail.com Mon Nov 27 17:00:57 2006 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 27 Nov 2006 11:00:57 -0500 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <4564C5E6.8070605@v.loewis.de> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com> <4564C5E6.8070605@v.loewis.de> Message-ID: Way back on 11/22/06, "Martin v. L?wis" wrote: > Nick Coghlan schrieb: > > Martin v. L?wis wrote: > >> I personally consider it "good style" to rely on implementation details > >> of CPython; > > > > Is there a 'do not' missing somewhere in there? > > No - I really mean it. I can find nothing wrong with people relying on > reference counting to close files, for example. It's a property of > CPython, and not guaranteed in other Python implementations - yet it > works in a well-defined way in CPython. Code that relies on that feature > is not portable, but portability is only one goal in software > development, and may be irrelevant for some projects. It's not necessarily future-portable either. Having your software not randomly break over time is relevant for most nontrivial projects. > Similarly, it's fine when people rely on the C type "int" to have > 32-bits when used with gcc on x86 Linux. Relying on behavior that's implementation-defined in a particular way for a reason (like int being 32 bits on 32-bit hardware) is one thing. Relying on behavior that even the implementors might not be consciously aware of (or consider important to retain across versions) is another. -j From aahz at pythoncraft.com Mon Nov 27 17:43:23 2006 From: aahz at pythoncraft.com (Aahz) Date: Mon, 27 Nov 2006 08:43:23 -0800 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com> <4564C5E6.8070605@v.loewis.de> Message-ID: <20061127164323.GA21272@panix.com> On Mon, Nov 27, 2006, Jason Orendorff wrote: > Way back on 11/22/06, "Martin v. L?wis" wrote: >> Nick Coghlan schrieb: >>> Martin v. L?wis wrote: >>>> >>>> I personally consider it "good style" to rely on implementation details >>>> of CPython; >>> >>> Is there a 'do not' missing somewhere in there? >> >> No - I really mean it. I can find nothing wrong with people relying on >> reference counting to close files, for example. It's a property of >> CPython, and not guaranteed in other Python implementations - yet it >> works in a well-defined way in CPython. Code that relies on that feature >> is not portable, but portability is only one goal in software >> development, and may be irrelevant for some projects. > > It's not necessarily future-portable either. Having your software not > randomly break over time is relevant for most nontrivial projects. We recently had this discussion at my day job. We ended up agreeing that using close() was an encouraged but not required style, because to really avoid breakage we'd have to go with a full-bore try/except style for file handling, and that would require too many changes (especially without upgrading to 2.5, and we're still using 2.2/2.3). -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Usenet is not a democracy. It is a weird cross between an anarchy and a dictatorship. From jason.orendorff at gmail.com Mon Nov 27 20:47:52 2006 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 27 Nov 2006 14:47:52 -0500 Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations In-Reply-To: <20061127164323.GA21272@panix.com> References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local> <45636067.7040305@v.loewis.de> <20061121215620.GA24206@code0.codespeak.net> <45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com> <4564C5E6.8070605@v.loewis.de> <20061127164323.GA21272@panix.com> Message-ID: On 11/27/06, Aahz wrote: > On Mon, Nov 27, 2006, Jason Orendorff wrote: > > Way back on 11/22/06, "Martin v. L?wis" wrote: > >> [...] I can find nothing wrong with people relying on > >> reference counting to close files, for example. It's a property of > >> CPython, and not guaranteed in other Python implementations - yet it > >> works in a well-defined way in CPython. [...] > > > > [Feh.] > > We recently had this discussion at my day job. We ended up agreeing > that using close() was an encouraged but not required style, because to > really avoid breakage we'd have to go with a full-bore try/except style > for file handling, and that would require too many changes (especially > without upgrading to 2.5, and we're still using 2.2/2.3). Well, CPython's refcounting is something Python-dev is (understatement) very conscious of. I think I've even heard assurances that it won't change Any Time Soon. But this isn't the case for every CPython implementation detail. Remember what brought all this up. If it's obscure enough that Fredrik Lundh has to ask around, I wouldn't bet the ranch on it. -j From r.m.oudkerk at googlemail.com Mon Nov 27 21:36:21 2006 From: r.m.oudkerk at googlemail.com (Richard Oudkerk) Date: Mon, 27 Nov 2006 20:36:21 +0000 Subject: [Python-Dev] Cloning threading.py using processes Message-ID: Version 0.10 of the 'processing' package is available at the cheeseshop: http://cheeseshop.python.org/processing It is intended to make writing programs using processes almost the same as writing programs using threads. (By importing from 'processing.dummy' instead of 'processing' one can use threads with the same API.) It has been tested on both Windows and Unix. Shared objects are created on a 'manager' which runs in its own processes. Communication with it happens using sockets or (on windows) named pipes. An example where integers are sent through a shared queue from a child process to its parent: . from processing import Process, Manager . . def f(q): . for i in range(10): . q.put(i*i) . q.put('STOP') . . if __name__ == '__main__': . manager = Manager() . queue = manager.Queue(maxsize=10) . . p = Process(target=f, args=[queue]) . p.start() . . result = None . while result != 'STOP': . result = queue.get() . print result . . p.join() It has had some changes since the version I posted lasted month: 1) The use of tokens to identify shared objects is now hidden, so now the API of 'processing' really is very similar to that of 'threading'. 2) It is much faster than before: on both Windows XP and Linux a manager serves roughly 20,000 requests/second on a 2.5 Ghz Pentium 4. (Though it is not a fair comparison that is 50-100 times faster than using SimpleXMLRPCServer/xmlrpclib.) 3) The manager process just reuses the standard synchronization types from threading.py, Queue.py and spawns a new thread to serve each process/thread which owns a proxy. (The old version was single threaded and had a select loop.) 4) Registering new shared types is straight forward, for instance . from processing.manager import ProcessBaseManager . . class Foo(object): . def bar(self): . print 'BAR' . . class NewManager(ProcessBaseManager): . pass . . NewManager.register('Foo', Foo, exposed=['bar']) . . if __name__ == '__main__': . manager = NewManager() . foo = manager.Foo() . foo.bar() # => prints 'BAR' Cheers Richard From martin at v.loewis.de Tue Nov 28 00:34:28 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Nov 2006 00:34:28 +0100 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <4569FD85.4010006@acm.org> References: <4569F7E3.9040004@acm.org> <4569FD85.4010006@acm.org> Message-ID: <456B7604.6050809@v.loewis.de> Talin schrieb: > As far as rewriting it goes - I can only rewrite things that I understand. So if you want this to change, you obviously need to understand the entire distutils. It's possible to do that; some people have done it (the "understanding" part) - just go ahead and start reading source code. Regards, Martin From martin at v.loewis.de Tue Nov 28 00:39:06 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Nov 2006 00:39:06 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> Message-ID: <456B771A.8090300@v.loewis.de> Phillip J. Eby schrieb: > Actually, I meant that (among other things) it should be clarified that > it's alright to e.g. put .pyc and data files inside Python library > directories, and NOT okay to split them up. My gut feeling is that this is out of scope for the LSB. The LSB would only specify what a confirming distribution should do, not what confirming applications need to do. But we will see. Regards, Martin From martin at v.loewis.de Tue Nov 28 01:06:43 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Nov 2006 01:06:43 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456AEA45.7060209@suse.cz> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <456AEA45.7060209@suse.cz> Message-ID: <456B7D93.2090004@v.loewis.de> Jan Matejek schrieb: > +1 on that. There should be a clear (and clearly presented) idea of how > Python is supposed to be laid out in the distribution-provided /usr > hierarchy. And it would be nice if this idea complied to FHS. The LSB refers to the FHS, so it is clear that LSB support for Python will have follow the LHS. Specifically, LSB 3.1 includes FHS 2.3 as a normative reference. > It would also be nice if somebody finally admitted the existence of > /usr/lib64 and made Python aware of it ;e) I don't think this is really relevant for Python. The FHS specifies that 64-bit libraries must be in /lib64 on AMD64-Linux. It is silent on where to put Python source files and .pyc files, and, indeed, putting them into /usr/lib/pythonX.Y seems to be FHS-conforming: # /usr/lib includes object files, libraries, and internal binaries that # are not intended to be executed directly by users or shell scripts. In any case, changing Python is certainly out of the scope of the LSB committee: they might put requirements on Python installations, but it's not their job to "fix" Python. Regards, Martin From sluggoster at gmail.com Tue Nov 28 02:44:54 2006 From: sluggoster at gmail.com (Mike Orr) Date: Mon, 27 Nov 2006 17:44:54 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <456B7604.6050809@v.loewis.de> References: <4569F7E3.9040004@acm.org> <4569FD85.4010006@acm.org> <456B7604.6050809@v.loewis.de> Message-ID: <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com> On 11/27/06, "Martin v. L?wis" wrote: > Talin schrieb: > > As far as rewriting it goes - I can only rewrite things that I understand. > > So if you want this to change, you obviously need to understand the > entire distutils. It's possible to do that; some people have done > it (the "understanding" part) - just go ahead and start reading source > code. You (and Fredrik) are being a little harsh on Talin. I understand the need to encourage people to fix things themselves rather than just complaining about stuff they don't like. But people don't have an unlimited amount of time and expertise to work on several Python projects simultaneously. Nevertheless, they should be able to offer an "It would be good if..." suggestion without being stomped on. The suggestion itself can be a contribution if it focuses people's attention on a problem and a potential solution. Just because somebody can't learn a big subsystem and write code or docs for it *at this moment* doesn't mean they never will. And even if they don't, it's possible to make contributions in one area of Python and suggestions in another... or does the karma account not work that way? I don't see Talin saying, "You should fix this for me." He's saying, "I'd like this improved and I'm working on it, but it's a big job and I need help, ideally from someone with more expertise in distutils." Ultimately for Python the question isn't, "Does Talin want this done?" but, "Does this dovetail with the direction Python generally wants to go?" From what I've seen of setuptools/distutils evolution, yes, it's consistent with what many people want for Python. So instead of saying, "You (Talin) should take on this task alone because you want it" as if nobody else did, it would be better to say, "Thank you, Talin, for moving this important Python issue along." I've privately offered Talin some (unfinished) material I've been working on anyway that relates to his vision. When I get some other projects cleared away I'd like to put together that TOC of links I mentioned and perhaps collaborate on a Guide with whoever wants to. But I also need to learn more about setuptools before I can do that. As it happens I need the information anyway because I'm about to package an egg.... -- Mike Orr From talin at acm.org Tue Nov 28 08:10:08 2006 From: talin at acm.org (Talin) Date: Mon, 27 Nov 2006 23:10:08 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com> References: <4569F7E3.9040004@acm.org> <4569FD85.4010006@acm.org> <456B7604.6050809@v.loewis.de> <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com> Message-ID: <456BE0D0.9010507@acm.org> Mike Orr wrote: > On 11/27/06, "Martin v. L?wis" wrote: >> Talin schrieb: >>> As far as rewriting it goes - I can only rewrite things that I understand. >> So if you want this to change, you obviously need to understand the >> entire distutils. It's possible to do that; some people have done >> it (the "understanding" part) - just go ahead and start reading source >> code. > > You (and Fredrik) are being a little harsh on Talin. I understand the > need to encourage people to fix things themselves rather than just > complaining about stuff they don't like. But people don't have an > unlimited amount of time and expertise to work on several Python > projects simultaneously. Nevertheless, they should be able to offer > an "It would be good if..." suggestion without being stomped on. The > suggestion itself can be a contribution if it focuses people's > attention on a problem and a potential solution. Just because > somebody can't learn a big subsystem and write code or docs for it *at > this moment* doesn't mean they never will. And even if they don't, > it's possible to make contributions in one area of Python and > suggestions in another... or does the karma account not work that way? > > I don't see Talin saying, "You should fix this for me." He's saying, > "I'd like this improved and I'm working on it, but it's a big job and > I need help, ideally from someone with more expertise in distutils." > Ultimately for Python the question isn't, "Does Talin want this done?" > but, "Does this dovetail with the direction Python generally wants to > go?" From what I've seen of setuptools/distutils evolution, yes, it's > consistent with what many people want for Python. So instead of > saying, "You (Talin) should take on this task alone because you want > it" as if nobody else did, it would be better to say, "Thank you, > Talin, for moving this important Python issue along." > > I've privately offered Talin some (unfinished) material I've been > working on anyway that relates to his vision. When I get some other > projects cleared away I'd like to put together that TOC of links I > mentioned and perhaps collaborate on a Guide with whoever wants to. > But I also need to learn more about setuptools before I can do that. > As it happens I need the information anyway because I'm about to > package an egg.... > What you are saying is basically correct, although I have a slightly different spin on it. I've written a lot of documentation over the years, and I know that one of the hardest parts of writing documentation is trying to identify your own assumptions. To someone who already knows how the system works, its hard to understand the mindset of someone who is just learning it. You tend to unconsciously assume knowledge of certain things which a new user might not know. To that extent, it can be useful sometimes to have someone who is in the process of learning how to use the system, and who is willing to carefully analyze and write down their own experiences while doing so. Most of the time people are too busy to do this - they want to get their immediate problem solved, and they aren't interested in how difficult it will be for the next person. This is especially true in cases where the problem that is holding them up is three levels down from the level where their real goal is - they want to be able to "pop the stack" of problems as quickly as possible, so that they can get back to solving their *real* problem. So what I am offering, in this case, is my ignorance -- but a carefully described ignorance :) I don't demand that anyone do anything - I'm merely pointing out some things that people may or may not care about. Now, in this particular case, I have actually used distutils before. But distutils is one of those systems (like Perl) which tends to leak out of your brain if you don't use it regularly - that is, if you only use it once every 6 months, at the end of 6 months you have forgotten most of what you have learned, and you have to start the learning curve all over again. And I am in the middle of that re-learning process right now. What I am doing right now is creating a new extension project using setuputils, and keeping notes on what I do. So for example, I start by creating the directory structure: mkdir myproject cd myproject mkdir src mkdir test Next, create a minimal setup.py script. I won't include that here, but it's in the notes. Next, create the myproject.c file for the module in src/, and write the 'init' function for the module. (again, content omitted but it's in my notes). Create a projectname_unittest.py file in test. Add both of these to the setup.py file. At this point, you ought to be able to a "python setup.py test" and have it succeed. At this point, you can start adding types and methods, with a unit test for each one, testing each one as it is added. Now, I realize that all of this is "baby steps" to you folks, but it took me a day or so to figure out. And its interesting that even these few steps cut across a number of tools and libraries - setuptools, distutils, unittest, the "extending Python" doc and the "Python C API" doc. (BTW, I realized another thing that would be really handy is if the "extending Python" doc contained hyperlink references to the "Python C API" doc, so that when it talks about, say, PyArg_ParseTuple, you could go straight to the reference doc for it.) -- Talin From robinbryce at gmail.com Tue Nov 28 13:19:50 2006 From: robinbryce at gmail.com (Robin Bryce) Date: Tue, 28 Nov 2006 12:19:50 +0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <456AEA45.7060209@suse.cz> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> Message-ID: > Actually, I meant that (among other things) it should be clarified that > it's alright to e.g. put .pyc and data files inside Python library > directories, and NOT okay to split them up. Phillip, Just to be clear: I understand you are not in favour of re-packaging data from python projects (projects in the distutils sense), separately and I strongly agree with this view. Are you opposed to developers choosing to *not* bundle data as python package data ? How much, if any, of the setuptools / distutils conventions do you think could sensibly peculate up to the LSB ? There are a couple of cases in ubuntu/debian (as of 6.10 edgy) that I think are worth considering: python2.4 profile (pstats) etc, was removed due to licensing issues rather than FHS. Should not be an issue for python2.5 but what, in general, can a vendor do except break python if their licensing policy cant accommodate all of pythons batteries ? python2.4 distutils is excluded by default. This totally blows in my view but I appreciate this one is a minefield of vendor packaging politics. It has to be legitimate for Python / setuptools too provide packaging infrastructure and conventions that are viable on more than linux. Is it unreasonable for a particular vendor to decide that, on their platform, the will disable Python's packaging conventions ? Is there any way to keep the peace on this one ? Cheers, Robin On 27/11/06, Phillip J. Eby wrote: > At 02:38 PM 11/27/2006 +0100, Jan Matejek wrote: > >-----BEGIN PGP SIGNED MESSAGE----- > >Hash: SHA1 > > > >Phillip J. Eby napsal(a): > > > Just a suggestion, but one issue that I think needs addressing is the FHS > > > language that leads some Linux distros to believe that they should change > > > Python's normal installation layout (sometimes in bizarre ways) (...) > > > Other vendors apparently also patch Python in various > > > ways to support their FHS-based theories of how Python should install > > > files. > > > >+1 on that. There should be a clear (and clearly presented) idea of how > >Python is supposed to be laid out in the distribution-provided /usr > >hierarchy. And it would be nice if this idea complied to FHS. > > > >It would also be nice if somebody finally admitted the existence of > >/usr/lib64 and made Python aware of it ;e) > > Actually, I meant that (among other things) it should be clarified that > it's alright to e.g. put .pyc and data files inside Python library > directories, and NOT okay to split them up. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/robinbryce%40gmail.com > From anthony at interlink.com.au Tue Nov 28 14:53:14 2006 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed, 29 Nov 2006 00:53:14 +1100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> Message-ID: <200611290053.17199.anthony@interlink.com.au> On Tuesday 28 November 2006 23:19, Robin Bryce wrote: > python2.4 profile (pstats) etc, was removed due to licensing > issues rather than FHS. Should not be an issue for python2.5 but > what, in general, can a vendor do except break python if their > licensing policy cant accommodate all of pythons batteries ? That's a historical case, and as far as I know, unique. I can't imagine we'd accept any new standard library contributions (no matter how compelling) without the proper licensing work being done. > python2.4 distutils is excluded by default. This totally blows in > my view but I appreciate this one is a minefield of vendor > packaging politics. It has to be legitimate for Python / > setuptools too provide packaging infrastructure and conventions > that are viable on more than linux. Is it unreasonable for a > particular vendor to decide that, on their platform, the will > disable Python's packaging conventions ? Is there any way to keep > the peace on this one ? I still have no idea why this was one - I was also one of the people who jumped up and down asking Debian/Ubuntu to fix this idiotic decision. Personally, I consider any distributions that break the standard library into non-required pieces to be shipping a _broken_ Python. As someone who writes and releases software, this is a complete pain. I can't tell you how many times through the years I'd get user complaints because they didn't get distutils installed as part of the standard library. (The only other packaging thing like this that I'm aware of is python-minimal in Ubuntu. This is done for installation purposes and wacky dependency issues that occur when a fair chunk of the O/S is actually written in Python. It's worth noting that the entirety of the Python stdlib is a required package, so it doesn't cause issues.) Anthony -- Anthony Baxter It's never too late to have a happy childhood. From barry at python.org Tue Nov 28 16:26:53 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Nov 2006 10:26:53 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <200611290053.17199.anthony@interlink.com.au> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 28, 2006, at 8:53 AM, Anthony Baxter wrote: > (The only other packaging thing like this that I'm aware of is > python-minimal in Ubuntu. This is done for installation purposes > and wacky dependency issues that occur when a fair chunk of the O/S > is actually written in Python. It's worth noting that the entirety > of the Python stdlib is a required package, so it doesn't cause > issues.) There's a related issue that may or may not be in scope for this thread. For distros like Gentoo or Ubuntu that rely heavily on their own system Python for the OS to work properly, I'm quite loathe to install Cheeseshop packages into the system site-packages. I've had Gentoo break occasionally when I did this for example (though I don't remember the details now), so I always end up installing my own /usr/ local/bin/python and installing my 3rd party packages into there. Even though site-packages is last on sys.path, installing 3rd party packages can still break the OS if the system itself installs incompatible versions of such packages into its site-packages. Mailman's philosophy is to install the 3rd party packages it requires into its own 'pythonlib' directory that gets put first on sys.path. It does this for several reasons: I want to be able to override stdlib packages such as email with newer versions, I don't want to have to mess around at all with the system's site-packages, and I don't want updates to the system Python to break my application. I question whether a distro built on Python can even afford to allow 3rd party packages to be installed in their system's site-packages. Maybe Python needs to extend its system-centric view of site-packages with an application-centric and/or user-centric view of extensions? The only reason I can think of for Mailman /not/ using its own pythonlib is to save on disk space, and really, who cares about that any more? I submit that most applications of any size will have way more application data than duplicated Python libraries. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRWxVQ3EjvBPtnXfVAQIuMAQAkciyaHCwLnkN+8GwbhUro+vJuna+JObP AZaNzPKYABITqu5fKPl3aEvQz+9pNUvjM2c/q5p1m/9n34ZBURfgpHa3yk7QcbW0 sud8utdW6wMHMuWVw/1lQNaZ2GeJz9E4CgO93btfgiMLFIrcnBxr6uw5NqTrMwOc 4iIupbjYfUg= =Nxff -----END PGP SIGNATURE----- From martin at v.loewis.de Tue Nov 28 19:08:17 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Nov 2006 19:08:17 +0100 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <456BE0D0.9010507@acm.org> References: <4569F7E3.9040004@acm.org> <4569FD85.4010006@acm.org> <456B7604.6050809@v.loewis.de> <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com> <456BE0D0.9010507@acm.org> Message-ID: <456C7B11.1010409@v.loewis.de> Talin schrieb: > To that extent, it can be useful sometimes to have someone who is in the > process of learning how to use the system, and who is willing to > carefully analyze and write down their own experiences while doing so. I readily agree that the documentation can be improved, and applaud efforts to do so. And I have no doubts that distutils is difficult to learn for a beginner. In Talin's remarks, there was also the suggestion that distutils is "in need of some serious refactoring". It is such remarks that get me started: it seems useless to me to make such a statement if they are not accompanied with concrete proposals what specifically to change. It also gets me upset because it suggests that all prior contributors weren't serious. Regards, Martin From martin at v.loewis.de Tue Nov 28 19:17:04 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Nov 2006 19:17:04 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <456AEA45.7060209@suse.cz> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> Message-ID: <456C7D20.1080604@v.loewis.de> Robin Bryce schrieb: > python2.4 profile (pstats) etc, was removed due to licensing issues > rather than FHS. Should not be an issue for python2.5 but what, in > general, can a vendor do except break python if their licensing policy > cant accommodate all of pythons batteries ? If some vendor has a valid concern about the licensing of a certain piece of Python, they should bring that up while the LSB is being defined. > python2.4 distutils is excluded by default. This totally blows in my > view but I appreciate this one is a minefield of vendor packaging > politics. It has to be legitimate for Python / setuptools too provide > packaging infrastructure and conventions that are viable on more than > linux. Again, that a decision for the LSB standard to make. If LSB defines that distutils is part of LSB (notice the *If*: this is all theoretical; the LSB doesn't yet define anything for Python), then each vendor can still chose to include distutils or not, if they don't, they wouldn't comply to that version of LSB. So it is *always* their choice what standard to follow. OTOH, certain customers demand LSB conformance, so a vendor that choses not to follow LSB may lose customers. I personally agree that "Linux standards" should specify a standard layout for a Python installation, and that it should be the one that "make install" generates (perhaps after "make install" is adjusted). Whether or not it is the *LSB* that needs to specify that, I don't know, because the LSB does not specify a file system layout. Instead, it incorporates the FHS - which might be the right place to define the layout of a Python installation. For the LSB, it's more import that "import httplib" gives you something working, no matter where httplib.py comes from (or whether it comes from httplib.py at all). Regards, Martin From talin at acm.org Tue Nov 28 19:33:08 2006 From: talin at acm.org (Talin) Date: Tue, 28 Nov 2006 10:33:08 -0800 Subject: [Python-Dev] Distribution tools: What I would like to see In-Reply-To: <456C7B11.1010409@v.loewis.de> References: <4569F7E3.9040004@acm.org> <4569FD85.4010006@acm.org> <456B7604.6050809@v.loewis.de> <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com> <456BE0D0.9010507@acm.org> <456C7B11.1010409@v.loewis.de> Message-ID: <456C80E4.5060000@acm.org> Martin v. L?wis wrote: > Talin schrieb: >> To that extent, it can be useful sometimes to have someone who is in the >> process of learning how to use the system, and who is willing to >> carefully analyze and write down their own experiences while doing so. > > I readily agree that the documentation can be improved, and applaud > efforts to do so. And I have no doubts that distutils is difficult to > learn for a beginner. > > In Talin's remarks, there was also the suggestion that distutils is > "in need of some serious refactoring". It is such remarks that get > me started: it seems useless to me to make such a statement if they > are not accompanied with concrete proposals what specifically to > change. It also gets me upset because it suggests that all prior > contributors weren't serious. I'm sorry if I implied that distutils was 'misdesigned', that wasn't what I meant. Refactoring is usually desirable when a body of code has accumulated a lot of additional baggage as a result of maintenance and feature additions, accompanied by the observation that if the baggage had been present when the system was originally created, the design of the system would have been substantially different. Refactoring is merely an attempt to discover what that original design might have been, if the requirements had been known at the time. What I was reacting to, I think, is that it seemed like in some ways the 'diffness' of setuptools wasn't just in the documentation, but in the code itself, and if both setuptools and distutils had been co-developed, then distutils might have been someone different as a result. Also, I admit that some of this is hearsay, so maybe I should just back off on this one. > Regards, > Martin From sluggoster at gmail.com Tue Nov 28 20:41:48 2006 From: sluggoster at gmail.com (Mike Orr) Date: Tue, 28 Nov 2006 11:41:48 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com> On 11/28/06, Barry Warsaw wrote: > For distros like Gentoo or Ubuntu that rely heavily on their > own system Python for the OS to work properly, I'm quite loathe to > install Cheeseshop packages into the system site-packages. I've had > Gentoo break occasionally when I did this for example (though I don't > remember the details now), so I always end up installing my own /usr/ > local/bin/python and installing my 3rd party packages into there. > Even though site-packages is last on sys.path, installing 3rd party > packages can still break the OS if the system itself installs > incompatible versions of such packages into its site-packages. One wishes distro vendors would install a separate copy of Python for their internal OS stuff so that broken-library or version issues wouldn't affect the system. That would be worth putting into the standard. -- Mike Orr From barry at python.org Tue Nov 28 21:11:57 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Nov 2006 15:11:57 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com> Message-ID: <4EB09625-4338-4537-9DDD-088359E2E33A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 28, 2006, at 2:41 PM, Mike Orr wrote: > On 11/28/06, Barry Warsaw wrote: >> For distros like Gentoo or Ubuntu that rely heavily on their >> own system Python for the OS to work properly, I'm quite loathe to >> install Cheeseshop packages into the system site-packages. I've had >> Gentoo break occasionally when I did this for example (though I don't >> remember the details now), so I always end up installing my own /usr/ >> local/bin/python and installing my 3rd party packages into there. >> Even though site-packages is last on sys.path, installing 3rd party >> packages can still break the OS if the system itself installs >> incompatible versions of such packages into its site-packages. > > One wishes distro vendors would install a separate copy of Python for > their internal OS stuff so that broken-library or version issues > wouldn't affect the system. That would be worth putting into the > standard. Agreed. But that would just eliminate one potential source of "application" conflict (defining the OS itself as just another application). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRWyYEnEjvBPtnXfVAQJCKQP7BXVOYUIvbEBgFK7nWHieBqRGXzohhKNZ SN5qV4P6uZGnCtjp1Z4W8U82X8TH+X3Ovx02mS+GN+nrlyF7AVhDr/mSLXI90Kan 1dqOhAIz5rBeT03/k0SpAPSiBhonl4zF4ZmezGaz3lif2CjsH6PT9153Mv7wXb1N ut2QIhXnejA= =jhbd -----END PGP SIGNATURE----- From theller at ctypes.org Tue Nov 28 21:26:54 2006 From: theller at ctypes.org (Thomas Heller) Date: Tue, 28 Nov 2006 21:26:54 +0100 Subject: [Python-Dev] ctypes and powerpc In-Reply-To: References: <456745DA.3010903@ctypes.org> Message-ID: Ronald Oussoren schrieb: > > On Friday, November 24, 2006, at 08:21PM, "Thomas Heller" wrote: >>I'd like to ask for help with an issue which I do not know >>how to solve. >> >>Please see this bug http://python.org/sf/1563807 >>"ctypes built with GCC on AIX 5.3 fails with ld ffi error" >> >>Apparently this is a powerpc machine, ctypes builds but cannot be imported >>because of undefined symbols like 'ffi_call', 'ffi_prep_closure'. >> >>These symbols are defined in file >> Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c. >>The whole contents of this file is enclosed within a >> >>#ifdef __ppc__ >>... >>#endif >> >>block. IIRC, this block has been added by Ronald for the >>Mac universal build. Now, it seems that on the AIX machine >>the __ppc__ symbols is not defined; removing the #ifdef/#endif >>makes the built successful. > > The defines were indeed added for the universal build and I completely overlooked the fact that ffi_darwin.c is also used for AIX. One way to fix this is > > #if ! (defined(__APPLE__) && !defined(__ppc__)) > ... > #endif > > That is, compile the file unless __APPLE__ is defined but __ppc__ isn't. This more clearly documents the intent. Yes, this makes the most sense. I've taken this approach. Thanks, Thomas From guido at python.org Tue Nov 28 22:05:20 2006 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Nov 2006 13:05:20 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: On 11/28/06, Barry Warsaw wrote: > There's a related issue that may or may not be in scope for this > thread. For distros like Gentoo or Ubuntu that rely heavily on their > own system Python for the OS to work properly, I'm quite loathe to > install Cheeseshop packages into the system site-packages. I wonder if would help if we were to add a vendor-packages directory where distros can put their own selection of 3rd party stuff they depend on, to be searched before site-packages, and a command-line switch that ignores site-package but still searches vendor-package. (-S would almost do it but probably suppresses too much.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue Nov 28 22:19:47 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 28 Nov 2006 16:19:47 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote: >On 11/28/06, Barry Warsaw wrote: > > There's a related issue that may or may not be in scope for this > > thread. For distros like Gentoo or Ubuntu that rely heavily on their > > own system Python for the OS to work properly, I'm quite loathe to > > install Cheeseshop packages into the system site-packages. > >I wonder if would help if we were to add a vendor-packages directory >where distros can put their own selection of 3rd party stuff they >depend on, to be searched before site-packages, and a command-line >switch that ignores site-package but still searches vendor-package. >(-S would almost do it but probably suppresses too much.) They could also use -S and then explicitly insert the vendor-packages directory into sys.path at the beginning of their scripts. And a .pth in site-packages could add vendor-packages at the *end* of sys.path, so that scripts not using -S would pick it up. This would be backward compatible except for the vendor scripts that want to use this approach. From barry at python.org Wed Nov 29 00:45:04 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Nov 2006 18:45:04 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <7D58A2FE-DDB3-4104-B9FB-F13FD483FF83@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 28, 2006, at 4:05 PM, Guido van Rossum wrote: > On 11/28/06, Barry Warsaw wrote: >> There's a related issue that may or may not be in scope for this >> thread. For distros like Gentoo or Ubuntu that rely heavily on their >> own system Python for the OS to work properly, I'm quite loathe to >> install Cheeseshop packages into the system site-packages. > > I wonder if would help if we were to add a vendor-packages directory > where distros can put their own selection of 3rd party stuff they > depend on, to be searched before site-packages, and a command-line > switch that ignores site-package but still searches vendor-package. > (-S would almost do it but probably suppresses too much.) I keep thinking I'd like to treat the OS as just another application, so that there's nothing special about it and the same infrastructure could be used for other applications with lots of entry level scripts. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRWzKAHEjvBPtnXfVAQK9AAQAsJS2Ag9yBO+dLGiZdJlaWAj64zWcd9oi zqaE95/y53iXBvMBynglROApDEdOsnv/1/XSx1+2gZVIkuFvHLplbqZWVCsZ56r+ nAcTzFXsM2zPBSECKWuSfxBUILKalRdaIXKOUjgd0iZTrCbt3EeTmZlxMTKq9sGU 1Scr8sHSpIE= =rNjl -----END PGP SIGNATURE----- From greg at electricrain.com Wed Nov 29 00:25:49 2006 From: greg at electricrain.com (Gregory P. Smith) Date: Tue, 28 Nov 2006 15:25:49 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <20061128232549.GB17224@electricrain.com> > I question whether a distro built on Python can even afford to allow > 3rd party packages to be installed in their system's site-packages. > Maybe Python needs to extend its system-centric view of site-packages > with an application-centric and/or user-centric view of extensions? Agreed, I do not think that should be allowed. A system site-packages directory for a python install is a convenient bandaid but not a good idea for real world deployment of anything third party. It suffers from the same classic DLL Hell problem that windows has suffered with for eons with applications all including the "same" DLLs and putting them in the system directory. I'm fine if an OS distro wants to use site-packages for things the OS depends on in its use of python. I'm fine with the OS offering its own packages (debs or rpms or whatnot) that install additional python libraries under site-packages for use system wide or to satisfy dependancies from other system packages. Those are all managed properly for compatibility at the OS distro level. Whats bad is for third party (non-os-distro packaged) applications to touch site-packages. -greg From glyph at divmod.com Wed Nov 29 01:01:30 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 29 Nov 2006 00:01:30 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061129000130.11053.1150542058.divmod.xquotient.111@joule.divmod.com> On 11:45 pm, barry at python.org wrote: >I keep thinking I'd like to treat the OS as just another application, >so that there's nothing special about it and the same infrastructure >could be used for other applications with lots of entry level scripts. I agree. The motivation here is that the "OS" application keeps itself separate so that incorrect changes to configuration or installation of incompatible versions of dependencies don't break it. There are other applications which also don't want to break. This is a general problem with Python, one that should be solved with a comprehensive parallel installation or "linker" which explicitly describes dependencies and allows for different versions of packages. I definitely don't think that this sort of problem should be solved during the *standardization* process - that should just describe the existing conventions for packaging Python stuff, and the OS can insulate itself in terms of that. Definitely it shouldn't be changed as part of standardization unless the distributors are asking for it loudly. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061129/c168fb09/attachment.htm From pje at telecommunity.com Wed Nov 29 01:10:15 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 28 Nov 2006 19:10:15 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com> At 06:41 PM 11/28/2006 -0500, Barry Warsaw wrote: >On Nov 28, 2006, at 4:19 PM, Phillip J. Eby wrote: >>At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote: >>>On 11/28/06, Barry Warsaw wrote: >>> > There's a related issue that may or may not be in scope for this >>> > thread. For distros like Gentoo or Ubuntu that rely heavily on >>>their >>> > own system Python for the OS to work properly, I'm quite loathe to >>> > install Cheeseshop packages into the system site-packages. >>> >>>I wonder if would help if we were to add a vendor-packages directory >>>where distros can put their own selection of 3rd party stuff they >>>depend on, to be searched before site-packages, and a command-line >>>switch that ignores site-package but still searches vendor-package. >>>(-S would almost do it but probably suppresses too much.) >> >>They could also use -S and then explicitly insert the vendor- packages >>directory into sys.path at the beginning of their scripts. > >Possibly, but stuff like this can be a pain because your dependent >app must build in the infrastructure itself to get the right paths >set up for its scripts. >... >Maybe there's no better way of doing this and applications are best >left to their own devices. But in the back of my mind, I keep >thinking there should be a better way. ;) Well, you can always use setuptools, which generates script wrappers that import the desired module and call a function, after first setting up sys.path. :) From barry at python.org Wed Nov 29 00:41:37 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Nov 2006 18:41:37 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 28, 2006, at 4:19 PM, Phillip J. Eby wrote: > At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote: >> On 11/28/06, Barry Warsaw wrote: >> > There's a related issue that may or may not be in scope for this >> > thread. For distros like Gentoo or Ubuntu that rely heavily on >> their >> > own system Python for the OS to work properly, I'm quite loathe to >> > install Cheeseshop packages into the system site-packages. >> >> I wonder if would help if we were to add a vendor-packages directory >> where distros can put their own selection of 3rd party stuff they >> depend on, to be searched before site-packages, and a command-line >> switch that ignores site-package but still searches vendor-package. >> (-S would almost do it but probably suppresses too much.) > > They could also use -S and then explicitly insert the vendor- > packages directory into sys.path at the beginning of their scripts. Possibly, but stuff like this can be a pain because your dependent app must build in the infrastructure itself to get the right paths set up for its scripts. An approach I've used in the past is to put a paths.py file in the bin directory and force every script to "import paths" before it imports anything it doesn't want to get from the stdlib (including overrides). paths.py is actually generated though because the user could specify an alternative Python with a configure switch. What I'm moving to now though is a sort of 'shell' or driver script which does that path setup once, then imports a module based on argv [0], sniffing out a main() and then calling that. The trick then of course is that you symlink all the top-level user scripts to this shell. Works fine if all you care about is *nix , but it does mean an application with lots of entry-level scripts has to build all this infrastructure itself. Maybe there's no better way of doing this and applications are best left to their own devices. But in the back of my mind, I keep thinking there should be a better way. ;) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRWzJMXEjvBPtnXfVAQJHmAP/UhGUv1Wxt2AzGT08dM9/M0J4pahGnrF3 VwbrdRTF6Jt32iAKAJolrnTE+XlMaTGitYv+mu8v3SgJLWwe+aeJwpg8AdOn5jBL bSjBpE9UeqUSiMhaJmBbx/z5ISv4OioJLX+vzBv6u0yBTYv4uoYZPKoeMcCe6Afw 7e1gIL1WHL4= =scvm -----END PGP SIGNATURE----- From barry at python.org Wed Nov 29 01:26:29 2006 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Nov 2006 19:26:29 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com> References: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com> <5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com> Message-ID: <91BBC5A5-00C1-4063-B10F-6BDA8BD19E59@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 28, 2006, at 7:10 PM, Phillip J. Eby wrote: > Well, you can always use setuptools, which generates script > wrappers that import the desired module and call a function, after > first setting up sys.path. :) That's so 21st Century! Where was setuptools back in 1996? :) Seriously though, that does sound cool, and thanks for the tip. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRWzTtXEjvBPtnXfVAQKDwgP+N/nGkHm7e9ZK+DmTEx+gOxPkeQnpKcA2 AHLg9WLJhLHlrxlekftm3F1+YNQv9R6tthRKu6Zgz5fJTPs57MluJ4qAzPapDymT oGX5Y3HxCdaqrw0HWviuJeUr8euN7NIghUAsEbe51pppfbTs80dGnDrDRL4AfXGm 4/C9DW2URkQ= =pt5u -----END PGP SIGNATURE----- From Daniel.Trstenjak at science-computing.de Wed Nov 29 10:06:03 2006 From: Daniel.Trstenjak at science-computing.de (Daniel Trstenjak) Date: Wed, 29 Nov 2006 09:06:03 +0000 (UTC) Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode Message-ID: <20061129090611.GA19856@bug.science-computing.de> Hi all, I would like to know the definition of the 'locals' object given to PyEval_EvalCode. Has 'locals' to be a python dictionary or a subtype of a python dictionary, or is it enough if the object implements the necessary protocols? The python implementation behaves different for the two following code lines: from modul import symbol from modul import * In the case of the first one, it's enough if the object 'locals' implements the necessary protocols. The second one only works if the object 'locals' is a type or subtype of dictionary. The problem lies in Python-2.5/Python/ceval.c: static int import_all_from(PyObject *locals, PyObject *v) { ... 4046 value = PyObject_GetAttr(v, name); 4047 if (value == NULL) 4048 err = -1; 4049 else >>> 4050 err = PyDict_SetItem(locals, name, value); 4051 Py_DECREF(name); ... } Changing PyDict_SetItem in line 4050 with PyObject_SetAttr could fix it. Best Regards, Daniel From Jack.Jansen at cwi.nl Wed Nov 29 10:34:26 2006 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Wed, 29 Nov 2006 10:34:26 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: On 28-nov-2006, at 22:05, Guido van Rossum wrote: > On 11/28/06, Barry Warsaw wrote: >> There's a related issue that may or may not be in scope for this >> thread. For distros like Gentoo or Ubuntu that rely heavily on their >> own system Python for the OS to work properly, I'm quite loathe to >> install Cheeseshop packages into the system site-packages. > > I wonder if would help if we were to add a vendor-packages directory > where distros can put their own selection of 3rd party stuff they > depend on, to be searched before site-packages, and a command-line > switch that ignores site-package but still searches vendor-package. > (-S would almost do it but probably suppresses too much.) +1. We've been running into this problem on the Mac since Apple started shipping Python. There's another standard place that is searched on MacOS: a per-user package directory ~/Library/Python/2.5/site-packages (the name "site- packages" is a misnomer, really). Standardising something here is less important than for vendor-packages (as the effect can easily be gotten by adding things to PYTHONPATH) but it has one advantage: distutils and such could be taught about it and provide an option to install either systemwide or for the current user only. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From glyph at divmod.com Wed Nov 29 11:18:26 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 29 Nov 2006 10:18:26 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> On 09:34 am, jack.jansen at cwi.nl wrote: >There's another standard place that is searched on MacOS: a per-user >package directory ~/Library/Python/2.5/site-packages (the name "site- >packages" is a misnomer, really). Standardising something here is >less important than for vendor-packages (as the effect can easily be >gotten by adding things to PYTHONPATH) but it has one advantage: >distutils and such could be taught about it and provide an option to >install either systemwide or for the current user only. Yes, let's do that, please. I've long been annoyed that site.py sets up a local user installation directory, a very useful feature, but _only_ on OS X. I've long since promoted my personal hack to add a local user installation directory into a public project -- divmod's "Combinator" -- but it would definitely be preferable for Python to do something sane by default (and have setuptools et. al. support it). I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the "official" UNIX installation location, since it's what we're already using, and ~/.local seems like a convention being slowly adopted by GNOME and the like. I don't know the cultural equivalent in Windows - "%USERPROFILE%\Application Data\PythonXX" maybe? It would be nice if site.py would do this in the same place as it sets up the "darwin"-specific path, and to set that path as a module global, so packaging tools could use "site.userinstdir" or something. Right now, if it's present, it's just some random entry on sys.path. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061129/63ad3d9a/attachment.html From arigo at tunes.org Wed Nov 29 12:23:54 2006 From: arigo at tunes.org (Armin Rigo) Date: Wed, 29 Nov 2006 12:23:54 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <200611290053.17199.anthony@interlink.com.au> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <20061129112354.GA30665@code0.codespeak.net> Hi Anthony, On Wed, Nov 29, 2006 at 12:53:14AM +1100, Anthony Baxter wrote: > > python2.4 distutils is excluded by default. > > I still have no idea why this was one - I was also one of the people > who jumped up and down asking Debian/Ubuntu to fix this idiotic > decision. I could not agree more. Nowadays, whenever I get an account on a new Linux machine, the first thing I have to do is reinstall Python correctly in my home dir because the system Python lacks distutils. Wasteful. (There are some applications and libraries that use distutils at run-time to compile things, and I'm using such applications and libraries on a daily basis.) Armin From guido at python.org Wed Nov 29 16:39:25 2006 From: guido at python.org (Guido van Rossum) Date: Wed, 29 Nov 2006 07:39:25 -0800 Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode In-Reply-To: <20061129090611.GA19856@bug.science-computing.de> References: <20061129090611.GA19856@bug.science-computing.de> Message-ID: This seems a bug. In revision 36714 by Raymond Hettinger, the restriction that locals be a dict was relaxed to allow any mapping. On 11/29/06, Daniel Trstenjak wrote: > > Hi all, > > I would like to know the definition of the 'locals' object given to > PyEval_EvalCode. Has 'locals' to be a python dictionary or a subtype > of a python dictionary, or is it enough if the object implements the > necessary protocols? > > The python implementation behaves different for the two following code > lines: > > from modul import symbol > from modul import * > > In the case of the first one, it's enough if the object 'locals' implements > the necessary protocols. The second one only works if the object 'locals' > is a type or subtype of dictionary. > > The problem lies in Python-2.5/Python/ceval.c: > > static int > import_all_from(PyObject *locals, PyObject *v) > { > ... > 4046 value = PyObject_GetAttr(v, name); > 4047 if (value == NULL) > 4048 err = -1; > 4049 else > >>> 4050 err = PyDict_SetItem(locals, name, value); > 4051 Py_DECREF(name); > ... > } > > Changing PyDict_SetItem in line 4050 with PyObject_SetAttr could fix it. > > > Best Regards, > Daniel > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Nov 29 22:05:55 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Nov 2006 22:05:55 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> Message-ID: <456DF633.5070103@v.loewis.de> Guido van Rossum schrieb: > I wonder if would help if we were to add a vendor-packages directory > where distros can put their own selection of 3rd party stuff they > depend on, to be searched before site-packages, and a command-line > switch that ignores site-package but still searches vendor-package. > (-S would almost do it but probably suppresses too much.) Patch #1298835 implements such a vendor-packages directory. I have reopened the patch to reconsider it. I take your message as a +1 for that feature. Regards, Martin From arigo at tunes.org Wed Nov 29 23:10:11 2006 From: arigo at tunes.org (Armin Rigo) Date: Wed, 29 Nov 2006 23:10:11 +0100 Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode In-Reply-To: References: <20061129090611.GA19856@bug.science-computing.de> Message-ID: <20061129221011.GA28156@code0.codespeak.net> Hi, On Wed, Nov 29, 2006 at 07:39:25AM -0800, Guido van Rossum wrote: > This seems a bug. In revision 36714 by Raymond Hettinger, the > restriction that locals be a dict was relaxed to allow any mapping. Mea culpa, I thought I reviewed this patch at the time. Fixed in r52862-52863. A bientot, Armin From barry at python.org Thu Nov 30 00:49:52 2006 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Nov 2006 18:49:52 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> Message-ID: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 29, 2006, at 5:18 AM, glyph at divmod.com wrote: > Yes, let's do that, please. I've long been annoyed that site.py > sets up a local user installation directory, a very useful feature, > but _only_ on OS X. I've long since promoted my personal hack to > add a local user installation directory into a public project -- > divmod's "Combinator" -- but it would definitely be preferable for > Python to do something sane by default (and have setuptools et. al. > support it). > > I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the > "official" UNIX installation location, since it's what we're > already using, and ~/.local seems like a convention being slowly > adopted by GNOME and the like. I don't know the cultural > equivalent in Windows - "%USERPROFILE%\Application Data\PythonXX" > maybe? > > It would be nice if site.py would do this in the same place as it > sets up the "darwin"-specific path, and to set that path as a > module global, so packaging tools could use "site.userinstdir" or > something. Right now, if it's present, it's just some random entry > on sys.path. +1 from me also for the concept. I'm not sure I like ~/.local though - -- it seems counter to the app-specific dot-file approach old schoolers like me are used to. OTOH, if that's a convention being promoted by GNOME and other frameworks, then I don't have too much objection. I also think that setuptools has the potential to be a big improvement here because it's much easier to install and use egg files than it is to get distutils to DTRT with setup.py. (I still detest the command name 'easy_install' but hey that's still fixable right? :). What might be nice would be to build a little more infrastructure into Python to support eggs, by say adding a default PEP 302 style importer that knows how to search for eggs in 'nests' (a directory containing a bunch of eggs). What if then that importer were general enough, or had a subclass that implemented a policy for applications where /lib/ pythonX.X/app-packages/ became a nest directory. All my app would have to do would be to drop an instance of one of those in the right place on sys.path and Python would pick up all the eggs in my app-package directory. Further, easy_install could then grow an -- install-app switch or somesuch that would install the egg in the app- package directory. I haven't really thought this through so maybe it's a stupid idea, but ISTM that would make management, installation, and use in an application about as simple as possible. (Oh yeah, add an -- uninstall switch too :). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRW4cpnEjvBPtnXfVAQK/7wP/fS/MnVm6Msq6kB3qJce5BOK4NFo0ewGG uephuUfux+AWKMhl6KIIe7xeT6yO4yS/U/DF0sZ35JoOK8ebyH0JO/pup+lCfA3r ODQL45s+G1yycZDjUh3/a9+RakdhpfBRvjU3V/IFH7ayiM9PIHxKjTIzjXo3m1Pq 1hxb5BHS/8I= =kPE7 -----END PGP SIGNATURE----- From greg.ewing at canterbury.ac.nz Thu Nov 30 01:34:03 2006 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 30 Nov 2006 13:34:03 +1300 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> Message-ID: <456E26FB.2020502@canterbury.ac.nz> Barry Warsaw wrote: > I'm not sure I like ~/.local though > - -- it seems counter to the app-specific dot-file approach old > schoolers like me are used to. Problems with that are starting to show, though. There's a particular Unix account that I've had for quite a number of years, accumulating much stuff. Nowadays when I do ls -a ~, I get a directory listing several screens long... The whole concept of "hidden" files seems ill- considered to me, anyway. It's too easy to forget that they're there. Putting infrequently-referenced stuff in a non-hidden location such as ~/local seems just as good and less magical to me. -- Greg From python at rcn.com Thu Nov 30 03:00:50 2006 From: python at rcn.com (python at rcn.com) Date: Wed, 29 Nov 2006 21:00:50 -0500 (EST) Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode Message-ID: <20061129210050.AOV97481@ms09.lnh.mail.rcn.net> [Guido van Rossum] > This seems a bug. In revision 36714 by Raymond Hettinger, > the restriction that locals be a dict was relaxed to allow > any mapping. [Armin Rigo] > Mea culpa, I thought I reviewed this patch at the time. > Fixed in r52862-52863. Armin, thanks for the check-ins. Daniel, thanks for finding one of the cases I missed. Will load a unittest for this one when I get a chance. Raymond From robinbryce at gmail.com Thu Nov 30 03:01:56 2006 From: robinbryce at gmail.com (Robin Bryce) Date: Thu, 30 Nov 2006 02:01:56 +0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456C7D20.1080604@v.loewis.de> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <456AEA45.7060209@suse.cz> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <456C7D20.1080604@v.loewis.de> Message-ID: On 28/11/06, "Martin v. L?wis" wrote: > I personally agree that "Linux standards" should specify a standard > layout for a Python installation, and that it should be the one that > "make install" generates (perhaps after "make install" is adjusted). > Whether or not it is the *LSB* that needs to specify that, I don't > know, because the LSB does not specify a file system layout. Instead, > it incorporates the FHS - which might be the right place to define > the layout of a Python installation. For the LSB, it's more import > that "import httplib" gives you something working, no matter where > httplib.py comes from (or whether it comes from httplib.py at all). Yes, especially with the regard to the level you pitch for LSB. I would go as far as to say that if this "contract in spirit" is broken by vendor repackaging they should: * Call the binaries something else because it is NOT python any more. * Setup the installation layout so that it does NOT conflict or overlap with the standard layout. * Call the whole package something else. But I can't see that happening. Is it a bad idea to suggest that: Python grows a vendor_variant attribute somewhere in the standard lib; That its content is completely dictated by a new ./configure argument which is the empty string by default; And, request that it is left empty by re-packagers if the installation is 'reasonably standard' ? I would strongly prefer _not_ write code that is conditional on such an attribute. However if there was a clear way for a vendor to communicate "This is not a standard python runtime" to the python run time, early failure (in the application) with informative error messages becomes much more viable. Eg sys.vendor_variant would be orthogonal to sys.version and sys.version_info Given: python -c "import sys; print sys.version" GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) A regex on sys.version does not seem like a good way to get positive confirmation I'm using the "Canonical" variant (pun intended) python -c "from distutils.util import get_platform; print get_platform()" Tells me nothing about the vendor of my linux distribution. Except, ironically, when it says ImportError Cheers, Robin From glyph at divmod.com Thu Nov 30 04:20:36 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Nov 2006 03:20:36 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> On 29 Nov, 11:49 pm, barry at python.org wrote: >On Nov 29, 2006, at 5:18 AM, glyph at divmod.com wrote: >> I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the >> "official" UNIX installation location, ... >+1 from me also for the concept. I'm not sure I like ~/.local though >- -- it seems counter to the app-specific dot-file approach old >schoolers like me are used to. OTOH, if that's a convention being >promoted by GNOME and other frameworks, then I don't have too much >objection. Thanks. I just had a look at the code in Combinator which sets this up and it turns out it's horribly inconsistent and buggy. It doesn't really work on any platform other than Linux. I'll try to clean it up in the next few days so it can serve as an example. GNOME et. al. aren't promoting the concept too hard. It's just the first convention I came across. (Pardon the lack of references here, but it's very hard to google for "~/.local" - I just know that I was looking for a convention when I wrote combinator, and this is the one I found.) The major advantage ~/.local has for *nix systems is the ability to have a parallel *bin* directory, which provides the user one location to set their $PATH to, so that installed scripts work as expected, rather than having to edit a bunch of .foorc files to add to your environment with each additional package. After all, what's the point of a per-user "install" if the software isn't actually installed in any meaningful way, and you have to manually edit your shell startup scripts, log out and log in again anyway? Another nice feature there is that it uses a pre-existing layout convention (bin lib share etc ...) rather than attempting to build a new one, so the only thing that has to change about the package installation is the root. Finally, I know there are quite a few Python developers out there already using Combinator, so at least there it's an established convention :). >I also think that setuptools has the potential to be a big >improvement here because it's much easier to install and use egg >files than it is to get distutils to DTRT with setup.py. (I still >detest the command name 'easy_install' but hey that's still fixable >right? :). What might be nice would be to build a little more >infrastructure into Python to support eggs, by say adding a default >PEP 302 style importer that knows how to search for eggs in >'nests' (a directory containing a bunch of eggs). One of the things that combinator hacks is where distutils thinks it should install to - when *I* type "python setup.py install" nothing tries to insert itself into system directories (those are for Ubuntu, not me) - ~/.local is the *default* install location. I haven't managed to make this feature work with eggs yet, but I haven't done a lot of work with setuptools. On the "easy_install" naming front, how about "layegg"? >What if then that importer were general enough (...) These all sound like interesting ideas, but they're starting to get pretty far afield - I wish I had more time to share ideas about packaging, but I know too well that I'm not going to be able to back them up with any implementation effort. I'd really like Python to use the ~/.local/bin / ~/.local/lib convention for installing packages, though. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/82573734/attachment.html From glyph at divmod.com Thu Nov 30 04:32:35 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Nov 2006 03:32:35 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061130033235.11053.1425332311.divmod.xquotient.907@joule.divmod.com> On 12:34 am, greg.ewing at canterbury.ac.nz wrote: >The whole concept of "hidden" files seems ill- >considered to me, anyway. It's too easy to forget >that they're there. Putting infrequently-referenced >stuff in a non-hidden location such as ~/local >seems just as good and less magical to me. Something like "~/.local" is an implementation detail, not something that should be exposed to non-savvy users. It's easy enough for an expert to "show" it if they want to - "ln -s .local local" - but impossible for someone more naive to hide if they don't understand what it is or what it's for. (And if they try, by clicking a checkbox in Nautilus or somesuch, *all* their installed software breaks.) This approach doesn't really work unless you have good support from the OS, so it can warn you you're about to do something crazy. UI designers tend to get adamant about this sort of thing, but I'll admit they go both ways, some saying that everything should be exposed to the user, some saying that all details should be hidden by default. Still, in the more recent UNIX desktops, the "let's hide the things that the user shouldn't see and just work really hard to make them work right all the time" camp seems to be winning. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/6118882b/attachment.htm From fdrake at acm.org Thu Nov 30 05:11:58 2006 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 29 Nov 2006 23:11:58 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> Message-ID: <200611292311.59096.fdrake@acm.org> On Wednesday 29 November 2006 22:20, glyph at divmod.com wrote: > GNOME et. al. aren't promoting the concept too hard. It's just the first > convention I came across. (Pardon the lack of references here, but it's > very hard to google for "~/.local" - I just know that I was looking for a > convention when I wrote combinator, and this is the one I found.) ~/.local/ is described in the "XDG Base Directory Specification": http://standards.freedesktop.org/basedir-spec/latest/ > On the "easy_install" naming front, how about "layegg"? Actually, why not just "egg"? That's parallel to "rpm" at least, and there isn't such a command installed on my Ubuntu box already. (Using synaptic to search for "egg" resulted in little that actually had "egg" in the name or short description; there was wnn7egg (a Wnn7 input method), but that's really it.) -Fred -- Fred L. Drake, Jr. From pje at telecommunity.com Thu Nov 30 05:36:19 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 29 Nov 2006 23:36:19 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule .divmod.com> Message-ID: <5.1.1.6.0.20061129233347.03a1d458@sparrow.telecommunity.com> At 03:20 AM 11/30/2006 +0000, glyph at divmod.com wrote: >One of the things that combinator hacks is where distutils thinks it >should install to - when *I* type "python setup.py install" nothing tries >to insert itself into system directories (those are for Ubuntu, not me) - >~/.local is the *default* install location. I haven't managed to make >this feature work with eggs yet, but I haven't done a lot of work with >setuptools. easy_install uses the standard distutils configuration system, which means that you can do e.g. [install] prefix = ~/.local in ./setup.cfg, ~/.pydistutils.cfg, or /usr/lib/python2.x/distutils/distutils.cfg to set the default installation prefix. Setuptools (and distutils!) will then install libraries to ~/.local/lib/python2.x/site-packages and scripts to ~/.local/bin. From pje at telecommunity.com Thu Nov 30 05:45:06 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 29 Nov 2006 23:45:06 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> Message-ID: <5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com> At 06:49 PM 11/29/2006 -0500, Barry Warsaw wrote: >What might be nice would be to build a little more >infrastructure into Python to support eggs, by say adding a default >PEP 302 style importer that knows how to search for eggs in >'nests' (a directory containing a bunch of eggs). If you have setuptools generate your scripts, the eggs are searched for and added to sys.path automatically, with no need for a separate importer. If you write standalone scripts (not using "setup.py develop" or "setup.py install"), you can use pkg_resources.require() to find eggs and add them to sys.path manually. If you want eggs available when you start Python, easy_install puts them on sys.path using .pth files by default. So, I'm not clear on what use case you have in mind for this importer, or how you think it would work. (Any .egg file in a sys.path directory is already automatically discoverable by the means described above.) >What if then that importer were general enough, or had a subclass >that implemented a policy for applications where /lib/ >pythonX.X/app-packages/ became a nest directory. Simply installing your scripts to the same directory as the eggs they require, is sufficient to ensure this. Also, since eggs are versioned, nothing stops you from having one giant systemwide egg directory. Setuptools-generated scripts automatically adjust their sys.path to include the specific eggs they need - and "need" can be specified to an exact version if desired (e.g. for system admin tools). >I haven't really thought this through so maybe it's a stupid idea, >but ISTM that would make management, installation, and use in an >application about as simple as possible. (Oh yeah, add an -- >uninstall switch too :). Yeah, that's targeted for the "nest" package management tool, which I may have some time to work on someday, in my copious free time. :) In the meantime, 'easy_install -Nm eggname; rm -rf /path/to/the.egg' takes care of everything but the scripts. From glyph at divmod.com Thu Nov 30 06:06:18 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Nov 2006 05:06:18 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061130050618.11053.1949245277.divmod.xquotient.913@joule.divmod.com> On 04:11 am, fdrake at acm.org wrote: >On Wednesday 29 November 2006 22:20, glyph at divmod.com wrote: > > GNOME et. al. aren't promoting the concept too hard. It's just the first > > convention I came across. (Pardon the lack of references here, but it's > > very hard to google for "~/.local" - I just know that I was looking for a > > convention when I wrote combinator, and this is the one I found.) > >~/.local/ is described in the "XDG Base Directory Specification": > > http://standards.freedesktop.org/basedir-spec/latest/ Thanks for digging that up! Not a whole lot of meat there, but at least it gives me some env vars to set / check... > > On the "easy_install" naming front, how about "layegg"? > >Actually, why not just "egg"? That works for me. I assumed there was some other reason the obvious answer hadn't been chosen :). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/5ecc1583/attachment.htm From barry at python.org Thu Nov 30 06:09:30 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Nov 2006 00:09:30 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 29, 2006, at 10:20 PM, glyph at divmod.com wrote: > Another nice feature there is that it uses a pre-existing layout > convention (bin lib share etc ...) rather than attempting to build > a new one, so the only thing that has to change about the package > installation is the root. That's an excellent point, because in configure-speak I guess you could just use --prefix=/.local and everything would lay out correctly. (I guess that's the whole point, eh? :) > One of the things that combinator hacks is where distutils thinks > it should install to - when *I* type "python setup.py install" > nothing tries to insert itself into system directories (those are > for Ubuntu, not me) - ~/.local is the *default* install location. > I haven't managed to make this feature work with eggs yet, but I > haven't done a lot of work with setuptools. That's really nice. So if I "sudo python setup.py install" it'll see uid 0 and install in the system location? > On the "easy_install" naming front, how about "layegg"? I think I once proposed "hatch" but that may not be quite the right word (where's Ken M when you need him? :). > >What if then that importer were general enough (...) > > These all sound like interesting ideas, but they're starting to get > pretty far afield - I wish I had more time to share ideas about > packaging, but I know too well that I'm not going to be able to > back them up with any implementation effort. Yeah, same here, so I'll shut up now. > I'd really like Python to use the ~/.local/bin / ~/.local/lib > convention for installing packages, though. I'm sold. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRW5ninEjvBPtnXfVAQKZBgP+MC1p3ipJbJn8ayhYyO73hdeWHpeHWd82 F4pFwkAuiXMWZ9/le1XW61+ODfSSti0RbBEiJeuul5dHP7+DlhXHyXrCf6Zzab4e PTerySTgc8AtI8L2VZzAaVU9PlzmKw0dp4s2pigNbGb3FRbH/m/ZwhSSYfeQTA3U gdA5YQq7CD0= =CJ9T -----END PGP SIGNATURE----- From glyph at divmod.com Thu Nov 30 06:10:46 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Nov 2006 05:10:46 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061130051046.11053.1680534272.divmod.xquotient.922@joule.divmod.com> On 04:36 am, pje at telecommunity.com wrote: >easy_install uses the standard distutils configuration system, which means >that you can do e.g. Hmm. I thought I knew quite a lot about distutils, but this particular nugget had evaded me. Thanks! I see that it's mentioned in the documentation, but I never thought to look in that section. I have an aversion to .ini files; I tend to assume there's always an equivalent Python expression, and it's better. Is there an equivalent Python API in this case? I don't know if this is a personal quirk of mine, or a reinforcement of Talin's point about the audience for documentation documentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/a3da98c6/attachment.html From barry at python.org Thu Nov 30 06:17:58 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Nov 2006 00:17:58 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com> Message-ID: <04EB4315-008A-460F-8DE2-D322634ED9BA@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 29, 2006, at 11:45 PM, Phillip J. Eby wrote: [Phillip describes a bunch of things I didn't know about setuptools] As is often the case, maybe everything I want is already there and I've just been looking in the wrong places. :) Thanks! I'll read up on that stuff. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRW5phnEjvBPtnXfVAQKwLgP/doK7aF5zGknK4JCv+rjO4xXKWRwjB0Vk B08Ee2HlSTcqSe8YIqMOSCRa8LcW86hEFipJmIi8vzcPv0Tr6y+i6yMTq0zhYeyh lvc7E7wdMY+U78/+ffeDLBNESXkZRzaiv0aH4ZkBf3xOebj58vCNBHlmzfT0WeFj EMnJut6jOnM= =mlIW -----END PGP SIGNATURE----- From pje at telecommunity.com Thu Nov 30 06:34:33 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 30 Nov 2006 00:34:33 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130051046.11053.1680534272.divmod.xquotient.922@joule .divmod.com> Message-ID: <5.1.1.6.0.20061130003332.04636428@sparrow.telecommunity.com> At 05:10 AM 11/30/2006 +0000, glyph at divmod.com wrote: >On 04:36 am, pje at telecommunity.com wrote: > > >easy_install uses the standard distutils configuration system, which means > >that you can do e.g. > >Hmm. I thought I knew quite a lot about distutils, but this particular >nugget had evaded me. Thanks! I see that it's mentioned in the >documentation, but I never thought to look in that section. I have an >aversion to .ini files; I tend to assume there's always an equivalent >Python expression, and it's better. Is there an equivalent Python API in >this case? Well, in a setup.py there's an options or some such that can be used to provide effective command-line option overrides in-line, but that doesn't help for systemwide default configurations, like the files I mentioned. It's effectively only a substitute for setup.cfg. From martin at v.loewis.de Thu Nov 30 07:12:33 2006 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Nov 2006 07:12:33 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <456AEA45.7060209@suse.cz> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <456C7D20.1080604@v.loewis.de> Message-ID: <456E7651.7070100@v.loewis.de> Robin Bryce schrieb: > Yes, especially with the regard to the level you pitch for LSB. I > would go as far as to say that if this "contract in spirit" is broken > by vendor repackaging they should: > * Call the binaries something else because it is NOT python any more. > * Setup the installation layout so that it does NOT conflict or > overlap with the standard layout. > * Call the whole package something else. I think that would be counter-productive. If applied in a strict sense, you couldn't call it Python anymore if it isn't in /usr/local. I see no point to that. It shouldn't be called Python anymore if it doesn't implement the Python language specification. No vendor is modifying it in such a way that print "Hello" stops working. > Is it a bad idea to suggest that: Python grows a vendor_variant > attribute somewhere in the standard lib; That its content is > completely dictated by a new ./configure argument which is the empty > string by default; And, request that it is left empty by re-packagers > if the installation is 'reasonably standard' ? I'm not sure in what applications that would be useful. > I would strongly prefer _not_ write code that is conditional on such > an attribute. However if there was a clear way for a vendor to > communicate "This is not a standard python runtime" to the python run > time, early failure (in the application) with informative error > messages becomes much more viable. Again: none of the vendors modifies Python in a way that what you get is "not a standard Python runtime". They *all* are "standard Python runtimes". Regards, Martin From talin at acm.org Thu Nov 30 15:40:55 2006 From: talin at acm.org (Talin) Date: Thu, 30 Nov 2006 06:40:55 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456E26FB.2020502@canterbury.ac.nz> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> <456E26FB.2020502@canterbury.ac.nz> Message-ID: <456EED77.5060009@acm.org> Greg Ewing wrote: > Barry Warsaw wrote: >> I'm not sure I like ~/.local though >> - -- it seems counter to the app-specific dot-file approach old >> schoolers like me are used to. > > Problems with that are starting to show, though. > There's a particular Unix account that I've had for > quite a number of years, accumulating much stuff. > Nowadays when I do ls -a ~, I get a directory > listing several screens long... > > The whole concept of "hidden" files seems ill- > considered to me, anyway. It's too easy to forget > that they're there. Putting infrequently-referenced > stuff in a non-hidden location such as ~/local > seems just as good and less magical to me. On OS X, you of course have ~/Library. I suppose the Linux equivalent would be something like ~/lib. Maybe this is something that we should be asking the LSB folks for advice on? > > -- > Greg > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/talin%40acm.org > From talin at acm.org Thu Nov 30 15:49:16 2006 From: talin at acm.org (Talin) Date: Thu, 30 Nov 2006 06:49:16 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> Message-ID: <456EEF6C.40409@acm.org> Barry Warsaw wrote: >> On the "easy_install" naming front, how about "layegg"? > > I think I once proposed "hatch" but that may not be quite the right > word (where's Ken M when you need him? :). I really don't like all these "cute" names, simply because they are obscure. Names that only make sense once you've gotten the joke may be self-gratifying but not good HCI. How about: python -M install Or maybe we could even lobby to get: python --install as a synonym of the above? -- Talin From ronaldoussoren at mac.com Thu Nov 30 15:55:02 2006 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 30 Nov 2006 06:55:02 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456EEF6C.40409@acm.org> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> <456EEF6C.40409@acm.org> Message-ID: On Thursday, November 30, 2006, at 03:49PM, "Talin" wrote: >Barry Warsaw wrote: >>> On the "easy_install" naming front, how about "layegg"? >> >> I think I once proposed "hatch" but that may not be quite the right >> word (where's Ken M when you need him? :). > >I really don't like all these "cute" names, simply because they are >obscure. Names that only make sense once you've gotten the joke may be >self-gratifying but not good HCI. > >How about: > > python -M install > >Or maybe we could even lobby to get: > > python --install > >as a synonym of the above? Maybe because 'install' is just one of the actions? I'd also like to see 'uninstall', 'list' and 'upgrade' actions (and have some very crude code to do this). Ronald From barry at python.org Thu Nov 30 16:20:25 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Nov 2006 10:20:25 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456EED77.5060009@acm.org> References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com> <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org> <456E26FB.2020502@canterbury.ac.nz> <456EED77.5060009@acm.org> Message-ID: <671B99D2-3174-4A5E-B12B-D2241EA22959@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 30, 2006, at 9:40 AM, Talin wrote: > Greg Ewing wrote: >> Barry Warsaw wrote: >>> I'm not sure I like ~/.local though - -- it seems counter to the >>> app-specific dot-file approach old schoolers like me are used to. >> Problems with that are starting to show, though. >> There's a particular Unix account that I've had for >> quite a number of years, accumulating much stuff. >> Nowadays when I do ls -a ~, I get a directory >> listing several screens long... >> The whole concept of "hidden" files seems ill- >> considered to me, anyway. It's too easy to forget >> that they're there. Putting infrequently-referenced >> stuff in a non-hidden location such as ~/local >> seems just as good and less magical to me. > > On OS X, you of course have ~/Library. I suppose the Linux > equivalent would be something like ~/lib. I forgot to add in my previous follow up why I'd prefer ~/.local over ~/local. It's a namespace thing. Dot-files in my home directory are like __names__ in Python -- they don't belong to me. Non-dot-names are my namespace so things like ~/local constrain what I can call my own files. When I switched to OS X for most of my desktops, I had several collisions in this namespace. I keep all my homedir files under subversion and could not check out my environment on my new Mac until I named a few directories (this was exacerbated by the case- insensitive file system). I think in general OS X has less philosophical problem with colliding in the non-dot namespace because most OS X users don't ever /see/ their home directory. They see ~/Desktop. Maybe that's what all the kids are into these days, but I still think dot-names are better to use for a wider acceptance. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRW72vXEjvBPtnXfVAQJUmAP8DOQkDJm35xfpSPmvFPXYNZYRhYk8gdSk yMisPq100d5c0lGvW/LjDyLoyi96vd0IQu/WfSgzbe9MBvJ6egP2R0U9hgwytxo5 VcI7jiqel8KFRqgM+4Xqau7MGRiIBGsNX/V5tzGPBA5QP4eSSEFXh/2i9l7ciWJE bN/byz5zlXo= =8CkG -----END PGP SIGNATURE----- From barry at python.org Thu Nov 30 16:24:02 2006 From: barry at python.org (Barry Warsaw) Date: Thu, 30 Nov 2006 10:24:02 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <456EEF6C.40409@acm.org> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> <456EEF6C.40409@acm.org> Message-ID: <715A197A-4B46-40BD-AB8D-706C810D81C1@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Nov 30, 2006, at 9:49 AM, Talin wrote: > I really don't like all these "cute" names, simply because they are > obscure. Names that only make sense once you've gotten the joke may > be self-gratifying but not good HCI. Warsaw's Fifth Law :) > How about: > > python -M install > > Or maybe we could even lobby to get: > > python --install > > as a synonym of the above? As Ronald points out, installing is only one action, and then you have to handle all of its options too. Maybe that means python -M install --install-dir foo --other-setuptools-options would work, but I don't think just bare --install does. I'm also not sure "python -M install" is a big improvement over "egg" or whatever ("egg" actually isn't bad). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRW73knEjvBPtnXfVAQKXHAQAgpcTBkhkN12H/JNOT2NJFMd+jilYYCnQ DmcwEnKeBEM0VoLelXKRs7xAj+ULownvL3Lv4bBgpXw69lH5ZCMcWLme2lnkD3Ko B0KSUSRS3DjApy4VTSBHW0M78K2n1yJ0XfTp2ceWfk42O1C6Qi6nnFkh2VT617tI hXKYWAzJaFA= =QY+p -----END PGP SIGNATURE----- From janssen at parc.com Thu Nov 30 18:37:43 2006 From: janssen at parc.com (Bill Janssen) Date: Thu, 30 Nov 2006 09:37:43 PST Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <715A197A-4B46-40BD-AB8D-706C810D81C1@python.org> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> <456EEF6C.40409@acm.org> <715A197A-4B46-40BD-AB8D-706C810D81C1@python.org> Message-ID: <06Nov30.093750pst."58648"@synergy1.parc.xerox.com> Perhaps "pyinstall"? Bill > On Nov 30, 2006, at 9:49 AM, Talin wrote: > > > I really don't like all these "cute" names, simply because they are > > obscure. Names that only make sense once you've gotten the joke may > > be self-gratifying but not good HCI. > > Warsaw's Fifth Law :) > > > How about: > > > > python -M install > > > > Or maybe we could even lobby to get: > > > > python --install > > > > as a synonym of the above? > > As Ronald points out, installing is only one action, and then you > have to handle all of its options too. Maybe that means > > python -M install --install-dir foo --other-setuptools-options > > would work, but I don't think just bare --install does. I'm also not > sure "python -M install" is a big improvement over "egg" or whatever > ("egg" actually isn't bad). > > - -Barry From guido at python.org Thu Nov 30 18:49:25 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Nov 2006 09:49:25 -0800 Subject: [Python-Dev] Small tweak to tokenize.py? Message-ID: I've got a small tweak to tokenize.py that I'd like to run by folks here. I'm working on a refactoring tool for Python 2.x-to-3.x conversion, and my approach is to build a full parse tree with annotations that show where the whitespace and comments go. I use the tokenize module to scan the input. This is nearly perfect (I can render code from the parse tree and it will be an exact match of the input) except for continuation lines -- while the tokenize gives me pseudo-tokens for comments and "ignored" newlines, it doesn't give me the backslashes at all (while it does give me the newline following the backslash). It would be trivial to add another yield to tokenize.py when the backslah is detected: --- tokenize.py (revision 52865) +++ tokenize.py (working copy) @@ -370,6 +370,8 @@ elif initial in namechars: # ordinary name yield (NAME, token, spos, epos, line) elif initial == '\\': # continued stmt + # This yield is new; needed for better idempotency: + yield (NL, initial, spos, (spos[0], spos[1]+1), line) continued = 1 else: if initial in '([{': parenlev = parenlev + 1 (Though I think that it should probably yield a single NL pseudo-token whose value is a backslash followed by a newline; or perhaps it should yield the backslash as a comment token, or as a new token. Thoughts?) This wouldn't be 100% backwards compatible, so I'm not dreaming of adding this to 2.5.1, but what about 2.6? (There's another issue with tokenize.py too -- when you use it to parse Python-like source code containing non-Python operators, e.g. '?', it does something bogus.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From glyph at divmod.com Thu Nov 30 19:02:47 2006 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Nov 2006 18:02:47 -0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) Message-ID: <20061130180247.11053.492404401.divmod.xquotient.954@joule.divmod.com> On 05:37 pm, janssen at parc.com wrote: >Perhaps "pyinstall"? Keep in mind that Python packages will still generally be *system*-installed with other tools, like dpkg (or apt) and rpm, on systems which have them. The name of the packaging system we're talking about is called either "eggs" or "setuptools" depending on the context. "pyinstall" invites confusion with "the Python installer", which is a different program, used to install Python itself on Windows. It's just a brand. If users can understand that "Excel" means "Spreadsheet", "Outlook" means "E-Mail", and "GIMP" means "Image Editor", then I think we should give them some credit on being able to figure out what the installer program is called. (I don't really care that much in this particular case, but this was one of my pet peeves with GNOME a while back. There was a brief change to the names of everything in the menus to remove all brand-names: "Firefox" became "Web Browser", "Evolution" became "E-Mail", "Rhythmbox" became "Music Player". I remember looking at my applications menu and wondering which of the 3 "music players" that I had installed the menu would run. Thankfully this nonsense stopped and they compromised on names like "Firefox Web Browser" and "GIMP Image Editor".) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/7e973951/attachment.html From pje at telecommunity.com Thu Nov 30 19:11:10 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 30 Nov 2006 13:11:10 -0500 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130180247.11053.492404401.divmod.xquotient.954@joule. divmod.com> Message-ID: <5.1.1.6.0.20061130130533.02880c78@sparrow.telecommunity.com> At 06:02 PM 11/30/2006 +0000, glyph at divmod.com wrote: >On 05:37 pm, janssen at parc.com wrote: > >Perhaps "pyinstall"? > >Keep in mind that Python packages will still generally be >*system*-installed with other tools, like dpkg (or apt) and rpm, on >systems which have them. The name of the packaging system we're talking >about is called either "eggs" or "setuptools" depending on the context. Just as an FYI, the (planned) name of the packaging program for setuptools is "nest". It doesn't exist yet, however, except for a whole lot of design notes in my outlining program. You'll be able to use commands like "nest list" to show installed projects, "nest source" to fetch a project's source, "nest rm" or "nest uninstall" to uninstall, etc. It's all 100% vaporware at the moment, but that's the plan. I actually looked at other system package managers written in Python (i.e. yum and smart) to use as a possible base for implementing "nest", but unfortunately these are all GPL'd and thus not compatible with the setuptools or Python licenses, so I didn't actually get very far in my evaluation. From pje at telecommunity.com Thu Nov 30 19:22:57 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 30 Nov 2006 13:22:57 -0500 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: Message-ID: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote: >I've got a small tweak to tokenize.py that I'd like to run by folks here. > >I'm working on a refactoring tool for Python 2.x-to-3.x conversion, >and my approach is to build a full parse tree with annotations that >show where the whitespace and comments go. I use the tokenize module >to scan the input. This is nearly perfect (I can render code from the >parse tree and it will be an exact match of the input) except for >continuation lines -- while the tokenize gives me pseudo-tokens for >comments and "ignored" newlines, it doesn't give me the backslashes at >all (while it does give me the newline following the backslash). The following routine will render a token stream, and it automatically restores the missing \'s. I don't know if it'll work with your patch, but perhaps you could use it instead of changing tokenize. For the documentation and examples, see: http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text def detokenize(tokens, indent=0): """Convert `tokens` iterable back to a string.""" out = []; add = out.append lr,lc,last = 0,0,'' baseindent = None for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens): # Insert trailing line continuation and blanks for skipped lines lr = lr or sr # first line of input is first line of output if sr>lr: if last: if len(last)>lc: add(last[lc:]) lr+=1 if sr>lr: add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines lc = 0 # Re-indent first token on line if lc==0: if tok==INDENT: continue # we want to dedent first actual token else: curindent = len(line[:sc].expandtabs()) if baseindent is None and tok not in WHITESPACE: baseindent = curindent elif baseindent is not None and curindent>=baseindent: add(' ' * (curindent-baseindent)) if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE): add(' ' * indent) # Not at start of line, handle intraline whitespace by retaining it elif sc>lc: add(line[lc:sc]) if val: add(val) lr,lc,last = er,ec,line return ''.join(out) From guido at python.org Thu Nov 30 19:28:25 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Nov 2006 10:28:25 -0800 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> Message-ID: Are you opposed changing tokenize? If so, why (apart from compatibility)? ISTM that it would be a good thing if it reported everything except horizontal whitespace. On 11/30/06, Phillip J. Eby wrote: > At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote: > >I've got a small tweak to tokenize.py that I'd like to run by folks here. > > > >I'm working on a refactoring tool for Python 2.x-to-3.x conversion, > >and my approach is to build a full parse tree with annotations that > >show where the whitespace and comments go. I use the tokenize module > >to scan the input. This is nearly perfect (I can render code from the > >parse tree and it will be an exact match of the input) except for > >continuation lines -- while the tokenize gives me pseudo-tokens for > >comments and "ignored" newlines, it doesn't give me the backslashes at > >all (while it does give me the newline following the backslash). > > The following routine will render a token stream, and it automatically > restores the missing \'s. I don't know if it'll work with your patch, but > perhaps you could use it instead of changing tokenize. For the > documentation and examples, see: > > http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text > > > def detokenize(tokens, indent=0): > """Convert `tokens` iterable back to a string.""" > out = []; add = out.append > lr,lc,last = 0,0,'' > baseindent = None > for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens): > # Insert trailing line continuation and blanks for skipped lines > lr = lr or sr # first line of input is first line of output > if sr>lr: > if last: > if len(last)>lc: > add(last[lc:]) > lr+=1 > if sr>lr: > add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines > lc = 0 > > # Re-indent first token on line > if lc==0: > if tok==INDENT: > continue # we want to dedent first actual token > else: > curindent = len(line[:sc].expandtabs()) > if baseindent is None and tok not in WHITESPACE: > baseindent = curindent > elif baseindent is not None and curindent>=baseindent: > add(' ' * (curindent-baseindent)) > if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE): > add(' ' * indent) > > # Not at start of line, handle intraline whitespace by retaining it > elif sc>lc: > add(line[lc:sc]) > > if val: > add(val) > > lr,lc,last = er,ec,line > > return ''.join(out) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Thu Nov 30 19:34:41 2006 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 30 Nov 2006 19:34:41 +0100 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> Message-ID: Guido van Rossum wrote: > Are you opposed changing tokenize? If so, why (apart from > compatibility)? ISTM that it would be a good thing if it reported > everything except horizontal whitespace. it would be a good thing if it could, optionally, be made to report horizontal whitespace as well. From pje at telecommunity.com Thu Nov 30 19:55:44 2006 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 30 Nov 2006 13:55:44 -0500 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com> At 10:28 AM 11/30/2006 -0800, Guido van Rossum wrote: >Are you opposed changing tokenize? If so, why (apart from >compatibility)? Nothing apart from compatibility. I think you should have to explicitly request the new behavior(s), since tools (like detokenize) written to work around the old behavior might behave oddly with the change. Mainly, though, I thought you might find the code useful, given the nature of your project. (Although I suppose you've probably already written something similar.) From python at rcn.com Thu Nov 30 20:12:16 2006 From: python at rcn.com (python at rcn.com) Date: Thu, 30 Nov 2006 14:12:16 -0500 (EST) Subject: [Python-Dev] Small tweak to tokenize.py? Message-ID: <20061130141216.AOY93108@ms09.lnh.mail.rcn.net> > It would be trivial to add another yield to tokenize.py when > the backslah is detected +1 > I think that it should probably yield a single NL pseudo-token > whose value is a backslash followed by a newline; or perhaps it > should yield the backslash as a comment token, or as a new token. The first option is likely the most compatible with existing uses of tokenize. If a comment token were emitted, an existing colorizer or pretty-printer would markup the continuation as a comment (possibly not what the tool author intended). If a new token were created, it might break if-elif-else chains in tools that thought they knew the universe of possible token types. Raymond From lists at janc.be Thu Nov 30 21:05:49 2006 From: lists at janc.be (Jan Claeys) Date: Thu, 30 Nov 2006 21:05:49 +0100 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061129112354.GA30665@code0.codespeak.net> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <20061129112354.GA30665@code0.codespeak.net> Message-ID: <1164917149.31269.344.camel@localhost> Op woensdag 29-11-2006 om 12:23 uur [tijdzone +0100], schreef Armin Rigo: > I could not agree more. Nowadays, whenever I get an account on a new > Linux machine, the first thing I have to do is reinstall Python > correctly in my home dir because the system Python lacks distutils. > Wasteful. (There are some applications and libraries that use > distutils at run-time to compile things, and I'm using such > applications and libraries on a daily basis.) I think you should blame the sysadmins, and kick them to install python properly for use by a developer, because every distro I know provides distutils... ;-) -- Jan Claeys From guido at python.org Thu Nov 30 22:46:01 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Nov 2006 13:46:01 -0800 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: <5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com> References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> <5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com> Message-ID: On 11/30/06, Phillip J. Eby wrote: > At 10:28 AM 11/30/2006 -0800, Guido van Rossum wrote: > >Are you opposed changing tokenize? If so, why (apart from > >compatibility)? > > Nothing apart from compatibility. I think you should have to explicitly > request the new behavior(s), since tools (like detokenize) written to work > around the old behavior might behave oddly with the change. Can you test it with this new change (slightly different from before)? It reports a NL pseudo-token with as its text value '\\\n' (or '\\\r\n' if the line ends in \r\n). @@ -370,6 +370,8 @@ elif initial in namechars: # ordinary name yield (NAME, token, spos, epos, line) elif initial == '\\': # continued stmt + # This yield is new; needed for better idempotency: + yield (NL, token, spos, (lnum, pos), line) continued = 1 else: if initial in '([{': parenlev = parenlev + 1 > Mainly, though, I thought you might find the code useful, given the nature > of your project. (Although I suppose you've probably already written > something similar.) Indeed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steve at holdenweb.com Thu Nov 30 22:48:53 2006 From: steve at holdenweb.com (Steve Holden) Date: Thu, 30 Nov 2006 21:48:53 +0000 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <1164917149.31269.344.camel@localhost> References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com> <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com> <200611290053.17199.anthony@interlink.com.au> <20061129112354.GA30665@code0.codespeak.net> <1164917149.31269.344.camel@localhost> Message-ID: <456F51C5.8080500@holdenweb.com> Jan Claeys wrote: > Op woensdag 29-11-2006 om 12:23 uur [tijdzone +0100], schreef Armin > Rigo: >> I could not agree more. Nowadays, whenever I get an account on a new >> Linux machine, the first thing I have to do is reinstall Python >> correctly in my home dir because the system Python lacks distutils. >> Wasteful. (There are some applications and libraries that use >> distutils at run-time to compile things, and I'm using such >> applications and libraries on a daily basis.) > > I think you should blame the sysadmins, and kick them to install python > properly for use by a developer, because every distro I know provides > distutils... ;-) > > I think the point is that some distros (Debian is the one that springs to mind most readily, but I'm not a distro archivist) require a separate install for distutils even though it's been a part of the standard *Python* distro since 2.3 (2.2?) So, it isn't that you can't get distutils, it's that you have to take an extra step over and above installing Python. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden From guido at python.org Thu Nov 30 22:49:30 2006 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Nov 2006 13:49:30 -0800 Subject: [Python-Dev] Small tweak to tokenize.py? In-Reply-To: References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com> Message-ID: On 11/30/06, Fredrik Lundh wrote: > Guido van Rossum wrote: > > > Are you opposed changing tokenize? If so, why (apart from > > compatibility)? ISTM that it would be a good thing if it reported > > everything except horizontal whitespace. > > it would be a good thing if it could, optionally, be made to report > horizontal whitespace as well. It's remarkably easy to get this out of the existing API; keep track of the end position returned by the previous call, and if it's different from the start position returned by the next call, slice the line text from the column positions, assuming the line numbers are the same. If the line numbers differ, something has been eating \n tokens; this shouldn't happen any more with my patch. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From sluggoster at gmail.com Thu Nov 30 23:46:22 2006 From: sluggoster at gmail.com (Mike Orr) Date: Thu, 30 Nov 2006 14:46:22 -0800 Subject: [Python-Dev] Python and the Linux Standard Base (LSB) In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com> Message-ID: <6e9196d20611301446y629c04cdn6c8215dfe065d006@mail.gmail.com> On 11/29/06, glyph at divmod.com wrote: > The major advantage ~/.local has for *nix systems is the ability to have a > parallel *bin* directory, which provides the user one location to set their > $PATH to, so that installed scripts work as expected, rather than having to > edit a bunch of .foorc files to add to your environment with each additional > package. After all, what's the point of a per-user "install" if the > software isn't actually installed in any meaningful way, and you have to > manually edit your shell startup scripts, log out and log in again anyway? > Another nice feature there is that it uses a pre-existing layout convention > (bin lib share etc ...) rather than attempting to build a new one, so the > only thing that has to change about the package installation is the root. Putting programs and libraries in a hidden directory? Things the user intends to run or inspect? Putting a hidden directory on $PATH? I'm... stunned. It sounds like a very bad idea. Dotfiles are for a program's internal state: "black box" stuff. Not programs the user will run, and not Python modules he may want to inspect or subclass. ~/bin and ~/lib already work well with both Virtual Python and ./configure, and it's what many users are already doing. On the other hand, the freedesktop link says ~/.local can be overridden with environment variables. That may be an acceptable compromise between the two. Speaking of Virtual Python [1], I've heard some people recommending it as a general solution to the "this library breaks that other application" problem and "this app needs a different version of X library than that other app does". I've started using it off and on but haven't come to any general conclusion on it. Is it becoming pretty widespread among Python users. Would it be worth mentioning in the LSB/FHS? It only works on *nix systems currently, but Linux is a *nix system anyway. [1] http://peak.telecommunity.com/dist/virtual-python.py (It installs a pseudo copy of Python symlinked to the system one, so that you have your own site--packages directory independent of others ) -- Mike Orr