From greg at krypto.org Fri May 1 00:44:12 2009 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 30 Apr 2009 15:44:12 -0700 Subject: [Python-Dev] Proposed: a new function-based C API for declaring Python types In-Reply-To: <49F7C37C.5090305@hastings.org> References: <49F7C37C.5090305@hastings.org> Message-ID: <52dc1c820904301544m649b78acr6238b66d9a63be61@mail.gmail.com> On Tue, Apr 28, 2009 at 8:03 PM, Larry Hastings wrote: > > EXECUTIVE SUMMARY > > I've written a patch against py3k trunk creating a new function-based > API for creating extension types in C. This allows PyTypeObject to > become a (mostly) private structure. > > > THE PROBLEM > > Here's how you create an extension type using the current API. > > * First, find some code that already has a working type declaration. > Copy and paste their fifty-line PyTypeObject declaration, then > hack it up until it looks like what you need. > > * Next--hey! There *is* no next, you're done. You can immediately > create an object using your type and pass it into the Python > interpreter and it would work fine. You are encouraged to call > PyType_Ready(), but this isn't required and it's often skipped. > > This approach causes two problems. > > 1) The Python interpreter *must support* and *cannot change* > the PyTypeObject structure, forever. Any meaningful change to > the structure will break every extension. This has many > consequences: > a) Fields that are no longer used must be left in place, > forever, as ignored placeholders if need be. Py3k cleaned > up a lot of these, but it's already picked up a new one > ("tp_compare" is now "tp_reserved"). > b) Internal implementation details of the type system must > be public. > c) The interpreter can't even use a different structure > internally, because extensions are free to pass in objects > using PyTypeObjects the interpreter has never seen before. > > 2) As a programming interface this lacks a certain gentility. It > clearly *works*, but it requires programmers to copy and paste > with a large structure mostly containing NULLs, which they must > pick carefully through to change just a few fields. > > > THE SOLUTION > > My patch creates a new function-based extension type definition API. > You create a type by calling PyType_New(), then call various accessor > functions on the type (PyType_SetString and the like), and when your > type has been completely populated you must call PyType_Activate() > to enable it for use. > > With this API available, extension authors no longer need to directly > see the innards of the PyTypeObject structure. Well, most of the > fields anyway. There are a few shortcut macros in CPython that need > to continue working for performance reasons, so the "tp_flags" and > "tp_dealloc" fields need to remain publically visible. > > One feature worth mentioning is that the API is type-safe. Many such > APIs would have had one generic "PyType_SetPointer", taking an > identifier for the field and a void * for its value, but this would > have lost type safety. Another approach would have been to have one > accessor per field ("PyType_SetAddFunction"), but this would have > exploded the number of functions in the API. My API splits the > difference: each distinct *type* has its own set of accessors > ("PyType_GetSSizeT") which takes an identifier specifying which > field you wish to get or set. > > > SIDE-EFFECTS OF THE API > > The major change resulting from this API: all PyTypeObjects must now > be *pointers* rather than static instances. For example, the external > declaration of PyType_Type itself changes from this: > PyAPI_DATA(PyTypeObject) PyType_Type; > to this: > PyAPI_DATA(PyTypeObject *) PyType_Type; > > This gives rise to the first headache caused by the API: type casts > on type objects. It took me a day and a half to realize that this, > from Modules/_weakref.c: > PyModule_AddObject(m, "ref", > (PyObject *) &_PyWeakref_RefType); > really needed to be this: > PyModule_AddObject(m, "ref", > (PyObject *) _PyWeakref_RefType); > > Hopefully I've already found most of these in CPython itself, but > this sort of code surely lurks in extensions yet to be touched. > > (Pro-tip: if you're working with this patch, and you see a crash, > and gdb shows you something like this at the top of the stack: > #0 0x081056d8 in visit_decref (op=0x8247aa0, data=0x0) > at Modules/gcmodule.c:323 > 323 if (PyObject_IS_GC(op)) { > your problem is an errant &, likely on a type object you're passing > in to the interpreter. Think--what did you touch recently? Or debug > it by salting your code with calls to collect(NUM_GENERATIONS-1).) > > > Another irksome side-effect of the API: because of "tp_flags" and > "tp_dealloc", I now have two declarations of PyTypeObject. There's > the externally-visible one in Include/object.h, which lets external > parties see "tp_dealloc" and "tp_flags". Then there's the internal > one in Objects/typeprivate.h which is the real structure. Since > declaring a type twice is a no-no, the external one is gated on > #ifndef PY_TYPEPRIVATE > If you're a normal Python extension programmer, you'd include Python.h > as normal: > #include "Python.h" > Python implementation files that need to see the real PyTypeObject > structure now look like this: > #define PY_TYPEPRIVATE > #include "Python.h" > #include "../Objects/typeprivate.h" > > Also, since the structure of PyTypeObject hasn't yet changed, there > are a bunch of fields in PyTypeObject that are externally visible that > I don't want to be visible. To ensure no one was using them, I renamed > them to "mysterious_object_0" and "mysterious_object_1" and the like. > Before this patch gets accepted, I want to reorder the fields in > PyTypeObject (which we can! because it's private!) so that these public > fields are at the top of the both the external and internal structures. > > > THE UPGRADE PATH > > Python internally declares a great many types, and I haven't attempted > to convert them all. Instead there's an conversion header file that > does most of the work for you. Here's how one would apply it to an > existing type. > > 1. Where your file currently has this: > #include "Python.h" > change it to this: > #define PY_TYPEPRIVATE > #include "Python.h" > #include "pytypeconvert.h" > > 2. Whenever you declare a type, change it from this: > static PyTypeObject YourExtension_Type = { > to this: > static PyTypeObject *YourExtension_Type; > static PyTypeObject _YourExtension_Type = { > > Use NULL for your metaclass. For example, change this: > PyObject_HEAD_INIT(&PyType_Type), > to this: > PyObject_HEAD_INIT(NULL), > > Also use NULL for your baseclass. For example, change this: > &PyDict_Type, /* tp_base */ > to this: > NULL, /* tp_base */ > setting it to NULL instead. > > 3. In your module's init function, add this: > CONVERT_TYPE(YourExtension_Type, > metaclass, baseclass, "description of type"); > "metaclass" and "baseclass" should be the metaclass and baseclass > for your type, the ones you just set to NULL in step 3. If you > had NULL before the baseclass, use NULL here too. > > 4. If you have any static object declarations, set their ob_type to > NULL in the static declaration, then set it explicitly in your > init function. If your object uses a locally-defined type, > be sure to do this *after* the CONVERT_TYPE line for that type. > (See _Py_EllipsisObject for an example.) > > 5. Anywhere you're using existing Python type declarations > you must remove the & from the front. > > The conversion header file *also* redefines PyTypeObject. But this > time it redefines it to the existing definition, and that definition > will stay the same forever. That's the whole point: if you have an > existing Python 3.0 extension, it won't have to change if we change > the internal definition of PyTypeObject. > > (Why bother with this conversion process, with few py3k extensions > in the wild? This patch was started quite a while ago, when it > seemed plausible the API would get backported to 2.x. Now I'm not > so sure that will happen.) > > > > > THE CURRENT PATCH > > I've uploaded a patch to the tracker: > http://bugs.python.org/issue5872 > It applies cleanly to py3k/trunk (r72081). But the code is awfully > grubby. > > * I haven't dealt with any types I can't build, and I can't build > a lot of the extensions. I'm using Linux, and I don't have the > dev headers for many libraries on my laptop, and I haven't touched > Windows or Mac stuff. > > * I created some new build warnings which should obviously be fixed. > > * With the patch installed, py3k trunk builds and installs. It does > *not* pass the regression test suite. (It crashes.) I don't think > this'll be too bad, it's just taken me this long to get it as far > as I have. > > * There are some internal scaffolds and hacks that should be purged > by the final patch. > > * There's no documentation. If you'd like to see how you'd use the > new API, currently the best way to learn is to read > Include/pytypeconvert.h. > > * I don't like the PY_TYPEPRIVATE hack. I only used it 'cause it > sucks less than the other approaches I've thought of. I welcome > your suggestions. > > The second-best approach I've come up with: make PyTypeObject > genuinely private, and declare a different structure containing just > the head of PyTypeObject. Let's call it PyTypeObjectHead. Then, > for the convenience macros that use "dealloc" and "flags", cast the > object to PyTypeObjectHead before dereferencing. This abandons type > safety, and given my longing for type safety while developing this > patch I'd prefer to not make loss of type safety an official API. > > THE FEEDBACK I SEEK > > My understanding is that the feature-freeze for Python 3.1 is in a > little over a week. Given the current stability level and untestedness > of the patch, and the lateness of the hour... is there any chance this > would be accepted into Python 3.1? If so, I'll need to act fast. If > not, I might as well take it relax, huh. > > > My thanks to Neal Norwitz for suggesting this project, and Brett Cannon > for some recent encouragement. (And another person who I discussed it > with so long ago I forgot who it was... maybe Fredik Lundh?) > > > /larry/ +1 I haven't looked at your code so I can't comment on the API itself... But awesome. I like the general idea. Exposing structures has hampered us for quite a while with forwards API compatability. I predict not enough people are available to drive this to adoption and use for Python 3.1 given the time frame (the beta feature freeze happens this Saturday I believe?) but we should make this happen for 3.2 and get it stable and into in trunk soon after release-31maint branch is created. Whats needed? Perhaps a PEP describing a lot of what you started to write up in this email: the new extension module API with sections on the upgrade path and backwards compatibillity story. Extension modules are often maintained such that they work on all versions of Python from 2.3 or 2.4 on up to 3.x. We should provide a decent way to do that. Could some of these API functions be provided as a rarely changing add on .c/.h file for extension module authors to bundle as part of their extension modules for use with older versions of python to avoid big #ifdefs around structure definitions vs initialization API calls? -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From skippy.hammond at gmail.com Fri May 1 02:20:52 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Fri, 01 May 2009 10:20:52 +1000 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49F9FCD0.80208@hastings.org> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> <49F9FCD0.80208@hastings.org> Message-ID: <49FA4064.5000508@gmail.com> Larry Hastings wrote: > > > Counting the votes for http://bugs.python.org/issue5799 : > > +1 from Mark Hammond (via private mail) > +1 from Paul Moore (via the tracker) > +1 from Tim Golden (in Python-ideas, though what he literally said > was "I'm up for it") > +1 from Michael Foord > +1 from Eric Smith > > There have been no other votes. > > Is that enough consensus for it to go in? If so, are there any core > developers who could help me get it in before the 3.1 feature freeze? > The patch should be in good shape; it has unit tests and updated > documentation. I've taken the liberty of explicitly CCing Martin just incase he missed the thread with all the noise regarding PEP383. If there are no objections from Martin or anyone else here, please feel free to assign it to me (and mail if I haven't taken action by the day before the beta freeze...) Cheers, Mark From steve at pearwood.info Fri May 1 04:40:14 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 1 May 2009 12:40:14 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com> References: <20090427211447.GA4291@cskk.homeip.net> <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com> Message-ID: <200905011240.14428.steve@pearwood.info> On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote: > You can get the same error on Linux: > > $ python > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) > [GCC 4.3.3] on linux2 > Type "help", "copyright", "credits" or "license" for more > information. > > >>> f=open(chr(255),'w') > > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' Works for me under Fedora using ext3 as the file system. $ python2.6 Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13) [GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f=open(chr(255),'w') >>> f.close() >>> import os >>> os.remove(chr(255)) >>> Given that chr(255) is a valid filename on my file system, I would consider it a bug if Python couldn't deal with a file with that name. -- Steven D'Aprano From ronaldoussoren at mac.com Fri May 1 07:41:16 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 01 May 2009 07:41:16 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> Message-ID: <67A75595-8D07-4D65-A234-301A8B45FB29@mac.com> On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote: >>>>>> Ronald Oussoren (RO) wrote: > >> RO> For what it's worth, the OSX API's seem to behave as follows: >> RO> * If you create a file with an non-UTF8 name on a HFS+ >> filesystem the >> RO> system automaticly encodes the name. > >> RO> That is, open(chr(255), 'w') will silently create a file named >> '%FF' >> RO> instead of the name you'd expect on a unix system. > > Not for me (I am using Python 2.6.2). > >>>> f = open(chr(255), 'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' >>>> That's odd. Which version of OSX do you use? ronald at Rivendell-2[0]$ sw_vers ProductName: Mac OS X ProductVersion: 10.5.6 BuildVersion: 9G55 [~/testdir] ronald at Rivendell-2[0]$ /usr/bin/python Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.listdir('.') [] >>> open(chr(255), 'w').write('x') >>> os.listdir('.') ['%FF'] >>> And likewise with python 2.6.1+ (after cleaning the directory): [~/testdir] ronald at Rivendell-2[0]$ python2.6 Python 2.6.1+ (release26-maint:70603, Mar 26 2009, 08:38:03) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.listdir('.') [] >>> open(chr(255), 'w').write('x') >>> os.listdir('.') ['%FF'] >>> > > I once got a tar file from a Linux system which contained a file > with a > non-ASCII, ISO-8859-1 encoded filename. The tar file refused to be > unpacked on a HFS+ filesystem. > -- > Piet van Oostrum > URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] > Private email: piet at vanoostrum.org -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From zookog at gmail.com Fri May 1 07:44:36 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Thu, 30 Apr 2009 23:44:36 -0600 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: Folks: My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary binary names from the filesystem and store them so that I can regenerate the same byte string later, but it also requires that I *know* whether what I got was a valid string in the expected encoding (which might be utf-8) or whether it was not and I need to fall back to storing the bytes. So far, it looks like PEP 383 doesn't provide both of these requirements, so I am going to have to continue working-around the Python API even after PEP 383. In fact, it might actually increase the amount of working-around that I have to do. If I understand correctly, .decode(encoding, 'strict') will not be changed by PEP 383. A new error handler is added, so .decode('utf-8', 'python-escape') performs the utf-8b decoding. Am I right so far? Therefore if I have a string of bytes, I can attempt to decode it with 'strict', and if that fails I can set the flag showing that it was not a valid byte string in the expected encoding, and then I can invoke .decode('utf-8', 'python-escape') on it. So far, so good. (Note that I never want to do .decode(expected_encoding, 'python-escape') -- if it wasn't a valid bytestring in the expected_encoding, then I want to decode it with utf-8b, regardless of what the expected encoding was.) Anyway, I can use it like this: class FName: def __init__(self, name, failed_decode=False): self.name = name self.failed_decode = failed_decode def fs_to_unicode(bytes): try: return FName(bytes.decode(sys.getfilesystemencoding(), 'strict')) except UnicodeDecodeError: return FName(fn.decode('utf-8', 'python-escape'), failed_decode=True) And what about unicode-oriented APIs such as os.listdir()? Uh-oh, the PEP says that on systems with locale 'utf-8', it will automatically be changed to 'utf-8b'. This means I can't reliably find out whether the entries in the directory *were* named with valid encodings in utf-8? That's not acceptable for my use case. I would have to refrain from using the unicode-oriented os.listdir() on POSIX, and instead do something like this: if platform.system() in ('Windows', 'Darwin'): def listdir(d): return [FName(n) for n in os.listdir(d)] elif platform.system() in ('Linux', 'SunOs'): def listdir(d): bytesd = d.encode(sys.getfilesystemencoding()) return [fs_to_unicode(n) for n in os.listdir(bytesd)] else: raise NotImplementedError("Please classify platform.system() == %s \ as either unicode-safe or unicode-unsafe." % platform.system()) In fact, if 'utf-8' gets automatically converted to 'utf-8b' when *decoding* as well as encoding, then I would have to change my fs_to_unicode() function to check for that and make sure to use strict utf-8 in the first attempt: def fs_to_unicode(bytes): fse = sys.getfilesystemencoding() if fse == 'utf-8b': fse = 'utf-8' try: return FName(bytes.decode(fse, 'strict')) except UnicodeDecodeError: return FName(fn.decode('utf-8', 'python-escape'), failed_decode=True) Would it be possible for Python unicode objects to have a flag indicating whether the 'python-escape' error handler was present? That would serve the same purpose as my "failed_decode" flag above, and would basically allow me to use the Python APIs directory and make all this work-around code disappear. Failing that, I can't see any way to use the os.listdir() in its unicode-oriented mode to satisfy Tahoe's requirements. If you take the above code and then add the fact that you want to use the failed_decode flag when *encoding* the d argument to os.listdir(), then you get this code: [2]. Oh, I just realized that I *could* use the PEP 383 os.listdir(), like this: def listdir(d): fse = sys.getfilesystemencoding() if fse == 'utf-8b': fse = 'utf-8' ns = [] for fn in os.listdir(d): bytes = fn.encode(fse, 'python-escape') try: ns.append(FName(bytes.decode(fse, 'strict'))) except UnicodeDecodeError: ns.append(FName(fn.decode('utf-8', 'python-escape'), failed_decode=True)) return ns (And I guess I could define listdir() like this only on the non-unicode-safe platforms, as above.) However, that strikes me as even more horrible than the previous "listdir()" work-around, in part because it means decoding, re-encoding, and re-decoding every name, so I think I would stick with the previous version. Oh, one more note: for Tahoe's purposes you can, in all of the code above, replace ".decode('utf-8', 'python-replace')" with ".decode('windows-1252')" and it works just as well. While UTF-8b seems like a really cool hack, and it would produce more legible results if utf-8-encoded strings were partially corrupted, I guess I should just use 'windows-1252' which is already implemented in Python 2 (as well as in all other software in the world). I guess this means that PEP 383, which I have approved of and liked so far in this discussion, would actually not help Tahoe at all and would in fact harm Tahoe -- I would have to remember to detect and work-around the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python 3. If anyone else has a concrete, real use case which would be helped by PEP 383, I would like to hear about it. Perhaps Tahoe can learn something from it. Oh, if this PEP could be extended to add a flag to each unicode object indicating whether it was created with the python-escape handler or not, then it would be useful to me. Regards, Zooko [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py From martin at v.loewis.de Fri May 1 08:25:34 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 08:25:34 +0200 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49FA4064.5000508@gmail.com> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> <49F9FCD0.80208@hastings.org> <49FA4064.5000508@gmail.com> Message-ID: <49FA95DE.8060409@v.loewis.de> > I've taken the liberty of explicitly CCing Martin just incase he missed > the thread with all the noise regarding PEP383. > > If there are no objections from Martin It's fine with me - I just won't have time to look into the details of that change. Regards, Martin From fuzzyman at voidspace.org.uk Fri May 1 11:06:08 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 01 May 2009 10:06:08 +0100 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: <49FABB80.8050301@voidspace.org.uk> Zooko O'Whielacronx wrote: > [snip...] > Would it be possible for Python unicode objects to have a flag > indicating whether the 'python-escape' error handler was present? That > would serve the same purpose as my "failed_decode" flag above, and would > basically allow me to use the Python APIs directory and make all this > work-around code disappear. > > Failing that, I can't see any way to use the os.listdir() in its > unicode-oriented mode to satisfy Tahoe's requirements. > > If you take the above code and then add the fact that you want to use > the failed_decode flag when *encoding* the d argument to os.listdir(), > then you get this code: [2]. > > Oh, I just realized that I *could* use the PEP 383 os.listdir(), like > this: > > def listdir(d): > fse = sys.getfilesystemencoding() > if fse == 'utf-8b': > fse = 'utf-8' > ns = [] > for fn in os.listdir(d): > bytes = fn.encode(fse, 'python-escape') > try: > ns.append(FName(bytes.decode(fse, 'strict'))) > except UnicodeDecodeError: > ns.append(FName(fn.decode('utf-8', 'python-escape'), > failed_decode=True)) > return ns > > (And I guess I could define listdir() like this only on the > non-unicode-safe platforms, as above.) > > However, that strikes me as even more horrible than the previous > "listdir()" work-around, in part because it means decoding, re-encoding, > and re-decoding every name, so I think I would stick with the previous > version. > The current unicode mode would skip the filenames you are interested (those that fail to decode correctly) - so you would have been forced to use the bytes mode. If you need access to the original bytes then you should continue to do this. PEP-383 is entirely neutral for your use case as far as I can see. Michael > Oh, one more note: for Tahoe's purposes you can, in all of the code > above, replace ".decode('utf-8', 'python-replace')" with > ".decode('windows-1252')" and it works just as well. While UTF-8b seems > like a really cool hack, and it would produce more legible results if > utf-8-encoded strings were partially corrupted, I guess I should just > use 'windows-1252' which is already implemented in Python 2 (as well as > in all other software in the world). > > I guess this means that PEP 383, which I have approved of and liked so > far in this discussion, would actually not help Tahoe at all and would > in fact harm Tahoe -- I would have to remember to detect and work-around > the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python > 3. > > If anyone else has a concrete, real use case which would be helped by > PEP 383, I would like to hear about it. Perhaps Tahoe can learn > something from it. > > Oh, if this PEP could be extended to add a flag to each unicode object > indicating whether it was created with the python-escape handler or not, > then it would be useful to me. > > Regards, > > Zooko > > [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html > [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From rdmurray at bitdance.com Fri May 1 13:13:24 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 1 May 2009 07:13:24 -0400 (EDT) Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: On Thu, 30 Apr 2009 at 23:44, Zooko O'Whielacronx wrote: > Would it be possible for Python unicode objects to have a flag > indicating whether the 'python-escape' error handler was present? That Unless I'm misunderstanding something, couldn't you implement what you need by looking in a given string for the half surrogates? If you find one, you have a string python-escape modified, if you don't, it didn't. What does Tahoe do on Windows when it gets a filename that is not valid Unicode? You might not even have to conditionalize the above code on platform (ie: instead you have a generalized is_valid_unicode test function that you always use). --David From martin at v.loewis.de Fri May 1 17:16:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 17:16:16 +0200 Subject: [Python-Dev] Deferring PEP 382 Message-ID: <49FB1240.50403@v.loewis.de> During Guido's review, we discovered that PEP 382 doesn't deal with PEP 302 loaders; I believe that it should, though. Rather than coming up with an ad-hoc design, I propose to defer the PEP to Python 3.2 - unless somebody can propose a straight-forward design with not too many new interfaces. FWIW, my own approach would be to add two new interfaces to loaders: 1. extend the package path according to .pth files available to the loader (alternatively, provide the contents of the .pth files of the package in question) 2. search for and execute a package initialization module. Regards, Martin From stephen at xemacs.org Fri May 1 17:36:39 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 02 May 2009 00:36:39 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F97275.3010307@v.loewis.de> <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net> Message-ID: <87skjoj0mw.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > in python. It seems like the most common reason why people want to use > SJIS is to make old pre-unicode apps work right in WINE -- in which > case it doesn't actually affect unix python at all. Mounting external drives, especially USB memory sticks which tend to be FAT-initialized by the manufacturers, is another common case. But I don't understand why PEP 383 needs to care at all. From zookog at gmail.com Fri May 1 17:31:01 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Fri, 1 May 2009 09:31:01 -0600 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: Following-up to my own post to correct a major error: On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx wrote: > Folks: > > My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary > binary names from the filesystem and store them so that I can regenerate > the same byte string later, but it also requires that I *know* whether > what I got was a valid string in the expected encoding (which might be > utf-8) or whether it was not and I need to fall back to storing the > bytes. Okay, I am wrong about this. Having a flag to remember whether I had to fall back to the utf-8b trick is one method to implement my requirement, but my actual requirement is this: Requirement: either the unicode string or the bytes are faithfully transmitted from one system to another. That is: if you read a filename from the filesystem, and transmit that filename to another system and use it, then there are two cases: Requirement 1: the byte string was valid in the encoding of source system, in which case the unicode name is faithfully transmitted (i.e. the bytes that finally land on the target system are the result of sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding). Requirement 2: the byte string was not valid in the encoding of source system, in which case the bytes are faithfully transmitted (i.e. the bytes that finally land on the target system are the same as the bytes that originated in the source system). Now I finally understand how fiendishly clever MvL's PEP 383 generalization of Markus Kuhn's utf-8b trick is! The only thing necessary to achieve both of those requirements above is that the 'python-escape' error handler is used on the target system .encode() as well as on the source system .decode()! Well, I'm going to have to let this sink in and maybe write some code to see if I really understand it. But if this is right, then I can do away with some of the mechanism that I've built up, and instead: Backport PEP 383 to Python 2. And, document the PEP 383 trick in some generic, widely respected format such as an Internet Draft so that I can explain to other users of the Tahoe data (many of whom use other languages than Python) what they have to do if they find invalid utf-8 in the data. Oh good, I just realized that Tahoe emits only utf-8, so all I have to do is point them to the utf-8b documents (such as they are) and explain that to read filenames produced by Tahoe they have to implement utf-8b. That's really good that they don't have to implement MvL's generalization of that trick to other encodings, since utf-8b is already understood by some folks. Okay, I find it surprisingly easy to make subtle errors in this encoding stuff, so please let me know if you spot one. Is it true that srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', 'python-escape') will always produce srcbytes ? That is my Requirement 2. Regards, Zooko From google at mrabarnett.plus.com Fri May 1 17:33:47 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 01 May 2009 16:33:47 +0100 Subject: [Python-Dev] Oddity PEP 0 key Message-ID: <49FB165B.9070909@mrabarnett.plus.com> I've just noticed an oddity in the key in PEP 0. Most letters are used more than once. Wouldn't it be clearer if different letters were used for "Accepted" and "Active" instead of them both being 'A', for example? -> A - Accepted proposal -> R - Rejected proposal W - Withdrawn proposal -> D - Deferred proposal F - Final proposal -> A - Active proposal -> D - Draft proposal -> R - Replaced proposal From google at mrabarnett.plus.com Fri May 1 17:52:50 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 01 May 2009 16:52:50 +0100 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: <49FB1AD2.9010704@mrabarnett.plus.com> Zooko O'Whielacronx wrote: > Following-up to my own post to correct a major error: > > > On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx wrote: >> Folks: >> >> My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary >> binary names from the filesystem and store them so that I can regenerate >> the same byte string later, but it also requires that I *know* whether >> what I got was a valid string in the expected encoding (which might be >> utf-8) or whether it was not and I need to fall back to storing the >> bytes. > > Okay, I am wrong about this. Having a flag to remember whether I had to > fall back to the utf-8b trick is one method to implement my requirement, > but my actual requirement is this: > > Requirement: either the unicode string or the bytes are faithfully > transmitted from one system to another. > > That is: if you read a filename from the filesystem, and transmit that > filename to another system and use it, then there are two cases: > > Requirement 1: the byte string was valid in the encoding of source > system, in which case the unicode name is faithfully transmitted > (i.e. the bytes that finally land on the target system are the result of > sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding). > > Requirement 2: the byte string was not valid in the encoding of source > system, in which case the bytes are faithfully transmitted (i.e. the > bytes that finally land on the target system are the same as the bytes > that originated in the source system). > > Now I finally understand how fiendishly clever MvL's PEP 383 > generalization of Markus Kuhn's utf-8b trick is! The only thing > necessary to achieve both of those requirements above is that the > 'python-escape' error handler is used on the target system .encode() as > well as on the source system .decode()! > > Well, I'm going to have to let this sink in and maybe write some code to > see if I really understand it. > > But if this is right, then I can do away with some of the mechanism that > I've built up, and instead: > > Backport PEP 383 to Python 2. > > And, document the PEP 383 trick in some generic, widely respected format > such as an Internet Draft so that I can explain to other users of the > Tahoe data (many of whom use other languages than Python) what they have > to do if they find invalid utf-8 in the data. Oh good, I just realized > that Tahoe emits only utf-8, so all I have to do is point them to the > utf-8b documents (such as they are) and explain that to read filenames > produced by Tahoe they have to implement utf-8b. That's really good > that they don't have to implement MvL's generalization of that trick to > other encodings, since utf-8b is already understood by some folks. > > > Okay, I find it surprisingly easy to make subtle errors in this encoding > stuff, so please let me know if you spot one. Is it true that > srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', > 'python-escape') will always produce srcbytes ? That is my Requirement > 2. > No, but srcbytes.encode('utf-8', 'python-escape').decode('utf-8', 'python-escape') == srcbytes. The encodings on both ends need to be the same. For example: >>> b'\x80'.decode('windows-1252') u'\u20ac' >>> u'\u20ac'.encode('utf-8') '\xe2\x82\xac' Currently: >>> b'\x80'.decode('utf-8') Traceback (most recent call last): File "", line 1, in b'\x80'.decode('utf-8') File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte But under this PEP: >>> b'x80'.decode('utf-8', 'python-escape') u'\xdc80' >>> u'\xdc80'.encode('utf-8', 'python-escape') '\x80' From status at bugs.python.org Fri May 1 18:07:30 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 1 May 2009 18:07:30 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090501160730.695547822F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (04/24/09 - 05/01/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2190 open (+34) / 15527 closed (+29) / 17717 total (+63) Open issues with patches: 861 Average duration of open issues: 645 days. Median duration of open issues: 394 days. Open Issues Breakdown open 2156 (+33) pending 33 ( +1) Issues Created Or Reopened (63) _______________________________ os.path.walk fails to descend into a directory whose name ends w 04/24/09 CLOSED http://bugs.python.org/issue5832 created linuxelf readline update 04/24/09 http://bugs.python.org/issue5833 created jrevans1 patch The word "error" used instead of "failure" 04/25/09 CLOSED http://bugs.python.org/issue5834 created kurtmckee Deprecate PyOS_ascii_formatd 04/25/09 CLOSED http://bugs.python.org/issue5835 created eric.smith Clean up float parsing code for nans and infs 04/25/09 CLOSED http://bugs.python.org/issue5836 created marketdickinson support.EnvironmentVarGuard broken 04/25/09 CLOSED http://bugs.python.org/issue5837 created doerwalter easy Test issue 04/25/09 CLOSED http://bugs.python.org/issue5838 created ajaksu2 RegOpenKeyEx key failed on Vista 64Bit with return 2 04/25/09 http://bugs.python.org/issue5839 created makursi "Thread State and the Global Interpreter Lock" section of the do 04/25/09 http://bugs.python.org/issue5840 created exarkun patch add py3k warnings to commands 04/25/09 CLOSED http://bugs.python.org/issue5841 created dsm001 patch Move test outside of urlparse module 04/25/09 http://bugs.python.org/issue5842 created Merwok Possible normalization error in urlparse.urlunparse 04/25/09 http://bugs.python.org/issue5843 created Merwok internal error on write while reading 04/25/09 http://bugs.python.org/issue5844 created dsm001 patch rlcompleter should be enabled automatically 04/25/09 http://bugs.python.org/issue5845 created cben Deprecate obsolete functions in unittest 04/25/09 http://bugs.python.org/issue5846 created michael.foord IDLE/Win Installer: drop -n switch for 2.7/3.1; install 3.1 as i 04/26/09 http://bugs.python.org/issue5847 created kbk Minor unittest doc patch 04/26/09 CLOSED http://bugs.python.org/issue5848 created michael.foord patch, patch, easy, needs review Idle 3.01 - invalid syntec error 04/26/09 CLOSED http://bugs.python.org/issue5849 created r2d2floyd Full example for emulating a container type 04/27/09 CLOSED http://bugs.python.org/issue5850 created yaneurabeya Add a stream parameter to gc.set_debug 04/27/09 http://bugs.python.org/issue5851 created nicdumz can't use "glog" to find the path with square bracket 04/27/09 CLOSED http://bugs.python.org/issue5852 created winterTTr mimetypes.guess_type() hits recursion limit 04/27/09 CLOSED http://bugs.python.org/issue5853 created djc logging module's __all__ attribute not in sync with documentatio 04/27/09 CLOSED http://bugs.python.org/issue5854 created flub easy Perhaps exponential performance of sum(listoflists, []) 04/27/09 CLOSED http://bugs.python.org/issue5855 created sjohn Minor typo in traceback example 04/27/09 CLOSED http://bugs.python.org/issue5856 created nielsdevos patch Return namedtuples from tokenize token generator 04/27/09 CLOSED http://bugs.python.org/issue5857 created mallyvai needs review Make complex repr and str more like float repr and str 04/27/09 http://bugs.python.org/issue5858 created marketdickinson Remove implicit '%f' -> '%g' switch from float formatting. 04/27/09 CLOSED http://bugs.python.org/issue5859 created marketdickinson patch TextIOWrapper: bad error reporting when write() is forbidden 04/27/09 CLOSED http://bugs.python.org/issue5860 created pitrou test_urllib fails on windows 04/28/09 http://bugs.python.org/issue5861 created ocean-city multiprocessing 'using a remote manager' example errors and poss 04/28/09 http://bugs.python.org/issue5862 created r.david.murray bz2.BZ2File should accept other file-like objects. 04/28/09 http://bugs.python.org/issue5863 created MizardX format(1234.5, '.4') gives misleading result 04/28/09 http://bugs.python.org/issue5864 created marketdickinson patch mathmodule.c fails to compile due to missing math_log1p() functi 04/28/09 CLOSED http://bugs.python.org/issue5865 created alanh cPickle defect with tuples and different from pickle output 04/28/09 http://bugs.python.org/issue5866 created jelle No way to create an abstract classmethod 04/28/09 http://bugs.python.org/issue5867 created della mimetypes.MAGIC_FUNCTION initialization not thread-safe in Pytho 04/28/09 CLOSED http://bugs.python.org/issue5868 created apoirier 100th character truncation in 2.4 tarfile.py 04/28/09 CLOSED http://bugs.python.org/issue5869 created neville.bagnall patch subprocess.DEVNULL 04/28/09 http://bugs.python.org/issue5870 created MrJean1 email.header.Header allow to embed raw newlines into a message 04/28/09 http://bugs.python.org/issue5871 created jwilk New C API for declaring Python types 04/29/09 http://bugs.python.org/issue5872 created larry patch Minidom: parsestring() error 04/29/09 CLOSED http://bugs.python.org/issue5873 created naf305 distutils.tests.test_config_cmd is locale-sensitive 04/29/09 CLOSED http://bugs.python.org/issue5874 created georg.brandl test_distutils failing on OpenSUSE 10.3, Py3k 04/29/09 http://bugs.python.org/issue5875 created ShuaibKhan __repr__ returning unicode doesn't work when called implicitly 04/29/09 http://bugs.python.org/issue5876 created liori Add a function for updating URL query parameters 04/29/09 http://bugs.python.org/issue5877 created mrts Regular Expression instances 04/29/09 CLOSED http://bugs.python.org/issue5878 created ecasbas multiprocessing - example "pool of http servers " fails on windo 04/29/09 http://bugs.python.org/issue5879 created ghum Remove unneeded "context" pointer from getters and setters 04/29/09 http://bugs.python.org/issue5880 created larry patch Remove extraneous backwards-compatibility attributes from some m 04/29/09 http://bugs.python.org/issue5881 created larry patch __repr__ is ignored when formatting exceptions 04/29/09 CLOSED http://bugs.python.org/issue5882 created ellisj detach() implementation 04/29/09 http://bugs.python.org/issue5883 created benjamin.peterson patch pydoc to return error status code 04/30/09 http://bugs.python.org/issue5884 created mixmastamyk uuid.uuid1() is too slow 04/30/09 http://bugs.python.org/issue5885 created wangchun curses/__init__.py: global name '_os' is not defined 04/30/09 CLOSED http://bugs.python.org/issue5886 created andrix patch mmap.write_byte out of bounds - no error, position gets screwed 04/30/09 http://bugs.python.org/issue5887 created bmearns mmap ehancement - resize with sequence notation 04/30/09 http://bugs.python.org/issue5888 created bmearns Extra comma in enum - fails on AIX 04/30/09 CLOSED http://bugs.python.org/issue5889 created srid Subclassing property doesn't preserve the auto __doc__ behavior 04/30/09 http://bugs.python.org/issue5890 created gsakkis strange list.sort() behavior on import, del and inport again 05/01/09 CLOSED http://bugs.python.org/issue5891 created dstemmer strange list.sort() behavior on import, del and inport again 05/01/09 CLOSED http://bugs.python.org/issue5892 created dstemmer Add support to pydoc to output .rst restructured text 05/01/09 http://bugs.python.org/issue5893 created gregory.p.smith Lookup of localised language name by ISO 639 language code and r 05/01/09 http://bugs.python.org/issue5894 created pander Issues Now Closed (104) _______________________ pyvm module patch 515 days http://bugs.python.org/issue1522 benjamin.peterson patch Bad OOB data management when using asyncore with select.poll() 514 days http://bugs.python.org/issue1541 georg.brandl patch str.format() wrongly formats complex() numbers (Py30a2) 505 days http://bugs.python.org/issue1588 eric.smith patch sqlite3 docs should mention utf8 requirement 434 days http://bugs.python.org/issue2127 georg.brandl patch, easy aifc cannot handle unrecognised chunk type "CHAN" 419 days http://bugs.python.org/issue2245 r.david.murray easy float compared to decimal is silently incorrect. 34 days http://bugs.python.org/issue2531 jdunck patch 3.0 pickle docs -- what about old-style classes? 385 days http://bugs.python.org/issue2572 georg.brandl PyString_FromStringAndSize() to be considered unsafe 384 days http://bugs.python.org/issue2587 iankko Python does not accept unicode keywords 375 days http://bugs.python.org/issue2646 ajaksu2 26backport ctypes defines global symbols 316 days http://bugs.python.org/issue3102 theller patch Wish: disable tests in unittest 304 days http://bugs.python.org/issue3202 benjamin.peterson patch various doc typos 291 days http://bugs.python.org/issue3320 georg.brandl patch file.readline: bad exception recovery 260 days http://bugs.python.org/issue3521 benjamin.peterson patch, easy Tuple comparison masking exception 226 days http://bugs.python.org/issue3829 rhettinger idle should be installed as idle3.0 220 days http://bugs.python.org/issue3896 ajaksu2 smtplib cannot sendmail over TLS 217 days http://bugs.python.org/issue3921 ajaksu2 patch, easy Python 2.6 Doc/tools folder bigger than in 2.6rc2 205 days http://bugs.python.org/issue4013 georg.brandl C/API documentation: request for documentation of change to Py_s 196 days http://bugs.python.org/issue4129 asmodai patch Email example should use SMTP.quit() rather than SMTP.close() 181 days http://bugs.python.org/issue4239 asmodai ctypes could include data type limits 145 days http://bugs.python.org/issue4538 theller Need to rework the dbm lib/include selection process 144 days http://bugs.python.org/issue4587 doko patch, needs review Idle for Python 3.0 is default even without doing make fullinsta 129 days http://bugs.python.org/issue4693 ajaksu2 failure in test_httpservers 101 days http://bugs.python.org/issue4951 tarek patch Incorrect title case 98 days http://bugs.python.org/issue4971 loewis Specifying common controls DLL in manifest 97 days http://bugs.python.org/issue5019 robind ctypes unwilling to allow pickling wide character 90 days http://bugs.python.org/issue5049 theller patch Inadequate documentation of the built-in function open 91 days http://bugs.python.org/issue5061 georg.brandl IDLE improve Subprocess Startup Error message 91 days http://bugs.python.org/issue5065 ajaksu2 Avoid redundant call to FormatError() 88 days http://bugs.python.org/issue5078 theller patch indentation in IDLE 2.6 different from IDLE 2.5, 2.4 or vim 82 days http://bugs.python.org/issue5129 kbk patch, 26backport wrong paths for ctypes cleanup 78 days http://bugs.python.org/issue5161 theller setting __class__ in __del__ is bad. mmkay. negative ref count! 67 days http://bugs.python.org/issue5283 benjamin.peterson patch email/base64mime.py cannot work 67 days http://bugs.python.org/issue5304 ajaksu2 easy ctypes configuration fails on mips-linux (and probably Irix) 41 days http://bugs.python.org/issue5507 theller test_math.testFsum failure on release30-maint 26 days http://bugs.python.org/issue5593 marketdickinson file "" on disk creates garbage output in stack trace 26 days http://bugs.python.org/issue5668 ajaksu2 shutils test fails on ZFS (on FUSE, on Linux) 27 days http://bugs.python.org/issue5676 benjamin.peterson patch inspect.findsource() should look only for sources 13 days http://bugs.python.org/issue5742 ajaksu2 patch idle pydoc et al removed from 3.1 without versioned replacements 11 days http://bugs.python.org/issue5756 kbk IDLE cannot find windows chm file 8 days http://bugs.python.org/issue5783 kbk patch, 26backport Rationalize isdigit / isalpha / tolower / ... uses throughout Py 8 days http://bugs.python.org/issue5793 eric.smith easy test_distutils fails - sysconfig._config_vars is None 3 days http://bugs.python.org/issue5810 tarek Fix five small bugs in the bininstall and altbininstall pseudota 3 days http://bugs.python.org/issue5818 benjamin.peterson patch Documentation: mention 'close' and iteration for tarfile.TarFile 2 days http://bugs.python.org/issue5821 georg.brandl patch new unittest function listed as assertIsNotNot() instead of asse 2 days http://bugs.python.org/issue5826 michael.foord Invalid behavior of unicode.lower 1 days http://bugs.python.org/issue5828 loewis patch heapq item comparison problematic with sched's events 0 days http://bugs.python.org/issue5830 rhettinger os.path.walk fails to descend into a directory whose name ends w 0 days http://bugs.python.org/issue5832 potten The word "error" used instead of "failure" 0 days http://bugs.python.org/issue5834 georg.brandl Deprecate PyOS_ascii_formatd 2 days http://bugs.python.org/issue5835 eric.smith Clean up float parsing code for nans and infs 2 days http://bugs.python.org/issue5836 marketdickinson support.EnvironmentVarGuard broken 0 days http://bugs.python.org/issue5837 doerwalter easy Test issue 0 days http://bugs.python.org/issue5838 marketdickinson add py3k warnings to commands 0 days http://bugs.python.org/issue5841 georg.brandl patch Minor unittest doc patch 1 days http://bugs.python.org/issue5848 georg.brandl patch, patch, easy, needs review Idle 3.01 - invalid syntec error 0 days http://bugs.python.org/issue5849 doerwalter Full example for emulating a container type 2 days http://bugs.python.org/issue5850 rhettinger can't use "glog" to find the path with square bracket 0 days http://bugs.python.org/issue5852 amaury.forgeotdarc mimetypes.guess_type() hits recursion limit 1 days http://bugs.python.org/issue5853 pitrou logging module's __all__ attribute not in sync with documentatio 0 days http://bugs.python.org/issue5854 vsajip easy Perhaps exponential performance of sum(listoflists, []) 0 days http://bugs.python.org/issue5855 pitrou Minor typo in traceback example 0 days http://bugs.python.org/issue5856 georg.brandl patch Return namedtuples from tokenize token generator 1 days http://bugs.python.org/issue5857 rhettinger needs review Remove implicit '%f' -> '%g' switch from float formatting. 4 days http://bugs.python.org/issue5859 marketdickinson patch TextIOWrapper: bad error reporting when write() is forbidden 0 days http://bugs.python.org/issue5860 benjamin.peterson mathmodule.c fails to compile due to missing math_log1p() functi 0 days http://bugs.python.org/issue5865 marketdickinson mimetypes.MAGIC_FUNCTION initialization not thread-safe in Pytho 0 days http://bugs.python.org/issue5868 pitrou 100th character truncation in 2.4 tarfile.py 0 days http://bugs.python.org/issue5869 neville.bagnall patch Minidom: parsestring() error 0 days http://bugs.python.org/issue5873 georg.brandl distutils.tests.test_config_cmd is locale-sensitive 0 days http://bugs.python.org/issue5874 tarek Regular Expression instances 0 days http://bugs.python.org/issue5878 georg.brandl __repr__ is ignored when formatting exceptions 0 days http://bugs.python.org/issue5882 benjamin.peterson curses/__init__.py: global name '_os' is not defined 0 days http://bugs.python.org/issue5886 amaury.forgeotdarc patch Extra comma in enum - fails on AIX 1 days http://bugs.python.org/issue5889 srid strange list.sort() behavior on import, del and inport again 0 days http://bugs.python.org/issue5891 loewis strange list.sort() behavior on import, del and inport again 0 days http://bugs.python.org/issue5892 loewis Fix for bugs relating to ntpath.expanduser() 210 days http://bugs.python.org/issue957650 gjb1002 patch urllib2 http auth 1689 days http://bugs.python.org/issue1025540 gregory.p.smith endianness detection fails on IRIX 5.3 1617 days http://bugs.python.org/issue1070140 ajaksu2 proposed patch for tls wrapped ssl support added to smtplib 1417 days http://bugs.python.org/issue1217246 ajaksu2 patch MSI installer does not pass values as SecureProperty from UI 1311 days http://bugs.python.org/issue1298962 ajaksu2 Integer bit operations performance improvement. 1073 days http://bugs.python.org/issue1492860 marketdickinson easy test_float segfaults with SIGFPE on FreeBSD 6.0 / Alpha 1066 days http://bugs.python.org/issue1496032 marketdickinson Use dynload_shlib on newer HP-UX versions 1026 days http://bugs.python.org/issue1516897 ajaksu2 Allowing multiple instances of IDLE with sub-processes 1004 days http://bugs.python.org/issue1529142 kbk patch Tracing and profiling functions can cause hangs in threads 999 days http://bugs.python.org/issue1531859 ajaksu2 patch Tru64 make install failure 954 days http://bugs.python.org/issue1558802 ajaksu2 Install on WinXP always goes to C:\ 943 days http://bugs.python.org/issue1565468 ajaksu2 Modules/readline.c fails to compile on AIX 4.2 891 days http://bugs.python.org/issue1597798 ajaksu2 Would you mind renaming object.h to pyobject.h? 844 days http://bugs.python.org/issue1626545 ajaksu2 patch Python 2.5 gets curses.h warning on HPUX 824 days http://bugs.python.org/issue1642054 ajaksu2 proxy_bypass in urllib handling of macro 821 days http://bugs.python.org/issue1648102 orsenthil patch, easy HP-UX: compiler warnings: alignment 815 days http://bugs.python.org/issue1649011 ajaksu2 Python package support not properly documented 715 days http://bugs.python.org/issue1719423 georg.brandl Document effects of PY_SSIZE_T_CLEAN on argument parsing 693 days http://bugs.python.org/issue1729742 loewis Solaris 64 bit LD_LIBRARY_PATH_64 needs to be set 687 days http://bugs.python.org/issue1733484 ajaksu2 Modules/ld_so_aix needs to strip path off of whichcc call 687 days http://bugs.python.org/issue1733509 ajaksu2 zlib configure behaves differently than main configure 687 days http://bugs.python.org/issue1733513 ajaksu2 HP shared object option 687 days http://bugs.python.org/issue1733523 ajaksu2 HP automatic build of zlib 687 days http://bugs.python.org/issue1733532 ajaksu2 HP 64 bit does not run 687 days http://bugs.python.org/issue1733544 ajaksu2 AIX shared object build of python 2.5 does not work 687 days http://bugs.python.org/issue1733546 ajaksu2 Fast path for unicodedata.normalize() 688 days http://bugs.python.org/issue1734234 pitrou patch Python - Operation time out problem 628 days http://bugs.python.org/issue1768858 ajaksu2 Top Issues Most Discussed (10) ______________________________ 28 str.format() wrongly formats complex() numbers (Py30a2) 505 days closed http://bugs.python.org/issue1588 12 support.EnvironmentVarGuard broken 0 days closed http://bugs.python.org/issue5837 10 mathmodule.c fails to compile due to missing math_log1p() funct 0 days closed http://bugs.python.org/issue5865 10 format(1234.5, '.4') gives misleading result 3 days open http://bugs.python.org/issue5864 9 Invalid behavior of unicode.lower 1 days closed http://bugs.python.org/issue5828 9 IDLE cannot find windows chm file 8 days closed http://bugs.python.org/issue5783 8 failure in test_httpservers 101 days closed http://bugs.python.org/issue4951 7 mimetypes.guess_type() hits recursion limit 1 days closed http://bugs.python.org/issue5853 7 C/API documentation: request for documentation of change to Py_ 196 days closed http://bugs.python.org/issue4129 6 detach() implementation 2 days open http://bugs.python.org/issue5883 From chris at simplistix.co.uk Fri May 1 18:26:29 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 17:26:29 +0100 Subject: [Python-Dev] .pth files are evil In-Reply-To: <49E60832.8030806@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> Message-ID: <49FB22B5.3040507@simplistix.co.uk> M.-A. Lemburg wrote: > """ > If the package really requires adding one or more directories on sys.path (e.g. > because it has not yet been structured to support dotted-name import), a "path > configuration file" named package.pth can be placed in either the site-python or > site-packages directory. > ... > A typical installation should have no or very few .pth files or something is > wrong, and if you need to play with the search order, something is very wrong. > """ I'll say! I think .pth files are absolute evil and I wish they could just be banned. +1 on anything that makes them closer to going away or reduces the possibility of yet another similar feature from hurting the comprehensibility of a python setup. Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri May 1 18:30:16 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 17:30:16 +0100 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <49E60832.8030806@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> Message-ID: <49FB2398.5000708@simplistix.co.uk> M.-A. Lemburg wrote: > The much more common use case is that of wanting to have a base package > installation which optional add-ons that live in the same logical > package namespace. > > The PEP provides a way to solve this use case by giving both developers > and users a standard at hand which they can follow without having to > rely on some non-standard helpers and across Python implementations. > > My proposal tries to solve this without adding yet another .pth > file like mechanism - hopefully in the spirit of the original Python > package idea. Okay, I need to issue a plea for a little help. I think I kinda get what this PEP is about now, and as someone who wants to ship a base package with several add-ons that live in the same logical package namespace, I'm very interested. However, despite trying to follow this thread *and* having tried to read the PEP a couple of times, I still don't know how I'd go about doing this. I did give some examples from what I'd be looking to do much earlier. I'll ask again in the vague hope of you or someone else explaining things to me like I'm a 5 year old - something I'm mentally equipped to be well ;-) In either of the proposals on the table, what code would I write and where to have a base package with a set of add-on packages? Simple examples would be greatly appreciated, and might bring things into focus for some of the less mentally able bystanders - like myself! cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri May 1 18:32:14 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 17:32:14 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415175704.966B13A4100@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> Message-ID: <49FB240E.8030905@simplistix.co.uk> P.J. Eby wrote: > At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote: >> The much more common use case is that of wanting to have a base package >> installation which optional add-ons that live in the same logical >> package namespace. > > Please see the large number of Zope and PEAK distributions on PyPI as > minimal examples that disprove this being the common use case. If you mean "the common use case as opposed to having code in the __init__.py of the namespace package", I think you'll find that's because people (especially me!) don't know how to do this, not because we don't want to! Chris - who would actually like to know how to do this, with or without the PEP, and how to indicate interdependencies in situations like this to setuptools... -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri May 1 18:35:43 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 17:35:43 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415192021.558E53A4119@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> Message-ID: <49FB24DF.2020701@simplistix.co.uk> P.J. Eby wrote: > It's unclear, however, who is using base packages besides mx.* and ll.*, > although I'd guess from the PyPI listings that perhaps Django is. (It > seems that "base" packages are more likely to use a 'base-extension' > naming pattern, vs. the 'namespace.project' pattern used by "pure" > packages.) I'll stress it again in case you missed it the first time: I think the main reason people use "pure namespace" versus "base namespace" packages is because hardly anyone know how to do the latter, not because there is no desire to do so! I, for one, have been trying to figure out how to do "base namespace" packages for years... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From martin at v.loewis.de Fri May 1 18:38:46 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 18:38:46 +0200 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: <49FB2596.1090706@v.loewis.de> > Okay, I am wrong about this. Having a flag to remember whether I had to > fall back to the utf-8b trick is one method to implement my requirement, > but my actual requirement is this: > > Requirement: either the unicode string or the bytes are faithfully > transmitted from one system to another. I don't understand this requirement very well, in particular not the "faithfully" part. > That is: if you read a filename from the filesystem, and transmit that > filename to another system and use it, then there are two cases: What do you mean by "use it"? Things like opening files? How does that work? In general, a file name valid on one system is invalid on a different system - or, at least, refers to a different file over there. This is independent of encodings. > Requirement 1: the byte string was valid in the encoding of source > system, in which case the unicode name is faithfully transmitted > (i.e. the bytes that finally land on the target system are the result of > sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding). In all your descriptions, I'm puzzled as to where exactly you get the source bytes from. If you use the PEP 383 interfaces, you will start with character strings, not byte strings, always. > Okay, I find it surprisingly easy to make subtle errors in this encoding > stuff, so please let me know if you spot one. Is it true that > srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', > 'python-escape') will always produce srcbytes ? I think you mixed up bytes and unicode here: if srcbytes is indeed a bytes object, then you can't apply .encode to it. Regards, Martin From martin at v.loewis.de Fri May 1 18:41:03 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 18:41:03 +0200 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <49FB2398.5000708@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> Message-ID: <49FB261F.9080306@v.loewis.de> > In either of the proposals on the table, what code would I write and > where to have a base package with a set of add-on packages? I don't quite understand the question. Why would you want to write code (except for the code that actually is in the packages)? PEP 382 is completely declarative - no need to write code. Regards, Martin From chris at simplistix.co.uk Fri May 1 18:58:18 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 17:58:18 +0100 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <49FB261F.9080306@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> Message-ID: <49FB2A2A.4090606@simplistix.co.uk> Martin v. L?wis wrote: >> In either of the proposals on the table, what code would I write and >> where to have a base package with a set of add-on packages? > > I don't quite understand the question. Why would you want to write code > (except for the code that actually is in the packages)? > > PEP 382 is completely declarative - no need to write code. "code" is anything I need to write to make this work... So, what do I need to do? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri May 1 19:14:12 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 18:14:12 +0100 Subject: [Python-Dev] headers api for email package In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E08F8C.5030205@simplistix.co.uk> Message-ID: <49FB2DE4.10008@simplistix.co.uk> >>> Where you just want "a damned valid email and stop making my life >>> hard!": >>> >>> Message['Subject']='Some text' >> >> Yes. In which case I propose we guess the encoding as 1) ascii, 2) >> utf-8, 3) wtf? Well, we're talking about Python 3 here right? In which case the above involves only unicode, so why do we need to guess anything? Just use utf-8 and be done with it... > However, it's not supposed to be used by mail composers, who are > expected to know the encoding. It's for mail gateways that are > transforming something and don't know the encoding. I'm not > sure what this means for the email module, which certainly > will be used in a mail gateways....maybe it's the responsibility > of the application code to explicitly say 'unknown encoding'? Indeed, surely this happens when you have bytes and need to do something with it? That's not what my example above is about... >>> Where you care about what encoding is used: >>> >>> Message['Subject']=Header('Some text',encoding='utf-8') >> >> Yes. ...it's covered by this. >>> If you have bytes, for whatever reason: >>> >>> Message['Subject']=b'some bytes'.decode('utf-8') >>> >>> ...because only you know what encoding those bytes use! >> >> So you're saying that __setitem__() should not accept raw bytes? Indeed :-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Fri May 1 19:18:35 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 May 2009 18:18:35 +0100 Subject: [Python-Dev] [Email-SIG] headers api for email package In-Reply-To: <873accv5jr.fsf@xemacs.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E08F8C.5030205@simplistix.co.uk> <873accv5jr.fsf@xemacs.org> Message-ID: <49FB2EEB.1000400@simplistix.co.uk> Stephen J. Turnbull wrote: > > > str(message['Subject']) > > > > Yes for unstructured headers like Subject. For structured headers... > > hmm. > > Well, suppose we get really radical here. *People* see email as > (rich-)text. So ... message['Subject'] returns an object, partly to > be consistent with more complex headers' APIs, but partly to remind us > that nothing in email is as simple as it seems. Now, > str(message['Subject']) is really for presentation to the user, right? > OK, so let's make it a presentation function! Decode the MIME-words, > optionally unfold folded lines, optionally compress spaces, etc. This > by default returns the subject field as a single, possibly quite long, > line. Then a higher-level API can rewrap it, add fonts etc, for fancy > presentation. This also suggests that we don't the field tag (ie, > "Subject") to be part of this value. > > Of course a *really* smart higher-level API would access structured > headers based on their structure, not on the one-size-fits-all str() > conversion. All sounds good to me. > Then MTAs see email as a string of octets. So guess what: > > > > bytes(message['Subject']) > > gives wire format. Yow! I think I'm just joking. Right? Why? That also sounds fine to me and "feels right"... > > > Where you just want "a damned valid email and stop making my life > > > hard!": > > -1 I mean, yeah, Brother, I feel your pain but it just isn't that > easy. If that were feasible, it would be *criminal* to have a > .set_header() method at all! In fact, Don't agree... > > > Message['Subject']='Some text' > > is going to (a) need to take *only* unicodes, or (b) raise Exceptions > at the slightest provocation when handed bytes. It should only take unicodes and bitch profusely about anything else. > And things only get worse if you try to provide this interface for say > "From" (let alone "Content-Type"). Is it really worth doing the > mapping interface if it's only usable with free-form headers (ie, only > Subject among the commonly used headers)? Sure, for other headers it might *not* accept unicodes... > How do you distinguish "raw" bytes from "encoded bytes"? > __setitem__() shouldn't accept bytes at all. Right on :-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From martin at v.loewis.de Fri May 1 19:38:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 19:38:12 +0200 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <49FB2A2A.4090606@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> Message-ID: <49FB3384.1030106@v.loewis.de> >>> In either of the proposals on the table, what code would I write and >>> where to have a base package with a set of add-on packages? >> >> I don't quite understand the question. Why would you want to write code >> (except for the code that actually is in the packages)? >> >> PEP 382 is completely declarative - no need to write code. > > "code" is anything I need to write to make this work... > > So, what do I need to do? Ok, so create three tar files: 1. base.tar, containing simplistix/ simplistix/__init__.py 2. addon1.tar, containing simplistix/addon1.pth (containing a single "*") simplistix/feature1.py 3. addon2.tar, containing simplistix/addon2.pth simplistix/feature2.py Unpack each of them anywhere on sys.path, in any order. Regards, Martin From martin at v.loewis.de Fri May 1 19:41:39 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 19:41:39 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49FB24DF.2020701@simplistix.co.uk> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> Message-ID: <49FB3453.4060906@v.loewis.de> >> It's unclear, however, who is using base packages besides mx.* and >> ll.*, although I'd guess from the PyPI listings that perhaps Django >> is. (It seems that "base" packages are more likely to use a >> 'base-extension' naming pattern, vs. the 'namespace.project' pattern >> used by "pure" packages.) > > I'll stress it again in case you missed it the first time: I think the > main reason people use "pure namespace" versus "base namespace" packages > is because hardly anyone know how to do the latter, not because there is > no desire to do so! > > I, for one, have been trying to figure out how to do "base namespace" > packages for years... You mean, without PEP 382? That won't be possible, unless you can coordinate all addon packages. Base packages are a feature solely of PEP 382. Regards, Martin From pje at telecommunity.com Fri May 1 20:49:40 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 01 May 2009 14:49:40 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49FB24DF.2020701@simplistix.co.uk> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> Message-ID: <20090501184706.66ED13A4070@sparrow.telecommunity.com> At 05:35 PM 5/1/2009 +0100, Chris Withers wrote: >P.J. Eby wrote: >>It's unclear, however, who is using base packages besides mx.* and >>ll.*, although I'd guess from the PyPI listings that perhaps Django >>is. (It seems that "base" packages are more likely to use a >>'base-extension' naming pattern, vs. the 'namespace.project' >>pattern used by "pure" packages.) > >I'll stress it again in case you missed it the first time: I think >the main reason people use "pure namespace" versus "base namespace" >packages is because hardly anyone know how to do the latter, not >because there is no desire to do so! I didn't say there's *no* desire, however IIRC the only person who *ever* asked on distutils-sig how to do a base package with setuptools was the author of the ll.* packages. And in the case of at least the zope.* peak.* and osaf.* namespace packages it was specifically *not* the intention to have a base __init__. From pje at telecommunity.com Fri May 1 20:51:20 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 01 May 2009 14:51:20 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49FB3453.4060906@v.loewis.de> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> Message-ID: <20090501184843.D08E43A4070@sparrow.telecommunity.com> At 07:41 PM 5/1/2009 +0200, Martin v. L?wis wrote: > >> It's unclear, however, who is using base packages besides mx.* and > >> ll.*, although I'd guess from the PyPI listings that perhaps Django > >> is. (It seems that "base" packages are more likely to use a > >> 'base-extension' naming pattern, vs. the 'namespace.project' pattern > >> used by "pure" packages.) > > > > I'll stress it again in case you missed it the first time: I think the > > main reason people use "pure namespace" versus "base namespace" packages > > is because hardly anyone know how to do the latter, not because there is > > no desire to do so! > > > > I, for one, have been trying to figure out how to do "base namespace" > > packages for years... > >You mean, without PEP 382? > >That won't be possible, unless you can coordinate all addon packages. >Base packages are a feature solely of PEP 382. Actually, if you are using only the distutils, you can do this by listing only modules in the addon projects; this is how the ll.* tools are doing it. That only works if the packages are all being installed in the same directory, though, not as eggs. From martin at v.loewis.de Fri May 1 20:58:28 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 01 May 2009 20:58:28 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090501184843.D08E43A4070@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> Message-ID: <49FB4654.9000408@v.loewis.de> > Actually, if you are using only the distutils, you can do this by > listing only modules in the addon projects; this is how the ll.* tools > are doing it. That only works if the packages are all being installed > in the same directory, though, not as eggs. Right: if all portions install into the same directory, you can have base packages already. Regards, Martin From benjamin at python.org Fri May 1 21:32:18 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 1 May 2009 14:32:18 -0500 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FB165B.9070909@mrabarnett.plus.com> References: <49FB165B.9070909@mrabarnett.plus.com> Message-ID: <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> 2009/5/1 MRAB : > I've just noticed an oddity in the key in PEP 0. Most letters are used > more than once. Wouldn't it be clearer if different letters were used > for "Accepted" and "Active" instead of them both being 'A', for example? > > -> A - Accepted proposal > -> R - Rejected proposal > ? W - Withdrawn proposal > -> D - Deferred proposal > ? F - Final proposal > -> A - Active proposal > -> D - Draft proposal > -> R - Replaced proposal Yes, that makes more sense. Would you like to submit a patch against the PEP 0 generator? (It's in peps/pep0) -- Regards, Benjamin From tjreedy at udel.edu Fri May 1 22:21:36 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 01 May 2009 16:21:36 -0400 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: Zooko O'Whielacronx wrote: > Following-up to my own post to correct a major error: > Is it true that > srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', > 'python-escape') will always produce srcbytes ? That is my Requirement If you start with bytes, decode with utf-8b to unicode (possibly 'invalid'), and encode the result back to bytes with utf-8b, you should get the original bytes, regardless of what they were. That is the point of PEP 383 -- to reliably roundtrip file 'names' that start as bytes and must end as the same bytes but which may not otherwise have a unicode decoding. If you start with invalid unicode text, encode to bytes with utf-8b, and decode back to unicode, you might instead get a different and valid unicode text. An example was given in the discussion. I believe this would be hard to avoid. An any case, it does not matter for the use case of starting with bytes that one wants to temporarily but surely work with as text. Terry Jan Reedy From cs at zip.com.au Fri May 1 23:39:28 2009 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 2 May 2009 07:39:28 +1000 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <49FB2596.1090706@v.loewis.de> Message-ID: <20090501213928.GA15679@cskk.homeip.net> On 01May2009 18:38, Martin v. L?wis wrote: | > Okay, I am wrong about this. Having a flag to remember whether I had to | > fall back to the utf-8b trick is one method to implement my requirement, | > but my actual requirement is this: | > | > Requirement: either the unicode string or the bytes are faithfully | > transmitted from one system to another. | | I don't understand this requirement very well, in particular not | the "faithfully" part. | | > That is: if you read a filename from the filesystem, and transmit that | > filename to another system and use it, then there are two cases: | | What do you mean by "use it"? Things like opening files? How does | that work? In general, a file name valid on one system is invalid | on a different system - or, at least, refers to a different file | over there. This is independent of encodings. I think he's doing a file transfer of some kind and needs to preserve the names. Or I would guess the two systems are not both UNIX or there is some subtlety not yet mentioned, or he'd just use tar or some other byte-level UNIX tool. | > Requirement 1: the byte string was valid in the encoding of source | > system, in which case the unicode name is faithfully transmitted | > (i.e. the bytes that finally land on the target system are the result of | > sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding). | | In all your descriptions, I'm puzzled as to where exactly you get | the source bytes from. If you use the PEP 383 interfaces, you will | start with character strings, not byte strings, always. But if both system do present POSIX layers, it's bytes underneath and the system tools will natively use bytes. He wants to ensure that he can read using python, using listdir, and elsewhere when he writing using python, preserve the bytes layer. I think. In fact it sounds like he may be translating valid unicode and carefully not altering byte names that don't decode. That in turn implies that the codec may be different on the two systems. | > Okay, I find it surprisingly easy to make subtle errors in this encoding | > stuff, so please let me know if you spot one. Is it true that | > srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', | > 'python-escape') will always produce srcbytes ? | | I think you mixed up bytes and unicode here: if srcbytes is indeed | a bytes object, then you can't apply .encode to it. I think he has encode/decode swapped (I did too back in the uber-thread; if your mapping is one-to-one the distinction is almost arbitrary). However, his assertion/hope is true only if srcencoding == 'utf-8'. The PEP itself says that it works if the decode and encode use the same mapping. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ "How do you know I'm Mad?" asked Alice. "You must be," said the Cat, "or you wouldn't have come here." From google at mrabarnett.plus.com Fri May 1 23:52:02 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 01 May 2009 22:52:02 +0100 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> Message-ID: <49FB6F02.7050204@mrabarnett.plus.com> Benjamin Peterson wrote: > 2009/5/1 MRAB : >> I've just noticed an oddity in the key in PEP 0. Most letters are used >> more than once. Wouldn't it be clearer if different letters were used >> for "Accepted" and "Active" instead of them both being 'A', for example? >> >> -> A - Accepted proposal >> -> R - Rejected proposal >> W - Withdrawn proposal >> -> D - Deferred proposal >> F - Final proposal >> -> A - Active proposal >> -> D - Draft proposal >> -> R - Replaced proposal > > Yes, that makes more sense. Would you like to submit a patch against > the PEP 0 generator? (It's in peps/pep0) > I'm still trying to think which letters to use! From fuzzyman at voidspace.org.uk Fri May 1 23:55:16 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 01 May 2009 22:55:16 +0100 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FB6F02.7050204@mrabarnett.plus.com> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> Message-ID: <49FB6FC4.1030800@voidspace.org.uk> MRAB wrote: > Benjamin Peterson wrote: >> 2009/5/1 MRAB : >>> I've just noticed an oddity in the key in PEP 0. Most letters are used >>> more than once. Wouldn't it be clearer if different letters were used >>> for "Accepted" and "Active" instead of them both being 'A', for >>> example? >>> >>> -> A - Accepted proposal >>> -> R - Rejected proposal >>> W - Withdrawn proposal >>> -> D - Deferred proposal >>> F - Final proposal >>> -> A - Active proposal >>> -> D - Draft proposal >>> -> R - Replaced proposal >> >> Yes, that makes more sense. Would you like to submit a patch against >> the PEP 0 generator? (It's in peps/pep0) >> > I'm still trying to think which letters to use! P for Proposal (to replace Active Proposal)? Every active PEP is a proposal... Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From barry at python.org Fri May 1 23:59:49 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 1 May 2009 17:59:49 -0400 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FB6FC4.1030800@voidspace.org.uk> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> Message-ID: On May 1, 2009, at 5:55 PM, Michael Foord wrote: > P for Proposal (to replace Active Proposal)? Every active PEP is a > proposal... +1 Maybe even s/Active/Proposed/g ? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From google at mrabarnett.plus.com Sat May 2 00:24:32 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 01 May 2009 23:24:32 +0100 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FB6FC4.1030800@voidspace.org.uk> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> Message-ID: <49FB76A0.7030909@mrabarnett.plus.com> Michael Foord wrote: > MRAB wrote: >> Benjamin Peterson wrote: >>> 2009/5/1 MRAB : >>>> I've just noticed an oddity in the key in PEP 0. Most letters are used >>>> more than once. Wouldn't it be clearer if different letters were used >>>> for "Accepted" and "Active" instead of them both being 'A', for >>>> example? >>>> >>>> -> A - Accepted proposal >>>> -> R - Rejected proposal >>>> W - Withdrawn proposal >>>> -> D - Deferred proposal >>>> F - Final proposal >>>> -> A - Active proposal >>>> -> D - Draft proposal >>>> -> R - Replaced proposal >>> >>> Yes, that makes more sense. Would you like to submit a patch against >>> the PEP 0 generator? (It's in peps/pep0) >>> >> I'm still trying to think which letters to use! > > P for Proposal (to replace Active Proposal)? Every active PEP is a > proposal... > The full list is: S - Standards Track PEP I - Informational PEP P - Process PEP A - Accepted proposal R - Rejected proposal W - Withdrawn proposal D - Deferred proposal F - Final proposal A - Active proposal D - Draft proposal R - Replaced proposal using one letter from each set. From looking more closely at the code: Only 'Informational' or 'Process' PEPs can be 'Active'. 'Draft' and 'Active' are shown as a single space instead of 'D' or 'A'. Therefore: S - Standards Track PEP I - Informational PEP P - Process PEP A - Accepted proposal R - Rejected proposal W - Withdrawn proposal D - Deferred proposal F - Final proposal [A - Active proposal # blank, so can be omitted from key] [D - Draft proposal # blank, so can be omitted from key] R - Replaced proposal leaving just 'Rejected' and 'Replaced' to be disambiguated. From eric at trueblade.com Sat May 2 00:55:04 2009 From: eric at trueblade.com (Eric Smith) Date: Fri, 01 May 2009 18:55:04 -0400 Subject: [Python-Dev] svn down? Message-ID: <49FB7DC8.9060508@trueblade.com> When checking in, I get: Transmitting file data .svn: Commit failed (details follow): svn: Can't create directory '/data/repos/projects/db/transactions/72186-1.txn': Read-only file system With 'svn up', I get: svn: Can't find a temporary directory: Internal error From benjamin at python.org Sat May 2 01:12:23 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 1 May 2009 18:12:23 -0500 Subject: [Python-Dev] svn down? In-Reply-To: <49FB7DC8.9060508@trueblade.com> References: <49FB7DC8.9060508@trueblade.com> Message-ID: <1afaf6160905011612n22ccf803hde0b02deb1e6ef57@mail.gmail.com> 2009/5/1 Eric Smith : > When checking in, I get: > > Transmitting file data .svn: Commit failed (details follow): > svn: Can't create directory > '/data/repos/projects/db/transactions/72186-1.txn': Read-only file system > > With 'svn up', I get: > > svn: Can't find a temporary directory: Internal error I get that, too. In addition, I can't ssh to dinsdale. -- Regards, Benjamin From benjamin at python.org Sat May 2 03:27:48 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 1 May 2009 20:27:48 -0500 Subject: [Python-Dev] yield from? Message-ID: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> What's the status of yield from? There's still a small window open for a patch to be checked into 3.1's branch. I haven't been following the python-ideas threads, so I'm not sure if it's ready yet. -- Regards, Benjamin From zookog at gmail.com Sat May 2 03:42:47 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Fri, 1 May 2009 19:42:47 -0600 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <49FB2596.1090706@v.loewis.de> References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> <49FB2596.1090706@v.loewis.de> Message-ID: Folks: Being new to the use of gmail, I accidentally sent the following only to MvL and not to the list. He promptly replied with a helpful counterexample showing that my design can suffer collisions. :-) Regards, Zooko On Fri, May 1, 2009 at 10:38 AM, "Martin v. L?wis" wrote: >> >> Requirement: either the unicode string or the bytes are faithfully >> transmitted from one system to another. > > I don't understand this requirement very well, in particular not > the "faithfully" part. > >> That is: if you read a filename from the filesystem, and transmit that >> filename to another system and use it, then there are two cases: > > What do you mean by "use it"? Things like opening files? How does > that work? In general, a file name valid on one system is invalid > on a different system - or, at least, refers to a different file > over there. This is independent of encodings. Tahoe is a backup and filesharing program, so you might for example, execute "tahoe cp -r Mot?rhead tahoe:" to copy all the contents of your "Mot?rhead" directory to your Tahoe filesystem. Later you or a friend, might execute "tahoe cp -r tahoe:Mot?rhead ." to copy everything from that directory within your Tahoe filesystem to your local filesystem. So in this case the flow of information is local_system_1 -> Tahoe -> local_system_2. The Requirement 1 is that for each filename encountered which is a valid encoding in local_system_1, then the resulting (unicode) name is transmitted through the Tahoe filesystem and then written out into local_system_2 in the expected way (i.e. just by using the Python unicode APIs and passing the unicode object to them). Requirement 2 is that for each filename encountered which is not a valid encoding in local_system_1, then the original bytes are transmitted through the Tahoe filesystem and then, if the target system is a byte-oriented system such as Linux, the original bytes are written into the target filesystem. (If the target is not Linux then mojibake! but we don't have to go into that now.) Does that make sense? > In all your descriptions, I'm puzzled as to where exactly you get > the source bytes from. If you use the PEP 383 interfaces, you will > start with character strings, not byte strings, always. On Mac and Windows, we use the Python unicode APIs e.g. os.listdir(u"Mot?rhead"). On Linux and Solaris, we use the Python bytestring APIs e.g. os.listdir("Mot?rhead".encode(sys.getfilesystemencoding())). >> Okay, I find it surprisingly easy to make subtle errors in this encoding >> stuff, so please let me know if you spot one. Is it true that >> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8', >> 'python-escape') will always produce srcbytes ? > > I think you mixed up bytes and unicode here: if srcbytes is indeed > a bytes object, then you can't apply .encode to it. Yep, I reversed the order of encode() and decode(). However, my whole statement was utterly wrong and shows that I still didn't fully get it yet. I have flip-flopped again and currently think that PEP 383 is useless for this use case and that my original plan [1] is still the way to go. Please let me know if you spot a flaw in my plan or a ridiculousity in my requirements, or if you see a way that PEP 383 can help me. Thank you very much. Regards, Zooko [1] http://allmydata.org/trac/tahoe/ticket/534#comment:47 From guido at python.org Sat May 2 04:10:47 2009 From: guido at python.org (Guido van Rossum) Date: Fri, 1 May 2009 19:10:47 -0700 Subject: [Python-Dev] yield from? In-Reply-To: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> References: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> Message-ID: Alas, I haven't been following it either recently. Too bad, really, because before I left (now three weeks ago) it was already pretty close. We could perhaps even check in Greg's patch (which I tried and looked like a solid implementation of his proposal at the time) and finagle it for b2. One problem though is that Greg's code is based on 2.6... On Fri, May 1, 2009 at 6:27 PM, Benjamin Peterson wrote: > What's the status of yield from? There's still a small window open for > a patch to be checked into 3.1's branch. I haven't been following the > python-ideas threads, so I'm not sure if it's ready yet. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Sat May 2 04:12:15 2009 From: foom at fuhm.net (James Y Knight) Date: Fri, 1 May 2009 22:12:15 -0400 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> <49FB2596.1090706@v.loewis.de> Message-ID: <51167066-A162-4AAF-B40D-52C1918032D8@fuhm.net> On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote: > Yep, I reversed the order of encode() and decode(). However, my whole > statement was utterly wrong and shows that I still didn't fully get it > yet. I have flip-flopped again and currently think that PEP 383 is > useless for this use case and that my original plan [1] is still the > way to go. Please let me know if you spot a flaw in my plan or a > ridiculousity in my requirements, or if you see a way that PEP 383 can > help me. If I were designing a new system such as this, I'd probably just go for utf8b *always*. That is, set the filesystem encoding to utf-8b. The end. All files always keep the same bytes transferring between unix systems. Thus, for the 99% of the world that uses either windows or a utf-8 locale, they get useful filenames inside tahoe. The other 1% of the world that uses something like latin-1, EUC_JP, etc. on their local system sees mojibake filenames in tahoe, but will see the same filename that they put in when they take it back out. Gnome already uses only utf-8 for filename displays for a few years now, for example, so this isn't exactly an unheard-of position to take... But if you don't do that, then, I still don't see what purpose your requirements serve. If I have two systems: one with a UTF-8 locale, and one with a Latin-1 locale, why should transmitting filenames from system 1 to system 2 through tahoe preserve the raw bytes, but doing the reverse *not* preserve the raw bytes? (all byte-sequences are valid in latin-1, remember, so they'll all decode into unicode without error, and then be reencoded in utf-8...). This seems rather a useless behavior to me. James From alexander.belopolsky at gmail.com Sat May 2 04:46:00 2009 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 1 May 2009 22:46:00 -0400 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FB76A0.7030909@mrabarnett.plus.com> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> <49FB76A0.7030909@mrabarnett.plus.com> Message-ID: .. > leaving just 'Rejected' and 'Replaced' to be disambiguated. 'X' or 'Z' for "Rejected"? Looks like a perfect start for a bikeshed discussion. :-) From stephen at xemacs.org Sat May 2 07:34:15 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 02 May 2009 14:34:15 +0900 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> Message-ID: <87ljpghxuw.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > On May 1, 2009, at 5:55 PM, Michael Foord wrote: > > > P for Proposal (to replace Active Proposal)? Every active PEP is a > > proposal... > > +1 > > Maybe even s/Active/Proposed/g ? Shouldn't that be s/Active/Proposed/ From stephen at xemacs.org Sat May 2 07:49:34 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 02 May 2009 14:49:34 +0900 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> <49FB76A0.7030909@mrabarnett.plus.com> Message-ID: <87k550hx5d.fsf@uwakimon.sk.tsukuba.ac.jp> Alexander Belopolsky writes: > .. > > leaving just 'Rejected' and 'Replaced' to be disambiguated. > > 'X' or 'Z' for "Rejected"? Looks like a perfect start for a bikeshed > discussion. :-) The Japanese contingent suggests O (UPPERCASE LATIN LETTER O) for accepted and X for rejected. (Actually these should be U+25EF and U+00D7, respectively.) From arfrever.fta at gmail.com Sat May 2 12:34:05 2009 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sat, 2 May 2009 12:34:05 +0200 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <87ljpghxuw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FB165B.9070909@mrabarnett.plus.com> <87ljpghxuw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <200905021234.08766.Arfrever.FTA@gmail.com> 2009-05-02 07:34:15 Stephen J. Turnbull napisa?(a): > Barry Warsaw writes: > > On May 1, 2009, at 5:55 PM, Michael Foord wrote: > > > > > P for Proposal (to replace Active Proposal)? Every active PEP is a > > > proposal... > > > > +1 > > > > Maybe even s/Active/Proposed/g ? > > Shouldn't that be > > s/Active/Proposed/ No. From `info sed 'sed Programs' 'The "s" Command'`: > The `s' Command > =============== > > The syntax of the `s' (as in substitute) command is > `s/REGEXP/REPLACEMENT/FLAGS'. The `/' characters may be uniformly > replaced by any other single character within any given `s' command. > The `/' character (or whatever other character is used in its stead) > can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\' > character. > ... > The `s' command can be followed by zero or more of the following > FLAGS: > > `g' > Apply the replacement to _all_ matches to the REGEXP, not just the > first. -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From aahz at pythoncraft.com Sat May 2 14:34:04 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 2 May 2009 05:34:04 -0700 Subject: [Python-Dev] FWD: svn down? Message-ID: <20090502123404.GA27305@panix.com> ----- Forwarded message from "\"Martin v. L?wis\"" ----- > Date: Sat, 02 May 2009 08:18:56 +0200 > From: "\"Martin v. L?wis\"" > To: Aahz > CC: pydotorg at python.org > Subject: Re: [Pydotorg] FWD: [Python-Dev] svn down? > >> Benjamin Peterson reports being unable to ssh to dinsdale > > I have rebooted the machine; it seems now to be working again. > > Regards, > Martin ----- End forwarded message ----- -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith From google at mrabarnett.plus.com Sat May 2 16:12:07 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sat, 02 May 2009 15:12:07 +0100 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> <49FB76A0.7030909@mrabarnett.plus.com> Message-ID: <49FC54B7.8010807@mrabarnett.plus.com> Alexander Belopolsky wrote: > .. >> leaving just 'Rejected' and 'Replaced' to be disambiguated. > > 'X' or 'Z' for "Rejected"? Looks like a perfect start for a bikeshed > discussion. :-) > Are there Unicode codepoints for smilies? I'm thinking of :-) for 'Accepted' and :-( for 'Rejected'. :-) From ajaksu at gmail.com Sat May 2 17:11:49 2009 From: ajaksu at gmail.com (Daniel Diniz) Date: Sat, 2 May 2009 12:11:49 -0300 Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FC54B7.8010807@mrabarnett.plus.com> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> <49FB76A0.7030909@mrabarnett.plus.com> <49FC54B7.8010807@mrabarnett.plus.com> Message-ID: <2d75d7660905020811p1bdd2b5k51030ef1f8ab046f@mail.gmail.com> MRAB wrote: > Are there Unicode codepoints for smilies? I'm thinking of :-) for > 'Accepted' and :-( for 'Rejected'. :-) Yes there are, but we'd need to set the font size to 'humongous' to see the smilies: ? ?. In py3k: print(chr(0x2639), chr(0x263a)) In trunk: print(unichr(0x2639), unichr(0x263a)) -------------- next part -------------- A non-text attachment was scrubbed... Name: smilies.png Type: image/png Size: 3574 bytes Desc: not available URL: From ijmorlan at uwaterloo.ca Sat May 2 17:04:22 2009 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Sat, 2 May 2009 11:04:22 -0400 (EDT) Subject: [Python-Dev] Oddity PEP 0 key In-Reply-To: <49FC54B7.8010807@mrabarnett.plus.com> References: <49FB165B.9070909@mrabarnett.plus.com> <1afaf6160905011232j2fee6103t1b25075733c39bf8@mail.gmail.com> <49FB6F02.7050204@mrabarnett.plus.com> <49FB6FC4.1030800@voidspace.org.uk> <49FB76A0.7030909@mrabarnett.plus.com> <49FC54B7.8010807@mrabarnett.plus.com> Message-ID: On Sat, 2 May 2009, MRAB wrote: > Alexander Belopolsky wrote: >> .. >>> leaving just 'Rejected' and 'Replaced' to be disambiguated. >> >> 'X' or 'Z' for "Rejected"? Looks like a perfect start for a bikeshed >> discussion. :-) >> > Are there Unicode codepoints for smilies? I'm thinking of :-) for > 'Accepted' and :-( for 'Rejected'. :-) U+2639 WHITE FROWNING FACE U+263A WHITE SMILING FACE Also, U+2694 CROSSED SWORDS for "vehement discussion on mailing list", U+2696 SCALES for "BDFL is considering", and U+2678 BLACK UNIVERSAL RECYCLING SYMBOL for "proposal previously rejected is being re-proposed due to changed circumstances". For code don't forget great math operator symbols like U+2264 LESS-THAN OR EQUAL TO and U+222A UNION. But I doubt if anybody would want to bake in an absolute requirement for Unicode support in order to be able to read or write Python code. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From benjamin at python.org Sat May 2 20:41:51 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 2 May 2009 13:41:51 -0500 Subject: [Python-Dev] yield from? In-Reply-To: References: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> Message-ID: <1afaf6160905021141m68b4b25cm7e60aaf6f5dce4e3@mail.gmail.com> 2009/5/1 Guido van Rossum : > Alas, I haven't been following it either recently. Too bad, really, > because before I left (now three weeks ago) it was already pretty > close. We could perhaps even check in Greg's patch (which I tried and > looked like a solid implementation of his proposal at the time) and > finagle it for b2. One problem though is that Greg's code is based on > 2.6... I don't believe the compiler has changed between 2.6 and the trunk, so a patch against the trunk would probably not be too hard. I volunteer to review it if it is produced. -- Regards, Benjamin From g.brandl at gmx.net Sat May 2 21:01:28 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 02 May 2009 21:01:28 +0200 Subject: [Python-Dev] multi-with statement Message-ID: Hi, this is just a short notice that Mattias Br?ndstr?m and I have finished a patch to implement the previously discussed and mostly warmly welcomed extension to with's syntax, allowing with A() as a, B() as b: to be written instead of with A() as a: with B() as b: This syntax was chosen (over "with A(), B() as a, b:") because it has more syntactical similarity to the written-out version. Also, our current uses of "as" all have only one expression on the right. The patch implements it as a simple AST transformation, which guarantees semantic equivalence. It is at . If there is no strong opposition, I will commit it and port it to py3k before 3.1 enters beta stage. cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From fredrik.johansson at gmail.com Sat May 2 21:26:14 2009 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Sat, 2 May 2009 21:26:14 +0200 Subject: [Python-Dev] multi-with statement In-Reply-To: References: Message-ID: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> On Sat, May 2, 2009 at 9:01 PM, Georg Brandl wrote: > Hi, > > this is just a short notice that Mattias Br?ndstr?m and I have finished a > patch to implement the previously discussed and mostly warmly welcomed > extension to with's syntax, allowing > > ? with A() as a, B() as b: > > to be written instead of > > ? with A() as a: > ? ? ? with B() as b: > > This syntax was chosen (over "with A(), B() as a, b:") because it has more > syntactical similarity to the written-out version. ?Also, our current uses > of "as" all have only one expression on the right. > > The patch implements it as a simple AST transformation, which guarantees > semantic equivalence. ?It is at . > > If there is no strong opposition, I will commit it and port it to py3k > before 3.1 enters beta stage. > > cheers, > Georg I was hoping for the other syntax in order to be able to create a nested context in advance as a simple tuple: with A, B: pass context = A, B with context: pass (I.e. a tuple, or perhaps any iterable, would be a valid context manager.) With the syntax in the patch, I will still have to implement a custom nesting context manager to do this, which sort of defeats the purpose. Fredrik From aleaxit at gmail.com Sat May 2 21:44:06 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Sat, 2 May 2009 12:44:06 -0700 Subject: [Python-Dev] multi-with statement In-Reply-To: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> References: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> Message-ID: FWIW, I prefer Fredrik's wish too. Alex On Sat, May 2, 2009 at 12:26 PM, Fredrik Johansson < fredrik.johansson at gmail.com> wrote: > On Sat, May 2, 2009 at 9:01 PM, Georg Brandl wrote: > > Hi, > > > > this is just a short notice that Mattias Br?ndstr?m and I have finished a > > patch to implement the previously discussed and mostly warmly welcomed > > extension to with's syntax, allowing > > > > with A() as a, B() as b: > > > > to be written instead of > > > > with A() as a: > > with B() as b: > > > > This syntax was chosen (over "with A(), B() as a, b:") because it has > more > > syntactical similarity to the written-out version. Also, our current > uses > > of "as" all have only one expression on the right. > > > > The patch implements it as a simple AST transformation, which guarantees > > semantic equivalence. It is at . > > > > If there is no strong opposition, I will commit it and port it to py3k > > before 3.1 enters beta stage. > > > > cheers, > > Georg > > I was hoping for the other syntax in order to be able to create a > nested context in advance as a simple tuple: > > with A, B: > pass > > context = A, B > with context: > pass > > (I.e. a tuple, or perhaps any iterable, would be a valid context manager.) > > With the syntax in the patch, I will still have to implement a custom > nesting context manager to do this, which sort of defeats the purpose. > > Fredrik > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat May 2 21:45:47 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 2 May 2009 19:45:47 +0000 (UTC) Subject: [Python-Dev] CVE-2008-5983 "untrusted python modules search path" Message-ID: Hello, I don't think it has already posted to the list, apologies if it has. Some Linux tools and vendors have been hit by an alleged "security hole" where an embedded Python interpreter will prepend the current working directory to sys.path as soon as PySys_SetArgv() is called by the embedding application. This means, for example, that a Python file in the working directory can break plugins or extensions written for that application if the Python file happens to shadow another module. Regardless of whether this is a security hole or not, it certainly can make things disturbingly surprising when the situation arises. In the bug report (http://bugs.python.org/issue5753), I suggested we add a new function PySys_SetArgvEx() which would take an additional parameter telling whether to touch sys.path or not (in the same spirit as Py_InitializeEx() providing a more flexible API than Py_Initialize()). On the other hand, I don't think we can change the default behaviour of PySys_SetArgv(), since there are probably tools and applications relying on it (the obvious use case which comes to my mind is a third-party interactive interpreter). Any opinions? Regards Antoine. From g.brandl at gmx.net Sat May 2 22:12:10 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 02 May 2009 22:12:10 +0200 Subject: [Python-Dev] multi-with statement In-Reply-To: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> References: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> Message-ID: Fredrik Johansson schrieb: > On Sat, May 2, 2009 at 9:01 PM, Georg Brandl wrote: >> Hi, >> >> this is just a short notice that Mattias Br?ndstr?m and I have finished a >> patch to implement the previously discussed and mostly warmly welcomed >> extension to with's syntax, allowing >> >> with A() as a, B() as b: >> >> to be written instead of >> >> with A() as a: >> with B() as b: > I was hoping for the other syntax in order to be able to create a > nested context in advance as a simple tuple: > > with A, B: > pass > > context = A, B > with context: > pass > > (I.e. a tuple, or perhaps any iterable, would be a valid context manager.) I see; you want to construct your context manager programmatically and pass it to "with" without knowing what is in there. While this would be possible, we have to be aware that with this we would effectively change the context manager protocol, rather like the iterator protocol's __getitem__ alternate realization. This muddies the definition of a context manager. (The interesting thing is that you could already implement *that* version without any new syntactic support, by giving tuples an __enter__/__exit__ method pair.) > With the syntax in the patch, I will still have to implement a custom > nesting context manager to do this, which sort of defeats the purpose. Not really. Having an unknown number of stacked context managers is not the purpose -- for that, I'd still say a custom nesting context manager is better, because it is also more explicit when created not at the "with" site. (You could even write it as a tuple subclass, if you like the tuple interface.) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From rdmurray at bitdance.com Sun May 3 00:33:15 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 2 May 2009 18:33:15 -0400 (EDT) Subject: [Python-Dev] multi-with statement In-Reply-To: References: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> Message-ID: oN Sat, 2 May 2009 at 22:12, Georg Brandl wrote: > I see; you want to construct your context manager programmatically and pass > it to "with" without knowing what is in there. > > While this would be possible, we have to be aware that with this we would > effectively change the context manager protocol, rather like the iterator > protocol's __getitem__ alternate realization. This muddies the definition > of a context manager. > > (The interesting thing is that you could already implement *that* version > without any new syntactic support, by giving tuples an __enter__/__exit__ > method pair.) > >> With the syntax in the patch, I will still have to implement a custom >> nesting context manager to do this, which sort of defeats the purpose. > > Not really. Having an unknown number of stacked context managers is not > the purpose -- for that, I'd still say a custom nesting context manager > is better, because it is also more explicit when created not at the "with" > site. (You could even write it as a tuple subclass, if you like the tuple > interface.) As I understand it, the primary problem the patch Georg is talking about solves is the fact that currently if you pass multiple contexts to contextlib.nested, and one of the later items in the argument list throws an error, the context(s) from the earlier context manager(s) does not get cleaned up properly. This patch solves that problem very neatly. I'm +1 on the patch, including preferring the syntax over the alternative. Georg, maybe you should post the link to the python-ideas discussion? --David From ben+python at benfinney.id.au Sun May 3 01:54:38 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 03 May 2009 09:54:38 +1000 Subject: [Python-Dev] Oddity PEP 0 key References: <49FB165B.9070909@mrabarnett.plus.com> <87ljpghxuw.fsf@uwakimon.sk.tsukuba.ac.jp> <200905021234.08766.Arfrever.FTA@gmail.com> Message-ID: <871vr7m56p.fsf@benfinney.id.au> Arfrever Frehtes Taifersar Arahesis writes: > 2009-05-02 07:34:15 Stephen J. Turnbull napisa?(a): > > Barry Warsaw writes: > > > Maybe even s/Active/Proposed/g ? > > > > Shouldn't that be > > > > s/Active/Proposed/ > > No. > From `info sed 'sed Programs' 'The "s" Command'`: Stephen was, I suspect, feeling a little frisky when he wrote that, and attempted a joke (the shortcut ?? is often used in this forum for ?insert a silly grin here?). Knowing him, I grade the joke ?4 out of 10, could do better?. -- \ ?Think for yourselves and let others enjoy the privilege to do | `\ so too.? ?Voltaire, _Essay On Tolerance_ | _o__) | Ben Finney -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: From zookog at gmail.com Sun May 3 06:33:54 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sat, 2 May 2009 22:33:54 -0600 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <51167066-A162-4AAF-B40D-52C1918032D8@fuhm.net> References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> <49FB2596.1090706@v.loewis.de> <51167066-A162-4AAF-B40D-52C1918032D8@fuhm.net> Message-ID: [cross-posting to python-dev and tahoe-dev] On Fri, May 1, 2009 at 8:12 PM, James Y Knight wrote: > > If I were designing a new system such as this, I'd probably just go for > utf8b *always*. Ah, this would be a very tempting possibility -- abandon all unix users who are slow to embrace our utf-8b future! However, it is moot because Tahoe is not a new system. It is currently at v1.4.1, has a strong policy of backwards-compatibility, and already has lots of data, lots of users, and programmers building on top of it. It currently uses utf-8 for its internal storage (note: nothing to do with reading or writing files from external sources -- only for storing filenames in the decentralized storage system which is accessed by Tahoe clients), and we can't start putting non-utf-8-valid sequences in the "filename" slot because other Tahoe clients would then get a UnicodeDecodeError exception when trying to read those directories. We *could* create a new metadata entry to hold things other than utf-8. Current Tahoe clients would never look at that entry (the metadata is a JSON-serialized dictionary, so we can add a new key name into it without disturbing the existing clients), but future Tahoe clients could look for that new key. That is where it is possible that future versions of Tahoe might be able to benefit from utf-8b or PEP 383, although what PEP 383 offers for this use case remains unclear to me. > But if you don't do that, then, I still don't see what purpose your > requirements serve. If I have two systems: one with a UTF-8 locale, and one > with a Latin-1 locale, why should transmitting filenames from system 1 to > system 2 through tahoe preserve the raw bytes, but doing the reverse *not* > preserve the raw bytes? (all byte-sequences are valid in latin-1, remember, > so they'll all decode into unicode without error, and then be reencoded in > utf-8...). This seems rather a useless behavior to me. I see I'm not explaining the Tahoe requirements clearly. It's probably that I'm not understanding them clearly myself. Hopefully the following will help. There are two different things stored in Tahoe for each directory entry: the filename and the metadata. Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system and then you inspect the files in the Tahoe filesystem, such as by examining the web interface [1] or by running "tahoe ls", either of which you could do either from the same machine where you ran "tahoe cp" or from a different machine (which could be using any operating system). We have the following requirements about what ends up in your Tahoe directory after that cp -r. Requirement 1 (unicode): Each filename that you see needs to be valid unicode (it is stored internally in utf-8). This eliminates utf-8b and PEP 383 from being directly applicable to the filename part, although perhaps they could be useful for the metadata part (about which more below). Requirement 2 (faithful if unicode): For each filename (byte string) in your myfiles directory, if that bytestring is the valid encoding of some string in your stated locale, then the resulting filename in Tahoe is that (unicode) string. Nobody ever doesn't want this, right? Well, maybe some people don't want this sometimes, because it could be that the locale was wrong for this byte string and the resulting successfully-decoded unicode name is gibberish. This is especially acute if the locale is an 8-bit encoding such as latin-1 or windows-1252. However, what's the alternative? Guessing that their locale shouldn't be set to latin-1 and instead decoding their bytes some other way? It seems like we're not going to do better than requirement 2 (faithful if unicode). Requirement 3 (no file left behind): For each filename (byte string) in your myfiles directory, whether or not that byte string is the valid encoding of anything in your stated locale, then that file will be added into the Tahoe filesystem under *some* name (a good candidate would be mojibake, e.g. decode the bytes with latin-1, but that is not the only possibility). I have heard some developers say that they don't want to support this requirement and would rather tell the users to fix their filenames before they can back up or share those files through Tahoe. On the other hand, users have said that they require this and they are not going to go mucking about with all their filenames just so that they can use my backup and filesharing tool. Now already we can say that these three requirements mean that there can be collisions -- for example a directory could have two entries, one of which is not a valid encoding in the locale, and whatever unicode string we invent to name it with in order to satisfy requirements 3 (no file left behind) and 1 (unicode) might happen to be the same as the (correctly-encoded) name of the other file. Therefore these three requirements imply that we have to detect such collisions and deal with them somehow. (Thanks to Martin v. L?wis for reminding me of this.) Possible Requirement 4 (faithful bytes if not unicode, a.k.a. "round-tripping"): Suppose you have a directory with some files with Japanese names, encoded using shift-jis, and some files with Russian names, encoded using koi8-r. Suppose your locale is set to shift-jis, and then you do "tahoe cp -r myfiles/ tahoe:". Then suppose you or someone else does "tahoe cp -r tahoe: copy_of_myfiles/". The "round-tripping" feature is that the files with Russian names that did not accidentally decode cleanly with shift-jis still have the same bytes in their names as they did in the original myfiles directory. As I write this, I am becoming skeptical of this (faithful bytes if not unicode, a.k.a. "round-tripping"), thanks in part to criticism from James Knight, MvL, Thomas Breuel, and others. One reason to be skeptical is that about a third of the Russian files will happen to decode cleanly as shift-jis anyway, and will therefore come out as something entirely different if the target filesystem's encoding is something other than shift-jis. But an even worse problem -- the show-stopper for me -- is that I don't want what Tahoe shows when you do "tahoe ls" or view it in a web browser to differ from what it writes out when you do "tahoe cp -r tahoe: newfiles/". So I'm ready to reject this one. Now about the "metadata" part which is separate from the filename itself. I have another requirement: Requirement 5 (no loss of information): I don't want Tahoe to destroy information -- every transformation should be (in principle) reversible by some future computer-augmented archaeologist. For example, if a bytestring decodes cleanly with the locale's suggested encoding, and we use the resulting unicode as the filename, then we also store the original byte string in the metadata since we don't know if the locale's suggested encoding was good. This allows the later invention of a tool which shows the user what the filename would have been with other encodings and let the user choose one that makes sense. It is important to note that this does not impose any requirement on the *filename* itself -- all such information can be stored in the metadata. Okay, in light of the above four requirements and the rejection of #4, I hereby propose to change from the previous Tahoe design [2] to the following: To copy an entry from a local filesystem into Tahoe: 1. On Windows or Mac read the filename with the unicode APIs. Normalize the string with filename = unicodedata.normalize('NFC', filename). Leave the "original_bytes" key and the "failed_decode" flag out of the metadata. 2. On Linux or Solaris read the filename with the string APIs, and store the result in the "original_bytes" part of the metadata. Call sys.getfilesystemencoding() to get an alleged_encoding. Then, call bytes.decode(alleged_encoding, 'strict') to try to get a unicode object. 2.a. If this decoding succeeds then normalize the unicode filename with filename = unicodedata.normalize('NFC', filename), store the resulting filename and leave the "failed_decode" flag out of the metadata. 2.b. If this decoding fails, then we decode it again with bytes.decode('latin-1', 'strict'). Do not normalize it. Store the resulting unicode object into the "filename" part, set the "failed_decode" flag to True. This is mojibake! 3. (handling collisions) In either case 2.a or 2.b the resulting unicode string may already be present in the directory. If so, check the failed_decode flags on the current entry and the new entry. If they are both set or both unset then the new entry overwrites the old entry -- they had the same name. If the failed_decode flags differ then this is a case of collision -- the old entry and the new entry had (as far as we are concerned) different names that accidentally generated the same unicode. Alter the new entry's name, for example by appending "~1" and then trying again and incrementing the number until it doesn't match any extant entry. To copy an entry from Tahoe into a local filesystem: Always use the Python unicode API. The original_bytes field and the failed_decode field in the metadata are not consulted. Now a question for python-dev people: could utf-8b or PEP 383 be useful for requirements like the four requirements listed above? If not, what requirements does PEP 383 help with? I'm sure that if can help with the use case of "I'm doing os.listdir() and then I'm going to turn around and use the resulting unicode objects on the same local filesystem in the same Python process". I'm not sure that it can help if you are going to store the results of your os.listdir() persistently or if you are going to transmit them over a network. Indeed, using the results that way could lead to unpleasant surprises. Does that sound right to you? Perhaps this could be documented somehow to help other programmers along the way. Thanks very much for your help, everyone. Regards, Zooko [1] http://testgrid.allmydata.org:3567/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/ [2] http://allmydata.org/trac/tahoe/ticket/534#comment:47 From greg.ewing at canterbury.ac.nz Sun May 3 09:47:17 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 03 May 2009 19:47:17 +1200 Subject: [Python-Dev] yield from? In-Reply-To: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> References: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> Message-ID: <49FD4C05.2020301@canterbury.ac.nz> Benjamin Peterson wrote: > What's the status of yield from? There's still a small window open for > a patch to be checked into 3.1's branch. I haven't been following the > python-ideas threads, so I'm not sure if it's ready yet. The PEP itself seems to have settle down, and is awaiting a verdict from Guido. The prototype implementation doesn't quite match the PEP in some of the fine details yet. Also it's for 2.6 rather than 3.x; someone with more knowledge of 3.x internals would be better placed than me to convert it. -- Greg From martin at v.loewis.de Sun May 3 10:17:04 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 03 May 2009 10:17:04 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler Message-ID: <49FD5300.6010906@v.loewis.de> With issue 3672 resolved, it is now unnecessary to introduce an utf-8b codec, since the utf-8 codec will properly report errors for all byte sequences invalid in UTF-8, including lone surrogates. Therefore, utf-8b can be implemented solely through the error handler. Glenn Linderman suggested that the name "python-escape" is not very descriptive, so I've changed the name to "utf8b". I've updated the PEP accordingly. Regards, Martin From stephen at xemacs.org Sun May 3 11:32:38 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 03 May 2009 18:32:38 +0900 Subject: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries] In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> <49FB2596.1090706@v.loewis.de> <51167066-A162-4AAF-B40D-52C1918032D8@fuhm.net> Message-ID: <877i0yilah.fsf@uwakimon.sk.tsukuba.ac.jp> Zooko O'Whielacronx writes: > However, it is moot because Tahoe is not a new system. It is currently > at v1.4.1, has a strong policy of backwards-compatibility, and already > has lots of data, lots of users, and programmers building on top of > it. Cool! Question: is there a way to negotiate versions, or better yet, features? > I see I'm not explaining the Tahoe requirements clearly. It's probably > that I'm not understanding them clearly myself. Well, it's a high-dimensional problem. Keeping track of all the variables is hard. That's why something like PEP 383 can be important to you even though it's only a partial solution; it eliminates one variable. > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system > and then you inspect the files in the Tahoe filesystem, such as by > examining the web interface [1] or by running "tahoe ls", either of > which you could do either from the same machine where you ran "tahoe > cp" or from a different machine (which could be using any operating > system). We have the following requirements about what ends up in your > Tahoe directory after that cp -r. Whoa! Slow down! Where's "my" "Tahoe directory"? Do you mean the directory listing? A copy to whatever system I'm on? The bytes that the Tahoe host has just loaded into a network card buffer to tell me about it? The bytes on disk at the Tahoe host? You'll find it a lot easier to explain things if you adopt a precise, consistent terminology. > Requirement 1 (unicode): Each filename that you see needs to be valid > unicode What does "see" mean? In directory listings? Under what circumstances, if any, can what I see be different from what I get? > Requirement 2 (faithful if unicode): For each filename (byte string) > in your myfiles directory, My local myfiles directory, or my Tahoe myfiles directory? > if that bytestring is the valid encoding of some string in your > stated locale, Who stated the locale? How? Are you referring to what getfilesystemencoding returns? This is a "(unicode) string", right? > then the resulting filename in Tahoe is that (unicode) > string. Nobody ever doesn't want this, right? Well, maybe some > people don't want this sometimes, [...]. However, what's the > alternative? Guessing that their locale shouldn't be set to > latin-1 and instead decoding their bytes some other way? Sure. Emacsen do that, you know. Of course it's hard to guess something else if ISO-8859/1 is the preferred encoding, but it does happen. This probably cannot be done accurately enough for Tahoe, though. > It seems like we're not going to do better than > requirement 2 (faithful if unicode). > > Requirement 3 (no file left behind): For each filename (byte string) > in your myfiles directory, whether or not that byte string is the > valid encoding of anything in your stated locale, then that file will > be added into the Tahoe filesystem under *some* name (a good candidate > would be mojibake, e.g. decode the bytes with latin-1, but that is not > the only possibility). That's not even a possibility, actually. Technically, Latin-1 has a "hole" from U+0080 to U+009F. You need to add the C1 controls to fill in that gap. (I don't think it actually matters in practice, everybody seems to implement ISO-8859/1 as though it contained the control characters ... except when detecting encodings ... but it pays to be precise in these things ....) > Now already we can say that these three requirements mean that there > can be collisions -- for example a directory could have two entries, > one of which is not a valid encoding in the locale, and whatever > unicode string we invent to name it with in order to satisfy > requirements 3 (no file left behind) and 1 (unicode) might happen to > be the same as the (correctly-encoded) name of the other file. This is false with rather high probability, but you need some extra structure to deal with it. First, claim the Unicode private planes for Tahoe. Then allocate characters from the private planes on demand as encountered, *including* such characters encountered in external file names to be stored in Tahoe *and* the surrogates used by PEP 383. "Display names" using these private characters would be valid Unicode, but not very useful. However, an algorithmically generated font (like the 4-hex-digit-square used to give a glyph to unknown code points in the BMP) could be used by those who care. Also store mappings from (system encoding, UTF-8b representation) to private char and back. For simplicity, that could be global on your server (IIRC, there are at least two private planes up there, so you'd need to run into almost 128Ki *unique* such characters to run out). I guess you'd be subject to a DOS attack where somebody decided to map all of 80000-odd CNS characters into private space, and then write 80000 files, each with a different 1-character name .... Note that Martin does *not* do this in PEP 383 because PEP 383 only cares about the semantics that a filename read from a directory can be used to access the file associated with it in that directory. For that, a private, non-Unicode encoding is perfectly acceptable. But you want valid Unicode. This scheme gives it to you. The registry of characters is somewhat unpleasant, but it does allow you to detect filenames that are the same reliably. > Possible Requirement 4 (faithful bytes if not unicode, a.k.a. > "round-tripping"): PEP 383 gives you this, but you must store the encoding used for each such file name. > One reason to be skeptical is that about a third of the Russian > files will happen to decode cleanly as shift-jis anyway, and will > therefore come out as something entirely different if the target > filesystem's encoding is something other than shift-jis. The only way to handle this is to store the encoding used to convert to Unicode as part of *every* file's metadata. This could be also used in Tahoe to warn the user that the current system encoding does not match the alleged_encoding used to make the backup. Some users might prefer to use the alleged_encoding on restore. > But an even worse problem -- the show-stopper for me -- is that I > don't want what Tahoe shows when you do "tahoe ls" or view it in a > web browser to differ from what it writes out when you do "tahoe cp > -r tahoe: newfiles/". But as a requirement, that's incoherent. What you are "seeing" is Unicode, what it will write out is bytes. That means that if multiple locales are in use on both the backup and restore systems, and the nominal system encodings are different, people whose personal default locales are not the same as the system's will see what they expect on the backup system (using system ls), mojibake on Tahoe (using tahoe ls), and *different* mojibake on the restore system (system ls, again). Note that "use Tahoe, not system, ls" doesn't help at all (unless the weirdo has learned to read mojibake, which actually does happen, but it's not worth betting on). How likely is that? Hate to tell you this: if you need the "unknown bytes scheme at all, this scenerio is *extremely* likely. How do you think that KOI8-R got into a directory on a Shift-JIS system in the first place? Yup, a Russian visiting professor in Tokyo who set his personal locale to ru_RU.KOI8-R wrote it there. And he's very likely to have the same personal locale on a very up-to-date system with a UTF-8 system encoding when he gets back to Moscow. Bingo! it's mojibake all the way to Moscow. > Now about the "metadata" part which is separate from the filename > itself. I have another requirement: > > Requirement 5 (no loss of information): I don't want Tahoe to destroy > information -- every transformation should be (in principle) > reversible by some future computer-augmented archaeologist. For > example, if a bytestring decodes cleanly with the locale's suggested > encoding, and we use the resulting unicode as the filename, then we > also store the original byte string in the metadata since we don't > know if the locale's suggested encoding was good. UTF-8b would be just as good for storing the original bytestring, as long as you keep the original encoding. It's actually probably preferable if PEP 383 can be assumed to be implemented in the versions of Python you use. > This allows the later invention of a tool It will be called "Emacs", by the way. > which shows the user what the filename would > have been with other encodings and let the user choose one that makes > sense. > To copy an entry from a local filesystem into Tahoe: > > 1. On Windows or Mac read the filename with the unicode APIs. > Normalize the string with filename = unicodedata.normalize('NFC', > filename). Leave the "original_bytes" key and the "failed_decode" flag > out of the metadata. NFD is probably better for fuzzy matching and display on legacy terminals. > 2. On Linux or Solaris read the filename with the string APIs, and > store the result in the "original_bytes" part of the metadata. Call > sys.getfilesystemencoding() to get an alleged_encoding. Then, call > bytes.decode(alleged_encoding, 'strict') to try to get a unicode > object. > > 2.a. If this decoding succeeds then normalize the unicode filename > with filename = unicodedata.normalize('NFC', filename), store the > resulting filename and leave the "failed_decode" flag out of the > metadata. Per the koi8-lucky example, you don't know if it succeeded for the right reason or the wrong reason. You really should store the alleged_encoding used in the metadata, always. Note that you should *also* store the failed_decode flag, because the presence of multiple fail_decodes is a very strong indication that some of the users had default encoding != system encoding. If you use the scheme I propose above, of course you have the same information by scanning the file name for Tahoe-only private use characters, but that would be relatively expensive. > 2.b. If this decoding fails, then we decode it again with > bytes.decode('latin-1', 'strict'). Do not normalize it. Store the > resulting unicode object into the "filename" part, set the > "failed_decode" flag to True. This is mojibake! Not necessarily. Most ISO-8859/X names will fail to decode if the alleged_encoding is UTF-8, for example, but many (even for X != 1) will be correctly readable because of the policy of trying to share code points across Latin-X encodings. Certainly ISO-8859/1 (and much ISO-8859/15) will be correct. > 3. (handling collisions) In either case 2.a or 2.b the resulting > unicode string may already be present in the directory. If so, check > the failed_decode flags on the current entry and the new entry. If > they are both set or both unset then the new entry overwrites the old > entry -- they had the same name. If both are set, you're OK, because you are forcing ISO-8859/1. If both are unset, however, you don't know for sure because alleged_encoding is not necessarily a constant. > To copy an entry from Tahoe into a local filesystem: > > Always use the Python unicode API. The original_bytes field and the > failed_decode field in the metadata are not consulted. > > Now a question for python-dev people: could utf-8b or PEP 383 be > useful for requirements like the four requirements listed above? If > not, what requirements does PEP 383 help with? By giving you a standard, invertible way to represent anything that the OS can throw at you, it helps with all of them. > I'm not sure that it can help if you are going to store the results > of your os.listdir() persistently or if you are going to transmit > them over a network. Indeed, using the results that way could lead > to unpleasant surprises. No more than any other system for giving a canonical Unicode spelling to the results of an OS call. From l.mastrodomenico at gmail.com Sun May 3 15:29:27 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 3 May 2009 15:29:27 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FD5300.6010906@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> Message-ID: 2009/5/3 "Martin v. L?wis" : > With issue 3672 resolved, it is now unnecessary to introduce > an utf-8b codec, since the utf-8 codec will properly report errors > for all byte sequences invalid in UTF-8, including lone surrogates. > Therefore, utf-8b can be implemented solely through the error handler. That's even nicer. One minor detail though, in the sentence: "non-decodable bytes >128 will be represented as lone half surrogate" ">" should be ">=". -- Lino Mastrodomenico From solipsis at pitrou.net Sun May 3 15:43:06 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 May 2009 13:43:06 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > Glenn Linderman suggested that the name "python-escape" is not very > descriptive, so I've changed the name to "utf8b". If the error handler is supposed to be used for codecs other than utf-8, perhaps it should renamed something more generic, e.g. "surrogate-escape"? Also, if utf8-b is not provided as a codec, will there be an easy way for user code to use the same encoding as the IO layer does? (e.g. os.fsdecode/os.fsencode)? From ncoghlan at gmail.com Sun May 3 17:09:47 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 4 May 2009 01:09:47 +1000 Subject: [Python-Dev] multi-with statement In-Reply-To: References: <3d0cebfb0905021226y501a5990q5b3ccc016255cdef@mail.gmail.com> Message-ID: <972FCC04-5F53-4098-8AFA-FC70CDF55BEB@gmail.com> (I still don't really have net access back after moving house - just chiming in briefly via my mobile) Anyway, I think there is one very good reason for NOT defining a multi- with statement in terms of an existing tuple: it gains us nothing except speed over contextlib.nested. The whole point of the new syntactic support is to execute each expression inside the context of the preceding managers. That requirement precludes the idea of using an intermediate tuple, since every expression would have to be evaluated before the tuple could be created. I'm still not 100% convinced the saving in indentation levels due to this change would be worth the increase in complexity and ambiguity though. -- Nick Coghlan, Brisbane, Australia On 03/05/2009, at 6:12 AM, Georg Brandl wrote: > Fredrik Johansson schrieb: >> On Sat, May 2, 2009 at 9:01 PM, Georg Brandl >> wrote: >>> Hi, >>> >>> this is just a short notice that Mattias Br?ndstr?m and I have f >>> inished a >>> patch to implement the previously discussed and mostly warmly >>> welcomed >>> extension to with's syntax, allowing >>> >>> with A() as a, B() as b: >>> >>> to be written instead of >>> >>> with A() as a: >>> with B() as b: > >> I was hoping for the other syntax in order to be able to create a >> nested context in advance as a simple tuple: >> >> with A, B: >> pass >> >> context = A, B >> with context: >> pass >> >> (I.e. a tuple, or perhaps any iterable, would be a valid context >> manager.) > > I see; you want to construct your context manager programmatically > and pass > it to "with" without knowing what is in there. > > While this would be possible, we have to be aware that with this we > would > effectively change the context manager protocol, rather like the > iterator > protocol's __getitem__ alternate realization. This muddies the > definition > of a context manager. > > (The interesting thing is that you could already implement *that* > version > without any new syntactic support, by giving tuples an __enter__/ > __exit__ > method pair.) > >> With the syntax in the patch, I will still have to implement a custom >> nesting context manager to do this, which sort of defeats the >> purpose. > > Not really. Having an unknown number of stacked context managers is > not > the purpose -- for that, I'd still say a custom nesting context > manager > is better, because it is also more explicit when created not at the > "with" > site. (You could even write it as a tuple subclass, if you like the > tuple > interface.) > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no > less. > Four shall be the number of spaces thou shalt indent, and the number > of thy > indenting shall be four. Eight shalt thou not indent, nor either > indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com From murman at gmail.com Sun May 3 17:35:16 2009 From: murman at gmail.com (Michael Urman) Date: Sun, 3 May 2009 10:35:16 -0500 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> Message-ID: On Sun, May 3, 2009 at 08:43, Antoine Pitrou wrote: > Also, if utf8-b is not provided as a codec, will there be an easy way for user > code to use the same encoding as the IO layer does? (e.g. > os.fsdecode/os.fsencode)? I like the idea of fsencode/fsdecode functions, but we need to be careful deciding what they accept and produce on Windows. I'd expect them to be identity functions, but then the difference in platform behavior suggests perhaps they should be in os.path. Unicode to Unicode on Windows would further mean fsencode wouldn't be useful for sending filenames over sockets, and "utf8" will be prone to exceptions on the very names we're trying to support right now. Is there an advantage to not providing the the "utf8b" behavior as a registered codec? -- Michael Urman From martin at v.loewis.de Sun May 3 19:32:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 03 May 2009 19:32:47 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> Message-ID: <49FDD53F.9080101@v.loewis.de> > That's even nicer. One minor detail though, in the sentence: > > "non-decodable bytes >128 will be represented as lone half surrogate" > > ">" should be ">=". Thanks, fixed. Martin From martin at v.loewis.de Sun May 3 19:39:41 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 03 May 2009 19:39:41 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> Message-ID: <49FDD6DD.6050808@v.loewis.de> > If the error handler is supposed to be used for codecs other than utf-8, > perhaps it should renamed something more generic, e.g. "surrogate-escape"? Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - it's an algorithm based on 16-bit or 32-bit code points. > Also, if utf8-b is not provided as a codec, will there be an easy way for user > code to use the same encoding as the IO layer does? s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in fact, that's exactly what the IO layer does). Regards, Martin From greg at krypto.org Sun May 3 21:20:07 2009 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 3 May 2009 12:20:07 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FDD6DD.6050808@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> Message-ID: <52dc1c820905031220l2f0671b0u425660b85e20d12f@mail.gmail.com> On Sun, May 3, 2009 at 10:39 AM, "Martin v. L?wis" wrote: > > If the error handler is supposed to be used for codecs other than utf-8, > > perhaps it should renamed something more generic, e.g. > "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm based on 16-bit or 32-bit code points. To me that lack of relationship with utf8 suggests that it should not be called utf8b... But I don't have any good suggestions. > > > Also, if utf8-b is not provided as a codec, will there be an easy way for > user > > code to use the same encoding as the IO layer does? > > s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in > fact, that's exactly what the IO layer does). > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sun May 3 22:27:59 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 03 May 2009 22:27:59 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <52dc1c820905031220l2f0671b0u425660b85e20d12f@mail.gmail.com> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> <52dc1c820905031220l2f0671b0u425660b85e20d12f@mail.gmail.com> Message-ID: <49FDFE4F.30200@v.loewis.de> > > If the error handler is supposed to be used for codecs other than > utf-8, > > perhaps it should renamed something more generic, e.g. > "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm based on 16-bit or 32-bit code points. > > > To me that lack of relationship with utf8 suggests that it should not be > called utf8b Perhaps. However, giving it that name was Markus Kuhn's choice - and while it may be confusing, it's (IMO) useful to be consistent with this background. Regards, Martin From greg at krypto.org Sun May 3 23:11:51 2009 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 3 May 2009 14:11:51 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FDFE4F.30200@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> <52dc1c820905031220l2f0671b0u425660b85e20d12f@mail.gmail.com> <49FDFE4F.30200@v.loewis.de> Message-ID: <52dc1c820905031411x488c7d51u4f068a9d419b0318@mail.gmail.com> On Sun, May 3, 2009 at 1:27 PM, "Martin v. L?wis" wrote: > > > If the error handler is supposed to be used for codecs other than > > utf-8, > > > perhaps it should renamed something more generic, e.g. > > "surrogate-escape"? > > > > Perhaps. However, utf-8b doesn't really have to do anything with > utf-8 - > > it's an algorithm based on 16-bit or 32-bit code points. > > > > > > To me that lack of relationship with utf8 suggests that it should not be > > called utf8b > > Perhaps. However, giving it that name was Markus Kuhn's choice - and > while it may be confusing, it's (IMO) useful to be consistent with this > background. > > Regards, > Martin > > Ah, right. My original searches for utf8b didn't turn up much but searching on his name turns some up. Good choice of name then. http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html http://bsittler.livejournal.com/10381.html http://hyperreal.org/~est/utf-8b/ -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon May 4 00:50:29 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 3 May 2009 17:50:29 -0500 Subject: [Python-Dev] yield from? In-Reply-To: <49FD4C05.2020301@canterbury.ac.nz> References: <1afaf6160905011827l132a0014o6b1032e20a08552c@mail.gmail.com> <49FD4C05.2020301@canterbury.ac.nz> Message-ID: <1afaf6160905031550h59af1bbaoc298b0f97f7c25c8@mail.gmail.com> 2009/5/3 Greg Ewing : > Benjamin Peterson wrote: >> >> What's the status of yield from? There's still a small window open for >> a patch to be checked into 3.1's branch. I haven't been following the >> python-ideas threads, so I'm not sure if it's ready yet. > > The PEP itself seems to have settle down, and is > awaiting a verdict from Guido. Guido is now on vacation until the 18th, so I think this will have to be deferred until 2.7/3.2. > > The prototype implementation doesn't quite match > the PEP in some of the fine details yet. Also > it's for 2.6 rather than 3.x; someone with more > knowledge of 3.x internals would be better placed > than me to convert it. -- Regards, Benjamin From jimjjewett at gmail.com Mon May 4 06:36:05 2009 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 4 May 2009 00:36:05 -0400 Subject: [Python-Dev] PEP 383 and GUI libraries Message-ID: (sent only to python-dev, as I am not a subscriber of tahoe-dev) Zooko wrote: > [Tahoe] currently uses utf-8 for its internal storage (note: nothing to > do with reading or writing files from external sources -- only for > storing filenames in the decentralized storage system which is > accessed by Tahoe clients), and we can't start putting non-utf-8-valid > sequences in the "filename" slot because other Tahoe clients would > then get a UnicodeDecodeError exception when trying to read those > directories. So what do you do when someone has an existing file whose name is supposed to be in utf-8, but whose actual bytes are not valid utf-8? If you have somehow solved that problem, then you're already done -- the PEP's encoding is a no-op on anything that isn't already invalid unicode. If you have not solved that problem, then those clients will already be getting a UnicodeDecodeError; all the PEP does is make it at least possible for them to recover. ... > Requirement 1 (unicode): Each filename that you see needs to be valid > unicode (it is stored internally in utf-8). (repeating) What does Tahoe do if this is violated? Do you throw an exception right there and not let them copy the file to tahoe? If so, then that same error correction means that utf8b will never differ from utf-8, and you have nothing to worry about. > Requirement 2 (faithful if unicode): Doesn't the PEP meet this? > Requirement 3 (no file left behind): Doesn't the PEP also meet this? I thought the concern was just that the name used would not be valid unicode, unless the original name was itself valid unicode. > Possible Requirement 4 (faithful bytes if not unicode, a.k.a. > "round-tripping"): Doesn't the PEP also support this? (Only) the invalid bytes get escaped and therefore must be unescaped, but the escapement is reversible. > 3. (handling collisions) In either case 2.a or 2.b the resulting > unicode string may already be present in the directory. This collision is what the use of half-surrogates (as the escape characters) avoids. Such collisions can't be present unless the data was invalid unicode, in which case it was the result of an escapement (unless something other than python is creating new invalid filenames). -jJ From larry at hastings.org Mon May 4 11:10:51 2009 From: larry at hastings.org (Larry Hastings) Date: Mon, 04 May 2009 02:10:51 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef Message-ID: <49FEB11B.2040304@hastings.org> I should have brought this up to python-dev before--sorry for being so slow. It's already in the tracker for a couple of days: http://bugs.python.org/issue5880 The idea: PyGetSetDef has this "void *closure" field that acts like a context pointer. You stick it in the PyGetSetDef, and it gets passed back to you when your getter or setter is called. It's a reasonable API design, but in practice you almost never need it. Meanwhile, it clutters up CPython, particularly typeobject.c; there are all these function calls that end with ", NULL);", just to satisfy the getter/setter prototype internally. Most of the time, the "closure" parameter is not only unused, it is skipped. PyGetSetDef definitions generally skip it, and often getter and setter implementations omit it. The "closure" was only actually *used* once in CPython, a silly use in Objects/longobject.c where it was abused as an integer value. And yes, I said "was": inspired by this discussion, Mark Dickinson removed this use in r72202 (trunk) and r72203 (py3k). So the "closure" field is now 100% unused in the python and py3k trunks. Mr. Dickinson also located an extension using the "closure" pointer, pyephem, which... *also* uses it to store an integer. Indeed, I have yet to see a use where someone stores a pointer in "closure". Anyone who needed functionality like this could roll it themselves with stub functions: PyObject *my_getter_with_context(PyObject *self, void *context) { /* ... */ } PyObject *my_getter_A(PyObject *self) { return my_getter_with_context(self, "A"); } PyObject *my_getter_B(PyObject *self) { return my_getter_with_context(self, "B"); } /* etc. */ (Although it'd make my example more realistic if "context" were an int!) So: you don't need it, it clutters up our code (particularly typeobject.c), and it adds overhead. The only good reason to keep it is backwards compatibility, which I admit is a fine reason. Whaddya think? To be honest I'd be surprised if you guys went for this. But I thought it was worth suggesting. /larry/ From eric at trueblade.com Mon May 4 13:37:33 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 04 May 2009 07:37:33 -0400 Subject: [Python-Dev] Changing float.__format__ Message-ID: <49FED37D.30906@trueblade.com> In issue 5920, Mark Dickinson raises an issue having to do with float.__format__ and how it handles the default format presentation type (that is, none of 'f', 'g', or 'e') versus how str() works on floats: http://bugs.python.org/issue5920 I agree with him that the current behavior is confusing and should be changed. I'm going to make this change, unless anyone objects. Please comment on the issue itself if you have any feedback. Eric. From dickinsm at gmail.com Mon May 4 14:13:25 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 4 May 2009 13:13:25 +0100 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <49FEB11B.2040304@hastings.org> References: <49FEB11B.2040304@hastings.org> Message-ID: <5c6f2a5d0905040513t42f167f9pf44d4a28d355df47@mail.gmail.com> On Mon, May 4, 2009 at 10:10 AM, Larry Hastings wrote: > So: you don't need it, it clutters up our code (particularly typeobject.c), > and it adds overhead. ?The only good reason to keep it is backwards > compatibility, which I admit is a fine reason. Presumably whoever added the context field had a reason for doing so. Does anyone remember what the intended use was? Trawling through the history, all I could find was this comment, attached to revision 23270: [Modified Thu Sep 20 21:45:26 2001 UTC (7 years, 7 months ago) by gvanrossum] """ Add optional docstrings to getset descriptors. Fortunately, there's no backwards compatibility to worry about, so I just pushed the 'closure' struct member to the back -- it's never used in the current code base (I may eliminate it, but that's more work because the getter and setter signatures would have to change.) """ Still, binary compatibility seems like a fairly strong reason not to remove the closure field. Mark From gregor.lingl at aon.at Mon May 4 16:33:58 2009 From: gregor.lingl at aon.at (Gregor Lingl) Date: Mon, 04 May 2009 16:33:58 +0200 Subject: [Python-Dev] turtle.py update for 3.1 Message-ID: <49FEFCD6.1040001@aon.at> Hi, Encouraged by a conversation with Martin at PyCon 2009 I've prepared a version 1.1b of the turtle module and I'd like to get some advice or assistance to get it into the beta as explained below. Thus I'd appreciate very much if also the release manager would take notice of this posting. python 2.0 had the version 1.0 and for now I'll give a terse summary of the changes I did: 1. a few bugfixes, with 1 - 5 lines of code changed for each; these concern bugs that prevented turtle to run correctly 2. I've added four methods to the class TurtleScreeenBase: _onkeypress(fun, key) (supplementing _onkeyrelease) mainloop() (which is now a Screen-method and a function) textinput(title, prompt) numinput(title, prompt, default, minval, maxval) the latter two remedy the complete lack of input methods _onkey, an internal method name is changed to _onkeyrelease 3. I've added one method to the class TurtleScreen: onkeypress(fun, key=None) implemented in analogy to the already present onkey() which got onkeyrelease as an alias. 4. I've changed several portions of the code that affect the representation of the turtleshape thus making it more compact (by removing some duplicated code) and more powerful, i. e. by adding the possibility to apply shearings to turtleshapes (in addition to the already present scaling and rotating transformations). Thus now the full range of (non singular) linear transformations is available. New methods in class RawTurtle: shearfactor(shear=None) set or get the shearfactor shapetransform(t11, t12, t21, t22) set or get the shape transform directly get_shapepoly() return the polygon of the current shape I've enhanced the functionality of tiltangle(angle=None) to contain also that of settiltangle and I propose to declare settiltangle as deprecated. 5. I've removed a lot of codelines that were commented out during the process of transferring the module from 2.6 to 3.0 6. I've implemented the bugfix for http://bugs.python.org/issue4117 according do my proposition there and I strongly recommend this change again, as the bug described is very annoying, the fix is easy and no one proposed a better solution. 7. I've tested the present version 1.1 extensivly. It runs all the demo scripts without problems and many others too (some of them significantly better than version 1.1). I'd like to add two additional scripts to the demo directory, one of them using new features so it only runs with this new version. I've *not* touched the issue of the Screen singleton, so that remains unchanged as it was as a result of Martins patch. Thus, as a summary, this update does some bugfixes and eliminates three deficiencies of the module: (1) accept keypress event, (2) provide user input functions and (3) complement scaling and rotating of turtleshapes by shearing, thus providing the full range of linear transforms. HOW TO PROCEED NOW? (1) Submit the new version as a single file (2) submit a unified diff containing all the changes (3) Divide the changes into several chunks of related changes and submit the according diffs separately That would pose the problems, that there are lines in the code that are affected by several changes, e. g. those lines that define __all__ And also: does the order of applying the patches matter? How do I have to account for this? (4) Some other approach? I'd appreciate to discuss open issues as needed and I'm prepared to give more elaborate explanations and rationales as wanted or as needed. Docs for the changes are (to a large extent) contained in the docstrings and I'm going to update the Documentation of the turtle module (on the basis of theses docstrings) now. Thanks in advance for your support Gregor From phd at phd.pp.ru Mon May 4 17:07:49 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Mon, 4 May 2009 19:07:49 +0400 Subject: [Python-Dev] PyPI copyright Message-ID: <20090504150749.GG16721@phd.pp.ru> http://pypi.python.org/pypi "Copyright ? 1990-2007, Python Software Foundation" :s/2007/2009/ Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From mail at apexo.de Mon May 4 17:28:54 2009 From: mail at apexo.de (Christian Schubert) Date: Mon, 4 May 2009 17:28:54 +0200 Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python Message-ID: <200905041728.55350.mail@apexo.de> Hi, Python ships with a profiler module which, unfortunately, is almost useless in a multi-threaded environment. * I've created an alternative profiler module which queries per-thread CPU usage via netlink/taskstats, which limits the applicability to Linux (which shouldn't be much of an issue, profiling is usually not done by end users). It implements two modes: a "sampling" (does CPU time accounting based on stack fraames 100 times per second, by default) and a "deterministic" profiler (does CPU time accounting on each function call/return, based on sys.profiler interface). The deterministic profiler is currently implemented in pure python (except for taskstats interface) and much slower than the sampling profiler. Usage (don't forget make to build the c module): python >> from Profiler import * >> def f(): do_something() >> sampling_profiler(f) or >> deterministic_profiler(f) Output is currently in the form of annotated source code (xyz.py.html, in the same directory where xyz.py resides). Before the *_profiler function returns, it iterates over all code objects it encountered and annotates the source files with 2 columns in front: - 1st column: real time - 2nd column: CPU time numbers are log2(time_in_ns), colors are green-to-yellow for below-average and yellow-to-red for above-average metrics (relative to the average metric for all lines of the code object with a metric > 0). Is there common need for such a module? Is it possible to have this included in the standard cPython distribution? Which functional changes (besides a modification of the annotation output which shouldn't spread its result all over the FS) would be required to get this included? Which non-functional changes would be required to get this included? Please direct traffic regarding this subject to pyprof-devel at lists.sourceforge.net (no I'm not subscribed to python-dev). SF project page: https://sourceforge.net/projects/pyprof/ git repository: git://pyprof.git.sourceforge.net/gitroot/pyprof Regards, Christian *) to be more exact there are at least three profiler modules: profile, cProfile, and hotshot, while I did only try (and failed) to use profile in a multi-threaded environment (by manually setting threading.profile to the profiling function), glancing at the source, I'm pretty sure that cProfile behaves similarly; I didn't test the hotshot module, but it does some other trade-offs (space-for-time), so I think that "pyprof" still adds some value -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From aahz at pythoncraft.com Mon May 4 17:56:04 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 4 May 2009 08:56:04 -0700 Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python In-Reply-To: <200905041728.55350.mail@apexo.de> References: <200905041728.55350.mail@apexo.de> Message-ID: <20090504155604.GA21330@panix.com> On Mon, May 04, 2009, Christian Schubert wrote: > > Python ships with a profiler module which, unfortunately, is almost > useless in a multi-threaded environment. * > > I've created an alternative profiler module which queries per-thread > CPU usage via netlink/taskstats, which limits the applicability to > Linux (which shouldn't be much of an issue, profiling is usually > not done by end users). It implements two modes: a "sampling" (does > CPU time accounting based on stack fraames 100 times per second, by > default) and a "deterministic" profiler (does CPU time accounting > on each function call/return, based on sys.profiler interface). The > deterministic profiler is currently implemented in pure python (except > for taskstats interface) and much slower than the sampling profiler. If you want to discuss this, please subscribe to python-ideas and repost your message. Generally speaking, in order to include modules like this, they need to prove themselves over time and may require PEP approval. If you choose to move the discussion to python-ideas, it would help if you mention known uses of your module. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From fumanchu at aminus.org Mon May 4 18:15:24 2009 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 4 May 2009 09:15:24 -0700 Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python In-Reply-To: <200905041728.55350.mail@apexo.de> References: <200905041728.55350.mail@apexo.de> Message-ID: Christian Schubert wrote: > I've created an alternative profiler module which queries per-thread > CPU usage via netlink/taskstats, which limits the applicability to > Linux (which shouldn't be much of an issue, profiling is usually not > done by end users). One of the uses for a profiling module is to compare runs on various platforms. And please, stop perpetuating the myth that only end-users use anything but Linux. Robert Brewer fumanchu at aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From janssen at parc.com Mon May 4 18:19:26 2009 From: janssen at parc.com (Bill Janssen) Date: Mon, 4 May 2009 09:19:26 PDT Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python In-Reply-To: <200905041728.55350.mail@apexo.de> References: <200905041728.55350.mail@apexo.de> Message-ID: <38623.1241453966@parc.com> Hi, Christian. Christian Schubert wrote: > I've created an alternative profiler module which queries per-thread > CPU usage via netlink/taskstats, which limits the applicability to > Linux (which shouldn't be much of an issue, profiling is usually not > done by end users). A surprisingly large # of developers are running on OS X these days, though. I suggest make it work there, too. Bill From larry at hastings.org Mon May 4 19:08:12 2009 From: larry at hastings.org (Larry Hastings) Date: Mon, 04 May 2009 10:08:12 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <5c6f2a5d0905040513t42f167f9pf44d4a28d355df47@mail.gmail.com> References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905040513t42f167f9pf44d4a28d355df47@mail.gmail.com> Message-ID: <49FF20FC.2060202@hastings.org> Mark Dickinson wrote: > Still, binary compatibility seems like a fairly strong reason not to > remove the closure field. My understanding is that there a) 2.x extension modules are not binary compatible with 3.x, and b) there are essentially no 3.x extension modules in the field. Is that accurate? If we don't have an installed base (yet) to worry about, now's the time to make this change. /larry/ From amauryfa at gmail.com Mon May 4 19:17:15 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 4 May 2009 19:17:15 +0200 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <49FF20FC.2060202@hastings.org> References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905040513t42f167f9pf44d4a28d355df47@mail.gmail.com> <49FF20FC.2060202@hastings.org> Message-ID: Hi, Larry Hastings wrote: > > Mark Dickinson wrote: >> >> Still, binary compatibility seems like a fairly strong reason not to >> remove the closure field. > > My understanding is that there a) 2.x extension modules are not binary > compatible with 3.x, and b) there are essentially no 3.x extension modules > in the field. ?Is that accurate? ?If we don't have an installed base (yet) > to worry about, now's the time to make this change. cx_Oracle at least uses this closure field, and has already been ported to 3.x: http://www.google.com/codesearch?q=Connection_SetOCIAttr+trunk -- Amaury Forgeot d'Arc From larry at hastings.org Mon May 4 21:04:55 2009 From: larry at hastings.org (Larry Hastings) Date: Mon, 04 May 2009 12:04:55 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905040513t42f167f9pf44d4a28d355df47@mail.gmail.com> <49FF20FC.2060202@hastings.org> Message-ID: <49FF3C57.6030106@hastings.org> Amaury Forgeot d'Arc wrote: > Larry Hastings wrote: > >> My understanding is that there a) 2.x extension modules are not binary >> compatible with 3.x, and b) there are essentially no 3.x extension modules >> in the field. Is that accurate? If we don't have an installed base (yet) >> to worry about, now's the time to make this change. >> > cx_Oracle at least uses this closure field, and has already been ported to 3.x: > http://www.google.com/codesearch?q=Connection_SetOCIAttr+trunk And they're using it as a pointer, too! Nice to see it not abused for once. If it helps, I volunteer to port cx_Oracle to the new PyGetSetDef if my patch is accepted. The resulting code would be backwards-compatible with Python 3.0, so it could be incorporated immediately. Given the lack of interest in the proposal so far, this is an easy vow to make! /larry/ From daniel at stutzbachenterprises.com Mon May 4 21:11:06 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 4 May 2009 14:11:06 -0500 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <49FEB11B.2040304@hastings.org> References: <49FEB11B.2040304@hastings.org> Message-ID: On Mon, May 4, 2009 at 4:10 AM, Larry Hastings wrote: > So: you don't need it, it clutters up our code (particularly typeobject.c), > and it adds overhead. The only good reason to keep it is backwards > compatibility, which I admit is a fine reason. > If you make the change, will 3rd party code that relies on it fail in unexpected ways, or will they just get a compile error? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon May 4 21:52:02 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 4 May 2009 20:52:02 +0100 Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python In-Reply-To: <38623.1241453966@parc.com> References: <200905041728.55350.mail@apexo.de> <38623.1241453966@parc.com> Message-ID: <79990c6b0905041252p42650f89s90a1fe1da284b556@mail.gmail.com> 2009/5/4 Bill Janssen : > Hi, Christian. > > Christian Schubert wrote: > >> I've created an alternative profiler module which queries per-thread >> CPU usage via netlink/taskstats, which limits the applicability to >> Linux (which shouldn't be much of an issue, profiling is usually not >> done by end users). > > A surprisingly large # of developers are running on OS X these days, > though. ?I suggest make it work there, too. And Windows. I doubt that the various Windows-specific modules available were developed on Linux. And I wouldn't assume that all of the platform-neutral modules are developed on Linux, or even that the developers have access to Linux. (I know I don't, short of building a brand new virtual machine...) Paul. From dickinsm at gmail.com Mon May 4 22:00:23 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 4 May 2009 21:00:23 +0100 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: References: <49FEB11B.2040304@hastings.org> Message-ID: <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> On Mon, May 4, 2009 at 8:11 PM, Daniel Stutzbach wrote: > If you make the change, will 3rd party code that relies on it fail in > unexpected ways, or will they just get a compile error? I *think* that third party code that's recompiled for 3.1 and that doesn't use the closure field will either just work, or will produce an easily-fixed compile error. Larry, does this sound right? But I guess the bigger issue is that extensions already compiled against 3.0 that use PyGetSetDef (even if they don't make use of the closure field) won't work with 3.1 without a recompile: they'll segfault, or otherwise behave unpredictably. If that's not considered a problem, then surely we ought to be getting rid of tp_reserved? Mark From daniel at stutzbachenterprises.com Mon May 4 22:07:50 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 4 May 2009 15:07:50 -0500 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> Message-ID: On Mon, May 4, 2009 at 3:00 PM, Mark Dickinson wrote: > But I guess the bigger issue is that extensions already compiled against > 3.0 > that use PyGetSetDef (even if they don't make use of the closure field) > won't work with 3.1 without a recompile: they'll segfault, or otherwise > behave > unpredictably. > I was under the impression that binary compatibility was only guaranteed within a minor revision (e.g., 2.6.1 must run code compiled for 2.6.0, but 2.7.0 doesn't have to). I've been wrong before, though. Certainly the C extension module I maintain is sprinkled with #ifdef's so it will compile under 2.5, 2.6, and 3.0. ;-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon May 4 22:15:21 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 4 May 2009 20:15:21 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Proposed=3A_drop_unnecessary_=22context=22?= =?utf-8?q?_pointer_from=09PyGetSetDef?= References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> Message-ID: Mark Dickinson gmail.com> writes: > > I *think* that third party code that's recompiled for 3.1 and that > doesn't use the closure field will either just work, or will produce an > easily-fixed compile error. Larry, does this sound right? This doesn't sound right. The functions in the third party code will get compiled with the wrong signature, so they can crash (or behave unexpectedly) when called by Python. From dickinsm at gmail.com Mon May 4 22:18:20 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 4 May 2009 21:18:20 +0100 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> Message-ID: <5c6f2a5d0905041318x504b83f5re90cafe5db099c89@mail.gmail.com> On Mon, May 4, 2009 at 9:15 PM, Antoine Pitrou wrote: > Mark Dickinson gmail.com> writes: >> >> I *think* that third party code that's recompiled for 3.1 and that >> doesn't use the closure field will either just work, or will produce an >> easily-fixed compile error. ?Larry, does this sound right? > > This doesn't sound right. The functions in the third party code will get > compiled with the wrong signature, so they can crash (or behave unexpectedly) > when called by Python. Yes, of course the signature of the getters and setters changes. Please ignore me. :-) Mark From larry at hastings.org Mon May 4 22:29:19 2009 From: larry at hastings.org (Larry Hastings) Date: Mon, 04 May 2009 13:29:19 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> Message-ID: <49FF501F.9040503@hastings.org> Mark Dickinson wrote: > I *think* that third party code that's recompiled for 3.1 and that > doesn't use the closure field will either just work, or will produce an > easily-fixed compile error. Larry, does this sound right? > Yep. > But I guess the bigger issue is that extensions already compiled against 3.0 > that use PyGetSetDef (even if they don't make use of the closure field) > won't work with 3.1 without a recompile: they'll segfault, or otherwise behave > unpredictably. > Well, I think they'd work if they didn't use the closure and they had only one entry in their array of PyGetSetDefs. But more than one, and yes it would behave unpredictably. Probably segfault. > If that's not considered a problem, then surely we ought to be getting rid of > tp_reserved? In principle they are equivalent, but in practice removing tp_reserved is a much bigger change. Removing the closure field would result in obvious compile errors, and plenty of folks wouldn't even experience those. Removing tp_reserved would affect everybody, with inscrutable compiler errors. Personally I'd be up for removing tp_reserved. But I lack the caution regarding backwards compatibility that has served Python so well, so you're ill-advised to listen to me. Daniel Stutzbach wrote: > I was under the impression that binary compatibility was only > guaranteed within a minor revision (e.g., 2.6.1 must run code compiled > for 2.6.0, but 2.7.0 doesn't have to). I've been wrong before, though. My understanding is that that's the explicit guarantee. However Python has been well-served by being much more cautious than that, a policy with which I cannot find fault. > Certainly the C extension module I maintain is sprinkled with #ifdef's > so it will compile under 2.5, 2.6, and 3.0. ;-) Happily this is one change where you could maintain backwards compatibility without #ifdefs. If you use the closure field, change your code to use stub functions and pass the closure data in yourself. /larry/ From greg at krypto.org Tue May 5 00:42:15 2009 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 4 May 2009 15:42:15 -0700 Subject: [Python-Dev] turtle.py update for 3.1 In-Reply-To: <49FEFCD6.1040001@aon.at> References: <49FEFCD6.1040001@aon.at> Message-ID: <52dc1c820905041542k365221d8t41d324ee5a169724@mail.gmail.com> On Mon, May 4, 2009 at 7:33 AM, Gregor Lingl wrote: > Hi, > > Encouraged by a conversation with Martin at PyCon 2009 > I've prepared a version 1.1b of the turtle module and I'd like to > get some advice or assistance to get it into the beta as explained > below. Thus I'd appreciate very much if also the release manager > would take notice of this posting. > > python 2.0 had the version 1.0 and for now I'll give a terse > summary of the changes I did: > > 1. a few bugfixes, with 1 - 5 lines of code changed for each; > these concern bugs that prevented turtle to run correctly > > 2. I've added four methods to the class TurtleScreeenBase: > _onkeypress(fun, key) (supplementing _onkeyrelease) > mainloop() (which is now a Screen-method and a function) > textinput(title, prompt) > numinput(title, prompt, default, minval, maxval) > the latter two remedy the complete lack of input methods > > _onkey, an internal method name is changed to _onkeyrelease > > 3. I've added one method to the class TurtleScreen: > onkeypress(fun, key=None) implemented in analogy to the already > present onkey() > which got onkeyrelease as an alias. > > 4. I've changed several portions of the code that affect > the representation of the turtleshape thus making it > more compact (by removing some duplicated code) and more > powerful, i. e. by adding the possibility to apply > shearings to turtleshapes (in addition to the already present > scaling and rotating transformations). Thus now the full > range of (non singular) linear transformations is available. > > New methods in class RawTurtle: > shearfactor(shear=None) set or get the shearfactor > shapetransform(t11, t12, t21, t22) > set or get the shape transform directly > get_shapepoly() return the polygon of the current shape > > I've enhanced the functionality of tiltangle(angle=None) > to contain also that of settiltangle and I propose to > declare settiltangle as deprecated. > 5. I've removed a lot of codelines that were commented out > during the process of transferring the module from 2.6 > to 3.0 > > 6. I've implemented the bugfix for http://bugs.python.org/issue4117 > according do my proposition there and I strongly > recommend this change again, as the bug described is very > annoying, the fix is easy and no one proposed a better > solution. > > 7. I've tested the present version 1.1 extensivly. It runs > all the demo scripts without problems and many others > too (some of them significantly better than version 1.1). > I'd like to add two additional scripts to the demo > directory, one of them using new features so it only runs > with this new version. > > I've *not* touched the issue of the Screen singleton, so that > remains unchanged as it was as a result of Martins patch. > > Thus, as a summary, this update does some bugfixes and eliminates > three deficiencies of the module: (1) accept keypress event, > (2) provide user input functions and (3) complement scaling > and rotating of turtleshapes by shearing, thus providing > the full range of linear transforms. > > HOW TO PROCEED NOW? > > (1) Submit the new version as a single file > (2) submit a unified diff containing all the changes > (3) Divide the changes into several chunks of > related changes and submit the according diffs separately > That would pose the problems, that there are lines > in the code that are affected by several changes, > e. g. those lines that define __all__ > And also: does the order of applying the patches matter? > How do I have to account for this? > (4) Some other approach? I'm happy with option #1. If you find it reasonable to break things into mutliple changes, feel free to do it, but at this point the turtle module hasn't had a much love in ages so a large update in one commit is not a problem IMHO. > > > I'd appreciate to discuss open issues as needed and I'm > prepared to give more elaborate explanations and rationales > as wanted or as needed. > > Docs for the changes are (to a large extent) contained in the > docstrings and I'm going to update the Documentation of the > turtle module (on the basis of theses docstrings) now. > > Thanks in advance for your support > > Gregor > > > > > > > > > > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue May 5 02:27:36 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 05 May 2009 12:27:36 +1200 Subject: [Python-Dev] Building types programmatically (was: drop unnecessary "context" pointer from PyGetSetDef) In-Reply-To: <49FF501F.9040503@hastings.org> References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> <49FF501F.9040503@hastings.org> Message-ID: <49FF87F8.7060201@canterbury.ac.nz> Larry Hastings wrote: > > Removing tp_reserved would affect everybody, with inscrutable > compiler errors. This would have to be considered in conjunction with the proposed programmatic type-building API, I think. I'd like to see a migration towards something like that, BTW. Recently I had occasion to do some work on a Ruby extension module, and I was struck by how much more pleasant it was to be able to create a class and add a few functions to it using calls, rather than having to wrestle with a huge static struct declaration. While I like the Python language better than Ruby, I think Ruby's extension API is ahead in this particular area. -- Greg From zookog at gmail.com Tue May 5 05:36:50 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Mon, 4 May 2009 21:36:50 -0600 Subject: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries] In-Reply-To: <877i0yilah.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49F965DB.6050601@v.loewis.de> <49FB2596.1090706@v.loewis.de> <51167066-A162-4AAF-B40D-52C1918032D8@fuhm.net> <877i0yilah.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Thank you for sharing your extensive knowledge of these issues, SJT. On Sun, May 3, 2009 at 3:32 AM, Stephen J. Turnbull wrote: > Zooko O'Whielacronx writes: > > > However, it is moot because Tahoe is not a new system. It is > > currently at v1.4.1, has a strong policy of backwards- > > compatibility, and already has lots of data, lots of users, and > > programmers building on top of it. > > Cool! Thanks! Actually yes it is extremely cool that it really does this encryption, erasure-encoding, capability-based access control, and decentralized topology all in a fully functional, stable system. If you're interested in such stuff then you should definitely check it out! > Question: is there a way to negotiate versions, or better yet, > features? For the peer-to-peer protocol there is, but the persistent storage is an inherently one-way communication. A Tahoe client writes down information, and at a later point a Tahoe client, possibly of a different version, reads it. There is no way for the original writer to ask what versions or features the readers may eventually have. But, the writer can write down optional information which will be invisible to readers that don't know to look for it, but adding it into the "metadata" dictionary. For example: http://testgrid.allmydata.org:3567/uri/URI%3ADIR2%3Adjrdkfawoqihigoett4g6auz6a%3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/?t=json renders the directory contents into json and results in this: "r\u00e9sum\u00e9.html": [ "filenode", { "mutable": false, "verify_uri": "URI:CHK-Verifier:63y4b5bziddi73jc6cmyngyqdq:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328", "metadata": { "ctime": 1241365319.0695441, "mtime": 1241365319.0695441 }, "ro_uri": "URI:CHK:no2l46woyeri6xmhcrhhomgr5a:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328", "size": 8328 } ], A new version of Tahoe writing entries like this is constrained to making the primary key (the filename) be a valid unicode string (if it wants older Tahoe clients to be able to read the directory at all). However, it is not constrained about what new keys it may add to the "metadata" dict, which is where we propose to add the "failed_decode" flag and the "original_bytes". > Well, it's a high-dimensional problem. Keeping track of all the > variables is hard. Well put. > That's why something like PEP 383 can be important > to you even though it's only a partial solution; it eliminates one > variable. Would that it were so! The possibility that PEP 383 could help me or other like me is why I am trying so hard to explain what kind of help I need. :-) > > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux > > system and then you inspect the files in the Tahoe filesystem, > > such as by examining the web interface [1] or by running > > "tahoe ls", either of which you could do either from the same > > machine where you ran "tahoe cp" or from a different machine > > (which could be using any operating system). We have the > > following requirements about what ends up in your Tahoe directory > > after that cp -r. > > Whoa! Slow down! Where's "my" "Tahoe directory"? Do you mean the > directory listing? A copy to whatever system I'm on? The bytes that > the Tahoe host has just loaded into a network card buffer to tell me > about it? The bytes on disk at the Tahoe host? You'll find it a lot > easier to explain things if you adopt a precise, consistent > terminology. Okay here's some more detail. There exists a Tahoe directory, the bytes of which are encrypted, erasure-coded, and spread out over multiple Tahoe servers. (To the servers it is utterly opaque, since it is encrypted with a symmetric encryption key that they don't have.) A Tahoe client has the decryption key and it recovers the cleartext bytes. (Note: the internal storage format is not the json encoding shown above -- it is a custom format -- the json format above is what is produced to be exported through the API, and it serves as a useful example for e-mail discussions.) Then for each bytestring childname in the directory it decodes it with utf-8 to get the unicode childname. Does that all make sense? > > Requirement 1 (unicode): Each filename that you see needs to be valid > > unicode > > What does "see" mean? In directory listings? Yes, either with "tahoe ls", with a FUSE plugin, wht the web UI. Remove the trailing "?t=json" from the URL above to see an example. > Under what > circumstances, if any, can what I see be different from what I get? This a good question! In the previous iteration of the Tahoe design, you could sometimes get something from "tahoe cp" which is different from what you saw with "tahoe ls". In the current design -- http://allmydata.org/trac/tahoe/ticket/534#comment:66 , this is no longer the case, because we abandon the requirement to have "round-trip fidelity of bytes". > > Requirement 2 (faithful if unicode): For each filename (byte > > string) in your myfiles directory, > > My local myfiles directory, or my Tahoe myfiles directory? The local one. > > if that bytestring is the valid encoding of some string in your > > stated locale, > > Who stated the locale? How? Are you referring to what > getfilesystemencoding returns? This is a "(unicode) string", right? Yes, and yes. > > Requirement 3 (no file left behind): For each filename (byte > > string) in your myfiles directory, whether or not that byte > > string is the valid encoding of anything in your stated locale, > > then that file will be added into the Tahoe filesystem under > > *some* name (a good candidate would be mojibake, e.g. decode the > > bytes with latin-1, but that is not the only possibility). > > That's not even a possibility, actually. Technically, Latin-1 has a > "hole" from U+0080 to U+009F. You need to add the C1 controls to fill > in that gap. (I don't think it actually matters in practice, > everybody seems to implement ISO-8859/1 as though it contained the > control characters ... except when detecting encodings ... but it pays > to be precise in these things ....) Perhaps windows-1252 would be a better codec for this purpose? However it would be clearer for the purposes of this discussion, and also perhaps for actual users of Tahoe, if instead of decoding with windows-1252 in order to get a mojibake name, Tahoe would simply generate a name like "badly_encoded_filename_#1". Let's run with that. For clarity, assume that the arbitrary unicode filename that Tahoe comes up with is "badly_encoded_filename_#1". This doesn't change anything in this story. In particular it doesn't change the fact that there might already be an entry in the directory which is named "badly_encoded_filename_#1" even though it was *not* a badly encoded filename, but a correctly encoded one. > > Now already we can say that these three requirements mean that > > there can be collisions -- for example a directory could have two > > entries, one of which is not a valid encoding in the locale, and > > whatever unicode string we invent to name it with in order to > > satisfy requirements 3 (no file left behind) and 1 (unicode) > > might happen to be the same as the (correctly-encoded) name of > > the other file. > > This is false with rather high probability, but you need some extra > structure to deal with it. First, claim the Unicode private planes > for Tahoe. [snip on long and intriguin instructions to perform unicode magic that I don't understand] Wait, wait. What good would this do? The current plan is that if the filenames collide we increment the number at the end "#$NUMBER", if we are just naming them "badly_encoded_filename_#1", or that we append "~1" if we are naming them by mojibake. And the current plan is that the original bytes are saved in the metadata for future cyborg archaeologists. How would this complex unicode magic that I don't understand improve the current plan? Would it provide filenames that are more meaningful or useful to the users than the "badly_encoded_filename_#1" or the mojibake? > The registry of characters is somewhat unpleasant, but it does allow > you to detect filenames that are the same reliably. There is no server, so to implement such a registry we would probably have to include a copy of the registry inside each (encrypted, erasure-encoded) directory. > > Possible Requirement 4 (faithful bytes if not unicode, a.k.a. > > "round-tripping"): > > PEP 383 gives you this, but you must store the encoding used for each > such file name. Well, at this point this has become an anti-requirement because it causes the filename as displayed when examining the directory to be different from the filename that results when cp'ing the directory. Also I don't see why PEP 383's implementation of this would be better than the previous iteration of the design in which this was accomplished by simply storing the original bytes and then writing them back out again on demand, or the design before that in which this was accomplished by mojibake'ing the bytes (by decoding them with windows-1252) and setting a flag indicating that this has been done. I think I understand now that PEP 383 is better for the case that you can't store extra metadata (such as our failed_decode flag or our original_bytes), but you can ensure that the encoding that will be used later matches the one that was used for decoding now. Neither of these two criteria apply to Tahoe, and I suspect that neither of them apply to most uses other than the entirely local and non-persistent "for x in os.listdir(): open(x)". > > But an even worse problem -- the show-stopper for me -- is that I > > don't want what Tahoe shows when you do "tahoe ls" or view it in a > > web browser to differ from what it writes out when you do > > "tahoe cp -r tahoe: newfiles/". > > But as a requirement, that's incoherent. What you are "seeing" is > Unicode, what it will write out is bytes. In the new plan, we write the unicode filename out using Python's unicode filesystem APIs, so Python will attempt to encode it into the appropriate filesystem encoding (raising UnicodeEncodeError if it won't fit). > That means that if multiple > locales are in use on both the backup and restore systems, and the > nominal system encodings are different, people whose personal default > locales are not the same as the system's will see what they expect on > the backup system (using system ls), mojibake on Tahoe (using tahoe > ls), and *different* mojibake on the restore system (system ls, > again). Let's see... Tahoe is a user-space program and lets Python determine what the appropriate "sys.getfilesystemencoding()" is based on what the user's locale was at Python startup. So I don't think what you wrote above is correct. I think that in the first transition, from source system to Tahoe, that either the name will be correctly transcoded (i.e., it looks the same to the user as long as the locale they are using to "look" at it, e.g. with "ls" or Nautilus or whatever is the same as the locale that was set when their Python process started up), or else it will be undecodable under their current locale and instead will be replaced with either mojibake or "badly_encoded_filename_#1". Hm, here is a good argument in favor of using mojibake to generate the arbitrary unicode name instead of naming it "badly_encoded_filename_#1": because that's probably what ls and Nautilus will show! Let me try that... Oh, cool, Nautilus and GNU ls both replace invalid chars with U+FFFD (like the 'replace' error handler does in Python's decode()) and append " (invalid encoding)" to the end. That sounds like an even better way to handle it than either mojibake or "badly_encoded_filename_#1", and it also means that it will look the same in Tahoe as it does in GNU ls and Nautilus. Excellent. On the next transition, from Tahoe to system, Tahoe uses the Python unicode API, which will attempt to encode the unicode filename into the local filesystem encoding and raise UnicodeEncodeError if it can't. > > Requirement 5 (no loss of information): I don't want Tahoe to > > destroy information -- every transformation should be (in > > principle) reversible by some future computer-augmented > > archaeologist. ... > UTF-8b would be just as good for storing the original bytestring, as > long as you keep the original encoding. It's actually probably > preferable if PEP 383 can be assumed to be implemented in the > versions of Python you use. It isn't -- Tahoe doesn't run on Python 3. Also Tahoe is increasingly interoperating with tools written in completely different languages. It is much easier for to tell all of those programmers (in my documentation) that in the filename slot is the (normal, valid, standard) unicode, and in the metadata slot there are the bytes than to tell them about utf-8b (which is not even implemented in their tools: JavaScript, JSON, C#, C, and Ruby). I imagine that it would be a deal-killer for many or most of them if I said they couldn't use Tahoe reliably without first implementing utf-8b for their toolsets. > > 1. On Windows or Mac read the filename with the unicode APIs. > > Normalize the string with filename = unicodedata.normalize('NFC', ... > NFD is probably better for fuzzy matching and display on legacy > terminals. I don't know anything about them, other than that Macintosh uses NFD and everything else uses NFC. Should I specify NFD? What are these "legacy terminals" of which you speak? Will NFD make it look better when I cat it to my vt102? (Just kidding -- I don't have one.) > Per the koi8-lucky example, you don't know if it succeeded for the > right reason or the wrong reason. You really should store the > alleged_encoding used in the metadata, always. Right -- got it. > > 2.b. If this decoding fails, then we decode it again with > > bytes.decode('latin-1', 'strict'). Do not normalize it. Store the > > resulting unicode object into the "filename" part, set the > > "failed_decode" flag to True. This is mojibake! > > Not necessarily. Most ISO-8859/X names will fail to decode if the > alleged_encoding is UTF-8, for example, but many (even for X != 1) > will be correctly readable because of the policy of trying to share > code points across Latin-X encodings. Certainly ISO-8859/1 (and > much ISO-8859/15) will be correct. Ah. What is the Japanese word for "word with some characters right and other characters mojibake!"? :-) > > Now a question for python-dev people: could utf-8b or PEP 383 be > > useful for requirements like the four requirements listed above? If > > not, what requirements does PEP 383 help with? > > By giving you a standard, invertible way to represent anything that > the OS can throw at you, it helps with all of them. So, it is invertible only if you can assume that the same encoding will be used on the second leg of the trip, right? Which you can do by writing down what encoding was used on this leg of the trip and forcing it to use the same encoding on the other leg. Except that we can't force that to happen on Windows at all as far as I understand, which is a show-stopper right there. But even if we could, this would require us to write down a bit of information and transmit it to the other side and use it to do the encoding. And if we are going to do that, why don't we just transmit the original bytes? Okay, maybe because that would roughly double the amount of data we have to transmit, and maybe we are stingy. But if we are stingy we could instead transmit a single added bit to indicate whether the name is normal or mojibake, and then use windows-1252 to stuff the bytes into the name. One of those options has the advantage of simplicity to the programmer ("There is the unicode, and there are the bytes."), and the other has the advantage of good compression. Both of them have the advantage that nobody involved has to understand and possibly implement a non-standard unicode hack. I'm trying not to be too pushy about this (heaven knows I've been completely wrong about things a dozen times in a row so far in this design process), but as far as I can understand it, PEP 383 can be used only when you can force the same encoding on both sides (the PEP says that encoding "only 'works' if the data get converted back to bytes with the python-escape error handler also"). That happens naturally when both sides are in the same Python process, so PEP 383 naturally looks good in that context. However, if the filenames are going to be stored persistently or transmitted over a network, then it seems simpler, easier, and more portable to use some other method than PEP 383 to handle badly encoded names. > > I'm not sure that it can help if you are going to store the results > > of your os.listdir() persistently or if you are going to transmit > > them over a network. Indeed, using the results that way could lead > > to unpleasant surprises. > > No more than any other system for giving a canonical Unicode spelling > to the results of an OS call. I think PEP 383 yields more surprises than the alternative of decoding with error handler 'replace' and then including the original bytes along with the unicode. During the course of this process I have also considered using two other mechanisms instead of decoding with error handler 'replace' -- mojibake using windows-1252 or a simple placeholder like "badly_encoded_filename_#1". Any of these three seem to be less surprising and similarly functional to PEP 383. I have to admit that they are not as elegant. Utf-8b is a really neat hack, and MvL's generalization of it to all unicode encodings is, too. I'm still being surprised by it after trying to understand it for many days now. For example, what happens if you decode a filename with PEP 383, store that filename somewhere, and then later try to write a file under that name on Windows? If it only 'works' if the data get converted back to bytes with the python-escape error handler, then can you use the python-escape error handler when trying to, say, create a new file on Windows? Regards, Zooko From jmillikin at gmail.com Tue May 5 07:19:36 2009 From: jmillikin at gmail.com (John Millikin) Date: Mon, 4 May 2009 22:19:36 -0700 Subject: [Python-Dev] Undocumented change / bug in Python3's PyMapping_Check Message-ID: <3283f7fe0905042219r23113ca6ud6dd3840d7462f37@mail.gmail.com> In Python 2, PyMapping_Check will return 0 for list objects. In Python 3, it returns 1. Obviously, this makes it rather difficult to differentiate between mappings and other sized iterables. In addition, it differs from the behavior of the ``collections.Mapping`` ABC -- isinstance([], collections.Mapping) returns False. I believe the new behavior is erroneous, but would like to confirm that before filing a bug. The behavior can be seen from a C extension, or if you're lazy, using ctypes: Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes >>> ctypes.CDLL('libpython2.6.so').PyMapping_Check(ctypes.py_object([])) 0 Python 3.0.1+ (r301:69556, Apr 15 2009, 15:59:22) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes >>> ctypes.CDLL('libpython3.0.so').PyMapping_Check(ctypes.py_object([])) 1 From larry at hastings.org Tue May 5 09:24:38 2009 From: larry at hastings.org (Larry Hastings) Date: Tue, 05 May 2009 00:24:38 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <5c6f2a5d0905041318x504b83f5re90cafe5db099c89@mail.gmail.com> References: <49FEB11B.2040304@hastings.org> <5c6f2a5d0905041300qe500a21vc90b72382883236a@mail.gmail.com> <5c6f2a5d0905041318x504b83f5re90cafe5db099c89@mail.gmail.com> Message-ID: <49FFE9B6.4040609@hastings.org> Mark Dickinson wrote: >> This doesn't sound right. The functions in the third party code will get >> compiled with the wrong signature, so they can crash (or behave unexpectedly) >> when called by Python. >> > Yes, of course the signature of the getters and setters changes. Please > ignore me. :-) If they don't use the closure field, then either they won't compile due to type mismatches or they'll work fine. There's a lot of code in CPython that didn't need to be changed for my remove-closure patch; the functions didn't bother taking the "void * closure" that they were going to ignore anyway, and then they cast the function pointer in the PyGetSetDef to make the compiler shut up. Worked fine. And, in nearly all cases, the static PyGetSetDefs omit the closure member, which means C initializes them with a 0. /larry/ From mal at egenix.com Tue May 5 10:40:51 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 05 May 2009 10:40:51 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FDD6DD.6050808@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> Message-ID: <49FFFB93.7020105@egenix.com> On 2009-05-03 19:39, Martin v. L?wis wrote: >> If the error handler is supposed to be used for codecs other than utf-8, >> perhaps it should renamed something more generic, e.g. "surrogate-escape"? > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > it's an algorithm based on 16-bit or 32-bit code points. If the error handler doesn't have anything to do with UTF-8, then why do you use "utf8" in the name. Please use a more descriptive name for the handler which does not cause confusion with a existing codec. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 05 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 54 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Tue May 5 10:57:03 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 05 May 2009 04:57:03 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FFFB93.7020105@egenix.com> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> <49FFFB93.7020105@egenix.com> Message-ID: M.-A. Lemburg wrote: > On 2009-05-03 19:39, Martin v. L?wis wrote: >>> If the error handler is supposed to be used for codecs other than utf-8, >>> perhaps it should renamed something more generic, e.g. "surrogate-escape"? >> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - >> it's an algorithm based on 16-bit or 32-bit code points. > > If the error handler doesn't have anything to do with UTF-8, then why > do you use "utf8" in the name. > > Please use a more descriptive name for the handler which does not cause > confusion with a existing codec. Having already been confused, I agree. From eric at trueblade.com Tue May 5 11:13:58 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 05 May 2009 05:13:58 -0400 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49FA4064.5000508@gmail.com> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> <49F9FCD0.80208@hastings.org> <49FA4064.5000508@gmail.com> Message-ID: <4A000356.30408@trueblade.com> Mark Hammond wrote: >> Is that enough consensus for it to go in? If so, are there any core >> developers who could help me get it in before the 3.1 feature freeze? >> The patch should be in good shape; it has unit tests and updated >> documentation. > > I've taken the liberty of explicitly CCing Martin just incase he missed > the thread with all the noise regarding PEP383. > > If there are no objections from Martin or anyone else here, please feel > free to assign it to me (and mail if I haven't taken action by the day > before the beta freeze...) Mark: I've reviewed this and it looks okay to me. It passes all the tests on Windows and Linux. But if you could take a look at it before the release tomorrow, I'd appreciate it. I feel good enough about it to check it in if no one else gets to it. Eric. From supreet.sethi at gmail.com Tue May 5 12:41:22 2009 From: supreet.sethi at gmail.com (s|s) Date: Tue, 5 May 2009 16:11:22 +0530 Subject: [Python-Dev] using help function in Py3k Message-ID: Hello, I Ran Python 3.0 for the first time. I used help() function and wrote "modules hash". It issues an error. Traceback (most recent call last): File "", line 1, in File "/home/ss/eproj/xapian/INST//lib/python3.0/site.py", line 427, in __call__ return pydoc.help(*args, **kwds) File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1675, in __call__ self.interact() File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1693, in interact self.help(request) File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1711, in help self.listmodules(request.split()[1]) File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1799, in listmodules apropos(key) File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1913, in apropos ModuleScanner().run(callback, key, onerror=onerror) File "/home/ss/eproj/xapian/INST//lib/python3.0/pydoc.py", line 1875, in run source = loader.get_source(modname) File "/home/ss/eproj/xapian/INST/lib/python3.0/pkgutil.py", line 293, in get_source self.source = self.file.read() File "/home/ss/eproj/xapian/INST//lib/python3.0/io.py", line 1720, in read decoder = self._decoder or self._get_decoder() File "/home/ss/eproj/xapian/INST//lib/python3.0/io.py", line 1506, in _get_decoder make_decoder = codecs.getincrementaldecoder(self._encoding) File "/home/ss/eproj/xapian/INST//lib/python3.0/codecs.py", line 960, in getincrementaldecoder decoder = lookup(encoding).incrementaldecoder LookupError: unknown encoding: uft-8 The reason for errors is test/ directory which has got tests for python parser are installed in Lib directory. I propose that these files should be installed by default in some other directory. Preferably in /share or /share/doc part of the tree. regards -- ~preet~ From aahz at pythoncraft.com Tue May 5 13:47:18 2009 From: aahz at pythoncraft.com (Aahz) Date: Tue, 5 May 2009 04:47:18 -0700 Subject: [Python-Dev] using help function in Py3k In-Reply-To: References: Message-ID: <20090505114718.GA16437@panix.com> On Tue, May 05, 2009, s|s wrote: > > I Ran Python 3.0 for the first time. I used help() function and wrote > "modules hash". It issues an error. Please file a report on bugs.python.org -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From stephen at xemacs.org Tue May 5 15:09:25 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 05 May 2009 22:09:25 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FFFB93.7020105@egenix.com> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> <49FFFB93.7020105@egenix.com> Message-ID: <87eiv3hf22.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > On 2009-05-03 19:39, Martin v. L?wis wrote: > >> If the error handler is supposed to be used for codecs other than utf-8, > >> perhaps it should renamed something more generic, e.g. "surrogate-escape"? > > > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > > it's an algorithm based on 16-bit or 32-bit code points. I don't understand this phrasing. The algorithm is only applicable to ASCII-compatible octet streams. It results in code points by a simple displacement of octet -> octet + 0xDC00. It cannot be used on (say) UTF-32 to deal with embedded surrogates. Certainly, the computation requires (at least) 16 bit numbers, but the input must be restricted to a stream of 8-bit code points, while the output is 16- or 32-bit code points. > Please use a more descriptive name [than "utf-8b"] for the handler > which does not cause confusion with a existing codec. But please don't use "surrogate-escape" or (as in the current PEP) "python-escape"; it's not an escaping (quotation) mechanism. "surrogate-replace", "surrogate-substitute", or "surrogate-translate" would be better names. From daniel at stutzbachenterprises.com Tue May 5 15:43:57 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 5 May 2009 08:43:57 -0500 Subject: [Python-Dev] using help function in Py3k In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 5:41 AM, s|s wrote: > LookupError: unknown encoding: uft-8 > uft-8? Looks like a variation of Issue 4540 (or a duplicate? I can't tell) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue May 5 16:08:34 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 05 May 2009 10:08:34 -0400 Subject: [Python-Dev] [Fwd: [Python-checkins] r72331 - python/branches/py3k/Modules/posixmodule.c] Message-ID: <4A004862.5070605@trueblade.com> Modules/posixmodule.c now compiles for me, but I get a Bus Error in test_lchflags when running test_posixmodule on Mac OS X 10.5. I'll open a release blocker bug on this. -------- Original Message -------- Subject: [Python-checkins] r72331 - python/branches/py3k/Modules/posixmodule.c Date: Tue, 5 May 2009 15:07:31 +0200 (CEST) From: eric.smith To: python-checkins at python.org Author: eric.smith Date: Tue May 5 15:07:30 2009 New Revision: 72331 Log: Added missing semicolon. Modified: python/branches/py3k/Modules/posixmodule.c Modified: python/branches/py3k/Modules/posixmodule.c ============================================================================== --- python/branches/py3k/Modules/posixmodule.c (original) +++ python/branches/py3k/Modules/posixmodule.c Tue May 5 15:07:30 2009 @@ -1928,7 +1928,7 @@ if (!PyArg_ParseTuple(args, "O&i:lchmod", PyUnicode_FSConverter, &opath, &i)) return NULL; - path = bytes2str(opath, 1) + path = bytes2str(opath, 1); Py_BEGIN_ALLOW_THREADS res = lchmod(path, i); Py_END_ALLOW_THREADS _______________________________________________ Python-checkins mailing list Python-checkins at python.org http://mail.python.org/mailman/listinfo/python-checkins From stephen at xemacs.org Tue May 5 16:57:36 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 05 May 2009 23:57:36 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <49FD5300.6010906@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> Message-ID: <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > I've updated the PEP accordingly. I have three substantive comments. First, although consequences for Python 3 byte interfaces (ie, "none") are explicitly stated, as far as I can see this PEP could apply to Python 2 as well. I don't think it's intended that way. Either way, I think you should clarify that point. Second, I suggest "surrogate-replace" as the name of the error handler rather than "utf8b". (Elsewhere I've suggested others, but I think this is the best of the bunch.) Third, it is not clear to me why non-decodable ASCII should be an error. There are plenty of low surrogates for the purpose. Is there another technical reason? Stupid or not, Shift-JIS- and Big5-encoded file systems are quite common in Asia still (including non-rewritable media). I think surrogate-replacement of ASCII should at least be an option. I don't think "people shouldn't be using non-ASCII-compatible encodings for locale encodings" is a sufficient rationale for a hard error here. I mean, of course they *should* be using UTF-8. Maybe Python 3.1 should just go ahead and error on any other encoding on POSIX platforms? I have a number of nitpicking comments and technical clarifications on the PEP. Rationale is in footnotes. There were also a few typos I noticed. 1. There is no such thing as a "half-surrogate" in Unicode. "Lone surrogate" is clear enough. Or for somewhat fancier English, "isolated surrogate" or "non-syntactic surrogate". To emphasize that Python codecs will only produce them in contexts where a Unicode character or high surrogate (for UTF-16 Python) is syntactically required, "isolated low surrogate" or "isolated trailing surrogate" might be good.[1] 2. The specification should state, and the discussion emphasize, that strings which were produced by surrogate replacement *must not* be used in data interchange with systems that do not specifically accept such strings, and that this is the responsibility of the application.[2] Rather than saying that "dealing with such conflicts is out of scope of this PEP", I would say """Dealing with such conflicts is the responsibility of the application. Since this PEP's mechanism produces valid Unicode where possible, and produces *invalid* code points only via the error handler, one strategy is for the application to validate all other sources of strings as Unicode conforming. There may be other useful application-specific strategies, as well.""" 3. In the discussion, the transition from the example of alternative use of 'python-escape' to discussion of the error handler interface extension is a bit abrupt. I suggest rewriting as: """The extension to the encode error handler interface proposed by this PEP is necessary to implement the 'utf8b' error handler, because there are required byte sequences which cannot be generated from replacement Unicode. However, the encode error handler interface presently requires replacement Unicode to be provided in lieu of the non-encodable Unicode from the source string. Then it promptly encodes that replacement Unicode. In some error handlers, such as the 'utf8b' proposed here, it is also simpler and more efficient for the error handler to provide a pre-encoded replacement byte string, rather than forcing it to calculating Unicode from which the encoder would create the desired bytes.""" Typos (line references are to pep-0383.txt svn r72332): l. 86: "Byte-orientied" -> "Byte-oriented" l. 98, 118, 124, 127, 132, 136: "python-escape" -> "utf8b" l. 130: "provide" -> "provided" l. 134: "calculating" -> "calculate" Footnotes: [1] Unicode 5.0 uses the terms "high-half" and "low-half" at least once, in section 16.6, but the context is such that I take it to refer to "half of the surrogate area". Section 3.8 doesn't use these, instead noting that "leading" and "trailing" are sometimes used instead of "high" and "low". Better to avoid the word "half" in PEP 383, I think. [2] Since this error handler is going to be the default for POSIX I/O, of course people are going to mostly ignore that restriction. The point is, passing such strings to systems that don't expect them is a bug, and the PEP should make it clear that it's the app's bug, not the other system's. On the other hand, using those strings in a context of consenting adults (and I do mean double-opt-in here) is perfectly acceptable. I'm specifically thinking of use in the Tahoe protocol discussed by Zooko O'Whielacronx; it may not be usable there for backward compatibility reasons, but "Unicode conformance" is not an issue in principle. This does imply that programs that take advantage of the error handler specified in this PEP are on their own if they accept data from any sources that are not known to be Unicode-conforming. OTOH, as far as I can see if other sources are known to be Unicode conformant, it's reasonably (but not perfectly) safe to combine them with strings from this PEP (and of course use either 'utf8b' or 'strict', as appropriate, when passing data out of Python). From zookog at gmail.com Tue May 5 17:18:29 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Tue, 5 May 2009 09:18:29 -0600 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull wrote: > > 2. ?The specification should state, and the discussion emphasize, that > ? ?strings which were produced by surrogate replacement *must not* be > ? ?used in data interchange with systems that do not specifically > ? ?accept such strings, and that this is the responsibility of the > ? ?application.[2] That sounds like a useful statement to make. How would an application make sure that they were producing only valid unicode? How about add an option to os.listdir() named "errors" with default value 'utf8b' (or 'surrogate-replace', or whatever the name is)? Then applications which need to produce only valid unicode strings could pass errors=strict, errors=ignore, or errors=replace? (If anyone really wants behavior like Python 3.0 then we could perhaps also add a new one just for os.listdir() named errors=skipfilename.) My most recent plan for Tahoe, as of the letter that I sent last night, is to emulate the behavior of Nautilus and GNU ls by using the 'replace' error handler and (emulating Nautilus) to append " (invalid encoding)" to the end of the string. (screenshot: http://zooko.com/Nautilus_vs_invalid_encoding.png ) So if I could ask os.listdir to return filenames with U+FFFD in place of undecodable characters, then I could subsequently do something like: for f in os.listdir(d, errors='replace'): if u"\ufffd" in f: f += " (invalid encoding)" (On top of that I would have to check for collisions, but that's out of scope.) Regards, Zooko From google at mrabarnett.plus.com Tue May 5 17:25:46 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 05 May 2009 16:25:46 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A005A7A.7070501@mrabarnett.plus.com> Stephen J. Turnbull wrote: > "Martin v. L?wis" writes: > > > I've updated the PEP accordingly. > > I have three substantive comments. First, although consequences for > Python 3 byte interfaces (ie, "none") are explicitly stated, as far as > I can see this PEP could apply to Python 2 as well. I don't think > it's intended that way. Either way, I think you should clarify that > point. > > Second, I suggest "surrogate-replace" as the name of the error handler > rather than "utf8b". (Elsewhere I've suggested others, but I think > this is the best of the bunch.) > +1 > Third, it is not clear to me why non-decodable ASCII should be an > error. There are plenty of low surrogates for the purpose. Is there > another technical reason? Stupid or not, Shift-JIS- and Big5-encoded > file systems are quite common in Asia still (including non-rewritable > media). I think surrogate-replacement of ASCII should at least be an > option. > > I don't think "people shouldn't be using non-ASCII-compatible > encodings for locale encodings" is a sufficient rationale for a hard > error here. I mean, of course they *should* be using UTF-8. Maybe > Python 3.1 should just go ahead and error on any other encoding on > POSIX platforms? > I don't see why the error handler couldn't in principle be used with encodings other than UTF-8, although in that case all of the low surrogates should be open to use. > I have a number of nitpicking comments and technical clarifications on > the PEP. Rationale is in footnotes. There were also a few typos I > noticed. > > 1. There is no such thing as a "half-surrogate" in Unicode. "Lone > surrogate" is clear enough. Or for somewhat fancier English, > "isolated surrogate" or "non-syntactic surrogate". To emphasize > that Python codecs will only produce them in contexts where a > Unicode character or high surrogate (for UTF-16 Python) is > syntactically required, "isolated low surrogate" or "isolated > trailing surrogate" might be good.[1] > > 2. The specification should state, and the discussion emphasize, that > strings which were produced by surrogate replacement *must not* be > used in data interchange with systems that do not specifically > accept such strings, and that this is the responsibility of the > application.[2] > > Rather than saying that "dealing with such conflicts is out of > scope of this PEP", I would say > > """Dealing with such conflicts is the responsibility of the > application. Since this PEP's mechanism produces valid Unicode > where possible, and produces *invalid* code points only via the > error handler, one strategy is for the application to validate all > other sources of strings as Unicode conforming. There may be > other useful application-specific strategies, as well.""" > > 3. In the discussion, the transition from the example of alternative > use of 'python-escape' to discussion of the error handler > interface extension is a bit abrupt. I suggest rewriting as: > > """The extension to the encode error handler interface proposed by > this PEP is necessary to implement the 'utf8b' error handler, > because there are required byte sequences which cannot be > generated from replacement Unicode. However, the encode error > handler interface presently requires replacement Unicode to be > provided in lieu of the non-encodable Unicode from the source > string. Then it promptly encodes that replacement Unicode. In > some error handlers, such as the 'utf8b' proposed here, it is also > simpler and more efficient for the error handler to provide a > pre-encoded replacement byte string, rather than forcing it to > calculating Unicode from which the encoder would create the > desired bytes.""" > > Typos (line references are to pep-0383.txt svn r72332): > > l. 86: "Byte-orientied" -> "Byte-oriented" > l. 98, 118, 124, 127, 132, 136: "python-escape" -> "utf8b" > l. 130: "provide" -> "provided" > l. 134: "calculating" -> "calculate" > > > Footnotes: > [1] Unicode 5.0 uses the terms "high-half" and "low-half" at least > once, in section 16.6, but the context is such that I take it to > refer to "half of the surrogate area". Section 3.8 doesn't use > these, instead noting that "leading" and "trailing" are sometimes > used instead of "high" and "low". Better to avoid the word "half" > in PEP 383, I think. > "Leading" and "trailing" simply state the order, not the set ("high" or "low"), so are not good terms to use. > [2] Since this error handler is going to be the default for POSIX I/O, > of course people are going to mostly ignore that restriction. The > point is, passing such strings to systems that don't expect them > is a bug, and the PEP should make it clear that it's the app's > bug, not the other system's. On the other hand, using those > strings in a context of consenting adults (and I do mean > double-opt-in here) is perfectly acceptable. I'm specifically > thinking of use in the Tahoe protocol discussed by Zooko > O'Whielacronx; it may not be usable there for backward > compatibility reasons, but "Unicode conformance" is not an issue > in principle. > > This does imply that programs that take advantage of the error > handler specified in this PEP are on their own if they accept data > from any sources that are not known to be Unicode-conforming. > OTOH, as far as I can see if other sources are known to be Unicode > conformant, it's reasonably (but not perfectly) safe to combine > them with strings from this PEP (and of course use either 'utf8b' > or 'strict', as appropriate, when passing data out of Python). > Should there be a function or method to check for conformance and lone surrogates? From stephen at xemacs.org Tue May 5 18:32:03 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 01:32:03 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8763gfh5oc.fsf@uwakimon.sk.tsukuba.ac.jp> Zooko O'Whielacronx writes: > How would an application make sure that they were producing only > valid unicode? That's very difficult. There are a couple of sources that I can think of, in Python: C modules, chr(), \u literals, and now codecs with the 'utf8b'. There may be others. You'd need to review your own code for all of them very carefully, and you'd have to validate all strings returned by non-validating APIs (which is all of them in Python now, although many of them can probably be trusted, such as codecs not using the 'utf8b' error handler). > How about add an option to os.listdir() named "errors" with default > value 'utf8b' Seems reasonable to me, but Martin's probably thought more carefully about it. I don't think its applicable to your use case, though, because you want to be able to *access* those files as well as display the names to the users, right? You won't be able to access those files if you receive the names already munged by the error handler. From stephen at xemacs.org Tue May 5 19:31:28 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 02:31:28 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A005A7A.7070501@mrabarnett.plus.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> Message-ID: <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > > I don't think "people shouldn't be using non-ASCII-compatible > > encodings for locale encodings" is a sufficient rationale for a hard > > error here. I mean, of course they *should* be using UTF-8. Maybe > > Python 3.1 should just go ahead and error on any other encoding on > > POSIX platforms? > > > I don't see why the error handler couldn't in principle be used with > encodings other than UTF-8, although in that case all of the low > surrogates should be open to use. I should have been more clear here, I guess. The error handler *can*, and in the PEP *will be* by default, used with all "sane" locale encodings on POSIX. It occurs to me that the PEP maybe should say that it is an error to have your POSIX locale set to UTF-16 or something like that. What "sane" means in this context is 1. ASCII NUL is the bytearray terminator, and can't be used as a byte in a file name. This rules out UTF-16, UTF-32, and widechar EUC encodings, as well as some very rare ones. 2. An ASCII character always translates to the Unicode character with the same code (ie, "to itself"). It is not a part of other sequences (control sequences, or a trailing byte). This rules out EBCDIC, ISO-2022-*, Shift JIS, and Big5, among the encodings I'm familiar with. EBCDIC because only by accident will an EBCDIC character map to the same ASCII character with the same code. The ISO-2022-* encodings are out because ASCII characters are used in escape sequences. Shift JIS and Big5 because in those encodings, a high-bit-set octet signals the start of a multibyte sequence, and some of the trailing bytes may be in the ASCII range. What's left? Well, UTF-8, all of the ISO-8859 sets, several national standards (such as the KOI8 family for Cyrillic), IBM and Microsoft "code pages", and the "packed" EUC encodings used for Japanese, Chinese, and Korean. These all have the character that ASCII is ASCII, and all non-ASCII characters are encoded using only high-bit-set octets. In fact, in practice, on Unix these are invariably what you encounter. So what's the problem? Backward compatibility for Microsoft OSes, which not only used to use MBCS national character sets, but "cleverly" packed more characters into the encoding by using ASCII as trailing bytes. Ie, the aforementioned "insane" Shift JIS (which is mandated by the leading Japanese cellphone service provider even today) and Big5 (the leading encoding for Chinese until very recently). These are very commonly found on archival media, and even on USB keys and so on which tend to be FAT-formatted. This doesn't prevent usage of the Unicode APIs, but up to Windows 2000 most Japanese vendors' OEM version of Windows used FAT format and Shift JIS as the file system encoding, and I know of Japanese offices where Windows 98 systems were in use as recently as early 2007. It's the removable media which are the problem, because on Windows you just use the Unicode APIs. But they're not available on Unix, so you need the byte-oriented APIs. Is this a real problem? I don't know, I don't do Windows, I don't do computing with my cellphone, and I don't need to get Japanese (that might be mixed with Russian ones!!) filenames off of ancient media or CIFS fileshares using Shift JIS. I guess it's possible that cellphones do everything *except* add filenames to directories in Shift JIS, but the filenames are in UTF-16. OTOH, it seems to me that an *optional* extension to handling error on ASCII is technically feasible and would be nearly trivial to add to the PEP. The biggest cost would be adding the error argument to various functions (as Zooko requested) so that surrogate-replace-extended could be specified if needed. > > Footnotes: > > [1] Unicode 5.0 uses the terms "high-half" and "low-half" at least > > once, in section 16.6, but the context is such that I take it to > > refer to "half of the surrogate area". Section 3.8 doesn't use > > these, instead noting that "leading" and "trailing" are sometimes > > used instead of "high" and "low". Better to avoid the word "half" > > in PEP 383, I think. > > > "Leading" and "trailing" simply state the order, not the set ("high" or > "low"), so are not good terms to use. But it's the order that's important. If you've just finished reading a character, and encounter a trailing surrogate, then it was produced by the 'utf8b' error handler; nothing else in a Python codec can do that. If you've just finished reading a character, are in a UTF-16 Python, and encounter a leading surrogate, then you immediately gobble the following code, which must be a trailing surrogate, and combine them to produce a character. The remaining case is that you encounter a valid character. Anything else is an error, and (assuming no bugs), no Python codec will produce anything else. > > This does imply that programs that take advantage of the error > > handler specified in this PEP are on their own if they accept data > > from any sources that are not known to be Unicode-conforming. > > OTOH, as far as I can see if other sources are known to be Unicode > > conformant, it's reasonably (but not perfectly) safe to combine > > them with strings from this PEP (and of course use either 'utf8b' > > or 'strict', as appropriate, when passing data out of Python). > > > Should there be a function or method to check for conformance and > lone surrogates? string.encode('utf-8',errors=strict) will do for now. From google at mrabarnett.plus.com Tue May 5 19:45:45 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 05 May 2009 18:45:45 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A007B49.5000001@mrabarnett.plus.com> Stephen J. Turnbull wrote: > MRAB writes: > > > > I don't think "people shouldn't be using non-ASCII-compatible > > > encodings for locale encodings" is a sufficient rationale for a hard > > > error here. I mean, of course they *should* be using UTF-8. Maybe > > > Python 3.1 should just go ahead and error on any other encoding on > > > POSIX platforms? > > > > > I don't see why the error handler couldn't in principle be used with > > encodings other than UTF-8, although in that case all of the low > > surrogates should be open to use. > > I should have been more clear here, I guess. The error handler *can*, > and in the PEP *will be* by default, used with all "sane" locale > encodings on POSIX. > > It occurs to me that the PEP maybe should say that it is an error > to have your POSIX locale set to UTF-16 or something like that. > > What "sane" means in this context is > > 1. ASCII NUL is the bytearray terminator, and can't be used as a byte > in a file name. This rules out UTF-16, UTF-32, and widechar EUC > encodings, as well as some very rare ones. > [snip] It might be slightly OT, but sometimes strict UTF-8 encoding is violated by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as a terminator. I think I read that Microsoft sometimes does this. From stephen at xemacs.org Tue May 5 20:09:54 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 03:09:54 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A007B49.5000001@mrabarnett.plus.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A007B49.5000001@mrabarnett.plus.com> Message-ID: <87y6tbfmkt.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > [snip] > It might be slightly OT, but sometimes strict UTF-8 encoding is violated > by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as > a terminator. I think I read that Microsoft sometimes does this. Nice hack! as long as you don't let it escape. But if 'strict' errors on this, then PEP 383 'utf8b' will do the right thing, I think. From l.mastrodomenico at gmail.com Tue May 5 20:16:03 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 5 May 2009 20:16:03 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2009/5/5 Stephen J. Turnbull : > Third, it is not clear to me why non-decodable ASCII should be an > error. The PEP originally allowed the conversion to U+DCxx of bytes below 128 that cannot be decoded by the encoding used, but this creates potential security problems. See: -- Lino Mastrodomenico From martin at v.loewis.de Tue May 5 22:46:26 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 05 May 2009 22:46:26 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87eiv3hf22.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <49FDD6DD.6050808@v.loewis.de> <49FFFB93.7020105@egenix.com> <87eiv3hf22.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A00A5A2.2050400@v.loewis.de> > > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 - > > > it's an algorithm based on 16-bit or 32-bit code points. > > I don't understand this phrasing. The algorithm is only applicable to > ASCII-compatible octet streams. It results in code points by a simple > displacement of octet -> octet + 0xDC00. It cannot be used on (say) > UTF-32 to deal with embedded surrogates. > > Certainly, the computation requires (at least) 16 bit numbers, but the > input must be restricted to a stream of 8-bit code points, while the > output is 16- or 32-bit code points. Right - the algorithm maps between bytes and 16/32-bit code units. It works, in particular, for UTF-8, and was originally proposed to apply to UTF-8 - but it can work in any other place that converts bytes to 16/32-bit code units as well. Regards, Martin From martin at v.loewis.de Tue May 5 23:01:49 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 05 May 2009 23:01:49 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A00A93D.3030204@v.loewis.de> > I have three substantive comments. First, although consequences for > Python 3 byte interfaces (ie, "none") are explicitly stated, as far as > I can see this PEP could apply to Python 2 as well. I don't think > it's intended that way. Either way, I think you should clarify that > point. Done: the Python-Version header already clarifies that point. > Second, I suggest "surrogate-replace" as the name of the error handler > rather than "utf8b". I think this is bike-shedding. > Third, it is not clear to me why non-decodable ASCII should be an > error. There are plenty of low surrogates for the purpose. Is there > another technical reason? Stupid or not, Shift-JIS- and Big5-encoded > file systems are quite common in Asia still (including non-rewritable > media). I think surrogate-replacement of ASCII should at least be an > option. It's a security risk. If U+DCXX would map to \xXX, then somebody could embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets sanitized, nobody would expect that this will actually access ../ > 1. There is no such thing as a "half-surrogate" in Unicode. "Lone > surrogate" is clear enough. Or for somewhat fancier English, > "isolated surrogate" or "non-syntactic surrogate". To emphasize > that Python codecs will only produce them in contexts where a > Unicode character or high surrogate (for UTF-16 Python) is > syntactically required, "isolated low surrogate" or "isolated > trailing surrogate" might be good.[1] Fixed. I removed the world "half" everywhere. It really doesn't mean anything to me (it could have been called sunnygate instead, making no difference). I tried to understand "surrogate", and it was explained to me that "surrogate" is something that stands for something - but then I would argue that the two subsequence codes form a surrogate - they stand for something else. The individual surrogate code (in Unicode terminology) doesn't stand for anything. So don't you agree that it is the Unicode terminology that is in error, not the PEP? > 2. The specification should state, and the discussion emphasize, that > strings which were produced by surrogate replacement *must not* be > used in data interchange with systems that do not specifically > accept such strings, and that this is the responsibility of the > application.[2] No. The specification puts no requirements on applications whatsoever. So if you propose to use MUST NOT in the RFC 2119 sense, I strongly disagree. Applications that desire mojibake are free to produce it; we are consenting adults; and all that. > 3. In the discussion, the transition from the example of alternative > use of 'python-escape' to discussion of the error handler > interface extension is a bit abrupt. I suggest rewriting as: > > """The extension to the encode error handler interface proposed by > this PEP is necessary to implement the 'utf8b' error handler, > because there are required byte sequences which cannot be > generated from replacement Unicode. However, the encode error > handler interface presently requires replacement Unicode to be > provided in lieu of the non-encodable Unicode from the source > string. Then it promptly encodes that replacement Unicode. In > some error handlers, such as the 'utf8b' proposed here, it is also > simpler and more efficient for the error handler to provide a > pre-encoded replacement byte string, rather than forcing it to > calculating Unicode from which the encoder would create the > desired bytes.""" Unfortunately, I failed to understand where you want this text to go. What paragraphs should I remove, or (if none), after which paragraph should I insert this text? Regards, Martin From martin at v.loewis.de Tue May 5 23:44:25 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 05 May 2009 23:44:25 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A00B339.5050305@v.loewis.de> > It occurs to me that the PEP maybe should say that it is an error > to have your POSIX locale set to UTF-16 or something like that. No. It is *impossible* to have UTF-16 as the locale character set, not an error. Your statement is like saying "it is an error to breathe in the vacuum". In any case, the discussion says # Encodings that are not compatible with ASCII are not supported by # this specification; bytes in the ASCII range that fail to decode # will cause an exception. It is widely agreed that such encodings # should not be used as locale charsets. Regards, Martin From mal at egenix.com Wed May 6 02:26:31 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 06 May 2009 02:26:31 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A00A93D.3030204@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> Message-ID: <4A00D937.6080403@egenix.com> Martin v. L?wis wrote: >> I have three substantive comments. First, although consequences for >> Python 3 byte interfaces (ie, "none") are explicitly stated, as far as >> I can see this PEP could apply to Python 2 as well. I don't think >> it's intended that way. Either way, I think you should clarify that >> point. > > Done: the Python-Version header already clarifies that point. > >> Second, I suggest "surrogate-replace" as the name of the error handler >> rather than "utf8b". > > I think this is bike-shedding. The name "utf8b" suggested in the PEP is not in line with the codec design and causes confusion with an existing codec of a similar name. Error handlers and codecs are two different things, so the namespaces need to be clearly separate. Please change the name of the error handler to a different name that does not resemble or cause confusion with a codec name and fits the scheme of error handler names we already have in place in Python for replacing error handlers, i.e. "XYZreplace". Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 53 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From stephen at xemacs.org Wed May 6 07:10:41 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 14:10:41 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A00B339.5050305@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> Message-ID: <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > It occurs to me that the PEP maybe should say that it is an error > > to have your POSIX locale set to UTF-16 or something like that. > > No. It is *impossible* to have UTF-16 as the locale character set, > not an error. Your statement is like saying "it is an error to > breathe in the vacuum". I realize this is not useful, so maybe you don't need to mention it. However, it certainly is possible to set LANG with an absurd, or merely dangerous, encoding. > In any case, the discussion says > > # Encodings that are not compatible with ASCII are not supported by > # this specification; bytes in the ASCII range that fail to decode > # will cause an exception. It is widely agreed that such encodings > # should not be used as locale charsets. Which is your excuse for not supporting Shift JIS fully. It doesn't stop people from setting LC_ALL=ja_JP.shift_jis, or using Shift JIS as the default encoding for certain media. From stephen at xemacs.org Wed May 6 07:35:30 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 14:35:30 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> Lino Mastrodomenico writes: > 2009/5/5 Stephen J. Turnbull : > > Third, it is not clear to me why non-decodable ASCII should be an > > error. > > The PEP originally allowed the conversion to U+DCxx of bytes below 128 > that cannot be decoded by the encoding used, but this creates > potential security problems. > > See: Yeah, yeah, this is the same old same old from PEP 3131. Anything that handles the various attacks based on ASCII-alike characters should at least rule out invalid Unicode, too! And where is this U+DC2F supposed to be coming from, anyway? The user's *local* environment or the user's *local* filesystem! Codecs not using 'utf8b' can't produce it, so the only other cases are chr() and \u literals in the *local* process, or an already broken module in your code. I really can't imagine that any sane programmer these days would be using 'utf8b' on bytes received from the Internet! Of course I can't prove that there's no vector for an exploit here (in fact, I'm sure there is one with sufficiently careless handling of input), but I think "consenting adults" covers the Shift JIS use case. Make it an option, but it should be explicitly part of the PEP. From stephen at xemacs.org Wed May 6 08:06:07 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 15:06:07 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A00A93D.3030204@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> Message-ID: <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Done: the Python-Version header already clarifies that point. Ah, OK. I wish my day job required reading more PEPs so I'd be more familiar with these formalities. :-) > > Second, I suggest "surrogate-replace" as the name of the error handler > > rather than "utf8b". > > I think this is bike-shedding. I don't personally care (I already was aware of UTF-8B), but there are plenty of others who do. I think that's a good name to make Marc-Andre and Terry happier. You have to fix the existing uses of the obsolete "python-escape", anyway. > It's a security risk. If U+DCXX would map to \xXX, then somebody could > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets > sanitized, nobody would expect that this will actually access ../ The odds that anybody will actually take notice of U+002E U+002E U+002F in a string are sufficiently small that any number of exploits have already been based on it. I agree that there is some additional risk from this if people make the check for "../" before they prepend "\ucd2e\udc2e\udc2f", but I think that risk is very small compared to the pain of having a error handler whose raison d'etre is to not raise exceptions go ahead and raise them anyway. See also my reply to Lino Mastrodomenico. Again, an option is good enough for my purposes as long as interfaces for os.listdir() and the like support setting the error handler (cf. Zooko's proposal), but I think the option should be available. > I tried to understand "surrogate", and it was explained to me that > "surrogate" is something that stands for something - but then I > would argue that the two subsequence codes form a surrogate - they > stand for something else. The individual surrogate code (in Unicode > terminology) doesn't stand for anything. So don't you agree that > it is the Unicode terminology that is in error, not the PEP? Plausibly so. Keep making comments like that and nobody will ever let you off the hook for being a non-native speaker! However, "surrogate" in English is typically used in situation that are too complex to be covered by simply "substitution." I've always read "surrogate" as "alternative form of encoding", and "surrogate code point" as "code point in that alternative form of encoding". Where it's an alternative to code-point-is-scalar-value. I think probably the authors of the terminology just made the best of a bad situation, I can't think of a better single word for this. > No. The specification puts no requirements on applications whatsoever. > So if you propose to use MUST NOT in the RFC 2119 sense, I strongly > disagree. I do propose that. But you're writing the PEP, so this battle will have to be deferred. Eventually Python will have to take a stand on Unicode conformance, but it's not urgent yet. > > 3. In the discussion, the transition from the example of alternative > > use of 'python-escape' to discussion of the error handler > > interface extension is a bit abrupt. I suggest rewriting as: > > > > """The extension to the encode error handler interface proposed by > > this PEP is necessary to implement the 'utf8b' error handler, > > because there are required byte sequences which cannot be > > generated from replacement Unicode. However, the encode error > > handler interface presently requires replacement Unicode to be > > provided in lieu of the non-encodable Unicode from the source > > string. Then it promptly encodes that replacement Unicode. In > > some error handlers, such as the 'utf8b' proposed here, it is also > > simpler and more efficient for the error handler to provide a > > pre-encoded replacement byte string, rather than forcing it to > > calculating Unicode from which the encoder would create the > > desired bytes.""" > > Unfortunately, I failed to understand where you want this text to > go. What paragraphs should I remove, or (if none), after which > paragraph should I insert this text? Sorry! I suggest substituting the paragraph above for the paragraph which begins "The encode error handler interface presentlyrequires..." at line 129. I think I forgot to do this before: "I hereby dedicate all text I suggest for inclusion in the PEP to the public domain." From martin at v.loewis.de Wed May 6 09:31:00 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 09:31:00 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A00D937.6080403@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> Message-ID: <4A013CB4.9010204@v.loewis.de> > The name "utf8b" suggested in the PEP is not in line with the codec > design Where is that design documented, and how exactly violates the name the design (chapter and verse, please). > Error handlers and codecs are two different things, so the namespaces > need to be clearly separate. They *are* separate naemspaces; that's guaranteed by the implementation. Regards, Martin From martin at v.loewis.de Wed May 6 09:36:01 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 09:36:01 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A013DE1.5000401@v.loewis.de> Stephen J. Turnbull wrote: > "Martin v. L?wis" writes: > > > It occurs to me that the PEP maybe should say that it is an error > > > to have your POSIX locale set to UTF-16 or something like that. > > > > No. It is *impossible* to have UTF-16 as the locale character set, > > not an error. Your statement is like saying "it is an error to > > breathe in the vacuum". > > I realize this is not useful, so maybe you don't need to mention it. > However, it certainly is possible to set LANG with an absurd, or > merely dangerous, encoding. How so? The C library will filter it out. > > In any case, the discussion says > > > > # Encodings that are not compatible with ASCII are not supported by > > # this specification; bytes in the ASCII range that fail to decode > > # will cause an exception. It is widely agreed that such encodings > > # should not be used as locale charsets. > > Which is your excuse for not supporting Shift JIS fully. It doesn't > stop people from setting LC_ALL=ja_JP.shift_jis, Well, it *does* stop them from doing so if their systems don't support the locale setting. In any case, if they do this, PEP 383 will not support them. > or using Shift JIS as the default encoding for certain media. I fail to see how this could ever matter. If, by "media", you mean things like removable disks, and the file name encoding used on them, it's fairly irrelevant for the PEP, since Python won't start using Shift JIS as its file system encoding just because that's the encoding used on the disk. Regards, Martin From martin at v.loewis.de Wed May 6 09:53:33 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 09:53:33 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A0141FD.2050307@v.loewis.de> > > > Second, I suggest "surrogate-replace" as the name of the error handler > > > rather than "utf8b". > > > > I think this is bike-shedding. > > I don't personally care (I already was aware of UTF-8B), but there are > plenty of others who do. I think it is a fairly bad name, because it is easy to confuse it with the "surrogates" error handler (unless you suggest to rename that also). > You have to fix the existing uses of > the obsolete "python-escape", anyway. Indeed - but only in the PEP. In the implementation, it's already utf8b throughout. Now it is also in the PEP; thanks for pointing that out. > > It's a security risk. If U+DCXX would map to \xXX, then somebody could > > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets > > sanitized, nobody would expect that this will actually access ../ > > The odds that anybody will actually take notice of U+002E U+002E > U+002F in a string are sufficiently small that any number of exploits > have already been based on it. I agree that there is some additional > risk from this if people make the check for "../" before they prepend > "\ucd2e\udc2e\udc2f", but I think that risk is very small compared to > the pain of having a error handler whose raison d'etre is to not raise > exceptions go ahead and raise them anyway. The problem is that functions like normpath will recognize ../, and that applications rely on them for file name sanitation. If they could be tricked into writing outside of their target folders, this would be a huge security risk. OTOH, I don't care breaking applications on misconfigured systems. People using SJIS as their locale encodings have bigger problems than Python raising exceptions. > See also my reply to Lino Mastrodomenico. URL? > But you're writing the PEP, so this battle will have to be deferred. > Eventually Python will have to take a stand on Unicode conformance, > but it's not urgent yet. I think it's always applications that are conforming or not, rather than libraries. Libraries should allow to write conforming applications. They may refuse to write certain non-conforming applications (although users then replace the library with one that does allow them to do what they want). Libraries can never enforce that applications conform to some standard. > Sorry! I suggest substituting the paragraph above for the paragraph > which begins "The encode error handler interface presentlyrequires..." > at line 129. Ah, ok. This was Glen Linderman's text before - now it's yours :-) > I think I forgot to do this before: "I hereby dedicate all text > I suggest for inclusion in the PEP to the public domain." :-) Martin From martin at v.loewis.de Wed May 6 10:03:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 10:03:47 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A014463.4070109@v.loewis.de> > Yeah, yeah, this is the same old same old from PEP 3131. Anything > that handles the various attacks based on ASCII-alike characters > should at least rule out invalid Unicode, too! > > And where is this U+DC2F supposed to be coming from, anyway? The > user's *local* environment or the user's *local* filesystem! Why is that not a threat? Suppose you have a setuid application, and you pass some string on the command line that decodes to /../. Then the setuid application will be tricked into modifying files it didn't mean to modify. Likewise, it might come from a relational database. Use a relational database that supports unicode code units, or lone surrogates through utf-8, and fill in some bogus data. Then have the Python application (running as root) read it. > Of course I can't prove that there's no vector for an exploit here (in > fact, I'm sure there is one with sufficiently careless handling of > input), but I think "consenting adults" covers the Shift JIS use case. > Make it an option, but it should be explicitly part of the PEP. Nothing is lost at the moment. If users complain, we can still think of ways to enhance the experience. In any case, Python 3.1b1 may get released today, so it's way too late for new features in the PEP. They can wait for Python 3.2. Regards, Martin From ziade.tarek at gmail.com Wed May 6 11:01:14 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 6 May 2009 11:01:14 +0200 Subject: [Python-Dev] Help on issue 5941 Message-ID: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> Hello, I need some help on http://bugs.python.org/issue5941 The bug is quite simple: the Distutils unixcompiler used to set the archiver command to "ar -rc". For quite a while now, this behavior has changed in order to be able to customize the compiler behavior from the environment. That introduced a regression because the mechanism in Distutils that looks for the AR variable in the environment also looks into the Makefile of Python. (in the Makefile then is os.environ) And as a matter of fact, AR is set to "ar" in there, so the -cr option is not set anymore. So my question is : should I make a change into the Makefile by adding for example a variable called AR_OPTIONS then build the ar command with AR + AR_OPTIONS *or* that doesn't make sense and I just need to change the behavior so it doesn't look for AR into the Makefile. (just in os.environ) Thanks Tarek -- Tarek Ziad? | http://ziade.org From solipsis at pitrou.net Wed May 6 11:17:43 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 09:17:43 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> <4A0141FD.2050307@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > > I don't personally care (I already was aware of UTF-8B), but there are > > plenty of others who do. > > I think it is a fairly bad name, because it is easy to confuse it with > the "surrogates" error handler (unless you suggest to rename that also). I didn't bother to say it at the time, but I think "surrogates" is a pretty bad name. It should be more indicative of what it does, e.g. "surrogates-pass", or "surrogates-accept". > > > It's a security risk. If U+DCXX would map to \xXX, then somebody could > > > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets > > > sanitized, nobody would expect that this will actually access ../ Agreed this is an annoying security breach. The whole point of the PEP is that application developers do not have to care about filename encoding issues, which is defeated is they have to check for strange (illegal) combinations of characters. By the way, what are the ASCII characters that are not suppported by Shift-JIS? Not many I suppose? (if I read the Wikipedia entry correctly, it's only the backslash and the tilde). Regards Antoine. From stephen at xemacs.org Wed May 6 11:39:02 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 18:39:02 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A013DE1.5000401@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> <4A013DE1.5000401@v.loewis.de> Message-ID: <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > I fail to see how this could ever matter. If, by "media", you mean > things like removable disks, and the file name encoding used on them, > it's fairly irrelevant for the PEP, since Python won't start using > Shift JIS as its file system encoding just because that's the encoding > used on the disk. I'm sorry for the lack of clarity of my posts, but somehow you're completely missing the point. The point is precisely that Python *won't* use Shift JIS as the file system encoding (if it did there would be no problem with reading Shift JIS), but the people who created the media *did*. Now, with Python's file system encoding == UTF-8 or any packed EUC, and more than a handful of Shift JIS or Big5 characters in file names, one is *almost certain* to encounter ASCII as the second byte of a multibyte sequence. PEP 383 can't handle this, but it is sure to be the most common use case for PEP 383 in East Asia. From mal at egenix.com Wed May 6 11:53:12 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 06 May 2009 11:53:12 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A013CB4.9010204@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> Message-ID: <4A015E08.5000203@egenix.com> Martin v. L?wis wrote: >> The name "utf8b" suggested in the PEP is not in line with the codec >> design > > Where is that design documented, and how exactly violates the name > the design (chapter and verse, please). Martin, I designed the whole Python codec machinery, so even if this is not explicitly written down somewhere, you can take my word for it. I don't want users to be confused by such an error handler name, so please change it ! Here's a list of the currently available error handlers (taken from codecs.py): The .encode()/.decode() methods may use different error handling schemes by providing the errors argument. These string values are predefined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs on decoding and '?' on encoding. 'xmlcharrefreplace' - Replace with the appropriate XML character reference (only for encoding). 'backslashreplace' - Replace with backslashed escape sequences (only for encoding). The set of allowed values can be extended via register_error. >> Error handlers and codecs are two different things, so the namespaces >> need to be clearly separate. > > They *are* separate naemspaces; that's guaranteed by the implementation. In the implementation, yes, but not in the head of a typical user: the 'utf8b' looks more like a codec name than an error handler name. I want to avoid any such confusion with Python codecs and don't understand why you are making a problem out of this. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 53 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From google at mrabarnett.plus.com Wed May 6 12:08:45 2009 From: google at mrabarnett.plus.com (MRAB) Date: Wed, 06 May 2009 11:08:45 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A015E08.5000203@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> Message-ID: <4A0161AD.6000605@mrabarnett.plus.com> M.-A. Lemburg wrote: > Martin v. L?wis wrote: >>> The name "utf8b" suggested in the PEP is not in line with the codec >>> design >> Where is that design documented, and how exactly violates the name >> the design (chapter and verse, please). > > Martin, I designed the whole Python codec machinery, so even if > this is not explicitly written down somewhere, you can take my > word for it. > > I don't want users to be confused by such an error handler > name, so please change it ! > > Here's a list of the currently available error handlers (taken from > codecs.py): > > The .encode()/.decode() methods may use different error > handling schemes by providing the errors argument. These > string values are predefined: > > 'strict' - raise a ValueError error (or a subclass) > 'ignore' - ignore the character and continue with the next > 'replace' - replace with a suitable replacement character; > Python will use the official U+FFFD REPLACEMENT > CHARACTER for the builtin Unicode codecs on > decoding and '?' on encoding. > 'xmlcharrefreplace' - Replace with the appropriate XML > character reference (only for encoding). > 'backslashreplace' - Replace with backslashed escape sequences > (only for encoding). > > The set of allowed values can be extended via register_error. > >>> Error handlers and codecs are two different things, so the namespaces >>> need to be clearly separate. >> They *are* separate naemspaces; that's guaranteed by the implementation. > > In the implementation, yes, but not in the head of a typical user: > the 'utf8b' looks more like a codec name than an error handler > name. > Judging by the existing names, I think that 'surrogate' would be reasonable. It already contains the meaning of substitute, it's not too long, and the codes which act as replacements are already called surrogates. > I want to avoid any such confusion with Python codecs and don't > understand why you are making a problem out of this. > From solipsis at pitrou.net Wed May 6 12:11:56 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 10:11:56 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> Message-ID: MRAB mrabarnett.plus.com> writes: > > Judging by the existing names, I think that 'surrogate' would be > reasonable. It already contains the meaning of substitute, Only if you are a native English-speaker I suppose... For me it's just a technical term denoting a certain class of unicode code points (I'm not sure of the latter terminology ;-)). Regards Antoine. From l.mastrodomenico at gmail.com Wed May 6 12:22:50 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Wed, 6 May 2009 12:22:50 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> <4A0141FD.2050307@v.loewis.de> Message-ID: 2009/5/6 Antoine Pitrou : > By the way, what are the ASCII characters that are not suppported by Shift-JIS? > Not many I suppose? (if I read the Wikipedia entry correctly, it's only the > backslash and the tilde). The biggest problem with Shift-JIS is that a perfectly valid unicode character above 127 can be encoded to a byte sequence that includes bytes in range(128). E.g. the character ? (a.k.a. '\u639b') when encoded with Shift-JIS becomes the two bytes sequence b'\x8a|'. Notice that the second byte is 124, which on POSIX is usually interpreted as the pipe character and can have security implications. It's a know problem with Shift-JIS and was fixed in UTF-8. -- Lino Mastrodomenico From regebro at gmail.com Wed May 6 12:28:22 2009 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 6 May 2009 12:28:22 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A013CB4.9010204@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> Message-ID: <319e029f0905060328s5f3446a1j92c52d7d6cc140ae@mail.gmail.com> On Wed, May 6, 2009 at 09:31, "Martin v. L?wis" wrote: > They *are* separate naemspaces; that's guaranteed by the implementation. Yes. But utf8b *sounds like* an encoding. When it isn't. I sure thought it was when it was first mentioned. I agree that it would be better to find another name. 'utf8-binary-replace'? Is it only usable with utf8 as an encoding? -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 From stephen at xemacs.org Wed May 6 13:39:18 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 20:39:18 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> <4A0141FD.2050307@v.loewis.de> Message-ID: <87ljpafok9.fsf@uwakimon.sk.tsukuba.ac.jp> Lino Mastrodomenico writes: > It's a know problem with Shift-JIS and was fixed in UTF-8. It was fixed in EUC before Shift-JIS was invented by Microsoft or Big5 was invented by the Taiwanese clone makers. Guido's not the only language designer with a time machine.... From stephen at xemacs.org Wed May 6 15:33:17 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 06 May 2009 22:33:17 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A014463.4070109@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> Message-ID: <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > Yeah, yeah, this is the same old same old from PEP 3131. Anything > > that handles the various attacks based on ASCII-alike characters > > should at least rule out invalid Unicode, too! > > > > And where is this U+DC2F supposed to be coming from, anyway? The > > user's *local* environment or the user's *local* filesystem! > > Why is that not a threat? Suppose you have a setuid application, and > you pass some string on the command line that decodes to /../. Then > the setuid application will be tricked into modifying files it didn't > mean to modify. Of course this is a threat, assuming that the application takes no precautions. But first, it should be stopped by any of several standard precautions. For example, applying os.path.realpath (come to think of it, PEP 383 should say something about realpath, shouldn't it?) and os.path.normpath (PEP 383 should definitely say something about this function; maybe PEP 3131 should, too) before checking access restrictions. If you're not running your paths through those, you're already vulnerable to symlink attacks, and maybe other forms of spoofing. Second, it's a threat already enabled by your restricted version of PEP 383. Access control applies to subdirectories as well as to parent directories. Since you can insert arbitrary non-ASCII bytes into the path using the current definition of 'utf8b', name-based access restrictions can be bypassed in exactly the same way for any directory whose name is not 100.00% ASCII, and the setuid application will be tricked into modifying files it didn't mean to modify. Also, on Mac OS X, system directories, including directories containing system libraries, frameworks, and executables, may be accessible via locale-specific names (I don't have a Japanese- localized Mac at hand to check, but I'm pretty sure in my old Mac the Japanese names appeared in ls in Terminal.app, which means it may be possible to access system directories containing libraries, frameworks, and executables this way). Those can be spoofed in exactly the same way. > Nothing is lost at the moment. Nothing is lost compared to 'strict', true, but under the PEP as it is a large fraction of Shift JIS and Big5 filenames cannot be read under ASCII-compatible file system encodings using 'utf8b'. Yet it is those users who are placed at risk by PEP 383. > In any case, Python 3.1b1 may get released today, so it's way too late > for new features in the PEP. They can wait for Python 3.2. You have convinced me that the PEP should wait as well. In its current form it is incomplete and dangerous. From solipsis at pitrou.net Wed May 6 15:40:16 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 13:40:16 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull xemacs.org> writes: > > Nothing is lost compared to 'strict', true, but under the PEP as it is > a large fraction of Shift JIS and Big5 filenames cannot be read under > ASCII-compatible file system encodings using 'utf8b'. You should really be more specific. I'm not sure about others, but I don't understand what filenames you are talking about. From rdmurray at bitdance.com Wed May 6 15:55:16 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 6 May 2009 09:55:16 -0400 (EDT) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, 6 May 2009 at 13:40, Antoine Pitrou wrote: > Stephen J. Turnbull xemacs.org> writes: >> >> Nothing is lost compared to 'strict', true, but under the PEP as it is >> a large fraction of Shift JIS and Big5 filenames cannot be read under >> ASCII-compatible file system encodings using 'utf8b'. > > You should really be more specific. I'm not sure about others, but I don't > understand what filenames you are talking about. Seems to me that the best thing to do would be to file a bug report with test cases that demonstrate the problems when run against the current py3k trunk. Especially the security issues you cite (which I don't understand). --David From zooko at zooko.com Wed May 6 15:48:57 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Wed, 6 May 2009 07:48:57 -0600 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4D13A827-2FC4-43F8-99CD-7188F832EA2A@zooko.com> On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote: > You have convinced me that the PEP should wait as well. > > In its current form it is incomplete and dangerous. +1 on delaying PEP 383 I think PEP 383 is a good idea in principle, but I'm still struggling to understand it myself, and it seems to offer new hazards for the unwary programmer. On the other hand, maybe the wary programmers are waiting for Python 3.2 anyway . On the gripping hand, if PEP 383 is released in Python 3.1, will that obligate python-dev to support it indefinitely, at least in backwards- compatibility mode? I'm not thinking of API compatibility as much as data compatibility -- someone used Python 3.1 to write down some filenames, and now a few years later they are trying to use the latest and greatest Python release to read those filenames... Regards, Zooko From foom at fuhm.net Wed May 6 16:41:53 2009 From: foom at fuhm.net (James Y Knight) Date: Wed, 6 May 2009 10:41:53 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> <4A013DE1.5000401@v.loewis.de> <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote: > Now, with Python's file system encoding == UTF-8 or any packed EUC, > and more than a handful of Shift JIS or Big5 characters in file names, > one is *almost certain* to encounter ASCII as the second byte of a > multibyte sequence. PEP 383 can't handle this Hm, I haven't tried the implementation, but I thought that what would happen is: '\x85a'.decode('utf-8', 'utf8b/surrogate-replace/whateveritscalled') - > u'\uDC85a' If that indeed doesn't happen, that's certainly a defect and should be remedied. > , but it is sure to be > the most common use case for PEP 383 in East Asia. Yes. James From ncoghlan at gmail.com Wed May 6 16:59:30 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 07 May 2009 00:59:30 +1000 Subject: [Python-Dev] Undocumented change / bug in Python3's PyMapping_Check In-Reply-To: <3283f7fe0905042219r23113ca6ud6dd3840d7462f37@mail.gmail.com> References: <3283f7fe0905042219r23113ca6ud6dd3840d7462f37@mail.gmail.com> Message-ID: <4A01A5D2.4030803@gmail.com> John Millikin wrote: > In Python 2, PyMapping_Check will return 0 for list objects. In Python > 3, it returns 1. Obviously, this makes it rather difficult to > differentiate between mappings and other sized iterables. In addition, > it differs from the behavior of the ``collections.Mapping`` ABC -- > isinstance([], collections.Mapping) returns False. > > I believe the new behavior is erroneous, but would like to confirm > that before filing a bug. It's not a bug. PyMapping_Check just tells you if a type has an entry in the tp_as_mapping->mp_subscript slot. In 2.x, it used to have an additional condition that the tp_as_sequence->sq_slice slot be empty, but that has gone away in Py3k because the sq_slice slot has been removed. Even in 2.x that test wasn't a reliable way of telling if something was a mapping or a sequence - it happened to get it right for lists and tuples (since they define __getslice__ and __setslice__), but this is not the case for new-style user defined sequences: >>> from operator import isMappingType >>> class MySeq(object): ... def __getitem__(self, idx): ... # Is this a mapping or an unsliceable sequence? ... return idx*2 ... >>> isMappingType(MySeq()) True Using the new collections module ABCs to check for sequences and mappings. That's what they're for, and they will give you a much more reliable answer than the C level checks (which are really just an implementation detail). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Wed May 6 18:54:37 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 16:54:37 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> <4D13A827-2FC4-43F8-99CD-7188F832EA2A@zooko.com> Message-ID: Zooko Wilcox-O'Hearn zooko.com> writes: > > I'm not thinking of API compatibility as much as > data compatibility -- someone used Python 3.1 to write down some > filenames, and now a few years later they are trying to use the > latest and greatest Python release to read those filenames... Well, if the filenames are generated by Python (as opposed to read from an existing directory on disk), they should be regular unicode objects without any lone surrogates, so I don't see the compatibility problem. Regards Antoine. From v+python at g.nevcal.com Wed May 6 19:05:01 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 10:05:01 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A01C33D.3030906@g.nevcal.com> On approximately 5/6/2009 6:33 AM, came the following characters from the keyboard of Stephen J. Turnbull: > "Martin v. L?wis" writes: > > In any case, Python 3.1b1 may get released today, so it's way too late > > for new features in the PEP. They can wait for Python 3.2. > > You have convinced me that the PEP should wait as well. > > In its current form it is incomplete and dangerous. I see nothing in this thread that suggests that the PEP is dangerous in its current form. While I (still) think that more readable transcodings could have been used, and while I had difficulty fully understanding the PEP at first, now that I think I do understand the PEP, and it has been somewhat clarified and amended, I cannot see how it could be dangerous. A specific case of danger should be included with such a statement. Regarding incomplete, I agree it won't brush my teeth for me, but I think it does solve the problem it sets out to solve. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Wed May 6 19:08:22 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 10:08:22 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0161AD.6000605@mrabarnett.plus.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> Message-ID: <4A01C406.3030004@g.nevcal.com> On approximately 5/6/2009 3:08 AM, came the following characters from the keyboard of MRAB: > M.-A. Lemburg wrote: >> Martin v. L?wis wrote: > Judging by the existing names, I think that 'surrogate' would be > reasonable. It already contains the meaning of substitute, it's not too > long, and the codes which act as replacements are already called > surrogates. > >> I want to avoid any such confusion with Python codecs and don't >> understand why you are making a problem out of this. +1 for "surrogate" as the name for the error handler. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Wed May 6 19:11:15 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 10:11:15 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0141FD.2050307@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> <4A0141FD.2050307@v.loewis.de> Message-ID: <4A01C4B3.9050905@g.nevcal.com> On approximately 5/6/2009 12:53 AM, came the following characters from the keyboard of Martin v. L?wis: >> Sorry! I suggest substituting the paragraph above for the paragraph >> which begins "The encode error handler interface presentlyrequires..." >> at line 129. > > Ah, ok. This was Glen Linderman's text before - now it's yours :-) Which is fine by me. Stephen's is more explanatory than mine, but says the same thing. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From tjreedy at udel.edu Wed May 6 21:13:55 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 May 2009 15:13:55 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01C406.3030004@g.nevcal.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> Message-ID: Glenn Linderman wrote: > On approximately 5/6/2009 3:08 AM, came the following characters from > the keyboard of MRAB: >> M.-A. Lemburg wrote: >>> Martin v. L?wis wrote: > >> Judging by the existing names, I think that 'surrogate' would be >> reasonable. It already contains the meaning of substitute, it's not too >> long, and the codes which act as replacements are already called >> surrogates. >> >>> I want to avoid any such confusion with Python codecs and don't >>> understand why you are making a problem out of this. > > > +1 for "surrogate" as the name for the error handler. > > +1 from me also From zooko at zooko.com Wed May 6 21:18:03 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Wed, 6 May 2009 13:18:03 -0600 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> <4D13A827-2FC4-43F8-99CD-7188F832EA2A@zooko.com> Message-ID: On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote: > Zooko Wilcox-O'Hearn zooko.com> writes: >> >> I'm not thinking of API compatibility as much as data >> compatibility -- someone used Python 3.1 to write down some >> filenames, and now a few years later they are trying to use the >> latest and greatest Python release to read those filenames... > > Well, if the filenames are generated by Python (as opposed to read > from an existing directory on disk), they should be regular unicode > objects without any lone surrogates, so I don't see the > compatibility problem. I meant that the application reads filenames from an existing directory on disk, saves those filenames, and then later, using a future version of Python, wants to read them and use them. I'm not saying that I know this would be a problem. I'm saying that I personally can't tell whether it would be a problem or not, and the extensive discussions so far have not convinced me that there is anyone who both understands PEP 383 and considers this use case. Many people who apparently understand encoding issues well have said something to the effect that there is no problem, but those people haven't yet managed to get through my thick skull how I would use PEP 383 safely for this sort of use case -- the one where data generated by os.listdir() travels forward in time or the one were that data travels sideways to other systems, including Windows or other systems that validate incoming unicode. That's why I am a bit uncomfortable about PEP 383 being quickly implemented and deployed in Python 3.1. By the way, much of the detailed discussion about what Tahoe requires and how that may or may not benefit from PEP 383 has now moved to the tahoe-dev mailing list: http://allmydata.org/cgi-bin/mailman/listinfo/ tahoe-dev . Regards, Zooko From v+python at g.nevcal.com Wed May 6 22:17:05 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 13:17:05 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> <4D13A827-2FC4-43F8-99CD-7188F832EA2A@zooko.com> Message-ID: <4A01F041.9000709@g.nevcal.com> On approximately 5/6/2009 12:18 PM, came the following characters from the keyboard of Zooko Wilcox-O'Hearn: > On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote: > >> Zooko Wilcox-O'Hearn zooko.com> writes: >>> >>> I'm not thinking of API compatibility as much as data compatibility >>> -- someone used Python 3.1 to write down some filenames, and now a >>> few years later they are trying to use the latest and greatest Python >>> release to read those filenames... >> >> Well, if the filenames are generated by Python (as opposed to read >> from an existing directory on disk), they should be regular unicode >> objects without any lone surrogates, so I don't see the compatibility >> problem. > > I meant that the application reads filenames from an existing directory > on disk, saves those filenames, and then later, using a future version > of Python, wants to read them and use them. Regarding future versions of Python. In the worst case, even if Python's default behavior changes, the transcoding done by PEP 383 can be done in other software too... it is a straightforward, fully specified, 1-to-1, reversible transcoding process, affecting and generating only invalid byte encodings on one side, and invalid Unicode sequences on the other. So if Python's default behavior should change, the transcoding implemented by PEP 383 could be easily reimplemented to enable a future version of a Python application to manipulate the transcoded, saved, filenames. By easily, I mean that I could code it in a couple hours, max. > I'm not saying that I know this would be a problem. I'm saying that I > personally can't tell whether it would be a problem or not, and the > extensive discussions so far have not convinced me that there is anyone > who both understands PEP 383 and considers this use case. Does the above help? > Many people who apparently understand encoding issues well have said > something to the effect that there is no problem, but those people > haven't yet managed to get through my thick skull how I would use PEP > 383 safely for this sort of use case -- the one where data generated by > os.listdir() travels forward in time or the one were that data travels > sideways to other systems, including Windows or other systems that > validate incoming unicode. Regarding data traveling sideways, some comments: 1) PEP 383's effect could be recoded in other languages as easily as it is in Python (or the C in which Python is implmented). So that could be a solution. 2) You mention "Windows" and "other systems that validate incoming unicode" in the same phrase, as if you think that "Windows" qualifies as an "other systems that validate incoming unicode", but it does not (at least not universally). > That's why I am a bit uncomfortable about PEP 383 being quickly > implemented and deployed in Python 3.1. Does the above help? > By the way, much of the detailed discussion about what Tahoe requires > and how that may or may not benefit from PEP 383 has now moved to the > tahoe-dev mailing list: > http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev . I have no background with Tahoe, nor particular interest, although it sounds like a useful project... so I won't be joining that list. I have no idea if there is an installed base of existing Tahoe file systems, my suggestions below assume that there is not, and that you are presently inventing them. Therefore, I provide no migration path, although I could invent one, but it would take longer to describe. However, since I'm responding here, and have read what you have posted here, it seems like the following could be true. Assumptions from your emails: A) Tahoe wants to provide a UTF-8 file name system B) Tahoe wants to interface to POSIX systems that use (and do not validate) byte interfaces. C) Tahoe wants to interface to non-POSIX systems that use 16-bit file name interfaces, with no validation. D) Tahoe wants to interface to non-POSIX systems that use 16-bit file name interfaces, with validation. Uncertainties: I'm not clear on what your goals are for Tahoe filenames. There seem to be 2 possibilities: 1) you want to reject attempts to use non-validating Unicode, be it from a 16-bit interface, or a bytes interface. 2) you don't want to reject non-validating Unicode, but you want to convert it to valid Unicode for (D) systems. 3) Orthogonally, you might want to store only Valid Unicode in the names, or you might not care, if you can meet the other goals. Truisms: If you want to support (D), and (2), then you must transform names at some point, using some scheme, because not all names supplied by (B) systems will be acceptable to (D) systems. You can choose to do this transformation when a (B) system provides an invalid (per Unicode) name, or you can choose to do the transformation when a (D) system accesses a file with an invalid (per Unicode) name. If the (B) and (D) systems talk to each other outside of Tahoe, they will have to do similar transformations, or, if they both access the same Tahoe system, they will have to do the identical transformation, to be sure that they can access the same file. All transcoding schemes have the possibility of data puns between non-transcoded names and transcoded names. In order to successfully and properly manipulate a name, you must know whether or not it has been transcoded, and how. PEP 383 limits its transcoding to names that are invalid (per Unicode). Names that cannot be properly decoded to Unicode are decoded to invalid Unicode. Names that are invalid Unicode are encoded to invalid byte sequences (per the encoding scheme specified). For PEP 383 and Python, transcoded names can be distinguished by checking for the existence of lone surrogates in the str form of the filename, or by attempting to do a strict decoding of the bytes form of the filename, depending on what you have (generally, the former). For PEP 383 and Python, the names will round trip from the POSIX bytes interfaces to the program, and back to POSIX bytes interfaces, as long as only Python wrappers of system functions are used, and the filesystem encoding is not changed between calls (or is restored). Passing them to 3rd party libraries or other systems requires extra work, if there is a desire to manipulate files with names that are not decodeable to Unicode by the standard decoding algorithm for that encoding. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Wed May 6 22:40:13 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:40:13 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A015E08.5000203@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> Message-ID: <4A01F5AD.4000404@v.loewis.de> >>> The name "utf8b" suggested in the PEP is not in line with the codec >>> design >> Where is that design documented, and how exactly violates the name >> the design (chapter and verse, please). > > Martin, I designed the whole Python codec machinery Not true. PEP 293 was written and designed by Walter D?rwald. > so even if > this is not explicitly written down somewhere, you can take my > word for it. If the design was specified in writing somewhere, I would probably challenge it as obsolete. If it isn't described anywhere, I'll have to ignore it. > I want to avoid any such confusion with Python codecs and don't > understand why you are making a problem out of this. Because utf8b (or, perhaps "UTF-8b") is the official name for this algorithm: http://hyperreal.org/~est/utf-8b/ Regards, Martin From martin at v.loewis.de Wed May 6 22:34:53 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:34:53 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> <4A013DE1.5000401@v.loewis.de> <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A01F46D.50105@v.loewis.de> > I'm sorry for the lack of clarity of my posts, but somehow you're > completely missing the point. The point is precisely that Python > *won't* use Shift JIS as the file system encoding (if it did there > would be no problem with reading Shift JIS), but the people who > created the media *did*. > > Now, with Python's file system encoding == UTF-8 or any packed EUC, > and more than a handful of Shift JIS or Big5 characters in file names, > one is *almost certain* to encounter ASCII as the second byte of a > multibyte sequence. PEP 383 can't handle this Not true. PEP 383 handles this very example just fine, with no problems that I can see. Can you propose a specific example that you think might cause problems? By "specific", I mean: what file names (exact bytes, please), what locale charset, what API calls. Regards, Martin From martin at v.loewis.de Wed May 6 22:41:11 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:41:11 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0161AD.6000605@mrabarnett.plus.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> Message-ID: <4A01F5E7.7030401@v.loewis.de> > Judging by the existing names, I think that 'surrogate' would be > reasonable MAL's list of existing names is incomplete. "surrogates" is already an existing name, also, and it means something different (similar, but different). Regards, Martin From martin at v.loewis.de Wed May 6 22:42:03 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:42:03 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> Message-ID: <4A01F61B.1000203@v.loewis.de> Terry Reedy wrote: > Glenn Linderman wrote: >> On approximately 5/6/2009 3:08 AM, came the following characters from >> the keyboard of MRAB: >>> M.-A. Lemburg wrote: >>>> Martin v. L?wis wrote: >> >>> Judging by the existing names, I think that 'surrogate' would be >>> reasonable. It already contains the meaning of substitute, it's not too >>> long, and the codes which act as replacements are already called >>> surrogates. >>> >>>> I want to avoid any such confusion with Python codecs and don't >>>> understand why you are making a problem out of this. >> >> >> +1 for "surrogate" as the name for the error handler. >> >> > +1 from me also Despite there being also an error handler called "surrogates". Are you serious? Regards, Martin From martin at v.loewis.de Wed May 6 22:44:09 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:44:09 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <319e029f0905060328s5f3446a1j92c52d7d6cc140ae@mail.gmail.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <319e029f0905060328s5f3446a1j92c52d7d6cc140ae@mail.gmail.com> Message-ID: <4A01F699.6050408@v.loewis.de> > Is it only usable with utf8 as an encoding? No, it applies to any codec which potentially cannot decode all bytes >127. Regards, Martin From solipsis at pitrou.net Wed May 6 22:48:15 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 20:48:15 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > Despite there being also an error handler called "surrogates". People, perhaps we could end all the bikeshedding and call one of those handlers "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful to what they actually /do/? Regards Antoine. From martin at v.loewis.de Wed May 6 22:48:34 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 06 May 2009 22:48:34 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <87skjig5el.fsf@uwakimon.sk.tsukuba.ac.jp> <4A014463.4070109@v.loewis.de> <87k54ufjaa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A01F7A2.5080603@v.loewis.de> > But first, it should be stopped by any of several > standard precautions. For example, applying os.path.realpath (come to > think of it, PEP 383 should say something about realpath, shouldn't > it?) Why do you think so? I think the existing documentation of realpath is correct and complete. > and os.path.normpath (PEP 383 should definitely say something > about this function Precisely what? > maybe PEP 3131 should, too) How can this be of relevance? > > Nothing is lost at the moment. > > Nothing is lost compared to 'strict', true, but under the PEP as it is > a large fraction of Shift JIS and Big5 filenames cannot be read under > ASCII-compatible file system encodings using 'utf8b'. Yet it is those > users who are placed at risk by PEP 383. I think this statement is incorrect. Those filenames *can* be read just fine. Regards, Martin From martin at v.loewis.de Wed May 6 22:56:34 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 06 May 2009 22:56:34 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: <4A01F982.2030205@v.loewis.de> Antoine Pitrou wrote: > Martin v. L?wis v.loewis.de> writes: >> Despite there being also an error handler called "surrogates". > > People, perhaps we could end all the bikeshedding and call one of those handlers > "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful > to what they actually /do/? The problem with these bike-shedding discussions is that you cannot stop them with a proposal. People will counter-propose. I would be willing to accept a ruling from someone who a) is a native speaker of English, and b) has demonstrated to fully understand what these do, and c) has understood why I insist on calling it utf8b. Regards, Martin From tjreedy at udel.edu Wed May 6 23:47:05 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 May 2009 17:47:05 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F61B.1000203@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: Martin v. L?wis wrote: >>> +1 for "surrogate" as the name for the error handler. >>> >>> >> +1 from me also > > Despite there being also an error handler called "surrogates". Given that additional information which MAL apparently omitted, I would revise. > Are you serious? Are you? ;-? You are the one naming a codec-agnostic error handler (if I understand correctly, and correct me if I do not) after a particular codec, and denying that that could cause confusion. See other message. Terry Jan Reedy From p.f.moore at gmail.com Thu May 7 00:01:23 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 6 May 2009 23:01:23 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: <79990c6b0905061501u753042c4y337b92605578020e@mail.gmail.com> 2009/5/6 Antoine Pitrou : > Martin v. L?wis v.loewis.de> writes: >> >> Despite there being also an error handler called "surrogates". > > People, perhaps we could end all the bikeshedding and call one of those handlers > "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful > to what they actually /do/? We could also stop the bikeshedding by sticking with the name utf8b. Martin's comment that it is the official name for this algorithm seems compelling to me (even if it is confusing because of its similarity with utf-8). Paul. From tjreedy at udel.edu Thu May 7 00:03:57 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 May 2009 18:03:57 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F5AD.4000404@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A01F5AD.4000404@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Because utf8b (or, perhaps "UTF-8b") is the official name for this > algorithm: > http://hyperreal.org/~est/utf-8b/ Thank you for the link. It starts: "This directory contains a C implementation of a UTF-8b codec. A Python codec based on it is provided as well." 'RTF-8b' consists, obviously, 'UTF-8' plus 'b', with the 'b' signifying a variation of or addition to UTF-8. The 'b', and only the 'b', refers to the innovative error-handler that was added to the existing 'UTF-8' codec/algorithm. The name of the combined whole is not the name of the part. If you were incorporating the Python-wrapped utf-8b *codec* as a codec, which is what I once thought *because you used that name*, then calling it 'utf-8b' would be fine. But you apparently instead proposed and implemented an *error-handler*, which seems to me to be something else, and which will not be specific to utf-8 but usable with any codec. Hence some of us think it should have a different name. I gather that you lifted the error-handler part of the algorithm and propose to use it with *any* ascii-respecting codec. I could claim that the 'official name' of that part is 'b', but I think we can find a better name. Terry Jan Reedy From tjreedy at udel.edu Thu May 7 00:33:11 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 May 2009 18:33:11 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F982.2030205@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Antoine Pitrou wrote: >> Martin v. L?wis v.loewis.de> writes: >>> Despite there being also an error handler called "surrogates". >> People, perhaps we could end all the bikeshedding and call one of those handlers >> "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful >> to what they actually /do/? > > The problem with these bike-shedding discussions is that you cannot stop > them with a proposal. People will counter-propose. > > I would be willing to accept a ruling from someone who a) is a native > speaker of English, and b) has demonstrated to fully understand what > these do, and c) has understood why I insist on calling it utf8b. I qualify with a). I believe I understand c) but, as explained in my other post, I do not think your reason applies. In fact, I think concern for naming rights might suggest that you *not* reuse the name for something different. I would have to learn more about the existing 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'. 'Surrogates-escape' is pretty good for the new handler since, to my understanding, it 'escapes' 'bad bytes' by prefixing them with bits that push them to the surrogates plane. I have been supportive of the idea and, as well as I understood them, the particulars of your proposal, from the beginning. Reusing the name of a codec as the name of an error-handler confused me and I believe it will confuse others, even though, but also because, the error handler was extracted and generalized from the codec. Terry Jan Reedy From martin at v.loewis.de Thu May 7 00:59:18 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 00:59:18 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: <4A021646.8030904@v.loewis.de> >> Are you serious? > > Are you? ;-? You are the one naming a codec-agnostic error handler (if > I understand correctly, and correct me if I do not) after a particular > codec, and denying that that could cause confusion. See other message. I can only repeat what I said before: I call it utf8b because that's the established name for the algorithm it implements. That algorithm was originally designed with UTF-8 in mind (and only meant to be applied for UTF-8), however, it remains the same algorithm even though PEP 383 widens its application. Regards, Martin From google at mrabarnett.plus.com Thu May 7 01:06:24 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 07 May 2009 00:06:24 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: <4A0217F0.4070004@mrabarnett.plus.com> Antoine Pitrou wrote: > Martin v. L?wis v.loewis.de> writes: >> Despite there being also an error handler called "surrogates". > > People, perhaps we could end all the bikeshedding and call one of those handlers > "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful > to what they actually /do/? > After having read about the existing error handler called "surrogates" and having thought about it, I've decided that calling one just "surrogates" isn't very helpful to the user; it has something to do with surrogates, but what? So +1 for Antoine's suggestion from me. From martin at v.loewis.de Thu May 7 01:16:18 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 May 2009 01:16:18 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> Message-ID: <4A021A42.4060509@v.loewis.de> > I qualify with a). I believe I understand c) but, as explained in my > other post, I do not think your reason applies. In fact, I think > concern for naming rights might suggest that you *not* reuse the name > for something different. I would have to learn more about the existing > 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'. > 'Surrogates-escape' is pretty good for the new handler since, to my > understanding, it 'escapes' 'bad bytes' by prefixing them with bits that > push them to the surrogates plane. See issue 3672. In essence, in python 2.5: py> u"\ud800".encode("utf-8") '\xed\xa0\x80' py> '\xed\xa0\x80'.decode("utf-8") u'\ud800' In 3.1, py> "\ud800".encode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed py> "\ud800".encode("utf-8","surrogates") b'\xed\xa0\x80' py> b'\xed\xa0\x80'.decode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: illegal encoding py> b'\xed\xa0\x80'.decode("utf-8","surrogates") '\ud800' Regards, Martin From solipsis at pitrou.net Thu May 7 01:27:00 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 6 May 2009 23:27:00 +0000 (UTC) Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > py> b'\xed\xa0\x80'.decode("utf-8","surrogates") > '\ud800' The point is, "surrogates" does not mean anything intuitive for an /error handler/. You seem to be the only one who finds this name explicit enough, perhaps because you chose it. Most other handlers' names have verbs in them ("ignore", "replace", "xmlcharrefreplace", etc.). Regards Antoine. From skippy.hammond at gmail.com Thu May 7 01:38:47 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Thu, 07 May 2009 09:38:47 +1000 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <4A000356.30408@trueblade.com> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> <49F9FCD0.80208@hastings.org> <49FA4064.5000508@gmail.com> <4A000356.30408@trueblade.com> Message-ID: <4A021F87.8030905@gmail.com> Eric Smith wrote: > Mark: I've reviewed this and it looks okay to me. Thanks Eric - I've now applied that patch. As you mentioned in a followup to the bug: | Thanks for looking at this, Mark. If we could only assign issues to | Python 3.2 and 3.3 to change the pending deprecation warning to a real | one, and to remove the function entirely, we'd be all set! I'm always | worried we'll forget these things. (for reference; the patch introduces a PendingDeprecationWarning for ntpath.uncpath) The bug tracker doesn't have these future versions available yet - is there some other way these things should be tracked? I fear simply opening a new bug without a reasonable 'trigger' will linger way beyond the next few versions... Thanks, Mark From murman at gmail.com Thu May 7 03:05:42 2009 From: murman at gmail.com (Michael Urman) Date: Wed, 6 May 2009 20:05:42 -0500 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F61B.1000203@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" wrote: > Despite there being also an error handler called "surrogates". Not that I have to be, but I'm not sold on the previous UTF-8 codec behavior becoming an error handler of the name "surrogates" for two reasons (I do respect the obvious PBP argument for the implementation, and have no better name - "lenient"?). First, unless there's a way to stack error handlers, there's no way to access the old behavior combined with the "replace" handler. Second, errors="surrogates" reads like surrogates should be an error, not an additionally allowed pattern. Neither of these are deal breakers or hard to learn, but they are non-obvious. I think the utf8b behavior makes a lot more sense with the name "surrogates", through the mnemonic that errors become surrogates. The stacking argument also applies to the new utf8b behavior on encode (only, as it handles all errors on decode). This may be a YAGNI, but for a non-UTF-8 encode, it may be useful to allow "xmlcharrefreplace" handling for unavailable non-surrogate-escaped characters. But without stacking that's unmaintainable, as we clearly don't want ${codec}b for all current codecs. I'd be perfectly happy with utf8b or UTF-8b, as either a codec or an error handler (do we want both? YAGNI?). So what if it smells a little inaccurate as a handler when used with codecs other than UTF-8, no big deal. I could also see something like errors="roundtrip" which explains the intention of the handler rather than the algorithm, but is awkward on encode when it encounters unavailable Unicode characters. -- Michael Urman From mal at egenix.com Thu May 7 03:06:05 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 07 May 2009 03:06:05 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F5AD.4000404@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A01F5AD.4000404@v.loewis.de> Message-ID: <4A0233FD.6010509@egenix.com> Martin v. L?wis wrote: >>>> The name "utf8b" suggested in the PEP is not in line with the codec >>>> design >>> Where is that design documented, and how exactly violates the name >>> the design (chapter and verse, please). >> Martin, I designed the whole Python codec machinery > > Not true. PEP 293 was written and designed by Walter D?rwald. Walter added the generic error handler callback mechanism and we both worked on their design. I designed and wrote the codec implementation back in 2000, which included the whole idea of having codec error handlers in the first place. The original implementation only allowed per-codec error handlers. Walter extended this to build general-purpose handlers that could be used by many codecs. His original motivation was to be able to do XML character reference escaping. If you don't believe me, go look this up in the repository, the mailing list archives and the trackers. >> so even if >> this is not explicitly written down somewhere, you can take my >> word for it. > > If the design was specified in writing somewhere, I would probably > challenge it as obsolete. If it isn't described anywhere, I'll have > to ignore it. Ah, lovely attitude. >> I want to avoid any such confusion with Python codecs and don't >> understand why you are making a problem out of this. > > Because utf8b (or, perhaps "UTF-8b") is the official name for this > algorithm: > > http://hyperreal.org/~est/utf-8b/ That's a codec implementing the escaping idea proposed by Markus Kuhn, not an official reference. AFAIK, the term "UTF-8B" originated from a "UTF-8 + binary" codec written for iconv: http://mail.nl.linux.org/linux-utf8/2006-04/msg00002.html If it were the official name of an escape algorithm, as you are suggesting, the inventor Markus Kuhn would probably have chosen it, but he hasn't... the only reference to it is an email where it is described as option D for ways of dealing with malformed UTF-8 data in a decoder: http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html Note that this escape method is not applicable for data that you decode from UTF-8 and then e.g. encode as Latin-1. It only works as general purpose method if you are decoding and encoding using the same codec, since it is specifically designed to assure round-trip safety. Martin, please stop being silly and just change the name. Or drop the idea of using an error handler altogether and just let people use the utf-8b codec you referenced above to solve their problems whereever and if needed. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 52 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From benjamin at python.org Thu May 7 03:14:06 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 6 May 2009 20:14:06 -0500 Subject: [Python-Dev] test - please ignore Message-ID: <1afaf6160905061814t61b81148y68ccec09cfee1853@mail.gmail.com> Some of my messages appear not to have gotten through. -- Regards, Benjamin From benjamin at python.org Thu May 7 03:32:47 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 6 May 2009 20:32:47 -0500 Subject: [Python-Dev] [RELEASED] Python 3.1 beta 1 Message-ID: <1afaf6160905061832xfc295e3y881c7c8e81083ee6@mail.gmail.com> On behalf of the Python development team, I'm thrilled to announce the first and only beta release of Python 3.1. Python 3.1 focuses on the stabilization and optimization of features and changes Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. [1] Other features include an ordered dictionary implementation and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/dev/py3k/whatsnew/3.1.html or Misc/NEWS in the Python distribution. Please note that this is a beta release, and as such is not suitable for production environments. We continue to strive for a high degree of quality, but there are still some known problems and the feature sets have not been finalized. This beta is being released to solicit feedback and hopefully discover bugs, as well as allowing you to determine how changes in 3.1 might impact you. If you find things broken or incorrect, please submit a bug report at http://bugs.python.org For more information and downloadable distributions, see the Python 3.1 website: http://www.python.org/download/releases/3.1/ See PEP 375 for release schedule details: http://www.python.org/dev/peps/pep-0375/ Enjoy, -- Benjamin Benjamin Peterson benjamin at python.org Release Manager (on behalf of the entire python-dev team and 3.1's contributors) From stephen at xemacs.org Thu May 7 04:35:52 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 07 May 2009 11:35:52 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A01F46D.50105@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A005A7A.7070501@mrabarnett.plus.com> <874ovzh2xb.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00B339.5050305@v.loewis.de> <87tz3yg6jy.fsf@uwakimon.sk.tsukuba.ac.jp> <4A013DE1.5000401@v.loewis.de> <87my9qfu4p.fsf@uwakimon.sk.tsukuba.ac.jp> <4A01F46D.50105@v.loewis.de> Message-ID: <87iqkdfxmf.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > Now, with Python's file system encoding == UTF-8 or any packed EUC, > > and more than a handful of Shift JIS or Big5 characters in file names, > > one is *almost certain* to encounter ASCII as the second byte of a > > multibyte sequence. PEP 383 can't handle this Ah, I see. Of course, the algorithm not only has to handle the ASCII octet which is erroneous because it can't be a trailing byte, but *also the leading byte that signalled to expect a trailing byte >127*. So the algorithm backs up to the character boundary (which is well-defined for all the "sane" encodings), encode the high byte(s) in the character with lone surrogates, and encode the ASCII as itself (promoted to a Unicode code point). Sorry, you're right, I was just confused. I withdraw the objection as completely mistaken, and apologize for not thinking more carefully in the first place. From tjreedy at udel.edu Thu May 7 05:48:38 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 06 May 2009 23:48:38 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A021646.8030904@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> Message-ID: Martin v. L?wis wrote: >>> Are you serious? >> Are you? ;-? You are the one naming a codec-agnostic error handler (if >> I understand correctly, and correct me if I do not) after a particular >> codec, and denying that that could cause confusion. See other message. > > I can only repeat what I said before: I call it What, specifically, is 'it'? > utf8b because that's > the established name for the algorithm Which algorithm? > it implements. Again, what is 'it'? As *I* read the sentence above, it is not true. I went to the site you referred to as the source of your reasoning and specifically http://hyperreal.org/~est/utf-8b/releases/utf-8b-20060413043934/utf_8b.c The algorithm called utf-8b *IS* utf-8 with the addition or replacement (of an error return) of essentially one line in each direction: # encode if 0xDC00 <= codepoint <= 0xDCFF: byte = codepoint - 0xDC00 #encode Note: for security concerns, you are increasing the lower limit to 0xDC80. The comment at the top of the utf_8b.c, suggests that that is what it should be and should have been in the file, with the other half of that surrogate area an error along with the other surrogate area. #decode if (0x80 <= byte <= 0xFF) and utf-8-invalid(byte): codepoint = byte + 0xDC00 # decode > That algorithm was originally designed with UTF-8 in mind (and only > meant to be applied for UTF-8), however, it remains the same algorithm > even though PEP 383 widens its application. The error handler designed with utf-8 in mind has no name in the encode direction and is called "utf_8b_decoder_invalid_bytes" in the decode direction. By your reasoning, *that* should be its name in Python. The encoding error handler would then be named analogously "utf_8b_encoder_invalid_codepoints". Even these, to me, would be better than confusing giving them the same name as the codec. Terry Jan Reedy From v+python at g.nevcal.com Thu May 7 06:16:02 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 21:16:02 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0233FD.6010509@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A01F5AD.4000404@v.loewis.de> <4A0233FD.6010509@egenix.com> Message-ID: <4A026082.2030508@g.nevcal.com> On approximately 5/6/2009 6:06 PM, came the following characters from the keyboard of M.-A. Lemburg: > Martin, please stop being silly and just change the name. Yes, please. If indeed Marc-Andre invented the codec business as he claims, he would be an appropriate person to give a fiat name to the error handler. > Or drop the idea of using an error handler altogether and just let > people use the utf-8b codec you referenced above to solve their > problems whereever and if needed. The design as an error handler is clever in leveraging the same error handler for multiple codecs, which cannot be done by using utf-8b alone, if I understand correctly. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Thu May 7 07:43:30 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 May 2009 07:43:30 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> Message-ID: <4A027502.5000901@v.loewis.de> Michael Urman wrote: > On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" wrote: >> Despite there being also an error handler called "surrogates". > > Not that I have to be, but I'm not sold on the previous UTF-8 codec > behavior becoming an error handler of the name "surrogates" for two > reasons (I do respect the obvious PBP argument for the implementation, > and have no better name - "lenient"?). PBP? > First, unless there's a way to stack error handlers, there's no way to > access the old behavior combined with the "replace" handler. Well, there is a way to stack error handlers, although it's not pretty: _surrogates = codecs.lookup_errors("surrogates") _replace = codecs.lookup_errors("replace") def surrogates_then_replace(exc): try: return _surrogates(exc) except UnicodeError: return _replace(exc) codecs.register_error("surrogates_then_replace", surrogates_then_replace) > The stacking argument also applies to the new utf8b behavior on encode > (only, as it handles all errors on decode). This may be a YAGNI Indeed - in particular, as, in the primary application of this error handler (i.e. file IO operations), there is no way of specifying an addition error handler anyway. Regards, Martin From martin at v.loewis.de Thu May 7 07:53:07 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 07:53:07 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> Message-ID: <4A027743.2050500@v.loewis.de> > The error handler designed with utf-8 in mind has no name in the encode > direction and is called "utf_8b_decoder_invalid_bytes" in the decode > direction. By your reasoning, *that* should be its name in Python. The > encoding error handler would then be named analogously > "utf_8b_encoder_invalid_codepoints". Even these, to me, would be better > than confusing giving them the same name as the codec. So are you proposing that I should rename the PEP 383 handler to "utf_8b_encoder_invalid_codepoints"? Regards, Martin From martin at v.loewis.de Thu May 7 08:10:16 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 May 2009 08:10:16 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <87r5z2g3zk.fsf@uwakimon.sk.tsukuba.ac.jp> <4A0141FD.2050307@v.loewis.de> Message-ID: <4A027B48.5060208@v.loewis.de> > By the way, what are the ASCII characters that are not suppported by Shift-JIS? > Not many I suppose? (if I read the Wikipedia entry correctly, it's only the > backslash and the tilde). The problem with this encoding is that bytes below 128 appear as second bytes of a two-byte encoding: py> "\x81@".decode("shift-jis") u'\u3000' py> "\x81A".decode("shift-jis") u'\u3001' So in on decoding, it may be the second byte (i.e. the ASCII byte) that causes a problem: py> "\x81/".decode("shift-jis") Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position 0-1: illegal multibyte sequence For the shift-jis codec, that's actually not a problem, though: py> b"\x81/".decode("shift-jis","utf8b") '\udc81/' so the utf8b error handler will escape the first of the two bytes, and then pass the second byte to the codec again, which then decodes as ASCII. Regards, Martin From martin at v.loewis.de Thu May 7 08:16:11 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 08:16:11 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027904.7040602@g.nevcal.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> Message-ID: <4A027CAB.5070708@v.loewis.de> >> So are you proposing that I should rename the PEP 383 handler >> to "utf_8b_encoder_invalid_codepoints"? > > > No, he's saying that your algorithm for choosing the PEP 383 handler > should have come up with that name, rather than utf8b. But since PEP > 383 applies to other codecs besides UTF-8, it should have a different > name. And one that is less cumbersome than > "utf_8b_encoder_invalid_codepoints" I'm still at a loss what name to give it, though. I understand that I have to rename both error handlers, but I'm uncertain what I should rename them to. So proposals that rename only one of them aren't that helpful. It would be helpful if people would indicate support for Antoine's proposal. Regards, Martin From v+python at g.nevcal.com Thu May 7 08:00:36 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 23:00:36 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027743.2050500@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> Message-ID: <4A027904.7040602@g.nevcal.com> On approximately 5/6/2009 10:53 PM, came the following characters from the keyboard of Martin v. L?wis: >> The error handler designed with utf-8 in mind has no name in the encode >> direction and is called "utf_8b_decoder_invalid_bytes" in the decode >> direction. By your reasoning, *that* should be its name in Python. The >> encoding error handler would then be named analogously >> "utf_8b_encoder_invalid_codepoints". Even these, to me, would be better >> than confusing giving them the same name as the codec. > > So are you proposing that I should rename the PEP 383 handler > to "utf_8b_encoder_invalid_codepoints"? No, he's saying that your algorithm for choosing the PEP 383 handler should have come up with that name, rather than utf8b. But since PEP 383 applies to other codecs besides UTF-8, it should have a different name. And one that is less cumbersome than "utf_8b_encoder_invalid_codepoints" -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Thu May 7 08:37:36 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 08:37:36 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A028090.6060405@g.nevcal.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A028090.6060405@g.nevcal.com> Message-ID: <4A0281B0.9070303@v.loewis.de> > Wouldn't renaming the existing "surrogates" handler be an incompatible > change, and thus inappropriate? No - it's new in Python 3.1. So what do you think about Antoine's proposal? Regards, Martin From v+python at g.nevcal.com Thu May 7 08:32:48 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 06 May 2009 23:32:48 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027CAB.5070708@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> Message-ID: <4A028090.6060405@g.nevcal.com> On approximately 5/6/2009 11:16 PM, came the following characters from the keyboard of Martin v. L?wis: >>> So are you proposing that I should rename the PEP 383 handler >>> to "utf_8b_encoder_invalid_codepoints"? >> >> No, he's saying that your algorithm for choosing the PEP 383 handler >> should have come up with that name, rather than utf8b. But since PEP >> 383 applies to other codecs besides UTF-8, it should have a different >> name. And one that is less cumbersome than >> "utf_8b_encoder_invalid_codepoints" > > I'm still at a loss what name to give it, though. I understand that > I have to rename both error handlers, but I'm uncertain what I should > rename them to. So proposals that rename only one of them aren't > that helpful. It would be helpful if people would indicate support > for Antoine's proposal. Wouldn't renaming the existing "surrogates" handler be an incompatible change, and thus inappropriate? I assume that is the second handler you are referring to? "bytes-as-lone-surrogates" That would be very descriptive of the decode case for PEP 383, but very long. One problem with the word "surrogates" is that anything you add to it makes it too long. "bytes-ls" This is short, but a meaningless as is -- however, adding the understanding via documentation that "ls" means "lone surrogates" would make it meaningful, and mnemonic. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From mal at egenix.com Thu May 7 11:21:28 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 07 May 2009 11:21:28 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> Message-ID: <4A02A818.4000204@egenix.com> Antoine Pitrou wrote: > Martin v. L?wis v.loewis.de> writes: >> py> b'\xed\xa0\x80'.decode("utf-8","surrogates") >> '\ud800' > > The point is, "surrogates" does not mean anything intuitive for an /error > handler/. You seem to be the only one who finds this name explicit enough, > perhaps because you chose it. > Most other handlers' names have verbs in them ("ignore", "replace", > "xmlcharrefreplace", etc.). Correct. The purpose of an error handler name is to indicate to the user what it does, hence the use of verbs. Walter started with "xmlcharrefreplace", ie. no space names, so "surrogatereplace" would be the logically correct name for the "replace with lone surrogates" scheme invented by Markus Kuhn. The error handler for undoing this operation (ie. when converting a Unicode string to some other encoding) should probably use the same name based on symmetry and the fact that the escaping scheme is meant to be used for enabling round-trip safety. BTW: It would also be appropriate to reference Markus Kuhn in the PEP as the inventor of the escaping scheme. Even if only to give the reader an idea of how that scheme works and why (the PEP on python.org currently doesn't explain this). It should also explain that the scheme is meant to assure round-trip safety and doesn't necessarily work when using transcoding, ie. reading using one encoding, writing using another. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 52 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From cournape at gmail.com Thu May 7 11:50:18 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 May 2009 18:50:18 +0900 Subject: [Python-Dev] Help on issue 5941 In-Reply-To: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> References: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> Message-ID: <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> On Wed, May 6, 2009 at 6:01 PM, Tarek Ziad? wrote: > Hello, > > I need some help on http://bugs.python.org/issue5941 > > The bug is quite simple: the Distutils unixcompiler used to set the > archiver command to "ar -rc". > > For quite a while now, this behavior has changed in order to be able > to customize the compiler behavior from > the environment. That introduced a regression because the mechanism in > Distutils that looks for the > AR variable in the environment also looks into the Makefile of Python. > (in the Makefile then is os.environ) > > And as a matter of fact, AR is set to "ar" in there, so the -cr option > is not set anymore. > > So my question is : should I make a change into the Makefile by adding > for example a variable called AR_OPTIONS > then build the ar command with AR + AR_OPTIONS I think for consistency, it could be named ARFLAGS (this is the name usually taken for configure scripts), and both should be overridable as the other variable in distutils.sysconfig.customize_compiler. Those flags should be used in Makefile.pre as well, instead of the harcoded cr as currently used. Here is what I would try: - check for AR (already done in the configure script AFAICT) - if ARFLAGS is defined in the environment, use those, otherwise set ARFLAGS to cr - use ARFLAGS in the makefile Then, in the customize_compiler function, set archiver to $AR + $ARFLAGS. IOW, just copying the logic used for e.g. ldshared, I can prepare a patch if you want, cheers, David From ziade.tarek at gmail.com Thu May 7 12:07:01 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 7 May 2009 12:07:01 +0200 Subject: [Python-Dev] Help on issue 5941 In-Reply-To: <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> References: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> Message-ID: <94bdd2610905070307g5eec595cw9f3de6c296e70acc@mail.gmail.com> On Thu, May 7, 2009 at 11:50 AM, David Cournapeau wrote: > Then, in the customize_compiler function, set archiver to $AR + > $ARFLAGS. IOW, just copying the logic used for e.g. ldshared, > > I can prepare a patch if you want, I am ok on Distutils side, but I wouldn't mind some help on the makefile/configure side Even if I could mimic what's in there, I am not confident enough yet. Please do so, by attaching your patch in the issue, Thanks Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Thu May 7 13:49:36 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 7 May 2009 13:49:36 +0200 Subject: [Python-Dev] Help on issue 5941 In-Reply-To: <5b8d13220905070437x18bdf332m737e6a934d40566c@mail.gmail.com> References: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> <94bdd2610905070307g5eec595cw9f3de6c296e70acc@mail.gmail.com> <5b8d13220905070437x18bdf332m737e6a934d40566c@mail.gmail.com> Message-ID: <94bdd2610905070449l5e565091ve2524f4d5e6522f1@mail.gmail.com> On Thu, May 7, 2009 at 1:37 PM, David Cournapeau wrote: > On Thu, May 7, 2009 at 7:07 PM, Tarek Ziad? wrote: >> On Thu, May 7, 2009 at 11:50 AM, David Cournapeau wrote: >>> Then, in the customize_compiler function, set archiver to $AR + >>> $ARFLAGS. IOW, just copying the logic used for e.g. ldshared, >>> >>> I can prepare a patch if you want, >> >> I am ok on Distutils side, but I wouldn't mind some help on the >> makefile/configure side > > Ok, I ended up making a patch for everything. I tested it on Linux, > where it fixed the issue while keeping the customization (both AR and > ARFLAGS can be customized through environment variables). > > numpy now builds under python 2.7, > > cheers, > > David > ok thanks David, I'll complete your patch with the test I have written for this issue and commit it so it's included in 2.7/3.1. Notice that from the beginning, the unixcompiler class options are never used if the option has been customized in distutils.sysconfig and present in the Makefile, so we need to clean this behavior as well at some point, and document the customization features. By the way, do you happen to have a buildbot or something that builds numpy ? If not it'll be very interesting: I wouldn't mind having one numpy track running on the Python trunk and receiving mails if something is broken. Regards Tarek -- Tarek Ziad? | http://ziade.org From cournape at gmail.com Thu May 7 14:11:46 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 7 May 2009 21:11:46 +0900 Subject: [Python-Dev] Help on issue 5941 In-Reply-To: <94bdd2610905070449l5e565091ve2524f4d5e6522f1@mail.gmail.com> References: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> <94bdd2610905070307g5eec595cw9f3de6c296e70acc@mail.gmail.com> <5b8d13220905070437x18bdf332m737e6a934d40566c@mail.gmail.com> <94bdd2610905070449l5e565091ve2524f4d5e6522f1@mail.gmail.com> Message-ID: <5b8d13220905070511q1f9f5d61u136c34dabefc0ca4@mail.gmail.com> On Thu, May 7, 2009 at 8:49 PM, Tarek Ziad? wrote: > > Notice that from the beginning, the unixcompiler class options are > never used if the option has been customized > in distutils.sysconfig and present in the Makefile, so we need to > clean this behavior as well at some point, and document > the customization features. Indeed, I have never bothered much with this part, though. Flags customization with distutils is too awkward to be useful in general for something like numpy IMHO, I just use scons instead when I need fine grained control. > By the way, do you happen to have a buildbot or something that builds numpy ? We have a buildbot: http://buildbot.scipy.org/ But I don't know if that's easy to set up such as both python and numpy are built from sources. > If not it'll be very interesting: ?I wouldn't mind having one numpy > track running on the Python trunk and receiving > mails if something is broken. Well, I would not mind either :) David From ziade.tarek at gmail.com Thu May 7 14:25:01 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 7 May 2009 14:25:01 +0200 Subject: [Python-Dev] Help on issue 5941 In-Reply-To: <5b8d13220905070511q1f9f5d61u136c34dabefc0ca4@mail.gmail.com> References: <94bdd2610905060201s2590144dp386d33773338d923@mail.gmail.com> <5b8d13220905070250m694f62d1uf311fde0f5203e8d@mail.gmail.com> <94bdd2610905070307g5eec595cw9f3de6c296e70acc@mail.gmail.com> <5b8d13220905070437x18bdf332m737e6a934d40566c@mail.gmail.com> <94bdd2610905070449l5e565091ve2524f4d5e6522f1@mail.gmail.com> <5b8d13220905070511q1f9f5d61u136c34dabefc0ca4@mail.gmail.com> Message-ID: <94bdd2610905070525k2f8392ecm3cd3ba2225a8d461@mail.gmail.com> On Thu, May 7, 2009 at 2:11 PM, David Cournapeau wrote: > But I don't know if that's easy to set up such as both python and > numpy are built from sources. I don't know about the numpy part, but the PyBots project code could be a source of inspiration for the Python part http://code.google.com/p/pybots/source/browse/trunk/master/community.cfg From benjamin at python.org Thu May 7 01:01:25 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 6 May 2009 18:01:25 -0500 Subject: [Python-Dev] [RELEASED] Python 3.1 beta 1 Message-ID: <1afaf6160905061601l1fac114ei4ffd0f4f35826640@mail.gmail.com> On behalf of the Python development team, I'm thrilled to announce the first and only beta release of Python 3.1. Python 3.1 focuses on the stabilization and optimization of features and changes Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. [1] Other features include an ordered dictionary implementation and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/dev/py3k/whatsnew/3.1.html or Misc/NEWS in the Python distribution. Please note that this is a beta release, and as such is not suitable for production environments. We continue to strive for a high degree of quality, but there are still some known problems and the feature sets have not been finalized. This beta is being released to solicit feedback and hopefully discover bugs, as well as allowing you to determine how changes in 3.1 might impact you. If you find things broken or incorrect, please submit a bug report at http://bugs.python.org For more information and downloadable distributions, see the Python 3.1 website: http://www.python.org/download/releases/3.1/ See PEP 375 for release schedule details: http://www.python.org/dev/peps/pep-0375/ Enjoy, -- Benjamin Benjamin Peterson benjamin at python.org Release Manager (on behalf of the entire python-dev team and 3.1's contributors) From walter at livinglogic.de Thu May 7 15:20:07 2009 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 07 May 2009 15:20:07 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A02A818.4000204@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> Message-ID: <4A02E007.9070308@livinglogic.de> M.-A. Lemburg wrote: > Antoine Pitrou wrote: >> Martin v. L?wis v.loewis.de> writes: >>> py> b'\xed\xa0\x80'.decode("utf-8","surrogates") >>> '\ud800' >> The point is, "surrogates" does not mean anything intuitive for an /error >> handler/. You seem to be the only one who finds this name explicit enough, >> perhaps because you chose it. >> Most other handlers' names have verbs in them ("ignore", "replace", >> "xmlcharrefreplace", etc.). > > Correct. > > The purpose of an error handler name is to indicate to the user > what it does, hence the use of verbs. > > Walter started with "xmlcharrefreplace", ie. no space names, so > "surrogatereplace" would be the logically correct name for the > "replace with lone surrogates" scheme invented by Markus Kuhn. "surrogatepass" (for the "don't complain about lone half surrogates" handler) and "surrogatereplace" sound OK to me. However the other "...replace" handlers are destructive (i.e. when such a "...replace" handler is used for encoding, decoding will not produce the original unicode string). The purpose of the PEP 383 error handler however is to be roundtrip safe, so maybe we should choose a slightly different name? How about "surrogateescape"? > The error handler for undoing this operation (ie. when converting > a Unicode string to some other encoding) should probably use the > same name based on symmetry and the fact that the escaping > scheme is meant to be used for enabling round-trip safety. We have only one error handler registry, but we *can* have one error handler for both directions (encoding and decoding) as the error handler can simply check whether it got passed a UnicodeEncodeError or UnicodeDecodeError object. > BTW: It would also be appropriate to reference Markus Kuhn in the PEP > as the inventor of the escaping scheme. > > Even if only to give the reader an idea of how that scheme works and > why (the PEP on python.org currently doesn't explain this). > > It should also explain that the scheme is meant to assure round-trip > safety and doesn't necessarily work when using transcoding, ie. > reading using one encoding, writing using another. Servus, Walter From google at mrabarnett.plus.com Thu May 7 15:47:13 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 07 May 2009 14:47:13 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0281B0.9070303@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A028090.6060405@g.nevcal.com> <4A0281B0.9070303@v.loewis.de> Message-ID: <4A02E661.9040306@mrabarnett.plus.com> Martin v. L?wis wrote: >> Wouldn't renaming the existing "surrogates" handler be an incompatible >> change, and thus inappropriate? > > No - it's new in Python 3.1. > > So what do you think about Antoine's proposal? > +1 Although it looks like it would be without the '-' for consistency with existing error handlers. From murman at gmail.com Thu May 7 16:18:31 2009 From: murman at gmail.com (Michael Urman) Date: Thu, 7 May 2009 09:18:31 -0500 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027502.5000901@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A027502.5000901@v.loewis.de> Message-ID: On Thu, May 7, 2009 at 00:43, "Martin v. L?wis" wrote: > Michael Urman wrote: >> On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" wrote: >>> Despite there being also an error handler called "surrogates". >> >> Not that I have to be, but I'm not sold on the previous UTF-8 codec >> behavior becoming an error handler of the name "surrogates" for two >> reasons (I do respect the obvious PBP argument for the implementation, >> and have no better name - "lenient"?). > > PBP? Practicality beats purity. From a purity standpoint, the legacy invalid utf-8 seems more like an encoding than an error handler to me. >From a practicality standpoint, it's presumably much more convenient to implement it on top of the new valid UTF-8 codec's behavior. And then any error handler needs a name. > Well, there is a way to stack error handlers, although it's not pretty: > [...] > codecs.register_error("surrogates_then_replace", > ? ? ? ? ? ? ? ? ? ? ?surrogates_then_replace) That mitigates my arguments significantly, although I'd rather see something like errors=('surrogates', 'replace') chain the handlers without additional registrations. But that's a different PEP or arbitrary change. :) >> The stacking argument also applies to the new utf8b behavior on encode >> (only, as it handles all errors on decode). This may be a YAGNI > > Indeed - in particular, as, in the primary application of this error > handler (i.e. file IO operations), there is no way of specifying > an addition error handler anyway. Would it be useful to allow setting this somewhere? It'd be analogous to setfsencoding, perhaps a setfsencodingerrors. It's not hard to imagine an application working on Windows where all Unicode characters are valid, and constructing backup filenames by adding some arbitrary character, or receiving them from a user who doesn't understand encodings. When this application is taken to a non-Unicode filesystem, without the ability to say "I really want a valid filename: so replace", that could get messy. But it may still be a YAGNI, or a "don't do that." -- Michael Urman From murman at gmail.com Thu May 7 16:31:11 2009 From: murman at gmail.com (Michael Urman) Date: Thu, 7 May 2009 09:31:11 -0500 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027CAB.5070708@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> Message-ID: On Thu, May 7, 2009 at 01:16, "Martin v. L?wis" wrote: > I'm still at a loss what name to give it, though. I understand that > I have to rename both error handlers, but I'm uncertain what I should > rename them to. So proposals that rename only one of them aren't > that helpful. It would be helpful if people would indicate support > for Antoine's proposal. Part of the problem is they both allow byte sequences to decode to invalid Unicode strings, and in particular they both affect the same byte subsequences, and that brought us to the crossroads where we wanted to name both of them "surrogates". So I'll offer a few more colors, and try to get out of the way of choosing between them or the other proposed ones. :) I haven't come up with anything I like better than errors="lenient" for the old utf8 behavior handler; would errors="nonvalidating" be correct? It still seems to me that a new codec, perhaps "utf8-lenient", reads better. For the utf8b error handler, I could see any of errors="roundtrip", errors="roundtripreplace", errors="tosurrogate", errors="surrogatereplace", errors="surrogateescape", errors="binaryreplace", errors="binaryescape". This includes Antoine's proposal (sans hyphen). -- Michael Urman From walter at livinglogic.de Thu May 7 16:33:21 2009 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Thu, 07 May 2009 16:33:21 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A027502.5000901@v.loewis.de> Message-ID: <4A02F131.7020408@livinglogic.de> Michael Urman wrote: > [...] >> Well, there is a way to stack error handlers, although it's not pretty: >> [...] >> codecs.register_error("surrogates_then_replace", >> surrogates_then_replace) > > That mitigates my arguments significantly, although I'd rather see > something like errors=('surrogates', 'replace') chain the handlers > without additional registrations. But that's a different PEP or > arbitrary change. :) The first version of PEP 293 changed the errors argument to be a string or callable. This would have simplified handler stacking somewhat (because you don't have to register or lookup handlers) but it had the disadvantage that many "char *" arguments in the C API would have had to changed to "PyObject *". Changing the errors argument to a list of strings would have the same problem. Servus, Walter From google at mrabarnett.plus.com Thu May 7 17:08:49 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 07 May 2009 16:08:49 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A02F131.7020408@livinglogic.de> References: <49FD5300.6010906@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A027502.5000901@v.loewis.de> <4A02F131.7020408@livinglogic.de> Message-ID: <4A02F981.2080504@mrabarnett.plus.com> Walter D?rwald wrote: > Michael Urman wrote: > >> [...] >>> Well, there is a way to stack error handlers, although it's not pretty: >>> [...] >>> codecs.register_error("surrogates_then_replace", >>> surrogates_then_replace) >> That mitigates my arguments significantly, although I'd rather see >> something like errors=('surrogates', 'replace') chain the handlers >> without additional registrations. But that's a different PEP or >> arbitrary change. :) > > The first version of PEP 293 changed the errors argument to be a string > or callable. This would have simplified handler stacking somewhat > (because you don't have to register or lookup handlers) but it had the > disadvantage that many "char *" arguments in the C API would have had to > changed to "PyObject *". Changing the errors argument to a list of > strings would have the same problem. > A comma-separated or space-separated string, eg 'surrogates replace' or 'surrogates,replace'? It could be treated as handler stacking internally. From martin at v.loewis.de Thu May 7 19:21:58 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 May 2009 19:21:58 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A027502.5000901@v.loewis.de> Message-ID: <4A0318B6.6030808@v.loewis.de> >> Well, there is a way to stack error handlers, although it's not pretty: >> [...] >> codecs.register_error("surrogates_then_replace", >> surrogates_then_replace) > > That mitigates my arguments significantly, although I'd rather see > something like errors=('surrogates', 'replace') chain the handlers > without additional registrations. But that's a different PEP or > arbitrary change. :) I think you can provide something like errors=combine_errors('surrogates', 'replace') as a library function, and it doesn't have to be part of the standard library. >>> The stacking argument also applies to the new utf8b behavior on encode >>> (only, as it handles all errors on decode). This may be a YAGNI >> Indeed - in particular, as, in the primary application of this error >> handler (i.e. file IO operations), there is no way of specifying >> an addition error handler anyway. > > Would it be useful to allow setting this somewhere? I'm deliberately not proposing this as part of the PEP. First, it has enough features already, and is approved as-is; plus YAGNI. Regards, Martin From martin at v.loewis.de Thu May 7 19:23:57 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 07 May 2009 19:23:57 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> Message-ID: <4A03192D.9000101@v.loewis.de> > I haven't come up with anything I like better than errors="lenient" > for the old utf8 behavior handler; would errors="nonvalidating" be > correct? I think either is fairly unspecific. > For the utf8b error handler, I could see any of errors="roundtrip", > errors="roundtripreplace", errors="tosurrogate", > errors="surrogatereplace", errors="surrogateescape", > errors="binaryreplace", errors="binaryescape". This includes Antoine's > proposal (sans hyphen). Giving multiple choices does not exactly make this proposal readily implementable :-) Regards, Martin From martin at v.loewis.de Thu May 7 19:27:07 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 19:27:07 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A02A818.4000204@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> Message-ID: <4A0319EB.4040508@v.loewis.de> > The error handler for undoing this operation (ie. when converting > a Unicode string to some other encoding) should probably use the > same name based on symmetry and the fact that the escaping > scheme is meant to be used for enabling round-trip safety. Could you please familiarize yourself with the implementation before commenting further? Thanks, Martin From stephen at xemacs.org Thu May 7 20:20:59 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 08 May 2009 03:20:59 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A02E007.9070308@livinglogic.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> <4A02E007.9070308@livinglogic.de> Message-ID: <87bpq4g4fo.fsf@uwakimon.sk.tsukuba.ac.jp> Walter D?rwald writes: > "surrogatepass" (for the "don't complain about lone half surrogates" > handler) and "surrogatereplace" sound OK to me. However the other > "...replace" handlers are destructive (i.e. when such a "...replace" > handler is used for encoding, decoding will not produce the original > unicode string). That doesn't bother me in the slightest. "Replace" does not connote "destructive" or "non-destructive" to me; it connotes "substitution". The fact that other error handlers happen to be destructive doesn't affect that at all for me. YMMV. > The purpose of the PEP 383 error handler however is to be roundtrip > safe, so maybe we should choose a slightly different name? How > about "surrogateescape"? To me, "escape" has a strong connotation of a multicharacter representation of a single character, and that's not true here. How about "surrogatetranslate"? I still prefer "surrogatereplace", as it's slightly easier for me to type. From ndbecker2 at gmail.com Thu May 7 20:42:29 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 07 May 2009 14:42:29 -0400 Subject: [Python-Dev] typo in 8.1.3.1. Format Specification Mini-Language? Message-ID: "format_spec ::= [[fill]align][sign][#][0][width][.precision][type]" "The precision is ignored for integer values." In [36]: '%3x' % 10 Out[36]: ' a' In [37]: '%.3x' % 10 Out[37]: '00a' Apparently, precision is _not_ ignored? From tjreedy at udel.edu Thu May 7 20:57:56 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2009 14:57:56 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A027CAB.5070708@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> Message-ID: Martin v. L?wis wrote: >>> So are you proposing that I should rename the PEP 383 handler >>> to "utf_8b_encoder_invalid_codepoints"? >> >> No, he's saying that your algorithm for choosing the PEP 383 handler >> should have come up with that name, rather than utf8b. But since PEP >> 383 applies to other codecs besides UTF-8, it should have a different >> name. And one that is less cumbersome than >> "utf_8b_encoder_invalid_codepoints" Correct. Thank you Glenn. > > I'm still at a loss what name to give it, though. I understand that > I have to rename both error handlers, but I'm uncertain what I should > rename them to. So proposals that rename only one of them aren't > that helpful. It would be helpful if people would indicate support > for Antoine's proposal. Given your explanation of what the new 'surrogates' handler does (pass rather than reject erroneous surrogates), I think 'surrogates_pass' is fine. Thus, I considoer that and 'surrogates_excape' the best proposal the best so far and suggest that you make this pair the current status quo to be argued against and improved ... or not. tjr From eric at trueblade.com Thu May 7 21:25:50 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 May 2009 15:25:50 -0400 Subject: [Python-Dev] typo in 8.1.3.1. Format Specification Mini-Language? In-Reply-To: References: Message-ID: <4A0335BE.2020603@trueblade.com> Neal Becker wrote: > "format_spec ::= [[fill]align][sign][#][0][width][.precision][type]" > "The precision is ignored for integer values." > > In [36]: '%3x' % 10 > Out[36]: ' a' > > In [37]: '%.3x' % 10 > Out[37]: '00a' > > Apparently, precision is _not_ ignored? That section is talking about this: >>> format(10, '.3x') Traceback (most recent call last): File "", line 1, in ValueError: Precision not allowed in integer format specifier From eric at trueblade.com Thu May 7 21:27:27 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 May 2009 15:27:27 -0400 Subject: [Python-Dev] typo in 8.1.3.1. Format Specification Mini-Language? In-Reply-To: <4A0335BE.2020603@trueblade.com> References: <4A0335BE.2020603@trueblade.com> Message-ID: <4A03361F.9070204@trueblade.com> Eric Smith wrote: > Neal Becker wrote: >> "format_spec ::= [[fill]align][sign][#][0][width][.precision][type]" >> "The precision is ignored for integer values." >> >> In [36]: '%3x' % 10 >> Out[36]: ' a' >> >> In [37]: '%.3x' % 10 >> Out[37]: '00a' >> >> Apparently, precision is _not_ ignored? > > That section is talking about this: > > >>> format(10, '.3x') > Traceback (most recent call last): > File "", line 1, in > ValueError: Precision not allowed in integer format specifier So I guess it shouldn't say "is ignored", it should be "is not allowed". From tjreedy at udel.edu Thu May 7 21:35:11 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2009 15:35:11 -0400 Subject: [Python-Dev] typo in 8.1.3.1. Format Specification Mini-Language? In-Reply-To: References: Message-ID: Neal Becker wrote: > "format_spec ::= [[fill]align][sign][#][0][width][.precision][type]" > "The precision is ignored for integer values." > > In [36]: '%3x' % 10 > Out[36]: ' a' > > In [37]: '%.3x' % 10 > Out[37]: '00a' > > Apparently, precision is _not_ ignored? Apparent typo reports should go to the tracker, along with version information. In this case, the Format Specification Mini-Language is for the new str.format() and format() facilities, not for % formatting, which is described in Old String Formatting Operations. Ironically, you report does point to a doc problem: precision is actually not allowed for integer types. 3.0.1 >> format(10, '3x') ' a' >>> format(10, '.3x') Traceback (most recent call last): File "", line 1, in format(10, '.3x') ValueError: Precision not allowed in integer format specifier >>> '{0:3x}'.format(10) ' a' >>> '{0:.3x}'.format(10) Traceback (most recent call last): File "", line 1, in '{0:.3x}'.format(10) ValueError: Precision not allowed in integer format specifier http://bugs.python.org/issue5963 Terry Jan Reedy From martin at v.loewis.de Thu May 7 21:39:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 07 May 2009 21:39:12 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> Message-ID: <4A0338E0.6070202@v.loewis.de> > Given your explanation of what the new 'surrogates' handler does (pass > rather than reject erroneous surrogates), I think 'surrogates_pass' is > fine. Thus, I considoer that and 'surrogates_excape' the best proposal > the best so far and suggest that you make this pair the current status > quo to be argued against and improved ... or not. That's exactly what I want to avoid: more bike-shedding. If this is now changed, it cannot be possibly be argued against and improved - it would be final, end of discussion (please!!!). So I'm happy to make it "surrogatepass" and "surrogateescape" as proposed by Walter. I'm sure you didn't really mean the spelling of "excape" to be taken literally - whether or not you meant the plural and the underscore literally, I cannot tell. Stephen Turnbull approved singular, so that's good enough for me. Regards, Martin From greg at krypto.org Thu May 7 22:26:08 2009 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 7 May 2009 13:26:08 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0338E0.6070202@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A0338E0.6070202@v.loewis.de> Message-ID: <52dc1c820905071326y208062bfv64d631dc2d7fadfa@mail.gmail.com> On Thu, May 7, 2009 at 12:39 PM, "Martin v. L?wis" wrote: >> Given your explanation of what the new 'surrogates' handler does (pass >> rather than reject erroneous surrogates), I think 'surrogates_pass' is >> fine. ?Thus, I considoer that and 'surrogates_excape' the best proposal >> the best so far and suggest that you make this pair the current status >> quo to be argued against and improved ... or not. > > That's exactly what I want to avoid: more bike-shedding. If this is now > changed, it cannot be possibly be argued against and improved - it would > be final, end of discussion (please!!!). > > So I'm happy to make it "surrogatepass" and "surrogateescape" as > proposed by Walter. I'm sure you didn't really mean the spelling of > "excape" to be taken literally - whether or not you meant the plural > and the underscore literally, I cannot tell. Stephen Turnbull approved > singular, so that's good enough for me. singular is good. +1 on these names. From eric at trueblade.com Thu May 7 23:36:08 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 May 2009 17:36:08 -0400 Subject: [Python-Dev] py3k build broken Message-ID: <4A035448.6010008@trueblade.com> Tarek: With you ARFLAGS change, I now get the following error on a 32 bit Fedora 6 box. I've done "make distclean" and "./configure": $ make ... gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -I./Modules/_io -c ./Modules/_io/textio.c -o Modules/textio.o gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -I./Modules/_io -c ./Modules/_io/stringio.c -o Modules/stringio.o gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -c ./Modules/zipimport.c -o Modules/zipimport.o ./Modules/zipimport.c: In function ?get_module_code?: ./Modules/zipimport.c:1132: warning: format ?%c? expects type ?int?, but argument 3 has type ?long int? gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -c ./Modules/symtablemodule.c -o Modules/symtablemodule.o gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -c ./Modules/xxsubtype.c -o Modules/xxsubtype.o gcc -pthread -c -fno-strict-aliasing -g -Wall -Wstrict-prototypes -I. -IInclude -I./Include -DPy_BUILD_CORE -DSVNVERSION=\"`LC_ALL=C svnversion .`\" -o Modules/getbuildinfo.o ./Modules/getbuildinfo.c rm -f libpython3.1.a ar @ARFLAGS@ libpython3.1.a Modules/getbuildinfo.o ar: illegal option -- @ Usage: ar [emulation options] [-]{dmpqrstx}[abcfilNoPsSuvV] [member-name] [count] archive-file file... ar -M [ - read options from emulation options: No emulation specific options ar: supported targets: elf32-i386 a.out-i386-linux efi-app-ia32 elf32-little elf32-big srec symbolsrec tekhex binary ihex trad-core make: *** [libpython3.1.a] Error 1 From ziade.tarek at gmail.com Thu May 7 23:46:12 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 7 May 2009 23:46:12 +0200 Subject: [Python-Dev] py3k build broken In-Reply-To: <4A035448.6010008@trueblade.com> References: <4A035448.6010008@trueblade.com> Message-ID: <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> On Thu, May 7, 2009 at 11:36 PM, Eric Smith wrote: > Tarek: > > With you ARFLAGS change, I now get the following error on a 32 bit Fedora 6 > box. I've done "make distclean" and "./configure": Sorry yes, I am on it now, the produced Makefile is broken, until then you can change it <<< line 71 ARFLAGS=....... at ARFLAGS@ <<< ARFLAGS= cr <<< Tarek -- Tarek Ziad? | http://ziade.org From tjreedy at udel.edu Thu May 7 23:49:40 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2009 17:49:40 -0400 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0338E0.6070202@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A0338E0.6070202@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> Given your explanation of what the new 'surrogates' handler does (pass >> rather than reject erroneous surrogates), I think 'surrogates_pass' is >> fine. Thus, I considoer that and 'surrogates_excape' the best proposal >> the best so far and suggest that you make this pair the current status >> quo to be argued against and improved ... or not. > > That's exactly what I want to avoid: more bike-shedding. If this is now > changed, it cannot be possibly be argued against and improved - it would > be final, end of discussion (please!!!). > > So I'm happy to make it "surrogatepass" and "surrogateescape" as > proposed by Walter. I'm sure you didn't really mean the spelling of > "excape" to be taken literally - whether or not you meant the plural > and the underscore literally, I cannot tell. Stephen Turnbull approved > singular, so that's good enough for me. Those minor tweaks for consistency with existing names are what I meant by 'improve' (with good arguments) and I approve of them also. +1 on stopping here. From eric at trueblade.com Thu May 7 23:51:32 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 May 2009 17:51:32 -0400 Subject: [Python-Dev] py3k build broken In-Reply-To: <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> References: <4A035448.6010008@trueblade.com> <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> Message-ID: <4A0357E4.4050209@trueblade.com> Tarek Ziad? wrote: > On Thu, May 7, 2009 at 11:36 PM, Eric Smith wrote: >> With you ARFLAGS change, I now get the following error on a 32 bit Fedora 6 >> box. I've done "make distclean" and "./configure": > > Sorry yes, I am on it now, the produced Makefile is broken, until then > you can change it ... No problem. I'll wait. From tjreedy at udel.edu Thu May 7 23:51:10 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 07 May 2009 17:51:10 -0400 Subject: [Python-Dev] typo in 8.1.3.1. Format Specification Mini-Language? In-Reply-To: <4A03361F.9070204@trueblade.com> References: <4A0335BE.2020603@trueblade.com> <4A03361F.9070204@trueblade.com> Message-ID: Eric Smith wrote: > Eric Smith wrote: >> Neal Becker wrote: >>> "format_spec ::= [[fill]align][sign][#][0][width][.precision][type]" >>> "The precision is ignored for integer values." >>> >>> In [36]: '%3x' % 10 >>> Out[36]: ' a' >>> >>> In [37]: '%.3x' % 10 >>> Out[37]: '00a' >>> >>> Apparently, precision is _not_ ignored? >> >> That section is talking about this: >> >> >>> format(10, '.3x') >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: Precision not allowed in integer format specifier > > So I guess it shouldn't say "is ignored", it should be "is not allowed". My exact suggestion in http://bugs.python.org/issue5963 From ziade.tarek at gmail.com Fri May 8 00:23:10 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 8 May 2009 00:23:10 +0200 Subject: [Python-Dev] py3k build broken In-Reply-To: <4A0357E4.4050209@trueblade.com> References: <4A035448.6010008@trueblade.com> <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> <4A0357E4.4050209@trueblade.com> Message-ID: <94bdd2610905071523h3c3f07o2e740c8051525f39@mail.gmail.com> On Thu, May 7, 2009 at 11:51 PM, Eric Smith wrote: > Tarek Ziad? wrote: >> >> On Thu, May 7, 2009 at 11:36 PM, Eric Smith wrote: >>> >>> With you ARFLAGS change, I now get the following error on a 32 bit Fedora >>> 6 >>> box. I've done "make distclean" and "./configure": >> >> Sorry yes, I am on it now, the produced Makefile is broken, until then >> you can change it > > ... > > No problem. I'll wait. I have fixed configure by runing autoconf, everything should be fine now Sorry for the inconvenience. Tarek From google at mrabarnett.plus.com Fri May 8 00:27:08 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 07 May 2009 23:27:08 +0100 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A0338E0.6070202@v.loewis.de> Message-ID: <4A03603C.40008@mrabarnett.plus.com> Terry Reedy wrote: > Martin v. L?wis wrote: >>> Given your explanation of what the new 'surrogates' handler does (pass >>> rather than reject erroneous surrogates), I think 'surrogates_pass' is >>> fine. Thus, I considoer that and 'surrogates_excape' the best proposal >>> the best so far and suggest that you make this pair the current status >>> quo to be argued against and improved ... or not. >> >> That's exactly what I want to avoid: more bike-shedding. If this is now >> changed, it cannot be possibly be argued against and improved - it would >> be final, end of discussion (please!!!). >> >> So I'm happy to make it "surrogatepass" and "surrogateescape" as >> proposed by Walter. I'm sure you didn't really mean the spelling of >> "excape" to be taken literally - whether or not you meant the plural >> and the underscore literally, I cannot tell. Stephen Turnbull approved >> singular, so that's good enough for me. > > Those minor tweaks for consistency with existing names are what I meant > by 'improve' (with good arguments) and I approve of them also. +1 on > stopping here. > We argue because we care. :-) From eric at trueblade.com Fri May 8 00:49:21 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 07 May 2009 18:49:21 -0400 Subject: [Python-Dev] py3k build broken In-Reply-To: <94bdd2610905071523h3c3f07o2e740c8051525f39@mail.gmail.com> References: <4A035448.6010008@trueblade.com> <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> <4A0357E4.4050209@trueblade.com> <94bdd2610905071523h3c3f07o2e740c8051525f39@mail.gmail.com> Message-ID: <4A036571.1070101@trueblade.com> Tarek Ziad? wrote: > I have fixed configure by runing autoconf, everything should be fine now And indeed, it's working fine now, thanks. > Sorry for the inconvenience. Not a problem. Anyone who volunteers for autoconf work gets a free pass from me. Eric. From mal at egenix.com Fri May 8 00:50:21 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 08 May 2009 00:50:21 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0319EB.4040508@v.loewis.de> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> <4A0319EB.4040508@v.loewis.de> Message-ID: <4A0365AD.5070506@egenix.com> Martin v. L?wis wrote: >> The error handler for undoing this operation (ie. when converting >> a Unicode string to some other encoding) should probably use the >> same name based on symmetry and the fact that the escaping >> scheme is meant to be used for enabling round-trip safety. > > Could you please familiarize yourself with the implementation > before commenting further? I did and it already uses the same (wrong) name for both encoding and decoding handlers which is good. The reason for my above comment was that the thread mentions two different names for the handler depending on the direction, e.g. "surrogatereplace" and "surrogatepass". I guess that "surrogatepass" was just an attempt to find a new name for the "surrogates" error handler (which also doesn't match the naming scheme) and that got me confused. I'd use "allowlonesurrogates" as name for the "surrogates" error handler and "lonesurrogatereplace" for the "utf8b" one. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 51 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Fri May 8 01:29:51 2009 From: brett at python.org (Brett Cannon) Date: Thu, 7 May 2009 16:29:51 -0700 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? Message-ID: [my python-dev sabbatical is still in effect, so make sure I am at least cc'ed on any replies to this email] I cannot be the only person who has a need to run tests conditionally based on whether the file system is case-sensitive or not, so I feel like I am re-inventing the wheel for issue 5442 to handle OS X with a case-sensitive filesystem. Is there a boolean somewhere that I can simply check or get to know whether the filesystem is case-sensitive? -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri May 8 01:39:41 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 07 May 2009 18:39:41 -0500 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: References: Message-ID: <4A03713D.1020407@gmail.com> On 2009-05-07 18:29, Brett Cannon wrote: > [my python-dev sabbatical is still in effect, so make sure I am at least > cc'ed on any replies to this email] > > I cannot be the only person who has a need to run tests conditionally > based on whether the file system is case-sensitive or not, so I feel > like I am re-inventing the wheel for issue 5442 to handle OS X with a > case-sensitive filesystem. Is there a boolean somewhere that I can > simply check or get to know whether the filesystem is case-sensitive? Since one may have more than one filesystem side-by-side, this can't be just be a system-wide boolean somewhere. One would have to query the target directory for this information. I am not aware of the existence of code that does such a query, though. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From solipsis at pitrou.net Fri May 8 01:48:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 7 May 2009 23:48:29 +0000 (UTC) Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? References: <4A03713D.1020407@gmail.com> Message-ID: Robert Kern gmail.com> writes: > > Since one may have more than one filesystem side-by-side, this can't be just be > a system-wide boolean somewhere. One would have to query the target directory > for this information. I am not aware of the existence of code that does such a > query, though. Or you can just be practical and test for it. Create a file "foobar" and see if you can open "FOOBAR" in read mode... Regards Antoine. From ziade.tarek at gmail.com Fri May 8 02:36:51 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 8 May 2009 02:36:51 +0200 Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib Message-ID: <94bdd2610905071736wa6a86awa1a7cb30a6f6e775@mail.gmail.com> Hello, I am trying to refactor distutils.log in order to use logging but I have been bugged by the fact that site.py uses distutils.util.get_platform() in "addbuilddir". The problem is the order of imports at initialization time : importing "logging" into distutils will make the initialization/build fail because site.py wil break when trying to import "logging", then "time". Anyways, So why site.py looks into distutils ? because distutils has a few functions to get some info about the platform and about the Makefile and some other header files like pyconfig.h etc. But I don't think it's the best place for this, and I have a proposal : let's create a dedicated "sysconfig" module in the standard library that will provide all the (refactored) functions located in distutils.sysconfig (but not customize_compiler) and disutils.util.get_platform. This module can be used by site.py, by distutils, and others, and will focus on this role. Regards Tarek -- Tarek Ziad? | http://ziade.org From andrew at bemusement.org Fri May 8 02:24:05 2009 From: andrew at bemusement.org (Andrew Bennetts) Date: Fri, 8 May 2009 10:24:05 +1000 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: References: <4A03713D.1020407@gmail.com> Message-ID: <20090508002405.GI10211@steerpike.home.puzzling.org> Antoine Pitrou wrote: > Robert Kern gmail.com> writes: > > > > Since one may have more than one filesystem side-by-side, this can't be just > be > > a system-wide boolean somewhere. One would have to query the target directory > > for this information. I am not aware of the existence of code that does such > a > > query, though. > > Or you can just be practical and test for it. Create a file "foobar" and see if > you can open "FOOBAR" in read mode... Agreed. That is how Bazaar's test suite detects this, and it works well. -Andrew. From v+python at g.nevcal.com Fri May 8 02:33:02 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 07 May 2009 17:33:02 -0700 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A03603C.40008@mrabarnett.plus.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A021646.8030904@v.loewis.de> <4A027743.2050500@v.loewis.de> <4A027904.7040602@g.nevcal.com> <4A027CAB.5070708@v.loewis.de> <4A0338E0.6070202@v.loewis.de> <4A03603C.40008@mrabarnett.plus.com> Message-ID: <4A037DBE.6000701@g.nevcal.com> On approximately 5/7/2009 3:27 PM, came the following characters from the keyboard of MRAB: > Terry Reedy wrote: >> Martin v. L?wis wrote: >>> So I'm happy to make it "surrogatepass" and "surrogateescape" as These seem adequate. It is not what I would choose or suggest, but it is adequate, and it is unlikely you can delight everyone with your choice of names, or even someone else's choice of names. These at least have a logical justification for their meaning, and can be documented reasonably. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From aahz at pythoncraft.com Fri May 8 03:22:35 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 7 May 2009 18:22:35 -0700 Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib In-Reply-To: <94bdd2610905071736wa6a86awa1a7cb30a6f6e775@mail.gmail.com> References: <94bdd2610905071736wa6a86awa1a7cb30a6f6e775@mail.gmail.com> Message-ID: <20090508012235.GA25029@panix.com> On Fri, May 08, 2009, Tarek Ziad? wrote: > > This module can be used by site.py, by distutils, and others, and will > focus on this role. This should get kicked around on python-ideas; I don't think it will require a full-blown PEP unless there's disagreement about what it should contain. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From john.arbash.meinel at gmail.com Fri May 8 03:56:02 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 07 May 2009 20:56:02 -0500 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: <20090508002405.GI10211@steerpike.home.puzzling.org> References: <4A03713D.1020407@gmail.com> <20090508002405.GI10211@steerpike.home.puzzling.org> Message-ID: <4A039132.2030006@gmail.com> Andrew Bennetts wrote: > Antoine Pitrou wrote: >> Robert Kern gmail.com> writes: >>> Since one may have more than one filesystem side-by-side, this can't be just >> be >>> a system-wide boolean somewhere. One would have to query the target directory >>> for this information. I am not aware of the existence of code that does such >> a >>> query, though. >> Or you can just be practical and test for it. Create a file "foobar" and see if >> you can open "FOOBAR" in read mode... > > Agreed. That is how Bazaar's test suite detects this, and it works well. > > -Andrew. Actually, I believe we do: open('format', 'wb').close() try: os.lstat('FoRmAt') except IOError, e: if e.errno == errno.ENOENT: ... I don't know that it really matters, just wanted to indicate we use 'lstat' rather than 'open()' to check. I could be wrong about the test suite, but I know that is what we do for 'live' files. (We always create a format file, so we know it is there to 'stat' it via a different name.) John =:-> From cournape at gmail.com Fri May 8 05:25:53 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 8 May 2009 12:25:53 +0900 Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib In-Reply-To: <94bdd2610905071736wa6a86awa1a7cb30a6f6e775@mail.gmail.com> References: <94bdd2610905071736wa6a86awa1a7cb30a6f6e775@mail.gmail.com> Message-ID: <5b8d13220905072025m522ce6e5pbfad73ebe18e3f30@mail.gmail.com> On Fri, May 8, 2009 at 9:36 AM, Tarek Ziad? wrote: > Hello, > > I am trying to refactor distutils.log in order to use logging but I > have been bugged by the fact that site.py uses > distutils.util.get_platform() in "addbuilddir". > The problem is the order of imports at initialization time : importing > "logging" into distutils will make the initialization/build fail > because site.py wil break when > trying to import "logging", then "time". > > Anyways, > So why site.py looks into distutils ? ?because distutils has a few > functions to get some info about the platform and about the Makefile > and some > other header files like pyconfig.h etc. > > But I don't think it's the best place for this, and I have a proposal : > > let's create a dedicated "sysconfig" module in the standard library > that will provide all the (refactored) functions located in > distutils.sysconfig (but not customize_compiler) > and disutils.util.get_platform. If we are talking about putting this into the stdlib proper, I would suggest thinking about putting information for every platform in sysconfig, instead of just Unix. I understand it is not an easy problem (because windows builds are totally different than every other platform), but it would really help for interoperability with other build tools. If sysconfig is to become independent of distutils, it should be cross platform and not unix specific. cheers, David From turnbull at sk.tsukuba.ac.jp Fri May 8 09:04:34 2009 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 08 May 2009 16:04:34 +0900 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <4A0365AD.5070506@egenix.com> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> <4A0319EB.4040508@v.loewis.de> <4A0365AD.5070506@egenix.com> Message-ID: <8763gcf531.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > I'd use "allowlonesurrogates" as name for the "surrogates" error > handler and "lonesurrogatereplace" for the "utf8b" one. +1 From cournape at gmail.com Fri May 8 10:31:33 2009 From: cournape at gmail.com (David Cournapeau) Date: Fri, 8 May 2009 17:31:33 +0900 Subject: [Python-Dev] py3k build broken In-Reply-To: <94bdd2610905071523h3c3f07o2e740c8051525f39@mail.gmail.com> References: <4A035448.6010008@trueblade.com> <94bdd2610905071446r6bc60c57j6784c8c34d268437@mail.gmail.com> <4A0357E4.4050209@trueblade.com> <94bdd2610905071523h3c3f07o2e740c8051525f39@mail.gmail.com> Message-ID: <5b8d13220905080131i552914bfn241a374b9b6c9d2f@mail.gmail.com> On Fri, May 8, 2009 at 7:23 AM, Tarek Ziad? wrote: > I have fixed configure by runing autoconf, everything should be fine now > > Sorry for the inconvenience. I am the one responsible for this - I did not realize that the generated configure/Makefile were also in the trunk, and my patch did not include the generated files. My apologies, cheers, David From walter at livinglogic.de Fri May 8 10:34:19 2009 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri, 08 May 2009 10:34:19 +0200 Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler In-Reply-To: <87bpq4g4fo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49FD5300.6010906@v.loewis.de> <87d4anha1r.fsf@uwakimon.sk.tsukuba.ac.jp> <4A00A93D.3030204@v.loewis.de> <4A00D937.6080403@egenix.com> <4A013CB4.9010204@v.loewis.de> <4A015E08.5000203@egenix.com> <4A0161AD.6000605@mrabarnett.plus.com> <4A01C406.3030004@g.nevcal.com> <4A01F61B.1000203@v.loewis.de> <4A01F982.2030205@v.loewis.de> <4A021A42.4060509@v.loewis.de> <4A02A818.4000204@egenix.com> <4A02E007.9070308@livinglogic.de> <87bpq4g4fo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4A03EE8B.8040400@livinglogic.de> Stephen J. Turnbull wrote: > Walter D?rwald writes: > > > "surrogatepass" (for the "don't complain about lone half surrogates" > > handler) and "surrogatereplace" sound OK to me. However the other > > "...replace" handlers are destructive (i.e. when such a "...replace" > > handler is used for encoding, decoding will not produce the original > > unicode string). > > That doesn't bother me in the slightest. "Replace" does not connote > "destructive" or "non-destructive" to me; it connotes "substitution". > The fact that other error handlers happen to be destructive doesn't > affect that at all for me. YMMV. > > > The purpose of the PEP 383 error handler however is to be roundtrip > > safe, so maybe we should choose a slightly different name? How > > about "surrogateescape"? > > To me, "escape" has a strong connotation of a multicharacter > representation of a single character, and that's not true here. > > How about "surrogatetranslate"? I still prefer "surrogatereplace", as > it's slightly easier for me to type. I like "surrogatetranslate" better than "surrogateescape" better than "surrogatereplace". But I'll stop bikesheding now and let Martin decide. Servus, alter From kristjan at ccpgames.com Fri May 8 11:47:22 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 8 May 2009 09:47:22 +0000 Subject: [Python-Dev] feature request 5804 Message-ID: <930F189C8A437347B80DF2C156F7EC7F056E2D8E32@exchis.ccp.ad.local> Hello there. I have sumitted the following patch: Add an 'offset' argument to zlib.decompress http://bugs.python.org/issue5804 I'd be interested on getting some more feedback on it. Kristj?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri May 8 18:07:06 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 8 May 2009 18:07:06 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090508160706.B81A6785D3@psf.upfronthosting.co.za> ACTIVITY SUMMARY (05/01/09 - 05/08/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2188 open (+45) / 15604 closed (+30) / 17792 total (+75) Open issues with patches: 848 Average duration of open issues: 646 days. Median duration of open issues: 396 days. Open Issues Breakdown open 2153 (+45) pending 34 ( +0) Issues Created Or Reopened (75) _______________________________ socketmodule.c on HPUX ia64 without _XOPEN_SOURCE_EXTENDED comp 05/01/09 http://bugs.python.org/issue5895 created ntai timeit documentation 05/01/09 http://bugs.python.org/issue5896 created hrfeels No library reference tree in chm help file 05/01/09 CLOSED http://bugs.python.org/issue5897 created suraj Hang in Popen.wait() when another process has been created 05/01/09 http://bugs.python.org/issue5898 created farialima test_capi crashes when called more than once 05/01/09 CLOSED http://bugs.python.org/issue5899 created pitrou Ensure RUNPATH is added to extension modules with RPATH if GNU l 05/01/09 http://bugs.python.org/issue5900 created flub patch missing meta-info in documentation pdf 05/02/09 http://bugs.python.org/issue5901 created ZeD Stricter codec names 05/02/09 http://bugs.python.org/issue5902 created ezio.melotti strftime fails in non UTF-8 locale 05/02/09 http://bugs.python.org/issue5903 created barry-scott strftime docs do not explain locale affect on result string 05/02/09 http://bugs.python.org/issue5904 created barry-scott strptime fails in non-UTF locale 05/02/09 http://bugs.python.org/issue5905 created pitrou Risk of confusion in multiprocessing module - daemonic processes 05/02/09 http://bugs.python.org/issue5906 created pakal repr of time.struct_time type does not eval 05/02/09 http://bugs.python.org/issue5907 created jwm I need to import the module in the same thread 05/02/09 http://bugs.python.org/issue5908 created tyoc Segfault in typeobject.c 05/02/09 CLOSED http://bugs.python.org/issue5909 created gbritton kqueue for more than one event is broken. 05/02/09 http://bugs.python.org/issue5910 created Erik Gorset patch built-in compile() should take encoding option. 05/03/09 http://bugs.python.org/issue5911 created naoki import deadlocks when using fork 05/03/09 http://bugs.python.org/issue5912 created abaron On Windows os.listdir('') -> cwd and os.listdir(u'') -> C:\ 05/03/09 CLOSED http://bugs.python.org/issue5913 created ezio.melotti patch Add PyOS_string_to_double function to C API 05/03/09 CLOSED http://bugs.python.org/issue5914 created marketdickinson patch PEP 383 implementation 05/03/09 CLOSED http://bugs.python.org/issue5915 created loewis patch Wrong function referenced in documentation of socket.inet_aton 05/03/09 CLOSED http://bugs.python.org/issue5916 created phihag patch Reference platform-independent alternative in socket.inet_ntop d 05/03/09 CLOSED http://bugs.python.org/issue5917 created phihag patch test_parser crashes when run after some other tests 05/04/09 http://bugs.python.org/issue5918 created pitrou patch pygettext documentation 05/04/09 CLOSED http://bugs.python.org/issue5919 created efrerich Confusing float formatting for empty presentation type. 05/04/09 CLOSED http://bugs.python.org/issue5920 created marketdickinson patch PEP 362 can be marked as finished? 05/04/09 CLOSED http://bugs.python.org/issue5921 created stutzbach Multi-with patch 05/04/09 http://bugs.python.org/issue5922 created georg.brandl patch turtle.py update: 1.0 --> 1.1 05/04/09 CLOSED http://bugs.python.org/issue5923 created gregorlingl patch When setting complete PYTHONPATH on Python 3.x, paths in the PYT 05/04/09 http://bugs.python.org/issue5924 created fabioz Odd formatting differences of keywords in reference 05/04/09 CLOSED http://bugs.python.org/issue5925 created MLModel bdist_msi - add support for minimum Python version for pure Pyth 05/04/09 http://bugs.python.org/issue5926 created atuining Typo in library on xmlrpc 05/04/09 CLOSED http://bugs.python.org/issue5927 created JonathansCorner.com Missing space after period in xmlrpc library documentation 05/04/09 CLOSED http://bugs.python.org/issue5928 created JonathansCorner.com warnings in unicodeobject.c 05/04/09 CLOSED http://bugs.python.org/issue5929 created pitrou patch Transient error in multiprocessing 05/04/09 http://bugs.python.org/issue5930 created pitrou Python runtime name hardcoded in wsgiref.simple_server 05/04/09 http://bugs.python.org/issue5931 created thijs _json: _convertPyInt_AsSsize_t() never raise any error 05/05/09 CLOSED http://bugs.python.org/issue5932 created haypo patch fix gcc -Wextra warnings (compare signed/unsigned) 05/05/09 http://bugs.python.org/issue5933 created haypo patch fix gcc warnings: explicit type conversion for uid/gid in posix 05/05/09 http://bugs.python.org/issue5934 created haypo patch Better documentation of use of BROWSER environment variable 05/05/09 http://bugs.python.org/issue5935 created Eddie E Add MSI suport for uninstalling individual versions 05/05/09 http://bugs.python.org/issue5936 created bethard Problems with dbm documentation 05/05/09 http://bugs.python.org/issue5937 created MLModel Noddy examples haven't been updated to match PEP 3123 05/05/09 CLOSED http://bugs.python.org/issue5938 created larry Ensure that PyCapsule_GetPointer calls in ctypes handle errors a 05/05/09 http://bugs.python.org/issue5939 created larry Wrong type check in check_library_list 05/05/09 CLOSED http://bugs.python.org/issue5940 created cdavid customize_compiler broken 05/05/09 CLOSED http://bugs.python.org/issue5941 created cdavid patch Ambiguity in dbm.open flag documentation 05/05/09 http://bugs.python.org/issue5942 created MLModel Bus error in test_posix on Mac OS 05/05/09 CLOSED http://bugs.python.org/issue5943 created eric.smith patch test_os failure on OS X, probably related to PEP 383 05/05/09 CLOSED http://bugs.python.org/issue5944 created marketdickinson patch PyMapping_Check returns 1 for lists 05/05/09 http://bugs.python.org/issue5945 created jmillikin Fix spelling error in Capsule docs 05/05/09 CLOSED http://bugs.python.org/issue5946 created larry patch Deprecate CObject 05/05/09 CLOSED http://bugs.python.org/issue5947 created larry patch setlocale regression 05/06/09 CLOSED http://bugs.python.org/issue5948 created Kerfred IMAP4_SSL spin because of SSLSocket.suppress_ragged_eofs 05/06/09 http://bugs.python.org/issue5949 created kevinwatters zimport doesn't work with zipfile containing comments 05/06/09 http://bugs.python.org/issue5950 created dsamersoff email.message : get_payload args's documentation is confusing 05/06/09 http://bugs.python.org/issue5951 created trolldbois AttributeError exception in urllib.urlopen 05/07/09 CLOSED http://bugs.python.org/issue5952 created sprigogin Add to "whats new": range(n) != range(n) 05/07/09 http://bugs.python.org/issue5953 created MLModel PyFrame_GetLineNumber 05/07/09 http://bugs.python.org/issue5954 created jyasskin patch, needs review aifc: close() does not close the underlying file 05/07/09 CLOSED http://bugs.python.org/issue5955 reopened amaury.forgeotdarc test_distutils fails for Python 3.1b1 on MacOS X 05/07/09 http://bugs.python.org/issue5956 created MrJean1 Possible mistake regarding writeback in documentation of shelve. 05/07/09 http://bugs.python.org/issue5957 created MLModel Typo in documentation of shelve.sync 05/07/09 CLOSED http://bugs.python.org/issue5958 created MLModel PyCode_NewEmpty 05/07/09 http://bugs.python.org/issue5959 created jyasskin patch, needs review Windows Installer Error 1722 when opting for compilation at inst 05/07/09 http://bugs.python.org/issue5960 created keldonin Missing labelside option for Tix option menu (fix included) 05/07/09 http://bugs.python.org/issue5961 created caryr Ambiguity about the semantics of sys.exit() and os._exit() in mu 05/07/09 http://bugs.python.org/issue5962 created pakal Doc error: integer precision in formats 05/07/09 CLOSED http://bugs.python.org/issue5963 created tjreedy WeakSet cmp methods 05/08/09 http://bugs.python.org/issue5964 created schuppenies Format Specs: doc 's' and implicit conversions 05/08/09 http://bugs.python.org/issue5965 created tjreedy unnecessary hardlink 05/08/09 http://bugs.python.org/issue5966 created exe PyList_GetSlice does not indicate negative ranges dont work as i 05/08/09 http://bugs.python.org/issue5967 created ideasman42 patch Generator expression bug? 05/08/09 CLOSED http://bugs.python.org/issue5968 created svenrahmann setup build with Platform SDK, finding vcvarsall.bat 05/08/09 http://bugs.python.org/issue5969 created MarcMarc Issues Now Closed (80) ______________________ str.format() wrongly formats complex() numbers (Py30a2) 510 days http://bugs.python.org/issue1588 marketdickinson patch shutil.copyfile blocks indefinitely on named pipes 337 days http://bugs.python.org/issue3002 pitrou patch FD leak in urllib2 328 days http://bugs.python.org/issue3066 gregory.p.smith IDLE opens window too low on Windows 303 days http://bugs.python.org/issue3286 gpolo Option to not-exit on test 290 days http://bugs.python.org/issue3379 michael.foord patch Ill-formed surrogates not treated as errors during encoding/deco 251 days http://bugs.python.org/issue3672 benjamin.peterson patch unicode-internal encoder reports wrong length 249 days http://bugs.python.org/issue3739 haypo patch Add Google's ipaddr.py to the stdlib 220 days http://bugs.python.org/issue3959 gregory.p.smith merge json library with latest simplejson 2.0.x 46 days http://bugs.python.org/issue4136 benjamin.peterson patch ctypes fails to build on mipsel-linux-gnu (detects mips instead 172 days http://bugs.python.org/issue4305 theller patch [PATCH] Better stacklevel for GzipFile.filename DeprecationWarni 170 days http://bugs.python.org/issue4351 pjenvey patch UTF7 encoding of slash (character 47) is incorrect 160 days http://bugs.python.org/issue4425 haypo UTF7 decoding is far too strict 160 days http://bugs.python.org/issue4426 pitrou patch, needs review Patch for better thread support in hashlib 128 days http://bugs.python.org/issue4751 gregory.p.smith patch Curses Unicode Support 129 days http://bugs.python.org/issue4787 asmodai find_library can return directories instead of files 118 days http://bugs.python.org/issue4875 theller unpickling does not intern attribute names 95 days http://bugs.python.org/issue5084 pitrou patch Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV() 94 days http://bugs.python.org/issue5108 haypo patch pdb feature request: Ability to skip standard lib modules and ot 91 days http://bugs.python.org/issue5142 georg.brandl patch bdist_msi generates version number for pure Python packages 75 days http://bugs.python.org/issue5311 bethard patch Multicast example mcast.py is outdated and ugly 66 days http://bugs.python.org/issue5379 gregory.p.smith patch msvcrt bytes cleanup 59 days http://bugs.python.org/issue5410 benjamin.peterson patch Create alternative CObject API that is safe and clean 35 days http://bugs.python.org/issue5630 benjamin.peterson patch test__locale fails with RADIXCHAR on Windows 34 days http://bugs.python.org/issue5643 benjamin.peterson patch cleanUp stack for unittest 29 days http://bugs.python.org/issue5679 yaneurabeya patch test_zipfile fails under Windows 30 days http://bugs.python.org/issue5692 pitrou patch os.getpwent returns unsigned 32bit value, os.setuid refuses it 28 days http://bugs.python.org/issue5705 gregory.p.smith 64bit msi.py still tries to copy non-existent test/README 28 days http://bugs.python.org/issue5721 loewis patch 2.6.2c1 fails to pass test_cmath on Solaris10 26 days http://bugs.python.org/issue5724 marketdickinson patch ld_so_aix does exit successfully even in case of failure 22 days http://bugs.python.org/issue5726 pitrou patch Support telling TestResult objects a test run has finished 23 days http://bugs.python.org/issue5728 michael.foord patch Change ntpath functions to implicitly support UNC paths 16 days http://bugs.python.org/issue5799 eric.smith patch, needs review IDLE/Win Installer: drop -n switch for 2.7/3.1; install 3.1 as i 11 days http://bugs.python.org/issue5847 benjamin.peterson Full example for emulating a container type 6 days http://bugs.python.org/issue5850 yaneurabeya Make complex repr and str more like float repr and str 5 days http://bugs.python.org/issue5858 marketdickinson test_urllib fails on windows 8 days http://bugs.python.org/issue5861 orsenthil Remove extraneous backwards-compatibility attributes from some m 5 days http://bugs.python.org/issue5881 benjamin.peterson patch detach() implementation 2 days http://bugs.python.org/issue5883 benjamin.peterson patch mmap.write_byte out of bounds - no error, position gets screwed 6 days http://bugs.python.org/issue5887 bmearns Extra comma in enum - fails on AIX 1 days http://bugs.python.org/issue5889 georg.brandl Subclassing property doesn't preserve the auto __doc__ behavior 4 days http://bugs.python.org/issue5890 r.david.murray patch, needs review Add support to pydoc to output .rst restructured text 0 days http://bugs.python.org/issue5893 georg.brandl No library reference tree in chm help file 0 days http://bugs.python.org/issue5897 georg.brandl test_capi crashes when called more than once 4 days http://bugs.python.org/issue5899 benjamin.peterson Segfault in typeobject.c 5 days http://bugs.python.org/issue5909 amaury.forgeotdarc On Windows os.listdir('') -> cwd and os.listdir(u'') -> C:\ 1 days http://bugs.python.org/issue5913 ezio.melotti patch Add PyOS_string_to_double function to C API 0 days http://bugs.python.org/issue5914 marketdickinson patch PEP 383 implementation 1 days http://bugs.python.org/issue5915 loewis patch Wrong function referenced in documentation of socket.inet_aton 1 days http://bugs.python.org/issue5916 georg.brandl patch Reference platform-independent alternative in socket.inet_ntop d 1 days http://bugs.python.org/issue5917 georg.brandl patch pygettext documentation 1 days http://bugs.python.org/issue5919 georg.brandl Confusing float formatting for empty presentation type. 1 days http://bugs.python.org/issue5920 eric.smith patch PEP 362 can be marked as finished? 0 days http://bugs.python.org/issue5921 georg.brandl turtle.py update: 1.0 --> 1.1 1 days http://bugs.python.org/issue5923 georg.brandl patch Odd formatting differences of keywords in reference 0 days http://bugs.python.org/issue5925 georg.brandl Typo in library on xmlrpc 0 days http://bugs.python.org/issue5927 georg.brandl Missing space after period in xmlrpc library documentation 0 days http://bugs.python.org/issue5928 georg.brandl warnings in unicodeobject.c 1 days http://bugs.python.org/issue5929 georg.brandl patch _json: _convertPyInt_AsSsize_t() never raise any error 0 days http://bugs.python.org/issue5932 georg.brandl patch Noddy examples haven't been updated to match PEP 3123 0 days http://bugs.python.org/issue5938 larry Wrong type check in check_library_list 1 days http://bugs.python.org/issue5940 tarek customize_compiler broken 3 days http://bugs.python.org/issue5941 tarek patch Bus error in test_posix on Mac OS 0 days http://bugs.python.org/issue5943 loewis patch test_os failure on OS X, probably related to PEP 383 0 days http://bugs.python.org/issue5944 marketdickinson patch Fix spelling error in Capsule docs 0 days http://bugs.python.org/issue5946 georg.brandl patch Deprecate CObject 0 days http://bugs.python.org/issue5947 georg.brandl patch setlocale regression 0 days http://bugs.python.org/issue5948 georg.brandl AttributeError exception in urllib.urlopen 1 days http://bugs.python.org/issue5952 amaury.forgeotdarc aifc: close() does not close the underlying file 1 days http://bugs.python.org/issue5955 georg.brandl Typo in documentation of shelve.sync 1 days http://bugs.python.org/issue5958 MLModel Doc error: integer precision in formats 1 days http://bugs.python.org/issue5963 eric.smith Generator expression bug? 0 days http://bugs.python.org/issue5968 r.david.murray urllib doesn't correct server returned urls 1875 days http://bugs.python.org/issue918368 orsenthil patch linecache.py::updatecache strips directory info from files 1629 days http://bugs.python.org/issue1068477 georg.brandl http_error_302() crashes with 'HTTP/1.1 400 Bad Request 1528 days http://bugs.python.org/issue1153027 orsenthil easy PEP 349: allow str() to return unicode 1350 days http://bugs.python.org/issue1266570 haypo patch linecache module returns wrong results 1313 days http://bugs.python.org/issue1309567 georg.brandl patch locale.getpreferredencoding() dies when setlocale fails 1158 days http://bugs.python.org/issue1443504 asmodai patch mailbox.Maildir re-reads directory too often 881 days http://bugs.python.org/issue1607951 akuchling patch linecache package handling 659 days http://bugs.python.org/issue1754483 georg.brandl patch Top Issues Most Discussed (10) ______________________________ 20 CVE-2008-5983 python: untrusted python modules search path 24 days open http://bugs.python.org/issue5753 19 test_asynchat fails on Mac OSX 18 days open http://bugs.python.org/issue5798 18 locale.getpreferredencoding() dies when setlocale fails 1158 days closed http://bugs.python.org/issue1443504 17 Change ntpath functions to implicitly support UNC paths 16 days closed http://bugs.python.org/issue5799 13 bdist_msi generates version number for pure Python packages 75 days closed http://bugs.python.org/issue5311 10 turtle.py update: 1.0 --> 1.1 1 days closed http://bugs.python.org/issue5923 9 test_os failure on OS X, probably related to PEP 383 0 days closed http://bugs.python.org/issue5944 9 customize_compiler broken 3 days closed http://bugs.python.org/issue5941 9 Add Google's ipaddr.py to the stdlib 220 days closed http://bugs.python.org/issue3959 9 Ill-formed surrogates not treated as errors during encoding/dec 251 days closed http://bugs.python.org/issue3672 From brett at python.org Fri May 8 18:52:40 2009 From: brett at python.org (Brett Cannon) Date: Fri, 8 May 2009 09:52:40 -0700 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: <4A039132.2030006@gmail.com> References: <4A03713D.1020407@gmail.com> <20090508002405.GI10211@steerpike.home.puzzling.org> <4A039132.2030006@gmail.com> Message-ID: On Thu, May 7, 2009 at 18:56, John Arbash Meinel < john.arbash.meinel at gmail.com> wrote: > Andrew Bennetts wrote: > > Antoine Pitrou wrote: > >> Robert Kern gmail.com> writes: > >>> Since one may have more than one filesystem side-by-side, this can't be > just > >> be > >>> a system-wide boolean somewhere. One would have to query the target > directory > >>> for this information. I am not aware of the existence of code that does > such > >> a > >>> query, though. > >> Or you can just be practical and test for it. Create a file "foobar" and > see if > >> you can open "FOOBAR" in read mode... > > > > Agreed. That is how Bazaar's test suite detects this, and it works well. > > > > -Andrew. > > > Actually, I believe we do: > > open('format', 'wb').close() > try: > os.lstat('FoRmAt') > except IOError, e: > if e.errno == errno.ENOENT: > ... > > I don't know that it really matters, just wanted to indicate we use > 'lstat' rather than 'open()' to check. I could be wrong about the test > suite, but I know that is what we do for 'live' files. (We always create > a format file, so we know it is there to 'stat' it via a different name.) Thanks for the help to everyone. I ended up simply taking __file__, making it all uppercase (or lowercase if it is already uppercase) and then doing os.path.exists() on the modified name. Seems to work. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From google at mrabarnett.plus.com Fri May 8 19:01:55 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 08 May 2009 18:01:55 +0100 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: References: <4A03713D.1020407@gmail.com> <20090508002405.GI10211@steerpike.home.puzzling.org> <4A039132.2030006@gmail.com> Message-ID: <4A046583.7060504@mrabarnett.plus.com> Brett Cannon wrote: > > > On Thu, May 7, 2009 at 18:56, John Arbash Meinel > > wrote: > > Andrew Bennetts wrote: > > Antoine Pitrou wrote: > >> Robert Kern gmail.com > writes: > >>> Since one may have more than one filesystem side-by-side, this > can't be just > >> be > >>> a system-wide boolean somewhere. One would have to query the > target directory > >>> for this information. I am not aware of the existence of code > that does such > >> a > >>> query, though. > >> Or you can just be practical and test for it. Create a file > "foobar" and see if > >> you can open "FOOBAR" in read mode... > > > > Agreed. That is how Bazaar's test suite detects this, and it > works well. > > > > -Andrew. > > > Actually, I believe we do: > > open('format', 'wb').close() > try: > os.lstat('FoRmAt') > except IOError, e: > if e.errno == errno.ENOENT: > ... > > I don't know that it really matters, just wanted to indicate we use > 'lstat' rather than 'open()' to check. I could be wrong about the test > suite, but I know that is what we do for 'live' files. (We always create > a format file, so we know it is there to 'stat' it via a different > name.) > > > Thanks for the help to everyone. I ended up simply taking __file__, > making it all uppercase (or lowercase if it is already uppercase) and > then doing os.path.exists() on the modified name. Seems to work. > Alternatively, use swapcase() and then os.path.exists(). From phd at phd.pp.ru Fri May 8 19:17:15 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 8 May 2009 21:17:15 +0400 Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity? In-Reply-To: References: <4A03713D.1020407@gmail.com> <20090508002405.GI10211@steerpike.home.puzzling.org> <4A039132.2030006@gmail.com> Message-ID: <20090508171715.GC3920@phd.pp.ru> On Fri, May 08, 2009 at 09:52:40AM -0700, Brett Cannon wrote: > Thanks for the help to everyone. I ended up simply taking __file__, making > it all uppercase (or lowercase if it is already uppercase) and then doing > os.path.exists() on the modified name. Seems to work. What if __file__ is on a different filesystem with different rules (consider NFS, SMB/CIFS, etc.)? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From casey at pandora.com Fri May 8 19:19:24 2009 From: casey at pandora.com (Casey Duncan) Date: Fri, 8 May 2009 11:19:24 -0600 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <49FEB11B.2040304@hastings.org> References: <49FEB11B.2040304@hastings.org> Message-ID: <7323CF3F-FC5D-4C62-9C45-22E4FFBBA857@pandora.com> On May 4, 2009, at 3:10 AM, Larry Hastings wrote: > > I should have brought this up to python-dev before--sorry for being > so slow. It's already in the tracker for a couple of days: > > http://bugs.python.org/issue5880 > > The idea: PyGetSetDef has this "void *closure" field that acts like > a context pointer. You stick it in the PyGetSetDef, and it gets > passed back to you when your getter or setter is called. It's a > reasonable API design, but in practice you almost never need it. > Meanwhile, it clutters up CPython, particularly typeobject.c; there > are all these function calls that end with ", NULL);", just to > satisfy the getter/setter prototype internally. I think this is an important feature, which allows you to define generic, reusable getter and setter functions and pass static metadata to them at runtime. Admittedly I have never needed the full pointer, my typical usage is to pass in an offset. I think this should only be removed if a suitable mechanism replaces it, if not it will require some needless duplication of code in extensions that use it (in particular my own) 8^) -Casey From benjamin at python.org Fri May 8 20:09:56 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 8 May 2009 13:09:56 -0500 Subject: [Python-Dev] special method lookup: how much do we care? Message-ID: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> A while ago, Guido declared that all special method lookups on new-style classes bypass __getattr__ and __getattribute__. This almost completely consistent now, and I've been working on patching up a few incorrect cases. I've know hit __enter__ and __exit__. The compiler generates LOAD_ATTR instructions for these, so it uses the normal lookup. The only way I can see to fix this is add a new opcode which uses _PyObject_LookupSpecial, but I don't think we really care this much. Opinions? -- Regards, Benjamin From larry at hastings.org Fri May 8 21:43:06 2009 From: larry at hastings.org (Larry Hastings) Date: Fri, 08 May 2009 12:43:06 -0700 Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from PyGetSetDef In-Reply-To: <7323CF3F-FC5D-4C62-9C45-22E4FFBBA857@pandora.com> References: <49FEB11B.2040304@hastings.org> <7323CF3F-FC5D-4C62-9C45-22E4FFBBA857@pandora.com> Message-ID: <4A048B4A.608@hastings.org> Casey Duncan wrote: > I think this is an important feature, which allows you to define > generic, reusable getter and setter functions and pass static metadata > to them at runtime. Admittedly I have never needed the full pointer, > my typical usage is to pass in an offset. > > I think this should only be removed if a suitable mechanism replaces > it, if not it will require some needless duplication of code in > extensions that use it (in particular my own) 8^) I disagree; I think it is a minor convenience feature, and one which encourages a lack of type safety. A suitable replacement mechanism already exists in C: static PyObject *generic_getter(PyObject *o, int context) { /* your generic code goes here */ } static PyObject *getter_with_context_1(o) { return generic_getter(o, 1); } static PyObject *getter_with_context_2(o) { return generic_getter(o, 2); } static PyObject *getter_with_context_3(o) { return generic_getter(o, 3); } You would then use "getter_with_context_1" &c in your PyGetSetDef. With a clever optimizing compiler this should result in no detectable slowdown or code bloat. However, you will be happy to learn there wasn't much support for this change, so it didn't make it into Python 3.1. /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat May 9 00:41:14 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 May 2009 18:41:14 -0400 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > A while ago, Guido declared that all special method lookups on > new-style classes bypass __getattr__ and __getattribute__. This almost > completely consistent now, and I've been working on patching up a few > incorrect cases. I've know hit __enter__ and __exit__. The compiler > generates LOAD_ATTR instructions for these, so it uses the normal > lookup. The only way I can see to fix this is add a new opcode which > uses _PyObject_LookupSpecial, but I don't think we really care this > much. Opinions? 1.More consistent attribute lookup is, to me, a feature of 3.x and I appreciate you working on this. 2. I am puzzled why those two methods should be extra special, but don't know enough to say more. 3. If there are only those two or a couple of other exceptions, I'd like them listed in the 'Special method lookup' ref doc section. tjr From benjamin at python.org Sat May 9 00:54:23 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 8 May 2009 17:54:23 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: <1afaf6160905081554p10aa7a63ue54744ea138689ef@mail.gmail.com> 2009/5/8 Terry Reedy : > 2. I am puzzled why those two methods should be extra special, but don't > know enough to say more. They're not supposed to be special, which is the reason for this message. :) Currently the interpreter will call __getattr__ when looking them up. This is not the way it should be. -- Regards, Benjamin From daniel at stutzbachenterprises.com Sat May 9 01:10:53 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 8 May 2009 18:10:53 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: On Fri, May 8, 2009 at 1:09 PM, Benjamin Peterson wrote: > I've know hit __enter__ and __exit__. The compiler > generates LOAD_ATTR instructions for these, so it uses the normal > lookup. The only way I can see to fix this is add a new opcode which > uses _PyObject_LookupSpecial, but I don't think we really care this > much. Opinions? > Why does this problem arise only with __enter__ and __exit__? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Sat May 9 01:14:12 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 8 May 2009 18:14:12 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> 2009/5/8 Daniel Stutzbach : > On Fri, May 8, 2009 at 1:09 PM, Benjamin Peterson > wrote: >> >> I've know hit __enter__ and __exit__. The compiler >> generates LOAD_ATTR instructions for these, so it uses the normal >> lookup. The only way I can see to fix this is add a new opcode which >> uses _PyObject_LookupSpecial, but I don't think we really care this >> much. Opinions? > > Why does this problem arise only with __enter__ and __exit__? Normally special methods use slots of the PyTypeObject struct. typeobject.c looks up all those methods on Python classes correctly. In the case of __enter__ and __exit__, the compiler generates bytecode to look them up, and that bytecode use PyObject_Getattr. -- Regards, Benjamin From daniel at stutzbachenterprises.com Sat May 9 02:36:44 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 8 May 2009 19:36:44 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> Message-ID: On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson wrote: > Normally special methods use slots of the PyTypeObject struct. > typeobject.c looks up all those methods on Python classes correctly. > In the case of __enter__ and __exit__, the compiler generates bytecode > to look them up, and that bytecode use PyObject_Getattr. Would this problem apply to all special methods that don't use a slot in PyTypeObject, then? I know of several other examples: __reduce__ __setstate__ __reversed__ __length_hint__ __sizeof__ (unless I misunderstand the definition of "special methods", which is possible) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Sat May 9 02:37:45 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 8 May 2009 19:37:45 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> Message-ID: <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> 2009/5/8 Daniel Stutzbach : > On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson > wrote: >> >> Normally special methods use slots of the PyTypeObject struct. >> typeobject.c looks up all those methods on Python classes correctly. >> In the case of __enter__ and __exit__, the compiler generates bytecode >> to look them up, and that bytecode use PyObject_Getattr. > > Would this problem apply to all special methods that don't use a slot in > PyTypeObject, then?? I know of several other examples: Yes. I didn't think of those. > > __reduce__ > __setstate__ > __reversed__ > __length_hint__ > __sizeof__ > > (unless I misunderstand the definition of "special methods", which is > possible) -- Regards, Benjamin From tjreedy at udel.edu Sat May 9 02:56:25 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 May 2009 20:56:25 -0400 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081554p10aa7a63ue54744ea138689ef@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081554p10aa7a63ue54744ea138689ef@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > 2009/5/8 Terry Reedy : >> 2. I am puzzled why those two methods should be extra special, but don't >> know enough to say more. > > They're not supposed to be special, which is the reason for this > message. :) Currently the interpreter will call __getattr__ when > looking them up. This is not the way it should be. I was trying to ask the same question as Daniel did more clearly, and which you answered: they are special special methods because they are not in the PyTypeObject struct like the other special (name) methods. And that, I presume, is because they are specific to context manager objects, while all other 'special' methods (that I notice in 'Special method names') are more general in being applicable to multiple types. Since built-in functions are compiled to load_global, call_function and operations to various special op codes, I could imagine that .__enter__ and .__exit__ are currently the only implicitly invoked special names that explicitly appear in code objects. I can see why you ask before burning an opcode (with parameter) to avoid that. There are two issues: 1) bypass instance lookup; 2) bypass .__getattribute__() calling. I presume you have or can do at least the first with a custom .__getattribute__ method. Terry Jan Reedy From tjreedy at udel.edu Sat May 9 03:47:32 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 08 May 2009 21:47:32 -0400 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > 2009/5/8 Daniel Stutzbach : >> On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson >> wrote: >>> Normally special methods use slots of the PyTypeObject struct. >>> typeobject.c looks up all those methods on Python classes correctly. >>> In the case of __enter__ and __exit__, the compiler generates bytecode >>> to look them up, and that bytecode use PyObject_Getattr. >> Would this problem apply to all special methods that don't use a slot in >> PyTypeObject, then? I know of several other examples: > > Yes. I didn't think of those. > >> __reduce__ >> __setstate__ >> __reversed__ >> __length_hint__ >> __sizeof__ >> >> (unless I misunderstand the definition of "special methods", which is >> possible) __reversed__, at least, is called by the reversed() builtin, so there is no LOAD_ATTR k (__reversed__) byte code. So for that, the problem is reduced to accessing type(it).__reversed__ without going thru type(it).__getattribute__. I would think that a function that did that would work for the others on the list (all 4?) that also have no LOAD_ATTR bytecode. Would a modified version of object.__getattribute__ work? tjr From benjamin at python.org Sat May 9 03:52:24 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 8 May 2009 20:52:24 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> Message-ID: <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> 2009/5/8 Terry Reedy : > Benjamin Peterson wrote: >> >> 2009/5/8 Daniel Stutzbach : >>> >>> On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson >>> wrote: >>>> >>>> Normally special methods use slots of the PyTypeObject struct. >>>> typeobject.c looks up all those methods on Python classes correctly. >>>> In the case of __enter__ and __exit__, the compiler generates bytecode >>>> to look them up, and that bytecode use PyObject_Getattr. >>> >>> Would this problem apply to all special methods that don't use a slot in >>> PyTypeObject, then? ?I know of several other examples: >> >> Yes. I didn't think of those. >> >>> __reduce__ >>> __setstate__ >>> __reversed__ >>> __length_hint__ >>> __sizeof__ >>> >>> (unless I misunderstand the definition of "special methods", which is >>> possible) > > __reversed__, at least, is called by the reversed() builtin, so there is no > LOAD_ATTR k (__reversed__) byte code. ?So for that, the problem is reduced > to accessing type(it).__reversed__ without going thru > type(it).__getattribute__. ?I would think that a function that did that > would work for the others on the list (all 4?) that also have no LOAD_ATTR > bytecode. ?Would a modified version of object.__getattribute__ work? No, it's easier to just use _PyObject_LookupSpecial there. -- Regards, Benjamin From tjreedy at udel.edu Sat May 9 08:26:58 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 09 May 2009 02:26:58 -0400 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> Message-ID: Benjamin Peterson wrote: >>>> __reduce__ >>>> __setstate__ >>>> __reversed__ >>>> __length_hint__ >>>> __sizeof__ > No, it's easier to just use _PyObject_LookupSpecial there. Does that mean that the above 5 'work correctly' (or can easily be made to do so)? Leaving just __entry__ and __exit__ as problems? From chris at simplistix.co.uk Sat May 9 11:02:21 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 09 May 2009 10:02:21 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090501184706.66ED13A4070@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <20090501184706.66ED13A4070@sparrow.telecommunity.com> Message-ID: <4A05469D.6090301@simplistix.co.uk> P.J. Eby wrote: > I didn't say there's *no* desire, however IIRC the only person who > *ever* asked on distutils-sig how to do a base package with setuptools > was the author of the ll.* packages. I've asked before ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat May 9 11:03:53 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 09 May 2009 10:03:53 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49FB3453.4060906@v.loewis.de> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> Message-ID: <4A0546F9.30108@simplistix.co.uk> Martin v. L?wis wrote: >> I, for one, have been trying to figure out how to do "base namespace" >> packages for years... > > You mean, without PEP 382? > > That won't be possible, unless you can coordinate all addon packages. > Base packages are a feature solely of PEP 382. Marc-Andre has achieved this, I think, without the PEP, but I never really understood how :-S Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat May 9 11:06:52 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 09 May 2009 10:06:52 +0100 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <49FB3384.1030106@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> Message-ID: <4A0547AC.7060103@simplistix.co.uk> Martin v. L?wis wrote: > Ok, so create three tar files: > > 1. base.tar, containing > > simplistix/ > simplistix/__init__.py So this __init__.py can have code in it? And base.tar can have other modules and subpackages in it? What happens if the base and an addon both define a package called simplistix.somepackage? > 2. addon1.tar, containing > > simplistix/addon1.pth (containing a single "*") What does that * mean? I thought .pth files just had python in them? > Unpack each of them anywhere on sys.path, in any order. How would this work if base, addon1 and addon2 were eggs managed by buildout or setuptools? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From martin at v.loewis.de Sat May 9 11:27:22 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 09 May 2009 11:27:22 +0200 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <4A0547AC.7060103@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> <4A0547AC.7060103@simplistix.co.uk> Message-ID: <4A054C7A.8020806@v.loewis.de> >> Ok, so create three tar files: >> >> 1. base.tar, containing >> >> simplistix/ >> simplistix/__init__.py > > So this __init__.py can have code in it? That's the point, yes. > And base.tar can have other modules and subpackages in it? Certainly, yes. > What happens if the base and an addon both define a package called > simplistix.somepackage? Depends on whether simplistix.somepackage is a namespace package (it should). If so, they get merged just as any other namespace package. >> 2. addon1.tar, containing >> >> simplistix/addon1.pth (containing a single "*") > > What does that * mean? See PEP 382 (search for "*"). > I thought .pth files just had python in them? Not at all - they never did. They have paths in them. >> Unpack each of them anywhere on sys.path, in any order. > > How would this work if base, addon1 and addon2 were eggs managed by > buildout or setuptools? What is a managed egg (i.e. what kind of management does buildout or setuptools apply to it)? Regards, Martin From asmodai at in-nomine.org Sat May 9 13:24:55 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Sat, 9 May 2009 13:24:55 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49FB4654.9000408@v.loewis.de> References: <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> Message-ID: <20090509112455.GL24353@nexus.in-nomine.org> -On [20090501 20:59], "Martin v. L?wis" (martin at v.loewis.de) wrote: >Right: if all portions install into the same directory, you can have >base packages already. Speaking as a user of packages, this use case is one I hardly ever encounter with the Python software/modules/packages I use. The only ones that spring to mind are the mx.* and ll.* packages. The rest simply create their own namespace as .*, but there's nothing that uses that same namespace and installs separately from the base package that I know of. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Knowledge was inherent in all things. The world was a library... From martin at v.loewis.de Sat May 9 13:40:48 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 09 May 2009 13:40:48 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090509112455.GL24353@nexus.in-nomine.org> References: <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> <20090509112455.GL24353@nexus.in-nomine.org> Message-ID: <4A056BC0.60606@v.loewis.de> >> Right: if all portions install into the same directory, you can have >> base packages already. > > Speaking as a user of packages, this use case is one I hardly ever encounter > with the Python software/modules/packages I use. The only ones that spring > to mind are the mx.* and ll.* packages. The rest simply create their own > namespace as .*, but there's nothing that uses that same namespace > and installs separately from the base package that I know of. There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*, plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*, xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, tw.*, themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc. Regards, Martin From asmodai at in-nomine.org Sat May 9 13:50:37 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Sat, 9 May 2009 13:50:37 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <4A056BC0.60606@v.loewis.de> References: <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> <20090509112455.GL24353@nexus.in-nomine.org> <4A056BC0.60606@v.loewis.de> Message-ID: <20090509115037.GM24353@nexus.in-nomine.org> -On [20090509 13:40], "Martin v. L?wis" (martin at v.loewis.de) wrote: >There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*, >plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*, >xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, tw.*, >themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc. Can be fairly said, though, that the majority of those you just named are related to Zope? That would explain why I won't know of them as I avoid Zope like the plague. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Hope is a letter that never arrives, delivered by the postman of my fear... From zookog at gmail.com Sat May 9 15:49:13 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sat, 9 May 2009 07:49:13 -0600 Subject: [Python-Dev] .pth files are evil In-Reply-To: <49FB22B5.3040507@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> Message-ID: .pth files are why I can't easily use GNU stow with easy_install. If installing a Python package involved writing new files into the filesystem, but did not require reading, updating, and re-writing any extant files such as .pth files, then GNU stow would Just Work with easy_install the way it Just Works with most things. Regards, Zooko From chris at simplistix.co.uk Sat May 9 16:07:01 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 09 May 2009 15:07:01 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090509115037.GM24353@nexus.in-nomine.org> References: <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> <20090509112455.GL24353@nexus.in-nomine.org> <4A056BC0.60606@v.loewis.de> <20090509115037.GM24353@nexus.in-nomine.org> Message-ID: <4A058E05.9070908@simplistix.co.uk> Jeroen Ruigrok van der Werven wrote: > -On [20090509 13:40], "Martin v. L?wis" (martin at v.loewis.de) wrote: >> There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*, >> plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*, >> xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, tw.*, >> themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc. > > Can be fairly said, though, that the majority of those you just named are > related to Zope? They're also all pure namespace packages rather than base + addons, which is what we've been discussing... > That would explain why I won't know of them as I avoid Zope like the plague. More fool you... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat May 9 16:10:23 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 09 May 2009 15:10:23 +0100 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <4A054C7A.8020806@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> <4A0547AC.7060103@simplistix.co.uk> <4A054C7A.8020806@v.loewis.de> Message-ID: <4A058ECF.6050203@simplistix.co.uk> Martin v. L?wis wrote: >> So this __init__.py can have code in it? > > That's the point, yes. > >> And base.tar can have other modules and subpackages in it? > > Certainly, yes. Great, when is the PEP due to land in 2.x? ;-) >> What happens if the base and an addon both define a package called >> simplistix.somepackage? > > Depends on whether simplistix.somepackage is a namespace package > (it should). If so, they get merged just as any other namespace > package. Sorry, I was looking at potential bug cases here. What happens if it's not a namespace package? > See PEP 382 (search for "*"). > >> I thought .pth files just had python in them? > > Not at all - they never did. They have paths in them. I've certainly seen them with python in, and that's what I hate about them... >>> Unpack each of them anywhere on sys.path, in any order. >> How would this work if base, addon1 and addon2 were eggs managed by >> buildout or setuptools? > > What is a managed egg (i.e. what kind of management does buildout > or setuptools apply to it)? Sorry, bad wording on my part... I guess I meant more how would buildout/setuptools go about installing/uninstalling/etc packages thatconform to PEP 382? Would setuptools/buildout need modification or would the changes take effect lower down in the stack? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From asmodai at in-nomine.org Sat May 9 16:14:34 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Sat, 9 May 2009 16:14:34 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <4A058E05.9070908@simplistix.co.uk> References: <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> <20090509112455.GL24353@nexus.in-nomine.org> <4A056BC0.60606@v.loewis.de> <20090509115037.GM24353@nexus.in-nomine.org> <4A058E05.9070908@simplistix.co.uk> Message-ID: <20090509141434.GN24353@nexus.in-nomine.org> -On [20090509 16:07], Chris Withers (chris at simplistix.co.uk) wrote: >They're also all pure namespace packages rather than base + addons, >which is what we've been discussing... But from Martin's email I understood it more as being base packages. Unless I misunderstood, of course. If correct, which is it? >More fool you... Maybe, used/worked with it and don't care for it one iota. But that's a whole different discussion. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Naritai jibun wo surikaetemo egao wa itsudemo suteki desuka... From martin at v.loewis.de Sat May 9 16:18:44 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 09 May 2009 16:18:44 +0200 Subject: [Python-Dev] .pth files are evil In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> Message-ID: <4A0590C4.1020904@v.loewis.de> Zooko O'Whielacronx wrote: > .pth files are why I can't easily use GNU stow with easy_install. > If installing a Python package involved writing new files into the > filesystem, but did not require reading, updating, and re-writing any > extant files such as .pth files, then GNU stow would Just Work with > easy_install the way it Just Works with most things. Please understand that this is the fault of easy_install, not of .pth files. There is no technical need for easy_install to rewrite .pth files on installation. It could just as well have created new .pth files, rather than modifying existing ones. If you always use --single-version-externally-managed with easy_install, it will stop editing .pth files on installation. Regards, Martin From martin at v.loewis.de Sat May 9 16:32:39 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 09 May 2009 16:32:39 +0200 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <4A058ECF.6050203@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> <4A0547AC.7060103@simplistix.co.uk> <4A054C7A.8020806@v.loewis.de> <4A058ECF.6050203@simplistix.co.uk> Message-ID: <4A059407.2060803@v.loewis.de> Chris Withers wrote: > Martin v. L?wis wrote: >>> So this __init__.py can have code in it? >> >> That's the point, yes. >> >>> And base.tar can have other modules and subpackages in it? >> >> Certainly, yes. > > Great, when is the PEP due to land in 2.x? ;-) Most likely, never - it probably will be implemented only after the last feature release of 2.x was made. >>> What happens if the base and an addon both define a package called >>> simplistix.somepackage? >> >> Depends on whether simplistix.somepackage is a namespace package >> (it should). If so, they get merged just as any other namespace >> package. > > Sorry, I was looking at potential bug cases here. What happens if it's > not a namespace package? Then it will be imported as a regular child package. >>>> Unpack each of them anywhere on sys.path, in any order. >>> How would this work if base, addon1 and addon2 were eggs managed by >>> buildout or setuptools? >> >> What is a managed egg (i.e. what kind of management does buildout >> or setuptools apply to it)? > > Sorry, bad wording on my part... I guess I meant more how would > buildout/setuptools go about installing/uninstalling/etc packages > thatconform to PEP 382? Would setuptools/buildout need modification or > would the changes take effect lower down in the stack? Unfortunately, I don't know precisely what they do, so I don't know whether any of it needs modification. All I can say is that if they want to install namespace packages using the mechanism of PEP 382, they will have to produce the file layout specified in the PEP. For distutils (which is the only library in that area that I do know), I think just installing any .pth files inside a package would be sufficient. Regards, Martin From martin at v.loewis.de Sat May 9 16:34:28 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 09 May 2009 16:34:28 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090509141434.GN24353@nexus.in-nomine.org> References: <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49FB24DF.2020701@simplistix.co.uk> <49FB3453.4060906@v.loewis.de> <20090501184843.D08E43A4070@sparrow.telecommunity.com> <49FB4654.9000408@v.loewis.de> <20090509112455.GL24353@nexus.in-nomine.org> <4A056BC0.60606@v.loewis.de> <20090509115037.GM24353@nexus.in-nomine.org> <4A058E05.9070908@simplistix.co.uk> <20090509141434.GN24353@nexus.in-nomine.org> Message-ID: <4A059474.4090704@v.loewis.de> Jeroen Ruigrok van der Werven wrote: > -On [20090509 16:07], Chris Withers (chris at simplistix.co.uk) wrote: >> They're also all pure namespace packages rather than base + addons, >> which is what we've been discussing... > > But from Martin's email I understood it more as being base packages. Unless > I misunderstood, of course. > > If correct, which is it? The list I gave you was a list of distributions that include namespace packages (using the setuptools mechanism). I don't think that any of them has the notion of a base package, as the setuptools mechanism doesn't support base packages. Regards, Martin From pje at telecommunity.com Sat May 9 16:41:02 2009 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 May 2009 10:41:02 -0400 Subject: [Python-Dev] .pth files are evil In-Reply-To: <4A0590C4.1020904@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> Message-ID: <20090509143829.17F293A4080@sparrow.telecommunity.com> At 04:18 PM 5/9/2009 +0200, Martin v. L??wis wrote: >Zooko O'Whielacronx wrote: > > .pth files are why I can't easily use GNU stow with easy_install. > > If installing a Python package involved writing new files into the > > filesystem, but did not require reading, updating, and re-writing any > > extant files such as .pth files, then GNU stow would Just Work with > > easy_install the way it Just Works with most things. > >Please understand that this is the fault of easy_install, not of .pth >files. There is no technical need for easy_install to rewrite .pth >files on installation. It could just as well have created new .pth >files, rather than modifying existing ones. > >If you always use --single-version-externally-managed with easy_install, >it will stop editing .pth files on installation. It's --multi-version (-m) that does that. --single-version-externally-managed is a "setup.py install" option. Both have the effect of not editing .pth files, but they do so in different ways. The "setup.py install" option causes it to install in a distutils-compatible layout, whereas --multi-version simply drops .egg files or directories in the target location and leaves it to the user (or the generated script wrappers) to add them to sys.path. From martin at v.loewis.de Sat May 9 16:42:01 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 09 May 2009 16:42:01 +0200 Subject: [Python-Dev] .pth files are evil In-Reply-To: <20090509143829.17F293A4080@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> Message-ID: <4A059639.7040505@v.loewis.de> >> If you always use --single-version-externally-managed with easy_install, >> it will stop editing .pth files on installation. > > It's --multi-version (-m) that does that. > --single-version-externally-managed is a "setup.py install" option. > > Both have the effect of not editing .pth files, but they do so in > different ways. The "setup.py install" option causes it to install in a > distutils-compatible layout, whereas --multi-version simply drops .egg > files or directories in the target location and leaves it to the user > (or the generated script wrappers) to add them to sys.path. Ah, ok. Is there also an easy_install invocation that unpacks the zip file into some location of sys.path (which then wouldn't require editing sys.path)? Regards, Martin From pje at telecommunity.com Sat May 9 17:39:52 2009 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 09 May 2009 11:39:52 -0400 Subject: [Python-Dev] .pth files are evil In-Reply-To: <4A059639.7040505@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> Message-ID: <20090509153716.D44633A4080@sparrow.telecommunity.com> At 04:42 PM 5/9/2009 +0200, Martin v. L?wis wrote: > >> If you always use --single-version-externally-managed with easy_install, > >> it will stop editing .pth files on installation. > > > > It's --multi-version (-m) that does that. > > --single-version-externally-managed is a "setup.py install" option. > > > > Both have the effect of not editing .pth files, but they do so in > > different ways. The "setup.py install" option causes it to install in a > > distutils-compatible layout, whereas --multi-version simply drops .egg > > files or directories in the target location and leaves it to the user > > (or the generated script wrappers) to add them to sys.path. > >Ah, ok. Is there also an easy_install invocation that unpacks the zip >file into some location of sys.path (which then wouldn't require >editing sys.path)? Not as yet. I'm sort of waiting to see what comes out of PEP 376 discussions re: an installation manifest... but then, if I actually had time to work on it right now, I'd probably just implement something. Currently, you can use pip to do that, though, as long as the packages you want are in source form. pip doesn't unzip eggs as yet. It would be really straightforward, though, for someone to implement an easy_install variant that does this. Just invoke "easy_install -Zmaxd /some/tmpdir packagelist" to get a full set of unpacked .egg directories in /some/tmpdir, and then move the contents of the resulting .egg subdirs to the target location, renaming EGG-INFO subdirs to projectname-version.egg-info subdirs. (Of course, this ignores the issue of uninstalling previous versions, or overwriting of conflicting files in the target -- does pip handle these?) From benjamin at python.org Sat May 9 17:52:11 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 9 May 2009 10:52:11 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> Message-ID: <1afaf6160905090852y3af447a9ofb5b7840f44a8a97@mail.gmail.com> 2009/5/9 Terry Reedy : > Benjamin Peterson wrote: > >>>>> __reduce__ >>>>> __setstate__ >>>>> __reversed__ >>>>> __length_hint__ >>>>> __sizeof__ > >> No, it's easier to just use _PyObject_LookupSpecial there. > > Does that mean that the above 5 'work correctly' (or can easily be made to > do so)? ?Leaving just __entry__ and __exit__ as problems? Yes, __enter__ and __exit__ are the tricky ones. -- Regards, Benjamin From p.f.moore at gmail.com Sat May 9 18:03:20 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 9 May 2009 17:03:20 +0100 Subject: [Python-Dev] PEP 382: little help for stupid people? In-Reply-To: <4A058ECF.6050203@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> <4A0547AC.7060103@simplistix.co.uk> <4A054C7A.8020806@v.loewis.de> <4A058ECF.6050203@simplistix.co.uk> Message-ID: <79990c6b0905090903o19e11505w353cfe62f4f67071@mail.gmail.com> 2009/5/9 Chris Withers : > Martin v. L?wis wrote: >>> I thought .pth files just had python in them? >> >> Not at all - they never did. They have paths in them. > > I've certainly seen them with python in, and that's what I hate about > them... AIUI, there was a small special case that lines starting with "import" are executed (see the source of site.py for details). This exception has been exploited (some would say "abused", but I'm trying to be unbiased here) by setuptools, at least, to do path manipulations and such. PEP 382 does not provide the import exception: "Unlike .pth files on the top level, lines starting with "import" are not supported in per-package .pth files". It's not clear to me what impact this would have on setuptools (probably none, as top-level .pth files aren't changed). Paul. From g.brandl at gmx.net Sat May 9 19:16:55 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 09 May 2009 19:16:55 +0200 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: Benjamin Peterson schrieb: > A while ago, Guido declared that all special method lookups on > new-style classes bypass __getattr__ and __getattribute__. This almost > completely consistent now, and I've been working on patching up a few > incorrect cases. I've know hit __enter__ and __exit__. The compiler > generates LOAD_ATTR instructions for these, so it uses the normal > lookup. The only way I can see to fix this is add a new opcode which > uses _PyObject_LookupSpecial, but I don't think we really care this > much. Opinions? It's easier to introduce a separate opcode like SETUP_WITH; the compilation of a with statement produces quite a lot of bytecode which could be made more efficient that way. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From greg.ewing at canterbury.ac.nz Sun May 10 03:10:53 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 10 May 2009 13:10:53 +1200 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> Message-ID: <4A06299D.4030403@canterbury.ac.nz> Are we solving an actual problem by changing the behaviour here, or is it just a case of foolish consistency? Seems to me that trying to pin down exactly what constitutes a "special method" is a fool's errand, especially if you want it to include __enter__ and __exit__ but not __reduce__, etc. -- Greg From benjamin at python.org Sun May 10 03:25:28 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 9 May 2009 20:25:28 -0500 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <4A06299D.4030403@canterbury.ac.nz> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <1afaf6160905081614o33443c85v51d5574807ada8d7@mail.gmail.com> <1afaf6160905081737t5329e27ax6757892230b75ea0@mail.gmail.com> <1afaf6160905081852g5323d307g54148d01adc4faca@mail.gmail.com> <4A06299D.4030403@canterbury.ac.nz> Message-ID: <1afaf6160905091825q28006c33sf9fb09d5e8a40cc2@mail.gmail.com> 2009/5/9 Greg Ewing : > Are we solving an actual problem by changing the > behaviour here, or is it just a case of foolish > consistency? "No implementation detail is obscure enough." For example, Maciek Fijalkowski of PyPy told me that he cares about this because someone is bound to eventually rely on it, and PyPy will have to follow CPython. > > Seems to me that trying to pin down exactly what > constitutes a "special method" is a fool's errand, > especially if you want it to include __enter__ and > __exit__ but not __reduce__, etc. IMO, if it's a callable that begins with __ and ends with __, it's a special method. -- Regards, Benjamin From zooko at zooko.com Sun May 10 17:41:33 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sun, 10 May 2009 09:41:33 -0600 Subject: [Python-Dev] .pth files are evil In-Reply-To: <20090509153716.D44633A4080@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> Message-ID: <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> On May 9, 2009, at 9:39 AM, P.J. Eby wrote: > It would be really straightforward, though, for someone to > implement an easy_install variant that does this. Just invoke > "easy_install -Zmaxd /some/tmpdir packagelist" to get a full set of > unpacked .egg directories in /some/tmpdir, and then move the > contents of the resulting .egg subdirs to the target location, > renaming EGG-INFO subdirs to projectname-version.egg-info subdirs. Except for the renaming part, this is exactly what GNU stow does. > (Of course, this ignores the issue of uninstalling previous > versions, or overwriting of conflicting files in the target -- does > pip handle these?) GNU stow does handle these issues. Regards, Zooko From martin at v.loewis.de Sun May 10 19:18:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 10 May 2009 19:18:16 +0200 Subject: [Python-Dev] .pth files are evil In-Reply-To: <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> Message-ID: <4A070C58.4070003@v.loewis.de> > GNU stow does handle these issues. If GNU stow solves all your problems, why do you want to use easy_install in the first place? Regards, Martin From zooko at zooko.com Sun May 10 20:04:57 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sun, 10 May 2009 12:04:57 -0600 Subject: [Python-Dev] how GNU stow is complementary rather than alternative to distutils In-Reply-To: <4A070C58.4070003@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4A070C58.4070003@v.loewis.de> Message-ID: On May 10, 2009, at 11:18 AM, Martin v. L?wis wrote: > If GNU stow solves all your problems, why do you want to use > easy_install in the first place? That's a good question. The answer is that there are two separate jobs: building executables and putting them in a directory structure of the appropriate shape for your system is one job, and installing or uninstalling that tree into your system is another. GNU stow does only the latter. The input to GNU stow is a set of executables, library files, etc., in a directory tree that is of the right shape for your system. For example, if you are on a Linux system, then your scripts all need to be in $prefix/bin/, your shared libs should be in $prefix/lib, your Python packages ought to be in $prefix/lib/python$x.$y/site- packages/, etc. GNU stow is blissfully ignorant about all issues of building binaries, and choosing where to place files, etc. -- that's the job of the build system of the package, e.g. the "./configure -- prefix=foo && make && make install" for most C packages, or the "python ./setup.py install --prefix=foo" for Python packages using distutils (footnote 1). Once GNU stow has the well-shaped directory which is the output of the build process, then it follows a very dumb, completely reversible (uninstallable) process of symlinking those files into the system directory structure. It is a beautiful, elegant hack because it is sooo dumb. It is also very nice to use the same tool to manage packages written in any programming language, provided only that they can build a directory tree of the right shape and content. However, there are lots of things that it doesn't do, such as automatically acquiring and building dependencies, or producing executables for the target platform for each of your console scripts. Not to mention creating a directory named "$prefx/lib/python $x.$y/site-packages" and cp'ing your Python files into it. That's why you still need a build system even if you use GNU stow for an install-and-uninstall system. The thing that prevents this from working with setuptools is that setuptools creates a file named easy_install.pth during the "python ./ setup.py install --prefix=foo" if you build two different Python packages this way, they will each create an easy_install.pth file, and then when you ask GNU stow to link the two resulting packages into your system, it will say "You are asking me to install two different packages which both claim that they need to write a file named '/usr/local/lib/python2.5/site-packages/easy_install.pth'. I'm too dumb to deal with this conflict, so I give up.". If I understand correctly, your (MvL's) suggestion that easy_install create a .pth file named "easy_install-$PACKAGE-$VERSION.pth" instead of "easy_install.pth" would indeed make it work with GNU stow. Regards, Zooko footnote 1: Aside from the .pth file issue, the other reason that setuptools doesn't work for this use while distutils does is that setuptools tries to hard to save you from making a mistake: maybe you don't know what you are doing if you ask it to install into a previously non-existent prefix dir "foo". This one is easier to fix: http://bugs.python.org/setuptools/issue54 # "be more like distutils with regard to --prefix=" . From martin at v.loewis.de Sun May 10 20:21:48 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 10 May 2009 20:21:48 +0200 Subject: [Python-Dev] how GNU stow is complementary rather than alternative to distutils In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4A070C58.4070003@v.loewis.de> Message-ID: <4A071B3C.2060808@v.loewis.de> Zooko Wilcox-O'Hearn wrote: > On May 10, 2009, at 11:18 AM, Martin v. L?wis wrote: > >> If GNU stow solves all your problems, why do you want to use >> easy_install in the first place? > > That's a good question. The answer is that there are two separate jobs: > building executables and putting them in a directory structure of the > appropriate shape for your system is one job, and installing or > uninstalling that tree into your system is another. GNU stow does only > the latter. And so does easy_install - it's job is *not* to build the executables and to put them in a directory structure. Instead, it's distutils/setuptools which has this job. The primary purpose of easy_install is to download the files from PyPI (IIUC). > The thing that prevents this from working with setuptools is that > setuptools creates a file named easy_install.pth It will stop doing that if you ask nicely. That's why I recommended earlier that you do ask it not to edit .pth files. > If I understand correctly, > your (MvL's) suggestion that easy_install create a .pth file named > "easy_install-$PACKAGE-$VERSION.pth" instead of "easy_install.pth" would > indeed make it work with GNU stow. My recommendation is that you use the already existing flag to setup.py install that stops it from editing .pth files. Regards, Martin From zookog at gmail.com Sun May 10 20:21:57 2009 From: zookog at gmail.com (Zooko O'Whielacronx) Date: Sun, 10 May 2009 12:21:57 -0600 Subject: [Python-Dev] how GNU stow is complementary rather than alternative to distutils In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4A070C58.4070003@v.loewis.de> Message-ID: following-up to my own post to mention one very important reason why anyone cares: On Sun, May 10, 2009 at 12:04 PM, Zooko Wilcox-O'Hearn wrote: > It is a beautiful, elegant hack because it is sooo dumb. ?It is also very > nice to use the same tool to manage packages written in any programming > language, provided only that they can build a directory tree of the right > shape and content. And, you are not relying on the author of the package that you are installing to avoid accidentally or maliciously screwing up your system. You're not even relying on the authors of the *build system* (e.g. the authors of distutils or easy_install). You are relying *only* on GNU stow to avoid accidentally or maliciously screwing up your system, and GNU stow is very dumb, so it is easy to understand what it is going to do and why that isn't going to irreversibly screw up your system. That is: you don't run the "build yourself and install into $prefix" step as root. This is an important consideration for a lot of people, who absolutely refuse on principle to ever run "sudo python ./setup.py" on a system that they care about unless they wrote the "setup.py" script themselves. (Likewise they refuse to run "sudo make install" on packages written in C.) Regards, Zooko From pje at telecommunity.com Sun May 10 20:48:46 2009 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 10 May 2009 14:48:46 -0400 Subject: [Python-Dev] how GNU stow is complementary rather than alternative to distutils In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4A070C58.4070003@v.loewis.de> Message-ID: <20090510184609.D238E3A4061@sparrow.telecommunity.com> At 12:04 PM 5/10/2009 -0600, Zooko Wilcox-O'Hearn wrote: >The thing that prevents this from working with setuptools is that >setuptools creates a file named easy_install.pth during the "python >./ setup.py install --prefix=foo" if you build two different Python >packages this way, they will each create an easy_install.pth file, >and then when you ask GNU stow to link the two resulting packages >into your system, it will say "You are asking me to install two >different packages which both claim that they need to write a file >named '/usr/local/lib/python2.5/site-packages/easy_install.pth'. Adding --record and --single-version-externally-managed to that command line will prevent the .pth file from being used or needed, although I believe you already know this. (What that mode won't do is install dependencies automatically.) From ncoghlan at gmail.com Sun May 10 23:51:32 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 11 May 2009 07:51:32 +1000 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> Message-ID: <4A074C64.5070208@gmail.com> Benjamin Peterson wrote: > A while ago, Guido declared that all special method lookups on > new-style classes bypass __getattr__ and __getattribute__. This almost > completely consistent now, and I've been working on patching up a few > incorrect cases. I've know hit __enter__ and __exit__. The compiler > generates LOAD_ATTR instructions for these, so it uses the normal > lookup. The only way I can see to fix this is add a new opcode which > uses _PyObject_LookupSpecial, but I don't think we really care this > much. Opinions? As Georg pointed out, the expectation was that we would eventually add a SETUP_WITH opcode that used the special method lookup (and hopefully speed with statements up to a point where they're competitive with writing out the associated try statement directly). The current code is the way it is because there is no "LOAD_SPECIAL" opcode and adding type dereferencing logic to the expansion would have been difficult without a custom opcode. For other special methods that are looked up from Python code, the closest we can ever get is to bypass the instance (i.e. using "type(obj).__method__(obj, *args)") to avoid metaclass confusion. The type slots are even *more* special than that because they bypass __getattribute__ and __getattr__ even on the metaclass for speed reasons. There's a reason the docs already say that for a guaranteed override you *must* actually define the special method on the class rather than merely making it accessible via __getattr__ or even __getattribute__. The PyPy guys are right to think that some developer somewhere is going to rely on these implementation details in CPython at some point. However lots of developers rely on CPython ref counting as well, no matter how many times they're told not to do that if they want to support alternative interpreters. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From fuzzyman at voidspace.org.uk Mon May 11 00:20:01 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 10 May 2009 23:20:01 +0100 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <4A074C64.5070208@gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <4A074C64.5070208@gmail.com> Message-ID: <4A075311.2050208@voidspace.org.uk> Nick Coghlan wrote: > Benjamin Peterson wrote: > >> A while ago, Guido declared that all special method lookups on >> new-style classes bypass __getattr__ and __getattribute__. This almost >> completely consistent now, and I've been working on patching up a few >> incorrect cases. I've know hit __enter__ and __exit__. The compiler >> generates LOAD_ATTR instructions for these, so it uses the normal >> lookup. The only way I can see to fix this is add a new opcode which >> uses _PyObject_LookupSpecial, but I don't think we really care this >> much. Opinions? >> > > As Georg pointed out, the expectation was that we would eventually add a > SETUP_WITH opcode that used the special method lookup (and hopefully > speed with statements up to a point where they're competitive with > writing out the associated try statement directly). The current code is > the way it is because there is no "LOAD_SPECIAL" opcode and adding type > dereferencing logic to the expansion would have been difficult without a > custom opcode. > > For other special methods that are looked up from Python code, the > closest we can ever get is to bypass the instance (i.e. using > "type(obj).__method__(obj, *args)") to avoid metaclass confusion. The > type slots are even *more* special than that because they bypass > __getattribute__ and __getattr__ even on the metaclass for speed reasons. > > There's a reason the docs already say that for a guaranteed override you > *must* actually define the special method on the class rather than > merely making it accessible via __getattr__ or even __getattribute__. > > The PyPy guys are right to think that some developer somewhere is going > to rely on these implementation details in CPython at some point. > However lots of developers rely on CPython ref counting as well, no > matter how many times they're told not to do that if they want to > support alternative interpreters. > It's actually very annoying for things like writing Mock or proxy objects when this behaviour is inconsistent (sorry should have spoken up earlier). The Python interpreter bases some of its decisions on whether these methods exist at all - and when you have objects that provide methods through __getattr__ then you can accidentally get screwed if magic method lookup returns an object unexpectedly when it should have raised an AttributeError. Of course for proxy objects it might be more convenient if *all* attribute access did go through __getattr__ - but with that not the case it is much better for it to be consistent rather than have to put in specific workaround code. All the best, Michael > Cheers, > Nick. > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From google at mrabarnett.plus.com Mon May 11 00:50:40 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sun, 10 May 2009 23:50:40 +0100 Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <4A075311.2050208@voidspace.org.uk> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <4A074C64.5070208@gmail.com> <4A075311.2050208@voidspace.org.uk> Message-ID: <4A075A40.1080407@mrabarnett.plus.com> Michael Foord wrote: > Nick Coghlan wrote: >> Benjamin Peterson wrote: >> >>> A while ago, Guido declared that all special method lookups on >>> new-style classes bypass __getattr__ and __getattribute__. This almost >>> completely consistent now, and I've been working on patching up a few >>> incorrect cases. I've know hit __enter__ and __exit__. The compiler >>> generates LOAD_ATTR instructions for these, so it uses the normal >>> lookup. The only way I can see to fix this is add a new opcode which >>> uses _PyObject_LookupSpecial, but I don't think we really care this >>> much. Opinions? >>> >> >> As Georg pointed out, the expectation was that we would eventually add a >> SETUP_WITH opcode that used the special method lookup (and hopefully >> speed with statements up to a point where they're competitive with >> writing out the associated try statement directly). The current code is >> the way it is because there is no "LOAD_SPECIAL" opcode and adding type >> dereferencing logic to the expansion would have been difficult without a >> custom opcode. >> >> For other special methods that are looked up from Python code, the >> closest we can ever get is to bypass the instance (i.e. using >> "type(obj).__method__(obj, *args)") to avoid metaclass confusion. The >> type slots are even *more* special than that because they bypass >> __getattribute__ and __getattr__ even on the metaclass for speed reasons. >> >> There's a reason the docs already say that for a guaranteed override you >> *must* actually define the special method on the class rather than >> merely making it accessible via __getattr__ or even __getattribute__. >> >> The PyPy guys are right to think that some developer somewhere is going >> to rely on these implementation details in CPython at some point. >> However lots of developers rely on CPython ref counting as well, no >> matter how many times they're told not to do that if they want to >> support alternative interpreters. >> > > It's actually very annoying for things like writing Mock or proxy > objects when this behaviour is inconsistent (sorry should have spoken up > earlier). > > The Python interpreter bases some of its decisions on whether these > methods exist at all - and when you have objects that provide methods > through __getattr__ then you can accidentally get screwed if magic > method lookup returns an object unexpectedly when it should have raised > an AttributeError. > > Of course for proxy objects it might be more convenient if *all* > attribute access did go through __getattr__ - but with that not the case > it is much better for it to be consistent rather than have to put in > specific workaround code. > Suggestion: have something like "from __future__" but affecting compile-time behaviour (like pragmas in some other languages), such as causing Python to generate bytecodes which perform all attribute access through __getattr__. From david.lyon at preisshare.net Mon May 11 03:32:11 2009 From: david.lyon at preisshare.net (David Lyon) Date: Sun, 10 May 2009 21:32:11 -0400 Subject: [Python-Dev] .pth files are evil In-Reply-To: <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> Message-ID: <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> On Sun, 10 May 2009 09:41:33 -0600, Zooko Wilcox-O'Hearn wrote: >> (Of course, this ignores the issue of uninstalling previous >> versions, or overwriting of conflicting files in the target -- does >> pip handle these?) > > GNU stow does handle these issues. I'm not sure GNU stow will handle the .PTH when deinstalling packages. In easy_install.PTH there will be a list of all the packages installed. This list really needs to be edited once a package is removed. The .PTH files are a really good part of python. Definitely nothing evil about them. David From giuott at gmail.com Mon May 11 14:26:49 2009 From: giuott at gmail.com (Giuseppe Ottaviano) Date: Mon, 11 May 2009 14:26:49 +0200 Subject: [Python-Dev] how GNU stow is complementary rather than alternative to distutils In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4A070C58.4070003@v.loewis.de> Message-ID: <6AA82018-2DCA-4F58-BEF9-28D021553247@gmail.com> Talking of stow, I take advantage of this thread to do some shameless advertising :) Recently I uploaded to PyPI a software of mine, BPT [1], which does the same symlinking trick of stow, but it is written in Python (and with a simple api) and, more importantly, it allows with another trick the relocation of the installation directory (it creates a semi- isolated environment, similar to virtualenv). I find it very convenient when I have to switch between several versions of the same packages (for example during development), or I have to deploy on the same machine software that needs different versions of the dependencies. I am planning to write an integration layer with buildout and easy_install. It should be very easy, since BPT can handle directly tarballs (and directories, in trunk) which contain a setup.py. HTH, Giuseppe [1] http://pypi.python.org/pypi/bpt P.S. I was not aware of stow, I'll add it to the references and see if there are any features that I can steal From aahz at pythoncraft.com Mon May 11 14:46:44 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 11 May 2009 05:46:44 -0700 Subject: [Python-Dev] Switchover: mail.python.org Message-ID: <20090511124644.GB19400@panix.com> On Monday 2009-05-11, mail.python.org will be switched to another machine starting roughly at 14:00 UTC. This should be invisible (expected downtime is less than ten minutes). -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From fumanchu at aminus.org Mon May 11 18:53:51 2009 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 11 May 2009 09:53:51 -0700 Subject: [Python-Dev] py3k, cgi, email, and form-data Message-ID: There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload. In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed: klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step: klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector). I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;) Robert Brewer fumanchu at aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From fumanchu at aminus.org Mon May 11 18:40:11 2009 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 11 May 2009 09:40:11 -0700 Subject: [Python-Dev] py3k, cgi, and form-data Message-ID: <1242060011.19084.20.camel@haku> There's a major change in functionality in the cgi module between Python 2 and Python 3 which I've just run across: the behavior of FieldStorage.read_multi, specifically when an HTTP app accepts a file upload within a multipart/form-data payload. In Python 2, each part would be read in sequence within its own FieldStorage instance. This allowed file uploads to be shunted to a TemporaryFile (via make_file) as needed: klass = self.FieldStorageClass or self.__class__ part = klass(self.fp, {}, ib, environ, keep_blank_values, strict_parsing) # Throw first part away while not part.done: headers = rfc822.Message(self.fp) part = klass(self.fp, headers, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) In Python 3 (svn revision 72466), the whole request body is read into memory first via fp.read(), and then broken into separate parts in a second step: klass = self.FieldStorageClass or self.__class__ parser = email.parser.FeedParser() # Create bogus content-type header for proper multipart parsing parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) parser.feed(self.fp.read()) full_msg = parser.close() # Get subparts msgs = full_msg.get_payload() for msg in msgs: fp = StringIO(msg.get_payload()) part = klass(fp, msg, ib, environ, keep_blank_values, strict_parsing) self.list.append(part) This makes the cgi module in Python 3 somewhat crippled for handling multipart/form-data file uploads of any significant size (and since the client is the one determining the size, opens a server up for an unexpected Denial of Service vector). I *think* the FeedParser is designed to accept incremental writes, but I haven't yet found a way to do any kind of incremental reads from it in order to shunt the fp.read out to a tempfile again. I'm secretly hoping Barry has a one-liner fix for this. ;) Robert Brewer fumanchu at aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Mon May 11 18:35:58 2009 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 11 May 2009 12:35:58 -0400 Subject: [Python-Dev] .pth files are evil In-Reply-To: <4A059639.7040505@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> Message-ID: <20090511163321.984D53A4109@sparrow.telecommunity.com> At 04:42 PM 5/9/2009 +0200, Martin v. L?wis wrote: > >> If you always use --single-version-externally-managed with easy_install, > >> it will stop editing .pth files on installation. > > > > It's --multi-version (-m) that does that. > > --single-version-externally-managed is a "setup.py install" option. > > > > Both have the effect of not editing .pth files, but they do so in > > different ways. The "setup.py install" option causes it to install in a > > distutils-compatible layout, whereas --multi-version simply drops .egg > > files or directories in the target location and leaves it to the user > > (or the generated script wrappers) to add them to sys.path. > >Ah, ok. Is there also an easy_install invocation that unpacks the zip >file into some location of sys.path (which then wouldn't require >editing sys.path)? No; you'd have to use the -e option to easy_install to download and extract a source version of the package; then run that package's setup.py, e.g.: easy_install -eb /some/tmpdir SomeProject cd /some/tmpdir/someproject # subdir is always lowercased/normalized setup.py install --single-version-externally-managed --record=... I suspect that this is basically what pip is doing under the hood, as that would explain why it doesn't support .egg files. I previously posted code to the distutils-sig that was an .egg unpacker with appropriate renaming, though. It was untested, and assumes you already checked for collisions in the target directory, and that you're handling any uninstall manifest yourself. It could probably be modified to take a filter function, though, something like: def flatten_egg(egg_filename, extract_dir, filter=lambda s,d: d): eggbase = os.path.filename(egg_filename)+'-info' def file_filter(src, dst): if src.startswith('EGG-INFO/'): src = eggbase+s[8:] dst = os.path.join(extract_dir, *src.split('/')) return filter(src, dst) return unpack_archive(egg_filename, extract_dir, file_filter) Then you could pass in a None-returning filter function to check and accumulate collisions and generate a manifest. A second run with the default filter would do the unpacking. (This function should work with either .egg files or .egg directories as input, btw, since unpack_archive treats a directory input as if it were an archive.) Anyway, if you used "easy_install -mxd /some/tmpdir [specs]" to get your target eggs found/built, you could then run this flattening function (with appropriate filter functions) over the *.egg contents of /some/tmpdir to do the actual installation. (The reason for using -mxd instead of -Zmaxd or -zmaxd is that we don't care whether the eggs are zipped or not, and we leave out the -a so that dependencies already present on sys.path aren't copied or re-downloaded to the target; only dependencies we don't already have will get dropped in /some/tmpdir.) Of course, the devil of this is in the details; to handle conflicts and uninstalls properly you would need to know what namespace packages were in the eggs you are installing. But if you don't care about blindly overwriting things (as the distutils does not), then it's actually pretty easy to make such an unpacker. I mainly haven't made one myself because I *do* care about things being blindly overwritten. From asmodai at in-nomine.org Mon May 11 19:29:55 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Mon, 11 May 2009 19:29:55 +0200 Subject: [Python-Dev] Switchover: mail.python.org In-Reply-To: <20090511124644.GB19400@panix.com> References: <20090511124644.GB19400@panix.com> Message-ID: <20090511172955.GT24353@nexus.in-nomine.org> -On [20090511 14:47], Aahz (aahz at pythoncraft.com) wrote: >On Monday 2009-05-11, mail.python.org will be switched to another machine >starting roughly at 14:00 UTC. This should be invisible (expected >downtime is less than ten minutes). The headers for the python checkins mails are apparently different now. So people might want to adjust any filtering. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B The reverse side also has a reverse side... From cesare.dimauro at a-tono.com Mon May 11 20:00:16 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Mon, 11 May 2009 20:00:16 +0200 (CEST) Subject: [Python-Dev] A wordcode-based Python In-Reply-To: <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> Message-ID: <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> At the last PyCon3 at Italy I've presented a new Python implementation, which you'll find at http://code.google.com/p/wpython/ WPython is a re-implementation of (some parts of) Python, which drops support for bytecode in favour of a wordcode-based model (where a is word is 16 bits wide). It also implements an hybrid stack-register virtual machine, and adds a lot of other optimizations. The slides are available in the download area, and explain the concept of wordcode, showing also how work some optimizations, comparing them with the current Python (2.6.1). Unfortunately I had not time to make extensive benchmarks with real code, so I've included some that I made with PyStone, PyBench, and a couple of simple recoursive function calls (Fibonacci and Factorial). This is the first release, and another two are scheduled; the first one to make it possibile to select (almost) any optimization to be compiled (so fine grained tests will be possibile). The latter will be a rewrite of the constant folding code (specifically for tuples, lists and dicts), removing a current "hack" to the python type system to make them "hashable" for the constants dictionary used by compile.c. Then I'll start writing some documentation that will explain what parts of code are related to a specific optimization, so that it'll be easier to create patches for other Python implementations, if needed. You'll find a bit more informations in the "README FIRST!" file present into the project's repository. I made so many changes to the source of Python 2.6.1, so feel free to ask me for any information about them. Cheers Cesare From google at mrabarnett.plus.com Mon May 11 20:28:20 2009 From: google at mrabarnett.plus.com (MRAB) Date: Mon, 11 May 2009 19:28:20 +0100 Subject: [Python-Dev] py3k, cgi, email, and form-data In-Reply-To: References: Message-ID: <4A086E44.3020409@mrabarnett.plus.com> Robert Brewer wrote: > There's a major change in functionality in the cgi module between Python > 2 and Python 3 which I've just run across: the behavior of > FieldStorage.read_multi, specifically when an HTTP app accepts a file > upload within a multipart/form-data payload. > > In Python 2, each part would be read in sequence within its own > FieldStorage instance. This allowed file uploads to be shunted to a > TemporaryFile (via make_file) as needed: > > klass = self.FieldStorageClass or self.__class__ > part = klass(self.fp, {}, ib, > environ, keep_blank_values, strict_parsing) > # Throw first part away > while not part.done: > headers = rfc822.Message(self.fp) > part = klass(self.fp, headers, ib, > environ, keep_blank_values, strict_parsing) > self.list.append(part) > > In Python 3 (svn revision 72466), the whole request body is read into > memory first via fp.read(), and then broken into separate parts in a > second step: > > klass = self.FieldStorageClass or self.__class__ > parser = email.parser.FeedParser() > # Create bogus content-type header for proper multipart parsing > parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) > parser.feed(self.fp.read()) > full_msg = parser.close() > # Get subparts > msgs = full_msg.get_payload() > for msg in msgs: > fp = StringIO(msg.get_payload()) > part = klass(fp, msg, ib, environ, keep_blank_values, > strict_parsing) > self.list.append(part) > > This makes the cgi module in Python 3 somewhat crippled for handling > multipart/form-data file uploads of any significant size (and since > the client is the one determining the size, opens a server up for an > unexpected Denial of Service vector). > > I *think* the FeedParser is designed to accept incremental writes, > but I haven't yet found a way to do any kind of incremental reads > from it in order to shunt the fp.read out to a tempfile again. > I'm secretly hoping Barry has a one-liner fix for this. ;) > It think what it needs is for the email.parser.FeedParser class to have a feed_from_file() method, supported by the class BufferedSubFile. The BufferedSubFile class keeps an internal list of lines. Perhaps it could also have a list of files, so that when the list of lines becomes empty it can continue by reading lines from the files instead, dropping a file from the list when it reaches the end, something like this: [Module feedparser.py] ... class BufferedSubFile(object): ... def __init__(self): # The last partial line pushed into this object. self._partial = '' # The list of full, pushed lines, in reverse order self._lines = [] # The list of files. self._files = [] ... ... def readline(self): while not self._lines and self._files: data = self._files[0].read(MAX_DATA_SIZE) if data: self.push(data) else: del self._files[0] if not self._lines: if self._closed: return '' return NeedMoreData ... def push_file(self, data_file): """Push some new data from a file into this object.""" self._files.append(data_file) ... and then: ... class FeedParser: ... def feed(self, data): """Push more data into the parser.""" self._input.push(data) self._call_parse() def feed_from_file(self, data_file): """Push more data from a file into the parser.""" self._input.push_file(data_file) self._call_parse() ... From solipsis at pitrou.net Mon May 11 22:27:54 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 11 May 2009 20:27:54 +0000 (UTC) Subject: [Python-Dev] A wordcode-based Python References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> Message-ID: Hi, > WPython is a re-implementation of (some parts of) Python, which drops > support for bytecode in favour of a wordcode-based model (where a is word > is 16 bits wide). This is great! Have you planned to port in to the py3k branch? Or, at least, to trunk? Some opcode and VM optimizations have gone in after 2.6 was released, although nothing as invasive as you did. About the CISC-y instructions, have you tried merging the fast and const arrays in frame objects? That way, you need less opcode space (since e.g. BINARY_ADD_FAST_FAST will cater with constants as well as local variables). Regards Antoine. From collinw at gmail.com Mon May 11 23:14:44 2009 From: collinw at gmail.com (Collin Winter) Date: Mon, 11 May 2009 14:14:44 -0700 Subject: [Python-Dev] A wordcode-based Python In-Reply-To: <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> References: <49D4DA72.60401@v.loewis.de> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> Message-ID: <43aa6ff70905111414n62d20099r9bb2b3ebd52a26ec@mail.gmail.com> Hi Cesare, On Mon, May 11, 2009 at 11:00 AM, Cesare Di Mauro wrote: > At the last PyCon3 at Italy I've presented a new Python implementation, > which you'll find at http://code.google.com/p/wpython/ Good to see some more attention on Python performance! There's quite a bit going on in your changes; do you have an optimization-by-optimization breakdown, to give an idea about how much performance each optimization gives? Looking over the slides, I see that you still need to implement functionality to make test_trace pass, for example; do you have a notion of how much performance it will cost to implement the rest of Python's semantics in these areas? Also, I checked out wpython at head to run Unladen Swallow's benchmarks against it, but it refuses to compile with either gcc 4.0.1 or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build failures off-list, if you're interested. Thanks, Collin Winter From martin at v.loewis.de Mon May 11 23:26:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 11 May 2009 23:26:16 +0200 Subject: [Python-Dev] albatross backup Message-ID: <4A0897F8.4080803@v.loewis.de> Hi Sean, Can you please setup backup for albatross? I gave sudo permissions to the "jafo" user, which has the key jafo at guin.tummy.com authorized. I think the policy now is that root logins to albatross are not allowed. So what might work is this: Create an rsyncbackup user, and give it sudo permission to run rsync (any command line arguments). Put your backup pubkey into rsyncbackup's authorized_keys. Could that actually work? albatross admins: would that be an acceptable setup? As for volumes to backup: I think /srv needs regular backup. Not sure about any of the others (and neither sure what your current strategy is wrt. volumes on the other machines). Compared to /srv, everything else is peanuts, anyway. Regards, Martin P.S. I have removed ~root/.ssh/authorized_keys. It only contained my key, and root logins are disallowed, anyway. P.P.S. You can stop doing regular backups to bag. I think we should keep the machine one for a little while, then turn it off and keep it around for a further while, and then return it to XS4ALL; making a complete dump before returning it. From martin at v.loewis.de Tue May 12 00:13:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 12 May 2009 00:13:16 +0200 Subject: [Python-Dev] albatross backup In-Reply-To: <4A0897F8.4080803@v.loewis.de> References: <4A0897F8.4080803@v.loewis.de> Message-ID: <4A08A2FC.2020102@v.loewis.de> [please ignore this message - I sent it to the wrong mailing list] Regards, Martin From skip at pobox.com Tue May 12 05:18:25 2009 From: skip at pobox.com (skip at pobox.com) Date: Mon, 11 May 2009 22:18:25 -0500 Subject: [Python-Dev] albatross backup In-Reply-To: <4A0897F8.4080803@v.loewis.de> References: <4A0897F8.4080803@v.loewis.de> Message-ID: <18952.60033.47955.293386@montanaro.dyndns.org> Martin> As for volumes to backup: I think /srv needs regular backup. Martin> Not sure about any of the others .... Backup of /usr/local/spambayes-corpus would be very helpful. Skip From supreet.sethi at gmail.com Tue May 12 08:27:25 2009 From: supreet.sethi at gmail.com (s|s) Date: Tue, 12 May 2009 11:57:25 +0530 Subject: [Python-Dev] using help function in Py3k In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 7:13 PM, Daniel Stutzbach wrote: > On Tue, May 5, 2009 at 5:41 AM, s|s wrote: >> >> LookupError: unknown encoding: uft-8 > > uft-8? > > Looks like a variation of Issue 4540 (or a duplicate?? I can't tell) > Yes. It is the same issue. I don't think pydoc should be modified. In my humble opinion tests should exist in /usr/share or /usr/share/doc. > -- > Daniel Stutzbach, Ph.D. > President, Stutzbach Enterprises, LLC -- ~preet~ From cesare.dimauro at a-tono.com Tue May 12 08:42:19 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 12 May 2009 08:42:19 +0200 (CEST) Subject: [Python-Dev] A wordcode-based Python In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> Message-ID: <4022.88.149.182.147.1242110539.squirrel@webmail5.pair.com> On Mon, May 11, 2009 10:27PM, Antoine Pitrou wrote: Hi Antoine > Hi, > >> WPython is a re-implementation of (some parts of) Python, which drops >> support for bytecode in favour of a wordcode-based model (where a is >> word >> is 16 bits wide). > > This is great! > Have you planned to port in to the py3k branch? Or, at least, to trunk? It was my idea too, but first I need to take a deep look at what parts of code are changed from 2.6 to 3.0. That's because I don't know how much work is required for this "forward" port. > Some opcode and VM optimizations have gone in after 2.6 was released, > although > nothing as invasive as you did. :-D Interesting. > About the CISC-y instructions, have you tried merging the fast and const > arrays > in frame objects? That way, you need less opcode space (since e.g. > BINARY_ADD_FAST_FAST will cater with constants as well as local > variables). > > Regards > > Antoine. It's an excellent idea, that needs exploration. Running my stats tools against all .py files found in Lib and Tools folders, I discovered that the maximum index used for fast/locals is 79, and 1853 for constants. So if I find a way to easily map locals first and constants following in the same array, your great idea can be implemented saving A LOT of opcodes and reducing ceval.c source code. I'll work on that after the two releases that I planned. Thanks for your precious suggestions! Cesare From cesare.dimauro at a-tono.com Tue May 12 08:54:01 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 12 May 2009 08:54:01 +0200 (CEST) Subject: [Python-Dev] A wordcode-based Python In-Reply-To: <43aa6ff70905111414n62d20099r9bb2b3ebd52a26ec@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> <43aa6ff70905111414n62d20099r9bb2b3ebd52a26ec@mail.gmail.com> Message-ID: <4213.88.149.182.147.1242111241.squirrel@webmail5.pair.com> Hi Collin On Mon, May 11, 2009 11:14PM, Collin Winter wrote: > Hi Cesare, > > On Mon, May 11, 2009 at 11:00 AM, Cesare Di Mauro > wrote: >> At the last PyCon3 at Italy I've presented a new Python implementation, >> which you'll find at http://code.google.com/p/wpython/ > > Good to see some more attention on Python performance! There's quite a > bit going on in your changes; do you have an > optimization-by-optimization breakdown, to give an idea about how much > performance each optimization gives? I planned it in the next release that will come may be next week. I'll introduce some #DEFINEs and #IFs in the code, so that only specific optimizations will be enabled. > Looking over the slides, I see that you still need to implement > functionality to make test_trace pass, for example; do you have a > notion of how much performance it will cost to implement the rest of > Python's semantics in these areas? Very little. That's because there are only two tests on test_trace that don't pass. I think that the reason stays in the changes that I made in the loops. With my code SETUP_LOOP and POP_BREAK are completely removed, so the code in settrace will failt to recognize the loop and the virtual machine crashes. I'll fix it in the second release that I have planned. > Also, I checked out wpython at head to run Unladen Swallow's > benchmarks against it, but it refuses to compile with either gcc 4.0.1 > or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build > failures off-list, if you're interested. > > Thanks, > Collin Winter I'm very interested, thanks. That's because I worked only on Windows machines, so I definitely need to test and fix it to let it run on any other platform. Cesare From solipsis at pitrou.net Tue May 12 13:40:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 12 May 2009 11:40:29 +0000 (UTC) Subject: [Python-Dev] A wordcode-based Python References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> <4022.88.149.182.147.1242110539.squirrel@webmail5.pair.com> Message-ID: Hi Cesare, Cesare Di Mauro a-tono.com> writes: > > It was my idea too, but first I need to take a deep look at what parts > of code are changed from 2.6 to 3.0. > That's because I don't know how much work is required for this > "forward" port. If you have some questions or need some help, send me a message. Regards Antoine. From cesare.dimauro at a-tono.com Tue May 12 13:45:47 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 12 May 2009 13:45:47 +0200 (CEST) Subject: [Python-Dev] A wordcode-based Python In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB22B5.3040507@simplistix.co.uk> <4A0590C4.1020904@v.loewis.de> <20090509143829.17F293A4080@sparrow.telecommunity.com> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> <4022.88.149.182.147.1242110539.squirrel@webmail5.pair.com> Message-ID: <3846.88.149.182.147.1242128747.squirrel@webmail2.pair.com> On Thu, May 12, 2009 01:40PM, Antoine Pitrou wrote: > > Hi Cesare, > > Cesare Di Mauro a-tono.com> writes: >> >> It was my idea too, but first I need to take a deep look at what parts >> of code are changed from 2.6 to 3.0. >> That's because I don't know how much work is required for this >> "forward" port. > > If you have some questions or need some help, send me a message. > > Regards > > Antoine. OK, thanks. :) Another note. Fredrik Johansson let me note just few minutes ago that I've compiled my sources without PGO optimizations enabled. That's because I used Visual Studio Express Edition. So another gain in performances can be obtained. :) cheers Cesare From collinw at gmail.com Tue May 12 17:27:11 2009 From: collinw at gmail.com (Collin Winter) Date: Tue, 12 May 2009 08:27:11 -0700 Subject: [Python-Dev] A wordcode-based Python In-Reply-To: <3846.88.149.182.147.1242128747.squirrel@webmail2.pair.com> References: <49D4DA72.60401@v.loewis.de> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> <4022.88.149.182.147.1242110539.squirrel@webmail5.pair.com> <3846.88.149.182.147.1242128747.squirrel@webmail2.pair.com> Message-ID: <43aa6ff70905120827n15c08468jb5fca2a19aa620fb@mail.gmail.com> On Tue, May 12, 2009 at 4:45 AM, Cesare Di Mauro wrote: > Another note. Fredrik Johansson let me note just few minutes ago that I've > compiled my sources without PGO optimizations enabled. > > That's because I used Visual Studio Express Edition. > > So another gain in performances can be obtained. :) FWIW, Unladen Swallow experimented with gcc 4.4's FDO and got an additional 10-30% (depending on the benchmark). The training load is important, though: some training sets offered better performance than others. I'd be interested in how MSVC's PGO compares to gcc's FDO in terms of overall effectiveness. The results for gcc FDO with our 2009Q1 release are at the bottom of http://code.google.com/p/unladen-swallow/wiki/Releases. Collin Winter From cesare.dimauro at a-tono.com Tue May 12 18:41:45 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 12 May 2009 18:41:45 +0200 (CEST) Subject: [Python-Dev] A wordcode-based Python In-Reply-To: <43aa6ff70905120827n15c08468jb5fca2a19aa620fb@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <4A059639.7040505@v.loewis.de> <20090509153716.D44633A4080@sparrow.telecommunity.com> <7FF9D9A9-211E-4E5D-BDD0-9C0315123975@zooko.com> <4c8bd6707712f01ccf3841c2c26169ef@preisshare.net> <1024.88.149.182.147.1242064816.squirrel@webmail5.pair.com> <4022.88.149.182.147.1242110539.squirrel@webmail5.pair.com> <3846.88.149.182.147.1242128747.squirrel@webmail2.pair.com> <43aa6ff70905120827n15c08468jb5fca2a19aa620fb@mail.gmail.com> Message-ID: <2743.88.149.182.147.1242146505.squirrel@webmail2.pair.com> On Tue, May 12, 2009 05:27 PM, Collin Winter wrote: > On Tue, May 12, 2009 at 4:45 AM, Cesare Di Mauro > wrote: >> Another note. Fredrik Johansson let me note just few minutes ago that >> I've >> compiled my sources without PGO optimizations enabled. >> >> That's because I used Visual Studio Express Edition. >> >> So another gain in performances can be obtained. :) > > FWIW, Unladen Swallow experimented with gcc 4.4's FDO and got an > additional 10-30% (depending on the benchmark). The training load is > important, though: some training sets offered better performance than > others. I'd be interested in how MSVC's PGO compares to gcc's FDO in > terms of overall effectiveness. The results for gcc FDO with our > 2009Q1 release are at the bottom of > http://code.google.com/p/unladen-swallow/wiki/Releases. > > Collin Winter Unfortunately I can't test PGO, since I use the Express Editions of VS. May be Martin or othe mainteners of the Windows versions can help here. However it'll be difficult to find a good enough profile for the binaries distributed for the official Python. FDO brings to quite different results based on the profile selected. cheers, Cesare From asmodai at in-nomine.org Tue May 12 18:43:55 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Tue, 12 May 2009 18:43:55 +0200 Subject: [Python-Dev] Switchover: mail.python.org In-Reply-To: <0B2CACDB-4C60-4038-91F1-235E7FBD5E37@python.org> References: <20090511124644.GB19400@panix.com> <20090511172955.GT24353@nexus.in-nomine.org> <0B2CACDB-4C60-4038-91F1-235E7FBD5E37@python.org> Message-ID: <20090512164355.GY24353@nexus.in-nomine.org> -On [20090512 18:41], Barry Warsaw (barry at python.org) wrote: >Somehow, personalization got turned off for python-checkins. This >disables VERPing of the headers. I've turned it back on, so please >let me know if that fixes the issue. This did not appear to happen >site-wide, just for python-checkins AFAICT. Yes, the current batches are arriving with personilization again. I don't mind either way, just thought a heads up was warranted. ;) Thanks Barry, -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B The Idea does not replace the work... From barry at python.org Tue May 12 18:41:19 2009 From: barry at python.org (Barry Warsaw) Date: Tue, 12 May 2009 12:41:19 -0400 Subject: [Python-Dev] Switchover: mail.python.org In-Reply-To: <20090511172955.GT24353@nexus.in-nomine.org> References: <20090511124644.GB19400@panix.com> <20090511172955.GT24353@nexus.in-nomine.org> Message-ID: <0B2CACDB-4C60-4038-91F1-235E7FBD5E37@python.org> On May 11, 2009, at 1:29 PM, Jeroen Ruigrok van der Werven wrote: > -On [20090511 14:47], Aahz (aahz at pythoncraft.com) wrote: >> On Monday 2009-05-11, mail.python.org will be switched to another >> machine >> starting roughly at 14:00 UTC. This should be invisible (expected >> downtime is less than ten minutes). > > The headers for the python checkins mails are apparently different > now. So > people might want to adjust any filtering. Somehow, personalization got turned off for python-checkins. This disables VERPing of the headers. I've turned it back on, so please let me know if that fixes the issue. This did not appear to happen site-wide, just for python-checkins AFAICT. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From jmatejek at suse.cz Tue May 12 20:42:52 2009 From: jmatejek at suse.cz (=?ISO-8859-1?Q?Jan_Mate=28jek?=) Date: Tue, 12 May 2009 20:42:52 +0200 Subject: [Python-Dev] CVE-2008-5983 "untrusted python modules search path" In-Reply-To: References: Message-ID: <4A09C32C.30200@suse.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Antoine Pitrou napsal(a): > Hello, > > I don't think it has already posted to the list, apologies if it has. > > Some Linux tools and vendors have been hit by an alleged "security hole" where > an embedded Python interpreter will prepend the current working directory to > sys.path as soon as PySys_SetArgv() is called by the embedding application. This > means, for example, that a Python file in the working directory can break > plugins or extensions written for that application if the Python file happens to > shadow another module. > > Regardless of whether this is a security hole or not, it certainly can make > things disturbingly surprising when the situation arises. In the bug report > (http://bugs.python.org/issue5753), I suggested we add a new function > PySys_SetArgvEx() which would take an additional parameter telling whether to > touch sys.path or not (in the same spirit as Py_InitializeEx() providing a more > flexible API than Py_Initialize()). > > On the other hand, I don't think we can change the default behaviour of > PySys_SetArgv(), since there are probably tools and applications relying on it > (the obvious use case which comes to my mind is a third-party interactive > interpreter). > > Any opinions? yes! Actually, i wanted to propose and implement something like this back when this vulnerability appeared, but i never got to it. I'd propose to create a whole new function, called, say, PySys_FillArgv() (no, i don't think that's a very good name) that would - -only- fill sys.argv and not touch sys.path. In addition to that, there would be a function like PySys_SetScriptPath() that would not fill sys.argv, but prepend the script's directory to sys.path Then i'd reimplement PySys_SetArgv as { PySys_FillArgv(); PySys_SetScriptPath(); } And as a final killing step, i would never ever mention PySys_SetArgv anywhere but in its own documentation ;e) And especially not in the first page of "Embedding Python". My rationale is that the only application deliberately using PySys_SetArgv the way it's written is a Python interpreter. For that, it's desirable to have '.' in sys.path _when no script is being executed_. For *all other applications*, this makes no sense ;e) regards m. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jmatejek%40suse.cz -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkoJwywACgkQjBrWA+AvBr8UQwCgmLdu+aq9pYUxbSn/7i7hF1dK lw0AnRo0UCBbszxtzeXNcmmdO7d9sYx4 =0tU7 -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed May 13 00:06:12 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 12 May 2009 22:06:12 +0000 (UTC) Subject: [Python-Dev] Shorter release schedule? Message-ID: Hello, Just food for thought here, but seeing how 3.1 is going to be a real featureful schedule despite being released shortly after 3.0, wouldn't it make sense to tighten future release planning a little? I was thinking something like doing a major release every 12 months (rather than 18 to 24 months as has been heuristically the case lately). This could also imply switching to some kind of loosely time-based release system. If I'm wildly off-base, you can either flame me, ignore me, or assign me annoying release blockers involving memoryviews and weird character encodings :-) Regards Antoine. From google at mrabarnett.plus.com Wed May 13 00:25:26 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 12 May 2009 23:25:26 +0100 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <4A09F756.8000005@mrabarnett.plus.com> Antoine Pitrou wrote: > Hello, > > Just food for thought here, but seeing how 3.1 is going to be a real featureful > schedule despite being released shortly after 3.0, wouldn't it make sense to > tighten future release planning a little? I was thinking something like doing a > major release every 12 months (rather than 18 to 24 months as has been > heuristically the case lately). This could also imply switching to some kind of > loosely time-based release system. > > If I'm wildly off-base, you can either flame me, ignore me, or assign me > annoying release blockers involving memoryviews and weird character encodings :-) > Next you'll be saying that they should be named after years. Python 2010, anyone? :-) I think that releases should depend on whether there are enough changes for one. From solipsis at pitrou.net Wed May 13 00:29:23 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 12 May 2009 22:29:23 +0000 (UTC) Subject: [Python-Dev] Shorter release schedule? References: <4A09F756.8000005@mrabarnett.plus.com> Message-ID: MRAB mrabarnett.plus.com> writes: > Next you'll be saying that they should be named after years. Python > 2010, anyone? After py3k, that would be a regression ;) cheers Antoine. From barry at python.org Wed May 13 00:29:43 2009 From: barry at python.org (Barry Warsaw) Date: Tue, 12 May 2009 18:29:43 -0400 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: On May 12, 2009, at 6:06 PM, Antoine Pitrou wrote: > Just food for thought here, but seeing how 3.1 is going to be a real > featureful > schedule despite being released shortly after 3.0, wouldn't it make > sense to > tighten future release planning a little? I was thinking something > like doing a > major release every 12 months (rather than 18 to 24 months as has been > heuristically the case lately). This could also imply switching to > some kind of > loosely time-based release system. > > If I'm wildly off-base, you can either flame me, ignore me, or > assign me > annoying release blockers involving memoryviews and weird character > encodings :-) I've been in favor of that for a while now. With the move to a DVCS (how's that coming along?) I think we can have more solid, releasable trunks for longer in the cycle. Then, we'd have feature branches which wouldn't land in trunk until they too are solid and complete (with docs and tests). If a particular feature doesn't make it, it'll just wait until the next release, which would be only 12 months off instead of almost 2 years off. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From collinw at gmail.com Wed May 13 00:35:31 2009 From: collinw at gmail.com (Collin Winter) Date: Tue, 12 May 2009 15:35:31 -0700 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <43aa6ff70905121535j3b930918gdb1ce2d9995427d3@mail.gmail.com> On Tue, May 12, 2009 at 3:06 PM, Antoine Pitrou wrote: > Hello, > > Just food for thought here, but seeing how 3.1 is going to be a real featureful > schedule despite being released shortly after 3.0, wouldn't it make sense to > tighten future release planning a little? I was thinking something like doing a > major release every 12 months (rather than 18 to 24 months as has been > heuristically the case lately). This could also imply switching to some kind of > loosely time-based release system. I'd be in favor of a shorter, 12-month release cycle. I think the limiting resource would be the time and energy of the release managers and the package builders for Windows, etc. Provided it's not a tax on the release staff, I think shorter release cycles would be a benefit to the community. My own experience with time-based releases at work is that it greatly helps focus energy and attention, knowing that you can't simply delay the release if you slack off on your features/bugs. Collin From martin at v.loewis.de Wed May 13 05:26:08 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 13 May 2009 05:26:08 +0200 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <4A0A3DD0.8080806@v.loewis.de> > Just food for thought here, but seeing how 3.1 is going to be a real featureful > schedule despite being released shortly after 3.0, wouldn't it make sense to > tighten future release planning a little? Do you have any specific releases in mind that you would like to apply such a tightened schedule to? > I was thinking something like doing a > major release every 12 months (rather than 18 to 24 months as has been > heuristically the case lately). Such a schedule was initially used for the first 2.x releases. We then switched to 18 months because of user complaints: if releases come too frequently, the users are confused as to what release they should be using. Even 24 months is too frequently for some: some people are only starting to move to 2.5 right now - when we have stopped maintaining it already. One question is what would happen to the old releases: would we still maintain them? If so, how many of them? For how long? Regards, Martin From fumanchu at aminus.org Wed May 13 05:43:21 2009 From: fumanchu at aminus.org (Robert Brewer) Date: Tue, 12 May 2009 20:43:21 -0700 Subject: [Python-Dev] [Web-SIG] py3k, cgi, email, and form-data In-Reply-To: <88e286470905121933i6b9dcffj82446098990224cc@mail.gmail.com> References: <88e286470905121933i6b9dcffj82446098990224cc@mail.gmail.com> Message-ID: Graham Dumpleton wrote: > 2009/5/12 Robert Brewer : > > There's a major change in functionality in the cgi module between > Python > > 2 and Python 3 which I've just run across: the behavior of > > FieldStorage.read_multi, specifically when an HTTP app accepts a file > > upload within a multipart/form-data payload. > > > > In Python 2, each part would be read in sequence within its own > > FieldStorage instance. This allowed file uploads to be shunted to a > > TemporaryFile (via make_file) as needed: > > > > ??? klass = self.FieldStorageClass or self.__class__ > > ??? part = klass(self.fp, {}, ib, > > ???????????????? environ, keep_blank_values, strict_parsing) > > ??? # Throw first part away > > ??? while not part.done: > > ??????? headers = rfc822.Message(self.fp) > > ??????? part = klass(self.fp, headers, ib, > > ???????????????????? environ, keep_blank_values, strict_parsing) > > ??????? self.list.append(part) > > > > In Python 3 (svn revision 72466), the whole request body is read into > > memory first via fp.read(), and then broken into separate parts in a > > second step: > > > > ??? klass = self.FieldStorageClass or self.__class__ > > ??? parser = email.parser.FeedParser() > > ??? # Create bogus content-type header for proper multipart parsing > > ??? parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, > ib)) > > ??? parser.feed(self.fp.read()) > > ??? full_msg = parser.close() > > ??? # Get subparts > > ??? msgs = full_msg.get_payload() > > ??? for msg in msgs: > > ??????? fp = StringIO(msg.get_payload()) > > ??????? part = klass(fp, msg, ib, environ, keep_blank_values, > > ???????????????????? strict_parsing) > > ??????? self.list.append(part) > > > > This makes the cgi module in Python 3 somewhat crippled for handling > > multipart/form-data file uploads of any significant size (and since > > the client is the one determining the size, opens a server up for an > > unexpected Denial of Service vector). > > > > I *think* the FeedParser is designed to accept incremental writes, > > but I haven't yet found a way to do any kind of incremental reads > > from it in order to shunt the fp.read out to a tempfile again. > > I'm secretly hoping Barry has a one-liner fix for this. ;) > > FWIW, Werkzeug gave up on 'cgi' module for form passing and implements > its own. > > Not sure whether this issue in Python 3.0 was one of the reasons or > not. I know one of the reasons was because cgi.FieldStorage is not > WSGI 1.0 compliant. One of the main reasons that no one actually > adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't > been addressed by a proper amendment to WSGI 1.0 specification or a > new WSGI 1.1 specification to allow a hint to readline(). > > The Werkzeug form processing module is properly WSGI 1.0 compliant, > meaning that Wekzeug is possibly the only major WSGI framework to be > WSGI compliant. FWIW, I just added a replacement for the cgi module to CherryPy over the weekend for the same reasons. It's in the python3 branch but will get backported to CherryPy 3.2 for Python 2.x. Robert Brewer fumanchu at aminus.org From greg.ewing at canterbury.ac.nz Wed May 13 03:54:23 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 13 May 2009 13:54:23 +1200 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: <4A09F756.8000005@mrabarnett.plus.com> References: <4A09F756.8000005@mrabarnett.plus.com> Message-ID: <4A0A284F.8040907@canterbury.ac.nz> MRAB wrote: > Next you'll be saying that they should be named after years. Python > 2010, anyone? :-) To keep people on their toes, we should switch to a completely random new naming scheme with every release, like Microsoft has been doing with Windows. -- Greg From graham.dumpleton at gmail.com Wed May 13 04:33:02 2009 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Wed, 13 May 2009 12:33:02 +1000 Subject: [Python-Dev] [Web-SIG] py3k, cgi, email, and form-data In-Reply-To: References: Message-ID: <88e286470905121933i6b9dcffj82446098990224cc@mail.gmail.com> 2009/5/12 Robert Brewer : > There's a major change in functionality in the cgi module between Python > 2 and Python 3 which I've just run across: the behavior of > FieldStorage.read_multi, specifically when an HTTP app accepts a file > upload within a multipart/form-data payload. > > In Python 2, each part would be read in sequence within its own > FieldStorage instance. This allowed file uploads to be shunted to a > TemporaryFile (via make_file) as needed: > > ??? klass = self.FieldStorageClass or self.__class__ > ??? part = klass(self.fp, {}, ib, > ???????????????? environ, keep_blank_values, strict_parsing) > ??? # Throw first part away > ??? while not part.done: > ??????? headers = rfc822.Message(self.fp) > ??????? part = klass(self.fp, headers, ib, > ???????????????????? environ, keep_blank_values, strict_parsing) > ??????? self.list.append(part) > > In Python 3 (svn revision 72466), the whole request body is read into > memory first via fp.read(), and then broken into separate parts in a > second step: > > ??? klass = self.FieldStorageClass or self.__class__ > ??? parser = email.parser.FeedParser() > ??? # Create bogus content-type header for proper multipart parsing > ??? parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib)) > ??? parser.feed(self.fp.read()) > ??? full_msg = parser.close() > ??? # Get subparts > ??? msgs = full_msg.get_payload() > ??? for msg in msgs: > ??????? fp = StringIO(msg.get_payload()) > ??????? part = klass(fp, msg, ib, environ, keep_blank_values, > ???????????????????? strict_parsing) > ??????? self.list.append(part) > > This makes the cgi module in Python 3 somewhat crippled for handling > multipart/form-data file uploads of any significant size (and since > the client is the one determining the size, opens a server up for an > unexpected Denial of Service vector). > > I *think* the FeedParser is designed to accept incremental writes, > but I haven't yet found a way to do any kind of incremental reads > from it in order to shunt the fp.read out to a tempfile again. > I'm secretly hoping Barry has a one-liner fix for this. ;) FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own. Not sure whether this issue in Python 3.0 was one of the reasons or not. I know one of the reasons was because cgi.FieldStorage is not WSGI 1.0 compliant. One of the main reasons that no one actually adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't been addressed by a proper amendment to WSGI 1.0 specification or a new WSGI 1.1 specification to allow a hint to readline(). The Werkzeug form processing module is properly WSGI 1.0 compliant, meaning that Wekzeug is possibly the only major WSGI framework to be WSGI compliant. Graham From stephen at xemacs.org Wed May 13 06:12:27 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 May 2009 13:12:27 +0900 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <87predbpzo.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > Just food for thought here, but seeing how 3.1 is going to be a > real featureful schedule despite being released shortly after 3.0, > wouldn't it make sense to tighten future release planning a little? With all due respect, it's easy and natural to have a short, featureful release schedule immediately after a major release (or should I say "complete rewrite"?) The discussion should focus on what happens as people become relatively satisfied with the core, and development shifts to optimizations, (smallish) bug fixes, and features that appeal to specialized audiences. That is when both the costs and the benefits of a tighter and/or time-based releases appear. > I was thinking something like doing a major release every 12 months > (rather than 18 to 24 months as has been heuristically the case > lately). This could also imply switching to some kind of loosely > time-based release system. I don't wish to express an opinion on either of these, as I'm not even in a position to help with release blockers. But I do hope discussion will focus on the implications for Python 3.7, not Python 3.3. From tleeuwenburg at gmail.com Wed May 13 06:44:52 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 13 May 2009 14:44:52 +1000 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: <4A0A3DD0.8080806@v.loewis.de> References: <4A0A3DD0.8080806@v.loewis.de> Message-ID: <43c8685c0905122144v41d43701jeee32a561a55f884@mail.gmail.com> On Wed, May 13, 2009 at 1:26 PM, "Martin v. L?wis" wrote: > > Just food for thought here, but seeing how 3.1 is going to be a real > featureful > > schedule despite being released shortly after 3.0, wouldn't it make sense > to > > tighten future release planning a little? > > Do you have any specific releases in mind that you would like to apply > such a tightened schedule to? > > > I was thinking something like doing a > > major release every 12 months (rather than 18 to 24 months as has been > > heuristically the case lately). > If I can just respond with a bit of feedback from my workplace, I'd say that slower is better. I'm grimacing as I write that :) because I personally love to be able to take advantage of the new capabilities in each release. Can I ask if there's any sense in pursuing a release schedule which is slow for whatever might be deemed the "most core modules" but faster for "less core modules"? This is really a response to my workplace environment. The pro of that is that it's a real example, but the con is that it may not be best practise :) Something else which would definitely be useful for me personally would be a kind of update egg which I could apply to, say, Python 3.0 to bring it up to 3.1 capabilities. Something that already happens now at work reasonably often is that on my PC I have Python 2.4, 2.5, 2.6 and 3.0 installed. I tend to develop under 2.6 from preference. However, server X only has 2.4 installed or worse, 2.3 which I don't even have. Recently I was bitten by this as my code was relying heavily on some functionality in datetime which had changed. I was faced with having to do some re-architecting that I really didn't want to do. Now, I don't know of course (I found another way around the issue), but suppose the changes to Python I needed were relatively cosmetic, i.e. the kind of thing I could maybe install into a virtualenv wrapper, then it would have been quite easy for me to run my scripts written for Python 2.6. To get to the point, I wonder if it would be possible to release new versions alongside a patch or egg which someone with only user-level privileges could use on a server to avoid being held back by a slower server update cycle. A more frequent update cycle would then be easier to deal with. More features would get out into use more quickly, while the pressures of the lowest-common-denominator would be eased. Just some thoughts... Regards, -Tennessee -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed May 13 08:10:48 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 13 May 2009 15:10:48 +0900 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: <43c8685c0905122144v41d43701jeee32a561a55f884@mail.gmail.com> References: <4A0A3DD0.8080806@v.loewis.de> <43c8685c0905122144v41d43701jeee32a561a55f884@mail.gmail.com> Message-ID: <87my9hbkif.fsf@uwakimon.sk.tsukuba.ac.jp> Tennessee Leeuwenburg writes: > Can I ask if there's any sense in pursuing a release schedule which > is slow for whatever might be deemed the "most core modules" but > faster for "less core modules"? I think you need to be more specific about how many levels of "fast" there should be, and why some modules might be deemed more or less "core". For example, this is part of why bsddb (sp?) was removed from the stdlib, because its cycle is different from the core (it's heavily torqued by whatever upstream chooses to throw at it, so it has been the devil to test). If you're not familiar with the history, you might try searching the list for "bsddb 'Jesus Cea' stdlib" which should bring up relevant threads. (Make sure you spell the package name right, sorry if I got it wrong!) In short, the answer is "the stuff on a different cycle is already on PyPI". > Something else which would definitely be useful for me personally > would be a kind of update egg which I could apply to, say, Python > 3.0 to bring it up to 3.1 capabilities. But this would have to be considered on a per-feature basis. If that's possible for an individual feature (ie, doesn't involve changes to the interpreter or compiler), almost surely the feature "did hard time" in PyPI. So you can probably get some version there. OTOH, such an egg would have to contain only a subset of features. If there are interdependencies between that subset and those that can't be included, in some sense you will be creating a completely new and *untested* version of Python. So I think that most server admins would really want you to restrict to features you actually need, and therefore the "best" update-egg will be very application-specific. With the new features being proposed for dist-utils, I suppose you (or anybody who feels like doing so) could create a "namespace package" for such updates, pulling in the relevant modules from PyPI. Do you think that could work for you? (See the PEP 382 threads for more info; I think current discussion has moved to distutils-SIG). From dirkjan at ochtman.nl Wed May 13 09:39:17 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 13 May 2009 09:39:17 +0200 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: On Wed, May 13, 2009 at 12:29 AM, Barry Warsaw wrote: > I've been in favor of that for a while now. ?With the move to a DVCS (how's > that coming along?) I've been rewriting PEP 374 about the Mercurial migration. Will post here once it's ready for review. Cheers, Dirkjan From larry.bugbee at boeing.com Wed May 13 10:01:33 2009 From: larry.bugbee at boeing.com (Bugbee, Larry) Date: Wed, 13 May 2009 01:01:33 -0700 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <9418DB6C0B9D434190E54A78E931C3D10961D6C4@XCH-NW-7V1.nw.nos.boeing.com> >From the perspective of this application developer and prototyper... In general, releases should be more frequent when the language is less mature and perhaps lacking. With maturity one seeks stability and less frequency. Python is, for the most part, a mature language. I submit the issue is less a question of frequency, but more a question of the number and value of each of the new features. Too many new features added to a mature language begs the question of simplicity vs complexity. One of Python's original goals, if I recall correctly, was to keep life simple, to have executable psuedocode, be easy to learn and re-learn, and be able to quickly read and grok your code 6-12 months later. Ease of maintenance is a huge advantage of Python. From an application developer's perspective, too many confusing features and the language becomes more and more like C++ and APL. I submit Python is now at the point where new features must not be added just because they are cool, but because they indeed add significant value *without* compromising simplicity and the suite of "easy to" benefits. The alternative is to rethink the long-term goals for the language. That could have large unintended consequences. Larry From henning.vonbargen at arcor.de Wed May 13 10:34:55 2009 From: henning.vonbargen at arcor.de (henning.vonbargen at arcor.de) Date: Wed, 13 May 2009 10:34:55 +0200 (CEST) Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread support? Message-ID: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> How to build Python 2.6.2 on HP-UX Itanium with thread support? Note: I know that the first address to post this question is comp.lang.python, but I posted this question a week ago on comp.lang.python (http://groups.google.com/group/comp.lang.python/browse_thread/thread/c7006ad8e5cf81e8) and unfortunately, I didn't receive any answers. According to Patch 1225212, at least Peter Kropf was able to get Python running with threading support on this platform, though AFAIK he was not using GCC. But I guess it should be possible with GCC as well. Is anyone able to confirm that Python (built with GCC) does or does not work with multi-threading on HP-UX Itanium? Is HP-UX Itanium a supported platform at all? BTW: A search for "supported platforms" at www.python.org does not help! And if it does work, which steps need to be taken to build it, e.g. other libraries/packages, environment variables, configure options, manual modifications? Henning From solipsis at pitrou.net Wed May 13 10:39:03 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 13 May 2009 08:39:03 +0000 (UTC) Subject: [Python-Dev] Shorter release schedule? References: <4A0A3DD0.8080806@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > Such a schedule was initially used for the first 2.x releases. We then > switched to 18 months because of user complaints: if releases come too > frequently, the users are confused as to what release they should be > using. Even 24 months is too frequently for some: some people are only > starting to move to 2.5 right now - when we have stopped maintaining > it already. Obviously, there are some users who value stability over everything else. While new language features are never critical and can easily be circumvented if you want your code to run on old Python versions, stdlib improvements can be more important for the average user. So perhaps the answer is the split that Brett proposed between core language and stdlib. > One question is what would happen to the old releases: would we still > maintain them? If so, how many of them? For how long? Yes, I realized that's one of the problems with this proposal. If we had to maintain more than one stable branch, it would become a burden. From eric at trueblade.com Wed May 13 10:57:29 2009 From: eric at trueblade.com (Eric Smith) Date: Wed, 13 May 2009 04:57:29 -0400 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: <4A0A3DD0.8080806@v.loewis.de> Message-ID: <4A0A8B79.5070203@trueblade.com> Antoine Pitrou wrote: > Yes, I realized that's one of the problems with this proposal. If we had to > maintain more than one stable branch, it would become a burden. Agreed. And since we have 2.x and 3.x now, we already have that problem. I'd like to an acceleration of release schedules (if it even happens) come after 2.x is retired. From ncoghlan at gmail.com Wed May 13 14:31:17 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 13 May 2009 22:31:17 +1000 Subject: [Python-Dev] Shorter release schedule? In-Reply-To: References: Message-ID: <4A0ABD95.2060008@gmail.com> Antoine Pitrou wrote: > Hello, > > Just food for thought here, but seeing how 3.1 is going to be a real featureful > schedule despite being released shortly after 3.0, wouldn't it make sense to > tighten future release planning a little? I was thinking something like doing a > major release every 12 months (rather than 18 to 24 months as has been > heuristically the case lately). This could also imply switching to some kind of > loosely time-based release system. > > If I'm wildly off-base, you can either flame me, ignore me, or assign me > annoying release blockers involving memoryviews and weird character encodings :-) I don't think a shorter release cycle makes sense for a programming language. It's already the case that even with 18+ month release cycles some end users will leapfrog releases (e.g. 2.2-> 2.4 -> 2.6) for their environments (speaking from experience there, although the 2.6 part is mere wishful thinking at this stage). It also seems to takes 6-12 months for the complaints about Windows binary compatibility to die down after each release (although that appears to be less of an issue since MS released Visual Studio Express). That said, the 3.1 to 3.2 spacing will probably be shorter than normal (i.e. around 12 months), simply because 3.1 is an "extra" release to iron out some of the major issues with 3.0. This will give 'normal' 18 month spacing for the 2.6 -> 2.7 gap. The other big factor to consider here is the duration of bug fix support for releases. With our policy of "current release and previous release are supported with bug fixes" and the 18-24 month release cycle, that means each release typically receives bug fix updates for 3-4 years. That's a reasonably period of time (and gives plenty of time to shake out even fairly thorny issues). If we were to switch to yearly releases, then either the support policy would have to change to at least "current release and the previous two releases" or we'd have to accept the fact that the support period for each release would now be no more than 2 years. Since 2 years strikes me as an unacceptably short period for maintenance, shorter release cycles would then lead directly to having to maintain more parallel branches (which doesn't strike me as a good use of developer effort). Standardising a time frame for major releases is a fine idea, but I don't think shortening that time frame to 12 months would be wise. Settling on 18 months would probably work though - those that crave stability can then use every alternate version and only upgrade every 3 years, as each branch would be maintained with general bug fixes for at least 3 years and security fixes for a further 3 years after that. I think 24 months would lead to too slow an overall development tempo though - the year-and-a-half approach feels to me like it would strike a better balance. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From janssen at parc.com Wed May 13 19:08:41 2009 From: janssen at parc.com (Bill Janssen) Date: Wed, 13 May 2009 10:08:41 PDT Subject: [Python-Dev] Shorter release schedule? In-Reply-To: <4A0ABD95.2060008@gmail.com> References: <4A0ABD95.2060008@gmail.com> Message-ID: <4129.1242234521@parc.com> Nick Coghlan wrote: > Settling on 18 months would probably work though - those that crave > stability can then use every alternate version and only upgrade every 3 > years I wonder about that. Lots of people are forced to upgrade by new language features: decorators, list comprehensions, set literals, etc., that are required by external libraries that they use. One of the huge strengths of Python is the external library community. Interesting tension there... Bill From hagenf at CoLi.Uni-SB.DE Wed May 13 18:07:50 2009 From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Wed, 13 May 2009 18:07:50 +0200 Subject: [Python-Dev] Should collections.Counter check for int? Message-ID: <4A0AF056.4090303@coli.uni-saarland.de> I just noticed that while the docs say that "Counts are allowed to be any integer value including zero or negative counts", collections.Counter doesn't perform any check on the types of count values. Instead, non-numerical values will lead to strange behaviour or exceptions later on: >>> c = collections.Counter({'a':'3', 'b':'20', 'c':'100'}) >>> c.most_common(2) [('a', '3'), ('b', '20')] >>> c+c Traceback (most recent call last): File "", line 1, in File "/local/hagenf/lib/python3.1/collections.py", line 467, in __add__ if newcount > 0: TypeError: unorderable types: str() > int() I'd prefer Counter to refuse non-numerical values right away as the present behaviour may hide bugs (e.g. a forgotten string->int conversion). Any opinions? (And what about negative values or floats?) - Hagen From martin at v.loewis.de Wed May 13 21:35:21 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 13 May 2009 21:35:21 +0200 Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread support? In-Reply-To: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> References: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> Message-ID: <4A0B20F9.1060706@v.loewis.de> > How to build Python 2.6.2 on HP-UX Itanium with thread support? > Note: I know that the first address to post this question is comp.lang.python, but > I posted this question a week ago on comp.lang.python > (http://groups.google.com/group/comp.lang.python/browse_thread/thread/c7006ad8e5cf81e8) > and unfortunately, I didn't receive any answers. That isn't sufficient reason to post to python-dev, though. > Is HP-UX Itanium a supported platform at all? Python does not have a single supported platform (*), so: no. (*) in the sense that anybody is providing "support" for it, ie. guarantees help in case somebody has problems. (**) HP-UX is not a platform that any of the regular Python contributors uses or tests on at a regular basis. Python contributors mostly use Linux, Windows, and OS X; some also use Solaris and *BSD. > And if it does work, which steps need to be taken to build it, > e.g. other libraries/packages, environment variables, > configure options, manual modifications? This really is out of scope for python-dev. In scope would be a proposal to apply a certain patch that you had to write Python work on HP-UX, and discussion whether this patch is the appropriate solution. Regards, Martin (**) There is, of course, ActiveState, which does provide binaries, including for HP-UX, so I suppose they support it - at least if you buy commercial support. From ajaksu at gmail.com Wed May 13 21:29:07 2009 From: ajaksu at gmail.com (Daniel Diniz) Date: Wed, 13 May 2009 16:29:07 -0300 Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread support? In-Reply-To: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> References: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> Message-ID: <2d75d7660905131229q6cd92088n7710281deb17d0eb@mail.gmail.com> Hi Henning, henning.vonbargen wrote: > How to build Python 2.6.2 on HP-UX Itanium with thread support? [snip bit about python-list] I can't give you directions, but if you can describe your issues I might be able to help. I'll respond in python-list, as I think this is OT for python-dev. > Is HP-UX Itanium a supported platform at all? > BTW: A search for "supported platforms" at www.python.org does not help! Now, this looks like python-dev material. PEP 11[0], the information in README[1] and the notes in the downloads pages[2] could be improved and updated. If someone has time to invest in this, a compatibility matrix would be very nice to have. Regards, Daniel [0] http://www.python.org/dev/peps/pep-0011/ [1] http://svn.python.org/view/python/trunk/README?revision=72107&view=markup [2] http://www.python.org/download/source/ and http://www.python.org/download/other/ From martin at v.loewis.de Wed May 13 21:52:30 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 13 May 2009 21:52:30 +0200 Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread support? In-Reply-To: <2d75d7660905131229q6cd92088n7710281deb17d0eb@mail.gmail.com> References: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> <2d75d7660905131229q6cd92088n7710281deb17d0eb@mail.gmail.com> Message-ID: <4A0B24FE.6090907@v.loewis.de> > Now, this looks like python-dev material. PEP 11[0], the information > in README[1] and the notes in the downloads pages[2] could be > improved and updated. If someone has time to invest in this, a > compatibility matrix would be very nice to have. I don't think HP-UX is ready for PEP 11 yet. It may not work, but that's a bug that could be fixed if users would actually contribute fixes. Likewise, changes to README could be accepted if users contribute them. I'm not sure /download/source is really that useful - perhaps it would be best to remove it. As for /download/other - contributions are welcome. Regards, Martin From dripton at ripton.net Thu May 14 00:22:39 2009 From: dripton at ripton.net (David Ripton) Date: Wed, 13 May 2009 15:22:39 -0700 Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread support? In-Reply-To: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> References: <20497492.1242203695160.JavaMail.ngmail@webmail10.arcor-online.net> Message-ID: <20090513222239.GB14178@vidar.dreamhost.com> On 2009.05.13 10:34:55 +0200, henning.vonbargen at arcor.de wrote: > How to build Python 2.6.2 on HP-UX Itanium with thread support? > Note: I know that the first address to post this question is comp.lang.python, but > I posted this question a week ago on comp.lang.python > (http://groups.google.com/group/comp.lang.python/browse_thread/thread/c7006ad8e5cf81e8) > and unfortunately, I didn't receive any answers. > > According to Patch 1225212, > at least Peter Kropf was able to get Python running with threading support > on this platform, though AFAIK he was not using GCC. > > But I guess it should be possible with GCC as well. > > Is anyone able to confirm that Python (built with GCC) > does or does not work with multi-threading on HP-UX Itanium? The good news: I did get Python 2.4.x working on HP-UX Itanium, with threading. The compiler was gcc 4.0.x. (I also tried building Python with aCC, but failed.) I remember building both 32-bit and 64-bit versions. I don't remember it being that hard. Used the source for the package at hpux.connect.org.uk as a starting point, since it had a lot of good porting tweaks, but it needed some further tweaking. (The main one I remember that is that the shared library extension for Itanium should be .so not .sl There were also a bunch of paths that required appending 32 or 64.) We used that build of Python in production, for very heavily multithreaded code, on multi-CPU boxes. Worked fine. AFAIK they're still using it. I'm not sure why the binary available at hpux.connect.org.uk has threading disabled. I suspect that some older version of HP/UX had pthread bugs that got fixed somewhere along the line. The bad news: I did this about 3.5 years ago, and I don't work there anymore, so I don't have access to that HP-UX hardware anymore, or to the notes I made when I was doing the port. So I can give you encouragement but not step-by-step instructions. Sorry. -- David Ripton dripton at ripton.net From aahz at pythoncraft.com Thu May 14 04:22:35 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 13 May 2009 19:22:35 -0700 Subject: [Python-Dev] Should collections.Counter check for int? In-Reply-To: <4A0AF056.4090303@coli.uni-saarland.de> References: <4A0AF056.4090303@coli.uni-saarland.de> Message-ID: <20090514022235.GA28101@panix.com> On Wed, May 13, 2009, Hagen F?rstenau wrote: > > I'd prefer Counter to refuse non-numerical values right away as the > present behaviour may hide bugs (e.g. a forgotten string->int > conversion). Any opinions? (And what about negative values or floats?) Please file a report on bugs.python.org so that there's a record of this issue. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From cesare.dimauro at a-tono.com Thu May 14 08:27:10 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Thu, 14 May 2009 08:27:10 +0200 (CEST) Subject: [Python-Dev] special method lookup: how much do we care? In-Reply-To: <4A074C64.5070208@gmail.com> References: <1afaf6160905081109w50b71c7albc4da21965087fdb@mail.gmail.com> <4A074C64.5070208@gmail.com> Message-ID: <4713.88.149.182.147.1242282430.squirrel@webmail2.pair.com> On Sun, May 10, 2009 11:51PM, Nick Coghlan wrote: > However lots of developers rely on CPython ref counting as well, no > matter how many times they're told not to do that if they want to > support alternative interpreters. > > Cheers, > Nick. >From socket.py: # Wrapper around platform socket objects. This implements # a platform-independent dup() functionality. The # implementation currently relies on reference counting # to close the underlying socket object. class _socketobject(object): You don't know how much time I've spent trying to understand why test_httpserver.py hanged indefinitely when I was experimenting with new opcodes in my VM. Cheers, Cesare From rdmurray at bitdance.com Thu May 14 19:30:13 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 14 May 2009 13:30:13 -0400 (EDT) Subject: [Python-Dev] python -m test.regrtest should pass on an installed python Message-ID: For various reasons I happened to run 'python -m test.regrtest' on my Gentoo installed Python. For 2.5.4 only test_tarfile failed (it tries to write into the read-only installed test directory). On 2.6.2 test_tarfile passes, but other test suites, including test_distutils, do not. So this posting is a general reminder that the tests should not make assumptions about the writabilty of the test directory (or, for that matter, of the CWD). When I get time I'll file bugs on the particular failures I'm seeing, after I do an install from checkout. --David From ziade.tarek at gmail.com Fri May 15 00:21:43 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 15 May 2009 00:21:43 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure Message-ID: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> Hello I'm proposing this PEP, which has been discussed in Distutils-SIG, for inclusion in Python 2.7 and 3.2 http://www.python.org/dev/peps/pep-0376/ Please comment ! Tarek -- Tarek Ziad? | http://ziade.org From pje at telecommunity.com Fri May 15 07:00:55 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 15 May 2009 01:00:55 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure Message-ID: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> At 12:21 AM 5/15/2009 +0200, Tarek Ziad? wrote: >Hello > >I'm proposing this PEP, which has been discussed in Distutils-SIG, for >inclusion in Python 2.7 and 3.2 > >http://www.python.org/dev/peps/pep-0376/ > >Please comment ! I'd like to reiterate my suggestion that the uninstall record include size and checksum information, ala PEP 262's "FILES" section. This would allow the uninstall function to validate whether a file has been modified, and thus prevent uninstalling a locally-modified file, or a file installed in some other way. It may also be that providing an uninstall API that simply yields files to be uninstalled, with data about their existence/modification status, would be more useful than a blind uninstall operation with a filter function. Also, the PEP doesn't document what happens if a single file was installed by more than one package. Ideally, a file with identical size/checksum that belongs to more than one project should be silently left alone, and a file installed by more than one project with *different* size/checksum should be warned about and left alone. Next, the doc for the metadata API functions seems quite sparse. ISTR that I've previously commented on such issues as case- and punctuation-insensitivity of project names, and '/' separation in egg_info subpaths, but these don't seem to have been incorporated into the current version of the PEP. These are important considerations in general, btw, because project name and version canonicalization and escaping are an important part of both generating and parsing .egg-info filenemaes. At minimum, the relevant setuptools docs that define these standards should be cited. Finally, the "Definitions" section also claims that a project installs one or more packages, but a project may not contain *any* packages; it may have a standalone module, or just a script, data, or metadata. From asmodai at in-nomine.org Fri May 15 08:32:20 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 15 May 2009 08:32:20 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> Message-ID: <20090515063220.GD24353@nexus.in-nomine.org> -On [20090515 06:59], P.J. Eby (pje at telecommunity.com) wrote: >I'd like to reiterate my suggestion that the uninstall record include >size and checksum information, ala PEP 262's "FILES" section. This >would allow the uninstall function to validate whether a file has >been modified, and thus prevent uninstalling a locally-modified file, >or a file installed in some other way. Agreed. Within FreeBSD's ports the installed package registration gets a MD5 hash per file recorded. Size is less interesting though, since essentially this information is encapsulated within the hash. Remove one byte from the file and your hash is already different. And the case of a collision for this kind of registration is sufficiently small to need the size information. And if you're worried about the MD5 collision space, which for this use case ought to be large enough, you could always settle for SHA1. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B What's one man's poison, is another's meat or drink... From ziade.tarek at gmail.com Fri May 15 08:32:29 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 15 May 2009 08:32:29 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> Message-ID: <94bdd2610905142332k38595ff4l4e478faf2ca43d25@mail.gmail.com> 2009/5/15 P.J. Eby : > At 12:21 AM 5/15/2009 +0200, Tarek Ziad? wrote: >> >> Hello >> >> I'm proposing this PEP, which has been discussed in Distutils-SIG, for >> inclusion in Python 2.7 and 3.2 >> >> http://www.python.org/dev/peps/pep-0376/ >> >> Please comment ! > > I'd like to reiterate my suggestion that the uninstall record include size > and checksum information, ala PEP 262's "FILES" section. ?This would allow > the uninstall function to validate whether a file has been modified, and > thus prevent uninstalling a locally-modified file, or a file installed in > some other way. good point, I'll re-work that part > > It may also be that providing an uninstall API that simply yields files to > be uninstalled, with data about their existence/modification status, would > be more useful than a blind uninstall operation with a filter function. Sure we could have it in that shape, I'll work on this as well. > > Also, the PEP doesn't document what happens if a single file was installed > by more than one package. It does: "...as long as they are not mentioned in another RECORD file..." > ?Ideally, a file with identical size/checksum that > belongs to more than one project should be silently left alone, and a file > installed by more than one project with *different* size/checksum should be > warned about and left alone. I think the path is the info that should be looked at. And a warning could be raised like you said if a file was manually modified. But I don't think you want to leave alone a file with identical size/checksum that belongs to more than one project when it's not the same absolute path. Here's an example why : if two different packages includes the "feedparser.py" module (from the FeedParser project) for conveniency, and if you remove one package, you *do* want to remove its "feeparser.py" module even if it exists in the other project. So it's rather changing the PEP text like this: "...as long as they are not mentioned in another RECORD file, with the same size/checksum..." > > Next, the doc for the metadata API functions seems quite sparse. ?ISTR that > I've previously commented on such issues as case- and > punctuation-insensitivity of project names, and '/' separation in egg_info > subpaths, but these don't seem to have been incorporated into the current > version of the PEP. > > These are important considerations in general, btw, because project name and > version canonicalization and escaping are an important part of both > generating and parsing .egg-info filenemaes. ?At minimum, the relevant > setuptools docs that define these standards should be cited. I'll add more info on that part accordingly then, > > Finally, the "Definitions" section also claims that a project installs one > or more packages, but a project may not contain *any* packages; it may have > a standalone module, or just a script, data, or metadata. > > ok Thanks for the feedbacks -- Tarek Ziad? | http://ziade.org From dirkjan at ochtman.nl Fri May 15 09:50:13 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 15 May 2009 09:50:13 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090515063220.GD24353@nexus.in-nomine.org> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> <20090515063220.GD24353@nexus.in-nomine.org> Message-ID: On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven wrote: > Agreed. Within FreeBSD's ports the installed package registration gets a MD5 > hash per file recorded. Size is less interesting though, since essentially > this information is encapsulated within the hash. Remove one byte from the > file and your hash is already different. And the case of a collision for > this kind of registration is sufficiently small to need the size > information. Size is nice because it's much cheaper to check. I don't know if mass uninstalls will be so common that this is actually something we have to worry about, though. Cheers, Dirkjan From ncoghlan at gmail.com Fri May 15 12:34:35 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 May 2009 20:34:35 +1000 Subject: [Python-Dev] python -m test.regrtest should pass on an installed python In-Reply-To: References: Message-ID: <4A0D453B.1060907@gmail.com> R. David Murray wrote: > So this posting is a general reminder that the tests should not make > assumptions about the writabilty of the test directory (or, for that > matter, of the CWD). Indeed - the tempfile module is very helpful in that regard. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From status at bugs.python.org Fri May 15 18:07:15 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 15 May 2009 18:07:15 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090515160715.694947851B@psf.upfronthosting.co.za> ACTIVITY SUMMARY (05/08/09 - 05/15/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2194 open (+34) / 15658 closed (+26) / 17852 total (+60) Open issues with patches: 855 Average duration of open issues: 647 days. Median duration of open issues: 398 days. Open Issues Breakdown open 2165 (+34) pending 28 ( +0) Issues Created Or Reopened (61) _______________________________ Generator expression bug? 05/08/09 CLOSED http://bugs.python.org/issue5968 reopened svenrahmann sys.exc_info leaks into a generator 05/08/09 http://bugs.python.org/issue5970 created jyasskin patch logging.Handler.handlerError() may raise IOError in traceback.pr 05/08/09 CLOSED http://bugs.python.org/issue5971 created ryles Failing test_signal.py on Redhat 4.1.2-44 05/08/09 http://bugs.python.org/issue5972 created dmauldin re-usable generators / generator expressions should return itera 05/08/09 CLOSED http://bugs.python.org/issue5973 created svenrahmann unicode decode error due to improperly entered text "Martin v. L 05/08/09 CLOSED http://bugs.python.org/issue5974 created srid patch csv unix file format ('\n' line terminator) 05/08/09 http://bugs.python.org/issue5975 created jtalbot test_os fails if run after test_distutils 05/09/09 CLOSED http://bugs.python.org/issue5976 created tarek distutils build_ext.get_outputs returns wrong result (patch) 05/12/09 http://bugs.python.org/issue5977 reopened ajaksu2 patch cProfile and profile don't work with pygtk/pyqt and sys.exit(0) 05/09/09 http://bugs.python.org/issue5978 created akkana strptime() gives inconsistent exceptions 05/09/09 http://bugs.python.org/issue5979 created ryles Add bug tracker tasks to PEP 101 05/09/09 http://bugs.python.org/issue5980 created ajaksu2 patch float.fromhex bugs 05/09/09 CLOSED http://bugs.python.org/issue5981 created marketdickinson patch classmethod, staticmethod: expose wrapped function 05/09/09 http://bugs.python.org/issue5982 created gsakkis boolean.so no more in _xmlplus/utils 05/09/09 CLOSED http://bugs.python.org/issue5983 created schmirrwurst distutils.command.build_ext.check_extensions_list broken checkin 05/10/09 CLOSED http://bugs.python.org/issue5984 created tarek Implement os.path.samefile and os.path.sameopenfile on Windows 05/10/09 http://bugs.python.org/issue5985 created sandberg Avoid reversed() in Random.shuffle() 05/10/09 CLOSED http://bugs.python.org/issue5986 created haypo patch Broken link to "Curses Programming with Python" 05/10/09 CLOSED http://bugs.python.org/issue5987 created ralph.corderoy Delete PyOS_ascii_formatd, PyOS_ascii_strtod, and PyOS_ascii_ato 05/10/09 http://bugs.python.org/issue5988 created eric.smith easy unittest.TestLoader.loadTestsFromNames should accept module / cl 05/10/09 CLOSED http://bugs.python.org/issue5989 created michael.foord Memory leak in os.rename() and other functions 05/10/09 CLOSED http://bugs.python.org/issue5990 created pitrou patch Add non-command help topics to help completion of cmd.Cmd 05/10/09 http://bugs.python.org/issue5991 created flub patch spurious space after opening parenthesis when auto-completing 05/10/09 http://bugs.python.org/issue5992 created pitrou python produces zombie in webbrowser.open 05/11/09 http://bugs.python.org/issue5993 created dontbugme help(marshal) just gives an outline; no help text provided. 05/11/09 CLOSED http://bugs.python.org/issue5994 created orsenthil patch unittest command line behaviour 05/11/09 CLOSED http://bugs.python.org/issue5995 created michael.foord patch, patch, easy abstract class instantiable when subclassing dict 05/11/09 http://bugs.python.org/issue5996 created thet strftime is broken 05/11/09 CLOSED http://bugs.python.org/issue5997 created jonathan.cervidae Add __bool__ to threading.Event and multiprocessing.Event 05/11/09 http://bugs.python.org/issue5998 created flub patch compile error on HP-UX 11.22 ia64 - 'mbstate_t' is used as a typ 05/11/09 http://bugs.python.org/issue5999 created srid compile error - PyNumber_InPlaceOr(newfree, allfree) < 0 05/11/09 CLOSED http://bugs.python.org/issue6000 created srid easy Test discovery for unittest 05/11/09 http://bugs.python.org/issue6001 created michael.foord patch, patch, needs review test_urllib2_localnet DigestAuthHandler leaks nonces 05/11/09 http://bugs.python.org/issue6002 created r.david.murray easy ZipFile.writestr "compression_type" argument 05/12/09 http://bugs.python.org/issue6003 created ronaldoussoren ZipFile.writestr "compression_type" argument 05/12/09 CLOSED http://bugs.python.org/issue6004 created ronaldoussoren Bug in socket example 05/12/09 http://bugs.python.org/issue6005 created kiilerix ffi.c compile failures on AIX 5.3 with xlc 05/12/09 http://bugs.python.org/issue6006 created elyeshel distutils tricks you into thinking you can build extensions with 05/12/09 http://bugs.python.org/issue6007 created exarkun Idle should be installed as `idle3.1` and not `idle3` 05/13/09 http://bugs.python.org/issue6008 created srid optparse docs say 'default' keyword is deprecated but uses it in 05/13/09 http://bugs.python.org/issue6009 created mallyvai unable to retrieve latin-1 encoded data from sqlite3 05/13/09 CLOSED http://bugs.python.org/issue6010 created izarf python doesn't build if prefix contains non-ascii characters 05/13/09 http://bugs.python.org/issue6011 created zegreek patch enhance getargs O& to accept cleanup function 05/13/09 http://bugs.python.org/issue6012 created ocean-city patch json slower than simplejson 05/13/09 CLOSED http://bugs.python.org/issue6013 created theller No shell prompt when a graphics window that was started from IDL 05/13/09 http://bugs.python.org/issue6014 created chessweb Scrollbar in Idle os x 10.5 05/13/09 http://bugs.python.org/issue6015 created an is Use shipped zlib if the system version is bad 05/14/09 CLOSED http://bugs.python.org/issue6016 created ajaksu2 patch, patch Dict fails to notice addition and deletion of keys during iterat 05/14/09 CLOSED http://bugs.python.org/issue6017 created stevenjd Fix the output word from "ok" to "OK" when a testcase passes 05/14/09 http://bugs.python.org/issue6018 created Retro Minor typos in ctypes docs 05/14/09 CLOSED http://bugs.python.org/issue6019 created lehmannro patch Create a datetime.timedelta.totalseconds property 05/14/09 CLOSED http://bugs.python.org/issue6020 created mw44118 itertools.grouper 05/14/09 CLOSED http://bugs.python.org/issue6021 created lieryan test_distutils leaves a 'foo' file behind in the cwd 05/14/09 CLOSED http://bugs.python.org/issue6022 created r.david.murray Search does not intelligently handle module.function queries on 05/14/09 http://bugs.python.org/issue6023 created JonathansCorner.com regrtest says refleaks are "ok" 05/14/09 CLOSED http://bugs.python.org/issue6024 created collinwinter patch documentation of xml.dom.minidom.parse signature is wrong 05/14/09 http://bugs.python.org/issue6025 created phihag patch test_(zipfile|zipimport|gzip|distutils) fail if zlib is not avai 05/15/09 http://bugs.python.org/issue6026 created ezio.melotti test_xmlrpc_net fails when the ISP returns "302 Found" 05/15/09 http://bugs.python.org/issue6027 created ezio.melotti Interpreter crashes when chaining an infinite number of exceptio 05/15/09 CLOSED http://bugs.python.org/issue6028 created yury FAIL: test_longdouble (ctypes.test.test_callbacks.Callbacks) [SP 05/15/09 http://bugs.python.org/issue6029 created illumino Issues Now Closed (58) ______________________ csv input converts \r\n to \n but csv output does not when a fie 531 days http://bugs.python.org/issue1511 ajaksu2 imaplib is not IPv6-capable 514 days http://bugs.python.org/issue1655 pitrou patch nntplib is not IPv6-capable 512 days http://bugs.python.org/issue1664 dmorr patch Cosmetic patch to supress compiler warning 475 days http://bugs.python.org/issue1932 ocean-city patch isinstance(anything, MetaclassThatDefinesInstancecheck) raises i 318 days http://bugs.python.org/issue2325 ajaksu2 patch Check implementation of new buffer interface for PyString in 2.6 413 days http://bugs.python.org/issue2492 pitrou 26backport 64 bit python memory leak usage 388 days http://bugs.python.org/issue2652 pitrou Py3k fails to parse a file with an iso-8859-1 string 384 days http://bugs.python.org/issue2660 benjamin.peterson patch update Lib/test/README 353 days http://bugs.python.org/issue2958 pitrou easy arguments and default path not set in site.py and sitecustomize. 351 days http://bugs.python.org/issue2972 haridsv Python 2.6rc2: Tix ComboBox error 239 days http://bugs.python.org/issue3872 loewis patch inspect.findsource() returns binary data for shared library modu 220 days http://bugs.python.org/issue4050 r.david.murray patch help("modules ftp") fails due to test modules 208 days http://bugs.python.org/issue4135 ajaksu2 Duplicate UTF-16 BOM if a file is open in append mode 115 days http://bugs.python.org/issue5006 pitrou patch backport distutils 3.x changes into 2.7 when appliabl 95 days http://bugs.python.org/issue5164 tarek StringIO can duplicate newlines in universal newlines mode 87 days http://bugs.python.org/issue5265 alexandre.vassalotti test_importlib fails on Mac OSX 10.5.6 w/ case-sensitive file sy 37 days http://bugs.python.org/issue5442 brett.cannon patch TextIOWrapper fails with SystemError when reading HTTPResponse 44 days http://bugs.python.org/issue5628 benjamin.peterson internal error on write while reading 19 days http://bugs.python.org/issue5844 benjamin.peterson patch Ensure RUNPATH is added to extension modules with RPATH if GNU l 8 days http://bugs.python.org/issue5900 tarek patch I need to import the module in the same thread 7 days http://bugs.python.org/issue5908 amaury.forgeotdarc import deadlocks when using fork 11 days http://bugs.python.org/issue5912 benjamin.peterson test_parser crashes when run after some other tests 11 days http://bugs.python.org/issue5918 pitrou patch fix gcc -Wextra warnings (compare signed/unsigned) 4 days http://bugs.python.org/issue5933 marketdickinson patch Add to "whats new": range(n) != range(n) 8 days http://bugs.python.org/issue5953 MLModel Possible mistake regarding writeback in documentation of shelve 4 days http://bugs.python.org/issue5957 r.david.murray patch unnecessary hardlink 1 days http://bugs.python.org/issue5966 orsenthil Generator expression bug? 0 days http://bugs.python.org/issue5968 tjreedy logging.Handler.handlerError() may raise IOError in traceback.pr 1 days http://bugs.python.org/issue5971 vsajip re-usable generators / generator expressions should return itera 1 days http://bugs.python.org/issue5973 r.david.murray unicode decode error due to improperly entered text "Martin v. L 0 days http://bugs.python.org/issue5974 loewis patch test_os fails if run after test_distutils 0 days http://bugs.python.org/issue5976 tarek float.fromhex bugs 2 days http://bugs.python.org/issue5981 marketdickinson patch boolean.so no more in _xmlplus/utils 0 days http://bugs.python.org/issue5983 loewis distutils.command.build_ext.check_extensions_list broken checkin 0 days http://bugs.python.org/issue5984 tarek Avoid reversed() in Random.shuffle() 1 days http://bugs.python.org/issue5986 rhettinger patch Broken link to "Curses Programming with Python" 1 days http://bugs.python.org/issue5987 ralph.corderoy unittest.TestLoader.loadTestsFromNames should accept module / cl 0 days http://bugs.python.org/issue5989 michael.foord Memory leak in os.rename() and other functions 3 days http://bugs.python.org/issue5990 pitrou patch help(marshal) just gives an outline; no help text provided. 2 days http://bugs.python.org/issue5994 r.david.murray patch unittest command line behaviour 1 days http://bugs.python.org/issue5995 michael.foord patch, patch, easy strftime is broken 0 days http://bugs.python.org/issue5997 jonathan.cervidae compile error - PyNumber_InPlaceOr(newfree, allfree) < 0 1 days http://bugs.python.org/issue6000 benjamin.peterson easy ZipFile.writestr "compression_type" argument 0 days http://bugs.python.org/issue6004 ronaldoussoren unable to retrieve latin-1 encoded data from sqlite3 0 days http://bugs.python.org/issue6010 loewis json slower than simplejson 0 days http://bugs.python.org/issue6013 pitrou Use shipped zlib if the system version is bad 0 days http://bugs.python.org/issue6016 ajaksu2 patch, patch Dict fails to notice addition and deletion of keys during iterat 0 days http://bugs.python.org/issue6017 benjamin.peterson Minor typos in ctypes docs 1 days http://bugs.python.org/issue6019 georg.brandl patch Create a datetime.timedelta.totalseconds property 0 days http://bugs.python.org/issue6020 pitrou itertools.grouper 0 days http://bugs.python.org/issue6021 cvrebert test_distutils leaves a 'foo' file behind in the cwd 0 days http://bugs.python.org/issue6022 tarek regrtest says refleaks are "ok" 0 days http://bugs.python.org/issue6024 collinwinter patch Interpreter crashes when chaining an infinite number of exceptio 0 days http://bugs.python.org/issue6028 pitrou Solaris term.h needs curses.h 2023 days http://bugs.python.org/issue831574 ajaksu2 test_subprocess fails on cygwin 996 days http://bugs.python.org/issue1543469 ajaksu2 patch os.popen with os.close gives error message 945 days http://bugs.python.org/issue1574310 ajaksu2 Problem linking to readline lib on x86(64) Solaris 797 days http://bugs.python.org/issue1676121 ajaksu2 Top Issues Most Discussed (10) ______________________________ 14 WeakSet cmp methods 8 days open http://bugs.python.org/issue5964 12 distutils build_ext.get_outputs returns wrong result (patch) 3 days open http://bugs.python.org/issue5977 11 xml.dom.minidom does not escape CR, LF and TAB characters withi 31 days open http://bugs.python.org/issue5752 10 Add to "whats new": range(n) != range(n) 8 days closed http://bugs.python.org/issue5953 10 Do not assume signed integer overflow behavior 519 days open http://bugs.python.org/issue1621 10 Enhance file.readlines by making line separator selectable 443 days open http://bugs.python.org/issue1152248 8 test_asynchat fails on Mac OSX 25 days open http://bugs.python.org/issue5798 7 Add os.link() and os.symlink() and os.path.islink() support for 942 days open http://bugs.python.org/issue1578269 6 test_importlib fails on Mac OSX 10.5.6 w/ case-sensitive file s 37 days closed http://bugs.python.org/issue5442 5 ssl makefile never closes socket 92 days open http://bugs.python.org/issue5238 From python.leojay at gmail.com Fri May 15 18:18:42 2009 From: python.leojay at gmail.com (Leo Jay) Date: Sat, 16 May 2009 00:18:42 +0800 Subject: [Python-Dev] doc error in 2.6.2 Message-ID: <4e307e0f0905150918p722a47d6n11ca391070953db9@mail.gmail.com> There is a syntax error in the client side code of "SocketServer.UDPServer Example" in http://docs.python.org/library/socketserver.html: import socket import sys HOST, PORT = "localhost" data = " ".join(sys.argv[1:]) Obviously, it should be: HOST, PORT = "localhost", 9999 -- Leo Jay From aahz at pythoncraft.com Fri May 15 19:16:27 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 15 May 2009 10:16:27 -0700 Subject: [Python-Dev] doc error in 2.6.2 In-Reply-To: <4e307e0f0905150918p722a47d6n11ca391070953db9@mail.gmail.com> References: <4e307e0f0905150918p722a47d6n11ca391070953db9@mail.gmail.com> Message-ID: <20090515171627.GC18871@panix.com> On Sat, May 16, 2009, Leo Jay wrote: > > There is a syntax error in the client side code of > "SocketServer.UDPServer Example" in > http://docs.python.org/library/socketserver.html: Please follow the directions in http://docs.python.org/bugs.html to report this on bugs.python.org -- that ensures that it won't get lost. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "In 1968 it took the computing power of 2 C-64's to fly a rocket to the moon. Now, in 1998 it takes the Power of a Pentium 200 to run Microsoft Windows 98. Something must have gone wrong." --/bin/fortune From pje at telecommunity.com Fri May 15 19:52:34 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 15 May 2009 13:52:34 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090515063220.GD24353@nexus.in-nomine.org> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> <20090515063220.GD24353@nexus.in-nomine.org> Message-ID: <20090515174953.F06B73A4061@sparrow.telecommunity.com> At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote: >Agreed. Within FreeBSD's ports the installed package registration >gets a MD5 hash per file recorded. Size is less interesting though, >since essentially this information is encapsulated within the hash. >Remove one byte from the file and your hash is already different. Which also means that in that case you can skip computing the MD5. The size allows you to easily notice an overwrite/corruption without further processing. From pje at telecommunity.com Fri May 15 19:56:36 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 15 May 2009 13:56:36 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905142332k38595ff4l4e478faf2ca43d25@mail.gmail.co m> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> <94bdd2610905142332k38595ff4l4e478faf2ca43d25@mail.gmail.com> Message-ID: <20090515175355.B0AA53A4061@sparrow.telecommunity.com> At 08:32 AM 5/15/2009 +0200, Tarek Ziad? wrote: >2009/5/15 P.J. Eby : > > Ideally, a file with identical size/checksum that > > belongs to more than one project should be silently left alone, and a file > > installed by more than one project with *different* size/checksum should be > > warned about and left alone. > >I think the path is the info that should be looked at. By "a file that belongs to more than one project" I meant a single file on *disk* (i.e., one absolute path). >But I don't think you want to leave alone a file with identical >size/checksum that belongs to more than one project when it's not >the same absolute path. That wouldn't be "a file" then, would it? ;-) >Here's an example why : if two different packages includes the >"feedparser.py" module >(from the FeedParser project) for conveniency, and if you remove one package, >you *do* want to remove its "feeparser.py" module even if it exists >in the other >project. Right, that would be *two files*, though, not one file. From tonynelson at georgeanelson.com Sat May 16 00:03:14 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Fri, 15 May 2009 18:03:14 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090515174953.F06B73A4061@sparrow.telecommunity.com> References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> <20090515063220.GD24353@nexus.in-nomine.org> <20090515174953.F06B73A4061@sparrow.telecommunity.com> Message-ID: At 13:52 -0400 05/15/2009, P.J. Eby wrote: >At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote: >>Agreed. Within FreeBSD's ports the installed package registration >>gets a MD5 hash per file recorded. Size is less interesting though, >>since essentially this information is encapsulated within the hash. >>Remove one byte from the file and your hash is already different. > >Which also means that in that case you can skip computing the >MD5. The size allows you to easily notice an overwrite/corruption >without further processing. In most cases the files will actually match, so the sizes and dates will be the same and the checksum must be computed to verify the match. RPM does this when asked to Verify a package. It is faster than Removing a package, and Verifying all installed packages takes a reasonable amount of time. I don't think Python would be any worse at verifying its own packages, and it would normally have less data to verify, so it should be fast enough. -- ____________________________________________________________________ TonyN.:' ' From hfuerstenau at gmx.net Sat May 16 11:58:16 2009 From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Sat, 16 May 2009 11:58:16 +0200 Subject: [Python-Dev] Should collections.Counter check for int? In-Reply-To: <20090514022235.GA28101@panix.com> References: <4A0AF056.4090303@coli.uni-saarland.de> <20090514022235.GA28101@panix.com> Message-ID: <4A0E8E38.4070801@gmx.net> >> I'd prefer Counter to refuse non-numerical values right away as the >> present behaviour may hide bugs (e.g. a forgotten string->int >> conversion). Any opinions? (And what about negative values or floats?) > > Please file a report on bugs.python.org so that there's a record of this > issue. Done: http://bugs.python.org/issue6038 - Hagen From chris at simplistix.co.uk Sat May 16 17:00:39 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 16 May 2009 16:00:39 +0100 Subject: [Python-Dev] .pth files should never contain python In-Reply-To: <79990c6b0905090903o19e11505w353cfe62f4f67071@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <49FB2398.5000708@simplistix.co.uk> <49FB261F.9080306@v.loewis.de> <49FB2A2A.4090606@simplistix.co.uk> <49FB3384.1030106@v.loewis.de> <4A0547AC.7060103@simplistix.co.uk> <4A054C7A.8020806@v.loewis.de> <4A058ECF.6050203@simplistix.co.uk> <79990c6b0905090903o19e11505w353cfe62f4f67071@mail.gmail.com> Message-ID: <4A0ED517.2050903@simplistix.co.uk> Paul Moore wrote: > 2009/5/9 Chris Withers : >> Martin v. L?wis wrote: >>>> I thought .pth files just had python in them? >>> Not at all - they never did. They have paths in them. >> I've certainly seen them with python in, and that's what I hate about >> them... > > AIUI, there was a small special case that lines starting with "import" > are executed (see the source of site.py for details). This exception > has been exploited (some would say "abused", but I'm trying to be > unbiased here) by setuptools, at least, to do path manipulations and > such. Abused is definitely the right word, I suppose it's too late to correct this bug? How about for Python 3? cheers, Chris From ziade.tarek at gmail.com Sat May 16 18:06:25 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 16 May 2009 18:06:25 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> Message-ID: <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> Ok I've changed the PEP with all the points you mentioned, if you want to take a look. 2009/5/15 P.J. Eby : > Next, the doc for the metadata API functions seems quite sparse. ?ISTR that > I've previously commented on such issues as case- and > punctuation-insensitivity of project names, and '/' separation in egg_info > subpaths, but these don't seem to have been incorporated into the current > version of the PEP. > > These are important considerations in general, btw, because project name and > version canonicalization and escaping are an important part of both > generating and parsing .egg-info filenemaes. ?At minimum, the relevant > setuptools docs that define these standards should be cited. I need to find back your comments for this part, I must have missed them. That's the last part I didn't work out yet on the current PEP revision. Regards Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Sat May 16 18:39:41 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 16 May 2009 18:39:41 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: References: <20090515045815.DE1F23A4061@sparrow.telecommunity.com> <20090515063220.GD24353@nexus.in-nomine.org> Message-ID: <94bdd2610905160939s705181dbp2f88e16cf5f60434@mail.gmail.com> Yes, I don't think it's relevant to optimize install/uninstall code in Python. In the whole PEP 376 proposal, the only part that will need care will be the code that browses sys.path. On Fri, May 15, 2009 at 9:50 AM, Dirkjan Ochtman wrote: > On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven > wrote: >> Agreed. Within FreeBSD's ports the installed package registration gets a MD5 >> hash per file recorded. Size is less interesting though, since essentially >> this information is encapsulated within the hash. Remove one byte from the >> file and your hash is already different. And the case of a collision for >> this kind of registration is sufficiently small to need the size >> information. > > Size is nice because it's much cheaper to check. I don't know if mass > uninstalls will be so common that this is actually something we have > to worry about, though. > > Cheers, > > Dirkjan > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com > -- Tarek Ziad? | http://ziade.org From pje at telecommunity.com Sat May 16 18:55:44 2009 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 16 May 2009 12:55:44 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.co m> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> Message-ID: <20090516165302.C71643A4061@sparrow.telecommunity.com> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote: >Ok I've changed the PEP with all the points you mentioned, if you want >to take a look. Some notes: 1. Why ';' separation, instead of tabs as in PEP 262? Aren't semicolons a valid character in filenames? 2. "if the installed file is located in a directory in site-packages" should refer not to site-packages but to the directory containing the .egg-info directory. 3. get_egg_info_file needs to be specified as using '/'-separated paths and converting to OS paths if appropriate. There's also the problem that the mode it opens the file in (binary or text) is unspecified. 4. There should probably be a way to iterate over the projects in a directory, since it's otherwise impossible for an installation tool to find out what project(s) "own" a file that conflicts with something being installed. Alternatively, reshaping the file API to allow querying by path as well as by project might work. 5. If any cache mechanisms are to be used by the API, the API *must* make it possible to bypass or explicitly manage that cache, as otherwise installation tools and tools that manipulate sys.path at runtime may end up using incorrect data. 6. get_files() doesn't document whether the yielded paths are absolute or relative, local or cross-platform, etc. >I need to find back your comments for this part, I must have missed >them. That's >the last part I didn't work out yet on the current PEP revision. Well, if you can't find them, the EggFormats doc explains how these file/dir structures are currently laid out by setuptools, easy_install, pip, etc., and the PEP should probably reference that. Technically, this PEP doesn't so much propose a change to the EggFormats standard, as simply add a RECORD file to it, and propose stdlib support for reading and writing it. So, the PEP really should reference (i.e. link to) the existing standard. The EggFormats doc in turn cites pkg_resources doc for lower-level format issues, such as name and version normalization, filename escaping, file parsing, etc. This PEP should also probably be framed as a replacement for PEP 262, proposing to extend the de-facto standard for an installation database with uninstall support, and blessing selected portions of the de facto standard as an official standard. (Since that's pretty much exactly what it is.) From v+python at g.nevcal.com Sat May 16 20:17:10 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 16 May 2009 11:17:10 -0700 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090516165302.C71643A4061@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> Message-ID: <4A0F0326.7070501@g.nevcal.com> On approximately 5/16/2009 9:55 AM, came the following characters from the keyboard of P.J. Eby: > At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote: >> Ok I've changed the PEP with all the points you mentioned, if you want >> to take a look. > > Some notes: > > 1. Why ';' separation, instead of tabs as in PEP 262? Aren't semicolons > a valid character in filenames? Why tabs? Aren't tabs a valid character in filenames? (hint: Both are valid in POSIX filenames, neither are valid in Windows filenames) -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From pje at telecommunity.com Sat May 16 20:58:35 2009 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 16 May 2009 14:58:35 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A0F0326.7070501@g.nevcal.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> Message-ID: <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote: >On approximately 5/16/2009 9:55 AM, came the following characters >from the keyboard of P.J. Eby: >>At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote: >>>Ok I've changed the PEP with all the points you mentioned, if you want >>>to take a look. >>Some notes: >>1. Why ';' separation, instead of tabs as in PEP 262? Aren't >>semicolons a valid character in filenames? > > >Why tabs? Aren't tabs a valid character in filenames? >(hint: Both are valid in POSIX filenames, neither are valid in >Windows filenames) ";" *is* valid in Windows filenames, actually. Tabs aresn't. From v+python at g.nevcal.com Sat May 16 21:12:15 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 16 May 2009 12:12:15 -0700 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> Message-ID: <4A0F100F.3070406@g.nevcal.com> On approximately 5/16/2009 11:58 AM, came the following characters from the keyboard of P.J. Eby: > At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote: >> On approximately 5/16/2009 9:55 AM, came the following characters from >> the keyboard of P.J. Eby: >>> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote: >>>> Ok I've changed the PEP with all the points you mentioned, if you want >>>> to take a look. >>> Some notes: >>> 1. Why ';' separation, instead of tabs as in PEP 262? Aren't >>> semicolons a valid character in filenames? >> >> >> Why tabs? Aren't tabs a valid character in filenames? >> (hint: Both are valid in POSIX filenames, neither are valid in Windows >> filenames) > > ";" *is* valid in Windows filenames, actually. Tabs aresn't. Oops. Guess I got that crossed with valid email address characters... But I should probably have stated my point... that since there are no characters that are not illegal in file names on every platform, except "/" and NULL, that some mention should be made, that splitting the line on ; (or TAB) isn't necessarily the correct parsing technique... rather that the line should be parsed from the right end, and the remainder used as a the filename, as the numbers at the end would not have ; or TAB as legal characters within them. Or else some escaping mechanism needs to be defined. Or else the ; or TAB will be illegal in names used in the RECORD (which would be limiting, although not significantly so, in my opinion, but others may have other opinions). -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From shigin at rambler-co.ru Sat May 16 21:36:04 2009 From: shigin at rambler-co.ru (Alexander Shigin) Date: Sat, 16 May 2009 23:36:04 +0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> Message-ID: <1242502564.4478.3.camel@jenner> ? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????: > ";" *is* valid in Windows filenames, actually. Tabs aresn't. I was sure ';' is separator for PATH in Windows. Do I miss something? If I remember right os.path.pathsep is ';' under Windows. From martin at v.loewis.de Sat May 16 22:08:25 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 16 May 2009 22:08:25 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <1242502564.4478.3.camel@jenner> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <1242502564.4478.3.camel@jenner> Message-ID: <4A0F1D39.1070108@v.loewis.de> Alexander Shigin wrote: > ? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????: >> ";" *is* valid in Windows filenames, actually. Tabs aresn't. > > I was sure ';' is separator for PATH in Windows. Do I miss something? Yes, this: http://msdn.microsoft.com/en-us/library/aa365247.aspx Regards, Martin From v+python at g.nevcal.com Sat May 16 22:26:18 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sat, 16 May 2009 13:26:18 -0700 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A0F1D39.1070108@v.loewis.de> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <1242502564.4478.3.camel@jenner> <4A0F1D39.1070108@v.loewis.de> Message-ID: <4A0F216A.4050905@g.nevcal.com> On approximately 5/16/2009 1:08 PM, came the following characters from the keyboard of Martin v. L?wis: > Alexander Shigin wrote: >> ? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????: >>> ";" *is* valid in Windows filenames, actually. Tabs aresn't. >> I was sure ';' is separator for PATH in Windows. Do I miss something? > > Yes, this: > > http://msdn.microsoft.com/en-us/library/aa365247.aspx Well, maybe he was missing that, or maybe he was missing that each entry in the Windows PATH is allowed to be quoted, so that ; characters inside quotes are part of path names, and ; characters outside of quotes are separators. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From google at mrabarnett.plus.com Sun May 17 00:15:46 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sat, 16 May 2009 23:15:46 +0100 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A0F100F.3070406@g.nevcal.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> Message-ID: <4A0F3B12.8080500@mrabarnett.plus.com> Glenn Linderman wrote: > On approximately 5/16/2009 11:58 AM, came the following characters from > the keyboard of P.J. Eby: >> At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote: >>> On approximately 5/16/2009 9:55 AM, came the following characters >>> from the keyboard of P.J. Eby: >>>> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote: >>>>> Ok I've changed the PEP with all the points you mentioned, if you want >>>>> to take a look. >>>> Some notes: >>>> 1. Why ';' separation, instead of tabs as in PEP 262? Aren't >>>> semicolons a valid character in filenames? >>> >>> >>> Why tabs? Aren't tabs a valid character in filenames? >>> (hint: Both are valid in POSIX filenames, neither are valid in >>> Windows filenames) >> >> ";" *is* valid in Windows filenames, actually. Tabs aresn't. > > > Oops. Guess I got that crossed with valid email address characters... > > But I should probably have stated my point... that since there are no > characters that are not illegal in file names on every platform, except > "/" and NULL, that some mention should be made, that splitting the line > on ; (or TAB) isn't necessarily the correct parsing technique... rather > that the line should be parsed from the right end, and the remainder > used as a the filename, as the numbers at the end would not have ; or > TAB as legal characters within them. Or else some escaping mechanism > needs to be defined. Or else the ; or TAB will be illegal in names used > in the RECORD (which would be limiting, although not significantly so, > in my opinion, but others may have other opinions). > FYI, on RISC OS '/' is a valid filename character and '.' is used as the directory separator. I'd probably say that TAB is s reasonable character to use, even though it's OK in POSIX; after all, should anyone really be using a control character in a filename? From solipsis at pitrou.net Sun May 17 00:29:58 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 16 May 2009 22:29:58 +0000 (UTC) Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> Message-ID: MRAB mrabarnett.plus.com> writes: > > I'd probably say that TAB is s reasonable character to use, even though > it's OK in POSIX; after all, should anyone really be using a control > character in a filename? Even newline characters are valid characters in a filename. Why not go for the safe choice of encoding all filenames using e.g. urllib.quote()? (which has the advantage that usual filenames will stay perfectly readable) From shigin at rambler-co.ru Sun May 17 06:39:48 2009 From: shigin at rambler-co.ru (Alexander Shigin) Date: Sun, 17 May 2009 08:39:48 +0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A0F216A.4050905@g.nevcal.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <1242502564.4478.3.camel@jenner> <4A0F1D39.1070108@v.loewis.de> <4A0F216A.4050905@g.nevcal.com> Message-ID: <1242535188.4478.8.camel@jenner> ? ???, 16/05/2009 ? 13:26 -0700, Glenn Linderman ?????: > On approximately 5/16/2009 1:08 PM, came the following characters from > the keyboard of Martin v. L?wis: > > Yes, this: > > > > http://msdn.microsoft.com/en-us/library/aa365247.aspx > > Well, maybe he was missing that, or maybe he was missing that each entry > in the Windows PATH is allowed to be quoted, so that ; characters inside > quotes are part of path names, and ; characters outside of quotes are > separators. Yep, I haven't think about it. MSDN entry makes clean that ';' is valid for file name. From shigin at rambler-co.ru Sun May 17 06:52:39 2009 From: shigin at rambler-co.ru (Alexander Shigin) Date: Sun, 17 May 2009 08:52:39 +0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A0F3B12.8080500@mrabarnett.plus.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> Message-ID: <1242535959.4478.12.camel@jenner> ? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????: > FYI, on RISC OS '/' is a valid filename character and '.' is used as > the directory separator. > > I'd probably say that TAB is s reasonable character to use, even > though it's OK in POSIX; after all, should anyone really be using a > control character in a filename? The '\0' char is invalid in both windows and posix. I don't know if one valid on RISC OS. From martin at v.loewis.de Sun May 17 07:03:14 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 17 May 2009 07:03:14 +0200 Subject: [Python-Dev] Cleanup for O& Message-ID: <4A0F9A92.6000508@v.loewis.de> Issue 6012 proposes to add cleanup support for O& converters; a first client for this would be PyUnicode_FSConverter. Using cleanup is always necessary if the conversion function allocates memory, and a later argument converter fails. The memory allocated must then be released. There are three options currently to provide such a function: 1. Make a code O&& with two function pointers. I find that too tedious to use. 2. Introduce a new code O$, that takes a O&-style function which, in addition, can also be called with a NULL PyObject*, meaning that it should cleanup. 3. Extend O& so that its function pointers also support the cleanup mode (NULL first argument). Conversion functions that need cleanup would have to return a special constant rather than the usual value of 1. In addition, there is also the approach introduced in issue 5990: 4. Users of a conversion function that requires cleanup need to initialize the output pointer to NULL, and then release memory explicitly when the argument conversion fails. Which of these do you like best? Regards, Martin From ziade.tarek at gmail.com Sun May 17 14:55:45 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 17 May 2009 14:55:45 +0200 Subject: [Python-Dev] LZW support in tarfile ? Message-ID: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> Hello, I want to remove the usage of the "tar" command in Distutils in favor or the "tarfile" module. But, there's an option in Distutils.make_archive to create a tarball using the "compress" [1] program rather than gzip or bzip2. Using tar -Z, it will pipe it to the compress program if present. This program implements the LZW algorithm [2]. The LZW used to be patented but this patent seem to be expired in every country now [3]. On Distutils side I can work things out so the tar archive created can be piped to an arbitraty compression program when it is not compressed using bzip2 or gzip; But I was wondering if we should we add a LZW support in tarinfo, besides gzip and bzip2 ? Although this compression standard doesn't seem very used these days, Regards Tarek [1] http://en.wikipedia.org/wiki/Compress [2] http://en.wikipedia.org/wiki/LZW [3] http://www.unisys.com/about__unisys/lzw -- Tarek Ziad? | http://ziade.org From google at mrabarnett.plus.com Sun May 17 15:04:06 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sun, 17 May 2009 14:04:06 +0100 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <1242535959.4478.12.camel@jenner> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> <1242535959.4478.12.camel@jenner> Message-ID: <4A100B46.2080003@mrabarnett.plus.com> Alexander Shigin wrote: > ? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????: >> FYI, on RISC OS '/' is a valid filename character and '.' is used as >> the directory separator. >> >> I'd probably say that TAB is s reasonable character to use, even >> though it's OK in POSIX; after all, should anyone really be using a >> control character in a filename? > > The '\0' char is invalid in both windows and posix. I don't know if one > valid on RISC OS. > '\0' isn't a valid filename character on RISC OS. From solipsis at pitrou.net Sun May 17 15:19:44 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 17 May 2009 13:19:44 +0000 (UTC) Subject: [Python-Dev] LZW support in tarfile ? References: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> Message-ID: Tarek Ziad? gmail.com> writes: > > But I was wondering if we should we add a LZW support in tarinfo, > besides gzip and bzip2 ? > > Although this compression standard doesn't seem very used these days, It would be more useful to add LZMA / xz support. I don't think compress is used anymore, except perhaps on old legacy systems. On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no .Z file. Regards Antoine. From fuzzyman at voidspace.org.uk Sun May 17 15:23:03 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 17 May 2009 14:23:03 +0100 Subject: [Python-Dev] LZW support in tarfile ? In-Reply-To: References: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> Message-ID: <4A100FB7.6020200@voidspace.org.uk> Antoine Pitrou wrote: > Tarek Ziad? gmail.com> writes: > >> But I was wondering if we should we add a LZW support in tarinfo, >> besides gzip and bzip2 ? >> >> Although this compression standard doesn't seem very used these days, >> > > It would be more useful to add LZMA / xz support. > I don't think compress is used anymore, except perhaps on old legacy systems. > On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no > .Z file. > I've seen the occasional .Z file in recent years, but never that I recall for a Python package. As plugging in external compression tools is less likely to work cross-platform wouldn't it be both easier and better to deprecate (and not replace) the compress support. If there is a huge outcry adding LZW support to tarfile can be reconsidered. Michael Foord > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From martin at v.loewis.de Sun May 17 17:00:18 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 17 May 2009 17:00:18 +0200 Subject: [Python-Dev] LZW support in tarfile ? In-Reply-To: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> References: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> Message-ID: <4A102682.7020207@v.loewis.de> > But, there's an option in Distutils.make_archive to create a tarball > using the "compress" [1] program rather than gzip or bzip2. > Using tar -Z, it will pipe it to the compress program if present. This > program implements the LZW algorithm [2]. As everybody else says: it might be best to just remove that option. For compatibility, perhaps deprecate it in 2.7 and 3.1, and remove in in 3.2. Regards, Martin From piet at cs.uu.nl Sun May 17 21:47:16 2009 From: piet at cs.uu.nl (Piet van Oostrum) Date: Sun, 17 May 2009 21:47:16 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: (Ned Deily's message of "Thu\, 30 Apr 2009 12\:54\:50 -0700") References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> Message-ID: >>>>> Ned Deily (ND) wrote: >ND> In article , Piet van Oostrum >ND> wrote: >>> >>>>> Ronald Oussoren (RO) wrote: >>> >RO> For what it's worth, the OSX API's seem to behave as follows: >>> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the >>> >RO> system automaticly encodes the name. >>> >>> >RO> That is, open(chr(255), 'w') will silently create a file named '%FF' >>> >RO> instead of the name you'd expect on a unix system. >>> >>> Not for me (I am using Python 2.6.2). >>> >>> >>> f = open(chr(255), 'w') >>> Traceback (most recent call last): >>> File "", line 1, in >>> IOError: [Errno 22] invalid mode ('w') or filename: '\xff' >>> >>> >ND> What version of OSX are you using? On Tiger 10.4.11 I see the failure >ND> you see but on Leopard 10.5.6 the behavior Ronald reports. Yes, I am using Tiger (10.4.11). Interesting that it has changed on Leopard. -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet at vanoostrum.org From martin at v.loewis.de Sun May 17 22:54:32 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 17 May 2009 22:54:32 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI Message-ID: <4A107988.3020202@v.loewis.de> Thomas Wouters reminded me of a long-standing idea; I finally found the time to write it down. Please comment! Regards, Martin PEP: 384 Title: Defining a Stable ABI Version: $Revision: 72754 $ Last-Modified: $Date: 2009-05-17 21:14:52 +0200 (So, 17. Mai 2009) $ Author: Martin v. L?wis Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 17-May-2009 Python-Version: 3.2 Post-History: Abstract ======== Currently, each feature release introduces a new name for the Python DLL on Windows, and may cause incompatibilities for extension modules on Unix. This PEP proposes to define a stable set of API functions which are guaranteed to be available for the lifetime of Python 3, and which will also remain binary-compatible across versions. Extension modules and applications embedding Python can work with different feature releases as long as they restrict themselves to this stable ABI. Rationale ========= The primary source of ABI incompatibility are changes to the lay-out of in-memory structures. For example, the way in which string interning works, or the data type used to represent the size of an object, have changed during the life of Python 2.x. As a consequence, extension modules making direct access to fields of strings, lists, or tuples, would break if their code is loaded into a newer version of the interpreter without recompilation: offsets of other fields may have changed, making the extension modules access the wrong data. In some cases, the incompatibilities only affect internal objects of the interpreter, such as frame or code objects. For example, the way line numbers are represented has changed in the 2.x lifetime, as has the way in which local variables are stored (due to the introduction of closures). Even though most applications probably never used these objects, changing them had required to change the PYTHON_API_VERSION. On Linux, changes to the ABI are often not much of a problem: the system will provide a default Python installation, and many extension modules are already provided pre-compiled for that version. If additional modules are needed, or additional Python versions, users can typically compile them themselves on the system, resulting in modules that use the right ABI. On Windows, multiple simultaneous installations of different Python versions are common, and extension modules are compiled by their authors, not by end users. To reduce the risk of ABI incompatibilities, Python currently introduces a new DLL name pythonXY.dll for each feature release, whether or not ABI incompatibilities actually exist. With this PEP, it will be possible to reduce the dependency of binary extension modules on a specific Python feature release, and applications embedding Python can be made work with different releases. Specification ============= The ABI specification falls into two parts: an API specification, specifying what function (groups) are available for use with the ABI, and a linkage specification specifying what libraries to link with. The actual ABI (layout of structures in memory, function calling conventions) is not specified, but implied by the compiler. As a recommendation, a specific ABI is recommended for selected platforms. During evolution of Python, new ABI functions will be added. Applications using them will then have a requirement on a minimum version of Python; this PEP provides no mechanism for such applications to fall back when the Python library is too old. Terminology ----------- Applications and extension modules that want to use this ABI are collectively referred to as "applications" from here on. Header Files and Preprocessor Definitions ----------------------------------------- Applications shall only include the header file Python.h (before including any system headers), or, optionally, include pyconfig.h, and then Python.h. During the compilation of applications, the preprocessor macro Py_LIMITED_API must be defined. Doing so will hide all definitions that are not part of the ABI. Structures ---------- Only the following structures and structure fields are accessible to applications: - PyObject (ob_refcnt, ob_type) - PyVarObject (ob_base, ob_size) - Py_buffer (buf, obj, len, itemsize, readonly, ndim, shape, strides, suboffsets, smalltable, internal) - PyMethodDef (ml_name, ml_meth, ml_flags, ml_doc) - PyMemberDef (name, type, offset, flags, doc) - PyGetSetDef (name, get, set, doc, closure) The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE) are also available to applications. The following types are available, but opaque (i.e. incomplete): - PyThreadState - PyInterpreterState Type Objects ------------ The structure of type objects is not available to applications; declaration of "static" type objects is not possible anymore (for applications using this ABI). Instead, type objects get created dynamically. To allow an easy creation of types (in particular, to be able to fill out function pointers easily), the following structures and functions are available:: typedef struct{ int slot; /* slot id, see below */ void *pfunc; /* function pointer */ } PyType_Slot; struct{ const char* name; const char* doc; int basicsize; int itemsize; int flags; PyType_Slot *slots; /* terminated by slot==0. */ } PyType_Spec; PyObject* PyType_FromSpec(PyType_Spec*); To specify a slot, a unique slot id must be provided. New Python versions may introduce new slot ids, but slot ids will never be recycled. Slots may get deprecated, but continue to be supported throughout Python 3.x. The slot ids are named like the field names of the structures that hold the pointers in Python 3.1, with an added ``Py_`` prefix (i.e. Py_tp_dealloc instead of just tp_dealloc): - tp_dealloc, tp_print, tp_getattr, tp_setattr, tp_repr, tp_hash, tp_call, tp_str, tp_getattro, tp_setattro, tp_doc, tp_traverse, tp_clear, tp_richcompare, tp_iter, tp_iternext, tp_methods, tp_base, tp_descr_set, tp_descr_set, tp_init, tp_alloc, tp_new, tp_is_gc, tp_bases, tp_del - nb_add nb_subtract nb_multiply nb_remainder nb_divmod nb_power nb_negative nb_positive nb_absolute nb_bool nb_invert nb_lshift nb_rshift nb_and nb_xor nb_or nb_int nb_float nb_inplace_add nb_inplace_subtract nb_inplace_multiply nb_inplace_remainder nb_inplace_power nb_inplace_lshift nb_inplace_rshift nb_inplace_and nb_inplace_xor nb_inplace_or nb_floor_divide nb_true_divide nb_inplace_floor_divide nb_inplace_true_divide nb_index - sq_length sq_concat sq_repeat sq_item sq_ass_item was_sq_ass_slice sq_contains sq_inplace_concat sq_inplace_repeat - mp_length mp_subscript mp_ass_subscript - bf_getbuffer bf_releasebuffer XXX Not supported yet: tp_weaklistoffset, tp_dictoffset The following fields cannot be set during type definition: - tp_dict tp_mro tp_cache tp_subclasses tp_weaklist Functions and function-like Macros ---------------------------------- All functions starting with _Py are not available to applications. Also, all functions that expect parameter types that are unavailable to applications are excluded from the ABI, such as PyAST_FromNode (which expects a ``node*``). All other functions are available, unless excluded below. Function-like macros (in particular, field access macros) remain available to applications, but get replaced by function calls (unless their definition only refers to features of the ABI, such as the various _Check macros) ABI function declarations will not change their parameters or return types. If a change to the signature becomes necessary, a new function will be introduced. If the new function is source-compatible (e.g. if just the return type changes), an alias macro may get added to redirect calls to the new function when the applications is recompiled. If continued provision of the old function is not possible, it may get deprecated, then removed, in accordance with PEP 7, causing applications that use that function to break. Excluded Functions ------------------ Functions declared in the following header files are not part of the ABI: - cellobject.h - classobject.h - code.h - frameobject.h - funcobject.h - genobject.h - pyarena.h - pydebug.h - symtable.h - token.h - traceback.h Global Variables ---------------- Global variables representing types and exceptions are available to applications. XXX provide a complete list. XXX should restrict list of globals to truly "builtin" stuff, excluding everything that can also be looked up through imports. XXX may specify access to predefined types and exceptions through the interpreter state, with appropriate Get macros. Other Macros ------------ All macros defining symbolic constants are available to applications; the numeric values will not change. In addition, the following macros are available: - Py_BEGIN_ALLOW_THREADS, Py_BLOCK_THREADS, Py_UNBLOCK_THREADS, Py_END_ALLOW_THREADS Linkage ------- On Windows, applications shall link with python3.dll; an import library python3.lib will be available. This DLL will redirect all of its API functions through /export linker options to the full interpreter DLL, i.e. python3y.dll. XXX is it possible to redirect global variables in the same way? If not, python3.dll would have to copy them, and we should verify that all available global variables are read-only. On Unix systems, the ABI is typically provided by the python executable itself. PyModule_Create is changed to pass ``3`` as the API version if the extension module was compiled with Py_LIMITED_API; the version check for the API version will accept either 3 or the current PYTHON_API_VERSION as conforming. If Python is compiled as a shared library, it is installed as both libpython3.so, and libpython3.y.so; applications conforming to this PEP should then link to the former. XXX is it possible to make the soname libpython.so.3, and still have some applications link to libpython3.y.so? Implementation Strategy ======================= This PEP will be implemented in a branch, allowing users to check whether their modules conform to the ABI. To simplify this testing, an additional macro Py_LIMITED_API_WITH_TYPES will expose the existing type object layout, to let users postpone rewriting all types. When the this branch is merged into the 3.2 code base, this macro will be removed. Copyright ========= This document has been placed in the public domain. From dirkjan at ochtman.nl Sun May 17 23:47:07 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 17 May 2009 23:47:07 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: On Sun, May 17, 2009 at 10:54 PM, "Martin v. L?wis" wrote: > Excluded Functions > ------------------ > > Functions declared in the following header files are not part > of the ABI: > - cellobject.h > - classobject.h > - code.h > - frameobject.h > - funcobject.h > - genobject.h > - pyarena.h > - pydebug.h > - symtable.h > - token.h > - traceback.h What kind of effect does this have on optimization efforts, for example all the stuff done by Antoine Pitrou over the last few months, and the first few results from unladen? Will it mean we won't get to the good optimizations until 4.0? Or does it just mean unladen swallow takes longer to come back to trunk (until 4.0) and every extension author who wants to be compatible with it will basically have the same burden as now? Cheers, Dirkjan From martin at v.loewis.de Mon May 18 00:07:59 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 18 May 2009 00:07:59 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> Message-ID: <4A108ABF.9060909@v.loewis.de> >> Functions declared in the following header files are not part >> of the ABI: >> - cellobject.h >> - classobject.h >> - code.h >> - frameobject.h >> - funcobject.h >> - genobject.h >> - pyarena.h >> - pydebug.h >> - symtable.h >> - token.h >> - traceback.h > > What kind of effect does this have on optimization efforts, for > example all the stuff done by Antoine Pitrou over the last few months, > and the first few results from unladen? I fail to see the relationship, so: no effect that I can see. Why do you think that optimization efforts could be related to the PEP 384 proposal? Regards, Martin From g.brandl at gmx.net Mon May 18 00:17:31 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 18 May 2009 00:17:31 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: Martin v. L?wis schrieb: > Header Files and Preprocessor Definitions > ----------------------------------------- > > Applications shall only include the header file Python.h (before > including any system headers), or, optionally, include pyconfig.h, and > then Python.h. What about structmember.h? It's not yet included with Python.h AFAICS. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From dirkjan at ochtman.nl Mon May 18 00:34:52 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 18 May 2009 00:34:52 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A108ABF.9060909@v.loewis.de> References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> Message-ID: On Mon, May 18, 2009 at 12:07 AM, "Martin v. L?wis" wrote: > I fail to see the relationship, so: no effect that I can see. > > Why do you think that optimization efforts could be related to > the PEP 384 proposal? It would seem to me that optimizations are likely to require data structure changes, for exactly the kind of core data structures that you're talking about locking down. But that's just a high-level view, I might be wrong. Cheers, Dirkjan From martin at v.loewis.de Mon May 18 00:35:06 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 18 May 2009 00:35:06 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> Message-ID: <4A10911A.90704@v.loewis.de> >> Header Files and Preprocessor Definitions >> ----------------------------------------- >> >> Applications shall only include the header file Python.h (before >> including any system headers), or, optionally, include pyconfig.h, and >> then Python.h. > > What about structmember.h? It's not yet included with Python.h AFAICS. Right - I think it should be, though. Is there a reason why it's not included? The only reason I can see is that it isn't completely namespace-safe, e.g. it defines a constant READONLY. Not sure whether the T_ constants would need to be changed as well. So if that's the rationale, I would propose to make it namespace-safe under a different file name, and add alias #defines in structmember.h for compatibility. I also think this should happen independent of PEP 384. See also issue 2897 - perhaps we can even fix it for 3.1. Regards, Martin From solipsis at pitrou.net Mon May 18 00:43:30 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 17 May 2009 22:43:30 +0000 (UTC) Subject: [Python-Dev] PEP 384: Defining a Stable ABI References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> Message-ID: Dirkjan Ochtman ochtman.nl> writes: > > It would seem to me that optimizations are likely to require data > structure changes, for exactly the kind of core data structures that > you're talking about locking down. But that's just a high-level view, > I might be wrong. Unless I'm misunderstanding something, Martin doesn't advocate locking data structures down (except a couple of outliers such as Py_buffer). An ABI-compliant application mustn't tinker directly with Python's data structures, but use the ABI functions. Regards Antoine. From dirkjan at ochtman.nl Mon May 18 00:46:21 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 18 May 2009 00:46:21 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> Message-ID: On Mon, May 18, 2009 at 12:43 AM, Antoine Pitrou wrote: > Unless I'm misunderstanding something, Martin doesn't advocate locking data > structures down (except a couple of outliers such as Py_buffer). An > ABI-compliant application mustn't tinker directly with Python's data structures, > but use the ABI functions. Right. Sorry about the noise, then. Cheers, Dirkjan From martin at v.loewis.de Mon May 18 00:53:00 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 18 May 2009 00:53:00 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> Message-ID: <4A10954C.6070401@v.loewis.de> Dirkjan Ochtman wrote: > On Mon, May 18, 2009 at 12:07 AM, "Martin v. L?wis" wrote: >> I fail to see the relationship, so: no effect that I can see. >> >> Why do you think that optimization efforts could be related to >> the PEP 384 proposal? > > It would seem to me that optimizations are likely to require data > structure changes, for exactly the kind of core data structures that > you're talking about locking down. But that's just a high-level view, > I might be wrong. Ah. It's exactly the opposite: The purpose of the PEP is not to lock the data structures down, but to allow more flexible evolution of them - by completely hiding them from extension modules. Currently, any data structure change must be weighed for its impact on binary compatibility. With the PEP, changing structures can be done fairly freely - with the exception of the very few structures that do get locked down. In particular, the list of header files that you quoted precisely contains the structures that can be modified with no impact on the ABI. I'm not aware that any of the structures that I propose to lock would be relevant for optimization - but I might be wrong. If so, I'd like to know, and it would be possible to add accessor functions in cases where extension modules might still legitimately want to access certain fields. Certain changes to the VM would definitely be binary-incompatible, such as removal of reference counting. However, such a change would probably have a much wider effect, breaking not just binary compatibility, but also source compatibility. It would be justified to call a Python release that makes such a change 4.0. Regards, Martin From martin at v.loewis.de Mon May 18 01:04:21 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 18 May 2009 01:04:21 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> Message-ID: <4A1097F5.2050009@v.loewis.de> Dino Viehland wrote: > Dirkjan Ochtman wrote: >> It would seem to me that optimizations are likely to require data >> structure changes, for exactly the kind of core data structures that >> you're talking about locking down. But that's just a high-level view, >> I might be wrong. >> > > > In particular I would guess that ref counting is the biggest issue here. > I would think not directly exposing the field and having inc/dec ref > Functions (real methods, not macros) for it would give a lot more > ability to change the API in the future. In the context of optimization, I'm skeptical that introducing functions for the reference counting would be useful. Making the INCREF/DECREF macros functions just in case the reference counting goes away is IMO an unacceptable performance cost. Instead, such a change should go through the regular deprecation procedure and/or cause the release of Python 4.0. > It also might make it easier for alternate implementations to support > the same API so some modules could work cross implementation - but I > suspect that's a non-goal of this PEP :). Indeed :-) I'm also skeptical that this would actually allow cross-implementation modules to happen. The list of functions that an alternate implementation would have to provide is fairly long. The memory management APIs in particular also assume a certain layout of Python objects in general, namely that they start with a header whose size is a compile-time constant. Again, making this more flexible "just in case" would also impact performance, and probably fairly badly so. > Other fields directly accessed (via macros or otherwise) might have similar > problems but they don't seem as core as ref counting. Access to the type object reference is probably similar. All the other structs are used "directly" in C code, with no accessor macros. Regards, Martin From dinov at microsoft.com Mon May 18 00:48:05 2009 From: dinov at microsoft.com (Dino Viehland) Date: Sun, 17 May 2009 22:48:05 +0000 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> Message-ID: <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> Dirkjan Ochtman wrote: > > It would seem to me that optimizations are likely to require data > structure changes, for exactly the kind of core data structures that > you're talking about locking down. But that's just a high-level view, > I might be wrong. > In particular I would guess that ref counting is the biggest issue here. I would think not directly exposing the field and having inc/dec ref Functions (real methods, not macros) for it would give a lot more ability to change the API in the future. It also might make it easier for alternate implementations to support the same API so some modules could work cross implementation - but I suspect that's a non-goal of this PEP :). Other fields directly accessed (via macros or otherwise) might have similar problems but they don't seem as core as ref counting. From fuzzyman at voidspace.org.uk Mon May 18 01:53:12 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 18 May 2009 00:53:12 +0100 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1097F5.2050009@v.loewis.de> References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> <4A1097F5.2050009@v.loewis.de> Message-ID: <4A10A368.1060406@voidspace.org.uk> Martin v. L?wis wrote: > Dino Viehland wrote: > >> Dirkjan Ochtman wrote: >> >>> It would seem to me that optimizations are likely to require data >>> structure changes, for exactly the kind of core data structures that >>> you're talking about locking down. But that's just a high-level view, >>> I might be wrong. >>> >>> >> In particular I would guess that ref counting is the biggest issue here. >> I would think not directly exposing the field and having inc/dec ref >> Functions (real methods, not macros) for it would give a lot more >> ability to change the API in the future. >> > > In the context of optimization, I'm skeptical that introducing functions > for the reference counting would be useful. Making the INCREF/DECREF > macros functions just in case the reference counting goes away is IMO > an unacceptable performance cost. > > Instead, such a change should go through the regular deprecation > procedure and/or cause the release of Python 4.0. > > >> It also might make it easier for alternate implementations to support >> the same API so some modules could work cross implementation - but I >> suspect that's a non-goal of this PEP :). >> > > Indeed :-) I'm also skeptical that this would actually allow > cross-implementation modules to happen. The list of functions that > an alternate implementation would have to provide is fairly long. > > Just in case you're unaware of it; the company I work for has an open source project called Ironclad. This *is* a reimplementation of the Python C API and gives us binary compatibility with [some subset of] Python C extensions for use from IronPython. http://www.resolversystems.com/documentation/index.php/Ironclad.html It's an ambitious project but it is now at the stage where 1000s of the Numpy and Scipy tests pass when run from IronPython. I don't think this PEP impacts the project, but it is not completely unfeasible for the alternative implementations to do this. In particular we have had to address the issue of the GIL and extensions (IronPython has no GIL) and reference counting (which IronPython also doesn't) use. Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From foom at fuhm.net Mon May 18 01:35:59 2009 From: foom at fuhm.net (James Y Knight) Date: Sun, 17 May 2009 19:35:59 -0400 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: <64C7B6A8-2273-4A41-9CFA-72B7B1D05361@fuhm.net> On May 17, 2009, at 4:54 PM, Martin v. L?wis wrote: > Currently, each feature release introduces a new name for the > Python DLL on Windows, and may cause incompatibilities for extension > modules on Unix. This PEP proposes to define a stable set of API > functions which are guaranteed to be available for the lifetime > of Python 3, and which will also remain binary-compatible across > versions. Extension modules and applications embedding Python > can work with different feature releases as long as they restrict > themselves to this stable ABI. It seems like a good ideal to strive for. But I think this is too strong a promise. IMO it would be better to say that ABI compatibility across releases is a goal. If someone does make a change that breaks the ABI, I'd expect whomever is proposing it to put forth a fairly strong argument towards why it's a worthwhile change. But it should be possible and allowed, given the right circumstances. Because I think it's pretty much inevitable that it *will* need to happen, sometime. (of course there will need to be ABI tests, so that any potential ABI breakages are known about when they occur) Python is much more defined by its source language than its C extension API, so tying the python major version number to the C ABI might not be the best idea from a "marketing" standpoint. (I can see it now..."Python 4.0 major new features: we changed the C method definition struct layout incompatibly" :) James From martin at v.loewis.de Mon May 18 08:00:57 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 18 May 2009 08:00:57 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A10A368.1060406@voidspace.org.uk> References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> <4A1097F5.2050009@v.loewis.de> <4A10A368.1060406@voidspace.org.uk> Message-ID: <4A10F999.5050106@v.loewis.de> >>> It also might make it easier for alternate implementations to support >>> the same API so some modules could work cross implementation - but I >>> suspect that's a non-goal of this PEP :). >>> >> >> Indeed :-) I'm also skeptical that this would actually allow >> cross-implementation modules to happen. The list of functions that >> an alternate implementation would have to provide is fairly long. >> >> > > Just in case you're unaware of it; the company I work for has an open > source project called Ironclad. I was unaware indeed; thanks for pointing this out. IIUC, it's not just an API emulation, but also an ABI emulation. > In particular we have had to address the issue of the GIL and extensions > (IronPython has no GIL) and reference counting (which IronPython also > doesn't) use. I think this somewhat strengthens the point I was trying to make: An alternate implementation that tries to be API compatible has to consider so many things that it is questionable whether making Py_INCREF/DECREF functions would be any simplification. So I just ask: a) Would it help IronClad if it could restrict itself to PEP 384 compatible modules? b) Would further restrictions in the PEP help that cause? Regards, Martin From nick at craig-wood.com Mon May 18 10:06:17 2009 From: nick at craig-wood.com (Nick Craig-Wood) Date: Mon, 18 May 2009 09:06:17 +0100 Subject: [Python-Dev] LZW support in tarfile ? In-Reply-To: <4A100FB7.6020200@voidspace.org.uk> References: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> <4A100FB7.6020200@voidspace.org.uk> Message-ID: <20090518080621.8A32C14C293@irishsea.home.craig-wood.com> Michael Foord wrote: > Antoine Pitrou wrote: > > Tarek Ziad? gmail.com> writes: > > > >> But I was wondering if we should we add a LZW support in tarinfo, > >> besides gzip and bzip2 ? > >> > >> Although this compression standard doesn't seem very used these days, > >> > > > > It would be more useful to add LZMA / xz support. > > I don't think compress is used anymore, except perhaps on old legacy systems. > > On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no > > .Z file. > > I've seen the occasional .Z file in recent years, but never that I > recall for a Python package. On my unix filesystem (which has files stretching back over 20 years) I find only two .Z files, one dated 1989 and one 2002. I think you can safely say that compress is gone! The worst you are doing by removing compress support is getting the user of some ancient platform to download one of the binaries here first. http://www.gzip.org/#exe > As plugging in external compression tools is less likely to work > cross-platform wouldn't it be both easier and better to deprecate (and > not replace) the compress support. Agreed. -- Nick Craig-Wood -- http://www.craig-wood.com/nick From ziade.tarek at gmail.com Mon May 18 10:27:58 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Mon, 18 May 2009 10:27:58 +0200 Subject: [Python-Dev] LZW support in tarfile ? In-Reply-To: <20090518080621.8A32C14C293@irishsea.home.craig-wood.com> References: <94bdd2610905170555y5faff2eav8708ec993d13259e@mail.gmail.com> <4A100FB7.6020200@voidspace.org.uk> <20090518080621.8A32C14C293@irishsea.home.craig-wood.com> Message-ID: <94bdd2610905180127x2ae625fbi8800c7a392ef0de5@mail.gmail.com> Ok thanks for all the feedback, I'll remove compress support Tarek On Mon, May 18, 2009 at 10:06 AM, Nick Craig-Wood wrote: > Michael Foord wrote: >> ?Antoine Pitrou wrote: >> > Tarek Ziad? gmail.com> writes: >> > >> >> But I was wondering if we should we add a LZW support in tarinfo, >> >> besides gzip and bzip2 ? >> >> >> >> Although this compression standard doesn't seem very used these days, >> >> >> > >> > It would be more useful to add LZMA / xz support. >> > I don't think compress is used anymore, except perhaps on old legacy systems. >> > On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no >> > .Z file. >> >> ?I've seen the occasional .Z file in recent years, but never that I >> ?recall for a Python package. > > On my unix filesystem (which has files stretching back over 20 years) > I find only two .Z files, one dated 1989 and one 2002. ?I think you > can safely say that compress is gone! > > The worst you are doing by removing compress support is getting the > user of some ancient platform to download one of the binaries here > first. > > ?http://www.gzip.org/#exe > >> ?As plugging in external compression tools is less likely to work >> ?cross-platform wouldn't it be both easier and better to deprecate (and >> ?not replace) the compress support. > > Agreed. > > -- > Nick Craig-Wood -- http://www.craig-wood.com/nick > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com > -- Tarek Ziad? | http://ziade.org From fuzzyman at voidspace.org.uk Mon May 18 13:17:37 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 18 May 2009 12:17:37 +0100 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A10F999.5050106@v.loewis.de> References: <4A107988.3020202@v.loewis.de> <4A108ABF.9060909@v.loewis.de> <1A472770E042064698CB5ADC83A12ACD016E82D6@TK5EX14MBXC116.redmond.corp.microsoft.com> <4A1097F5.2050009@v.loewis.de> <4A10A368.1060406@voidspace.org.uk> <4A10F999.5050106@v.loewis.de> Message-ID: <4A1143D1.1050806@voidspace.org.uk> Martin v. L?wis wrote: >>>> It also might make it easier for alternate implementations to support >>>> the same API so some modules could work cross implementation - but I >>>> suspect that's a non-goal of this PEP :). >>>> >>>> >>> Indeed :-) I'm also skeptical that this would actually allow >>> cross-implementation modules to happen. The list of functions that >>> an alternate implementation would have to provide is fairly long. >>> >>> >>> >> Just in case you're unaware of it; the company I work for has an open >> source project called Ironclad. >> > > I was unaware indeed; thanks for pointing this out. > > IIUC, it's not just an API emulation, but also an ABI emulation. > > Correct. >> In particular we have had to address the issue of the GIL and extensions >> (IronPython has no GIL) and reference counting (which IronPython also >> doesn't) use. >> > > I think this somewhat strengthens the point I was trying to make: An > alternate implementation that tries to be API compatible has to consider > so many things that it is questionable whether making Py_INCREF/DECREF > functions would be any simplification. > It would actually have been helpful for us, but I understand that it would be a big performance hit. The Ironclad garbage collection mechanism is described here: http://www.voidspace.org.uk/python/weblog/arch_d7_2009_01_24.shtml#e1055 We artificially inflate the refcount of all objects that Ironclad creates to 2 and hold a reference to them on the .NET side to make them ineligible for garbage collection. Because we can't always know when objects have been decreffed back down to 1, there are some circumstances when we have to scan all the objects we are holding onto. If their refcount is only 1 then we no longer need to hold a reference them. When nothing is using them on the IronPython side either normal .NET garbage collection kicks in and the IronPython proxy object has a destructor that calls back into Ironclad and uses the CPython dealloc method. > So I just ask: > a) Would it help IronClad if it could restrict itself to PEP 384 > compatible modules? > b) Would further restrictions in the PEP help that cause? > I've forwarded these questions to the lead developer of Ironclad (William Reade) along with a link to the PEP. He isn't on Python-dev so I may have to be a proxy for him in discussion. His initial response was "looks pretty sweet". Michael > Regards, > Martin > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From william at resolversystems.com Tue May 19 11:09:58 2009 From: william at resolversystems.com (William Reade) Date: Tue, 19 May 2009 10:09:58 +0100 Subject: [Python-Dev] [Fwd: Re: PEP 384: Defining a Stable ABI] In-Reply-To: <4A1141B4.4090608@voidspace.org.uk> References: <4A1141B4.4090608@voidspace.org.uk> Message-ID: <4A127766.8000101@resolversystems.com> My perspective is as follows: 1) If PEP-384 had always been in place, my life would now be a lot easier. 2) Since it hasn't always been in place, its introduction won't help me in the short term: there are an awful lot of extension modules that use excluded functions (for example, all(?) PyCxx modules use PyCode_New and PyFrame_New to get nicer tracebacks), and I'll still have to handle all these cases until everyone is up-to-date with whatever version of Python this gets accepted into. 3) Regardless, this PEP makes me very happy, because I can now look forward to the glorious day when all extension modules are 384-compatible (and even *some* modules becoming compatible will make me pretty happy). However, I'm not sure exactly how we can get there from here; I suspect that certain features of certain extensions already depend critically upon implementation details which will become hidden. The most extreme illustrative example I know is from NumPy (in scalarmathmodule.c), and looks like this: PyInt_Type.tp_as_number = PyLongArrType_Type.tp_as_number; PyInt_Type.tp_compare = PyLongArrType_Type.tp_compare; PyInt_Type.tp_richcompare = PyLongArrType_Type.tp_richcompare; ...and I fear that many many similar (if perhaps less frightening) dependencies exist elsewhere. Regardless, in answer to the two specific questions you ask: a) We don't really have that option. However, I would have a much higher degree of confidence in running PEP-384-compatible modules under Ironclad than I do with current modules, simply because I would no longer need to worry about (say) edge cases in which extension writers suddenly try to directly access op->ob_type->tp_as_number->nb_power. b) I can't think of any more useful restrictions. The PEP would solve my biggest current worry, which is that my current implementation allows managed/unmanaged lists to fall out of sync in certain circumstances (but if every list mutation happened via an API call, it wouldn't be an issue). Best Regards William Michael Foord wrote: > The questions from Martin v. Lowis are in the email below. > > The PEP under discussion is: > > http://www.python.org/dev/peps/pep-0384/ > > I can proxy any replies you want to send, or you can join Python-dev. > > All the best, > > Michael > > -------- Original Message -------- > Subject: Re: [Python-Dev] PEP 384: Defining a Stable ABI > Date: Mon, 18 May 2009 08:00:57 +0200 > From: "Martin v. L?wis" > To: Michael Foord > CC: Dino Viehland , Python-Dev > , Unladen Swallow > , Python List > References: <4A107988.3020202 at v.loewis.de> > > <4A108ABF.9060909 at v.loewis.de> > > <1A472770E042064698CB5ADC83A12ACD016E82D6 at TK5EX14MBXC116.redmond.corp.microsoft.com> > <4A1097F5.2050009 at v.loewis.de> <4A10A368.1060406 at voidspace.org.uk> > > > > >>> It also might make it easier for alternate implementations to support > >>> the same API so some modules could work cross implementation - but I > >>> suspect that's a non-goal of this PEP :). > >>> > >> > >> Indeed :-) I'm also skeptical that this would actually allow > >> cross-implementation modules to happen. The list of functions that > >> an alternate implementation would have to provide is fairly long. > >> > >> > > > > Just in case you're unaware of it; the company I work for has an open > > source project called Ironclad. > > I was unaware indeed; thanks for pointing this out. > > IIUC, it's not just an API emulation, but also an ABI emulation. > > > In particular we have had to address the issue of the GIL and extensions > > (IronPython has no GIL) and reference counting (which IronPython also > > doesn't) use. > > I think this somewhat strengthens the point I was trying to make: An > alternate implementation that tries to be API compatible has to consider > so many things that it is questionable whether making Py_INCREF/DECREF > functions would be any simplification. > > So I just ask: > a) Would it help IronClad if it could restrict itself to PEP 384 > compatible modules? > b) Would further restrictions in the PEP help that cause? > > Regards, > Martin > > > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > From ronaldoussoren at mac.com Tue May 19 14:59:31 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 19 May 2009 14:59:31 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <4A100B46.2080003@mrabarnett.plus.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> <1242535959.4478.12.camel@jenner> <4A100B46.2080003@mrabarnett.plus.com> Message-ID: <06B4F3E6-A8BD-4B4A-87C8-88B52B00F7EF@mac.com> On 17 May, 2009, at 15:04, MRAB wrote: > Alexander Shigin wrote: >> ? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????: >>> FYI, on RISC OS '/' is a valid filename character and '.' is used as >>> the directory separator. >>> >>> I'd probably say that TAB is s reasonable character to use, even >>> though it's OK in POSIX; after all, should anyone really be using a >>> control character in a filename? >> The '\0' char is invalid in both windows and posix. I don't know if >> one >> valid on RISC OS. > '\0' isn't a valid filename character on RISC OS. Wouldn't it be possible to use a CSV file for this? That way we wouldn't have to invent yet another escaping mechanism and there's already good suppport for reading and writing CSV files in the standard library. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From solipsis at pitrou.net Tue May 19 16:03:12 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 19 May 2009 14:03:12 +0000 (UTC) Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> <1242535959.4478.12.camel@jenner> <4A100B46.2080003@mrabarnett.plus.com> <06B4F3E6-A8BD-4B4A-87C8-88B52B00F7EF@mac.com> Message-ID: Ronald Oussoren mac.com> writes: > > Wouldn't it be possible to use a CSV file for this? That way we > wouldn't have to invent yet another escaping mechanism and there's > already good suppport for reading and writing CSV files in the > standard library. +1 We can even customize the delimiter if you want to make it more readable (or if there's a shortage of bikeshed material ;-)). cheers Antoine. From ziade.tarek at gmail.com Tue May 19 16:04:21 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 19 May 2009 16:04:21 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090516165302.C71643A4061@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> Message-ID: <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> On Sat, May 16, 2009 at 6:55 PM, P.J. Eby wrote: > > 1. Why ';' separation, instead of tabs as in PEP 262? ?Aren't semicolons a > valid character in filenames? I am changing this into a . for now. What about Antoine's idea about doing a quote() on the names ? >From my point of view seems more simple to deal with, if 3rd-party tools want to work on these files without using pkgutil or Python. > > 4. There should probably be a way to iterate over the projects in a > directory, since it's otherwise impossible for an installation tool to find > out what project(s) "own" a file that conflicts with something being > installed. ?Alternatively, reshaping the file API to allow querying by path > as well as by project might work. I am adding a "get_projects" api: get_projects() -> iterator Provides an iterator that will return (name, path) tuples, where `name` is the name of a registered project and `path` the path to its `egg-info` directory. But for the use case you are mentioning, what about an explicit API: get_owners(paths) -> sequence of project names returns a sequence of tuple. For each path in the "paths" list, a tuple of project names is returned > > 5. If any cache mechanisms are to be used by the API, the API *must* make it > possible to bypass or explicitly manage that cache, as otherwise > installation tools and tools that manipulate sys.path at runtime may end up > using incorrect data. work in progress - (I am afraid I have to write an advanced prototype to be able to know exaclty how the cache might work, and so, what API we should have) > > 6. get_files() doesn't document whether the yielded paths are absolute or > relative, local or cross-platform, etc. I am fixing this as well >> I need to find back your comments for this part, I must have missed >> them. That's >> the last part I didn't work out yet on the current PEP revision. > > Well, if you can't find them, the EggFormats doc explains how these file/dir > structures are currently laid out by setuptools, easy_install, pip, etc., > and the PEP should probably reference that. work in progress Tarek -- Tarek Ziad? | http://ziade.org From ziade.tarek at gmail.com Tue May 19 16:12:05 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 19 May 2009 16:12:05 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <4A0F0326.7070501@g.nevcal.com> <20090516185556.D8B5F3A4061@sparrow.telecommunity.com> <4A0F100F.3070406@g.nevcal.com> <4A0F3B12.8080500@mrabarnett.plus.com> <1242535959.4478.12.camel@jenner> <4A100B46.2080003@mrabarnett.plus.com> <06B4F3E6-A8BD-4B4A-87C8-88B52B00F7EF@mac.com> Message-ID: <94bdd2610905190712h2ac8883fs7fc85224a2fa3ff6@mail.gmail.com> On Tue, May 19, 2009 at 4:03 PM, Antoine Pitrou wrote: > Ronald Oussoren mac.com> writes: >> >> Wouldn't it be possible to use a CSV file for this? That way we >> wouldn't have to invent yet another escaping mechanism and there's >> already good suppport for reading and writing CSV files in the >> standard library. > > +1 > > We can even customize the delimiter if you want to make it more readable (or if > there's a shortage of bikeshed material ;-)). +1 and the default csv delimiter "," makes it perfectly readable From google at mrabarnett.plus.com Tue May 19 16:21:25 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 19 May 2009 15:21:25 +0100 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> Message-ID: <4A12C065.3000705@mrabarnett.plus.com> Tarek Ziad? wrote: > On Sat, May 16, 2009 at 6:55 PM, P.J. Eby wrote: >> 1. Why ';' separation, instead of tabs as in PEP 262? Aren't semicolons a >> valid character in filenames? > > I am changing this into a . for now. > > What about Antoine's idea about doing a quote() on the names ? > >>From my point of view seems more simple to deal with, if 3rd-party > tools want to work on these files without using pkgutil or Python. > >> 4. There should probably be a way to iterate over the projects in a >> directory, since it's otherwise impossible for an installation tool to find >> out what project(s) "own" a file that conflicts with something being >> installed. Alternatively, reshaping the file API to allow querying by path >> as well as by project might work. > > I am adding a "get_projects" api: > > get_projects() -> iterator > > Provides an iterator that will return (name, path) tuples, where `name` > is the name of a registered project and `path` the path to its `egg-info` > directory. > > But for the use case you are mentioning, what about an explicit API: > > get_owners(paths) -> sequence of project names > > returns a sequence of tuple. For each path in the "paths" list, a > tuple of project names > is returned > >> 5. If any cache mechanisms are to be used by the API, the API *must* make it >> possible to bypass or explicitly manage that cache, as otherwise >> installation tools and tools that manipulate sys.path at runtime may end up >> using incorrect data. > > work in progress - (I am afraid I have to write an advanced prototype > to be able to know > exaclty how the cache might work, and so, what API we should have) > >> 6. get_files() doesn't document whether the yielded paths are absolute or >> relative, local or cross-platform, etc. > > I am fixing this as well > > >>> I need to find back your comments for this part, I must have missed >>> them. That's >>> the last part I didn't work out yet on the current PEP revision. >> Well, if you can't find them, the EggFormats doc explains how these file/dir >> structures are currently laid out by setuptools, easy_install, pip, etc., >> and the PEP should probably reference that. > > work in progress > Is it Pythonic for the methods to starts with "get_", or should they be projects(), owners(), etc? From p.f.moore at gmail.com Tue May 19 21:33:50 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 19 May 2009 20:33:50 +0100 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> Message-ID: <79990c6b0905191233p7565929ft4bcc90ea29e88b2f@mail.gmail.com> 2009/5/19 Tarek Ziad? : > On Sat, May 16, 2009 at 6:55 PM, P.J. Eby wrote: >> >> 1. Why ';' separation, instead of tabs as in PEP 262? ?Aren't semicolons a >> valid character in filenames? > > I am changing this into a . for now. I'm not following this thread at all, but can I put a strong vote *against* tabs in, please. You're just asking for bug reports from people who edit the file and expand tabs to spaces (either deliberately, or via an automatic editor setting they forgot about) and then can't see why a file that looks the same works differently. OK, so it's not meant to be a human editable file, but that won't stop some people :-) Paul From pje at telecommunity.com Tue May 19 22:36:40 2009 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 19 May 2009 16:36:40 -0400 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.co m> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> Message-ID: <20090519203357.CE9C63A40D7@sparrow.telecommunity.com> At 04:04 PM 5/19/2009 +0200, Tarek Ziad? wrote: >On Sat, May 16, 2009 at 6:55 PM, P.J. Eby wrote: > > > > 1. Why ';' separation, instead of tabs as in PEP 262? Aren't semicolons a > > valid character in filenames? > >I am changing this into a . for now. > >What about Antoine's idea about doing a quote() on the names ? I like the CSV idea better, since the csv module is available in 2.3 and up. We should just pick a dialect with unambiguous quoting rules. > From my point of view seems more simple to deal with, if 3rd-party >tools want to work on these files without using pkgutil or Python. True, but then CSV files are still pretty common. One other possibility that might work is using a vertical bar as a separator. My preference rank at the moment is probably tabs, CSV, or vertical bar. But I don't really care all that much, so let the people who care decide. Personally, though, I don't see much point to cross-language manipulation of the file. System packaging tools have their own way of keeping track of this stuff. So unless somebody's using it to *build* system packages (e.g. making an RPM builder), they don't need this. Now, about the APIs... > > 4. There should probably be a way to iterate over the projects in a > > directory, since it's otherwise impossible for an installation tool to find > > out what project(s) "own" a file that conflicts with something being > > installed. Alternatively, reshaping the file API to allow querying by path > > as well as by project might work. > >I am adding a "get_projects" api: > > get_projects() -> iterator > > Provides an iterator that will return (name, path) tuples, where `name` > is the name of a registered project and `path` the path to its `egg-info` > directory. > >But for the use case you are mentioning, what about an explicit API: > > get_owners(paths) -> sequence of project names > > returns a sequence of tuple. For each path in the "paths" list, a >tuple of project names > is returned > > > > > 5. If any cache mechanisms are to be used by the API, the API > *must* make it > > possible to bypass or explicitly manage that cache, as otherwise > > installation tools and tools that manipulate sys.path at runtime may end up > > using incorrect data. > >work in progress - (I am afraid I have to write an advanced prototype >to be able to know >exaclty how the cache might work, and so, what API we should have) I think it would be simpler to have explicit object types representing things like a directory, a collection of directories, and individual projects, and these object types should be part of the API. Any function-oriented API should just be exposed as the methods of a default singleton. Other Python modules follow this pattern -- and it's what I copied for the pkg_resources design. It gives a nice tradeoff between keeping the simple things simple, and complex things possible, as well as keeping mechanism and policy separate. Right now, the API design you're trying to do is being burdened by using strings and tuples to represent things that could just as easily be objects with their own methods, instead of things you have to pass back into other APIs. This also makes caching more complex, because you can't just have one main object with stuff hanging off; you've got to have a bunch of dictionaries, tuples, lists, sets, etc. From fuzzyman at voidspace.org.uk Wed May 20 00:48:42 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 19 May 2009 23:48:42 +0100 Subject: [Python-Dev] IronPython specific code in inspect module Message-ID: <4A13374A.4060404@voidspace.org.uk> Hello all, The inspect module (inspect.get_argspec etc) work fine for Python functions and classes in IronPython, but they don't work on .NET types which don't have the Python function attributes like im_func etc. I have IronPython specific versions of several of these functions which use .NET reflection and inspect could fallback to if sys.platform == 'cli'. Would it be ok for me to add these to the inspect module? Obviously the tests would only run on IronPython... The behaviour for CPython would be unaffected. All the best, Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From benjamin at python.org Wed May 20 03:26:47 2009 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 19 May 2009 20:26:47 -0500 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <4A13374A.4060404@voidspace.org.uk> References: <4A13374A.4060404@voidspace.org.uk> Message-ID: <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> 2009/5/19 Michael Foord : > I have IronPython specific versions of several of these functions which use > .NET reflection and inspect could fallback to if sys.platform == 'cli'. > Would it be ok for me to add these to the inspect module? Obviously the > tests would only run on IronPython... The behaviour for CPython would be > unaffected. I wish we had more of a policy about this. There seems to be a long tradition of special casing other implementations in the stdlib. For example, see types.py and tests/test_support.py for remnants of Jython compatibility. However, I suspect this code has languished with out core-developers using the trunk stdlib with Jython. I suppose this is a good reason why we are going to split the stdlib out of the main repo. However that still leaves the question of how to handle putting code like this in. Should we ask that all code be implementation-independent as much as possible from the original authors? Do all all changes against the stdlib have to be run against several implementations? Should we sprinkle if switches all over the codebase for different implementations, or should new support files be added? -- Regards, Benjamin From fijall at gmail.com Wed May 20 04:09:03 2009 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 19 May 2009 20:09:03 -0600 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> References: <4A13374A.4060404@voidspace.org.uk> <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> Message-ID: <693bc9ab0905191909y1cb7183dna364cb8f3ee30626@mail.gmail.com> On Tue, May 19, 2009 at 7:26 PM, Benjamin Peterson wrote: > 2009/5/19 Michael Foord : >> I have IronPython specific versions of several of these functions which use >> .NET reflection and inspect could fallback to if sys.platform == 'cli'. >> Would it be ok for me to add these to the inspect module? Obviously the >> tests would only run on IronPython... The behaviour for CPython would be >> unaffected. > > I wish we had more of a policy about this. There seems to be a long > tradition of special casing other implementations in the stdlib. For > example, see types.py and tests/test_support.py for remnants of Jython > compatibility. However, I suspect this code has languished with out > core-developers using the trunk stdlib with Jython. I suppose this is > a good reason why we are going to split the stdlib out of the main > repo. > > However that still leaves the question of how to handle putting code > like this in. Should we ask that all code be > implementation-independent as much as possible from the original > authors? Do all all changes against the stdlib have to be run against > several implementations? Should we sprinkle if switches all over the > codebase for different implementations, or should new support files be > added? > >From my observation (mostly according to jython), such changes easily get out of sync. The net result is that you have one, outdated, version in stdlib and other implementation, like IronPython is maintaining it's own anyway. IMO it's easy enough to maintain clearly implementation-specific parts out of cpython's stdlib. What I would rather like to see is that stdlib does not contain impl specific parts, even for cpython and cpython maintains it's own things outside of stdlib. This would be in line with what we discussed at pycon I think, please correct me if I'm wrong. Cheers, fijal From dstanek at dstanek.com Wed May 20 04:21:27 2009 From: dstanek at dstanek.com (David Stanek) Date: Tue, 19 May 2009 22:21:27 -0400 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> References: <4A13374A.4060404@voidspace.org.uk> <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> Message-ID: On Tue, May 19, 2009 at 9:26 PM, Benjamin Peterson wrote: > 2009/5/19 Michael Foord : >> I have IronPython specific versions of several of these functions which use >> .NET reflection and inspect could fallback to if sys.platform == 'cli'. >> Would it be ok for me to add these to the inspect module? Obviously the >> tests would only run on IronPython... The behaviour for CPython would be >> unaffected. > > I wish we had more of a policy about this. There seems to be a long > tradition of special casing other implementations in the stdlib. For > example, see types.py and tests/test_support.py for remnants of Jython > compatibility. However, I suspect this code has languished with out > core-developers using the trunk stdlib with Jython. I suppose this is > a good reason why we are going to split the stdlib out of the main > repo. > > However that still leaves the question of how to handle putting code > like this in. Should we ask that all code be > implementation-independent as much as possible from the original > authors? Do all all changes against the stdlib have to be run against > several implementations? Should we sprinkle if switches all over the > codebase for different implementations, or should new support files be > added? > It seems that using a technique similar to dependency injection could provide some value. DI allows implementations conforming to some interface to be injected into a running application without the messy construction logic. The simple construction-by-hand pattern is to create the dependencies and pass them into the dependent objects. Frameworks build on top of this to allow the dependencies to be wired together without having any construction logic in code, like switch statements, to do the wiring. I think a similar pattern could be used in the standard library. When the interpreter goes through its normal bootstrapping process in can just execute a module provided by the vendor that specifies the platform specific implementations. Some defaults can be provided since Python already has a bunch of platform specific implementations. An over simplified design to make this happen may look like: 1. Create a simple configuration that allows a mapping of interfaces to implementations. This is where the vendor would say when using inspect you really should be using cli.inspect. 2. Add executing this new configuration to the bootstrapping process. 3. Add generic hooks into the library where needed to load the dependency instead of platform specific if statements. 4. Rip out the platform specific code that is hidden in the if statements and use that as the basis for the sane injected defaults. 5. Document the interfaces for each component that can be changed by the vendor. -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek From benjamin at python.org Wed May 20 04:26:55 2009 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 19 May 2009 21:26:55 -0500 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <693bc9ab0905191909y1cb7183dna364cb8f3ee30626@mail.gmail.com> References: <4A13374A.4060404@voidspace.org.uk> <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> <693bc9ab0905191909y1cb7183dna364cb8f3ee30626@mail.gmail.com> Message-ID: <1afaf6160905191926l17e73e92ic29250d9407dd075@mail.gmail.com> 2009/5/19 Maciej Fijalkowski : > From my observation (mostly according to jython), such changes easily get out of > sync. The net result is that you have one, outdated, version in stdlib > and other implementation, like IronPython is maintaining it's own > anyway. IMO it's easy enough > to maintain clearly implementation-specific parts out of cpython's stdlib. Hopefully, it will be easier to visualize how this might work once the plan for hg migration is finalized. > > What I would rather like to see is that stdlib does not contain impl > specific parts, > even for cpython and cpython maintains it's own things outside of stdlib. This > would be in line with what we discussed at pycon I think, please correct me if > I'm wrong. I was not present, but that's my impression, too. -- Regards, Benjamin From dinov at microsoft.com Wed May 20 04:36:13 2009 From: dinov at microsoft.com (Dino Viehland) Date: Wed, 20 May 2009 02:36:13 +0000 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <4A13374A.4060404@voidspace.org.uk> References: <4A13374A.4060404@voidspace.org.uk> Message-ID: <1A472770E042064698CB5ADC83A12ACD0287ED45@TK5EX14MBXC120.redmond.corp.microsoft.com> Michael Foord wrote: > I have IronPython specific versions of several of these functions which > use .NET reflection and inspect could fallback to if sys.platform == > 'cli'. Would it be ok for me to add these to the inspect module? > Obviously the tests would only run on IronPython... The behaviour for > CPython would be unaffected. What about instead defining __argspec__ for built-in functions/method objects and allowing all the implementations to implement it? We could all agree to return: [ (return_type, (arg_types,...)), (return_type, (arg_types,...)), ] Then inspect can check for that attribute and support introspection on built-ins. This would be an easy feature for us to implement and it may also be for Jython as well given that we both get the power of our platforms reflection capabilities. Any platform that implements it lights up w/o new platform specific code. And maybe this needs to go to python-ideas now :) From ajaksu at gmail.com Wed May 20 05:18:34 2009 From: ajaksu at gmail.com (Daniel Diniz) Date: Wed, 20 May 2009 00:18:34 -0300 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <1A472770E042064698CB5ADC83A12ACD0287ED45@TK5EX14MBXC120.redmond.corp.microsoft.com> References: <4A13374A.4060404@voidspace.org.uk> <1A472770E042064698CB5ADC83A12ACD0287ED45@TK5EX14MBXC120.redmond.corp.microsoft.com> Message-ID: <2d75d7660905192018h209d27cfsc4a4e74ad9fd1ece@mail.gmail.com> Dino Viehland wrote: > What about instead defining __argspec__ for built-in functions/method > objects and allowing all the implementations to implement it? ?We could > all agree to return: > > [ > ? ? ? ?(return_type, (arg_types,...)), > ? ? ? ?(return_type, (arg_types,...)), > ] > > Then inspect can check for that attribute and support introspection on > built-ins. ?This would be an easy feature for us to implement and it > may also be for Jython as well given that we both get the power of our > platforms reflection capabilities. ?Any platform that implements it > lights up w/o new platform specific code. And maybe this needs to go > to python-ideas now :) Curiously, inspect limitations on CPython (can't inspect functools.partial, has issues with some descriptors and decorators) got us chatting about PEP 362: Function Signature Object[0] on #python-dev today. PEP 362 was also brought up in a recent thread where the executive summary was 'it just needs someone to guide it through the last steps'[1], and it would make this kind of introspection nice and clean[2]. It makes even more sense now we have PEP 3107: Function Annotations[3] in place. Cheers, Daniel [0] http://www.python.org/dev/peps/pep-0362/ [1] http://mail.python.org/pipermail/python-dev/2009-April/088517.html [2] http://mail.python.org/pipermail/python-dev/2009-April/088597.html [3] http://www.python.org/dev/peps/pep-3107/#parameters From chrispl78 at yahoo.com Wed May 20 09:31:00 2009 From: chrispl78 at yahoo.com (Chris Plasun) Date: Wed, 20 May 2009 00:31:00 -0700 Subject: [Python-Dev] Python on PowerPC? Message-ID: <4A13B1B4.5090705@yahoo.com> Hi, I'm to develop console apps on a Linux embedded PowerPC board (Freescale MPC8313). Is there a Python release for the PowerPC platform? Thanks, Chris Plasun From ziade.tarek at gmail.com Wed May 20 11:48:59 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 20 May 2009 11:48:59 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <20090519203357.CE9C63A40D7@sparrow.telecommunity.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> <20090519203357.CE9C63A40D7@sparrow.telecommunity.com> Message-ID: <94bdd2610905200248k79d15f2ahf5aa036ac1e7a80d@mail.gmail.com> On Tue, May 19, 2009 at 10:36 PM, P.J. Eby wrote: > > Now, about the APIs... > > I think it would be simpler to have explicit object types representing > things like a directory, a collection of directories, and individual > projects, and these object types should be part of the API. > > Any function-oriented API should just be exposed as the methods of a default > singleton. ?Other Python modules follow this pattern -- and it's what I > copied for the pkg_resources design. ?It gives a nice tradeoff between > keeping the simple things simple, and complex things possible, as well as > keeping mechanism and policy separate. > > Right now, the API design you're trying to do is being burdened by using > strings and tuples to represent things that could just as easily be objects > with their own methods, instead of things you have to pass back into other > APIs. ?This also makes caching more complex, because you can't just have one > main object with stuff hanging off; you've got to have a bunch of > dictionaries, tuples, lists, sets, etc. I don't know how other people work on building APIs in PEPs, but at this stage I am unable to work them on the paper, without having a prototype to try things out. So I guess I'll start this prototype in bitbucket and come back with it for feedback in Distutils-SIG, for a new PEP 376 round. Tarek -- Tarek Ziad? | http://ziade.org From doug.hellmann at gmail.com Wed May 20 13:13:44 2009 From: doug.hellmann at gmail.com (Doug Hellmann) Date: Wed, 20 May 2009 07:13:44 -0400 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: References: <4A13374A.4060404@voidspace.org.uk> <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> Message-ID: <54265478-74A9-4105-B4AD-828C795729DD@gmail.com> On May 19, 2009, at 10:21 PM, David Stanek wrote: > On Tue, May 19, 2009 at 9:26 PM, Benjamin Peterson > wrote: >> 2009/5/19 Michael Foord : >>> I have IronPython specific versions of several of these functions >>> which use >>> .NET reflection and inspect could fallback to if sys.platform == >>> 'cli'. >>> Would it be ok for me to add these to the inspect module? >>> Obviously the >>> tests would only run on IronPython... The behaviour for CPython >>> would be >>> unaffected. [...] >> However that still leaves the question of how to handle putting code >> like this in. Should we ask that all code be >> implementation-independent as much as possible from the original >> authors? Do all all changes against the stdlib have to be run against >> several implementations? Should we sprinkle if switches all over the >> codebase for different implementations, or should new support files >> be >> added? >> > > It seems that using a technique similar to dependency injection could > provide some value. DI allows implementations conforming to some > interface to be injected into a running application without the messy > construction logic. The simple construction-by-hand pattern is to > create the dependencies and pass them into the dependent objects. > Frameworks build on top of this to allow the dependencies to be wired > together without having any construction logic in code, like switch > statements, to do the wiring. > > I think a similar pattern could be used in the standard library. When > the interpreter goes through its normal bootstrapping process in can > just execute a module provided by the vendor that specifies the > platform specific implementations. Some defaults can be provided since > Python already has a bunch of platform specific implementations. > > An over simplified design to make this happen may look like: > 1. Create a simple configuration that allows a mapping of interfaces > to implementations. This is where the vendor would say when using > inspect you really should be using cli.inspect. That sounds like a plugin and the "strategy" pattern. Tarek is doing some work on providing a standard plugin mechanism as part of the work he's doing on distutils, isn't he? > 2. Add executing this new configuration to the bootstrapping process. Maybe I misunderstand, but wouldn't it make more sense to initialize the platform-specific parts of a module when it is imported rather than bring in everything at startup? Are we only worried about interpreter-implementation-level dependencies, or should there be a way for all platform-specific features to be treated in the same way? There are quite a few checks for Windows that could be moved into the platform-specific modules if there was an easy/standard way to do it. Doug > 3. Add generic hooks into the library where needed to load the > dependency instead of platform specific if statements. > 4. Rip out the platform specific code that is hidden in the if > statements and use that as the basis for the sane injected defaults. > 5. Document the interfaces for each component that can be changed by > the vendor. > > -- > David > blog: http://www.traceback.org > twitter: http://twitter.com/dstanek > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/doug.hellmann%40gmail.com From doug.hellmann at gmail.com Wed May 20 13:14:53 2009 From: doug.hellmann at gmail.com (Doug Hellmann) Date: Wed, 20 May 2009 07:14:53 -0400 Subject: [Python-Dev] Python on PowerPC? In-Reply-To: <4A13B1B4.5090705@yahoo.com> References: <4A13B1B4.5090705@yahoo.com> Message-ID: On May 20, 2009, at 3:31 AM, Chris Plasun wrote: > Hi, > > I'm to develop console apps on a Linux embedded PowerPC board > (Freescale MPC8313). > > Is there a Python release for the PowerPC platform? We used to run a version of the interpreter on PPC for a microcontroller board we had, but we built it ourselves. Have you tried building from source? It might Just Work. Doug From eckhardt at satorlaser.com Wed May 20 13:17:10 2009 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Wed, 20 May 2009 13:17:10 +0200 Subject: [Python-Dev] Python on PowerPC? In-Reply-To: <4A13B1B4.5090705@yahoo.com> References: <4A13B1B4.5090705@yahoo.com> Message-ID: <200905201317.10235.eckhardt@satorlaser.com> On Wednesday 20 May 2009, Chris Plasun wrote: > I'm to develop console apps on a Linux embedded PowerPC board (Freescale > MPC8313). > > Is there a Python release for the PowerPC platform? This has pretty little to do with the development of the Python language itself, so it is rather off topic here. That said, Linux systems are barely thinkable without Python, even when running on PPC, so yes, Python runs on PPC, too, and is included in probably every Linux distro, e.g. Debian. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Sator Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From dstanek at dstanek.com Wed May 20 13:54:56 2009 From: dstanek at dstanek.com (David Stanek) Date: Wed, 20 May 2009 07:54:56 -0400 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: <54265478-74A9-4105-B4AD-828C795729DD@gmail.com> References: <4A13374A.4060404@voidspace.org.uk> <1afaf6160905191826k61a97f31h3ebb1b9e31fe5258@mail.gmail.com> <54265478-74A9-4105-B4AD-828C795729DD@gmail.com> Message-ID: On Wed, May 20, 2009 at 7:13 AM, Doug Hellmann wrote: > > On May 19, 2009, at 10:21 PM, David Stanek wrote: > >> >> It seems that using a technique similar to dependency injection could >> provide some value. DI allows implementations conforming to some >> interface to be injected into a running application without the messy >> construction logic. The simple construction-by-hand pattern is to >> create the dependencies and pass them into the dependent objects. >> Frameworks build on top of this to allow the dependencies to be wired >> together without having any construction logic in code, like switch >> statements, to do the wiring. >> >> I think a similar pattern could be used in the standard library. When >> the interpreter goes through its normal bootstrapping process in can >> just execute a module provided by the vendor that specifies the >> platform specific implementations. Some defaults can be provided since >> Python already has a bunch of platform specific implementations. >> >> An over simplified design to make this happen may look like: >> 1. Create a simple configuration that allows a mapping of interfaces >> to implementations. This is where the vendor would say when using >> inspect you really should be using cli.inspect. > > That sounds like a plugin and the "strategy" pattern. ?Tarek is doing some > work on providing a standard plugin mechanism as part of the work he's doing > on distutils, isn't he? > Basically yes. What I proposed is more like a service locator with a pinch of DI. Where can I learn more about what Tarek is working on? Is there a branch somewhere? >> 2. Add executing this new configuration to the bootstrapping process. > > Maybe I misunderstand, but wouldn't it make more sense to initialize the > platform-specific parts of a module when it is imported rather than bring in > everything at startup? > By executing I mean figure out the mappings and necessarily create them. This enables errors to happen early if the dependencies are not met. This is really useful if the technique is used for more than just the platform specific code. > Are we only worried about interpreter-implementation-level dependencies, or > should there be a way for all platform-specific features to be treated in > the same way? ? There are quite a few checks for Windows that could be moved > into the platform-specific modules if there was an easy/standard way to do > it. > -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek From sven.schrader at gmail.com Wed May 20 14:31:18 2009 From: sven.schrader at gmail.com (Sven Schrader) Date: Wed, 20 May 2009 14:31:18 +0200 Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2 Message-ID: <4A13F816.7050106@gmail.com> Hi, since our python installation is located on a symlink'ed directory, our variables "sys.exec_prefix" and "sys.executable" can have different paths. Therefore, the respective test in build_ext.py fails (line 202) and a wrong library directory is obtained. To fix this issue, I have attached a patch that uses "os.path.samefile" instead, to see whether two files are identical irrespective of its path. Greetings Sven Schrader ps: please CC answers to me, I'm not on the list :-) pps: I hope the attachment isn't inline... -------------- next part -------------- A non-text attachment was scrubbed... Name: python-2.5.2-build_ext-pathcompare.patch Type: text/x-patch Size: 1570 bytes Desc: not available URL: From seb.binet at gmail.com Wed May 20 15:33:43 2009 From: seb.binet at gmail.com (Sebastien Binet) Date: Wed, 20 May 2009 15:33:43 +0200 Subject: [Python-Dev] IronPython specific code in inspect module In-Reply-To: References: <4A13374A.4060404@voidspace.org.uk> <54265478-74A9-4105-B4AD-828C795729DD@gmail.com> Message-ID: <200905201533.43458.binet@cern.ch> On Wednesday 20 May 2009 13:54:56 David Stanek wrote: > On Wed, May 20, 2009 at 7:13 AM, Doug Hellmann wrote: > > On May 19, 2009, at 10:21 PM, David Stanek wrote: > >> It seems that using a technique similar to dependency injection could > >> provide some value. DI allows implementations conforming to some > >> interface to be injected into a running application without the messy > >> construction logic. The simple construction-by-hand pattern is to > >> create the dependencies and pass them into the dependent objects. > >> Frameworks build on top of this to allow the dependencies to be wired > >> together without having any construction logic in code, like switch > >> statements, to do the wiring. > >> > >> I think a similar pattern could be used in the standard library. When > >> the interpreter goes through its normal bootstrapping process in can > >> just execute a module provided by the vendor that specifies the > >> platform specific implementations. Some defaults can be provided since > >> Python already has a bunch of platform specific implementations. > >> > >> An over simplified design to make this happen may look like: > >> 1. Create a simple configuration that allows a mapping of interfaces > >> to implementations. This is where the vendor would say when using > >> inspect you really should be using cli.inspect. > > > > That sounds like a plugin and the "strategy" pattern. Tarek is doing > > some work on providing a standard plugin mechanism as part of the work > > he's doing on distutils, isn't he? > > Basically yes. What I proposed is more like a service locator with a > pinch of DI. Where can I learn more about what Tarek is working on? Is > there a branch somewhere? it is here: http://wiki.python.org/moin/Distutils/PluginSystem and there: http://pypi.python.org/pypi/extensions cheers, sebastien. -- ######################################### # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay ######################################### From jyasskin at gmail.com Wed May 20 17:33:26 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 20 May 2009 08:33:26 -0700 Subject: [Python-Dev] Documenting lnotab Message-ID: <5d44f72f0905200833m7615dc64od48c309e8bcee5d6@mail.gmail.com> Hi all. I've got a patch to add some documentation for lnotab and its use in tracing at http://bugs.python.org/issue6042. I think it's correct, but it's complicated so I'm looking for someone who was around when it was designed to check. I'm also proposing a change to the semantics of PyCode_CheckLineNumber and want to know whether I should consider it public. Thanks to anyone who takes a look! Jeffrey From chrispl78 at yahoo.com Wed May 20 17:47:50 2009 From: chrispl78 at yahoo.com (Chris Plasun) Date: Wed, 20 May 2009 08:47:50 -0700 Subject: [Python-Dev] Python on PowerPC? In-Reply-To: <200905201317.10235.eckhardt@satorlaser.com> References: <4A13B1B4.5090705@yahoo.com> <200905201317.10235.eckhardt@satorlaser.com> Message-ID: <4A142626.10502@yahoo.com> Thanks for your reply. Ulrich Eckhardt wrote: > On Wednesday 20 May 2009, Chris Plasun wrote: >> I'm to develop console apps on a Linux embedded PowerPC board (Freescale >> MPC8313). >> >> Is there a Python release for the PowerPC platform? > > This has pretty little to do with the development of the Python language > itself, so it is rather off topic here. This group appeared to be relevant. > That said, Linux systems are barely thinkable without Python, even when > running on PPC, so yes, Python runs on PPC, too, and is included in probably > every Linux distro, e.g. Debian. hmmm, hopefully I can find something to run in an embedded box. Thanks, Chris From jyasskin at gmail.com Wed May 20 18:40:42 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 20 May 2009 09:40:42 -0700 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> A couple thoughts: I'm with the people who think the refcount should be accessed through functions by apps that want ABI compatibility. In particular, GIL-removal efforts are guaranteed to change how the refcount is modified, but there's a good chance they wouldn't have to change the API. (We have some ideas for how to maintain source compatibility in the absence of a GIL: http://code.google.com/p/unladen-swallow/wiki/ExtensionModules#Reference_Counting) Over an 8-year lifetime for Python 3, Moore's law predicts that desktop systems will have up to 64 cores, at which point even the simplest GIL-removal strategy of making refcounts atomic will be a win, despite the 2x performance loss for a single thread. I wouldn't want an ABI to rule that out. I do think the refcounting macros should remain present in the API (not ABI) for apps that only need source compatibility and want the extra speed. I wonder if it makes sense to specify an API compatibility mode in this PEP too. "Py_LIMITED_API" may not be the right macro name?it didn't imply anything about an ABI when I first saw it. Might it make sense to use Py_ABI_COMPATIBILITY=### instead? (Where ### could be an ISO date like 20090520.) That would put "ABI" in the macro name and make it easier to define new versions later if necessary. (New versions would help people compile against a new version of Python and be confident they had something that would run against old versions.) If we never define a new version, defining it to a number instead of just anything doesn't really hurt. It's probably worth pointing out in the PEP that the fact that PyVarObject.ob_size is part of the ABI means that PyObject cannot change size, even by adding fields at the end. Right now, the globals representing types are defined like "PyAPI_DATA(PyTypeObject) PyList_Type;". To allow the core to use the new type creation functions, it might be useful to make the ABI type objects PyTypeObject* constants instead. In general, this looks really good. Thanks! Jeffrey On Sun, May 17, 2009 at 1:54 PM, "Martin v. L?wis" wrote: > > Thomas Wouters reminded me of a long-standing idea; I finally > found the time to write it down. > > Please comment! > > Regards, > Martin > > PEP: 384 > Title: Defining a Stable ABI > Version: $Revision: 72754 $ > Last-Modified: $Date: 2009-05-17 21:14:52 +0200 (So, 17. Mai 2009) $ > Author: Martin v. L?wis > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 17-May-2009 > Python-Version: 3.2 > Post-History: > > Abstract > ======== > > Currently, each feature release introduces a new name for the > Python DLL on Windows, and may cause incompatibilities for extension > modules on Unix. This PEP proposes to define a stable set of API > functions which are guaranteed to be available for the lifetime > of Python 3, and which will also remain binary-compatible across > versions. Extension modules and applications embedding Python > can work with different feature releases as long as they restrict > themselves to this stable ABI. > > Rationale > ========= > > The primary source of ABI incompatibility are changes to the lay-out > of in-memory structures. For example, the way in which string interning > works, or the data type used to represent the size of an object, have > changed during the life of Python 2.x. As a consequence, extension > modules making direct access to fields of strings, lists, or tuples, > would break if their code is loaded into a newer version of the > interpreter without recompilation: offsets of other fields may have > changed, making the extension modules access the wrong data. > > In some cases, the incompatibilities only affect internal objects of > the interpreter, such as frame or code objects. For example, the way > line numbers are represented has changed in the 2.x lifetime, as has > the way in which local variables are stored (due to the introduction > of closures). Even though most applications probably never used these > objects, changing them had required to change the PYTHON_API_VERSION. > > On Linux, changes to the ABI are often not much of a problem: the > system will provide a default Python installation, and many extension > modules are already provided pre-compiled for that version. If additional > modules are needed, or additional Python versions, users can typically > compile them themselves on the system, resulting in modules that use > the right ABI. > > On Windows, multiple simultaneous installations of different Python > versions are common, and extension modules are compiled by their > authors, not by end users. To reduce the risk of ABI incompatibilities, > Python currently introduces a new DLL name pythonXY.dll for each > feature release, whether or not ABI incompatibilities actually exist. > > With this PEP, it will be possible to reduce the dependency of binary > extension modules on a specific Python feature release, and applications > embedding Python can be made work with different releases. > > Specification > ============= > > The ABI specification falls into two parts: an API specification, > specifying what function (groups) are available for use with the > ABI, and a linkage specification specifying what libraries to link > with. The actual ABI (layout of structures in memory, function > calling conventions) is not specified, but implied by the > compiler. As a recommendation, a specific ABI is recommended for > selected platforms. > > During evolution of Python, new ABI functions will be added. > Applications using them will then have a requirement on a minimum > version of Python; this PEP provides no mechanism for such > applications to fall back when the Python library is too old. > > Terminology > ----------- > > Applications and extension modules that want to use this ABI > are collectively referred to as "applications" from here on. > > Header Files and Preprocessor Definitions > ----------------------------------------- > > Applications shall only include the header file Python.h (before > including any system headers), or, optionally, include pyconfig.h, and > then Python.h. > > During the compilation of applications, the preprocessor macro > Py_LIMITED_API must be defined. Doing so will hide all definitions > that are not part of the ABI. > > Structures > ---------- > > Only the following structures and structure fields are accessible to > applications: > > - PyObject (ob_refcnt, ob_type) > - PyVarObject (ob_base, ob_size) > - Py_buffer (buf, obj, len, itemsize, readonly, ndim, shape, > ?strides, suboffsets, smalltable, internal) > - PyMethodDef (ml_name, ml_meth, ml_flags, ml_doc) > - PyMemberDef (name, type, offset, flags, doc) > - PyGetSetDef (name, get, set, doc, closure) > > The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE) > are also available to applications. > > The following types are available, but opaque (i.e. incomplete): > > - PyThreadState > - PyInterpreterState > > Type Objects > ------------ > > The structure of type objects is not available to applications; > declaration of "static" type objects is not possible anymore > (for applications using this ABI). > Instead, type objects get created dynamically. To allow an > easy creation of types (in particular, to be able to fill out > function pointers easily), the following structures and functions > are available:: > > ?typedef struct{ > ? ?int slot; ? ?/* slot id, see below */ > ? ?void *pfunc; /* function pointer */ > ?} PyType_Slot; > > ?struct{ > ? ?const char* name; > ? ?const char* doc; > ? ?int basicsize; > ? ?int itemsize; > ? ?int flags; > ? ?PyType_Slot *slots; /* terminated by slot==0. */ > ?} PyType_Spec; > > ?PyObject* PyType_FromSpec(PyType_Spec*); > > To specify a slot, a unique slot id must be provided. New Python > versions may introduce new slot ids, but slot ids will never be > recycled. Slots may get deprecated, but continue to be supported > throughout Python 3.x. > > The slot ids are named like the field names of the structures that > hold the pointers in Python 3.1, with an added ``Py_`` prefix (i.e. > Py_tp_dealloc instead of just tp_dealloc): > > - tp_dealloc, tp_print, tp_getattr, tp_setattr, tp_repr, > ?tp_hash, tp_call, tp_str, tp_getattro, tp_setattro, > ?tp_doc, tp_traverse, tp_clear, tp_richcompare, tp_iter, > ?tp_iternext, tp_methods, tp_base, tp_descr_set, tp_descr_set, > ?tp_init, tp_alloc, tp_new, tp_is_gc, tp_bases, tp_del > - nb_add nb_subtract nb_multiply nb_remainder nb_divmod nb_power > ?nb_negative nb_positive nb_absolute nb_bool nb_invert nb_lshift > ?nb_rshift nb_and nb_xor nb_or nb_int nb_float nb_inplace_add > ?nb_inplace_subtract nb_inplace_multiply nb_inplace_remainder > ?nb_inplace_power nb_inplace_lshift nb_inplace_rshift nb_inplace_and > ?nb_inplace_xor nb_inplace_or nb_floor_divide nb_true_divide > ?nb_inplace_floor_divide nb_inplace_true_divide nb_index > - sq_length sq_concat sq_repeat sq_item sq_ass_item was_sq_ass_slice > ?sq_contains sq_inplace_concat sq_inplace_repeat > - mp_length mp_subscript mp_ass_subscript > - bf_getbuffer bf_releasebuffer > > XXX Not supported yet: tp_weaklistoffset, tp_dictoffset > > The following fields cannot be set during type definition: > - tp_dict tp_mro tp_cache tp_subclasses tp_weaklist > > Functions and function-like Macros > ---------------------------------- > > All functions starting with _Py are not available to applications. > Also, all functions that expect parameter types that are unavailable > to applications are excluded from the ABI, such as PyAST_FromNode > (which expects a ``node*``). > > All other functions are available, unless excluded below. > > Function-like macros (in particular, field access macros) remain > available to applications, but get replaced by function calls > (unless their definition only refers to features of the ABI, such > as the various _Check macros) > > ABI function declarations will not change their parameters or return > types. If a change to the signature becomes necessary, a new function > will be introduced. If the new function is source-compatible (e.g. if > just the return type changes), an alias macro may get added to > redirect calls to the new function when the applications is > recompiled. > > If continued provision of the old function is not possible, it may get > deprecated, then removed, in accordance with PEP 7, causing > applications that use that function to break. > > Excluded Functions > ------------------ > > Functions declared in the following header files are not part > of the ABI: > - cellobject.h > - classobject.h > - code.h > - frameobject.h > - funcobject.h > - genobject.h > - pyarena.h > - pydebug.h > - symtable.h > - token.h > - traceback.h > > Global Variables > ---------------- > > Global variables representing types and exceptions are available > to applications. > XXX provide a complete list. > > XXX should restrict list of globals to truly "builtin" stuff, > excluding everything that can also be looked up through imports. > > XXX may specify access to predefined types and exceptions through > the interpreter state, with appropriate Get macros. > > Other Macros > ------------ > > All macros defining symbolic constants are available to applications; > the numeric values will not change. > > In addition, the following macros are available: > > - Py_BEGIN_ALLOW_THREADS, Py_BLOCK_THREADS, Py_UNBLOCK_THREADS, > ?Py_END_ALLOW_THREADS > > Linkage > ------- > > On Windows, applications shall link with python3.dll; an import > library python3.lib will be available. This DLL will redirect all of > its API functions through /export linker options to the full > interpreter DLL, i.e. python3y.dll. > > XXX is it possible to redirect global variables in the same way? > If not, python3.dll would have to copy them, and we should verify > that all available global variables are read-only. > > On Unix systems, the ABI is typically provided by the python > executable itself. PyModule_Create is changed to pass ``3`` as the API > version if the extension module was compiled with Py_LIMITED_API; the > version check for the API version will accept either 3 or the current > PYTHON_API_VERSION as conforming. If Python is compiled as a shared > library, it is installed as both libpython3.so, and libpython3.y.so; > applications conforming to this PEP should then link to the former. > > XXX is it possible to make the soname libpython.so.3, and still > have some applications link to libpython3.y.so? > > Implementation Strategy > ======================= > > This PEP will be implemented in a branch, allowing users to check > whether their modules conform to the ABI. To simplify this testing, an > additional macro Py_LIMITED_API_WITH_TYPES will expose the existing > type object layout, to let users postpone rewriting all types. When > the this branch is merged into the 3.2 code base, this macro will > be removed. > > Copyright > ========= > > This document has been placed in the public domain. > From jyasskin at gmail.com Wed May 20 18:49:34 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 20 May 2009 09:49:34 -0700 Subject: [Python-Dev] [Fwd: Re: PEP 384: Defining a Stable ABI] In-Reply-To: <4A127766.8000101@resolversystems.com> References: <4A1141B4.4090608@voidspace.org.uk> <4A127766.8000101@resolversystems.com> Message-ID: <5d44f72f0905200949m2524f2d5t492baf24de6eed93@mail.gmail.com> On Tue, May 19, 2009 at 2:09 AM, William Reade wrote: > (for example, all(?) PyCxx modules use PyCode_New and > PyFrame_New to get nicer tracebacks) Specifically for this, I think it'd be nice to expose a function to do this directly. I recently added PyCode_NewEmpty (http://svn.python.org/view?view=rev&revision=72487) to go part of the way here. I didn't go farther because I didn't have a big enough picture. If most uses of PyFrame_New are really just to call into Python with a nice traceback, I think it'd be a good idea to add such a function to ceval.h next to PyEval_Call*(). We can only credibly tell people to use only the ABI functions when we have an ABI replacement for the (sane uses of) non-ABI calls. From theller at ctypes.org Wed May 20 18:52:46 2009 From: theller at ctypes.org (Thomas Heller) Date: Wed, 20 May 2009 18:52:46 +0200 Subject: [Python-Dev] Python on PowerPC? In-Reply-To: <4A142626.10502@yahoo.com> References: <4A13B1B4.5090705@yahoo.com> <200905201317.10235.eckhardt@satorlaser.com> <4A142626.10502@yahoo.com> Message-ID: Chris Plasun schrieb: > Thanks for your reply. > > Ulrich Eckhardt wrote: >> On Wednesday 20 May 2009, Chris Plasun wrote: >>> I'm to develop console apps on a Linux embedded PowerPC board (Freescale >>> MPC8313). >>> >>> Is there a Python release for the PowerPC platform? >> >> This has pretty little to do with the development of the Python language >> itself, so it is rather off topic here. > > This group appeared to be relevant. > >> That said, Linux systems are barely thinkable without Python, even when >> running on PPC, so yes, Python runs on PPC, too, and is included in probably >> every Linux distro, e.g. Debian. > > hmmm, hopefully I can find something to run in an embedded box. If you need to cross-compile, I have a build script and working patches to cross-build Python 2.6.2 for an ARM embedded system. Contact me by private mail if you want them. -- Thanks, Thomas From solipsis at pitrou.net Wed May 20 19:14:46 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 20 May 2009 17:14:46 +0000 (UTC) Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> Message-ID: Jeffrey Yasskin gmail.com> writes: > > Over an 8-year lifetime for Python 3, Moore's law predicts that > desktop systems will have up to 64 cores, at which point even the > simplest GIL-removal strategy of making refcounts atomic will be a > win, despite the 2x performance loss for a single thread. That's only if you think all workloads parallelize easily (and with little work from the average programmer), which sounds a bit presumptuous. When you have a GUI application and the perceived performance is driven by UI responsivity, spawning dozens of threads can little to improve the picture ("GUI application" here can mean a feature-rich Web application, too). As for desktop systems having 64 cores, that's unless the available die space gets used for something else instead, e.g. an integrated GPU. Or unless the desktop dies in favor of something else (e.g. laptops with small tightly integrated chips). The former is already in AMD's and Intel's plans. The latter could be happening right now. And we're not even talking about embedded platforms, or virtual machines where a 64-core server is partitioned into 64 "single-core" systems. (and then there's the whole threading vs processing debate ;-)) Endly, removing the GIL means you have to make all base types (especially containers) thread-safe without sacrificing their performance. I don't think it's just about reference-counting. That said, the Py_Incref() and Py_Decref() functions already exist. Perhaps they should be advertised a bit more in the documentation. The day a hypothetical Python implementation gets rid of reference-counting while remaining binary compatible with the rest of the API (which rules out PyPy), and gets much faster in the process, I think people will happily suffer a small recompile. Regards Antoine. From jyasskin at gmail.com Wed May 20 19:26:35 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 20 May 2009 10:26:35 -0700 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> Message-ID: <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> On Wed, May 20, 2009 at 10:14 AM, Antoine Pitrou wrote: > Jeffrey Yasskin gmail.com> writes: >> >> Over an 8-year lifetime for Python 3, Moore's law predicts that >> desktop systems will have up to 64 cores, at which point even the >> simplest GIL-removal strategy of making refcounts atomic will be a >> win, despite the 2x performance loss for a single thread. > > That's only if you think all workloads parallelize easily (and with little work > from the average programmer), which sounds a bit presumptuous. When you have a > GUI application and the perceived performance is driven by UI responsivity, > spawning dozens of threads can little to improve the picture ("GUI application" > here can mean a feature-rich Web application, too). > > As for desktop systems having 64 cores, that's unless the available die space > gets used for something else instead, e.g. an integrated GPU. Or unless the > desktop dies in favor of something else (e.g. laptops with small tightly > integrated chips). The former is already in AMD's and Intel's plans. The latter > could be happening right now. > > And we're not even talking about embedded platforms, or virtual machines where a > 64-core server is partitioned into 64 "single-core" systems. > > (and then there's the whole threading vs processing debate ;-)) > > Endly, removing the GIL means you have to make all base types (especially > containers) thread-safe without sacrificing their performance. I don't think > it's just about reference-counting. > > > That said, the Py_Incref() and Py_Decref() functions already exist. Perhaps they > should be advertised a bit more in the documentation. The day a hypothetical > Python implementation gets rid of reference-counting while remaining binary > compatible with the rest of the API (which rules out PyPy), and gets much faster > in the process, I think people will happily suffer a small recompile. Sorry, I didn't mean to get into a GIL debate. All I'm saying is that I don't think changing the definition of Py_INCREF and Py_DECREF justifies going to Python 4.0, so I don't think their definitions should be part of the ABI. If that's not what the ABI means, that's ok too. From solipsis at pitrou.net Wed May 20 19:34:42 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 20 May 2009 17:34:42 +0000 (UTC) Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> Message-ID: Jeffrey Yasskin gmail.com> writes: > > Sorry, I didn't mean to get into a GIL debate. All I'm saying is that > I don't think changing the definition of Py_INCREF and Py_DECREF > justifies going to Python 4.0, so I don't think their definitions > should be part of the ABI. If that's not what the ABI means, that's ok > too. Consider, though, that if Py_INCREF and Py_DECREF are not part of the ABI, enabling the ABI-specific preprocessor symbol will hide them, which might (or might not!) annoy a lot of extension writers. (I don't know if there are extensions out there having reference count increments and decrements in their critical paths) Regards Antoine. From jyasskin at gmail.com Wed May 20 19:41:37 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 20 May 2009 10:41:37 -0700 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> Message-ID: <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> On Wed, May 20, 2009 at 10:34 AM, Antoine Pitrou wrote: > Jeffrey Yasskin gmail.com> writes: >> >> Sorry, I didn't mean to get into a GIL debate. All I'm saying is that >> I don't think changing the definition of Py_INCREF and Py_DECREF >> justifies going to Python 4.0, so I don't think their definitions >> should be part of the ABI. If that's not what the ABI means, that's ok >> too. > > Consider, though, that if Py_INCREF and Py_DECREF are not part of the ABI, > enabling the ABI-specific preprocessor symbol will hide them, which might (or > might not!) annoy a lot of extension writers. Yes, that's my intention. (Well, not the annoying part, but making them use Py_IncRef instead for ABI compatibility is, I think, a good thing.) If they don't want ABI compatibility, they shouldn't ask for it. Giving them something else useful to ask for is why I mentioned an API compatibility mode. To decrease the annoyance of having to change source code, we could have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode. From aahz at pythoncraft.com Wed May 20 21:34:22 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 20 May 2009 12:34:22 -0700 Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2 In-Reply-To: <4A13F816.7050106@gmail.com> References: <4A13F816.7050106@gmail.com> Message-ID: <20090520193422.GB29309@panix.com> On Wed, May 20, 2009, Sven Schrader wrote: > > since our python installation is located on a symlink'ed directory, > our variables "sys.exec_prefix" and "sys.executable" can have > different paths. Therefore, the respective test in build_ext.py fails > (line 202) and a wrong library directory is obtained. > > To fix this issue, I have attached a patch that uses > "os.path.samefile" instead, to see whether two files are identical > irrespective of its path. Please post this patch to bugs.python.org so it can be tracked. Note that Python 2.5 is now accepting only security patches, so please check whether 2.6 and trunk need it. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From ncoghlan at gmail.com Wed May 20 22:07:08 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 May 2009 06:07:08 +1000 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> Message-ID: <4A1462EC.9050007@gmail.com> Jeffrey Yasskin wrote: > Yes, that's my intention. (Well, not the annoying part, but making > them use Py_IncRef instead for ABI compatibility is, I think, a good > thing.) If they don't want ABI compatibility, they shouldn't ask for > it. Giving them something else useful to ask for is why I mentioned an > API compatibility mode. > > To decrease the annoyance of having to change source code, we could > have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode. Forcing developers to choose between the speed of the INCREF/DECREF macros and the proposed ABI compatibility mode for the benefit of an as yet hypothetical GIL-less CPython API implementation seems more like a way to kill adoption of the ABI compatibility mode rather than a way to encourage the use of the IncRef/Decref functions. The idea of allow an extension to explicitly version the stable ABI they're using with a macro like Py_ABI_VERSION is a good one though. I'd suggest using the Python version in hex (e.g. 0x020700 and 0x030200) rather than an ISO date though. That way an extension developer that wanted to ensure there code worked with a particular Python version and later could just define the right Py_ABI_VERSION rather than have to specifically compile against that earliest version. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Wed May 20 22:10:48 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 May 2009 06:10:48 +1000 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: <4A1463C8.9090200@gmail.com> Martin v. L?wis wrote: > Functions and function-like Macros > ---------------------------------- > > All functions starting with _Py are not available to applications. > Also, all functions that expect parameter types that are unavailable > to applications are excluded from the ABI, such as PyAST_FromNode > (which expects a ``node*``). > > All other functions are available, unless excluded below. > > Function-like macros (in particular, field access macros) remain > available to applications, but get replaced by function calls > (unless their definition only refers to features of the ABI, such > as the various _Check macros) > > ABI function declarations will not change their parameters or return > types. If a change to the signature becomes necessary, a new function > will be introduced. If the new function is source-compatible (e.g. if > just the return type changes), an alias macro may get added to > redirect calls to the new function when the applications is > recompiled. > > If continued provision of the old function is not possible, it may get > deprecated, then removed, in accordance with PEP 7, causing > applications that use that function to break. Something I haven't seen explicitly mentioned as yet (in the PEP or the python-dev list discussion) are the memory management APIs and the FILE* APIs which can cause the MSVCRT versioning issues on Windows. Those would either need to be excluded from the stable ABI or else changed to use opaque pointers. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ziade.tarek at gmail.com Wed May 20 22:58:50 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 20 May 2009 22:58:50 +0200 Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2 In-Reply-To: <4A13F816.7050106@gmail.com> References: <4A13F816.7050106@gmail.com> Message-ID: <94bdd2610905201358h712a5becm1ad59adf4ff6c683@mail.gmail.com> Hi Sven can you add an issue with your patch in http://bugs.python.org/ Thanks in advance Tarek On Wed, May 20, 2009 at 2:31 PM, Sven Schrader wrote: > Hi, > > since our python installation is located on a symlink'ed directory, > our variables "sys.exec_prefix" and "sys.executable" can have different > paths. Therefore, the respective test in build_ext.py fails (line 202) > and a wrong > library directory is obtained. > > To fix this issue, I have attached a patch that uses "os.path.samefile" > instead, > to see whether two files are identical irrespective of its path. > > > Greetings > > Sven Schrader > > > > ps: please CC answers to me, I'm not on the list :-) > pps: I hope the attachment isn't inline... > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com > > -- Tarek Ziad? | http://ziade.org From skip at pobox.com Wed May 20 22:59:32 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 20 May 2009 15:59:32 -0500 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <4A1462EC.9050007@gmail.com> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> <4A1462EC.9050007@gmail.com> Message-ID: <18964.28469.271.781515@montanaro.dyndns.org> Nick> Jeffrey Yasskin wrote: >> To decrease the annoyance of having to change source code, we could >> have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode. Nick> Forcing developers to choose between the speed of the Nick> INCREF/DECREF macros and the proposed ABI compatibility mode for Nick> the benefit of an as yet hypothetical GIL-less CPython API Nick> implementation seems more like a way to kill adoption of the ABI Nick> compatibility mode rather than a way to encourage the use of the Nick> IncRef/Decref functions. I suspect it's not really germane to this discussion but if the incref/decref functions were defined as inline would that effectively be like using the macro versions vis a vis ABI compatibility? Skip From benjamin at python.org Wed May 20 23:01:23 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 20 May 2009 16:01:23 -0500 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <18964.28469.271.781515@montanaro.dyndns.org> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> <4A1462EC.9050007@gmail.com> <18964.28469.271.781515@montanaro.dyndns.org> Message-ID: <1afaf6160905201401o550b6a44u179c6a31a53d56bf@mail.gmail.com> 2009/5/20 : > I suspect it's not really germane to this discussion but if the > incref/decref functions were defined as inline would that effectively be > like using the macro versions vis a vis ABI compatibility? The code would be inlined into applications defeating the point of being able to change the implementation. :) -- Regards, Benjamin From stephen at xemacs.org Thu May 21 02:40:56 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 21 May 2009 09:40:56 +0900 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <1afaf6160905201401o550b6a44u179c6a31a53d56bf@mail.gmail.com> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> <4A1462EC.9050007@gmail.com> <18964.28469.271.781515@montanaro.dyndns.org> <1afaf6160905201401o550b6a44u179c6a31a53d56bf@mail.gmail.com> Message-ID: <87my97fftz.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > 2009/5/20 : > > > I suspect it's not really germane to this discussion but if the > > incref/decref functions were defined as inline would that effectively be > > like using the macro versions vis a vis ABI compatibility? > > The code would be inlined into applications defeating the point of > being able to change the implementation. :) Hang on, are you sure Skip isn't on to something? If the A*P*Is are defined in such way that by making them *function calls* they preserve A*B*I compatibility, while making them inline gives performance, then the user (in this case, I really mean the vendor of an application that contains C modules, I guess) can choose which route to go, right? I suppose that Python itself could be built with inlined code internally, but also provide the ABI (at a cost in size, of course). I don't know if this complexity is manageable or worth trying to manage, but isn't it conceivable that it could work? I guess that's for the advocates of extending the promise of ABI compatibility to these APIs to show, though. I don't need it myself. From foom at fuhm.net Thu May 21 02:48:01 2009 From: foom at fuhm.net (James Y Knight) Date: Wed, 20 May 2009 20:48:01 -0400 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <4A1462EC.9050007@gmail.com> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> <4A1462EC.9050007@gmail.com> Message-ID: On May 20, 2009, at 4:07 PM, Nick Coghlan wrote: > Forcing developers to choose between the speed of the INCREF/DECREF > macros and the proposed ABI compatibility mode for the benefit of an > as > yet hypothetical GIL-less CPython API implementation seems more like a > way to kill adoption of the ABI compatibility mode rather than a way > to > encourage the use of the IncRef/Decref functions. Indeed, and if the promise of "no-ABI-breakages-till-4.0" is removed, this would be a non-issue. Keep Py_INCREF macros in the current ABI, and then break the ABI when someone wants to remove the GIL someday. That's certainly going to be a big enough change to justify changing the ABI. James From benjamin at python.org Thu May 21 03:48:53 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 20 May 2009 20:48:53 -0500 Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI In-Reply-To: <87my97fftz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4A107988.3020202@v.loewis.de> <5d44f72f0905200940k4a40f638j5637a6b79d075b0@mail.gmail.com> <5d44f72f0905201026v7c9241f4jeaca9a8153a2fc7c@mail.gmail.com> <5d44f72f0905201041q146bb99dr2f27b47c10522eb6@mail.gmail.com> <4A1462EC.9050007@gmail.com> <18964.28469.271.781515@montanaro.dyndns.org> <1afaf6160905201401o550b6a44u179c6a31a53d56bf@mail.gmail.com> <87my97fftz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1afaf6160905201848g13b943ddg62b6d2435786b55@mail.gmail.com> 2009/5/20 Stephen J. Turnbull : > Benjamin Peterson writes: > ?> 2009/5/20 ?: > ?> > ?> > I suspect it's not really germane to this discussion but if the > ?> > incref/decref functions were defined as inline would that effectively be > ?> > like using the macro versions vis a vis ABI compatibility? > ?> > ?> The code would be inlined into applications defeating the point of > ?> being able to change the implementation. :) > > Hang on, are you sure Skip isn't on to something? ?If the A*P*Is are > defined in such way that by making them *function calls* they preserve > A*B*I compatibility, while making them inline gives performance, then > the user (in this case, I really mean the vendor of an application > that contains C modules, I guess) can choose which route to go, right? In that case, they might as well be macros because changing would require recompiling. -- Regards, Benjamin From william at resolversystems.com Fri May 22 12:33:02 2009 From: william at resolversystems.com (William Reade) Date: Fri, 22 May 2009 11:33:02 +0100 Subject: [Python-Dev] [Fwd: Re: PEP 384: Defining a Stable ABI] In-Reply-To: <4A127766.8000101@resolversystems.com> References: <4A1141B4.4090608@voidspace.org.uk> <4A127766.8000101@resolversystems.com> Message-ID: <4A167F5E.2050704@resolversystems.com> William Reade wrote: > 2) Since it hasn't always been in place, its introduction won't help > me in the short term: there are an awful lot of extension modules that > use excluded functions (for example, all(?) PyCxx modules use > PyCode_New and PyFrame_New to get nicer tracebacks), and I'll still > have to handle all these cases until everyone is up-to-date with > whatever version of Python this gets accepted into. It seems that where I should have said Pyrex, I actually said PyCxx. Sorry for the confusion. Thanks to Barry Scott for pointing it out. From status at bugs.python.org Fri May 22 18:06:56 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 22 May 2009 18:06:56 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090522160656.341DA785EC@psf.upfronthosting.co.za> ACTIVITY SUMMARY (05/15/09 - 05/22/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2195 open (+35) / 15716 closed (+24) / 17911 total (+59) Open issues with patches: 863 Average duration of open issues: 650 days. Median duration of open issues: 400 days. Open Issues Breakdown open 2168 (+35) pending 27 ( +0) Issues Created Or Reopened (64) _______________________________ bsddb memory leak on ubuntu 05/18/09 CLOSED http://bugs.python.org/issue3541 reopened ajaksu2 idle pydoc et al removed from 3.1 without versioned replacements 05/22/09 http://bugs.python.org/issue5756 reopened nad Dict fails to notice addition and deletion of keys during iterat 05/16/09 CLOSED http://bugs.python.org/issue6017 reopened stevenjd documentation of xml.dom.minidom.parse signature is wrong 05/16/09 CLOSED http://bugs.python.org/issue6025 reopened phihag patch Interpreter crashes when chaining an infinite number of exceptio 05/15/09 http://bugs.python.org/issue6028 reopened amaury.forgeotdarc patch io.BufferedWriter C module missing _write_lock 05/15/09 CLOSED http://bugs.python.org/issue6030 created jroesslein BaseServer.shutdown documentation is incomplete 05/15/09 http://bugs.python.org/issue6031 created gagenellina Fix refleaks in test_urllib2_localnet 05/16/09 CLOSED http://bugs.python.org/issue6032 created collinwinter patch LOOKUP_METHOD and CALL_METHOD optimization 05/16/09 http://bugs.python.org/issue6033 created benjamin.peterson patch Fix object.__reversed__ doc 05/16/09 CLOSED http://bugs.python.org/issue6034 created tjreedy test_poplib Bus error with gcc-4.4 on OS X 05/16/09 CLOSED http://bugs.python.org/issue6035 created marketdickinson Clean up test_posixpath.py 05/16/09 http://bugs.python.org/issue6036 created phihag patch MutableSequence.__iadd__ should return self 05/16/09 CLOSED http://bugs.python.org/issue6037 created amarzal Should collections.Counter check for int? 05/16/09 CLOSED http://bugs.python.org/issue6038 created hagen cygwin compilers should not check compiler versions 05/16/09 http://bugs.python.org/issue6039 created cdavid bdist_msi does not deal with pre-release version 05/16/09 http://bugs.python.org/issue6040 created cdavid change sdist and register command so they use check 05/16/09 CLOSED http://bugs.python.org/issue6041 created tarek Document and slightly simplify lnotab tracing 05/16/09 http://bugs.python.org/issue6042 created jyasskin patch, needs review HTMLParseError derivation 05/16/09 CLOSED http://bugs.python.org/issue6043 created bayerf Exception message in int() when trying to convert a complex 05/17/09 CLOSED http://bugs.python.org/issue6044 created aletornw Fix dbm interfaces 05/17/09 http://bugs.python.org/issue6045 created georg.brandl test_distutils.py fails on VC6(Windows) 05/17/09 CLOSED http://bugs.python.org/issue6046 created ocean-city patch "install" target in python 3.x makefile should be "fullinstall" 05/17/09 http://bugs.python.org/issue6047 created ronaldoussoren make distutils use the tarinfo command 05/17/09 http://bugs.python.org/issue6048 created tarek str.strip() and " behaviour expected? 05/17/09 CLOSED http://bugs.python.org/issue6049 created sholvar zipfile: Extracting a directory that already exists generates an 05/18/09 http://bugs.python.org/issue6050 created joe.amenta patch smtplib docs should link to email module examples 05/18/09 CLOSED http://bugs.python.org/issue6051 created guettli for-loop doesn't work with -c 05/18/09 CLOSED http://bugs.python.org/issue6052 created exe distutils error on windows 05/18/09 CLOSED http://bugs.python.org/issue6053 created ocean-city patch tarfile normalizes arcname 05/18/09 http://bugs.python.org/issue6054 created mkv References to "pysqlite" in documentation of sqlite3 should be c 05/18/09 CLOSED http://bugs.python.org/issue6055 created MLModel socket.setdefaulttimeout affecting multiprocessing Manager 05/18/09 http://bugs.python.org/issue6056 created ryles sqlite3 error classes should be documented 05/18/09 http://bugs.python.org/issue6057 created MLModel Add cp65001 to encodings/aliases.py 05/19/09 http://bugs.python.org/issue6058 created tzot patch uuid.uuid4 cause segfault in emesene 05/19/09 http://bugs.python.org/issue6059 created acevery PYTHONHOME should be more flexible (and controllable by --libdir 05/19/09 http://bugs.python.org/issue6060 created soundmurderer time.clock(): overflow in programs that run for very long 05/19/09 CLOSED http://bugs.python.org/issue6061 created tom65536 build_ext fails to build in the right directory using the packag 05/19/09 CLOSED http://bugs.python.org/issue6062 created tarek pydoc_data package is not installed 05/19/09 CLOSED http://bugs.python.org/issue6063 created ronaldoussoren patch, 26backport Add "daemon" argument to threading.Thread constructor 05/19/09 http://bugs.python.org/issue6064 created tebeka patch, easy bdist_msi.py failed assert when including extension modules 05/19/09 http://bugs.python.org/issue6065 created tim.golden patch POP_MARK was not in pickle protocol 0 05/20/09 CLOSED http://bugs.python.org/issue6066 created collinwinter patch, easy make error 05/20/09 http://bugs.python.org/issue6067 created gast support read/write c_ulonglong type bitfield structures 05/20/09 http://bugs.python.org/issue6068 created higstar casting error from ctypes array to structure 05/20/09 http://bugs.python.org/issue6069 created higstar Pyhon 2.6 makes .pyc/.pyo bytecode files executable 05/20/09 http://bugs.python.org/issue6070 created phd no longer possible to hash arrays 05/20/09 http://bugs.python.org/issue6071 created exarkun unittest.TestCase._result is very likely to collide (and break) 05/20/09 CLOSED http://bugs.python.org/issue6072 created exarkun patch threading.Timer and gtk.main are not compatible 05/20/09 http://bugs.python.org/issue6073 created eric .pyc files created readonly if .py file is readonly, python won' 05/20/09 http://bugs.python.org/issue6074 created pdsimanyi Patch for IDLE/OS X to work with Tk-Cocoa 05/20/09 http://bugs.python.org/issue6075 created wordtech patch Missing title for configDialog.py 05/20/09 http://bugs.python.org/issue6076 created wordtech patch Unicode issue with tempfile on Windows 05/21/09 http://bugs.python.org/issue6077 created daniel.ugra freeze.py doesn't work 05/21/09 http://bugs.python.org/issue6078 created mzalokar SyntaxError in xmlrpc.client examples 05/21/09 http://bugs.python.org/issue6079 created thijs Itertools objects are missing "send" 05/21/09 CLOSED http://bugs.python.org/issue6080 created tebeka str.format_from_mapping() 05/21/09 http://bugs.python.org/issue6081 created rhettinger os.path.sameopenfile reports that standard streams are the same 05/22/09 CLOSED http://bugs.python.org/issue6082 reopened ryles Reference counting bug in setrlimit 05/22/09 http://bugs.python.org/issue6083 created billm patch documentation of zip function is error 05/22/09 CLOSED http://bugs.python.org/issue6084 created bones7456 Logging in BaseHTTPServer.BaseHTTPRequestHandler causes lag 05/22/09 http://bugs.python.org/issue6085 created aerodonkey Correct minor typos in doanddont.rst and urllib2.rst howto docum 05/22/09 CLOSED http://bugs.python.org/issue6086 created vshenoy patch distutils.sysconfig.get_python_lib gives surprising result when 05/22/09 http://bugs.python.org/issue6087 created vsajip easy Python3.0.1.1 is not available when system locale is zh_TW.eucTW 05/22/09 http://bugs.python.org/issue6088 created leeon Issues Now Closed (65) ______________________ weakref copy module interaction 456 days http://bugs.python.org/issue2116 pitrou patch os.listdir doc should mention that Unicode decoding can fail 366 days http://bugs.python.org/issue2856 georg.brandl sys.stdin.fileno() gives attribute error in IDLE 354 days http://bugs.python.org/issue3003 kbk Problem with invalidly-encoded command-line arguments (Unix) 351 days http://bugs.python.org/issue3023 benjamin.peterson incorrect comments for PyObject_ReleaseBuffer 315 days http://bugs.python.org/issue3293 pitrou Py_WIN_WIDE_FILENAMES removal 282 days http://bugs.python.org/issue3527 ocean-city patch bsddb memory leak on ubuntu 4 days http://bugs.python.org/issue3541 jcea remove not decodable environment variables 214 days http://bugs.python.org/issue4126 loewis patch 3 tutorial documentation errors 210 days http://bugs.python.org/issue4144 georg.brandl Running Python 2.6 GUI on Windows Vista 202 days http://bugs.python.org/issue4215 georg.brandl Missing make altframeworkinstall for Mac OS X 165 days http://bugs.python.org/issue4554 ronaldoussoren 2.6.1 breaks many applications that embed Python on Windows 163 days http://bugs.python.org/issue4566 chrisyco patch, needs review Setting font from preference dialog in IDLE on OS X broken 97 days http://bugs.python.org/issue5232 kbk patch OS X Installer: add options to specify universal build type and 93 days http://bugs.python.org/issue5269 ronaldoussoren Scanner class in re module undocumented 87 days http://bugs.python.org/issue5337 rhettinger add a new command called "check" into Distutils 37 days http://bugs.python.org/issue5732 tarek OS X Installer: new make of documentation installs at wrong loca 34 days http://bugs.python.org/issue5769 ronaldoussoren len(reversed( 28 days http://bugs.python.org/issue5786 marketdickinson float('1e500') -> inf, complex('1e500') -> ValueError 26 days http://bugs.python.org/issue5829 marketdickinson patch, easy Better documentation of use of BROWSER environment variable 12 days http://bugs.python.org/issue5935 georg.brandl Problems with dbm documentation 12 days http://bugs.python.org/issue5937 georg.brandl Ambiguity in dbm.open flag documentation 14 days http://bugs.python.org/issue5942 georg.brandl email.message : get_payload args's documentation is confusing 10 days http://bugs.python.org/issue5951 georg.brandl test_distutils fails for Python 3.1b1 on MacOS X 9 days http://bugs.python.org/issue5956 nad WeakSet cmp methods 11 days http://bugs.python.org/issue5964 pitrou patch, needs review Add bug tracker tasks to PEP 101 8 days http://bugs.python.org/issue5980 georg.brandl patch Broken link to "Curses Programming with Python" 6 days http://bugs.python.org/issue5987 georg.brandl Add __bool__ to threading.Event and multiprocessing.Event 4 days http://bugs.python.org/issue5998 benjamin.peterson patch test_urllib2_localnet DigestAuthHandler leaks nonces 7 days http://bugs.python.org/issue6002 collinwinter easy optparse docs say 'default' keyword is deprecated but uses it in 3 days http://bugs.python.org/issue6009 georg.brandl Dict fails to notice addition and deletion of keys during iterat 1 days http://bugs.python.org/issue6017 georg.brandl Fix the output word from "ok" to "OK" when a testcase passes 3 days http://bugs.python.org/issue6018 benjamin.peterson test_distutils leaves a 'foo' file behind in the cwd 5 days http://bugs.python.org/issue6022 r.david.murray Search does not intelligently handle module.function queries on 3 days http://bugs.python.org/issue6023 georg.brandl documentation of xml.dom.minidom.parse signature is wrong 0 days http://bugs.python.org/issue6025 georg.brandl patch io.BufferedWriter C module missing _write_lock 0 days http://bugs.python.org/issue6030 pitrou Fix refleaks in test_urllib2_localnet 3 days http://bugs.python.org/issue6032 collinwinter patch Fix object.__reversed__ doc 0 days http://bugs.python.org/issue6034 georg.brandl test_poplib Bus error with gcc-4.4 on OS X 0 days http://bugs.python.org/issue6035 marketdickinson MutableSequence.__iadd__ should return self 2 days http://bugs.python.org/issue6037 rhettinger Should collections.Counter check for int? 1 days http://bugs.python.org/issue6038 rhettinger change sdist and register command so they use check 0 days http://bugs.python.org/issue6041 tarek HTMLParseError derivation 0 days http://bugs.python.org/issue6043 benjamin.peterson Exception message in int() when trying to convert a complex 0 days http://bugs.python.org/issue6044 marketdickinson test_distutils.py fails on VC6(Windows) 1 days http://bugs.python.org/issue6046 tarek patch str.strip() and " behaviour expected? 0 days http://bugs.python.org/issue6049 loewis smtplib docs should link to email module examples 2 days http://bugs.python.org/issue6051 georg.brandl for-loop doesn't work with -c 0 days http://bugs.python.org/issue6052 r.david.murray distutils error on windows 0 days http://bugs.python.org/issue6053 loewis patch References to "pysqlite" in documentation of sqlite3 should be c 2 days http://bugs.python.org/issue6055 georg.brandl time.clock(): overflow in programs that run for very long 0 days http://bugs.python.org/issue6061 tom65536 build_ext fails to build in the right directory using the packag 0 days http://bugs.python.org/issue6062 tarek pydoc_data package is not installed 0 days http://bugs.python.org/issue6063 georg.brandl patch, 26backport POP_MARK was not in pickle protocol 0 1 days http://bugs.python.org/issue6066 collinwinter patch, easy unittest.TestCase._result is very likely to collide (and break) 1 days http://bugs.python.org/issue6072 michael.foord patch Itertools objects are missing "send" 0 days http://bugs.python.org/issue6080 rhettinger os.path.sameopenfile reports that standard streams are the same 0 days http://bugs.python.org/issue6082 ryles documentation of zip function is error 0 days http://bugs.python.org/issue6084 georg.brandl Correct minor typos in doanddont.rst and urllib2.rst howto docum 0 days http://bugs.python.org/issue6086 georg.brandl patch http libraries throw errors internally in BitTorrent 1884 days http://bugs.python.org/issue920573 rhettinger Documentation for Descriptors in the main docs 1809 days http://bugs.python.org/issue966625 rhettinger pdb unable to jump to first statement 785 days http://bugs.python.org/issue1689458 jyasskin patch, needs review Failure to build on AIX 5.3 773 days http://bugs.python.org/issue1694442 ajaksu2 syslog syscall support for SysLogLogger 749 days http://bugs.python.org/issue1711603 dandrzejewski patch help() can't find right source file 702 days http://bugs.python.org/issue1738179 pitrou patch, easy Top Issues Most Discussed (10) ______________________________ 11 test_distutils.py fails on VC6(Windows) 1 days closed http://bugs.python.org/issue6046 8 Interpreter crashes when chaining an infinite number of excepti 7 days open http://bugs.python.org/issue6028 8 Dict fails to notice addition and deletion of keys during itera 1 days closed http://bugs.python.org/issue6017 7 test_distutils fails for Python 3.1b1 on MacOS X 9 days closed http://bugs.python.org/issue5956 7 urllib/urllib2: HTTPS over (Squid) Proxy fails 1203 days open http://bugs.python.org/issue1424152 6 Enhanced cPython profiler with high-resolution timer 436 days open http://bugs.python.org/issue2281 5 distutils error on windows 0 days closed http://bugs.python.org/issue6053 5 Fix dbm interfaces 5 days open http://bugs.python.org/issue6045 5 Fix refleaks in test_urllib2_localnet 3 days closed http://bugs.python.org/issue6032 5 Embedding into a shared library fails 177 days open http://bugs.python.org/issue4434 From ziade.tarek at gmail.com Fri May 22 18:27:01 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 22 May 2009 18:27:01 +0200 Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure In-Reply-To: <94bdd2610905200248k79d15f2ahf5aa036ac1e7a80d@mail.gmail.com> References: <94bdd2610905141521i57727416q21f7fb13b1bdd077@mail.gmail.com> <20090514230740.D9C8A3A4061@sparrow.telecommunity.com> <94bdd2610905160906g7b4b03a1m81a7aa8c99e89968@mail.gmail.com> <20090516165302.C71643A4061@sparrow.telecommunity.com> <94bdd2610905190704m5efdeb4dne1d559e9964331bd@mail.gmail.com> <20090519203357.CE9C63A40D7@sparrow.telecommunity.com> <94bdd2610905200248k79d15f2ahf5aa036ac1e7a80d@mail.gmail.com> Message-ID: <94bdd2610905220927h25d58259r365b057ab56c334f@mail.gmail.com> On Wed, May 20, 2009 at 11:48 AM, Tarek Ziad? wrote: > So I guess I'll start this prototype in bitbucket and come back with it for feedback > in Distutils-SIG, for a new PEP 376 round. Ok so FYI, I moved the discussion here: http://mail.python.org/pipermail/distutils-sig/2009-May/011933.html Regards Tarek From jimjjewett at gmail.com Fri May 22 18:46:57 2009 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 22 May 2009 12:46:57 -0400 Subject: [Python-Dev] PEP 384: Defining a Stable ABI Message-ID: Martin v. L?wis wrote: > - PyGetSetDef (name, get, set, doc, closure) Is it fully decided that the generally-unused closure parameter will stay until python 4? > The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE) > are also available to applications. There have been several experiments in memory management, ranging from not bothering to change the refcount on permanent objects like None, to proxying objects across multiple threads or processes. I also believe (but don't remember for sure) that some of the proposed Unicode (or String?) optimizations changed the memory layout a bit. So far, these have all been complicated (or slow) enough that they didn't get integrated, but if it ever happens ... I don't think it would justify python 4.0 > New Python > versions may introduce new slot ids, but slot ids will never be > recycled. Slots may get deprecated, but continue to be supported > throughout Python 3.x. Weren't there already a few ready for deprecation? Do you really want to commit to them forever? Even if you aren't willing to settle for less than "3.x from now on", it might make sense to at least start with 3.2, rather than 3.0. -jJ From solipsis at pitrou.net Fri May 22 19:00:00 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 22 May 2009 17:00:00 +0000 (UTC) Subject: [Python-Dev] PEP 384: Defining a Stable ABI References: Message-ID: Jim Jewett gmail.com> writes: > > > The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE) > > are also available to applications. > > There have been several experiments in memory management, ranging from > not bothering to change the refcount on permanent objects like None, > to proxying objects across multiple threads or processes. These experiments don't seem to have been very successful, have they? Besides, Py_TYPE is a fundamental property of every PyObject. On the other hand, I think Py_SIZE should be discouraged in favour of the type-specific variants (PyString_GET_SIZE, etc.), since some types have their own way of (ab)using the size field. > I also > believe (but don't remember for sure) that some of the proposed > Unicode (or String?) optimizations changed the memory layout a bit. The one Unicode optimization I know of, in http://bugs.python.org/issue1943, is suspended because of Marc-Andre's opposition. In any case, it doesn't touch the fundamental PyObject layout. Regards Antoine. From martin at v.loewis.de Fri May 22 21:47:33 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 22 May 2009 21:47:33 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1463C8.9090200@gmail.com> References: <4A107988.3020202@v.loewis.de> <4A1463C8.9090200@gmail.com> Message-ID: <4A170155.6010402@v.loewis.de> > Something I haven't seen explicitly mentioned as yet (in the PEP or the > python-dev list discussion) are the memory management APIs and the FILE* > APIs which can cause the MSVCRT versioning issues on Windows. > > Those would either need to be excluded from the stable ABI or else > changed to use opaque pointers. Good point. As a separate issue, I would actually like to deprecate, then remove these APIs. I had originally hoped that this would happen for 3.0 already, alas, nobody worked on it. In any case, I have removed them from the ABI now. I haven't thought about the Windows CRT issue yet. I can see that there would be still problems even without that, e.g. when you do setlocale in Python, it might not affect the extension module, etc. How would you propose to deal with that? One approach would to fix the CRT version for Windows, for the lifetime of 3.x. Another approach could be to document the known restrictions, and otherwise declare "use at your own risk". Regards, Martin From dalcinl at gmail.com Sat May 23 02:50:33 2009 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 22 May 2009 21:50:33 -0300 Subject: [Python-Dev] PEP 384: a request for PyType_Slot Message-ID: Martin, a small request. Any chance you consider defining PyType_Slot like below? typedef struct{ int slot; /* slot id, see below */ void *pdata; /* data pointer */ void (*pfunc)(void); /* function pointer */ } PyType_Slot Or perhaps other way? Just to avoid compilers complaining about the illegal conversion between pointers to data and pointers to functions... It would be really annoying being force to do type-punning using an union in order to get "correct" C code... -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From aahz at pythoncraft.com Sat May 23 18:55:52 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 23 May 2009 09:55:52 -0700 Subject: [Python-Dev] FWD: FTP URLs for Python source Message-ID: <20090523165552.GA6818@panix.com> Yes, this is ancient, I've been putting off dealing with it because I couldn't figure out who should handle it. At this point, I think that if anyone does it should be the release team, therefore I'm forwarding to python-dev. Feel free to tell me I made the wrong choice. ;-) ----- Forwarded message from "Douglas W. Goodall" ----- > From: "Douglas W. Goodall" > To: webmaster at python.org > Subject: made too hard... > Date: Mon, 16 Feb 2009 05:57:15 -0800 > > Dear Sir, > > I am not sure why, but you have made it harder than it has to be to > fetch the python source for installation on a unix system such as > OpenBSD. > > I had to use the command line ftp client and it took a lot of time to > discover the real > URL of the download file. > > Here is what ended up working. > > ftp http://www.e you made it this hard on purpose. Yes, it is easy if > you > are using a web browser, but if you are on a unix system without X > it is a pain to get it when you don't know how. > > You might want to add the ftp URL to the web page for people like me. > > Respectfully, > > Doug > > --- > Douglas W. Goodall > 425 San Juanico Street > Santa Maria, CA 93455 > (805) 598-9099 > http://www.goodall.com > > I call on each of us to pray for our president. > He is who we have for the next four years, > and we need him to be successful for all of > us. God Bless America, and the President. ----- End forwarded message ----- -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From martin at v.loewis.de Sat May 23 22:44:53 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 23 May 2009 22:44:53 +0200 Subject: [Python-Dev] FWD: FTP URLs for Python source In-Reply-To: <20090523165552.GA6818@panix.com> References: <20090523165552.GA6818@panix.com> Message-ID: <4A186045.5070107@v.loewis.de> Aahz wrote: > Yes, this is ancient, I've been putting off dealing with it because I > couldn't figure out who should handle it. At this point, I think that if > anyone does it should be the release team, therefore I'm forwarding to > python-dev. Feel free to tell me I made the wrong choice. ;-) I don't think it needs any action, except perhaps a half-polite response that we don't intend to change anything. a) if you are really sitting on the console of an OpenBSD system with no X installed, use lynx, or any other text browser: scroll down to "Source distribution", hit Enter b) alternatively, and even better: don't build Python from source at all. Instead, use pkg_add to install the Python version that you want, downloadable from ftp.openbsd.org/pub/OpenBSD//packages//python-.tgz c) OTOH, if you had only connected to the OpenBSD system remotely (e.g. through ssh), just use your local web browser, to either * determine the full source download URL of the Python release you want to build, then wget on the target system, or * if your target system doesn't have wget, download it locally, then scp/rcp/ftp it to the target system. We cannot add an FTP URL to the download page, because we don't run an ftp server anymore, and don't plan to. [I don't quite get the "Here is what ended up working" part. What is http://www.e?] Regards, Martin From hasan.diwan at gmail.com Sun May 24 02:48:02 2009 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Sat, 23 May 2009 17:48:02 -0700 Subject: [Python-Dev] FWD: FTP URLs for Python source In-Reply-To: <4A186045.5070107@v.loewis.de> References: <20090523165552.GA6818@panix.com> <4A186045.5070107@v.loewis.de> Message-ID: <2cda2fc90905231748x8db04fbi1a7d0e68d99c718e@mail.gmail.com> > Aahz wrote: >> Yes, this is ancient, I've been putting off dealing with it because I >> couldn't figure out who should handle it. ?At this point, I think that if >> anyone does it should be the release team, therefore I'm forwarding to >> python-dev. ?Feel free to tell me I made the wrong choice. ?;-) Regarding OpenBSD, what's the problem with just using the port -- the 2.6 version seems to work fine. -- Sent from my mobile device From aahz at pythoncraft.com Sun May 24 10:54:51 2009 From: aahz at pythoncraft.com (Aahz) Date: Sun, 24 May 2009 01:54:51 -0700 Subject: [Python-Dev] FWD: FTP URLs for Python source In-Reply-To: <4A186045.5070107@v.loewis.de> References: <20090523165552.GA6818@panix.com> <4A186045.5070107@v.loewis.de> Message-ID: <20090524085451.GA19579@panix.com> On Sat, May 23, 2009, "Martin v. L?wis" wrote: > > We cannot add an FTP URL to the download page, because we don't > run an ftp server anymore, and don't plan to. That's the critical bit. At this point, I don't think anything else needs doing. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From andymac at bullseye.apana.org.au Sun May 24 12:34:07 2009 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Sun, 24 May 2009 20:34:07 +1000 Subject: [Python-Dev] FWD: FTP URLs for Python source In-Reply-To: <4A186045.5070107@v.loewis.de> References: <20090523165552.GA6818@panix.com> <4A186045.5070107@v.loewis.de> Message-ID: <4A19229F.8080203@bullseye.andymac.org> Martin v. L?wis wrote: > * if your target system doesn't have wget, download it locally, > then scp/rcp/ftp it to the target system. All of [Free|Net|Open|Dragonfly]BSD have ftp clients that can also retrieve HTTP URLs, though I guess many wouldn't think of that... -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac at pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From charles.r.mccreary at gmail.com Sun May 24 15:20:56 2009 From: charles.r.mccreary at gmail.com (Charles McCreary) Date: Sun, 24 May 2009 08:20:56 -0500 Subject: [Python-Dev] Introducing GSOC student James Pruitt Message-ID: <8294397f0905240620q262a5ea7le3722cc5eb401899@mail.gmail.com> I am a mentor for a GSOC 2009 student working on a PSF project. His project abstract is "Handling of subprocess async io issues, testing and reimplementing the commands module in terms of subprocess." He has started a blog, http://subdev.blogspot.com/, in which he is providing general information on his GSOC project. In the next few days, he will start a project on google code so that interested parties can help guide his work. I urge anyone interested in the subprocess module to interact with Mr. Pruitt and provide feedback/suggestions/encouragement. Charles R. McCreary -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon May 25 14:51:01 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 25 May 2009 07:51:01 -0500 Subject: [Python-Dev] python-checkins is down Message-ID: <1afaf6160905250551y1100f29dp228a35e1942744f4@mail.gmail.com> I haven't gotten emails for any of the commits I've done in the last 12 hours or so. -- Regards, Benjamin From aahz at pythoncraft.com Mon May 25 15:44:37 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 25 May 2009 06:44:37 -0700 Subject: [Python-Dev] python-checkins is down In-Reply-To: <1afaf6160905250551y1100f29dp228a35e1942744f4@mail.gmail.com> References: <1afaf6160905250551y1100f29dp228a35e1942744f4@mail.gmail.com> Message-ID: <20090525134437.GB16887@panix.com> On Mon, May 25, 2009, Benjamin Peterson wrote: > > I haven't gotten emails for any of the commits I've done in the last > 12 hours or so. Forwarded to postmaster at python.org -- if there's a problem with the checkins process itself, that won't help. Have you verified that the commits are landing? (I.e. is svn working properly?) Also, if you could double-check the python-checkins archives to see whether it's just you not getting the messages, that would help. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From solipsis at pitrou.net Mon May 25 15:49:50 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 25 May 2009 13:49:50 +0000 (UTC) Subject: [Python-Dev] python-checkins is down References: <1afaf6160905250551y1100f29dp228a35e1942744f4@mail.gmail.com> <20090525134437.GB16887@panix.com> Message-ID: Aahz pythoncraft.com> writes: > > Forwarded to postmaster python.org -- if there's a problem with the > checkins process itself, that won't help. Have you verified that the > commits are landing? (I.e. is svn working properly?) Yes, it is. > Also, if you > could double-check the python-checkins archives to see whether it's just > you not getting the messages, that would help. The messages aren't in the archives either. cheers Antoine. From mal at egenix.com Mon May 25 19:41:54 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 25 May 2009 19:41:54 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A107988.3020202@v.loewis.de> References: <4A107988.3020202@v.loewis.de> Message-ID: <4A1AD862.6090100@egenix.com> Martin v. L?wis wrote: > Thomas Wouters reminded me of a long-standing idea; I finally > found the time to write it down. > > Please comment! > ... > Up until this PEP proposal, we had a very simple scheme for the Python C-API: all documented functions and variables with a "Py" prefix were part of the C-API, everything else was not and could change between releases (in particular the private "_Py" prefix APIs). Changing the published APIs was considered a bad thing in the 2.x development process and generally required a good reason to get supported. Changing private functions or ones that were not documented was generally never a big problem. Now, with the PEP, I have a feeling that the Python C-API will in effect be limited to what's in the PEP's idea of a usable ABI and open up the non-inluded public C-APIs to the same rate of change as the private APIs. If that's the case, the PEP should be discussed on the C-API list first, in order to identify a complete list of APIs that is supposed to define the Python C-API. Ideally, all other APIs would then need to be made private. However, I doubt that this is possible before switching to Python 4.0. Then again, I'm not sure whether that's what you're aiming for... An optional cross-version ABI would certainly be a good thing. Limiting the Python C-API would be counterproductive. > During the compilation of applications, the preprocessor macro > Py_LIMITED_API must be defined. Doing so will hide all definitions > that are not part of the ABI. So extensions wanting to use the full Python C-API as documented in the C-API docs will still be able to do this, right ? > Type Objects > ------------ > > The structure of type objects is not available to applications; > declaration of "static" type objects is not possible anymore > (for applications using this ABI). Hmm, that's going to create big problems for extensions that want to expose a C-API for their types: Type checks are normally done by pointer comparison using those static type objects. > Functions and function-like Macros > ---------------------------------- > > Function-like macros (in particular, field access macros) remain > available to applications, but get replaced by function calls > (unless their definition only refers to features of the ABI, such > as the various _Check macros) Including Py_INCREF()/Py_DECREF() ? > Excluded Functions > ------------------ > > Functions declared in the following header files are not part > of the ABI: > - cellobject.h > - classobject.h > - code.h > - frameobject.h > - funcobject.h > - genobject.h > - pyarena.h > - pydebug.h > - symtable.h > - token.h > - traceback.h I don't think that's feasable: you basically remove all introspection functions that way. This will need a more fine-grained approach. > Linkage > ------- > > On Windows, applications shall link with python3.dll; You mean: extensions that were compiled with Py_LIMITED_API, right ? > an import > library python3.lib will be available. This DLL will redirect all of > its API functions through /export linker options to the full > interpreter DLL, i.e. python3y.dll. What if you mix extensions that use the full C-API with ones that restrict themselves to the limited version ? Would creating a Python object in a full-API extension and free'ing it in a limited-API extension cause problems ? > Implementation Strategy > ======================= > > This PEP will be implemented in a branch, allowing users to check > whether their modules conform to the ABI. To simplify this testing, an > additional macro Py_LIMITED_API_WITH_TYPES will expose the existing > type object layout, to let users postpone rewriting all types. When > the this branch is merged into the 3.2 code base, this macro will > be removed. Now I'm confused again: this sounds a lot like you do want all extension writers to only use the limited API. [And in another post] >> Something I haven't seen explicitly mentioned as yet (in the PEP or the >> > python-dev list discussion) are the memory management APIs and the FILE* >> > APIs which can cause the MSVCRT versioning issues on Windows. >> > >> > Those would either need to be excluded from the stable ABI or else >> > changed to use opaque pointers. > > Good point. As a separate issue, I would actually like to deprecate, > then remove these APIs. I had originally hoped that this would happen > for 3.0 already, alas, nobody worked on it. > > In any case, I have removed them from the ABI now. How do you expect Python extensions to allocate memory and objects in a platform independent way without those APIs ? And as an aside: Which API families are you referring to ? PyMem_Malloc, PyObject_Malloc, or PyObject_New ? Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 25 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 34 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Mon May 25 23:04:58 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 26 May 2009 07:04:58 +1000 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1AD862.6090100@egenix.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> Message-ID: <4A1B07FA.6010509@gmail.com> M.-A. Lemburg wrote: > Now, with the PEP, I have a feeling that the Python C-API > will in effect be limited to what's in the PEP's idea of > a usable ABI and open up the non-inluded public C-APIs > to the same rate of change as the private APIs. Not really - before this PEP it was already fairly easy to write an extension that was source-level compatible with multiple versions of Python (depending on exactly what you wanted to do, of course). However, it is essentially impossible to make an extension that is binary level compatible with multiple versions. With the defined stable ABI in place, each extension module author will be able to make a choice: - choose binary compatibility by limiting themselves to the stable ABI and be able to provide a single binary that will still work with later versions of Py3k - stick with source compatibility and continue to provide new binaries for each version of Python > An optional cross-version ABI would certainly be a good thing. > > Limiting the Python C-API would be counterproductive. I don't think anyone would disagree with that. A discussion on C-API sig would certainly be a good idea. >> During the compilation of applications, the preprocessor macro >> Py_LIMITED_API must be defined. Doing so will hide all definitions >> that are not part of the ABI. > > So extensions wanting to use the full Python C-API as documented > in the C-API docs will still be able to do this, right ? Yep - they just wouldn't define the new macro. >> Type Objects >> ------------ >> >> The structure of type objects is not available to applications; >> declaration of "static" type objects is not possible anymore >> (for applications using this ABI). > > Hmm, that's going to create big problems for extensions that > want to expose a C-API for their types: Type checks are normally > done by pointer comparison using those static type objects. They would just have to expose "MyExtensionPrefix_MyType_Check" and "MyExtensionPrefix_MyType_CheckExact" functions the same way that types in the C API do. >> Functions and function-like Macros >> ---------------------------------- >> >> Function-like macros (in particular, field access macros) remain >> available to applications, but get replaced by function calls >> (unless their definition only refers to features of the ABI, such >> as the various _Check macros) > > Including Py_INCREF()/Py_DECREF() ? I believe so - MvL deliberately left the fields that the ref counting relies on as part of the ABI. >> Excluded Functions >> ------------------ >> >> Functions declared in the following header files are not part >> of the ABI: >> - cellobject.h >> - classobject.h >> - code.h >> - frameobject.h >> - funcobject.h >> - genobject.h >> - pyarena.h >> - pydebug.h >> - symtable.h >> - token.h >> - traceback.h > > I don't think that's feasable: you basically remove all introspection > functions that way. > > This will need a more fine-grained approach. I don't think it is reasonable to expect the introspection interfaces to remain stable at a binary level across versions. Having "I want deep introspection support from C" and "I want to use a single binary for multiple Python versions" be mutually exclusive choices sounds like a perfectly sensible position to me. Also, keep in mind that even an extension module that restricts itself to Py_LIMITED_API would still be able to call in to the Python equivalents via PyObject_Call and friends (e.g. by importing and using the inspect and traceback modules). > What if you mix extensions that use the full C-API with ones > that restrict themselves to the limited version ? > > Would creating a Python object in a full-API extension and > free'ing it in a limited-API extension cause problems ? Possibly, if you end up mixing C runtimes in the process. Specifically: 1. Python linked with MSVCRT X 2. Full extension module linked with MSVCRT Y 3. Limited extension module linked with MSVCRT Z The PyMem/PyObject APIs in the limited extension module will use the heap in MSVCRT X, since they will be redirected through the Python stable ABI as function calls. However, if the full extension module uses the macro forms and links with the wrong MSVCRT version, then you have the usual opportunities for conflicts between the two C runtimes. This isn't a problem created by defining a stable ABI though - it's the main reason mixing C runtimes is a bad idea. (The two others we have noted so far being IO issues, especially attempting to share FILE* instances and the fact that changing the locale will only affect whichever runtime the extension module linked against). >> Good point. As a separate issue, I would actually like to deprecate, >> then remove these APIs. I had originally hoped that this would happen >> for 3.0 already, alas, nobody worked on it. >> >> In any case, I have removed them from the ABI now. > > How do you expect Python extensions to allocate memory and objects > in a platform independent way without those APIs ? > > And as an aside: Which API families are you referring to ? PyMem_Malloc, > PyObject_Malloc, or PyObject_New ? The ones with a FILE* parameter in the signature. There's no problem with the PyMem/PyObject functions since those will be redirected to consistently use the version of the C runtime that Python was originally linked against (their macro counterparts are obviously off limits for the stable ABI). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From aahz at pythoncraft.com Mon May 25 23:54:16 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 25 May 2009 14:54:16 -0700 Subject: [Python-Dev] FWD: python-checkins is down Message-ID: <20090525215416.GA19344@panix.com> ----- Forwarded message from Ralf Hildebrandt ----- > Date: Mon, 25 May 2009 21:59:32 +0200 > From: Ralf Hildebrandt > To: Patrick Ben Koetter > Cc: Aahz , postmaster at python.org > Subject: Re: FWD: Re: [Python-Dev] python-checkins is down > > * Patrick Ben Koetter : >> This just hit python-checkins at python.org: >> >> May 25 20:50:33 albatross postfix/local[12976]: A029ED5FF: to=, orig_to=, relay=local, delay=0.17, delays=0.09/0/0/0.08, dsn=2.0.0, status=sent (delivered to command: /usr/local/mailman/mail/mailman post python-checkins) >> >> Looks like the list itself is online and can be reached. >> >> I didn't read the whole thread (deleted part of it already). >> If that isn't the problem, what should I look for then? > > I let all the mails through and set the senders to the "may send > although they're not members" > > -- > Ralf Hildebrandt > Gesch?ftsbereich IT | Abteilung Netzwerk > Charit? - Universit?tsmedizin Berlin > Campus Benjamin Franklin > Hindenburgdamm 30 | D-12200 Berlin > Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 > Ralf.Hildebrandt at charite.de | http://www.charite.de ----- End forwarded message ----- -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson From google at mrabarnett.plus.com Tue May 26 01:50:58 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 26 May 2009 00:50:58 +0100 Subject: [Python-Dev] Arguments of MatchObject in re module Message-ID: <4A1B2EE2.3030107@mrabarnett.plus.com> I've just noticed an oddity of the re module while looking at the sources. I'll illustrate it below: >>> import re >>> p = re.compile("foo") >>> help(p.match) Help on built-in function match: match(...) match(string[, pos[, endpos]]) --> match object or None. Matches zero or more characters at the beginning of the string >>> p.match(string="foo") Traceback (most recent call last): File "", line 1, in p.match(string="foo") TypeError: Required argument 'pattern' (pos 1) not found >>> The name of the first argument should be "string", yet it's "pattern". Does anyone know if it's anything other than a mistake? Should it be fixed in the next version of the re module, or are we just stuck with it (and should just change the docstring to match)? From martin at v.loewis.de Tue May 26 08:59:51 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 May 2009 08:59:51 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1AD862.6090100@egenix.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> Message-ID: <4A1B9367.7090808@v.loewis.de> > Now, with the PEP, I have a feeling that the Python C-API > will in effect be limited to what's in the PEP's idea of > a usable ABI and open up the non-inluded public C-APIs > to the same rate of change as the private APIs. That's certainly not the plan. Instead, the plan is to have a stable ABI. The policy on the API isn't affected, except for restricting changes to the API that would break the ABI. >> During the compilation of applications, the preprocessor macro >> Py_LIMITED_API must be defined. Doing so will hide all definitions >> that are not part of the ABI. > > So extensions wanting to use the full Python C-API as documented > in the C-API docs will still be able to do this, right ? Correct. They would link to the version-specific DLL on Windows. >> The structure of type objects is not available to applications; >> declaration of "static" type objects is not possible anymore >> (for applications using this ABI). > > Hmm, that's going to create big problems for extensions that > want to expose a C-API for their types: Type checks are normally > done by pointer comparison using those static type objects. I don't see the problem. During module initialization, you create the type object and store it in a global variable, and then both clients and the module compare against the stored pointer. >> Function-like macros (in particular, field access macros) remain >> available to applications, but get replaced by function calls >> (unless their definition only refers to features of the ABI, such >> as the various _Check macros) > > Including Py_INCREF()/Py_DECREF() ? Yes, although some people are requesting that these become functions. >> Excluded Functions >> ------------------ >> >> Functions declared in the following header files are not part >> of the ABI: >> - cellobject.h >> - classobject.h >> - code.h >> - frameobject.h >> - funcobject.h >> - genobject.h >> - pyarena.h >> - pydebug.h >> - symtable.h >> - token.h >> - traceback.h > > I don't think that's feasable: you basically remove all introspection > functions that way. > > This will need a more fine-grained approach. What specifically is it that you want to do in a module that you couldn't do anymore? >> On Windows, applications shall link with python3.dll; > > You mean: extensions that were compiled with Py_LIMITED_API, right ? Correct, see "Terminology" in the PEP. > >> an import >> library python3.lib will be available. This DLL will redirect all of >> its API functions through /export linker options to the full >> interpreter DLL, i.e. python3y.dll. > > What if you mix extensions that use the full C-API with ones > that restrict themselves to the limited version ? Some link against python3.dll, others against python32.dll (say). > Would creating a Python object in a full-API extension and > free'ing it in a limited-API extension cause problems ? No problem that I can see. >> This PEP will be implemented in a branch, allowing users to check >> whether their modules conform to the ABI. To simplify this testing, an >> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing >> type object layout, to let users postpone rewriting all types. When >> the this branch is merged into the 3.2 code base, this macro will >> be removed. > > Now I'm confused again: this sounds a lot like you do want all extension > writers to only use the limited API. I certainly want to support as many modules as reasonable with the PEP. Whether or not developers then chose to build version-independent binaries is certainly outside the scope of the PEP - it only specifies action items for Python, not for application authors. >>> Something I haven't seen explicitly mentioned as yet (in the PEP or the >>>> python-dev list discussion) are the memory management APIs and the FILE* >>>> APIs which can cause the MSVCRT versioning issues on Windows. >>>> >>>> Those would either need to be excluded from the stable ABI or else >>>> changed to use opaque pointers. >> Good point. As a separate issue, I would actually like to deprecate, >> then remove these APIs. I had originally hoped that this would happen >> for 3.0 already, alas, nobody worked on it. >> >> In any case, I have removed them from the ABI now. > > How do you expect Python extensions to allocate memory and objects > in a platform independent way without those APIs ? I have only removed functions from the ABI that have FILE* in their signatures. > And as an aside: Which API families are you referring to ? PyMem_Malloc, > PyObject_Malloc, or PyObject_New ? Neither. PyRun_AnyFileFlags and friends. Regards, Martin From mal at egenix.com Tue May 26 18:28:59 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 26 May 2009 18:28:59 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1B07FA.6010509@gmail.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B07FA.6010509@gmail.com> Message-ID: <4A1C18CB.6040208@egenix.com> Nick Coghlan wrote: > M.-A. Lemburg wrote: >> Now, with the PEP, I have a feeling that the Python C-API >> will in effect be limited to what's in the PEP's idea of >> a usable ABI and open up the non-inluded public C-APIs >> to the same rate of change as the private APIs. > > Not really - before this PEP it was already fairly easy to write an > extension that was source-level compatible with multiple versions of > Python (depending on exactly what you wanted to do, of course). Right and I hope that things stay that way. > However, it is essentially impossible to make an extension that is > binary level compatible with multiple versions. On Windows, yes. On Unix, this often worked, even though it wasn't always safe to do. In practice it's usually better to recompile extensions for every single release. > With the defined stable ABI in place, each extension module author will > be able to make a choice: > - choose binary compatibility by limiting themselves to the stable ABI > and be able to provide a single binary that will still work with later > versions of Py3k > - stick with source compatibility and continue to provide new binaries > for each version of Python Great ! >> An optional cross-version ABI would certainly be a good thing. >> >> Limiting the Python C-API would be counterproductive. > > I don't think anyone would disagree with that. A discussion on C-API sig > would certainly be a good idea. > >>> During the compilation of applications, the preprocessor macro >>> Py_LIMITED_API must be defined. Doing so will hide all definitions >>> that are not part of the ABI. >> So extensions wanting to use the full Python C-API as documented >> in the C-API docs will still be able to do this, right ? > > Yep - they just wouldn't define the new macro. Good ! >>> Type Objects >>> ------------ >>> >>> The structure of type objects is not available to applications; >>> declaration of "static" type objects is not possible anymore >>> (for applications using this ABI). >> Hmm, that's going to create big problems for extensions that >> want to expose a C-API for their types: Type checks are normally >> done by pointer comparison using those static type objects. > > They would just have to expose "MyExtensionPrefix_MyType_Check" and > "MyExtensionPrefix_MyType_CheckExact" functions the same way that types > in the C API do. Hmm, that's a function call per type check and will slow things down a lot, esp. when working with APIs that deal a lot with these objects. The typical way to implement these type checks is via a simple pointer comparison (falling back to a function for sub-types). That's cheap and fast. >>> Functions and function-like Macros >>> ---------------------------------- >>> >>> Function-like macros (in particular, field access macros) remain >>> available to applications, but get replaced by function calls >>> (unless their definition only refers to features of the ABI, such >>> as the various _Check macros) >> Including Py_INCREF()/Py_DECREF() ? > > I believe so - MvL deliberately left the fields that the ref counting > relies on as part of the ABI. Hmm, another slow-down. This one has even more impact if you're writing extensions that have to deal with lots of objects. >>> Excluded Functions >>> ------------------ >>> >>> Functions declared in the following header files are not part >>> of the ABI: >>> - cellobject.h >>> - classobject.h >>> - code.h >>> - frameobject.h >>> - funcobject.h >>> - genobject.h >>> - pyarena.h >>> - pydebug.h >>> - symtable.h >>> - token.h >>> - traceback.h >> I don't think that's feasable: you basically remove all introspection >> functions that way. >> >> This will need a more fine-grained approach. > > I don't think it is reasonable to expect the introspection interfaces to > remain stable at a binary level across versions. > > Having "I want deep introspection support from C" and "I want to use a > single binary for multiple Python versions" be mutually exclusive > choices sounds like a perfectly sensible position to me. > > Also, keep in mind that even an extension module that restricts itself > to Py_LIMITED_API would still be able to call in to the Python > equivalents via PyObject_Call and friends (e.g. by importing and using > the inspect and traceback modules). Sure, but they'd also want to print tracebacks or raise fatal errors if necessary. >> What if you mix extensions that use the full C-API with ones >> that restrict themselves to the limited version ? >> >> Would creating a Python object in a full-API extension and >> free'ing it in a limited-API extension cause problems ? > > Possibly, if you end up mixing C runtimes in the process. Specifically: > 1. Python linked with MSVCRT X > 2. Full extension module linked with MSVCRT Y > 3. Limited extension module linked with MSVCRT Z > > The PyMem/PyObject APIs in the limited extension module will use the > heap in MSVCRT X, since they will be redirected through the Python > stable ABI as function calls. However, if the full extension module uses > the macro forms and links with the wrong MSVCRT version, then you have > the usual opportunities for conflicts between the two C runtimes. > > This isn't a problem created by defining a stable ABI though - it's the > main reason mixing C runtimes is a bad idea. (The two others we have > noted so far being IO issues, especially attempting to share FILE* > instances and the fact that changing the locale will only affect > whichever runtime the extension module linked against). Of course, but the stable ABI encourages mixing extensions regardless of what runtime they were compiled with. This is not much of an issue if the C runtime DLL doesn't change between releases, but it becomes a problem when they do e.g. due to an upgrade to a new MSVC++ compiler version or in case the extension was downloaded pre-compiled from pypi or some other site. I think the module import API should check for possible incompatibilities here and issue a warning (much like it does now for differences in the Python API version). >>> Good point. As a separate issue, I would actually like to deprecate, >>> then remove these APIs. I had originally hoped that this would happen >>> for 3.0 already, alas, nobody worked on it. >>> >>> In any case, I have removed them from the ABI now. >> How do you expect Python extensions to allocate memory and objects >> in a platform independent way without those APIs ? >> >> And as an aside: Which API families are you referring to ? PyMem_Malloc, >> PyObject_Malloc, or PyObject_New ? > > The ones with a FILE* parameter in the signature. There's no problem > with the PyMem/PyObject functions since those will be redirected to > consistently use the version of the C runtime that Python was originally > linked against (their macro counterparts are obviously off limits for > the stable ABI). Ah, ok. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue May 26 18:42:37 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 26 May 2009 18:42:37 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1B9367.7090808@v.loewis.de> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B9367.7090808@v.loewis.de> Message-ID: <4A1C1BFD.8090700@egenix.com> Martin v. L?wis wrote: >> Now, with the PEP, I have a feeling that the Python C-API >> will in effect be limited to what's in the PEP's idea of >> a usable ABI and open up the non-inluded public C-APIs >> to the same rate of change as the private APIs. > > That's certainly not the plan. Instead, the plan is to have > a stable ABI. The policy on the API isn't affected, except > for restricting changes to the API that would break the ABI. Thanks for clarifying this. >>> During the compilation of applications, the preprocessor macro >>> Py_LIMITED_API must be defined. Doing so will hide all definitions >>> that are not part of the ABI. >> So extensions wanting to use the full Python C-API as documented >> in the C-API docs will still be able to do this, right ? > > Correct. They would link to the version-specific DLL on Windows. Good. >>> The structure of type objects is not available to applications; >>> declaration of "static" type objects is not possible anymore >>> (for applications using this ABI). >> Hmm, that's going to create big problems for extensions that >> want to expose a C-API for their types: Type checks are normally >> done by pointer comparison using those static type objects. > > I don't see the problem. During module initialization, you > create the type object and store it in a global variable, and > then both clients and the module compare against the stored > pointer. Ah, good point ! >>> Function-like macros (in particular, field access macros) remain >>> available to applications, but get replaced by function calls >>> (unless their definition only refers to features of the ABI, such >>> as the various _Check macros) >> Including Py_INCREF()/Py_DECREF() ? > > Yes, although some people are requesting that these become functions. I'd opt against that, simply because it creates a lot of overhead due to the function call and issues with cache locality. >>> Excluded Functions >>> ------------------ >>> >>> Functions declared in the following header files are not part >>> of the ABI: >>> - cellobject.h >>> - classobject.h >>> - code.h >>> - frameobject.h >>> - funcobject.h >>> - genobject.h >>> - pyarena.h >>> - pydebug.h >>> - symtable.h >>> - token.h >>> - traceback.h >> I don't think that's feasable: you basically remove all introspection >> functions that way. >> >> This will need a more fine-grained approach. > > What specifically is it that you want to do in a module that you > couldn't do anymore? See my reply to Nick: some of the functions are needed even if you don't want to do introspection, such as Py_FatalError() or PyTraceBack_Print(). BTW: Given the headline, I take it that the various type checking macros in these header will still be available, right ? >>> On Windows, applications shall link with python3.dll; >> You mean: extensions that were compiled with Py_LIMITED_API, right ? > > Correct, see "Terminology" in the PEP. Good, thanks. >>> an import >>> library python3.lib will be available. This DLL will redirect all of >>> its API functions through /export linker options to the full >>> interpreter DLL, i.e. python3y.dll. >> What if you mix extensions that use the full C-API with ones >> that restrict themselves to the limited version ? > > Some link against python3.dll, others against python32.dll (say). > >> Would creating a Python object in a full-API extension and >> free'ing it in a limited-API extension cause problems ? > > No problem that I can see. Can we be sure that the MSVCRT used by python35.dll stays compatible to the one used by say python32.dll ? What if the CRT memory management changes between MSVCRT versions ? Another aspect to consider: How will this work in the light of having multiple copies of Python installed on a Windows machine ? They implementation section suggests that python3.dll would always redirect to the python3x.dll for which it was installed, ie. if I have Python 3.5 installed, but then need to run some app with Python 3.2, the installed python3.dll would then point back to the python32.dll. Now, if I start a Python 3.5 application which uses a limited API extension, this would try to load python32.dll into the Python 3.5 process. AFAIK, that's not possible due to the naming conflicts. >>> This PEP will be implemented in a branch, allowing users to check >>> whether their modules conform to the ABI. To simplify this testing, an >>> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing >>> type object layout, to let users postpone rewriting all types. When >>> the this branch is merged into the 3.2 code base, this macro will >>> be removed. >> Now I'm confused again: this sounds a lot like you do want all extension >> writers to only use the limited API. > > I certainly want to support as many modules as reasonable with the PEP. > Whether or not developers then chose to build version-independent > binaries is certainly outside the scope of the PEP - it only specifies > action items for Python, not for application authors. Thanks for the clarification. >>>> Something I haven't seen explicitly mentioned as yet (in the PEP or the >>>>> python-dev list discussion) are the memory management APIs and the FILE* >>>>> APIs which can cause the MSVCRT versioning issues on Windows. >>>>> >>>>> Those would either need to be excluded from the stable ABI or else >>>>> changed to use opaque pointers. >>> Good point. As a separate issue, I would actually like to deprecate, >>> then remove these APIs. I had originally hoped that this would happen >>> for 3.0 already, alas, nobody worked on it. >>> >>> In any case, I have removed them from the ABI now. >> How do you expect Python extensions to allocate memory and objects >> in a platform independent way without those APIs ? > > I have only removed functions from the ABI that have FILE* in their > signatures. > >> And as an aside: Which API families are you referring to ? PyMem_Malloc, >> PyObject_Malloc, or PyObject_New ? > > Neither. PyRun_AnyFileFlags and friends. Good. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 33 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Tue May 26 20:31:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 May 2009 20:31:16 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1C18CB.6040208@egenix.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B07FA.6010509@gmail.com> <4A1C18CB.6040208@egenix.com> Message-ID: <4A1C3574.6040306@v.loewis.de> >>>> The structure of type objects is not available to applications; >>>> declaration of "static" type objects is not possible anymore >>>> (for applications using this ABI). >>> Hmm, that's going to create big problems for extensions that >>> want to expose a C-API for their types: Type checks are normally >>> done by pointer comparison using those static type objects. >> They would just have to expose "MyExtensionPrefix_MyType_Check" and >> "MyExtensionPrefix_MyType_CheckExact" functions the same way that types >> in the C API do. > > Hmm, that's a function call per type check and will slow things > down a lot, esp. when working with APIs that deal a lot with > these objects. See my other response. You can continue to provide _Check macros; knowledge of the structure of types is not necessary to perform such checks. > The typical way to implement these type checks is via a simple > pointer comparison (falling back to a function for sub-types). > That's cheap and fast. And will continue to be available to ABI-compliant extensions. >>> Including Py_INCREF()/Py_DECREF() ? >> I believe so - MvL deliberately left the fields that the ref counting >> relies on as part of the ABI. > > Hmm, another slow-down. ??? Why is "no change" a slow-down? > This is not much of an issue if the C runtime DLL doesn't change > between releases, but it becomes a problem when they do e.g. > due to an upgrade to a new MSVC++ compiler version or in case > the extension was downloaded pre-compiled from pypi or some > other site. What problem specifically may occur? Regards, Martin From martin at v.loewis.de Tue May 26 20:54:35 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 May 2009 20:54:35 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1C1BFD.8090700@egenix.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B9367.7090808@v.loewis.de> <4A1C1BFD.8090700@egenix.com> Message-ID: <4A1C3AEB.7010708@v.loewis.de> >>>> Functions declared in the following header files are not part >>>> of the ABI: >>>> - cellobject.h >>>> - classobject.h >>>> - code.h >>>> - frameobject.h >>>> - funcobject.h >>>> - genobject.h >>>> - pyarena.h >>>> - pydebug.h >>>> - symtable.h >>>> - token.h >>>> - traceback.h >>> I don't think that's feasable: you basically remove all introspection >>> functions that way. >>> >>> This will need a more fine-grained approach. >> What specifically is it that you want to do in a module that you >> couldn't do anymore? > > See my reply to Nick: some of the functions are needed even > if you don't want to do introspection, such as Py_FatalError() Ok. I don't know what Py_FatalError is doing in pydebug.h, so I now propose to move it to pyerrors.h. > or PyTraceBack_Print(). Ok; I have removed traceback.h from the list. By the other rules of the PEP, the only function that becomes available then is PyTraceBack_Print. > BTW: Given the headline, I take it that the various type checking > macros in these header will still be available, right ? Which headers? The one on the list above? No; my idea would be to completely hide them as-is. All other type-checking macros will remain available, and will remain being macros. >>> Would creating a Python object in a full-API extension and >>> free'ing it in a limited-API extension cause problems ? >> No problem that I can see. > > Can we be sure that the MSVCRT used by python35.dll stays compatible > to the one used by say python32.dll ? What if the CRT memory > management changes between MSVCRT versions ? It doesn't matter. For Python "things", the extension module will use the pymem.h functions, which get routed through pythonxy.dll to the CRT that Python was build with. If the extension uses regular malloc(), it should also invoke regular free() on the pointer. There is no API where Python calls malloc directly and the extension calls free, or vice versa. > How will this work in the light of having multiple copies of > Python installed on a Windows machine ? Interesting question. One solution could be to use SxS, which would allow multiple concurrent installations of python3.dll, although we would need to make sure it always binds to the "right" one in each context. Another solution could be to keep the various copies of python3.dll in their respective PYTHONHOMEs, and leave it to python.exe or the app to load the right one; any subsequent extension modules should then pick up the one that was already loaded. > They implementation section suggests that python3.dll would always > redirect to the python3x.dll for which it was installed, ie. if > I have Python 3.5 installed, but then need to run some app with > Python 3.2, the installed python3.dll would then point back to the > python32.dll. That depends on where they get installed. If they all go into system32, only the most recent one would be available, which is probably not desirable. > Now, if I start a Python 3.5 application which uses a limited > API extension, this would try to load python32.dll into the > Python 3.5 process. AFAIK, that's not possible due to the > naming conflicts. I don't see this problem. As long as we manage to install multiple versions of python3.dll on the system somehow, different processes could certainly load different such DLLs, and the same extension module would always use the right one. Regards, Martin From phillip.sitbon+python-dev at gmail.com Tue May 26 21:48:49 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Tue, 26 May 2009 12:48:49 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows Message-ID: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> Hi everyone, I'm new to the list but I've been embedding Python and working very closely with the core sources for many years now. I discovered Python a long time ago when I needed to embed a scripting language and found the PHP sources... unreadable ;) Anyway, I'd like to ask something that may have been asked already, so I apologize if this has been covered. Instead of removing the GIL, has anyone thought of making it more lightweight? The current situation for Windows is that the single-thread case is decently fast (via interlocked operations), but it drops to using an event object in the case of contention. (see thread_nt.h) Now, I don't have any specific evidence aside from my experience in Windows multithreaded programming, but event objects are often considered the slowest synchronization mechanism available. So, what are the alternatives? Mutexes or critical sections. Semaphores too, if you want to get fancy, but I digress. Because mutexes have the capability of inter-process locking, which we don't need, critical sections fit the bill as a lightweight locking mechanism. They work in a way similar to how the Python GIL is handled: first, attempt an interlocked operation, and if another thread owns the lock, wait on a kernel object. They are known to be extremely fast. There are some catches with using a critical section instead of the current method: 1. It is recursive, while the current GIL setup is not. Would it break Python to support (or deal with) recursive behavior at the GIL level? Note that we can still disallow recursion and fail because we know if the current thread is the lock owner, but the return from the lock function is usually only checked when the wait parameter is zero (meaning "don't block, just try to acquire"). The biggest problem I see here is how mixing the PyGILState_* API with multiple interpreters will behave: when PyGILState_Ensure() is called while the GIL is held for a thread state under an interpreter other than the main interpreter, it tries to re-lock the GIL. This would normally cause a deadlock, but the best we could do with a critical section is have the call fail and/or increase a recursion counter. If maintaining behavior is absolutely necessary, I guess it would be pretty easy to force a deadlock. Personally, I would prefer a Py_FatalError or something like it. 2. Backwards incompatibility: TryEnterCriticalSection isn't available pre-NT4, so Windows 95 support is broken. Microsoft doesn't support or even mention it in the list of supporting OSes for their API functions anymore, so... non-issue? Some of the data structure is available to us, so I bet it would be easy to implement the function manually. 3. ?? - I'm sure there are other issues that deserve a look. I've given this a shot already while doing some concurrency testing with my ISAPI extension (PyISAPIe). First of all, nothing looks broken yet. I'm using my modified python26.dll to run all of my Python code and trying to find anywhere it could possibly break. For multiple concurrent requests against a single multithreaded ISAPI handler process, I see a statistically significant speed increase depending on how much Python code is executed. With more Python code executed (e.g. a Django page), the speedup was about 2x. I haven't tested with varied values for _Py_CheckInterval aside from finding a sweet spot for my specific purposes, but using 100 (the default) would likely make the performance difference more noticeable. A spin mutex also does well, but the results vary a lot more. Just as a disclaimer, my tests were nowhere near scientific, but if anyone needs convincing I can come up with some actual measurements. I think at this point most of you are wondering more about what it would break. Hopefully I haven't wasted anyone's time - I just wanted to share what I see as a possibly substantial improvement to Python's core. let me know if you're interested in a patch to use for your own testing. Cheers, Phillip From solipsis at pitrou.net Tue May 26 21:57:53 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 26 May 2009 19:57:53 +0000 (UTC) Subject: [Python-Dev] Making the GIL faster & lighter on Windows References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> Message-ID: Hello, > Hopefully I haven't wasted anyone's time - I just wanted to share what > I see as a possibly substantial improvement to Python's core. let me > know if you're interested in a patch to use for your own testing. You should definitely open a bug entry in http://bugs.python.org. There, post your patch, some explanations and preferably a quick way (e.g. a simple script) of reproducing the speedups (without having to install a third-party library or extension, that is). Thanks Antoine. From v+python at g.nevcal.com Tue May 26 22:01:41 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 26 May 2009 13:01:41 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> Message-ID: <4A1C4AA5.2090905@g.nevcal.com> On approximately 5/26/2009 12:48 PM, came the following characters from the keyboard of Phillip Sitbon: > Hi everyone, > > I'm new to the list but I've been embedding Python and working very > closely with the core sources for many years now. I discovered Python > a long time ago when I needed to embed a scripting language and found > the PHP sources... unreadable ;) ... > I've given this a shot already while doing some concurrency testing > with my ISAPI extension (PyISAPIe). First of all, nothing looks broken > yet. I'm using my modified python26.dll to run all of my Python code > and trying to find anywhere it could possibly break. For multiple > concurrent requests against a single multithreaded ISAPI handler > process, I see a statistically significant speed increase depending on > how much Python code is executed. With more Python code executed (e.g. > a Django page), the speedup was about 2x. I haven't tested with varied > values for _Py_CheckInterval aside from finding a sweet spot for my > specific purposes, but using 100 (the default) would likely make the > performance difference more noticeable. A spin mutex also does well, > but the results vary a lot more. > > Just as a disclaimer, my tests were nowhere near scientific, but if > anyone needs convincing I can come up with some actual measurements. I > think at this point most of you are wondering more about what it would > break. > > Hopefully I haven't wasted anyone's time - I just wanted to share what > I see as a possibly substantial improvement to Python's core. let me > know if you're interested in a patch to use for your own testing. I wonder if the patch could be structured as a conditional compilation? You know how many different spots are touched, and how many lines per spot. If it could be, then theoretically it could be released and people could do lots of comparative stress testing with different workloads. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Tue May 26 22:07:23 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 26 May 2009 22:07:23 +0200 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> Message-ID: <4A1C4BFB.40501@v.loewis.de> > 3. ?? - I'm sure there are other issues that deserve a look. What about fairness? I don't know off-hand whether the GIL is fair, or whether critical sections are fair, but it needs to be considered. Regards, Martin From phillip.sitbon+python-dev at gmail.com Tue May 26 23:00:10 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Tue, 26 May 2009 14:00:10 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <4A1C4BFB.40501@v.loewis.de> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> Message-ID: <536685ea0905261400t1ff81f6bx72ed8017f1c7cb80@mail.gmail.com> > You should definitely open a bug entry in http://bugs.python.org. There, post > your patch, some explanations and preferably a quick way (e.g. a simple script) > of reproducing the speedups (without having to install a third-party library or > extension, that is). I'll get started on that. I'm assuming I should generate a patch from the trunk (2.7)? The file doesn't look different, but I want to make sure I get it from the right place. > I wonder if the patch could be structured as a conditional compilation? You > know how many different spots are touched, and how many lines per spot. > > If it could be, then theoretically it could be released and people could do > lots of comparative stress testing with different workloads. That would be easy to do, because I am just replacing the *NonRecursiveMutex functions. > What about fairness? I don't know off-hand whether the GIL is > fair, or whether critical sections are fair, but it needs to be > considered. If you define fairness in this context as not starving other threads while consuming resources, that is built into the interpreter via sys.setcheckinterval() and also anywhere the GIL is released for I/O. What might be interesting is to see if releasing a critical section and immediately re-acquiring it every _Py_CheckInterval bytecode operations behaves in a similar manner (see ceval.c, line 869). My best guess right now is that it will behave as expected when not using the spin-based critical section. AFAIK, the kernel processes waiters in a FIFO manner without regard to priority. Because a guarantee of mutual exclusion is absolutely necessary, it's up to applications to provide fairness. Python does a decent job of this. - Phillip From tlesher at gmail.com Tue May 26 23:03:44 2009 From: tlesher at gmail.com (Tim Lesher) Date: Tue, 26 May 2009 17:03:44 -0400 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <4A1C4BFB.40501@v.loewis.de> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> Message-ID: <9613db600905261403j1b30f37cs4a94f24912d89960@mail.gmail.com> On Tue, May 26, 2009 at 16:07, "Martin v. L?wis" wrote: >> 3. ?? - I'm sure there are other issues that deserve a look. > > What about fairness? I don't know off-hand whether the GIL is > fair, or whether critical sections are fair, but it needs to be > considered. FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't guarantee a defined order of wakeup among threads of equal priority. -- Tim Lesher From solipsis at pitrou.net Tue May 26 23:09:30 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 26 May 2009 21:09:30 +0000 (UTC) Subject: [Python-Dev] Making the GIL faster & lighter on Windows References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > What about fairness? I don't know off-hand whether the GIL is > fair, According to a past discussion on this list, the current implementation isn't: http://mail.python.org/pipermail/python-dev/2008-March/077814.html (at least on the poster's system) Regards Antoine. From phillip.sitbon+python-dev at gmail.com Tue May 26 23:45:57 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Tue, 26 May 2009 14:45:57 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> Message-ID: <536685ea0905261445v6748f995i6a22f63f7598af32@mail.gmail.com> > FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't > guarantee a defined order of wakeup among threads of equal priority. Indeed, I should have quoted the MSDN docs: "The threads of a single process can use a critical section object for mutual-exclusion synchronization. There is no guarantee about the order in which threads will obtain ownership of the critical section, however, the system will be fair to all threads." http://msdn.microsoft.com/en-us/library/ms683472(VS.85).aspx I read somewhere else that the FIFO order is present, but obviously we shouldn't to expect that if it's not documented as such. > According to a past discussion on this list, the current implementation isn't: > http://mail.python.org/pipermail/python-dev/2008-March/077814.html > (at least on the poster's system) > I believe he's only talking about Linux. Apples & oranges when it comes to stuff like this, although it still justifies looking into what happens every _Py_CheckInterval on Windows. - Phillip From martin at v.loewis.de Wed May 27 01:24:02 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 27 May 2009 01:24:02 +0200 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905261400t1ff81f6bx72ed8017f1c7cb80@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> <536685ea0905261400t1ff81f6bx72ed8017f1c7cb80@mail.gmail.com> Message-ID: <4A1C7A12.7020900@v.loewis.de> > If you define fairness in this context as not starving other threads > while consuming resources, that is built into the interpreter via > sys.setcheckinterval() and also anywhere the GIL is released for I/O. > What might be interesting is to see if releasing a critical section > and immediately re-acquiring it every _Py_CheckInterval bytecode > operations behaves in a similar manner (see ceval.c, line 869). My > best guess right now is that it will behave as expected when not using > the spin-based critical section. AFAIK, the kernel processes waiters > in a FIFO manner without regard to priority. Because a guarantee of > mutual exclusion is absolutely necessary, it's up to applications to > provide fairness. Python does a decent job of this. No: fairness in mutex synchronization means that every waiter for the mutex will eventually acquire it; it won't happen that one thread starves waiting for the mutex. This is something that the mutex needs to provide, not the application. Regards, Martin From martin at v.loewis.de Wed May 27 01:36:54 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 27 May 2009 01:36:54 +0200 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905261445v6748f995i6a22f63f7598af32@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> <536685ea0905261445v6748f995i6a22f63f7598af32@mail.gmail.com> Message-ID: <4A1C7D16.50009@v.loewis.de> >> According to a past discussion on this list, the current implementation isn't: >> http://mail.python.org/pipermail/python-dev/2008-March/077814.html >> (at least on the poster's system) >> > > I believe he's only talking about Linux. Apples & oranges when it > comes to stuff like this Please trust Antoine that it's relevant: if the current implementation isn't fair on Linux, there is no need for the new implementation to be fair on Windows. Regards, Martin From phillip.sitbon+python-dev at gmail.com Wed May 27 02:42:39 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Tue, 26 May 2009 17:42:39 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <4A1C7D16.50009@v.loewis.de> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> <536685ea0905261445v6748f995i6a22f63f7598af32@mail.gmail.com> <4A1C7D16.50009@v.loewis.de> Message-ID: <536685ea0905261742i47707b9fq4fc4821f6a4d5497@mail.gmail.com> > No: fairness in mutex synchronization means that every waiter for the > mutex will eventually acquire it; it won't happen that one thread > starves waiting for the mutex. This is something that the mutex needs to > provide, not the application. Right, I guess I was thinking of it in terms of needing to release the mutex at some point in order for it to be later acquired. > Please trust Antoine that it's relevant: if the current implementation > isn't fair on Linux, there is no need for the new implementation to be > fair on Windows. Fair enough. -- While setting up my patch, I'm noticing something that could be potentially bad for this idea that I overlooked until just now. I'm going to hold off on submitting a ticket unless others suggest it's a better idea to keep this discussion going there. The thread module's lock object uses the same code used to lock and unlock the GIL. By replacing the current locking mechanism with a critical section, it'd be breaking the expected functionality of the lock object, specifically two cases: 1. Blocking recursion: Critical sections don't block on recursion, no way to enforce that 2. Releasing: Currently any thread can release a lock, but only the owner release a critical section Of course blocking recursion is only meaningful with the current behavior of #2, otherwise it's an unrecoverable deadlock. There are a few solutions to this. The first would be to implement only the GIL as a critical section. The problem then is the need to change all of the core code that does not use PyEval_Acquire/ReleaseLock (there is some, right?), which is the best place to use something other than the thread module's locking mechanism on the GIL. This is doable with some effort, but clearly not an option if there is any possibility that extensions are using something other than the PyThreadState_*, PyGILState_* and PyEval_* APIs to manipulate the GIL (are there others?). After any of this, of course, I wonder what kind of crazy things might be expected of the GIL externally that requires its behavior to remain as it is. The second solution would be to use semaphores. I can't say yet if it would be worth it performance-wise so I will refrain from conjecture for the moment. I like the first solution above... I don't know why non-recursion would be necessary for the GIL; clearly it would be a little more involved, but if I can demonstrate the performance gain maybe it's worth my time. - Phillip From kristjan at ccpgames.com Wed May 27 11:23:00 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 27 May 2009 09:23:00 +0000 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F057F03EF73@exchis.ccp.ad.local> I've often thought of this. The problem is that the GIL uses the regular python "lock" which has to be non-recursive, since it is used for synchronization operations other than mutual exclusion, e.g. one thread going to sleep, and another waking it up. Now, we could easily create another class of locks, a python "mutex" or a "critical section" even, which is allowed (but not required) to be recursive. On other platforms, this could fall back to being the good old lock. Requiring it to be recursive would mean that we would need implementations for all platforms. Which is possible, I suppose, building on the old python lock... For the GIL, we would then use a python "mutex" or "critical section" whichever you prefer. Note that for the GIL, if you use a CriticalSection object, you should initialize its "spincount" to zero, because the GIL is almost always in contention. That is, if you don't get the GIL right away, you won't for a while. I don't know what kernel primitive the Critical Section uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing. K -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Phillip Sitbon Sent: 26. ma? 2009 19:49 To: python-dev at python.org Subject: [Python-Dev] Making the GIL faster & lighter on Windows Hi everyone, I'm new to the list but I've been embedding Python and working very closely with the core sources for many years now. I discovered Python a long time ago when I needed to embed a scripting language and found the PHP sources... unreadable ;) Anyway, I'd like to ask something that may have been asked already, so I apologize if this has been covered. Instead of removing the GIL, has anyone thought of making it more lightweight? The current situation for Windows is that the single-thread case is decently fast (via interlocked operations), but it drops to using an event object in the case of contention. (see thread_nt.h) From ncoghlan at gmail.com Wed May 27 13:17:55 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 27 May 2009 21:17:55 +1000 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1C3574.6040306@v.loewis.de> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B07FA.6010509@gmail.com> <4A1C18CB.6040208@egenix.com> <4A1C3574.6040306@v.loewis.de> Message-ID: <4A1D2163.4070709@gmail.com> [PEP] >>>>> Function-like macros (in particular, field access macros) remain >>>>> available to applications, but get replaced by function calls >>>>> (unless their definition only refers to features of the ABI, such >>>>> as the various _Check macros) [MAL] >>>> Including Py_INCREF()/Py_DECREF() ? [Nick] >>> I believe so - MvL deliberately left the fields that the ref counting >>> relies on as part of the ABI. [MAL] >> Hmm, another slow-down. [MvL] > ??? Why is "no change" a slow-down? That was just a miscommunication - I misunderstood the sense in which MAL was using "Including". He was referring to the first part of the paragraph from the PEP (most macros become functions), but I answered assuming he was referring to the part in parentheses (some macros get to stay). So to be perfectly clear: the Py_INCREF/Py_DECREF macros are available as part of the stable ABI because they qualify for the PEP's "definition only refers to features of the ABI" exception. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Wed May 27 13:24:02 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 27 May 2009 21:24:02 +1000 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <4A1C7A12.7020900@v.loewis.de> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> <536685ea0905261400t1ff81f6bx72ed8017f1c7cb80@mail.gmail.com> <4A1C7A12.7020900@v.loewis.de> Message-ID: <4A1D22D2.4050000@gmail.com> Martin v. L?wis wrote: > No: fairness in mutex synchronization means that every waiter for the > mutex will eventually acquire it; it won't happen that one thread > starves waiting for the mutex. This is something that the mutex needs to > provide, not the application. CriticalSections are first come first served on Windows, just like a regular mutex. As Phillip already noted, their main limitation is that they don't work cross-process (of course, that's also where they get their extra speed). Since we don't need the cross-process feature and we don't support Win 9x any more, this is certainly an idea worth looking at. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From mal at egenix.com Wed May 27 14:05:06 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 27 May 2009 14:05:06 +0200 Subject: [Python-Dev] PEP 384: Defining a Stable ABI In-Reply-To: <4A1D2163.4070709@gmail.com> References: <4A107988.3020202@v.loewis.de> <4A1AD862.6090100@egenix.com> <4A1B07FA.6010509@gmail.com> <4A1C18CB.6040208@egenix.com> <4A1C3574.6040306@v.loewis.de> <4A1D2163.4070709@gmail.com> Message-ID: <4A1D2C72.1090705@egenix.com> Nick Coghlan wrote: > [PEP] >>>>>> Function-like macros (in particular, field access macros) remain >>>>>> available to applications, but get replaced by function calls >>>>>> (unless their definition only refers to features of the ABI, such >>>>>> as the various _Check macros) > [MAL] >>>>> Including Py_INCREF()/Py_DECREF() ? > [Nick] >>>> I believe so - MvL deliberately left the fields that the ref counting >>>> relies on as part of the ABI. > [MAL] >>> Hmm, another slow-down. > [MvL] >> ??? Why is "no change" a slow-down? > > That was just a miscommunication - I misunderstood the sense in which > MAL was using "Including". He was referring to the first part of the > paragraph from the PEP (most macros become functions), but I answered > assuming he was referring to the part in parentheses (some macros get to > stay). > > So to be perfectly clear: the Py_INCREF/Py_DECREF macros are available > as part of the stable ABI because they qualify for the PEP's "definition > only refers to features of the ABI" exception. Sorry for the confusion. The exclusion clause in the PEP should probably be replaced by an explicit list of macros which are made available. It not necessarily obvious that a macro only uses features made available through the ABI without actually digging through the headers. In the case of Py_INCREF()/Py_DECREF() the macros do use private macros which the ABI omits. Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 27 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-06-29: EuroPython 2009, Birmingham, UK 32 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From aahz at pythoncraft.com Wed May 27 14:39:55 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 27 May 2009 05:39:55 -0700 Subject: [Python-Dev] Arguments of MatchObject in re module In-Reply-To: <4A1B2EE2.3030107@mrabarnett.plus.com> References: <4A1B2EE2.3030107@mrabarnett.plus.com> Message-ID: <20090527123955.GB2573@panix.com> On Tue, May 26, 2009, MRAB wrote: > > >>> p = re.compile("foo") > >>> help(p.match) > Help on built-in function match: > > match(...) > match(string[, pos[, endpos]]) --> match object or None. > Matches zero or more characters at the beginning of the string > > >>> p.match(string="foo") > > Traceback (most recent call last): > File "", line 1, in > p.match(string="foo") > TypeError: Required argument 'pattern' (pos 1) not found > > The name of the first argument should be "string", yet it's "pattern". > Does anyone know if it's anything other than a mistake? Should it be > fixed in the next version of the re module, or are we just stuck with it > (and should just change the docstring to match)? Please file a report on bugs.python.org so this doesn't get lost. Attaching a suggested patch for _sre.c would be most welcome. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 1993 From curt at hagenlocher.org Wed May 27 14:59:48 2009 From: curt at hagenlocher.org (Curt Hagenlocher) Date: Wed, 27 May 2009 05:59:48 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <4A1D22D2.4050000@gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <4A1C4BFB.40501@v.loewis.de> <536685ea0905261400t1ff81f6bx72ed8017f1c7cb80@mail.gmail.com> <4A1C7A12.7020900@v.loewis.de> <4A1D22D2.4050000@gmail.com> Message-ID: On Wed, May 27, 2009 at 4:24 AM, Nick Coghlan wrote: > > CriticalSections are first come first served on Windows, just like a > regular mutex. "Starting with Windows Server 2003 with Service Pack 1 (SP1), threads waiting on a critical section do not acquire the critical section on a first-come, first-serve basis." http://msdn.microsoft.com/en-us/library/ms682530(VS.85).aspx Windows critical sections use events for kernel-level synchronization. The user-mode code basically consists of an interlocked instruction inside the spin loop. When the likelihood of contention is low, a critical section should be a big win because it won't need to switch into the kernel. I suspect that contention will be frequent for the GIL A good description of pre-Vista Windows critical sections can be found here: http://msdn.microsoft.com/en-us/magazine/cc164040.aspx -- Curt Hagenlocher curt at hagenlocher.org From phillip.sitbon+python-dev at gmail.com Thu May 28 00:22:52 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Wed, 27 May 2009 15:22:52 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F057F03EF73@exchis.ccp.ad.local> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F057F03EF73@exchis.ccp.ad.local> Message-ID: <536685ea0905271522w19b26852pfb67b295673e8358@mail.gmail.com> Heads up to those who were following, I did my best to clearly outline the situation and direction in the tracker. http://bugs.python.org/issue6132 It includes a patch that will break the expected behavior of the thread lock object but make it possible to test GIL performance. > Note that for the GIL, if you use a CriticalSection object, you should initialize its "spincount" to zero, because the GIL is almost always in contention. ?That is, if you don't get the GIL right away, you won't for a while. If I'm not mistaken, calling InitializeCriticalSection rather than InitializeCriticalSectionAndSpinCount (gotta love those long function names) sets the spin count to zero. I could tell when the spin count wasn't zero as far as performance is concerned - spinning is too much of a gamble in most contention situations. > I don't know what kernel primitive the Critical Section uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing. Judging from the increase in speed and CPU utilization I've seen, I don't believe this is the case. My guess is that it's something similar to a futex. - Phillip From brian.de.alwis at usask.ca Thu May 28 02:02:57 2009 From: brian.de.alwis at usask.ca (Brian de Alwis) Date: Wed, 27 May 2009 18:02:57 -0600 Subject: [Python-Dev] Survey on DVCS usage and experience Message-ID: <75599A84-8A53-4DA4-A022-1BEF4EEDA943@usask.ca> Hello everybody. I'm Brett's former lab-mate, and am part of a team conducting a survey to understand the perceived benefits and challenges of using a decentralized or distributed version control systems (DVCS) in software development. With Python having recently chosen to switch to Mercurial, I hoped that any developers who've used a DVCS (and who are over 18 years old) might like to participate in our survey and share your experiences. (We followed your extensive discussions on the switch with great interest.) Details on partcipating are below. Thanks for your time! Brian. ---------------------------------------------------------------------- An increasing number of software projects have or are considering switching their code repositories to a decentralized or distributed VCS (DVCS). There are many such DVCS tools, including git, bzr, mercurial, monotone, or bitkeeper. We are conducting a survey to assess the perceived benefits and challenges of using a DVCS. We would ask that any individuals who use or are comfortable using a DVCS for managing the artifacts for a project to please consider completing the survey. The survey has several open-ended questions, and may take up to 20 minutes to complete. The data collected from this study will be used in articles for publication in journals and conference proceedings. The results of this study will provide additional knowledge and guidance for projects considering moving to using a DVCS. This is an anonymous survey. Any personal information divulged in answering a question will be kept strictly confidential. The survey is at: http://www.cs.usask.ca/~bsd178/research/dvcs-survey/ Please feel free to redistribute this to other interested groups. If you would like more detail about the survey, or information not included here, please contact us. Brian de Alwis Department of Computer Science University of Saskatchewan brian.de.alwis at usask.ca This research has the ethical approval of the Research Ethics Office at the University of Saskatchewan. If you have any concerns about your treatment or rights as a research subject, please contact the office at 306-966-2084. ---------------------------------------------------------------------- -- Brian de Alwis | HCI Lab | University of Saskatchewan On bike helmets: "If you think your hair is more important than your brain, you're probably right." (B. J. Wawrykow) From kristjan at ccpgames.com Thu May 28 11:00:22 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 28 May 2009 09:00:22 +0000 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <536685ea0905271522w19b26852pfb67b295673e8358@mail.gmail.com> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F057F03EF73@exchis.ccp.ad.local> <536685ea0905271522w19b26852pfb67b295673e8358@mail.gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F057F03F17E@exchis.ccp.ad.local> You are right, a small experiment confirmed that it is set to 0 (see SetCriticalSectionSpinCount()) I had assumed that a small non-zero value might be chosen on multiprocessor machines. Do you think that the problem lies with the use of the "event" object as such? Have you tried using a "semaphore" or "mutex" instead? Or do you think that all of the synchronizations primitives that rely on the WaitForMultipleObjects() api are subject to the same issue? Cheers, Kristj?n -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Phillip Sitbon Sent: 27. ma? 2009 22:23 To: python-dev Subject: Re: [Python-Dev] Making the GIL faster & lighter on Windows If I'm not mistaken, calling InitializeCriticalSection rather than InitializeCriticalSectionAndSpinCount (gotta love those long function names) sets the spin count to zero. I could tell when the spin count wasn't zero as far as performance is concerned - spinning is too much of a gamble in most contention situations. > I don't know what kernel primitive the Critical Section uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing. Judging from the increase in speed and CPU utilization I've seen, I don't believe this is the case. My guess is that it's something similar to a futex. From jeremy at alum.mit.edu Thu May 28 15:06:03 2009 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 28 May 2009 09:06:03 -0400 Subject: [Python-Dev] question about docstring formatting Message-ID: A question came up at work about docstring formatting. It relates to the description of the summary line in PEP 257. http://www.python.org/dev/peps/pep-0257/ """Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line. The entire docstring is indented the same as the quotes at its first line (see example below).""" It says that the summary line may be used by automatic indexing tools, but is there any evidence that such a tool actually exists? Or was there once upon a time? If there are no such tools, do we still think that it is important that it fits on line line? Jeremy From glyph at divmod.com Thu May 28 15:45:30 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 28 May 2009 13:45:30 -0000 Subject: [Python-Dev] question about docstring formatting In-Reply-To: References: Message-ID: <20090528134530.12555.1212950071.divmod.xquotient.11621@weber.divmod.com> On 01:06 pm, jeremy at alum.mit.edu wrote: >It says that the summary line may be used by automatic indexing tools, >but is there any evidence that such a tool actually exists? Or was >there once upon a time? If there are no such tools, do we still think >that it is important that it fits on line line? For what it's worth, https://launchpad.net/pydoctor appears to do this, as you can see from the numerous truncated sentences on . I suspect a more reasonable approach for automatic documentation generators would be to try to identify the first complete sentence, rather than the first line... but, this is at least an accurate description of the status quo for some tools :). From goodger at python.org Thu May 28 15:29:25 2009 From: goodger at python.org (David Goodger) Date: Thu, 28 May 2009 09:29:25 -0400 Subject: [Python-Dev] question about docstring formatting In-Reply-To: References: Message-ID: <4335d2c40905280629xf9138a3qb94026841e20eebd@mail.gmail.com> On Thu, May 28, 2009 at 09:06, Jeremy Hylton wrote: > A question came up at work about docstring formatting. ?It relates to > the description of the summary line in PEP 257. > > http://www.python.org/dev/peps/pep-0257/ > """Multi-line docstrings consist of a summary line just like a > one-line docstring, followed by a blank line, followed by a more > elaborate description. The summary line may be used by automatic > indexing tools; it is important that it fits on one line and is > separated from the rest of the docstring by a blank line. The summary > line may be on the same line as the opening quotes or on the next > line. The entire docstring is indented the same as the quotes at its > first line (see example below).""" > > It says that the summary line may be used by automatic indexing tools, > but is there any evidence that such a tool actually exists? ?Or was > there once upon a time? ?If there are no such tools, do we still think > that it is important that it fits on line line? There are several auto-documentation tools out there, like Sphinx and epydoc, and the stdlib's pydoc. Historically there were other tools, like HappyDoc ad Pythondoc. I'm not up on these or other tools, so I don't know if or how that part of PEP 257 applies. The point of the one-line summary was to allow for tooltips and compact tables of contents. Even if there were no supporting tools, I think it is useful to express the intent of a class/method/function in a single line. The process of distilling the description down can, in itself, be illuminating. To imitate the Zen: if the code can't be described in a short sentence, it may be too complicated. I'm not saying that this should be enforced in any way. It's just a guideline. If a tool needs a short summary and the docstring doens't have a one-liner, I'd expect the tool just to take the first line and add ellipsis ("..."). -- David Goodger From phd at phd.pp.ru Thu May 28 15:11:55 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 28 May 2009 17:11:55 +0400 Subject: [Python-Dev] question about docstring formatting In-Reply-To: References: Message-ID: <20090528131155.GC27490@phd.pp.ru> On Thu, May 28, 2009 at 09:06:03AM -0400, Jeremy Hylton wrote: > It says that the summary line may be used by automatic indexing tools, > but is there any evidence that such a tool actually exists? epydoc, for one. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ziade.tarek at gmail.com Thu May 28 16:19:41 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Thu, 28 May 2009 16:19:41 +0200 Subject: [Python-Dev] [buildbot] some build slaves in bad shape Message-ID: <94bdd2610905280719r2d2af617r8b8b173a5337a74a@mail.gmail.com> Hello, I've noticed some problems since this morning with the trunk and 3.x stable buildbots: - x86 XP-4 (trunk and 3x) is throwing an "no space left on device" error when it compiles the sqlite module in its temp dir - amd64 gentoo 3.x and ia64 Ubuntu 3.x buildbot versions seem to be too old to run, they should be upgraded - ppc Debian unstable trunk keeps on failing to connect to svn.python.org Regards Tarek -- Tarek Ziad? | http://ziade.org From rrr at ronadam.com Thu May 28 18:12:52 2009 From: rrr at ronadam.com (Ron Adam) Date: Thu, 28 May 2009 11:12:52 -0500 Subject: [Python-Dev] question about docstring formatting In-Reply-To: References: Message-ID: <4A1EB804.7030605@ronadam.com> Jeremy Hylton wrote: > A question came up at work about docstring formatting. It relates to > the description of the summary line in PEP 257. > > http://www.python.org/dev/peps/pep-0257/ > """Multi-line docstrings consist of a summary line just like a > one-line docstring, followed by a blank line, followed by a more > elaborate description. The summary line may be used by automatic > indexing tools; it is important that it fits on one line and is > separated from the rest of the docstring by a blank line. The summary > line may be on the same line as the opening quotes or on the next > line. The entire docstring is indented the same as the quotes at its > first line (see example below).""" > > It says that the summary line may be used by automatic indexing tools, > but is there any evidence that such a tool actually exists? Or was > there once upon a time? If there are no such tools, do we still think > that it is important that it fits on line line? > > Jeremy Python's own built in help utility, pydoc uses it. At the help prompt in the python console window, type "modules searchkey" to get a list of modules that contain the searchkey in thier one line summary. Running pydoc with the -g option opens a tkinter search window, that searches the summery lines. Selecting from that list then opens the browser to that item. Ron From phillip.sitbon+python-dev at gmail.com Thu May 28 18:11:17 2009 From: phillip.sitbon+python-dev at gmail.com (Phillip Sitbon) Date: Thu, 28 May 2009 09:11:17 -0700 Subject: [Python-Dev] Making the GIL faster & lighter on Windows In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F057F03F17E@exchis.ccp.ad.local> References: <536685ea0905261248i13728f58ka435d8d0a826a80d@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F057F03EF73@exchis.ccp.ad.local> <536685ea0905271522w19b26852pfb67b295673e8358@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F057F03F17E@exchis.ccp.ad.local> Message-ID: <536685ea0905280911r1ba77ba5hb191369872a4c073@mail.gmail.com> The testing patch I submitted to the tracker includes a semaphore as well, and I did take some time to try it out. It seems that it's no better than the event object, either for a single thread or scaled to many threads... so this does appear to indicate that the WaitForXX functions are costly (which is expected) and scale terribly (which is unfortunate). I had always believed event objects to be "slower" but I'm not seeing a difference here compared to semaphores. My guess is that these results could be very different if I were to test on, say, Windows 2000 instead of Vista. - Phillip 2009/5/28 Kristj?n Valur J?nsson : > You are right, a small experiment confirmed that it is set to 0 (see SetCriticalSectionSpinCount()) > I had assumed that a small non-zero value might be chosen on multiprocessor machines. > > Do you think that the problem lies with the use of the "event" object as such? ?Have you tried using a "semaphore" or "mutex" instead? ?Or do you think that all of the synchronizations primitives that rely on the WaitForMultipleObjects() api are subject to the same issue? > > Cheers, > > Kristj?n > > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Phillip Sitbon > Sent: 27. ma? 2009 22:23 > To: python-dev > Subject: Re: [Python-Dev] Making the GIL faster & lighter on Windows > > > If I'm not mistaken, calling InitializeCriticalSection rather than > InitializeCriticalSectionAndSpinCount (gotta love those long function > names) sets the spin count to zero. I could tell when the spin count > wasn't zero as far as performance is concerned - spinning is too much > of a gamble in most contention situations. > >> I don't know what kernel primitive the Critical Section ?uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing. > > Judging from the increase in speed and CPU utilization I've seen, I > don't believe this is the case. My guess is that it's something > similar to a futex. > > > From rrr at ronadam.com Thu May 28 18:12:52 2009 From: rrr at ronadam.com (Ron Adam) Date: Thu, 28 May 2009 11:12:52 -0500 Subject: [Python-Dev] question about docstring formatting In-Reply-To: References: Message-ID: <4A1EB804.7030605@ronadam.com> Jeremy Hylton wrote: > A question came up at work about docstring formatting. It relates to > the description of the summary line in PEP 257. > > http://www.python.org/dev/peps/pep-0257/ > """Multi-line docstrings consist of a summary line just like a > one-line docstring, followed by a blank line, followed by a more > elaborate description. The summary line may be used by automatic > indexing tools; it is important that it fits on one line and is > separated from the rest of the docstring by a blank line. The summary > line may be on the same line as the opening quotes or on the next > line. The entire docstring is indented the same as the quotes at its > first line (see example below).""" > > It says that the summary line may be used by automatic indexing tools, > but is there any evidence that such a tool actually exists? Or was > there once upon a time? If there are no such tools, do we still think > that it is important that it fits on line line? > > Jeremy Python's own built in help utility, pydoc uses it. At the help prompt in the python console window, type "modules searchkey" to get a list of modules that contain the searchkey in thier one line summary. Running pydoc with the -g option opens a tkinter search window, that searches the summery lines. Selecting from that list then opens the browser to that item. Ron From db3l.net at gmail.com Thu May 28 23:12:33 2009 From: db3l.net at gmail.com (David Bolen) Date: Thu, 28 May 2009 17:12:33 -0400 Subject: [Python-Dev] [buildbot] some build slaves in bad shape References: <94bdd2610905280719r2d2af617r8b8b173a5337a74a@mail.gmail.com> Message-ID: Tarek Ziad? writes: > - x86 XP-4 (trunk and 3x) is throwing an "no space left on device" > error when it compiles the sqlite module in its temp dir Ooops, that's mine. Geez - it's a VM, but has a 10GB C: drive, and the actual build slave has its working directory on a separate virtual drive. Wonder what the heck has filled up the system drive. I'm working on it now though. -- David From eric at trueblade.com Fri May 29 00:39:11 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 28 May 2009 18:39:11 -0400 Subject: [Python-Dev] [Python-checkins] r72995 - in python/branches/py3k: Doc/library/contextlib.rst Doc/whatsnew/3.1.rst Lib/contextlib.py Lib/test/test_contextlib.py Misc/NEWS In-Reply-To: <20090528222003.D1096D569@mail.python.org> References: <20090528222003.D1096D569@mail.python.org> Message-ID: <4A1F128F.50409@trueblade.com> raymond.hettinger wrote: > Author: raymond.hettinger > Date: Fri May 29 00:20:03 2009 > New Revision: 72995 > > Log: > Deprecate contextlib.nested(). The with-statement now provides this functionality directly. > > Modified: > python/branches/py3k/Doc/library/contextlib.rst > python/branches/py3k/Doc/whatsnew/3.1.rst > python/branches/py3k/Lib/contextlib.py > python/branches/py3k/Lib/test/test_contextlib.py > python/branches/py3k/Misc/NEWS Shouldn't the test cases exist as long as contextlib.nested still exists? We want to make sure it works, after all. I think they should be removed only when .nested is itself deleted. Eric. From db3l.net at gmail.com Fri May 29 00:39:02 2009 From: db3l.net at gmail.com (David Bolen) Date: Thu, 28 May 2009 18:39:02 -0400 Subject: [Python-Dev] [buildbot] some build slaves in bad shape References: <94bdd2610905280719r2d2af617r8b8b173a5337a74a@mail.gmail.com> Message-ID: David Bolen writes: > Ooops, that's mine. Geez - it's a VM, but has a 10GB C: drive, and > the actual build slave has its working directory on a separate virtual > drive. Wonder what the heck has filled up the system drive. I'm > working on it now though. Well, looks like it was 5+GB of temporary files of some sort. It's cleaned up now and back online. -- David From ziade.tarek at gmail.com Fri May 29 00:49:55 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 29 May 2009 00:49:55 +0200 Subject: [Python-Dev] [buildbot] some build slaves in bad shape In-Reply-To: References: <94bdd2610905280719r2d2af617r8b8b173a5337a74a@mail.gmail.com> Message-ID: <94bdd2610905281549u4e2518b1o53e16b4fd98b6c5b@mail.gmail.com> On Fri, May 29, 2009 at 12:39 AM, David Bolen wrote: > David Bolen writes: > >> Ooops, that's mine. ?Geez - it's a VM, but has a 10GB C: drive, and >> the actual build slave has its working directory on a separate virtual >> drive. ?Wonder what the heck has filled up the system drive. ?I'm >> working on it now though. > > Well, looks like it was 5+GB of temporary files of some sort. ?It's > cleaned up now and back online. Thanks that's great From dave at boostpro.com Fri May 29 03:22:45 2009 From: dave at boostpro.com (David Abrahams) Date: Thu, 28 May 2009 21:22:45 -0400 Subject: [Python-Dev] Possibility of binary configuration mismatch Message-ID: Hi All, I'm not sure there's anything you can do about this, but I thought I should alert the Python devs that it can happen... http://allmydata.org/trac/tahoe/ticket/704#comment:7 describes a situation where my macports-installed python25 had a pyOpenSSL egg installed in it by something other than macports (possibly by easy_install-2.5?) that was not compatible with the Python build. My hunch is that the pyOpenSSL had binaries compiled against a UCS4 Python, but I don't know for sure. Whatever did the installation of the bad egg was almost certainly being executed by the macports python25 because macports is installed in /opt/local, and nothing is likely to have installed it under that prefix by chance. In other words, this egg probably couldn't have been left over from some non-macports python installation. In fact, I haven't had any other version of Python2.5 installed on this machine. Very odd. I wonder if it makes sense to enhance the extension module system to record this kind of information so the problem can be diagnosed by the system? -- Dave Abrahams BoostPro Computing http://www.boostpro.com From ben+python at benfinney.id.au Fri May 29 04:41:04 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 29 May 2009 12:41:04 +1000 Subject: [Python-Dev] question about docstring formatting References: <4335d2c40905280629xf9138a3qb94026841e20eebd@mail.gmail.com> Message-ID: <87ljogwrzz.fsf@benfinney.id.au> David Goodger writes: > Even if there were no supporting tools, I think it is useful to > express the intent of a class/method/function in a single line. The > process of distilling the description down can, in itself, be > illuminating. To imitate the Zen: if the code can't be described in a > short sentence, it may be too complicated. Absolutely. If you can't describe what the (function, class, module) does succinctly in a single line, how on earth are you going to choose an appropriate short-but-descriptive name for it? This constraint is well worth keeping, for exactly the reasons David says above. > I'm not saying that this should be enforced in any way. It's just a > guideline. If a tool needs a short summary and the docstring doens't > have a one-liner, I'd expect the tool just to take the first line and > add ellipsis ("..."). Which in itself would be annoying enough to apply social pressure from others to get the synopsis into a single line ? so again, I approve :-) -- \ ?Men never do evil so completely and cheerfully as when they do | `\ it from religious conviction.? ?Blaise Pascal (1623-1662), | _o__) Pens?es, #894. | Ben Finney From orsenthil at gmail.com Fri May 29 05:35:08 2009 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 29 May 2009 09:05:08 +0530 Subject: [Python-Dev] Survey on DVCS usage and experience Message-ID: <20090529033508.GA4463@ubuntu.ubuntu-domain> On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote: > With Python having recently chosen to switch to Mercurial, I hoped > that any developers who've used a DVCS (and who are over 18 years > old) might like to participate in our survey and share your Just curious. Why is this age restriction? You might miss out few key developers... -- Senthil From ideasman42 at gmail.com Fri May 29 06:07:02 2009 From: ideasman42 at gmail.com (Campbell Barton) Date: Thu, 28 May 2009 21:07:02 -0700 Subject: [Python-Dev] C/Python API Index removed? Message-ID: <7c1ab96d0905282107r58ee393di79a9b7109f86191e@mail.gmail.com> This page used to give an index of the C/Python API functions too http://docs.python.org/genindex-all.html But a week or so ago I noticed all these functions are now missing (I remember they existed in 2.6.1 docs) Was this intentional? Quite a while ago, ~2.5 the C/API docs had their own index which personally I prefer. http://docs.python.org/c-api/index.html This page is called an index but Im looking for a page like http://docs.python.org/genindex-all.html which includes all C/API function names. Is this the right place to mail such problems? Thanks -- - Campbell From ideasman42 at gmail.com Fri May 29 07:05:51 2009 From: ideasman42 at gmail.com (Campbell Barton) Date: Thu, 28 May 2009 22:05:51 -0700 Subject: [Python-Dev] Warnings when no file exists. Message-ID: <7c1ab96d0905282205o612e03e8vcbfd9183efebcbc@mail.gmail.com> Hi, there has been a problem in blender3d for 6~ years or so thats eluded me, I decided to look into today. - Whenever the a script raises a warnings python prints out binary garbage in the console. Some users complain when they run python games in blender they get beeps coming from the PC speaker. It turns out that _warning.c's setup_context() is taking the first value of argv (line 534 in 2.6.2), which in our case is the blender binary. then some part of the binary is printed to the console. Apart from the beeps and not being helpful this also can mess up the console's state - a like "cat /dev/random" might. But the real problem is that warnings expect a file to exist, in blender we have our own internal text's that dont have a corresponding file on disk, so setting __file__ in the global dict will just point to a location that doesn't exist. It surprises me that warnings do this since exceptions work as expected, printing useful stack traces from our built in texts. Incase this helps, the scripts are converted into a buffer and run like this... text->compiled = Py_CompileString( buf, text->id.name+2, Py_file_input ); PyEval_EvalCode( text->compiled, globaldict, globaldict ); Does anyone know of a workaround for this? Im sure there are other cases where you may want to run compiled code that isnt related to a file. -- - Campbell From ben+python at benfinney.id.au Fri May 29 07:30:45 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 29 May 2009 15:30:45 +1000 Subject: [Python-Dev] Survey on DVCS usage and experience References: <20090529033508.GA4463@ubuntu.ubuntu-domain> Message-ID: <87hbz4wk56.fsf@benfinney.id.au> Senthil Kumaran writes: > On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote: > > > With Python having recently chosen to switch to Mercurial, I hoped > > that any developers who've used a DVCS (and who are over 18 years > > old) might like to participate in our survey and share your > > Just curious. Why is this age restriction? You might miss out few > key developers... I would guess because they need adult consent in order to legally use the survey results as evidence in whatever psychological/sociological study they perform. -- \ ?No matter how far down the wrong road you've gone, turn back.? | `\ ?Turkish proverb | _o__) | Ben Finney From brian.de.alwis at usask.ca Fri May 29 09:13:34 2009 From: brian.de.alwis at usask.ca (Brian de Alwis) Date: Fri, 29 May 2009 01:13:34 -0600 Subject: [Python-Dev] Survey on DVCS usage and experience In-Reply-To: <20090529033508.GA4463@ubuntu.ubuntu-domain> References: <20090529033508.GA4463@ubuntu.ubuntu-domain> Message-ID: <4CE22B25-BC10-40FB-83B3-449E894E8FAB@usask.ca> On 28-May-09, at 9:35 PM, Senthil Kumaran wrote: > On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote: > >> With Python having recently chosen to switch to Mercurial, I hoped >> that any developers who've used a DVCS (and who are over 18 years >> old) might like to participate in our survey and share your > > Just curious. Why is this age restriction? You might miss out few > key developers... It's a restriction required to obtain approval from our research ethics board -- people under 18 are considered to be minors in Canada and thus require the consent of their guardian to participate. Trying to obtain such permission for an anonymous survey is a bit difficult! Although we could work around this guardian-consent issue in theory, doing so would require jumping through several additional hoops in the ethics process and would take significantly more time. Brian. From p.f.moore at gmail.com Fri May 29 12:57:31 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 29 May 2009 11:57:31 +0100 Subject: [Python-Dev] Possibility of binary configuration mismatch In-Reply-To: References: Message-ID: <79990c6b0905290357v6bdaab3amd4d4c001fccdb3e4@mail.gmail.com> 2009/5/29 David Abrahams : > http://allmydata.org/trac/tahoe/ticket/704#comment:7 describes a > situation where my macports-installed python25 had a pyOpenSSL egg > installed in it by something other than macports (possibly by > easy_install-2.5?) that was not compatible with the Python build. ?My > hunch is that the pyOpenSSL had binaries compiled against a UCS4 Python, > but I don't know for sure. ?Whatever did the installation of the bad egg > was almost certainly being executed by the macports python25 because > macports is installed in /opt/local, and nothing is likely to have > installed it under that prefix by chance. ?In other words, this egg > probably couldn't have been left over from some non-macports python > installation. ?In fact, I haven't had any other version of Python2.5 > installed on this machine. ?Very odd. > > I wonder if it makes sense to enhance the extension module system to > record this kind of information so the problem can be diagnosed by the > system? I have a feeling that this has been discussed before, in the context of easy_install/setuptools' approach to encoding the build details for a binary package in the filename, not covering UCS4 vs UCS2. You may find it useful to search on the distutils-sig archives for further information. Paul. From solipsis at pitrou.net Fri May 29 13:14:28 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 29 May 2009 11:14:28 +0000 (UTC) Subject: [Python-Dev] Survey on DVCS usage and experience References: <20090529033508.GA4463@ubuntu.ubuntu-domain> <4CE22B25-BC10-40FB-83B3-449E894E8FAB@usask.ca> Message-ID: Brian de Alwis usask.ca> writes: > > It's a restriction required to obtain approval from our research > ethics board -- people under 18 are considered to be minors in Canada > and thus require the consent of their guardian to participate. Trying > to obtain such permission for an anonymous survey is a bit difficult! But since your survey is anonymous, you can't be sure all the responders are over 18. Actually, they might even not be human beings! (hint: I'm not) Regards Antoine. From status at bugs.python.org Fri May 29 18:08:04 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 29 May 2009 18:08:04 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090529160804.42CB178604@psf.upfronthosting.co.za> ACTIVITY SUMMARY (05/22/09 - 05/29/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2201 open (+36) / 15764 closed (+18) / 17965 total (+54) Open issues with patches: 866 Average duration of open issues: 652 days. Median duration of open issues: 400 days. Open Issues Breakdown open 2175 (+36) pending 26 ( +0) Issues Created Or Reopened (55) _______________________________ improved allocation of PyUnicode objects 05/24/09 http://bugs.python.org/issue1943 reopened pitrou patch str.format raises SystemError 05/22/09 CLOSED http://bugs.python.org/issue6089 created eggy zipfile DeprecationWarning Python 2.6.2 05/22/09 http://bugs.python.org/issue6090 created ivb Curses segfaulting in FreeBSD/amd64 05/23/09 http://bugs.python.org/issue6091 created themoken Changed Shortcuts don't show up in menu 05/23/09 http://bugs.python.org/issue6092 created jamesie Ambiguous locale.strxfrm 05/23/09 CLOSED http://bugs.python.org/issue6093 created tuves Python fails to build with Subversion 1.7 05/23/09 CLOSED http://bugs.python.org/issue6094 created Arfrever patch os.curdir as the default argument for os.listdir 05/23/09 http://bugs.python.org/issue6095 created tarek SimpleXMLRPCServer not suitable for HTTP/1.1 keep-alive 05/24/09 http://bugs.python.org/issue6096 created krisvale patch, patch, easy, needs review Encoded surrogate characters on command line not escaped in sys. 05/24/09 http://bugs.python.org/issue6097 created baikie patch xml.dom.minidom incorrectly claims DOM Level 3 conformance 05/24/09 http://bugs.python.org/issue6098 created phihag patch HTTP/1.1 with keep-alive support for xmlrpclib.ServerProxy 05/24/09 http://bugs.python.org/issue6099 created krisvale patch, patch, needs review Expanding arrays inside other arrays 05/24/09 http://bugs.python.org/issue6100 created marek_sp SETUP_WITH 05/24/09 CLOSED http://bugs.python.org/issue6101 created benjamin.peterson patch When the package has non-ascii path and .pyc file, we cannot imp 05/25/09 http://bugs.python.org/issue6102 created Suzumizaki Static library (libpythonX.Y.a) installed in incorrect location 05/25/09 http://bugs.python.org/issue6103 created Arfrever patch OSX framework builds fail after r72861 move of _locale into core 05/25/09 CLOSED http://bugs.python.org/issue6104 created nad json.dumps doesn't respect OrderedDict's iteration order 05/25/09 http://bugs.python.org/issue6105 created wangchun read_until 05/25/09 http://bugs.python.org/issue6106 created ps Subprocess.Popen output fails on Windows 05/26/09 CLOSED http://bugs.python.org/issue6107 created ac.james unicode(exception) behaves differently on Py2.6 when len(excepti 05/26/09 http://bugs.python.org/issue6108 created ezio.melotti IDLE rendering issue with oriental characters on OSX 05/26/09 http://bugs.python.org/issue6109 created ronaldoussoren IDLE has two "Preferences..." menu's on OSX 05/26/09 CLOSED http://bugs.python.org/issue6110 created ronaldoussoren Impossible to change preferences in IDLE 05/26/09 CLOSED http://bugs.python.org/issue6111 created ronaldoussoren scheduler.cancel does not raise RuntimeError 05/26/09 CLOSED http://bugs.python.org/issue6112 created fidlej Dupicate instances of classes in list 05/26/09 CLOSED http://bugs.python.org/issue6113 reopened mbaynham distutils build_ext path comparison only based on strings 05/26/09 http://bugs.python.org/issue6114 created sleipnir patch Header and doc related to PyNumber_Divide and PyNumber_InPlaceDi 05/26/09 CLOSED http://bugs.python.org/issue6115 created bhy patch frame.f_locals keeps references to things for too long 05/26/09 http://bugs.python.org/issue6116 created exarkun Fix O(n**2) performance problem in socket._fileobject 05/26/09 http://bugs.python.org/issue6117 created krisvale patch, patch, easy, needs review urllib.parse.quote_plus ignores optional arguments 05/26/09 CLOSED http://bugs.python.org/issue6118 created mgiuca patch Confusing DeprecationWarning 05/26/09 http://bugs.python.org/issue6119 created alejolp zipfile.ZipFile's extractall works inproperly under Windows 05/27/09 CLOSED http://bugs.python.org/issue6120 created aerodonkey help('modules ') causes IndexError. 05/27/09 CLOSED http://bugs.python.org/issue6121 created July patch, easy OSError: [Errno 10] No child processes 05/27/09 http://bugs.python.org/issue6122 created yonas tarfile: opening an empty tar file fails 05/27/09 http://bugs.python.org/issue6123 created evanj patch Tkinter should support the OS X zoom button 05/27/09 http://bugs.python.org/issue6124 created culler 2to3 mishandles "from module_name import" when module_name inclu 05/27/09 CLOSED http://bugs.python.org/issue6125 created MLModel Python 3 pdb: shows internal code, breakpoints don't work 05/27/09 http://bugs.python.org/issue6126 created ericp Unexpected universal newline behavior (newline duplication) in W 05/27/09 http://bugs.python.org/issue6127 created jaraco Consequences of using Py_TPFLAGS_HAVE_GC are incompletely explai 05/27/09 http://bugs.python.org/issue6128 created exarkun 2to3 does not convert imports of the form 'import sub.mod' to re 05/27/09 CLOSED http://bugs.python.org/issue6129 reopened MLModel There ought to be a way for extension types to associate documen 05/27/09 http://bugs.python.org/issue6130 created exarkun test_modulefinder leaks when run after test_distutils 05/27/09 CLOSED http://bugs.python.org/issue6131 created pitrou patch Implement the GIL with critical sections in Windows 05/27/09 http://bugs.python.org/issue6132 created sitbon patch LOAD_CONST followed by LOAD_ATTR can be optimized to just be a L 05/27/09 http://bugs.python.org/issue6133 created alex patch 2to3 tests fail on Windows due to line endings 05/28/09 CLOSED http://bugs.python.org/issue6134 created abbeyj patch subprocess seems to use local 8-bit encoding and gives no choice 05/28/09 http://bugs.python.org/issue6135 created mark Make logging configuration files easier to use 05/28/09 http://bugs.python.org/issue6136 created gjb1002 Pickle migration: Should pickle map "copy_reg" to "copyreg"? 05/28/09 http://bugs.python.org/issue6137 created mkiever './configure; make install' fails in setup.py step if .pydistuti 05/28/09 http://bugs.python.org/issue6138 created r.david.murray Typo in email.base64mime 05/28/09 http://bugs.python.org/issue6139 created ocean-city patch configure error: shadow.h: present but cannot be compiled 05/29/09 http://bugs.python.org/issue6140 created Sashi missing first argument on subprocess.Popen w/ executable 05/29/09 http://bugs.python.org/issue6141 created lieryan patch Distutils doesn't remove .pyc files 05/29/09 http://bugs.python.org/issue6142 created purpleidea patch Issues Now Closed (53) ______________________ Test issue 635 days http://bugs.python.org/issue1064 dtuser2 patch Return from fork() is pid_t, not int 478 days http://bugs.python.org/issue1983 pitrou patch pkg-config support 280 days http://bugs.python.org/issue3585 pitrou patch, needs review test_fileio fails on OpenBSD 4.4 249 days http://bugs.python.org/issue3877 pitrou patch ignored exceptions in generators (regression?) 236 days http://bugs.python.org/issue4040 doughellmann smtplib SMTP_SSL._get_socket doesn't return a value 228 days http://bugs.python.org/issue4066 r.david.murray patch Py_Object_HEAD_INIT in Py3k 183 days http://bugs.python.org/issue4385 georg.brandl Issue with RotatingFileHandler logging handler on Windows 147 days http://bugs.python.org/issue4749 rcronk pwd, spwd, grp functions vulnerable to denial of service 143 days http://bugs.python.org/issue4859 loewis patch time.ctime docs refer to "time tuple" for default 116 days http://bugs.python.org/issue5079 georg.brandl IDLE to support reindent.py 114 days http://bugs.python.org/issue5150 rhettinger smtplib is broken in Python3 103 days http://bugs.python.org/issue5259 r.david.murray patch, easy StringIO can duplicate newlines in universal newlines mode 102 days http://bugs.python.org/issue5265 jaraco OS X installer: fix makefile target changed for 3.x 101 days http://bugs.python.org/issue5272 ronaldoussoren sys.exc_info()[1] - different handling from str() and unicode() 102 days http://bugs.python.org/issue5274 georg.brandl OS X Installer: by default install versioned-only links in /usr/ 55 days http://bugs.python.org/issue5653 ronaldoussoren Speed up pickling of dicts in cPickle 53 days http://bugs.python.org/issue5670 pitrou patch, needs review idle pydoc et al removed from 3.1 without versioned replacements 3 days http://bugs.python.org/issue5756 nad add file name to py3k IO objects repr() 38 days http://bugs.python.org/issue5761 pitrou patch pickle/cPickle of recursive tuples create pickles that cPickle c 37 days http://bugs.python.org/issue5794 collinwinter patch, easy, 26backport there is en exception om Create User page 33 days http://bugs.python.org/issue5797 georg.brandl cPickle defect with tuples and different from pickle output 27 days http://bugs.python.org/issue5866 collinwinter classmethod, staticmethod: expose wrapped function 19 days http://bugs.python.org/issue5982 rhettinger patch enhance getargs O& to accept cleanup function 16 days http://bugs.python.org/issue6012 loewis patch test_distutils leaves a 'foo' file behind in the cwd 11 days http://bugs.python.org/issue6022 rpetrov "install" target in python 3.x makefile should be "fullinstall" 6 days http://bugs.python.org/issue6047 benjamin.peterson make distutils use the tarinfo command 11 days http://bugs.python.org/issue6048 tarek zipfile: Extracting a directory that already exists generates an 7 days http://bugs.python.org/issue6050 loewis patch PYTHONHOME should be more flexible (and controllable by --libdir 5 days http://bugs.python.org/issue6060 loewis bdist_msi.py failed assert when including extension modules 5 days http://bugs.python.org/issue6065 loewis patch threading.Timer and gtk.main are not compatible 7 days http://bugs.python.org/issue6073 amaury.forgeotdarc freeze.py doesn't work 1 days http://bugs.python.org/issue6078 georg.brandl SyntaxError in xmlrpc.client examples 1 days http://bugs.python.org/issue6079 georg.brandl str.format raises SystemError 1 days http://bugs.python.org/issue6089 eric.smith Ambiguous locale.strxfrm 0 days http://bugs.python.org/issue6093 loewis Python fails to build with Subversion 1.7 1 days http://bugs.python.org/issue6094 Arfrever patch SETUP_WITH 1 days http://bugs.python.org/issue6101 benjamin.peterson patch OSX framework builds fail after r72861 move of _locale into core 0 days http://bugs.python.org/issue6104 benjamin.peterson Subprocess.Popen output fails on Windows 1 days http://bugs.python.org/issue6107 ac.james IDLE has two "Preferences..." menu's on OSX 0 days http://bugs.python.org/issue6110 ronaldoussoren Impossible to change preferences in IDLE 0 days http://bugs.python.org/issue6111 nad scheduler.cancel does not raise RuntimeError 0 days http://bugs.python.org/issue6112 georg.brandl Dupicate instances of classes in list 0 days http://bugs.python.org/issue6113 mbaynham Header and doc related to PyNumber_Divide and PyNumber_InPlaceDi 0 days http://bugs.python.org/issue6115 georg.brandl patch urllib.parse.quote_plus ignores optional arguments 0 days http://bugs.python.org/issue6118 georg.brandl patch zipfile.ZipFile's extractall works inproperly under Windows 0 days http://bugs.python.org/issue6120 ocean-city help('modules ') causes IndexError. 1 days http://bugs.python.org/issue6121 r.david.murray patch, easy 2to3 mishandles "from module_name import" when module_name inclu 0 days http://bugs.python.org/issue6125 r.david.murray 2to3 does not convert imports of the form 'import sub.mod' to re 0 days http://bugs.python.org/issue6129 benjamin.peterson test_modulefinder leaks when run after test_distutils 2 days http://bugs.python.org/issue6131 ocean-city patch 2to3 tests fail on Windows due to line endings 1 days http://bugs.python.org/issue6134 benjamin.peterson patch os.listdir on empty strings. Inconsistent behaviour. 2063 days http://bugs.python.org/issue818059 benjamin.peterson patch, needs review Make fcntl work properly on AMD64 1332 days http://bugs.python.org/issue1309352 pitrou patch Top Issues Most Discussed (10) ______________________________ 13 Dupicate instances of classes in list 0 days closed http://bugs.python.org/issue6113 10 improved allocation of PyUnicode objects 5 days open http://bugs.python.org/issue1943 9 .pyc files created readonly if .py file is readonly, python won 9 days open http://bugs.python.org/issue6074 9 Python 2.6 makes .pyc/.pyo bytecode files executable 9 days open http://bugs.python.org/issue6070 8 LOAD_CONST followed by LOAD_ATTR can be optimized to just be a 2 days open http://bugs.python.org/issue6133 8 test_modulefinder leaks when run after test_distutils 2 days closed http://bugs.python.org/issue6131 8 OSError: [Errno 10] No child processes 2 days open http://bugs.python.org/issue6122 7 Impossible to change preferences in IDLE 0 days closed http://bugs.python.org/issue6111 6 zipfile.ZipFile's extractall works inproperly under Windows 0 days closed http://bugs.python.org/issue6120 6 SETUP_WITH 1 days closed http://bugs.python.org/issue6101 From dinov at microsoft.com Sat May 30 02:08:46 2009 From: dinov at microsoft.com (Dino Viehland) Date: Sat, 30 May 2009 00:08:46 +0000 Subject: [Python-Dev] Indentation oddness... Message-ID: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> Consider the code: code = "def Foo():\n\n pass\n\n " This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces). Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true. So let's look at different ways I can consume this code. If I use compile to compile this: compile(code, 'foo', 'single') I get an IndentationError: unindent does not match any outer indentation level But if I put this in a file: f= file('indenttest.py', 'w') f.write(code) f.close() import indenttest It imports just fine. If I run it through the tokenize module it also tokenizes just fine: >>> import tokenize >>> from cStringIO import StringIO >>> tokenize.tokenize(StringIO(code).readline) 1,0-1,3: NAME 'def' 1,5-1,8: NAME 'Foo' 1,8-1,9: OP '(' 1,9-1,10: OP ')' 1,10-1,11: OP ':' 1,11-1,12: NEWLINE '\n' 2,0-2,1: NL '\n' 3,0-3,4: INDENT ' ' 3,4-3,8: NAME 'pass' 3,8-3,9: NEWLINE '\n' 4,0-4,1: NL '\n' 5,0-5,0: DEDENT '' 5,0-5,0: ENDMARKER '' And if it fails anywhere it would seem tokenization is where it should fail - especially given that tokenize.py seems to report this error on other occasions. And stranger still if I add a new line then it will even compile fine: compile(code + '\n', 'foo', 'single') Which seems strange because in either case all of the trailing lines are blank lines and as such should basically be ignored according to the documentation. Is there some strange reason why compile rejects what everything else agrees is perfectly valid code? From dinov at microsoft.com Sat May 30 02:08:46 2009 From: dinov at microsoft.com (Dino Viehland) Date: Sat, 30 May 2009 00:08:46 +0000 Subject: [Python-Dev] Indentation oddness... Message-ID: <1A472770E042064698CB5ADC83A12ACD02915C20@TK5EX14MBXC118.redmond.corp.microsoft.com> Consider the code: code = "def Foo():\n\n pass\n\n " This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces). Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true. So let's look at different ways I can consume this code. If I use compile to compile this: compile(code, 'foo', 'single') I get an IndentationError: unindent does not match any outer indentation level But if I put this in a file: f= file('indenttest.py', 'w') f.write(code) f.close() import indenttest It imports just fine. If I run it through the tokenize module it also tokenizes just fine: >>> import tokenize >>> from cStringIO import StringIO >>> tokenize.tokenize(StringIO(code).readline) 1,0-1,3: NAME 'def' 1,5-1,8: NAME 'Foo' 1,8-1,9: OP '(' 1,9-1,10: OP ')' 1,10-1,11: OP ':' 1,11-1,12: NEWLINE '\n' 2,0-2,1: NL '\n' 3,0-3,4: INDENT ' ' 3,4-3,8: NAME 'pass' 3,8-3,9: NEWLINE '\n' 4,0-4,1: NL '\n' 5,0-5,0: DEDENT '' 5,0-5,0: ENDMARKER '' And if it fails anywhere it would seem tokenization is where it should fail - especially given that tokenize.py seems to report this error on other occasions. And stranger still if I add a new line then it will even compile fine: compile(code + '\n', 'foo', 'single') Which seems strange because in either case all of the trailing lines are blank lines and as such should basically be ignored according to the documentation. Is there some strange reason why compile rejects what everything else agrees is perfectly valid code? From robert.kern at gmail.com Sat May 30 02:26:22 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 29 May 2009 19:26:22 -0500 Subject: [Python-Dev] Indentation oddness... In-Reply-To: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> References: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> Message-ID: On 2009-05-29 19:08, Dino Viehland wrote: > Consider the code: > > code = "def Foo():\n\n pass\n\n " > > This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces). Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true. So let's look at different ways I can consume this code. > > If I use compile to compile this: > > compile(code, 'foo', 'single') > > I get an IndentationError: unindent does not match any outer indentation level > > But if I put this in a file: > > f= file('indenttest.py', 'w') > f.write(code) > f.close() > import indenttest > > It imports just fine. The 'single' mode, which is used for the REPL, is a bit different than 'exec', which is used for modules. This difference lets you insert "blank" lines of whitespace into a function definition without exiting the definition. Ending with a truly empty line does not cause the IndentationError, so the REPL can successfully compile the code, signaling that the user has finished typing the function. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dinov at microsoft.com Sat May 30 02:52:33 2009 From: dinov at microsoft.com (Dino Viehland) Date: Sat, 30 May 2009 00:52:33 +0000 Subject: [Python-Dev] Indentation oddness... In-Reply-To: References: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> Message-ID: <1A472770E042064698CB5ADC83A12ACD0294ADCA@TK5EX14MBXC118.redmond.corp.microsoft.com> > The 'single' mode, which is used for the REPL, is a bit different than > 'exec', > which is used for modules. This difference lets you insert "blank" > lines of > whitespace into a function definition without exiting the definition. > Ending > with a truly empty line does not cause the IndentationError, so the > REPL can > successfully compile the code, signaling that the user has finished > typing the > function. Sorry, I probably should have mentioned this but it repros w/ compile(..., "exec") as well: >>> code = "def Foo():\n\n pass\n\n " >>> compile(code, 'foo', 'exec') Traceback (most recent call last): File "", line 1, in File "foo", line 5 IndentationError: unindent does not match any outer indentation level It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under single and exec. From guido at python.org Sat May 30 04:19:34 2009 From: guido at python.org (Guido van Rossum) Date: Fri, 29 May 2009 19:19:34 -0700 Subject: [Python-Dev] Indentation oddness... In-Reply-To: <1A472770E042064698CB5ADC83A12ACD0294ADCA@TK5EX14MBXC118.redmond.corp.microsoft.com> References: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> <1A472770E042064698CB5ADC83A12ACD0294ADCA@TK5EX14MBXC118.redmond.corp.microsoft.com> Message-ID: I usually append some extra newlines before passing a string to compile(). That's the usual work-around. There's probably a subtle bug in the tokenizer when reading from a string -- if you find it, please upload a patch to the tracker! --Guido On Fri, May 29, 2009 at 5:52 PM, Dino Viehland wrote: >> The 'single' mode, which is used for the REPL, is a bit different than >> 'exec', >> which is used for modules. This difference lets you insert "blank" >> lines of >> whitespace into a function definition without exiting the definition. >> Ending >> with a truly empty line does not cause the IndentationError, so the >> REPL can >> successfully compile the code, signaling that the user has finished >> typing the >> function. > > Sorry, I probably should have mentioned this but it repros w/ > compile(..., "exec") as well: > >>>> code = "def ?Foo():\n\n ? ?pass\n\n ?" >>>> compile(code, 'foo', 'exec') > Traceback (most recent call last): > ?File "", line 1, in > ?File "foo", line 5 > > IndentationError: unindent does not match any outer indentation level > > It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under > single and exec. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dinov at microsoft.com Sat May 30 17:35:44 2009 From: dinov at microsoft.com (Dino Viehland) Date: Sat, 30 May 2009 15:35:44 +0000 Subject: [Python-Dev] Indentation oddness... In-Reply-To: References: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> <1A472770E042064698CB5ADC83A12ACD0294ADCA@TK5EX14MBXC118.redmond.corp.microsoft.com> Message-ID: <1A472770E042064698CB5ADC83A12ACD029521FD@TK5EX14MBXC118.redmond.corp.microsoft.com> Unfortunately my problem is the opposite one - trying to emulate what compile does for IronPython rather than just trying to make some code compile. So adding newlines doesn't help me. But this case isn't really that important - it was just a wacky corner case I ran into while trying to get other behavior right. I think I can safely ignore this one especially if it's just a bug. > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of > Guido van Rossum > Sent: Friday, May 29, 2009 7:20 PM > To: Dino Viehland > Cc: Robert Kern; python-dev at python.org > Subject: Re: [Python-Dev] Indentation oddness... > > I usually append some extra newlines before passing a string to > compile(). That's the usual work-around. There's probably a subtle bug > in the tokenizer when reading from a string -- if you find it, please > upload a patch to the tracker! > > --Guido > > On Fri, May 29, 2009 at 5:52 PM, Dino Viehland > wrote: > >> The 'single' mode, which is used for the REPL, is a bit different > than > >> 'exec', > >> which is used for modules. This difference lets you insert "blank" > >> lines of > >> whitespace into a function definition without exiting the definition. > >> Ending > >> with a truly empty line does not cause the IndentationError, so the > >> REPL can > >> successfully compile the code, signaling that the user has finished > >> typing the > >> function. > > > > Sorry, I probably should have mentioned this but it repros w/ > > compile(..., "exec") as well: > > > >>>> code = "def Foo():\n\n pass\n\n " > >>>> compile(code, 'foo', 'exec') > > Traceback (most recent call last): > > File "", line 1, in > > File "foo", line 5 > > > > IndentationError: unindent does not match any outer indentation level > > > > It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under > > single and exec. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: http://mail.python.org/mailman/options/python- > dev/guido%40python.org > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) From benjamin at python.org Sat May 30 20:04:35 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 30 May 2009 13:04:35 -0500 Subject: [Python-Dev] [RELEASED] Python 3.1 Release Candidate 1 Message-ID: <1afaf6160905301104w203b5a76u5d9909942f91ecb4@mail.gmail.com> On behalf of the Python development team, I'm happy to announce the first release candidate of Python 3.1. Python 3.1 focuses on the stabilization and optimization of the features and changes that Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. File system APIs that use unicode strings now handle paths with undecodable bytes in them. Other features include an ordered dictionary implementation, a condensed syntax for nested with statements, and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/dev/py3k/whatsnew/3.1.html or Misc/NEWS in the Python distribution. This is a release candidate, and as such, we do not recommend use in production environments. However, please take this opportunity to test the release with your libraries or applications. This will hopefully discover bugs before the final release and allow you to determine how changes in 3.1 might impact you. If you find things broken or incorrect, please submit a bug report at http://bugs.python.org For more information and downloadable distributions, see the Python 3.1 website: http://www.python.org/download/releases/3.1/ See PEP 375 for release schedule details: http://www.python.org/dev/peps/pep-0375/ Enjoy, -- Benjamin Benjamin Peterson benjamin at python.org Release Manager (on behalf of the entire python-dev team and 3.1's contributors) From carmstr3 at illinois.edu Sat May 30 20:02:04 2009 From: carmstr3 at illinois.edu (carmstr3 at illinois.edu) Date: Sat, 30 May 2009 13:02:04 -0500 (CDT) Subject: [Python-Dev] looking for some people to talk with about Python development Message-ID: <20090530130204.BOO92267@expms3.cites.uiuc.edu> Hello, My name is Chandler Armstrong and I'm investigating environments of collaboration. I'm a PhD candidate at the University of Illinois, Urbana-Champaign, specialized in internet research and science & technology studies. I'm generally interested in development methods overall, and specifically interested in both artificial languange construction and evolution, and collaboration in open-source models. I would like to talk to some members of the Python development community about what kinds of activities they do within it. If anybody is interested in this please email me at carmstr3 at illinois.edu. I will send you a document that describes the research and interview in more detail. I'd like to do a voice interview over skype or a phone, but I can accomodate an online chat or even email. I have some current research on this specific mailing list which is more quantitative in nature. I downloaded the entire mailing list from the archives. Next I looked through all the python-dev summaries and used links provided to referenced threads to indicate that a particular message or thread was meaningful in development. I characterized the mailing list as threads, and each instance with about 30 attributes (things like the number of posts, the depth of the tree, a measure of 'branchyness' of the thread, the standard deviation of post counts across posters, the hour/day/month of the thread, etc). Using these attributes I attempted to classify, using logistic regression, the threads that were indicated as meaningful in the python-dev summaries. There are some significant results. If anyone is interested I can send you my results, or even post them here to the list. I'll be presenting my results at the Classification Society Conference at St. Louis in June. The ! wo! rk is unpublished at the moment but I hope to find a journal for it this summer. I used entirely Python for all that quantitative work: downloading the mailing list and going through all the summaries, opening the links and matching the referenced message to the correct one in my downloaded database, and cleaning and transforming data. It was a ton of fun. I hope to develop more scripts for other sorts of automated analysis. At any rate, please contact me if you'de like to contribute to my current tack of investigation. I would ultimately want to interview however many people that are willing to talk with me. I need to do about two in the next couple of weeks, and I would get with other volunteers in the weeks after that. Thanks, Chandler Armstrong carmstr3 at illinois.edu From nnorwitz at gmail.com Sat May 30 20:54:09 2009 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sat, 30 May 2009 11:54:09 -0700 Subject: [Python-Dev] cleanup before 3.1 is released Message-ID: Has anyone run valgrind/purify and pychecker/pylint on the 3.1 code recently? Both sets of tools should be used before the final release so we can fix any obvious problems. n From g.brandl at gmx.net Sat May 30 22:43:19 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 30 May 2009 22:43:19 +0200 Subject: [Python-Dev] cleanup before 3.1 is released In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > Has anyone run valgrind/purify and pychecker/pylint on the 3.1 code > recently? Both sets of tools should be used before the final release > so we can fix any obvious problems. Do pychecker/pylint work on 3.x code? Gerog -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From wojtek.gminick.walczak at gmail.com Sun May 31 01:35:38 2009 From: wojtek.gminick.walczak at gmail.com (Wojciech Walczak) Date: Sun, 31 May 2009 01:35:38 +0200 Subject: [Python-Dev] [Sphinx] GSoC project announcement Message-ID: <2c3c21060905301635h7d2c195bsbd30057156a8b413@mail.gmail.com> Hi, guys, just a short introduction of one of this year's GSoC PSF projects: I am implementing a support for per-paragraph comments and user/developer interface for submitting/committing fixes in Sphinx[1]. In case you are interesed in adding your 2 cents (or more) by commenting on my application[2] or proposing some enhancements - feel free to do so on sphinx-dev[3]. Or take a look at my blog to keep up to date[4]. [1] - http://sphinx.pocoo.org/ [2] - http://tosh.pl/gminick/gsoc/sphinx/ [3] - http://groups.google.com/group/sphinx-dev [4] - http://gminick.wordpress.com Best regards, -- Wojtek Walczak http://tosh.pl/gminick/ From greg.ewing at canterbury.ac.nz Sun May 31 03:21:15 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 31 May 2009 13:21:15 +1200 Subject: [Python-Dev] Survey on DVCS usage and experience In-Reply-To: References: <20090529033508.GA4463@ubuntu.ubuntu-domain> <4CE22B25-BC10-40FB-83B3-449E894E8FAB@usask.ca> Message-ID: <4A21DB8B.7070301@canterbury.ac.nz> Antoine Pitrou wrote: > you can't be sure all the responders are > over 18. Actually, they might even not be human beings! > (hint: I'm not) Not over 18, or not a human being? -- Greg From greg.ewing at canterbury.ac.nz Sun May 31 04:02:41 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 31 May 2009 14:02:41 +1200 Subject: [Python-Dev] Indentation oddness... In-Reply-To: References: <1A472770E042064698CB5ADC83A12ACD02915C13@TK5EX14MBXC118.redmond.corp.microsoft.com> Message-ID: <4A21E541.1020003@canterbury.ac.nz> Robert Kern wrote: > The 'single' mode, which is used for the REPL, is a bit different than > 'exec', which is used for modules. This difference lets you insert > "blank" lines of whitespace into a function definition without exiting > the definition. All that means is that the REPL needs to keep reading lines until it gets a completely blank one. I don't see why the compiler has to treat the source any differently once the REPL has decided how much text to feed it. -- Greg