From ncoghlan at gmail.com Wed Apr 1 00:03:05 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Apr 2009 08:03:05 +1000 Subject: [Python-Dev] And the winner is... In-Reply-To: <87y6ulvdb4.fsf@xemacs.org> References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <49D20FB3.9050400@gmail.com> <87y6ulvdb4.fsf@xemacs.org> Message-ID: <49D29319.2010902@gmail.com> Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > Every single git command line example I have seen gives me exactly the > > same gut reaction I get whenever I have to read Perl code. > > Every single one? Sounds to me like the cause is probably something > you ate, not anything you read. In the examples in the PEP, about 80% > of the commands were syntactically identical across VCSes. What, hyperbole on the internets? ;) The non-trivial examples are the ones I was talking about - as you say, for trivial tasks, the only difference is typically going to be in the exact name of the command. > I hope nobody is put off either git or bzr by the result of this PEP. > If there's anything striking about the PEP's examples, it's how > similar the usage of the VCSes would be in the context of Python's > workflow. There are important differences, and I agree with Guido's > choice, for Python, on March 30, 2009. But all three are capable > VCSes, with advantages and disadvantages, and were this PEP started > next June rather than last December, the result could have been very > different. Indeed! (although I doubt git's CLI will ever evolve into anything I could claim to love) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From eduardo.padoan at gmail.com Wed Apr 1 00:20:49 2009 From: eduardo.padoan at gmail.com (Eduardo O. Padoan) Date: Tue, 31 Mar 2009 19:20:49 -0300 Subject: [Python-Dev] And the winner is... In-Reply-To: <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com> References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com> Message-ID: On Tue, Mar 31, 2009 at 3:04 PM, Mike Coleman wrote: > It looks like there might be a Python clone sprouting here: > > ? ?http://gitorious.org/projects/git-python/ AFAIK, git-python is just a lib to manipulate git repos from python, not a git clone. Dulwich is more like it: http://samba.org/~jelmer/dulwich/ -- Eduardo de Oliveira Padoan http://importskynet.blogspot.com http://djangopeople.net/edcrypt/ "Distrust those in whom the desire to punish is strong." -- Goethe, Nietzsche, Dostoevsky From martin at v.loewis.de Wed Apr 1 00:44:29 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 31 Mar 2009 17:44:29 -0500 Subject: [Python-Dev] Test failures under Windows? In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local> <49C9EEB5.2090804@gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local> Message-ID: <49D29CCD.9000701@v.loewis.de> > I guess I'll stop asking after this note, but can anyone give a final > verdict on whether the older "-n" option can be restored to the > buildbot test.bat (from the revision history I'm not actually sure it > was intentionally removed in the first place)? I have now restored it; it was removed by an unintentional merge from the trunk. Notice, however, that the feature was never present in the trunk. Regards, Martin From greg.ewing at canterbury.ac.nz Wed Apr 1 00:47:23 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 01 Apr 2009 10:47:23 +1200 Subject: [Python-Dev] Broken import? In-Reply-To: <49D20B54.1010108@gmail.com> References: <49D20B54.1010108@gmail.com> Message-ID: <49D29D7B.7000002@canterbury.ac.nz> Nick Coghlan wrote: > Jim Fulton's example in that tracker issue shows that with a bit of > creativity you can provoke this behaviour *without* using a from-style > import. Torsten Bronger later brought up the same issue that Fredrik did > - it prevents some kinds of explicit relative import that look like they > should be fine. I haven't been following this very closely, but if there's something that's making absolute and relative imports behave differently, I think it should be fixed. The only difference between an absolute and relative import of the same module should be the way you specify the module. -- Greg From kristjan at ccpgames.com Wed Apr 1 01:28:38 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 31 Mar 2009 23:28:38 +0000 Subject: [Python-Dev] Test failures under Windows? In-Reply-To: <4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local> <49C9EEB5.2090804@gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local> <4222a8490903311431u351b8d7kfb1e6cca716b1976@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D527435@exchis.ccp.ad.local> <4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D527439@exchis.ccp.ad.local> Revision 70843. I don't know when this crept in. I didn't go and check if it applies to other branches too. Also, I'm sorry for just checking this in witout warning. But I had just spent what amounts to a full day tracking this down which was tricky because it happens in a subprocess and those are hard to debug on windows. My eagerness got the best of me. But again, it shows how useful assertions can be and why we ought not to disable them. Cheers, Kristj?n -----Original Message----- From: Jesse Noller [mailto:jnoller at gmail.com] Sent: 31. mars 2009 22:00 To: Kristj?n Valur J?nsson Cc: Curt Hagenlocher; mhammond at skippinet.com.au; David Bolen; python-dev at python.org Subject: Re: [Python-Dev] Test failures under Windows? Does it need to be backported? I wonder when that was introduced. Also, what CL was it so I can review it? 2009/3/31 Kristj?n Valur J?nsson : > I found a different problem in multiprocessing, for the py3k. > In import.c, get_file.c, it was knowingly leaking FILE objects, while the underlying fh was being correctly closed. ?This caused the CRT to assert when cleaning up FILE pointers on subprocess exit. > I fixed this this afternoon in a submission to the py3k branch. > > K From greg.ewing at canterbury.ac.nz Wed Apr 1 01:50:40 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 01 Apr 2009 11:50:40 +1200 Subject: [Python-Dev] And the winner is... In-Reply-To: <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com> References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> <3c6c07c20903311104i6b50a9eeg3362ade5cf981c5c@mail.gmail.com> Message-ID: <49D2AC50.5040107@canterbury.ac.nz> Mike Coleman wrote: > I mentioned this once on the git list and Linus' response was > something like "C lets me see exactly what's going on". I'm not > unsympathetic to this point of view--I'm really growing to loathe C++ > partly because it *doesn't* let me see exactly what's going on--but > I'm not convinced, either. I think Python lets you see exactly what's going on too, at the level of abstraction you're working with. The problem with C++ is that it indiscriminately mixes up wildly different levels of abstraction, so that it's hard to look at a piece of code and decide whether it's doing something high-level or low-level. Python takes a uniformly high-level view of everything, which is fine for the vast majority of application programming, I think -- VCSes included. -- Greg From tseaver at palladion.com Wed Apr 1 02:42:41 2009 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 31 Mar 2009 20:42:41 -0400 Subject: [Python-Dev] And the winner is... In-Reply-To: <874oxaw95q.fsf@xemacs.org> References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen J. Turnbull wrote: > I also just wrote a long post about the comparison of bzr to hg > responding to a comment on bazaar at canonical.com. I won't recap it > here but it might be of interest. Thank you very much for your writeups on that thread: both in tone and in content I found them extremely helpful. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ0riB+gerLs4ltQ4RAir2AJ4rXedI4gfkaZxP5LRiOSonAI/csQCgqkpb CY6QHmE8VHpGYGaENeUMnXQ= =t/1R -----END PGP SIGNATURE----- From db3l.net at gmail.com Wed Apr 1 02:50:44 2009 From: db3l.net at gmail.com (David Bolen) Date: Tue, 31 Mar 2009 20:50:44 -0400 Subject: [Python-Dev] Test failures under Windows? References: <930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local> <49C9EEB5.2090804@gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local> <49D29CCD.9000701@v.loewis.de> Message-ID: "Martin v. L?wis" writes: > Notice, however, that the feature was never present in the trunk. Yep - would be nice if it were to get backported to trunk at some point but that's a separate discussion ... presumably at some point py3k will be the trunk anyway, and for better or worst (perhaps due to the sorts of changes made) the assertions seem to have hit the py3k branch more than others. Thanks for the test.bat change. -- David From db3l.net at gmail.com Wed Apr 1 02:51:57 2009 From: db3l.net at gmail.com (David Bolen) Date: Tue, 31 Mar 2009 20:51:57 -0400 Subject: [Python-Dev] Test failures under Windows? References: <930F189C8A437347B80DF2C156F7EC7F05068D5486@exchis.ccp.ad.local> <49C9EEB5.2090804@gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D526790@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F056D5272C8@exchis.ccp.ad.local> <4222a8490903311431u351b8d7kfb1e6cca716b1976@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D527435@exchis.ccp.ad.local> <4222a8490903311459m68e7b9f8m8cfcf27aa71b92ac@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F056D527439@exchis.ccp.ad.local> Message-ID: Kristj?n Valur J?nsson writes: > But again, it shows how useful assertions can be and why we ought > not to disable them. Note that just to be clear, I'm certainly not advocating the disabling of CRT assertions - just the redirection of them so they don't prevent unattended test runs from completing. -- David From aleaxit at gmail.com Wed Apr 1 03:24:56 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Tue, 31 Mar 2009 18:24:56 -0700 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Stephen J. Turnbull wrote: > > > I also just wrote a long post about the comparison of bzr to hg > > responding to a comment on bazaar at canonical.com. I won't recap it > > here but it might be of interest. > > Thank you very much for your writeups on that thread: both in tone and > in content I found them extremely helpful. I'd like to read that thread for my edification -- might there be a URL for it perhaps...? Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleaxit at gmail.com Wed Apr 1 03:26:42 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Tue, 31 Mar 2009 18:26:42 -0700 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Stephen J. Turnbull wrote: > > > I also just wrote a long post about the comparison of bzr to hg > > responding to a comment on bazaar at canonical.com. I won't recap it > > here but it might be of interest. > > Thank you very much for your writeups on that thread: both in tone and > in content I found them extremely helpful. I'd like to read that thread for my edification -- might there be a URL for it perhaps...? Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre at peadrop.com Wed Apr 1 03:33:42 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 31 Mar 2009 21:33:42 -0400 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: 2009/3/31 Alex Martelli : > On Tue, Mar 31, 2009 at 5:42 PM, Tres Seaver wrote: >> Stephen J. Turnbull wrote: >> >> > I also just wrote a long post about the comparison of bzr to hg >> > responding to a comment on bazaar at canonical.com. ?I won't recap it >> > here but it might be of interest. >> >> Thank you very much for your writeups on that thread: ?both in tone and >> in content I found them extremely helpful. > > I'd like to read that thread for my edification -- might there be a URL for > it perhaps...? > https://lists.ubuntu.com/archives/bazaar/2009q1/055850.html https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html -- Alexandre From aleaxit at gmail.com Wed Apr 1 03:39:27 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Tue, 31 Mar 2009 18:39:27 -0700 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: On Tue, Mar 31, 2009 at 6:33 PM, Alexandre Vassalotti wrote: ... > html > https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html > Perfect, thanks! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleaxit at gmail.com Wed Apr 1 03:41:20 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Tue, 31 Mar 2009 18:41:20 -0700 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: On Tue, Mar 31, 2009 at 6:33 PM, Alexandre Vassalotti wrote: ... > html > https://lists.ubuntu.com/archives/bazaar/2009q1/055872.html > Perfect, thanks! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From fijall at gmail.com Wed Apr 1 05:15:29 2009 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Apr 2009 05:15:29 +0200 Subject: [Python-Dev] issue5578 - explanation Message-ID: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> So. The issue was closed and I suppose it was closed by not entirely understanding the problem (or I didn't get it completely). The question is - what the following code should do? def f(): a = 2 class C: exec 'a = 42' abc = a return C print f().abc (quick answer - on python2.5 it return 42, on python 2.6 and up it returns 2, the patch changes it to syntax error). I would say that returning 2 is the less obvious thing to do. The reason why IMO this should be a syntax error is this code: def f(): a = 2 def g(): exec 'a = 42' abc = a which throws syntax error. Cheers, fijal From guido at python.org Wed Apr 1 05:25:01 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Mar 2009 20:25:01 -0700 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> Message-ID: Well hold on for a minute, I remember we used to have an exec statement in a class body in the standard library, to define some file methods in socket.py IIRC. It's a totally different case than exec in a nested function, and I don't believe it should be turned into a syntax error at all. An exec in a class body is probably meant to define some methods or other class attributes. I actually think the 2.5 behavior is correct, and I don't know why it changed in 2.6. --Guido On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski wrote: > So. The issue was closed and I suppose it was closed by not entirely > understanding > the problem (or I didn't get it completely). > > The question is - what the following code should do? > > def f(): > ?a = 2 > ?class C: > ? ?exec 'a = 42' > ? ?abc = a > ?return C > > print f().abc > > (quick answer - on python2.5 it return 42, on python 2.6 and up it > returns 2, the patch changes > it to syntax error). > > I would say that returning 2 is the less obvious thing to do. The > reason why IMO this should > be a syntax error is this code: > > def f(): > ?a = 2 > ?def g(): > ? ?exec 'a = 42' > ? ?abc = a > > which throws syntax error. > > Cheers, > fijal > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 1 05:34:15 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Mar 2009 20:34:15 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D26BB1.8050108@hastings.org> References: <49D26BB1.8050108@hastings.org> Message-ID: Can you get Jim Fulton's feedback? ISTR he originated this. On Tue, Mar 31, 2009 at 12:14 PM, Larry Hastings wrote: > > The CObject API has two flaws. > > First, there is no usable type safety mechanism. ?You can store a void > *object, and a void *description. ?There is no established schema for > the description; it could be an integer cast to a pointer, or it could > point to memory of any configuration, or it could be NULL. ?Thus users > of the CObject API generally ignore it--thus working without any type > safety whatsoever. ?A programmer could crash the interpreter from pure > Python by mixing and matching CObjects from different modules (e.g. give > "curses" a CObject from "_ctypes"). > > Second, the destructor callback is defined as taking *either* one *or* > two parameters, depending on whether the "descr" pointer is non-NULL. One > can debate the finer points of what is and isn't defined behavior in > C, but at its heart this is a sloppy API. > > MvL and I discussed this last night and decided to float a revision of > the API. ?I wrote the patch, though, so don't blame Martin if you don't > like my specific approach. > > The color of this particular bike shed is: > * The PyCObject is now a private data structure; you must use accessors. > ?I added accessors for all the members. > * The constructors and the main accessor (PyCObject_AsVoidPtr) now all > ?*require* a "const char *type" parameter, which must be a non-NULL C > ?string of non-zero length. ?If you call that accessor and the "type" > ?is invalid *or doesn't match*, it fails. > * The destructor now takes the PyObject *, not the PyCObject *. ?You > ?must use accessors to get your hands on the data inside to free it. > > Yes, you can easily skip around the "matching type" restriction by > calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)). ?The point > of my API changes is to *encourage* correct use. > > I've posted a patch implementing this change in the 3.1 trunk to the > bug tracker: > > ? http://bugs.python.org/issue5630 > > I look forward to your comments! > > > /larry/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fijall at gmail.com Wed Apr 1 05:36:24 2009 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Apr 2009 05:36:24 +0200 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> Message-ID: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> Because classes have now it's own local scope (according to Martin) It's not about exec in class, it's about exec in class in nested function. On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum wrote: > Well hold on for a minute, I remember we used to have an exec > statement in a class body in the standard library, to define some file > methods in socket.py IIRC. ?It's a totally different case than exec in > a nested function, and I don't believe it should be turned into a > syntax error at all. An exec in a class body is probably meant to > define some methods or other class attributes. I actually think the > 2.5 behavior is correct, and I don't know why it changed in 2.6. > > --Guido > > On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski wrote: >> So. The issue was closed and I suppose it was closed by not entirely >> understanding >> the problem (or I didn't get it completely). >> >> The question is - what the following code should do? >> >> def f(): >> ?a = 2 >> ?class C: >> ? ?exec 'a = 42' >> ? ?abc = a >> ?return C >> >> print f().abc >> >> (quick answer - on python2.5 it return 42, on python 2.6 and up it >> returns 2, the patch changes >> it to syntax error). >> >> I would say that returning 2 is the less obvious thing to do. The >> reason why IMO this should >> be a syntax error is this code: >> >> def f(): >> ?a = 2 >> ?def g(): >> ? ?exec 'a = 42' >> ? ?abc = a >> >> which throws syntax error. >> >> Cheers, >> fijal >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From guido at python.org Wed Apr 1 05:38:13 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Mar 2009 20:38:13 -0700 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> Message-ID: OK that might change matters. Shame on you though for posting a patch without any explanation of the issue. On Tue, Mar 31, 2009 at 8:36 PM, Maciej Fijalkowski wrote: > Because classes have now it's own local scope (according to Martin) > > It's not about exec in class, it's about exec in class in nested function. > > On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum wrote: >> Well hold on for a minute, I remember we used to have an exec >> statement in a class body in the standard library, to define some file >> methods in socket.py IIRC. ?It's a totally different case than exec in >> a nested function, and I don't believe it should be turned into a >> syntax error at all. An exec in a class body is probably meant to >> define some methods or other class attributes. I actually think the >> 2.5 behavior is correct, and I don't know why it changed in 2.6. >> >> --Guido >> >> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski wrote: >>> So. The issue was closed and I suppose it was closed by not entirely >>> understanding >>> the problem (or I didn't get it completely). >>> >>> The question is - what the following code should do? >>> >>> def f(): >>> ?a = 2 >>> ?class C: >>> ? ?exec 'a = 42' >>> ? ?abc = a >>> ?return C >>> >>> print f().abc >>> >>> (quick answer - on python2.5 it return 42, on python 2.6 and up it >>> returns 2, the patch changes >>> it to syntax error). >>> >>> I would say that returning 2 is the less obvious thing to do. The >>> reason why IMO this should >>> be a syntax error is this code: >>> >>> def f(): >>> ?a = 2 >>> ?def g(): >>> ? ?exec 'a = 42' >>> ? ?abc = a >>> >>> which throws syntax error. >>> >>> Cheers, >>> fijal >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >> >> >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fijall at gmail.com Wed Apr 1 06:16:30 2009 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Apr 2009 06:16:30 +0200 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> Message-ID: <693bc9ab0903312116o78d31684t8bafbb4b80587047@mail.gmail.com> Shame on me indeed. On Wed, Apr 1, 2009 at 5:38 AM, Guido van Rossum wrote: > OK that might change matters. Shame on you though for posting a patch > without any explanation of the issue. > > On Tue, Mar 31, 2009 at 8:36 PM, Maciej Fijalkowski wrote: >> Because classes have now it's own local scope (according to Martin) >> >> It's not about exec in class, it's about exec in class in nested function. >> >> On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum wrote: >>> Well hold on for a minute, I remember we used to have an exec >>> statement in a class body in the standard library, to define some file >>> methods in socket.py IIRC. ?It's a totally different case than exec in >>> a nested function, and I don't believe it should be turned into a >>> syntax error at all. An exec in a class body is probably meant to >>> define some methods or other class attributes. I actually think the >>> 2.5 behavior is correct, and I don't know why it changed in 2.6. >>> >>> --Guido >>> >>> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski wrote: >>>> So. The issue was closed and I suppose it was closed by not entirely >>>> understanding >>>> the problem (or I didn't get it completely). >>>> >>>> The question is - what the following code should do? >>>> >>>> def f(): >>>> ?a = 2 >>>> ?class C: >>>> ? ?exec 'a = 42' >>>> ? ?abc = a >>>> ?return C >>>> >>>> print f().abc >>>> >>>> (quick answer - on python2.5 it return 42, on python 2.6 and up it >>>> returns 2, the patch changes >>>> it to syntax error). >>>> >>>> I would say that returning 2 is the less obvious thing to do. The >>>> reason why IMO this should >>>> be a syntax error is this code: >>>> >>>> def f(): >>>> ?a = 2 >>>> ?def g(): >>>> ? ?exec 'a = 42' >>>> ? ?abc = a >>>> >>>> which throws syntax error. >>>> >>>> Cheers, >>>> fijal >>>> _______________________________________________ >>>> Python-Dev mailing list >>>> Python-Dev at python.org >>>> http://mail.python.org/mailman/listinfo/python-dev >>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>>> >>> >>> >>> >>> -- >>> --Guido van Rossum (home page: http://www.python.org/~guido/) >>> >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From rdmurray at bitdance.com Wed Apr 1 07:17:05 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 1 Apr 2009 01:17:05 -0400 (EDT) Subject: [Python-Dev] 3.1a2 In-Reply-To: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> Message-ID: On Tue, 31 Mar 2009 at 14:09, Benjamin Peterson wrote: > I haven't looked at #4847 in depth, but appears that the csv module > will need some API changes to deal with encodings. Perhaps somebody > would like to sprint on it? First we have to figure out what should be done. http://bugs.python.org/4847 Having read through the ticket, it seems that a CSV file must be (and 2.6 was) treated as a binary file, and part of the CSV module's job is to convert that binary data to and from strings. That is, the CSV module is at the same layer of the input stack as the TextIOWrapper. So IMO it should have an encoding parameter, and the defaults should be handled the same way they are for TextIOBase. _csv as indicated by the initial error report is in py3k expecting to read strings from the iterator passed to it, which IMO is wrong. It should be expecting bytes. The problem with this solution is that those people currently passing it string iterators would have to change their code. The documentation says "If csvfile is a file object, it must be opened with the ???b??? flag on platforms where that makes a difference." With the advent of unicode strings, it now makes a difference on all platforms. -- R. David Murray http://www.bitdance.com From p.f.moore at gmail.com Wed Apr 1 10:57:34 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 1 Apr 2009 09:57:34 +0100 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> <874oxaw95q.fsf@xemacs.org> Message-ID: <79990c6b0904010157p25ac7212v77e1b85947e364da@mail.gmail.com> 2009/4/1 Tres Seaver : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Stephen J. Turnbull wrote: > >> I also just wrote a long post about the comparison of bzr to hg >> responding to a comment on bazaar at canonical.com. ?I won't recap it >> here but it might be of interest. > > Thank you very much for your writeups on that thread: ?both in tone and > in content I found them extremely helpful. Agreed. Paul From solipsis at pitrou.net Wed Apr 1 12:07:15 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 1 Apr 2009 10:07:15 +0000 (UTC) Subject: [Python-Dev] CSV, bytes and encodings References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> Message-ID: R. David Murray bitdance.com> writes: > > Having read through the ticket, it seems that a CSV file must be (and > 2.6 was) treated as a binary file, and part of the CSV module's job > is to convert that binary data to and from strings. IMO this interpretation is flawed. In 2.6 there is no tangible difference between "binary" and "text" files, except for newline handling. Also, as a matter of fact, if you want the 2.x CSV module to read a file with Windows line endings, you have to open the file in "rU" mode (that is, the closest we have to a moral equivalent of the 3.x text files). Therefore, I don't think 2.x is of any guidance to us for what 3.x should do. I see three possible practical cases that, ideally, the 3.x CSV module should be able to handle: 1. be handed a binary file (yielding bytes) without an encoding: in this case, the CSV module should return lists of bytes objects 2. be handed a text file (yielding str) without an encoding: in this case, the CSV module should return lists of str objects 3. be handed a binary file (yielding bytes) with an encoding: in this case, the CSV module should also return lists of str objects I think 2 and 3 both /should/ be supported (for 3, it's probably enough to wrap the binary file in a TextIOWrapper ;-)). 1 would be convenient too, but perhaps more work than it deserves (since it means the CSV module must be able to deal internally with two different datatypes: bytes and str). > The documentation says "If csvfile is a file object, it must be opened > with the ?b? flag on platforms where that makes a difference." The documentation is, IMO, wrong even in 2.x. Just yesterday I had to open a CSV file in 'rU' mode because it had Windows line endings and I'm under Linux.... Regards Antoine. From skip at pobox.com Wed Apr 1 12:37:38 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 1 Apr 2009 05:37:38 -0500 Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> Message-ID: <18899.17394.455907.841425@montanaro.dyndns.org> >> Having read through the ticket, it seems that a CSV file must be (and >> 2.6 was) treated as a binary file, and part of the CSV module's job >> is to convert that binary data to and from strings. Antoine> IMO this interpretation is flawed. In 2.6 there is no tangible Antoine> difference between "binary" and "text" files, except for Antoine> newline handling. Also, as a matter of fact, if you want the Antoine> 2.x CSV module to read a file with Windows line endings, you Antoine> have to open the file in "rU" mode (that is, the closest we Antoine> have to a moral equivalent of the 3.x text files). The problem is that fields in CSV files, at least those produced by Excel, can contain embedded newlines. You are welcome to decide that *all* CRLF pairs should be translated to LF, but that is not the decision the original authors (mostly Andrew MacNamara) made. The contents of the fields was deemed to be separate from the newline convention, so the csv module needed to do its own newline processing, and thus required files to be opened in binary mode. This case arises rarely, but it does turn up every now and again. If you are comfortable with translating all CRLF pairs into LF, no matter if they are true end-of-line markers or embedded content, that's fine. (It certainly simplifies the implementation.) However, a) I would run it past the folks on csv at python.org first, and b) put a big fat note in the module docs about the transformation. Antoine> Therefore, I don't think 2.x is of any guidance to us for what Antoine> 3.x should do. I suspect we will disagree on this. I believe the behavior of the 2.x version of the module is easily defensible and should be a useful guide to how the 3.x version of the module behaves. >> The documentation says "If csvfile is a file object, it must be >> opened with the $,1rx(Bb$,1ry(B flag on platforms where that makes a difference." Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I Antoine> had to open a CSV file in 'rU' mode because it had Windows line Antoine> endings and I'm under Linux.... See above. You almost certainly didn't have fields containing CRLF pairs or didn't care that while reading the file your data values were silently altered. Skip From ncoghlan at gmail.com Wed Apr 1 12:45:26 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Apr 2009 20:45:26 +1000 Subject: [Python-Dev] Broken import? In-Reply-To: <49D29D7B.7000002@canterbury.ac.nz> References: <49D20B54.1010108@gmail.com> <49D29D7B.7000002@canterbury.ac.nz> Message-ID: <49D345C6.2050507@gmail.com> Greg Ewing wrote: > Nick Coghlan wrote: > >> Jim Fulton's example in that tracker issue shows that with a bit of >> creativity you can provoke this behaviour *without* using a from-style >> import. Torsten Bronger later brought up the same issue that Fredrik did >> - it prevents some kinds of explicit relative import that look like they >> should be fine. > > I haven't been following this very closely, but if there's > something that's making absolute and relative imports > behave differently, I think it should be fixed. The only > difference between an absolute and relative import of the > same module should be the way you specify the module. That's exactly the problem though. Because of the difference in the way the target module is specified, the way it is looked up is different: 'import a.b.c' will look in sys.modules for "a.b.c", succeed and work, even if "a.b.c" is in the process of being imported. 'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will only look in sys.modules for "a.b", and then look on that object for a "c" attribute. The cached "a.b.c' module in sys.modules is ignored. It doesn't appear to be an impossible problem to solve, but it probably isn't going to be easy to fix in a backwards compatible way. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Wed Apr 1 12:53:19 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 1 Apr 2009 10:53:19 +0000 (UTC) Subject: [Python-Dev] CSV, bytes and encodings References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: pobox.com> writes: > > Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I > Antoine> had to open a CSV file in 'rU' mode because it had Windows line > Antoine> endings and I'm under Linux.... > > See above. You almost certainly didn't have fields containing CRLF pairs or > didn't care that while reading the file your data values were silently > altered. Perhaps. But without using 'rU' the file couldn't be read at all. (I'm not sure it was Windows line endings by the way; perhaps Macintosh ones; anyway, it didn't work using 'rb') I have to add that if individual fields really can contain newlines, then the CSV module ought to be smarter when /saving/ those fields. I've inadvertently tried to produce a CSV file with such fields and it ended up wrong when opened as a spreadsheet (text after the newlines was ignored in Gnumeric and in OpenOffice, while Excel displayed a spurious additional row containing only the text after the newline). Regards Antoine. From greg.ewing at canterbury.ac.nz Wed Apr 1 13:11:10 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 01 Apr 2009 23:11:10 +1200 Subject: [Python-Dev] Broken import? In-Reply-To: <49D345C6.2050507@gmail.com> References: <49D20B54.1010108@gmail.com> <49D29D7B.7000002@canterbury.ac.nz> <49D345C6.2050507@gmail.com> Message-ID: <49D34BCE.4050401@canterbury.ac.nz> Nick Coghlan wrote: > 'import a.b.c' will look in sys.modules for "a.b.c", succeed and work, > even if "a.b.c" is in the process of being imported. > > 'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will > only look in sys.modules for "a.b", and then look on that object for a > "c" attribute. The cached "a.b.c' module in sys.modules is ignored. Hasn't 'from a.b import c' always been that way, though? Is the problem just that relative imports make it easier to run into this behaviour, or has something about the way imports work changed? -- Greg From ncoghlan at gmail.com Wed Apr 1 13:50:07 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Apr 2009 21:50:07 +1000 Subject: [Python-Dev] Broken import? In-Reply-To: <49D34BCE.4050401@canterbury.ac.nz> References: <49D20B54.1010108@gmail.com> <49D29D7B.7000002@canterbury.ac.nz> <49D345C6.2050507@gmail.com> <49D34BCE.4050401@canterbury.ac.nz> Message-ID: <49D354EF.1010300@gmail.com> Greg Ewing wrote: > Nick Coghlan wrote: > >> 'import a.b.c' will look in sys.modules for "a.b.c", succeed and work, >> even if "a.b.c" is in the process of being imported. >> >> 'from a.b import c' (or 'from . import c' in a subpackage of "a.b") will >> only look in sys.modules for "a.b", and then look on that object for a >> "c" attribute. The cached "a.b.c' module in sys.modules is ignored. > > Hasn't 'from a.b import c' always been that way, though? > Is the problem just that relative imports make it easier > to run into this behaviour, or has something about the > way imports work changed? The former - while a few things have obviously changed in this area due to PEP 328 and PEP 366, I don't believe any of that affected this aspect of the semantics (the issue I linked dates from 2004!). Instead, I'm pretty sure implicit relative imports use the 'import a.b.c' rules and hence work in situations where explicit relative imports now fail. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From chris at simplistix.co.uk Wed Apr 1 14:12:41 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Wed, 01 Apr 2009 13:12:41 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> Message-ID: <49D35A39.7020507@simplistix.co.uk> Guido van Rossum wrote: > Well hold on for a minute, I remember we used to have an exec > statement in a class body in the standard library, to define some file > methods in socket.py IIRC. But why an exec?! Surely there must be some other way to do this than an exec? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From skip at pobox.com Wed Apr 1 14:51:28 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 1 Apr 2009 07:51:28 -0500 Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: <18899.25424.820832.462451@montanaro.dyndns.org> Antoine> Perhaps. But without using 'rU' the file couldn't be read at Antoine> all. (I'm not sure it was Windows line endings by the way; Antoine> perhaps Macintosh ones; anyway, it didn't work using 'rb') Please file a bug report and assign to me. Does it work in 2.x? What was the source of the file? Antoine> I have to add that if individual fields really can contain Antoine> newlines, then the CSV module ought to be smarter when /saving/ Antoine> those fields. I've inadvertently tried to produce a CSV file Antoine> with such fields and it ended up wrong when opened as a Antoine> spreadsheet (text after the newlines was ignored in Gnumeric Antoine> and in OpenOffice, while Excel displayed a spurious additional Antoine> row containing only the text after the newline). Sounds like you have a budding test case. Of course, the problem with CSV files is that there is no standard. In the above paragraph you named three. The CSV authors chose Excel's behavior as the measuring stick. Still, that's not written down anywhere. You have to read the tea leaves. Skip From rdmurray at bitdance.com Wed Apr 1 16:54:19 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 1 Apr 2009 10:54:19 -0400 (EDT) Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: <18899.17394.455907.841425@montanaro.dyndns.org> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: On Wed, 1 Apr 2009 at 05:37, skip at pobox.com wrote: > This case arises rarely, but it does turn up every now and again. If you For some definition of "rarely". I don't handle CVS files generated by Windows very often, but I've run into it a least a couple times. That says to me that it isn't all that rare in the wild. (One out of fifty? But I'm sure it depends on your data sources; some people will run into it often, others almost never.) Of course, on unix it doesn't help much having those newlines preserved, since there are few tools on unix other than the CSV module that even attempt to deal with newlines inside quoted strings being data, but on Windows it makes a difference. It would actually be nice if the CSV module had an option for turning those quoted newlines into spaces, but that's a feature request and is out of scope for this discussion :) > Antoine> The documentation is, IMO, wrong even in 2.x. Just yesterday I > Antoine> had to open a CSV file in 'rU' mode because it had Windows line > Antoine> endings and I'm under Linux.... That sounds like a bug, IMO. From the source code it looks like the 2.6 _csv module should be handling that, and certainly intended to handle it. --David From peck at spss.com Wed Apr 1 15:45:25 2009 From: peck at spss.com (Peck, Jon) Date: Wed, 1 Apr 2009 08:45:25 -0500 Subject: [Python-Dev] Python 2.6 64-bit Mac Release Message-ID: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com> Apparently the Mac Python 2.6.1 Installer image does not include 64-bit binaries. Is this going to change? Is there some technical reason why these are not included? We are hoping to support that in our next 64-bit release. Thanks for your help. Jon K. Peck SPSS Inc. peck at spss.com (ip) phone 312-651-3435 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cto at whiz.com Wed Apr 1 18:01:01 2009 From: cto at whiz.com (Frank The Extruder Lomax) Date: April 1, 2009 12:01:01 EDT Subject: [Python-Dev] All Hail the FLUFL Message-ID: On behalf of the entire Python community and as CTO of Cheez Whiz Global Conglomerates, Inc. I would like to extend My Thanks to our BDEVIL's many Selfless and Dedicated years of service. I can say without remorse that the State of the Art in the Gas Propelled Cheezical Sciences would be 11 years behind schedule if it weren't for the BDEVIL and Python. However, as we at CWGC have been Contemplating an upgrade to our Pythonic Orange Oscillation Process for many years - we're still on Python 1.5.1 and due to our Centuries-Old corporate culture, only upgrade to multiples of two - we have been dismayed by recent Unfortunate Decisions negatively impacting our hope of world-wide long term acceptance of Python 3.0.2. Thus I applaud you for your chosen New Path, wish you a Speedy Shirpa Assisted Ascent, and I welcome our new FLUFL's similarly Lofty Ascent to Pythonic Overlordhood. You may not realize that the technology behind our groundbreaking Subsonic Plasticine Extrusion of Coagulated Flavor Oil utilize inequalities to a vast degree. In fact, our many Published Papers confirm our mathematical leadership in the Algebraic States of Less Than and More Than Simultaneity. Were it not for the Diamond Operator, billions, nay! trillions of crackers would have languished Unadorned, Unenjoyed, and Unloved. For this reason, the sole choice of the Evil Hash-Equal was enough to force us to Seriously Investigate a switch to Stenographic Non-deterministic SQL (a promising new scripting language somewhat similar to Ruby). It is with an Overboiling Pot of Joy that we fully support Official FLUFL Act 2. Trust me when I say that with this single reversion, the world of High Velocity Extremely High Pressure Milkyish Orange Goop Delivery Devices will never be the same. I am also Ecstatic at the reversal of DVCS decision. It is with no Small Irony that I admit the mere utterance of any derivative of the root word "Mercury" is a firing offence in my Establishment. In honor of our new FLUFL, I am directing our CFO Timmy "The Larch" Lomax (no direct relation to myself) to donate the sum of USA $23,250 to the PSU in furtherance of their mission. If there is a PyCon sponsorship level Above Diamond (may we suggest "Orange"?) we would be honored to claim that Pinnacle for 2010. Atlanta is located very near our Secret Manufacturing Facility and I would be remiss if I did not direct additional PyCon branded delivery of 5000 cans of our Premium Velvet Brand Cheez Whiz Lunchables with Detachable Shooters. I think the 2010 conference attendees will appreciate the diversion and hope this will entice people to join our Sprint next year. foolish-ly y'rs, frank From rdmurray at bitdance.com Wed Apr 1 17:00:06 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 1 Apr 2009 11:00:06 -0400 (EDT) Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D35A39.7020507@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> Message-ID: On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote: > Guido van Rossum wrote: >> Well hold on for a minute, I remember we used to have an exec >> statement in a class body in the standard library, to define some file >> methods in socket.py IIRC. > > But why an exec?! Surely there must be some other way to do this than an > exec? Maybe, but this sure is gnarly code: _s = ("def %s(self, *args): return self._sock.%s(*args)\n\n" "%s.__doc__ = _realsocket.%s.__doc__\n") for _m in _socketmethods: exec _s % (_m, _m, _m, _m) del _m, _s Guido's memory is good, that's from the _socketobject class in socket.py. --David From jeremy at alum.mit.edu Wed Apr 1 17:20:32 2009 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 1 Apr 2009 11:20:32 -0400 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <693bc9ab0903312036w175b21b2ued5e5f7631e82123@mail.gmail.com> Message-ID: I posted in the bug report, but repeating here: I don't remember why exec in a nested function changed either. It would help if someone could summarize why we made the change. (I hope I didn't do it <0.2 wink>.) Jeremy On Tue, Mar 31, 2009 at 11:36 PM, Maciej Fijalkowski wrote: > Because classes have now it's own local scope (according to Martin) > > It's not about exec in class, it's about exec in class in nested function. > > On Wed, Apr 1, 2009 at 5:25 AM, Guido van Rossum wrote: >> Well hold on for a minute, I remember we used to have an exec >> statement in a class body in the standard library, to define some file >> methods in socket.py IIRC. ?It's a totally different case than exec in >> a nested function, and I don't believe it should be turned into a >> syntax error at all. An exec in a class body is probably meant to >> define some methods or other class attributes. I actually think the >> 2.5 behavior is correct, and I don't know why it changed in 2.6. >> >> --Guido >> >> On Tue, Mar 31, 2009 at 8:15 PM, Maciej Fijalkowski wrote: >>> So. The issue was closed and I suppose it was closed by not entirely >>> understanding >>> the problem (or I didn't get it completely). >>> >>> The question is - what the following code should do? >>> >>> def f(): >>> ?a = 2 >>> ?class C: >>> ? ?exec 'a = 42' >>> ? ?abc = a >>> ?return C >>> >>> print f().abc >>> >>> (quick answer - on python2.5 it return 42, on python 2.6 and up it >>> returns 2, the patch changes >>> it to syntax error). >>> >>> I would say that returning 2 is the less obvious thing to do. The >>> reason why IMO this should >>> be a syntax error is this code: >>> >>> def f(): >>> ?a = 2 >>> ?def g(): >>> ? ?exec 'a = 42' >>> ? ?abc = a >>> >>> which throws syntax error. >>> >>> Cheers, >>> fijal >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >> >> >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > From brett at python.org Wed Apr 1 17:21:48 2009 From: brett at python.org (Brett Cannon) Date: Wed, 1 Apr 2009 08:21:48 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> Message-ID: On Tue, Mar 31, 2009 at 20:34, Guido van Rossum wrote: > Can you get Jim Fulton's feedback? ISTR he originated this. > I thought Neal started this idea? -Brett > > On Tue, Mar 31, 2009 at 12:14 PM, Larry Hastings > wrote: > > > > The CObject API has two flaws. > > > > First, there is no usable type safety mechanism. You can store a void > > *object, and a void *description. There is no established schema for > > the description; it could be an integer cast to a pointer, or it could > > point to memory of any configuration, or it could be NULL. Thus users > > of the CObject API generally ignore it--thus working without any type > > safety whatsoever. A programmer could crash the interpreter from pure > > Python by mixing and matching CObjects from different modules (e.g. give > > "curses" a CObject from "_ctypes"). > > > > Second, the destructor callback is defined as taking *either* one *or* > > two parameters, depending on whether the "descr" pointer is non-NULL. One > > can debate the finer points of what is and isn't defined behavior in > > C, but at its heart this is a sloppy API. > > > > MvL and I discussed this last night and decided to float a revision of > > the API. I wrote the patch, though, so don't blame Martin if you don't > > like my specific approach. > > > > The color of this particular bike shed is: > > * The PyCObject is now a private data structure; you must use accessors. > > I added accessors for all the members. > > * The constructors and the main accessor (PyCObject_AsVoidPtr) now all > > *require* a "const char *type" parameter, which must be a non-NULL C > > string of non-zero length. If you call that accessor and the "type" > > is invalid *or doesn't match*, it fails. > > * The destructor now takes the PyObject *, not the PyCObject *. You > > must use accessors to get your hands on the data inside to free it. > > > > Yes, you can easily skip around the "matching type" restriction by > > calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)). The point > > of my API changes is to *encourage* correct use. > > > > I've posted a patch implementing this change in the 3.1 trunk to the > > bug tracker: > > > > http://bugs.python.org/issue5630 > > > > I look forward to your comments! > > > > > > /larry/ > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/ > ) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Wed Apr 1 17:34:42 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 1 Apr 2009 15:34:42 +0000 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D26BB1.8050108@hastings.org> References: <49D26BB1.8050108@hastings.org> Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local> What are the semantics of the "type" argument for PyCObject_FromVoidPtr()? -Does it do a strdup, or is the type required to be valid while the object exists (e.g. a static string)? -How is the type match determined, strcmp, or pointer comparison? -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Larry Hastings Sent: 31. mars 2009 19:15 To: Python-Dev at python.org Subject: [Python-Dev] Let's update CObject API so it is safe and regular! * The constructors and the main accessor (PyCObject_AsVoidPtr) now all *require* a "const char *type" parameter, which must be a non-NULL C string of non-zero length. If you call that accessor and the "type" is invalid *or doesn't match*, it fails. From ronaldoussoren at mac.com Wed Apr 1 18:17:40 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 01 Apr 2009 11:17:40 -0500 Subject: [Python-Dev] Python 2.6 64-bit Mac Release In-Reply-To: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com> References: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com> Message-ID: On 1 Apr, 2009, at 8:45, Peck, Jon wrote: > Apparently the Mac Python 2.6.1 Installer image does not include 64- > bit binaries. Is this going to change? Is there some technical > reason why these are not included? We are hoping to support that in > our next 64-bit release. The 2.6 installer image does not include 64-bit binaries. As of this week the script that creates the installer can build an installer that does support 64-bit code as well, but that only works on Leopard systems. I'm thinking about how to distribute binaries that support 64-bit code without unduly complicating the world. The easiest option for me would be to have two installers: one 32-bit only that supports OSX 10.3.9 and later and a 4-way universal one that supports OSX Leopard and later. It might be possible to have a single installer that supports 64-bit code on Leopard but is usable on 10.3.9 as well, but I haven't checked yet how much that would complicate the build. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Wed Apr 1 18:19:42 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 01 Apr 2009 11:19:42 -0500 Subject: [Python-Dev] Python 2.6 64-bit Mac Release In-Reply-To: References: <6CD9B6A6B6CCBA4FA497F07182F4EE83014B6EAB@MIAEMAILEVS1.spss.com> Message-ID: <49D3941E.103@voidspace.org.uk> Ronald Oussoren wrote: > > On 1 Apr, 2009, at 8:45, Peck, Jon wrote: > >> Apparently the Mac Python 2.6.1 Installer image does not include >> 64-bit binaries. Is this going to change? Is there some technical >> reason why these are not included? We are hoping to support that in >> our next 64-bit release. > > The 2.6 installer image does not include 64-bit binaries. As of this > week the script that creates the installer can build an installer that > does support 64-bit code as well, but that only works on Leopard systems. > > I'm thinking about how to distribute binaries that support 64-bit code > without unduly complicating the world. The easiest option for me would > be to have two installers: one 32-bit only that supports OSX 10.3.9 > and later and a 4-way universal one that supports OSX Leopard and > later. It might be possible to have a single installer that supports > 64-bit code on Leopard but is usable on 10.3.9 as well, but I haven't > checked yet how much that would complicate the build. > Two installers sounds OK to me, particularly if it simplifies the build process but allows us to still support 64bit. Michael > Ronald > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ From jeremy at alum.mit.edu Wed Apr 1 18:21:03 2009 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 1 Apr 2009 12:21:03 -0400 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> Message-ID: Eeek, I think it was me. Part of the AST changes involved raising a SyntaxError when exec was used in a scope that had a free variable, since the behavior is pretty much undefined. If the compiler decides a variable is free, then it can't be assigned to in the function body. The compiled exec code can't know whether the variable is local or free in the exec context, only that it should generate a STORE_NAME opcode. The STORE_NAME can't possibly work. It looks like I did a bad job of documenting the change, though. I had forgotton about it ,because it was three or four years ago. It looks like the same exception should be raised for the class statement. Jeremy On Wed, Apr 1, 2009 at 11:00 AM, R. David Murray wrote: > On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote: >> >> Guido van Rossum wrote: >>> >>> ?Well hold on for a minute, I remember we used to have an exec >>> ?statement in a class body in the standard library, to define some file >>> ?methods in socket.py IIRC. >> >> But why an exec?! Surely there must be some other way to do this than an >> exec? > > Maybe, but this sure is gnarly code: > > ? ?_s = ("def %s(self, *args): return self._sock.%s(*args)\n\n" > ? ? ? ? ?"%s.__doc__ = _realsocket.%s.__doc__\n") > ? ?for _m in _socketmethods: > ? ? ? ?exec _s % (_m, _m, _m, _m) > ? ?del _m, _s > > Guido's memory is good, that's from the _socketobject class in > socket.py. > > --David > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > From ron.duplain at gmail.com Wed Apr 1 18:50:48 2009 From: ron.duplain at gmail.com (Ron DuPlain) Date: Wed, 1 Apr 2009 11:50:48 -0500 Subject: [Python-Dev] 3to2 Project In-Reply-To: <1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com> References: <4222a8490903300744t498e79daodea9cff32e4a94c1@mail.gmail.com> <43aa6ff70903301037y215d979he36246d36c987493@mail.gmail.com> <1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com> Message-ID: <2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com> On Mon, Mar 30, 2009 at 9:29 PM, Benjamin Peterson wrote: > 2009/3/30 Collin Winter : >> If anyone is interested in working on this during the PyCon sprints or >> otherwise, here are some easy, concrete starter projects that would >> really help move this along: >> - The core refactoring engine needs to be broken out from 2to3. In >> particular, the tests/ and fixes/ need to get pulled up a directory, >> out of lib2to3/. >> - Once that's done, lib2to3 should then be renamed to something like >> librefactor or something else that indicates its more general nature. >> This will allow both 2to3 and 3to2 to more easily share the core >> components. > > FWIW, I think it is unfortunately too late to make this change. We've > already released it as lib2to3 in the standard library and I have > actually seen it used in other projects. (PythonScope, for example.) > Paul Kippes and I have been sprinting on this. We put lib2to3 into a refactor package and kept a shell lib2to3 to support the old interface. We are able to run 2to3, 3to2, lib2to3 tests, and refactor tests. We only have a few simple 3to2 fixes now, but they should be easy to add. We kept the old lib2to3 tests to make sure we didn't break anything. As things settle down, I'd like to verify that our new lib2to3 is backward-compatible (since right now it points to the new refactor lib) with one of the external projects. We've been using hg to push changesets between each other, but we'll be committing to the svn sandbox before the week is out. I'm heading out today, but Paul is sticking around another day. It's a start, Ron > > -- > Regards, > Benjamin From fuzzyman at voidspace.org.uk Wed Apr 1 19:51:50 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 01 Apr 2009 12:51:50 -0500 Subject: [Python-Dev] Wing IDE and python.wpr Message-ID: <49D3A9B6.9020609@voidspace.org.uk> Hello all, How many are using the Wing IDE to work on core Python? It would be nice to have a 'python.wpr' checked in to trunk, as I have to recreate the project file every time I do a new checkout. Would this be useful for anyone else? Where is a good place for it to live? Littering the top level directory seems like a bad idea but I can't see anywhere else immediately *obvious* (no reason it has to live at the top level). Wing can be configured to use two files for the project - one file for the basic configuration (which would be checked in) and one for your personal settings (which files you have open, how many windows you are using etc) and would be svn-ignored. Michael Foord -- http://www.ironpythoninaction.com/ From larry at hastings.org Wed Apr 1 20:40:36 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 01 Apr 2009 11:40:36 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> Message-ID: <49D3B524.5090708@hastings.org> Brett Cannon wrote: > On Tue, Mar 31, 2009 at 20:34, Guido van Rossum > wrote: > > Can you get Jim Fulton's feedback? ISTR he originated this. > > > I thought Neal started this idea? The earliest revision spotted in "svn blame cobject.[ch]" is 5782: svn log -r 5782 ------------------------------------------------------------------------ r5782 | guido | 1996-01-11 16:44:03 -0800 (Thu, 11 Jan 1996) | 2 lines opaque C object a la Jim Fulton I'll email Jim Fulton and inquire. /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Wed Apr 1 20:44:40 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 01 Apr 2009 13:44:40 -0500 Subject: [Python-Dev] Wing IDE and python.wpr In-Reply-To: <49D3A9B6.9020609@voidspace.org.uk> References: <49D3A9B6.9020609@voidspace.org.uk> Message-ID: <49D3B618.2010203@voidspace.org.uk> Michael Foord wrote: > Hello all, > > How many are using the Wing IDE to work on core Python? > > It would be nice to have a 'python.wpr' checked in to trunk, as I have > to recreate the project file every time I do a new checkout. Would > this be useful for anyone else? Where is a good place for it to live? > Littering the top level directory seems like a bad idea but I can't > see anywhere else immediately *obvious* (no reason it has to live at > the top level). > > Wing can be configured to use two files for the project - one file for > the basic configuration (which would be checked in) and one for your > personal settings (which files you have open, how many windows you are > using etc) and would be svn-ignored. The Wing project file is now checked in. It is Misc/python-wing.wpr The project is configured with SVN integration enabled, with two file configuration and the wpu file SVN ignored plus the main project directory added. The wpr file is text so changes are diff friendly. There is an issue with the way the project is displayed - the Misc directory is the top-level with '..' showing as another directory in the project. This issue will be resolved in the next version of Wing. There are various other feature-requests now with Wing to better support using it for developing Python. Currently the debugger doesn't work with a newly built version of Python and the executable name / location is platform dependent and so setting a custom executable would only work on one platform. It would be easy to add custom tools to (for example) integrate regrtest or do the configure / make dance on a fresh checkout. All the best, Michael > > Michael Foord > -- http://www.ironpythoninaction.com/ From larry at hastings.org Wed Apr 1 20:58:00 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 01 Apr 2009 11:58:00 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local> References: <49D26BB1.8050108@hastings.org> <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local> Message-ID: <49D3B938.5000202@hastings.org> Kristj?n Valur J?nsson wrote: > What are the semantics of the "type" argument for PyCObject_FromVoidPtr()? > From the patch, from the documentation comment above the prototype for PyCObject_FromVoidPtr() in Include/cobject.h: The "type" string must point to a legal C string of non-zero length, > -Does it do a strdup, or is the type required to be valid while the object exists (e.g. a static string)? > From the patch, continuing on from where we just left off: and this string must outlive the CObject. > -How is the type match determined, strcmp, or pointer comparison? From the patch, observing the code in the static function _is_legal_cobject_and_type() in Objects/cobject.c: if (!type || !*type) { PyErr_SetString(PyExc_TypeError, invalidType); return 0; } if (strcmp(type, self->type)) { PyErr_SetString(PyExc_TypeError, incorrectType); return 0; } A method for answering further such questions suggests itself, /larry// / From jim at zope.com Wed Apr 1 23:29:19 2009 From: jim at zope.com (Jim Fulton) Date: Wed, 1 Apr 2009 17:29:19 -0400 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D26BB1.8050108@hastings.org> References: <49D26BB1.8050108@hastings.org> Message-ID: On Mar 31, 2009, at 3:14 PM, Larry Hastings wrote: (Thanks for calling my attention to this. :) > > The CObject API has two flaws. > > First, there is no usable type safety mechanism. You can store a void > *object, and a void *description. There is no established schema for > the description; it could be an integer cast to a pointer, or it could > point to memory of any configuration, or it could be NULL. Thus users > of the CObject API generally ignore it--thus working without any type > safety whatsoever. A programmer could crash the interpreter from pure > Python by mixing and matching CObjects from different modules (e.g. > give > "curses" a CObject from "_ctypes"). The description field wasn't in the original CObject implementation that I was involved with many years ago. Looking at it now, I don't think it is intended as a type-safety mechanism at all, but as a way to pass data to the destructor. I don't know what motivated this. (I don't know why it it's called "description". This name seems to be very confusing.) The only type-safety mechanism for a CObject is it's identity. If you want to make sure you're using the foomodule api, make sure the address of the CObject is the same as the address of the api object exported by the module. The exporting module should automate use of the C API by providing an appropriate header file, as described in http://docs.python.org/extending/extending.html#providing-a-c-api-for-an-extension-module . > Second, the destructor callback is defined as taking *either* one *or* > two parameters, depending on whether the "descr" pointer is non- > NULL. One can debate the finer points of what is and isn't defined > behavior in > C, but at its heart this is a sloppy API. It was necessary for backward compatibility. I don't know what motivated this, so I don't know if the benefit was worth the ugliness. > MvL and I discussed this last night and decided to float a revision of > the API. I wrote the patch, though, so don't blame Martin if you > don't > like my specific approach. > > The color of this particular bike shed is: > * The PyCObject is now a private data structure; you must use > accessors. > I added accessors for all the members. The original implementation didn't expose the structure. I don't know why it was exposed. It would be backward incompatible to hide it again now. > * The constructors and the main accessor (PyCObject_AsVoidPtr) now all > *require* a "const char *type" parameter, which must be a non-NULL C > string of non-zero length. If you call that accessor and the "type" > is invalid *or doesn't match*, it fails. That would break backward compatibility. Are you proposing this for Python 3? What would be the gain in this? The CObject is already a type identifier for itself. In any case, client code generally doesn't mess with CObjects directly anyway. > * The destructor now takes the PyObject *, not the PyCObject *. You > must use accessors to get your hands on the data inside to free it. It currently isn't passed the CObject, but the C pointer that it holds. In any case, changing the API isn't practical, at least not for Python 2. > Yes, you can easily skip around the "matching type" restriction by > calling PyCObject_AsVoidPtr(cobj, PyCObject_GetType(cobj)). The point > of my API changes is to *encourage* correct use. > > I've posted a patch implementing this change in the 3.1 trunk to the > bug tracker: > > http://bugs.python.org/issue5630 > > I look forward to your comments! -1 I don't see that this gains anything. 1. All you're adding, afaict is a name for the API and the (address of the) CObject itself already provides this. 2. Only code provided by the module provider should be accessing the CObject exported by the module. Jim -- Jim Fulton Zope Corporation From david.christian at gmail.com Wed Apr 1 23:49:31 2009 From: david.christian at gmail.com (David Christian) Date: Wed, 1 Apr 2009 17:49:31 -0400 Subject: [Python-Dev] bdb.py trace C implementation? Message-ID: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> Hi all, I've recently written a C version of the trace function used in figleaf (the coverage tool written by Titus). After a few rewrites to add in caching, etc, it gives users a significant speedup. One person stated that switching to the C version caused coverage to decrease from a 442% slowdown to only a 56% slowdown. You can see my C implementation at: http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c (Specific comments about the implementation welcome off-list). I'd like to attempt something similar for bdb.py (only for the trace function). A naive C trace function which duplicated the current python function should speed up bdb significantly. I would initially write the smallest part of the C implementation that I could. Basically the tracing function would call back out to python at any point where a line requires action. I'd be willing to maintain the C implementation. I would be willing to write those tests that are possible as well. Is this something that would be likely to be accepted? Thanks, David Christian Senior Software Engineer rPath, Inc. From rdmurray at bitdance.com Thu Apr 2 00:22:26 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 1 Apr 2009 18:22:26 -0400 (EDT) Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: On Wed, 1 Apr 2009 at 10:53, Antoine Pitrou wrote: > Perhaps. But without using 'rU' the file couldn't be read at all. > (I'm not sure it was Windows line endings by the way; perhaps Macintosh ones; > anyway, it didn't work using 'rb') I just tested it in 2.6. It must have been old-mac (\r), which indeed gave me the error message you mentioned. Windows lineneds worked fine for me reading in binary mode on linux. > I have to add that if individual fields really can contain newlines, then the > CSV module ought to be smarter when /saving/ those fields. I've inadvertently > tried to produce a CSV file with such fields and it ended up wrong when opened > as a spreadsheet (text after the newlines was ignored in Gnumeric and in > OpenOffice, while Excel displayed a spurious additional row containing only the > text after the newline). I just added some tests to trunk that seem to indicate this case is handled correctly in terms of preserving the data. Maybe you didn't write the file such that the fields with the newlines were quoted? And of course how non-Excel applications handle that data on import can be different from how Excel handles it. --David From benjamin at python.org Thu Apr 2 00:25:57 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 1 Apr 2009 17:25:57 -0500 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> Message-ID: <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> 2009/4/1 David Christian : > Hi all, > I've recently written a C version of the trace function used in > figleaf (the coverage tool written by Titus). ?After a few rewrites to > add in caching, etc, it gives users a significant speedup. ?One person > stated that switching to the C version caused coverage to decrease > from a 442% slowdown to only a 56% slowdown. > > You can see my C implementation at: > ?http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c > > (Specific comments about the implementation welcome off-list). > > I'd like to attempt something similar for bdb.py (only for the trace > function). ?A naive C trace function which duplicated the current > python function should speed up bdb significantly. ?I would initially > write the smallest part of the C implementation that I could. > Basically the tracing function would call back out to python at any > point where a line requires action. > > I'd be willing to maintain the C implementation. ?I would be willing > to write those tests that are possible as well. > > Is this something that would be likely to be accepted? Generally debugging doesn't require good performance, so this is definitely low priority. However, if you can contribute it, I don't have a problem with it. -- Regards, Benjamin From rdmurray at bitdance.com Thu Apr 2 00:44:58 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 1 Apr 2009 18:44:58 -0400 (EDT) Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: OK, Antoine, having merged my newline tests to py3k and having them work when lineend is set to '', as you suggested on the ticket, I'm inclined to agree with you that this is a doc bug. Skip? --David From guido at python.org Thu Apr 2 00:48:59 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Apr 2009 15:48:59 -0700 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> Message-ID: On Wed, Apr 1, 2009 at 3:25 PM, Benjamin Peterson wrote: > 2009/4/1 David Christian : >> Hi all, >> I've recently written a C version of the trace function used in >> figleaf (the coverage tool written by Titus). ?After a few rewrites to >> add in caching, etc, it gives users a significant speedup. ?One person >> stated that switching to the C version caused coverage to decrease >> from a 442% slowdown to only a 56% slowdown. >> >> You can see my C implementation at: >> ?http://github.com/ctb/figleaf/blob/e077155956c288b68704b09889ebcd675ba02240/figleaf/_coverage.c >> >> (Specific comments about the implementation welcome off-list). >> >> I'd like to attempt something similar for bdb.py (only for the trace >> function). ?A naive C trace function which duplicated the current >> python function should speed up bdb significantly. ?I would initially >> write the smallest part of the C implementation that I could. >> Basically the tracing function would call back out to python at any >> point where a line requires action. >> >> I'd be willing to maintain the C implementation. ?I would be willing >> to write those tests that are possible as well. >> >> Is this something that would be likely to be accepted? > > Generally debugging doesn't require good performance, so this is > definitely low priority. However, if you can contribute it, I don't > have a problem with it. Tracing has other uses besides debugging though. In particular, coverage, which usually wants per-line data. Also, sometimes if you set a breakpoint in a function it turns on tracing for the entire function. This can sometimes be annoyingly slow. So, personally, I am more positive than that, and hope it will make it in. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From benjamin at python.org Thu Apr 2 00:53:30 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 1 Apr 2009 17:53:30 -0500 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> Message-ID: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> 2009/4/1 Guido van Rossum : > Tracing has other uses besides debugging though. The OP said he wished to implement a C trace function for bdb. Wouldn't that make it only applicable to debugging? -- Regards, Benjamin From robert.kern at gmail.com Thu Apr 2 01:00:56 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 01 Apr 2009 18:00:56 -0500 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> Message-ID: On 2009-04-01 17:53, Benjamin Peterson wrote: > 2009/4/1 Guido van Rossum: >> Tracing has other uses besides debugging though. > > The OP said he wished to implement a C trace function for bdb. > Wouldn't that make it only applicable to debugging? Once you are at the breakpoint and stepping through the code manually, the performance is not all that important. However, up until that breakpoint, you are running a lot of code "in bulk". It would be useful to have a performant trace function that interferes with that code the least. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david.christian at gmail.com Thu Apr 2 01:07:32 2009 From: david.christian at gmail.com (David Christian) Date: Wed, 1 Apr 2009 19:07:32 -0400 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> Message-ID: <63940b00904011607y75a602acxea87905d9923f66e@mail.gmail.com> On Wed, Apr 1, 2009 at 6:53 PM, Benjamin Peterson wrote: > 2009/4/1 Guido van Rossum : >> Tracing has other uses besides debugging though. > > The OP said he wished to implement a C trace function for bdb. > Wouldn't that make it only applicable to debugging? > > Benjamin > I was suggesting a speedup for debugging. However, I could certainly also contribute my figleaf work that I referenced earlier, with a few tweaks, as a tracing replacement for the tracing function in trace.py. My concern with moving the coverage tracing code in particular to the standard library is that it tries to extract the maximum speed by being clever*, and certainly has not been out in the wild for long enough. I would write something much more conservative as a starting point for bdb.py. I expect that any C implementation that was thinking about performance at all would be much better than the status quo. * figleaf checks a regular expression to determine whether or not we wish to trace a particular file. If the file is not being traced, I switch to the profiler instead of the line tracer, which means that the trace function only gets called twice per function instead of once per line. This can give a large speedup when you are skipping the entire standard library, at some measurable cost per function call, and a cost in code complexity. --- David Christian Senior Software Engineer rPath, Inc From guido at python.org Thu Apr 2 01:14:25 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Apr 2009 16:14:25 -0700 Subject: [Python-Dev] bdb.py trace C implementation? In-Reply-To: <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> References: <63940b00904011449o4f87917i7dcd83b1ca05da@mail.gmail.com> <1afaf6160904011525y28ab16efh87bc693aa41f703d@mail.gmail.com> <1afaf6160904011553r515a9e5agafbd0b3bf4e77f7f@mail.gmail.com> Message-ID: On Wed, Apr 1, 2009 at 3:53 PM, Benjamin Peterson wrote: > 2009/4/1 Guido van Rossum : >> Tracing has other uses besides debugging though. > > The OP said he wished to implement a C trace function for bdb. > Wouldn't that make it only applicable to debugging? I honestly don't recall, but I believe pretty much everyone who uses tracing does so via bdb.py. And yes, when debugging sometimes you have to silently skip 1000 iterations until a condition becomes true, and the tracking speed matters. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From larry at hastings.org Thu Apr 2 01:26:15 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 01 Apr 2009 16:26:15 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> Message-ID: <49D3F817.9080201@hastings.org> Jim Fulton wrote: > The only type-safety mechanism for a CObject is it's identity. If you > want to make sure you're using the foomodule api, make sure the > address of the CObject is the same as the address of the api object > exported by the module. That doesn't help. Here's a program that crashes the interpreter, something I shouldn't be able to do from pure Python: import _socket import cStringIO cStringIO.cStringIO_CAPI = _socket.CAPI import cPickle s = cPickle.dumps([1, 2, 3]) How can cPickle determine that cStringIO.cStringIO_CAPI is legitimate? > That would break backward compatibility. Are you proposing this for > Python 3? I'm proposing this for Python 3.1. My understanding is that breaking backwards compatibility is still on the table, which is why I wrote the patch the way I did. If we have to preserve the existing API, I still think we should add new APIs and deprecate the old ones. It's worth noting that there's been demand for this for a long time. Check out this comment from Include/datetime.h: #define PyDateTime_IMPORT \ PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_Import("datetime", \ "datetime_CAPI") /* This macro would be used if PyCObject_ImportEx() was created. #define PyDateTime_IMPORT \ PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_ImportEx("datetime", \ "datetime_CAPI", \ DATETIME_API_MAGIC) */ That was checked in by Tim Peters on 2004-06-20, r36214. (At least, in the py3k/trunk branch; I'd hope it would be the same revision number in other branches.) /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpe at wingware.com Thu Apr 2 01:29:20 2009 From: jpe at wingware.com (John Ehresman) Date: Wed, 01 Apr 2009 18:29:20 -0500 Subject: [Python-Dev] PyDict_SetItem hook Message-ID: <49D3F8D0.8070805@wingware.com> I've written a proof of concept patch to add a hook to PyDict_SetItem at http://bugs.python.org/issue5654 My motivation is to enable watchpoints in a python debugger that are called when an attribute or global changes. I know that this won't cover function locals and objects with slots (as Martin pointed out). We talked about this at the sprints and a few issues came up: * Is this worth it for debugger watchpoint support? This is a feature that probably wouldn't be used regularly but is extremely useful in some situations. * Would it be better to create a namespace dict subclass of dict, use it for modules, classes, & instances, and only allow watches of the subclass instances? * To what extent should non-debugger code use the hook? At one end of the spectrum, the hook could be made readily available for non-debug use and at the other end, it could be documented as being debug only, disabled in python -O, & not exposed in the stdlib to python code. John From chris at simplistix.co.uk Thu Apr 2 01:48:13 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 00:48:13 +0100 Subject: [Python-Dev] Get the standard library to declare the versions it provides! In-Reply-To: References: <20090327204953.12555.1384799699.divmod.xquotient.6636@weber.divmod.com> <49CD43B7.3050904@v.loewis.de> <49CD4A4F.30900@trueblade.com> <878wmqxjaz.fsf@xemacs.org> <49CE2726.3050307@trueblade.com> Message-ID: <49D3FD3D.4020503@simplistix.co.uk> Fred Drake wrote: > Even simple cases present issues with regard to this. For example, I > work on a project that relies on the uuid module, so we declare a > dependency on Ka-Ping Ye's uuid module (since we're using Python 2.4). > How should we write that in a version-agnostic way if we want to use the > standard library version of that module with newer Pythons? Well, that could be done be getting standard library modules to: - declare what version they are - be overridable why installed packages That way, the fact that the standard library's development moves at the speed of frozen tar wouldn't stop packages in it being developed and released seperately for people who want to use newer versions of them and aren't in a situation where they need "batteries included"... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From guido at python.org Thu Apr 2 01:53:28 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Apr 2009 16:53:28 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D3F817.9080201@hastings.org> References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> Message-ID: 2009/4/1 Larry Hastings : > > Jim Fulton wrote: > > The only type-safety mechanism for a CObject is it's identity.? If you want > to make sure you're using the foomodule api, make sure the address of the > CObject is the same as the address of the api object exported by the module. > > That doesn't help.? Here's a program that crashes the interpreter, something > I shouldn't be able to do from pure Python: > > import _socket > import cStringIO > cStringIO.cStringIO_CAPI = _socket.CAPI > > import cPickle > s = cPickle.dumps([1, 2, 3]) > > How can cPickle determine that cStringIO.cStringIO_CAPI is legitimate? This is a bug in cPickle. It calls the PycString_IMPORT macro at the very end of its init_stuff() function without checking for success. This macro calls PyCObject_Import("cStringIO", "cStringIO_CAPI") which in turn calls PyCObject_AsVoidPtr() on the object that it finds as cStringIO.cStringIO_CAPI, and this function *does* do a type check and sets an exception if the object isn't a PyCObject instance. However cPickle's initialization doesn't check for errors immediately and apparently some later code overrides the exception. The fix should be simple: insert if (PyErr_Occurred()) return -1; immediately after the line PycString_IMPORT; in init_stuff() in cPickle.c. This will cause the import of cPickle to fail with an exception and all should be well. I have to say, I haven't understood this whole thread, but I'm skeptical about a redesign. But perhaps you can come up with an example that doesn't rely on this cPickle bug? --Guido > That would break backward compatibility. Are you proposing this for Python > 3? > > I'm proposing this for Python 3.1.? My understanding is that breaking > backwards compatibility is still on the table, which is why I wrote the > patch the way I did.? If we have to preserve the existing API, I still think > we should add new APIs and deprecate the old ones. > > It's worth noting that there's been demand for this for a long time.? Check > out this comment from Include/datetime.h: > > #define PyDateTime_IMPORT \ > ??????? PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_Import("datetime", \ > ??????????????????????????????????????????????????????????? "datetime_CAPI") > > /* This macro would be used if PyCObject_ImportEx() was created. > #define PyDateTime_IMPORT \ > ??????? PyDateTimeAPI = (PyDateTime_CAPI*) PyCObject_ImportEx("datetime", \ > ??????????????????????????????????????????????????????????? "datetime_CAPI", > \ > > DATETIME_API_MAGIC) > */ > > That was checked in by Tim Peters on 2004-06-20, r36214.? (At least, in the > py3k/trunk branch; I'd hope it would be the same revision number in other > branches.) > > > /larry/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Thu Apr 2 02:31:29 2009 From: collinw at gmail.com (Collin Winter) Date: Wed, 1 Apr 2009 17:31:29 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D3F8D0.8070805@wingware.com> References: <49D3F8D0.8070805@wingware.com> Message-ID: <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> On Wed, Apr 1, 2009 at 4:29 PM, John Ehresman wrote: > I've written a proof of concept patch to add a hook to PyDict_SetItem at > ?http://bugs.python.org/issue5654 ?My motivation is to enable watchpoints in > a python debugger that are called when an attribute or global changes. ?I > know that this won't cover function locals and objects with slots (as Martin > pointed out). > > We talked about this at the sprints and a few issues came up: > > * Is this worth it for debugger watchpoint support? ?This is a feature that > probably wouldn't be used regularly but is extremely useful in some > situations. > > * Would it be better to create a namespace dict subclass of dict, use it for > modules, classes, & instances, and only allow watches of the subclass > instances? > > * To what extent should non-debugger code use the hook? ?At one end of the > spectrum, the hook could be made readily available for non-debug use and at > the other end, it could be documented as being debug only, disabled in > python -O, & not exposed in the stdlib to python code. Have you measured the impact on performance? Collin From larry at hastings.org Thu Apr 2 02:39:34 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 01 Apr 2009 17:39:34 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> Message-ID: <49D40946.1050100@hastings.org> Guido van Rossum wrote: > This is a bug in cPickle. It calls the PycString_IMPORT macro at the > very end of its init_stuff() function without checking for success. > The bug you cite is a genuine bug, but that's not what I'm exploiting. % python >>> import _socket >>> _socket.CAPI The PyCObject_Import() call in PycString_IMPORT doesn't return failure--it returns a valid CObject. I stuck the *wrong* CObject in cStringIO on purpose. With the current API there's no way for cPickle to tell that it's using the wrong one. For what it's worth, the previous example was for Python 2.x. (Python 3 doesn't have "cStringIO" or "cPickle".) Here's an example that crashes python in my py3k/trunk (sync'd Monday morning). And this one's only three lines: import unicodedata import _multibytecodec _multibytecodec.__create_codec(unicodedata.ucnhash_CAPI) /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocean-city at m2.ccsnet.ne.jp Thu Apr 2 02:46:30 2009 From: ocean-city at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Thu, 02 Apr 2009 09:46:30 +0900 Subject: [Python-Dev] 3.1a2 In-Reply-To: <49D26ED4.7090205@m2.ccsnet.ne.jp> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <49D26ED4.7090205@m2.ccsnet.ne.jp> Message-ID: <49D40AE6.9040408@m2.ccsnet.ne.jp> Hirokazu Yamamoto wrote: > > I added #5499 to release blocker because it needs specification > decision. (It's too strong?) Thank you for fixing this. I also added #5391: mmap: read_byte/write_byte and object type #5410: msvcrt bytes cleanup which depend on this issue. These are also API spec issue. #5410 is easy, but #5391 still needs decision which of getarg("c") or getarg("b") read_byte/write_byte should use. From benjamin at python.org Thu Apr 2 03:17:24 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 1 Apr 2009 20:17:24 -0500 Subject: [Python-Dev] 3.1a2 In-Reply-To: <49D40AE6.9040408@m2.ccsnet.ne.jp> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <49D26ED4.7090205@m2.ccsnet.ne.jp> <49D40AE6.9040408@m2.ccsnet.ne.jp> Message-ID: <1afaf6160904011817v1ca3d47ep1f7640a636a0c615@mail.gmail.com> 2009/4/1 Hirokazu Yamamoto : > > Hirokazu Yamamoto wrote: >> >> I added #5499 to release blocker because it needs specification decision. >> (It's too strong?) > > Thank you for fixing this. I also added > > #5391: mmap: read_byte/write_byte and object type > #5410: msvcrt bytes cleanup > > which depend on this issue. These are also API spec issue. > #5410 is easy, but #5391 still needs decision which of getarg("c") or > getarg("b") read_byte/write_byte should use. I'm afraid neither of these bugs are anywhere near my areas of expertise, so I'll leave resolution of them to the experts. :) -- Regards, Benjamin From lists at cheimes.de Thu Apr 2 03:23:41 2009 From: lists at cheimes.de (Christian Heimes) Date: Thu, 02 Apr 2009 03:23:41 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D3F8D0.8070805@wingware.com> References: <49D3F8D0.8070805@wingware.com> Message-ID: John Ehresman wrote: > * To what extent should non-debugger code use the hook? At one end of > the spectrum, the hook could be made readily available for non-debug use > and at the other end, it could be documented as being debug only, > disabled in python -O, & not exposed in the stdlib to python code. To explain Collin's mail: Python's dict implementation is crucial to the performance of any Python program. Modules, types, instances all rely on the speed of Python's dict type because most of them use a dict to store their name space. Even the smallest change to the C code may lead to a severe performance penalty. This is especially true for set and get operations. From python at rcn.com Thu Apr 2 03:37:34 2009 From: python at rcn.com (Raymond Hettinger) Date: Wed, 1 Apr 2009 18:37:34 -0700 Subject: [Python-Dev] PyDict_SetItem hook References: <49D3F8D0.8070805@wingware.com> Message-ID: <686ADDF37DF5413D93F43090806E0E5B@RaymondLaptop1> > John Ehresman wrote: >> * To what extent should non-debugger code use the hook? At one end of >> the spectrum, the hook could be made readily available for non-debug use >> and at the other end, it could be documented as being debug only, >> disabled in python -O, & not exposed in the stdlib to python code. > > To explain Collin's mail: > Python's dict implementation is crucial to the performance of any Python > program. Modules, types, instances all rely on the speed of Python's > dict type because most of them use a dict to store their name space. > Even the smallest change to the C code may lead to a severe performance > penalty. This is especially true for set and get operations. See my comments in http://bugs.python.org/issue5654 Raymond From guido at python.org Thu Apr 2 04:08:56 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Apr 2009 19:08:56 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D40946.1050100@hastings.org> References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> Message-ID: On Wed, Apr 1, 2009 at 5:39 PM, Larry Hastings wrote: > > Guido van Rossum wrote: > > This is a bug in cPickle. It calls the PycString_IMPORT macro at the > very end of its init_stuff() function without checking for success. > > > The bug you cite is a genuine bug, but that's not what I'm exploiting. > > % python >>>> import _socket >>>> _socket.CAPI > > > The PyCObject_Import() call in PycString_IMPORT doesn't return failure--it > returns a valid CObject.? I stuck the *wrong* CObject in cStringIO on > purpose.? With the current API there's no way for cPickle to tell that it's > using the wrong one. Ouch. So true. > For what it's worth, the previous example was for Python 2.x.? (Python 3 > doesn't have "cStringIO" or "cPickle".)? Here's an example that crashes > python in my py3k/trunk (sync'd Monday morning).? And this one's only three > lines: > > import unicodedata > import _multibytecodec > _multibytecodec.__create_codec(unicodedata.ucnhash_CAPI) Yeah, any two CAPI objects can be used to play this trick, as long as you have some place that calls them. :-( So what's your solution? If it was me I'd change the API to put the full module name and variable name of the object inside the object and have the IMPORT call check that. Then you can only have crashes if some extension module cheats, and surely there are many other ways that C extensions can cheat, so that doesn't bother me. :) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jpe at wingware.com Thu Apr 2 04:16:51 2009 From: jpe at wingware.com (John Ehresman) Date: Wed, 01 Apr 2009 21:16:51 -0500 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> Message-ID: <49D42013.3010600@wingware.com> Collin Winter wrote: > Have you measured the impact on performance? I've tried to test using pystone, but am seeing more differences between runs than there is between python w/ the patch and w/o when there is no hook installed. The highest pystone is actually from the binary w/ the patch, which I don't really believe unless it's some low level code generation affect. The cost is one test of a global variable and then a switch to the branch that doesn't call the hooks. I'd be happy to try to come up with better numbers next week after I get home from pycon. John From larry at hastings.org Thu Apr 2 04:58:30 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 01 Apr 2009 19:58:30 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> Message-ID: <49D429D6.90006@hastings.org> Guido van Rossum wrote: > Yeah, any two CAPI objects can be used to play this trick, as long as > you have some place that calls them. :-( FWIW, I can't take credit for this observation. Neal Norwitz threw me at this class of problem at the Py3k sprints in August 2007 at Google Mountain View, specifically with curses, though the approach he suggested then was removing the CObjects. Then, Monday night MvL and I re-established the problem based on my dim memories. > So what's your solution? If it was me I'd change the API to put the > full module name and variable name of the object inside the object and > have the IMPORT call check that. Then you can only have crashes if > some extension module cheats, and surely there are many other ways > that C extensions can cheat, so that doesn't bother me. :) My proposed API requires that the creator of the CObject pass in a "type" string, which must be of nonzero length, and the caller must pass in a matching string. I figured that was easy to get right and sufficient for "consenting adults". Note also this cheap exported-vtable hack isn't the only use of CObjects; for example _ctypes uses them to wrap plenty of one-off objects which are never set as attributes of the _ctypes module. We'd like a solution that enforces some safety for those too, without creating spurious module attributes. /larry// / From dalcinl at gmail.com Thu Apr 2 05:36:32 2009 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 2 Apr 2009 00:36:32 -0300 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D429D6.90006@hastings.org> References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> <49D429D6.90006@hastings.org> Message-ID: On Wed, Apr 1, 2009 at 11:58 PM, Larry Hastings wrote: > > Guido van Rossum wrote: >> >> Yeah, any two CAPI objects can be used to play this trick, as long as >> you have some place that calls them. :-( > > FWIW, I can't take credit for this observation. ?Neal Norwitz threw me at > this class of problem at the Py3k sprints in August 2007 at Google Mountain > View, specifically with curses, though the approach he suggested then was > removing the CObjects. > IMHO, removing them would be a really bad idea... PyCObject's are the documented recommended way to make ext modules export its API's, and that works pretty well in practice, and more well now with your approach. > >> So what's your solution? If it was me I'd change the API to put the >> full module name and variable name of the object inside the object and >> have the IMPORT call check that. Then you can only have crashes if >> some extension module cheats, and surely there are many other ways >> that C extensions can cheat, so that doesn't bother me. :) > > My proposed API requires that the creator of the CObject pass in a "type" > string, which must be of nonzero length, and the caller must pass in a > matching string. ?I figured that was easy to get right and sufficient for > "consenting adults". Just for reference, I'll comment how Cython uses this. First, Cython exports API in a function-by-function basis (instead of a single pointer to a C struct with function pointers, as e.g. cStringIO, or an array of func pointers, as e.g. NumPy). All these are cached in a "private" module global (a dict) named "__pyx_api__". See the link below, for example: http://mpi4py.scipy.org/docs/api/mpi4py.MPI-module.html#__pyx_capi__ So the dict keys are the exported function names. Moreover, the PyCObject's "desc" are a C string with the function signature. Cython retrieves a function by name from the dict and checks that the expected signature match. BTW, now I believe Cython should also use the function name for the "descr" :-) The only issue with this approach for Cython is that PyCObject currently stores "void*" (i.e., pointers to data), but does not have room for "void(*)(void)" (i.e. pointers to functions, aka code). Recently I had to write some hackery using type-punning with unions to avoid the illegal conversion problem between pointers to data and functions. Larry, I did not understand your comments in the tracker about this. Why do you see the above approach a miss-use of the API? All this works extremely well in practice... A Cython-implement extension module can export its API, and next you can consume it from Cython, and moreover from hand-written C extension (and then you can easily write SWIG typemaps). And as the function are exported one by one, you can even add stuff to some module API, and the consumers will not notice the thing (API tables implemented with pointer to C struct or array of function pointers, you need to be more careful for API exporting being backward) -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From guido at python.org Thu Apr 2 05:51:55 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Apr 2009 20:51:55 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D429D6.90006@hastings.org> References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> <49D429D6.90006@hastings.org> Message-ID: On Wed, Apr 1, 2009 at 7:58 PM, Larry Hastings wrote: > Guido van Rossum wrote: >> Yeah, any two CAPI objects can be used to play this trick, as long as >> you have some place that calls them. :-( > > FWIW, I can't take credit for this observation. ?Neal Norwitz threw me at > this class of problem at the Py3k sprints in August 2007 at Google Mountain > View, specifically with curses, though the approach he suggested then was > removing the CObjects. ?Then, Monday night MvL and I re-established the > problem based on my dim memories. > >> So what's your solution? If it was me I'd change the API to put the >> full module name and variable name of the object inside the object and >> have the IMPORT call check that. Then you can only have crashes if >> some extension module cheats, and surely there are many other ways >> that C extensions can cheat, so that doesn't bother me. :) > > My proposed API requires that the creator of the CObject pass in a "type" > string, which must be of nonzero length, and the caller must pass in a > matching string. ?I figured that was easy to get right and sufficient for > "consenting adults". OK, my proposal would be to agree on the value of this string too: "module.variable". > Note also this cheap exported-vtable hack isn't the > only use of CObjects; for example _ctypes uses them to wrap plenty of > one-off objects which are never set as attributes of the _ctypes module. > ?We'd like a solution that enforces some safety for those too, without > creating spurious module attributes. Why would you care about safety for ctypes? It's about as unsafe as it gets anyway. Coredump emptor I say. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Thu Apr 2 06:35:22 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 1 Apr 2009 21:35:22 -0700 Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: <18899.25424.820832.462451@montanaro.dyndns.org> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> <18899.25424.820832.462451@montanaro.dyndns.org> Message-ID: <20090402043522.GA21023@panix.com> On Wed, Apr 01, 2009, skip at pobox.com wrote: > > Antoine> Perhaps. But without using 'rU' the file couldn't be read at > Antoine> all. (I'm not sure it was Windows line endings by the way; > Antoine> perhaps Macintosh ones; anyway, it didn't work using 'rb') > > Please file a bug report and assign to me. Does it work in 2.x? What was > the source of the file? Perhaps there have been changes, but in my last job, I was running into this problem with Python 2.3, and I also needed to open with 'rU' under Linux. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From solipsis at pitrou.net Thu Apr 2 07:23:22 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 02 Apr 2009 07:23:22 +0200 Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> Message-ID: <1238649802.6033.5.camel@fsol> Le mercredi 01 avril 2009 ? 18:22 -0400, R. David Murray a ?crit : > I just added some tests to trunk that seem to indicate this case is > handled correctly in terms of preserving the data. Maybe you didn't > write the file such that the fields with the newlines were quoted? I used the default csv.writer into a StringIO, and the whole was then returned as the response of an HTTP request (with the proper Content-Type and Content-Disposition headers). I assume quoting is enabled by default? > And of course how non-Excel applications handle that data on import > can be different from how Excel handles it. Of course, but when three major spreadsheet software (including Excel itself) choke on the embedded newline, there might be a problem (or not :)). (please note that as for Excel I couldn't test myself, a client of mine did) Regards Antoine. From rdmurray at bitdance.com Thu Apr 2 07:27:05 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 2 Apr 2009 01:27:05 -0400 (EDT) Subject: [Python-Dev] CSV, bytes and encodings In-Reply-To: <1238649802.6033.5.camel@fsol> References: <1afaf6160903311209w43623e04mb35f15883b1d2560@mail.gmail.com> <18899.17394.455907.841425@montanaro.dyndns.org> <1238649802.6033.5.camel@fsol> Message-ID: On Thu, 2 Apr 2009 at 07:23, Antoine Pitrou wrote: > Le mercredi 01 avril 2009 ?? 18:22 -0400, R. David Murray a ??crit : >> I just added some tests to trunk that seem to indicate this case is >> handled correctly in terms of preserving the data. Maybe you didn't >> write the file such that the fields with the newlines were quoted? > > I used the default csv.writer into a StringIO, and the whole was then > returned as the response of an HTTP request (with the proper > Content-Type and Content-Disposition headers). I assume quoting is > enabled by default? Yes, it is. The files I've encountered that had embedded newlines I never tried to open in Excel or any other spreadsheet, so all _I'm_ sure of is that Excel produces them. >> And of course how non-Excel applications handle that data on import >> can be different from how Excel handles it. > > Of course, but when three major spreadsheet software (including Excel > itself) choke on the embedded newline, there might be a problem (or > not :)). > (please note that as for Excel I couldn't test myself, a client of mine > did) I've made a note to test this, out of curiosity, when I get home. --David From ajaksu at gmail.com Thu Apr 2 10:51:59 2009 From: ajaksu at gmail.com (Daniel (ajax) Diniz) Date: Thu, 2 Apr 2009 05:51:59 -0300 Subject: [Python-Dev] Left the GSoC-mentors list Message-ID: <2d75d7660904020151h7eabc461ged408e986f3cc34c@mail.gmail.com> Hi, I've just left the soc2009-mentors list on request, as I'm not a mentor. So if you need my input on the mentor side regarding ideas I've contributed to [1] (struct, socket, core helper tools or Roundup), please CC me. Best regards, Daniel [1] http://wiki.python.org/moin/SummerOfCode/2009/Incoming From kristjan at ccpgames.com Thu Apr 2 12:02:39 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 2 Apr 2009 10:02:39 +0000 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D3B938.5000202@hastings.org> References: <49D26BB1.8050108@hastings.org> <930F189C8A437347B80DF2C156F7EC7F056D52762C@exchis.ccp.ad.local> <49D3B938.5000202@hastings.org> Message-ID: <930F189C8A437347B80DF2C156F7EC7F056D52773A@exchis.ccp.ad.local> Thanks Larry. I didn't notice the patch, or indeed the defect, hence my question. A clarification in the documentation that a string comparison is indeed used might be useful. As a user of CObject I appreciate this effort. K -----Original Message----- From: Larry Hastings [mailto:larry at hastings.org] A method for answering further such questions suggests itself, From greg.ewing at canterbury.ac.nz Thu Apr 2 13:28:34 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 02 Apr 2009 23:28:34 +1200 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> Message-ID: <49D4A162.2020209@canterbury.ac.nz> Jim Fulton wrote: > The only type-safety mechanism for a CObject is it's identity. If you > want to make sure you're using the foomodule api, make sure the address > of the CObject is the same as the address of the api object exported by > the module. I don't follow that. If you already have the address of the thing you want to use, you don't need a CObject. > 2. Only code provided by the module provider should be accessing the > CObject exported by the module. Not following that either. Without attaching some kind of metadata to a CObject, I don't see how you can know whether a CObject passed to you from Python code is one that you created yourself, or by some other unrelated piece of code. Attaching some kind of type info to a CObject and having an easy way of checking it makes sense to me. If the existing CObject API can't be changed, maybe a new enhanced one could be added. -- Greg From gjcarneiro at gmail.com Thu Apr 2 14:25:54 2009 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Thu, 2 Apr 2009 13:25:54 +0100 Subject: [Python-Dev] OSError.errno => exception hierarchy? Message-ID: Apologies if this has already been discussed. I was expecting that by now, python 3.0, the following code: # clean the target dir import errno try: shutil.rmtree(trace_output_path) except OSError, ex: if ex.errno not in [errno.ENOENT]: raise Would have become something simpler, like this: # clean the target dir try: shutil.rmtree(trace_output_path) except OSErrorNoEntry: # or maybe os.ErrorNoEntry pass Apparently no one has bothered yet to turn OSError + errno into a hierarchy of OSError subclasses, as it should. What's the problem, no will to do it, or no manpower? Regards, -- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From hrvoje.niksic at avl.com Thu Apr 2 14:42:44 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Thu, 02 Apr 2009 14:42:44 +0200 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001> References: <49D26BB1.8050108@hastings.org> <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001> Message-ID: <49D4B2C4.4060107@avl.com> Greg Ewing wrote: > Attaching some kind of type info to a CObject and having > an easy way of checking it makes sense to me. If the > existing CObject API can't be changed, maybe a new > enhanced one could be added. I thought the entire *point* of C object was that it's an opaque box without any info whatsoever, except that which is known and shared by its creator and its consumer. If we're adding type information, then please make it a Python object rather than a C string. That way the creator and the consumer can use a richer API to query the "type", such as by calling its methods or by inspecting it in some other way. Instead of comparing strings with strcmp, it could use PyObject_RichCompareBool, which would allow a much more flexible way to define "types". Using a PyObject also ensures that the lifecycle of the attached "type" is managed by the well-understood reference-counting mechanism. From jim at zope.com Thu Apr 2 15:16:29 2009 From: jim at zope.com (Jim Fulton) Date: Thu, 2 Apr 2009 09:16:29 -0400 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> <49D429D6.90006@hastings.org> Message-ID: <64D315D7-D01E-4F8C-90C1-879D8B89EB8E@zope.com> On Apr 1, 2009, at 11:51 PM, Guido van Rossum wrote: ... >> Note also this cheap exported-vtable hack isn't the >> only use of CObjects; for example _ctypes uses them to wrap plenty of >> one-off objects which are never set as attributes of the _ctypes >> module. >> We'd like a solution that enforces some safety for those too, >> without >> creating spurious module attributes. > > Why would you care about safety for ctypes? It's about as unsafe as it > gets anyway. Coredump emptor I say. At which point, I wonder why we worry so much about someone intentionally breaking a CObject as in Larry's example. Jim -- Jim Fulton Zope Corporation From jim at zope.com Thu Apr 2 15:22:44 2009 From: jim at zope.com (Jim Fulton) Date: Thu, 2 Apr 2009 09:22:44 -0400 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D4A162.2020209@canterbury.ac.nz> References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> Message-ID: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> On Apr 2, 2009, at 7:28 AM, Greg Ewing wrote: > Jim Fulton wrote: > >> The only type-safety mechanism for a CObject is it's identity. If >> you want to make sure you're using the foomodule api, make sure >> the address of the CObject is the same as the address of the api >> object exported by the module. > > I don't follow that. If you already have the address of the > thing you want to use, you don't need a CObject. I was refering to the identity of the CObject itself. >> 2. Only code provided by the module provider should be accessing >> the CObject exported by the module. > > Not following that either. Without attaching some kind of > metadata to a CObject, I don't see how you can know whether > a CObject passed to you from Python code is one that you > created yourself, or by some other unrelated piece of > code. The original use case for CObjects was to export an API from a module, in which case, you'd be importing the API from the module. The presence in the module indicates the type. Of course, this doesn't account for someone intentionally replacing the module's CObject with a fake. > Attaching some kind of type info to a CObject and having > an easy way of checking it makes sense to me. If the > existing CObject API can't be changed, maybe a new > enhanced one could be added. I don't think backward compatibility needs to be a consideration for Python 3 at this point. I don't see much advantage in the proposal, but I can live with it for Python 3. Jim -- Jim Fulton Zope Corporation From kristjan at ccpgames.com Thu Apr 2 15:36:37 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Thu, 2 Apr 2009 13:36:37 +0000 Subject: [Python-Dev] py3k regression tests on Windows Message-ID: <930F189C8A437347B80DF2C156F7EC7F056DD0B8E0@exchis.ccp.ad.local> Hello there. Yesterday I created a number of defects for regression test failures on Windows: http://bugs.python.org/issue5646 : test_importlib fails for py3k on Windows http://bugs.python.org/issue5645 : test_memoryio fails for py3k on windows http://bugs.python.org/issue5643 : test__locale fails with RADIXCHAR on Windows Does anyone feel like taking a look? K -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 2 17:32:02 2009 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 02 Apr 2009 10:32:02 -0500 Subject: [Python-Dev] PEP 382: Namespace Packages Message-ID: <49D4DA72.60401@v.loewis.de> I propose the following PEP for inclusion to Python 3.1. Please comment. Regards, Martin Abstract ======== Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages __path__ must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. Terminology =========== Within this PEP, the term package refers to Python packages as defined by Python's import statement. The term distribution refers to separately installable sets of Python modules as stored in the Python package index, and installed by distutils or setuptools. The term vendor package refers to groups of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages install on Linux systems). The term portion refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package. Namespace packages today ======================== Python currently provides the pkgutil.extend_path to denote a package as a namespace package. The recommended way of using it is to put:: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) int the package's ``__init__.py``. Every distribution needs to provide the same contents in its ``__init__.py``, so that extend_path is invoked independent of which portion of the package gets imported first. As a consequence, the package's ``__init__.py`` cannot practically define any names as it depends on the order of the package fragments on sys.path which portion is imported first. As a special feature, extend_path reads files named ``*.pkg`` which allow to declare additional portions. setuptools provides a similar function pkg_resources.declare_namespace that is used in the form:: import pkg_resources pkg_resources.declare_namespace(__name__) In the portion's __init__.py, no assignment to __path__ is necessary, as declare_namespace modifies the package __path__ through sys.modules. As a special feature, declare_namespace also supports zip files, and registers the package name internally so that future additions to sys.path by setuptools can properly add additional portions to each package. setuptools allows declaring namespace packages in a distribution's setup.py, so that distribution developers don't need to put the magic __path__ modification into __init__.py themselves. Rationale ========= The current imperative approach to namespace packages has lead to multiple slightly-incompatible mechanisms for providing namespace packages. For example, pkgutil supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports inspecting zip files, and supports adding portions to its _namespace_packages variable, whereas pkgutil doesn't. In addition, the current approach causes problems for system vendors. Vendor packages typically must not provide overlapping files, and an attempt to install a vendor package that has a file already on disk will fail or cause unpredictable behavior. As vendors might chose to package distributions such that they will end up all in a single directory for the namespace package, all portions would contribute conflicting __init__.py files. Specification ============= Rather than using an imperative mechanism for importing packages, a declarative approach is proposed here, as an extension to the existing ``*.pkg`` mechanism. The import statement is extended so that it directly considers ``*.pkg`` files during import; a directory is considered a package if it either contains a file named __init__.py, or a file whose name ends with ".pkg". In addition, the format of the ``*.pkg`` file is extended: a line with the single character ``*`` indicates that the entire sys.path will be searched for portions of the namespace package at the time the namespace packages is imported. Importing a package will immediately compute the package's __path__; the ``*.pkg`` files are not considered anymore after the initial import. If a ``*.pkg`` package contains an asterisk, this asterisk is prepended to the package's __path__ to indicate that the package is a namespace package (and that thus further extensions to sys.path might also want to extend __path__). At most one such asterisk gets prepended to the path. extend_path will be extended to recognize namespace packages according to this PEP, and avoid adding directories twice to __path__. No other change to the importing mechanism is made; searching modules (including __init__.py) will continue to stop at the first module encountered. Discussion ========== With the addition of ``*.pkg`` files to the import mechanism, namespace packages can stop filling out the namespace package's __init__.py. As a consequence, extend_path and declare_namespace become obsolete. It is recommended that distributions put a file .pkg into their namespace packages, with a single asterisk. This allows vendor packages to install multiple portions of namespace package into a single directory, with no risk of overlapping files. Namespace packages can start providing non-trivial __init__.py implementations; to do so, it is recommended that a single distribution provides a portion with just the namespace package's __init__.py (and potentially other modules that belong to the namespace package proper). The mechanism is mostly compatible with the existing namespace mechanisms. extend_path will be adjusted to this specification; any other mechanism might cause portions to get added twice to __path__. Copyright ========= This document has been placed in the public domain. From pje at telecommunity.com Thu Apr 2 19:14:42 2009 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 02 Apr 2009 13:14:42 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D4DA72.60401@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> Message-ID: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com> At 10:32 AM 4/2/2009 -0500, Martin v. L?wis wrote: >I propose the following PEP for inclusion to Python 3.1. >Please comment. An excellent idea. One thing I am not 100% clear on, is how to get additions to sys.path to work correctly with this. Currently, when pkg_resources adds a new egg to sys.path, it uses its existing registry of namespace packages in order to locate which packages need __path__ fixups. It seems under this proposal that it would have to scan sys.modules for objects with __path__ attributes that are lists that begin with a '*', instead... which is a bit troubling because sys.modules doesn't always only contain module objects. Many major frameworks place lazy module objects, and module proxies or wrappers of various sorts in there, so scanning through it arbitrarily is not really a good idea. Perhaps we could add something like a sys.namespace_packages that would be updated by this mechanism? Then, pkg_resources could check both that and its internal registry to be both backward and forward compatible. Apart from that, this mechanism sounds great! I only wish there was a way to backport it all the way to 2.3 so I could drop the messy bits from setuptools. From guido at python.org Thu Apr 2 19:19:17 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 10:19:17 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton wrote: > The original use case for CObjects was to export an API from a module, in > which case, you'd be importing the API from the module. I consider this the *only* use case. What other use cases are there? > The presence in the > module indicates the type. Of course, this doesn't account for someone > intentionally replacing the module's CObject with a fake. And that's the problem. I would like the following to hold: given a finite number of extension modules that I trust to be safe (i.e. excluding ctypes!), pure Python code should not be able to cause any of their CObjects to be passed off for another. Putting an identity string in the CObject and checking that string in PyCObject_Import() solves this. Adding actual information about what the CObject *means* is emphatically out of scope. Once a CObject is identified as having the correct module and name, I am okay with trusting it, because Python code has no way to create CObjects. I have to trust the extension that exports the CObject anyway, since after all it is C code that could do anything at all. But I need to be able to trust that the app cannot swap CObjects. >> Attaching some kind of type info to a CObject and having >> an easy way of checking it makes sense to me. If the >> existing CObject API can't be changed, maybe a new >> enhanced one could be added. > > I don't think backward compatibility needs to be a consideration for Python > 3 at this point. ?I don't see much advantage in the proposal, but I can live > with it for Python 3. Good. Let's solve this for 3.1, and figure out whether or how to backport later, since for 2.6 (and probably 2.7) binary backwards compatibility is most important. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From solipsis at pitrou.net Thu Apr 2 19:24:04 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 2 Apr 2009 17:24:04 +0000 (UTC) Subject: [Python-Dev] Let's update CObject API so it is safe and regular! References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: Guido van Rossum python.org> writes: > > On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton zope.com> wrote: > > The original use case for CObjects was to export an API from a module, in > > which case, you'd be importing the API from the module. > > I consider this the *only* use case. What other use cases are there? I don't know if it is good style, but I could imagine it being used to accumulate non-PyObject data in a Python container (e.g. a list), without too much overhead. It is used in getargs.c to manage a list of "destructors" of temporarily created data for when a call to PyArg_Parse* fails. From guido at python.org Thu Apr 2 19:53:40 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 10:53:40 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: On Thu, Apr 2, 2009 at 10:24 AM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> >> On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton zope.com> wrote: >> > The original use case for CObjects was to export an API from a module, in >> > which case, you'd be importing the API from the module. >> >> I consider this the *only* use case. What other use cases are there? > > I don't know if it is good style, but I could imagine it being used to > accumulate non-PyObject data in a Python container (e.g. a list), without too > much overhead. > > It is used in getargs.c to manage a list of "destructors" of temporarily created > data for when a call to PyArg_Parse* fails. Well, that sounds like it really just needs to manage a variable-length array of void pointers, and using PyList and PyCObject is just laziness (and perhaps the wrong kind -- I imagine I could write the same code without using Python objects and it would be cleaner *and* faster). So no, I don't consider that a valid use case, or at least not one we need to consider for backwards compatibility of the PyCObject design. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Thu Apr 2 20:41:07 2009 From: thomas at python.org (Thomas Wouters) Date: Thu, 2 Apr 2009 20:41:07 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D42013.3010600@wingware.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> Message-ID: <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: > Collin Winter wrote: > >> Have you measured the impact on performance? >> > > I've tried to test using pystone, but am seeing more differences between > runs than there is between python w/ the patch and w/o when there is no hook > installed. The highest pystone is actually from the binary w/ the patch, > which I don't really believe unless it's some low level code generation > affect. The cost is one test of a global variable and then a switch to the > branch that doesn't call the hooks. > > I'd be happy to try to come up with better numbers next week after I get > home from pycon. > Pystone is pretty much a useless benchmark. If it measures anything, it's the speed of the bytecode dispatcher (and it doesn't measure it particularly well.) PyBench isn't any better, in my experience. Collin has collected a set of reasonable benchmarks for Unladen Swallow, but they still leave a lot to be desired. From the discussions at the VM and Language summits before PyCon, I don't think anyone else has better benchmarks, though, so I would suggest using Unladen Swallow's: http://code.google.com/p/unladen-swallow/wiki/Benchmarks -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron.duplain at gmail.com Thu Apr 2 20:44:46 2009 From: ron.duplain at gmail.com (Ron DuPlain) Date: Thu, 2 Apr 2009 14:44:46 -0400 Subject: [Python-Dev] 3to2 Project In-Reply-To: <2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com> References: <4222a8490903300744t498e79daodea9cff32e4a94c1@mail.gmail.com> <43aa6ff70903301037y215d979he36246d36c987493@mail.gmail.com> <1afaf6160903301929l4120abe5g96e2ca2fdb722896@mail.gmail.com> <2b485bad0904010950h7c3f3275n1f03c4b2cf2dcc3e@mail.gmail.com> Message-ID: <2b485bad0904021144r614d468av45c26529019a56e3@mail.gmail.com> On Wed, Apr 1, 2009 at 12:50 PM, Ron DuPlain wrote: > On Mon, Mar 30, 2009 at 9:29 PM, Benjamin Peterson wrote: >> 2009/3/30 Collin Winter : >>> If anyone is interested in working on this during the PyCon sprints or >>> otherwise, here are some easy, concrete starter projects that would >>> really help move this along: >>> - The core refactoring engine needs to be broken out from 2to3. In >>> particular, the tests/ and fixes/ need to get pulled up a directory, >>> out of lib2to3/. >>> - Once that's done, lib2to3 should then be renamed to something like >>> librefactor or something else that indicates its more general nature. >>> This will allow both 2to3 and 3to2 to more easily share the core >>> components. >> >> FWIW, I think it is unfortunately too late to make this change. We've >> already released it as lib2to3 in the standard library and I have >> actually seen it used in other projects. (PythonScope, for example.) >> > > Paul Kippes and I have been sprinting on this. ?We put lib2to3 into a > refactor package and kept a shell lib2to3 to support the old > interface. > > We are able to run 2to3, 3to2, lib2to3 tests, and refactor tests. ?We > only have a few simple 3to2 fixes now, but they should be easy to add. > ?We kept the old lib2to3 tests to make sure we didn't break anything. > As things settle down, I'd like to verify that our new lib2to3 is > backward-compatible (since right now it points to the new refactor > lib) with one of the external projects. > > We've been using hg to push changesets between each other, but we'll > be committing to the svn sandbox before the week is out. ?I'm heading > out today, but Paul is sticking around another day. > > It's a start, > > Ron > See sandbox/trunk/refactor_pkg. More fixers to come... -Ron From python at rcn.com Thu Apr 2 20:58:18 2009 From: python at rcn.com (Raymond Hettinger) Date: Thu, 2 Apr 2009 11:58:18 -0700 Subject: [Python-Dev] PyDict_SetItem hook References: <49D3F8D0.8070805@wingware.com><43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com><49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> Message-ID: <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> The measurements are just a distractor. We all already know that the hook is being added to a critical path. Everyone will pay a cost for a feature that few people will use. This is a really bad idea. It is not part of a thorough, thought-out framework of container hooks (something that would need a PEP at the very least). The case for how it helps us is somewhat thin. The case for DTrace hooks was much stronger. If something does go in, it should be #ifdef'd out by default. But then, I don't think it should go in at all. Raymond On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: Collin Winter wrote: Have you measured the impact on performance? I've tried to test using pystone, but am seeing more differences between runs than there is between python w/ the patch and w/o when there is no hook installed. The highest pystone is actually from the binary w/ the patch, which I don't really believe unless it's some low level code generation affect. The cost is one test of a global variable and then a switch to the branch that doesn't call the hooks. I'd be happy to try to come up with better numbers next week after I get home from pycon. Pystone is pretty much a useless benchmark. If it measures anything, it's the speed of the bytecode dispatcher (and it doesn't measure it particularly well.) PyBench isn't any better, in my experience. Collin has collected a set of reasonable benchmarks for Unladen Swallow, but they still leave a lot to be desired. From the discussions at the VM and Language summits before PyCon, I don't think anyone else has better benchmarks, though, so I would suggest using Unladen Swallow's: http://code.google.com/p/unladen-swallow/wiki/Benchmarks -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Apr 2 21:22:51 2009 From: larry at hastings.org (Larry Hastings) Date: Thu, 02 Apr 2009 12:22:51 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: <49D5108B.3070706@hastings.org> Guido van Rossum wrote: > On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton wrote: > >> The original use case for CObjects was to export an API from a module, in >> which case, you'd be importing the API from the module. >> > I consider this the *only* use case. What other use cases are there? Exporting a C/C++ data structure: http://wiki.cacr.caltech.edu/danse/index.php/Lots_more_details_on_writing_wrappers http://www.cacr.caltech.edu/projects/ARCS/array_kluge/array_klugemodule/html/misc_8h.html http://svn.xiph.org/trunk/vorbisfile-python/vorbisfile.c Some folks don't register a proper type; they just wrap their objects in CObjects and add module methods. The "obscure" method in the "Robin" package ( http://code.google.com/p/robin/ ) curiously wraps a *Python* object in a CObject: http://code.google.com/p/robin/source/browse/trunk/src/robin/frontends/python/module.cc I must admit I don't understand why this is a good idea. There are many more wild & wooly use cases to be found if you Google for "PyCObject_FromVoidPtr". Using CObject to exporting C APIs seems to be the minority, outside the CPython sources anyway. /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Apr 2 21:26:13 2009 From: larry at hastings.org (Larry Hastings) Date: Thu, 02 Apr 2009 12:26:13 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D4B2C4.4060107@avl.com> References: <49D26BB1.8050108@hastings.org> <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001> <49D4B2C4.4060107@avl.com> Message-ID: <49D51155.3030606@hastings.org> Hrvoje Niksic wrote: > If we're adding type information, then please make it a Python object > rather than a C string. That way the creator and the consumer can use > a richer API to query the "type", such as by calling its methods or by > inspecting it in some other way. I'm not writing my patch that way; it would be too cumbersome for what is ostensibly an easy, light-weight API. If you're going that route you might as well create a real PyTypeObject for the blob you're passing in. But please feel free to contribute your own competing patch; you may start with my patch if you like. YAGNI, /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Apr 2 21:28:48 2009 From: larry at hastings.org (Larry Hastings) Date: Thu, 02 Apr 2009 12:28:48 -0700 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D3F817.9080201@hastings.org> <49D40946.1050100@hastings.org> <49D429D6.90006@hastings.org> Message-ID: <49D511F0.1040104@hastings.org> Guido van Rossum wrote: > OK, my proposal would be to agree on the value of this string too: > "module.variable". > That's a fine idea for cases where the CObject is stored as an attribute of a module; my next update of my patch will change the existing uses to use that format. > Why would you care about safety for ctypes? It's about as unsafe as it > gets anyway. Coredump emptor I say. _ctypes and exporting C APIs are not the only use cases of CObjects in the wild. Please see, uh, that email I wrote like five minutes ago, also a reply to you. /larry/ From chris at simplistix.co.uk Thu Apr 2 22:03:34 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 21:03:34 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D4DA72.60401@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> Message-ID: <49D51A16.70804@simplistix.co.uk> Martin v. L?wis wrote: > I propose the following PEP for inclusion to Python 3.1. > Please comment. Would this support the following case: I have a package called mortar, which defines useful stuff: from mortar import content, ... I now want to distribute large optional chunks separately, but ideally so that the following will will work: from mortar.rbd import ... from mortar.zodb import ... from mortar.wsgi import ... Does the PEP support this? The only way I can currently think to do this would result in: from mortar import content,.. from mortar_rbd import ... from mortar_zodb import ... from mortar_wsgi import ... ...which looks a bit unsightly to me. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Thu Apr 2 22:03:49 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 21:03:49 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com> Message-ID: <49D51A25.10400@simplistix.co.uk> P.J. Eby wrote: > Apart from that, this mechanism sounds great! I only wish there was a > way to backport it all the way to 2.3 so I could drop the messy bits > from setuptools. Maybe we could? :-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From benjamin at python.org Thu Apr 2 22:25:09 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 2 Apr 2009 15:25:09 -0500 Subject: [Python-Dev] OSError.errno => exception hierarchy? In-Reply-To: References: Message-ID: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com> 2009/4/2 Gustavo Carneiro : > Apologies if this has already been discussed. I don't believe it has ever been discussed to be implemented. > Apparently no one has bothered yet to turn OSError + errno into a hierarchy > of OSError subclasses, as it should.? What's the problem, no will to do it, > or no manpower? Python doesn't need any more builtin exceptions to clutter the namespace. Besides, what's wrong with just checking the errno? -- Regards, Benjamin From mal at egenix.com Thu Apr 2 22:33:25 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 02 Apr 2009 22:33:25 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D4DA72.60401@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> Message-ID: <49D52115.6020001@egenix.com> On 2009-04-02 17:32, Martin v. L?wis wrote: > I propose the following PEP for inclusion to Python 3.1. Thanks for picking this up. I'd like to extend the proposal to Python 2.7 and later. > Please comment. > > Regards, > Martin > > Specification > ============= > > Rather than using an imperative mechanism for importing packages, a > declarative approach is proposed here, as an extension to the existing > ``*.pkg`` mechanism. > > The import statement is extended so that it directly considers ``*.pkg`` > files during import; a directory is considered a package if it either > contains a file named __init__.py, or a file whose name ends with > ".pkg". That's going to slow down Python package detection a lot - you'd replace an O(1) test with an O(n) scan. Alternative Approach: --------------------- Wouldn't it be better to stick with a simpler approach and look for "__pkg__.py" files to detect namespace packages using that O(1) check ? This would also avoid any issues you'd otherwise run into if you want to maintain this scheme in an importer that doesn't have access to a list of files in a package directory, but is well capable for the checking the existence of a file. Mechanism: ---------- If the import mechanism finds a matching namespace package (a directory with a __pkg__.py file), it then goes into namespace package scan mode and scans the complete sys.path for more occurrences of the same namespace package. The import loads all __pkg__.py files of matching namespace packages having the same package name during the search. One of the namespace packages, the defining namespace package, will have to include a __init__.py file. After having scanned all matching namespace packages and loading the __pkg__.py files in the order of the search, the import mechanism then sets the packages .__path__ attribute to include all namespace package directories found on sys.path and finally executes the __init__.py file. (Please let me know if the above is not clear, I will then try to follow up on it.) Discussion: ----------- The above mechanism allows the same kind of flexibility we already have with the existing normal __init__.py mechanism. * It doesn't add yet another .pth-style sys.path extension (which are difficult to manage in installations). * It always uses the same naive sys.path search strategy. The strategy is not determined by some file contents. * The search is only done once - on the first import of the package. * It's possible to have a defining package dir and add-one package dirs. * Namespace packages are easy to recognize by testing for a single resource. * Namespace __pkg__.py modules can provide extra meta-information, logging, etc. to simplify debugging namespace package setups. * It's possible to freeze such setups, to put them into ZIP files, or only have parts of it in a ZIP file and the other parts in the file-system. Caveats: * Changes to sys.path will not result in an automatic rescan for additional namespace packages, if the package was already loaded. However, we could have a function to make such a rescan explicit. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 02 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jackdied at gmail.com Thu Apr 2 22:35:41 2009 From: jackdied at gmail.com (Jack diederich) Date: Thu, 2 Apr 2009 16:35:41 -0400 Subject: [Python-Dev] OSError.errno => exception hierarchy? In-Reply-To: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com> References: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com> Message-ID: On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson wrote: > 2009/4/2 Gustavo Carneiro : >> Apologies if this has already been discussed. > > I don't believe it has ever been discussed to be implemented. > >> Apparently no one has bothered yet to turn OSError + errno into a hierarchy >> of OSError subclasses, as it should.? What's the problem, no will to do it, >> or no manpower? > > Python doesn't need any more builtin exceptions to clutter the > namespace. Besides, what's wrong with just checking the errno? The problem is manpower (this has been no ones itch). In order to have a hierarchy of OSError exceptions the underlying code would have to raise them. That means diving into all the C code that raises OSError and cleaning them up. I'm +1 on the idea but -1 on doing the work myself. -Jack From barry at python.org Thu Apr 2 22:44:09 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 2 Apr 2009 15:44:09 -0500 Subject: [Python-Dev] OSError.errno => exception hierarchy? In-Reply-To: References: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com> Message-ID: <1304C4AA-F450-49D4-9EC3-CDE3B414FA40@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 2, 2009, at 3:35 PM, Jack diederich wrote: > On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson > wrote: >> 2009/4/2 Gustavo Carneiro : >>> Apologies if this has already been discussed. >> >> I don't believe it has ever been discussed to be implemented. >> >>> Apparently no one has bothered yet to turn OSError + errno into a >>> hierarchy >>> of OSError subclasses, as it should. What's the problem, no will >>> to do it, >>> or no manpower? >> >> Python doesn't need any more builtin exceptions to clutter the >> namespace. Besides, what's wrong with just checking the errno? > > The problem is manpower (this has been no ones itch). In order to > have a hierarchy of OSError exceptions the underlying code would have > to raise them. That means diving into all the C code that raises > OSError and cleaning them up. > > I'm +1 on the idea but -1 on doing the work myself. I'm +0/-1 (idea/work) on doing them all, but I think a /few/ errnos would be very handy. I certainly check ENOENT and EEXIST very frequently, so being able to easily catch or ignore those would be a big win. I'm sure there's one or two others that would give big bang for little buck. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdUjmnEjvBPtnXfVAQKsqAP+Ol4N2EqmNl0AFRIyxyvY+i7JEWhcJMQl 7fNm/lVJt3s7+5oO7egzNJYAjCmvjd9Vdh4poAqWvmcrcJB3a0WDxf8ZTJnCErJx ehdSpx9JO0nohrhcHM+EwcvQS39vZFFlLgOkCS5O57Wy5GdynAGBlPQY5abwJGEe V8or9I16W/E= =JG7r -----END PGP SIGNATURE----- From chris at simplistix.co.uk Thu Apr 2 23:16:28 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 22:16:28 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> Message-ID: <49D52B2C.5050509@simplistix.co.uk> R. David Murray wrote: > On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote: >> Guido van Rossum wrote: >>> Well hold on for a minute, I remember we used to have an exec >>> statement in a class body in the standard library, to define some file >>> methods in socket.py IIRC. >> >> But why an exec?! Surely there must be some other way to do this than >> an exec? > > Maybe, but this sure is gnarly code: > > _s = ("def %s(self, *args): return self._sock.%s(*args)\n\n" > "%s.__doc__ = _realsocket.%s.__doc__\n") > for _m in _socketmethods: > exec _s % (_m, _m, _m, _m) > del _m, _s I played around with this and managed to rewrite it as: from functools import partial from new import instancemethod def meth(name,self,*args): return getattr(self._sock,name)(*args) for _m in _socketmethods: p = partial(meth,_m) p.__name__ = _m p.__doc__ = getattr(_realsocket,_m).__doc__ m = instancemethod(p,None,_socketobject) setattr(_socketobject,_m,m) Have I missed something or is that a suitable replacement that gets rid of the exec nastiness? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From guido at python.org Thu Apr 2 23:18:30 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 14:18:30 -0700 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D52B2C.5050509@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> Message-ID: On Thu, Apr 2, 2009 at 2:16 PM, Chris Withers wrote: > R. David Murray wrote: >> >> On Wed, 1 Apr 2009 at 13:12, Chris Withers wrote: >>> >>> Guido van Rossum wrote: >>>> >>>> ?Well hold on for a minute, I remember we used to have an exec >>>> ?statement in a class body in the standard library, to define some file >>>> ?methods in socket.py IIRC. >>> >>> But why an exec?! Surely there must be some other way to do this than an >>> exec? >> >> Maybe, but this sure is gnarly code: >> >> ? ?_s = ("def %s(self, *args): return self._sock.%s(*args)\n\n" >> ? ? ? ? ?"%s.__doc__ = _realsocket.%s.__doc__\n") >> ? ?for _m in _socketmethods: >> ? ? ? ?exec _s % (_m, _m, _m, _m) >> ? ?del _m, _s > > I played around with this and managed to rewrite it as: > > from functools import partial > from new import instancemethod > > def meth(name,self,*args): > ? ?return getattr(self._sock,name)(*args) > > for _m in _socketmethods: > ? ?p = partial(meth,_m) > ? ?p.__name__ = _m > ? ?p.__doc__ = getattr(_realsocket,_m).__doc__ > ? ?m = instancemethod(p,None,_socketobject) > ? ?setattr(_socketobject,_m,m) > > Have I missed something or is that a suitable replacement that gets rid of > the exec nastiness? That code in socket.py is much older that functools... I don't know if the dependency matters, probably not. But anyways this is moot, the bug was only about exec in a class body *nested inside a function*. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Apr 2 23:19:58 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 14:19:58 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: Wow. Can you possibly be more negative? 2009/4/2 Raymond Hettinger : > The measurements are just a distractor.? We all already know that the hook > is being added to a critical path.? Everyone will pay a cost for a feature > that few people will use.? This is a really bad idea.? It is not part of a > thorough, thought-out framework of container hooks (something that would > need a PEP at the very least).??? The case for how it helps us is somewhat > thin.? The case for DTrace hooks was much stronger. > > If something does go in, it should be #ifdef'd out by default.? But then, I > don't think it should go in at all. > > > Raymond > > > > > On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: >> >> Collin Winter wrote: >>> >>> Have you measured the impact on performance? >> >> I've tried to test using pystone, but am seeing more differences between >> runs than there is between python w/ the patch and w/o when there is no hook >> installed. ?The highest pystone is actually from the binary w/ the patch, >> which I don't really believe unless it's some low level code generation >> affect. ?The cost is one test of a global variable and then a switch to the >> branch that doesn't call the hooks. >> >> I'd be happy to try to come up with better numbers next week after I get >> home from pycon. > > Pystone is pretty much a useless benchmark. If it measures anything, it's > the speed of the bytecode dispatcher (and it doesn't measure it particularly > well.) PyBench isn't any better, in my experience. Collin has collected a > set of reasonable benchmarks for Unladen Swallow, but they still leave a lot > to be desired. From the discussions at the VM and Language summits before > PyCon, I don't think anyone else has better benchmarks, though, so I would > suggest using Unladen Swallow's: > http://code.google.com/p/unladen-swallow/wiki/Benchmarks > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From chris at simplistix.co.uk Thu Apr 2 23:21:31 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 22:21:31 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> Message-ID: <49D52C5B.7010506@simplistix.co.uk> Guido van Rossum wrote: >> from functools import partial >> from new import instancemethod >> >> def meth(name,self,*args): >> return getattr(self._sock,name)(*args) >> >> for _m in _socketmethods: >> p = partial(meth,_m) >> p.__name__ = _m >> p.__doc__ = getattr(_realsocket,_m).__doc__ >> m = instancemethod(p,None,_socketobject) >> setattr(_socketobject,_m,m) >> >> Have I missed something or is that a suitable replacement that gets rid of >> the exec nastiness? > > That code in socket.py is much older that functools... I don't know if > the dependency matters, probably not. > > But anyways this is moot, the bug was only about exec in a class body > *nested inside a function*. Indeed, I just hate seeing execs and it was an interesting mental exercise to try and get rid of the above one ;-) Assuming it breaks no tests, would there be objection to me committing the above change to the Python 3 trunk? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From amauryfa at gmail.com Thu Apr 2 23:27:20 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 2 Apr 2009 23:27:20 +0200 Subject: [Python-Dev] OSError.errno => exception hierarchy? In-Reply-To: References: <1afaf6160904021325m6d5eb230o3f762ec5b5568d1f@mail.gmail.com> Message-ID: Hello, On Thu, Apr 2, 2009 at 22:35, Jack diederich wrote: > On Thu, Apr 2, 2009 at 4:25 PM, Benjamin Peterson wrote: >> 2009/4/2 Gustavo Carneiro : >>> Apologies if this has already been discussed. >> >> I don't believe it has ever been discussed to be implemented. >> >>> Apparently no one has bothered yet to turn OSError + errno into a hierarchy >>> of OSError subclasses, as it should.? What's the problem, no will to do it, >>> or no manpower? >> >> Python doesn't need any more builtin exceptions to clutter the >> namespace. Besides, what's wrong with just checking the errno? > > The problem is manpower (this has been no ones itch). ?In order to > have a hierarchy of OSError exceptions the underlying code would have > to raise them. ?That means diving into all the C code that raises > OSError and cleaning them up. > > I'm +1 on the idea but -1 on doing the work myself. > > -Jack The py library (http://codespeak.net/py/dist/) already has a py.error module that provide an exception class for each errno. See for example how they use py.error.ENOENT, py.error.EACCES... to implement some kind of FilePath object: http://codespeak.net/svn/py/dist/py/path/local/local.py But I'm not sure I would like this kind of code in core python. Too much magic... -- Amaury Forgeot d'Arc From guido at python.org Thu Apr 2 23:49:22 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 14:49:22 -0700 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D52C5B.7010506@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> Message-ID: On Thu, Apr 2, 2009 at 2:21 PM, Chris Withers wrote: > Guido van Rossum wrote: >>> >>> from functools import partial >>> from new import instancemethod >>> >>> def meth(name,self,*args): >>> ? return getattr(self._sock,name)(*args) >>> >>> for _m in _socketmethods: >>> ? p = partial(meth,_m) >>> ? p.__name__ = _m >>> ? p.__doc__ = getattr(_realsocket,_m).__doc__ >>> ? m = instancemethod(p,None,_socketobject) >>> ? setattr(_socketobject,_m,m) >>> >>> Have I missed something or is that a suitable replacement that gets rid >>> of >>> the exec nastiness? >> >> That code in socket.py is much older that functools... I don't know if >> the dependency matters, probably not. >> >> But anyways this is moot, the bug was only about exec in a class body >> *nested inside a function*. > > Indeed, I just hate seeing execs and it was an interesting mental exercise > to try and get rid of the above one ;-) > > Assuming it breaks no tests, would there be objection to me committing the > above change to the Python 3 trunk? That's up to Benjamin. Personally, I live by "if it ain't broke, don't fix it." :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From orsenthil at gmail.com Thu Apr 2 23:50:20 2009 From: orsenthil at gmail.com (Senthil Kumaran) Date: Thu, 2 Apr 2009 16:50:20 -0500 Subject: [Python-Dev] [issue3609] does parse_header really belong in CGI module? In-Reply-To: <1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za> References: <1219193477.24.0.768590998992.issue3609@psf.upfronthosting.co.za> <1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za> Message-ID: <7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com> http://bugs.python.org/issue3609 requests to move the function parse_header present in cgi module to email package. The reasons for this request are: 1) The MIME type header parsing methods rightly belong to email package. Confirming to RFC 2045. 2) parse_qs, parse_qsl were similarly moved from cgi to urlparse. The question here is, should the relocation happen in Python 2.7 as well as in Python 3K or only in Python 3k? If changes happen in Python 2.7, then cgi.parse_header will have DeprecationWarning just in case we go for more versions in Python 2.x series. Does anyone have any concerns with this change? -- Senthil From chris at simplistix.co.uk Thu Apr 2 23:57:07 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 02 Apr 2009 22:57:07 +0100 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery Message-ID: <49D534B3.8020801@simplistix.co.uk> Hey All, I have to admit to not having the willpower to plough through the 200 unread messages in the packaging thread when I got back from PyCon but just wanted to throw out a few thoughts on what my python packaging utopia would look like: - python would have a package format that included version numbers and dependencies. - this package format would "play nice" with os-specific ideas of how packages should be structured. - python itself would have a version number, so it could be treated as just another dependency by packages (ie: python >=2.3,<3) - python would ship with a package manager that would let you install and uninstall python packages, resolving dependencies in the process and complaining if it couldn't or if there were clashes - this package manager would facilitate the building of os-specific packages (.deb, .rpm) including providing dependency information, so making life *much* easier for these packagers. - the standard library packages would be no different from any other package, and could be overridden as and when new versions became available on PyPI, should an end user so desire. They would also be free to have their own release lifecycles (unittest, distutils, email, I'm looking at you!) - python would still ship "batteries included" with versions of these packages appropriate for the release, to keep those in corporate shackles or with no network happy. In fact, creating application-specific "bundles" like this would become trivial, helping those who have apps where they want to ship as single, isolated lumps which the os-specific package managers could use without having to worry about any python package dependencies. Personally I feel all of the above are perfectly possible, and can't see anyone being left unhappy by them. I'm sure I've missed something then, otherwise why not make it happen? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From fuzzyman at voidspace.org.uk Thu Apr 2 23:58:23 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 02 Apr 2009 16:58:23 -0500 Subject: [Python-Dev] unittest package Message-ID: <49D534FF.60901@voidspace.org.uk> Hello all, The unittest module is around 1500 lines of code now, and the tests are 3000 lines. It would be much easier to maintain as a package rather than a module. Shall I work on a suggested structure or are there objections in principle? Obviously all the functionality would still be available from the top-level unittest namespace (for backwards compatibility). Michael -- http://www.ironpythoninaction.com/ From python at rcn.com Fri Apr 3 00:07:03 2009 From: python at rcn.com (Raymond Hettinger) Date: Thu, 2 Apr 2009 15:07:03 -0700 Subject: [Python-Dev] PyDict_SetItem hook References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: > Wow. Can you possibly be more negative? I think it's worse to give the poor guy the run around by making him run lots of random benchmarks. In the end, someone will run a timeit or have a specific case that shows the full effect. All of the respondents so far seem to have a clear intuition that hook is right in the middle of a critical path. Their intuition matches what I learned by spending a month trying to find ways to optimize dictionaries. Am surprised that there has been no discussion of why this should be in the default build (as opposed to a compile time option). AFAICT, users have not previously requested a hook like this. Also, there has been no discussion for an overall strategy for monitoring containers in general. Lists and tuples will both defy this approach because there is so much code that accesses the arrays directly. Am not sure whether the setitem hook would work for other implementations either. It seems weird to me that Collin's group can be working so hard just to get a percent or two improvement in specific cases for pickling while python-dev is readily entertaining a patch that slows down the entire language. If my thoughts on the subject bug you, I'll happily withdraw from the thread. I don't aspire to be a source of negativity. I just happen to think this proposal isn't a good idea. Raymond ----- Original Message ----- From: "Guido van Rossum" To: "Raymond Hettinger" Cc: "Thomas Wouters" ; "John Ehresman" ; Sent: Thursday, April 02, 2009 2:19 PM Subject: Re: [Python-Dev] PyDict_SetItem hook Wow. Can you possibly be more negative? 2009/4/2 Raymond Hettinger : > The measurements are just a distractor. We all already know that the hook > is being added to a critical path. Everyone will pay a cost for a feature > that few people will use. This is a really bad idea. It is not part of a > thorough, thought-out framework of container hooks (something that would > need a PEP at the very least). The case for how it helps us is somewhat > thin. The case for DTrace hooks was much stronger. > > If something does go in, it should be #ifdef'd out by default. But then, I > don't think it should go in at all. > > > Raymond > > > > > On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: >> >> Collin Winter wrote: >>> >>> Have you measured the impact on performance? >> >> I've tried to test using pystone, but am seeing more differences between >> runs than there is between python w/ the patch and w/o when there is no hook >> installed. The highest pystone is actually from the binary w/ the patch, >> which I don't really believe unless it's some low level code generation >> affect. The cost is one test of a global variable and then a switch to the >> branch that doesn't call the hooks. >> >> I'd be happy to try to come up with better numbers next week after I get >> home from pycon. > > Pystone is pretty much a useless benchmark. If it measures anything, it's > the speed of the bytecode dispatcher (and it doesn't measure it particularly > well.) PyBench isn't any better, in my experience. Collin has collected a > set of reasonable benchmarks for Unladen Swallow, but they still leave a lot > to be desired. From the discussions at the VM and Language summits before > PyCon, I don't think anyone else has better benchmarks, though, so I would > suggest using Unladen Swallow's: > http://code.google.com/p/unladen-swallow/wiki/Benchmarks > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From robert.collins at canonical.com Fri Apr 3 00:00:42 2009 From: robert.collins at canonical.com (Robert Collins) Date: Fri, 03 Apr 2009 09:00:42 +1100 Subject: [Python-Dev] unittest package In-Reply-To: <49D534FF.60901@voidspace.org.uk> References: <49D534FF.60901@voidspace.org.uk> Message-ID: <1238709643.2700.147.camel@lifeless-64> On Thu, 2009-04-02 at 16:58 -0500, Michael Foord wrote: > Hello all, > > The unittest module is around 1500 lines of code now, and the tests are > 3000 lines. > > It would be much easier to maintain as a package rather than a module. > Shall I work on a suggested structure or are there objections in principle? > > Obviously all the functionality would still be available from the > top-level unittest namespace (for backwards compatibility). > > Michael I'd like to see this; jmls' testtools package has a layout for this which is quite nice. -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From solipsis at pitrou.net Fri Apr 3 00:14:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 2 Apr 2009 22:14:29 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?= References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: Raymond Hettinger rcn.com> writes: > > It seems weird to me that Collin's group can be working > so hard just to get a percent or two improvement in > specific cases for pickling while python-dev is readily > entertaining a patch that slows down the entire language. I think it's really more than a percent or two: http://bugs.python.org/issue5670 Regards Antoine. From python at rcn.com Fri Apr 3 00:20:32 2009 From: python at rcn.com (Raymond Hettinger) Date: Thu, 2 Apr 2009 15:20:32 -0700 Subject: [Python-Dev] PyDict_SetItem hook References: <49D3F8D0.8070805@wingware.com><43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com><49D42013.3010600@wingware.com><9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com><78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: >> It seems weird to me that Collin's group can be working >> so hard just to get a percent or two improvement in >> specific cases for pickling while python-dev is readily >> entertaining a patch that slows down the entire language. [Antoine Pitrou] > I think it's really more than a percent or two: > http://bugs.python.org/issue5670 For lists, it was a percent or two: http://bugs.python.org/issue5671 I expect Collin's overall efforts to payoff nicely. I was just pointing-out the contrast between module specific optimization efforts versus anti-optimizations that affect the whole language. Raymond From amauryfa at gmail.com Fri Apr 3 00:26:23 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 3 Apr 2009 00:26:23 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> Message-ID: On Thu, Apr 2, 2009 at 03:23, Christian Heimes wrote: > John Ehresman wrote: >> * To what extent should non-debugger code use the hook? ?At one end of >> the spectrum, the hook could be made readily available for non-debug use >> and at the other end, it could be documented as being debug only, >> disabled in python -O, & not exposed in the stdlib to python code. > > To explain Collin's mail: > Python's dict implementation is crucial to the performance of any Python > program. Modules, types, instances all rely on the speed of Python's > dict type because most of them use a dict to store their name space. > Even the smallest change to the C code may lead to a severe performance > penalty. This is especially true for set and get operations. A change that would have no performance impact could be to set mp->ma_lookup to another function, that calls all the hooks it wants before calling the "super()" method (lookdict). This ma_lookup is already an attribute of every dict, so a debugger could trace only the namespaces it monitors. The only problem here is that ma_lookup is called with the key and its hash, but not with the value, and you cannot know whether you are reading or setting the dict. It is easy to add an argument and call ma_lookup with the value (or NULL, or -1 depending on the action: set, get or del), but this may have a slight impact (benchmark needed!) even if this argument is not used by the standard function. -- Amaury Forgeot d'Arc From thomas at python.org Fri Apr 3 00:44:18 2009 From: thomas at python.org (Thomas Wouters) Date: Fri, 3 Apr 2009 00:44:18 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: <9e804ac0904021544o3c6b1263o6db0a80d15acc3c1@mail.gmail.com> On Fri, Apr 3, 2009 at 00:07, Raymond Hettinger wrote: > > It seems weird to me that Collin's group can be working > so hard just to get a percent or two improvement in specific cases for > pickling while python-dev is readily entertaining a patch that slows down > the entire language. Collin's group has unfortunately seen that you cannot know the actual impact of a change until you measure it. GCC performance, for instance, is extremely unpredictable, and I can easily see a change like this proving to have zero impact -- or even positive impact -- on most platforms because, say, it warms the cache for the common case. I doubt it will, but you can't *know* until you measure it. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Apr 3 00:57:22 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Apr 2009 15:57:22 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: On Thu, Apr 2, 2009 at 3:07 PM, Raymond Hettinger wrote: >> Wow. Can you possibly be more negative? > > I think it's worse to give the poor guy the run around Mind your words please. > by making him run lots of random benchmarks. ?In > the end, someone will run a timeit or have a specific > case that shows the full effect. ?All of the respondents so far seem to have > a clear intuition that hook is right in the middle of a critical path. > ?Their intuition matches > what I learned by spending a month trying to find ways > to optimize dictionaries. > > Am surprised that there has been no discussion of why this should be in the > default build (as opposed to a compile time option). ?AFAICT, users have not > previously > requested a hook like this. I may be partially to blame for this. John and Stephan are requesting this because it would (mostly) fulfill one of the top wishes of the users of Wingware. So the use case is certainly real. > Also, there has been no discussion for an overall strategy > for monitoring containers in general. ?Lists and tuples will > both defy this approach because there is so much code > that accesses the arrays directly. ?Am not sure whether the > setitem hook would work for other implementations either. The primary use case is some kind of trap on assignment. While this cannot cover all cases, most non-local variables are stored in dicts. List mutations are not in the same league, as use case. > It seems weird to me that Collin's group can be working > so hard just to get a percent or two improvement in specific cases for > pickling while python-dev is readily entertaining a patch that slows down > the entire language. I don't actually believe that you can know whether this affects performance at all without serious benchmarking. The patch amounts to a single global flag check as long as the feature is disabled, and that flag could be read from the L1 cache. > If my thoughts on the subject bug you, I'll happily > withdraw from the thread. ?I don't aspire to be a > source of negativity. ?I just happen to think this proposal isn't a good > idea. I think we need more proof either way. > Raymond > > > > ----- Original Message ----- From: "Guido van Rossum" > To: "Raymond Hettinger" > Cc: "Thomas Wouters" ; "John Ehresman" > ; > Sent: Thursday, April 02, 2009 2:19 PM > Subject: Re: [Python-Dev] PyDict_SetItem hook > > > Wow. Can you possibly be more negative? > > 2009/4/2 Raymond Hettinger : >> >> The measurements are just a distractor. We all already know that the hook >> is being added to a critical path. Everyone will pay a cost for a feature >> that few people will use. This is a really bad idea. It is not part of a >> thorough, thought-out framework of container hooks (something that would >> need a PEP at the very least). The case for how it helps us is somewhat >> thin. The case for DTrace hooks was much stronger. >> >> If something does go in, it should be #ifdef'd out by default. But then, I >> don't think it should go in at all. >> >> >> Raymond >> >> >> >> >> On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: >>> >>> Collin Winter wrote: >>>> >>>> Have you measured the impact on performance? >>> >>> I've tried to test using pystone, but am seeing more differences between >>> runs than there is between python w/ the patch and w/o when there is no >>> hook >>> installed. The highest pystone is actually from the binary w/ the patch, >>> which I don't really believe unless it's some low level code generation >>> affect. The cost is one test of a global variable and then a switch to >>> the >>> branch that doesn't call the hooks. >>> >>> I'd be happy to try to come up with better numbers next week after I get >>> home from pycon. >> >> Pystone is pretty much a useless benchmark. If it measures anything, it's >> the speed of the bytecode dispatcher (and it doesn't measure it >> particularly >> well.) PyBench isn't any better, in my experience. Collin has collected a >> set of reasonable benchmarks for Unladen Swallow, but they still leave a >> lot >> to be desired. From the discussions at the VM and Language summits before >> PyCon, I don't think anyone else has better benchmarks, though, so I would >> suggest using Unladen Swallow's: >> http://code.google.com/p/unladen-swallow/wiki/Benchmarks >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/guido%40python.org >> >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Fri Apr 3 01:07:29 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 2 Apr 2009 18:07:29 -0500 Subject: [Python-Dev] unittest package In-Reply-To: <49D534FF.60901@voidspace.org.uk> References: <49D534FF.60901@voidspace.org.uk> Message-ID: <55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 2, 2009, at 4:58 PM, Michael Foord wrote: > The unittest module is around 1500 lines of code now, and the tests > are 3000 lines. > > It would be much easier to maintain as a package rather than a > module. Shall I work on a suggested structure or are there > objections in principle? +1/jfdi :) Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdVFMXEjvBPtnXfVAQJeeQQAl5yYTLCUT4M4jQBY0yb39uNexREytnmp Oo+8gaehi2at62WbeIXa3GRojfWcpAJfGEWIWxsIEe8vRBMfNJphfsiN62rD1CIt Awn9SPPka9Xxfd3fsdvfKxDpJysK1pcqNFi5e49lXgbmt8XJ/09RbviMUHFmhlxb eVYkHYmelFQ= =eNHE -----END PGP SIGNATURE----- From gjcarneiro at gmail.com Fri Apr 3 01:13:05 2009 From: gjcarneiro at gmail.com (Gustavo Carneiro) Date: Fri, 3 Apr 2009 00:13:05 +0100 Subject: [Python-Dev] OSError.errno => exception hierarchy? In-Reply-To: References: Message-ID: (cross-posting back to python-dev to finalize discussions) 2009/4/2 Guido van Rossum [...] > > The problem you report: > >> > >> try: > >> ... > >> except OSWinError: > >> ... > >> except OSLinError: > >> ... > >> > > > > Would be solved if both OSWinError and OSLinError were always defined in > > both Linux and Windows Python. Programs could be written to catch both > > OSWinError and OSLinError, except that on Linux OSWinError would never > > actually be raised, and on Windows OSLinError would never occur. Problem > > solved. > > Yeah, but now you'd have to generate the list of exceptions (which > would be enormously long) based on the union of all errno codes in the > universe. > > Unless you only want to do it for some errno codes and not for others, > which sounds like asking for trouble. > > Also you need a naming scheme that works for all errnos and doesn't > require manual work. Frankly, the only scheme that I can think of that > could be automated would be something like OSError_ENAME. > > And, while OSError is built-in, I think these exceptions (because > there are so many) should not be built-in, and probably not even live > in the 'os' namespace -- the best place for them would be the errno > module, so errno.OSError_ENAME. > > > The downsides of this? I can only see memory, at the moment, but I might > be > > missing something. > > It's an enormous amount of work to make it happen across all > platforms. And it doesn't really solve an important problem. I partially agree. It will be a lot of work. I think the problem is valid, although not very important, I agree. > > > > Now just one final word why I think this matters. The currently correct > way > > to remove a directory tree and only ignore the error "it does not exist" > is: > > > > try: > > shutil.rmtree("dirname") > > except OSError, e: > > if errno.errorcode[e.errno] != 'ENOENT': > > raise > > > > However, only very experienced programmers will know to write that > correct > > code (apparently I am not experienced enought!). > > That doesn't strike me as correct at all, since it doesn't distinguish > between ENOENT being raised for some file deep down in the tree vs. > the root not existing. (This could happen if after you did > os.listdir() some other process deleted some file.) OK. Maybe in a generic case this could happen, although I'm sure this won't happen in my particular scenario. This is about a build system, and I am assuming there are no two concurrent builds (or else a lot of other things would fail anyway). > A better way might be > > try: > shutil.rmtree() > except OSError: > if os.path.exists(): > raise Sure, this works, but at the cost of an extra system call. I think it's more elegant to check the errno (assuming the corner case you pointed out above is not an issue). > Though I don't know what you wish to happen of were a dangling > symlink. > > > What I am proposing is that the simpler correct code would be something > > like: > > > > try: > > shutil.rmtree("dirname") > > except OSNoEntryError: > > pass > > > > Much simpler, no? > > And wrong. > > > Right now, developers are tempted to write code like: > > > > shutil.rmtree("dirname", ignore_errors=True) > > > > Or: > > > > try: > > shutil.rmtree("dirname") > > except OSError: > > pass > > > > Both of which follow the error hiding anti-pattern [1]. > > > > [1] http://en.wikipedia.org/wiki/Error_hiding > > > > Thanks for reading this far. > > Thanks for not wasting any more of my time. OK, I won't waste more time. If this were an obvious improvement beyond doubt to most people, I would pursue it, but since it's not, I can live with it. Thanks anyway, -- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit "The universe is always one step beyond logic." -- Frank Herbert -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel at stutzbachenterprises.com Fri Apr 3 01:14:59 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 2 Apr 2009 18:14:59 -0500 Subject: [Python-Dev] __length_hint__ Message-ID: Iterators can implement a method called __length_hint__ that provides a hint to certain internal routines (such as list.extend) so they can operate more efficiently. As far as I can tell, __length_hint__ is currently undocumented. Should it be? If so, are there any constraints on what an iterator should return? I can think of 3 possible rules, each with advantages and disadvantages: 1. return your best guess 2. return your best guess that you are certain is not higher than the true value 3. return your best guess that you are certain is not lower than the true value Also, I've noticed that if a VERY large hint is returned by the iterator, list.extend will sometimes disregard the hint and try to allocate memory incrementally (correct for rule #1 or #2). However, in another code path it will throw a MemoryError immediately based on the hint (correct for rule #3). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Fri Apr 3 01:17:08 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 2 Apr 2009 18:17:08 -0500 Subject: [Python-Dev] __length_hint__ In-Reply-To: References: Message-ID: <1afaf6160904021617k5c810c86sc7e2d99508076f5@mail.gmail.com> 2009/4/2 Daniel Stutzbach : > Iterators can implement a method called __length_hint__ that provides a hint > to certain internal routines (such as list.extend) so they can operate more > efficiently.? As far as I can tell, __length_hint__ is currently > undocumented.? Should it be? This has been discussed, and no, it is a implementation detail mostly for the optimization of builtin iterators. > > If so, are there any constraints on what an iterator should return?? I can > think of 3 possible rules, each with advantages and disadvantages: > 1. return your best guess > 2. return your best guess that you are certain is not higher than the true > value > 3. return your best guess that you are certain is not lower than the true > value > > Also, I've noticed that if a VERY large hint is returned by the iterator, > list.extend will sometimes disregard the hint and try to allocate memory > incrementally (correct for rule #1 or #2).? However, in another code path it > will throw a MemoryError immediately based on the hint (correct for rule > #3). Perhaps Raymond can shed some light on these. -- Regards, Benjamin From python at rcn.com Fri Apr 3 01:30:39 2009 From: python at rcn.com (Raymond Hettinger) Date: Thu, 2 Apr 2009 16:30:39 -0700 Subject: [Python-Dev] __length_hint__ References: <1afaf6160904021617k5c810c86sc7e2d99508076f5@mail.gmail.com> Message-ID: >> Iterators can implement a method called __length_hint__ that provides a hint >> to certain internal routines (such as list.extend) so they can operate more >> efficiently. As far as I can tell, __length_hint__ is currently >> undocumented. Should it be? > > This has been discussed, and no, it is a implementation detail mostly > for the optimization of builtin iterators. Right. That matches my vague recollection on the subject. >> If so, are there any constraints on what an iterator should return? I can >> think of 3 possible rules, each with advantages and disadvantages: >> 1. return your best guess Yes. BTW, the same rule also applies to __len__. IIRC, Tim proposed to add that to the docs somewhere. > Perhaps Raymond can shed some light on these. Can't guess the future of __length_hint__(). Since it doesn't have a slot, the attribute lookup can actually slow down cases with a small number of iterands. The original idea was based on some research on map/fold operations, noting that iterators can sometimes be processed more efficiently if accompanied by some metadata (i.e. the iterator has a known length, consists of unique items, is sorted, is all of a certain type, is re-iterable, etc.). Raymond From ben+python at benfinney.id.au Fri Apr 3 02:25:57 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 03 Apr 2009 11:25:57 +1100 Subject: [Python-Dev] UnicodeDecodeError bug in distutils References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com> <200702242309.46022.pogonyshev@gmx.net> <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <45E0C012.7090801@palladion.com> <5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com> Message-ID: <877i22fuqy.fsf_-_@benfinney.id.au> "Phillip J. Eby" writes: > However, there's currently no standard, as far as I know, for what > encoding the PKG-INFO file should use. Who would define such a standard? My vote goes for ?default is UTF-8?. > Meanwhile, the 'register' command accepts Unicode, but is broken in > handling it. [?] > > Unfortunately, this isn't fixable until there's a new 2.5.x release. > For previous Python versions, both register and write_pkg_info() > accepted 8-bit strings and passed them on as-is, so the only > workaround for this issue at the moment is to revert to Python 2.4 > or less. What is the prognosis on this issue? It's still hitting me in Python 2.5.4. -- \ ?Everything you read in newspapers is absolutely true, except | `\ for that rare story of which you happen to have first-hand | _o__) knowledge.? ?Erwin Knoll | Ben Finney From pje at telecommunity.com Fri Apr 3 02:44:00 2009 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 02 Apr 2009 20:44:00 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D52115.6020001@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> Message-ID: <20090403004135.B76443A40A7@sparrow.telecommunity.com> At 10:33 PM 4/2/2009 +0200, M.-A. Lemburg wrote: >That's going to slow down Python package detection a lot - you'd >replace an O(1) test with an O(n) scan. I thought about this too, but it's pretty trivial considering that the only time it takes effect is when you have a directory name that matches the name you're importing, and that it will only happen once for that directory, unless there is no package on sys.path with that name, and the program tries to import the package multiple times. In other words, the overhead isn't likely to be much, compared to the time needed to say, open and marshal even a trivial __init__.py file. >Alternative Approach: >--------------------- > >Wouldn't it be better to stick with a simpler approach and look for >"__pkg__.py" files to detect namespace packages using that O(1) check ? I thought the same thing (or more precisely, a single .pkg file), but when I got lower in the PEP I saw the reason was to support system packages not having overlapping filenames. The PEP could probably be a little clearer about the connection between needing *.pkg and the system-package use case. >One of the namespace packages, the defining namespace package, will have >to include a __init__.py file. Note that there is no such thing as a "defining namespace package" -- namespace package contents are symmetrical peers. >The above mechanism allows the same kind of flexibility we already >have with the existing normal __init__.py mechanism. > >* It doesn't add yet another .pth-style sys.path extension (which are >difficult to manage in installations). > >* It always uses the same naive sys.path search strategy. The strategy >is not determined by some file contents. The above are also true for using only a '*' in .pkg files -- in that event there are no sys.path changes. (Frankly, I'm doubtful that anybody is using extend_path and .pkg files to begin with, so I'd be fine with a proposal that instead used something like '.nsp' files that didn't even need to be opened and read -- which would let the directory scan stop at the first .nsp file found. >* The search is only done once - on the first import of the package. I believe the PEP does this as well, IIUC. >* It's possible to have a defining package dir and add-one package >dirs. Also possible in the PEP, although the __init__.py must be in the first such directory on sys.path. (However, such "defining" packages are not that common now, due to tool limitations.) From greg.ewing at canterbury.ac.nz Fri Apr 3 03:11:42 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Apr 2009 14:11:42 +1300 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <49D4B2C4.4060107@avl.com> References: <49D26BB1.8050108@hastings.org> <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001> <49D4B2C4.4060107@avl.com> Message-ID: <49D5624E.9050502@canterbury.ac.nz> Hrvoje Niksic wrote: > I thought the entire *point* of C object was that it's an opaque box > without any info whatsoever, except that which is known and shared by > its creator and its consumer. But there's no way of telling who created a given CObject, so *nobody* knows anything about it for certain. -- Greg From greg.ewing at canterbury.ac.nz Fri Apr 3 03:18:50 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 03 Apr 2009 14:18:50 +1300 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: <49D563FA.7050909@canterbury.ac.nz> Jim Fulton wrote: > The original use case for CObjects was to export an API from a module, > in which case, you'd be importing the API from the module. The presence > in the module indicates the type. Sure, but it can't hurt to have an additional sanity check. Also, there are wider uses for CObjects than this. I see it as a quick way of creating a wrapper when you don't want to go to the trouble of a full-blown extension type. A small amount of metadata would make CObjects much more useful. -- Greg From doko at ubuntu.com Fri Apr 3 03:21:10 2009 From: doko at ubuntu.com (Matthias Klose) Date: Fri, 03 Apr 2009 03:21:10 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D4DA72.60401@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> Message-ID: <49D56486.8020708@ubuntu.com> Martin v. L?wis schrieb: > I propose the following PEP for inclusion to Python 3.1. > Please comment. > > Regards, > Martin > > Abstract > ======== > > Namespace packages are a mechanism for splitting a single Python > package across multiple directories on disk. In current Python > versions, an algorithm to compute the packages __path__ must be > formulated. With the enhancement proposed here, the import machinery > itself will construct the list of directories that make up the > package. +1 speaking as a downstream packaging python for Debian/Ubuntu I welcome this approach. The current practice of shipping the very same file (__init__.py) in different packages leads to conflicts for the installation of these packages (this is not specific to dpkg, but is true for rpm packaging as well). Current practice of packaging (for downstreams) so called "name space packages" is: - either to split out the namespace __init__.py into a separate (linux distribution) package (needing manual packaging effort for each name space package) - using downstream specific packaging techniques to handle conflicting files (diversions) - replicating the current behaviour of setuptools simply overwriting the file conflicts. Following this proposal (downstream) packaging of namespace packages is made possible independent of any manual downstream packaging decisions or any downstream specific packaging decisions. Matthias From pje at telecommunity.com Fri Apr 3 05:12:18 2009 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 02 Apr 2009 23:12:18 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D56486.8020708@ubuntu.com> References: <49D4DA72.60401@v.loewis.de> <49D56486.8020708@ubuntu.com> Message-ID: <20090403030953.32A493A40A7@sparrow.telecommunity.com> At 03:21 AM 4/3/2009 +0200, Matthias Klose wrote: >+1 speaking as a downstream packaging python for Debian/Ubuntu I >welcome this approach. The current practice of shipping the very >same file (__init__.py) in different packages leads to conflicts for >the installation of these packages (this is not specific to dpkg, >but is true for rpm packaging as well). Current practice of >packaging (for downstreams) so called "name space packages" is: - >either to split out the namespace __init__.py into a >separate (linux distribution) package (needing manual packaging >effort for each name space package) - using downstream specific >packaging techniques to handle conflicting files (diversions) - >replicating the current behaviour of setuptools simply overwriting >the file conflicts. Following this proposal (downstream) >packaging of namespace packages is made possible independent of any >manual downstream packaging decisions or any downstream specific >packaging decisions A clarification: setuptools does not currently install the __init__.py file when installing in --single-version-externally-managed or --root mode. Instead, it uses a project-version-nspkg.pth file that essentially simulates a variation of Martin's .pkg proposal, by abusing .pth file support. If this PEP is adopted, setuptools would replace its nspkg.pth file with a .pkg file on Python versions that provide native support for .pkg imports, keeping the .pth file only for older Pythons. (.egg files and directories will not be affected by the change, unless the zipimport module will also supports .pkg files... and again, only for Python versions that support the new approach.) From stephen at xemacs.org Fri Apr 3 06:12:36 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 03 Apr 2009 13:12:36 +0900 Subject: [Python-Dev] [issue3609] does parse_header really belong in CGI module? In-Reply-To: <7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com> References: <1219193477.24.0.768590998992.issue3609@psf.upfronthosting.co.za> <1238708765.18.0.659138371932.issue3609@psf.upfronthosting.co.za> <7c42eba10904021450k3756ee0ftfe282d065024f2bb@mail.gmail.com> Message-ID: <87zleytlxn.fsf@xemacs.org> Senthil Kumaran writes: > http://bugs.python.org/issue3609 requests to move the function > parse_header present in cgi module to email package. > > The reasons for this request are: > > 1) The MIME type header parsing methods rightly belong to email > package. Confirming to RFC 2045. In practice, the "mail" part of the name is historical; RFC 822-style headers are used in many protocols, most prominently email, netnews (less important nowadays :-( ), and HTTP. If there are differences in usage, the parsing methods may be different. If not, then this functionality is redundant in email, which has its own parser. It can't be right for email to have two parsers and CGI none! Anyway, "moving" the function is almost certainly the *wrong* thing to do, as the email package has its own conventions and organization. In particular, in email header parsing is done by methods of the message and header objects (in their respective initializations), rather than by a (global) function. Since Barry et al have been sprinting on email TNG, you really ought to coordinate this with them. I think it would be good to have header parsing and generation in a free-standing package separate from other aspects of handling Internet protocols, but this will require coordination of several modules besides email and cgi. From alexandre at peadrop.com Fri Apr 3 06:10:56 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 3 Apr 2009 00:10:56 -0400 Subject: [Python-Dev] Should the io-c modules be put in their own directory? Message-ID: Hello, I just noticed that the new io-c modules were merged in the py3k branch (I know, I am kind late on the news?blame school work). Anyway, I am just wondering if it would be a good idea to put the io-c modules in a sub-directory (like sqlite), instead of scattering them around in the Modules/ directory. Cheers, -- Alexandre From stephen at xemacs.org Fri Apr 3 06:55:58 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 03 Apr 2009 13:55:58 +0900 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <49D534B3.8020801@simplistix.co.uk> References: <49D534B3.8020801@simplistix.co.uk> Message-ID: <87y6uitjxd.fsf@xemacs.org> Chris Withers writes: > Personally I feel all of the above are perfectly possible, and can't see > anyone being left unhappy by them. I'm sure I've missed something then, > otherwise why not make it happen? Labor shortage. We will need a PEP, the PEP will need a sample implementation, and a proponent. Who's gonna bell the cat? From hrvoje.niksic at avl.com Fri Apr 3 09:59:24 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 03 Apr 2009 09:59:24 +0200 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: <20295095.32108.1238700461315.JavaMail.xicrypt@atgrzls001> References: <49D26BB1.8050108@hastings.org> <10203019.4341695.1238671712090.JavaMail.xicrypt@atgrzls001> <49D4B2C4.4060107@avl.com> <20295095.32108.1238700461315.JavaMail.xicrypt@atgrzls001> Message-ID: <49D5C1DC.10801@avl.com> Larry Hastings wrote: >> If we're adding type information, then please make it a Python object >> rather than a C string. That way the creator and the consumer can use >> a richer API to query the "type", such as by calling its methods or by >> inspecting it in some other way. > > I'm not writing my patch that way; it would be too cumbersome for what > is ostensibly an easy, light-weight API. If you're going that route > you might as well create a real PyTypeObject for the blob you're > passing in. Well, that's exactly the point, given a PyObject* tag, you can add any kind of type identification you need, including some Python type. (It is assumed that the actual pointer you're passing is not a PyObject itself, of course, otherwise you wouldn't need PyCObject at all.) I have no desire to compete with your patch, it was a suggestion for (what I see as) improvement. From ziade.tarek at gmail.com Fri Apr 3 10:01:51 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 3 Apr 2009 10:01:51 +0200 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <87y6uitjxd.fsf@xemacs.org> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> Message-ID: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> Guys, I have taken the commitment to lead these tasks and synchronize the people that are willing to help on this. We are working on several tasks and PEPS to make things happen since the summit. There's no public roadmap yet on when things will be done (because there's no 100% certitude yet on what shall be done). But that it will probably be too late to see it happen in 3.1. Python 2.7 will be our target. The tasks discussed so far are: - version definition (http://wiki.python.org/moin/DistutilsVersionFight) - egg.info standardification (PEP 376) - metadata enhancement (rewrite PEP 345) - static metadata definition work (*) - creation of a network of OS packager people - PyPI mirroring (PEP 381) Each one of this task has a leader, except the one with (*). I just got back from travelling, and I will reorganize http://wiki.python.org/moin/Distutils asap to it is up-to-date. If you want to work on one of this task or feel there's a new task you can start, please, join Distutils SIG or contact me, Regards Tarek On Fri, Apr 3, 2009 at 6:55 AM, Stephen J. Turnbull wrote: > Chris Withers writes: > > > Personally I feel all of the above are perfectly possible, and can't see > > anyone being left unhappy by them. I'm sure I've missed something then, > > otherwise why not make it happen? > > Labor shortage. > > We will need a PEP, the PEP will need a sample implementation, and > a proponent. Who's gonna bell the cat? > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com > -- Tarek Ziad? | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ziade.tarek at gmail.com Fri Apr 3 10:46:56 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 3 Apr 2009 10:46:56 +0200 Subject: [Python-Dev] UnicodeDecodeError bug in distutils In-Reply-To: <877i22fuqy.fsf_-_@benfinney.id.au> References: <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com> <200702242309.46022.pogonyshev@gmx.net> <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <45E0C012.7090801@palladion.com> <5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com> <877i22fuqy.fsf_-_@benfinney.id.au> Message-ID: <94bdd2610904030146m569aa5a2q5b7bdc542f4570e5@mail.gmail.com> On Fri, Apr 3, 2009 at 2:25 AM, Ben Finney wrote: > "Phillip J. Eby" writes: > >> However, there's currently no standard, as far as I know, for what >> encoding the PKG-INFO file should use. > > Who would define such a standard? PEP 376 where we can explain that all files in egg-info should be in a specific encoding > ?My vote goes for ?default is UTF-8?. +1 > >> Meanwhile, the 'register' command accepts Unicode, but is broken in >> handling it. [?] how so ? Tarek From solipsis at pitrou.net Fri Apr 3 11:14:51 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 3 Apr 2009 09:14:51 +0000 (UTC) Subject: [Python-Dev] Should the io-c modules be put in their own directory? References: Message-ID: Alexandre Vassalotti peadrop.com> writes: > > I just noticed that the new io-c modules were merged in the py3k > branch (I know, I am kind late on the news?blame school work). Anyway, > I am just wondering if it would be a good idea to put the io-c modules > in a sub-directory (like sqlite), instead of scattering them around in > the Modules/ directory. Welcome back! I have no particular opinion on this. I suggest waiting for Benjamin's advice and following it :-) (unless the FLUFL wants to chime in) Benjamin-makes-boring-decisions-easy'ly yrs, Antoine. From solipsis at pitrou.net Fri Apr 3 11:27:40 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 3 Apr 2009 09:27:40 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?= References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> Message-ID: Thomas Wouters python.org> writes: > > > Pystone is pretty much a useless benchmark. If it measures anything, it's the speed of the bytecode dispatcher (and it doesn't measure it particularly well.) PyBench isn't any better, in my experience. I don't think pybench is useless. It gives a lot of performance data about crucial internal operations of the interpreter. It is of course very little real-world, but conversely makes you know immediately where a performance regression has happened. (by contrast, if you witness a regression in a high-level benchmark, you still have a lot of investigation to do to find out where exactly something bad happened) Perhaps someone should start maintaining a suite of benchmarks, high-level and low-level; we currently have them all scattered around (pybench, pystone, stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not to mention other third-party stuff that can be found in e.g. the Computer Language Shootout). I also know Gregory P. Smith had emitted the idea of plotting benchmark figures for each new revision of trunk or py3k (and, perhaps, other implementations), but I don't know if he's willing to do it himself :-) Regards Antoine. From eckhardt at satorlaser.com Fri Apr 3 12:09:38 2009 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 3 Apr 2009 12:09:38 +0200 Subject: [Python-Dev] sequence slice that wraps, bug or intention? Message-ID: <200904031209.38726.eckhardt@satorlaser.com> Hi! I just stumbled across something in Python 2.6 where I'm not sure if it is by design or a fault: x = 'abdc' x[-3:-3] -> '' x[-3:-2] -> 'b' x[-3:-1] -> 'bc' x[-3: 0] -> '' The one that actually bothers me here is the last one, I would have expected it to yield 'bcd' instead, because otherwise I don't see a way to specify a slice that starts with a negative index but still includes the last element. Similarly, I would expect x[-1,1] to yield 'ca' or at least raise an error, but not to return an empty string. Bug? Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Sator Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From kristjan at ccpgames.com Fri Apr 3 12:22:58 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 3 Apr 2009 10:22:58 +0000 Subject: [Python-Dev] Let's update CObject API so it is safe and regular! In-Reply-To: References: <49D26BB1.8050108@hastings.org> <49D4A162.2020209@canterbury.ac.nz> <6B01D28B-34E2-42A1-B7AC-17963E064CEF@zope.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F056DD0BA54@exchis.ccp.ad.local> Here's one from EVE, where the DB module creates raw data, for our Crowsets, and then hands it over to another module for consumption (actual creation of the CRow and CrowDescriptor objects: BluePy raw(PyCObject_FromVoidPtr(&mColumnList, 0)); if (!raw) return 0; return PyObject_CallMethod(blueModule, "DBRowDescriptor", "O", raw.o); This is done for performance reasons to avoid data duplication. Of course it implies tight coupling of the modules. In our FreeType wrapper system, we also use it to wrap pointers to FreeType structs: template struct Wrapper : public T { ... PyObject *Wrap() {if (!sMap.size())Init(); return PyCObject_FromVoidPtrAndDesc(this, &sMap, 0);} }; It is quite useful to pass unknown and opaque stuff around with, really, and makes certain things possible that otherwise wouldn't be. We live with the type unsafety, of course. In fact, I don't think we ever use a CObject to expose an API. Kristj'an -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Guido van Rossum Sent: 2. apr?l 2009 17:19 To: Jim Fulton Cc: Python-Dev at python.org Subject: Re: [Python-Dev] Let's update CObject API so it is safe and regular! On Thu, Apr 2, 2009 at 6:22 AM, Jim Fulton wrote: > The original use case for CObjects was to export an API from a module, in > which case, you'd be importing the API from the module. I consider this the *only* use case. What other use cases are there? From p.f.moore at gmail.com Fri Apr 3 12:29:17 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 3 Apr 2009 11:29:17 +0100 Subject: [Python-Dev] sequence slice that wraps, bug or intention? In-Reply-To: <200904031209.38726.eckhardt@satorlaser.com> References: <200904031209.38726.eckhardt@satorlaser.com> Message-ID: <79990c6b0904030329i1a8a070em84d0d25df7cdd35a@mail.gmail.com> 2009/4/3 Ulrich Eckhardt : > Hi! > > I just stumbled across something in Python 2.6 where I'm not sure if it is by > design or a fault: > > x = 'abdc' > x[-3:-3] -> '' > x[-3:-2] -> 'b' > x[-3:-1] -> 'bc' > x[-3: 0] -> '' > > The one that actually bothers me here is the last one, I would have expected > it to yield 'bcd' instead, because otherwise I don't see a way to specify a > slice that starts with a negative index but still includes the last element. > > Similarly, I would expect x[-1,1] to yield 'ca' or at least raise an error, > but not to return an empty string. > > Bug? Feature. Documented behaviour, even (http://docs.python.org/reference/expressions.html#id5 section "Slicings"). This question is more appropriate for python-list (comp.lang.python) as it is about using Python, rather than the development of the Python interpreter itself (although I can see that your uncertainty as to whether this was a bug might have led you to think this was a more appropriate list). You should first confirm on python-list that a given behaviour is a bug, and if it is, post it to the tracker, rather than to python-dev. In this case, the behaviour is fine. As regards your point "I don't see a way to specify a slice that starts with a negative index but still includes the last element" what you want is x[-3:]. If you want to discuss this further, please do so on python-list. Paul. From hrvoje.niksic at avl.com Fri Apr 3 14:07:02 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 03 Apr 2009 14:07:02 +0200 Subject: [Python-Dev] Getting values stored inside sets Message-ID: <49D5FBE6.6090807@avl.com> I've stumbled upon an oddity using sets. It's trivial to test if a value is in the set, but it appears to be impossible to retrieve a stored value, other than by iterating over the whole set. Let me describe a concrete use case. Imagine a set of objects identified by some piece of information, such as a "key" slot (guaranteed to be constant for any particular element). The object could look like this: class Element(object): def __init__(self, key): self.key = key def __eq__(self, other): return self.key == other def __hash__(self): return hash(self.key) # ... Now imagine a set "s" of such objects. I can add them to the set: >>> s = set() >>> s.add(Element('foo')) >>> s.add(Element('bar')) I can test membership using the keys: >>> 'foo' in s True >>> 'blah' in s False But I can't seem to find a way to retrieve the element corresponding to 'foo', at least not without iterating over the entire set. Is this an oversight or an intentional feature? Or am I just missing an obvious way to do this? I know I can work around this by changing the set of elements to a dict that maps key -> element, but this feels unsatisfactory. It's redundant, as the element already contains all the necessary information, and the set already knows how to use it, and the set must remember the original elements anyway, to be able to iterate over them, so why not allow one to retrieve them? Secondly, the data structure I need conceptually *is* a set of elements, so it feels wrong to pigeonhole it into a dict. This wasn't an isolated case, we stumbled on this several times while trying to use sets. In comparison, STL sets don't have this limitation. If this is not possible, I would like to propose either that set's __getitem__ translates key to value, so that s['foo'] would return the first element, or, if this is considered ugly, an equivalent method, such as s.get('foo'). From p.f.moore at gmail.com Fri Apr 3 14:22:02 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 3 Apr 2009 13:22:02 +0100 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D5FBE6.6090807@avl.com> References: <49D5FBE6.6090807@avl.com> Message-ID: <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com> 2009/4/3 Hrvoje Niksic : > I've stumbled upon an oddity using sets. ?It's trivial to test if a value is > in the set, but it appears to be impossible to retrieve a stored value, > other than by iterating over the whole set. ?Let me describe a concrete use > case. > > Imagine a set of objects identified by some piece of information, such as a > "key" slot (guaranteed to be constant for any particular element). ?The > object could look like this: > > class Element(object): > ? ?def __init__(self, key): > ? ? ? ?self.key = key > ? ?def __eq__(self, other): > ? ? ? ?return self.key == other > ? ?def __hash__(self): > ? ? ? ?return hash(self.key) > ? ?# ... > > Now imagine a set "s" of such objects. ?I can add them to the set: > >>>> s = set() >>>> s.add(Element('foo')) >>>> s.add(Element('bar')) > > I can test membership using the keys: > >>>> 'foo' in s > True >>>> 'blah' in s > False > > But I can't seem to find a way to retrieve the element corresponding to > 'foo', at least not without iterating over the entire set. ?Is this an > oversight or an intentional feature? ?Or am I just missing an obvious way to > do this? My instinct is that it's intentional. I'd say that you're abusing __eq__ here. If you can say "x in s" and then can't use x as if it were the actual item inserted into s, then are they really "equal"? Using a dict seems like the correct answer. I certainly don't think it's worth complicating the set interface to cover this corner case. Paul. From tjreedy at udel.edu Fri Apr 3 14:26:02 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 03 Apr 2009 08:26:02 -0400 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D5FBE6.6090807@avl.com> References: <49D5FBE6.6090807@avl.com> Message-ID: Hrvoje Niksic wrote: > I've stumbled upon an oddity using sets. It's trivial to test if a > value is in the set, but it appears to be impossible to retrieve a > stored value, Set elements, by definition, do not have keys or position by which to grab. When they do, use a dict or list. > other than by iterating over the whole set. Let me > describe a concrete use case. > > Imagine a set of objects identified by some piece of information, such > as a "key" slot (guaranteed to be constant for any particular element). > The object could look like this: > > class Element(object): > def __init__(self, key): > self.key = key > def __eq__(self, other): > return self.key == other > def __hash__(self): > return hash(self.key) > # ... > > Now imagine a set "s" of such objects. I can add them to the set: > > >>> s = set() > >>> s.add(Element('foo')) > >>> s.add(Element('bar')) > > I can test membership using the keys: > > >>> 'foo' in s > True > >>> 'blah' in s > False > > But I can't seem to find a way to retrieve the element corresponding to > 'foo', at least not without iterating over the entire set. Is this an > oversight or an intentional feature? Or am I just missing an obvious > way to do this? Use a dict, like you did. > > I know I can work around this by changing the set of elements to a dict > that maps key -> element, but this feels unsatisfactory. Sorry, that is the right way. > It's > redundant, as the element already contains all the necessary > information, Records in a database have all the information of the record, but we still put out fields for indexes. tjr From olemis at gmail.com Fri Apr 3 14:41:52 2009 From: olemis at gmail.com (Olemis Lang) Date: Fri, 3 Apr 2009 07:41:52 -0500 Subject: [Python-Dev] unittest package In-Reply-To: <55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org> References: <49D534FF.60901@voidspace.org.uk> <55E8EAA0-868C-4AEA-B0AE-7DB85F66B348@python.org> Message-ID: <24ea26600904030541p41dd7bb9w11fb00c26cce0948@mail.gmail.com> On Thu, Apr 2, 2009 at 6:07 PM, Barry Warsaw wrote: > On Apr 2, 2009, at 4:58 PM, Michael Foord wrote: > >> The unittest module is around 1500 lines of code now, and the tests are >> 3000 lines. >> >> It would be much easier to maintain as a package rather than a module. >> Shall I work on a suggested structure or are there objections in principle? > > +1/jfdi :) > I remember that something like this was discussed some time ago ... perhaps the ideas mentionned that time might be valuable ... AFAICR somebody provided an example ... ;) +1 for unittest as a package ... BTW ... Q: Does it means that there will be subpkgs for specific (... yet standard ...) pkgs ? If this is the case, and there is a space for a unittest.doctest pkg (... or whatever ... the name may be different ;) ... and inclusion is Ok ... and so on ... I wonder ... Q: Is it possible that dutest module [1]_ be considered ... to live in stdlib ... ? The module integrates doctest + unittest ... without needing a plugin architecture or anything like that, just unittest + doctest... (... in fact, sometimes I dont really get the idea for having plugins in testing framews for what can be done following unittest philosophy ... but anyway ... this is a long OT thread ... and I dont even think to continue ... was just a brief comment ...) Classes ===== - DocTestLoader allows to load (using unittest-style) TestCases which check the match made for doctests. It provides integration with TestProgram, supports building complex TestSuites in a more natural way, and eases the use of specialized instances of TestCases built out of doctest examples. - A few classes so as to allow reporting the individual results of each and every interactive example executed during the test run. A separate entry is created in the corresponding TestResult instance containing the expected value and the actual result. - PackageTestLoader class (acting as a decorator ... design pattern ;) loads all the tests found throughout a package hierarchy using another loader . The later is used to retrieve the tests found in modules matching a specified pattern. - dutest.main is an alias for dutest.VerboseTestProgram. This class fixes a minor bug (... IMO) I found while specifying different verbosity levels from the command line to unittest.TestProgram. These are the classes right now, but some others (e.g. DocTestScriptLoader ... to load doctests out of test scripts ...) might be helpful as well ... ;o) Download from PyPI =============== dutest-0.2.2.win32.exe MS Windows installer any 76KB 28 dutest-0.2.2-py2.5.egg Python Egg 2.5 17KB 93 dutest-0.2.2.zip Source any 13KB 47 PS: Random thoughts ... .. [1] dutest 0.2.2 (http://pypi.python.org/pypi/dutest) .. [2] "Doctest and unittest... now they'll live happily together", O. Lang (2008) The Python Papers, Volume 3, Issue 1, pp. 31:51 (http://ojs.pythonpapers.org/index.php/tpp/article/view/56/51) -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: No me gustan los templates de Django ... From olemis at gmail.com Fri Apr 3 14:56:19 2009 From: olemis at gmail.com (Olemis Lang) Date: Fri, 3 Apr 2009 07:56:19 -0500 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> Message-ID: <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com> 2009/4/3 Tarek Ziad? : > Guys, > > The tasks discussed so far are: > > - version definition (http://wiki.python.org/moin/DistutilsVersionFight) > - egg.info standardification (PEP 376) > - metadata enhancement (rewrite PEP 345) > - static metadata definition work? (*) Looks fine ... and very useful ... ;) > - creation of a network of OS packager people > - PyPI mirroring (PEP 381) > Wow ! BTW ... I see nothing about removing dist_* commands from distutils ... Q: Am I wrong or it seems they will remain in stdlib ? -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a? From ziade.tarek at gmail.com Fri Apr 3 15:36:49 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 3 Apr 2009 15:36:49 +0200 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com> Message-ID: <94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com> On Fri, Apr 3, 2009 at 2:56 PM, Olemis Lang wrote: > > BTW ... I see nothing about removing dist_* commands from distutils ... > > Q: Am I wrong or it seems they will remain in stdlib ? Ok, beware that what I am writing here is for the long term. There are no plans yet to remove things right now. Maybe some things for 3.1, as long as it is clearly defined and non-controversial. And this is not the most urgent thing to take care of. So, Some commands are not really used by the OS packagers, whether because these commands don't provide what packagers need, whether because they are unable to let the packagers configure them the way they would like to. Packagers still need to tell us why and how to make things better. Some people like Toshio or Matthias are already helping a lot on this. We are making a lot of progress since the summit to share our point of views. So I'd put this task under "creation of a network of OS packager people" (them+others) And in detail : 1/ define with them the precise usage of Distutils commands in each OS community 2/ define if there's a leading project that could take care of building OS-dependant ? ?packages, using packages built by/with Distutils 4/ see what needs to be done in Distutils to let these projects play with Python packages whithout pain. 5/ finally, see what could be externalized/removed from Distutils in favor of these third-party projects. This is roughly what Guido was talking about when he said we would remove things like bdist_rpm from the stdlib : it's too OS-specific for the stdlib to do a good job in this area. To discuss this plan in details, let's move to Distutils-SIG Cheers Tarek -- Tarek Ziad? | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ From steve at pearwood.info Fri Apr 3 16:57:21 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 4 Apr 2009 01:57:21 +1100 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com> References: <49D5FBE6.6090807@avl.com> <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com> Message-ID: <200904040157.21938.steve@pearwood.info> On Fri, 3 Apr 2009 11:22:02 pm Paul Moore wrote: > I'd say that you're abusing __eq__ here. If you can say "x in s" > and then can't use x as if it were the actual item inserted into > s, then are they really "equal"? That's hardly unusual in Python. >>> alist = [0, 1, 2, 3, 4] >>> 3.0 in alist True >>> alist[3.0] Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers Besides, there's a concrete use-case for retrieving the actual object inside the set. You can ensure that you only have one instance of any object with a particular value by using a cache like this: _cache = {} def cache(obj): if obj in _cache: return _cache[obj] _cache[obj] = obj return obj Arguably, it would be neater if the cache was a set rather than a dict, thus saving one pointer per item, but of course that would rely on a change on set behaviour. -- Steven D'Aprano From solipsis at pitrou.net Fri Apr 3 17:07:28 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 3 Apr 2009 15:07:28 +0000 (UTC) Subject: [Python-Dev] Getting values stored inside sets References: <49D5FBE6.6090807@avl.com> <79990c6b0904030522v5f5b52b5y94473c4c439d91a8@mail.gmail.com> <200904040157.21938.steve@pearwood.info> Message-ID: Steven D'Aprano pearwood.info> writes: > > That's hardly unusual in Python. > > >>> alist = [0, 1, 2, 3, 4] > >>> 3.0 in alist > True > >>> alist[3.0] > Traceback (most recent call last): > File "", line 1, in > TypeError: list indices must be integers Your example is wrong: >>> alist = [0, 1, 2, 3, 4] >>> alist.index(3.0) 3 >>> alist[alist.index(3.0)] 3 Regards Antoine. From steve at pearwood.info Fri Apr 3 17:41:25 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 4 Apr 2009 02:41:25 +1100 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: References: <49D5FBE6.6090807@avl.com> <200904040157.21938.steve@pearwood.info> Message-ID: <200904040241.25758.steve@pearwood.info> On Sat, 4 Apr 2009 02:07:28 am Antoine Pitrou wrote: > Your example is wrong: Of course it is. The perils of posting at 2am, sorry. Nevertheless, the principle still holds. There's nothing in Python that prohibits two objects from being equal, but without them being interchangeable. As poorly written as my example was, it still holds: I just need to add a level of indirection. >>> alist = [100, 111, 102, 103, 105, 104, 106, 108] >>> indices_of_odd_numbers = [alist.index(n) for n in alist if n%2] >>> if Decimal('3') in indices_of_odd_numbers: ... print alist[Decimal('3')] ... Traceback (most recent call last): File "", line 2, in TypeError: list indices must be integers Python does not promise that if x == y, you can use y anywhere you can use x. Nor should it. Paul's declaration of abuse of __eq__ is unfounded. -- Steven D'Aprano From srittau at jroger.in-berlin.de Fri Apr 3 17:45:42 2009 From: srittau at jroger.in-berlin.de (Sebastian Rittau) Date: Fri, 3 Apr 2009 17:45:42 +0200 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D5FBE6.6090807@avl.com> References: <49D5FBE6.6090807@avl.com> Message-ID: <20090403154541.GA6881@jroger.in-berlin.de> Hello, On Fri, Apr 03, 2009 at 02:07:02PM +0200, Hrvoje Niksic wrote: > But I can't seem to find a way to retrieve the element corresponding to > 'foo', at least not without iterating over the entire set. Is this an > oversight or an intentional feature? Or am I just missing an obvious > way to do this? I am missing a simple way to retrieve the "first" element of any iterable in python that matches a certain condition anyway. Something like this: def first(iter, cb): for el in iter: if cb(el): return el raise IndexError() Or (shorter, but potentially slower): def first(iter, cb): return [el for el in iter if cb(el)][0] To be used like this: my_el = first(my_set, lambda el: el == "foobar") This is something I need from time to time and this also seems to solve your problem. - Sebastian From thomas at python.org Fri Apr 3 18:06:17 2009 From: thomas at python.org (Thomas Wouters) Date: Fri, 3 Apr 2009 18:06:17 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> Message-ID: <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou wrote: > Thomas Wouters python.org> writes: > > > > > > Pystone is pretty much a useless benchmark. If it measures anything, it's > the > speed of the bytecode dispatcher (and it doesn't measure it particularly > well.) > PyBench isn't any better, in my experience. > > I don't think pybench is useless. It gives a lot of performance data about > crucial internal operations of the interpreter. It is of course very little > real-world, but conversely makes you know immediately where a performance > regression has happened. (by contrast, if you witness a regression in a > high-level benchmark, you still have a lot of investigation to do to find > out > where exactly something bad happened) Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include *negative* run times. And yes, I tried all the different settings for calibration runs and timing mechanisms. The tests in PyBench are not micro-benchmarks (they do way too much for that), they don't try to minimize overhead or noise, but they are also not representative of real-world code. That doesn't just mean "you can't infer the affected operation from the test name", but "you can't infer anything." You can just be looking at differently borrowed runtime. I have in the past written patches to Python that improved *every* micro-benchmark and *every* real-world measurement I made, except PyBench. Trying to pinpoint the slowdown invariably lead to tests that did too much in the measurement loop, introduced too much noise in the "calibration" run or just spent their time *in the measurement loop* on doing setup and teardown of the test. Collin and Jeffrey have seen the exact same thing since starting work on Unladen Swallow. So, sure, it might be "useful" if you have 10% or more difference across the board, and if you don't have access to anything but pybench and pystone. > Perhaps someone should start maintaining a suite of benchmarks, high-level > and > low-level; we currently have them all scattered around (pybench, pystone, > stringbench, richard, iobench, and the various Unladen Swallow benchmarks; > not > to mention other third-party stuff that can be found in e.g. the Computer > Language Shootout). That's exactly what Collin proposed at the summits last week. Have you seen http://code.google.com/p/unladen-swallow/wiki/Benchmarks ? Please feel free to suggest more benchmarks to add :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Fri Apr 3 18:07:29 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 3 Apr 2009 18:07:29 +0200 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <20090403154541.GA6881@jroger.in-berlin.de> References: <49D5FBE6.6090807@avl.com> <20090403154541.GA6881@jroger.in-berlin.de> Message-ID: Hi, On Fri, Apr 3, 2009 at 17:45, Sebastian Rittau wrote: > I am missing a simple way to retrieve the "first" element of any > iterable in python that matches a certain condition anyway. Something > like this: > > ?def first(iter, cb): > ? ? ?for el in iter: > ? ? ? ? ?if cb(el): > ? ? ? ? ? ? ?return el > ? ? ?raise IndexError() > > Or (shorter, but potentially slower): > > ?def first(iter, cb): > ? ? ?return [el for el in iter if cb(el)][0] > > To be used like this: > > ?my_el = first(my_set, lambda el: el == "foobar") > > This is something I need from time to time and this also seems to solve > your problem. def first(iter, cb): return itertools.ifilter(cb, iter).next() -- Amaury Forgeot d'Arc From chris at simplistix.co.uk Fri Apr 3 18:08:05 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 03 Apr 2009 17:08:05 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> Message-ID: <49D63465.80401@simplistix.co.uk> Guido van Rossum wrote: >>> But anyways this is moot, the bug was only about exec in a class body >>> *nested inside a function*. >> Indeed, I just hate seeing execs and it was an interesting mental exercise >> to try and get rid of the above one ;-) >> >> Assuming it breaks no tests, would there be objection to me committing the >> above change to the Python 3 trunk? > > That's up to Benjamin. Personally, I live by "if it ain't broke, don't > fix it." :-) Anything using an exec is broken by definition ;-) Benjamin? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From olemis at gmail.com Fri Apr 3 18:16:47 2009 From: olemis at gmail.com (Olemis Lang) Date: Fri, 3 Apr 2009 11:16:47 -0500 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <24ea26600904030556o79b82a68le326ddd934806135@mail.gmail.com> <94bdd2610904030636q4089bcban635c32e5eaac1d6d@mail.gmail.com> Message-ID: <24ea26600904030916y8b8d6aeqbda28fe6a481a7b5@mail.gmail.com> On Fri, Apr 3, 2009 at 8:36 AM, Tarek Ziad? wrote: > On Fri, Apr 3, 2009 at 2:56 PM, Olemis Lang wrote: >> >> BTW ... I see nothing about removing dist_* commands from distutils ... >> >> Q: Am I wrong or it seems they will remain in stdlib ? > > This is roughly what Guido was talking about when he said we would > remove things like bdist_rpm > from the stdlib : it's too OS-specific for the stdlib to do a good job > in this area. > > To discuss this plan in details, let's move to Distutils-SIG > understood ... ;) -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a? From collinw at gmail.com Fri Apr 3 18:19:29 2009 From: collinw at gmail.com (Collin Winter) Date: Fri, 3 Apr 2009 09:19:29 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> Message-ID: <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou wrote: > Thomas Wouters python.org> writes: >> >> >> Pystone is pretty much a useless benchmark. If it measures anything, it's the > speed of the bytecode dispatcher (and it doesn't measure it particularly well.) > PyBench isn't any better, in my experience. > > I don't think pybench is useless. It gives a lot of performance data about > crucial internal operations of the interpreter. It is of course very little > real-world, but conversely makes you know immediately where a performance > regression has happened. (by contrast, if you witness a regression in a > high-level benchmark, you still have a lot of investigation to do to find out > where exactly something bad happened) > > Perhaps someone should start maintaining a suite of benchmarks, high-level and > low-level; we currently have them all scattered around (pybench, pystone, > stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not > to mention other third-party stuff that can be found in e.g. the Computer > Language Shootout). Already in the works :) As part of the common standard library and test suite that we agreed on at the PyCon language summit last week, we're going to include a common benchmark suite that all Python implementations can share. This is still some months off, though, so there'll be plenty of time to bikeshed^Wrationally discuss which benchmarks should go in there. Collin From chris at simplistix.co.uk Fri Apr 3 18:20:36 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 03 Apr 2009 17:20:36 +0100 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> Message-ID: <49D63754.6030601@simplistix.co.uk> Tarek Ziad? wrote: > I have taken the commitment to lead these tasks and synchronize the people > that are willing to help on this. Good, I'm one of those people, sadly my only help may be to ask "how is this bit going to be done?". > The tasks discussed so far are: > > - version definition (http://wiki.python.org/moin/DistutilsVersionFight) > - egg.info standardification (PEP 376) > - metadata enhancement (rewrite PEP 345) > - static metadata definition work (*) These all seem to be a subset of the last one, right? > - creation of a network of OS packager people This would be useful... > - PyPI mirroring (PEP 381) I don't see why PyPI isn't just ported to GAE with an S3 data storage bit and be done with it... Offline mirrors for people behind firewalls already have solutions out there... > Each one of this task has a leader, except the one with (*). I just got back > from travelling, and I will reorganize > http://wiki.python.org/moin/Distutils asap to it is up-to-date. Cool, is this the focal point to track your activities? > If you want to work on one of this task or feel there's a new task you can > start, please, join Distutils SIG or contact me, Well, I think my "big list" breaks down roughly as tasks, of which I think the stuff you're already doing will hopefully take care of the first 2, but what about the rest. If labour shortage is all that's stopping this, then let me know ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From martin at v.loewis.de Fri Apr 3 18:21:27 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 18:21:27 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: <49D63787.3080304@v.loewis.de> > I think it's worse to give the poor guy the run around > by making him run lots of random benchmarks. "the poor guy" works for Wingware (a company you may have heard of) and has contributed to Python at several occasions. His name is John Ehresmann. > In the end, someone will run a timeit or have a specific > case that shows the full effect. All of the respondents so far seem to > have a clear intuition that hook is right in the middle of a critical > path. Their intuition matches what I learned by spending a month > trying to find ways to optimize dictionaries. Ok, so add me as a respondent who thinks that this deserves to be added despite being in the critical path. I doubt it will be noticeable in practice. > Am surprised that there has been no discussion of why this should be in > the default build (as opposed to a compile time option). Because, as a compile time option, it will be useless. It's not targeted for people who want to work on the Python VM (who are the primary users of compile time options), but for people developing Python applications. > AFAICT, users have not previously requested a hook like this. That's because debugging Python in general is in a sad state (which, in turn, is because you can get very far with just print calls). > Also, there has been no discussion for an overall strategy > for monitoring containers in general. Lists and tuples will > both defy this approach because there is so much code > that accesses the arrays directly. Dicts are special because they are used to implement namespaces. Watchpoints is an incredibly useful debugging aid. > Am not sure whether the > setitem hook would work for other implementations either. I can't see why it shouldn't. > If my thoughts on the subject bug you, I'll happily > withdraw from the thread. I don't aspire to be a > source of negativity. I just happen to think this proposal isn't a good > idea. As somebody who has worked a lot on performance, I'm puzzled how easily you judge a the performance impact of a patch without having seen any benchmarks. If I have learned anything about performance, it is this: never guess the performance aspects of code without benchmarking. Regards, Martin From status at bugs.python.org Fri Apr 3 18:06:59 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 3 Apr 2009 18:06:59 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090403160659.1AB6278590@psf.upfronthosting.co.za> ACTIVITY SUMMARY (03/27/09 - 04/03/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2272 open (+59) / 15229 closed (+39) / 17501 total (+98) Open issues with patches: 857 Average duration of open issues: 647 days. Median duration of open issues: 388 days. Open Issues Breakdown open 2222 (+57) pending 50 ( +2) Issues Created Or Reopened (103) ________________________________ mmap.move crashes by integer overflow 03/31/09 CLOSED http://bugs.python.org/issue5387 reopened ocean-city patch Issue in transparency in top level tk window(python) on MAC 03/28/09 CLOSED http://bugs.python.org/issue5569 reopened YMohan unqualified exec in class body 04/01/09 http://bugs.python.org/issue5578 reopened loewis patch abc.abstractproperty() docs list fget as required; fget is not r 03/27/09 CLOSED http://bugs.python.org/issue5581 created Devin Jeanpierre Incorrect DST transition on Windows 03/27/09 http://bugs.python.org/issue5582 created acummings Optional extensions in setup.py 03/28/09 CLOSED http://bugs.python.org/issue5583 created georg.brandl patch json.loads(u'3.14') fails unexpectedly (minor scanner bug) 03/28/09 http://bugs.python.org/issue5584 created bob.ippolito easy implement initializer for multiprocessing.BaseManager.start() 03/28/09 CLOSED http://bugs.python.org/issue5585 created lekma patch, needs review The documentation of os.makedirs is misleading 03/28/09 http://bugs.python.org/issue5586 created mher patch vars() no longer has a use __repr__ 03/28/09 http://bugs.python.org/issue5587 created rhettinger Add --randseed to regrtest 03/28/09 CLOSED http://bugs.python.org/issue5588 created collinwinter patch, needs review Wrong dump of floats 03/28/09 CLOSED http://bugs.python.org/issue5589 created stein pyexpat defines global symbol template_string 03/28/09 http://bugs.python.org/issue5590 created doko global symbols in shared libpython not prefixed with Py or _Py 03/28/09 CLOSED http://bugs.python.org/issue5591 created doko Modules/_textio.c defines global symbol encodefuncs 03/28/09 CLOSED http://bugs.python.org/issue5592 created doko test_math.testFsum failure on release30-maint 03/29/09 http://bugs.python.org/issue5593 reopened pitrou IDLE startup configuration 03/29/09 http://bugs.python.org/issue5594 created mark os.path.ismount (ntpath) gives UnboundLocalError for any input 03/29/09 CLOSED http://bugs.python.org/issue5595 created mnewman memory leaks in 3.1 03/29/09 http://bugs.python.org/issue5596 created pitrou patch inspect.formatargspec crashes on missing kwonlydefaults 03/29/09 CLOSED http://bugs.python.org/issue5597 created petr.dolezal "paths" argument missing in DocFileSuite documentation 03/29/09 CLOSED http://bugs.python.org/issue5598 created harobed test_email_codecs is skipped because it fails to import TestSkip 03/30/09 CLOSED http://bugs.python.org/issue5599 created r.david.murray easy Slight inaccuracy in webbrowser documentation 03/30/09 CLOSED http://bugs.python.org/issue5600 created MLModel webbrowser doesn't just open browsers 03/30/09 http://bugs.python.org/issue5601 created MLModel Slight punctuation problem in documentation of urllib.request.ur 03/30/09 CLOSED http://bugs.python.org/issue5602 created MLModel Garbled sentence in documentation of urllib.request.urlopen 03/30/09 CLOSED http://bugs.python.org/issue5603 created MLModel imp.find_module() mixes UTF8 and MBCS 03/30/09 CLOSED http://bugs.python.org/issue5604 created gvanrossum Don't assume that repr of literal dicts are sorted like pprint s 03/30/09 CLOSED http://bugs.python.org/issue5605 created fwierzbicki The makefile dependencies listing formatter.h are wrong 03/30/09 http://bugs.python.org/issue5606 created stutzbach patch Lib/distutils/test/test_util: test_get_platform bogus for OSX 03/30/09 http://bugs.python.org/issue5607 created ronaldoussoren Add python.exe to the path in windows? 03/30/09 CLOSED http://bugs.python.org/issue5608 created twillis Create Unit Tests for nturl2path module 03/30/09 http://bugs.python.org/issue5609 created Kozyarchuk email feedparser.py CRLFLF bug: $ vs \Z 03/30/09 http://bugs.python.org/issue5610 created tony_nelson patch Auto-detect indentation in C source in vimrc 03/30/09 http://bugs.python.org/issue5611 created KirkMcDonald patch whitespace folding in the email package could be better ;-) 03/30/09 http://bugs.python.org/issue5612 created cjw296 test_posix.py and test_wait4.py having missing import on win32 03/30/09 CLOSED http://bugs.python.org/issue5613 created tdriscol Malloc errors in test_io 03/30/09 http://bugs.python.org/issue5614 created ronaldoussoren linking fails when configured --without-threads 03/30/09 http://bugs.python.org/issue5615 created stutzbach patch Distutils 2to3 support doesn't have the doctest_only flag. 03/30/09 http://bugs.python.org/issue5616 created lregebro Unicode printing in gdb post-mortem sessions 03/31/09 CLOSED http://bugs.python.org/issue5617 created dugan PyMemberDef type T_UBYTE incorrectly documtented 03/31/09 CLOSED http://bugs.python.org/issue5618 created briancurtin patch Pass MS CRT debug flags into subprocesses 04/01/09 http://bugs.python.org/issue5619 reopened jnoller The attribute's action of an object is not correct. 03/31/09 CLOSED http://bugs.python.org/issue5620 created Yong yang Add description of special case to "Assignment statements" secti 03/31/09 http://bugs.python.org/issue5621 created jjposner wrong error from curses.wrapper if curses initialization fails 03/31/09 http://bugs.python.org/issue5622 created nad test_fdopen fails with vs2005, release build on Windows 2000 03/31/09 http://bugs.python.org/issue5623 created amaury.forgeotdarc patch Py3K branch import _winreg instead of winreg 03/31/09 CLOSED http://bugs.python.org/issue5624 created Kozyarchuk test_urllib2 fails - urlopen error file not on local host 03/31/09 http://bugs.python.org/issue5625 created nad misleading comment in socket.gethostname() documentation 03/31/09 http://bugs.python.org/issue5626 created nad PyDict_SetItemString() fails when the second argument is null 03/31/09 CLOSED http://bugs.python.org/issue5627 created eulerto TextIOWrapper fails with SystemError when reading HTTPResponse 04/01/09 http://bugs.python.org/issue5628 reopened orsenthil PEP 0 date and revision not being set 03/31/09 http://bugs.python.org/issue5629 created brett.cannon Update CObject API so it is safe and regular 03/31/09 http://bugs.python.org/issue5630 created lhastings patch Distutils "upload" command does not show up in--help-commands ou 03/31/09 CLOSED http://bugs.python.org/issue5631 created blais Bug - threading.currentThread().ident returns None in main threa 03/31/09 CLOSED http://bugs.python.org/issue5632 created skip.montanaro patch fix for timeit when the statment is a string and the setup is n 03/31/09 http://bugs.python.org/issue5633 created tdriscol cPickle error in case of recursion limit 03/31/09 http://bugs.python.org/issue5634 created bad test_sys reference counting fails while tracing 03/31/09 CLOSED http://bugs.python.org/issue5635 created dugan patch csv.reader next() method missing 04/01/09 CLOSED http://bugs.python.org/issue5636 created tonyjoblin 2to3 does not convert urllib.urlopen to urllib.request.urlopen 04/01/09 CLOSED http://bugs.python.org/issue5637 created orsenthil test_httpservers fails CGI tests if --enable-shared 04/01/09 http://bugs.python.org/issue5638 created tony_nelson Support TLS SNI extension in ssl module 04/01/09 http://bugs.python.org/issue5639 created pdp patch Wrong print() result when unicode error handler is not 'strict' 04/01/09 http://bugs.python.org/issue5640 created ishimoto patch Local variables not freed when Exception raises in function call 04/02/09 CLOSED http://bugs.python.org/issue5641 reopened Glin multiprocessing.Pool.map() docs slightly misleading 04/01/09 http://bugs.python.org/issue5642 created jmmcd test__locale fails with RADIXCHAR on Windows 04/01/09 http://bugs.python.org/issue5643 created krisvale test___future__ fails for py3k on Windows 04/01/09 CLOSED http://bugs.python.org/issue5644 created krisvale test_memoryio fails for py3k on windows 04/01/09 http://bugs.python.org/issue5645 created krisvale test_importlib fails for py3k on Windows 04/01/09 http://bugs.python.org/issue5646 created krisvale MutableSet.__iand__ implementation calls self.discard while iter 04/01/09 CLOSED http://bugs.python.org/issue5647 created della OS X Installer: do not install obsolete documentation within Pyt 04/01/09 http://bugs.python.org/issue5648 created nad OS X Installer: only include PythonSystemFixes package if target 04/01/09 http://bugs.python.org/issue5649 created nad Obsolete RFC's should be removed from doc of urllib.urlparse 04/01/09 http://bugs.python.org/issue5650 created MLModel OS X Installer: add checks to ensure proper Tcl configuration du 04/01/09 http://bugs.python.org/issue5651 created nad OS X Installer: remove references to Mac/Tools which no longer e 04/01/09 http://bugs.python.org/issue5652 created nad OS X Installer: by default install versioned-only links in /usr/ 04/01/09 http://bugs.python.org/issue5653 created nad Add C hook in PyDict_SetItem for debuggers 04/01/09 http://bugs.python.org/issue5654 created jpe patch fix glob.iglob docstring 04/01/09 CLOSED http://bugs.python.org/issue5655 created dsm001 patch Coverage execution fails for files not encoded with utf-8 04/01/09 CLOSED http://bugs.python.org/issue5656 created maru patch bad repr of itertools.count object with negative value on OS X 1 04/01/09 http://bugs.python.org/issue5657 created nad make html in doc fails because Makefile assigns python to PYTHO 04/01/09 CLOSED http://bugs.python.org/issue5658 created MLModel logging.FileHandler encoding parameter does not work as expected 04/01/09 CLOSED http://bugs.python.org/issue5659 created warp Cannot deepcopy unittest.TestCase instances 04/02/09 CLOSED http://bugs.python.org/issue5660 created spiv asyncore should catch EPIPE while sending() and receiving() 04/02/09 http://bugs.python.org/issue5661 created giampaolo.rodola patch py3k interpreter leak 04/02/09 CLOSED http://bugs.python.org/issue5662 created quiver Better failure messages for unittest assertions 04/02/09 CLOSED http://bugs.python.org/issue5663 created michael.foord patch 2to3 wont convert Cookie.Cookie properly 04/02/09 http://bugs.python.org/issue5664 created orsenthil Add more pickling tests 04/02/09 http://bugs.python.org/issue5665 created collinwinter patch, needs review Py_BuildValue("c") should return bytes? 04/02/09 http://bugs.python.org/issue5666 created ocean-city patch Interpreter fails to initialize on build dir when IO encoding is 04/02/09 http://bugs.python.org/issue5667 created hyeshik.chang file "" on disk creates garbage output in stack trace 04/02/09 http://bugs.python.org/issue5668 created zbysz Extra heapq nlargest/nsmallest option for including ties 04/02/09 CLOSED http://bugs.python.org/issue5669 reopened gsakkis Speed up pickling of dicts in cPickle 04/02/09 http://bugs.python.org/issue5670 created collinwinter patch, needs review Speed up pickling of lists in cPickle 04/02/09 http://bugs.python.org/issue5671 created collinwinter patch, needs review Implement a way to change the python process name 04/02/09 http://bugs.python.org/issue5672 created marcelo_fernandez Add timeout option to subprocess.Popen 04/02/09 http://bugs.python.org/issue5673 created rnk patch distutils fails to find Linux libs (lib.....so.n) 04/02/09 http://bugs.python.org/issue5674 created jgarrison string module requires bytes type for maketrans, but calling met 04/02/09 http://bugs.python.org/issue5675 created MechPaul Fix "make clean" in py3k/trunk 04/03/09 http://bugs.python.org/issue5676 created lhastings patch Serious interpreter crash and/or arbitrary memory leak using .re 04/03/09 http://bugs.python.org/issue5677 created nneonneo typo in future_builtins documentation 04/03/09 http://bugs.python.org/issue5678 created fredreichbier Repair or Change installation error 03/30/09 http://bugs.python.org/issue1565509 reopened ghazel Speed up using + for string concatenation 03/30/09 http://bugs.python.org/issue1569040 reopened benjamin.peterson patch Issues Now Closed (233) _______________________ Get rid of more references to __cmp__ 166 days http://bugs.python.org/issue1717 georg.brandl patch email.MIMEText.MIMEText.as_string incorrectly folding long subje 426 days http://bugs.python.org/issue1974 cjw296 asynchat push always sends 512 bytes (ignoring ac_out_buffer_siz 414 days http://bugs.python.org/issue2073 josiahcarlson time.strptime too strict? should it assume current year? 394 days http://bugs.python.org/issue2227 brett.cannon patch, easy Missing documentation about old/new-style classes 386 days http://bugs.python.org/issue2266 georg.brandl Using an iteration variable outside a list comprehension needs a 379 days http://bugs.python.org/issue2344 jhylton 26backport Backport memoryview object to Python 2.7 380 days http://bugs.python.org/issue2396 pitrou patch regrtest should not just skip imports that fail 378 days http://bugs.python.org/issue2409 r.david.murray patch stdbool support 366 days http://bugs.python.org/issue2497 r.david.murray patch locale.format() problems with decimal separator 365 days http://bugs.python.org/issue2522 r.david.murray patch Seconds range in time unit 360 days http://bugs.python.org/issue2568 r.david.murray patch, easy cmd.py should track input file objects so macros with submacros 357 days http://bugs.python.org/issue2577 rickbking ErrorHandler buffer overflow in ?unused? SGI extension module al 354 days http://bugs.python.org/issue2591 gvanrossum alp_ReadFrames() integer overflow leads to buffer overflow 355 days http://bugs.python.org/issue2593 r.david.murray alp_readsamps() overflow leads to memory corruption in ?unused? 355 days http://bugs.python.org/issue2594 r.david.murray allow field_name in format strings to default to next positional 354 days http://bugs.python.org/issue2599 r.david.murray mailbox.MH.get_message() treats result of get_sequences() as lis 355 days http://bugs.python.org/issue2625 r.david.murray patch, easy Mac version of IDLE doesn't scroll as expected 330 days http://bugs.python.org/issue2754 ronaldoussoren IDLE ignores module change before restart 330 days http://bugs.python.org/issue2755 tjreedy pydoc doesnt show 'from module import identifier' in the docs 308 days http://bugs.python.org/issue2966 georg.brandl arguments and default path not set in site.py and sitecustomize. 308 days http://bugs.python.org/issue2972 brett.cannon Clean up Demos and Tools 293 days http://bugs.python.org/issue3087 brett.cannon easy Multiprocessing package build problem on Solaris 10 291 days http://bugs.python.org/issue3110 jnoller patch Hang when calling get() on an empty queue in the queue module 281 days http://bugs.python.org/issue3138 tazle test_multiprocessing: test_listener_client flakiness 272 days http://bugs.python.org/issue3270 jnoller patch Test failure in test_math::testSum 252 days http://bugs.python.org/issue3421 marketdickinson urllib documentation: urlopen().info() return type 252 days http://bugs.python.org/issue3427 georg.brandl patch Multi-process 2to3 250 days http://bugs.python.org/issue3448 benjamin.peterson patch os.path.normcase documentation/behaviour unclear on Mac OS X 241 days http://bugs.python.org/issue3485 ronaldoussoren patch Missing IDLE Preferences on Mac 231 days http://bugs.python.org/issue3549 kbk multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES i 231 days http://bugs.python.org/issue3551 jnoller patch A more informative message for ImportError 225 days http://bugs.python.org/issue3619 brett.cannon patch Cannot read saved csv file in a single run 216 days http://bugs.python.org/issue3681 gpolo Python 3.0 beta 2 : json and urllib not working together? 210 days http://bugs.python.org/issue3763 orsenthil test_multiprocessing fails on systems with HAVE_SEM_OPEN=0 210 days http://bugs.python.org/issue3770 jnoller patch, needs review Patch for adding "default" to itemgetter and attrgetter 169 days http://bugs.python.org/issue4124 rhettinger patch Add file comparisons to the unittest library 156 days http://bugs.python.org/issue4217 georg.brandl On some Python builds, exec in a function can't create shadows o 138 days http://bugs.python.org/issue4315 jhylton __mro__ documentation 127 days http://bugs.python.org/issue4411 georg.brandl Given a module hierarchy string 'a.b.c', add an easy way to impo 126 days http://bugs.python.org/issue4438 georg.brandl patch Build / Test Py3K failed on Ubuntu 8.10 117 days http://bugs.python.org/issue4535 benjamin.peterson add SEEK_* values to io and/or io.IOBase 116 days http://bugs.python.org/issue4572 georg.brandl compile() doesn't ignore the source encoding when a string is pa 110 days http://bugs.python.org/issue4626 jmfauth patch, needs review urllib's splitpasswd does not accept newline chars in passwords 104 days http://bugs.python.org/issue4675 orsenthil patch try to build a C module, but don't worry if it doesn't work 101 days http://bugs.python.org/issue4706 tarek exec() behavior - revisited 86 days http://bugs.python.org/issue4831 jhylton MacPython build script uses Carbon and MacOS modules slated for 84 days http://bugs.python.org/issue4848 ronaldoussoren js_output wrong for cookies with " characters 85 days http://bugs.python.org/issue4860 orsenthil patch system wide site-packages dir not used on Mac OS X 82 days http://bugs.python.org/issue4865 ronaldoussoren patch Behavior of backreferences to named groups in regular expression 82 days http://bugs.python.org/issue4882 georg.brandl patch test/regrtest.py contains error on __import__ 82 days http://bugs.python.org/issue4886 r.david.murray email/header.py ecre regular expression issue 71 days http://bugs.python.org/issue4958 amaury.forgeotdarc urlparse & nfs url (rfc 2224) 74 days http://bugs.python.org/issue4962 orsenthil patch multiprocessing/pipe_connection.c compiler warning (conn_poll) 71 days http://bugs.python.org/issue5002 jnoller patch Overly general claim about sequence unpacking in tutorial 70 days http://bugs.python.org/issue5018 georg.brandl Adjust reference-counting note 67 days http://bugs.python.org/issue5039 georg.brandl patch Bug of CGIXMLRPCRequestHandler 67 days http://bugs.python.org/issue5040 orsenthil patch Printing Unicode chars from the interpreter in a non-UTF8 termin 61 days http://bugs.python.org/issue5110 ishimoto patch Add combinatoric counting functions to the math module. 58 days http://bugs.python.org/issue5139 rhettinger multiprocessing: SocketListener should use SO_REUSEADDR 51 days http://bugs.python.org/issue5177 jnoller patch optparse doex not export make_option 50 days http://bugs.python.org/issue5190 georg.brandl warns vars() assignment as well as locals() 49 days http://bugs.python.org/issue5199 georg.brandl patch String Formatting with namedtuple 50 days http://bugs.python.org/issue5205 rhettinger urllib2.build_opener( 49 days http://bugs.python.org/issue5208 georg.brandl change value of local variable in debug 50 days http://bugs.python.org/issue5215 georg.brandl patch Py_Main() does not return on sys.exit() 47 days http://bugs.python.org/issue5227 georg.brandl multiprocessing not compatible with functools.partial 47 days http://bugs.python.org/issue5228 jackdied time.strptime should reject bytes arguments on Py3 46 days http://bugs.python.org/issue5236 brett.cannon Change time.strptime() to make it work with Unicode chars 46 days http://bugs.python.org/issue5239 brett.cannon patch Missing flags in the Regex howto 47 days http://bugs.python.org/issue5241 ezio.melotti PyRun_SimpleStringFlags() documentation 46 days http://bugs.python.org/issue5245 georg.brandl with lock fails on multiprocessing 44 days http://bugs.python.org/issue5261 jnoller patch OS X installer: faulty Python.app bundle inside of framework 43 days http://bugs.python.org/issue5270 ronaldoussoren OS X installer: build can fail on import checks 43 days http://bugs.python.org/issue5271 ronaldoussoren http client error 37 days http://bugs.python.org/issue5314 jhylton __subclasses__ undocumented 37 days http://bugs.python.org/issue5324 georg.brandl json needs object_pairs_hook 31 days http://bugs.python.org/issue5381 bob.ippolito patch mmap.move crashes by integer overflow 0 days http://bugs.python.org/issue5387 ocean-city patch patches for multiprocessing module on NetBSD 30 days http://bugs.python.org/issue5400 jnoller patch Reference to missing(?) function in Extending & Embedding Docume 27 days http://bugs.python.org/issue5417 georg.brandl Pyshell history management error 27 days http://bugs.python.org/issue5428 kbk test_httpservers on Debian Testing 27 days http://bugs.python.org/issue5435 zamotcr [3.1alpha1] test_importlib fails on Mac OSX 10.5.6 25 days http://bugs.python.org/issue5442 brett.cannon only accept byte for getarg('c') and unicode for getarg('C') 16 days http://bugs.python.org/issue5499 haypo patch Deletion of some statements in re documentation 12 days http://bugs.python.org/issue5519 georg.brandl HTTPRedirectHandler documentation is wrong 12 days http://bugs.python.org/issue5522 georg.brandl execfile() removed from Python3 12 days http://bugs.python.org/issue5524 jhylton patch Backport sys module docs involving import to 2.7 11 days http://bugs.python.org/issue5529 georg.brandl tearDown in unittest should be executed regardless of result in 7 days http://bugs.python.org/issue5538 yaneurabeya needs review "file objects" in python 3 tutorial 9 days http://bugs.python.org/issue5540 georg.brandl multiprocessing: switch to autoconf detection of platform values 9 days http://bugs.python.org/issue5545 jnoller patch In the tutorial, PyMODINIT_FUNC is shown as having a return type 8 days http://bugs.python.org/issue5548 georg.brandl Python 3.0.1 Mac OS X install image ReadMe file is incorrect 6 days http://bugs.python.org/issue5558 ronaldoussoren Document bdist_msi 6 days http://bugs.python.org/issue5563 georg.brandl patch os.symlink/os.link docs should say old/new, not src/dst 3 days http://bugs.python.org/issue5564 benjamin.peterson Minor error in document of PyLong_AsSsize_t 5 days http://bugs.python.org/issue5566 georg.brandl Operators in operator module don't work with keyword arguments 6 days http://bugs.python.org/issue5567 rhettinger Issue in transparency in top level tk window(python) on MAC 0 days http://bugs.python.org/issue5569 amaury.forgeotdarc Bus error when calling .poll() on a closed Connection from multi 4 days http://bugs.python.org/issue5570 jnoller new "TestCase.skip" method causes all tests to skip under trial 1 days http://bugs.python.org/issue5571 glyph multiprocessing queues.py doesn't include JoinableQueue in its _ 4 days http://bugs.python.org/issue5574 jnoller yield in iterators 0 days http://bugs.python.org/issue5577 gvanrossum abc.abstractproperty() docs list fget as required; fget is not r 4 days http://bugs.python.org/issue5581 georg.brandl Optional extensions in setup.py 4 days http://bugs.python.org/issue5583 tarek patch implement initializer for multiprocessing.BaseManager.start() 5 days http://bugs.python.org/issue5585 lekma patch, needs review Add --randseed to regrtest 1 days http://bugs.python.org/issue5588 marketdickinson patch, needs review Wrong dump of floats 0 days http://bugs.python.org/issue5589 benjamin.peterson global symbols in shared libpython not prefixed with Py or _Py 1 days http://bugs.python.org/issue5591 pitrou Modules/_textio.c defines global symbol encodefuncs 0 days http://bugs.python.org/issue5592 pitrou os.path.ismount (ntpath) gives UnboundLocalError for any input 0 days http://bugs.python.org/issue5595 benjamin.peterson inspect.formatargspec crashes on missing kwonlydefaults 0 days http://bugs.python.org/issue5597 benjamin.peterson "paths" argument missing in DocFileSuite documentation 2 days http://bugs.python.org/issue5598 georg.brandl test_email_codecs is skipped because it fails to import TestSkip 0 days http://bugs.python.org/issue5599 benjamin.peterson easy Slight inaccuracy in webbrowser documentation 0 days http://bugs.python.org/issue5600 benjamin.peterson Slight punctuation problem in documentation of urllib.request.ur 2 days http://bugs.python.org/issue5602 georg.brandl Garbled sentence in documentation of urllib.request.urlopen 2 days http://bugs.python.org/issue5603 georg.brandl imp.find_module() mixes UTF8 and MBCS 3 days http://bugs.python.org/issue5604 asvetlov Don't assume that repr of literal dicts are sorted like pprint s 0 days http://bugs.python.org/issue5605 benjamin.peterson Add python.exe to the path in windows? 0 days http://bugs.python.org/issue5608 loewis test_posix.py and test_wait4.py having missing import on win32 2 days http://bugs.python.org/issue5613 r.david.murray Unicode printing in gdb post-mortem sessions 1 days http://bugs.python.org/issue5617 georg.brandl PyMemberDef type T_UBYTE incorrectly documtented 1 days http://bugs.python.org/issue5618 georg.brandl patch The attribute's action of an object is not correct. 1 days http://bugs.python.org/issue5620 Yong yang Py3K branch import _winreg instead of winreg 1 days http://bugs.python.org/issue5624 georg.brandl PyDict_SetItemString() fails when the second argument is null 1 days http://bugs.python.org/issue5627 georg.brandl Distutils "upload" command does not show up in--help-commands ou 1 days http://bugs.python.org/issue5631 georg.brandl Bug - threading.currentThread().ident returns None in main threa 0 days http://bugs.python.org/issue5632 benjamin.peterson patch test_sys reference counting fails while tracing 0 days http://bugs.python.org/issue5635 georg.brandl patch csv.reader next() method missing 1 days http://bugs.python.org/issue5636 georg.brandl 2to3 does not convert urllib.urlopen to urllib.request.urlopen 1 days http://bugs.python.org/issue5637 benjamin.peterson Local variables not freed when Exception raises in function call 0 days http://bugs.python.org/issue5641 georg.brandl test___future__ fails for py3k on Windows 0 days http://bugs.python.org/issue5644 benjamin.peterson MutableSet.__iand__ implementation calls self.discard while iter 0 days http://bugs.python.org/issue5647 rhettinger fix glob.iglob docstring 0 days http://bugs.python.org/issue5655 georg.brandl patch Coverage execution fails for files not encoded with utf-8 0 days http://bugs.python.org/issue5656 georg.brandl patch make html in doc fails because Makefile assigns python to PYTHO 1 days http://bugs.python.org/issue5658 MLModel logging.FileHandler encoding parameter does not work as expected 1 days http://bugs.python.org/issue5659 warp Cannot deepcopy unittest.TestCase instances 0 days http://bugs.python.org/issue5660 michael.foord py3k interpreter leak 0 days http://bugs.python.org/issue5662 benjamin.peterson Better failure messages for unittest assertions 0 days http://bugs.python.org/issue5663 michael.foord patch Extra heapq nlargest/nsmallest option for including ties 0 days http://bugs.python.org/issue5669 rhettinger Confusions in formatfloat 2566 days http://bugs.python.org/issue532631 marketdickinson Bgen should learn about booleans 2404 days http://bugs.python.org/issue602291 ronaldoussoren More documentation for the imp module 2376 days http://bugs.python.org/issue616247 brett.cannon Imports can deadlock 2233 days http://bugs.python.org/issue689895 brett.cannon Reloading pseudo modules 2213 days http://bugs.python.org/issue701743 brett.cannon patch bgen requires Universal Headers, not OS X dev headers 2072 days http://bugs.python.org/issue779153 ronaldoussoren BasicModuleLoader behaviour in Python 2.3c2 2074 days http://bugs.python.org/issue779191 brett.cannon zipimport on meta_path fails with mutual importers 2060 days http://bugs.python.org/issue787113 brett.cannon imp.find_module doesn't work in /tmp 2022 days http://bugs.python.org/issue809254 brett.cannon cryptic os.spawnvpe() return code 1972 days http://bugs.python.org/issue837577 georg.brandl reload() fails with modules from zips 1942 days http://bugs.python.org/issue856103 brett.cannon Some Carbon modules missing from documentation 1873 days http://bugs.python.org/issue896199 ronaldoussoren easy bundlebuilder: some way to add non-py files in packages 1866 days http://bugs.python.org/issue900502 ronaldoussoren asyncore fixes and improvements 1854 days http://bugs.python.org/issue909005 giampaolo.rodola patch nametowidget throws TypeError for Tcl_Objs 1811 days http://bugs.python.org/issue934418 gpolo List with Canvas.create_line Option arrow=LAST Broke 1799 days http://bugs.python.org/issue941262 gpolo importing dynamic modules via embedded python 1764 days http://bugs.python.org/issue965206 brett.cannon import x.y inside of module x.y 1763 days http://bugs.python.org/issue966431 brett.cannon PyObject_GenericGetAttr is undocumented 1755 days http://bugs.python.org/issue970783 georg.brandl easy Starting a script in OSX within a specific folder 1748 days http://bugs.python.org/issue974159 ronaldoussoren An inconsistency with nested scopes 1721 days http://bugs.python.org/issue991196 jhylton exec statement balks at CR/LF 1719 days http://bugs.python.org/issue992207 georg.brandl easy test__locale fails on MacOS X 1699 days http://bugs.python.org/issue1005113 brett.cannon Can't raise "C API version mismatch" warning 1634 days http://bugs.python.org/issue1044382 brett.cannon current directory in sys.path handles symlinks badly 1585 days http://bugs.python.org/issue1074015 brett.cannon Carbon.Res misses GetIndString 1560 days http://bugs.python.org/issue1089399 ronaldoussoren Carbon.File.FSCatalogInfo.createDate implementation 1560 days http://bugs.python.org/issue1089624 ronaldoussoren _AEModule.c patch 1557 days http://bugs.python.org/issue1090958 ronaldoussoren patch sys.__stdout__ doco isn't discouraging enough 1546 days http://bugs.python.org/issue1096310 georg.brandl OSATerminology still semi-broken 1519 days http://bugs.python.org/issue1113328 ronaldoussoren patches to compile for AIX 4.1.x 1509 days http://bugs.python.org/issue1119626 ajaksu2 patch eval does not bind variables in lambda bodies correctly 1492 days http://bugs.python.org/issue1153622 jhylton Neverending warnings from asyncore 1483 days http://bugs.python.org/issue1161031 brett.cannon patch threading.Condition.wait() return value indicates timeout 1457 days http://bugs.python.org/issue1175933 gvanrossum patch, easy allow running multiple instances of IDLE 1417 days http://bugs.python.org/issue1201569 kbk patch 'insufficient disk space' message wrong (msi on win xp pro) 1361 days http://bugs.python.org/issue1234328 ajaksu2 httplib gzip support 1347 days http://bugs.python.org/issue1243678 georg.brandl patch expat binding for XML_ParserReset (Bug #1208730) 1345 days http://bugs.python.org/issue1244208 ajaksu2 patch QuickTime API needs corrected object types 1329 days http://bugs.python.org/issue1254695 ronaldoussoren patch 2.4.1 make fails on Solaris 10 (complexobject.c/HUGE_VAL) 1308 days http://bugs.python.org/issue1276509 ajaksu2 Incorrect use of -L/usr/lib/termcap 1257 days http://bugs.python.org/issue1332732 ajaksu2 async_chat.push() can trigger handle_error(). undocumented. 1217 days http://bugs.python.org/issue1370380 josiahcarlson minidom namespace problems 1214 days http://bugs.python.org/issue1371937 ajaksu2 _winreg specifies EnvironmentError instead of WindowsError 1197 days http://bugs.python.org/issue1386675 georg.brandl Compile under mingw properly 1162 days http://bugs.python.org/issue1412448 ajaksu2 patch PyImport_AppendInittab stores pointer to parameter 1157 days http://bugs.python.org/issue1419652 brett.cannon Unable to stringify datetime with tzinfo 1114 days http://bugs.python.org/issue1447945 ajaksu2 Hitting CTRL-C while in a loop closes IDLE on cygwin 1083 days http://bugs.python.org/issue1468223 tebeka endless loop in PyCFunction_Fini() 1049 days http://bugs.python.org/issue1488906 ajaksu2 sys.path issue if sys.prefix contains a colon 1021 days http://bugs.python.org/issue1507224 brett.cannon __del__: Type is cleared before instances 1006 days http://bugs.python.org/issue1513802 benjamin.peterson site.py can break the location of the python library 1007 days http://bugs.python.org/issue1514734 brett.cannon fcntl.ioctl fails to copy back exactly-1024 buffer 992 days http://bugs.python.org/issue1520818 ajaksu2 Document additions from PEP 302 987 days http://bugs.python.org/issue1525549 brett.cannon Literal strings use BS as octal escape character 978 days http://bugs.python.org/issue1530012 georg.brandl distutils 'register' command and windows home directories 973 days http://bugs.python.org/issue1531505 ajaksu2 Win32 debug version of _msi creates _msi.pyd, not _msi_d.pyd 968 days http://bugs.python.org/issue1534738 ajaksu2 sys.path gets munged with certain directory structures 969 days http://bugs.python.org/issue1534764 brett.cannon "make install" doesn't install to /usr/lib64 on x86_64 boxes 965 days http://bugs.python.org/issue1536339 ajaksu2 python-2.5c1.msi contains ICE validation errors and warnings 955 days http://bugs.python.org/issue1542432 ajaksu2 test_tempfile fails on cygwin 953 days http://bugs.python.org/issue1543467 ajaksu2 Wireless on Python 946 days http://bugs.python.org/issue1547300 ajaksu2 C modules reloaded on certain failed imports 946 days http://bugs.python.org/issue1548687 brett.cannon python 2.5 install can't find tcl/tk in /usr/lib64 936 days http://bugs.python.org/issue1553166 ajaksu2 Class instance apparently not destructed when expected 935 days http://bugs.python.org/issue1553819 ajaksu2 2.5c1 Core dump during 64-bit make on Solaris 9 Sparc 929 days http://bugs.python.org/issue1557490 ajaksu2 strftime('%z') behaving differently with/without time arg. 924 days http://bugs.python.org/issue1560794 ajaksu2 IDLE: Dedent with Italian keyboard 921 days http://bugs.python.org/issue1562092 gpolo importing threading in a thread does not work 921 days http://bugs.python.org/issue1562822 brett.cannon --disable-sunaudiodev --disable-tk does not work 895 days http://bugs.python.org/issue1579029 ajaksu2 Error piping output between scripts on Windows 879 days http://bugs.python.org/issue1590068 georg.brandl import deadlocks when using PyObjC threads 879 days http://bugs.python.org/issue1590864 brett.cannon PyThread_release_lock with pthreads munges errno 849 days http://bugs.python.org/issue1608921 benjamin.peterson patch IDLE crashes on OS X 10.4 when "Preferences" selected 831 days http://bugs.python.org/issue1621111 kbk Please provide rsync-method in the urllib[2] module 810 days http://bugs.python.org/issue1634770 orsenthil MIME renderer: wrong header line break with long subject? 795 days http://bugs.python.org/issue1645148 barry sgmllib _convert_ref UnicodeDecodeError exception, new in 2.5 786 days http://bugs.python.org/issue1651995 georg.brandl patch thread join() with timeout hangs on Windows 2003 x64 786 days http://bugs.python.org/issue1654429 amaury.forgeotdarc Handle requests to intern string subtype instances 776 days http://bugs.python.org/issue1658799 ajaksu2 patch Calling tparm from extension lib fails in Python 2.5 777 days http://bugs.python.org/issue1659171 ajaksu2 Hangup when using cgitb in a thread while still in import 770 days http://bugs.python.org/issue1665206 brett.cannon add identity function 760 days http://bugs.python.org/issue1673203 rhettinger Make threading.Event().wait(timeout=3) return isSet 757 days http://bugs.python.org/issue1674032 georg.brandl patch Redirect cause invalid descriptor error 756 days http://bugs.python.org/issue1675026 georg.brandl Remove trailing slash from --prefix 755 days http://bugs.python.org/issue1676135 georg.brandl patch PEP 361 Warnings 742 days http://bugs.python.org/issue1683908 ajaksu2 patch python throws an error when unpacking bz2 file 694 days http://bugs.python.org/issue1714773 georg.brandl Destructor behavior faulty 689 days http://bugs.python.org/issue1717900 pitrou Line ending bug SimpleXMLRPCServer 677 days http://bugs.python.org/issue1725295 georg.brandl patch Windows Build Warnings 677 days http://bugs.python.org/issue1726196 amaury.forgeotdarc patch telnetlib: A callback for monitoring the telnet session 666 days http://bugs.python.org/issue1730959 jackdied patch asyncore/asynchat patches 654 days http://bugs.python.org/issue1736190 intgr patch Top Issues Most Discussed (10) ______________________________ 43 additional unittest type equality methods 361 days open http://bugs.python.org/issue2578 21 asyncore delayed calls feature 473 days open http://bugs.python.org/issue1641 15 Pass MS CRT debug flags into subprocesses 2 days open http://bugs.python.org/issue5619 14 multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES 231 days closed http://bugs.python.org/issue3551 13 implement initializer for multiprocessing.BaseManager.start() 5 days closed http://bugs.python.org/issue5585 13 Neverending warnings from asyncore 1483 days closed http://bugs.python.org/issue1161031 12 Speed up pickling of dicts in cPickle 1 days open http://bugs.python.org/issue5670 12 test_math.testFsum failure on release30-maint 5 days open http://bugs.python.org/issue5593 11 Extra heapq nlargest/nsmallest option for including ties 0 days closed http://bugs.python.org/issue5669 10 PyDict_SetItemString() fails when the second argument is null 1 days closed http://bugs.python.org/issue5627 From martin at v.loewis.de Fri Apr 3 18:35:09 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 03 Apr 2009 18:35:09 +0200 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: References: Message-ID: <49D63ABD.30000@v.loewis.de> >> I just noticed that the new io-c modules were merged in the py3k >> branch (I know, I am kind late on the news?blame school work). Anyway, >> I am just wondering if it would be a good idea to put the io-c modules >> in a sub-directory (like sqlite), instead of scattering them around in >> the Modules/ directory. > > Welcome back! > > I have no particular opinion on this. I suggest waiting for Benjamin's advice > and following it :-) I would suggest to leave it as is: a) never change a running system b) flat is better than nested Martin From olemis at gmail.com Fri Apr 3 18:38:12 2009 From: olemis at gmail.com (Olemis Lang) Date: Fri, 3 Apr 2009 11:38:12 -0500 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <49D63754.6030601@simplistix.co.uk> <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com> Message-ID: <24ea26600904030938i45f191a7o6be70ac0f1a95761@mail.gmail.com> On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers wrote: > Tarek Ziad? wrote: > >> - PyPI mirroring (PEP 381) > > I don't see why PyPI isn't just ported to GAE with an S3 data storage bit > and be done with it... Offline mirrors for people behind firewalls already > have solutions out there... > -1 ... IMHO ... -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: No me gustan los templates de Django ... -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article: Comandos : Pipe Viewer ... ?Qu? est? pasando por esta tuber?a? From martin at v.loewis.de Fri Apr 3 18:43:05 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 18:43:05 +0200 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D5FBE6.6090807@avl.com> References: <49D5FBE6.6090807@avl.com> Message-ID: <49D63C99.6000302@v.loewis.de> > I've stumbled upon an oddity using sets. It's trivial to test if a > value is in the set, but it appears to be impossible to retrieve a > stored value, other than by iterating over the whole set. Of course it is. That's why it is called a set: it's an unordered collection of objects, keyed by nothing. If you have a set of elements, and you check "'foo' in s", then you should be able just to use the string 'foo' itself for whatever you want to do with it - you have essentially created a set of strings. If you think that 'foo' and Element('foo') are different things, you should not implement __eq__ in a way that they are considered equal. Regards, Martin From solipsis at pitrou.net Fri Apr 3 18:43:58 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 3 Apr 2009 16:43:58 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?= References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> Message-ID: Thomas Wouters python.org> writes: > > Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include *negative* run times. That's an implementation problem, not an issue with the tests themselves. Perhaps a better timing mechanism could be inspired from the timeit module. Perhaps the default numbers of iterations should be higher (many subtests run in less than 100ms on a modern CPU, which might be too low for accurate measurement). Perhaps the so-called "calibration" should just be disabled. etc. > The tests in PyBench are not micro-benchmarks (they do way too much for that), Then I wonder what you call a micro-benchmark. Should it involve direct calls to low-level C API functions? > but they are also not representative of real-world code. Representativity is not black or white. Is measuring Spitfire performance representative of the Genshi templating engine, or str.format-based templating? Regardless of the answer, it is still an interesting measurement. > That doesn't just mean "you can't infer the affected operation from the test name" I'm not sure what you mean by that. If you introduce an optimization to make list comprehensions faster, it will certainly show up in the list comprehensions subtest, and probably in none of the other tests. Isn't it enough in terms of specificity? Of course, some optimizations are interpreter-wide, and then the breakdown into individual subtests is less relevant. > I have in the past written patches to Python that improved *every* micro-benchmark and *every* real-world measurement I made, except PyBench. Well, I didn't claim that pybench measures /everything/. That's why we have other benchmarks as well (stringbench, iobench, whatever). It does test a bunch of very common operations which are important in daily use of Python. If some important operation is missing, it's possible to add a new test. Conversely, someone optimizing e.g. list comprehensions and trying to measure the impact using a set of so-called "real-world benchmarks" which don't involve any list comprehension in their critical path will not see any improvement in those "real-world benchmarks". Does it mean that the optimization is useless? No, certainly not. The world is not black and white. > That's exactly what Collin proposed at the summits last week. Have you seen http://code.google.com/p/unladen-swallow/wiki/Benchmarks Yes, I've seen. I haven't tried it, I hope it can be run without installing the whole unladen-swallow suite? These are the benchmarks I've had a tendency to use depending on the issue at hand: pybench, richards, stringbench, iobench, binary-trees (from the Computer Language Shootout). And various custom timeit runs :-) Cheers Antoine. From jpe at wingware.com Fri Apr 3 18:44:19 2009 From: jpe at wingware.com (John Ehresman) Date: Fri, 03 Apr 2009 11:44:19 -0500 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D63787.3080304@v.loewis.de> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> <49D63787.3080304@v.loewis.de> Message-ID: <49D63CE3.40203@wingware.com> Just want to reply quickly because I'm traveling -- I appreciate the feedback from Raymond and others. Part of the reason I created an issue with a proof of concept patch is to get this kind of feedback. I also agree that this shouldn't go in if it slows things down noticeably. I will do some benchmarking and look at the dtrace patches next week to see if there is some sort of more systematic way of adding these types of hooks. John From chris at simplistix.co.uk Fri Apr 3 18:55:04 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 03 Apr 2009 17:55:04 +0100 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <49D63754.6030601@simplistix.co.uk> <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com> Message-ID: <49D63F68.2050409@simplistix.co.uk> Olemis Lang wrote: > On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers wrote: >> Tarek Ziad? wrote: >> >>> - PyPI mirroring (PEP 381) >> I don't see why PyPI isn't just ported to GAE with an S3 data storage bit >> and be done with it... Offline mirrors for people behind firewalls already >> have solutions out there... > > -1 ... IMHO ... For what reason? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From p.f.moore at gmail.com Fri Apr 3 18:57:29 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 3 Apr 2009 17:57:29 +0100 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D63C99.6000302@v.loewis.de> References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de> Message-ID: <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> 2009/4/3 Steven D'Aprano : > Python does not promise that if x == y, you can use y anywhere you can > use x. Nor should it. Paul's declaration of abuse of __eq__ is > unfounded. Sorry, I was trying to simplify what I was saying, and simplified it to the point where it didn't make sense :-) Martin (quoted below) explained what I was trying to say far more clearly. 2009/4/3 "Martin v. L?wis" : > If you have a set of elements, and you check "'foo' in s", then > you should be able just to use the string 'foo' itself for whatever > you want to do with it - you have essentially created a set of > strings. If you think that 'foo' and Element('foo') are different > things, you should not implement __eq__ in a way that they are > considered equal. -- in particular, if you're using things in sets (which are *all about* equality, insofar as that's how "duplicates" are defined) you should ensure that your definition of __eq__ respects the idea that equal objects are duplicates (ie, interchangeable). Otherwise, a dict is the appropriate data structure. Actually, given the definition in the original post, class Element(object): def __init__(self, key): self.key = key def __eq__(self, other): return self.key == other def __hash__(self): return hash(self.key) as far as I can tell, equality is *only* defined between Elements and keys - not even between 2 elements! So with that definition, there could be many Elements in a set, all equal to the same key. Which is completely insane. In fact, Python seems to be doing something I don't understand: >>> class Element(object): ... def __init__(self, key, id): ... self.key = key ... self.id = id ... def __eq__(self, other): ... print "Calling __eq__ for %s" % self.id ... return self.key == other ... def __hash__(self): ... return hash(self.key) ... >>> a = Element('k', 'a') >>> b = Element('k', 'b') >>> a == b Calling __eq__ for a Calling __eq__ for b True >>> a == a Calling __eq__ for a Calling __eq__ for a True >>> Why does __eq__ get called twice in these cases? Why does a == b, as that means a.key == b, and clearly a.key ('k') does *not* equal b. Or are there some further options being tried, in str,__eq__ or object.__eq__? The documentation doesn't say so... Specifically, there's nothing saying that a "reversed" version is tried. Paul. From mal at egenix.com Fri Apr 3 19:04:36 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 03 Apr 2009 19:04:36 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> Message-ID: <49D641A4.10904@egenix.com> On 2009-04-03 18:06, Thomas Wouters wrote: > On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou wrote: > >> Thomas Wouters python.org> writes: >>> >>> Pystone is pretty much a useless benchmark. If it measures anything, it's >> the >> speed of the bytecode dispatcher (and it doesn't measure it particularly >> well.) >> PyBench isn't any better, in my experience. >> >> I don't think pybench is useless. It gives a lot of performance data about >> crucial internal operations of the interpreter. It is of course very little >> real-world, but conversely makes you know immediately where a performance >> regression has happened. (by contrast, if you witness a regression in a >> high-level benchmark, you still have a lot of investigation to do to find >> out >> where exactly something bad happened) > > > Really? Have you tried it? I get at least 5% noise between runs without any > changes. I have gotten results that include *negative* run times. On which platform ? pybench 2.0 works reasonably well on Linux and Windows, but of course can't do better than the timers available for those platforms. If you have e.g. NTP running and it uses wall clock timers, it is possible that you get negative round times. If you don't and still get negative round times, you have to change the test parameters (see below). > And yes, I > tried all the different settings for calibration runs and timing mechanisms. > The tests in PyBench are not micro-benchmarks (they do way too much for > that), they don't try to minimize overhead or noise, That is not true. They were written as micro-benchmarks and adjusted to have a high signal-noise ratio. For some operations this isn't easy to do, but I certainly tried hard to get the overhead low (note that the overhead is listed in the output). That said, please keep in mind that the settings in pybench were last adjusted some years ago to have the tests all run in more or less the same wall clock time. CPUs have evolved a lot since then and this shows. > but they are also not > representative of real-world code. True and they never were meant for that, since I was frustrated by other benchmarks at the time and the whole approach in general. Each of the tests checks one specific aspect of Python. If your application happens to use a lot of dictionary operations, you'll be mostly interested in those. If you do a lot of simple arithmetic, there's another test for that. On top of that the application is written to be easily extensible, so it's easy to add new tests specific to whatever application space you're after. > That doesn't just mean "you can't infer > the affected operation from the test name", but "you can't infer anything." > You can just be looking at differently borrowed runtime. I have in the past > written patches to Python that improved *every* micro-benchmark and *every* > real-world measurement I made, except PyBench. Trying to pinpoint the > slowdown invariably lead to tests that did too much in the measurement loop, > introduced too much noise in the "calibration" run or just spent their time > *in the measurement loop* on doing setup and teardown of the test. pybench calibrates itself to remove that kind of noise from the output. Each test has a .calibrate() method which does all the setup and tear down minus the actual benchmark operations. If you get wrong numbers, try adjusting the parameters and add more "packets" of operations. Don't forget to adjust the version number to not compare apples and orange, though. Perhaps it's time to readjust the pybench parameters to todays CPUs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 03 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ziade.tarek at gmail.com Fri Apr 3 19:12:20 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 3 Apr 2009 19:12:20 +0200 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <49D63754.6030601@simplistix.co.uk> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <49D63754.6030601@simplistix.co.uk> Message-ID: <94bdd2610904031012g48bb2blccf24c59573fadfc@mail.gmail.com> On Fri, Apr 3, 2009 at 6:20 PM, Chris Withers wrote: > Tarek Ziad? wrote: >> >> I have taken the commitment to lead these tasks and synchronize the people >> that are willing to help on this. > > Good, I'm one of those people, Great ! > sadly my only help may be to ask "how is this > bit going to be done?". I'll work on the wiki this week end for that > >> The tasks discussed so far are: >> >> - version definition (http://wiki.python.org/moin/DistutilsVersionFight) >> - egg.info standardification (PEP 376) >> - metadata enhancement (rewrite PEP 345) >> - static metadata definition work ?(*) > > These all seem to be a subset of the last one, right? Sorry I used "task" I should have used "topics". We are trying to have a list of well-defined, isolated tasks. Theses tasks are built upon the discussions we have in these topics. The last topic (static metadata) might generate new tasks and/or complete existing tasks. >> - PyPI mirroring (PEP 381) > > I don't see why PyPI isn't just ported to GAE with an S3 data storage bit > and be done with it... Offline mirrors for people behind firewalls already > have solutions out there... GAE+S3 is just an implementation imho. We still need a mirroring protocol ala CPAN and features in client softwares to use them. (as defined in 381) > >> Each one of this task has a leader, except the one with (*). I just got >> back >> from travelling, and I will reorganize >> http://wiki.python.org/moin/Distutils asap to it is up-to-date. > > Cool, is this the focal point to track your activities? Exactly. And Distutils-SIG is the mailing list to discuss in ;) > >> If you want to work on one of this task or feel there's a new task you can >> start, please, join Distutils SIG or contact me, > > Well, I think my "big list" breaks down roughly as tasks, of which I think > the stuff you're already doing will hopefully take care of the first 2, but > what about the rest. If labour shortage is all that's stopping this, then > let me know ;-) > Please discuss these new points in Distutils-SIG Cheers Tarek From fuzzyman at voidspace.org.uk Fri Apr 3 19:13:39 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 03 Apr 2009 18:13:39 +0100 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <49D63F68.2050409@simplistix.co.uk> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> <49D63754.6030601@simplistix.co.uk> <24ea26600904030937y72e36dcdydeab19607302f23d@mail.gmail.com> <49D63F68.2050409@simplistix.co.uk> Message-ID: <49D643C3.3090707@voidspace.org.uk> Chris Withers wrote: > Olemis Lang wrote: >> On Fri, Apr 3, 2009 at 11:20 AM, Chris Withers >> wrote: >>> Tarek Ziad? wrote: >>> >>>> - PyPI mirroring (PEP 381) >>> I don't see why PyPI isn't just ported to GAE with an S3 data >>> storage bit >>> and be done with it... Offline mirrors for people behind firewalls >>> already >>> have solutions out there... >> >> -1 ... IMHO ... > > For what reason? GAE does suffer from blackouts - which is the problem we are attempting to solve with mirroring. I don't see why we should tie vital Python infrastructure to the proprietary APIs of a single vendor and outsource delivery entirely to them. If we have the manpower to do this ourselves it seems better to do it and retain control. Added to which GAE is a commercial service and beyond a certain level bandwidth / cycles needs paying for. This may not be an issue in itself (either Google may waive charges or the PSF may be willing to pay). Michael > > Chris > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From python at rcn.com Fri Apr 3 19:14:10 2009 From: python at rcn.com (Raymond Hettinger) Date: Fri, 3 Apr 2009 10:14:10 -0700 Subject: [Python-Dev] Getting values stored inside sets References: <49D5FBE6.6090807@avl.com> Message-ID: <00AE37B203704E668246D2A70BAD4568@RaymondLaptop1> > Hrvoje Niksic wrote: >> I've stumbled upon an oddity using sets. It's trivial to test if a >> value is in the set, but it appears to be impossible to retrieve a >> stored value, See: http://code.activestate.com/recipes/499299/ Raymond From alexander.belopolsky at gmail.com Fri Apr 3 19:16:24 2009 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 3 Apr 2009 13:16:24 -0400 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D63C99.6000302@v.loewis.de> References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de> Message-ID: I just want to add a link to a 2.5 year old discussion on this issue: . In that discussion I disagreed with Martin and argued that "interning is a set operation and it is unfortunate that set API does not support it directly." On Fri, Apr 3, 2009 at 12:43 PM, "Martin v. L?wis" wrote: >> I've stumbled upon an oddity using sets. ?It's trivial to test if a >> value is in the set, but it appears to be impossible to retrieve a >> stored value, other than by iterating over the whole set. > > Of course it is. That's why it is called a set: it's an unordered > collection of objects, keyed by nothing. > > If you have a set of elements, and you check "'foo' in s", then > you should be able just to use the string 'foo' itself for whatever > you want to do with it - you have essentially created a set of > strings. If you think that 'foo' and Element('foo') are different > things, you should not implement __eq__ in a way that they are > considered equal. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From collinw at gmail.com Fri Apr 3 19:18:04 2009 From: collinw at gmail.com (Collin Winter) Date: Fri, 3 Apr 2009 10:18:04 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> Message-ID: <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> On Fri, Apr 3, 2009 at 9:43 AM, Antoine Pitrou wrote: > Thomas Wouters python.org> writes: >> >> Really? Have you tried it? I get at least 5% noise between runs without any > changes. I have gotten results that include *negative* run times. > > That's an implementation problem, not an issue with the tests themselves. > Perhaps a better timing mechanism could be inspired from the timeit module. > Perhaps the default numbers of iterations should be higher (many subtests run > in less than 100ms on a modern CPU, which might be too low for accurate > measurement). Perhaps the so-called "calibration" should just be disabled. > etc. > >> The tests in PyBench are not micro-benchmarks (they do way too much for > that), > > Then I wonder what you call a micro-benchmark. Should it involve direct calls > to > low-level C API functions? I agree that a suite of microbenchmarks is supremely useful: I would very much like to be able to isolate, say, raise statement performance. PyBench suffers from implementation defects that in its current incarnation make it unsuitable for this, though: - It does not effectively isolate component performance as it claims. When I was working on a change to BINARY_MODULO to make string formatting faster, PyBench would report that floating point math got slower, or that generator yields got slower. There is a lot of random noise in the results. - We have observed overall performance swings of 10-15% between runs on the same machine, using the same Python binary. Using the same binary on the same unloaded machine should give as close an answer to 0% as possible. - I wish PyBench actually did more isolation. Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it didn't put keyword arguments and **kwargs in the same microbenchmark. - In experimenting with gcc 4.4's FDO support, I produced a training load that resulted in a 15-30% performance improvement (depending on benchmark) across all benchmarks. Using this trained binary, PyBench slowed down by 10%. - I would like to see PyBench incorporate better statistics for indicating the significance of the observed performance difference. I don't believe that these are insurmountable problems, though. A great contribution to Python performance work would be an improved version of PyBench that corrects these problems and offers more precise measurements. Is that something you might be interested in contributing to? As performance moves more into the wider consciousness, having good tools will become increasingly important. Thanks, Collin From fuzzyman at voidspace.org.uk Fri Apr 3 19:28:40 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 03 Apr 2009 18:28:40 +0100 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> Message-ID: <49D64748.70305@voidspace.org.uk> Collin Winter wrote: > On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou wrote: > >> Thomas Wouters python.org> writes: >> >>> Pystone is pretty much a useless benchmark. If it measures anything, it's the >>> >> speed of the bytecode dispatcher (and it doesn't measure it particularly well.) >> PyBench isn't any better, in my experience. >> >> I don't think pybench is useless. It gives a lot of performance data about >> crucial internal operations of the interpreter. It is of course very little >> real-world, but conversely makes you know immediately where a performance >> regression has happened. (by contrast, if you witness a regression in a >> high-level benchmark, you still have a lot of investigation to do to find out >> where exactly something bad happened) >> >> Perhaps someone should start maintaining a suite of benchmarks, high-level and >> low-level; we currently have them all scattered around (pybench, pystone, >> stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not >> to mention other third-party stuff that can be found in e.g. the Computer >> Language Shootout). >> > > Already in the works :) > > As part of the common standard library and test suite that we agreed > on at the PyCon language summit last week, we're going to include a > common benchmark suite that all Python implementations can share. This > is still some months off, though, so there'll be plenty of time to > bikeshed^Wrationally discuss which benchmarks should go in there. > Where is the right place for us to discuss this common benchmark and test suite? As the benchmark is developed I would like to ensure it can run on IronPython. The test suite changes will need some discussion as well - Jython and IronPython (and probably PyPy) have almost identical changes to tests that currently rely on deterministic finalisation (reference counting) so it makes sense to test changes on both platforms and commit a single solution. Michael > Collin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From rdmurray at bitdance.com Fri Apr 3 19:33:39 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Fri, 3 Apr 2009 13:33:39 -0400 (EDT) Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de> <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> Message-ID: On Fri, 3 Apr 2009 at 17:57, Paul Moore wrote: > In fact, Python seems to be doing something I don't understand: > >>>> class Element(object): > ... def __init__(self, key, id): > ... self.key = key > ... self.id = id > ... def __eq__(self, other): > ... print "Calling __eq__ for %s" % self.id > ... return self.key == other > ... def __hash__(self): > ... return hash(self.key) > ... >>>> a = Element('k', 'a') >>>> b = Element('k', 'b') >>>> a == b > Calling __eq__ for a > Calling __eq__ for b > True >>>> a == a > Calling __eq__ for a > Calling __eq__ for a > True >>>> > > Why does __eq__ get called twice in these cases? Why does a == b, as > that means a.key == b, and clearly a.key ('k') does *not* equal b. Or > are there some further options being tried, in str,__eq__ or > object.__eq__? The documentation doesn't say so... Specifically, > there's nothing saying that a "reversed" version is tried. a == b So, python calls a.__eq__(b) Now, that function does: a.key == b Since b is an object with an __eq__ method, python calls b.__eq__(a.key). That function does: a.key == b.key ie: the OP's code is inefficient :) --David From collinw at gmail.com Fri Apr 3 19:35:28 2009 From: collinw at gmail.com (Collin Winter) Date: Fri, 3 Apr 2009 10:35:28 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D64748.70305@voidspace.org.uk> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> Message-ID: <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord wrote: > Collin Winter wrote: >> As part of the common standard library and test suite that we agreed >> on at the PyCon language summit last week, we're going to include a >> common benchmark suite that all Python implementations can share. This >> is still some months off, though, so there'll be plenty of time to >> bikeshed^Wrationally discuss which benchmarks should go in there. >> > > Where is the right place for us to discuss this common benchmark and test > suite? > > As the benchmark is developed I would like to ensure it can run on > IronPython. > > The test suite changes will need some discussion as well - Jython and > IronPython (and probably PyPy) have almost identical changes to tests that > currently rely on deterministic finalisation (reference counting) so it > makes sense to test changes on both platforms and commit a single solution. I believe Brett Cannon is the best person to talk to about this kind of thing. I don't know that any common mailing list has been set up, though there may be and Brett just hasn't told anyone yet :) Collin From solipsis at pitrou.net Fri Apr 3 19:50:21 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 3 Apr 2009 17:50:21 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PyDict=5FSetItem_hook?= References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> Message-ID: Collin Winter gmail.com> writes: > > - I wish PyBench actually did more isolation. > Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it > didn't put keyword arguments and **kwargs in the same microbenchmark. Well, there is a balance to be found between having more subtests and keeping a reasonable total running time :-) (I have to plead guilty for ComplexPythonFunctionCalls, btw) > - I would like to see PyBench incorporate better statistics for > indicating the significance of the observed performance difference. I see you already have this kind of measurement in your perf.py script, would it be easy to port it? We could also discuss making individual tests longer (by changing the default "warp factor"). From p.f.moore at gmail.com Fri Apr 3 19:56:44 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 3 Apr 2009 18:56:44 +0100 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de> <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> Message-ID: <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com> 2009/4/3 R. David Murray : > a == b > > So, python calls a.__eq__(b) > > Now, that function does: > > a.key == b > > Since b is an object with an __eq__ method, python calls > b.__eq__(a.key). That's the bit I can't actually find documented anywhere. Ah, looking again I see that I misread the section describing the rich comparison methods: """ There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, __lt__() and __gt__() are each other?s reflection, __le__() and __ge__() are each other?s reflection, and __eq__() and __ne__() are their own reflection. """ I read that as meaning that no "reversed" version was called, whereas it actually means that __eq__ is its own reversed version - and so gets called both times. Thanks for helping me clear that up! Paul. From fuzzyman at voidspace.org.uk Fri Apr 3 20:00:43 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 03 Apr 2009 19:00:43 +0100 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> Message-ID: <49D64ECB.9040100@voidspace.org.uk> Collin Winter wrote: > On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord > wrote: > >> Collin Winter wrote: >> >>> As part of the common standard library and test suite that we agreed >>> on at the PyCon language summit last week, we're going to include a >>> common benchmark suite that all Python implementations can share. This >>> is still some months off, though, so there'll be plenty of time to >>> bikeshed^Wrationally discuss which benchmarks should go in there. >>> >>> >> Where is the right place for us to discuss this common benchmark and test >> suite? >> >> As the benchmark is developed I would like to ensure it can run on >> IronPython. >> >> The test suite changes will need some discussion as well - Jython and >> IronPython (and probably PyPy) have almost identical changes to tests that >> currently rely on deterministic finalisation (reference counting) so it >> makes sense to test changes on both platforms and commit a single solution. >> > > I believe Brett Cannon is the best person to talk to about this kind > of thing. I don't know that any common mailing list has been set up, > though there may be and Brett just hasn't told anyone yet :) > > Collin > Which begs the question of whether we *should* have a separate mailing list. I don't think we discussed this specific point in the language summit - although it makes sense. Should we have a list specifically for the test / benchmarking or would a more general implementations-sig be appropriate? And is it really Brett who sets up mailing lists? My understanding is that he is pulling out of stuff for a while anyway, so that he can do Java / Phd type things... ;-) Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From collinw at gmail.com Fri Apr 3 20:05:46 2009 From: collinw at gmail.com (Collin Winter) Date: Fri, 3 Apr 2009 11:05:46 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> Message-ID: <43aa6ff70904031105y182abfabtc59b1880736625db@mail.gmail.com> On Fri, Apr 3, 2009 at 10:50 AM, Antoine Pitrou wrote: > Collin Winter gmail.com> writes: >> >> - I wish PyBench actually did more isolation. >> Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it >> didn't put keyword arguments and **kwargs in the same microbenchmark. > > Well, there is a balance to be found between having more subtests and keeping a > reasonable total running time :-) > (I have to plead guilty for ComplexPythonFunctionCalls, btw) Sure, there's definitely a balance to maintain. With perf.py, we're going down the road of having different tiers of benchmarks: the default set is the one we pay the most attention to, with other benchmarks available for benchmarking certain specific subsystems or workloads (like pickling list-heavy input data). Something similar could be done for PyBench, giving the user the option of increasing the level of detail (and run-time) as appropriate. >> - I would like to see PyBench incorporate better statistics for >> indicating the significance of the observed performance difference. > > I see you already have this kind of measurement in your perf.py script, would it > be easy to port it? Yes, it should be straightforward to incorporate these statistics into PyBench. In the same directory as perf.py, you'll find test_perf.py which includes tests for the stats functions we're using. Collin From steve at holdenweb.com Fri Apr 3 21:50:01 2009 From: steve at holdenweb.com (Steve Holden) Date: Fri, 03 Apr 2009 15:50:01 -0400 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D63465.80401@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> Message-ID: Chris Withers wrote: > Guido van Rossum wrote: >>>> But anyways this is moot, the bug was only about exec in a class body >>>> *nested inside a function*. >>> Indeed, I just hate seeing execs and it was an interesting mental >>> exercise >>> to try and get rid of the above one ;-) >>> >>> Assuming it breaks no tests, would there be objection to me >>> committing the >>> above change to the Python 3 trunk? >> >> That's up to Benjamin. Personally, I live by "if it ain't broke, don't >> fix it." :-) > > Anything using an exec that can be done in some other (more pythonic way) > is broken by definition ;-) > > Benjamin? > We've just had a fairly clear demonstration that small semantic changes to the language can leave unexpected areas borked. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ From martin at v.loewis.de Fri Apr 3 21:49:58 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 21:49:58 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <20090402171218.9DDEF3A40A7@sparrow.telecommunity.com> Message-ID: <49D66866.5020505@v.loewis.de> > Perhaps we could add something like a sys.namespace_packages that would > be updated by this mechanism? Then, pkg_resources could check both that > and its internal registry to be both backward and forward compatible. I could see no problem with that, so I have added this to the PEP. Thanks for the feedback, Martin From martin at v.loewis.de Fri Apr 3 21:55:22 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 21:55:22 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D51A16.70804@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk> Message-ID: <49D669AA.6080001@v.loewis.de> Chris Withers wrote: > Martin v. L?wis wrote: >> I propose the following PEP for inclusion to Python 3.1. >> Please comment. > > Would this support the following case: > > I have a package called mortar, which defines useful stuff: > > from mortar import content, ... > > I now want to distribute large optional chunks separately, but ideally > so that the following will will work: > > from mortar.rbd import ... > from mortar.zodb import ... > from mortar.wsgi import ... > > Does the PEP support this? That's the primary purpose of the PEP. You can do this today already (see the zope package, and the reference to current techniques in the PEP), but the PEP provides a cleaner way. In each chunk (which the PEP calls portion), you had a structure like this: mortar/ mortar/rbd.pkg (contains just "*") mortar/rbd.py or mortar/ mortar/zobd.pkg mortar/zobd/ mortar/zobd/__init__.py mortar/zobd/backends.py As a site effect, you can also do "import mortar", but that would just give you the (nearly) empty namespace package, whose only significant contents is the variable __path__. Regards, Martin From martin at v.loewis.de Fri Apr 3 22:07:10 2009 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 22:07:10 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D52115.6020001@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> Message-ID: <49D66C6E.3090602@v.loewis.de> > I'd like to extend the proposal to Python 2.7 and later. I don't object, but I also don't want to propose this, so I added it to the discussion. My (and perhaps other people's) concern is that 2.7 might well be the last release of the 2.x series. If so, adding this feature to it would make 2.7 an odd special case for users and providers of third party tools. > That's going to slow down Python package detection a lot - you'd > replace an O(1) test with an O(n) scan. I question that claim. In traditional Unix systems, the file system driver performs a linear search of the directory, so it's rather O(n)-in-kernel vs. O(n)-in-Python. Even for advanced file systems, you need at least O(log n) to determine whether a specific file is in a directory. For all practical purposes, the package directory will fit in a single disk block (containing a single .pkg file, and one or few subpackages), making listdir complete as fast as stat. > Wouldn't it be better to stick with a simpler approach and look for > "__pkg__.py" files to detect namespace packages using that O(1) check ? Again - this wouldn't be O(1). More importantly, it breaks system packages, which now again have to deal with the conflicting file names if they want to install all portions into a single location. > This would also avoid any issues you'd otherwise run into if you want > to maintain this scheme in an importer that doesn't have access to a list > of files in a package directory, but is well capable for the checking > the existence of a file. Do you have a specific mechanism in mind? Regards, Martin From martin at v.loewis.de Fri Apr 3 22:15:55 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 03 Apr 2009 22:15:55 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090403004135.B76443A40A7@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <20090403004135.B76443A40A7@sparrow.telecommunity.com> Message-ID: <49D66E7B.9080304@v.loewis.de> > Note that there is no such thing as a "defining namespace package" -- > namespace package contents are symmetrical peers. With the PEP, a "defining package" becomes possible - at most one portion can define an __init__.py. I know that the current mechanisms don't support it, and it might not be useful in general, but now there is a clean way of doing it, so I wouldn't exclude it. Distribution-wise, all distributions relying on the defining package would need to require (or install_require, or depend on) it. > The above are also true for using only a '*' in .pkg files -- in that > event there are no sys.path changes. (Frankly, I'm doubtful that > anybody is using extend_path and .pkg files to begin with, so I'd be > fine with a proposal that instead used something like '.nsp' files that > didn't even need to be opened and read -- which would let the directory > scan stop at the first .nsp file found. That would work for me as well. Nobody at PyCon could remember where .pkg files came from. > I believe the PEP does this as well, IIUC. Correct. >> * It's possible to have a defining package dir and add-one package >> dirs. > > Also possible in the PEP, although the __init__.py must be in the first > such directory on sys.path. I should make it clear that this is not the case. I envision it to work this way: import zope - searches sys.path, until finding either a directory zope, or a file zope.{py,pyc,pyd,...} - if it is a directory, it checks for .pkg files. If it finds any, it processes them, extending __path__. - it *then* checks for __init__.py, taking the first hit anywhere on __path__ (just like any module import would) - if no .pkg was found, nor an __init__.py, it proceeds with the next sys.path item (skipping the directory entirely) Regards, Martin From jyasskin at gmail.com Fri Apr 3 22:17:37 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Fri, 3 Apr 2009 15:17:37 -0500 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <49D64748.70305@voidspace.org.uk> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> Message-ID: <5d44f72f0904031317o1a2cb434p7e59ce5046c4bbd1@mail.gmail.com> On Fri, Apr 3, 2009 at 12:28 PM, Michael Foord wrote: > Collin Winter wrote: >> >> On Fri, Apr 3, 2009 at 2:27 AM, Antoine Pitrou >> wrote: >> >>> >>> Thomas Wouters python.org> writes: >>> >>>> >>>> Pystone is pretty much a useless benchmark. If it measures anything, >>>> it's the >>>> >>> >>> speed of the bytecode dispatcher (and it doesn't measure it particularly >>> well.) >>> PyBench isn't any better, in my experience. >>> >>> I don't think pybench is useless. It gives a lot of performance data >>> about >>> crucial internal operations of the interpreter. It is of course very >>> little >>> real-world, but conversely makes you know immediately where a performance >>> regression has happened. (by contrast, if you witness a regression in a >>> high-level benchmark, you still have a lot of investigation to do to find >>> out >>> where exactly something bad happened) >>> >>> Perhaps someone should start maintaining a suite of benchmarks, >>> high-level and >>> low-level; we currently have them all scattered around (pybench, pystone, >>> stringbench, richard, iobench, and the various Unladen Swallow >>> benchmarks; not >>> to mention other third-party stuff that can be found in e.g. the Computer >>> Language Shootout). >>> >> >> Already in the works :) >> >> As part of the common standard library and test suite that we agreed >> on at the PyCon language summit last week, we're going to include a >> common benchmark suite that all Python implementations can share. This >> is still some months off, though, so there'll be plenty of time to >> bikeshed^Wrationally discuss which benchmarks should go in there. >> > > Where is the right place for us to discuss this common benchmark and test > suite? Dunno. Here, by default, but I'd subscribe to a tests-sig or commonlibrary-sig or benchmark-sig if one were created. > As the benchmark is developed I would like to ensure it can run on > IronPython. We want to ensure the same thing for the current unladen swallow suite. If you find ways it currently doesn't, send us patches (until we get it moved to the common library repository at which point you'll be able to submit changes yourself). You should be able to check out http://unladen-swallow.googlecode.com/svn/tests independently of the rest of the repository. Follow the instructions at http://code.google.com/p/unladen-swallow/wiki/Benchmarks to run benchmarks though perf.py. You'll probably want to select benchmarks individually rather than accepting the default of "all" because it's currently not very resilient to tests that don't run on one of the comparison pythons. Personally, I'd be quite happy moving our performance tests into the main python repository before the big library+tests move, but I don't know what directory to put it in, and I don't know what Collin+Thomas think of that. > The test suite changes will need some discussion as well - Jython and > IronPython (and probably PyPy) have almost identical changes to tests that > currently rely on deterministic finalisation (reference counting) so it > makes sense to test changes on both platforms and commit a single solution. IMHO, any place in the test suite that relies on deterministic finalization but isn't explicitly testing that CPython-specific feature is a bug and should be fixed, even before we export it to the new repository. Jeffrey From jyasskin at gmail.com Fri Apr 3 22:36:57 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Fri, 3 Apr 2009 15:36:57 -0500 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: <5d44f72f0904031336r70736337wc32212771e750608@mail.gmail.com> On Thu, Apr 2, 2009 at 5:57 PM, Guido van Rossum wrote: > On Thu, Apr 2, 2009 at 3:07 PM, Raymond Hettinger wrote: >>> Wow. Can you possibly be more negative? >> >> I think it's worse to give the poor guy the run around > > Mind your words please. > >> by making him run lots of random benchmarks. ?In >> the end, someone will run a timeit or have a specific >> case that shows the full effect. ?All of the respondents so far seem to have >> a clear intuition that hook is right in the middle of a critical path. >> ?Their intuition matches >> what I learned by spending a month trying to find ways >> to optimize dictionaries. >> >> Am surprised that there has been no discussion of why this should be in the >> default build (as opposed to a compile time option). ?AFAICT, users have not >> previously >> requested a hook like this. > > I may be partially to blame for this. John and Stephan are requesting > this because it would (mostly) fulfill one of the top wishes of the > users of Wingware. So the use case is certainly real. > >> Also, there has been no discussion for an overall strategy >> for monitoring containers in general. ?Lists and tuples will >> both defy this approach because there is so much code >> that accesses the arrays directly. ?Am not sure whether the >> setitem hook would work for other implementations either. > > The primary use case is some kind of trap on assignment. While this > cannot cover all cases, most non-local variables are stored in dicts. > List mutations are not in the same league, as use case. > >> It seems weird to me that Collin's group can be working >> so hard just to get a percent or two improvement in specific cases for >> pickling while python-dev is readily entertaining a patch that slows down >> the entire language. > > I don't actually believe that you can know whether this affects > performance at all without serious benchmarking. The patch amounts to > a single global flag check as long as the feature is disabled, and > that flag could be read from the L1 cache. When I was optimizing the tracing support in the eval loop, we started with two memory loads and an if test. Removing the whole thing saved about 3% of runtime, although I think that had been as high as 5% when Neal measured it a year before. (That indicates that the exact arrangement of the code can affect performance in subtle and annoying ways.) Removing one of the two loads saved about 2% of runtime. I don't remember exactly which benchmark that was; it may just have been pybench. Here, we're talking about introducing a load+if in dicts, which is less critical than the eval loop, so I'd guess that the effect will be less than 2% overall. I do think the real-life benchmarks are worth getting for this, but they may not predict the effect after other code changes. And I don't really have an opinion on what performance hit for normal use is worth better debugging. >> If my thoughts on the subject bug you, I'll happily >> withdraw from the thread. ?I don't aspire to be a >> source of negativity. ?I just happen to think this proposal isn't a good >> idea. > > I think we need more proof either way. > >> Raymond >> >> >> >> ----- Original Message ----- From: "Guido van Rossum" >> To: "Raymond Hettinger" >> Cc: "Thomas Wouters" ; "John Ehresman" >> ; >> Sent: Thursday, April 02, 2009 2:19 PM >> Subject: Re: [Python-Dev] PyDict_SetItem hook >> >> >> Wow. Can you possibly be more negative? >> >> 2009/4/2 Raymond Hettinger : >>> >>> The measurements are just a distractor. We all already know that the hook >>> is being added to a critical path. Everyone will pay a cost for a feature >>> that few people will use. This is a really bad idea. It is not part of a >>> thorough, thought-out framework of container hooks (something that would >>> need a PEP at the very least). The case for how it helps us is somewhat >>> thin. The case for DTrace hooks was much stronger. >>> >>> If something does go in, it should be #ifdef'd out by default. But then, I >>> don't think it should go in at all. >>> >>> >>> Raymond >>> >>> >>> >>> >>> On Thu, Apr 2, 2009 at 04:16, John Ehresman wrote: >>>> >>>> Collin Winter wrote: >>>>> >>>>> Have you measured the impact on performance? >>>> >>>> I've tried to test using pystone, but am seeing more differences between >>>> runs than there is between python w/ the patch and w/o when there is no >>>> hook >>>> installed. The highest pystone is actually from the binary w/ the patch, >>>> which I don't really believe unless it's some low level code generation >>>> affect. The cost is one test of a global variable and then a switch to >>>> the >>>> branch that doesn't call the hooks. >>>> >>>> I'd be happy to try to come up with better numbers next week after I get >>>> home from pycon. >>> >>> Pystone is pretty much a useless benchmark. If it measures anything, it's >>> the speed of the bytecode dispatcher (and it doesn't measure it >>> particularly >>> well.) PyBench isn't any better, in my experience. Collin has collected a >>> set of reasonable benchmarks for Unladen Swallow, but they still leave a >>> lot >>> to be desired. From the discussions at the VM and Language summits before >>> PyCon, I don't think anyone else has better benchmarks, though, so I would >>> suggest using Unladen Swallow's: >>> http://code.google.com/p/unladen-swallow/wiki/Benchmarks From glyph at divmod.com Fri Apr 3 23:16:49 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 03 Apr 2009 21:16:49 -0000 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D66E7B.9080304@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <20090403004135.B76443A40A7@sparrow.telecommunity.com> <49D66E7B.9080304@v.loewis.de> Message-ID: <20090403211649.12555.1005832716.divmod.xquotient.6954@weber.divmod.com> On 08:15 pm, martin at v.loewis.de wrote: >>Note that there is no such thing as a "defining namespace package" -- >>namespace package contents are symmetrical peers. > >With the PEP, a "defining package" becomes possible - at most one >portion can define an __init__.py. For what it's worth, this is a _super_ useful feature for Twisted. We have one "defining package" for the "twisted" package (twisted core) and then a bunch of other things which want to put things into twisted.* (twisted.web, twisted.conch, et. al.). For debian we already have separate packages, but such a definition of namespace packages would allow us to actually have things separated out on the cheeseshop as well. From benjamin at python.org Fri Apr 3 23:12:50 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 3 Apr 2009 16:12:50 -0500 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: References: Message-ID: <1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com> 2009/4/3 Antoine Pitrou : > Alexandre Vassalotti peadrop.com> writes: >> >> I just noticed that the new io-c modules were merged in the py3k >> branch (I know, I am kind late on the news?blame school work). Anyway, >> I am just wondering if it would be a good idea to put the io-c modules >> in a sub-directory (like sqlite), instead of scattering them around in >> the Modules/ directory. > > Welcome back! > > I have no particular opinion on this. I suggest waiting for Benjamin's advice > and following it :-) I'm +.2. This is the layout I would suggest: Modules/ _io/ _io.c stringio.c textio.c etc.... > > (unless the FLUFL wants to chime in) > > Benjamin-makes-boring-decisions-easy'ly yrs, > > Antoine. mad-with-power'ly yours, Benjamin From benjamin at python.org Fri Apr 3 23:15:47 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 3 Apr 2009 16:15:47 -0500 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: <49D63ABD.30000@v.loewis.de> References: <49D63ABD.30000@v.loewis.de> Message-ID: <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com> 2009/4/3 "Martin v. L?wis" : >>> I just noticed that the new io-c modules were merged in the py3k >>> branch (I know, I am kind late on the news?blame school work). Anyway, >>> I am just wondering if it would be a good idea to put the io-c modules >>> in a sub-directory (like sqlite), instead of scattering them around in >>> the Modules/ directory. >> >> Welcome back! >> >> I have no particular opinion on this. I suggest waiting for Benjamin's advice >> and following it :-) > > I would suggest to leave it as is: > a) never change a running system > b) flat is better than nested It doesn't make sense, though, to have the 8 files that make up the _io module scattered around in a directory with scores of other ones. -- Regards, Benjamin From pje at telecommunity.com Fri Apr 3 23:23:19 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 03 Apr 2009 17:23:19 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D66E7B.9080304@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <20090403004135.B76443A40A7@sparrow.telecommunity.com> <49D66E7B.9080304@v.loewis.de> Message-ID: <20090403212054.D15F73A40A7@sparrow.telecommunity.com> At 10:15 PM 4/3/2009 +0200, Martin v. L?wis wrote: >I should make it clear that this is not the case. I envision it to work >this way: import zope >- searches sys.path, until finding either a directory zope, or a file > zope.{py,pyc,pyd,...} >- if it is a directory, it checks for .pkg files. If it finds any, > it processes them, extending __path__. >- it *then* checks for __init__.py, taking the first hit anywhere > on __path__ (just like any module import would) >- if no .pkg was found, nor an __init__.py, it proceeds with the next > sys.path item (skipping the directory entirely) Ah, I missed that. Maybe the above should be added to the PEP to clarify. From benjamin at python.org Fri Apr 3 23:27:05 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 3 Apr 2009 16:27:05 -0500 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D63465.80401@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> Message-ID: <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com> 2009/4/3 Chris Withers : > Guido van Rossum wrote: >>>> >>>> But anyways this is moot, the bug was only about exec in a class body >>>> *nested inside a function*. >>> >>> Indeed, I just hate seeing execs and it was an interesting mental >>> exercise >>> to try and get rid of the above one ;-) >>> >>> Assuming it breaks no tests, would there be objection to me committing >>> the >>> above change to the Python 3 trunk? >> >> That's up to Benjamin. Personally, I live by "if it ain't broke, don't >> fix it." :-) > > Anything using an exec is broken by definition ;-) "practicality beats purity" > > Benjamin? +0 -- Regards, Benjamin From guido at python.org Fri Apr 3 23:32:42 2009 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Apr 2009 14:32:42 -0700 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com> References: <49D63ABD.30000@v.loewis.de> <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com> Message-ID: On Fri, Apr 3, 2009 at 2:15 PM, Benjamin Peterson wrote: > 2009/4/3 "Martin v. L?wis" : >>>> I just noticed that the new io-c modules were merged in the py3k >>>> branch (I know, I am kind late on the news?blame school work). Anyway, >>>> I am just wondering if it would be a good idea to put the io-c modules >>>> in a sub-directory (like sqlite), instead of scattering them around in >>>> the Modules/ directory. >>> >>> Welcome back! >>> >>> I have no particular opinion on this. I suggest waiting for Benjamin's advice >>> and following it :-) >> >> I would suggest to leave it as is: >> a) never change a running system >> b) flat is better than nested > > It doesn't make sense, though, to have the 8 files that make up the > _io module scattered around in a directory with scores of other ones. I think Benjamin is right. While most of the C source is indeed exactly one level below the root, there's plenty of code that isn't, e.g. _ctypes, cjkcodecs, expat, _multiprocessing, zlib. And even Objects/stringlib. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at janc.be Sat Apr 4 00:36:14 2009 From: lists at janc.be (Jan Claeys) Date: Sat, 04 Apr 2009 00:36:14 +0200 Subject: [Python-Dev] Integrate BeautifulSoup into stdlib? In-Reply-To: <49C939BA.8040206@v.loewis.de> References: <49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de> <49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de> <49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org> <49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de> <49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com> <49C939BA.8040206@v.loewis.de> Message-ID: <1238798174.5360.388.camel@saeko.local> Op dinsdag 24-03-2009 om 20:51 uur [tijdzone +0100], schreef "Martin v. L?wis": > The Windows story is indeed sad, as none of the Windows packaging > formats provides support for dependencies That's not entirely true; Cygwin comes with a package management tool that probably could be used to set up a repository of python packages for native Windows: This package manager is in no way dependent on Cygwin, supports (basic) dependencies, etc. Of course some people would have to take care of the packaging work (just like happens for most open source OS distros and for Mac OS X already). It seems like XEmacs is already using a fork of that installer for the same purpose. -- Jan Claeys From alexandre at peadrop.com Sat Apr 4 00:53:18 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 3 Apr 2009 18:53:18 -0400 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> Message-ID: On Tue, Mar 31, 2009 at 11:25 PM, Guido van Rossum wrote: > Well hold on for a minute, I remember we used to have an exec > statement in a class body in the standard library, to define some file > methods in socket.py IIRC. FYI, collections.namedtuple is also implemented using exec. - Alexandre From leif.walsh at gmail.com Sat Apr 4 01:18:22 2009 From: leif.walsh at gmail.com (Leif Walsh) Date: Fri, 3 Apr 2009 19:18:22 -0400 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <49D5FBE6.6090807@avl.com> References: <49D5FBE6.6090807@avl.com> Message-ID: On Fri, Apr 3, 2009 at 8:07 AM, Hrvoje Niksic wrote: > But I can't seem to find a way to retrieve the element corresponding to > 'foo', at least not without iterating over the entire set. ?Is this an > oversight or an intentional feature? ?Or am I just missing an obvious way to > do this? >>> query_obj in s True >>> s_prime = s.copy() >>> s_prime.discard(query_obj) >>> x = s.difference(s_prime).pop() Pretty ugly, but I think it only uses a shallow copy, and it might be a bit better than iterating, if difference is intelligent. I haven't run any tests though. -- Cheers, Leif From brett at python.org Sat Apr 4 01:37:06 2009 From: brett at python.org (Brett Cannon) Date: Fri, 3 Apr 2009 16:37:06 -0700 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D66E7B.9080304@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <20090403004135.B76443A40A7@sparrow.telecommunity.com> <49D66E7B.9080304@v.loewis.de> Message-ID: On Fri, Apr 3, 2009 at 13:15, "Martin v. L?wis" wrote: > > Note that there is no such thing as a "defining namespace package" -- > > namespace package contents are symmetrical peers. > > With the PEP, a "defining package" becomes possible - at most one > portion can define an __init__.py. > > I know that the current mechanisms don't support it, and it might > not be useful in general, but now there is a clean way of doing it, > so I wouldn't exclude it. Distribution-wise, all distributions > relying on the defining package would need to require (or > install_require, or depend on) it. > > > The above are also true for using only a '*' in .pkg files -- in that > > event there are no sys.path changes. (Frankly, I'm doubtful that > > anybody is using extend_path and .pkg files to begin with, so I'd be > > fine with a proposal that instead used something like '.nsp' files that > > didn't even need to be opened and read -- which would let the directory > > scan stop at the first .nsp file found. > > That would work for me as well. Nobody at PyCon could remember where > .pkg files came from. > > > I believe the PEP does this as well, IIUC. > > Correct. > > >> * It's possible to have a defining package dir and add-one package > >> dirs. > > > > Also possible in the PEP, although the __init__.py must be in the first > > such directory on sys.path. > > I should make it clear that this is not the case. I envision it to work > this way: import zope > - searches sys.path, until finding either a directory zope, or a file > zope.{py,pyc,pyd,...} > - if it is a directory, it checks for .pkg files. If it finds any, > it processes them, extending __path__. > - it *then* checks for __init__.py, taking the first hit anywhere > on __path__ (just like any module import would) Just so people know how this __init__ search could be done such that __path__ is set from the .pkg is to treat it as a reload (assuming .pkg files can only be found off of sys.path). -Brett > - if no .pkg was found, nor an __init__.py, it proceeds with the next > sys.path item (skipping the directory entirely) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at janc.be Sat Apr 4 01:59:38 2009 From: lists at janc.be (Jan Claeys) Date: Sat, 04 Apr 2009 01:59:38 +0200 Subject: [Python-Dev] And the winner is... In-Reply-To: References: <3c6c07c20903301759u209f1b0dyb46c933e5f25f0b2@mail.gmail.com> Message-ID: <1238803178.5360.389.camel@saeko.local> Op maandag 30-03-2009 om 21:54 uur [tijdzone -0500], schreef Guido van Rossum: > But is his humility enough to cancel out Linus's attitude? I hope not, or the /.-crowd would become desperate... ;-) -- Jan Claeys From alexandre at peadrop.com Sat Apr 4 01:59:44 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 3 Apr 2009 19:59:44 -0400 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: <1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com> References: <1afaf6160904031412n6b7415acxcc9e85677f54981e@mail.gmail.com> Message-ID: On Fri, Apr 3, 2009 at 5:12 PM, Benjamin Peterson wrote: > I'm +.2. This is the layout I would suggest: > > Modules/ > ?_io/ > ? ? _io.c > ? ? stringio.c > ? ? textio.c > ? ? etc.... > That seems good to me. I opened an issue on the tracker and included a patch. http://bugs.python.org/issue5682 -- Alexandre From brett at python.org Sat Apr 4 02:28:59 2009 From: brett at python.org (Brett Cannon) Date: Fri, 3 Apr 2009 17:28:59 -0700 Subject: [Python-Dev] Going "offline" for three months Message-ID: In order to hunker down and get my thesis proposal done by its due date, I am disabling mail delivery for myself for all mail.python.org mailing lists for three months (sans python-committers so I don't accidentally commit when I shouldn't). If something comes up I should know about you can always email or IM me directly. See you all on July 1. Here is to hoping I don't suffer any withdrawal. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Apr 4 03:54:51 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 04 Apr 2009 11:54:51 +1000 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com> References: <49D5FBE6.6090807@avl.com> <49D63C99.6000302@v.loewis.de> <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com> Message-ID: <49D6BDEB.3050505@gmail.com> Paul Moore wrote: > 2009/4/3 R. David Murray : >> a == b >> >> So, python calls a.__eq__(b) >> >> Now, that function does: >> >> a.key == b >> >> Since b is an object with an __eq__ method, python calls >> b.__eq__(a.key). > > That's the bit I can't actually find documented anywhere. It doesn't quite work the way RDM desribed it - he missed a step. a == b So, python calls a.__eq__(b) Now, that function does: a.key == b which first calls a.key.__eq__(b) # This step was missing Since str has no idea what an Element is, that returns NotImplemented. Since __eq__ is defined as being commutative, the interpreter then tries b.__eq__(a.key). That function does: b.key == a.key which calls b.key.__eq__(a.key) which is a well defined string comparison and returns the expected answer. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Sat Apr 4 04:07:34 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Apr 2009 04:07:34 +0200 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> Message-ID: <49D6C0E6.5050404@v.loewis.de> Alexandre Vassalotti wrote: > On Tue, Mar 31, 2009 at 11:25 PM, Guido van Rossum wrote: >> Well hold on for a minute, I remember we used to have an exec >> statement in a class body in the standard library, to define some file >> methods in socket.py IIRC. > > FYI, collections.namedtuple is also implemented using exec. Ah, but it uses "exec ... in ...". That is much safer than an unqualified exec (where the issue is what namespace it executes in, and, consequentially, what early binding is possible). The patch bans only unqualified exec, IIUC. Regards, Martin From martin at v.loewis.de Sat Apr 4 04:12:28 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 04 Apr 2009 04:12:28 +0200 Subject: [Python-Dev] Integrate BeautifulSoup into stdlib? In-Reply-To: <1238798174.5360.388.camel@saeko.local> References: <49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de> <49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de> <49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org> <49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de> <49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com> <49C939BA.8040206@v.loewis.de> <1238798174.5360.388.camel@saeko.local> Message-ID: <49D6C20C.8030102@v.loewis.de> > That's not entirely true; Cygwin comes with a package management tool > that probably could be used to set up a repository of python packages > for native Windows: Ah, ok. It has the big disadvantage of not being Microsoft-endorsed, though. In that sense, it feels very much like easy_install (which also does dependencies). Regards, Martin From ben+python at benfinney.id.au Sat Apr 4 04:33:39 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 04 Apr 2009 13:33:39 +1100 Subject: [Python-Dev] Going "offline" for three months References: Message-ID: <87hc15cflo.fsf@benfinney.id.au> Brett Cannon writes: > See you all on July 1. Here is to hoping I don't suffer any > withdrawal. Ouch. Best of luck to you! -- \ ?Giving every man a vote has no more made men wise and free | `\ than Christianity has made them good.? ?Henry L. Mencken | _o__) | Ben Finney From python at rcn.com Sat Apr 4 04:37:38 2009 From: python at rcn.com (Raymond Hettinger) Date: Fri, 3 Apr 2009 19:37:38 -0700 Subject: [Python-Dev] Getting values stored inside sets References: <49D5FBE6.6090807@avl.com><49D63C99.6000302@v.loewis.de> <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com> <49D6BDEB.3050505@gmail.com> Message-ID: <420CC0DE254142398B721C0067FC9403@RaymondLaptop1> [Nick Coghlan] > It doesn't quite work the way RDM desribed it - he missed a step. Thanks for the clarification. We ought to write-out the process somewhere in a FAQ. It may also be instructive to step through the recipe that answers the OP's original request, http://code.activestate.com/recipes/499299/ The call "get_equivalent(set([1, 2, 3]), 2.0)" wraps the 2.0 in a new object t and calls "t in set([1,2,3])". The set.__contains__ method hashes t using t.__hash__(self) and checks for an exact match using t.__eq__(other). Both calls delegate to float objects but the latter also records the "other" that resulted in a successful equality test (i.e. 2 is the member of the set that matched the 2.0). The get_equivalent call then returns the matching value, 2.0. As far as I can tell, the technique is completely generic and lets you reach inside any function or container to retrieve the "other" value that is equivalent to "self". Raymond From fuzzyman at voidspace.org.uk Sat Apr 4 04:51:45 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 04 Apr 2009 03:51:45 +0100 Subject: [Python-Dev] Going "offline" for three months In-Reply-To: References: Message-ID: <49D6CB41.5030608@voidspace.org.uk> Brett Cannon wrote: > In order to hunker down and get my thesis proposal done by its due > date, I am disabling mail delivery for myself for all mail.python.org > mailing lists for three months (sans > python-committers so I don't accidentally commit when I shouldn't). If > something comes up I should know about you can always email or IM me > directly. > > See you all on July 1. Here is to hoping I don't suffer any withdrawal. We'll miss you. Hope you don't end up preferring Java. ;-) Michael > > -Brett > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From ctb at msu.edu Sat Apr 4 04:55:34 2009 From: ctb at msu.edu (C. Titus Brown) Date: Fri, 3 Apr 2009 19:55:34 -0700 Subject: [Python-Dev] core python tests (was: Re: PyDict_SetItem hook) In-Reply-To: <49D64ECB.9040100@voidspace.org.uk> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> Message-ID: <20090404025534.GA12996@idyll.org> On Fri, Apr 03, 2009 at 07:00:43PM +0100, Michael Foord wrote: -> Collin Winter wrote: -> >On Fri, Apr 3, 2009 at 10:28 AM, Michael Foord -> > wrote: -> > -> >>Collin Winter wrote: -> >> -> >>>As part of the common standard library and test suite that we agreed -> >>>on at the PyCon language summit last week, we're going to include a -> >>>common benchmark suite that all Python implementations can share. This -> >>>is still some months off, though, so there'll be plenty of time to -> >>>bikeshed^Wrationally discuss which benchmarks should go in there. -> >>> -> >>> -> >>Where is the right place for us to discuss this common benchmark and test -> >>suite? -> >> -> >>As the benchmark is developed I would like to ensure it can run on -> >>IronPython. -> >> -> >>The test suite changes will need some discussion as well - Jython and -> >>IronPython (and probably PyPy) have almost identical changes to tests that -> >>currently rely on deterministic finalisation (reference counting) so it -> >>makes sense to test changes on both platforms and commit a single -> >>solution. -> >> -> > -> >I believe Brett Cannon is the best person to talk to about this kind -> >of thing. I don't know that any common mailing list has been set up, -> >though there may be and Brett just hasn't told anyone yet :) -> > -> >Collin -> > -> Which begs the question of whether we *should* have a separate mailing list. -> -> I don't think we discussed this specific point in the language summit - -> although it makes sense. Should we have a list specifically for the test -> / benchmarking or would a more general implementations-sig be appropriate? -> -> And is it really Brett who sets up mailing lists? My understanding is -> that he is pulling out of stuff for a while anyway, so that he can do -> Java / Phd type things... ;-) 'tis a sad loss for both Python-dev and the academic community... I vote for a separate mailing list -- 'python-tests'? -- but I don't know exactly how splintered to make the conversation. It probably belongs at python.org but if you want me to host it, I can. N.B. There are a bunch of GSoC projects to work on or with the CPython test framework (increase test coverage, write plugins to make it runnable in nose or py.test, etc.). I don't know that the students should be active participants in such a list, but the mentors should at least try to stay in the loop so that we don't completely waste our time. cheers, --titus -- C. Titus Brown, ctb at msu.edu From brett at python.org Sat Apr 4 04:56:11 2009 From: brett at python.org (Brett Cannon) Date: Fri, 3 Apr 2009 19:56:11 -0700 Subject: [Python-Dev] Going "offline" for three months In-Reply-To: <49D6CB41.5030608@voidspace.org.uk> References: <49D6CB41.5030608@voidspace.org.uk> Message-ID: On Fri, Apr 3, 2009 at 19:51, Michael Foord wrote: > Brett Cannon wrote: > >> In order to hunker down and get my thesis proposal done by its due date, I >> am disabling mail delivery for myself for all mail.python.org < >> http://mail.python.org> mailing lists for three months (sans >> python-committers so I don't accidentally commit when I shouldn't). If >> something comes up I should know about you can always email or IM me >> directly. >> >> See you all on July 1. Here is to hoping I don't suffer any withdrawal. >> > > We'll miss you. Hope you don't end up preferring Java. ;-) No, it would be more like JavaScript, but I don't see that happening either. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Apr 4 07:03:40 2009 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Apr 2009 07:03:40 +0200 Subject: [Python-Dev] Should the io-c modules be put in their own directory? In-Reply-To: References: <49D63ABD.30000@v.loewis.de> <1afaf6160904031415y3a62c7b1u90b674d2dc3ed28c@mail.gmail.com> Message-ID: <49D6EA2C.6080103@v.loewis.de> > I think Benjamin is right. While most of the C source is indeed > exactly one level below the root, there's plenty of code that isn't, > e.g. _ctypes, cjkcodecs, expat, _multiprocessing, zlib. And even > Objects/stringlib. It's fine with me either way. Martin From ncoghlan at gmail.com Sat Apr 4 07:16:23 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 04 Apr 2009 15:16:23 +1000 Subject: [Python-Dev] core python tests In-Reply-To: <20090404025534.GA12996@idyll.org> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org> Message-ID: <49D6ED27.8030908@gmail.com> C. Titus Brown wrote: > I vote for a separate mailing list -- 'python-tests'? -- but I don't > know exactly how splintered to make the conversation. It probably > belongs at python.org but if you want me to host it, I can. If too many things get moved off to SIGs there won't be anything left for python-dev to talk about ;) (Although in this case it makes sense, as I expect there will be developers involved in alternate implementations that would like to be part of the test suite discussion without having to sign up for the rest of python-dev) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Sat Apr 4 07:54:23 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 04 Apr 2009 15:54:23 +1000 Subject: [Python-Dev] Documenting the Py3k coercion rules (was Re: Getting values stored inside sets) In-Reply-To: <420CC0DE254142398B721C0067FC9403@RaymondLaptop1> References: <49D5FBE6.6090807@avl.com><49D63C99.6000302@v.loewis.de> <79990c6b0904030957g380e9ce6u54fbf60d13897374@mail.gmail.com> <79990c6b0904031056m32a85ea5yea16dd6c4540dfb1@mail.gmail.com> <49D6BDEB.3050505@gmail.com> <420CC0DE254142398B721C0067FC9403@RaymondLaptop1> Message-ID: <49D6F60F.8090909@gmail.com> Raymond Hettinger wrote: > > [Nick Coghlan] >> It doesn't quite work the way RDM desribed it - he missed a step. > > Thanks for the clarification. We ought to write-out the process > somewhere in a FAQ. The closest we currently have to that is the write-up of the coercion rules in 2.x: http://docs.python.org/reference/datamodel.html#id5 Unfortunately, that mixes in a lot of CPython specific special cases along with the old coerce() builtin that obscure the basic behaviour for __op__ and __rop__ pairs. Here's an initial stab at a write-up of the coercion rules for Py3k that is accurate without getting too CPython specific: """ Given "a OP b", the coercion sequence is: 1. Try "a.__op__(b)" 2. If "a.__op__" doesn't exist or the call returns NotImplemented, try "b.__rop__(a)" 3. If "b.__rop__" doesn't exist or the call returns NotImplemented, raise TypeError identifying "type(a)" and "type(b)" as unsupported operands for OP 4. If step 1 or 2 is successful, then the result of the call is the value of the expression Given "a OP= b" the coercion sequence is: 1. Try "a = a.__iop__(b)" 2. If "a.__iop__" doesn't exist or the call returns not implemented, try "a = a OP b" using the normal binary coercion rules above Special cases: - if "type(b)" is a strict subclass of "type(a)", then "b.__rop__" is tried before "a.__op__". This allows subclasses to ensure an instance of the subclass is returned when interacting with instances of the parent class. - rich comparisons are associated into __op__/__rop__ pairs as follows: __eq__/__eq__ (i.e. a == b is considered equivalent to b == a) __ne__/__ne__ (i.e. a != b is considered equivalent to b != a) __lt__/__gt__ (i.e. a < b is considered equivalent to b > a) __le__/__ge__ (i.e. a <= b is considered equivalent to b >= a) - __rpow__ is never invoked for the 3 argument form of pow(), as the coercion rules only apply to binary operations. In this case, a NotImplemented return from the call to __pow__ is converted immediately into a TypeError. """ Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sat Apr 4 13:04:28 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 11:04:28 +0000 (UTC) Subject: [Python-Dev] core python tests References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org> <49D6ED27.8030908@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > C. Titus Brown wrote: > > I vote for a separate mailing list -- 'python-tests'? -- but I don't > > know exactly how splintered to make the conversation. It probably > > belongs at python.org but if you want me to host it, I can. > > If too many things get moved off to SIGs there won't be anything left > for python-dev to talk about ;) There is already an stdlib-sig, which has been almost unused. Regards Antoine. From aahz at pythoncraft.com Sat Apr 4 15:28:01 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 4 Apr 2009 06:28:01 -0700 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> Message-ID: <20090404132800.GA10257@panix.com> On Fri, Apr 03, 2009, Collin Winter wrote: > > I don't believe that these are insurmountable problems, though. A > great contribution to Python performance work would be an improved > version of PyBench that corrects these problems and offers more > precise measurements. Is that something you might be interested in > contributing to? As performance moves more into the wider > consciousness, having good tools will become increasingly important. GSoC work? -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From ctb at msu.edu Sat Apr 4 16:26:12 2009 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 4 Apr 2009 07:26:12 -0700 Subject: [Python-Dev] GSoC (was Re: PyDict_SetItem hook) In-Reply-To: <20090404132800.GA10257@panix.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <9e804ac0904030906x281906ra555c7fe2619197b@mail.gmail.com> <43aa6ff70904031018s6b97a76at545ce2a8f17916c9@mail.gmail.com> <20090404132800.GA10257@panix.com> Message-ID: <20090404142612.GG12593@idyll.org> On Sat, Apr 04, 2009 at 06:28:01AM -0700, Aahz wrote: -> On Fri, Apr 03, 2009, Collin Winter wrote: -> > -> > I don't believe that these are insurmountable problems, though. A -> > great contribution to Python performance work would be an improved -> > version of PyBench that corrects these problems and offers more -> > precise measurements. Is that something you might be interested in -> > contributing to? As performance moves more into the wider -> > consciousness, having good tools will become increasingly important. -> -> GSoC work? Alas, it's too late to submit new proposals; the deadline was yesterday. The next "Google gives us money to wrangle students into doing development" project will probably be GHOP for highschool students, in the winter, although it has not been announced and may not happen. cheers, --titus -- C. Titus Brown, ctb at msu.edu From fuzzyman at voidspace.org.uk Sat Apr 4 16:33:49 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 04 Apr 2009 15:33:49 +0100 Subject: [Python-Dev] core python tests In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org> <49D6ED27.8030908@gmail.com> Message-ID: <49D76FCD.8050303@voidspace.org.uk> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: > >> C. Titus Brown wrote: >> >>> I vote for a separate mailing list -- 'python-tests'? -- but I don't >>> know exactly how splintered to make the conversation. It probably >>> belongs at python.org but if you want me to host it, I can. >>> >> If too many things get moved off to SIGs there won't be anything left >> for python-dev to talk about ;) >> > > There is already an stdlib-sig, which has been almost unused. > > stdlib-sig isn't *quite* right (the testing and benchmarking are as much about core python as the stdlib) - although we could view the benchmarks and tests themselves as part of the standard library... Either way we should get it underway. Collin and Jeffrey - happy to use stdlib-sig? Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From ctb at msu.edu Sat Apr 4 17:01:11 2009 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 4 Apr 2009 08:01:11 -0700 Subject: [Python-Dev] graphics maths types in python core? Message-ID: <20090404150111.GQ12593@idyll.org> Hi all, we're having a discussion over on the GSoC mailing list about basic math types, and I was wondering if there is any history that we should be aware of in python-dev. Has this been brought up before and rejected? Should the interested projects work towards a consensus and maybe write up a PEP? (The proximal issue is whether or not this is of direct relevance to the python core and hence should be given priority.) tnx, -titus Rene Dudfield wrote: -> there's seven graphics math type proposals which would be a good project -> for the graphics python using projects -- especially if they can get -> into python. -> -> It would be great if one of these proposals was accepted to work towards -> getting these simple types into python. -> -> Otherwise we'll be doomed to have each project implement vec2, vec3, -> vec4, matrix3/4, quaternion (which has already happened many times) - -> and continue to have interoperability issues. -> -> The reason why just these basic types, and not full blown numpy is that -> numpy is never planned to get into python. Numpy doesn't want to tie -> it's development into pythons development cycle. Whereas a small set of -> types can be implemented and stabalised for python more easily. -> -> Also, it's not image, or 3d format types -- since those are also a way -> larger project. -- C. Titus Brown, ctb at msu.edu From solipsis at pitrou.net Sat Apr 4 17:09:39 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 15:09:39 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> Message-ID: C. Titus Brown msu.edu> writes: > > we're having a discussion over on the GSoC mailing list about basic > math types > [...] > -> > -> Otherwise we'll be doomed to have each project implement vec2, vec3, > -> vec4, matrix3/4, quaternion (which has already happened many times) - > -> and continue to have interoperability issues. This interoperability problem is the very reason the new buffer API and memoryview object were devised by Travis Oliphant (who is, AFAIK, a numpy contributor). Unfortunately, Travis disappeared and left us with an unfinished implementation which doesn't support anything else than linear byte buffers. So, rather than trying to stuff new specialized datatypes into Python, I suggest maths types proponents contribute the missing bits of the new buffer API and memoryview object :-) Regards Antoine. From aahz at pythoncraft.com Sat Apr 4 17:40:50 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 4 Apr 2009 08:40:50 -0700 Subject: [Python-Dev] Mercurial? Message-ID: <20090404154049.GA23987@panix.com> With Brett's (hopefully temporary!) absence, who is spearheading the Mercurial conversion? Whoever it is should probably take over PEP 374 and start updating it with the conversion plan, particularly WRT expectations for dates relative to 3.1 final and 2.7 final. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From mario.danic at gmail.com Sat Apr 4 17:46:02 2009 From: mario.danic at gmail.com (Mario) Date: Sat, 4 Apr 2009 17:46:02 +0200 Subject: [Python-Dev] Helper Python core development tools Message-ID: <79957db20904040846k6ea1c392lcc9393899aa77352@mail.gmail.com> With all the sand and sun on the beaches, should I really be doing this now? That is the question we probably ask ourselves every time we have to do some boring task. What kind of things do you think could be made better? What would make your workflow smoother and more fun? Now is your chance to voice your opinion. http://wiki.python.org/moin/CoreDevHelperTools Some of the tools/extensions categories that could be relevant: - Wrappers for working with tracker issues - Wrapper for managing patches - Wrapper for running tests - Wrapper for submitting diffs for review - Commit helpers (various hooks) - Various Roundup extensions Please be invited to comment and raise your concerns, so we could discuss them together and make our hacker's life more enjoyable. My name is Mario ?ani?, a hopeful GSoC student, and I am looking forward working with you. Thank you for your time and your help in this matter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinw at gmail.com Sat Apr 4 18:52:11 2009 From: collinw at gmail.com (Collin Winter) Date: Sat, 4 Apr 2009 09:52:11 -0700 Subject: [Python-Dev] core python tests In-Reply-To: <49D76FCD.8050303@voidspace.org.uk> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org> <49D6ED27.8030908@gmail.com> <49D76FCD.8050303@voidspace.org.uk> Message-ID: <43aa6ff70904040952g4aece85ajfceac04b7d857194@mail.gmail.com> On Sat, Apr 4, 2009 at 7:33 AM, Michael Foord wrote: > Antoine Pitrou wrote: >> >> Nick Coghlan gmail.com> writes: >> >>> >>> C. Titus Brown wrote: >>> >>>> >>>> I vote for a separate mailing list -- 'python-tests'? -- but I don't >>>> know exactly how splintered to make the conversation. ?It probably >>>> belongs at python.org but if you want me to host it, I can. >>>> >>> >>> If too many things get moved off to SIGs there won't be anything left >>> for python-dev to talk about ;) >>> >> >> There is already an stdlib-sig, which has been almost unused. >> >> > > stdlib-sig isn't *quite* right (the testing and benchmarking are as much > about core python as the stdlib) - although we could view the benchmarks and > tests themselves as part of the standard library... > > Either way we should get it underway. Collin and Jeffrey - happy to use > stdlib-sig? Works for me. Collin From guido at python.org Sat Apr 4 20:20:19 2009 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Apr 2009 11:20:19 -0700 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> Message-ID: I'm not even sure what you mean by "basic math types" (it would probably depend on which math curriculum you are using :-) but if you're not already aware of PEP 3141, that's where to start. --Guido On Sat, Apr 4, 2009 at 8:09 AM, Antoine Pitrou wrote: > C. Titus Brown msu.edu> writes: >> >> we're having a discussion over on the GSoC mailing list about basic >> math types >> > [...] >> -> >> -> Otherwise we'll be doomed to have each project implement vec2, vec3, >> -> vec4, matrix3/4, quaternion (which has already happened many times) - >> -> and continue to have interoperability issues. > > This interoperability problem is the very reason the new buffer API and > memoryview object were devised by Travis Oliphant (who is, AFAIK, a numpy > contributor). Unfortunately, Travis disappeared and left us with an unfinished > implementation which doesn't support anything else than linear byte buffers. > > So, rather than trying to stuff new specialized datatypes into Python, I suggest > maths types proponents contribute the missing bits of the new buffer API and > memoryview object :-) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Sat Apr 4 20:23:30 2009 From: barry at python.org (Barry Warsaw) Date: Sat, 4 Apr 2009 14:23:30 -0400 Subject: [Python-Dev] Package Management - thoughts from the peanut gallery In-Reply-To: <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> References: <49D534B3.8020801@simplistix.co.uk> <87y6uitjxd.fsf@xemacs.org> <94bdd2610904030101k297d59cah6987ddd8ad37207@mail.gmail.com> Message-ID: <75F5CB1E-8589-4848-937E-F43F2B82D5F3@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 3, 2009, at 4:01 AM, Tarek Ziad? wrote: > Each one of this task has a leader, except the one with (*). I just > got back > from travelling, and I will reorganize > http://wiki.python.org/moin/Distutils asap to it is up-to-date. I added a link to this from the new SIG page. http://wiki.python.org/moin/Special%20Interest%20Groups Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdelonEjvBPtnXfVAQI0QQP/c0mXr4OA+yLOFHqSksFxT5pkt2xPtxPO 25VfcGFmP0FydsGMW0fpIPC9nw3kaZhtwtx5iYiRXOg796IParSzSdleKwRdabwA SH+EzhD0gprwyfPEi6Vptb+ORz8if1gz4UPIUBfJaLVGw7eXH0Xue5rqUEksu6MX wi/MMub9V0g= =2FHl -----END PGP SIGNATURE----- From fuzzyman at voidspace.org.uk Sat Apr 4 20:37:26 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 04 Apr 2009 19:37:26 +0100 Subject: [Python-Dev] core python tests In-Reply-To: <5d44f72f0904041130x5f805862t396787b8fbb5ce6f@mail.gmail.com> References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904030919j725b375avfbe4c80c9f7bc464@mail.gmail.com> <49D64748.70305@voidspace.org.uk> <43aa6ff70904031035i62687614va480c93db09ade36@mail.gmail.com> <49D64ECB.9040100@voidspace.org.uk> <20090404025534.GA12996@idyll.org> <49D6ED27.8030908@gmail.com> <49D76FCD.8050303@voidspace.org.uk> <43aa6ff70904040952g4aece85ajfceac04b7d857194@mail.gmail.com> <5d44f72f0904041130x5f805862t396787b8fbb5ce6f@mail.gmail.com> Message-ID: <49D7A8E6.5030901@voidspace.org.uk> Jeffrey Yasskin wrote: > On Sat, Apr 4, 2009 at 11:52 AM, Collin Winter wrote: > >> On Sat, Apr 4, 2009 at 7:33 AM, Michael Foord wrote: >> >>> Antoine Pitrou wrote: >>> >>>> Nick Coghlan gmail.com> writes: >>>> >>>> >>>>> C. Titus Brown wrote: >>>>> >>>>> >>>>>> I vote for a separate mailing list -- 'python-tests'? -- but I don't >>>>>> know exactly how splintered to make the conversation. It probably >>>>>> belongs at python.org but if you want me to host it, I can. >>>>>> >>>>>> >>>>> If too many things get moved off to SIGs there won't be anything left >>>>> for python-dev to talk about ;) >>>>> >>>>> >>>> There is already an stdlib-sig, which has been almost unused. >>>> >>>> >>>> >>> stdlib-sig isn't *quite* right (the testing and benchmarking are as much >>> about core python as the stdlib) - although we could view the benchmarks and >>> tests themselves as part of the standard library... >>> >>> Either way we should get it underway. Collin and Jeffrey - happy to use >>> stdlib-sig? >>> >> Works for me. >> > > Me too. > > bcc python-dev, -> stdlib-sig > > First question: Do people want the unladen-swallow performance tests > in the CPython repository until the whole library gets moved out? If > so, where? Tools/performance? Lib/test/benchmarks? > I'm +1 on including them (so long as they run under trunk of course) but agnostic on location. Maybe better not in test as it might be expected that a full regrtest would then run them? I'm keeping Python-dev cc'd as it is a Python-dev decision and bcc messages require individual admin approval. Michael > Jeffrey > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From dirkjan at ochtman.nl Sat Apr 4 22:37:55 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sat, 04 Apr 2009 22:37:55 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <20090404154049.GA23987@panix.com> References: <20090404154049.GA23987@panix.com> Message-ID: <49D7C523.2090605@ochtman.nl> On 04/04/2009 17:40, Aahz wrote: > With Brett's (hopefully temporary!) absence, who is spearheading the > Mercurial conversion? Whoever it is should probably take over PEP 374 > and start updating it with the conversion plan, particularly WRT > expectations for dates relative to 3.1 final and 2.7 final. I'd like to take that on. I know hardly anyone here knows me, but I'm one of the Mercurial developers. I've been in contact with Brett, saying that I'd gladly as much help as I could, and I figured I'd put a lot of time in providing the best possible migration path. While I haven't posted here much, I've been lurking for about two years now, so I know a little about what's going on. Maybe I could pair up with someone here who wants to work on it, if that makes people more confident? Anyway, I'm also on the tracker-discuss list, since Brett told me that's where infra stuff mostly takes place. Cheers, Dirkjan From greg.ewing at canterbury.ac.nz Sun Apr 5 00:00:36 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Apr 2009 10:00:36 +1200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: <20090404150111.GQ12593@idyll.org> References: <20090404150111.GQ12593@idyll.org> Message-ID: <49D7D884.5060801@canterbury.ac.nz> C. Titus Brown wrote: > we're having a discussion over on the GSoC mailing list about basic > math types, and I was wondering if there is any history that we should > be aware of in python-dev. Something I've suggested before is to provide a set of functions for doing elementwise arithmetic operations on objects that support the new buffer protocol. Together with a multidimensional version of the standard array.array type, this would provide a kind of "numpy lite" that you could use to build reasonably efficient vector and matrix types with no external dependencies. By making these functions that operate through the buffer protocol rather than special types, they would be much more flexible and interoperate with other libraries very well. -- Greg From solipsis at pitrou.net Sun Apr 5 00:11:55 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 22:11:55 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > Something I've suggested before is to provide a set of > functions for doing elementwise arithmetic operations on > objects that support the new buffer protocol. > > Together with a multidimensional version of the standard > array.array type, this would provide a kind of "numpy > lite" that you could use to build reasonably efficient > vector and matrix types with no external dependencies. Again, I don't want to spoil the party, but multidimensional buffers are not implemented, and neither are buffers of anything other than single-byte data. Interested people should start with this, before jumping to the higher-level stuff. Regards Antoine. From brian at sweetapp.com Sun Apr 5 00:35:32 2009 From: brian at sweetapp.com (brian at sweetapp.com) Date: Sat, 4 Apr 2009 18:35:32 -0400 (EDT) Subject: [Python-Dev] Possible py3k io wierdness Message-ID: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> Hey, I noticed that the call pattern of the C-implemented io libraries is as follows (translating from C to Python): class _FileIO(object): def flush(self): if self.__IOBase_closed: raise ... def close(self): self.flush() self.__IOBase_closed = True class _RawIOBase(_FileIO): def close(self): # do close _FileIO.close(self) This means that, if a subclass overrides flush(), it will be called after the file has been closed e.g. >>> import io >>> class MyIO(io.FileIO): ... def flush(self): ... print('closed:', self.closed) ... >>> f = MyIO('test.out', 'wb') >>> f.close() closed: True It seems to me that, during close, calls should only propagate up the class hierarchy i.e. class _FileIO(object): def flush(self): if self.__IOBase_closed: raise ... def close(self): _FileIO.flush(self) self.__IOBase_closed = True I volunteer to change this if there is agreement that this is the way to go. Cheers, Brian From benjamin at python.org Sun Apr 5 01:13:40 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 4 Apr 2009 18:13:40 -0500 Subject: [Python-Dev] [RELEASED] Python 3.1 alpha 2 Message-ID: <1afaf6160904041613t4bb44976x65d7a4d4a90f2c47@mail.gmail.com> On behalf of the Python development team, I'm thrilled to announce the second alpha release of Python 3.1. Python 3.1 focuses on the stabilization and optimization of features and changes Python 3.0 introduced. For example, the new I/O system has been rewritten in C for speed. Other features include an ordered dictionary implementation and support for ttk Tile in Tkinter. For a more extensive list of changes in 3.1, see http://doc.python.org/dev/py3k/whatsnew/3.1.html or Misc/NEWS in the Python distribution. Please note that this is an alpha releases, and as such is not suitable for production environments. We continue to strive for a high degree of quality, but there are still some known problems and the feature sets have not been finalized. This alpha is being released to solicit feedback and hopefully discover bugs, as well as allowing you to determine how changes in 3.1 might impact you. If you find things broken or incorrect, please submit a bug report at http://bugs.python.org For more information and downloadable distributions, see the Python 3.1 website: http://www.python.org/download/releases/3.1/ See PEP 375 for release schedule details: http://www.python.org/dev/peps/pep-0375/ Regards, -- Benjamin Benjamin Peterson benjamin at python.org Release Manager (on behalf of the entire python-dev team and 3.1's contributors) From solipsis at pitrou.net Sun Apr 5 01:23:06 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 23:23:06 +0000 (UTC) Subject: [Python-Dev] Possible py3k io wierdness References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> Message-ID: Hi! sweetapp.com> writes: > > class _RawIOBase(_FileIO): FileIO is a subclass of _RawIOBase, not the reverse: >>> issubclass(_io._RawIOBase, _io.FileIO) False >>> issubclass(_io.FileIO, _io._RawIOBase) True I do understand your surprise, but the Python implementation of IOBase.close() in _pyio.py does the same thing: def close(self) -> None: """Flush and close the IO object. This method has no effect if the file is already closed. """ if not self.__closed: try: self.flush() except IOError: pass # If flush() fails, just give up self.__closed = True Note how it calls `self.flush()` and not `IOBase.flush(self)`. When writing the C version of the I/O stack, we tried to keep the semantics the same as in the Python version, although there are a couple of subtleties. Your problem here is that it's IOBase.close() which calls your flush() method, but FileIO.close() has already done its job before and the internal file descriptor has been closed (hence `self.closed` is True). In this particular case, I advocate overriding close() as well and call your flush() method manually from there. Thanks for your feedback! Regards Antoine. From steve at holdenweb.com Sun Apr 5 01:23:32 2009 From: steve at holdenweb.com (Steve Holden) Date: Sat, 04 Apr 2009 19:23:32 -0400 Subject: [Python-Dev] Integrate BeautifulSoup into stdlib? In-Reply-To: <49D6C20C.8030102@v.loewis.de> References: <49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de> <49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de> <49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org> <49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de> <49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com> <49C939BA.8040206@v.loewis.de> <1238798174.5360.388.camel@saeko.local> <49D6C20C.8030102@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> That's not entirely true; Cygwin comes with a package management tool >> that probably could be used to set up a repository of python packages >> for native Windows: > > Ah, ok. It has the big disadvantage of not being Microsoft-endorsed, > though. In that sense, it feels very much like easy_install (which also > does dependencies). > Not only that, but the Cygwin packaging system appears to be extremely difficult to organize a package for. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ From benjamin at python.org Sun Apr 5 01:31:45 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 4 Apr 2009 18:31:45 -0500 Subject: [Python-Dev] 3.1 beta is closer than you think Message-ID: <1afaf6160904041631u37bf7ed6ga3ad8b338c0afe95@mail.gmail.com> 3.1's only beta is planned for May 2nd, so that means you have exactly 28 days to get the amazing 3.1 features you have planned checked into the py3k branch. There will be absolutely no new features after the beta is released. -- Regards, Benjamin From greg.ewing at canterbury.ac.nz Sun Apr 5 01:34:20 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Apr 2009 11:34:20 +1200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> Message-ID: <49D7EE7C.4040604@canterbury.ac.nz> Antoine Pitrou wrote: > Again, I don't want to spoil the party, but multidimensional buffers are > not implemented, and neither are buffers of anything other than single-byte > data. When you say "buffer" here, are you talking about the buffer interface itself, or the memoryview object? -- Greg From solipsis at pitrou.net Sun Apr 5 01:38:30 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 23:38:30 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > > Again, I don't want to spoil the party, but multidimensional buffers are > > not implemented, and neither are buffers of anything other than single-byte > > data. > > When you say "buffer" here, are you talking about the > buffer interface itself, or the memoryview object? Both. Well, taking a buffer or memoryview to non-bytes data is supported, but since it's basically unused, some things are likely missing or broken (e.g. memoryview.tolist()). From greg.ewing at canterbury.ac.nz Sun Apr 5 01:52:11 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Apr 2009 11:52:11 +1200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> Message-ID: <49D7F2AB.8060907@canterbury.ac.nz> Antoine Pitrou wrote: > Both. > Well, taking a buffer or memoryview to non-bytes data is supported, but since > it's basically unused, some things are likely missing or broken So you're saying the buffer interface *has* been fully implemented, it just hasn't been tested very well? If so, writing some things that attempt to use it in non-trivial ways would be a useful thing to do. -- Greg From benjamin at python.org Sun Apr 5 01:52:57 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 4 Apr 2009 18:52:57 -0500 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: <49D7F2AB.8060907@canterbury.ac.nz> References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> Message-ID: <1afaf6160904041652r4d538210y88ee33f7027a4dae@mail.gmail.com> 2009/4/4 Greg Ewing : > Antoine Pitrou wrote: > >> Both. >> Well, taking a buffer or memoryview to non-bytes data is supported, but >> since >> it's basically unused, some things are likely missing or broken > > So you're saying the buffer interface *has* been fully > implemented, it just hasn't been tested very well? No, only simple linear bytes are supported. > > If so, writing some things that attempt to use it in > non-trivial ways would be a useful thing to do. -- Regards, Benjamin From solipsis at pitrou.net Sun Apr 5 01:56:07 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 4 Apr 2009 23:56:07 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > So you're saying the buffer interface *has* been fully > implemented, it just hasn't been tested very well? No, it hasn't been implemented for multi-dimensional types, and it hasn't been really tested for anything other than plain linear collections of bytes. (I have added tests for arrays in test_memoryview, but that's all. And that's only in py3k since array.array in 2.x only supports the old buffer interface) From martin at v.loewis.de Sun Apr 5 02:44:38 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 02:44:38 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D7C523.2090605@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> Message-ID: <49D7FEF6.1010006@v.loewis.de> > I'd like to take that on. I know hardly anyone here knows me, but I'm > one of the Mercurial developers. I've been in contact with Brett, saying > that I'd gladly as much help as I could, and I figured I'd put a lot of > time in providing the best possible migration path. I'm personally happy letting you do that (although I do wonder who would then be in charge of the Mercurial installation in the long run, the way I have been in charge of the subversion installation). To proceed, I think the next step should be to discuss in the PEP the details of the migration procedure (see PEP 347 for what level of detail I produced for the svn migration), and to set up a demo installation that is considered ready-to-run, except that it might get torn down again, if the actual conversion requires that (it did for the CVS->svn case), or if problems are found with the demo installation. I would personally remove all non-mercurial stuff out of PEP 374, and retitle it, but that would be your choice. Regards, Martin From solipsis at pitrou.net Sun Apr 5 03:03:23 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 01:03:23 +0000 (UTC) Subject: [Python-Dev] BufferedReader.peek() ignores its argument Message-ID: Hello, Currently, BufferedReader.peek() ignores its argument and can return more or less than the number of bytes requested by the user. This is how it was implemented in the Python version, and we've reflected this in the C version. It seems a bit strange and unhelpful though. Should we change the implementation so that the argument to peek() becomes the upper bound to the number of bytes returned? Thanks for your advice, Antoine. From ben+python at benfinney.id.au Sun Apr 5 03:07:26 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 05 Apr 2009 11:07:26 +1000 Subject: [Python-Dev] UnicodeDecodeError bug in distutils References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com> <200702242309.46022.pogonyshev@gmx.net> <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <45E0C012.7090801@palladion.com> <5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com> <877i22fuqy.fsf_-_@benfinney.id.au> Message-ID: <87iqljc3ht.fsf@benfinney.id.au> Ben Finney writes: > "Phillip J. Eby" writes: > > > Meanwhile, the 'register' command accepts Unicode, but is broken in > > handling it. [?] > > > > Unfortunately, this isn't fixable until there's a new 2.5.x release. > > For previous Python versions, both register and write_pkg_info() > > accepted 8-bit strings and passed them on as-is, so the only > > workaround for this issue at the moment is to revert to Python 2.4 > > or less. > > What is the prognosis on this issue? It's still hitting me in Python > 2.5.4. Any word on this? Is there an open bug tracker issue with more information? Who's working on this? -- \ ?If sharing a thing in no way diminishes it, it is not rightly | `\ owned if it is not shared.? ?Saint Augustine | _o__) | Ben Finney From benjamin at python.org Sun Apr 5 03:11:50 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 4 Apr 2009 20:11:50 -0500 Subject: [Python-Dev] BufferedReader.peek() ignores its argument In-Reply-To: References: Message-ID: <1afaf6160904041811y5d4933dfj2f7b0da02967a833@mail.gmail.com> 2009/4/4 Antoine Pitrou : > Hello, > > Currently, BufferedReader.peek() ignores its argument and can return more or > less than the number of bytes requested by the user. This is how it was > implemented in the Python version, and we've reflected this in the C version. > > It seems a bit strange and unhelpful though. Should we change the implementation > so that the argument to peek() becomes the upper bound to the number of bytes > returned? +1 That sounds more useful. > > Thanks for your advice, -- Regards, Benjamin From lists at cheimes.de Sun Apr 5 03:14:03 2009 From: lists at cheimes.de (Christian Heimes) Date: Sun, 05 Apr 2009 03:14:03 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D7FEF6.1010006@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> Message-ID: Martin v. L?wis wrote: > I would personally remove all non-mercurial stuff out of PEP 374, > and retitle it, but that would be your choice. I suggest we keep the old PEP and start a new one about Hg exclusively. The original PEP 374 has cost Brett a lot of time. It would be a shame to throw it away when it may become in handy for other FOSS projects that want to move away from subversion. Dirkjan or whoever is going to work on the PEP can copy n' paste the interesting pieces from PEP 374 to the new one. Christian From martin at v.loewis.de Sun Apr 5 03:40:22 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 03:40:22 +0200 Subject: [Python-Dev] UnicodeDecodeError bug in distutils In-Reply-To: <87iqljc3ht.fsf@benfinney.id.au> References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com> <200702242309.46022.pogonyshev@gmx.net> <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <45E0C012.7090801@palladion.com> <5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com> <877i22fuqy.fsf_-_@benfinney.id.au> <87iqljc3ht.fsf@benfinney.id.au> Message-ID: <49D80C06.20809@v.loewis.de> >>> Meanwhile, the 'register' command accepts Unicode, but is broken in >>> handling it. [?] >>> >>> Unfortunately, this isn't fixable until there's a new 2.5.x release. >>> For previous Python versions, both register and write_pkg_info() >>> accepted 8-bit strings and passed them on as-is, so the only >>> workaround for this issue at the moment is to revert to Python 2.4 >>> or less. >> What is the prognosis on this issue? It's still hitting me in Python >> 2.5.4. > > Any word on this? Is there an open bug tracker issue with more > information? Who's working on this? For Python 2.5.4, no further changes will be made. If you can reproduce with 2.6, and can't find a tracker issue, make a new report. Regards, Martin From tjreedy at udel.edu Sun Apr 5 03:45:40 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 04 Apr 2009 21:45:40 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> Message-ID: Christian Heimes wrote: > Martin v. L?wis wrote: >> I would personally remove all non-mercurial stuff out of PEP 374, >> and retitle it, but that would be your choice. > > I suggest we keep the old PEP and start a new one about Hg exclusively. > The original PEP 374 has cost Brett a lot of time. It would be a shame > to throw it away when it may become in handy for other FOSS projects > that want to move away from subversion. I second not tossing the data and history. It serves as partial justification for the decision, which has been and will occasionally again be discussed on python-list. > Dirkjan or whoever is going to work on the PEP can copy n' paste the > interesting pieces from PEP 374 to the new one. tjr From aahz at pythoncraft.com Sun Apr 5 03:58:00 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 4 Apr 2009 18:58:00 -0700 Subject: [Python-Dev] BufferedReader.peek() ignores its argument In-Reply-To: References: Message-ID: <20090405015800.GB19165@panix.com> On Sun, Apr 05, 2009, Antoine Pitrou wrote: > > Currently, BufferedReader.peek() ignores its argument and can return > more or less than the number of bytes requested by the user. This is > how it was implemented in the Python version, and we've reflected this > in the C version. > > It seems a bit strange and unhelpful though. Should we change the > implementation so that the argument to peek() becomes the upper bound > to the number of bytes returned? IIRC, this was made to handle SSL where the number of bytes returned may need to be larger than the size. If that's the case, there should be a record somewhere in the list archives... (Or possibly the svn logs.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From ben+python at benfinney.id.au Sun Apr 5 04:48:58 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 05 Apr 2009 12:48:58 +1000 Subject: [Python-Dev] UnicodeDecodeError bug in distutils References: <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <94bdd2610702241247t568a942dw2fe1b10883b62d20@mail.gmail.com> <200702242309.46022.pogonyshev@gmx.net> <94bdd2610702241306q60b1a10rb91dff4919fdae13@mail.gmail.com> <45E0C012.7090801@palladion.com> <5.1.1.6.0.20070224203115.0270a5a8@sparrow.telecommunity.com> <877i22fuqy.fsf_-_@benfinney.id.au> <87iqljc3ht.fsf@benfinney.id.au> Message-ID: <87eiw7bysl.fsf@benfinney.id.au> Ben Finney writes: > Is there an open bug tracker issue with more information? Answer: . Apparently the issue is resolved for Python 2.6. I will need to wait for my distribution to catch up before I can know whether it's resolved. -- \ ?The World is not dangerous because of those who do harm but | `\ because of those who look at it without doing anything.? | _o__) ?Albert Einstein | Ben Finney From martin at v.loewis.de Sun Apr 5 04:56:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 04:56:12 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> Message-ID: <49D81DCC.70306@v.loewis.de> > I second not tossing the data and history. It serves as partial > justification for the decision, which has been and will occasionally > again be discussed on python-list. It's in subversion, so the history won't be tossed. To keep it online, it doesn't have to be in the PEP - putting it in a wiki page would also allow referring to it. Regards, Martin From tjreedy at udel.edu Sun Apr 5 05:33:50 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 04 Apr 2009 23:33:50 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D81DCC.70306@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> <49D81DCC.70306@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> I second not tossing the data and history. It serves as partial >> justification for the decision, which has been and will occasionally >> again be discussed on python-list. > > It's in subversion, so the history won't be tossed. I know; I should have been more exact: not hidden and made difficult to access. > To keep it online, > it doesn't have to be in the PEP - putting it in a wiki page would > also allow referring to it. Sure. A title like DvcsComparison would be easy to remember. From alexandre at peadrop.com Sun Apr 5 07:55:03 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Apr 2009 01:55:03 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <20090404154049.GA23987@panix.com> References: <20090404154049.GA23987@panix.com> Message-ID: On Sat, Apr 4, 2009 at 11:40 AM, Aahz wrote: > With Brett's (hopefully temporary!) absence, who is spearheading the > Mercurial conversion? ?Whoever it is should probably take over PEP 374 > and start updating it with the conversion plan, particularly WRT > expectations for dates relative to 3.1 final and 2.7 final. I am willing to take over this. I was in charge of the Mercurial scenarios in the PEP, so it would be natural for me to continue with the transition. In addition, I volunteer to maintain the new Mercurial installation. Off the top of my head, the following is needed for a successful migration: - Verify that the repository at http://code.python.org/hg/ is properly converted. - Convert the current svn commit hooks to Mercurial. - Add Mercurial support to the issue tracker. - Update the developer FAQ. - Setup temporary svn mirrors for the main Mercurial repositories. - Augment code.python.org infrastructure to support the creation of developer accounts. - Update the release.py script. There is probably some other things that I missed, but I think this is a good overview of what needs to be done. And of course, I would welcome anyone who would be willing to help me with the transition. -- Alexandre From alexandre at peadrop.com Sun Apr 5 08:07:29 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Apr 2009 02:07:29 -0400 Subject: [Python-Dev] BufferedReader.peek() ignores its argument In-Reply-To: References: Message-ID: On Sat, Apr 4, 2009 at 9:03 PM, Antoine Pitrou wrote: > Hello, > > Currently, BufferedReader.peek() ignores its argument and can return more or > less than the number of bytes requested by the user. This is how it was > implemented in the Python version, and we've reflected this in the C version. > > It seems a bit strange and unhelpful though. Should we change the implementation > so that the argument to peek() becomes the upper bound to the number of bytes > returned? > I am not sure if this is a good idea. Currently, the argument of peek() is documented as a lower bound that cannot exceed the size of the buffer: Returns buffered bytes without advancing the position. The argument indicates a desired minimal number of bytes; we do at most one raw read to satisfy it. We never return more than self.buffer_size. Changing the meaning of peek() now could introduce at least some confusion and maybe also bugs. And personally, I like the current behavior, since it guarantees that peek() won't return an empty string unless you reached the end-of-file. Plus, it is fairly easy to cap the number of bytes returned by doing f.peek()[:upper_bound]. -- Alexandre From alexandre at peadrop.com Sun Apr 5 08:28:52 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Apr 2009 02:28:52 -0400 Subject: [Python-Dev] Should I/O object wrappers close their underlying buffer when deleted? Message-ID: Hello, I would like to call to your attention the following behavior of TextIOWrapper: import io def test(buf): textio = io.TextIOWrapper(buf) buf = io.BytesIO() test(buf) print(buf.closed) # This prints True currently The problem here is TextIOWrapper closes its buffer when deleted. BufferedRWPair behalves similarly. The solution is simply to override the __del__ method of TextIOWrapper inherited from IOBase. -- Alexandre From ncoghlan at gmail.com Sun Apr 5 09:13:39 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 05 Apr 2009 17:13:39 +1000 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> Message-ID: <49D85A23.6020405@gmail.com> Antoine Pitrou wrote: > Greg Ewing canterbury.ac.nz> writes: >> So you're saying the buffer interface *has* been fully >> implemented, it just hasn't been tested very well? > > No, it hasn't been implemented for multi-dimensional types, and it hasn't been > really tested for anything other than plain linear collections of bytes. > (I have added tests for arrays in test_memoryview, but that's all. And that's > only in py3k since array.array in 2.x only supports the old buffer interface) Step back for a sec here... PEP 3118 has three pieces, not two. Part 1, the actual new buffer protocol, is complete and works fine as far as I know. If it didn't, we would have heard about it from the third clients of the new protocol by now. Parts 2 and 3, being the memoryview API and support for the new protocol in the builtin types are the parts that are currently restricted to simple linear memory views. That's largely because parts 2 and 3 are somewhat use case challenged: the key motivation behind PEP 3118 was so that libraries like NumPy, PIL and the like would have a common standard for data interchange. Since those all have their own extension objects and will be using the PEP 3118 C API directly rather than going through memoryview, the state of the Python API and the support from builtin containers types is largely irrelevant to the target audience for the PEP. Actually *finishing* parts 2 and 3 of PEP 3118 would be a good precursor to having some kind of multi-dimensional mathematics in the standard library though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From stephen at xemacs.org Sun Apr 5 11:08:44 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 05 Apr 2009 18:08:44 +0900 Subject: [Python-Dev] Integrate BeautifulSoup into stdlib? In-Reply-To: References: <49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de> <49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de> <49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org> <49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de> <49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com> <49C939BA.8040206@v.loewis.de> <1238798174.5360.388.camel@saeko.local> <49D6C20C.8030102@v.loewis.de> Message-ID: <87fxgntqlf.fsf@xemacs.org> Steve Holden writes: > Not only that, but the Cygwin packaging system appears to be extremely > difficult to organize a package for. Really? I Don't Do Windows[tm], but the people who did installers and stuff for XEmacs releases never had problems with it. It was much more painful to create the .exe-style Windows installers. From greg.ewing at canterbury.ac.nz Sun Apr 5 11:08:02 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 05 Apr 2009 21:08:02 +1200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: <49D85A23.6020405@gmail.com> References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> Message-ID: <49D874F2.8080709@canterbury.ac.nz> Nick Coghlan wrote: > Actually *finishing* parts 2 and 3 of PEP 3118 would be a good precursor > to having some kind of multi-dimensional mathematics in the standard > library though. Even if they only work on the existing one-dimensional sequence types, elementwise operations would still be useful to have. And if they work through the new buffer protocol, they'll be ready for multi-dimensional types if and when such types appear. -- Greg From martin at v.loewis.de Sun Apr 5 11:06:33 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 11:06:33 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> Message-ID: <49D87499.5060502@v.loewis.de> > Off the top of my head, the following is needed for a successful migration: > > - Verify that the repository at http://code.python.org/hg/ is > properly converted. I see that this has four branches. What about all the other branches? Will they be converted, or not? What about the stuff outside /python? In particular, the Stackless people have requested that they move along with what core Python does, so their code should also be converted. > - Add Mercurial support to the issue tracker. Not sure what this means. There is currently svn support insofar as the tracker can format rNNN references into ViewCVS links; this should be updated if possible (removed if not). There would also be a possibility to auto-close issues from the commit messages. This is not done currently, so I would not make it a prerequisite for the switch. > - Setup temporary svn mirrors for the main Mercurial repositories. What is that? > - Augment code.python.org infrastructure to support the creation of > developer accounts. One option would be to carry on with the current setup; migrating it to hg might work as well, of course. > - Update the release.py script. > > There is probably some other things that I missed Here are some: - integrate with the buildbot - come up with a strategy for /external (also relevant for the buildbot slaves) - decide what to do with the bzr mirrors Regards, Martin From brian at sweetapp.com Sun Apr 5 11:07:48 2009 From: brian at sweetapp.com (Brian Quinlan) Date: Sun, 05 Apr 2009 10:07:48 +0100 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> Message-ID: <49D874E4.6030602@sweetapp.com> Hey Antoine, Thanks for the clarification! I see that the C implementation matches the Python implementation but I don't see how the semantics of either are useful in this case. If a subclass implements flush then, as you say, it must also implement close and call flush itself before calling its superclass' close method. But then _RawIOBase will pointlessly call the subclass' flush method a second time. This second call should raise (because the file is closed) and the exception will be caught and suppressed. I don't see why this is helpful. Could you explain why _RawIOBase.close() calling self.flush() is useful? Cheers, Brian Antoine Pitrou wrote: > Hi! > > sweetapp.com> writes: >> class _RawIOBase(_FileIO): > > FileIO is a subclass of _RawIOBase, not the reverse: > >>>> issubclass(_io._RawIOBase, _io.FileIO) > False >>>> issubclass(_io.FileIO, _io._RawIOBase) > True > > I do understand your surprise, but the Python implementation of IOBase.close() > in _pyio.py does the same thing: > > def close(self) -> None: > """Flush and close the IO object. > > This method has no effect if the file is already closed. > """ > if not self.__closed: > try: > self.flush() > except IOError: > pass # If flush() fails, just give up > self.__closed = True > > Note how it calls `self.flush()` and not `IOBase.flush(self)`. > When writing the C version of the I/O stack, we tried to keep the semantics the > same as in the Python version, although there are a couple of subtleties. > > Your problem here is that it's IOBase.close() which calls your flush() method, > but FileIO.close() has already done its job before and the internal file > descriptor has been closed (hence `self.closed` is True). In this particular > case, I advocate overriding close() as well and call your flush() method > manually from there. > > Thanks for your feedback! > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com From cournape at gmail.com Sun Apr 5 11:14:37 2009 From: cournape at gmail.com (David Cournapeau) Date: Sun, 5 Apr 2009 18:14:37 +0900 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87499.5060502@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <5b8d13220904050214i40142b43tbbadb7d6815c7923@mail.gmail.com> On Sun, Apr 5, 2009 at 6:06 PM, "Martin v. L?wis" wrote: >> Off the top of my head, the following is needed for a successful migration: >> >> ? ?- Verify that the repository at http://code.python.org/hg/ is >> properly converted. > > I see that this has four branches. What about all the other branches? > Will they be converted, or not? What about the stuff outside /python? > > In particular, the Stackless people have requested that they move along > with what core Python does, so their code should also be converted. I don't know the capabilities of hg w.r.t svn conversion, so this may well be overkill, but git has a really good tool for svn conversion (svn-all-fast-export, developed by KDE). You can handle almost any svn organization (e.g. outside the usual trunk/tags/branches), and convert email addresses of committers, split one big svn repo into subprojects, etc... Then, the git repo could be converted to hg relatively easily I believe. cheers, David From robertc at robertcollins.net Sun Apr 5 11:16:56 2009 From: robertc at robertcollins.net (Robert Collins) Date: Sun, 05 Apr 2009 19:16:56 +1000 Subject: [Python-Dev] Integrate BeautifulSoup into stdlib? In-Reply-To: <87fxgntqlf.fsf@xemacs.org> References: <49BA3154.8080408@simplistix.co.uk> <49BAA596.5020106@v.loewis.de> <49C79C1A.8040301@simplistix.co.uk> <49C7FC85.5000809@v.loewis.de> <49C80FA0.4020800@simplistix.co.uk> <87ab7bh5fb.fsf@xemacs.org> <49C87004.2030807@holdenweb.com> <49C88503.2030902@v.loewis.de> <49C886EF.80203@v.loewis.de> <49C8C9B3.3070403@holdenweb.com> <49C939BA.8040206@v.loewis.de> <1238798174.5360.388.camel@saeko.local> <49D6C20C.8030102@v.loewis.de> <87fxgntqlf.fsf@xemacs.org> Message-ID: <1238923019.2700.394.camel@lifeless-64> On Sun, 2009-04-05 at 18:08 +0900, Stephen J. Turnbull wrote: > Steve Holden writes: > > > Not only that, but the Cygwin packaging system appears to be extremely > > difficult to organize a package for. > > Really? I Don't Do Windows[tm], but the people who did installers and > stuff for XEmacs releases never had problems with it. It was much > more painful to create the .exe-style Windows installers. Back when I was maintaining setup.exe was when XEmacs started using setup.exe to do installers; it must have been fairly straight forward because we first heard of it when it was complete :). The following may have changed, but I doubt it has changed dramatically - the setup.exe system is kindof trivial: There is a .lst file which is a .INI format file listing packages and direct dependencies. - each package is a .tar.(gz|bz2) which is unpacked on disk, and [optional] post-install, pre-removal scripts inside the tarball. Doing an installer for something not part of Cygwin requires a one-time fork of the setup.exe program, to change the master source for .lst files, and thats about it. Beyond that its all maintaining whatever set of packages and dependencies you have. If you are installing things for Cygwin itself you can just depend directly on things Cygwin ships in your .lst file; and not ship a setup.exe at all - setup.exe can source from many places to satisfy dependencies. -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From mario.danic at gmail.com Sun Apr 5 11:21:01 2009 From: mario.danic at gmail.com (Mario) Date: Sun, 5 Apr 2009 11:21:01 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87499.5060502@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <79957db20904050221m72d998b1ja126d496594c4438@mail.gmail.com> > > > Not sure what this means. There is currently svn support insofar as the > tracker can format rNNN references into ViewCVS links; this should be > updated if possible (removed if not). There would also be a possibility > to auto-close issues from the commit messages. This is not done > currently, so I would not make it a prerequisite for the switch. > > While I don't know how urgent this is, I will just mention that I am willing to work on Roundup-mercurial during GSoC (or outside it, if I don't get accept). Cheers, M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirkjan at ochtman.nl Sun Apr 5 11:41:40 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 11:41:40 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> Message-ID: <49D87CD4.1000909@ochtman.nl> On 05/04/2009 07:55, Alexandre Vassalotti wrote: > - Verify that the repository at http://code.python.org/hg/ is > properly converted. I'm pretty sure that we'll need to reconvert; I don't think the current conversion is particularly good. We'll also have to decide on named branches vs. clones, for example, and if we could try to reorder revlogs to make the repo smaller after conversion. I've svnsynced the SVN repo so that we can work on it efficiently, and I've already talked with Augie Fackler, the hgsubversion maintainer, about what the best way forward is. For example, we may want to leave some of the very old history behind, or prune some old branches. > - Convert the current svn commit hooks to Mercurial. Some new hooks should also be discussed. For example, Mozilla uses a single-head hook, to prevent people from pushing multiple heads. They also have a pushlog extension that keeps a SQLite database of what people pushed. This is particularly useful for linearizing history, which is required for integration with buildbot infrastructure. > - Add Mercurial support to the issue tracker. I don't think there's much to do there, but a regex to link up some commonly-used revision references would be good. If we use cloned branches, we'll have to come up with some syntax to make that work. > - Update the developer FAQ. > - Setup temporary svn mirrors for the main Mercurial repositories. How do you plan to do that? I don't think there are any tools that support that, yet. I've actually started on my own, but I haven't gotten very far with it, yet. > - Augment code.python.org infrastructure to support the creation of > developer accounts. Developers already have accounts, don't they? In any case, some web interface to facilitate setting up new clones (branches) is also something that's probably desirable. I think Mozilla has some tooling for that which we might be able to start off of. Cheers, Dirkjan From dirkjan at ochtman.nl Sun Apr 5 11:45:38 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 11:45:38 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D7FEF6.1010006@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> Message-ID: <49D87DC2.2040708@ochtman.nl> On 05/04/2009 02:44, "Martin v. L?wis" wrote: > I'm personally happy letting you do that (although I do wonder who would > then be in charge of the Mercurial installation in the long run, the way > I have been in charge of the subversion installation). I'd be happy to commit to that for the foreseeable future. > To proceed, I think the next step should be to discuss in the PEP the > details of the migration procedure (see PEP 347 for what level of detail > I produced for the svn migration), and to set up a demo installation > that is considered ready-to-run, except that it might get torn down > again, if the actual conversion requires that (it did for the CVS->svn > case), or if problems are found with the demo installation. Sounds sane. Would I be able to get access to PSF infrastructure to get started on that, or do you want me to get started on my own box? I'll probably do the conversion on my own box, but for authn/authz it might be useful to be able to use PSF infra. > I would personally remove all non-mercurial stuff out of PEP 374, > and retitle it, but that would be your choice. Moving the current content to a wiki page like you suggest later in this thread sounds like a good idea. Cheers, Dirkjan From dirkjan at ochtman.nl Sun Apr 5 11:55:22 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 11:55:22 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87499.5060502@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <49D8800A.60601@ochtman.nl> On 05/04/2009 11:06, "Martin v. L?wis" wrote: > In particular, the Stackless people have requested that they move along > with what core Python does, so their code should also be converted. I'd be interested to hear if they want all of their stuff converted, or just the mainline/trunk of what is currently in trunk/branches/tags. > - integrate with the buildbot I've setup the buildbot infra for Mercurial (though not many people are interesting in it, so it's kind of languished). Using buildbot's hg support is easy. 0.7.10 is the first version which works with hg 1.1+, though, so we probably don't want to go with anything earlier. > - come up with a strategy for /external (also relevant for > the buildbot slaves) I'm not sure exactly what the purpose or mechanism for /external is. Sure, it's like a snapshot dir, probably used for to pull some stuff into other process? Seems to me like it might be interesting to, for example, convert to a simple config file + script that lets you specify a package (repository) + tag, which can then be easily pulled in. But it'd be nice to know where and how exactly this is used. > - decide what to do with the bzr mirrors I'm assuming the bzr people have ways of importing hg repos. It's probably more effective for them to deal with this problem. If helpful, there are some scripts that do fast-exporting from hg repos. Cheers, Dirkjan From solipsis at pitrou.net Sun Apr 5 12:19:46 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 10:19:46 +0000 (UTC) Subject: [Python-Dev] BufferedReader.peek() ignores its argument References: Message-ID: Alexandre Vassalotti peadrop.com> writes: > > I am not sure if this is a good idea. Currently, the argument of > peek() is documented as a lower bound that cannot exceed the size of > the buffer: Unfortunately, in practice, the argument is neither a lower bound nor an upper bound. It's just used as some kind of internal heuristic (in the Python version) or not used at all (in the C version). Regards Antoine. From solipsis at pitrou.net Sun Apr 5 12:27:45 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 10:27:45 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> Message-ID: Alexandre Vassalotti peadrop.com> writes: > > Off the top of my head, the following is needed for a successful migration: There's also the issue of how we adapt the current workflow of "svnmerging" between branches when we want to back- or forward-port stuff. In particular, tracking of already done or blocked backports. (the issue being that "svnmerge" is different from what DVCS'es call "merging" :-)) From solipsis at pitrou.net Sun Apr 5 12:29:17 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 10:29:17 +0000 (UTC) Subject: [Python-Dev] Possible py3k io wierdness References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> Message-ID: Brian Quinlan sweetapp.com> writes: > > I don't see why this is helpful. Could you explain why > _RawIOBase.close() calling self.flush() is useful? I could not explain it for sure since I didn't write the Python version. I suppose it's so that people who only override flush() automatically get the flush-on-close behaviour. cheers Antoine. From solipsis at pitrou.net Sun Apr 5 12:33:33 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 10:33:33 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > Parts 2 and 3, being the memoryview API and support for the new protocol > in the builtin types are the parts that are currently restricted to > simple linear memory views. > > That's largely because parts 2 and 3 are somewhat use case challenged: > the key motivation behind PEP 3118 was so that libraries like NumPy, PIL > and the like would have a common standard for data interchange. If I understand correctly, one of the motivations behind memoryview() is to replace buffer() as a way to get cheap slicing without memory copies (it's used e.g. in the C IO library). I don't know whether the third-party types mentioned above could also benefit from that. Regards Antoine. From dirkjan at ochtman.nl Sun Apr 5 12:51:30 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 12:51:30 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> Message-ID: <49D88D32.60202@ochtman.nl> On 05/04/2009 12:27, Antoine Pitrou wrote: > There's also the issue of how we adapt the current workflow of "svnmerging" > between branches when we want to back- or forward-port stuff. In particular, > tracking of already done or blocked backports. Right. The canonical way to do that with Mercurial is to commit patches against the "oldest" branch where they should be applied, so that every stable branch is a strict subset of every less stable branch. From what I've understood, this doesn't fit the way the Python-dev community/process works very well. In that case, there are a number of alternatives. For example, hg's export/import commands can be used to explicitly deal with diffs that contain hg metadata, the transplant extension can be used to automate that, or in some cases, the rebase extension might be more appropriate. We can put extended examples from the PEP in the wiki to help people discovering the best workflow. Cheers, Dirkjan From brian at sweetapp.com Sun Apr 5 12:56:47 2009 From: brian at sweetapp.com (Brian Quinlan) Date: Sun, 05 Apr 2009 11:56:47 +0100 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> Message-ID: <49D88E6F.4080801@sweetapp.com> Antoine Pitrou wrote: > Brian Quinlan sweetapp.com> writes: >> I don't see why this is helpful. Could you explain why >> _RawIOBase.close() calling self.flush() is useful? > > I could not explain it for sure since I didn't write the Python version. > I suppose it's so that people who only override flush() automatically get the > flush-on-close behaviour. But the way that the code is currently written, flush only gets called *after* the file has been closed (see my original example). It seems very unlikely that this is the behavior that the subclass would want/expect. So any objections to me changing IOBase (and the C implementation) to: def close(self): """Flush and close the IO object. This method has no effect if the file is already closed. """ if not self.__closed: try: - self.flush() + IOBase.flush(self) except IOError: pass # If flush() fails, just give up self.__closed = True Cheers, Brian From solipsis at pitrou.net Sun Apr 5 13:00:11 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 11:00:11 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D88D32.60202@ochtman.nl> Message-ID: Dirkjan Ochtman ochtman.nl> writes: > > Right. The canonical way to do that with Mercurial is to commit patches > against the "oldest" branch where they should be applied, so that every > stable branch is a strict subset of every less stable branch. It doesn't work between py3k and trunk, which are wildly diverging. > In that case, there are a number of > alternatives. For example, hg's export/import commands can be used to > explicitly deal with diffs that contain hg metadata, the transplant > extension can be used to automate that, or in some cases, the rebase > extension might be more appropriate. Transplant or export/import have the right semantics IMO, but we lose the tracking that's built in svnmerge. Perhaps a new hg extension? ;) (the missing functionality is to store the list of transplanted or blocked changesets in a .hgXXX file (storing the original hashes, not the ones after transplant), and parse that file in order to compare it with the incoming changesets from an other repo) Regards Antoine. From dirkjan at ochtman.nl Sun Apr 5 13:04:13 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 13:04:13 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D88D32.60202@ochtman.nl> Message-ID: <49D8902D.8050803@ochtman.nl> On 05/04/2009 13:00, Antoine Pitrou wrote: > Transplant or export/import have the right semantics IMO, but we lose the > tracking that's built in svnmerge. Perhaps a new hg extension? ;) > (the missing functionality is to store the list of transplanted or blocked > changesets in a .hgXXX file (storing the original hashes, not the ones after > transplant), and parse that file in order to compare it with the incoming > changesets from an other repo) Transplant can already keep the source revision hash on the new revision (in hg's equivalent of generic revprops, the extra dict). I think that blocked revisions will not be an issue due to the nature of the DAG, but I have too little experience with svnmerge to say for sure. Cheers, Dirkjan From ncoghlan at gmail.com Sun Apr 5 13:16:25 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 05 Apr 2009 21:16:25 +1000 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> Message-ID: <49D89309.7050307@gmail.com> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> Parts 2 and 3, being the memoryview API and support for the new protocol >> in the builtin types are the parts that are currently restricted to >> simple linear memory views. >> >> That's largely because parts 2 and 3 are somewhat use case challenged: >> the key motivation behind PEP 3118 was so that libraries like NumPy, PIL >> and the like would have a common standard for data interchange. > > If I understand correctly, one of the motivations behind memoryview() is to > replace buffer() as a way to get cheap slicing without memory copies (it's used > e.g. in the C IO library). I don't know whether the third-party types mentioned > above could also benefit from that. Yep, once memoryview supports all of the PEP 3118 semantics it should be usable with sufficiently recent versions of NumPy arrays and the like. It's implementation has unfortunately lagged because those with the most relevant expertise don't need it (they access the objects they care about through the C API), and there are some interesting semantics to get right which are hard to judge without that expertise. Still, as both you and Greg have pointed out, even in its current form memoryview is already useful as a replacement for buffer that doesn't share buffer's problems - it's only if they try to use it with the more sophisticated aspects of the PEP 3118 API that people may be disappointed by its capabilities. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From l.mastrodomenico at gmail.com Sun Apr 5 13:17:20 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 5 Apr 2009 13:17:20 +0200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> Message-ID: 2009/4/5 Antoine Pitrou : > Nick Coghlan gmail.com> writes: >> >> That's largely because parts 2 and 3 are somewhat use case challenged: >> the key motivation behind PEP 3118 was so that libraries like NumPy, PIL >> and the like would have a common standard for data interchange. > > If I understand correctly, one of the motivations behind memoryview() is to > replace buffer() as a way to get cheap slicing without memory copies (it's used > e.g. in the C IO library). I don't know whether the third-party types mentioned > above could also benefit from that. Well, PEP 3118 is useful because it would be nice having e.g. the possibility of opening an image with PIL, manipulate it directly with NumPy and saving it to file with PIL. Right now this is possible only if the PIL image is first converted (and copied) to a new NumPy array and then the array is converted back to an image. BTW, while PEP 3118 provides a common C API for this, the related PEP 368 proposes a standard "image protocol" on the Python side that should be compatible with the image classes of PIL, wxPython and pygame, and (mostly) with NumPy arrays. I started an implementation of PEP 368 at: http://code.google.com/p/pyimage/ Both the PEP and the implementation need updates (pyimage already includes an IEEE 754r compatible half-precision floating point type, aka float16, that's not yet in the PEP), but if someone is interested and willing to help I may start again working on them. Also note that the subjects "vec2, vec3, quaternion, etc" (PEP 3141) and "multi-dimensional arrays" (PEP 3118) are mostly unrelated. -- Lino Mastrodomenico From firephoenix at wanadoo.fr Sun Apr 5 13:31:48 2009 From: firephoenix at wanadoo.fr (Firephoenix) Date: Sun, 05 Apr 2009 13:31:48 +0200 Subject: [Python-Dev] Generator methods - "what's next" ? Message-ID: <49D896A4.3000104@wanadoo.fr> Hello everyone I'm a little confused by the recent changes to the generator system... I basically agreed with renaming the next() method to __next__(), so as to follow the naming of other similar methods (__iter__() etc.). But I noticed then that all the other methods of the generator had stayed the same (send, throw, close...), which gives really weird (imo) codes : next(it) it.send(35) it.throw(Exception()) next(it) .... Browsing the web, I've found people troubled by that asymmetry, but no remarks on its causes nor its future... Since __next__(), send() and others have really really close semantics, I consider that state as a python wart, one of the few real ones I can think of. Is there any plan to fix this ? Either by coming back to the next() method, or by putting all the "magical methods" of generators in the __specialattributes__ bag ? next(it) send(it, 5) throw(it, Exception()) ... Thanks a lot for the information, Pascal From g.brandl at gmx.net Sun Apr 5 14:46:12 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 14:46:12 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D88D32.60202@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D88D32.60202@ochtman.nl> Message-ID: Dirkjan Ochtman schrieb: > On 05/04/2009 12:27, Antoine Pitrou wrote: >> There's also the issue of how we adapt the current workflow of "svnmerging" >> between branches when we want to back- or forward-port stuff. In particular, >> tracking of already done or blocked backports. > > Right. The canonical way to do that with Mercurial is to commit patches > against the "oldest" branch where they should be applied, so that every > stable branch is a strict subset of every less stable branch. That's what I do as well in Sphinx. It works fine there, but there are two issues if you want to apply it to Python: * As Antoine said, trunk and py3k are very different. Merging would still be possible, but confusing. * Our current trunk/maint branches will have completely different commits, so pulling (e.g.) from 2.6 into trunk won't work. So I'd be in favor of a solution like the following: * Once 2.7 and 3.1 are final, create their maint branches as "real" Hg branches, so that for each pair committing to maint and pulling into trunk works. * For the 2->3 merging, use transplant (optionally with the mentioned feature of keeping track what was already transplanted and blocked). Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sun Apr 5 14:48:38 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 14:48:38 +0200 Subject: [Python-Dev] Generator methods - "what's next" ? In-Reply-To: <49D896A4.3000104@wanadoo.fr> References: <49D896A4.3000104@wanadoo.fr> Message-ID: Firephoenix schrieb: > Hello everyone > > I'm a little confused by the recent changes to the generator system... > > I basically agreed with renaming the next() method to __next__(), so as > to follow the naming of other similar methods (__iter__() etc.). > But I noticed then that all the other methods of the generator had > stayed the same (send, throw, close...), which gives really weird (imo) > codes : > > next(it) > it.send(35) > it.throw(Exception()) > next(it) > .... > > Browsing the web, I've found people troubled by that asymmetry, but no > remarks on its causes nor its future... > > Since __next__(), send() and others have really really close semantics, > I consider that state as a python wart, one of the few real ones I can > think of. You're missing an important detail: next()/__next__() is a feature of all iterators, while send() and throw() are generator-only methods. The only thing I could imagine is to add a generator.next() method that is simply an alias for generator.__next__(). However, TSBOOWTDI. cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From alexandre at peadrop.com Sun Apr 5 15:11:43 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Apr 2009 09:11:43 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87499.5060502@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: On Sun, Apr 5, 2009 at 5:06 AM, "Martin v. L?wis" wrote: >> Off the top of my head, the following is needed for a successful migration: >> >> ? ?- Verify that the repository at http://code.python.org/hg/ is >> properly converted. > > I see that this has four branches. What about all the other branches? > Will they be converted, or not? What about the stuff outside /python? > I am not sure if it would be useful to convert the old branches to Mercurial. The simplest thing to do would be to keep the current svn repository as a read-only archive. And if people needs to commit to these branches, they could request the branch to be imported into a Mercurial branch (or a simple to use script could be provided and developer could run it directly on the server to create a user branch). > In particular, the Stackless people have requested that they move along > with what core Python does, so their code should also be converted. > Noted. >> ? ?- Add Mercurial support to the issue tracker. > > Not sure what this means. There is currently svn support insofar as the > tracker can format rNNN references into ViewCVS links; this should be > updated if possible (removed if not). There would also be a possibility > to auto-close issues from the commit messages. This is not done > currently, so I would not make it a prerequisite for the switch. > Yes, I was referring to the rNNN references. Actually, I am not sure how this could be implemented, since with Mercurial we lose atomic revision IDs. We could use something like hash at branch-name (e.g, bf94293b1932 at py3k) referring to specific revision. An auto-close would be a nice feature, but, as you said, not necessary for the migration. The main stumbling block to implement an auto-close feature is to define when an issue should be closed. Maybe we could add our own meta-data to the commit message. For example: Fix some nasty bug. Close-Issue: 4532 When a such commit would arrive in one of the main branches, a commit hook would close the issue if all the affected releases have been fixed. >> ? ?- Setup temporary svn mirrors for the main Mercurial repositories. > > What is that? > I think it would be a good idea to host a temporary svn mirrors for developers who accesses their VCS via an IDE. Although, I am sure anymore if supporting these developers (if there are any) would worth the trouble. So, think of this as optional. >> ? ?- Augment code.python.org infrastructure to support the creation of >> developer accounts. > > One option would be to carry on with the current setup; migrating it > to hg might work as well, of course. > You mean the current setup for svn.python.org? Would you be comfortable to let this machine be accessed by core developers through SSH? Since with Mercurial, SSH access will be needed for server-side clone (or, a script similar to what the Mozilla folk have [1] could be added). [1]: https://developer.mozilla.org/en/Publishing_Mercurial_Clones >> ? ?- Update the release.py script. >> >> There is probably some other things that I missed > > Here are some: > > - integrate with the buildbot Good one. It seems buildbot has support for Mercurial. [2] So, this will be a matter of tweaking the right options. The batch scripts in Tools/buildbot will also need to be updated. [2]: http://djmitche.github.com/buildbot/docs/0.7.10/#How-Different-VC-Systems-Specify-Sources > - come up with a strategy for /external (also relevant for > ?the buildbot slaves) Since the directories in /external are considered read-only, we could simply a new Mercurial repository and copy the content of /external in it. When a new release needs to be added, just create a new directory and commit. > - decide what to do with the bzr mirrors > I don't see much benefits to keep them. So, I say, archive the branches there unless someone step-up to maintain them. -- Alexandre From benjamin at python.org Sun Apr 5 15:13:28 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 5 Apr 2009 08:13:28 -0500 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> Message-ID: <1afaf6160904050613w2016ed87i2ffab6f67c48aca4@mail.gmail.com> 2009/4/5 Alexandre Vassalotti : > Off the top of my head, the following is needed for a successful migration: ... > ? - Update the release.py script. I'll do this. -- Regards, Benjamin From benjamin at python.org Sun Apr 5 15:15:48 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 5 Apr 2009 08:15:48 -0500 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8800A.60601@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> Message-ID: <1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com> 2009/4/5 Dirkjan Ochtman : > On 05/04/2009 11:06, "Martin v. L?wis" wrote: >> - come up with a strategy for /external (also relevant for >> ? the buildbot slaves) > > I'm not sure exactly what the purpose or mechanism for /external is. Sure, > it's like a snapshot dir, probably used for to pull some stuff into other > process? Seems to me like it might be interesting to, for example, convert > to a simple config file + script that lets you specify a package > (repository) + tag, which can then be easily pulled in. > > But it'd be nice to know where and how exactly this is used. Basically it contains released versions of packages that some parts of Python depend on. For example, Sphinx dependencies to build the docs reside their. A simple script that downloads a tarball and extracts it seems more elegant. -- Regards, Benjamin From firephoenix at wanadoo.fr Sun Apr 5 15:35:21 2009 From: firephoenix at wanadoo.fr (Firephoenix) Date: Sun, 05 Apr 2009 15:35:21 +0200 Subject: [Python-Dev] Generator methods - "what's next" ? In-Reply-To: References: <49D896A4.3000104@wanadoo.fr> Message-ID: <49D8B399.4020003@wanadoo.fr> Georg Brandl a ?crit : > Firephoenix schrieb: > >> Hello everyone >> >> I'm a little confused by the recent changes to the generator system... >> >> I basically agreed with renaming the next() method to __next__(), so as >> to follow the naming of other similar methods (__iter__() etc.). >> But I noticed then that all the other methods of the generator had >> stayed the same (send, throw, close...), which gives really weird (imo) >> codes : >> >> next(it) >> it.send(35) >> it.throw(Exception()) >> next(it) >> .... >> >> Browsing the web, I've found people troubled by that asymmetry, but no >> remarks on its causes nor its future... >> >> Since __next__(), send() and others have really really close semantics, >> I consider that state as a python wart, one of the few real ones I can >> think of. >> > > You're missing an important detail: next()/__next__() is a feature of all > iterators, while send() and throw() are generator-only methods. > > The only thing I could imagine is to add a generator.next() method that > is simply an alias for generator.__next__(). However, TSBOOWTDI. > > cheers, > Georg > > Good point indeed. Generator methods (send, throw...) are some kind of black magic compared to normal methods, so I'd find it normal if their naming reflected this specificity, but on the other end it wouldn't be cool to overflow the builtin scope with all the corresponding functions "send(iter, var)"... so I guess all that will stay the way it is. Regards, Pascal -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre at peadrop.com Sun Apr 5 15:46:01 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Apr 2009 09:46:01 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> Message-ID: On Sun, Apr 5, 2009 at 6:27 AM, Antoine Pitrou wrote: > Alexandre Vassalotti peadrop.com> writes: >> >> Off the top of my head, the following is needed for a successful migration: > > There's also the issue of how we adapt the current workflow of "svnmerging" > between branches when we want to back- or forward-port stuff. In particular, > tracking of already done or blocked backports. > > (the issue being that "svnmerge" is different from what DVCS'es call "merging" > :-)) > See the PEP about that. I have written a fair amount of details how this would work with Mercurial: http://www.python.org/dev/peps/pep-0374/#backport -- Alexandre From dirkjan at ochtman.nl Sun Apr 5 16:13:21 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 16:13:21 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> Message-ID: <49D8BC81.7040007@ochtman.nl> (going back on-list) On 05/04/2009 15:42, Alexandre Vassalotti wrote: >> I'm pretty sure that we'll need to reconvert; I don't think the current >> conversion is particularly good. > > What is bad about it? For one thing, it has the [svn] prefixes, which I found to be quite ugly. hgsubversion in many cases will preserve the rev order from svn so that the local revision numbers that hg shows will be the same as in SVN anyway. On top of that, good conversion tools save the svn revision in the revision metadata in hg, so that you can see it with log --debug. For another, I'd like to use an author map to bring the revision authors more in line with what Mercurial repositories usually display; this helps with tool support and is also just a nicer solution IMO. I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. Could use some review, but it seems like a good start. > I largely prefer clone to named branches. From personal experience, I > found named branches difficult to use properly. And, I think even > Mercurial developers don't use them. No, the Mercurial project currently doesn't use them. Mozilla does use them at the moment, because they found they did have some advantages (especially lower disk usage because no separate clones were needed). I think named branches are fine for long-lived branches. At the very least we should have a proper discussion over this. > How do you reorder the revlog of a repository? There are scripts for this which can be investigated. > I am in favor of pruning the old branches, but not of leaving the old > history behind. The current Mercurial mirror of py3k is 92M on my disk > which is totally reasonable. So, I don't see what would be the > advantage there. The current Mercurial mirror for py3k also doesn't include any history from before it was branched, which is bad, IMO. In order to get the most of the DVCS structure, it would be helpful if py3k shared history with the normal (trunk) branches. > I was thinking of something very basic?e.g., something like a commit > hook that would asynchronously commit the latest revision to svn. We > wouldn't to keep convert much meta-data just the committer's name and > the changelog would be fine. What's the use case, who do you want to support with this? hgweb trivially provides tarballs for download on every revision, so people who don't want to use hg can easily download a snapshot. > Not really. Currently, core developers can only push stuff using the > Bazaar setup. Personally, I think SSH access would be a lot nicer, but > this will depend how confident python.org's admins are with this idea. We could still enable pushing through http(s) for hgweb(dir). Cheers, Dirkjan From dirkjan at ochtman.nl Sun Apr 5 16:27:30 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 16:27:30 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <49D8BFD2.8090600@ochtman.nl> On 05/04/2009 15:11, Alexandre Vassalotti wrote: > I am not sure if it would be useful to convert the old branches to > Mercurial. The simplest thing to do would be to keep the current svn > repository as a read-only archive. And if people needs to commit to > these branches, they could request the branch to be imported into a > Mercurial branch (or a simple to use script could be provided and > developer could run it directly on the server to create a user > branch). We should probably not include any branches that haven't been touched in the last 18 months. Then we also leave out branches that have been pruned. BTW, tags are also missing from the current conversions. We probably want to keep all release tags, but not the partial tags (e.g. the Distutils tags). Are there any other particularly useful tags we should keep? > An auto-close would be a nice feature, but, as you said, not necessary > for the migration. The main stumbling block to implement an auto-close > feature is to define when an issue should be closed. Maybe we could > add our own meta-data to the commit message. For example: > > Fix some nasty bug. > > Close-Issue: 4532 > > When a such commit would arrive in one of the main branches, a commit > hook would close the issue if all the affected releases have been > fixed. It makes more sense to me to use the syntax already used by Trac et al., e.g. "(fix|close)s? (issue|#)\d+" for closing and possibly "ref(erence)?s? (issue|#)\d+" for creating a link on the issue. BTW, this would also be a good time to split out the stdlib if that's still desirable (which I seem to have gleaned from the PyCon videos). Cheers, Dirkjan From solipsis at pitrou.net Sun Apr 5 16:39:20 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 14:39:20 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: Hello, > hgsubversion in many cases will preserve the rev order from svn so > that the local revision numbers that hg shows will be the same as in SVN > anyway. Er... I guess it's only the case in simplistic cases where you convert all branches in the SVN repo to a single hg repo (which is not workable for the CPython repo, which is too big), and there are no cases of SVN revisions being either ignored or split between several hg changesets (for example because they span multiple branches). The other nice thing with having "[svn rXXX]" in the patch subject line is that it makes the info easily viewable and searchable in the Web front-end. > For another, I'd like to use an author map to bring the revision authors > more in line with what Mercurial repositories usually display; this > helps with tool support and is also just a nicer solution IMO. Good idea. [in-repo multiple branches] > No, the Mercurial project currently doesn't use them. Mozilla does use > them at the moment, because they found they did have some advantages > (especially lower disk usage because no separate clones were needed). I > think named branches are fine for long-lived branches. > > At the very least we should have a proper discussion over this. I think at least 3.x and 2.x should live in separate repos. It is pointless for a clone of py3k to end up pulling all 40000+ changesets from the trunk. It would add 100MB+ to every py3k clone (that is, quadrupling the size of the repository). > The current Mercurial mirror for py3k also doesn't include any history > from before it was branched, which is bad, IMO. Given how much separate work has taken place in both, I'm not sure having that history would be very useful. We have to take into account practical needs. Someone needing to search history before py3k was created can just do a clone of the trunk. > In order to get the most > of the DVCS structure, it would be helpful if py3k shared history with > the normal (trunk) branches. Is any SVN-to-hg conversion tool able to parse the commits produced by svnmerge? And, even then, turn that information into useful hg information (say, transplant metadata of which changes were ported)? > > Not really. Currently, core developers can only push stuff using the > > Bazaar setup. Personally, I think SSH access would be a lot nicer, but > > this will depend how confident python.org's admins are with this idea. > > We could still enable pushing through http(s) for hgweb(dir). I'm not sure what the problem is. Developer SVN access already goes through ssh. cheers Antoine. From dirkjan at ochtman.nl Sun Apr 5 16:53:23 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 16:53:23 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: <49D8C5E3.5000200@ochtman.nl> On 05/04/2009 16:39, Antoine Pitrou wrote: > The other nice thing with having "[svn rXXX]" in the patch subject line is that > it makes the info easily viewable and searchable in the Web front-end. We can still make it accessible/searchable on the web if we don't put it in the commit message. > I think at least 3.x and 2.x should live in separate repos. It is pointless for > a clone of py3k to end up pulling all 40000+ changesets from the trunk. It would > add 100MB+ to every py3k clone (that is, quadrupling the size of the repository). I don't agree. It's quite annoying for things like annotate/blame, for example, where you may have to switch to another branch while chasing down a defective change. I also think 100MB+ is a cheap price to pay, given you only pay it in disk space (cheap) and initial clone time (not very often, and usually still quite fast). Also, at some point you presumably want to deprecate the whole 2.x line, right? So at that point, it'd be nice to have a full 3.x line with all the history in it, so that you can just throw away the 2.x stuff and still have full history. I do agree that 2.x and 3.x should probably be in separate clones. > Is any SVN-to-hg conversion tool able to parse the commits produced by > svnmerge? And, even then, turn that information into useful hg information (say, > transplant metadata of which changes were ported)? I think things are these are planned for hgsubversion, yes. I'd probably want to look at implementing some support for it myself if that makes the conversion of the Python repositories better. > I'm not sure what the problem is. Developer SVN access already goes through > ssh. Okay, sounds like that will be easy. Would be good to enable compression on the SSH, though, if that's not already done. Cheers, Dirkjan From solipsis at pitrou.net Sun Apr 5 17:18:41 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 15:18:41 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> Message-ID: Dirkjan Ochtman ochtman.nl> writes: > > I also think 100MB+ is a cheap price to pay, > given you only pay it in disk space (cheap) and initial clone time (not > very often, and usually still quite fast). It is a cheap price to pay if there is a significant return for it. In my experience using the hg mirror of the py3k branch, I don't remember having had to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k. Other developers may have different experiences, though. As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s line. Perhaps he wants to step in and say whether cloning the trunk is a painful experience for him, or not. > Also, at some point you > presumably want to deprecate the whole 2.x line, right? The consensus seems to be that it will not happen before a couple of years. > Okay, sounds like that will be easy. Would be good to enable compression > on the SSH, though, if that's not already done. Does the hg protocol compress that good? I would have thought there is already a lot of compression in the layout (given that it seems much more efficient than some of its competitors). Regards Antoine. From dirkjan at ochtman.nl Sun Apr 5 17:47:12 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 17:47:12 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> Message-ID: <49D8D280.6060900@ochtman.nl> On 05/04/2009 17:18, Antoine Pitrou wrote: > It is a cheap price to pay if there is a significant return for it. In my > experience using the hg mirror of the py3k branch, I don't remember having had > to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k. > Other developers may have different experiences, though. > > As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s > line. Perhaps he wants to step in and say whether cloning the trunk is a painful > experience for him, or not. Sure it's painful, but he only has to go through that once, maybe twice. > The consensus seems to be that it will not happen before a couple of years. See, I think the point here is that, even though you want the branches to be clones, you also want them to all be part of the same directed acyclic graph (that DAG thing I keep nattering on about). That way, you can later merge every branch back in to some other branch (even if it's trivial merge that doesn't keep anything from one of the branches). Even if that's not for a couple of years, it's nice when you'll be able to do it in a couple of years without changing all the hashes (meaning everybody has to re-clone). For any dial-up providers, we could for example provide a repository that just has the changesets up to the split between trunk and py3k. He can clone that once, clone it locally, then pull the rest of the respective history in those local clones. If you don't have common history, a few of the niceties of having a DAG-based DVCS in the first place go away; that seems like a pity. > Does the hg protocol compress that good? I would have thought there is already a > lot of compression in the layout (given that it seems much more efficient than > some of its competitors). When used over HTTP, hg uses bundles (which can also be used as separate file to exchange changesets informally). Bundles contain gzip- or bzip2-compressed csets. When communicating over SSH, on the other hand, hg defaults to uncompressed streams, because the assumption is that connections can use SSH's compression, which is more efficient. All of this functions on top of the already quite efficient revlogs that make up the basic storage model for hg. Cheers, Dirkjan From benjamin at python.org Sun Apr 5 18:48:28 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 5 Apr 2009 11:48:28 -0500 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> Message-ID: <1afaf6160904050948n32674f60p6de4b1d335ad150d@mail.gmail.com> 2009/4/5 Antoine Pitrou : > Dirkjan Ochtman ochtman.nl> writes: >> >> I also think 100MB+ is a cheap price to pay, >> given you only pay it in disk space (cheap) and initial clone time (not >> very often, and usually still quite fast). > > It is a cheap price to pay if there is a significant return for it. In my > experience using the hg mirror of the py3k branch, I don't remember having had > to run "annotate" on the trunk to hunt for a change that I'd witnessed in py3k. > Other developers may have different experiences, though. I agree with Dirkjan. > > As for the clone time, one of our proeminent developers is, IIRC, on a 40 kb/s > line. Perhaps he wants to step in and say whether cloning the trunk is a painful > experience for him, or not. I suppose this is me. Cloning the hg trunk repo only takes slightly longer than an svn checkout for me, and it only needs to be done occasionally, so I have no problem with including all the history. -- Regards, Benjamin From foom at fuhm.net Sun Apr 5 18:51:38 2009 From: foom at fuhm.net (James Y Knight) Date: Sun, 5 Apr 2009 12:51:38 -0400 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> Message-ID: <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> On Apr 5, 2009, at 6:29 AM, Antoine Pitrou wrote: > Brian Quinlan sweetapp.com> writes: >> >> I don't see why this is helpful. Could you explain why >> _RawIOBase.close() calling self.flush() is useful? > > I could not explain it for sure since I didn't write the Python > version. > I suppose it's so that people who only override flush() > automatically get the > flush-on-close behaviour. It seems that a separate method "_internal_close" should've been defined to do the actual closing of the file, and the close() method should've been defined on the base class as "self.flush(); self._internal_close()" and never overridden. James From barry at python.org Sun Apr 5 19:04:10 2009 From: barry at python.org (Barry Warsaw) Date: Sun, 5 Apr 2009 13:04:10 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87499.5060502@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <25252456-64F8-42C2-BFEF-4AA791C3F1AB@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 5, 2009, at 5:06 AM, Martin v. L?wis wrote: > - decide what to do with the bzr mirrors I don't see any reason to keep them running on python.org. There are, or will be, other alternatives. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdjki3EjvBPtnXfVAQK2gAP8Duw+imZwZhsyGildHkUSeNW1uHazxbzL cKPeEfanSDUtkC51478/NC7+UxfNGdQJ4umo+LNiy6GXG3Kx7KCmYKHr6yBCzaxS 4HsuOVkFcjqn57u2eT9A5PDcxGgK4Os7XfB3kMS/f1xlBPYsF7W4Qpdck8gTbL+i dXJnq/+rd6k= =QSw3 -----END PGP SIGNATURE----- From firephoenix at wanadoo.fr Sun Apr 5 19:25:56 2009 From: firephoenix at wanadoo.fr (Firephoenix) Date: Sun, 05 Apr 2009 19:25:56 +0200 Subject: [Python-Dev] Generator methods - "what's next" ? In-Reply-To: <878wmftac1.fsf@xemacs.org> References: <49D896A4.3000104@wanadoo.fr> <878wmftac1.fsf@xemacs.org> Message-ID: <49D8E9A4.70604@wanadoo.fr> Stephen J. Turnbull a ?crit : > Firephoenix writes: > > > I'm a little confused by the recent changes to the generator system... > > Welcome to the club. It's not easy even for the gurus. See the PEP > 380 ("yield from") discussions (mostly on Python-Ideas). > > > But I noticed then that all the other methods of the generator had > > stayed the same (send, throw, close...), which gives really weird (imo) > > codes : > > > > next(it) > > it.send(35) > > it.throw(Exception()) > > next(it) > > .... > > > > Browsing the web, I've found people troubled by that asymmetry, but no > > remarks on its causes nor its future... > > Well, this kind of discussion generally belongs on c.l.py, but as far > as I know, .next() is still present for generators but it's spelled > .send(None). See PEP 342. It seems to me that the rationale for > respelling .next() as .__next__() given in PEP 3114 doesn't apply to > .send() and .throw(), since there is no syntax which invokes those > methods magically. > > Also note that since next() takes no argument, it presumes no > knowledge of the implementation of the iterator. So specification as > a function called "on" the iterator seems natural to me. But .send() > and .throw() are only useful if you know the semantics of their > arguments, ie, the implementation of the generator. Thus using method > syntax for them seems more natural to me. > > If you have some concrete suggestions you want to follow up to > Python-Dev with, two remarks: > > The code presented above is weird because that code is weird, not > because the generator methods are messed up. Why would you ever write > that code? You need a plausible use case, one where a generator is > the natural way to write the code, but it's not explicitly iterative. > > Second, the whole trend is the other direction, fitting generators > naturally into Python syntax without using explicit invocation of > methods. Again, PEP 380 is an example (though rather specialized). > As is the expression form of yield (half-successful in that no > recv() syntax or builtin is needed, although .send() seems to be). So > the use case requested above will need to be compelling. > > > Whoups, now that you mention it, I discover other mailing-lists seemed more suitable for this subject... sorry Actually I ran over an example like the following, in the case of a "reversed generator" that has to be activated by a first call to "next", before we're able to send data to the yield expression it has encountered. But as you mention, send(None) would work as well, and this kind of "setup operation" had better be hidden in a function decorator or something like that. > next(it) # init phase > it.send(35) > it.send(36) Regards, pascal Chambon From martin at v.loewis.de Sun Apr 5 19:37:53 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 19:37:53 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> Message-ID: <49D8EC71.5020105@v.loewis.de> > I am not sure if it would be useful to convert the old branches to > Mercurial. The simplest thing to do would be to keep the current svn > repository as a read-only archive. And if people needs to commit to > these branches, they could request the branch to be imported into a > Mercurial branch (or a simple to use script could be provided and > developer could run it directly on the server to create a user > branch). I think it should be stated in the PEP what branches get converted, in what form, and what the further usage of the svn repository should be. > An auto-close would be a nice feature, but, as you said, not necessary > for the migration. The main stumbling block to implement an auto-close > feature is to define when an issue should be closed. Maybe we could > add our own meta-data to the commit message. For example: > > Fix some nasty bug. > > Close-Issue: 4532 > > When a such commit would arrive in one of the main branches, a commit > hook would close the issue if all the affected releases have been > fixed. I think there is a long tradition of such annotations; we should try to repeat history here. IIUC, the Debian bugtracker understands Closes: #4532 and some other syntaxes. It must be easy to remember, else people won't use it. >>> - Setup temporary svn mirrors for the main Mercurial repositories. >> What is that? >> > > I think it would be a good idea to host a temporary svn mirrors for > developers who accesses their VCS via an IDE. Although, I am sure > anymore if supporting these developers (if there are any) would worth > the trouble. So, think of this as optional. Any decision to have or not have such a feature should be stated in the PEP. I personally don't use IDEs, so I don't care (although I do notice that the apparent absence of IDE support for Mercurial indicates maturity of the technology) >>> - Augment code.python.org infrastructure to support the creation of >>> developer accounts. >> One option would be to carry on with the current setup; migrating it >> to hg might work as well, of course. >> > > You mean the current setup for svn.python.org? Would you be > comfortable to let this machine be accessed by core developers through > SSH? Since with Mercurial, SSH access will be needed for server-side > clone (or, a script similar to what the Mozilla folk have [1] could be > added). > > [1]: https://developer.mozilla.org/en/Publishing_Mercurial_Clones Ok, I take that back. I assumed that Mercurial could work *exactly* as Subversion. Apparently, that's not the case (although I have no idea what a server-side clone is). So I wait for the PEP to explain how authentication and access control is to be implemented. Creating individual Unix accounts for committers should be avoided. >> - integrate with the buildbot > > Good one. It seems buildbot has support for Mercurial. [2] So, this > will be a matter of tweaking the right options. The batch scripts in > Tools/buildbot will also need to be updated. > > [2]: http://djmitche.github.com/buildbot/docs/0.7.10/#How-Different-VC-Systems-Specify-Sources I can give you access to the master setup. Ideally, this should be tested before the switchover (with a single branch). We also need instructions for the slaves (if any - perhaps installing a hg binary is sufficient). > Since the directories in /external are considered read-only, we could > simply a new Mercurial repository and copy the content of /external in > it. >> - decide what to do with the bzr mirrors >> > > I don't see much benefits to keep them. Both should go into the PEP. Regards, Martin From martin at v.loewis.de Sun Apr 5 19:39:18 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 19:39:18 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> <1afaf6160904050615i39aa6fd9o953336f4c74fa871@mail.gmail.com> Message-ID: <49D8ECC6.9080302@v.loewis.de> >> I'm not sure exactly what the purpose or mechanism for /external is. Sure, >> it's like a snapshot dir, probably used for to pull some stuff into other >> process? Seems to me like it might be interesting to, for example, convert >> to a simple config file + script that lets you specify a package >> (repository) + tag, which can then be easily pulled in. >> >> But it'd be nice to know where and how exactly this is used. > > Basically it contains released versions of packages that some parts of > Python depend on. For example, Sphinx dependencies to build the docs > reside their. A simple script that downloads a tarball and extracts it > seems more elegant. Such a script would, in particular, also have to work on the Windows buildbot slaves. /external is primarily used for the Window build. Regards, Martin From martin at v.loewis.de Sun Apr 5 19:42:59 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 19:42:59 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87DC2.2040708@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D7C523.2090605@ochtman.nl> <49D7FEF6.1010006@v.loewis.de> <49D87DC2.2040708@ochtman.nl> Message-ID: <49D8EDA3.3080405@v.loewis.de> > Sounds sane. Would I be able to get access to PSF infrastructure to get > started on that, or do you want me to get started on my own box? I'll > probably do the conversion on my own box, but for authn/authz it might > be useful to be able to use PSF infra. Now that Alexandre has also volunteered, you two need to decide who is in charge. Whoever does that will certainly get access to code.python.org; the demo installation should run on that machine. Regards, Martin From solipsis at pitrou.net Sun Apr 5 19:45:50 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 17:45:50 +0000 (UTC) Subject: [Python-Dev] Possible py3k io wierdness References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> Message-ID: James Y Knight fuhm.net> writes: > > It seems that a separate method "_internal_close" should've been > defined to do the actual closing of the file, and the close() method > should've been defined on the base class as "self.flush(); > self._internal_close()" and never overridden. I'm completely open to changes as long as there is a reasonable consensus around them. Your proposal looks sane, although the fact that a semi-private method (starting with an underscore) is designed to be overriden in some classes is a bit annoying. I'd also like to have some advice from Guido, since he was one of the driving forces behind the specification and the original Python implementation. Regards Antoine. From martin at v.loewis.de Sun Apr 5 19:50:07 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 19:50:07 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8800A.60601@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> Message-ID: <49D8EF4F.7010709@v.loewis.de> Dirkjan Ochtman wrote: > On 05/04/2009 11:06, "Martin v. L?wis" wrote: >> In particular, the Stackless people have requested that they move along >> with what core Python does, so their code should also be converted. > > I'd be interested to hear if they want all of their stuff converted, or > just the mainline/trunk of what is currently in trunk/branches/tags. Richard Tew would be the person discuss the details with. >> - integrate with the buildbot > > I've setup the buildbot infra for Mercurial (though not many people are > interesting in it, so it's kind of languished). Using buildbot's hg > support is easy. 0.7.10 is the first version which works with hg 1.1+, > though, so we probably don't want to go with anything earlier. Ok, that's a problem. We currently run 0.7.5 on the master, and have made custom changes that need to be forward-ported. IIUC, this will also mean that the waterfall default page is gone, which might surprise people. I suppose all slaves also need to upgrade. >> - come up with a strategy for /external (also relevant for >> the buildbot slaves) > > I'm not sure exactly what the purpose or mechanism for /external is. > Sure, it's like a snapshot dir, probably used for to pull some stuff > into other process? Seems to me like it might be interesting to, for > example, convert to a simple config file + script that lets you specify > a package (repository) + tag, which can then be easily pulled in. > > But it'd be nice to know where and how exactly this is used. Take a look at the batch files in Tools/buildbot - they are the primary consumers. PCbuild/readme.txt also refers to it. Regards, Martin From martin at v.loewis.de Sun Apr 5 19:53:48 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 19:53:48 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8BFD2.8090600@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8BFD2.8090600@ochtman.nl> Message-ID: <49D8F02C.1040008@v.loewis.de> > We should probably not include any branches that haven't been touched in > the last 18 months. Then we also leave out branches that have been pruned. > > BTW, tags are also missing from the current conversions. We probably > want to keep all release tags, but not the partial tags (e.g. the > Distutils tags). Are there any other particularly useful tags we should > keep? First of all, if the conversion is incomplete, the PEP should make explicit what information will be lost. As for tags - I think providing just the release tags is fine. > BTW, this would also be a good time to split out the stdlib if that's > still desirable (which I seem to have gleaned from the PyCon videos). Is it possible to branch from a subdirectory? For the "different VMs" stuff, it's rather desirable to have a branch of the test suite, and the perhaps the standard library, than extracting it from the source. Regards, Martin From dirkjan at ochtman.nl Sun Apr 5 19:58:21 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 19:58:21 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8EF4F.7010709@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de> Message-ID: <49D8F13D.90302@ochtman.nl> On 05/04/2009 19:50, "Martin v. L?wis" wrote: > Ok, that's a problem. We currently run 0.7.5 on the master, and have > made custom changes that need to be forward-ported. IIUC, this will > also mean that the waterfall default page is gone, which might surprise > people. > > I suppose all slaves also need to upgrade. Why is the waterfall default page gone? I had that in my 0.7.9 setup, at least. Provided the 0.7.5 slaves work with 0.7.10, then no, it's not necessary to upgrade the slaves. The problem in buildbot was strictly with the change detection in hg repos (combined with the Mercurial API, which hasn't entirely become stable -- so it changed a bit in 1.1). > Take a look at the batch files in Tools/buildbot - they are the > primary consumers. PCbuild/readme.txt also refers to it. Will do. Cheers, Dirkjan From dirkjan at ochtman.nl Sun Apr 5 20:02:44 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 20:02:44 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8EC71.5020105@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: <49D8F244.8080204@ochtman.nl> On 05/04/2009 19:37, "Martin v. L?wis" wrote: > Any decision to have or not have such a feature should be stated in > the PEP. I personally don't use IDEs, so I don't care (although > I do notice that the apparent absence of IDE support for Mercurial > indicates maturity of the technology) Well, there should be good support for Eclipse (through MercurialEclipse), NetBeans (they use hg themselves, after all), and the IDE-version of Komodo 5.0+ also includes hg support. I suppose other, more Python-specific IDEs might be following suit as Python switches. > Ok, I take that back. I assumed that Mercurial could work *exactly* > as Subversion. Apparently, that's not the case (although I have no > idea what a server-side clone is). So I wait for the PEP to explain > how authentication and access control is to be implemented. Creating > individual Unix accounts for committers should be avoided. Yeah, that won't be necessary. The canonical solution is to have just one Unix account called hg, to which we can add public keys. Cheers, Dirkjan From martin at v.loewis.de Sun Apr 5 20:18:33 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 20:18:33 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8F13D.90302@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de> <49D8F13D.90302@ochtman.nl> Message-ID: <49D8F5F9.3000803@v.loewis.de> >> Ok, that's a problem. We currently run 0.7.5 on the master, and have >> made custom changes that need to be forward-ported. IIUC, this will >> also mean that the waterfall default page is gone, which might surprise >> people. >> >> I suppose all slaves also need to upgrade. > > Why is the waterfall default page gone? I had that in my 0.7.9 setup, at > least. Provided the 0.7.5 slaves work with 0.7.10, then no, it's not > necessary to upgrade the slaves. The problem in buildbot was strictly > with the change detection in hg repos (combined with the Mercurial API, > which hasn't entirely become stable -- so it changed a bit in 1.1). My understanding is that with 0.7.6 and later, the default page won't be the waterfall anymore. In the 0.7.6 release notes, it says # The initial page (when you hit the root of the web site) is served # from index.html, and provides links to the Waterfall as well as the # other pages. In the 0.7.9 release notes, it says # The html.Waterfall status target was replaced by html.WebStatus in # 0.7.6, and will be removed by 0.8.0. But then, I have not tried installing it, so I don't know what it actually looks like. Regards, Martin From barry at python.org Sun Apr 5 20:19:55 2009 From: barry at python.org (Barry Warsaw) Date: Sun, 5 Apr 2009 14:19:55 -0400 Subject: [Python-Dev] Tools Message-ID: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Someone (I'm sorry, I forgot who) asked me at Pycon about stripping out Demos and Tools. I'm happy to remove the two I wrote - Tools/ world and Tools/pynche - from the distribution and release them as separate projects (retaining the PSF license). Should I remove them from both the Python 2.x and 3.x trunks? Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdj2S3EjvBPtnXfVAQJvkAQAhj/Go+OtfYP//OZ7HIHwTjaeMlpAkfwn iPxE6O8gY0K48J1AUmjvGSeckfP4FRqVJWOVMQYvX8yTHNFnCJxDSl4JjgboqLz4 s/IvrUYjSiN1FGrQJBA3RI4jFmuetzmKxNWgi6gEzQ6ocTLC80EyCHhxsAMhCeqr SGQ+Alrewis= =ODWt -----END PGP SIGNATURE----- From martin at v.loewis.de Sun Apr 5 20:22:46 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 20:22:46 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8F244.8080204@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <49D8F244.8080204@ochtman.nl> Message-ID: <49D8F6F6.8030308@v.loewis.de> > Yeah, that won't be necessary. The canonical solution is to have just > one Unix account called hg, to which we can add public keys. That would work fine for me. We currently call the account pythondev, but calling it hg would be shorter, and therefore better (plus, pythondev is associated with svn). The PEP should then explain what the authorized_keys lines should look like; this allows people to review the security of the setup. Regards, Martin From martin at v.loewis.de Sun Apr 5 20:29:31 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 20:29:31 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D87CD4.1000909@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> Message-ID: <49D8F88B.3050102@v.loewis.de> > I've svnsynced the SVN repo so that we can work on it efficiently, and > I've already talked with Augie Fackler, the hgsubversion maintainer, > about what the best way forward is. For example, we may want to leave > some of the very old history behind, or prune some old branches. I'm -1 on removing very old history; it's still useful to find out that some change goes back to 1994. I'm -0 on removing old branches (your 18 month policy sounds reasonable). >> - Convert the current svn commit hooks to Mercurial. > > Some new hooks should also be discussed. For example, Mozilla uses a > single-head hook, to prevent people from pushing multiple heads. They > also have a pushlog extension that keeps a SQLite database of what > people pushed. This is particularly useful for linearizing history, > which is required for integration with buildbot infrastructure. FYI: this is the list of hooks currently employed: - pre: check whitespace - post: mail python-checkins inform regular buildbot inform community buildbot trigger website rebuild if a PEP was modified (but then, whether or not the PEPs will be maintained in hg also needs to be decided) >> - Augment code.python.org infrastructure to support the creation of >> developer accounts. > > Developers already have accounts, don't they? Depends on the term "account". There is a mapping ssh-key <-> logname. > In any case, some web > interface to facilitate setting up new clones (branches) is also > something that's probably desirable. I think Mozilla has some tooling > for that which we might be able to start off of. How to authenticate in that interface? We don't have passwords per committer. Regards, Martin From martin at v.loewis.de Sun Apr 5 20:36:42 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 20:36:42 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8BC81.7040007@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: <49D8FA3A.5050400@v.loewis.de> > For another, I'd like to use an author map to bring the revision authors > more in line with what Mercurial repositories usually display; this > helps with tool support and is also just a nicer solution IMO. We do require full real names (i.e. no nicknames). Can Mercurial guarantee such a thing? > At the very least we should have a proper discussion over this. If so, I would like to see that discussion in the PEP. I don't think I can personally contribute to that discussion. I will have to trust that whatever Mercurial experts propose is good. > The current Mercurial mirror for py3k also doesn't include any history > from before it was branched, which is bad, IMO. In order to get the most > of the DVCS structure, it would be helpful if py3k shared history with > the normal (trunk) branches. In the long run, the current trunk may cease to exist, and the py3k branch may take over its role. Not sure whether this needs to be considered. >> Not really. Currently, core developers can only push stuff using the >> Bazaar setup. Personally, I think SSH access would be a lot nicer, but >> this will depend how confident python.org's admins are with this idea. If it's the same as the current subversion access, it's fine. Otherwise, it needs discussion. > We could still enable pushing through http(s) for hgweb(dir). But that would require to hand out (and manage) passwords, right? Martin From dirkjan at ochtman.nl Sun Apr 5 20:37:36 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 20:37:36 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8F5F9.3000803@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8800A.60601@ochtman.nl> <49D8EF4F.7010709@v.loewis.de> <49D8F13D.90302@ochtman.nl> <49D8F5F9.3000803@v.loewis.de> Message-ID: <49D8FA70.2060304@ochtman.nl> On 05/04/2009 20:18, "Martin v. L?wis" wrote: > But then, I have not tried installing it, so I don't know what it > actually looks like. Ah, right. In my setup, there was an index page with three lines of text, one of which had a link to the waterfall. So I think that should still be simple enough for most of the interested parties. ;) Cheers, Dirkjan From martin at v.loewis.de Sun Apr 5 20:40:20 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Apr 2009 20:40:20 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8C5E3.5000200@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> Message-ID: <49D8FB14.2080509@v.loewis.de> >> I think at least 3.x and 2.x should live in separate repos. It is >> pointless for >> a clone of py3k to end up pulling all 40000+ changesets from the >> trunk. It would >> add 100MB+ to every py3k clone (that is, quadrupling the size of the >> repository). > > I don't agree. It's quite annoying for things like annotate/blame, for > example, where you may have to switch to another branch while chasing > down a defective change. FWIW, I also think that all branches should go back to the very beginning. > Okay, sounds like that will be easy. Would be good to enable compression > on the SSH, though, if that's not already done. Where is that configured? Regards, Martin From dirkjan at ochtman.nl Sun Apr 5 20:43:24 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 20:43:24 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8F88B.3050102@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> Message-ID: <49D8FBCC.1050801@ochtman.nl> On 05/04/2009 20:29, "Martin v. L?wis" wrote: > FYI: this is the list of hooks currently employed: > - pre: check whitespace > - post: mail python-checkins > inform regular buildbot > inform community buildbot > trigger website rebuild if a PEP was modified > (but then, whether or not the PEPs will be maintained > in hg also needs to be decided) All this is easy to do with Mercurial's hook system. One caveat is that stuff (like whitespace) only gets checked at push time, not at commit time (running commit hooks would have to be done on the client, but since we don't sandbox hooks, they would be a liability to distribute by default). People could still set them up locally for pre-commit if they want, of course, but otherwise they may need some trickery to modify the changesets they want to push. For the email messages, we'll probably want to use the notify extension that comes with hg. > How to authenticate in that interface? We don't have passwords per > committer. Okay, so we'll use ssh. Cheers, Dirkjan From dirkjan at ochtman.nl Sun Apr 5 20:45:27 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Sun, 05 Apr 2009 20:45:27 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FA3A.5050400@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> Message-ID: <49D8FC47.8080803@ochtman.nl> On 05/04/2009 20:36, "Martin v. L?wis" wrote: > We do require full real names (i.e. no nicknames). Can Mercurial > guarantee such a thing? We could pre-record the list of allowed names in a hook, then have the hook check that usernames include one of those names and an email address (so people can still start using another email address). > In the long run, the current trunk may cease to exist, and the py3k > branch may take over its role. Not sure whether this needs to be > considered. I considered that in some other subthread. :) Cheers, Dirkjan From benjamin at python.org Sun Apr 5 21:30:04 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 5 Apr 2009 14:30:04 -0500 Subject: [Python-Dev] Tools In-Reply-To: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> Message-ID: <1afaf6160904051230h3f657ad3tbf41e6fa20bd02fb@mail.gmail.com> 2009/4/5 Barry Warsaw : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Someone (I'm sorry, I forgot who) asked me at Pycon about stripping out > Demos and Tools. ?I'm happy to remove the two I wrote - Tools/world and > Tools/pynche - from the distribution and release them as separate projects > (retaining the PSF license). ? Should I remove them from both the Python 2.x > and 3.x trunks? +1 to removing some of the old unused stuff from those directories. -- Regards, Benjamin From g.brandl at gmx.net Sun Apr 5 22:09:46 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 22:09:46 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FC47.8080803@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: Dirkjan Ochtman schrieb: > On 05/04/2009 20:36, "Martin v. L?wis" wrote: >> We do require full real names (i.e. no nicknames). Can Mercurial >> guarantee such a thing? > > We could pre-record the list of allowed names in a hook, then have the > hook check that usernames include one of those names and an email > address (so people can still start using another email address). What about commits from other people, e.g. pulled from a repo or imported via hg import? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sun Apr 5 22:11:36 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 22:11:36 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FBCC.1050801@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> Message-ID: Dirkjan Ochtman schrieb: > On 05/04/2009 20:29, "Martin v. L?wis" wrote: >> FYI: this is the list of hooks currently employed: >> - pre: check whitespace >> - post: mail python-checkins >> inform regular buildbot >> inform community buildbot >> trigger website rebuild if a PEP was modified >> (but then, whether or not the PEPs will be maintained >> in hg also needs to be decided) > > All this is easy to do with Mercurial's hook system. One caveat is that > stuff (like whitespace) only gets checked at push time, not at commit > time (running commit hooks would have to be done on the client, but > since we don't sandbox hooks, they would be a liability to distribute by > default). People could still set them up locally for pre-commit if they > want, of course, but otherwise they may need some trickery to modify the > changesets they want to push. When commits with bad whitespace changes are rejected on push, this is a pretty good incentive to run the pre-commit hook next time, so that you don't have to re-do all the commits in that batch :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From solipsis at pitrou.net Sun Apr 5 22:29:57 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 20:29:57 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> Message-ID: Georg Brandl gmx.net> writes: > > When commits with bad whitespace changes are rejected on push, this is a > pretty good incentive to run the pre-commit hook next time, so that you > don't have to re-do all the commits in that batch :) Do you really have to re-do all the commits, or can you just commit the whitespace fixes separately? From martin at v.loewis.de Sun Apr 5 22:35:45 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 05 Apr 2009 22:35:45 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: <49D91621.1050306@v.loewis.de> >> We could pre-record the list of allowed names in a hook, then have the >> hook check that usernames include one of those names and an email >> address (so people can still start using another email address). > > What about commits from other people, e.g. pulled from a repo or imported > via hg import? Not sure. What is the recommendation? Ideally, we would have a contributor agreement on file of any, well, contributor. Regards, Martin From g.brandl at gmx.net Sun Apr 5 22:37:06 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 22:37:06 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> Message-ID: Antoine Pitrou schrieb: > Georg Brandl gmx.net> writes: >> >> When commits with bad whitespace changes are rejected on push, this is a >> pretty good incentive to run the pre-commit hook next time, so that you >> don't have to re-do all the commits in that batch :) > > Do you really have to re-do all the commits, or can you just commit the > whitespace fixes separately? Probably yes. I was just painting the devil on the wall :) At PyCon, I already wrote the pre-commit hook. And what's best, since it runs locally it can fix the files for you instead of just bitching around... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sun Apr 5 22:47:32 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Apr 2009 22:47:32 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D91621.1050306@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> <49D91621.1050306@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >>> We could pre-record the list of allowed names in a hook, then have the >>> hook check that usernames include one of those names and an email >>> address (so people can still start using another email address). >> >> What about commits from other people, e.g. pulled from a repo or imported >> via hg import? > > Not sure. What is the recommendation? > > Ideally, we would have a contributor agreement on file of any, well, > contributor. Well, in theory it shouldn't make a difference if a contributed patch is committed by a committer under his name (and the contributor's name mentioned in the commit message), or if the patch is committed under the contributor's name. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From greg.ewing at canterbury.ac.nz Mon Apr 6 00:39:43 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Apr 2009 10:39:43 +1200 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <49D88E6F.4080801@sweetapp.com> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <49D88E6F.4080801@sweetapp.com> Message-ID: <49D9332F.4050503@canterbury.ac.nz> Brian Quinlan wrote: > if not self.__closed: > try: > - self.flush() > + IOBase.flush(self) > except IOError: > pass # If flush() fails, just give up > self.__closed = True That doesn't seem like a good idea to me at all. If someone overrides flush() but not close(), their flush method won't get called, which would be surprising. To get the desired behaviour, you need something like def close(self): if not self.__closed: self.flush() self._close() self.__closed = True and then tell people to override _close() rather than close(). -- Greg From greg.ewing at canterbury.ac.nz Mon Apr 6 00:51:19 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Apr 2009 10:51:19 +1200 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: <49D89309.7050307@gmail.com> References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> <49D89309.7050307@gmail.com> Message-ID: <49D935E7.5010207@canterbury.ac.nz> Nick Coghlan wrote: > Still, as both you and Greg have pointed out, even in its current form > memoryview is already useful as a replacement for buffer that doesn't > share buffer's problems That may be so, but I was more pointing out that the elementwise functions I'm talking about would be useful even without memoryview at all. Mostly you would just use them directly on array.array or other sequence types. Why is it that whenever the word "buffer" is mentioned, some people seem to think it has something to do with memoryview? There is no such thing as "a buffer". There is the buffer interface, and there are objects which support the buffer interface, of which memoryview is one among many. -- Greg From skippy.hammond at gmail.com Mon Apr 6 00:48:04 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Mon, 06 Apr 2009 08:48:04 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8BC81.7040007@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: <49D93524.3060500@gmail.com> On 6/04/2009 12:13 AM, Dirkjan Ochtman wrote: > > I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. > Could use some review, but it seems like a good start. Just to be clear, what input would you like on that map? I'm listed twice: mark.hammond = Mark Hammond mhammond = Mark Hammond but that email address isn't the address normally associated with any checkins I make, nor the address in the comments of the ssh keys I use (which is mhammond at skippinet.com.au) The addresses given are valid though, so I'm not sure what kind of review or feedback you are after. Cheers, Mark From ncoghlan at gmail.com Mon Apr 6 00:54:20 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 08:54:20 +1000 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> Message-ID: <49D9369C.8080400@gmail.com> Antoine Pitrou wrote: > James Y Knight fuhm.net> writes: >> It seems that a separate method "_internal_close" should've been >> defined to do the actual closing of the file, and the close() method >> should've been defined on the base class as "self.flush(); >> self._internal_close()" and never overridden. > > I'm completely open to changes as long as there is a reasonable consensus around > them. Your proposal looks sane, although the fact that a semi-private method > (starting with an underscore) is designed to be overriden in some classes is a > bit annoying. Note that we already do that in a couple of places where it makes sense - in those cases the underscore is there to tell *clients* of the class "don't call this directly", but it is still explicitly documented as part of the subclassing API. (the only example I can find at the moment is in asynchat, but I thought there were a couple of more common ones than that - hopefully I'll think of them later) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From greg.ewing at canterbury.ac.nz Mon Apr 6 00:56:28 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Apr 2009 10:56:28 +1200 Subject: [Python-Dev] Generator methods - "what's next" ? In-Reply-To: <49D896A4.3000104@wanadoo.fr> References: <49D896A4.3000104@wanadoo.fr> Message-ID: <49D9371C.3000202@canterbury.ac.nz> Firephoenix wrote: > I basically agreed with renaming the next() method to __next__(), so as > to follow the naming of other similar methods (__iter__() etc.). > But I noticed then that all the other methods of the generator had > stayed the same (send, throw, close...) Keep in mind that next() is part of the iterator protocol that applies to all iterators, whereas the others are specific to generators. By your reasoning, any object that has any __xxx__ methods should have all its other methods turned into __xxx__ methods as well. -- Greg From ncoghlan at gmail.com Mon Apr 6 01:10:37 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 09:10:37 +1000 Subject: [Python-Dev] graphics maths types in python core? In-Reply-To: <49D935E7.5010207@canterbury.ac.nz> References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> <49D89309.7050307@gmail.com> <49D935E7.5010207@canterbury.ac.nz> Message-ID: <49D93A6D.2040602@gmail.com> Greg Ewing wrote: > > Why is it that whenever the word "buffer" is mentioned, > some people seem to think it has something to do with > memoryview? There is no such thing as "a buffer". There > is the buffer interface, and there are objects which > support the buffer interface, of which memoryview is > one among many. > Probably because memoryview *is* the Python API for the C-level buffer interface. While I can understand that point of view, I don't agree with it, which is why I consider it important to point out that memoryview's limitations aren't shared by the underlying API when the topic comes up. /tangent from the vector math thread (hopefully) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Mon Apr 6 01:14:54 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Apr 2009 23:14:54 +0000 (UTC) Subject: [Python-Dev] graphics maths types in python core? References: <20090404150111.GQ12593@idyll.org> <49D7D884.5060801@canterbury.ac.nz> <49D7EE7C.4040604@canterbury.ac.nz> <49D7F2AB.8060907@canterbury.ac.nz> <49D85A23.6020405@gmail.com> <49D89309.7050307@gmail.com> <49D935E7.5010207@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > Why is it that whenever the word "buffer" is mentioned, > some people seem to think it has something to do with > memoryview? There is no such thing as "a buffer". There is a Py_buffer struct. From doko at ubuntu.com Mon Apr 6 01:37:00 2009 From: doko at ubuntu.com (Matthias Klose) Date: Mon, 06 Apr 2009 01:37:00 +0200 Subject: [Python-Dev] Tools In-Reply-To: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> Message-ID: <49D9409C.6060108@ubuntu.com> Barry Warsaw schrieb: > Someone (I'm sorry, I forgot who) asked me at Pycon about stripping out > Demos and Tools. I'm happy to remove the two I wrote - Tools/world and > Tools/pynche - from the distribution and release them as separate > projects (retaining the PSF license). Should I remove them from both > the Python 2.x and 3.x trunks? +1, but please for 2.7 and 3.1 only. From ajaksu at gmail.com Mon Apr 6 01:56:09 2009 From: ajaksu at gmail.com (Daniel (ajax) Diniz) Date: Sun, 5 Apr 2009 20:56:09 -0300 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8EC71.5020105@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: <2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com> "Martin v. L?wis" wrote: >> I think it would be a good idea to host a temporary svn mirrors for >> developers who accesses their VCS via an IDE. Although, I am sure >> anymore if supporting these developers (if there are any) would worth >> the trouble. So, think of this as optional. > > Any decision to have or not have such a feature should be stated in > the PEP. I personally don't use IDEs, so I don't care (although > I do notice that the apparent absence of IDE support for Mercurial > indicates maturity of the technology) I can spend some time on Mercurial integration for the main IDEs in use by core devs, I'm sure the PIDA folks have most of this sorted already. It would be important to have these (and any other non-PEP worthy tasks/helpers) listed with some detail, e.g., in a wiki page. If anyone has requests for tools that would make the transition smoother (e.g. the script for /external, a wrapper for svnmerge semantics on top of hg transplant, etc.) but has no time to work on them, please add them to http://wiki.python.org/moin/CoreDevHelperTools . Daniel From aleaxit at gmail.com Mon Apr 6 02:28:51 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Sun, 5 Apr 2009 17:28:51 -0700 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <49D9369C.8080400@gmail.com> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> <49D9369C.8080400@gmail.com> Message-ID: On Sun, Apr 5, 2009 at 3:54 PM, Nick Coghlan wrote: > Antoine Pitrou wrote: > > James Y Knight fuhm.net> writes: > >> It seems that a separate method "_internal_close" should've been > >> defined to do the actual closing of the file, and the close() method > >> should've been defined on the base class as "self.flush(); > >> self._internal_close()" and never overridden. > > > > I'm completely open to changes as long as there is a reasonable consensus > around > > them. Your proposal looks sane, although the fact that a semi-private > method > > (starting with an underscore) is designed to be overriden in some classes > is a > > bit annoying. > > Note that we already do that in a couple of places where it makes sense > - in those cases the underscore is there to tell *clients* of the class > "don't call this directly", but it is still explicitly documented as > part of the subclassing API. > > (the only example I can find at the moment is in asynchat, but I thought > there were a couple of more common ones than that - hopefully I'll think > of them later) > Queue.Queue in 2.* (and queue.Queue in 3.*) is like that too -- the single leading underscore meaning "protected" ("I'm here for subclasses to override me, only" in C++ parlance) and a great way to denote "hook methods" in a Template Method design pattern instance. Base class deals with all locking issues in e.g. 'get' (the method a client calls), subclass can override _get and not worry about threading (it will be called by parent class's get with proper locks held and locks will be properly released &c afterwards). Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Apr 6 03:06:52 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 06 Apr 2009 13:06:52 +1200 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> Message-ID: <49D955AC.2050106@canterbury.ac.nz> Antoine Pitrou wrote: > Your proposal looks sane, although the fact that a semi-private method > (starting with an underscore) is designed to be overriden in some classes is a > bit annoying. The only other way I can see is to give up any attempt in the base class to ensure that flushing occurs before closing, and make that the responsibility of the derived class. -- Greg From skip at pobox.com Mon Apr 6 04:03:44 2009 From: skip at pobox.com (skip at pobox.com) Date: Sun, 5 Apr 2009 21:03:44 -0500 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: <18905.25344.21772.908887@montanaro.dyndns.org> After the private hell I've gone through the past few days stumbling around Mercurial without really understanding what the hell I was doing, I strongly recommend that when the conversion is complete that there is a "do it just like you did it with svn" mode available. Fortunately, this was just with my little lockfile module, so the damage was very isolated. (And perhaps "damage" is the wrong word. Someone more experienced with hg could almost certainly correct my mistakes.) I freely admit that my own misunderstanding of how Mercurial works was the primary cause of my problems. Still, until people are real familiar with what is going on, especially people like me who have little or no familiarity with dVCSs I think it's best to just treat it like Subversion if at all possible. Skip From skip at pobox.com Mon Apr 6 04:27:36 2009 From: skip at pobox.com (skip at pobox.com) Date: Sun, 5 Apr 2009 21:27:36 -0500 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8D280.6060900@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> <49D8D280.6060900@ochtman.nl> Message-ID: <18905.26776.630604.13623@montanaro.dyndns.org> >> As for the clone time, one of our proeminent developers is, IIRC, on >> a 40 kb/s line. Perhaps he wants to step in and say whether cloning >> the trunk is a painful experience for him, or not. Dirkjan> Sure it's painful, but he only has to go through that once, Dirkjan> maybe twice. Maybe once for each currently active Subversion branch (2.6, 2.7, 3.0, 3.1)? Skip From skip at pobox.com Mon Apr 6 04:50:42 2009 From: skip at pobox.com (skip at pobox.com) Date: Sun, 5 Apr 2009 21:50:42 -0500 Subject: [Python-Dev] Tools In-Reply-To: <49D9409C.6060108@ubuntu.com> References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> <49D9409C.6060108@ubuntu.com> Message-ID: <18905.28162.645078.593247@montanaro.dyndns.org> Barry> Someone asked me at Pycon about stripping out Demos and Tools. Matthias> +1, but please for 2.7 and 3.1 only. Is there a list of other demos or tools which should be deleted? If possible the list should be publicized so that people can pick up external maintenance if desired. Skip From jackdied at gmail.com Mon Apr 6 04:58:22 2009 From: jackdied at gmail.com (Jack diederich) Date: Sun, 5 Apr 2009 22:58:22 -0400 Subject: [Python-Dev] Tools In-Reply-To: <18905.28162.645078.593247@montanaro.dyndns.org> References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> <49D9409C.6060108@ubuntu.com> <18905.28162.645078.593247@montanaro.dyndns.org> Message-ID: On Sun, Apr 5, 2009 at 10:50 PM, wrote: > ? ?Barry> Someone asked me at Pycon about stripping out Demos and Tools. > > ? ?Matthias> +1, but please for 2.7 and 3.1 only. > > Is there a list of other demos or tools which should be deleted? ?If > possible the list should be publicized so that people can pick up external > maintenance if desired. I liked Brett's (Georg's?) half joking idea at sprints. Just delete each subdirectory in a separate commit and then wait to see what people revert. -Jack From alexandre at peadrop.com Mon Apr 6 06:06:15 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 6 Apr 2009 00:06:15 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8EC71.5020105@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: On Sun, Apr 5, 2009 at 1:37 PM, "Martin v. L?wis" wrote: > I think it should be stated in the PEP what branches get converted, > in what form, and what the further usage of the svn repository should > be. > Noted. > I think there is a long tradition of such annotations; we should > try to repeat history here. IIUC, the Debian bugtracker understands > > ? Closes: #4532 > > and some other syntaxes. It must be easy to remember, else people > won't use it. > That should reasonable. Personally, I don't really care about the syntax we would use as long its consistent and documented. > Any decision to have or not have such a feature should be stated in > the PEP. I personally don't use IDEs, so I don't care (although > I do notice that the apparent absence of IDE support for Mercurial > indicates maturity of the technology) > I know Netbeans has Mercurial support built-in (which makes sense because Sun uses Mercurial for its open-source projects). However, I am not sure if Eclipse has good Mercurial support yet. There are 3rd-party plugins for Eclipse, but I don't know if they work well. > Ok, I take that back. I assumed that Mercurial could work *exactly* > as Subversion. Apparently, that's not the case (although I have no > idea what a server-side clone is). So I wait for the PEP to explain > how authentication and access control is to be implemented. Creating > individual Unix accounts for committers should be avoided. With Subversion, we can do a server-side clone (or copy) using the copy command: svn copy SRC_URL DEST_URL This prevents wasting time and bandwidth by doing the copy directly on server. Without this feature, you would need to checkout the remote repository to clone, then push it to a different location. Since upload bandwidth is often limited, creating new branch in a such fashion would be time consuming. With Mercurial, we will need to add support for server-side clone ourselves. There's few ways to provide this feature. We give Unix user accounts to all core developers and let developers manages their private branches directly on the server. You made clear that this is not wanted. So an alternative approach is to add a interface accessible via SSH. As I previously mentioned, this is the approach used by Mozilla. Yet another approach would be to add a web interface for managing the repositories. This what OpenSolaris admins opted for. Personnally, I do not think this a good idea because it would requires us to roll our own authentication mechanism which is clearly a bad thing (both security-wise and usability-wise). This makes me remember that we will have to decide how we will reorganize our workflow. For this, we can either be conservative and keep the current CVS-style development workflow?i.e., a few main repositories where all developers can commit to. Or we could drink the kool-aid and go with a kernel-style development workflow?i.e., each developer maintains his own branch and pull changes from each others. >From what I have heard, the CVS-style workflow has a lower overhead than the kernel-style workflow. However the kernel-style workflow somehow advantageous because changes get reviewed several times before they get in the main branches. Thus, it is less likely that someone manage to break the build. In addition, Mercurial is much better suited at supporting the kernel-style workflow. However if we go kernel-style, I will need to designate someone (i.e., an integrator) that will maintain the main branches, which will tested by buildbot and used for the public releases. These are issues I would like to address in the PEP. > I can give you access to the master setup. Ideally, this should > be tested before the switchover (with a single branch). We also > need instructions for the slaves (if any - perhaps installing > a hg binary is sufficient). > I am not too familiar with our buildbot setup. So, I will to do some reading before actually doing any change. You can give me access to the buildbot master now. However, I would use this access only to study how the current setup works and to plan the changes we need accordingly. >> Since the directories in /external are considered read-only, we could >> simply a new Mercurial repository and copy the content of /external in >> it. >>> - decide what to do with the bzr mirrors >>> >> >> I don't see much benefits to keep them. > > Both should go into the PEP. Noted. Regards, -- Alexandre From aahz at pythoncraft.com Mon Apr 6 06:20:16 2009 From: aahz at pythoncraft.com (Aahz) Date: Sun, 5 Apr 2009 21:20:16 -0700 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: <20090406042016.GA97@panix.com> On Mon, Apr 06, 2009, Alexandre Vassalotti wrote: > > This makes me remember that we will have to decide how we will > reorganize our workflow. For this, we can either be conservative and > keep the current CVS-style development workflow?i.e., a few main > repositories where all developers can commit to. Or we could drink the > kool-aid and go with a kernel-style development workflow?i.e., each > developer maintains his own branch and pull changes from each others. How difficult would it be to change the decision later? That is, how about starting with a CVS-style system and maybe switch to kernel-style once people get comfortable with Hg? -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From alexandre at peadrop.com Mon Apr 6 06:20:52 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 6 Apr 2009 00:20:52 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FC47.8080803@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: On Sun, Apr 5, 2009 at 2:45 PM, Dirkjan Ochtman wrote: > On 05/04/2009 20:36, "Martin v. L?wis" wrote: >> >> We do require full real names (i.e. no nicknames). Can Mercurial >> guarantee such a thing? > > We could pre-record the list of allowed names in a hook, then have the hook > check that usernames include one of those names and an email address (so > people can still start using another email address). > But that won't work if people who are not core developers submit us patch bundle to import. And maintaining a such white-list sounds to me more burdensome than necessary. -- Alexandre From alexandre at peadrop.com Mon Apr 6 06:26:06 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 6 Apr 2009 00:26:06 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FB14.2080509@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> <49D8FB14.2080509@v.loewis.de> Message-ID: On Sun, Apr 5, 2009 at 2:40 PM, "Martin v. L?wis" wrote: >> Okay, sounds like that will be easy. Would be good to enable compression >> on the SSH, though, if that's not already done. > > Where is that configured? > If I recall correctly, only ssh clients can request compression to the server?in other words, the server cannot forces the clients to use compression, but merely allow them use it. See the man page for sshd_config and ssh_config for the specific details. -- Alexandre From alexandre at peadrop.com Mon Apr 6 06:31:56 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 6 Apr 2009 00:31:56 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <20090406042016.GA97@panix.com> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <20090406042016.GA97@panix.com> Message-ID: On Mon, Apr 6, 2009 at 12:20 AM, Aahz wrote: > How difficult would it be to change the decision later? ?That is, how > about starting with a CVS-style system and maybe switch to kernel-style > once people get comfortable with Hg? I believe it would be fairly easy. It would be a matter of declaring a volunteer to maintain the main repositories and ask core developers to avoid committing directly to them. Cheers, -- Alexandre From dirkjan at ochtman.nl Mon Apr 6 08:00:10 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 08:00:10 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D93524.3060500@gmail.com> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D93524.3060500@gmail.com> Message-ID: On Mon, Apr 6, 2009 at 00:48, Mark Hammond wrote: > Just to be clear, what input would you like on that map? Review of email addresses, pointers to names/email addresses for the usernames I don't have anything for yet. Also, there's a few commented question marks, it would be useful if someone checked those. > I'm listed twice: > > mark.hammond = Mark Hammond > mhammond = Mark Hammond > > but that email address isn't the address normally associated with any > checkins I make, nor the address in the comments of the ssh keys I use > (which is mhammond at skippinet.com.au) Your being listed twice is normal; both mark.hammond and mhammond have been used in the commit history, and I just assumed they're both you. I'll probably change your email address to be the one associated with the checkins/public key, though. Is there a list of such email addresses? I just parsed python-dev archives to get to my list. Cheers, Dirkjan From dirkjan at ochtman.nl Mon Apr 6 08:03:18 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 08:03:18 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <18905.26776.630604.13623@montanaro.dyndns.org> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> <49D8D280.6060900@ochtman.nl> <18905.26776.630604.13623@montanaro.dyndns.org> Message-ID: On Mon, Apr 6, 2009 at 04:27, wrote: > Maybe once for each currently active Subversion branch (2.6, 2.7, 3.0, 3.1)? Sure, if we're doing cloned branches. But then someone will also need to clone 2.5, and maybe 2.4. The point is, as long as it's a constant factor and not an order of magnitude more, it's still quite easy to cope with. This would also be one of the arguments *for* named branches, I suppose. Cheers, Dirkjan From dirkjan at ochtman.nl Mon Apr 6 08:04:52 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 08:04:52 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti wrote: > But that won't work if people who are not core developers submit us > patch bundle to import. And maintaining a such white-list sounds to me > more burdensome than necessary. Well, if you need contributors to sign a contributor's agreement anyway, there's already some list out there that we can leverage. The other option is to play the consenting adults card and ask all people with push access to ascertain the correct names of committer names on patches they push. Cheers, Dirkjan From dirkjan at ochtman.nl Mon Apr 6 08:07:39 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 06:07:39 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8C5E3.5000200@ochtman.nl> <49D8FB14.2080509@v.loewis.de> Message-ID: Alexandre Vassalotti peadrop.com> writes: > If I recall correctly, only ssh clients can request compression to the > server?in other words, the server cannot forces the clients to use > compression, but merely allow them use it. > > See the man page for sshd_config and ssh_config for the specific details. So we'll explain how to configure that in the .hgrc/Mercurial.ini file that people will have to create anyway. Alternatively, we do it the way Mozilla has done and let everyone clone/pull over http and push over ssh. Then everyone always gets compression for the big clones/pulls, pushes are a little slower (but they aren't usually that large), and people who don't have push access already have the right setup. Cheers, Dirkjan From dirkjan at ochtman.nl Mon Apr 6 08:13:00 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 06:13:00 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8BFD2.8090600@ochtman.nl> <49D8F02C.1040008@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > Is it possible to branch from a subdirectory? For the "different VMs" > stuff, it's rather desirable to have a branch of the test suite, and > the perhaps the standard library, than extracting it from the source. You can only branch the whole repository. Of course you could drop the other stuff right after branching it, but that would kind of defy the point of branching (since you won't really be able to merge later on). This is why it might be interesting to just split out the stdlib entirely. Though maybe we should wait for Mercurial's subrepos support to arrive before we go there (so we can a CPython repo that has the stdlib repo included automatically). Something like that is already provided by the forest extension, but it's not being maintained. Subrepo support is slated for the 1.3 release, which is planned for early July. Cheers, Dirkjan From dirkjan at ochtman.nl Mon Apr 6 08:21:05 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 06:21:05 +0000 (UTC) Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: Alexandre Vassalotti peadrop.com> writes: > With Mercurial, we will need to add support for server-side clone > ourselves. There's few ways to provide this feature. We give Unix user > accounts to all core developers and let developers manages their > private branches directly on the server. You made clear that this is > not wanted. So an alternative approach is to add a interface > accessible via SSH. As I previously mentioned, this is the approach > used by Mozilla. The easier solution here is to just allow normal local-to-remote clones. hg supports commands like hg clone . ssh://hg at hg.python.org/my-branch without the need for any extra scripts or setup. I think that would be a good start. > This makes me remember that we will have to decide how we will > reorganize our workflow. For this, we can either be conservative and > keep the current CVS-style development workflow?i.e., a few main > repositories where all developers can commit to. Or we could drink the > kool-aid and go with a kernel-style development workflow?i.e., each > developer maintains his own branch and pull changes from each others. The differences between these workflows aren't all that big, i.e. it's not like there's a big schisma between them. But I suspect that, in a setup where buildbots are important, a very much multi-repo setup probably isn't a good idea (this is also why Mozilla doesn't use that many repos; their continuous integration infra is /very/ important to them). Cheers, Dirkjan From brian at sweetapp.com Mon Apr 6 08:51:21 2009 From: brian at sweetapp.com (Brian Quinlan) Date: Mon, 06 Apr 2009 07:51:21 +0100 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> Message-ID: <49D9A669.9010008@sweetapp.com> James Y Knight wrote: > > On Apr 5, 2009, at 6:29 AM, Antoine Pitrou wrote: > >> Brian Quinlan sweetapp.com> writes: >>> >>> I don't see why this is helpful. Could you explain why >>> _RawIOBase.close() calling self.flush() is useful? >> >> I could not explain it for sure since I didn't write the Python version. >> I suppose it's so that people who only override flush() automatically >> get the >> flush-on-close behaviour. > > It seems that a separate method "_internal_close" should've been defined > to do the actual closing of the file, and the close() method should've > been defined on the base class as "self.flush(); self._internal_close()" > and never overridden. Are you imagining something like this? class RawIOBase(object): def flush(self): pass def _internal_close(self): pass def close(self): self.flush() self._internal_close() class FileIO(RawIOBase): def _internal_close(self): # Do close super()._internal_close() class SomeSubclass(FileIO): def flush(self): # Do flush super().flush() def _internal_close(self): # Do close super()._internal_close() That looks pretty good. RawIOBase.close acts as the controller and .flush() calls move up the class hierarchy. The downsides that I see: - you need the cooperation of your subclasses i.e. they must call super().flush() in .flush() to get correct close behavior (and this represents a backwards-incompatible semantic change) - there is also going to be some extra method calls Another approach is to get every subclass to deal with their own close semantics i.e. class RawIOBase(object): def flush(self): pass def close(self): pass class FileIO(RawIOBase): def close(self): # Do close super().close() class SomeSubclass(FileIO): def _flush_internal(self): # Do flush def flush(self): self._flush_internal() super().flush() def close(self): FileIO._flush_internal(self) # Do close super().close() I was thinking about this approach when I wrote this patch: http://bugs.python.org/file13620/remove_flush.diff But I think I like your way better. Let me play with it a bit. Cheers, Brian From larry at hastings.org Mon Apr 6 10:00:57 2009 From: larry at hastings.org (Larry Hastings) Date: Mon, 06 Apr 2009 01:00:57 -0700 Subject: [Python-Dev] CObject take 2: Introducing the "Capsule" Message-ID: <49D9B6B9.6020304@hastings.org> (See my posting "Let's update CObject API so it is safe and regular!" from 2009/03/31 for "take 1"). I discussed this off-list with GvR. He was primarily concerned with fixing the passing-around-a-vtable C API usage of CObject, but he wanted to preserve as much backwards compatibility as possible. In the end, he suggested I create a new API and leave CObject unchanged. I've done that, incorporating many of GvR's suggestions, though the blame for the proposed new API is ultimately mine. The new object is called a "Capsule". (I *had* wanted to call it "Wrapper", but there's already a PyWrapper_New in descrobject.h.) Highlights of the new API: * PyCapsule_New() replaces PyCObject_FromVoidPtr. * It takes a void * pointer, a const char *name, and a destructor. * The pointer must not be NULL. * The name may be NULL; if it is not NULL, it must be a valid C string which outlives the capsule. * The destructor takes a PyObject *, not a void *. * PyCapsule_GetPointer() replaces PyCObject_AsVoidPtr. * It takes a PyObject * and a const char *name. * The name must compare to the name inside the object; either they're both NULL or they strcmp to be the same. * PyCapsule_Import() replaces PyCObject_Import. * It takes three arguments: const char *module_name, const char *attribute_name, int no_block. * It ensures that the "name" of the Capsule is "modulename.attributename". * If no_block is true, it uses PyModule_ImportModuleNoBlock. If this fails it sets no exception. * The PyCapsule structure is private. There are accessors for all fields: pointer, name, destructor, and "context". * The "context" is a second "void *" you can set / get. You can read the full API and its implementation in the patch I just posted to the tracker: http://bugs.python.org/issue5630 The patch was written against svn r71304. The patch isn't ready to be applied--there is no documentation for the new API beyond the header file. GvR and I disagree on one point: he thinks that we should leave CObject in forever, undeprecated. I think we should deprecate it now and remove it... whenever we'd do that. The new API does everything the old one does, and more, and it's cleaner and safer. Let me start an informal poll: assuming we accept the new API, should we deprecate CObject? /larry/ From phil at freehackers.org Mon Apr 6 10:21:36 2009 From: phil at freehackers.org (Philippe Fremy) Date: Mon, 06 Apr 2009 10:21:36 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8FBCC.1050801@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> Message-ID: <49D9BB90.8040008@freehackers.org> Dirkjan Ochtman wrote: > On 05/04/2009 20:29, "Martin v. L?wis" wrote: >> FYI: this is the list of hooks currently employed: >> - pre: check whitespace >> - post: mail python-checkins >> inform regular buildbot >> inform community buildbot >> trigger website rebuild if a PEP was modified >> (but then, whether or not the PEPs will be maintained >> in hg also needs to be decided) > > All this is easy to do with Mercurial's hook system. One question: if somebody pushes a changeset with 3 commits, will the pre and post hooks be applied on all of the commits, or only on the final commit ? If this is applied on every commit, then you have no way to fix a whitespace problem without rewriting your history ? cheers, Philippe From dirkjan at ochtman.nl Mon Apr 6 10:33:36 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 10:33:36 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9BB90.8040008@freehackers.org> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> Message-ID: On Mon, Apr 6, 2009 at 10:21, Philippe Fremy wrote: > One question: if somebody pushes a changeset with 3 commits, will the > pre and post hooks be applied on all of the commits, or only on the > final commit ? > > If this is applied on every commit, then you have no way to fix a > whitespace problem without rewriting your history ? Correct, so if the latter is something we want, we could run the whitespace hook just on every changegroup (group of changesets pushed). Cheers, Dirkjan From aafshar at gmail.com Mon Apr 6 09:52:15 2009 From: aafshar at gmail.com (Ali Afshar) Date: Mon, 06 Apr 2009 08:52:15 +0100 Subject: [Python-Dev] Mercurial? In-Reply-To: <2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <2d75d7660904051656s2242a9ex91ac0a2d8056cbfe@mail.gmail.com> Message-ID: <49D9B4AF.6080403@gmail.com> Daniel (ajax) Diniz wrote: > "Martin v. L?wis" wrote: > >>> I think it would be a good idea to host a temporary svn mirrors for >>> developers who accesses their VCS via an IDE. Although, I am sure >>> anymore if supporting these developers (if there are any) would worth >>> the trouble. So, think of this as optional. >>> >> Any decision to have or not have such a feature should be stated in >> the PEP. I personally don't use IDEs, so I don't care (although >> I do notice that the apparent absence of IDE support for Mercurial >> indicates maturity of the technology) >> > > I can spend some time on Mercurial integration for the main IDEs in > use by core devs, I'm sure the PIDA folks have most of this sorted > already. It would be important to have these (and any other non-PEP > worthy tasks/helpers) listed with some detail, e.g., in a wiki page. > > Well PIDA is the IDE-hater's IDE, but yes, it has excellent Mercurial integration (probably the best integration of any system). It is all through anyvc with a small amount of user interface added. I am sure this would be easily portable. Ali (thanks for cc) From phil at freehackers.org Mon Apr 6 11:14:39 2009 From: phil at freehackers.org (Philippe Fremy) Date: Mon, 06 Apr 2009 11:14:39 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> Message-ID: <49D9C7FF.80506@freehackers.org> Dirkjan Ochtman wrote: > On Mon, Apr 6, 2009 at 10:21, Philippe Fremy wrote: >> One question: if somebody pushes a changeset with 3 commits, will the >> pre and post hooks be applied on all of the commits, or only on the >> final commit ? >> >> If this is applied on every commit, then you have no way to fix a >> whitespace problem without rewriting your history ? > > Correct, so if the latter is something we want, we could run the > whitespace hook just on every changegroup (group of changesets > pushed). Probably wise, and for many other checks as well. This is a problem I have with my daily usage of mercurial. It's supposed to be great to work offline and to commit your intermediate versions before it's fully working but if you do that, all those intermediate non working versions find their way into the main repository. This means that something like "all test pass 100% or close on every version of the repository" is not really feasible unless every committer agrees not to have any version in his local repository that does not break any tests. Which defeats part of the purpose of being able to have a local repository, no ? cheers, Philippe From dirkjan at ochtman.nl Mon Apr 6 11:41:30 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 11:41:30 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9C7FF.80506@freehackers.org> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> <49D9C7FF.80506@freehackers.org> Message-ID: On Mon, Apr 6, 2009 at 11:14, Philippe Fremy wrote: > This is a problem I have with my daily usage of mercurial. It's supposed > to be great to work offline and to commit your intermediate versions > before it's fully working but if you do that, all those intermediate non > working versions find their way into the main repository. Well, it can also be nice to have smaller commits. They're easier to review, and will provide easier history to browse/read later on. BTW, having smaller commits doesn't necessarily equate having non-working changesets. I.e. in my work on Mercurial, I'll often push small changesets (we all do), but we try to keep the test suite passing in every single one of them. > This means that something like "all test pass 100% or close on every > version of the repository" is not really feasible unless every committer > agrees not to have any version in his local repository that does not > break any tests. Which defeats part of the purpose of being able to have > a local repository, no ? This is why you'd want something like a pushlog, to provide a way to see what revisions were actually tested by buildbots. Another thing that I discussed with Georg last night would be a setup where changesets get pushed to a gateway repo that runs the tests and only pushes to an "official" repo if everything's still green. That should probably be a topic discussed separately, though. Cheers, Dirkjan From ncoghlan at gmail.com Mon Apr 6 13:08:47 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 21:08:47 +1000 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> <49D9369C.8080400@gmail.com> Message-ID: <49D9E2BF.1010604@gmail.com> Alex Martelli wrote: > Queue.Queue in 2.* (and queue.Queue in 3.*) is like that too -- the > single leading underscore meaning "protected" ("I'm here for subclasses > to override me, only" in C++ parlance) and a great way to denote "hook > methods" in a Template Method design pattern instance. Base class deals > with all locking issues in e.g. 'get' (the method a client calls), > subclass can override _get and not worry about threading (it will be > called by parent class's get with proper locks held and locks will be > properly released &c afterwards). Ah, thank you - yes, that's the one I was thinking of. My brain was telling me "threading", which makes some sense, since I put the Queue conceptually in the same bucket as the rest of the locking constructs in the threading module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Mon Apr 6 13:13:36 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 21:13:36 +1000 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <49D9A669.9010008@sweetapp.com> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> <49D9A669.9010008@sweetapp.com> Message-ID: <49D9E3E0.2060408@gmail.com> Brian Quinlan wrote: > - you need the cooperation of your subclasses i.e. they must call > super().flush() in .flush() to get correct close behavior (and this > represents a backwards-incompatible semantic change) Are you sure about that? Going by the current _pyio semantics that Antoine posted, it looks to me that it is already the case that subclasses need to invoke the parent flush() call correctly to avoid breaking the base class semantics (which really isn't an uncommon problem when it comes to writing correct subclasses). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Mon Apr 6 13:36:14 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 21:36:14 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <20090406042016.GA97@panix.com> Message-ID: <49D9E92E.4010603@gmail.com> Alexandre Vassalotti wrote: > On Mon, Apr 6, 2009 at 12:20 AM, Aahz wrote: >> How difficult would it be to change the decision later? That is, how >> about starting with a CVS-style system and maybe switch to kernel-style >> once people get comfortable with Hg? > > I believe it would be fairly easy. It would be a matter of declaring a > volunteer to maintain the main repositories and ask core developers to > avoid committing directly to them. I think that would be the way to go then (i.e. start with a fairly centralised workflow, and then look at adjusting to something more decentralised later)*. Cheers, Nick. *I actually had an interesting off-list discussion with Steve Turnbull regarding how well the 3 most popular DVCS tools supported centralised and decentralised workflows (or rather, how their advocates evangelise them in that respect). This is relevant when pitching a DVCS to people like me that really only have experience working with a centralised repository model like CVS or SVN. My guess was that Bazaar anchored the "centralised" end of the DVCS scale by letting users avoid caring about the underlying acyclic graph, while Git was solidly down the "decentralised" end with users expected to be fully aware of and comfortable with the graph. Mercurial appeared to be somewhere in the middle, as it allowed you to avoid caring about the graph most of the time, but still provided tools to manipulate it when you needed to. That makes Bazaar easy to pitch conceptually to someone like me ("you can use it just like you use SVN, only with much better merging and offline support"), and Git a tough sell ("umm, yeah, you really think about version control all wrong... we're going to have to fix that before Git makes much sense to you"). Mercurial appears to best allow the sales pitch to be tailored to the target audience (in this case, a group including a lot of people with a background predominantly involving centralised version control tools). That's just a subjective impression formed from reading what other people have written *about* the various tools, rather than anything based on my own experience using them, so you may want to investigate the location of the nearest salt mine before taking it too seriously :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From hrvoje.niksic at avl.com Mon Apr 6 13:37:15 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Mon, 06 Apr 2009 13:37:15 +0200 Subject: [Python-Dev] Getting values stored inside sets In-Reply-To: <17434881.97057.1238778876345.JavaMail.xicrypt@atgrzls001> References: <49D5FBE6.6090807@avl.com> <17434881.97057.1238778876345.JavaMail.xicrypt@atgrzls001> Message-ID: <49D9E96B.1060805@avl.com> Raymond Hettinger wrote: >> Hrvoje Niksic wrote: >>> I've stumbled upon an oddity using sets. It's trivial to test if a >>> value is in the set, but it appears to be impossible to retrieve a >>> stored value, > > See: http://code.activestate.com/recipes/499299/ Thanks, this is *really* good, the kind of idea that seems perfectly obvious once pointed out by someone else. :-) I'd still prefer sets to get this functionality so they can be used to implement, say, interning, but this is good enough for me. In fact, I can derive from set and add a method similar to that in the recipe. It can be a bit simpler than yours because it only needs to support operations needed by sets (__eq__ and __hash__), not arbitrary attributes. class Set(set): def find(self, item, default=None): capt = _CaptureEq(item) if capt in self: return capt.match return default class _CaptureEq(object): __slots__ = 'obj', 'match' def __init__(self, obj): self.obj = obj def __eq__(self, other): eq = (self.obj == other) if eq: self.match = other return eq def __hash__(self): return hash(self.obj) >>> s = Set([1, 2, 3]) >>> s.find(2.0) 2 From ncoghlan at gmail.com Mon Apr 6 13:44:21 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 21:44:21 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D8BC81.7040007@ochtman.nl> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> Message-ID: <49D9EB15.8070806@gmail.com> Dirkjan Ochtman wrote: > I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. > Could use some review, but it seems like a good start. Martin may be able to provide a better list of names based on the checkin name<->SSH public key mapping in the SVN setup. (e.g. I believe my SVN checkin name is nick.coghlan rather than the shorter ncoghlan in my email address, and many others are in a similar boat since first.last was the chosen scheme for names in the SVN switchover) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Mon Apr 6 13:47:02 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 06 Apr 2009 21:47:02 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> <49D9C7FF.80506@freehackers.org> Message-ID: <49D9EBB6.1080004@gmail.com> Dirkjan Ochtman wrote: > Another thing that I discussed with Georg last night would be a setup > where changesets get pushed to a gateway repo that runs the tests and > only pushes to an "official" repo if everything's still green. That > should probably be a topic discussed separately, though. That was one of the post-switch workflow enhancements that Barry was advocating - it's still a good idea, even if Barry's preferred flavour of DVCS wasn't chosen :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From fuzzyman at voidspace.org.uk Mon Apr 6 13:55:55 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 06 Apr 2009 12:55:55 +0100 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9EBB6.1080004@gmail.com> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> <49D9C7FF.80506@freehackers.org> <49D9EBB6.1080004@gmail.com> Message-ID: <49D9EDCB.7010905@voidspace.org.uk> Nick Coghlan wrote: > Dirkjan Ochtman wrote: > >> Another thing that I discussed with Georg last night would be a setup >> where changesets get pushed to a gateway repo that runs the tests and >> only pushes to an "official" repo if everything's still green. That >> should probably be a topic discussed separately, though. >> > > That was one of the post-switch workflow enhancements that Barry was > advocating - it's still a good idea, even if Barry's preferred flavour > of DVCS wasn't chosen :) > > Gated checkins can work fine but can also have many problems. For example if we have a spuriously failing test then if you are working on an unrelated issue it will be entirely up to chance as to whether you can checkin... Building the docs would be another thing we could check, although it can take a while. If we have a queue then it could be the case that you do a commit - and then discover half an hour later that it conflicts with something that was ahead of you in the queue. Michael > Cheers, > Nick. > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From dirkjan at ochtman.nl Mon Apr 6 14:23:29 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 6 Apr 2009 14:23:29 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9EDCB.7010905@voidspace.org.uk> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> <49D9C7FF.80506@freehackers.org> <49D9EBB6.1080004@gmail.com> <49D9EDCB.7010905@voidspace.org.uk> Message-ID: On Mon, Apr 6, 2009 at 13:55, Michael Foord wrote: > Gated checkins can work fine but can also have many problems. For example if > we have a spuriously failing test then if you are working on an unrelated > issue it will be entirely up to chance as to whether you can checkin... Sure, it's a problem, but it does get you a tree that's always green. They're all trade-offs. But let's keep this discussion for some time *after* migration to hg is completed. Cheers, Dirkjan From jnoller at gmail.com Mon Apr 6 14:55:58 2009 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 6 Apr 2009 08:55:58 -0400 Subject: [Python-Dev] Tools In-Reply-To: References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> <49D9409C.6060108@ubuntu.com> <18905.28162.645078.593247@montanaro.dyndns.org> Message-ID: <4222a8490904060555u676e0a02y50db666a9f447436@mail.gmail.com> On Sun, Apr 5, 2009 at 10:58 PM, Jack diederich wrote: > On Sun, Apr 5, 2009 at 10:50 PM, ? wrote: >> ? ?Barry> Someone asked me at Pycon about stripping out Demos and Tools. >> >> ? ?Matthias> +1, but please for 2.7 and 3.1 only. >> >> Is there a list of other demos or tools which should be deleted? ?If >> possible the list should be publicized so that people can pick up external >> maintenance if desired. > > I liked Brett's (Georg's?) half joking idea at sprints. ?Just delete > each subdirectory in a separate commit and then wait to see what > people revert. > > -Jack Jack brought up a good point - this discussion came up during the sprints, I believe Martin and others had some good arguments to keep *some* of the demo/... stuff, however I think we all agreed that it belongs somewhere else; possibly the documentation. As it is, the demo/... directory only exists in subversion - it's not installed anywhere. I really do think that most of the contents can either be deleted, or moved to the docs where it might be of more use for people in general. Random thought - what if we made a docs/demos directory, which contained sub directories ala Demo/... - and added a sphinx extension which would detect nested directories and zip them up during the build? This way, you could add a tag in the .rst for the module that looked like: .. demos:: multiprocessing.zip The zip would not be checked in, but created at build time from Docs/demos/multiprocessing Just some thoughts. Back to my coffee. -jesse From chris at simplistix.co.uk Mon Apr 6 15:00:18 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 06 Apr 2009 14:00:18 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D669AA.6080001@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk> <49D669AA.6080001@v.loewis.de> Message-ID: <49D9FCE2.4070805@simplistix.co.uk> Martin v. L?wis wrote: > Chris Withers wrote: >> Martin v. L?wis wrote: >>> I propose the following PEP for inclusion to Python 3.1. >>> Please comment. >> Would this support the following case: >> >> I have a package called mortar, which defines useful stuff: >> >> from mortar import content, ... >> >> I now want to distribute large optional chunks separately, but ideally >> so that the following will will work: >> >> from mortar.rbd import ... >> from mortar.zodb import ... >> from mortar.wsgi import ... >> >> Does the PEP support this? > > That's the primary purpose of the PEP. Are you sure? Does the pep really allow for: from mortar import content from mortar.rdb import something ...where 'content' is a function defined in mortar/__init__.py and 'something' is a function defined in mortar/rdb/__init__.py *and* the following are separate distributions on PyPI: - mortar - mortar.rdb ...where 'mortar' does not contain 'mortar.rdb'. > You can do this today already > (see the zope package, No, they have nothing but a (functionally) empty __init__.py in the zope package. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Mon Apr 6 15:01:09 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 06 Apr 2009 14:01:09 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com> Message-ID: <49D9FD15.9030406@simplistix.co.uk> Benjamin Peterson wrote: >>>> Assuming it breaks no tests, would there be objection to me committing >>>> the >>>> above change to the Python 3 trunk? >>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't >>> fix it." :-) >> Anything using an exec is broken by definition ;-) > > "practicality beats purity" > >> Benjamin? > > +0 OK, well, I'll use it as my first "test commit" when I get a chance :-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From aahz at pythoncraft.com Mon Apr 6 15:04:46 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 6 Apr 2009 06:04:46 -0700 Subject: [Python-Dev] FWD: Documentation site problems Message-ID: <20090406130446.GB19296@panix.com> The 3.0 docs seem to be correct: http://docs.python.org/3.0/tutorial/ ----- Forwarded message from Ernst Persson ----- > Subject: Documentation site problems > From: Ernst Persson > To: webmaster at python.org > Organization: StickyBit AB > Date: Mon, 06 Apr 2009 10:32:42 +0200 > > Hi, > > there contents is missing from the python tutorial: > http://docs.python.org/tutorial/ > > BR > /Ernst Persson ----- End forwarded message ----- -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From aahz at pythoncraft.com Mon Apr 6 15:06:18 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 6 Apr 2009 06:06:18 -0700 Subject: [Python-Dev] FWD: Library Reference is incomplete Message-ID: <20090406130618.GD19296@panix.com> Hrm, looks like the whole 2.6 build is broken. ----- Forwarded message from "M?ller-Reineke, Matthias" ----- > Subject: Library Reference is incomplete > Date: Mon, 6 Apr 2009 11:25:54 +0200 > From: "M?ller-Reineke, Matthias" > To: webmaster at python.org > > Dear Webmaster, > > "Library Reference" on http://www.python.org/doc/ takes me to http://docs.python.org/library/ . > That side doesn't contain the index of contents. > > Matthias M?ller-Reineke > > ------------------------------------------ > Grundeigent?mer-Versicherung VVaG > Gro?e B?ckerstra?e 7 > 20095 Hamburg > Tel: 040 - 3 76 63 - 199 > Fax: 040 - 3 76 63 - 98 199 > > http://www.grundvers.de > > > Firmensitz: Hamburg HRB 13 103 > Vorstand: Heinz Walter Berens (Vors.), R?diger Buyten > Aufsichtsratsvorsitzender: Peter Landmann ----- End forwarded message ----- -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From barry at python.org Mon Apr 6 15:07:21 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 6 Apr 2009 09:07:21 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9EDCB.7010905@voidspace.org.uk> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8F88B.3050102@v.loewis.de> <49D8FBCC.1050801@ochtman.nl> <49D9BB90.8040008@freehackers.org> <49D9C7FF.80506@freehackers.org> <49D9EBB6.1080004@gmail.com> <49D9EDCB.7010905@voidspace.org.uk> Message-ID: <462BFB67-C648-42DB-91BC-E9610DABC8D4@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 6, 2009, at 7:55 AM, Michael Foord wrote: > Gated checkins can work fine but can also have many problems. For > example if we have a spuriously failing test then if you are working > on an unrelated issue it will be entirely up to chance as to whether > you can checkin... > > Building the docs would be another thing we could check, although it > can take a while. > > If we have a queue then it could be the case that you do a commit - > and then discover half an hour later that it conflicts with > something that was ahead of you in the queue. All very true. Where I've worked with gated branches, there are procedures for dealing with each of these issues. For a test suite like Python's which runs in a few minutes, I don't think some of the more extreme approaches are necessary (as opposed to a system where a full test run takes *hours*). On the whole though, it's a net win because you know the main tree is always good. This is especially useful around release time! But I guess it's up to Benjamin now to push for that :). Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdn+iXEjvBPtnXfVAQIEKAP/b3RcUIxcxOpTGfk8POAj+oQXvcvIpI+H 6sN2CWss7bt9qLVlJMFCJoEH78JKnydHuGy+JmZf2rMtnfwIr0w7EFSMoT8X7tPg YflsHn3ePrBddqD9EOwXo+hQfgodSKHEyPHDPgYSMUtiR4TTqkVXD/o4ViQk4K1b YFtRkehHKfc= =F39k -----END PGP SIGNATURE----- From ben+python at benfinney.id.au Mon Apr 6 15:15:09 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 06 Apr 2009 23:15:09 +1000 Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <20090406042016.GA97@panix.com> <49D9E92E.4010603@gmail.com> Message-ID: <87ab6tappe.fsf@benfinney.id.au> Nick Coghlan writes: > My guess was that Bazaar anchored the "centralised" end of the DVCS > scale by letting users avoid caring about the underlying acyclic > graph [?] > That makes Bazaar easy to pitch conceptually to someone like me > ("you can use it just like you use SVN, only with much better > merging and offline support") [?] > Mercurial appears to best allow the sales pitch to be tailored to > the target audience (in this case, a group including a lot of people > with a background predominantly involving centralised version > control tools). I don't follow. Wouldn't your preceding points above instead make *Bazaar* the one best suited for a group including a lot of people with a background predominantly involving centralised version control tools? -- \ ?I disapprove of what you say, but I will defend to the death | `\ your right to say it.? ?Evelyn Beatrice Hall, _The Friends of | _o__) Voltaire_, 1906 | Ben Finney From jnoller at gmail.com Mon Apr 6 15:21:06 2009 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 6 Apr 2009 09:21:06 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D52115.6020001@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> Message-ID: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: > On 2009-04-02 17:32, Martin v. L?wis wrote: >> I propose the following PEP for inclusion to Python 3.1. > > Thanks for picking this up. > > I'd like to extend the proposal to Python 2.7 and later. > -1 to adding it to the 2.x series. There was much discussion around adding features to 2.x *and* 3.0, and the consensus seemed to *not* add new features to 2.x and use those new features as carrots to help lead people into 3.0. jesse From barry at python.org Mon Apr 6 15:26:24 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 6 Apr 2009 09:26:24 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: > On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >> On 2009-04-02 17:32, Martin v. L?wis wrote: >>> I propose the following PEP for inclusion to Python 3.1. >> >> Thanks for picking this up. >> >> I'd like to extend the proposal to Python 2.7 and later. >> > > -1 to adding it to the 2.x series. There was much discussion around > adding features to 2.x *and* 3.0, and the consensus seemed to *not* > add new features to 2.x and use those new features as carrots to help > lead people into 3.0. Actually, isn't the policy just that nothing can go into 2.7 that isn't backported from 3.1? Whether the actual backport happens or not is up to the developer though. OTOH, we talked about a lot of things and my recollection is probably fuzzy. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdoDAXEjvBPtnXfVAQIrPgQAse7BXQfPYHJJ/g3HNEtc0UmZZ9MCNtGc sIoZ2EHRVz+pylZT9fmSmorJdIdFvAj7E43tKsV2bQpo/am9XlL10SMn3k0KLxnF vNCi39nB1B7Uktbnrlpnfo4u93suuEqYexEwrkDhJuTMeye0Cxg0os5aysryuPza mKr5jsqkV5c= =Y9iP -----END PGP SIGNATURE----- From barry at python.org Mon Apr 6 15:34:09 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 6 Apr 2009 09:34:09 -0400 Subject: [Python-Dev] Tools In-Reply-To: <49D9409C.6060108@ubuntu.com> References: <6AD085E2-AC98-484D-B5FB-E6A3671C75FB@python.org> <49D9409C.6060108@ubuntu.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 5, 2009, at 7:37 PM, Matthias Klose wrote: > Barry Warsaw schrieb: >> Someone (I'm sorry, I forgot who) asked me at Pycon about stripping >> out >> Demos and Tools. I'm happy to remove the two I wrote - Tools/world >> and >> Tools/pynche - from the distribution and release them as separate >> projects (retaining the PSF license). Should I remove them from >> both >> the Python 2.x and 3.x trunks? > > +1, but please for 2.7 and 3.1 only. Yes, of course. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdoE0XEjvBPtnXfVAQIyFgP+MqBghtSqVigJF9w/u47npaheOusITPWT iUeeJfTFDDHBKyYKXOwpASW+SahtnTO3OTR3f40S0Ptf+HRGo0J2efWUWcbXkN5X ikrHePT8YIp0MC4qYcUAfNrSNtgYxJuVKd7ARCFotBSN3Nu+bxzPO+LGw5xhlvbT Q3H3f3TQM3A= =nCUB -----END PGP SIGNATURE----- From cesare.dimauro at a-tono.com Mon Apr 6 16:28:45 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Mon, 06 Apr 2009 16:28:45 +0200 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: On Mar 29, 2009 at 05:36PM, Guido van Rossum wrote: >> - Issue #5593: code like 1e16+2.9999 is optimized away and its result stored as >> a constant (again), but the result can vary slightly depending on the internal >> FPU precision. > > I would just not bother constant folding involving FP, or only if the > values involved have an exact representation in IEEE binary FP format. The Language Reference says nothing about the effects of code optimizations. I think it's a very good thing, because we can do some work here with constant folding. If someone wants to preserve precision with floats, it can always use a temporary variable, like in many other languages. >> These problems have probably been there for a long time and almost no one seems >> to complain, but I thought I'd report them here just in case. > > I would expect that constant folding isn't nearly effective in Python > as in other (less dynamic) languages because it doesn't do anything > for NAMED constants. E.g. > > MINUTE = 60 > > def half_hour(): > return MINUTE*30 > > This should be folded to "return 1800" but doesn't because the > compiler doesn't know that MINUTE is a constant. I completely agree. We can't say nothing about MINUTE at the time half_hour will be executed. The code here must never been changed. > Has anyone ever profiled the effectiveness of constant folding on > real-world code? The only kind of constant folding that I expect to be > making a diference is things like unary operators, since e.g. "x = -2" > is technically an expression involving a unary minus. At this time with Python 2.6.1 we have these results: def f(): return 1 + 2 * 3 + 4j dis(f) 1 0 LOAD_CONST 1 (1) 3 LOAD_CONST 5 (6) 6 BINARY_ADD 7 LOAD_CONST 4 (4j) 10 BINARY_ADD 11 RETURN_VALUE def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) dis(f) 1 0 LOAD_CONST 1 ('a') 3 LOAD_CONST 7 (('b', 'c')) 6 BUILD_LIST 2 9 LOAD_CONST 4 (1) 12 LOAD_CONST 8 (6) 15 BINARY_ADD 16 BINARY_MULTIPLY 17 RETURN_VALUE With proper constant folding code, both functions can be reduced to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by a single instruction at all with an advanced peephole optimizer). I'll show you it at PyCon in Florence, next month. > ISTM that historically, almost every time we attempted some new form > of constant folding, we introduced a bug. I found a very rich test battery with Python, which helped me a lot in my work of changing the ast, compiler, peephole, and VM. If they aren't enough, we can expand them to add more test cases. But, again, the Language Reference says nothing about optimizations. Cheers, Cesare From eric at trueblade.com Mon Apr 6 16:40:56 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 6 Apr 2009 10:40:56 -0400 (EDT) Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> Message-ID: <39936.63.251.87.214.1239028856.squirrel@mail.trueblade.com> > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: > >> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>> I propose the following PEP for inclusion to Python 3.1. >>> >>> Thanks for picking this up. >>> >>> I'd like to extend the proposal to Python 2.7 and later. >>> >> >> -1 to adding it to the 2.x series. There was much discussion around >> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >> add new features to 2.x and use those new features as carrots to help >> lead people into 3.0. > > Actually, isn't the policy just that nothing can go into 2.7 that > isn't backported from 3.1? Whether the actual backport happens or not > is up to the developer though. OTOH, we talked about a lot of things > and my recollection is probably fuzzy. I believe Barry is correct. The official policy is "no features in 2.7 that aren't also in 3.1". I personally think I'm not going to put anything else in 2.7, specifically the ',' formatter stuff from PEP 378. 3.1 has diverged too far from 2.7 in this regard to make the backport easy to do. But this decision is left up to the individual committer. From solipsis at pitrou.net Mon Apr 6 16:43:11 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Apr 2009 14:43:11 +0000 (UTC) Subject: [Python-Dev] pyc files, constant folding and borderline portability issues References: Message-ID: Cesare Di Mauro a-tono.com> writes: > def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) [...] > > With proper constant folding code, both functions can be reduced > to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by > a single instruction at all with an advanced peephole optimizer). Lists are mutable, you can't optimize the creation of list literals by storing them as singleton constants. Regards Antoine. From pje at telecommunity.com Mon Apr 6 17:21:42 2009 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 06 Apr 2009 11:21:42 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D9FCE2.4070805@simplistix.co.uk> References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk> <49D669AA.6080001@v.loewis.de> <49D9FCE2.4070805@simplistix.co.uk> Message-ID: <20090406151915.5D4F93A406A@sparrow.telecommunity.com> At 02:00 PM 4/6/2009 +0100, Chris Withers wrote: >Martin v. L?wis wrote: >>Chris Withers wrote: >>>Would this support the following case: >>> >>>I have a package called mortar, which defines useful stuff: >>> >>>from mortar import content, ... >>> >>>I now want to distribute large optional chunks separately, but ideally >>>so that the following will will work: >>> >>>from mortar.rbd import ... >>>from mortar.zodb import ... >>>from mortar.wsgi import ... >>> >>>Does the PEP support this? >>That's the primary purpose of the PEP. > >Are you sure? > >Does the pep really allow for: > >from mortar import content >from mortar.rdb import something > >...where 'content' is a function defined in mortar/__init__.py and >'something' is a function defined in mortar/rdb/__init__.py *and* >the following are separate distributions on PyPI: > >- mortar >- mortar.rdb > >...where 'mortar' does not contain 'mortar.rdb'. See the third paragraph of http://www.python.org/dev/peps/pep-0382/#discussion From chris at simplistix.co.uk Mon Apr 6 17:57:59 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 06 Apr 2009 16:57:59 +0100 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090406151915.5D4F93A406A@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D51A16.70804@simplistix.co.uk> <49D669AA.6080001@v.loewis.de> <49D9FCE2.4070805@simplistix.co.uk> <20090406151915.5D4F93A406A@sparrow.telecommunity.com> Message-ID: <49DA2687.6050508@simplistix.co.uk> P.J. Eby wrote: > See the third paragraph of > http://www.python.org/dev/peps/pep-0382/#discussion Indeed, I guess the PEP could be made more explanatory then 'cos, as a packager, I don't see what I'd put in the various setup.py and __init__.py to make this work... That said, I'm delighted to hear it's going to be possible and wholeheartedly support the PEP and it's backporting to 2.7 as a result... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From jnoller at gmail.com Mon Apr 6 18:00:46 2009 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 6 Apr 2009 12:00:46 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> Message-ID: <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: > >> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>> >>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>> >>>> I propose the following PEP for inclusion to Python 3.1. >>> >>> Thanks for picking this up. >>> >>> I'd like to extend the proposal to Python 2.7 and later. >>> >> >> -1 to adding it to the 2.x series. There was much discussion around >> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >> add new features to 2.x and use those new features as carrots to help >> lead people into 3.0. > > Actually, isn't the policy just that nothing can go into 2.7 that isn't > backported from 3.1? ?Whether the actual backport happens or not is up to > the developer though. ?OTOH, we talked about a lot of things and my > recollection is probably fuzzy. > > Barry That *is* the official policy, but there was discussions around no further backporting of features from 3.1 into 2.x, therefore providing more of an upgrade incentive From tseaver at palladion.com Mon Apr 6 18:15:43 2009 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 06 Apr 2009 12:15:43 -0400 Subject: [Python-Dev] deprecating BaseException.message In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brett Cannon wrote: > During the PyCon sprint I tried to make BaseException accept only a single > argument and bind it to BaseException.message . I was successful (see the > p3yk_no_args_on_exc branch), but it was very painful to pull off as anyone > who sat around me the last three days of the sprint will tell you as they > had to listen to me curse incessantly. > > Because of the pain that I went through in the transition and thus the > lessons learned, Guido and I discussed it and we think it would be best to > give up on forcing BaseException to accept only a single argument. I think > it is still doable, but requires a multi-release transition period and not > the one that 2.6 -> 3.0 is offering. And so Guido and I plan on deprecating > BaseException.message as its entire point in existence was to help > transition to what we are not going to have happen. =) > > Now that means BaseException.message might hold the record for shortest > lived feature as it was only introduced in 2.5 and is now to be deprecated > in 2.6 and removed in 2.7/3.0. =) > > Below is PEP 352, revised to reflect the removal of > BaseException.messageand for letting the 'args' attribute stay (along > with suggesting one should > only pass a single argument to BaseException). Basically the interface for > exceptions doesn't really change in 3.0 except for the removal of > __getitem__. Hmm, I'm working on cleaning up deprecations for Zope and related packages under Python 2.6. The irony here is that I'm receiving deprecation warnings for custom exception classes which had a 'message' attribute long before the abortive attempt to add them to the BaseException type, which hardly seems reasonable. For instance, docutils.parsers.rst defines a DirectiveError which takes two arguments, 'level' and 'message', and therefore gets hit with the deprecation (even though it never used the new signature). Likewise, ZODB.POSException defines a ConflictError type which takes 'message' as one of several arguments, all optional, and has since at least 2002. I don't think either of these classes should be subject to a deprecation warning for a feature they never used or depended on. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ2iqv+gerLs4ltQ4RArg7AJ9cjTweXUuGdUZNxZ3dHzYb9u6AcQCePJW/ PrXQ48wFrwrsrXSslZ0LSB4= =VU1d -----END PGP SIGNATURE----- From rdmurray at bitdance.com Mon Apr 6 18:28:43 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 6 Apr 2009 12:28:43 -0400 (EDT) Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com> Message-ID: On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote: > On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: >> >>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>>> >>>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>>> >>>>> I propose the following PEP for inclusion to Python 3.1. >>>> >>>> Thanks for picking this up. >>>> >>>> I'd like to extend the proposal to Python 2.7 and later. >>>> >>> >>> -1 to adding it to the 2.x series. There was much discussion around >>> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >>> add new features to 2.x and use those new features as carrots to help >>> lead people into 3.0. >> >> Actually, isn't the policy just that nothing can go into 2.7 that isn't >> backported from 3.1? ?Whether the actual backport happens or not is up to >> the developer though. ?OTOH, we talked about a lot of things and my >> recollection is probably fuzzy. >> >> Barry > > That *is* the official policy, but there was discussions around no > further backporting of features from 3.1 into 2.x, therefore providing > more of an upgrade incentive My sense was that this wasn't proposed as a hard and fast rule, more as a strongly suggested guideline. And in this case, I think you could argue that the PEP is actually fixing a bug in the current namespace packaging system. Some projects, especially the large ones where this matters most, are going to have to maintain backward compatibility for 2.x for a long time even as 3.x adoption accelerates. It seems a shame to require packagers to continue to deal with the problems caused by the current system even after all the platforms have made it to 2.7+. --David From jnoller at gmail.com Mon Apr 6 18:33:54 2009 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 6 Apr 2009 12:33:54 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com> Message-ID: <4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com> On Mon, Apr 6, 2009 at 12:28 PM, R. David Murray wrote: > On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote: >> >> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw wrote: >>> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: >>> >>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>>>> >>>>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>>>> >>>>>> I propose the following PEP for inclusion to Python 3.1. >>>>> >>>>> Thanks for picking this up. >>>>> >>>>> I'd like to extend the proposal to Python 2.7 and later. >>>>> >>>> >>>> -1 to adding it to the 2.x series. There was much discussion around >>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >>>> add new features to 2.x and use those new features as carrots to help >>>> lead people into 3.0. >>> >>> Actually, isn't the policy just that nothing can go into 2.7 that isn't >>> backported from 3.1? ?Whether the actual backport happens or not is up to >>> the developer though. ?OTOH, we talked about a lot of things and my >>> recollection is probably fuzzy. >>> >>> Barry >> >> That *is* the official policy, but there was discussions around no >> further backporting of features from 3.1 into 2.x, therefore providing >> more of an upgrade incentive > > My sense was that this wasn't proposed as a hard and fast rule, more > as a strongly suggested guideline. > > And in this case, I think you could argue that the PEP is actually > fixing a bug in the current namespace packaging system. > > Some projects, especially the large ones where this matters most, are > going to have to maintain backward compatibility for 2.x for a long time > even as 3.x adoption accelerates. ?It seems a shame to require packagers > to continue to deal with the problems caused by the current system even > after all the platforms have made it to 2.7+. > > --David I know it wasn't a hard and fast rule; also, with 3to2 already being worked on, the barrier of maintenance and back porting is going to be lowered. From skip at pobox.com Mon Apr 6 18:57:44 2009 From: skip at pobox.com (skip at pobox.com) Date: Mon, 6 Apr 2009 11:57:44 -0500 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <18906.13448.974602.214940@montanaro.dyndns.org> Cesare> At this time with Python 2.6.1 we have these results: Cesare> def f(): return 1 + 2 * 3 + 4j ... Cesare> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) Guido can certainly correct me if I'm wrong, but I believe the main point of his message was that you aren't going to encounter a lot of code in Python which is amenable to traditional constant folding. For the most part, they will be assigned to symbolic "constants", which, unlike C preprocessor macros aren't really constants at all. Consequently, the opportunity for constant folding is minimal and probably introduces more opportunities for bugs than performance improvements. Skip From cesare.dimauro at a-tono.com Mon Apr 6 18:34:53 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Mon, 6 Apr 2009 18:34:53 +0200 (CEST) Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <58342.151.53.159.5.1239035693.squirrel@webmail6.pair.com> On Lun, Apr 6, 2009 16:43, Antoine Pitrou wrote: > Cesare Di Mauro a-tono.com> writes: >> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) > [...] >> >> With proper constant folding code, both functions can be reduced >> to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by >> a single instruction at all with an advanced peephole optimizer). > > Lists are mutable, you can't optimize the creation of list literals by > storing > them as singleton constants. > > Regards > > Antoine. You are right, I've mistyped the example. def f(): return ('a', ('b', 'c')) * (1 + 2 * 3) generates a single instruction (depending on the threshold used to limit folding of sequences), whereas def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) needs three. Sorry for the mistake. Cheers, Cesare From tseaver at palladion.com Mon Apr 6 19:06:25 2009 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 06 Apr 2009 13:06:25 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <4222a8490904060900s349180a5k952b32b35274df73@mail.gmail.com> <4222a8490904060933y540fd611lc2b9c554eb079c5e@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jesse Noller wrote: > On Mon, Apr 6, 2009 at 12:28 PM, R. David Murray wrote: >> On Mon, 6 Apr 2009 at 12:00, Jesse Noller wrote: >>> On Mon, Apr 6, 2009 at 9:26 AM, Barry Warsaw wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> On Apr 6, 2009, at 9:21 AM, Jesse Noller wrote: >>>> >>>>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>>>>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>>>>> I propose the following PEP for inclusion to Python 3.1. >>>>>> Thanks for picking this up. >>>>>> >>>>>> I'd like to extend the proposal to Python 2.7 and later. >>>>>> >>>>> -1 to adding it to the 2.x series. There was much discussion around >>>>> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >>>>> add new features to 2.x and use those new features as carrots to help >>>>> lead people into 3.0. >>>> Actually, isn't the policy just that nothing can go into 2.7 that isn't >>>> backported from 3.1? Whether the actual backport happens or not is up to >>>> the developer though. OTOH, we talked about a lot of things and my >>>> recollection is probably fuzzy. >>>> >>>> Barry >>> That *is* the official policy, but there was discussions around no >>> further backporting of features from 3.1 into 2.x, therefore providing >>> more of an upgrade incentive >> My sense was that this wasn't proposed as a hard and fast rule, more >> as a strongly suggested guideline. >> >> And in this case, I think you could argue that the PEP is actually >> fixing a bug in the current namespace packaging system. >> >> Some projects, especially the large ones where this matters most, are >> going to have to maintain backward compatibility for 2.x for a long time >> even as 3.x adoption accelerates. It seems a shame to require packagers >> to continue to deal with the problems caused by the current system even >> after all the platforms have made it to 2.7+. >> >> --David > > I know it wasn't a hard and fast rule; also, with 3to2 already being > worked on, the barrier of maintenance and back porting is going to be > lowered. My understanding from the summit is that the only point in a 2.7 release at all is to lower the "speed bumps" which make porting from 2.x to 3.x hard for large codebases. In this case, having a consistent spelling for namespace packages between 2.7 and 3.1 would incent those applications / frameworks / libraries to move to 2.7, and therefore ease getting them to 3.1. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ2jaR+gerLs4ltQ4RAsi1AJ0cJyKsoP5SlOcBlnzLr6MB11ZoNwCg1Kil 4O2M0sZG+jH12s22p2AmXWk= =DLRM -----END PGP SIGNATURE----- From brian at sweetapp.com Mon Apr 6 20:13:28 2009 From: brian at sweetapp.com (Brian Quinlan) Date: Mon, 06 Apr 2009 19:13:28 +0100 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <49D9E3E0.2060408@gmail.com> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> <49D9A669.9010008@sweetapp.com> <49D9E3E0.2060408@gmail.com> Message-ID: <49DA4648.9070204@sweetapp.com> Nick Coghlan wrote: > Brian Quinlan wrote: >> - you need the cooperation of your subclasses i.e. they must call >> super().flush() in .flush() to get correct close behavior (and this >> represents a backwards-incompatible semantic change) > > Are you sure about that? Going by the current _pyio semantics that > Antoine posted, it looks to me that it is already the case that > subclasses need to invoke the parent flush() call correctly to avoid > breaking the base class semantics (which really isn't an uncommon > problem when it comes to writing correct subclasses). As it is now, if you didn't call super().flush() in your flush override, then a buffer won't be flushed at the time that you expected. With the proposed change, if you don't call super().flush() in your flush override, then the buffer will never get flushed and you will lose data when you close the file. I'm not saying that it is a big deal, but it is a difference. Cheers, Brian From cesare.dimauro at a-tono.com Mon Apr 6 21:23:18 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Mon, 6 Apr 2009 21:23:18 +0200 (CEST) Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <18906.13448.974602.214940@montanaro.dyndns.org> References: <18906.13448.974602.214940@montanaro.dyndns.org> Message-ID: <52217.151.53.159.5.1239045798.squirrel@webmail6.pair.com> On Mon, Apr 6, 2009 18:57, skip at pobox.com wrote: > > Cesare> At this time with Python 2.6.1 we have these results: > Cesare> def f(): return 1 + 2 * 3 + 4j > ... > Cesare> def f(): return ['a', ('b', 'c')] * (1 + 2 * 3) > > Guido can certainly correct me if I'm wrong, but I believe the main point > of > his message was that you aren't going to encounter a lot of code in Python > which is amenable to traditional constant folding. For the most part, > they > will be assigned to symbolic "constants", which, unlike C preprocessor > macros aren't really constants at all. Consequently, the opportunity for > constant folding is minimal and probably introduces more opportunities for > bugs than performance improvements. > > Skip I can understand Guido's concern, but you worked as well on constant folding, and you know that there's space for optimizations here. peephole.c have some code for unary, binary, and tuple/list folding; they worked fine. Why mantaining unuseful and dangerous code, otherwise? I know that bugs can come out doing such optimizations, but Python have a good tests battery that can help find them. Obviously tests can't give us 100% insurance that everything works as expected, but they are very good starting point. Bugs can happen at every change on the code base, but code base changes... Cesare From dickinsm at gmail.com Mon Apr 6 21:30:57 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 6 Apr 2009 20:30:57 +0100 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> [Antoine] > - Issue #5593: code like 1e16+2.9999 is optimized away and its result stored as > a constant (again), but the result can vary slightly depending on the internal > FPU precision. [Guido] > I would just not bother constant folding involving FP, or only if the > values involved have an exact representation in IEEE binary FP format. +1 for removing constant folding for floats (besides conversion of -). There are just too many things to worry about: FPU rounding mode and precision, floating-point signals and flags, effect of compiler flags, and the potential benefit seems small. Mark From python at rcn.com Mon Apr 6 22:05:37 2009 From: python at rcn.com (Raymond Hettinger) Date: Mon, 6 Apr 2009 13:05:37 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues References: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> Message-ID: > +1 for removing constant folding for floats (besides conversion > of -). There are just too many things to worry about: > FPU rounding mode and precision, floating-point signals and flags, > effect of compiler flags, and the potential benefit seems small. If you're talking about the existing peepholer optimization that has been in-place for years, I think it would be better to leave it as-is. It's better to have the compiler do the work than to have a programmer thinking he/she needs to do it by hand (reducing readability by introducing magic numbers). The code for the lsum() recipe is more readable with a line like: exp = long(mant * 2.0 ** 53) than with exp = long(mant * 9007199254740992.0) It would be ashamed if code written like the former suddenly started doing the exponentation in the inner-loop or if the code got rewritten by hand as shown. The list of "things to worry about" seems like the normal list of issues associated with doing anything in floating point. Python is already FPU challenged in that it offers nearly zero control over the FPU or direct access to signals and flags. Every step of a floating point calculation in Python gets written-out to a PyFloat object and is squeezed back into a C double (potentially introducing double-rounding if extended precision had be used by the FPU). Disabling the peepholer doesn't change this situation. Raymond From ondrej at certik.cz Mon Apr 6 22:06:06 2009 From: ondrej at certik.cz (Ondrej Certik) Date: Mon, 6 Apr 2009 13:06:06 -0700 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> Message-ID: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> Hi, On Sun, Mar 29, 2009 at 10:21 AM, Jeffrey Yasskin wrote: > I've heard some good things about cmake ? LLVM, googletest, and Boost > are all looking at switching to it ? so I wanted to see if we could > simplify our autoconf+makefile system by using it. The biggest wins I > see from going to cmake are: > ?1. It can autogenerate the Visual Studio project files instead of > needing them to be maintained separately > ?2. It lets you write functions and modules without understanding > autoconf's mix of shell and M4. > ?3. Its generated Makefiles track header dependencies accurately so we > might be able to add private headers efficiently. I am switching to cmake with all my python projects, as it is rock solid, supports building in parallel (if I have some C++ and Cython extensions), and the configure part works well. The only disadvantage that I can see is that one has to learn a new syntax, which is not Python. But on the other hand, at least it forces one to really just use cmake to write build scripts in a standard way, while scons and other Python solutions imho encourage to write full Python programs, which imho is a disadvantage for the build system, as then every build system is nonstandard. Ondrej From dickinsm at gmail.com Mon Apr 6 22:22:28 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 6 Apr 2009 21:22:28 +0100 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> Message-ID: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com> On Mon, Apr 6, 2009 at 9:05 PM, Raymond Hettinger wrote: > The code for the lsum() recipe is more readable with a line like: > > ?exp = long(mant * 2.0 ** 53) > > than with > > ?exp = long(mant * 9007199254740992.0) > > It would be ashamed if code written like the former suddenly > started doing the exponentation in the inner-loop or if the code > got rewritten by hand as shown. Well, I'd say that the obvious solution here is to compute the constant 2.0**53 just once, somewhere outside the inner loop. In any case, that value would probably be better written as 2.0**DBL_MANT_DIG (or something similar). As Antoine reported, the constant-folding caused quite a confusing bug report (issue #5593): the problem (when we eventually tracked it down) was that the folded constant was in a .pyc file, and so wasn't updated when the compiler flags changed. Mark From jackdied at gmail.com Mon Apr 6 22:32:05 2009 From: jackdied at gmail.com (Jack diederich) Date: Mon, 6 Apr 2009 16:32:05 -0400 Subject: [Python-Dev] Getting information out of the buildbots Message-ID: I committed some new telnetlib tests yesterday to the trunk and I can see they are failing on Neal's setup but not what the failures are. Ideally I like to get the information out of the buildbots but they all seem to be hanging on stdio tests and quiting out. Ideas? TIA, -Jack From solipsis at pitrou.net Mon Apr 6 22:35:36 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Apr 2009 20:35:36 +0000 (UTC) Subject: [Python-Dev] Getting information out of the buildbots References: Message-ID: Jack diederich gmail.com> writes: > > I committed some new telnetlib tests yesterday to the trunk and I can > see they are failing on Neal's setup but not what the failures are. > Ideally I like to get the information out of the buildbots but they > all seem to be hanging on stdio tests and quiting out. You can commit some temporary debug output in the tests (just sprinkle those print()'s you need to get your tasty information). Regards Antoine. From guido at python.org Mon Apr 6 23:27:29 2009 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Apr 2009 14:27:29 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro wrote: > The Language Reference says nothing about the effects of code optimizations. > I think it's a very good thing, because we can do some work here with constant > folding. Unfortunately the language reference is not the only thing we have to worry about. Unlike languages like C++, where compiler writers have the moral right to modify the compiler as long as they stay within the weasel-words of the standard, in Python, users' expectations carry value. Since the language is inherently not that fast, users are not all that focused on performance (if they were, they wouldn't be using Python). Unsurprising behavior OTOH is valued tremendously. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 6 23:28:41 2009 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Apr 2009 14:28:41 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com> References: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com> Message-ID: On Mon, Apr 6, 2009 at 1:22 PM, Mark Dickinson wrote: > On Mon, Apr 6, 2009 at 9:05 PM, Raymond Hettinger wrote: >> The code for the lsum() recipe is more readable with a line like: >> >> ?exp = long(mant * 2.0 ** 53) >> >> than with >> >> ?exp = long(mant * 9007199254740992.0) >> >> It would be ashamed if code written like the former suddenly >> started doing the exponentation in the inner-loop or if the code >> got rewritten by hand as shown. Do you have any evidence that people write lots of inner loops with constant expressions? In real-world code these just don't exist that much. The case of constant folding in Python is *much* weaker than in C because Python doesn't have real compile-time constants, so named "constants" are variables to the compiler. > Well, I'd say that the obvious solution here is to compute > the constant 2.0**53 just once, somewhere outside the > inner loop. ?In any case, that value would probably be better > written as 2.0**DBL_MANT_DIG (or something similar). So true. > As Antoine reported, the constant-folding caused quite > a confusing bug report (issue #5593): ?the problem (when > we eventually tracked it down) was that the folded > constant was in a .pyc file, and so wasn't updated when > the compiler flags changed. Right. Over the years the peephole optimizer and constant folding have been a constant (though small) source of bugs. I'm not sure that there is much real-world value in it, and it is certainly not right to choose speed over correctness. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Mon Apr 6 23:32:45 2009 From: thomas at python.org (Thomas Wouters) Date: Mon, 6 Apr 2009 23:32:45 +0200 Subject: [Python-Dev] FWD: Library Reference is incomplete In-Reply-To: <20090406130618.GD19296@panix.com> References: <20090406130618.GD19296@panix.com> Message-ID: <9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com> Anyone able to look into this and fix it? Having all of the normal entrypoints for documentation broken is rather inconvenient for users :-) On Mon, Apr 6, 2009 at 15:06, Aahz wrote: > Hrm, looks like the whole 2.6 build is broken. > > ----- Forwarded message from "M?ller-Reineke, Matthias" < > matthias.mueller-reineke at grundvers.de> ----- > > > Subject: Library Reference is incomplete > > Date: Mon, 6 Apr 2009 11:25:54 +0200 > > From: "M?ller-Reineke, Matthias" > > To: webmaster at python.org > > > > Dear Webmaster, > > > > "Library Reference" on http://www.python.org/doc/ takes me to > http://docs.python.org/library/ . > > That side doesn't contain the index of contents. > > > > Matthias M?ller-Reineke > > > > ------------------------------------------ > > Grundeigent?mer-Versicherung VVaG > > Gro?e B?ckerstra?e 7 > > 20095 Hamburg > > Tel: 040 - 3 76 63 - 199 > > Fax: 040 - 3 76 63 - 98 199 > > > > http://www.grundvers.de > > > > > > Firmensitz: Hamburg HRB 13 103 > > Vorstand: Heinz Walter Berens (Vors.), R?diger Buyten > > Aufsichtsratsvorsitzender: Peter Landmann > > ----- End forwarded message ----- > > -- > Aahz (aahz at pythoncraft.com) <*> > http://www.pythoncraft.com/ > > "...string iteration isn't about treating strings as sequences of strings, > it's about treating strings as sequences of characters. The fact that > characters are also strings is the reason we have problems, but characters > are strings for other good reasons." --Aahz > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/thomas%40python.org > -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Apr 6 23:44:18 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 07 Apr 2009 07:44:18 +1000 Subject: [Python-Dev] deprecating BaseException.message In-Reply-To: References: Message-ID: <49DA77B2.4020508@gmail.com> Tres Seaver wrote: > I don't think either of these classes should be subject to a deprecation > warning for a feature they never used or depended on. Agreed. Could you raise a tracker issue for the spurious warnings? (I believe we should be able to make the warning condition a bit smarter to eliminate these). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Mon Apr 6 23:51:26 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 07 Apr 2009 07:51:26 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: <87ab6tappe.fsf@benfinney.id.au> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <20090406042016.GA97@panix.com> <49D9E92E.4010603@gmail.com> <87ab6tappe.fsf@benfinney.id.au> Message-ID: <49DA795E.3030104@gmail.com> Ben Finney wrote: > Nick Coghlan writes: >> Mercurial appears to best allow the sales pitch to be tailored to >> the target audience (in this case, a group including a lot of people >> with a background predominantly involving centralised version >> control tools). > > I don't follow. Wouldn't your preceding points above instead make > *Bazaar* the one best suited for a group including a lot of people > with a background predominantly involving centralised version control > tools? Yes, but the Bazaar advocates appear to have a hard time convincing the other existing DVCS users that it provides *enough* access to the underlying graph. So it then tends to get resisted by the folks that are already fans of git or Mercurial. Like I said though, this is a subjective impression formed by reading what other people have written rather than by actually experiencing any of the tools myself. I'm sure all of them are quite capable of getting the job done :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Tue Apr 7 00:05:05 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Apr 2009 00:05:05 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49D9EB15.8070806@gmail.com> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> Message-ID: <49DA7C91.6010202@v.loewis.de> Nick Coghlan wrote: > Dirkjan Ochtman wrote: >> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. >> Could use some review, but it seems like a good start. > > Martin may be able to provide a better list of names based on the > checkin name<->SSH public key mapping in the SVN setup. I think the identification in the SSH keys is useless. It contains strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple of them (barry at wooz, barry at resist, ...). It seems that the PEP needs to spell out a policy as to what committer information needs to look like; then we need to verify that the proposed name mapping matches that policy. > (e.g. I believe my SVN checkin name is nick.coghlan rather than the > shorter ncoghlan in my email address, and many others are in a similar > boat since first.last was the chosen scheme for names in the SVN switchover) Correct. The objective was to not allow nick names, but have real names as committer names. It appears that this policy does not directly translate into Mercurial. Regards, Martin From rhamph at gmail.com Tue Apr 7 00:05:58 2009 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 6 Apr 2009 16:05:58 -0600 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com> References: <5c6f2a5d0904061230j6ef7fd18q369c19d91c5e34b8@mail.gmail.com> <5c6f2a5d0904061322t2c7f6bd7y55f73ced221c8804@mail.gmail.com> Message-ID: On Mon, Apr 6, 2009 at 2:22 PM, Mark Dickinson wrote: > Well, I'd say that the obvious solution here is to compute > the constant 2.0**53 just once, somewhere outside the > inner loop. ?In any case, that value would probably be better > written as 2.0**DBL_MANT_DIG (or something similar). > > As Antoine reported, the constant-folding caused quite > a confusing bug report (issue #5593): ?the problem (when > we eventually tracked it down) was that the folded > constant was in a .pyc file, and so wasn't updated when > the compiler flags changed. Another way of looking at this is that we have a ./configure option which affects .pyc output. Therefor, we should add a flag to the magic number, causing it to be regenerated as needed. Whether that's better or worse than removing constant folding I haven't decided. I have such low expectations of floating point that I'm not surprised by bugs like this. I'm more surprised that people expect consistent, deterministic results... -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Tue Apr 7 00:12:26 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Apr 2009 00:12:26 +0200 Subject: [Python-Dev] Getting information out of the buildbots In-Reply-To: References: Message-ID: <49DA7E4A.4020203@v.loewis.de> > You can commit some temporary debug output in the tests (just sprinkle those > print()'s you need to get your tasty information). Also, if you want to do a sequence of changes to test a specific machine, you might want to create a branch, make those changes, and then trigger a build of that branch just on that specific slave (use branches/ in the input field). When doing so, feel free to cancel any automated build that is currently running; make sure to use your real name in the UI so we know it's not spam. Regards, Martin From syfou at users.sourceforge.net Tue Apr 7 01:58:16 2009 From: syfou at users.sourceforge.net (Sylvain Fourmanoit) Date: Mon, 6 Apr 2009 19:58:16 -0400 (EDT) Subject: [Python-Dev] FWD: Documentation site problems In-Reply-To: <20090406130446.GB19296@panix.com> References: <20090406130446.GB19296@panix.com> Message-ID: >> there contents is missing from the python tutorial: > The 3.0 docs seem to be correct: > http://docs.python.org/3.0/tutorial/ It seems it is not the case anymore. The devel doc from Python 3 are missing a few tables of contents as well: http://docs.python.org/dev/py3k/tutorial/ When I build the html doc locally, it looks like Sphinx from svn (r68598) has an issue with the 'numbered' option in the toctree directive. Here is my output of `make html' from revision 71295 of the py3k branch: http://fourmanoit.googlepages.com/pydoc_output.txt It did work fine a few days back though -- yesterday, the online doc was still complete: I believe it was last built on March the 28th. Yours, -- Sylvain Fourmanoit Memory fault -- core...uh...um...core... Oh dammit, I forget! From steve at pearwood.info Tue Apr 7 02:10:16 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 7 Apr 2009 10:10:16 +1000 Subject: [Python-Dev] =?iso-8859-1?q?pyc_files=2C_constant_folding_and_bor?= =?iso-8859-1?q?derline_=09portability_issues?= In-Reply-To: References: Message-ID: <200904071010.16855.steve@pearwood.info> On Tue, 7 Apr 2009 07:27:29 am Guido van Rossum wrote: > Unfortunately the language reference is not the only thing we have to > worry about. Unlike languages like C++, where compiler writers have > the moral right to modify the compiler as long as they stay within > the weasel-words of the standard, in Python, users' expectations > carry value. Since the language is inherently not that fast, users > are not all that focused on performance (if they were, they wouldn't > be using Python). Unsurprising behavior OTOH is valued tremendously. Speaking as a user, Python's slowness is *not* a feature. Anything reasonable which can increase performance is a Good Thing. One of the better aspects of Python programming is that (in general) you can write code in the most natural way possible, with the least amount of scaffolding getting in the way. I'm with Raymond: I think it would be sad if "exp = long(mant * 2.0 ** 53)" did the exponentiation in the inner-loop. Pre-computing that value outside the loop counts as scaffolding, and gets in the way of readability and beauty. On the other hand, I'm with Guido when he wrote "it is certainly not right to choose speed over correctness". This is especially a problem for floating point optimizations, and I urge Cesare to be conservative in any f.p. optimizations he introduces, including constant folding. So... +1 on the general principle of constant folding, -0.5 on any such optimizations which change the semantics of a f.p. operation. The only reason it's -0.5 rather than -1 is that (presumably) anyone who cares about floating point correctness already knows to never trust the compiler. -- Steven D'Aprano From guido at python.org Tue Apr 7 02:18:42 2009 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Apr 2009 17:18:42 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <200904071010.16855.steve@pearwood.info> References: <200904071010.16855.steve@pearwood.info> Message-ID: On Mon, Apr 6, 2009 at 5:10 PM, Steven D'Aprano wrote: > On Tue, 7 Apr 2009 07:27:29 am Guido van Rossum wrote: > >> Unfortunately the language reference is not the only thing we have to >> worry about. Unlike languages like C++, where compiler writers have >> the moral right to modify the compiler as long as they stay within >> the weasel-words of the standard, in Python, users' expectations >> carry value. Since the language is inherently not that fast, users >> are not all that focused on performance (if they were, they wouldn't >> be using Python). Unsurprising behavior OTOH is valued tremendously. > > Speaking as a user, Python's slowness is *not* a feature. Anything > reasonable which can increase performance is a Good Thing. > > One of the better aspects of Python programming is that (in general) you > can write code in the most natural way possible, with the least amount > of scaffolding getting in the way. I'm with Raymond: I think it would > be sad if "exp = long(mant * 2.0 ** 53)" did the exponentiation in the > inner-loop. Pre-computing that value outside the loop counts as > scaffolding, and gets in the way of readability and beauty. > > On the other hand, I'm with Guido when he wrote "it is certainly not > right to choose speed over correctness". This is especially a problem > for floating point optimizations, and I urge Cesare to be conservative > in any f.p. optimizations he introduces, including constant folding. > > So... +1 on the general principle of constant folding, -0.5 on any such > optimizations which change the semantics of a f.p. operation. The only > reason it's -0.5 rather than -1 is that (presumably) anyone who cares > about floating point correctness already knows to never trust the > compiler. Unfortunately, historically well-meaning attempts at adding constant-folding have more than once introduced obscure bugs that were hard to reproduce and only discovered one or two releases later. This has little to do with caring about float correctness. It's more about the difficulty of debugging Heisenbugs. For all these reasons should be super risk averse in this area. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ndbecker2 at gmail.com Tue Apr 7 02:25:44 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 06 Apr 2009 20:25:44 -0400 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <5b8d13220903291114k17e9eff9v6d1a5eef1fb72332@mail.gmail.com> Message-ID: David Cournapeau wrote: > On Mon, Mar 30, 2009 at 2:59 AM, Antoine Pitrou > wrote: ... > > Waf is definitely faster than scons - something like one order of > magnitude. I am yet very familiar with waf, but I like what I saw - > the architecture is much nicer than scons (waf core amount of code is > almost ten times smaller than scons core), but I would not call it a > mature project yet. > I haven't tried waf, but IIUC it _solves_ the bootstrap issue. From steve at holdenweb.com Tue Apr 7 03:35:22 2009 From: steve at holdenweb.com (Steve Holden) Date: Mon, 06 Apr 2009 21:35:22 -0400 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> Message-ID: Ondrej Certik wrote: > Hi, > > On Sun, Mar 29, 2009 at 10:21 AM, Jeffrey Yasskin wrote: >> I've heard some good things about cmake ? LLVM, googletest, and Boost >> are all looking at switching to it ? so I wanted to see if we could >> simplify our autoconf+makefile system by using it. The biggest wins I >> see from going to cmake are: >> 1. It can autogenerate the Visual Studio project files instead of >> needing them to be maintained separately >> 2. It lets you write functions and modules without understanding >> autoconf's mix of shell and M4. >> 3. Its generated Makefiles track header dependencies accurately so we >> might be able to add private headers efficiently. > > I am switching to cmake with all my python projects, as it is rock > solid, supports building in parallel (if I have some C++ and Cython > extensions), and the configure part works well. > > The only disadvantage that I can see is that one has to learn a new > syntax, which is not Python. But on the other hand, at least it forces > one to really just use cmake to write build scripts in a standard way, > while scons and other Python solutions imho encourage to write full > Python programs, which imho is a disadvantage for the build system, as > then every build system is nonstandard. > [obirrelevance] Isn't it strange how nobody every complained about the significance of whitespace in makefiles: only the fact that leading tabs were required rather than just-any-old whitespace. I guess some people just home in on things to complain about. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ From steve at holdenweb.com Tue Apr 7 04:25:36 2009 From: steve at holdenweb.com (Steve Holden) Date: Mon, 06 Apr 2009 22:25:36 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: <49DAB9A0.2090803@holdenweb.com> Dirkjan Ochtman wrote: > On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti > wrote: >> But that won't work if people who are not core developers submit us >> patch bundle to import. And maintaining a such white-list sounds to me >> more burdensome than necessary. > > Well, if you need contributors to sign a contributor's agreement > anyway, there's already some list out there that we can leverage. > > The other option is to play the consenting adults card and ask all > people with push access to ascertain the correct names of committer > names on patches they push. > I would remind you all that it's *very* necessary to make sure that whatever finds its way into released code is indeed covered by contributor agreements. The PSF (as the guardian of the IP) has to ensure this, and so we have to find a way of ensuring that all contributions to source are correctly logged against authors in a traceable way. regasds Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ From steve at holdenweb.com Tue Apr 7 04:25:36 2009 From: steve at holdenweb.com (Steve Holden) Date: Mon, 06 Apr 2009 22:25:36 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> Message-ID: <49DAB9A0.2090803@holdenweb.com> Dirkjan Ochtman wrote: > On Mon, Apr 6, 2009 at 06:20, Alexandre Vassalotti > wrote: >> But that won't work if people who are not core developers submit us >> patch bundle to import. And maintaining a such white-list sounds to me >> more burdensome than necessary. > > Well, if you need contributors to sign a contributor's agreement > anyway, there's already some list out there that we can leverage. > > The other option is to play the consenting adults card and ask all > people with push access to ascertain the correct names of committer > names on patches they push. > I would remind you all that it's *very* necessary to make sure that whatever finds its way into released code is indeed covered by contributor agreements. The PSF (as the guardian of the IP) has to ensure this, and so we have to find a way of ensuring that all contributions to source are correctly logged against authors in a traceable way. regasds Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/ From dschult at colgate.edu Tue Apr 7 05:47:17 2009 From: dschult at colgate.edu (Dan Schult) Date: Mon, 6 Apr 2009 23:47:17 -0400 Subject: [Python-Dev] calling dictresize outside dictobject.c Message-ID: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> Hi, I'm trying to write a C extension which is a subclass of dict. I want to do something like a setdefault() but with a single lookup. Looking through the dictobject code, the three workhorse routines lookdict, insertdict and dictresize are not available directly for functions outside dictobject.c, but I can get at lookdict through dict->ma_lookup(). So I use lookdict to get the PyDictEntry (call it ep) I'm looking for. The comments for lookdict say ep is ready to be set... so I do that. Then I check whether the dict needs to be resized--following the nice example of PyDict_SetItem. But I can't call dictresize to finish off the process. Should I be using PyDict_SetItem directly? No... it does its own lookup. I don't want a second lookup! I already know which entry will be filled. So then I look at the code for setdefault and it also does a double lookup for checking and setting an entry. What subtle issue am I missing? Why does setdefault do a double lookup? More globally, why isn't dictresize available through the C-API? If there isn't a reason to do a double lookup I have a patch for setdefault, but I thought I should ask here first. Thanks! Dan From greg.ewing at canterbury.ac.nz Tue Apr 7 07:20:24 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 07 Apr 2009 17:20:24 +1200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> Message-ID: <49DAE298.7040007@canterbury.ac.nz> Steve Holden wrote: > Isn't it strange how nobody every complained about the significance of > whitespace in makefiles: only the fact that leading tabs were required > rather than just-any-old whitespace. Make doesn't care how *much* whitespace there is, though, only whether it's there or not. If it accepted anything that looks like whitespace, there would be no cause for complaint. -- Greg From fetchinson at googlemail.com Tue Apr 7 07:55:47 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Mon, 6 Apr 2009 22:55:47 -0700 Subject: [Python-Dev] decorator module in stdlib? Message-ID: The decorator module [1] written by Michele Simionato is a very useful tool for maintaining function signatures while applying a decorator. Many different projects implement their own versions of the same functionality, for example turbogears has its own utility for this, I guess others do something similar too. Was the issue whether to include this module in the stdlib raised? If yes, what were the arguments against it? If not, what do you folks think, shouldn't it be included? I certainly think it should be. Originally I sent this message to c.l.p [2] and Michele suggested it be brought up on python-dev. He also pointed out that a PEP [3] is already written about this topic and it is in draft form. What do you guys think, wouldn't this be a useful addition to functools? Cheers, Daniel [1] http://pypi.python.org/pypi/decorator [2] http://groups.google.com/group/comp.lang.python/browse_thread/thread/d4056023f1150fe0 [3] http://www.python.org/dev/peps/pep-0362/ -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From stephen at xemacs.org Tue Apr 7 08:03:05 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Apr 2009 15:03:05 +0900 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> Message-ID: <873aclt2zq.fsf@xemacs.org> Alexandre Vassalotti writes: > This makes me remember that we will have to decide how we will > reorganize our workflow. For this, we can either be conservative and > keep the current CVS-style development workflow--i.e., a few main > repositories where all developers can commit to. That was the original idea of PEP 374, that was a presumption under which I wrote my part of it, I think we should stick with it. As people develop personal workflows, they can suggest them, and/or changes in the public workflow needed to support them. But there should be a working sample implementation before thinking about changes to the workflow. Simply allowing more people to work effectively offline is going to speed things up perceptibly. Improved branching will add to that impact. The current workflow is pretty clean. Let's not mess it up or all that will be achieved is to speed up the mess. > Or we could drink the kool-aid and go with a kernel-style > development workflow--i.e., each developer maintains his own branch > and pull changes from each others. Can you give examples of projects using Mercurial that do that? All of the Mercurial projects I've seen "up close" have relatively centralized workflows, which Mercurial encourages because of the way it likes to automatically merge. I wouldn't want to try the kernel style with Mercurial because its named branch support doesn't work the way it should. In my experience, to deal with external branches, you have to maintain a separate workspace per external branch you want to follow. You'd also need to provide a users' guide to things like rebasing, which become very important in a kernel-style workflow, but which the Mercurial developers opposed on principle, at least at first. > However if we go kernel-style, I will need to designate someone > (i.e., an integrator) that will maintain the main branches, which > will tested by buildbot and used for the public releases. These are > issues I would like to address in the PEP. IMHO, that's new PEP. This is not part of the PEP 374 decision to go to a dVCS, nor part of the requirements for implementation, whether that is considered an extension of 374 or a new PEP in itself. From dirkjan at ochtman.nl Tue Apr 7 08:15:33 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 08:15:33 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49DA7C91.6010202@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> Message-ID: On Tue, Apr 7, 2009 at 00:05, "Martin v. L?wis" wrote: > I think the identification in the SSH keys is useless. It contains > strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple > of them (barry at wooz, barry at resist, ...). Right, so we'll put up the author map somewhere with the email addresses I gathered and ask for a more thorough review at some point. > It seems that the PEP needs to spell out a policy as to what committer > information needs to look like; then we need to verify that the proposed > name mapping matches that policy. Right. It's basically "Name Lastname " -- we can verify that in a hook. > Correct. The objective was to not allow nick names, but have real names > as committer names. It appears that this policy does not directly > translate into Mercurial. One of the nicer features of Mercurial/DVCSs, in my experience, is that non-committers get to keep the credit on their patches. That means that it's impossible to enforce a policy more extensive than some basic checks (such as the format above). Unless we keep a list of people who have signed an agreement, which will mean people will have to re-do the username on commits that don't constitute a non-trivial contribution. Cheers, Dirkjan From alexandre at peadrop.com Tue Apr 7 08:17:51 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 7 Apr 2009 02:17:51 -0400 Subject: [Python-Dev] Mercurial? In-Reply-To: <873aclt2zq.fsf@xemacs.org> References: <20090404154049.GA23987@panix.com> <49D87499.5060502@v.loewis.de> <49D8EC71.5020105@v.loewis.de> <873aclt2zq.fsf@xemacs.org> Message-ID: On Tue, Apr 7, 2009 at 2:03 AM, Stephen J. Turnbull wrote: > Alexandre Vassalotti writes: > > ?> This makes me remember that we will have to decide how we will > ?> reorganize our workflow. For this, we can either be conservative and > ?> keep the current CVS-style development workflow--i.e., a few main > ?> repositories where all developers can commit to. > > That was the original idea of PEP 374, that was a presumption under > which I wrote my part of it, I think we should stick with it. ?As > people develop personal workflows, they can suggest them, and/or > changes in the public workflow needed to support them. ?But there > should be a working sample implementation before thinking about > changes to the workflow. > Aahz convinced me earlier that changing the current workflow would be stupid. So, I now think the best thing to do is to provide a CVS-style environment similar to what we have currently, and let the workflow evolve naturally as developers gain more confidence with Mercurial. > > ?> Or we could drink the kool-aid and go with a kernel-style > ?> development workflow--i.e., each developer maintains his own branch > ?> and pull changes from each others. > > Can you give examples of projects using Mercurial that do that? > Mercurial itself is developed using that style, I believe. -- Alexandre From dirkjan at ochtman.nl Tue Apr 7 08:18:34 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 08:18:34 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <49DAB9A0.2090803@holdenweb.com> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D8FA3A.5050400@v.loewis.de> <49D8FC47.8080803@ochtman.nl> <49DAB9A0.2090803@holdenweb.com> Message-ID: On Tue, Apr 7, 2009 at 04:25, Steve Holden wrote: > I would remind you all that it's *very* necessary to make sure that > whatever finds its way into released code is indeed covered by > contributor agreements. The PSF (as the guardian of the IP) has to > ensure this, and so we have to find a way of ensuring that all > contributions to source are correctly logged against authors in a > traceable way. I think having full name *and* email addresses make it easier to trace code, I guess, since previously code not written by committers would be harder to trace. The fact that some stuff isn't covered just becomes more explicit, which is a good thing IMO. Cheers, Dirkjan From ben+python at benfinney.id.au Tue Apr 7 08:25:09 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 07 Apr 2009 16:25:09 +1000 Subject: [Python-Dev] Mercurial? References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> Message-ID: <873acl7zga.fsf@benfinney.id.au> Dirkjan Ochtman writes: > Right. It's basically "Name Lastname " -- we can verify that > in a hook. Remembering, of course, that full names don't follow any template (especially not first-name last-name). The person's full name must be treated as free-form text, since there's no format common to all. -- \ ?We should strive to do things in [Gandhi's] spirit? not to use | `\ violence in fighting for our cause, but by non-participation in | _o__) what we believe is evil.? ?Albert Einstein | Ben Finney From dirkjan at ochtman.nl Tue Apr 7 08:30:10 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 08:30:10 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <873acl7zga.fsf@benfinney.id.au> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> <873acl7zga.fsf@benfinney.id.au> Message-ID: On Tue, Apr 7, 2009 at 08:25, Ben Finney wrote: > Remembering, of course, that full names don't follow any template > (especially not first-name last-name). The person's full name must be > treated as free-form text, since there's no format common to all. Of course, unless we lock it down through a list of people who have contributor's agreements. Cheers, Dirkjan From cesare.dimauro at a-tono.com Tue Apr 7 09:27:04 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 07 Apr 2009 09:27:04 +0200 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <200904071010.16855.steve@pearwood.info> References: <200904071010.16855.steve@pearwood.info> Message-ID: On Apr 07, 2009 at 02:10AM, Steven D'Aprano wrote: > On the other hand, I'm with Guido when he wrote "it is certainly not > right to choose speed over correctness". This is especially a problem > for floating point optimizations, and I urge Cesare to be conservative > in any f.p. optimizations he introduces, including constant folding. The principle that I followed on doing constant folding was: "do what Python will do without constant folding enabled". So if Python will generate LOAD_CONST 1 LOAD_CONST 2 BINARY_ADD the constant folding code will simply replace them with a single LOAD_CONST 3 When working with such kind of optimizations, the temptation is to apply them at any situation possible. For example, in other languages this a = b * 2 * 3 will be replaced by a = b * 6 In Python I can't do that, because b can be an object which overloaded the * operator, so it *must* be called two times, one for 2 and one for 3. That's the way I choose to implement constant folding. The only difference at this time is regards invalid operations, which will raise exceptions at compile time, not at running time. So if you write: a = 1 / 0 an exception will be raised at compile time. I decided to let the exception be raised immediately, because I think that it's better to detect an error at compile time than at execution time. However, this can leed to incompatibilities with existing code, so in the final implementation I will add a flag to struct compiling (in ast.c) so that this behaviour can be controlled programmatically (enabling or not the exception raising). I already introduced a flag in struct compiling to control the constant folding, that can be completely disabled, if desired. > So... +1 on the general principle of constant folding, -0.5 on any such > optimizations which change the semantics of a f.p. operation. The only > reason it's -0.5 rather than -1 is that (presumably) anyone who cares > about floating point correctness already knows to never trust the > compiler. As Raymond stated, there's no loss in precision working with constant folding code on float datas. That's because there will be a rounding and a store of computed values each time that a result is calculated. Other languages will use FPU registers to hold results as long as possibile, keeping full 80 bit precision (16 bit exponent + 64 bit mantissa). That's not the Python case. Cesare From andrewm at object-craft.com.au Tue Apr 7 09:43:37 2009 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Tue, 7 Apr 2009 17:43:37 +1000 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <7CCEA2C3-A131-4E35-BAE6-8D9896A786AB@object-craft.com.au> On 07/04/2009, at 7:27 AM, Guido van Rossum wrote: > On Mon, Apr 6, 2009 at 7:28 AM, Cesare Di Mauro > wrote: >> The Language Reference says nothing about the effects of code >> optimizations. >> I think it's a very good thing, because we can do some work here >> with constant >> folding. > > Unfortunately the language reference is not the only thing we have to > worry about. Unlike languages like C++, where compiler writers have > the moral right to modify the compiler as long as they stay within the > weasel-words of the standard, in Python, users' expectations carry > value. Since the language is inherently not that fast, users are not > all that focused on performance (if they were, they wouldn't be using > Python). Unsurprising behavior OTOH is valued tremendously. Rather than trying to get the optimizer to guess, why not have a "const" keyword and make it explicit? The result would be a symbol that essentially only exists at compile time - references to the symbol would be replaced by the computed value while compiling. Okay, maybe that would suck a bit (no symbolic debug output). Yeah, I know... take it to python-wild-and-ill-considered-ideas at python.org . From g.brandl at gmx.net Tue Apr 7 10:27:28 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 07 Apr 2009 10:27:28 +0200 Subject: [Python-Dev] FWD: Library Reference is incomplete In-Reply-To: <9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com> References: <20090406130618.GD19296@panix.com> <9e804ac0904061432r16475d1et9fb5b15494ff9d4@mail.gmail.com> Message-ID: Thomas Wouters schrieb: > > Anyone able to look into this and fix it? Having all of the normal > entrypoints for documentation broken is rather inconvenient for users :-) A rebuild should do the trick, I'll fix this ASAP. Georg From p.f.moore at gmail.com Tue Apr 7 12:33:39 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 7 Apr 2009 11:33:39 +0100 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <200904071010.16855.steve@pearwood.info> Message-ID: <79990c6b0904070333t42c55ddfhc72f4a2c987cc38e@mail.gmail.com> 2009/4/7 Cesare Di Mauro : > The principle that I followed on doing constant folding was: "do what Python > will do without constant folding enabled". > > So if Python will generate > > LOAD_CONST ? ? ?1 > LOAD_CONST ? ? ?2 > BINARY_ADD > > the constant folding code will simply replace them with a single > > LOAD_CONST ? ? ?3 > > When working with such kind of optimizations, the temptation is to > apply them at any situation possible. For example, in other languages > this > > a = b * 2 * 3 > > will be replaced by > > a = b * 6 > > In Python I can't do that, because b can be an object which overloaded > the * operator, so it *must* be called two times, one for 2 and one for 3. > > That's the way I choose to implement constant folding. That sounds sufficiently "super risk-averse" to me, so I'm in favour of constant folding being implemented with this attitude :-) Paul. From steve at pearwood.info Tue Apr 7 13:42:05 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 7 Apr 2009 21:42:05 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> Message-ID: <200904072142.06158.steve@pearwood.info> On Tue, 7 Apr 2009 04:30:10 pm Dirkjan Ochtman wrote: > On Tue, Apr 7, 2009 at 08:25, Ben Finney wrote: > > Remembering, of course, that full names don't follow any template > > (especially not first-name last-name). The person's full name must > > be treated as free-form text, since there's no format common to > > all. > > Of course, unless we lock it down through a list of people who have > contributor's agreements. Perhaps you should ask Aahz what he thinks about being forced to provide two names before being allowed to contribute. To say nothing of noted MIT professor and computer scientist Arvind, British lords, the magician Teller, and millions of people from Spanish, Portuguese, Indonesian, Burmese and Malaysian cultures. Ben is correct: you can't assume that contributors will have both a first name and a last name, or that a first name and last name is sufficient to legally identify them. Those from Spanish and Portuguese cultures usually have two family names as well as a personal name; people from Indonesian, Burmese and Malaysian cultures often only use a single name. -- Steven D'Aprano From dirkjan at ochtman.nl Tue Apr 7 13:57:05 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 13:57:05 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <200904072142.06158.steve@pearwood.info> References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> Message-ID: On Tue, Apr 7, 2009 at 13:42, Steven D'Aprano wrote: > Perhaps you should ask Aahz what he thinks about being forced to provide > two names before being allowed to contribute. Huh? The contributor's agreement list would presumably include real names only (so Aahz is out of luck), but the names wouldn't need to be limited to just one "word". I don't think I was implying otherwise; maybe my example much earlier in the thread was simplistic and I should have put it in EBNF (with Unicode character classes just to be very sure). Oh, yes, I am excluding people whose names include non-Unicode characters. Tough luck. Cheers, Dirkjan From mal at egenix.com Tue Apr 7 14:02:54 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 07 Apr 2009 14:02:54 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090403004135.B76443A40A7@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <20090403004135.B76443A40A7@sparrow.telecommunity.com> Message-ID: <49DB40EE.60004@egenix.com> On 2009-04-03 02:44, P.J. Eby wrote: > At 10:33 PM 4/2/2009 +0200, M.-A. Lemburg wrote: >> Alternative Approach: >> --------------------- >> >> Wouldn't it be better to stick with a simpler approach and look for >> "__pkg__.py" files to detect namespace packages using that O(1) check ? > >> One of the namespace packages, the defining namespace package, will have >> to include a __init__.py file. > > Note that there is no such thing as a "defining namespace package" -- > namespace package contents are symmetrical peers. That was a definition :-) Definition namespace package := the namespace package having the __pkg__.py file This is useful to have since packages allowing integration of other sub-packages typically come as a base package with some basic infra-structure in place which is required by all other namespace packages. If the __init__.py file is not found among the namespace directories, the importer will have to raise an exception, since the result would not be a proper Python package. >> * It's possible to have a defining package dir and add-one package >> dirs. > > Also possible in the PEP, although the __init__.py must be in the first > such directory on sys.path. (However, such "defining" packages are not > that common now, due to tool limitations.) That's a strange limitation of the PEP. Why should the location of the __init__.py file depend on the order of sys.path ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 03 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tseaver at palladion.com Tue Apr 7 14:07:51 2009 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 07 Apr 2009 08:07:51 -0400 Subject: [Python-Dev] deprecating BaseException.message In-Reply-To: <49DA77B2.4020508@gmail.com> References: <49DA77B2.4020508@gmail.com> Message-ID: <49DB4217.5060004@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nick Coghlan wrote: > Tres Seaver wrote: >> I don't think either of these classes should be subject to a deprecation >> warning for a feature they never used or depended on. > > Agreed. Could you raise a tracker issue for the spurious warnings? (I > believe we should be able to make the warning condition a bit smarter to > eliminate these). Done: http://bugs.python.org/issue5716? Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ20IX+gerLs4ltQ4RAkuDAKCTZNp0r38d+hW8TmvjIh9Sj59CJQCfbJlQ taNbsBUT79MF8t7owySE2dg= =LjZf -----END PGP SIGNATURE----- From mal at egenix.com Tue Apr 7 14:25:08 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 07 Apr 2009 14:25:08 +0200 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> Message-ID: <49DB4624.604@egenix.com> On 2009-04-06 15:21, Jesse Noller wrote: > On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >> On 2009-04-02 17:32, Martin v. L?wis wrote: >>> I propose the following PEP for inclusion to Python 3.1. >> Thanks for picking this up. >> >> I'd like to extend the proposal to Python 2.7 and later. >> > > -1 to adding it to the 2.x series. There was much discussion around > adding features to 2.x *and* 3.0, and the consensus seemed to *not* > add new features to 2.x and use those new features as carrots to help > lead people into 3.0. I must have missed that discussion :-) Where's the PEP pinning this down ? The Python 2.x user base is huge and the number of installed applications even larger. Cutting these users and application developers off of important new features added to Python 3 is only going to work as "carrot" for those developers who: * have enough resources (time, money, manpower) to port their existing application to Python 3 * can persuade their users to switch to Python 3 * don't rely much on 3rd party libraries (the bread and butter of Python applications) Realistically, such a porting effort is not likely going to happen for any decent sized application, except perhaps a few open source ones. Such a policy would then translate to a dead end for Python 2.x based applications. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From skip at pobox.com Tue Apr 7 14:14:22 2009 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Apr 2009 07:14:22 -0500 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> Message-ID: <18907.17310.201358.697994@montanaro.dyndns.org> Ondrej> ... while scons and other Python solutions imho encourage to Ondrej> write full Python programs, which imho is a disadvantage for the Ondrej> build system, as then every build system is nonstandard. Hmmm... Like distutils setup scripts? I don't know thing one about cmake, but if it's good for the goose (building Python proper) would it be good for the gander (building extensions)? -- Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/ "XML sucks, dictionaries rock" - Dave Beazley From mal at egenix.com Tue Apr 7 14:30:19 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 07 Apr 2009 14:30:19 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49D66C6E.3090602@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> Message-ID: <49DB475B.8060504@egenix.com> [Resent due to a python.org mail server problem] On 2009-04-03 22:07, Martin v. L?wis wrote: >> I'd like to extend the proposal to Python 2.7 and later. > > I don't object, but I also don't want to propose this, so > I added it to the discussion. > > My (and perhaps other people's) concern is that 2.7 might > well be the last release of the 2.x series. If so, adding > this feature to it would make 2.7 an odd special case for > users and providers of third party tools. I certainly hope that we'll see more useful features backported from 3.x to the 2.x series or forward ported from 2.x to 3.x (depending on what the core developer preferences are). Regarding this particular PEP, it is well possible to implement an importer that provides the functionality for Python 2.3-2.7 versions, so it doesn't have to be an odd special case. >> That's going to slow down Python package detection a lot - you'd >> replace an O(1) test with an O(n) scan. > > I question that claim. In traditional Unix systems, the file system > driver performs a linear search of the directory, so it's rather > O(n)-in-kernel vs. O(n)-in-Python. Even for advanced file systems, > you need at least O(log n) to determine whether a specific file is > in a directory. For all practical purposes, the package directory > will fit in a single disk block (containing a single .pkg file, and > one or few subpackages), making listdir complete as fast as stat. On second thought, you're right, it won't be that costly. It requires an os.listdir() scan due to the wildcard approach and in some cases, such a scan may not be possible, e.g. when using frozen packages. Indeed, the freeze mechanism would not even add the .pkg files - it only handles .py file content. The same is true for distutils, MANIFEST generators and other installer mechanisms - it would have to learn to package the .pkg files along with the Python files. Another problem with the .pkg file approach is that the file extension is already in use for e.g. Mac OS X installers. You don't have those issues with the __pkg__.py file approach I suggested. >> Wouldn't it be better to stick with a simpler approach and look for >> "__pkg__.py" files to detect namespace packages using that O(1) check ? > > Again - this wouldn't be O(1). More importantly, it breaks system > packages, which now again have to deal with the conflicting file names > if they want to install all portions into a single location. True, but since that means changing the package infrastructure, I think it's fair to ask distributors who want to use that approach to also take care of looking into the __pkg__.py files and merging them if necessary. Most of the time the __pkg__.py files will be empty, so that's not really much to ask for. >> This would also avoid any issues you'd otherwise run into if you want >> to maintain this scheme in an importer that doesn't have access to a list >> of files in a package directory, but is well capable for the checking >> the existence of a file. > > Do you have a specific mechanism in mind? Yes: frozen modules and imports straight from a web resource. The .pkg file approach requires a directory scan and additional support from all importers. The __pkg__.py approach I suggested can use existing importers without modifications by checking for the existence of such a Python module in an importer managed resource. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Tue Apr 7 14:53:02 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Apr 2009 12:53:02 +0000 (UTC) Subject: [Python-Dev] Evaluated cmake as an autoconf replacement References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> Message-ID: pobox.com> writes: > > I don't know thing one about cmake, but if it's good for the goose (building > Python proper) would it be good for the gander (building extensions)? African or European? From ndbecker2 at gmail.com Tue Apr 7 15:02:13 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 07 Apr 2009 09:02:13 -0400 Subject: [Python-Dev] What's missing from easy_install Message-ID: 1. easy_remove! 2. Various utilities to provide query package management. - easy_install --list (list files installed) From kdr2 at x-macro.com Tue Apr 7 15:05:01 2009 From: kdr2 at x-macro.com (KDr2) Date: Tue, 7 Apr 2009 21:05:01 +0800 Subject: [Python-Dev] What's missing from easy_install In-Reply-To: References: Message-ID: I need an CPyAN. -- Best Regards, -- KDr2, at x-macro.com. On Tue, Apr 7, 2009 at 9:02 PM, Neal Becker wrote: > 1. easy_remove! > > 2. Various utilities to provide query package management. > - easy_install --list (list files installed) > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/kdr2%40x-macro.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Tue Apr 7 15:06:41 2009 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 7 Apr 2009 09:06:41 -0400 Subject: [Python-Dev] What's missing from easy_install In-Reply-To: References: Message-ID: <4222a8490904070606s77e8177exeb053c03bc63ae30@mail.gmail.com> On Tue, Apr 7, 2009 at 9:02 AM, Neal Becker wrote: > 1. easy_remove! > > 2. Various utilities to provide query package management. > ? - easy_install --list (list files installed) This discussion should happen on the distutils-sig list; not python-dev From solipsis at pitrou.net Tue Apr 7 15:06:53 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Apr 2009 13:06:53 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?What=27s_missing_from_easy=5Finstall?= References: Message-ID: Neal Becker gmail.com> writes: > > 2. Various utilities to provide query package management. > - easy_install --list (list files installed) "yolk" will tell you that. http://pypi.python.org/pypi/yolk Regards Antoine. From cournape at gmail.com Tue Apr 7 15:08:38 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 7 Apr 2009 22:08:38 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <18907.17310.201358.697994@montanaro.dyndns.org> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> Message-ID: <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> On Tue, Apr 7, 2009 at 9:14 PM, wrote: > > ? ?Ondrej> ... while scons and other Python solutions imho encourage to > ? ?Ondrej> write full Python programs, which imho is a disadvantage for the > ? ?Ondrej> build system, as then every build system is nonstandard. > > Hmmm... ?Like distutils setup scripts? fortunately, waf and scons are much better than distutils, at least for the build part :) I think it is hard to overestimate the importance of a python solution for python softwares (python itself is different). Having a full fledged language for complex builds is nice, I think most familiar with complex makefiles would agree with this. > > I don't know thing one about cmake, but if it's good for the goose (building > Python proper) would it be good for the gander (building extensions)? For complex softwares, specially ones relying on lot of C and platform idiosyncrasies, distutils is just too cumbersome and limited. Both Ondrej and me use python for scientific usage, and I think it is no hazard that we both look for something else. In those cases, scons - and cmake it seems - are very nice; build tools are incredibly hard to get right once you want to manage dependencies automatically. For simple python projects (pure python, a few .c source files without much dependencies), I think it is just overkill. cheers, David > > -- > Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/ > ? ? ? ?"XML sucks, dictionaries rock" - Dave Beazley > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/cournape%40gmail.com > From alex.neundorf at kitware.com Tue Apr 7 15:08:54 2009 From: alex.neundorf at kitware.com (Alexander Neundorf) Date: Tue, 7 Apr 2009 15:08:54 +0200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <18907.17310.201358.697994@montanaro.dyndns.org> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> Message-ID: <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> On Tue, Apr 7, 2009 at 2:14 PM, wrote: > > Ondrej> ... while scons and other Python solutions imho encourage to > Ondrej> write full Python programs, which imho is a disadvantage for the > Ondrej> build system, as then every build system is nonstandard. I fully agree here. > Hmmm... Like distutils setup scripts? > > I don't know thing one about cmake, but if it's good for the goose (building > Python proper) would it be good for the gander (building extensions)? What is involved in building python extensions ? Can you please explain ? Alex From cournape at gmail.com Tue Apr 7 15:23:18 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 7 Apr 2009 22:23:18 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> Message-ID: <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com> On Tue, Apr 7, 2009 at 10:08 PM, Alexander Neundorf wrote: > > What is involved in building python extensions ? Can you please explain ? Not much: at the core, a python extension is nothing more than a dynamically loaded library + a couple of options. One choice is whether to take options from distutils or to set them up independently. In my own scons tool to build python extensions, both are possible. The hard (or rather time consuming) work is to do everything else that distutils does related to the packaging. That's where scons/waf are more interesting than cmake IMO, because you can "easily" give up this task back to distutils, whereas it is inherently more difficult with cmake. cheers, David From pje at telecommunity.com Tue Apr 7 16:05:45 2009 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 07 Apr 2009 10:05:45 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49DB475B.8060504@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> Message-ID: <20090407140317.EBD383A4063@sparrow.telecommunity.com> At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote: > >> Wouldn't it be better to stick with a simpler approach and look for > >> "__pkg__.py" files to detect namespace packages using that O(1) check ? > > > > Again - this wouldn't be O(1). More importantly, it breaks system > > packages, which now again have to deal with the conflicting file names > > if they want to install all portions into a single location. > >True, but since that means changing the package infrastructure, I think >it's fair to ask distributors who want to use that approach to also take >care of looking into the __pkg__.py files and merging them if >necessary. > >Most of the time the __pkg__.py files will be empty, so that's not >really much to ask for. This means your proposal actually doesn't add any benefit over the status quo, where you can have an __init__.py that does nothing but declare the package a namespace. We already have that now, and it doesn't need a new filename. Why would we expect OS vendors to start supporting it, just because we name it __pkg__.py instead of __init__.py? From aahz at pythoncraft.com Tue Apr 7 16:29:02 2009 From: aahz at pythoncraft.com (Aahz) Date: Tue, 7 Apr 2009 07:29:02 -0700 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> Message-ID: <20090407142902.GC13081@panix.com> On Tue, Apr 07, 2009, Dirkjan Ochtman wrote: > On Tue, Apr 7, 2009 at 13:42, Steven D'Aprano wrote: >> >> Perhaps you should ask Aahz what he thinks about being forced to provide >> two names before being allowed to contribute. Thanks for speaking up! I'm not sure I would have noticed the implication of Dirkjan's post (I'm not paying a huge amount of attention to the conversion process). > Huh? The contributor's agreement list would presumably include real > names only (so Aahz is out of luck), but the names wouldn't need to be > limited to just one "word". What you apparently are unaware of is that "Aahz" is in fact my full legal name. (Which was clearly the point of Steven's post since he knows that Teller also has only one legal name -- it's not common, but we do exist.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From dirkjan at ochtman.nl Tue Apr 7 16:35:17 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 16:35:17 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <20090407142902.GC13081@panix.com> References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> <20090407142902.GC13081@panix.com> Message-ID: On Tue, Apr 7, 2009 at 16:29, Aahz wrote: > What you apparently are unaware of is that "Aahz" is in fact my full > legal name. ?(Which was clearly the point of Steven's post since he knows > that Teller also has only one legal name -- it's not common, but we do > exist.) Ah, sorry about that. But I hope you also concluded from my email that that wouldn't be a problem. Cheers, Dirkjan From aahz at pythoncraft.com Tue Apr 7 16:39:00 2009 From: aahz at pythoncraft.com (Aahz) Date: Tue, 7 Apr 2009 07:39:00 -0700 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> <20090407142902.GC13081@panix.com> Message-ID: <20090407143900.GA713@panix.com> On Tue, Apr 07, 2009, Dirkjan Ochtman wrote: > On Tue, Apr 7, 2009 at 16:29, Aahz wrote: >> >> What you apparently are unaware of is that "Aahz" is in fact my full >> legal name. (Which was clearly the point of Steven's post since he knows >> that Teller also has only one legal name -- it's not common, but we do >> exist.) > > Ah, sorry about that. But I hope you also concluded from my email that > that wouldn't be a problem. Nope, thanks for clearing it up. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From dickinsm at gmail.com Tue Apr 7 16:39:47 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 7 Apr 2009 15:39:47 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? Message-ID: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> Executive summary (details and discussion points below) ================= Some time ago, Noam Raphael pointed out that for a float x, repr(x) can often be much shorter than it currently is, without sacrificing the property that eval(repr(x)) == x, and proposed changing Python accordingly. See http://bugs.python.org/issue1580 For example, instead of the current behaviour: Python 3.1a2+ (py3k:71353:71354, Apr 7 2009, 12:55:16) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.029999999999999999 >>> 0.04 0.040000000000000001 >>> 0.04 == eval(repr(0.04)) True we'd have this: Python 3.1a2+ (py3k-short-float-repr:71350:71352M, Apr 7 2009, ) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 0.01 0.01 >>> 0.02 0.02 >>> 0.03 0.03 >>> 0.04 0.04 >>> 0.04 == eval(repr(0.04)) True Initial attempts to implement this encountered various difficulties, and at some point Tim Peters pointed out (I'm paraphrasing horribly here) that one can't have all three of {fast, easy, correct}. One PyCon 2009 sprint later, Eric Smith and I have produced the py3k-short-float-repr branch, which implements short repr of floats and also does some major cleaning up of the current float formatting functions. We've gone for the {fast, correct} pairing. We'd like to get this into Python 3.1. Any thoughts/objections/counter-proposals/...? More details ============ Our solution is based on an adaptation of David Gay's 'perfect rounding' code for inclusion in Python. To make eval(repr(x)) roundtripping work, one needs to have correctly rounded float -> decimal *and* decimal -> float conversions: Gay's code provides correctly rounded dtoa and strtod functions for these two conversions. His code is well-known and well-tested: it's used as the basis of the glibc strtod, and is also in OS X. It's available from http://www.netlib.org/fp/dtoa.c So our branch contains a new file Python/dtoa.c, which is a cut down version of Gay's original file. (We've removed stuff for VAX and IBM floating-point formats, hex NaNs, hex floating-point formats, locale-aware interpretation of the decimal separator, K&R headers, code for correct setting of the inexact flag, and various other bits and pieces that Python doesn't care about.) Most of the rest of the work is in the existing file Python/pystrtod.c. Every float -> string or string -> float conversion goes through a function in this file at some point. Gay's code also provides the opportunity to clean up the current float formatting code, and Eric has reworked a lot of the float formatting in the py3k-short-float-repr branch. This reworking should make finishing off the implementation of things like thousands separators much more straightforward. One example of this: the previous string -> float conversion used the system strtod, which is locale-aware, so the code had to first replace the '.' by the current locale's decimal separator, *then* call strtod. There was a similar dance in the reverse direction when doing float -> string conversion. Both these are now unnecessary. The current code is pretty close to ready for merging to py3k. I've uploaded a patchset to Rietveld: http://codereview.appspot.com/33084/show Apart from the short float repr, and a couple of bugfixes, all behaviour should be unchanged from before. There are a few exceptions: - format(1e200, '<') doesn't behave quite as it did before. See item (3) below for details - repr switches to using exponential notation at 1e16 instead of the previous 1e17. This avoids a subtle issue where the 'short float repr' result is padded with bogus zeros. - a similar change applies to str, which switches to exponential notation at 1e11, not 1e12. This fixes the following minor annoyance, which goes back at least as far as Python 2.5 (and probably much further): >>> x = 1e11 + 0.5 >>> x 100000000000.5 >>> print(x) 100000000000.0 That .0 seems wrong to me: if we're going to go to the trouble of printing extra digits (str usually only gives 12 significant digits; here there are 13), they should be the *right* extra digits. Discussion points ================= (1) Any objections to including this into py3k? If there's controversy, then I guess we'll need a PEP. (2) Should other Python implementations (Jython, IronPython, etc.) be expected to use short float repr, or should it just be considered an implementation detail of CPython? I propose the latter, except that all implementations should be required to satisfy eval(repr(x)) == x for finite floats x. (3) There's a PEP 3101 line we don't know what to do with. In py3k, we currently have: >>> format(1e200, '<') '1.0e+200' but in our py3k-short-float-repr branch: >>> format(1e200, '<') '1e+200' Which is correct? The py3k behaviour comes from the 'Standard Format Specifiers' section of PEP 3101, where it says: """ The available floating point presentation types are: [... list of other format codes omitted here ...] '' (None) - similar to 'g', except that it prints at least one digit after the decimal point. """ It's that 'at least one digit after the decimal point' bit that's at issue. I understood this to apply only to floats converted to a string *without* an exponent; this is the way that repr and str work, adding a .0 to floats formatted without an exponent, but leaving the .0 out when the exponent is present. Should the .0 always be added? Or is it required only when it would be necessary to distinguish a float string from an integer string? My preference is for the latter (i.e., format(x, '<') should behave in the same way as repr and str in this respect). But I'm biased, not least because the other behaviour would be a pain to implement. Does anyone care? This email is already too long. I'll stop now. Mark From mal at egenix.com Tue Apr 7 16:58:39 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 07 Apr 2009 16:58:39 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090407140317.EBD383A4063@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> Message-ID: <49DB6A1F.50801@egenix.com> On 2009-04-07 16:05, P.J. Eby wrote: > At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote: >> >> Wouldn't it be better to stick with a simpler approach and look for >> >> "__pkg__.py" files to detect namespace packages using that O(1) >> check ? >> > >> > Again - this wouldn't be O(1). More importantly, it breaks system >> > packages, which now again have to deal with the conflicting file names >> > if they want to install all portions into a single location. >> >> True, but since that means changing the package infrastructure, I think >> it's fair to ask distributors who want to use that approach to also take >> care of looking into the __pkg__.py files and merging them if >> necessary. >> >> Most of the time the __pkg__.py files will be empty, so that's not >> really much to ask for. > > This means your proposal actually doesn't add any benefit over the > status quo, where you can have an __init__.py that does nothing but > declare the package a namespace. We already have that now, and it > doesn't need a new filename. Why would we expect OS vendors to start > supporting it, just because we name it __pkg__.py instead of __init__.py? I lost you there. Since when do we support namespace packages in core Python without the need to add some form of magic support code to __init__.py ? My suggestion basically builds on the same idea as Martin's PEP, but uses a single __pkg__.py file as opposed to some non-Python file yaddayadda.pkg. Here's a copy of the proposal, with some additional discussion bullets added: """ Alternative Approach: --------------------- Wouldn't it be better to stick with a simpler approach and look for "__pkg__.py" files to detect namespace packages using that O(1) check ? This would also avoid any issues you'd otherwise run into if you want to maintain this scheme in an importer that doesn't have access to a list of files in a package directory, but is well capable for the checking the existence of a file. Mechanism: ---------- If the import mechanism finds a matching namespace package (a directory with a __pkg__.py file), it then goes into namespace package scan mode and scans the complete sys.path for more occurrences of the same namespace package. The import loads all __pkg__.py files of matching namespace packages having the same package name during the search. One of the namespace packages, the defining namespace package, will have to include a __init__.py file. After having scanned all matching namespace packages and loading the __pkg__.py files in the order of the search, the import mechanism then sets the packages .__path__ attribute to include all namespace package directories found on sys.path and finally executes the __init__.py file. (Please let me know if the above is not clear, I will then try to follow up on it.) Discussion: ----------- The above mechanism allows the same kind of flexibility we already have with the existing normal __init__.py mechanism. * It doesn't add yet another .pth-style sys.path extension (which are difficult to manage in installations). * It always uses the same naive sys.path search strategy. The strategy is not determined by some file contents. * The search is only done once - on the first import of the package. * It's possible to have a defining package dir and add-one package dirs. * The search does not depend on the order of directories in sys.path. There's no requirement for the defining package to appear first on sys.path. * Namespace packages are easy to recognize by testing for a single resource. * There's no conflict with existing files using the .pkg extension such as Mac OS X installer files or Solaris packages. * Namespace __pkg__.py modules can provide extra meta-information, logging, etc. to simplify debugging namespace package setups. * It's possible to freeze such setups, to put them into ZIP files, or only have parts of it in a ZIP file and the other parts in the file-system. * There's no need for a package directory scan, allowing the mechanism to also work with resources that do not permit to (easily and efficiently) scan the contents of a package "directory", e.g. frozen packages or imports from web resources. Caveats: * Changes to sys.path will not result in an automatic rescan for additional namespace packages, if the package was already loaded. However, we could have a function to make such a rescan explicit. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From Scott.Daniels at Acm.Org Tue Apr 7 17:04:56 2009 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Tue, 07 Apr 2009 08:04:56 -0700 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <49DAE298.7040007@canterbury.ac.nz> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <49DAE298.7040007@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Steve Holden wrote: > >> Isn't it strange how nobody every complained about the significance of >> whitespace in makefiles: only the fact that leading tabs were required >> rather than just-any-old whitespace. > > Make doesn't care how *much* whitespace there > is, though, only whether it's there or not. If > it accepted anything that looks like whitespace, > there would be no cause for complaint. > Make and the *roff formats had the nasty feature that they treated homographs differently. That is, you could print two sources that placed all the same ink on the paper at the same places, but they would perform differently. For make it was tabs. For the *roff files, the periods ending sentences and the periods for abbreviations (such as honorifics) were distinguished by following end-of-sentence periods with two spaces. This left any line ending in a period ambiguous, and tools to strip whitespace off the end of lines as information-destroying. --Scott David Daniels Scott.Daniels at Acm.Org From ronaldoussoren at mac.com Tue Apr 7 17:10:01 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 07 Apr 2009 17:10:01 +0200 Subject: [Python-Dev] PyDict_SetItem hook In-Reply-To: References: <49D3F8D0.8070805@wingware.com> <43aa6ff70904011731l43fc151dib8673788f87f46de@mail.gmail.com> <49D42013.3010600@wingware.com> <9e804ac0904021141k6653e2d6v442ef6065688236e@mail.gmail.com> <78A8FD816C154A01A1A02810534CB4F1@RaymondLaptop1> Message-ID: On 3 Apr, 2009, at 0:57, Guido van Rossum wrote: >> > > The primary use case is some kind of trap on assignment. While this > cannot cover all cases, most non-local variables are stored in dicts. > List mutations are not in the same league, as use case. I have a slightly different use-case than a debugger, although it boils down to "some kind of trap on assignment": implementing Key- Value Observing support for Python objects in PyObjC. "Key-Value Observing" is a technique in Cocoa where you can get callbacks when a property of an object changes and is something I cannot support for plain python objects at the moment due to lack of a callback mechanism. A full implementation would require hooks for mutation of lists and sets as well. The lack of mutation hooks is not a terrible problem for PyObjC, we can always use Cocoa datastructures when using KVO, but it is somewhat annoying that Cocoa datastructures leak into code that could be pure python just because I want to use KVO. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From techtonik at gmail.com Tue Apr 7 17:10:08 2009 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 7 Apr 2009 18:10:08 +0300 Subject: [Python-Dev] os.defpath for Windows In-Reply-To: <494E0A2B.4080704@gmail.com> References: <494E0A2B.4080704@gmail.com> Message-ID: Hi, I've added the issue to tracker. http://bugs.python.org/issue5717 --anatoly t. On Sun, Dec 21, 2008 at 12:19 PM, Yinon Ehrlich wrote: > Hi, > > just saw that os.defpath for Windows is defined as > ? ? ? ?Lib/ntpath.py:30:defpath = '.;C:\\bin' > > Most Windows machines I saw has no c:\bin directory. > > Any reason why it was defined this way ? > Thanks, > ? ? ? ?Yinon > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com > From skip at pobox.com Tue Apr 7 17:19:25 2009 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Apr 2009 10:19:25 -0500 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <200904071010.16855.steve@pearwood.info> Message-ID: <18907.28413.42458.631358@montanaro.dyndns.org> Cesare> The only difference at this time is regards invalid operations, Cesare> which will raise exceptions at compile time, not at running Cesare> time. Cesare> So if you write: Cesare> a = 1 / 0 Cesare> an exception will be raised at compile time. I think I have to call *bzzzzt* here. This is a common technique used during debugging. Insert a 1/0 to force an exception (possibly causing the running program to drop into pdb). I think you have to leave that in. Skip From skip at pobox.com Tue Apr 7 17:22:05 2009 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Apr 2009 10:22:05 -0500 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> Message-ID: <18907.28573.46660.915761@montanaro.dyndns.org> >> I don't know thing one about cmake, but if it's good for the goose >> (building Python proper) would it be good for the gander (building >> extensions)? Antoine> African or European? I was thinking Canadian... Skip From cesare.dimauro at a-tono.com Tue Apr 7 17:19:10 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 07 Apr 2009 17:19:10 +0200 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <18907.28413.42458.631358@montanaro.dyndns.org> References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> Message-ID: In data 07 aprile 2009 alle ore 17:19:25, ha scritto: > > Cesare> The only difference at this time is regards invalid operations, > Cesare> which will raise exceptions at compile time, not at running > Cesare> time. > > Cesare> So if you write: > > Cesare> a = 1 / 0 > > Cesare> an exception will be raised at compile time. > > I think I have to call *bzzzzt* here. This is a common technique used > during debugging. Insert a 1/0 to force an exception (possibly causing the > running program to drop into pdb). I think you have to leave that in. > > Skip Many tests rely on this, and I have changed them from something like: try: 1 / 0 except: .... to try: a = 1; a / 0 except: .... But I know that it's a major source of incompatibilities, and in the final code I'll enabled it only if user demanded it (through a flag). Cesare From cournape at gmail.com Tue Apr 7 17:29:02 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 8 Apr 2009 00:29:02 +0900 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49DB6A1F.50801@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> Message-ID: <5b8d13220904070829j416b2536u885cd79a33ebefb5@mail.gmail.com> On Tue, Apr 7, 2009 at 11:58 PM, M.-A. Lemburg wrote: >> >> This means your proposal actually doesn't add any benefit over the >> status quo, where you can have an __init__.py that does nothing but >> declare the package a namespace. ?We already have that now, and it >> doesn't need a new filename. ?Why would we expect OS vendors to start >> supporting it, just because we name it __pkg__.py instead of __init__.py? > > I lost you there. > > Since when do we support namespace packages in core Python without > the need to add some form of magic support code to __init__.py ? I think P. Eby refers to the problem that most packaging systems don't like several packages to have the same file - be it empty or not. That's my main personal grip against namespace packages, and from this POV, I think it is fair to say the proposal does not solve anything. Not that I have a solution, of course :) cheers, David > > My suggestion basically builds on the same idea as Martin's PEP, > but uses a single __pkg__.py file as opposed to some non-Python > file yaddayadda.pkg. > > Here's a copy of the proposal, with some additional discussion > bullets added: > > """ > Alternative Approach: > --------------------- > > Wouldn't it be better to stick with a simpler approach and look for > "__pkg__.py" files to detect namespace packages using that O(1) check ? > > This would also avoid any issues you'd otherwise run into if you want > to maintain this scheme in an importer that doesn't have access to a list > of files in a package directory, but is well capable for the checking > the existence of a file. > > Mechanism: > ---------- > > If the import mechanism finds a matching namespace package (a directory > with a __pkg__.py file), it then goes into namespace package scan mode and > scans the complete sys.path for more occurrences of the same namespace > package. > > The import loads all __pkg__.py files of matching namespace packages > having the same package name during the search. > > One of the namespace packages, the defining namespace package, will have > to include a __init__.py file. > > After having scanned all matching namespace packages and loading > the __pkg__.py files in the order of the search, the import mechanism > then sets the packages .__path__ attribute to include all namespace > package directories found on sys.path and finally executes the > __init__.py file. > > (Please let me know if the above is not clear, I will then try to > follow up on it.) > > Discussion: > ----------- > > The above mechanism allows the same kind of flexibility we already > have with the existing normal __init__.py mechanism. > > * It doesn't add yet another .pth-style sys.path extension (which are > difficult to manage in installations). > > * It always uses the same naive sys.path search strategy. The strategy > is not determined by some file contents. > > * The search is only done once - on the first import of the package. > > * It's possible to have a defining package dir and add-one package > dirs. > > * The search does not depend on the order of directories in sys.path. > There's no requirement for the defining package to appear first > on sys.path. > > * Namespace packages are easy to recognize by testing for a single > resource. > > * There's no conflict with existing files using the .pkg extension > such as Mac OS X installer files or Solaris packages. > > * Namespace __pkg__.py modules can provide extra meta-information, > logging, etc. to simplify debugging namespace package setups. > > * It's possible to freeze such setups, to put them into ZIP files, > or only have parts of it in a ZIP file and the other parts in the > file-system. > > * There's no need for a package directory scan, allowing the > mechanism to also work with resources that do not permit to > (easily and efficiently) scan the contents of a package "directory", > e.g. frozen packages or imports from web resources. > > Caveats: > > * Changes to sys.path will not result in an automatic rescan for > additional namespace packages, if the package was already loaded. > However, we could have a function to make such a rescan explicit. > """ > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source ?(#1, Apr 07 2009) >>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/ > ________________________________________________________________________ > 2009-03-19: Released mxODBC.Connect 1.0.1 ? ? ?http://python.egenix.com/ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48 > ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611 > ? ? ? ? ? ? ? http://www.egenix.com/company/contact/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/cournape%40gmail.com > From regebro at gmail.com Tue Apr 7 17:34:37 2009 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 7 Apr 2009 17:34:37 +0200 Subject: [Python-Dev] What's missing from easy_install In-Reply-To: References: Message-ID: <319e029f0904070834j47066061h10c3c9aafe9dd9c9@mail.gmail.com> On Tue, Apr 7, 2009 at 15:05, KDr2 wrote: > I need an CPyAN. On the lighter side of things: That would be pronounced "spy-ann", which mean "the vomit" is swedish. Do you still want it? :-D -- Lennart Regebro: Pythonista, Barista, Notsotrista. http://regebro.wordpress.com/ +33 661 58 14 64 From fuzzyman at voidspace.org.uk Tue Apr 7 17:41:06 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 07 Apr 2009 16:41:06 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> Message-ID: <49DB7412.9030404@voidspace.org.uk> Mark Dickinson wrote: > [snip...] > > Discussion points > ================= > > (1) Any objections to including this into py3k? If there's > controversy, then I guess we'll need a PEP. > Big +1 > (2) Should other Python implementations (Jython, > IronPython, etc.) be expected to use short float repr, or should > it just be considered an implementation detail of CPython? > I propose the latter, except that all implementations should > be required to satisfy eval(repr(x)) == x for finite floats x. > Short float repr should be an implementation detail, so long as eval(repr(x)) == x still holds. Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From p.f.moore at gmail.com Tue Apr 7 17:51:35 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 7 Apr 2009 16:51:35 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <79990c6b0904070850l7513d9b7y2863d347d87d7e6f@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49DB7412.9030404@voidspace.org.uk> <79990c6b0904070850l7513d9b7y2863d347d87d7e6f@mail.gmail.com> Message-ID: <79990c6b0904070851k9ea2054o4864ccd4fb0c9b35@mail.gmail.com> It would have helped if I'd copied the list... Sorry, Paul. 2009/4/7 Paul Moore : > 2009/4/7 Michael Foord : >> Mark Dickinson wrote: >>> >>> [snip...] >>> ?Discussion points >>> ================= >>> >>> (1) Any objections to including this into py3k? ?If there's >>> controversy, then I guess we'll need a PEP. >>> >> >> Big +1 >>> >>> (2) Should other Python implementations (Jython, >>> IronPython, etc.) be expected to use short float repr, or should >>> it just be considered an implementation detail of CPython? >>> I propose the latter, except that all implementations should >>> be required to satisfy eval(repr(x)) == x for finite floats x. >>> >> >> Short float repr should be an implementation detail, so long as >> eval(repr(x)) == x still holds. > > What he said :-) > Paul. > From eric at trueblade.com Tue Apr 7 17:55:34 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 07 Apr 2009 11:55:34 -0400 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> Message-ID: <49DB7776.3010500@trueblade.com> Mark Dickinson wrote: > One PyCon 2009 sprint later, Eric Smith and I have > produced the py3k-short-float-repr branch, which implements > short repr of floats and also does some major cleaning > up of the current float formatting functions. > We've gone for the {fast, correct} pairing. > We'd like to get this into Python 3.1. > > Any thoughts/objections/counter-proposals/...? As part of the decision process, we've tried this on several buildbots, and it has been successful on at least: AMD64 Gentoo: http://www.python.org/dev/buildbot/3.x/amd64%20gentoo%203.x/builds/592 PPC Debian unstable: http://www.python.org/dev/buildbot/3.x/ppc%20Debian%20unstable%203.x/builds/584 Sparc Solaris 10: http://www.python.org/dev/buildbot/3.x/sparc%20solaris10%20gcc%203.x/builds/493 The Sparc test failed, but that wasn't our fault! Our tests succeeded. These builds are in addition to x86 Linux and x86 Mac, which we've developed on. Eric. From aahz at pythoncraft.com Tue Apr 7 18:01:31 2009 From: aahz at pythoncraft.com (Aahz) Date: Tue, 7 Apr 2009 09:01:31 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> Message-ID: <20090407160130.GA1220@panix.com> On Tue, Apr 07, 2009, Mark Dickinson wrote: > > Executive summary (details and discussion points below) > ================= > > Some time ago, Noam Raphael pointed out that for a float x, > repr(x) can often be much shorter than it currently is, without > sacrificing the property that eval(repr(x)) == x, and proposed > changing Python accordingly. See > > http://bugs.python.org/issue1580 Sounds good to me! -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From guido at python.org Tue Apr 7 18:19:37 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Apr 2009 09:19:37 -0700 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <49DB4624.604@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> Message-ID: On Tue, Apr 7, 2009 at 5:25 AM, M.-A. Lemburg wrote: > On 2009-04-06 15:21, Jesse Noller wrote: >> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>> I propose the following PEP for inclusion to Python 3.1. >>> Thanks for picking this up. >>> >>> I'd like to extend the proposal to Python 2.7 and later. >>> >> >> -1 to adding it to the 2.x series. There was much discussion around >> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >> add new features to 2.x and use those new features as carrots to help >> lead people into 3.0. > > I must have missed that discussion :-) > > Where's the PEP pinning this down ? > > The Python 2.x user base is huge and the number of installed > applications even larger. > > Cutting these users and application developers off of important new > features added to Python 3 is only going to work as "carrot" for > those developers who: > > ?* have enough resources (time, money, manpower) to port their existing > ? application to Python 3 > > ?* can persuade their users to switch to Python 3 > > ?* don't rely much on 3rd party libraries (the bread and butter > ? of Python applications) > > Realistically, such a porting effort is not likely going to happen > for any decent sized application, except perhaps a few open source > ones. > > Such a policy would then translate to a dead end for Python 2.x > based applications. Think of the advantages though! Python 2 will finally become *stable*. The group of users you are talking to are usually balking at the thought of upgrading from 2.x to 2.(x+1) just as much as they might balk at the thought of Py3k. We're finally giving them what they really want. Regarding calling this a dead end, we're committed to supporting 2.x for at least five years. If that's not enough, well, it's open source, so there's no reason why some group of rogue 2.x fans can't maintain it indefinitely after that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 7 18:25:53 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Apr 2009 09:25:53 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> Message-ID: Well I'm sorry Cesare but this is unacceptable. As Skip points out there is plenty of code that relies on this. Also, consider what "problem" you are trying to solve here. What is the benefit to the user of moving this error to compile time? I cannot see any. --Guido On Tue, Apr 7, 2009 at 8:19 AM, Cesare Di Mauro wrote: > In data 07 aprile 2009 alle ore 17:19:25, ha scritto: > >> >> ? ? Cesare> The only difference at this time is regards invalid operations, >> ? ? Cesare> which will raise exceptions at compile time, not at running >> ? ? Cesare> time. >> >> ? ? Cesare> So if you write: >> >> ? ? Cesare> a = 1 / 0 >> >> ? ? Cesare> an exception will be raised at compile time. >> >> I think I have to call *bzzzzt* here. ?This is a common technique used >> during debugging. ?Insert a 1/0 to force an exception (possibly causing the >> running program to drop into pdb). ?I think you have to leave that in. >> >> Skip > > Many tests rely on this, and I have changed them from something like: > > try: > ? 1 / 0 > except: > ?.... > > to > > try: > ?a = 1; a / 0 > except: > ?.... > > But I know that it's a major source of incompatibilities, and in the final > code I'll enabled it only if user demanded it (through a flag). > > Cesare > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Tue Apr 7 18:34:49 2009 From: aahz at pythoncraft.com (Aahz) Date: Tue, 7 Apr 2009 09:34:49 -0700 Subject: [Python-Dev] calling dictresize outside dictobject.c In-Reply-To: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> Message-ID: <20090407163449.GA10119@panix.com> On Mon, Apr 06, 2009, Dan Schult wrote: > > I'm trying to write a C extension which is a subclass of dict. > I want to do something like a setdefault() but with a single lookup. python-dev is for core development, not for questions about using Python. Please use comp.lang.python or the capi-sig list. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From cesare.dimauro at a-tono.com Tue Apr 7 18:46:29 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 7 Apr 2009 18:46:29 +0200 (CEST) Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> Message-ID: <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com> On Tue, Apr 7, 2009 06:25PM, Guido van Rossum wrote: > Well I'm sorry Cesare but this is unacceptable. As Skip points out > there is plenty of code that relies on this. Guido, as I already said, in the final code the normal Python behaviour will be kept, and the stricter one will be enabled solely due to a flag set by the user. > Also, consider what > "problem" you are trying to solve here. What is the benefit to the > user of moving this error to compile time? I cannot see any. > > --Guido In my experience it's better to discover a bug at compile time rather than at running time. Cesare > On Tue, Apr 7, 2009 at 8:19 AM, Cesare Di Mauro > wrote: >> In data 07 aprile 2009 alle ore 17:19:25, ha scritto: >> >>> >>> ? ? Cesare> The only difference at this time is regards invalid >>> operations, >>> ? ? Cesare> which will raise exceptions at compile time, not at running >>> ? ? Cesare> time. >>> >>> ? ? Cesare> So if you write: >>> >>> ? ? Cesare> a = 1 / 0 >>> >>> ? ? Cesare> an exception will be raised at compile time. >>> >>> I think I have to call *bzzzzt* here. ?This is a common technique used >>> during debugging. ?Insert a 1/0 to force an exception (possibly causing >>> the >>> running program to drop into pdb). ?I think you have to leave that in. >>> >>> Skip >> >> Many tests rely on this, and I have changed them from something like: >> >> try: >> ? 1 / 0 >> except: >> ?.... >> >> to >> >> try: >> ?a = 1; a / 0 >> except: >> ?.... >> >> But I know that it's a major source of incompatibilities, and in the >> final >> code I'll enabled it only if user demanded it (through a flag). >> >> Cesare >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > > From guido at python.org Tue Apr 7 19:22:15 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Apr 2009 10:22:15 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com> References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com> Message-ID: On Tue, Apr 7, 2009 at 9:46 AM, Cesare Di Mauro wrote: > On Tue, Apr 7, 2009 06:25PM, Guido van Rossum wrote: >> Well I'm sorry Cesare but this is unacceptable. As Skip points out >> there is plenty of code that relies on this. > > Guido, as I already said, in the final code the normal Python behaviour > will be kept, and the stricter one will be enabled solely due to a flag > set by the user. Ok. >> Also, consider what >> "problem" you are trying to solve here. What is the benefit to the >> user of moving this error to compile time? I cannot see any. >> >> --Guido > > In my experience it's better to discover a bug at compile time rather than > at running time. That's my point though, which you seem to be ignoring: if the user explicitly writes "1/0" it is not likely to be a bug. That's very different than "1/x" where x happens to take on zero at runtime -- *that* is likely bug, but a constant folder can't detect that (at least not for Python). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From cournape at gmail.com Tue Apr 7 19:41:10 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 8 Apr 2009 02:41:10 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com> Message-ID: <5b8d13220904071041w36087a87rf84c8b52defc02c0@mail.gmail.com> On Wed, Apr 8, 2009 at 2:24 AM, Heikki Toivonen wrote: > David Cournapeau wrote: >> The hard (or rather time consuming) work is to do everything else that >> distutils does related to the packaging. That's where scons/waf are >> more interesting than cmake IMO, because you can "easily" give up this >> task back to distutils, whereas it is inherently more difficult with >> cmake. > > I think this was the first I heard about using SCons this way. Do you > have any articles or examples of this? If not, could you perhaps write one? I developed numscons as an experiment to build numpy, scipy, and other complex python projects depending on many library/compilers: http://github.com/cournape/numscons/tree/master The general ideas are somewhat explained on my blog http://cournape.wordpress.com/?s=numscons And also the slides from SciPy08 conf: http://conference.scipy.org/static/wiki/numscons.pdf It is plugged into distutils through a scons command (which bypasses all the compiled build_* ones, so that the whole build is done through scons for correct dependency handling). It is not really meant as a general replacement (it is too fragile, partly because of distutils, partly because of scons, partly because of me), but it shows it is possible not only theoretically. cheers, David From pje at telecommunity.com Tue Apr 7 19:46:21 2009 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 07 Apr 2009 13:46:21 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49DB6A1F.50801@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> Message-ID: <20090407174355.B62983A4063@sparrow.telecommunity.com> At 04:58 PM 4/7/2009 +0200, M.-A. Lemburg wrote: >On 2009-04-07 16:05, P.J. Eby wrote: > > At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote: > >> >> Wouldn't it be better to stick with a simpler approach and look for > >> >> "__pkg__.py" files to detect namespace packages using that O(1) > >> check ? > >> > > >> > Again - this wouldn't be O(1). More importantly, it breaks system > >> > packages, which now again have to deal with the conflicting file names > >> > if they want to install all portions into a single location. > >> > >> True, but since that means changing the package infrastructure, I think > >> it's fair to ask distributors who want to use that approach to also take > >> care of looking into the __pkg__.py files and merging them if > >> necessary. > >> > >> Most of the time the __pkg__.py files will be empty, so that's not > >> really much to ask for. > > > > This means your proposal actually doesn't add any benefit over the > > status quo, where you can have an __init__.py that does nothing but > > declare the package a namespace. We already have that now, and it > > doesn't need a new filename. Why would we expect OS vendors to start > > supporting it, just because we name it __pkg__.py instead of __init__.py? > >I lost you there. > >Since when do we support namespace packages in core Python without >the need to add some form of magic support code to __init__.py ? > >My suggestion basically builds on the same idea as Martin's PEP, >but uses a single __pkg__.py file as opposed to some non-Python >file yaddayadda.pkg. Right... which completely obliterates the primary benefit of the original proposal compared to the status quo. That is, that the PEP 382 way is more compatible with system packaging tools. Without that benefit, there's zero gain in your proposal over having __init__.py files just call pkgutil.extend_path() (in the stdlib since 2.3, btw) or pkg_resources.declare_namespace() (similar functionality, but with zipfile support and some other niceties). IOW, your proposal doesn't actually improve the status quo in any way that I am able to determine, except that it calls for loading all the __pkg__.py modules, rather than just the first one. (And the setuptools implementation of namespace packages actually *does* load multiple __init__.py's, so that's still no change over the status quo for setuptools-using packages.) From cesare.dimauro at a-tono.com Tue Apr 7 19:51:45 2009 From: cesare.dimauro at a-tono.com (Cesare Di Mauro) Date: Tue, 7 Apr 2009 19:51:45 +0200 (CEST) Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com> Message-ID: <62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com> On Tue, Apr 7, 2009 07:22PM, Guido van Rossum wrote: >> In my experience it's better to discover a bug at compile time rather >> than >> at running time. > > That's my point though, which you seem to be ignoring: if the user > explicitly writes "1/0" it is not likely to be a bug. That's very > different than "1/x" where x happens to take on zero at runtime -- > *that* is likely bug, but a constant folder can't detect that (at > least not for Python). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) I agree. My only concern was about user mistyping that can leed to an error interceptable by a stricter constant folder. But I admit that it's a rarer case compared to an explicit exception raising such the one you showed. Cesare From ajaksu at gmail.com Tue Apr 7 20:25:53 2009 From: ajaksu at gmail.com (Daniel (ajax) Diniz) Date: Tue, 7 Apr 2009 15:25:53 -0300 Subject: [Python-Dev] Mercurial? In-Reply-To: References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> Message-ID: <2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com> Dirkjan Ochtman wrote: > One of the nicer features of Mercurial/DVCSs, in my experience, is > that non-committers get to keep the credit on their patches. That > means that it's impossible to enforce a policy more extensive than > some basic checks (such as the format above). Unless we keep a list of > people who have signed an agreement, which will mean people will have > to re-do the username on commits that don't constitute a non-trivial > contribution. Maybe it'd be better to first replicate the current workflow, shortcomings and all, then later discuss a new policy? That would mean no credits for non-commiters should come from the VCS alone: those come from commit messages, the ACKS file, copyright notices in source, etc. BTW, keep in mind some people will prefer to submit diff-generated, non-hg patches. IMO, this use case should be supported before the rich-patch one. Regards, Daniel From dirkjan at ochtman.nl Tue Apr 7 20:32:46 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 7 Apr 2009 20:32:46 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> <2d75d7660904071125o3e132dabg4f250a52755e81dd@mail.gmail.com> Message-ID: On Tue, Apr 7, 2009 at 20:25, Daniel (ajax) Diniz wrote: > BTW, keep in mind some people will prefer to submit diff-generated, > non-hg patches. IMO, ?this use case should be supported before the > rich-patch one. Sure, that will be in the PEP as well (and it's quite simple). Cheers, Dirkjan From brtzsnr at gmail.com Tue Apr 7 20:59:01 2009 From: brtzsnr at gmail.com (=?UTF-8?Q?Alexandru_Mo=C8=99oi?=) Date: Tue, 7 Apr 2009 21:59:01 +0300 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues Message-ID: > From:?"Cesare Di Mauro" > So if Python will generate > > LOAD_CONST ? ? ?1 > LOAD_CONST ? ? ?2 > BINARY_ADD > > the constant folding code will simply replace them with a single > > LOAD_CONST ? ? ?3 > > When working with such kind of optimizations, the temptation is to > apply them at any situation possible. For example, in other languages > this > > a = b * 2 * 3 > > will be replaced by > > a = b * 6 > > In Python I can't do that, because b can be an object which overloaded > the * operator, so it *must* be called two times, one for 2 and one for 3. Not necessarily. For example C/C++ doesn't define the order of the operations inside an expression (and AFAIK neither Python) and therefore folding 2 * 3 is OK whether b is an integer or an arbitrary object with mul operator overloaded. Moreover one would expect * to be associative and commutative (take a look at Python strings); if a * 2 * 3 returns a different result from a * 6 I will find it very surprising and probably reject such code. However you can fix the order of operations like this: a = (b * 2) * 3 or a = b * (2 * 3) or a = b * 2 a = a * 3 -- Alexandru Mo?oi http://alexandru.mosoi.googlepages.com From fredrik.johansson at gmail.com Tue Apr 7 21:09:26 2009 From: fredrik.johansson at gmail.com (Fredrik Johansson) Date: Tue, 7 Apr 2009 21:09:26 +0200 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <3d0cebfb0904071209m12b0d587vffc2057454ba5363@mail.gmail.com> On Tue, Apr 7, 2009 at 8:59 PM, Alexandru Mo?oi wrote: > Not necessarily. For example C/C++ doesn't define the order of the > operations inside an expression (and AFAIK neither Python) and > therefore folding 2 * 3 is OK whether b is an integer or an arbitrary > object with mul operator overloaded. Moreover one would expect * to be > associative and commutative (take a look at Python strings); if a * 2 > * 3 returns a different result from a * 6 I will find it very > surprising and probably reject such code. Multiplication is not associative for floats: >>> a = 0.1 >>> a*3*5 1.5000000000000002 >>> a*(3*5) 1.5 Fredrik From martin at v.loewis.de Tue Apr 7 21:20:47 2009 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Apr 2009 21:20:47 +0200 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <49DB4624.604@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> Message-ID: <49DBA78F.7010904@v.loewis.de> > Such a policy would then translate to a dead end for Python 2.x > based applications. 2.x based applications *are* in a dead end, with the only exit being portage to 3.x. Regards, Martin From firephoenix at wanadoo.fr Tue Apr 7 21:30:19 2009 From: firephoenix at wanadoo.fr (Firephoenix) Date: Tue, 07 Apr 2009 21:30:19 +0200 Subject: [Python-Dev] Generator methods - "what's next" ? In-Reply-To: <49D9371C.3000202@canterbury.ac.nz> References: <49D896A4.3000104@wanadoo.fr> <49D9371C.3000202@canterbury.ac.nz> Message-ID: <49DBA9CB.6010100@wanadoo.fr> Greg Ewing a ?crit : > > Firephoenix wrote: > >> I basically agreed with renaming the next() method to __next__(), so >> as to follow the naming of other similar methods (__iter__() etc.). >> But I noticed then that all the other methods of the generator had >> stayed the same (send, throw, close...) > > Keep in mind that next() is part of the iterator protocol > that applies to all iterators, whereas the others are > specific to generators. By your reasoning, any object that > has any __xxx__ methods should have all its other methods > turned into __xxx__ methods as well. > Indeed, I kind of mixed up generators with the wider family of iterators. From martin at v.loewis.de Tue Apr 7 21:51:16 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Apr 2009 21:51:16 +0200 Subject: [Python-Dev] Mercurial? In-Reply-To: <200904072142.06158.steve@pearwood.info> References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> Message-ID: <49DBAEB4.1070007@v.loewis.de> > Ben is correct: you can't assume that contributors will have both a > first name and a last name, or that a first name and last name is > sufficient to legally identify them. Those from Spanish and Portuguese > cultures usually have two family names as well as a personal name; > people from Indonesian, Burmese and Malaysian cultures often only use a > single name. That's why asking for a policy. We have to have *some* way of identifying where a certain change originated from. I'm sure there is solution, and it doesn't matter to me whether I need to identify myself as "Martin v. L?wis" or "Martinv von L?wis of Menar". Regards, Martin From jared.grubb at gmail.com Tue Apr 7 21:55:10 2009 From: jared.grubb at gmail.com (Jared Grubb) Date: Tue, 7 Apr 2009 12:55:10 -0700 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: <2BA445ED-A541-4EA0-8BEC-6D0C469F971E@gmail.com> On 7 Apr 2009, at 11:59, Alexandru Mo?oi wrote: > Not necessarily. For example C/C++ doesn't define the order of the > operations inside an expression (and AFAIK neither Python) and > therefore folding 2 * 3 is OK whether b is an integer or an arbitrary > object with mul operator overloaded. Moreover one would expect * to be > associative and commutative (take a look at Python strings); if a * 2 > * 3 returns a different result from a * 6 I will find it very > surprising and probably reject such code. That's not true. All ops in C/C++ have associativity that is fixed and well-defined; the star op is left-associative: 2*3*x is (2*3)*x is 6*x x*2*3 is (x*2)*3, and this is NOT x*6 (You can show this in C++ by creating a class that has a side-effect in its * operator). The star operator is not commutative in Python or C/C++ (otherwise what would __rmul__ do?). It's easier to see that + is not commutative: "abc"+"def" and "def"+"abc" are definitely different! You may be confusing the "order is undefined" for the evaluation of parameter lists in C/C++. Example: foo(f(),g()) calls f and g in an undefined order. Jared From martin at v.loewis.de Tue Apr 7 21:59:09 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 07 Apr 2009 21:59:09 +0200 Subject: [Python-Dev] $Id$ and sys.subversion (Was: Mercurial?) In-Reply-To: References: <20090404154049.GA23987@panix.com> <873acl7zga.fsf@benfinney.id.au> <200904072142.06158.steve@pearwood.info> Message-ID: <49DBB08D.3090208@v.loewis.de> One issue that the PEP needs to address is what to do with the files that use svn (really, CVS) keywords, and what should happen to sys.subversion. Along with it goes the question what sys.version should say. It probably would be good if somebody could produce a patch that can be applied to a mercurial checkout that gets these things right (perhaps a Mercurial branch in itself?). Subversion-specific code is both in configure.in, Makefile.pre.in, and PCbuild/make_buildinfo.c (not sure whether that would still be needed). Regards, Martin From tjreedy at udel.edu Tue Apr 7 23:04:43 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 07 Apr 2009 17:04:43 -0400 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: Message-ID: Daniel Fetchinson wrote: > The decorator module [1] written by Michele Simionato is a very useful > tool for maintaining function signatures while applying a decorator. > Many different projects implement their own versions of the same > functionality, for example turbogears has its own utility for this, I > guess others do something similar too. > > Was the issue whether to include this module in the stdlib raised? If > yes, what were the arguments against it? If not, what do you folks > think, shouldn't it be included? I certainly think it should be. > > Originally I sent this message to c.l.p [2] and Michele suggested it > be brought up on python-dev. He also pointed out that a PEP [3] is > already written about this topic and it is in draft form. > > What do you guys think, wouldn't this be a useful addition to functools? > [1] http://pypi.python.org/pypi/decorator > [2] http://groups.google.com/group/comp.lang.python/browse_thread/thread/d4056023f1150fe0 > [3] http://www.python.org/dev/peps/pep-0362/ This probably should have gone to the python-ideas list. In any case, I think it needs to start with a clear offer from Michele (directly or relayed by you) to contribute it to the PSF with the usual conditions. From tjreedy at udel.edu Tue Apr 7 23:09:13 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 07 Apr 2009 17:09:13 -0400 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: <62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com> References: <200904071010.16855.steve@pearwood.info> <18907.28413.42458.631358@montanaro.dyndns.org> <56851.151.53.159.5.1239122789.squirrel@webmail6.pair.com> <62037.151.53.159.5.1239126705.squirrel@webmail6.pair.com> Message-ID: Cesare Di Mauro wrote: > On Tue, Apr 7, 2009 07:22PM, Guido van Rossum wrote: >>> In my experience it's better to discover a bug at compile time rather >>> than >>> at running time. >> That's my point though, which you seem to be ignoring: if the user >> explicitly writes "1/0" it is not likely to be a bug. That's very >> different than "1/x" where x happens to take on zero at runtime -- >> *that* is likely bug, but a constant folder can't detect that (at >> least not for Python). >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) > > I agree. My only concern was about user mistyping that can leed to an > error interceptable by a stricter constant folder. > > But I admit that it's a rarer case compared to an explicit exception > raising such the one you showed. I would guess that it is so rare as to not be worth bothering about. From tjreedy at udel.edu Tue Apr 7 23:11:41 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 07 Apr 2009 17:11:41 -0400 Subject: [Python-Dev] pyc files, constant folding and borderline portability issues In-Reply-To: References: Message-ID: Alexandru Mo?oi wrote: >> From: "Cesare Di Mauro" >> So if Python will generate >> >> LOAD_CONST 1 >> LOAD_CONST 2 >> BINARY_ADD >> >> the constant folding code will simply replace them with a single >> >> LOAD_CONST 3 >> >> When working with such kind of optimizations, the temptation is to >> apply them at any situation possible. For example, in other languages >> this >> >> a = b * 2 * 3 >> >> will be replaced by >> >> a = b * 6 >> >> In Python I can't do that, because b can be an object which overloaded >> the * operator, so it *must* be called two times, one for 2 and one for 3. > > Not necessarily. For example C/C++ doesn't define the order of the > operations inside an expression (and AFAIK neither Python) Yes is does. Expression/Evaluation order "Python evaluates expressions from left to right." From alex.neundorf at kitware.com Tue Apr 7 23:42:48 2009 From: alex.neundorf at kitware.com (Alexander Neundorf) Date: Tue, 7 Apr 2009 23:42:48 +0200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com> Message-ID: <806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com> On Tue, Apr 7, 2009 at 3:23 PM, David Cournapeau wrote: > On Tue, Apr 7, 2009 at 10:08 PM, Alexander Neundorf > wrote: > >> >> What is involved in building python extensions ? Can you please explain ? > > Not much: at the core, a python extension is nothing more than a > dynamically loaded library + a couple of options. CMake has support (slightly but intentionally undocumented) for this, from FindPythonLibs.cmake: # PYTHON_ADD_MODULE( src1 src2 ... srcN) is used to build modules for python. # PYTHON_WRITE_MODULES_HEADER() writes a header file you can include # in your sources to initialize the static python modules Using python_add_module(name file1.c file2.c) you can build python modules, and decide at cmake time whether it should be a dynamically loaded module (default) or whether it should be built as a static library (useful for platforms without shared libs). Installation then happens simply via install(TARGETS ...) > One choice is whether to take options from distutils or to set them up What options ? > independently. In my own scons tool to build python extensions, both > are possible. > > The hard (or rather time consuming) work is to do everything else that > distutils does related to the packaging. That's where scons/waf are > more interesting than cmake IMO, because you can "easily" give up this > task back to distutils, whereas it is inherently more difficult with > cmake. Can you please explain ? It is easy to run external tools with cmake at cmake time and at build time, and it is also possible to run them at install time. Alex From skip at pobox.com Tue Apr 7 23:29:30 2009 From: skip at pobox.com (skip at pobox.com) Date: Tue, 7 Apr 2009 16:29:30 -0500 Subject: [Python-Dev] ANN: deps extension (fwd) Message-ID: <18907.50618.40170.430005@montanaro.dyndns.org> I know the subject of external dependencies came up here in the discussion about Mercurial. I just saw this on the Mercurial mailing list. Perhaps it will be of interest to our hg mavens. Skip -------------- next part -------------- An embedded message was scrubbed... From: =?ISO-8859-1?Q?Martin_Vejn=E1r?= Subject: ANN: deps extension Date: Tue, 07 Apr 2009 22:09:38 +0200 Size: 7245 URL: From greg.ewing at canterbury.ac.nz Wed Apr 8 00:43:05 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Apr 2009 10:43:05 +1200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> Message-ID: <49DBD6F9.7030502@canterbury.ac.nz> David Cournapeau wrote: > Having a full > fledged language for complex builds is nice, I think most familiar > with complex makefiles would agree with this. Yes, people will still need general computation in their build process from time to time whether the build tool they're using supports it or not. And if it doesn't, they'll resort to some ungodly mash such as Makefile+ shell+m4. Python has got to be a better choice than that. -- Greg From alex.neundorf at kitware.com Wed Apr 8 00:54:12 2009 From: alex.neundorf at kitware.com (Alexander Neundorf) Date: Wed, 8 Apr 2009 00:54:12 +0200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <49DBD6F9.7030502@canterbury.ac.nz> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> Message-ID: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> On Wed, Apr 8, 2009 at 12:43 AM, Greg Ewing wrote: > David Cournapeau wrote: >> >> Having a full >> fledged language for complex builds is nice, I think most familiar >> with complex makefiles would agree with this. > > Yes, people will still need general computation in their > build process from time to time whether the build tool > they're using supports it or not. I'm maintaining the CMake-based buildsystem for KDE4 since 3 years now in my sparetime, millions lines of code, multiple code generators, all major operating systems. My experience is that people don't need general computation in their build process. CMake supports now more general purpose programming features than it did 2 years ago, e.g. it has now functions with local variables, it can do simple math, regexps and other things. If we get to the point where this is not enough, it usually means a real program which does real work is required. In this case it's actually a good thing to have this as a separate tool, and not mixed into the buildsystem. Having a not very powerful, but therefor domain specific language for the buildsystem is really a feature :-) (even if it sounds wrong in the first moment). >From what I saw when I was building Python I didn't actually see too complicated things. In KDE4 we are not only building and installing programs, but we are also installing and shipping a development platform. This includes CMake files which contain functionality which helps in developing KDE software, i.e. variables and a bunch of KDE-specific macros. They are documented here: http://api.kde.org/cmake/modules.html#module_FindKDE4Internal (this is generated automatically from the cmake file we ship). I guess something similar could be useful for Python, maybe this is what distutils actually do ? I.e. they help with developing python-standard-conformant software ? This could be solved easily if python would install a cmake file which provides the necessary utility functions/macros. Alex From tleeuwenburg at gmail.com Wed Apr 8 00:59:39 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 8 Apr 2009 08:59:39 +1000 Subject: [Python-Dev] Is there an issue with bugs.python.org currently Message-ID: <43c8685c0904071559p6be5f274r945ac8c6258217d9@mail.gmail.com> Sadly, my work firewall/proxy often handles things badly, so I can't actually tell. Is bugs.python.org accepting changes at the moment (I'm trying to update the Stage of an issue)? Cheers, -T -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Apr 8 01:58:54 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 08 Apr 2009 11:58:54 +1200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> Message-ID: <49DBE8BE.3090208@canterbury.ac.nz> Alexander Neundorf wrote: > My experience is that people don't need > general computation in their build process. > ... > CMake supports now more general purpose programming features than it > did 2 years ago, e.g. it has now functions with local variables, it > can do simple math, regexps and other things. In other words, it's growing towards being able to do general computation. Why is it doing that, if people don't need general computation in their build process? > If we get to the point where this is not enough, it usually means a > real program which does real work is required. > In this case it's actually a good thing to have this as a separate > tool, and not mixed into the buildsystem. There's some merit in that idea, but the build tool and the program need to work together smoothly somehow. If the build tool is implemented in Python, there's more chance of that happening (e.g. the Python code can import parts of the build system and call them directly, rather than having to generate a file in some other language). -- Greg From cournape at gmail.com Wed Apr 8 04:11:33 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 8 Apr 2009 11:11:33 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <806d41050904070608x3f1f025bu18df4f1c843e7357@mail.gmail.com> <5b8d13220904070623j258605bob5200dc84362dc11@mail.gmail.com> <806d41050904071442u405f6473t6888b848dd5a6922@mail.gmail.com> Message-ID: <5b8d13220904071911i1bc9ae8ah616e55fdbc080e83@mail.gmail.com> On Wed, Apr 8, 2009 at 6:42 AM, Alexander Neundorf wrote: > What options ? Compilation options. If you build an extension with distutils, the extension is built with the same flags as the ones used by python, the options are taken from distutils.sysconfig (except for MS compilers, which has its own options, which is one of the big pain in distutils). > > Can you please explain ? If you want to stay compatible with distutils, you have to do quite a lot of things. Cmake (and scons, and waf) only handle the build, but they can't handle all the packaging done by distutils (tarballs generation, binaries generation, in place build, develop mode of setuptools, eggs, .pyc and .pyo generation, etc...), so you have two choices: add support for this in the build tool (lot of work) or just use distutils once everything is built with your tool of choice. > It is easy to run external tools with cmake at cmake time and at build > time, and it is also possible to run them at install time. Sure, what can of build tool could not do that :) But given the design of distutils, if you want to keep all its packaging features, you can't just launch a few commands, you have to integrate them somewhat. Everytime you need something from distutils, you would need to launch python for cmake, whether with scons/waf, you can just use it as you would use any python library. That's just inherent to the fact that waf/scons are in the same language as distutils; if we were doing ocaml builds, having a build tool in ocaml would have been easier, etc... David From cournape at gmail.com Wed Apr 8 04:18:16 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 8 Apr 2009 11:18:16 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> Message-ID: <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> On Wed, Apr 8, 2009 at 7:54 AM, Alexander Neundorf wrote: > On Wed, Apr 8, 2009 at 12:43 AM, Greg Ewing wrote: >> David Cournapeau wrote: >>> >>> Having a full >>> fledged language for complex builds is nice, I think most familiar >>> with complex makefiles would agree with this. >> >> Yes, people will still need general computation in their >> build process from time to time whether the build tool >> they're using supports it or not. > > I'm maintaining the CMake-based buildsystem for KDE4 since 3 years now > in my sparetime, millions lines of code, multiple code generators, all > major operating systems. My experience is that people don't need > general computation in their build process. > CMake supports now more general purpose programming features than it > did 2 years ago, e.g. it has now functions with local variables, it > can do simple math, regexps and other things. > If we get to the point where this is not enough, it usually means a > real program which does real work is required. > In this case it's actually a good thing to have this as a separate > tool, and not mixed into the buildsystem. > Having a not very powerful, but therefor domain specific language for > the buildsystem is really a feature :-) > (even if it sounds wrong in the first moment). Yes, there are some advantages to that. The point of python is to have the same language for the build specification and the extensions, in my mind. For extensions, you really need a full language - for example, if you want to add support for tools which generate unknown files in advance, and handle this correctly from a build POV, a macro-like language is not sufficient. > > >From what I saw when I was building Python I didn't actually see too > complicated things. In KDE4 we are not only building and installing > programs, but we are also installing and shipping a development > platform. This includes CMake files which contain functionality which > helps in developing KDE software, i.e. variables and a bunch of > KDE-specific macros. They are documented here: > http://api.kde.org/cmake/modules.html#module_FindKDE4Internal > (this is generated automatically from the cmake file we ship). > I guess something similar could be useful for Python, maybe this is > what distutils actually do ? distutils does roughly everything that autotools does, and more: - configuration: not often used in extensions, we (numpy) are the exception I would guess - build - installation - tarball generation - bdist_ installers (msi, .exe on windows, .pkg/.mpkg on mac os x, rpm/deb on Linux) - registration to pypi - more things which just ellude me at the moment cheers, David From tleeuwenburg at gmail.com Wed Apr 8 04:54:27 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 8 Apr 2009 12:54:27 +1000 Subject: [Python-Dev] http://bugs.python.org/issue2240 Message-ID: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> This issue has been largely resolved, but there is an outstanding bug where the (reviewed and committed) solution does not work on certain versions of FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported platforms', and is FreeBSD 6.3 in it? What's the policy with regards to supporting dependencies like this? Should I set this issue to 'pending' seeing as no-one is currently working on a patch for this? Or is leaving this open and hanging around exactly the right thing to do? Cheers, -T -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed Apr 8 04:58:59 2009 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 7 Apr 2009 23:58:59 -0300 Subject: [Python-Dev] calling dictresize outside dictobject.c In-Reply-To: <20090407163449.GA10119@panix.com> References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> <20090407163449.GA10119@panix.com> Message-ID: Did you read the post until the end? The OP is asking a question related to a very low level detail of dict implementation and making an offer to write a patch that could speed-up dict.setdefault() in core CPython... IMHO, a poll on python-dev do makes sense... On Tue, Apr 7, 2009 at 1:34 PM, Aahz wrote: > On Mon, Apr 06, 2009, Dan Schult wrote: >> >> I'm trying to write a C extension which is a subclass of dict. >> I want to do something like a setdefault() but with a single lookup. > > python-dev is for core development, not for questions about using Python. > Please use comp.lang.python or the capi-sig list. > -- > Aahz (aahz at pythoncraft.com) ? ? ? ? ? <*> ? ? ? ? http://www.pythoncraft.com/ > > "...string iteration isn't about treating strings as sequences of strings, > it's about treating strings as sequences of characters. ?The fact that > characters are also strings is the reason we have problems, but characters > are strings for other good reasons." ?--Aahz > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/dalcinl%40gmail.com > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From ggpolo at gmail.com Wed Apr 8 05:14:13 2009 From: ggpolo at gmail.com (Guilherme Polo) Date: Wed, 8 Apr 2009 00:14:13 -0300 Subject: [Python-Dev] http://bugs.python.org/issue2240 In-Reply-To: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> Message-ID: On Tue, Apr 7, 2009 at 11:54 PM, Tennessee Leeuwenburg wrote: > This issue has been largely resolved, but there is an outstanding bug where > the (reviewed and committed) solution does not work on certain versions of > FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported > platforms', and is FreeBSD 6.3 in it? > > What's the policy with regards to supporting dependencies like this? Should > I set this issue to 'pending' seeing as no-one is currently working on a > patch for this? Or is leaving this open and hanging around exactly the right > thing to do? > I would find more appropriate to close this as fixed because the issue was about adding setitimer and getitimer wrappers and that is done. We could then create another issue regarding this bug in specific versions of freebsd towards this setitimer/getitimer wrapper. That is what makes more sense to me. > Cheers, > -T > > > > -- > -------------------------------------------------- > Tennessee Leeuwenburg > http://myownhat.blogspot.com/ > "Don't believe everything you think" > Regards, -- -- Guilherme H. Polo Goncalves From tleeuwenburg at gmail.com Wed Apr 8 05:24:04 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 8 Apr 2009 13:24:04 +1000 Subject: [Python-Dev] http://bugs.python.org/issue2240 In-Reply-To: References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> Message-ID: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> On Wed, Apr 8, 2009 at 1:14 PM, Guilherme Polo wrote: > On Tue, Apr 7, 2009 at 11:54 PM, Tennessee Leeuwenburg > wrote: > > This issue has been largely resolved, but there is an outstanding bug > where > > the (reviewed and committed) solution does not work on certain versions > of > > FreeBSD (broken in 6.3, working in 7+). Do we have a list of 'supported > > platforms', and is FreeBSD 6.3 in it? > > > > What's the policy with regards to supporting dependencies like this? > Should > > I set this issue to 'pending' seeing as no-one is currently working on a > > patch for this? Or is leaving this open and hanging around exactly the > right > > thing to do? > > > > I would find more appropriate to close this as fixed because the issue > was about adding setitimer and getitimer wrappers and that is done. > > We could then create another issue regarding this bug in specific > versions of freebsd towards this setitimer/getitimer wrapper. That is > what makes more sense to me. Hi Guilherme, I'd agree with that. I just wonder whether it's necessary to create another issue, or whether the issue can be marked as 'fixed' without opening the new issue. It seems like the bug relates only to an older version of a 'weird' operating system and could perhaps be left unfixed without causing anyone any problems. Cheers, -T -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Apr 8 05:28:15 2009 From: barry at python.org (Barry Warsaw) Date: Tue, 7 Apr 2009 23:28:15 -0400 Subject: [Python-Dev] RELEASED Python 2.6.2 candidate 1 Message-ID: <67987D03-6D96-4601-A0C5-08B987A81F3B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm happy to announce the release of Python 2.6.2 candidate 1. This release contains dozens of bug fixes since Python 2.6.1. Please see the NEWS file for a detailed list of changes. Barring unforeseen problems, Python 2.6.2 final will be released within a few days. http://www.python.org/download/releases/2.6.2/NEWS.txt For more information on Python 2.6 please see http://docs.python.org/dev/whatsnew/2.6.html Source tarballs and Windows installers for this release candidate can be downloaded from the Python 2.6.2 page: http://www.python.org/download/releases/2.6.2/ Bugs can be reported in the Python bug tracker: http://bugs.python.org Enjoy, Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdwZ0HEjvBPtnXfVAQJTsAP+Krt1F6qGjuk9a7q8HwF2oAWr/peIAfDf 7HGjOpieoyyAKO1ZNqWvxZ1Ftx+I0YHjfk5OKz/1FN9H3eteFU/L5EEbJD1iTSmK LAOycWWtWJp+OPatqveHZbGr4ap4XON05yMrzlewnnIH0iGnYjMAgxKkwVKA7MwN BiXDeBPba1A= =HdKG -----END PGP SIGNATURE----- From michele.simionato at gmail.com Wed Apr 8 06:09:14 2009 From: michele.simionato at gmail.com (Michele Simionato) Date: Wed, 8 Apr 2009 06:09:14 +0200 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: Message-ID: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> On Tue, Apr 7, 2009 at 11:04 PM, Terry Reedy wrote: > > This probably should have gone to the python-ideas list. ?In any case, I > think it needs to start with a clear offer from Michele (directly or relayed > by you) to contribute it to the PSF with the usual conditions. I have no problem to contribute the module to the PSF and to maintain it. I would just prefer to have the ability to change the function signature in the core language rather than include in the standard library a clever hack. M. Simionato From stephen at xemacs.org Wed Apr 8 07:19:03 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Apr 2009 14:19:03 +0900 Subject: [Python-Dev] http://bugs.python.org/issue2240 In-Reply-To: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> Message-ID: <87zlerhge0.fsf@xemacs.org> Tennessee Leeuwenburg writes: > I'd agree with that. I just wonder whether it's necessary to create another > issue, or whether the issue can be marked as 'fixed' without opening the new > issue. Opening a new issue has the effect of running a poll of those who watch such issues on the tracker (in particular, I'd grandfather the nosy list). You could even set the new issue to pending at that time. From tleeuwenburg at gmail.com Wed Apr 8 07:40:42 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 8 Apr 2009 15:40:42 +1000 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour Message-ID: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> Now, I know that sets aren't ordered, but... foo = set([1,2,3,4,5]) bar = [1,2,3,4,5] foo.pop() will reliably return 1 while bar.pop() will return 5 discuss :) Cheers, -T -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at rcn.com Wed Apr 8 07:47:45 2009 From: python at rcn.com (Raymond Hettinger) Date: Tue, 7 Apr 2009 22:47:45 -0700 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> Message-ID: <937C9D2AC5034C3AB2551839855A5D99@RaymondLaptop1> [Tennessee Leeuwenburg ] > Now, I know that sets aren't ordered, but... > > foo = set([1,2,3,4,5]) > bar = [1,2,3,4,5] > > foo.pop() will reliably return 1 > while bar.pop() will return 5 > > discuss :) If that's what you need: http://code.activestate.com/recipes/576694/ Raymond From asmodai at in-nomine.org Wed Apr 8 07:55:41 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Wed, 8 Apr 2009 07:55:41 +0200 Subject: [Python-Dev] http://bugs.python.org/issue2240 In-Reply-To: <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> Message-ID: <20090408055541.GA13110@nexus.in-nomine.org> -On [20090408 05:24], Tennessee Leeuwenburg (tleeuwenburg at gmail.com) wrote: >It seems like the bug relates only to an older version of a 'weird' >operating system and could perhaps be left unfixed without causing >anyone any problems. Being one of the FreeBSD guys I'll throw peanuts at you. :P In any case, 6.3 is from early 2008 and 6.4 is from November 2008. The 6-STABLE branch is still open and a lot of users are still tracking this. However, the main focus is 7 and with 8 looming on the horizon. And FreeBSD 7 does away with libc_r and uses a whole different model for its threading. Are the tests going ok there? If so, then I shouldn't worry about the 6 branch. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Few are those who see with their own eyes and feel with their own hearts... From jackdied at gmail.com Wed Apr 8 08:10:15 2009 From: jackdied at gmail.com (Jack diederich) Date: Wed, 8 Apr 2009 02:10:15 -0400 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> Message-ID: On Wed, Apr 8, 2009 at 12:09 AM, Michele Simionato wrote: > On Tue, Apr 7, 2009 at 11:04 PM, Terry Reedy wrote: >> >> This probably should have gone to the python-ideas list. ?In any case, I >> think it needs to start with a clear offer from Michele (directly or relayed >> by you) to contribute it to the PSF with the usual conditions. > > I have no problem to contribute the module to the PSF and to maintain it. > I would just prefer to have the ability to change the function signature in > the core language rather than include in the standard library a clever hack. Flipping Michele's commit bit (if he wants it) is overdue. A quick google doesn't show he refused it in the past, but the same search shows the things things he did do - including the explication of MRO in 2.3 (http://www.python.org/download/releases/2.3/mro/). Plus he's a softie for decorators, as am I. -Jack From jbarham at gmail.com Wed Apr 8 08:13:19 2009 From: jbarham at gmail.com (John Barham) Date: Tue, 7 Apr 2009 23:13:19 -0700 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> Message-ID: <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> Tennessee Leeuwenburg wrote: > Now, I know that sets aren't ordered, but... > > foo = set([1,2,3,4,5]) > bar = [1,2,3,4,5] > > foo.pop() will reliably return 1 > while bar.pop() will return 5 > > discuss :) As designed. If you play around a bit it becomes clear that what set.pop() returns is independent of the insertion order: PythonWin 2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit (Intel)] on win32. >>> foo = set([5,4,3,2,1]) # Order reversed from above >>> foo.pop() 1 >>> foo = set([-1,0,1,2,3,4,5]) >>> foo.pop() 0 >>> foo = set([-1,1,2,3,4,5]) >>> foo.pop() 1 As the documentation says (http://docs.python.org/library/stdtypes.html#set.pop) set.pop() is free to return an arbitrary element. list.pop() however always returns the last element of the list, unless of course you specify some other index: http://docs.python.org/library/stdtypes.html#mutable-sequence-types, point 6. John From dickinsm at gmail.com Wed Apr 8 08:44:35 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Wed, 8 Apr 2009 07:44:35 +0100 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> Message-ID: <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> On Wed, Apr 8, 2009 at 7:13 AM, John Barham wrote: > If you play around a bit it becomes clear that what set.pop() returns > is independent of the insertion order: It might look like that, but I don't think this is true in general (at least, with the current implementation): >>> foo = set([1, 65537]) >>> foo.pop() 1 >>> foo = set([65537, 1]) >>> foo.pop() 65537 Mark From michele.simionato at gmail.com Wed Apr 8 09:17:17 2009 From: michele.simionato at gmail.com (Michele Simionato) Date: Wed, 8 Apr 2009 09:17:17 +0200 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> Message-ID: <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> On Wed, Apr 8, 2009 at 8:10 AM, Jack diederich wrote: > Plus he's a softie for decorators, as am I. I must admit that while I still like decorators, I do like them as much as in the past. I also see an overuse of decorators in various libraries for things that could be done more clearly without them ;-( But this is tangential. What I would really like to know is the future of PEP 362, i.e. having a signature object that could be taken from an undecorated function and added to the decorated function. I do not recall people having anything against it, in principle, and there is also an implementation in the sandbox, but after three years nothing happened. I guess this is just not a high priority for the core developers. From solipsis at pitrou.net Wed Apr 8 11:42:49 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Apr 2009 09:42:49 +0000 (UTC) Subject: [Python-Dev] slightly inconsistent set/list pop behaviour References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: Mark Dickinson gmail.com> writes: > > On Wed, Apr 8, 2009 at 7:13 AM, John Barham gmail.com> wrote: > > If you play around a bit it becomes clear that what set.pop() returns > > is independent of the insertion order: > > It might look like that, but I don't think this is > true in general (at least, with the current implementation): Not to mention that other implementations (Jython, etc.) will probably exhibit yet different behaviour, and the CPython hash functions are not engraved in stone either. If you want to write portable code, you can't rely on *any* reproduceable ordering for random set member access. Regards Antoine. From tleeuwenburg at gmail.com Wed Apr 8 12:57:07 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 8 Apr 2009 20:57:07 +1000 Subject: [Python-Dev] http://bugs.python.org/issue2240 In-Reply-To: <20090408055541.GA13110@nexus.in-nomine.org> References: <43c8685c0904071954j49d66c23v322cde4fc1114030@mail.gmail.com> <43c8685c0904072024o756b275cu69962e8731faf162@mail.gmail.com> <20090408055541.GA13110@nexus.in-nomine.org> Message-ID: <43c8685c0904080357i19ab2f1u628222b97875a131@mail.gmail.com> On Wed, Apr 8, 2009 at 3:55 PM, Jeroen Ruigrok van der Werven < asmodai at in-nomine.org> wrote: > -On [20090408 05:24], Tennessee Leeuwenburg (tleeuwenburg at gmail.com) > wrote: > >It seems like the bug relates only to an older version of a 'weird' > >operating system and could perhaps be left unfixed without causing > >anyone any problems. > > Being one of the FreeBSD guys I'll throw peanuts at you. :P > > In any case, 6.3 is from early 2008 and 6.4 is from November 2008. The > 6-STABLE branch is still open and a lot of users are still tracking this. > > However, the main focus is 7 and with 8 looming on the horizon. And FreeBSD > 7 does away with libc_r and uses a whole different model for its threading. > Are the tests going ok there? If so, then I shouldn't worry about the 6 > branch. :) Thanks for your input. I've done the paper shuffling so someone else can pick up the FreeBSD cleanup job as a new issue... Cheers, -T -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackdied at gmail.com Wed Apr 8 12:57:25 2009 From: jackdied at gmail.com (Jack diederich) Date: Wed, 8 Apr 2009 06:57:25 -0400 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: On Wed, Apr 8, 2009 at 2:44 AM, Mark Dickinson wrote: > On Wed, Apr 8, 2009 at 7:13 AM, John Barham wrote: >> If you play around a bit it becomes clear that what set.pop() returns >> is independent of the insertion order: > > It might look like that, but I don't think this is > true in general (at least, with the current implementation): > >>>> foo = set([1, 65537]) >>>> foo.pop() > 1 >>>> foo = set([65537, 1]) >>>> foo.pop() > 65537 You wrote a program to find the two smallest ints that would have a hash collision in the CPython set implementation? I'm impressed. And by impressed I mean frightened. -Jack From solipsis at pitrou.net Wed Apr 8 13:10:21 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Apr 2009 11:10:21 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json Message-ID: Hello, We're in the process of forward-porting the recent (massive) json updates to 3.1, and we are also thinking of dropping remnants of support of the bytes type in the json library (in 3.1, again). This bytes support almost didn't work at all, but there was a lot of C and Python code for it nevertheless. We're also thinking of dropping the "encoding" argument in the various APIs, since it is useless. Under the new situation, json would only ever allow str as input, and output str as well. By posting here, I want to know whether anybody would oppose this (knowing, once again, that bytes support is already broken in the current py3k trunk). The bug entry is: http://bugs.python.org/issue4136 Regards Antoine. From steve at holdenweb.com Wed Apr 8 13:57:09 2009 From: steve at holdenweb.com (Steve Holden) Date: Wed, 08 Apr 2009 07:57:09 -0400 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: Jack diederich wrote: > On Wed, Apr 8, 2009 at 2:44 AM, Mark Dickinson wrote: >> On Wed, Apr 8, 2009 at 7:13 AM, John Barham wrote: >>> If you play around a bit it becomes clear that what set.pop() returns >>> is independent of the insertion order: >> It might look like that, but I don't think this is >> true in general (at least, with the current implementation): >> >>>>> foo = set([1, 65537]) >>>>> foo.pop() >> 1 >>>>> foo = set([65537, 1]) >>>>> foo.pop() >> 65537 > > You wrote a program to find the two smallest ints that would have a > hash collision in the CPython set implementation? I'm impressed. And > by impressed I mean frightened. > Given the two numbers in question (1, 2**16+1) I suspect this is the result of analysis rather than algorithm. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From gripho66 at gmail.com Wed Apr 8 13:58:04 2009 From: gripho66 at gmail.com (Andrea Griffini) Date: Wed, 8 Apr 2009 13:58:04 +0200 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich wrote: > You wrote a program to find the two smallest ints that would have a > hash collision in the CPython set implementation? ?I'm impressed. ?And > by impressed I mean frightened. ? print set([0,8]).pop(), set([8,0]).pop() Andrea From ideasman42 at gmail.com Wed Apr 8 14:04:11 2009 From: ideasman42 at gmail.com (Campbell Barton) Date: Wed, 8 Apr 2009 05:04:11 -0700 Subject: [Python-Dev] PyCFunction_* Missing Message-ID: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com> Hi, Just noticed the new Python 2.6.2 docs now dont have any reference to * PyCFunction_New * PyCFunction_NewEx * PyCFunction_Check * PyCFunction_Call Ofcourse these are still in the source code but Im wondering if this is intentional that these functions should be for internal use only? -- - Campbell From duncan.booth at suttoncourtenay.org.uk Wed Apr 8 14:30:05 2009 From: duncan.booth at suttoncourtenay.org.uk (Duncan Booth) Date: Wed, 8 Apr 2009 12:30:05 +0000 (UTC) Subject: [Python-Dev] slightly inconsistent set/list pop behaviour References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: Andrea Griffini wrote: > On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich > wrote: >> You wrote a program to find the two smallest ints that would have a >> hash collision in the CPython set implementation? ?I'm impressed. >> ?And by impressed I mean frightened. > > ? > > print set([0,8]).pop(), set([8,0]).pop() If 'smallest ints' means the sum of the absolute values then these are slightly smaller: >>> print set([-1,6]).pop(), set([6,-1]).pop() 6 -1 From p.f.moore at gmail.com Wed Apr 8 14:58:31 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 8 Apr 2009 13:58:31 +0100 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com> 2009/4/8 Duncan Booth : > Andrea Griffini wrote: > >> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich >> wrote: >>> You wrote a program to find the two smallest ints that would have a >>> hash collision in the CPython set implementation? ?I'm impressed. >>> ?And by impressed I mean frightened. >> >> ? >> >> print set([0,8]).pop(), set([8,0]).pop() > > If 'smallest ints' means the sum of the absolute values then these are > slightly smaller: > >>>> print set([-1,6]).pop(), set([6,-1]).pop() > 6 -1 Can't resist: >>> print set([-2,-1]).pop(), set([-1,-2]).pop() -1 -2 Paul. From steve at holdenweb.com Wed Apr 8 17:14:09 2009 From: steve at holdenweb.com (Steve Holden) Date: Wed, 08 Apr 2009 11:14:09 -0400 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com> References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com> Message-ID: Paul Moore wrote: > 2009/4/8 Duncan Booth : >> Andrea Griffini wrote: >> >>> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich >>> wrote: >>>> You wrote a program to find the two smallest ints that would have a >>>> hash collision in the CPython set implementation? I'm impressed. >>>> And by impressed I mean frightened. >>> ? >>> >>> print set([0,8]).pop(), set([8,0]).pop() >> If 'smallest ints' means the sum of the absolute values then these are >> slightly smaller: >> >>>>> print set([-1,6]).pop(), set([6,-1]).pop() >> 6 -1 > > Can't resist: > >>>> print set([-2,-1]).pop(), set([-1,-2]).pop() > -1 -2 > >>> a = 0.001 >>> b = 0.002 >>> print set([a, b]).pop(), set([b, a]).pop() 0.002 0.001 Let's stop here ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From p.f.moore at gmail.com Wed Apr 8 17:38:35 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 8 Apr 2009 16:38:35 +0100 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> <79990c6b0904080558y3b9bb5a2kf4976b104b65baa1@mail.gmail.com> Message-ID: <79990c6b0904080838h12f520f8g96d5197214d820e@mail.gmail.com> 2009/4/8 Steve Holden : > Paul Moore wrote: >> 2009/4/8 Duncan Booth : >>> Andrea Griffini wrote: >>> >>>> On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich >>>> wrote: >>>>> You wrote a program to find the two smallest ints that would have a >>>>> hash collision in the CPython set implementation? ?I'm impressed. >>>>> ?And by impressed I mean frightened. >>>> ? >>>> >>>> print set([0,8]).pop(), set([8,0]).pop() >>> If 'smallest ints' means the sum of the absolute values then these are >>> slightly smaller: >>> >>>>>> print set([-1,6]).pop(), set([6,-1]).pop() >>> 6 -1 >> >> Can't resist: >> >>>>> print set([-2,-1]).pop(), set([-1,-2]).pop() >> -1 -2 >> >>>> a = 0.001 >>>> b = 0.002 >>>> print set([a, b]).pop(), set([b, a]).pop() > 0.002 0.001 Cheat! We were using integers... :-) Paul. From jbaker at zyasoft.com Wed Apr 8 17:50:55 2009 From: jbaker at zyasoft.com (Jim Baker) Date: Wed, 8 Apr 2009 09:50:55 -0600 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! Message-ID: A question that arose on this thread, which I'm forwarding for context (and we're quite happy about it too!): - What is the scope of a patch that requires a contributor agreement? This particular patch on #1188 simply adds obvious (in retrospect of course) handling on SecurityException so that it's treated in a similar fashion to IOException (possibly a bit more buried), so it seems like a minor patch. - Do Google employees, working on company time, automatically get treated as contributors with existing contributor agreements on file with the PSF? If so, are there are other companies that automatically get this treatment? - Should we change the workflow for roundup to make this assignment of license clearer (see Tobias's idea in the thread about a click-though agreement). In these matters, Jython, as a project under the Python Software Foundation, intends to follow the same policy as CPython. - Jim ---------- Forwarded message ---------- From: Frank Wierzbicki Date: Wed, Apr 8, 2009 at 9:32 AM Subject: Re: [Jython-dev] Jython on Google AppEngine! To: James Robinson Cc: Jython Developers , Alan Kennedy < jython-dev at xhaus.com> On Wed, Apr 8, 2009 at 11:22 AM, James Robinson wrote: > I submitted 1188 and I'm a Google employee working on company time. Let me > know if anything further is needed, but we have quite a few contributors to > the Python project working here. Excellent, and thanks! 1188 was already slated for inclusion in our upcoming RC, but knowing that it is in support of GAE moves it up to a very high priority. -Frank ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Jython-dev mailing list Jython-dev at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jython-dev -- Jim Baker jbaker at zyasoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at rcn.com Wed Apr 8 17:51:29 2009 From: python at rcn.com (Raymond Hettinger) Date: Wed, 8 Apr 2009 08:51:29 -0700 Subject: [Python-Dev] Dropping bytes "support" in json References: Message-ID: > We're in the process of forward-porting the recent (massive) json updates to > 3.1, and we are also thinking of dropping remnants of support of the bytes type > in the json library (in 3.1, again). This bytes support almost didn't work at > all, but there was a lot of C and Python code for it nevertheless. We're also > thinking of dropping the "encoding" argument in the various APIs, since it is > useless. > > Under the new situation, json would only ever allow str as input, and output str > as well. By posting here, I want to know whether anybody would oppose this > (knowing, once again, that bytes support is already broken in the current py3k > trunk). +1 Raymond From jbaker at zyasoft.com Wed Apr 8 17:53:38 2009 From: jbaker at zyasoft.com (Jim Baker) Date: Wed, 8 Apr 2009 09:53:38 -0600 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! In-Reply-To: References: Message-ID: Oops, didn't attach the entire thread, so see below: On Wed, Apr 8, 2009 at 9:50 AM, Jim Baker wrote: > A question that arose on this thread, which I'm forwarding for context (and > we're quite happy about it too!): > > - What is the scope of a patch that requires a contributor agreement? > This particular patch on #1188 simply adds obvious (in retrospect of course) > handling on SecurityException so that it's treated in a similar fashion to > IOException (possibly a bit more buried), so it seems like a minor patch. > - Do Google employees, working on company time, automatically get > treated as contributors with existing contributor agreements on file with > the PSF? If so, are there are other companies that automatically get this > treatment? > - Should we change the workflow for roundup to make this assignment of > license clearer (see Tobias's idea in the thread about a click-though > agreement). > > In these matters, Jython, as a project under the Python Software > Foundation, intends to follow the same policy as CPython. > > - Jim > Forwarded conversation Subject: [Jython-dev] Jython on Google AppEngine! ------------------------ From: *Alan Kennedy* Date: Wed, Apr 8, 2009 at 6:37 AM To: Jython Developers , jython users < jython-users at lists.sourceforge.net> Hi all, As you may know, Google announced Java for AppEngine yesterday! http://googleappengine.blogspot.com/2009/04/seriously-this-time-new-language-on-app.html And they're also supporting all of the various languages that run on the JVM, including jython. http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine They say about jython """ - Jython 2.2 works out of the box. - Jython 2.5 requires patches which we'll supply until the changes make it directly into Jython: - jython-r5996-patched-for-appengine.jar is the complete jython binary library, patched for app engine - jython-r5996-appengine.patch is the patch file that contains the source code for the changes """ They provide the patches they used to make 2.5 work http://google-appengine-java.googlegroups.com/web/jython-r5996-appengine.patch I definitely think this is an important patch to consider for the 2.5RC! It would be nice if Google could say Jython 2.2 works out of the box, and jython 2.5 works out of the box. Alan. ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Jython-dev mailing list Jython-dev at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jython-dev ---------- From: *Tobias Ivarsson* Date: Wed, Apr 8, 2009 at 8:18 AM To: Alan Kennedy Cc: Jython Developers Most things in that patch look ok. I'd like to do a more thorough analysis of the implications of each change though. The catching of SecurityException is fine, but I want to look at the places where they drop the exceptions that they caught in their context, and make sure that silently ignoring the exception is a valid approach. The other changes are few but slightly more controversial. Are Google willing to sign a contributors agreement and license this patch to us? otherwise someone who has not looked on it yet (i.e. not me), should probably experiment with Jython on GAE and find out what needs to be patched to get Jython to run there. /Tobias ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Jython-dev mailing list Jython-dev at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jython-dev ---------- From: *Jim Baker* Date: Wed, Apr 8, 2009 at 8:33 AM To: Alan Kennedy Cc: Jython Developers , jython users < jython-users at lists.sourceforge.net> This is the same patch set requested in http://bugs.jython.org/issue1188: "Patch against trunk to handle SecurityExceptions". Now we know the source of the request, and the specific application is very clear: a sandboxed Jython, running under a fairly strict security manager. The bug is a blocker for the release candidate, so this fix will be part of 2.5. We would love to see more work testing the full scope of environments Jython needs to run under, and any resulting bugs. - Jim -- Jim Baker jbaker at zyasoft.com ---------- From: *James Robinson* Date: Wed, Apr 8, 2009 at 8:30 AM To: Tobias Ivarsson Cc: Jython Developers , Alan Kennedy < jython-dev at xhaus.com> I have a patch up on your issue tracker already, I'll ping it shortly. It's a very small patch and the SecurityExceptions that are caught and ignored are treated the same as I/O exceptions in the vast majority of cases (which they really are). - James ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Jython-dev mailing list Jython-dev at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jython-dev ---------- From: *Jim Baker* Date: Wed, Apr 8, 2009 at 8:36 AM To: James Robinson Cc: Tobias Ivarsson , Jython Developers < jython-dev at lists.sourceforge.net>, Alan Kennedy Right, this is a very small patch, we haven't required contributor agreements in similar cases. I think we want to consider how to replicate this setup however so we don't inadvertently reverse things. - Jim ---------- From: *Tobias Ivarsson* Date: Wed, Apr 8, 2009 at 8:40 AM To: Jim Baker Cc: James Robinson , Jython Developers < jython-dev at lists.sourceforge.net>, Alan Kennedy Could we add a click-through agreement for patch submissions? Patches are usually small enough to not be a big deal, but such a thing would leave us entirely safe. /Tobias ---------- From: *Frank Wierzbicki* Date: Wed, Apr 8, 2009 at 8:44 AM To: Tobias Ivarsson Cc: Jython Developers , Alan Kennedy < jython-dev at xhaus.com> Google is a member of the PSF, so as long as Google wants this contributed I think it's okay. To be safe we should get an explicit statement, but since the patch is small, this probably isn't strictly necessary. FWIW this is how my on-the-clock contributions to Jython are protected (Sun is a member of the PSF and allows my contributions). -Frank ---------- From: *James Robinson* Date: Wed, Apr 8, 2009 at 9:22 AM To: Frank Wierzbicki Cc: Jython Developers , Alan Kennedy < jython-dev at xhaus.com> I submitted 1188 and I'm a Google employee working on company time. Let me know if anything further is needed, but we have quite a few contributors to the Python project working here. - James ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Jython-dev mailing list Jython-dev at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jython-dev ---------- From: *Frank Wierzbicki* Date: Wed, Apr 8, 2009 at 9:33 AM To: Tobias Ivarsson Cc: Jim Baker , Jython Developers < jython-dev at lists.sourceforge.net>, Alan Kennedy A click through is a very good idea, I think Jim is going to find out what they do for CPython. -Frank ---------- From: *Frank Wierzbicki* Date: Wed, Apr 8, 2009 at 9:32 AM To: James Robinson Cc: Jython Developers , Alan Kennedy < jython-dev at xhaus.com> Excellent, and thanks! 1188 was already slated for inclusion in our upcoming RC, but knowing that it is in support of GAE moves it up to a very high priority. -- Jim Baker jbaker at zyasoft.com -- Jim Baker jbaker at zyasoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From aahz at pythoncraft.com Wed Apr 8 18:14:59 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 8 Apr 2009 09:14:59 -0700 Subject: [Python-Dev] Update PEP 374 (DVCS) Message-ID: <20090408161459.GA24661@panix.com> Someone listed this URL on c.l.py and I thought it would make a good reference addition to PEP 374 (DVCS decision): http://www.catb.org/~esr/writings/version-control/version-control.html -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "...string iteration isn't about treating strings as sequences of strings, it's about treating strings as sequences of characters. The fact that characters are also strings is the reason we have problems, but characters are strings for other good reasons." --Aahz From guido at python.org Wed Apr 8 19:51:55 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Apr 2009 10:51:55 -0700 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> Message-ID: On Wed, Apr 8, 2009 at 12:17 AM, Michele Simionato wrote: > On Wed, Apr 8, 2009 at 8:10 AM, Jack diederich wrote: >> Plus he's a softie for decorators, as am I. This worries me a bit. There was a remark (though perhaps meant humorously) in Michele's page about decorators that worried me too: "For instance, typical implementations of decorators involve nested functions, and we all know that flat is better than nested." I find the nested-function pattern very clear and easy to grasp, whereas I find using another decorator (a meta-decorator?) to hide this pattern unnecessarily obscuring what's going on. I also happen to disagree in many cases with decorators that attempt to change the signature of the wrapper function to that of the wrapped function. While this may make certain kinds of introspection possible, again it obscures what's going on to a future maintainer of the code, and the cleverness can get in the way of good old-fashioned debugging. > I must admit that while I still like decorators, I do like them as > much as in the past. > I also see an overuse of decorators in various libraries for things that could > be done more clearly without them ;-( Right. > But this is tangential. (All this BTW is not to say that I don't trust you with commit privileges if you were to be interested in contributing. I just don't think that adding that particular decorator module to the stdlib would be wise. It can be debated though.) > What I would really like to know is the future of PEP 362, i.e. having > a signature object that could be taken from an undecorated function > and added to the decorated function. > I do not recall people having anything against it, in principle, > and there is also an implementation in the sandbox, but > after three years nothing happened. I guess this is just not > a high priority for the core developers. That's likely true. To me, introspection is mostly useful for certain situations like debugging or interactively finding help, but I would hesitate to build a large amount of stuff (whether a library, framework or application) on systematic use of introspection. In fact, I rarely use the inspect module and had to type help(inspect) to figure out what you meant by "signature". :-) I guess one reason is that in my mind, and in the way I tend to write code, I don't write APIs that require introspection -- for example, I don't like APIs that do different things when given a "callable" as opposed to something else (common practices in web frameworks notwithstanding), and thinking about it I would like it even less if an API cared about the *actual* signature of a function I pass into it. I like APIs that say, for example, "argument 'f' must be a function of two arguments, an int and a string," and then I assume that if I pass it something for 'f' it will try to call that something with an int and a string. If I pass it something else, well, I'll get a type error. But it gives me the freedom to pass something that doesn't even have a signature but happens to be callable in that way regardless (e.g. a bound method of a built-in type). I will probably regret saying this. So be it. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Apr 8 20:30:02 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Apr 2009 20:30:02 +0200 Subject: [Python-Dev] slightly inconsistent set/list pop behaviour In-Reply-To: References: <43c8685c0904072240u404fc816u431e354a20c61bf1@mail.gmail.com> <4f34febc0904072313h51f46674nd57efa7de4f52de6@mail.gmail.com> <5c6f2a5d0904072344s1d08f38am4bbc8d523d6f703d@mail.gmail.com> Message-ID: <49DCED2A.4090401@v.loewis.de> >>>>> foo = set([1, 65537]) >>>>> foo.pop() >> 1 >>>>> foo = set([65537, 1]) >>>>> foo.pop() >> 65537 > > You wrote a program to find the two smallest ints that would have a > hash collision in the CPython set implementation? I'm impressed. And > by impressed I mean frightened. Well, Mark is the guy who deals with floating point numbers for fun. *That* should frighten you :-) Martin From martin at v.loewis.de Wed Apr 8 20:33:35 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Apr 2009 20:33:35 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: <49DCEDFF.7050708@v.loewis.de> > We're in the process of forward-porting the recent (massive) json updates to > 3.1, and we are also thinking of dropping remnants of support of the bytes type > in the json library (in 3.1, again). This bytes support almost didn't work at > all, but there was a lot of C and Python code for it nevertheless. We're also > thinking of dropping the "encoding" argument in the various APIs, since it is > useless. > > Under the new situation, json would only ever allow str as input, and output str > as well. By posting here, I want to know whether anybody would oppose this > (knowing, once again, that bytes support is already broken in the current py3k > trunk). What does Bob Ippolito think about this change? IIUC, he considers simplejson's speed one of its primary advantages, and also attributes it to the fact that he can parse directly out of byte strings, and marshal into them (which is important, as you typically receive them over the wire). Having to run them through a codec slows parsing down. Regards, Martin From martin at v.loewis.de Wed Apr 8 20:37:39 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 08 Apr 2009 20:37:39 +0200 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! In-Reply-To: References: Message-ID: <49DCEEF3.7020603@v.loewis.de> > * What is the scope of a patch that requires a contributor > agreement? Unfortunately, that question was never fully answered (or I forgot what the answer was). > * Do Google employees, working on company time, automatically get > treated as contributors with existing contributor agreements on > file with the PSF? Yes, they do. > If so, are there are other companies that > automatically get this treatment? Not that I know of. > * Should we change the workflow for roundup to make this assignment > of license clearer (see Tobias's idea in the thread about a > click-though agreement). I think we do need something written; a lawyer may be able to tell precisely. I still hope that we can record, in the tracker, which contributors have signed an agreement. > In these matters, Jython, as a project under the Python Software > Foundation, intends to follow the same policy as CPython. Please keep pushing. From this message alone, I find two questions to the lawyer, and one (possibly two) feature requests for the bug tracker. Regards, Martin From pje at telecommunity.com Wed Apr 8 20:41:13 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 08 Apr 2009 14:41:13 -0400 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> Message-ID: <20090408183847.180053A4063@sparrow.telecommunity.com> At 10:51 AM 4/8/2009 -0700, Guido van Rossum wrote: >I would like it even less if an API cared about the >*actual* signature of a function I pass into it. One notable use of callable argument inspection is Bobo, the 12-years-ago predecessor to Zope, which used argument information to determine form or query string parameter names. (Were Bobo being written for the first time today for Python 3, I imagine it would use argument annotations to specify types, instead of requiring them to be in the client-side field names.) Bobo, of course, is just a single case of the general pattern of tools that expose a callable to some other (possibly explicitly-typed) system. E.g., wrapping Python functions for exposure to C, Java, .NET, CORBA, SOAP, etc. Anyway, it's nice for decorators to be transparent to inspection when the decorator doesn't actually modify the calling signature, so that you can then use your decorated functions with tools like the above. From alex.neundorf at kitware.com Wed Apr 8 21:45:18 2009 From: alex.neundorf at kitware.com (Alexander Neundorf) Date: Wed, 8 Apr 2009 21:45:18 +0200 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> Message-ID: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> On Wed, Apr 8, 2009 at 4:18 AM, David Cournapeau wrote: ... >> I guess something similar could be useful for Python, maybe this is >> what distutils actually do ? > > distutils does roughly everything that autotools does, and more: > - configuration: not often used in extensions, we (numpy) are the > exception I would guess > - build > - installation > - tarball generation > - bdist_ installers (msi, .exe on windows, .pkg/.mpkg on mac os x, > rpm/deb on Linux) I think cmake can do all of the above (cpack supports creating packages). > - registration to pypi No idea what this is . Alex From skip at pobox.com Wed Apr 8 21:53:08 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 8 Apr 2009 14:53:08 -0500 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> Message-ID: <18909.164.374915.626585@montanaro.dyndns.org> >> - registration to pypi Alex> No idea what this is . http://pypi.python.org/ It is, in some ways, a CPAN-like system for Python. Skip From ncoghlan at gmail.com Wed Apr 8 23:40:44 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 07:40:44 +1000 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <20090408183847.180053A4063@sparrow.telecommunity.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <20090408183847.180053A4063@sparrow.telecommunity.com> Message-ID: <49DD19DC.20204@gmail.com> P.J. Eby wrote: > Anyway, it's nice for decorators to be transparent to inspection when > the decorator doesn't actually modify the calling signature, so that you > can then use your decorated functions with tools like the above. If anyone wanted to take PEP 362 up again, we could easily add a __signature__ attribute to functools.update_wrapper. It may be too late to hammer it into shape for 3.1/2.7 though (I don't recall how far the PEP was from being ready for prime time) . Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From eric at trueblade.com Thu Apr 9 01:07:45 2009 From: eric at trueblade.com (Eric Smith) Date: Wed, 08 Apr 2009 19:07:45 -0400 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd Message-ID: <49DD2E41.80401@trueblade.com> Assuming that Mark's and my changes in the py3k-short-float-repr branch get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its functionality is largely being replaced by PyOS_double_to_string, which we're introducing on our branch. PyOS_ascii_formatd was introduced to fix the issue in PEP 331. PyOS_double_to_string addresses all of the same issues, namely a non-locale aware double-to-string conversion. PyOS_ascii_formatd has an unfortunate interface. It accepts a printf-like format string for a single double parameter. It must parse the format string into the parameters it uses. All uses of it inside Python already know the parameters and must build up a format string using sprintf, only to turn around and have PyOS_ascii_formatd reparse it. In the branch I've replaced all of the internal calls to PyOS_ascii_format with PyOS_double_to_string. My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2. The 2.7 situation is tricker, because we're not planning on backporting the short-float-repr work back to 2.7. In 2.7 I guess we'll leave PyOS_ascii_formatd around, unfortunately. FWIW, I didn't find any external callers of it using Google code search. And as a reminder, the py3k-short-float-repr changes are on Rietveld at http://codereview.appspot.com/33084/show. So far, no comments. From thobes at gmail.com Thu Apr 9 01:10:50 2009 From: thobes at gmail.com (Tobias Ivarsson) Date: Thu, 9 Apr 2009 01:10:50 +0200 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! In-Reply-To: <49DCEEF3.7020603@v.loewis.de> References: <49DCEEF3.7020603@v.loewis.de> Message-ID: <9997d5e60904081610q306746a7x710a4edb804e1cda@mail.gmail.com> On Wed, Apr 8, 2009 at 8:37 PM, "Martin v. L?wis" wrote: --8<-- > > * Should we change the workflow for roundup to make this assignment > > of license clearer (see Tobias's idea in the thread about a > > click-though agreement). > > I think we do need something written; a lawyer may be able to tell > precisely. The company I work for does open source development. And our lawyers said that our model of having contributors send an e-mail with the text "I agree" and our CLA as an attachment was perfectly valid, no hand written signature needed. From there the step to a click through for something as simple as a patch isn't too far. But I would not claim that I know any of these things, I'm just hoping that we can have a simple process with no legal gray areas. > > > I still hope that we can record, in the tracker, which contributors have > signed an agreement. That would be good. Cheers, Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Apr 9 01:31:31 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Apr 2009 23:31:31 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: <49DCEDFF.7050708@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > What does Bob Ippolito think about this change? IIUC, he considers > simplejson's speed one of its primary advantages, and also attributes it > to the fact that he can parse directly out of byte strings, and marshal > into them (which is important, as you typically receive them over the > wire). The only thing I know is that the new version (the one I've tried to merge) is massively faster than the old one - several times faster - and within 20-30% of the speed of the 2.x version (*). Besides, Bob doesn't really seem to care about porting to py3k (he hasn't said anything about it until now, other than that he didn't feel competent to do it). But I'm happy with someone proposing an alternate patch if they want to. As for me, I just wanted to fill the gap and I'm not interested in doing lot of work on this issue. (*) timeit -s "import json; l=['abc']*100" "json.dumps(l)" -> trunk: 33.4 usec per loop -> py3k + patch: 37.1 usec per loop -> vanilla py3k: 314 usec per loop timeit -s "import json; s=json.dumps(['abc']*100)" "json.loads(s)" -> trunk: 44.8 usec per loop -> py3k + patch: 35.4 usec per loop -> vanilla py3k: 1.48 msec per loop (!) Regards Antoine. From guido at python.org Thu Apr 9 02:36:20 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Apr 2009 17:36:20 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: On Wed, Apr 8, 2009 at 4:10 AM, Antoine Pitrou wrote: > We're in the process of forward-porting the recent (massive) json updates to > 3.1, and we are also thinking of dropping remnants of support of the bytes type > in the json library (in 3.1, again). This bytes support almost didn't work at > all, but there was a lot of C and Python code for it nevertheless. We're also > thinking of dropping the "encoding" argument in the various APIs, since it is > useless. > > Under the new situation, json would only ever allow str as input, and output str > as well. By posting here, I want to know whether anybody would oppose this > (knowing, once again, that bytes support is already broken in the current py3k > trunk). > > The bug entry is: http://bugs.python.org/issue4136 I'm kind of surprised that a serialization protocol like JSON wouldn't support reading/writing bytes (as the serialized format -- I don't care about having bytes as values, since JavaScript doesn't have something equivalent AFAIK, and hence JSON doesn't allow it IIRC). Marshal and Pickle, for example, *always* treat the serialized format as bytes. And since in most cases it will be sent over a socket, at some point the serialized representation *will* be bytes, I presume. What makes supporting this hard? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From cournape at gmail.com Thu Apr 9 03:57:44 2009 From: cournape at gmail.com (David Cournapeau) Date: Thu, 9 Apr 2009 10:57:44 +0900 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> Message-ID: <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com> On Thu, Apr 9, 2009 at 4:45 AM, Alexander Neundorf wrote: > I think cmake can do all of the above (cpack supports creating packages). I am sure it is - it is just a lot of work, specially if you want to stay compatible with distutils-built extensions :) cheers, David From michele.simionato at gmail.com Thu Apr 9 06:31:41 2009 From: michele.simionato at gmail.com (Michele Simionato) Date: Thu, 9 Apr 2009 06:31:41 +0200 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> Message-ID: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> On Wed, Apr 8, 2009 at 7:51 PM, Guido van Rossum wrote: > > There was a remark (though perhaps meant humorously) in Michele's page > about decorators that worried me too: "For instance, typical > implementations of decorators involve nested functions, and we all > know that flat is better than nested." I find the nested-function > pattern very clear and easy to grasp, whereas I find using another > decorator (a meta-decorator?) to hide this pattern unnecessarily > obscuring what's going on. I understand your point and I will freely admit that I have always had mixed feelings about the advantages of a meta decorator with respect to plain simple nested functions. I see pros and contras. If functools.update_wrapper could preserve the signature I would probably use it over the decorator module. > I also happen to disagree in many cases with decorators that attempt > to change the signature of the wrapper function to that of the wrapped > function. While this may make certain kinds of introspection possible, > again it obscures what's going on to a future maintainer of the code, > and the cleverness can get in the way of good old-fashioned debugging. Then perhaps you misunderstand the goal of the decorator module. The raison d'etre of the module is to PRESERVE the signature: update_wrapper unfortunately *changes* it. When confronted with a library which I do not not know, I often run over it pydoc, or sphinx, or a custom made documentation tool, to extract the signature of functions. For instance, if I see a method get_user(self, username) I have a good hint about what it is supposed to do. But if the library (say a web framework) uses non signature-preserving decorators, my documentation tool says to me that there is function get_user(*args, **kwargs) which frankly is not enough [this is the optimistic case, when the author of the decorator has taken care to preserve the name of the original function]. I *hate* losing information about the true signature of functions, since I also use a lot IPython, Python help, etc. >> I must admit that while I still like decorators, I do like them as >> much as in the past. Of course there was a missing NOT in this sentence, but you all understood the intended meaning. > (All this BTW is not to say that I don't trust you with commit > privileges if you were to be interested in contributing. I just don't > think that adding that particular decorator module to the stdlib would > be wise. It can be debated though.) Fine. As I have repeated many time that particular module was never meant for inclusion in the standard library. But I feel strongly about the possibility of being able to preserve (not change!) the function signature. > To me, introspection is mostly useful for certain > situations like debugging or interactively finding help, but I would > hesitate to build a large amount of stuff (whether a library, > framework or application) on systematic use of introspection. In fact, > I rarely use the inspect module and had to type help(inspect) to > figure out what you meant by "signature". :-) I guess one reason is > that in my mind, and in the way I tend to write code, I don't write > APIs that require introspection -- for example, I don't like APIs that > do different things when given a "callable" as opposed to something > else (common practices in web frameworks notwithstanding), and > thinking about it I would like it even less if an API cared about the > *actual* signature of a function I pass into it. I like APIs that say, > for example, "argument 'f' must be a function of two arguments, an int > and a string," and then I assume that if I pass it something for 'f' > it will try to call that something with an int and a string. If I pass > it something else, well, I'll get a type error. But it gives me the > freedom to pass something that doesn't even have a signature but > happens to be callable in that way regardless (e.g. a bound method of > a built-in type). I do not think everybody disagree with your point here. My point still stands, though: objects should not lie about their signature, especially during debugging and when generating documentation from code. From solipsis at pitrou.net Thu Apr 9 07:15:09 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Apr 2009 05:15:09 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: Message-ID: Guido van Rossum python.org> writes: > > I'm kind of surprised that a serialization protocol like JSON wouldn't > support reading/writing bytes (as the serialized format -- I don't > care about having bytes as values, since JavaScript doesn't have > something equivalent AFAIK, and hence JSON doesn't allow it IIRC). > Marshal and Pickle, for example, *always* treat the serialized format > as bytes. And since in most cases it will be sent over a socket, at > some point the serialized representation *will* be bytes, I presume. > What makes supporting this hard? It's not hard, it just means a lot of duplicated code if the library wants to support both str and bytes in an optimized way as Martin alluded to. This duplicated code already exists in the C parts to support the 2.x semantics of accepting unicode objects as well as str, but not in the Python parts, which explains why the bytes support is broken in py3k - in 2.x, the same Python code can be used for str and unicode. On the other hand, supporting it without going after the last percents of performance should be fairly trivial (by encoding/decoding before doing the processing proper), and it would avoid the current duplicated code. As for reading/writing bytes over the wire, JSON is often used in the same context as HTML: you are supposed to know the charset and decode/encode the payload using that charset. However, the RFC specifies a default encoding of utf-8. (*) (*) http://www.ietf.org/rfc/rfc4627.txt The RFC also specifies a discrimination algorithm for non-supersets of ASCII (?Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.?), but it is not implemented in the json module: >>> json.loads('"hi"') 'hi' >>> json.loads(u'"hi"'.encode('utf16')) Traceback (most recent call last): File "", line 1, in File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads return _default_decoder.decode(s) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded Regards Antoine. From martin at v.loewis.de Thu Apr 9 07:55:20 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 09 Apr 2009 07:55:20 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <49DCEDFF.7050708@v.loewis.de> Message-ID: <49DD8DC8.8020302@v.loewis.de> > Besides, Bob doesn't really seem to care about > porting to py3k (he hasn't said anything about it until now, other than that he > didn't feel competent to do it). That is quite unfortunate, and suggests that perhaps the module shouldn't have been added to Python in the first place. I can understand that you don't want to spend much time on it. How about removing it from 3.1? We could re-add it when long-term support becomes more likely. Regards, Martin From python at rcn.com Thu Apr 9 09:16:24 2009 From: python at rcn.com (Raymond Hettinger) Date: Thu, 9 Apr 2009 00:16:24 -0700 Subject: [Python-Dev] Dropping bytes "support" in json References: <49DCEDFF.7050708@v.loewis.de> <49DD8DC8.8020302@v.loewis.de> Message-ID: <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1> [Antoine Pitrou] >> Besides, Bob doesn't really seem to care about >> porting to py3k (he hasn't said anything about it until now, other than that he >> didn't feel competent to do it). His actual words were: "I will need some help with 3.0 since I am not well versed in the changes to the C API or Python code for that, but merging for 2.6.1 should be no big deal." [MvL] > That is quite unfortunate, and suggests that perhaps the module > shouldn't have been added to Python in the first place. Bob participated actively in http://bugs.python.org/issue4136 and was responsive to detailed patch review. He gave a popular talk at PyCon less than two weeks ago. He's not derelict. > I can understand that you don't want to spend much time on it. How > about removing it from 3.1? We could re-add it when long-term support > becomes more likely. I'm speechless. Raymond From dirkjan at ochtman.nl Thu Apr 9 09:59:56 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 9 Apr 2009 09:59:56 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: On Thu, Apr 9, 2009 at 07:15, Antoine Pitrou wrote: > The RFC also specifies a discrimination algorithm for non-supersets of ASCII > (?Since the first two characters of a JSON text will always be ASCII > ? characters [RFC0020], it is possible to determine whether an octet > ? stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > ? at the pattern of nulls in the first four octets.?), but it is not > implemented in the json module: Well, your example is bad in the context of the RFC. The RFC states that JSON-text = object / array, meaning "loads" for '"hi"' isn't strictly valid. The discrimination algorithm obviously only works in the context of that grammar, where the first character of a document must be { or [ and the next character can only be {, [, f, n, t, ", -, a number, or insignificant whitespace (space, \t, \r, \n). >>>> json.loads('"hi"') > 'hi' >>>> json.loads(u'"hi"'.encode('utf16')) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads > ? ?return _default_decoder.decode(s) > ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode > ? ?obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode > ? ?raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded Cheers, Dirkjan From ncoghlan at gmail.com Thu Apr 9 12:54:39 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 20:54:39 +1000 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd In-Reply-To: <49DD2E41.80401@trueblade.com> References: <49DD2E41.80401@trueblade.com> Message-ID: <49DDD3EF.2010501@gmail.com> Eric Smith wrote: > And as a reminder, the py3k-short-float-repr changes are on Rietveld at > http://codereview.appspot.com/33084/show. So far, no comments. I skipped over the actual number crunching parts (the test suite will do a better job than I will of telling you whether or not you have those parts correct), but I had a look at the various other changes to make use of the new API. Looks like you were able to delete some fairly respectable chunks of redundant code! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From barry at python.org Thu Apr 9 13:01:19 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 07:01:19 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 9, 2009, at 1:15 AM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> >> I'm kind of surprised that a serialization protocol like JSON >> wouldn't >> support reading/writing bytes (as the serialized format -- I don't >> care about having bytes as values, since JavaScript doesn't have >> something equivalent AFAIK, and hence JSON doesn't allow it IIRC). >> Marshal and Pickle, for example, *always* treat the serialized format >> as bytes. And since in most cases it will be sent over a socket, at >> some point the serialized representation *will* be bytes, I presume. >> What makes supporting this hard? > > It's not hard, it just means a lot of duplicated code if the library > wants to > support both str and bytes in an optimized way as Martin alluded to. > This > duplicated code already exists in the C parts to support the 2.x > semantics of > accepting unicode objects as well as str, but not in the Python > parts, which > explains why the bytes support is broken in py3k - in 2.x, the same > Python code > can be used for str and unicode. This is an interesting question, and something I'm struggling with for the email package for 3.x. It turns out to be pretty convenient to have both a bytes and a string API, both for input and output, but I think email really wants to be represented internally as bytes. Maybe. Or maybe just for content bodies and not headers, or maybe both. Anyway, aside from that decision, I haven't come up with an elegant way to allow /output/ in both bytes and strings (input is I think theoretically easier by sniffing the arguments). Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSd3Vf3EjvBPtnXfVAQKyNgQApNmI5hh9heTYynyADYaDkP8wzZFXUpgg cKYL741MbLpOFn3IFGAGaRWBQe4Dt8i4CiIEIbg3X7QZqwQJjoTtFwxsJKmXFd1M JR0oCB8Du2kE5YzD+avrEp+d8zwl2goxvzD9dJwziBav5V98w7PMiZc3sApklQFD gNYzbHEOfv4= =tjGr -----END PGP SIGNATURE----- From ncoghlan at gmail.com Thu Apr 9 13:06:21 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 21:06:21 +1000 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <49DBA78F.7010904@v.loewis.de> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de> Message-ID: <49DDD6AD.9020708@gmail.com> Martin v. L?wis wrote: >> Such a policy would then translate to a dead end for Python 2.x >> based applications. > > 2.x based applications *are* in a dead end, with the only exit > being portage to 3.x. The actual end of the dead end just happens to be in 2013 or so :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Thu Apr 9 13:10:22 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Apr 2009 11:10:22 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: Message-ID: Dirkjan Ochtman ochtman.nl> writes: > > The RFC states > that JSON-text = object / array, meaning "loads" for '"hi"' isn't > strictly valid. Sure, but then: >>> json.loads('[]') [] >>> json.loads(u'[]'.encode('utf16')) Traceback (most recent call last): File "", line 1, in File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads return _default_decoder.decode(s) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded Cheers Antoine. From eric at trueblade.com Thu Apr 9 13:56:21 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 09 Apr 2009 07:56:21 -0400 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd In-Reply-To: <49DDD3EF.2010501@gmail.com> References: <49DD2E41.80401@trueblade.com> <49DDD3EF.2010501@gmail.com> Message-ID: <49DDE265.4070605@trueblade.com> Nick Coghlan wrote: > Eric Smith wrote: >> And as a reminder, the py3k-short-float-repr changes are on Rietveld at >> http://codereview.appspot.com/33084/show. So far, no comments. > Looks like you were able to delete some fairly respectable chunks of > redundant code! Wait until you see how much nasty code gets deleted when I can actually remove PyOS_ascii_formatd! And thanks for your comments on Rietveld, especially catching the memory leak. Eric. From dirkjan at ochtman.nl Thu Apr 9 14:02:43 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 9 Apr 2009 14:02:43 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: On Thu, Apr 9, 2009 at 13:10, Antoine Pitrou wrote: > Sure, but then: > >>>> json.loads('[]') > [] >>>> json.loads(u'[]'.encode('utf16')) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads > ? ?return _default_decoder.decode(s) > ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode > ? ?obj, end = self.raw_decode(s, idx=_w(s, 0).end()) > ?File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode > ? ?raise ValueError("No JSON object could be decoded") > ValueError: No JSON object could be decoded Right. :) Just wanted to point your test might not be testing what you want to test. Cheers, Dirkjan From steve at holdenweb.com Thu Apr 9 14:07:15 2009 From: steve at holdenweb.com (Steve Holden) Date: Thu, 09 Apr 2009 08:07:15 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: Barry Warsaw wrote: > On Apr 9, 2009, at 1:15 AM, Antoine Pitrou wrote: > >> Guido van Rossum python.org> writes: >>> >>> I'm kind of surprised that a serialization protocol like JSON wouldn't >>> support reading/writing bytes (as the serialized format -- I don't >>> care about having bytes as values, since JavaScript doesn't have >>> something equivalent AFAIK, and hence JSON doesn't allow it IIRC). >>> Marshal and Pickle, for example, *always* treat the serialized format >>> as bytes. And since in most cases it will be sent over a socket, at >>> some point the serialized representation *will* be bytes, I presume. >>> What makes supporting this hard? > >> It's not hard, it just means a lot of duplicated code if the library >> wants to >> support both str and bytes in an optimized way as Martin alluded to. This >> duplicated code already exists in the C parts to support the 2.x >> semantics of >> accepting unicode objects as well as str, but not in the Python parts, >> which >> explains why the bytes support is broken in py3k - in 2.x, the same >> Python code >> can be used for str and unicode. > > This is an interesting question, and something I'm struggling with for > the email package for 3.x. It turns out to be pretty convenient to have > both a bytes and a string API, both for input and output, but I think > email really wants to be represented internally as bytes. Maybe. Or > maybe just for content bodies and not headers, or maybe both. Anyway, > aside from that decision, I haven't come up with an elegant way to allow > /output/ in both bytes and strings (input is I think theoretically > easier by sniffing the arguments). > The real problem I came across in storing email in a relational database was the inability to store messages as Unicode. Some messages have a body in one encoding and an attachment in another, so the only ways to store the messages are either as a monolithic bytes string that gets parsed when the individual components are required or as a sequence of components in the database's preferred encoding (if you want to keep the original encoding most relational databases won't be able to help unless you store the components as bytes). All in all, as you might expect from a system that's been growing up since 1970 or so, it can be quite intractable. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From ncoghlan at gmail.com Thu Apr 9 14:11:28 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 22:11:28 +1000 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> Message-ID: <49DDE5F0.4070000@gmail.com> Michele Simionato wrote: > On Wed, Apr 8, 2009 at 7:51 PM, Guido van Rossum wrote: >> There was a remark (though perhaps meant humorously) in Michele's page >> about decorators that worried me too: "For instance, typical >> implementations of decorators involve nested functions, and we all >> know that flat is better than nested." I find the nested-function >> pattern very clear and easy to grasp, whereas I find using another >> decorator (a meta-decorator?) to hide this pattern unnecessarily >> obscuring what's going on. > > I understand your point and I will freely admit that I have always had mixed > feelings about the advantages of a meta decorator with > respect to plain simple nested functions. I see pros and contras. > If functools.update_wrapper could preserve the signature I > would probably use it over the decorator module. Yep, update_wrapper was a compromise along the lines of "well, at least we can make sure the relevant metadata refers to the original function rather than the relatively uninteresting wrapper, even if the signature itself is lost". The idea being that you can often figure out the signature from the doc string even when introspection has been broken by an intervening wrapper. One of my hopes for PEP 362 was that I would be able to just add __signature__ to the list of copied attributes, but that PEP is currently short a champion to work through the process of resolving the open issues and creating an up to date patch (Brett ended up with too many things on his plate so he wasn't able to do it, and nobody else has offered to take it over). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Thu Apr 9 14:17:37 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 22:17:37 +1000 Subject: [Python-Dev] Mercurial? In-Reply-To: <49DA7C91.6010202@v.loewis.de> References: <20090404154049.GA23987@panix.com> <49D87CD4.1000909@ochtman.nl> <49D8BC81.7040007@ochtman.nl> <49D9EB15.8070806@gmail.com> <49DA7C91.6010202@v.loewis.de> Message-ID: <49DDE761.9000206@gmail.com> Martin v. L?wis wrote: > Nick Coghlan wrote: >> Dirkjan Ochtman wrote: >>> I have a stab at an author map at http://dirkjan.ochtman.nl/author-map. >>> Could use some review, but it seems like a good start. >> Martin may be able to provide a better list of names based on the >> checkin name<->SSH public key mapping in the SVN setup. > > I think the identification in the SSH keys is useless. It contains > strings like "loewis at mira" or "ncoghlan at uberwald", or even multiple > of them (barry at wooz, barry at resist, ...). Ah, I forgot our SVN accounts weren't linked up to our email addresses. I guess that means the existing list won't be as useful as I thought it might be. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From michele.simionato at gmail.com Thu Apr 9 14:18:51 2009 From: michele.simionato at gmail.com (Michele Simionato) Date: Thu, 9 Apr 2009 14:18:51 +0200 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <49DDE5F0.4070000@gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> <49DDE5F0.4070000@gmail.com> Message-ID: <4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com> On Thu, Apr 9, 2009 at 2:11 PM, Nick Coghlan wrote: > One of my hopes for PEP 362 was that I would be able to just add > __signature__ to the list of copied attributes, but that PEP is > currently short a champion to work through the process of resolving the > open issues and creating an up to date patch (Brett ended up with too > many things on his plate so he wasn't able to do it, and nobody else has > offered to take it over). I am totally ignorant about the internals of Python and I cannot certainly take that role. But I would like to hear from Guido if he wants to support a __signature__ object or if he does not care. In the first case I think somebody will take the job, in the second case it is better to reject the PEP and be done with it. From aahz at pythoncraft.com Thu Apr 9 14:53:12 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 9 Apr 2009 05:53:12 -0700 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <49DDD6AD.9020708@gmail.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de> <49DDD6AD.9020708@gmail.com> Message-ID: <20090409125312.GB1909@panix.com> On Thu, Apr 09, 2009, Nick Coghlan wrote: > > Martin v. L?wis wrote: >>> Such a policy would then translate to a dead end for Python 2.x >>> based applications. >> >> 2.x based applications *are* in a dead end, with the only exit >> being portage to 3.x. > > The actual end of the dead end just happens to be in 2013 or so :) More like 2016 or 2020 -- as of January, my former employer was still using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in the wilds. The transition to 3.x is more extreme, and lots of people will continue making do for years after any formal support is dropped. Whether this warrants including PEP 382 in 2.x, I don't know; I still don't really understand this proposal. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From ncoghlan at gmail.com Thu Apr 9 15:16:26 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 23:16:26 +1000 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> <49DDE5F0.4070000@gmail.com> <4edc17eb0904090518s62db9461hb190f0db29abb871@mail.gmail.com> Message-ID: <49DDF52A.60704@gmail.com> Michele Simionato wrote: > On Thu, Apr 9, 2009 at 2:11 PM, Nick Coghlan wrote: >> One of my hopes for PEP 362 was that I would be able to just add >> __signature__ to the list of copied attributes, but that PEP is >> currently short a champion to work through the process of resolving the >> open issues and creating an up to date patch (Brett ended up with too >> many things on his plate so he wasn't able to do it, and nobody else has >> offered to take it over). > > I am totally ignorant about the internals of Python and I cannot certainly > take that role. But I would like to hear from Guido if he wants to support > a __signature__ object or if he does not care. In the first case > I think somebody will take the job, in the second case it is better to > reject the PEP and be done with it. I don't recall Guido being opposed when PEP 362 was first being discussed (keeping in mind that was more than 2 years ago, so he's quite entitled to have changed his mind in the meantime!). That said, it's a sensible, largely straightforward idea, and by creating the object lazily it doesn't even have to incur a runtime cost in programs that don't do much introspection. I think the main problem leading to the current lack of movement on the PEP is that the existing inspect module is good enough for most practical purposes (which are fairly rare in the first place), so this isn't perceived as a huge gain even for the folks that are interested in introspection. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Thu Apr 9 15:28:08 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 09 Apr 2009 23:28:08 +1000 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <20090409125312.GB1909@panix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de> <49DDD6AD.9020708@gmail.com> <20090409125312.GB1909@panix.com> Message-ID: <49DDF7E8.9000001@gmail.com> Aahz wrote: > On Thu, Apr 09, 2009, Nick Coghlan wrote: >> Martin v. L?wis wrote: >>>> Such a policy would then translate to a dead end for Python 2.x >>>> based applications. >>> 2.x based applications *are* in a dead end, with the only exit >>> being portage to 3.x. >> The actual end of the dead end just happens to be in 2013 or so :) > > More like 2016 or 2020 -- as of January, my former employer was still > using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in > the wilds. Indeed - I know of a system that will finally be migrating from Python 2.2 to Python *2.4* later this year :) > The transition to 3.x is more extreme, and lots of people > will continue making do for years after any formal support is dropped. Yeah, I was only referring to the likely minimum time frame that python-dev would continue providing security releases. As you say, the actual 2.x version of the language will live on long after the day we close all remaining 2.x only bug reports and patches as "out of date". > Whether this warrants including PEP 382 in 2.x, I don't know; I still > don't really understand this proposal. I'd personally still prefer to keep the guideline that new features that are easy to backport *should* be backported, but that's really a decision for the authors of each new feature. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From asmodai at in-nomine.org Thu Apr 9 15:38:30 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 9 Apr 2009 15:38:30 +0200 Subject: [Python-Dev] py3k build erroring out on fileio? Message-ID: <20090409133830.GD13110@nexus.in-nomine.org> Just to make sure I am not doing something silly, with a configure line as such: ./configure --prefix=/home/asmodai/local --with-wide-unicode --with-pymalloc --with-threads --with-computed-gotos, would there be any reason why I am getting the following error with both BSD make and gmake: make: don't know how to make ./Modules/_fileio.c. Stop [Will log an issue if it turns out to, indeed, be a problem with the tree and not me.] -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Forgive us our trespasses, as we forgive those that trespass against us... From benjamin at python.org Thu Apr 9 15:41:12 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 9 Apr 2009 08:41:12 -0500 Subject: [Python-Dev] py3k build erroring out on fileio? In-Reply-To: <20090409133830.GD13110@nexus.in-nomine.org> References: <20090409133830.GD13110@nexus.in-nomine.org> Message-ID: <1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com> 2009/4/9 Jeroen Ruigrok van der Werven : > Just to make sure I am not doing something silly, with a configure line as > such: ./configure --prefix=/home/asmodai/local --with-wide-unicode > --with-pymalloc --with-threads --with-computed-gotos, would there be any > reason why I am getting the following error with both BSD make and gmake: > > make: don't know how to make ./Modules/_fileio.c. Stop > > [Will log an issue if it turns out to, indeed, be a problem with the tree > and not me.] It seems your Makefile is outdated. We moved the _fileio.c module around a few days, so maybe you just need a make distclean. -- Regards, Benjamin From asmodai at in-nomine.org Thu Apr 9 16:04:55 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 9 Apr 2009 16:04:55 +0200 Subject: [Python-Dev] py3k build erroring out on fileio? In-Reply-To: <1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com> References: <20090409133830.GD13110@nexus.in-nomine.org> <1afaf6160904090641h347a6cf4o936b2b161dd31130@mail.gmail.com> Message-ID: <20090409140455.GF13110@nexus.in-nomine.org> -On [20090409 15:41], Benjamin Peterson (benjamin at python.org) wrote: >It seems your Makefile is outdated. We moved the _fileio.c module >around a few days, so maybe you just need a make distclean. Yes, that was the cause. Thanks Benjamin. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B You yourself, as much as anybody in the entire universe, deserve your love and affection... From janssen at parc.com Thu Apr 9 17:08:50 2009 From: janssen at parc.com (Bill Janssen) Date: Thu, 9 Apr 2009 08:08:50 PDT Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <66887.1239289730@parc.com> Barry Warsaw wrote: > Anyway, aside from that decision, I haven't come up with an > elegant way to allow /output/ in both bytes and strings (input is I > think theoretically easier by sniffing the arguments). Probably a good thing. It just promotes more confusion to do things that way, IMO. Bill From john at arbash-meinel.com Thu Apr 9 17:02:14 2009 From: john at arbash-meinel.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 10:02:14 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure Message-ID: <49DE0DF6.1040900@arbash-meinel.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've been doing some memory profiling of my application, and I've found some interesting results with how intern() works. I was pretty surprised to see that the "interned" dict was actually consuming a significant amount of total memory. To give the specific values, after doing: bzr branch A B of a small project, the total memory consumption is ~21MB Of that, the largest single object is the 'interned' dict, at 1.57MB, which contains 22k strings. One interesting bit, the size of it + the referenced strings is only 2.4MB. So the "interned" dict *by itself* is 2/3rds the size of the dict + strings it contains. It also means that the average size of a referenced string is 37.4 bytes. A 'str' has 24 bytes of overhead, so the average string is 13.5 characters long. So to save references to 13.5*22k ~ 300kB of character data, we are paying 2.4MB, or about 8:1 overhead. When I looked at the actual references from interned, I saw mostly variable names. Considering that every variable goes through the python intern dict. And when you look at the intern function, it doesn't use setdefault logic, it actually does a get() followed by a set(), which means the cost of interning is 1-2 lookups depending on likelyhood, etc. (I saw a whole lot of strings as the error codes in win32all / winerror.py, and windows error codes tend to be longer-than-average variable length.) Anyway, I the internals of intern() could be done a bit better. Here are some concrete things: a) Don't keep a double reference to both key and value to the same object (1 pointer per entry), this could be as simple as using a Set() instead of a dict() b) Don't cache the hash key in the set, as strings already cache them. (1 long per entry). This is a big win for space, but would need to be balanced against lookup and collision resolving speed. My guess is that reducing the size of the set will actually improve speed more, because more items can fit in cache. It depends on how many times you need to resolve a collision. If the string hash is sufficiently spread out, and the load factor is reasonable, then likely when you actually find an item in the set, it will be the item you want, and you'll need to bring the string object into cache anyway, so that you can do a string comparison (rather than just a hash comparison.) c) Use the existing lookup function one time. (PySet->lookup()) Sets already have a "lookup" which is optimized for strings, and returns a pointer to where the object would go if it exists. Which means the intern() function can do a single lookup resolving any collisions, and return the object or insert without doing a second lookup. d) Having a special structure might also allow for separate optimizing of things like 'default size', 'grow rate', 'load factor', etc. A lot of this could be tuned specifically knowing that we really only have 1 of these objects, and it is going to be pointing at a lot of strings that are < 50 bytes long. If hashes of variable name strings are well distributed, we could probably get away with a load factor of 2. If we know we are likely to have lots and lots that never go away (you rarely *unload* modules, and all variable names are in the intern dict), that would suggest having a large initial size, and probably a wide growth factor to avoid spending a lot of time resizing the set. e) How tuned is String.hash() for the fact that most of these strings are going to be ascii text? (I know that python wants to support non-ascii variable names, but I still think there is going to be an overwhelming bias towards characters in the range 65-122 ('A'-'z'). Also note that the performance of the "interned" dict gets even worse on 64-bit platforms. Where the size of a 'dictentry' doubles, but the average length of a variable name wouldn't change. Anyway, I would be happy to implement something along the lines of a "StringSet", or maybe the "InternSet", etc. I just wanted to check if people would be interested or not. John =:-> PS> I'm not yet subscribed to python-dev, so if you could make sure to CC me in replies, I would appreciate it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkneDfYACgkQJdeBCYSNAAPMywCfQVWOg51dtIkWT/jttVTARV0g WJ4An1w7ypB+akHT5hiSwRKoUhH7ez4j =9TTp -----END PGP SIGNATURE----- From aahz at pythoncraft.com Thu Apr 9 17:31:23 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 9 Apr 2009 08:31:23 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com> References: <49DE0DF6.1040900@arbash-meinel.com> Message-ID: <20090409153123.GA2971@panix.com> On Thu, Apr 09, 2009, John Arbash Meinel wrote: > > PS> I'm not yet subscribed to python-dev, so if you could make sure to > CC me in replies, I would appreciate it. Please do subscribe to python-dev ASAP; I also suggest that you subscribe to python-ideas, because I suspect that this is sufficiently blue-sky to start there. As always, this is the kind of thing where code trumps gedanken, so you shouldn't expect much activity unless either you are willing to make at least initial attempts at trying out your ideas or someone else just happens to find it interesting. In general, the core Python implementation strives for simplicity, so there's already some built-in pushback. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From dirkjan at ochtman.nl Thu Apr 9 17:40:18 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 9 Apr 2009 17:40:18 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <20090409153123.GA2971@panix.com> References: <49DE0DF6.1040900@arbash-meinel.com> <20090409153123.GA2971@panix.com> Message-ID: On Thu, Apr 9, 2009 at 17:31, Aahz wrote: > Please do subscribe to python-dev ASAP; I also suggest that you subscribe > to python-ideas, because I suspect that this is sufficiently blue-sky to > start there. It might also be interesting to the unladen-swallow guys. Cheers, Dirkjan From daniel at stutzbachenterprises.com Thu Apr 9 17:55:47 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 9 Apr 2009 10:55:47 -0500 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw wrote: > Anyway, aside from that decision, I haven't come up with an elegant way to > allow /output/ in both bytes and strings (input is I think theoretically > easier by sniffing the arguments). > Won't this work? (assuming dumps() always returns a string) def dumpb(obj, encoding='utf-8', *args, **kw): s = dumps(obj, *args, **kw) return s.encode(encoding) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From tonynelson at georgeanelson.com Thu Apr 9 17:05:38 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 9 Apr 2009 11:05:38 -0400 Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: (email-sig added) At 08:07 -0400 04/09/2009, Steve Holden wrote: >Barry Warsaw wrote: ... >> This is an interesting question, and something I'm struggling with for >> the email package for 3.x. It turns out to be pretty convenient to have >> both a bytes and a string API, both for input and output, but I think >> email really wants to be represented internally as bytes. Maybe. Or >> maybe just for content bodies and not headers, or maybe both. Anyway, >> aside from that decision, I haven't come up with an elegant way to allow >> /output/ in both bytes and strings (input is I think theoretically >> easier by sniffing the arguments). >> >The real problem I came across in storing email in a relational database >was the inability to store messages as Unicode. Some messages have a >body in one encoding and an attachment in another, so the only ways to >store the messages are either as a monolithic bytes string that gets >parsed when the individual components are required or as a sequence of >components in the database's preferred encoding (if you want to keep the >original encoding most relational databases won't be able to help unless >you store the components as bytes). ... I found it confusing myself, and did it wrong for a while. Now, I understand that essages come over the wire as bytes, either 7-bit US-ASCII or 8-bit whatever, and are parsed at the receiver. I think of the database as a wire to the future, and store the data as bytes (a BLOB), letting the future receiver parse them as it did the first time, when I cleaned the message. Data I care to query is extracted into fields (in UTF-8, what I usually use for char fields). I have no need to store messages as Unicode, and they aren't Unicode anyway. I have no need ever to flatten a message to Unicode, only to US-ASCII or, for messages (spam) that are corrupt, raw 8-bit data. If you need the data from the message, by all means extract it and store it in whatever form is useful to the purpose of the database. If you need the entire message, store it intact in the database, as the bytes it is. Email isn't Unicode any more than a JPEG or other image types (often payloads in a message) are Unicode. -- ____________________________________________________________________ TonyN.:' ' From steve at holdenweb.com Thu Apr 9 18:20:31 2009 From: steve at holdenweb.com (Steve Holden) Date: Thu, 09 Apr 2009 12:20:31 -0400 Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: Tony Nelson wrote: > (email-sig added) > > At 08:07 -0400 04/09/2009, Steve Holden wrote: >> Barry Warsaw wrote: > ... >>> This is an interesting question, and something I'm struggling with for >>> the email package for 3.x. It turns out to be pretty convenient to have >>> both a bytes and a string API, both for input and output, but I think >>> email really wants to be represented internally as bytes. Maybe. Or >>> maybe just for content bodies and not headers, or maybe both. Anyway, >>> aside from that decision, I haven't come up with an elegant way to allow >>> /output/ in both bytes and strings (input is I think theoretically >>> easier by sniffing the arguments). >>> >> The real problem I came across in storing email in a relational database >> was the inability to store messages as Unicode. Some messages have a >> body in one encoding and an attachment in another, so the only ways to >> store the messages are either as a monolithic bytes string that gets >> parsed when the individual components are required or as a sequence of >> components in the database's preferred encoding (if you want to keep the >> original encoding most relational databases won't be able to help unless >> you store the components as bytes). > ... > > I found it confusing myself, and did it wrong for a while. Now, I > understand that essages come over the wire as bytes, either 7-bit US-ASCII > or 8-bit whatever, and are parsed at the receiver. I think of the database > as a wire to the future, and store the data as bytes (a BLOB), letting the > future receiver parse them as it did the first time, when I cleaned the > message. Data I care to query is extracted into fields (in UTF-8, what I > usually use for char fields). I have no need to store messages as Unicode, > and they aren't Unicode anyway. I have no need ever to flatten a message > to Unicode, only to US-ASCII or, for messages (spam) that are corrupt, raw > 8-bit data. > > If you need the data from the message, by all means extract it and store it > in whatever form is useful to the purpose of the database. If you need the > entire message, store it intact in the database, as the bytes it is. Email > isn't Unicode any more than a JPEG or other image types (often payloads in > a message) are Unicode. This is all great, and I did quite quickly realize that the best approach was to store the mails in their network byte-stream format as bytes. The approach was negated in my own case because of PostgreSQL's execrable BLOB-handling capabilities. I took a look at the escaping they required, snorted with derision and gave it up as a bad job. PostgreSQL strongly encourages you to store text as encoded columns. Because emails lack an encoding it turns out this is a most inconvenient storage type for it. Sadly BLOBs are such a pain in PostgreSQL that it's easier to store the messages in external files and just use the relational database to index those files to retrieve content, so that's what I ended up doing. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From collinw at gmail.com Thu Apr 9 18:29:00 2009 From: collinw at gmail.com (Collin Winter) Date: Thu, 9 Apr 2009 09:29:00 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com> References: <49DE0DF6.1040900@arbash-meinel.com> Message-ID: <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com> Hi John, On Thu, Apr 9, 2009 at 8:02 AM, John Arbash Meinel wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I've been doing some memory profiling of my application, and I've found > some interesting results with how intern() works. I was pretty surprised > to see that the "interned" dict was actually consuming a significant > amount of total memory. > To give the specific values, after doing: > ?bzr branch A B > of a small project, the total memory consumption is ~21MB [snip] > Anyway, I the internals of intern() could be done a bit better. Here are > some concrete things: [snip] Memory usage is definitely something we're interested in improving. Since you've already looked at this in some detail, could you try implementing one or two of your ideas and see if it makes a difference in memory consumption? Changing from a dict to a set looks promising, and should be a fairly self-contained way of starting on this. If it works, please post the patch on http://bugs.python.org with your results and assign it to me for review. Thanks, Collin Winter From john.arbash.meinel at gmail.com Thu Apr 9 18:34:24 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 11:34:24 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com> Message-ID: <49DE2390.4070305@gmail.com> ... >> Anyway, I the internals of intern() could be done a bit better. Here are >> some concrete things: >> > > [snip] > > Memory usage is definitely something we're interested in improving. > Since you've already looked at this in some detail, could you try > implementing one or two of your ideas and see if it makes a difference > in memory consumption? Changing from a dict to a set looks promising, > and should be a fairly self-contained way of starting on this. If it > works, please post the patch on http://bugs.python.org with your > results and assign it to me for review. > > Thanks, > Collin Winter > (I did end up subscribing, just with a different email address :) What is the best branch to start working from? "trunk"? John =:-> From collinw at gmail.com Thu Apr 9 18:36:29 2009 From: collinw at gmail.com (Collin Winter) Date: Thu, 9 Apr 2009 09:36:29 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE2390.4070305@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <43aa6ff70904090929t657a3154rdfc0f66c180469eb@mail.gmail.com> <49DE2390.4070305@gmail.com> Message-ID: <43aa6ff70904090936y32ea66b9o44a6eda4d50502b3@mail.gmail.com> On Thu, Apr 9, 2009 at 9:34 AM, John Arbash Meinel wrote: > ... > >>> Anyway, I the internals of intern() could be done a bit better. Here are >>> some concrete things: >>> >> >> [snip] >> >> Memory usage is definitely something we're interested in improving. >> Since you've already looked at this in some detail, could you try >> implementing one or two of your ideas and see if it makes a difference >> in memory consumption? Changing from a dict to a set looks promising, >> and should be a fairly self-contained way of starting on this. If it >> works, please post the patch on http://bugs.python.org with your >> results and assign it to me for review. >> >> Thanks, >> Collin Winter >> > (I did end up subscribing, just with a different email address :) > > What is the best branch to start working from? "trunk"? That's a good place to start, yes. If the idea works well, we'll want to port it to the py3k branch, too, but that can wait. Collin From lists at cheimes.de Thu Apr 9 19:05:24 2009 From: lists at cheimes.de (Christian Heimes) Date: Thu, 09 Apr 2009 19:05:24 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com> References: <49DE0DF6.1040900@arbash-meinel.com> Message-ID: <49DE2AD4.6090605@cheimes.de> John Arbash Meinel wrote: > When I looked at the actual references from interned, I saw mostly > variable names. Considering that every variable goes through the python > intern dict. And when you look at the intern function, it doesn't use > setdefault logic, it actually does a get() followed by a set(), which > means the cost of interning is 1-2 lookups depending on likelyhood, etc. > (I saw a whole lot of strings as the error codes in win32all / > winerror.py, and windows error codes tend to be longer-than-average > variable length.) I've read your posting twice but I'm still not sure if you are aware of the most important feature of interned strings. In the first place interning not about saving some bytes of memory but a speed optimization. Interned strings can be compared with a simple and fast pointer comparison. With interend strings you can simple write: char *a, *b; if (a == b) { ... } Instead of: char *a, *b; if (strcmp(a, b) == 0) { ... } A compiler can optimize the pointer comparison much better than a function call. > Anyway, I the internals of intern() could be done a bit better. Here are > some concrete things: > > a) Don't keep a double reference to both key and value to the same > object (1 pointer per entry), this could be as simple as using a > Set() instead of a dict() > > b) Don't cache the hash key in the set, as strings already cache them. > (1 long per entry). This is a big win for space, but would need to > be balanced against lookup and collision resolving speed. > > My guess is that reducing the size of the set will actually improve > speed more, because more items can fit in cache. It depends on how > many times you need to resolve a collision. If the string hash is > sufficiently spread out, and the load factor is reasonable, then > likely when you actually find an item in the set, it will be the > item you want, and you'll need to bring the string object into > cache anyway, so that you can do a string comparison (rather than > just a hash comparison.) > > c) Use the existing lookup function one time. (PySet->lookup()) > Sets already have a "lookup" which is optimized for strings, and > returns a pointer to where the object would go if it exists. Which > means the intern() function can do a single lookup resolving any > collisions, and return the object or insert without doing a second > lookup. > > d) Having a special structure might also allow for separate optimizing > of things like 'default size', 'grow rate', 'load factor', etc. A > lot of this could be tuned specifically knowing that we really only > have 1 of these objects, and it is going to be pointing at a lot of > strings that are < 50 bytes long. > > If hashes of variable name strings are well distributed, we could > probably get away with a load factor of 2. If we know we are likely > to have lots and lots that never go away (you rarely *unload* > modules, and all variable names are in the intern dict), that would > suggest having a large initial size, and probably a wide growth > factor to avoid spending a lot of time resizing the set. I agree that a dict is not the most memory efficient data structure for interned strings. However dicts are extremely well tested and highly optimized. Any specialized data structure needs to be desinged and tested very carefully. If you happen to break the interning system it's going to lead to rather nasty and hard to debug problems. > e) How tuned is String.hash() for the fact that most of these strings > are going to be ascii text? (I know that python wants to support > non-ascii variable names, but I still think there is going to be an > overwhelming bias towards characters in the range 65-122 ('A'-'z'). Python 3.0 uses unicode for all names. You have to design something that can be adopted to unicode, too. By the way do you know that dicts have an optimized lookup function for strings? It's called lookdict_unicode / lookdict_string. > Also note that the performance of the "interned" dict gets even worse on > 64-bit platforms. Where the size of a 'dictentry' doubles, but the > average length of a variable name wouldn't change. > > Anyway, I would be happy to implement something along the lines of a > "StringSet", or maybe the "InternSet", etc. I just wanted to check if > people would be interested or not. Since interning is mostly used in the core and extension modules you might want to experiment with a different growth rate. The interning data structure could start with a larger value and have a slower, non progressive data growth rate. Christian From tonynelson at georgeanelson.com Thu Apr 9 19:14:21 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 9 Apr 2009 13:14:21 -0400 Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: (email-sig dropped, as I didn't see Steve Holden's message there) At 12:20 -0400 04/09/2009, Steve Holden wrote: >Tony Nelson wrote: ... >> If you need the data from the message, by all means extract it and store it >> in whatever form is useful to the purpose of the database. If you need the >> entire message, store it intact in the database, as the bytes it is. Email >> isn't Unicode any more than a JPEG or other image types (often payloads in >> a message) are Unicode. > >This is all great, and I did quite quickly realize that the best >approach was to store the mails in their network byte-stream format as >bytes. The approach was negated in my own case because of PostgreSQL's >execrable BLOB-handling capabilities. I took a look at the escaping they >required, snorted with derision and gave it up as a bad job. ... I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that PostgreSQL has no real support for BLOBs. I agree that having to import them from a file is awful. Also, there appears to be a severe limit on the size of character data fields, so storing in Base64 is out. About the only thing to do then is to use external storage for the BLOBs. Still, email seems to demand such binary storage, whether all databases provide it or not. -- ____________________________________________________________________ TonyN.:' ' From phd at phd.pp.ru Thu Apr 9 19:24:24 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 9 Apr 2009 21:24:24 +0400 Subject: [Python-Dev] BLOBs in Pg (was: email package Bytes vs Unicode) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <20090409172424.GD26429@phd.pp.ru> On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: > I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that > PostgreSQL has no real support for BLOBs. I think it has - BYTEA data type. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From john.arbash.meinel at gmail.com Thu Apr 9 19:35:05 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 12:35:05 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE2AD4.6090605@cheimes.de> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> Message-ID: <49DE31C9.103@gmail.com> Christian Heimes wrote: > John Arbash Meinel wrote: >> When I looked at the actual references from interned, I saw mostly >> variable names. Considering that every variable goes through the python >> intern dict. And when you look at the intern function, it doesn't use >> setdefault logic, it actually does a get() followed by a set(), which >> means the cost of interning is 1-2 lookups depending on likelyhood, etc. >> (I saw a whole lot of strings as the error codes in win32all / >> winerror.py, and windows error codes tend to be longer-than-average >> variable length.) > > I've read your posting twice but I'm still not sure if you are aware of > the most important feature of interned strings. In the first place > interning not about saving some bytes of memory but a speed > optimization. Interned strings can be compared with a simple and fast > pointer comparison. With interend strings you can simple write: > > char *a, *b; > if (a == b) { > ... > } > > Instead of: > > char *a, *b; > if (strcmp(a, b) == 0) { > ... > } > > A compiler can optimize the pointer comparison much better than a > function call. > Certainly. But there is a cost associated with calling intern() in the first place. You created a string, and you are now trying to de-dup it. That cost is both in the memory to track all strings interned so far, and the cost to do a dict lookup. And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. I'll also note that increasing memory does have a semi-direct effect on performance, because more memory requires more time to bring memory back and forth from main memory to CPU caches. ... > I agree that a dict is not the most memory efficient data structure for > interned strings. However dicts are extremely well tested and highly > optimized. Any specialized data structure needs to be desinged and > tested very carefully. If you happen to break the interning system it's > going to lead to rather nasty and hard to debug problems. Sure. My plan was to basically take the existing Set/Dict design, and just tweak it slightly for the expected operations of "interned". > >> e) How tuned is String.hash() for the fact that most of these strings >> are going to be ascii text? (I know that python wants to support >> non-ascii variable names, but I still think there is going to be an >> overwhelming bias towards characters in the range 65-122 ('A'-'z'). > > Python 3.0 uses unicode for all names. You have to design something that > can be adopted to unicode, too. By the way do you know that dicts have > an optimized lookup function for strings? It's called lookdict_unicode / > lookdict_string. Sure, but so does PySet. I'm not sure about lookset_unicode, but I would guess that exists or should exist for py3k. > >> Also note that the performance of the "interned" dict gets even worse on >> 64-bit platforms. Where the size of a 'dictentry' doubles, but the >> average length of a variable name wouldn't change. >> >> Anyway, I would be happy to implement something along the lines of a >> "StringSet", or maybe the "InternSet", etc. I just wanted to check if >> people would be interested or not. > > Since interning is mostly used in the core and extension modules you > might want to experiment with a different growth rate. The interning > data structure could start with a larger value and have a slower, non > progressive data growth rate. > > Christian I'll also mention that there are other uses for intern() where it is uniquely suitable. Namely, if you are parsing lots of text with redundant strings, it is a way to decrease total memory consumption. (And potentially speed up future comparisons, etc.) The main reason why intern() is useful for this is because it doesn't make strings immortal, as would happen if you used some other structure. Because strings know about the "interned" object. The options for a 3rd-party structure fall down into something like: 1) A cache that makes the strings immortal. (IIRC this is what older versions of Python did.) 2) A cache that is periodically walked to see if any of the objects are no longer externally referenced. The main problem here is that walking is O(all-objects), whereas doing the checking at refcount=0 time means you only check objects when you think the last reference has gone away. 3) Hijacking PyStringType->dealloc, so that when the refcount goes to 0 and Python want's to destroy the string, you then trigger your own cache to look and see if it should remove the object. Even further, you either have to check on every string dealloc, or re-use PyStringObject->ob_sstate to track that you have placed this string into your custom structure. Which would preclude ever calling intern() on this string, because intern() doesn't just check a couple bits, it looks at the entire ob_sstate value. I think you could make it work, such that if your custom cache had set some values, then intern() would just return without evaluating, and during dealloc you could make sure that you set ob_sstate back to 0 before letting the rest of the python machinery dealloc the string. John =:-> From steve at holdenweb.com Thu Apr 9 20:05:54 2009 From: steve at holdenweb.com (Steve Holden) Date: Thu, 09 Apr 2009 14:05:54 -0400 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: <20090409172424.GD26429@phd.pp.ru> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> Message-ID: <49DE3902.70103@holdenweb.com> Oleg Broytmann wrote: > On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: >> I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that >> PostgreSQL has no real support for BLOBs. > > I think it has - BYTEA data type. > But the Python DB adapters appears to require some fairly hairy escaping of the data to make it usable with the cursor execute() method. IMHO you shouldn't have to escape data that is passed for insertion via a parameterized query. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From john.arbash.meinel at gmail.com Thu Apr 9 20:20:11 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 13:20:11 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: References: <49DE0DF6.1040900@arbash-meinel.com> Message-ID: <49DE3C5B.6020308@gmail.com> Alexander Belopolsky wrote: > On Thu, Apr 9, 2009 at 11:02 AM, John Arbash Meinel > wrote: > ... >> a) Don't keep a double reference to both key and value to the same >> object (1 pointer per entry), this could be as simple as using a >> Set() instead of a dict() >> > > There is a rejected patch implementing just that: > http://bugs.python.org/issue1507011 . > Thanks for the heads up. So reading that thread, the final reason it was rejected was 2 part: Without reviewing the patch again, I also doubt it is capable of getting rid of the reference count cheating: essentially, this cheating enables the interning dictionary to have weak references to strings, this is important to allow automatic collection of certain interned strings. This feature needs to be preserved, so the cheating in the reference count must continue. That specific argument was invalid. Because the patch just changed the refcount trickery to use +- 1. And I'm pretty sure Alexander's argument was just that +- 2 was weird, not that the "weakref" behavior was bad. The other argument against the patch was based on the idea that: The operation "give me the member equal but not identical to E" is conceptually a lookup operation; the mathematical set construct has no such operation, and the Python set models it closely. IOW, set is *not* a dict with key==value. I don't know if there was any consensus reached on this, since only Martin responded this way. I can say that for my "do some work with a medium size code base", the overhead of "interned" as a dictionary was 1.5MB out of 20MB total memory. Simply changing it to a Set would drop this to 1.0MB. I have no proof about the impact on performance, since I haven't benchmarked it yet. Changing it to a StringSet could further drop it to 0.5MB. I would guess that any performance impact would depend on whether the total size of 'interned' would fit inside L2 cache or not. There is a small bug in the original patch adding the string to the set failed. Namely it would return "t == NULL" which would be "t != s" and the intern in place would end up setting your pointer to NULL rather than doing nothing and clearing the error code. So I guess some of it comes down to whether "loweis" would also reject this change on the basis that mathematically a "set is not a dict". Though given that his claim "nobody else is speaking in favor of the patch", while at least Colin Winter has expressed some interest at this point. John =:-> From martin at v.loewis.de Thu Apr 9 20:25:35 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Apr 2009 20:25:35 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <49DE3D9F.3000902@v.loewis.de> > This is an interesting question, and something I'm struggling with for > the email package for 3.x. It turns out to be pretty convenient to have > both a bytes and a string API, both for input and output, but I think > email really wants to be represented internally as bytes. Maybe. Or > maybe just for content bodies and not headers, or maybe both. Anyway, > aside from that decision, I haven't come up with an elegant way to allow > /output/ in both bytes and strings (input is I think theoretically > easier by sniffing the arguments). If you allow for content-transfer-encoding: 8bit, I think there is just no way to represent email as text. You have to accept conversion to, say, base64 (or quoted-unreadable) when converting an email message to text. Regards, Martin From tonynelson at georgeanelson.com Thu Apr 9 20:43:16 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 9 Apr 2009 14:43:16 -0400 Subject: [Python-Dev] BLOBs in Pg (was: email package Bytes vs Unicode) In-Reply-To: <20090409172424.GD26429@phd.pp.ru> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> Message-ID: At 21:24 +0400 04/09/2009, Oleg Broytmann wrote: >On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: >> I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that >> PostgreSQL has no real support for BLOBs. > > I think it has - BYTEA data type. So it does; I see that now that I've opened up the PostgreSQL docs. I don't find escaping data to be a problem -- I do it for all untrusted data. So, after all, there isn't an example of a database that makes onerous the storing of email and other such byte-oriented data, and Python's email package has no need for workarounds in that area. -- ____________________________________________________________________ TonyN.:' ' From martin at v.loewis.de Thu Apr 9 21:06:40 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Apr 2009 21:06:40 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE3C5B.6020308@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> Message-ID: <49DE4740.2040205@v.loewis.de> > So I guess some of it comes down to whether "loweis" would also reject > this change on the basis that mathematically a "set is not a dict". I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be accepted (in msg50482). I found that a weak rationale for making that change, in particular because I think the rationale is incorrect. I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming. > Though given that his claim "nobody else is speaking in favor of the > patch", while at least Colin Winter has expressed some interest at this > point. Again, at that point in the tracker, none of the other committers had spoken in favor of the patch. Since I wasn't convinced of its correctness, and nobody else (whom I trust) had reviewed it as correct, I rejected it. Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k interned strings is "typical", I still don't know. Wrt. your proposed change, I would be worried about maintainability, in particular if it would copy parts of the set implementation. Regards, Martin import gc, sys def find_interned_dict(): cand = None for o in gc.get_objects(): if not isinstance(o, dict): continue if "find_interned_dict" not in o: continue for k,v in o.iteritems(): if k is not v: break else: assert not cand cand = o return cand d = find_interned_dict() print len(d), sys.getsizeof(d) l = [] for i in range(20000): if i%100==0: print len(d), sys.getsizeof(d) l.append(intern(repr(i))) From benjamin at python.org Thu Apr 9 21:17:39 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 9 Apr 2009 14:17:39 -0500 Subject: [Python-Dev] calling dictresize outside dictobject.c In-Reply-To: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> References: <6CE3CEB2-0753-4708-99A5-78F2B05A054C@colgate.edu> Message-ID: <1afaf6160904091217g30cbda5bt27529a4fe44e5f0e@mail.gmail.com> Hi Dan, Thanks for your interest. 2009/4/6 Dan Schult : > Hi, > I'm trying to write a C extension which is a subclass of dict. > I want to do something like a setdefault() but with a single lookup. > > Looking through the dictobject code, the three workhorse > routines lookdict, insertdict and dictresize are not available > directly for functions outside dictobject.c, > but I can get at lookdict through dict->ma_lookup(). > > So I use lookdict to get the PyDictEntry (call it ep) I'm looking for. > The comments for lookdict say ep is ready to be set... so I do that. > Then I check whether the dict needs to be resized--following the > nice example of PyDict_SetItem. ?But I can't call dictresize to finish > off the process. > > Should I be using PyDict_SetItem directly? ?No... it does its own lookup. > I don't want a second lookup! ? I already know which entry will be filled. > > So then I look at the code for setdefault and it also does > a double lookup for checking and setting an entry. > > What subtle issue am I missing? > Why does setdefault do a double lookup? > More globally, why isn't dictresize available through the C-API? Because it's not useful outside the intimate implementation details of dictobject.c > > If there isn't a reason to do a double lookup I have a patch for setdefault, > but I thought I should ask here first. Raymond tells me the cost of the second lookup is negligible because of caching, but PyObject_Hash needn't be called two times. He's working on a patch later today. -- Regards, Benjamin From alexandre at peadrop.com Thu Apr 9 21:51:15 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 9 Apr 2009 15:51:15 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou wrote: > As for reading/writing bytes over the wire, JSON is often used in the same > context as HTML: you are supposed to know the charset and decode/encode the > payload using that charset. However, the RFC specifies a default encoding of > utf-8. (*) > > > (*) http://www.ietf.org/rfc/rfc4627.txt > That is one short and sweet RFC. :-) > The RFC also specifies a discrimination algorithm for non-supersets of ASCII > (?Since the first two characters of a JSON text will always be ASCII > ? characters [RFC0020], it is possible to determine whether an octet > ? stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > ? at the pattern of nulls in the first four octets.?), but it is not > implemented in the json module: > Given the RFC specifies that the encoding used should be one of the encodings defined by Unicode, wouldn't be a better idea to remove the "unicode" support, instead? To me, it would make sense to use the detection algorithms for Unicode to sniff the encoding of the JSON stream and then use the detected encoding to decode the strings embed in the JSON stream. Cheers, -- Alexandre From john.arbash.meinel at gmail.com Thu Apr 9 21:59:02 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 14:59:02 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE4740.2040205@v.loewis.de> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de> Message-ID: <49DE5386.7070908@gmail.com> ... > I like your rationale (save memory) much more, and was asking in the > tracker for specific numbers, which weren't forthcoming. > ... > Now that you brought up a specific numbers, I tried to verify them, > and found them correct (although a bit unfortunate), please see my > test script below. Up to 21800 interned strings, the dict takes (only) > 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k > interned strings is "typical", I still don't know. Given that every variable name in any file is interned, it can grow pretty rapidly. As an extreme case, consider the file "win32/lib/winerror.py" which tracks all possible win32 errors. >>> import winerror >>> print len(winerror.__dict__) 1872 So a single error file has 1.9k strings. My python version (2.5.2) doesn't have 'sys.getsizeof()', but otherwise your code looks correct. If all I do is find the interned dict, I see: >>> print len(d) 5037 So stock python, without importing much extra (just os, sys, gc, etc.) has almost 5k strings already. I don't have a great regex yet for just extracting how many unique strings there are in a given bit of source code. However, if I do: import gc, sys def find_interned_dict(): cand = None for o in gc.get_objects(): if not isinstance(o, dict): continue if "find_interned_dict" not in o: continue for k,v in o.iteritems(): if k is not v: break else: assert not cand cand = o return cand d = find_interned_dict() print len(d) # Just import a few of the core structures from bzrlib import branch, repository, workingtree, builtins print len(d) I start at 5k strings, and after just importing the important bits of bzrlib, I'm at: 19,316 Now, the bzrlib source code isn't particularly huge. It is about 3.7MB / 91k lines of .py files (that is, without importing the test suite). Memory consumption with just importing bzrlib shows up at 15MB, with 300kB taken up by the intern dict. If I then import some extra bits of bzrlib, like http support, ftp support, and sftp support (which brings in python's httplib, and paramiko, and ssh/sftp implementation), I'm up to: >>> print len(d) 25186 Memory has jumped to 23MB, (interned is now 1.57MB) and I haven't actually done anything but import python code yet. If I sum the size of the PyString objects held in intern() it ammounts to 940KB. Though they refer to only 335KB of char data. (or an average of 13 bytes per string). > > Wrt. your proposed change, I would be worried about maintainability, > in particular if it would copy parts of the set implementation. Right, so in the first part, I would just use Set(), as it could then save 1/3rd of the memory it uses today. (Dropping down to 1MB from 1.5MB.) I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a *lot* of strings. Though honestly, Bazaar would really like this, because startup overhead for us is almost 400ms to 'do nothing', which is a lot for a command line app. John =:-> From martin at v.loewis.de Thu Apr 9 22:05:28 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Apr 2009 22:05:28 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1> References: <49DCEDFF.7050708@v.loewis.de> <49DD8DC8.8020302@v.loewis.de> <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1> Message-ID: <49DE5508.7000309@v.loewis.de> >> I can understand that you don't want to spend much time on it. How >> about removing it from 3.1? We could re-add it when long-term support >> becomes more likely. > > I'm speechless. It seems that my statement has surprised you, so let me explain: I think we should refrain from making design decisions (such as API decisions) without Bob's explicit consent, unless we assign a new maintainer for the simplejson module (perhaps just for the 3k branch, which perhaps would be a fork from Bob's code). Antoine suggests that Bob did not comment on the issues at hand, therefore, we should not proceed with the proposed design. Since the 3.1 release is only a few weeks ahead, we have the choice of either shipping with the broken version that is currently in the 3k branch, or drop the module from the 3k branch. I believe our users are better served by not having to waste time with a module that doesn't quite work, or may change. Regards, Martin From martin at v.loewis.de Thu Apr 9 22:13:40 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 09 Apr 2009 22:13:40 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE5386.7070908@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de> <49DE5386.7070908@gmail.com> Message-ID: <49DE56F4.6050904@v.loewis.de> > I don't have numbers on how much that would improve CPU times, I would > imagine improving 'intern()' would impact import times more than run > times, simply because import time is interning a *lot* of strings. > > Though honestly, Bazaar would really like this, because startup overhead > for us is almost 400ms to 'do nothing', which is a lot for a command > line app. Maybe I misunderstand your proposed change: how could the representation of the interning dict possibly change the runtime of interning? (let alone significantly) Regards, Martin From martin at v.loewis.de Thu Apr 9 22:19:43 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 09 Apr 2009 22:19:43 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: Message-ID: <49DE585F.6040209@v.loewis.de> Alexandre Vassalotti wrote: > On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou wrote: >> As for reading/writing bytes over the wire, JSON is often used in the same >> context as HTML: you are supposed to know the charset and decode/encode the >> payload using that charset. However, the RFC specifies a default encoding of >> utf-8. (*) >> >> >> (*) http://www.ietf.org/rfc/rfc4627.txt >> > > That is one short and sweet RFC. :-) It is indeed well-specified. Unfortunately, it only talks about the application/json type; the pre-existing other versions of json in MIME types vary widely, such as text/plain (possibly with a charset= parameter), text/json, or text/javascript. For these, the RFC doesn't apply. > Given the RFC specifies that the encoding used should be one of the > encodings defined by Unicode, wouldn't be a better idea to remove the > "unicode" support, instead? To me, it would make sense to use the > detection algorithms for Unicode to sniff the encoding of the JSON > stream and then use the detected encoding to decode the strings embed > in the JSON stream. That might be reasonable. (but then, I also stand by my view that we shouldn't proceed without Bob's approval). Regards, Martin From john.arbash.meinel at gmail.com Thu Apr 9 22:22:04 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 15:22:04 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE56F4.6050904@v.loewis.de> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de> <49DE5386.7070908@gmail.com> <49DE56F4.6050904@v.loewis.de> Message-ID: <49DE58EC.4000803@gmail.com> Martin v. L?wis wrote: >> I don't have numbers on how much that would improve CPU times, I would >> imagine improving 'intern()' would impact import times more than run >> times, simply because import time is interning a *lot* of strings. >> >> Though honestly, Bazaar would really like this, because startup overhead >> for us is almost 400ms to 'do nothing', which is a lot for a command >> line app. > > Maybe I misunderstand your proposed change: how could the representation > of the interning dict possibly change the runtime of interning? (let > alone significantly) > > Regards, > Martin > Decreasing memory consumption lets more things fit in cache. Once the size of 'interned' is greater than fits into L2 cache, you start paying the cost of a full memory fetch, which is usually measured in 100s of cpu cycles. Avoiding double lookups in the dictionary would be less overhead, though the second lookup is probably pretty fast if there are no collisions, since everything would already be in the local CPU cache. If we were dealing in objects that were KB in size, it wouldn't matter. But as the intern dict quickly gets into MB, it starts to make a bigger difference. How big of a difference would be very CPU and dataset size specific. But certainly caches make certain things much faster, and once you overflow a cache, performance can take a surprising turn. So my primary goal is certainly a decrease of memory consumption. I think it will have a small knock-on effect of improving performance, I don't have anything to give concrete numbers. Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect. John =:-> From steve at holdenweb.com Thu Apr 9 22:42:21 2009 From: steve at holdenweb.com (Steve Holden) Date: Thu, 09 Apr 2009 16:42:21 -0400 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> Message-ID: Tony Nelson wrote: > At 21:24 +0400 04/09/2009, Oleg Broytmann wrote: >> On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: >>> I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that >>> PostgreSQL has no real support for BLOBs. >> I think it has - BYTEA data type. > > So it does; I see that now that I've opened up the PostgreSQL docs. I > don't find escaping data to be a problem -- I do it for all untrusted data. > You shouldn't have to when you are using parameterized queries. > So, after all, there isn't an example of a database that makes onerous the > storing of email and other such byte-oriented data, and Python's email > package has no need for workarounds in that area. Create a table: CREATE TABLE tst ( id serial, byt bytea, PRIMARY KEY (id) ) WITH (OIDS=FALSE) ; ALTER TABLE tst OWNER TO steve; The following program prints "0": import psycopg2 as db conn = db.connect(database="maildb", user="@@@", password="@@@", host="localhost", port=5432) curs = conn.cursor() curs.execute("DELETE FROM tst") curs.execute("INSERT INTO tst (byt) VALUES (%s)", ("".join(chr(i) for i in range(256)), )) conn.commit() curs.execute("SELECT byt FROM tst") for st, in curs.fetchall(): print len(st) If I change the date to use range(1, 256) I get a ProgrammingError fron PostgreSQL "invalid input syntax for type bytea". If I can't pass a 256-byte string into a BLOB and get it back without anything like this happening then there's *something* in the chain that makes the database useless. My current belief is that this something is fairly deeply embedded in the PostgreSQL engine. No "syntax" should be necessary. I suppose if we have to go round again on this we should take it to email as we have gotten pretty far off-topic for python-dev. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From aahz at pythoncraft.com Thu Apr 9 22:53:26 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 9 Apr 2009 13:53:26 -0700 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> Message-ID: <20090409205326.GA2807@panix.com> On Thu, Apr 09, 2009, Steve Holden wrote: > > import psycopg2 as db > conn = db.connect(database="maildb", user="@@@", password="@@@", > host="localhost", port=5432) > curs = conn.cursor() > curs.execute("DELETE FROM tst") > curs.execute("INSERT INTO tst (byt) VALUES (%s)", > ("".join(chr(i) for i in range(256)), )) > conn.commit() > curs.execute("SELECT byt FROM tst") > for st, in curs.fetchall(): > print len(st) > > If I change the date to use range(1, 256) I get a ProgrammingError fron > PostgreSQL "invalid input syntax for type bytea". > > If I can't pass a 256-byte string into a BLOB and get it back without > anything like this happening then there's *something* in the chain that > makes the database useless. My current belief is that this something is > fairly deeply embedded in the PostgreSQL engine. No "syntax" should be > necessary. You're not using a parameterized query. I suggest you post to c.l.py for more information. ;-) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From phd at phd.pp.ru Thu Apr 9 23:12:17 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 10 Apr 2009 01:12:17 +0400 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> Message-ID: <20090409211217.GA7897@phd.pp.ru> On Thu, Apr 09, 2009 at 04:42:21PM -0400, Steve Holden wrote: > If I can't pass a 256-byte string into a BLOB and get it back without > anything like this happening then there's *something* in the chain that > makes the database useless. import psycopg2 con = psycopg2.connect(database="test") cur = con.cursor() cur.execute("CREATE TABLE test (id serial, data BYTEA)") cur.execute('INSERT INTO test (data) VALUES (%s)', (psycopg2.Binary(''.join([chr(i) for i in range(256)])),)) cur.execute('SELECT * FROM test ORDER BY id') for rec in cur.fetchall(): print rec[0], type(rec[1]), repr(str(rec[1])) Result: 1 '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' What am I doing wrong? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From bob at redivi.com Thu Apr 9 23:13:50 2009 From: bob at redivi.com (Bob Ippolito) Date: Thu, 9 Apr 2009 14:13:50 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DE5508.7000309@v.loewis.de> References: <49DCEDFF.7050708@v.loewis.de> <49DD8DC8.8020302@v.loewis.de> <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1> <49DE5508.7000309@v.loewis.de> Message-ID: <6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com> On Thu, Apr 9, 2009 at 1:05 PM, "Martin v. L?wis" wrote: >>> I can understand that you don't want to spend much time on it. How >>> about removing it from 3.1? We could re-add it when long-term support >>> becomes more likely. >> >> I'm speechless. > > It seems that my statement has surprised you, so let me explain: > > I think we should refrain from making design decisions (such as > API decisions) without Bob's explicit consent, unless we assign > a new maintainer for the simplejson module (perhaps just for the > 3k branch, which perhaps would be a fork from Bob's code). > > Antoine suggests that Bob did not comment on the issues at hand, > therefore, we should not proceed with the proposed design. Since > the 3.1 release is only a few weeks ahead, we have the choice of > either shipping with the broken version that is currently in the > 3k branch, or drop the module from the 3k branch. I believe our > users are better served by not having to waste time with a module > that doesn't quite work, or may change. Most of my time to spend on json/simplejson and these mailing list discussions is on weekends, I try not to bother with it when I'm busy doing Actual Work unless there is a bug or some other issue that needs more immediate attention. I also wasn't aware that I was expected to comment on those issues. I'm CC'ed on the discussion for issue4136 but I don't see any unanswered questions directed at me. I have the issues (issue5723, issue4136) starred in my gmail and I planned to look at it more closely later, hopefully on Friday or Saturday. As far as Python 3 goes, I honestly have not yet familiarized myself with the changes to the IO infrastructure and what the new idioms are. At this time, I can't make any educated decisions with regard to how it should be done because I don't know exactly how bytes are supposed to work and what the common idioms are for other libraries in the stdlib that do similar things. Until I figure that out, someone else is better off making decisions about the Python 3 version. My guess is that it should work the same way as it does in Python 2.x: take bytes or unicode input in loads (which means encoding is still relevant). I also think the output of dumps should also be bytes, since it is a serialization, but I am not sure how other libraries do this in Python 3 because one could argue that it is also text. If other libraries that do text/text encodings (e.g. binascii, mimelib, ...) use str for input and output instead of bytes then maybe Antoine's changes are the right solution and I just don't know better because I'm not up to speed with how people write Python 3 code. I'll do my best to find some time to look into Python 3 more closely soon, but thus far I have not been very motivated to do so because Python 3 isn't useful for us at work and twiddling syntax isn't a very interesting problem for me to solve. -bob From martin at v.loewis.de Fri Apr 10 00:07:23 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 10 Apr 2009 00:07:23 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com> References: <49DCEDFF.7050708@v.loewis.de> <49DD8DC8.8020302@v.loewis.de> <351C98023EB24D4EACF1794F7CDAC272@RaymondLaptop1> <49DE5508.7000309@v.loewis.de> <6a36e7290904091413i10994056k754b6ce04a93c0c5@mail.gmail.com> Message-ID: <49DE719B.8050101@v.loewis.de> > As far as Python 3 goes, I honestly have not yet familiarized myself > with the changes to the IO infrastructure and what the new idioms are. > At this time, I can't make any educated decisions with regard to how > it should be done because I don't know exactly how bytes are supposed > to work and what the common idioms are for other libraries in the > stdlib that do similar things. It's really very similar to 2.x: the "bytes" type is to used in all interfaces that operate on byte sequences that may or may not represent characters; in particular, for interface where the operating system deliberately uses bytes - ie. low-level file IO and socket IO; also for cases where the encoding is embedded in the stream that still needs to be processed (e.g. XML parsing). (Unicode) strings should be used where the data is truly text by nature, i.e. where no encoding information is necessary to find out what characters are intended. It's used on interfaces where the encoding is known (e.g. text IO, where the encoding is specified on opening, XML parser results, with the declared encoding, and GUI libraries, which naturally expect text). > Until I figure that out, someone else > is better off making decisions about the Python 3 version. Some of us can certainly explain to you how this is supposed to work. However, we need you to check any assumption against the known use cases - would the users of the module be happy if it worked one way or the other? > My guess is > that it should work the same way as it does in Python 2.x: take bytes > or unicode input in loads (which means encoding is still relevant). I > also think the output of dumps should also be bytes, since it is a > serialization, but I am not sure how other libraries do this in Python > 3 because one could argue that it is also text. This, indeed, had been an endless debate, and, in the end, the decision was somewhat arbitrary. Here are some examples: - base64.encodestring expects bytes (naturally, since it is supposed to encode arbitrary binary data), and produces bytes (debatably) - binascii.b2a_hex likewise (expect and produce bytes) - pickle.dumps produces bytes (uniformly, both for binary and text pickles) - marshal.dumps likewise - email.message.Message().as_string produces a (unicode) string (see Barry's recent thread on whether that's a good thing; the email package hasn't been fully ported to 3k, either) - the XML libraries (continue to) parse bytes, and produce Unicode strings - for the IO libraries, see above > If other libraries > that do text/text encodings (e.g. binascii, mimelib, ...) use str for > input and output See above - most of them don't; mimetools is no longer (replaced by email package) > instead of bytes then maybe Antoine's changes are the > right solution and I just don't know better because I'm not up to > speed with how people write Python 3 code. There isn't too much fresh end-user code out there, so we can't really tell, either. As for standard library users - users will do whatever the library forces them to do. This is why I'm so concerned about this issue: we should get it right, or not done at all. I still think you would be the best person to determine what is right. > I'll do my best to find some time to look into Python 3 more closely > soon, but thus far I have not been very motivated to do so because > Python 3 isn't useful for us at work and twiddling syntax isn't a very > interesting problem for me to solve. And I didn't expect you to - it seems people are quite willing to do the actual work, as long as there is some guidance. Regards, Martin From martin at v.loewis.de Fri Apr 10 00:10:18 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 10 Apr 2009 00:10:18 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE58EC.4000803@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de> <49DE5386.7070908@gmail.com> <49DE56F4.6050904@v.loewis.de> <49DE58EC.4000803@gmail.com> Message-ID: <49DE724A.1070300@v.loewis.de> > Also, consider that resizing has to evaluate every object, thus paging > in all X bytes, and assigning to another 2X bytes. Cutting X by > (potentially 3), would probably have a small but measurable effect. I'm *very* skeptical about claims on performance in the absence of actual measurements. Too many effects come together, so the actual performance is difficult to predict (and, for that prediction, you would need *at least* a work load that you want to measure - starting bzr would be such a workload, of course). Regards, Martin From steve at holdenweb.com Fri Apr 10 01:56:25 2009 From: steve at holdenweb.com (Steve Holden) Date: Thu, 09 Apr 2009 19:56:25 -0400 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: <20090409211217.GA7897@phd.pp.ru> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> <20090409211217.GA7897@phd.pp.ru> Message-ID: Oleg Broytmann wrote: > On Thu, Apr 09, 2009 at 04:42:21PM -0400, Steve Holden wrote: >> If I can't pass a 256-byte string into a BLOB and get it back without >> anything like this happening then there's *something* in the chain that >> makes the database useless. > > import psycopg2 > > con = psycopg2.connect(database="test") > cur = con.cursor() > cur.execute("CREATE TABLE test (id serial, data BYTEA)") > cur.execute('INSERT INTO test (data) VALUES (%s)', (psycopg2.Binary(''.join([chr(i) for i in range(256)])),)) > cur.execute('SELECT * FROM test ORDER BY id') > for rec in cur.fetchall(): > print rec[0], type(rec[1]), repr(str(rec[1])) > > Result: > > 1 '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' > > What am I doing wrong? > > Oleg. Corresponding with me, probably. Thank you Oleg. I feel suddenly saner again. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/ From jake at youtube.com Fri Apr 10 02:37:56 2009 From: jake at youtube.com (Jake McGuire) Date: Thu, 9 Apr 2009 17:37:56 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE4740.2040205@v.loewis.de> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE3C5B.6020308@gmail.com> <49DE4740.2040205@v.loewis.de> Message-ID: <0C73464C-CA60-4DAA-9E7B-88D9D0F5FD42@youtube.com> On Apr 9, 2009, at 12:06 PM, Martin v. L?wis wrote: > Now that you brought up a specific numbers, I tried to verify them, > and found them correct (although a bit unfortunate), please see my > test script below. Up to 21800 interned strings, the dict takes (only) > 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k > interned strings is "typical", I still don't know. > > Wrt. your proposed change, I would be worried about maintainability, > in particular if it would copy parts of the set implementation. I connected to a random one of our processes, which has been running for a typical amount of time and is currently at ~300MB RSS. (gdb) p *(PyDictObject*)interned $2 = {ob_refcnt = 1, ob_type = 0x8121240, ma_fill = 97239, ma_used = 95959, ma_mask = 262143, ma_table = 0xa493c008, ....} Going from 3MB to 2.25MB isn't much, but it's not nothing, either. I'd be skeptical of cache performance arguments given that the strings used in any particular bit of code should be spread pretty much evenly throughout the hash table, and 3MB seems solidly bigger than any L2 cache I know of. You should be able to get meaningful numbers out of a C profiler, but I'd be surprised to see the act of interning taking a noticeable amount of time. -jake From greg.ewing at canterbury.ac.nz Fri Apr 10 03:01:26 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Apr 2009 13:01:26 +1200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE0DF6.1040900@arbash-meinel.com> References: <49DE0DF6.1040900@arbash-meinel.com> Message-ID: <49DE9A66.2020109@canterbury.ac.nz> John Arbash Meinel wrote: > And when you look at the intern function, it doesn't use > setdefault logic, it actually does a get() followed by a set(), which > means the cost of interning is 1-2 lookups depending on likelyhood, etc. Keep in mind that intern() is called fairly rarely, mostly only at module load time. It may not be worth attempting to speed it up. -- Greg From benjamin at python.org Fri Apr 10 03:01:50 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 9 Apr 2009 20:01:50 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE9A66.2020109@canterbury.ac.nz> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE9A66.2020109@canterbury.ac.nz> Message-ID: <1afaf6160904091801k2d5ffccdm700bee842bf1a1f5@mail.gmail.com> 2009/4/9 Greg Ewing : > John Arbash Meinel wrote: >> >> And when you look at the intern function, it doesn't use >> setdefault logic, it actually does a get() followed by a set(), which >> means the cost of interning is 1-2 lookups depending on likelyhood, etc. > > Keep in mind that intern() is called fairly rarely, mostly > only at module load time. It may not be worth attempting > to speed it up. That's very important, though, for a command line tool for bazaar. Even a few fractions of a second can make a difference in user perception of speed. -- Regards, Benjamin From greg.ewing at canterbury.ac.nz Fri Apr 10 03:22:10 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Apr 2009 13:22:10 +1200 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE31C9.103@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> Message-ID: <49DE9F42.5000704@canterbury.ac.nz> John Arbash Meinel wrote: > And the way intern is currently > written, there is a third cost when the item doesn't exist yet, which is > another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. -- Greg From john.arbash.meinel at gmail.com Fri Apr 10 03:24:04 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 20:24:04 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE9F42.5000704@canterbury.ac.nz> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> Message-ID: <49DE9FB4.9060908@gmail.com> Greg Ewing wrote: > John Arbash Meinel wrote: >> And the way intern is currently >> written, there is a third cost when the item doesn't exist yet, which is >> another lookup to insert the object. > > That's even rarer still, since it only happens the first > time you load a piece of code that uses a given variable > name anywhere in any module. > Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. John =:-> From nyamatongwe at gmail.com Fri Apr 10 03:49:04 2009 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Fri, 10 Apr 2009 11:49:04 +1000 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com> Message-ID: <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com> cmake does not produce relative paths in its generated make and project files. There is an option CMAKE_USE_RELATIVE_PATHS which appears to do this but the documentation says: """This option does not work for more complicated projects, and relative paths are used when possible. In general, it is not possible to move CMake generated makefiles to a different location regardless of the value of this variable.""" This means that generated Visual Studio project files will not work for other people unless a particular absolute build location is specified for everyone which will not suit most. Each person that wants to build Python will have to run cmake before starting Visual Studio thus increasing the prerequisites. Neil From barry at python.org Fri Apr 10 04:26:22 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 22:26:22 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> On Apr 9, 2009, at 8:07 AM, Steve Holden wrote: > The real problem I came across in storing email in a relational > database > was the inability to store messages as Unicode. Some messages have a > body in one encoding and an attachment in another, so the only ways to > store the messages are either as a monolithic bytes string that gets > parsed when the individual components are required or as a sequence of > components in the database's preferred encoding (if you want to keep > the > original encoding most relational databases won't be able to help > unless > you store the components as bytes). > > All in all, as you might expect from a system that's been growing up > since 1970 or so, it can be quite intractable. There are really two ways to look at an email message. It's either an unstructured blob of bytes, or it's a structured tree of objects. Those objects have headers and payload. The payload can be of any type, though I think it generally breaks down into "strings" for text/ * types and bytes for anything else (not counting multiparts). The email package isn't a perfect mapping to this, which is something I want to improve. That aside, I think storing a message in a database means storing some or all of the headers separately from the byte stream (or text?) of its payload. That's for non-multipart types. It would be more complicated to represent a message tree of course. It does seem to make sense to think about headers as text header names and text header values. Of course, header values can contain almost anything and there's an encoding to bring it back to 7-bit ASCII, but again, you really have two views of a header value. Which you want really depends on your application. Maybe you just care about the text of both the header name and value. In that case, I think you want the values as unicodes, and probably the headers as unicodes containing only ASCII. So your table would be strings in both cases. OTOH, maybe your application cares about the raw underlying encoded data, in which case the header names are probably still strings of ASCII-ish unicodes and the values are bytes. It's this distinction (and I think the competing use cases) that make a true Python 3.x API for email more complicated. Thinking about this stuff makes me nostalgic for the sloppy happy days of Python 2.x -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 04:29:12 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 22:29:12 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <66887.1239289730@parc.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <66887.1239289730@parc.com> Message-ID: On Apr 9, 2009, at 11:08 AM, Bill Janssen wrote: > Barry Warsaw wrote: > >> Anyway, aside from that decision, I haven't come up with an >> elegant way to allow /output/ in both bytes and strings (input is I >> think theoretically easier by sniffing the arguments). > > Probably a good thing. It just promotes more confusion to do things > that way, IMO. Very possibly so. But applications will definitely want stuff like the text/plain payload as a unicode, or the image/gif payload as a bytes (or even as a PIL image or whatever). Not that I think the email package needs to know about every content type under the sun, but I do think that it should be pluggable so as to allow applications to more conveniently access the data that way. Possibly the defaults should be unicodes for any text/* type and bytes for everything else. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 04:38:11 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 22:38:11 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote: > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw wrote: > Anyway, aside from that decision, I haven't come up with an elegant > way to allow /output/ in both bytes and strings (input is I think > theoretically easier by sniffing the arguments). > > Won't this work? (assuming dumps() always returns a string) > > def dumpb(obj, encoding='utf-8', *args, **kw): > s = dumps(obj, *args, **kw) > return s.encode(encoding) So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: >>> message['Subject'] The raw bytes or the decoded unicode? Okay, so you've picked one. Now how do you spell the other way? The Message class probably has these explicit methods: >>> Message.get_header_bytes('Subject') >>> Message.get_header_string('Subject') (or better names... it's late and I'm tired ;). One of those maps to message['Subject'] but which is the more obvious choice? Now, setting headers. Sometimes you have some unicode thing and sometimes you have some bytes. You need to end up with bytes in the ASCII range and you'd like to leave the header value unencoded if so. But in both cases, you might have bytes or characters outside that range, so you need an explicit encoding, defaulting to utf-8 probably. >>> Message.set_header('Subject', 'Some text', encoding='utf-8') >>> Message.set_header('Subject', b'Some bytes') One of those maps to >>> message['Subject'] = ??? I'm open to any suggestions here! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 04:40:30 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 22:40:30 -0400 Subject: [Python-Dev] email package Bytes vs Unicode (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> Message-ID: <657BFEEA-04E3-418F-86C0-D2F80C75DB96@python.org> On Apr 9, 2009, at 12:20 PM, Steve Holden wrote: > PostgreSQL strongly encourages you to store text as encoded columns. > Because emails lack an encoding it turns out this is a most > inconvenient > storage type for it. Sadly BLOBs are such a pain in PostgreSQL that > it's > easier to store the messages in external files and just use the > relational database to index those files to retrieve content, so > that's > what I ended up doing. That's not insane for other reasons. Do you really want to store 10MB of mp3 data in your database? Which of course reminds me that I want to add an interface, probably to the parser and message class, to allow an application to store message payloads in other than memory. Parsing and holding onto messages with huge payloads can kill some applications, when you might not care too much about the actual payload content. Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 04:41:55 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 22:41:55 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DE3D9F.3000902@v.loewis.de> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <49DE3D9F.3000902@v.loewis.de> Message-ID: On Apr 9, 2009, at 2:25 PM, Martin v. L?wis wrote: >> This is an interesting question, and something I'm struggling with >> for >> the email package for 3.x. It turns out to be pretty convenient to >> have >> both a bytes and a string API, both for input and output, but I think >> email really wants to be represented internally as bytes. Maybe. Or >> maybe just for content bodies and not headers, or maybe both. >> Anyway, >> aside from that decision, I haven't come up with an elegant way to >> allow >> /output/ in both bytes and strings (input is I think theoretically >> easier by sniffing the arguments). > > If you allow for content-transfer-encoding: 8bit, I think there is > just > no way to represent email as text. You have to accept conversion to, > say, base64 (or quoted-unreadable) when converting an email message to > text. Agreed. But applications will want to deal with some parts of the message as text on the boundaries. Internally, it should be all bytes (although even that is a pain to write ;). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From aahz at pythoncraft.com Fri Apr 10 04:52:03 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 9 Apr 2009 19:52:03 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <20090410025203.GA199@panix.com> On Thu, Apr 09, 2009, Barry Warsaw wrote: > > So, what I'm really asking is this. Let's say you agree that there are > use cases for accessing a header value as either the raw encoded bytes or > the decoded unicode. What should this return: > > >>> message['Subject'] > > The raw bytes or the decoded unicode? Let's make that the raw bytes by default -- we can add a parameter to Message() to specify that the default where possible is unicode for returned values, if that isn't too painful. Here's my reasoning: ultimately, everyone NEEDS to understand that the underlying transport for e-mail is bytes (similar to sockets). We do people no favors by pasting over this too much. We can overlay convenience at various points, but except for text payloads, everything should be bytes by default. Even for text payloads, I'm not entirely certain the default shouldn't be bytes: consider an HTML attachment that you want to compare against the output from a webserver. Still, as long as it's easy to get bytes for text payloads, I think overall I'm still leaning toward unicode for them. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From glyph at divmod.com Fri Apr 10 05:11:51 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 10 Apr 2009 03:11:51 -0000 Subject: [Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> Message-ID: <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> On 02:26 am, barry at python.org wrote: >There are really two ways to look at an email message. It's either an >unstructured blob of bytes, or it's a structured tree of objects. >Those objects have headers and payload. The payload can be of any >type, though I think it generally breaks down into "strings" for text/ >* types and bytes for anything else (not counting multiparts). I think this is a problematic way to model bytes vs. text; it gives text a special relationship to bytes which should be avoided. IMHO the right way to think about domains like this is a multi-level representation. The "low level" representation is always bytes, whether your MIME type is text/whatever or application/x-i-dont-know. The thing that's "special" about text is that it's a "high level" representation that the standard library can know about. But the 'email' package ought to support being extended to support other types just as well. For example, I want to ask for image/png content as PIL.Image objects, not bags of bytes. Of course this presupposes some way for PIL itself to get at some bytes, but then you need the email module itself to get at the bytes to convert to text in much the same way. There also needs to be layering at the level of bytes->base64->some different bytes->PIL->Image. There are mail clients that will base64-encode unusual encodings so you have to do that same layering for text sometimes. I'm also being somewhat handwavy with talk of "low" and "high" level representations; of course there are actually multiple levels beyond that. I might want text/x-python content to show up as an AST, but the intermediate DOM-parsing representation really wants to operate on characters. Similarly for a DOM and text/html content. (Modulo the usual encoding-detection weirdness present in parsers.) So, as long as there's a crisp definition of what layer of the MIME stack one is operating on, I don't think that there's really any ambiguity at all about what type you should be getting. From barry at python.org Fri Apr 10 05:03:35 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 23:03:35 -0400 Subject: [Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> Message-ID: On Apr 9, 2009, at 11:11 PM, glyph at divmod.com wrote: > I think this is a problematic way to model bytes vs. text; it gives > text a special relationship to bytes which should be avoided. > > IMHO the right way to think about domains like this is a multi-level > representation. The "low level" representation is always bytes, > whether your MIME type is text/whatever or application/x-i-dont-know. This is a really good point, and I really should be clearer when describing my current thinking (sleep would help :). > The thing that's "special" about text is that it's a "high level" > representation that the standard library can know about. But the > 'email' package ought to support being extended to support other > types just as well. For example, I want to ask for image/png > content as PIL.Image objects, not bags of bytes. Of course this > presupposes some way for PIL itself to get at some bytes, but then > you need the email module itself to get at the bytes to convert to > text in much the same way. There also needs to be layering at the > level of bytes->base64->some different bytes->PIL->Image. There are > mail clients that will base64-encode unusual encodings so you have > to do that same layering for text sometimes. > > I'm also being somewhat handwavy with talk of "low" and "high" level > representations; of course there are actually multiple levels beyond > that. I might want text/x-python content to show up as an AST, but > the intermediate DOM-parsing representation really wants to operate > on characters. Similarly for a DOM and text/html content. (Modulo > the usual encoding-detection weirdness present in parsers.) When I was talking about supporting text/* content types as strings, I was definitely thinking about using basically the same plug-in or higher level or whatever API to do that as you might use to get PIL images from an image/gif. > So, as long as there's a crisp definition of what layer of the MIME > stack one is operating on, I don't think that there's really any > ambiguity at all about what type you should be getting. In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- center API first, and build things on top of that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 05:05:37 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 23:05:37 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <20090410025203.GA199@panix.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> Message-ID: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> On Apr 9, 2009, at 10:52 PM, Aahz wrote: > On Thu, Apr 09, 2009, Barry Warsaw wrote: >> >> So, what I'm really asking is this. Let's say you agree that there >> are >> use cases for accessing a header value as either the raw encoded >> bytes or >> the decoded unicode. What should this return: >> >>>>> message['Subject'] >> >> The raw bytes or the decoded unicode? > > Let's make that the raw bytes by default -- we can add a parameter to > Message() to specify that the default where possible is unicode for > returned values, if that isn't too painful. I don't know whether the parameter thing will work or not, but you're probably right that we need to get the bytes-everywhere API first. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From ncoghlan at gmail.com Fri Apr 10 05:21:05 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Apr 2009 13:21:05 +1000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> Message-ID: <49DEBB21.70305@gmail.com> Barry Warsaw wrote: > I don't know whether the parameter thing will work or not, but you're > probably right that we need to get the bytes-everywhere API first. Given that json is a wire protocol, that sounds like the right approach for json as well. Once bytes-everywhere works, then a text API can be built on top of it, but it is difficult to build a bytes API on top of a text one. So I guess the IO library *is* the right model: bytes at the bottom of the stack, with text as a wrapper around it (mediated by codecs). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From barry at python.org Fri Apr 10 05:23:40 2009 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Apr 2009 23:23:40 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DEBB21.70305@gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> Message-ID: <0047AD0A-7B5B-4703-96D6-BD26B9752E7D@python.org> On Apr 9, 2009, at 11:21 PM, Nick Coghlan wrote: > Barry Warsaw wrote: >> I don't know whether the parameter thing will work or not, but you're >> probably right that we need to get the bytes-everywhere API first. > > Given that json is a wire protocol, that sounds like the right > approach > for json as well. Once bytes-everywhere works, then a text API can be > built on top of it, but it is difficult to build a bytes API on top > of a > text one. Agreed! > So I guess the IO library *is* the right model: bytes at the bottom of > the stack, with text as a wrapper around it (mediated by codecs). Yes, that's a very interesting (and proven?) model. I don't quite see how we could apply that email and json, but it seems like there's a good idea there. ;) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From tonynelson at georgeanelson.com Fri Apr 10 05:41:58 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 9 Apr 2009 23:41:58 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: At 22:38 -0400 04/09/2009, Barry Warsaw wrote: ... >So, what I'm really asking is this. Let's say you agree that there >are use cases for accessing a header value as either the raw encoded >bytes or the decoded unicode. What should this return: > > >>> message['Subject'] > >The raw bytes or the decoded unicode? That's an easy one: Subject: is an unstructured header, so it must be text, thus Unicode. We're looking at a high-level representation of an email message, with parsed header fields and a MIME message tree. >Okay, so you've picked one. Now how do you spell the other way? message.get_header_bytes('Subject') Oh, I see that's what you picked. >The Message class probably has these explicit methods: > > >>> Message.get_header_bytes('Subject') > >>> Message.get_header_string('Subject') > >(or better names... it's late and I'm tired ;). One of those maps to >message['Subject'] but which is the more obvious choice? Structured header fields are more of a problem. Any header with addresses should return a list of addresses. I think the default return type should depend on the data type. To get an explicit bytes or string or list of addresses, be explicit; otherwise, for convenience, return the appropriate type for the particular header field name. >Now, setting headers. Sometimes you have some unicode thing and >sometimes you have some bytes. You need to end up with bytes in the >ASCII range and you'd like to leave the header value unencoded if so. >But in both cases, you might have bytes or characters outside that >range, so you need an explicit encoding, defaulting to utf-8 probably. Never for header fields. The default is always RFC 2047, unless it isn't, say for params. The Message class should create an object of the appropriate subclass of Header based on the name (or use the existing object, see other discussion), and that should inspect its argument and DTRT or complain. > > >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >>> Message.set_header('Subject', b'Some bytes') > >One of those maps to > > >>> message['Subject'] = ??? The expected data type should depend on the header field. For Subject:, it should be bytes to be parsed or verbatim text. For To:, it should be a list of addresses or bytes or text to be parsed. The email package should be pythonic, and not require deep understanding of dozens of RFCs to use properly. Users don't need to know about the raw bytes; that's the whole point of MIME and any email package. It should be easy to set header fields with their natural data types, and doing it with bad data should produce an error. This may require a bit more care in the message parser, to always produce a parsed message with defects. -- ____________________________________________________________________ TonyN.:' ' From mike.klaas at gmail.com Fri Apr 10 05:42:37 2009 From: mike.klaas at gmail.com (Mike Klaas) Date: Thu, 9 Apr 2009 20:42:37 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE9FB4.9060908@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> Message-ID: <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com> On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: > Greg Ewing wrote: >> John Arbash Meinel wrote: >>> And the way intern is currently >>> written, there is a third cost when the item doesn't exist yet, >>> which is >>> another lookup to insert the object. >> >> That's even rarer still, since it only happens the first >> time you load a piece of code that uses a given variable >> name anywhere in any module. >> > > Somewhat true, though I know it happens 25k times during startup of > bzr... And I would be a *lot* happier if startup time was 100ms > instead > of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of speed up by tweaking string interning. 25k times doing anything (computation) just isn't all that much. $ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 8.28 msec per loop Perhaps this isn't representative (int hashing is ridiculously cheap, for instance), but the dict itself is far bigger than the dict you are dealing with and such would have similar cache-busting properties. And yet, 25k accesses (plus python->c dispatching costs which you are paying with interning) consume only ~10ms. You could do more good by eliminating a handful of disk seeks by reducing the number of imported modules... -Mike From guido at python.org Fri Apr 10 05:55:34 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Apr 2009 20:55:34 -0700 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> Message-ID: On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato wrote: > Then perhaps you misunderstand the goal of the decorator module. > The raison d'etre of the module is to PRESERVE the signature: > update_wrapper unfortunately *changes* it. > > When confronted with a library which I do not not know, I often run > over it pydoc, or sphinx, or a custom made documentation tool, to extract the > signature of functions. Ah, I see. Personally I rarely trust automatically extracted documentation -- too often in my experience it is out of date or simply absent. Extracting the signatures in theory wouldn't lie, but in practice I still wouldn't trust it -- not only because of what decorators might or might not do, but because it might still be misleading. Call me old-fashioned, but I prefer to read the source code. For instance, if I see a method > get_user(self, username) I have a good hint about what it is supposed > to do. But if the library (say a web framework) uses non signature-preserving > decorators, my documentation tool says to me that there is function > get_user(*args, **kwargs) which frankly is not enough [this is the > optimistic case, when the author of the decorator has taken care > to preserve the name of the original function]. But seeing the decorator is often essential for understanding what goes on! Even if the decorator preserves the signature (in truth or according inspect), many decorators *do* something, and it's important to know how a function is decorated. For example, I work a lot with a small internal framework at Google whose decorators can raise exceptions and set instance variables; they also help me understand under which conditions a method can be called. > ?I *hate* losing information about the true signature of functions, since I also > use a lot IPython, Python help, etc. I guess we just have different styles. That's fine. >>> I must admit that while I still like decorators, I do like them as >>> much as in the past. > > Of course there was a missing NOT in this sentence, but you all understood > the intended meaning. > >> (All this BTW is not to say that I don't trust you with commit >> privileges if you were to be interested in contributing. I just don't >> think that adding that particular decorator module to the stdlib would >> be wise. It can be debated though.) > > Fine. As I have repeated many time that particular module was never > meant for inclusion in the standard library. Then perhaps it shouldn't -- I haven't looked but if you don't plan stdlib inclusion it is often the case that the API style and/or implementation details make stdlib inclusion unrealistic. (Though admittedly some older modules wouldn't be accepted by today's standards either -- and I'm not just talking PEP-8 compliance! :-) > But I feel strongly about > the possibility of being able to preserve (not change!) the function > signature. That could be added to functools if enough people want it. > I do not think everybody disagree with your point here. My point still > stands, though: objects should not lie about their signature, especially > during ?debugging and when generating documentation from code. Source code never lies. Debuggers should make access to the source code a key point. And good documentation should be written by a human, not automatically cobbled together from source code and a few doc strings. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tonynelson at georgeanelson.com Fri Apr 10 05:59:54 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 9 Apr 2009 23:59:54 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> Message-ID: At 22:26 -0400 04/09/2009, Barry Warsaw wrote: >There are really two ways to look at an email message. It's either an >unstructured blob of bytes, or it's a structured tree of objects. >Those objects have headers and payload. The payload can be of any >type, though I think it generally breaks down into "strings" for text/ >* types and bytes for anything else (not counting multiparts). > >The email package isn't a perfect mapping to this, which is something >I want to improve. That aside, I think storing a message in a >database means storing some or all of the headers separately from the >byte stream (or text?) of its payload. That's for non-multipart >types. It would be more complicated to represent a message tree of >course. Storing an email message in a database does mean storing some of the header fields as database fields, but the set of email header fields is open, so any "unused" fields in a message must be stored elsewhere. It isn't useful to just have a bag of name/value pairs in a table. General message MIME payload trees don't map well to a database either, unless one wants to get very relational. Sometimes the database needs to represent the entire email message, header fields and MIME tree, but only if it is an email program and usually not even then. Usually, the database has a specific purpose, and can be designed for the data it cares about; it may choose to keep the original message as bytes. >It does seem to make sense to think about headers as text header names >and text header values. Of course, header values can contain almost >anything and there's an encoding to bring it back to 7-bit ASCII, but >again, you really have two views of a header value. Which you want >really depends on your application. I think of header fields as having text-like names (the set of allowed characters is more than just text, though defined headers don't make use of that), but the data is either bytes or it should be parsed into something appropriate: text for unstructured fields like Subject:, a list of addresses for address fields like To:. Many of the structured header fields have a reasonable mapping to text; certainly this is true for adress header fields. Content-Type header fields are barely text, they can be so convolutedly structured, but I suppose one could flatten one of them to text instead of bytes if the user wanted. It's not very useful, though, except for debugging (either by the programmer or the recipient who wants to know what was cleaned from the message). >Maybe you just care about the text of both the header name and value. >In that case, I think you want the values as unicodes, and probably >the headers as unicodes containing only ASCII. So your table would be >strings in both cases. OTOH, maybe your application cares about the >raw underlying encoded data, in which case the header names are >probably still strings of ASCII-ish unicodes and the values are >bytes. It's this distinction (and I think the competing use cases) >that make a true Python 3.x API for email more complicated. If a database stores the Subject: header field, it would be as text. The various recipient address fields are a one message to many names and addresses mapping, and need a related table of name/address fields, with each field being text. The original message (or whatever part of it one preserves) should be bytes. I don't think this complicates the email package API; rather, it just shows where generality is needed. >Thinking about this stuff makes me nostalgic for the sloppy happy days >of Python 2.x You now have the opportunity to finally unsnarl that mess. It is not an insurmountable opportunity. -- ____________________________________________________________________ TonyN.:' ' From jyasskin at gmail.com Fri Apr 10 06:04:09 2009 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 9 Apr 2009 21:04:09 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE9FB4.9060908@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> Message-ID: <5d44f72f0904092104y66073939q2ea4ea87937bef69@mail.gmail.com> On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel wrote: > Greg Ewing wrote: >> John Arbash Meinel wrote: >>> And the way intern is currently >>> written, there is a third cost when the item doesn't exist yet, which is >>> another lookup to insert the object. >> >> That's even rarer still, since it only happens the first >> time you load a piece of code that uses a given variable >> name anywhere in any module. >> > > Somewhat true, though I know it happens 25k times during startup of > bzr... And I would be a *lot* happier if startup time was 100ms instead > of 400ms. I think you have plenty of a case to try it out. If you code it up and it doesn't speed anything up, well then we've learned something, and maybe it'll be useful anyway for the memory savings. If it does speed things up, well then Python's faster. I wouldn't waste time arguing about it before you have the change written. Good luck! Jeffrey From collinw at gmail.com Fri Apr 10 06:07:54 2009 From: collinw at gmail.com (Collin Winter) Date: Thu, 9 Apr 2009 21:07:54 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DE9FB4.9060908@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> Message-ID: <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel wrote: > Greg Ewing wrote: >> John Arbash Meinel wrote: >>> And the way intern is currently >>> written, there is a third cost when the item doesn't exist yet, which is >>> another lookup to insert the object. >> >> That's even rarer still, since it only happens the first >> time you load a piece of code that uses a given variable >> name anywhere in any module. >> > > Somewhat true, though I know it happens 25k times during startup of > bzr... And I would be a *lot* happier if startup time was 100ms instead > of 400ms. Quite so. We have a number of internal tools, and they find that frequently just starting up Python takes several times the duration of the actual work unit itself. I'd be very interested to review any patches you come up with to improve start-up time; so far on this thread, there's been a lot of theory and not much practice. I'd approach this iteratively: first replace the dict with a set, then if that bears fruit, consider a customized data structure; if that bears fruit, etc. Good luck, and be sure to let us know what you find, Collin Winter From guido at python.org Fri Apr 10 06:26:53 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Apr 2009 21:26:53 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> Message-ID: On Thu, Apr 9, 2009 at 9:07 PM, Collin Winter wrote: > On Thu, Apr 9, 2009 at 6:24 PM, John Arbash Meinel wrote: > >And I would be a *lot* happier if startup time was 100ms instead > > of 400ms. > > Quite so. We have a number of internal tools, and they find that > frequently just starting up Python takes several times the duration of > the actual work unit itself. I'd be very interested to review any > patches you come up with to improve start-up time; so far on this > thread, there's been a lot of theory and not much practice. I'd > approach this iteratively: first replace the dict with a set, then if > that bears fruit, consider a customized data structure; if that bears > fruit, etc. > > Good luck, and be sure to let us know what you find, Just to add some skepticism, has anyone done any kind of instrumentation of bzr start-up behavior? IIRC every time I was asked to reduce the start-up cost of some Python app, the cause was too many imports, and the solution was either to speed up import itself (.pyc files were the first thing ever that came out of that -- importing from a single .zip file is one of the more recent tricks) or to reduce the number of modules imported at start-up (or both :-). Heavy-weight frameworks are usually the root cause, but usually there's nothing that can be done about that by the time you've reached this point. So, amen on the good luck, but please start with a bit of analysis. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 10 06:34:19 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Apr 2009 21:34:19 -0700 Subject: [Python-Dev] Adding new features to Python 2.x (PEP 382: Namespace Packages) In-Reply-To: <20090409125312.GB1909@panix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> <49DBA78F.7010904@v.loewis.de> <49DDD6AD.9020708@gmail.com> <20090409125312.GB1909@panix.com> Message-ID: On Thu, Apr 9, 2009 at 5:53 AM, Aahz wrote: > On Thu, Apr 09, 2009, Nick Coghlan wrote: >> >> Martin v. L?wis wrote: >>>> Such a policy would then translate to a dead end for Python 2.x >>>> based applications. >>> >>> 2.x based applications *are* in a dead end, with the only exit >>> being portage to 3.x. >> >> The actual end of the dead end just happens to be in 2013 or so :) > > More like 2016 or 2020 -- as of January, my former employer was still > using Python 2.3, and I wouldn't be surprised if 1.5.2 was still out in > the wilds. ?The transition to 3.x is more extreme, and lots of people > will continue making do for years after any formal support is dropped. There's nothing wrong with that. People using 1.5.2 today certainly aren't asking for support, and people using 2.3 probably aren't expecting much either. That's fine, those Python versions are as stable as the rest of their environment. (I betcha they're still using GCC 2.96 too, though they probably don't have any reason to build a new Python binary from source. :-) People *will* be using 2.6 well past 2013. But will they care about the Python community actively supporting it? Of course not! Anything we did would probably break something for them. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From john.arbash.meinel at gmail.com Fri Apr 10 06:38:55 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Thu, 09 Apr 2009 23:38:55 -0500 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com> Message-ID: <49DECD5F.7@gmail.com> ... >> Somewhat true, though I know it happens 25k times during startup of >> bzr... And I would be a *lot* happier if startup time was 100ms instead >> of 400ms. > > I don't want to quash your idealism too severely, but it is extremely > unlikely that you are going to get anywhere near that kind of speed up > by tweaking string interning. 25k times doing anything (computation) > just isn't all that much. > > $ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in > xrange(25000): d.get(x)' > 100 loops, best of 3: 8.28 msec per loop > > Perhaps this isn't representative (int hashing is ridiculously cheap, > for instance), but the dict itself is far bigger than the dict you are > dealing with and such would have similar cache-busting properties. And > yet, 25k accesses (plus python->c dispatching costs which you are paying > with interning) consume only ~10ms. You could do more good by > eliminating a handful of disk seeks by reducing the number of imported > modules... > > -Mike > You're also using timeit over the same set of 25k keys, which means it only has to load that subset. And as you are using identical runs each time, those keys are already loaded into your cache lines... And given how hash(int) works, they are all sequential in memory, and all 10M in your original set have 0 collisions. Actually, at 10M, you'll have a dict of size 20M entries, and the first 10M entries will be full, and the trailing 10M entries will all be empty. That said, you're right, the benefits of a smaller structure are going to be small. I'll just point that if I just do a small tweak to your timing and do: $ python -mtimeit -s 'd=dict.fromkeys(xrange(10000000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 6.27 msec per loop So slightly faster than yours, *but*, lets try a much smaller dict: $ python -mtimeit -s 'd=dict.fromkeys(xrange(25000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 6.35 msec per loop Pretty much the same time. Well within the noise margin. But if I go back to the "big dict" and actually select 25k keys across the whole set: $ TIMEIT -s 'd=dict.fromkeys(xrange(10000000));' \ -s keys=range(0,10000000,10000000/25000)' \ 'for x in keys: d.get(x)' 100 loops, best of 3: 13.1 msec per loop Now I'm still accessing 25k keys, but I'm doing it across the whole range, and suddenly the time *doubled*. What about slightly more random access: $ TIMEIT -s 'import random; d=dict.fromkeys(xrange(10000000));' -s 'bits = range(0, 10000000, 400); random.shuffle(bits)'\ 'for x in bits: d.get(x)' 100 loops, best of 3: 15.5 msec per loop Not as big of a difference as I thought it would be... But I bet if there was a way to put the random shuffle in the inner loop, so you weren't accessing the same identical 25k keys internally, you might get more interesting results. As for other bits about exercising caches: $ shuffle(range(0, 10000000, 400)) 100 loops, best of 3: 15.5 msec per loop $ shuffle(range(0, 10000000, 40)) 10 loops, best of 3: 175 msec per loop 10x more keys, costs 11.3x, pretty close to linear. $ shuffle(range(0, 10000000, 10)) 10 loops, best of 3: 739 msec per loop 4x the keys, 4.5x the time, starting to get more into nonlinear effects. Anyway, you're absolutely right. intern() overhead is a tiny fraction of 'import bzrlib.*' time, so I don't expect to see amazing results. That said, accessing 25k keys in a smaller structure is 2x faster than accessing 25k keys spread across a larger structure. John =:-> From glyph at divmod.com Fri Apr 10 07:19:02 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 10 Apr 2009 05:19:02 -0000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> On 02:38 am, barry at python.org wrote: >So, what I'm really asking is this. Let's say you agree that there >are use cases for accessing a header value as either the raw encoded >bytes or the decoded unicode. What should this return: > > >>> message['Subject'] > >The raw bytes or the decoded unicode? My personal preference would be to just get deprecate this API, and get rid of it, replacing it with a slightly more explicit one. message.headers['Subject'] message.bytes_headers['Subject'] >Now, setting headers. Sometimes you have some unicode thing and >sometimes you have some bytes. You need to end up with bytes in the >ASCII range and you'd like to leave the header value unencoded if so. >But in both cases, you might have bytes or characters outside that >range, so you need an explicit encoding, defaulting to utf-8 probably. message.headers['Subject'] = 'Some text' should be equivalent to message.headers['Subject'] = Header('Some text') My preference would be that message.headers['Subject'] = b'Some Bytes' would simply raise an exception. If you've got some bytes, you should instead do message.bytes_headers['Subject'] = b'Some Bytes' or message.headers['Subject'] = Header(bytes=b'Some Bytes', encoding='utf-8') Explicit is better than implicit, right? From glyph at divmod.com Fri Apr 10 07:28:36 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 10 Apr 2009 05:28:36 -0000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DEBB21.70305@gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> Message-ID: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> On 03:21 am, ncoghlan at gmail.com wrote: >Barry Warsaw wrote: >>I don't know whether the parameter thing will work or not, but you're >>probably right that we need to get the bytes-everywhere API first. >Given that json is a wire protocol, that sounds like the right approach >for json as well. Once bytes-everywhere works, then a text API can be >built on top of it, but it is difficult to build a bytes API on top of >a >text one. I wish I could agree, but JSON isn't really a wire protocol. According to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the serialization of structured data". There are some notes about encoding, but it is very clearly described in terms of unicode code points. >So I guess the IO library *is* the right model: bytes at the bottom of >the stack, with text as a wrapper around it (mediated by codecs). In email's case this is true, but in JSON's case it's not. JSON is a format defined as a sequence of code points; MIME is defined as a sequence of octets. From turnbull at sk.tsukuba.ac.jp Fri Apr 10 07:22:04 2009 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 10 Apr 2009 14:22:04 +0900 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> Message-ID: <87zlepf5hf.fsf@xemacs.org> Barry Warsaw writes: > There are really two ways to look at an email message. It's either an > unstructured blob of bytes, or it's a structured tree of objects. Indeed! > Those objects have headers and payload. The payload can be of any > type, though I think it generally breaks down into "strings" for text/ > * types and bytes for anything else (not counting multiparts). *sigh* Why are you back-tracking? The payload should be of an appropriate *object* type. Atomic object types will have their content stored as string or bytes [nb I use Python 3 terminology throughout]. Composite types (multipart/*) won't need string or bytes attributes AFAICS. Start by implementing the application/octet-stream and text/plain;charset=utf-8 object types, of course. > It does seem to make sense to think about headers as text header names > and text header values. I disagree. IMHO, structured header types should have object values, and something like message['to'] = "Barry 'da FLUFL' Warsaw " should be smart enough to detect that it's a string and attempt to (flexibly) parse it into a fullname and a mailbox adding escapes, etc. Whether these should be structured objects or they can be strings or bytes, I'm not sure (probably bytes, not strings, though -- see next exampl). OTOH message['to'] = b'''"Barry 'da.FLUFL' Warsaw" ''' should assume that the client knows what they are doing, and should parse it strictly (and I mean "be a real bastard", eg, raise an exception on any non-ASCII octet), merely dividing it into fullname and mailbox, and caching the bytes for later insertion in a wire-format message. > In that case, I think you want the values as unicodes, and probably > the headers as unicodes containing only ASCII. So your table would be > strings in both cases. OTOH, maybe your application cares about the > raw underlying encoded data, in which case the header names are > probably still strings of ASCII-ish unicodes and the values are > bytes. It's this distinction (and I think the competing use cases) > that make a true Python 3.x API for email more complicated. I don't see why you can't have the email API be specific, with message['to'] always returning a structured_header object (or maybe even more specifically an address_header object), and methods like message['to'].build_header_as_text() which returns """To: "Barry 'da.FLUFL' Warsaw" """ and message['to'].build_header_in_wire_format() which returns b"""To: "Barry 'da.FLUFL' Warsaw" """ Then have email.textview.Message and email.wireview.Message which provide a simple interface where message['to'] would invoke .build_header_as_text() and .build_header_in_wire_format() respectively. > Thinking about this stuff makes me nostalgic for the sloppy happy days > of Python 2.x Er, yeah. Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs, From fetchinson at googlemail.com Fri Apr 10 07:21:22 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Thu, 9 Apr 2009 22:21:22 -0700 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> Message-ID: >> Then perhaps you misunderstand the goal of the decorator module. >> The raison d'etre of the module is to PRESERVE the signature: >> update_wrapper unfortunately *changes* it. >> >> When confronted with a library which I do not not know, I often run >> over it pydoc, or sphinx, or a custom made documentation tool, to extract >> the >> signature of functions. > > Ah, I see. Personally I rarely trust automatically extracted > documentation -- too often in my experience it is out of date or > simply absent. Extracting the signatures in theory wouldn't lie, but > in practice I still wouldn't trust it -- not only because of what > decorators might or might not do, but because it might still be > misleading. Call me old-fashioned, but I prefer to read the source > code. > > For instance, if I see a method >> get_user(self, username) I have a good hint about what it is supposed >> to do. But if the library (say a web framework) uses non >> signature-preserving >> decorators, my documentation tool says to me that there is function >> get_user(*args, **kwargs) which frankly is not enough [this is the >> optimistic case, when the author of the decorator has taken care >> to preserve the name of the original function]. > > But seeing the decorator is often essential for understanding what > goes on! Even if the decorator preserves the signature (in truth or > according inspect), many decorators *do* something, and it's important > to know how a function is decorated. For example, I work a lot with a > small internal framework at Google whose decorators can raise > exceptions and set instance variables; they also help me understand > under which conditions a method can be called. > >> I *hate* losing information about the true signature of functions, since >> I also >> use a lot IPython, Python help, etc. > > I guess we just have different styles. That's fine. > >>>> I must admit that while I still like decorators, I do like them as >>>> much as in the past. >> >> Of course there was a missing NOT in this sentence, but you all understood >> the intended meaning. >> >>> (All this BTW is not to say that I don't trust you with commit >>> privileges if you were to be interested in contributing. I just don't >>> think that adding that particular decorator module to the stdlib would >>> be wise. It can be debated though.) >> >> Fine. As I have repeated many time that particular module was never >> meant for inclusion in the standard library. > > Then perhaps it shouldn't -- I haven't looked but if you don't plan > stdlib inclusion it is often the case that the API style and/or > implementation details make stdlib inclusion unrealistic. (Though > admittedly some older modules wouldn't be accepted by today's > standards either -- and I'm not just talking PEP-8 compliance! :-) > >> But I feel strongly about >> the possibility of being able to preserve (not change!) the function >> signature. > > That could be added to functools if enough people want it. My original suggestion for inclusion in stdlib was motivated by this reason alone: I'd like to see an official one way of preserving function signatures by decorators. If there are better ways of doing it than the decorator module, that's totally fine, but there should be one. Cheers, Daniel >> I do not think everybody disagree with your point here. My point still >> stands, though: objects should not lie about their signature, especially >> during debugging and when generating documentation from code. > > Source code never lies. Debuggers should make access to the source > code a key point. And good documentation should be written by a human, > not automatically cobbled together from source code and a few doc > strings. -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From sylvain.thenault at logilab.fr Fri Apr 10 09:49:00 2009 From: sylvain.thenault at logilab.fr (Sylvain =?utf-8?B?VGjDqW5hdWx0?=) Date: Fri, 10 Apr 2009 09:49:00 +0200 Subject: [Python-Dev] BLOBs in Pg In-Reply-To: <49DE3902.70103@holdenweb.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <20090409172424.GD26429@phd.pp.ru> <49DE3902.70103@holdenweb.com> Message-ID: <20090410074900.GB21832@lupus.logilab.fr> On 09 avril 14:05, Steve Holden wrote: > Oleg Broytmann wrote: > > On Thu, Apr 09, 2009 at 01:14:21PM -0400, Tony Nelson wrote: > >> I use MySQL, but sort of intend to learn PostgreSQL. I didn't know that > >> PostgreSQL has no real support for BLOBs. > > > > I think it has - BYTEA data type. > > > But the Python DB adapters appears to require some fairly hairy escaping > of the data to make it usable with the cursor execute() method. IMHO you > shouldn't have to escape data that is passed for insertion via a > parameterized query. can't you simply use dbmodule.Binary to do the job? -- Sylvain Th?nault LOGILAB, Paris (France) Formations Python, Debian, M?th. Agiles: http://www.logilab.fr/formations D?veloppement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org From ncoghlan at gmail.com Fri Apr 10 10:40:28 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Apr 2009 18:40:28 +1000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> Message-ID: <49DF05FC.9040208@gmail.com> glyph at divmod.com wrote: > On 03:21 am, ncoghlan at gmail.com wrote: >> Given that json is a wire protocol, that sounds like the right approach >> for json as well. Once bytes-everywhere works, then a text API can be >> built on top of it, but it is difficult to build a bytes API on top of a >> text one. > > I wish I could agree, but JSON isn't really a wire protocol. According > to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the > serialization of structured data". There are some notes about encoding, > but it is very clearly described in terms of unicode code points. Ah, my apologies - if the RFC defines things such that the native format is Unicode, then yes, the appropriate Python 3.x data type for the base implementation would indeed be strings. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Apr 10 10:52:40 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Apr 2009 18:52:40 +1000 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> Message-ID: <49DF08D8.9080806@gmail.com> Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? IIRC every time I was asked > to reduce the start-up cost of some Python app, the cause was too many > imports, and the solution was either to speed up import itself (.pyc > files were the first thing ever that came out of that -- importing > from a single .zip file is one of the more recent tricks) or to reduce > the number of modules imported at start-up (or both :-). Heavy-weight > frameworks are usually the root cause, but usually there's nothing > that can be done about that by the time you've reached this point. So, > amen on the good luck, but please start with a bit of analysis. This problem (slow application startup times due to too many imports at startup, which can in turn can be due to top level imports for library or framework functionality that a given application doesn't actually use) is actually the main reason I sometimes wish for a nice, solid lazy module import mechanism that manages to avoid the potential deadlock problems created by using import statements inside functions. Providing a clean API and implementation for that functionality is a pretty tough nut to crack though, so I'm not holding my breath... Cheers, Nick. P.S. It's only an occasional fairly idle wish for me though, or I'd have at least tried to come up with something myself by now. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From robert.collins at canonical.com Fri Apr 10 11:19:39 2009 From: robert.collins at canonical.com (Robert Collins) Date: Fri, 10 Apr 2009 19:19:39 +1000 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> Message-ID: <1239355179.2892.224.camel@lifeless-64> On Thu, 2009-04-09 at 21:26 -0700, Guido van Rossum wrote: > Just to add some skepticism, has anyone done any kind of > instrumentation of bzr start-up behavior? We sure have. 'bzr --profile-imports' reports on the time to import different modules (both cumulative and individually). We have a lazy module loader that allows us to defer loading modules we might not use (though if they are needed we are in fact going to pay for loading them eventually). We monkeypatch the standard library where modules we want are unreasonably expensive to import (for instance by making a regex we wouldn't use be lazy compiled rather than compiled at import time). > IIRC every time I was asked > to reduce the start-up cost of some Python app, the cause was too many > imports, and the solution was either to speed up import itself (.pyc > files were the first thing ever that came out of that -- importing > from a single .zip file is one of the more recent tricks) or to reduce > the number of modules imported at start-up (or both :-). Heavy-weight > frameworks are usually the root cause, but usually there's nothing > that can be done about that by the time you've reached this point. So, > amen on the good luck, but please start with a bit of analysis. Certainly, import time is part of it: robertc at lifeless-64:~$ python -m timeit -s 'import sys; import bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors" 10 loops, best of 3: 18.7 msec per loop (errors.py is 3027 lines long with 347 exception classes). We've also looked lower - python does a lot of stat operations search for imports and determining if the pyc is up to date; these appear to only really matter on cold-cache imports (but they matter a lot then); in hot-cache situations they are insignificant. Uhm, there's probably more - but I just wanted to note that we have done quite a bit of analysis. I think a large chunk of our problem is having too much code loaded when only a small fraction will be used in any one operation. Consider importing bzrlib errors - 10% of the startup time for 'bzr help'. In any operation only a few of those exceptions will be used - and typically 0. -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From solipsis at pitrou.net Fri Apr 10 13:41:07 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Apr 2009 11:41:07 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> Message-ID: divmod.com> writes: > > In email's case this is true, but in JSON's case it's not. JSON is a > format defined as a sequence of code points; MIME is defined as a > sequence of octets. Another to look at it is that JSON is a subset of Javascript, and as such is text rather than bytes. Regards Antoine. From solipsis at pitrou.net Fri Apr 10 13:52:00 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Apr 2009 11:52:00 +0000 (UTC) Subject: [Python-Dev] Rethinking intern() and its data structure References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <1239355179.2892.224.camel@lifeless-64> Message-ID: Robert Collins canonical.com> writes: > > (errors.py is 3027 lines long with 347 exception classes). 347 exception classes? Perhaps your framework is over-engineered. Similarly, when using a heavy Web framework, reloading a Web app can take several seconds... but I won't blame Python for that. Regards Antoine. From p.f.moore at gmail.com Fri Apr 10 13:53:47 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 10 Apr 2009 12:53:47 +0100 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DF05FC.9040208@gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> Message-ID: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> 2009/4/10 Nick Coghlan : > glyph at divmod.com wrote: >> On 03:21 am, ncoghlan at gmail.com wrote: >>> Given that json is a wire protocol, that sounds like the right approach >>> for json as well. Once bytes-everywhere works, then a text API can be >>> built on top of it, but it is difficult to build a bytes API on top of a >>> text one. >> >> I wish I could agree, but JSON isn't really a wire protocol. ?According >> to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the >> serialization of structured data". ?There are some notes about encoding, >> but it is very clearly described in terms of unicode code points. > > Ah, my apologies - if the RFC defines things such that the native format > is Unicode, then yes, the appropriate Python 3.x data type for the base > implementation would indeed be strings. Indeed, the RFC seems to clearly imply that loads should take a Unicode string, dumps should produce one, and load/dump should work in terms of text files (not byte files). On the other hand, further down in the document: """ 3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. """ This is at best confused (in my utterly non-expert opinion :-)) as Unicode isn't an encoding... I would guess that what the RFC is trying to say is that JSON is text (Unicode) and where a byte stream purporting to be JSON is encountered without a defined encoding, this is how to guess one. That implies that loads can/should also allow bytes as input, applying the given algorithm to guess an encoding. And similarly load can/should accept a byte stream, on the same basis. (There's no need to allow the possibility of accepting bytes plus an encoding - in that case the user should decode the bytes before passing Unicode to the JSON module). An alternative might be for the JSON module to register a special encoding ('JSON-guess'?) which captures the rules here. Then there's no need for special bytes parameter handling. Of course, this is all from a native English speaker, who therefore has no idea of the real life issues involved in Unicode :-) Paul. From robert.collins at canonical.com Fri Apr 10 14:16:30 2009 From: robert.collins at canonical.com (Robert Collins) Date: Fri, 10 Apr 2009 22:16:30 +1000 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <1239355179.2892.224.camel@lifeless-64> Message-ID: <1239365790.2892.229.camel@lifeless-64> On Fri, 2009-04-10 at 11:52 +0000, Antoine Pitrou wrote: > Robert Collins canonical.com> writes: > > > > (errors.py is 3027 lines long with 347 exception classes). > > 347 exception classes? Perhaps your framework is over-engineered. > > Similarly, when using a heavy Web framework, reloading a Web app can take > several seconds... but I won't blame Python for that. Well, we've added exceptions as we needed them. This isn't much different to errno in C programs; the errno range has expanded as people have wanted to signal that specific situations have arisen. The key thing for us is to have both something that can be caught (for library users of bzrlib) and something that can be formatted with variable substitution (for displaying to users). If there are better ways to approach this in python than what we've done, that would be great. -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From martin at v.loewis.de Fri Apr 10 14:55:45 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Apr 2009 14:55:45 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> Message-ID: <49DF41D1.7030003@v.loewis.de> >> In email's case this is true, but in JSON's case it's not. JSON is a >> format defined as a sequence of code points; MIME is defined as a >> sequence of octets. > > Another to look at it is that JSON is a subset of Javascript, and as such is > text rather than bytes. I don't think this can be approached from a theoretical point of view. Instead, what matters is how users want to use it. Regards, Martin From fuzzyman at voidspace.org.uk Fri Apr 10 14:57:43 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 10 Apr 2009 13:57:43 +0100 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> Message-ID: <49DF4247.7060504@voidspace.org.uk> Guido van Rossum wrote: > On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato > wrote: > >> Then perhaps you misunderstand the goal of the decorator module. >> The raison d'etre of the module is to PRESERVE the signature: >> update_wrapper unfortunately *changes* it. >> >> When confronted with a library which I do not not know, I often run >> over it pydoc, or sphinx, or a custom made documentation tool, to extract the >> signature of functions. >> > > Ah, I see. Personally I rarely trust automatically extracted > documentation -- too often in my experience it is out of date or > simply absent. Extracting the signatures in theory wouldn't lie, but > in practice I still wouldn't trust it -- not only because of what > decorators might or might not do, but because it might still be > misleading. Call me old-fashioned, but I prefer to read the source > code. > If you auto-generate API documentation by introspection (which we do at Resolver Systems) then preserving signatures can also be important. Interactive use (support for help), and more straightforward tracebacks in the event of usage errors are other reasons to want to preserve signatures and function name. > For instance, if I see a method > >> get_user(self, username) I have a good hint about what it is supposed >> to do. But if the library (say a web framework) uses non signature-preserving >> decorators, my documentation tool says to me that there is function >> get_user(*args, **kwargs) which frankly is not enough [this is the >> optimistic case, when the author of the decorator has taken care >> to preserve the name of the original function]. >> > > But seeing the decorator is often essential for understanding what > goes on! Even if the decorator preserves the signature (in truth or > according inspect), many decorators *do* something, and it's important > to know how a function is decorated. For example, I work a lot with a > small internal framework at Google whose decorators can raise > exceptions and set instance variables; they also help me understand > under which conditions a method can be called. > Having methods renamed to 'wrapped' and their signature changed to *args, **kwargs may tell you there *is* a decorator but doesn't give you any useful information about what it does. If you look at the code then the decorator is obvious (whether or not it mangles the method)... > [+1] >> But I feel strongly about >> the possibility of being able to preserve (not change!) the function >> signature. >> > > That could be added to functools if enough people want it. > > +1 Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From barry at python.org Fri Apr 10 15:31:46 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 09:31:46 -0400 Subject: [Python-Dev] Python 2.6.2 final Message-ID: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org> I wanted to cut Python 2.6.2 final tonight, but for family reasons I won't be able to do so until Monday. Please be conservative in any commits to the 2.6 branch between now and then. bugs.python.org is apparently down right now, but I set issue 5724 to release blocker for 2.6.2. This is waiting for input from Mark Dickinson, and it relates to test_cmath failing on Solaris 10. If Mark fixes that, he's welcome to commit it, otherwise I will remove the release blocker tag on the issue and release 2.6.2 anyway. Plan on me tagging 2.6.2 final Sunday evening. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From bill.hoffman at kitware.com Fri Apr 10 16:13:30 2009 From: bill.hoffman at kitware.com (Bill Hoffman) Date: Fri, 10 Apr 2009 10:13:30 -0400 Subject: [Python-Dev] Evaluated cmake as an autoconf replacement In-Reply-To: <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com> References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com> <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com> Message-ID: <49DF540A.9030808@kitware.com> Neil Hodgson wrote: > cmake does not produce relative paths in its generated make and > project files. There is an option CMAKE_USE_RELATIVE_PATHS which > appears to do this but the documentation says: > > """This option does not work for more complicated projects, and > relative paths are used when possible. In general, it is not possible > to move CMake generated makefiles to a different location regardless > of the value of this variable.""" > > This means that generated Visual Studio project files will not work > for other people unless a particular absolute build location is > specified for everyone which will not suit most. Each person that > wants to build Python will have to run cmake before starting Visual > Studio thus increasing the prerequisites. > This is true. CMake does not generate stand alone transferable projects. CMake must be installed on the machine where the compilation is done. CMake will automatically re-run if any of the inputs are changed, and have visual studio re-load the project, and CMake can be used for simple cross platform commands like file copy and and other operations so that the build files do not depend on shell commands or anything system specific. -Bill From ncoghlan at gmail.com Fri Apr 10 16:53:00 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Apr 2009 00:53:00 +1000 Subject: [Python-Dev] decorator module in stdlib? In-Reply-To: References: <4edc17eb0904072109s43528da5if7ca6f7d34fa8b60@mail.gmail.com> <4edc17eb0904080017k2aa23077q70e5b74aa11421a5@mail.gmail.com> <4edc17eb0904082131j568176a2p30836834623fbfa6@mail.gmail.com> Message-ID: <49DF5D4C.5040607@gmail.com> Guido van Rossum wrote: > On Wed, Apr 8, 2009 at 9:31 PM, Michele Simionato >> But I feel strongly about >> the possibility of being able to preserve (not change!) the function >> signature. > > That could be added to functools if enough people want it. No objection in principle here - it's just hard to do cleanly without PEP 362's __signature__ attribute to underpin it. Without that as a basis, I expect you'd end up being forced to do something similar to what Michele does in the decorator module - inspect the function being wrapped and then use exec to generate a wrapper with a matching signature. Another nice introspection enhancement might be to give class and function objects writable __file__ and __line__ attributes (initially set appropriately by the compiler) and have the inspect modules use those when they're available. Then functools.update_wrapper() could be adjusted to copy those attributes, meaning that the wrapper function would point back to the original (decorated) function for the source code, rather than to the definition of the wrapper (note that the actual wrapper code could still be found by looking at the metadata on the function's __code__ attribute). Unfortunately-ideas-aren't-working-code'ly, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From __peter__ at web.de Fri Apr 10 10:58:56 2009 From: __peter__ at web.de (Peter Otten) Date: Fri, 10 Apr 2009 10:58:56 +0200 Subject: [Python-Dev] Rethinking intern() and its data structure References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <99425C9C-BE44-4564-84DC-3BAF370CFAE0@gmail.com> <49DECD5F.7@gmail.com> Message-ID: John Arbash Meinel wrote: > Not as big of a difference as I thought it would be... But I bet if > there was a way to put the random shuffle in the inner loop, so you > weren't accessing the same identical 25k keys internally, you might get > more interesting results. You can prepare a few random samples during startup: $ python -m timeit -s"from random import sample; d = dict.fromkeys(xrange(10**7)); nextrange = iter([sample(xrange(10**7),25000) for i in range(200)]).next" "for x in nextrange(): d.get(x)" 10 loops, best of 3: 20.2 msec per loop To put it into perspective: $ python -m timeit -s"d = dict.fromkeys(xrange(10**7)); nextrange = iter([range(25000)]*200).next" "for x in nextrange(): d.get(x)" 100 loops, best of 3: 10.9 msec per loop Peter From a.badger at gmail.com Fri Apr 10 16:56:20 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 10 Apr 2009 07:56:20 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <1239355179.2892.224.camel@lifeless-64> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <1239355179.2892.224.camel@lifeless-64> Message-ID: <49DF5E14.3030108@gmail.com> Robert Collins wrote: > Certainly, import time is part of it: > robertc at lifeless-64:~$ python -m timeit -s 'import sys; import > bzrlib.errors' "del sys.modules['bzrlib.errors']; import bzrlib.errors" > 10 loops, best of 3: 18.7 msec per loop > > (errors.py is 3027 lines long with 347 exception classes). > > We've also looked lower - python does a lot of stat operations search > for imports and determining if the pyc is up to date; these appear to > only really matter on cold-cache imports (but they matter a lot then); > in hot-cache situations they are insignificant. > Tarek, Georg, and I talked about a way to do both multi-version and speedup of this exact problem with import in the future at pycon. I had to leave before the hackfest got started, though, so I don't know where the idea went from there. Tarek, did this idea progress any? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From foom at fuhm.net Fri Apr 10 17:08:04 2009 From: foom at fuhm.net (James Y Knight) Date: Fri, 10 Apr 2009 11:08:04 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: On Apr 9, 2009, at 10:38 PM, Barry Warsaw wrote: > So, what I'm really asking is this. Let's say you agree that there > are use cases for accessing a header value as either the raw encoded > bytes or the decoded unicode. As I said in the thread having nearly the same exact discussion on web- sig, except about WSGI headers... > What should this return: > > >>> message['Subject'] > > The raw bytes or the decoded unicode? Until you write a parser for every header, you simply cannot decode to unicode. The only sane choices are: 1) raw bytes 2) parsed structured data There's no "decoded to unicode but not parsed" option: that's doing things in the wrong order. If you RFC2047-decode the header before doing tokenization and parsing, you will just have a *broken* implementation. Here's an example where it matters. If you decode the RFC2047 part before parsing, you'd decide that there's two recipients to the message. There aren't. ", " is the display-name of "actual at example.com", not a second recipient. To: =?UTF-8?B?PGJyb2tlbkBleGFtcGxlLmNvbT4sIA==?= Here's a quote from RFC2047: > NOTE: Decoding and display of encoded-words occurs *after* a > structured field body is parsed into tokens. It is therefore > possible to hide 'special' characters in encoded-words which, when > displayed, will be indistinguishable from 'special' characters in > the surrounding text. For this and other reasons, it is NOT > generally possible to translate a message header containing 'encoded- > word's to an unencoded form which can be parsed by an RFC 822 mail > reader. And another quote for good measure: > (2) Any header field not defined as '*text' should be parsed > according to the syntax rules for that header field. However, any > 'word' that appears within a 'phrase' should be treated as an > 'encoded-word' if it meets the syntax rules in section 2. Otherwise > it should be treated as an ordinary 'word'. Now, I suppose there's also a third possibility: 3) US-ASCII-only strings, unmolested except for doing a .decode('ascii'). That'll give you a string all right, but it's really just cheating. It's not actually a text string in any meaningful sense. (in all this I'm assuming your question is not about the "Subject" header in particular; that is of course just unstructured text so the parse step doesn't actually do anything...). James From stephen at xemacs.org Fri Apr 10 17:38:03 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 00:38:03 +0900 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> Message-ID: <87vdpcfrj8.fsf@xemacs.org> Paul Moore writes: > On the other hand, further down in the document: > > """ > 3. Encoding > > JSON text SHALL be encoded in Unicode. The default encoding is > UTF-8. > > Since the first two characters of a JSON text will always be ASCII > characters [RFC0020], it is possible to determine whether an octet > stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > at the pattern of nulls in the first four octets. > """ > > This is at best confused (in my utterly non-expert opinion :-)) as > Unicode isn't an encoding... The word "encoding" (by itself) does not have a standard definition AFAIK. However, since Unicode *is* a "coded character set" (plus a bunch of hairy usage rules), there's nothing wrong with saying "text is encoded in Unicode". The RFC 2130 and Unicode TR#17 taxonomies are annoying verbose and pedantic to say the least. So what is being said there (in UTR#17 terminology) is (1) JSON is *text*, that is, a sequence of characters. (2) The abstract repertoire and coded character set are defined by the Unicode standard. (3) The default transfer encoding syntax is UTF-8. > That implies that loads can/should also allow bytes as input, applying > the given algorithm to guess an encoding. It's not a guess, unless the data stream is corrupt---or nonconforming. But it should not be the JSON package's responsibility to deal with corruption or non-conformance (eg, ISO-8859-15-encoded programs). That's the whole point of specifying the coded character set in the standard the first place. I think it's a bad idea for any of the core JSON API to accept or produce bytes in any language that provides a Unicode string type. That doesn't mean Python's module shouldn't provide convenience functions to read and write JSON serialized as UTF-8 (in fact, that *should* be done, IMO) and/or other UTFs (I'm not so happy about that). But those who write programs using them should not report bugs until they've checked out and eliminated the possibility of an encoding screwup! From bob at redivi.com Fri Apr 10 17:55:25 2009 From: bob at redivi.com (Bob Ippolito) Date: Fri, 10 Apr 2009 08:55:25 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <87vdpcfrj8.fsf@xemacs.org> References: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> <87vdpcfrj8.fsf@xemacs.org> Message-ID: <6a36e7290904100855x7ce48f2ege72b4825fd792579@mail.gmail.com> On Fri, Apr 10, 2009 at 8:38 AM, Stephen J. Turnbull wrote: > Paul Moore writes: > > ?> On the other hand, further down in the document: > ?> > ?> """ > ?> 3. ?Encoding > ?> > ?> ? ?JSON text SHALL be encoded in Unicode. ?The default encoding is > ?> ? ?UTF-8. > ?> > ?> ? ?Since the first two characters of a JSON text will always be ASCII > ?> ? ?characters [RFC0020], it is possible to determine whether an octet > ?> ? ?stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > ?> ? ?at the pattern of nulls in the first four octets. > ?> """ > ?> > ?> This is at best confused (in my utterly non-expert opinion :-)) as > ?> Unicode isn't an encoding... > > The word "encoding" (by itself) does not have a standard definition > AFAIK. ?However, since Unicode *is* a "coded character set" (plus a > bunch of hairy usage rules), there's nothing wrong with saying "text > is encoded in Unicode". ?The RFC 2130 and Unicode TR#17 taxonomies are > annoying verbose and pedantic to say the least. > > So what is being said there (in UTR#17 terminology) is > > (1) JSON is *text*, that is, a sequence of characters. > (2) The abstract repertoire and coded character set are defined by the > ? ?Unicode standard. > (3) The default transfer encoding syntax is UTF-8. > > ?> That implies that loads can/should also allow bytes as input, applying > ?> the given algorithm to guess an encoding. > > It's not a guess, unless the data stream is corrupt---or nonconforming. > > But it should not be the JSON package's responsibility to deal with > corruption or non-conformance (eg, ISO-8859-15-encoded programs). > That's the whole point of specifying the coded character set in the > standard the first place. ?I think it's a bad idea for any of the core > JSON API to accept or produce bytes in any language that provides a > Unicode string type. > > That doesn't mean Python's module shouldn't provide convenience > functions to read and write JSON serialized as UTF-8 (in fact, that > *should* be done, IMO) and/or other UTFs (I'm not so happy about > that). ?But those who write programs using them should not report bugs > until they've checked out and eliminated the possibility of an > encoding screwup! The current implementation doesn't do any encoding guesswork and I have no intention to allow that as a feature. The input must be unicode, UTF-8 bytes, or an encoding must be specified. Personally most of experience with JSON is as a wire protocol and thus bytes, so the obvious function to encode json should do that. There probably should be another function to get unicode output, but nobody has ever asked for that in the Python 2.x version. They either want the default behavior (encoding as ASCII str which can be used as unicode due to implementation details of Python 2.x) or encoding as a more compact UTF-8 str (without escaping non-ASCII code points). Perhaps Python 3 users would ask for a unicode output when decoding though. -bob From martin at v.loewis.de Fri Apr 10 18:11:26 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Apr 2009 18:11:26 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <87vdpcfrj8.fsf@xemacs.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> <87vdpcfrj8.fsf@xemacs.org> Message-ID: <49DF6FAE.3040602@v.loewis.de> > (3) The default transfer encoding syntax is UTF-8. Notice that the RFC is partially irrelevant. It only applies to the application/json mime type, and JSON is used in various other protocols, using various other encodings. > I think it's a bad idea for any of the core > JSON API to accept or produce bytes in any language that provides a > Unicode string type. So how do you integrate the encoding detection that the RFC suggests to be done? Regards, Martin From janssen at parc.com Fri Apr 10 18:35:44 2009 From: janssen at parc.com (Bill Janssen) Date: Fri, 10 Apr 2009 09:35:44 PDT Subject: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> Message-ID: <92023.1239381344@parc.com> Barry Warsaw wrote: > In that case, we really need the > bytes-in-bytes-out-bytes-in-the-chewy- > center API first, and build things on top of that. Yep. Bill From barry at python.org Fri Apr 10 18:56:09 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 12:56:09 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> Message-ID: On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote: > On 02:38 am, barry at python.org wrote: >> So, what I'm really asking is this. Let's say you agree that there >> are use cases for accessing a header value as either the raw >> encoded bytes or the decoded unicode. What should this return: >> >> >>> message['Subject'] >> >> The raw bytes or the decoded unicode? > > My personal preference would be to just get deprecate this API, and > get rid of it, replacing it with a slightly more explicit one. > > message.headers['Subject'] > message.bytes_headers['Subject'] This is pretty darn clever Glyph. Stop that! :) I'm not 100% sure I like the name .bytes_headers or that .headers should be the decoded header (rather than have .headers return the bytes thingie and say .decoded_headers return the decoded thingies), but I do like the general approach. >> Now, setting headers. Sometimes you have some unicode thing and >> sometimes you have some bytes. You need to end up with bytes in >> the ASCII range and you'd like to leave the header value unencoded >> if so. But in both cases, you might have bytes or characters >> outside that range, so you need an explicit encoding, defaulting to >> utf-8 probably. > > message.headers['Subject'] = 'Some text' > > should be equivalent to > > message.headers['Subject'] = Header('Some text') Yes, absolutely. I think we're all in general agreement that header values should be instances of Header, or subclasses thereof. > My preference would be that > > message.headers['Subject'] = b'Some Bytes' > > would simply raise an exception. If you've got some bytes, you > should instead do > > message.bytes_headers['Subject'] = b'Some Bytes' > > or > > message.headers['Subject'] = Header(bytes=b'Some Bytes', > encoding='utf-8') > > Explicit is better than implicit, right? Yes. Again, I really like the general idea, if I might quibble about some of the details. Thanks for a great suggestion. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From fumanchu at aminus.org Fri Apr 10 18:47:11 2009 From: fumanchu at aminus.org (Robert Brewer) Date: Fri, 10 Apr 2009 09:47:11 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <1239382031.8682.11.camel@haku> On Thu, 2009-04-09 at 22:38 -0400, Barry Warsaw wrote: > On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote: > > > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw wrote: > > Anyway, aside from that decision, I haven't come up with an elegant > > way to allow /output/ in both bytes and strings (input is I think > > theoretically easier by sniffing the arguments). > > > > Won't this work? (assuming dumps() always returns a string) > > > > def dumpb(obj, encoding='utf-8', *args, **kw): > > s = dumps(obj, *args, **kw) > > return s.encode(encoding) > > So, what I'm really asking is this. Let's say you agree that there > are use cases for accessing a header value as either the raw encoded > bytes or the decoded unicode. What should this return: > > >>> message['Subject'] > > The raw bytes or the decoded unicode? > > Okay, so you've picked one. Now how do you spell the other way? > > The Message class probably has these explicit methods: > > >>> Message.get_header_bytes('Subject') > >>> Message.get_header_string('Subject') > > (or better names... it's late and I'm tired ;). One of those maps to > message['Subject'] but which is the more obvious choice? > > Now, setting headers. Sometimes you have some unicode thing and > sometimes you have some bytes. You need to end up with bytes in the > ASCII range and you'd like to leave the header value unencoded if so. > But in both cases, you might have bytes or characters outside that > range, so you need an explicit encoding, defaulting to utf-8 probably. > > >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >>> Message.set_header('Subject', b'Some bytes') > > One of those maps to > > >>> message['Subject'] = ??? > > I'm open to any suggestions here! Syntactically, there's no sense in providing: Message.set_header('Subject', 'Some text', encoding='utf-16') ...since you could more clearly write the same as: Message.set_header('Subject', 'Some text'.encode('utf-16')) The only interesting case is if you provided a *default* encoding, so that: Message.default_header_encoding = 'utf-16' Message.set_header('Subject', 'Some text') ...has the same effect. But it would be far easier to do all the encoding at once in an output() or serialize() method. Do different headers need different encodings? If so, make message['Subject'] a subclass of str and give it an .encoding attribute (with a default). If not, Message.header_encoding should be sufficient. Robert Brewer fumanchu at aminus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Apr 10 19:08:26 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 13:08:26 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <595A42B2-0D3B-4886-960B-F16D50D0CC5A@python.org> On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote: > At 22:38 -0400 04/09/2009, Barry Warsaw wrote: > ... >> So, what I'm really asking is this. Let's say you agree that there >> are use cases for accessing a header value as either the raw encoded >> bytes or the decoded unicode. What should this return: >> >>>>> message['Subject'] >> >> The raw bytes or the decoded unicode? > > That's an easy one: Subject: is an unstructured header, so it must be > text, thus Unicode. We're looking at a high-level representation of > an > email message, with parsed header fields and a MIME message tree. I'm liking Glyph's suggestion here. We'll probably have to support the message['Subject'] API for backward compatibility, but in that case it really should be a bytes API. >> (or better names... it's late and I'm tired ;). One of those maps to >> message['Subject'] but which is the more obvious choice? > > Structured header fields are more of a problem. Any header with > addresses > should return a list of addresses. I think the default return type > should > depend on the data type. To get an explicit bytes or string or list > of > addresses, be explicit; otherwise, for convenience, return the > appropriate > type for the particular header field name. Yes, structured headers are trickier. In a separate message, James Knight makes some excellent points, which I agree with. However the email package obviously cannot support every time of structured header possible. It must support this through extensibility. The obvious way is through inheritance (i.e. subclasses of Header), but in my experience, using inheritance of the Message class really doesn't work very well. You need to pass around factories to parsing functions and your application tends to have its own hierarchy of subclasses for whatever extra things it needs. ISTM that subclassing is simply not the right pattern to support extensibility in the Message objects or Header objects. Yes, this leads me to think that all the MIME* subclasses are essentially /wrong/. Having said all that, the email package must support structured headers. Look at the insanity which is the current folding whitespace splitting and the impossibility of the current code to do the right thing for say Subject headers and Received headers, and you begin to see why it must be possible to extend this stuff. >> Now, setting headers. Sometimes you have some unicode thing and >> sometimes you have some bytes. You need to end up with bytes in the >> ASCII range and you'd like to leave the header value unencoded if so. >> But in both cases, you might have bytes or characters outside that >> range, so you need an explicit encoding, defaulting to utf-8 >> probably. > > Never for header fields. The default is always RFC 2047, unless it > isn't, > say for params. > > The Message class should create an object of the appropriate > subclass of > Header based on the name (or use the existing object, see other > discussion), and that should inspect its argument and DTRT or > complain. >>>>> Message.set_header('Subject', 'Some text', encoding='utf-8') >>>>> Message.set_header('Subject', b'Some bytes') >> >> One of those maps to >> >>>>> message['Subject'] = ??? > > The expected data type should depend on the header field. For > Subject:, it > should be bytes to be parsed or verbatim text. For To:, it should > be a > list of addresses or bytes or text to be parsed. At a higher level, yes. At the low level, it has to be bytes. > The email package should be pythonic, and not require deep > understanding of > dozens of RFCs to use properly. Users don't need to know about the > raw > bytes; that's the whole point of MIME and any email package. It > should be > easy to set header fields with their natural data types, and doing > it with > bad data should produce an error. This may require a bit more care > in the > message parser, to always produce a parsed message with defects. I agree that we should have some higher level APIs that make it easy to compose email messages, and probably easy-ish to parse a byte stream into an email message tree. But we can't build those without the lower level raw support. I'm also convinced that this lower level will be the domain of those crazy enough to have the RFCs tattooed to the back of their eyelids. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 19:12:48 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 13:12:48 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> Message-ID: <50EC006F-CF96-45F4-AD71-73B9DE7E510E@python.org> On Apr 9, 2009, at 11:59 PM, Tony Nelson wrote: >> Thinking about this stuff makes me nostalgic for the sloppy happy >> days >> of Python 2.x > > You now have the opportunity to finally unsnarl that mess. It is > not an > insurmountable opportunity. No, it's just a full time job . Now where did I put that hack- drink-coffee-twitter clone? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 19:21:45 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 13:21:45 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <87zlepf5hf.fsf@xemacs.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <87zlepf5hf.fsf@xemacs.org> Message-ID: <67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org> On Apr 10, 2009, at 1:22 AM, Stephen J. Turnbull wrote: >> Those objects have headers and payload. The payload can be of any >> type, though I think it generally breaks down into "strings" for >> text/ >> * types and bytes for anything else (not counting multiparts). > > *sigh* Why are you back-tracking? I'm not. Sleep deprivation on makes it seem like that. > The payload should be of an appropriate *object* type. Atomic object > types will have their content stored as string or bytes [nb I use > Python 3 terminology throughout]. Composite types (multipart/*) won't > need string or bytes attributes AFAICS. Yes, agreed. > Start by implementing the application/octet-stream and > text/plain;charset=utf-8 object types, of course. Yes. See my lament about using inheritance for this. >> It does seem to make sense to think about headers as text header >> names >> and text header values. > > I disagree. IMHO, structured header types should have object values, > and something like While I agree, there's still a need for a higher level API that make it easy to do the simple things. > message['to'] = "Barry 'da FLUFL' Warsaw " > > should be smart enough to detect that it's a string and attempt to > (flexibly) parse it into a fullname and a mailbox adding escapes, etc. > Whether these should be structured objects or they can be strings or > bytes, I'm not sure (probably bytes, not strings, though -- see next > exampl). OTOH > > message['to'] = b'''"Barry 'da.FLUFL' Warsaw" ''' > > should assume that the client knows what they are doing, and should > parse it strictly (and I mean "be a real bastard", eg, raise an > exception on any non-ASCII octet), merely dividing it into fullname > and mailbox, and caching the bytes for later insertion in a > wire-format message. I agree that the Message class needs to be strict. A parser needs to be lenient; see the .defects attribute introduced in the current email package. Oh, and this reminds me that we still haven't talked about idempotency. That's an important principle in the current email package, but do we need to give up on that? >> In that case, I think you want the values as unicodes, and probably >> the headers as unicodes containing only ASCII. So your table would >> be >> strings in both cases. OTOH, maybe your application cares about the >> raw underlying encoded data, in which case the header names are >> probably still strings of ASCII-ish unicodes and the values are >> bytes. It's this distinction (and I think the competing use cases) >> that make a true Python 3.x API for email more complicated. > > I don't see why you can't have the email API be specific, with > message['to'] always returning a structured_header object (or maybe > even more specifically an address_header object), and methods like > > message['to'].build_header_as_text() > > which returns > > """To: "Barry 'da.FLUFL' Warsaw" """ > > and > > message['to'].build_header_in_wire_format() > > which returns > > b"""To: "Barry 'da.FLUFL' Warsaw" """ > > Then have email.textview.Message and email.wireview.Message which > provide a simple interface where message['to'] would invoke > .build_header_as_text() and .build_header_in_wire_format() > respectively. This seems similar to Glyph's basic idea, but with a different spelling. >> Thinking about this stuff makes me nostalgic for the sloppy happy >> days >> of Python 2.x > > Er, yeah. > > Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly > y'rs, Can I have my uucp address back now? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From v+python at g.nevcal.com Fri Apr 10 20:00:54 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 10 Apr 2009 11:00:54 -0700 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> Message-ID: <49DF8956.5050501@g.nevcal.com> On approximately 4/10/2009 9:56 AM, came the following characters from the keyboard of Barry Warsaw: > On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote: >> On 02:38 am, barry at python.org wrote: >>> So, what I'm really asking is this. Let's say you agree that there >>> are use cases for accessing a header value as either the raw encoded >>> bytes or the decoded unicode. What should this return: >>> >>> >>> message['Subject'] >>> >>> The raw bytes or the decoded unicode? >> >> My personal preference would be to just get deprecate this API, and >> get rid of it, replacing it with a slightly more explicit one. >> >> message.headers['Subject'] >> message.bytes_headers['Subject'] > > This is pretty darn clever Glyph. Stop that! :) > > I'm not 100% sure I like the name .bytes_headers or that .headers > should be the decoded header (rather than have .headers return the > bytes thingie and say .decoded_headers return the decoded thingies), > but I do like the general approach. If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. Of course, one could use message.header and message.bythdr and they'd be the same length. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From fuzzyman at voidspace.org.uk Fri Apr 10 20:06:13 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 10 Apr 2009 19:06:13 +0100 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49DF8956.5050501@g.nevcal.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> Message-ID: <49DF8A95.4010700@voidspace.org.uk> Glenn Linderman wrote: > On approximately 4/10/2009 9:56 AM, came the following characters from > the keyboard of Barry Warsaw: >> On Apr 10, 2009, at 1:19 AM, glyph at divmod.com wrote: >>> On 02:38 am, barry at python.org wrote: >>>> So, what I'm really asking is this. Let's say you agree that there >>>> are use cases for accessing a header value as either the raw >>>> encoded bytes or the decoded unicode. What should this return: >>>> >>>> >>> message['Subject'] >>>> >>>> The raw bytes or the decoded unicode? >>> >>> My personal preference would be to just get deprecate this API, and >>> get rid of it, replacing it with a slightly more explicit one. >>> >>> message.headers['Subject'] >>> message.bytes_headers['Subject'] >> >> This is pretty darn clever Glyph. Stop that! :) >> >> I'm not 100% sure I like the name .bytes_headers or that .headers >> should be the decoded header (rather than have .headers return the >> bytes thingie and say .decoded_headers return the decoded thingies), >> but I do like the general approach. > > If one name has to be longer than the other, it should be the bytes > version. Real user code is more likely to want to use the text > version, and hopefully there will be more of that type of code than > implementations using bytes. > > Of course, one could use message.header and message.bythdr and they'd > be the same length. > > Shouldn't headers always be text? Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From stephen at xemacs.org Fri Apr 10 20:13:35 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 03:13:35 +0900 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DF6FAE.3040602@v.loewis.de> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> <87vdpcfrj8.fsf@xemacs.org> <49DF6FAE.3040602@v.loewis.de> Message-ID: <87r600fkc0.fsf@xemacs.org> "Martin v. L?wis" writes: > > (3) The default transfer encoding syntax is UTF-8. > > Notice that the RFC is partially irrelevant. It only applies > to the application/json mime type, and JSON is used in various > other protocols, using various other encodings. Sure. That's their problem. In Python, Unicode is the native encoding, and we have codecs to deal with the outside world, no? That happens to match very well not only with RFC 4627, but the sidebar on json.org that defines JSON. > > I think it's a bad idea for any of the core JSON API to accept or > > produce bytes in any language that provides a Unicode string type. > > So how do you integrate the encoding detection that the RFC suggests > to be done? I suggest you don't. That's mission creep. Think about writing tests for it, and remember that out in the wild those "various other encodings" almost certainly include Shift JIS, Big5, and KOI8-R. Both those considerations point to "er, let's delegate detection and en/decoding to the nice folks who maintain the codec suite." Where it's embedded in some other protocol which specifies a TES, the TES can be implemented there, too. As I wrote earlier, I don't see anything wrong with providing a wrapper module that deals with some default/common/easy cases. But I'd stick it in the contrib directory. From barry at python.org Fri Apr 10 20:55:23 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 14:55:23 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49DF8956.5050501@g.nevcal.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> Message-ID: <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote: > If one name has to be longer than the other, it should be the bytes > version. Real user code is more likely to want to use the text > version, and hopefully there will be more of that type of code than > implementations using bytes. I'm not sure we know that yet, actually. Nothing written for Python 2 counts, and email is too broken in 3 for any sane person to be writing such code for Python 3. > Of course, one could use message.header and message.bythdr and > they'd be the same length. I was trying to figure out what a 'thdr' was that we'd want to index 'by' it. :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Fri Apr 10 20:55:56 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 14:55:56 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49DF8A95.4010700@voidspace.org.uk> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <49DF8A95.4010700@voidspace.org.uk> Message-ID: On Apr 10, 2009, at 2:06 PM, Michael Foord wrote: > Shouldn't headers always be text? /me weeps -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From stephen at xemacs.org Fri Apr 10 21:04:22 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 04:04:22 +0900 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <87zlepf5hf.fsf@xemacs.org> <67879F1D-B386-4B9B-8203-86DB977BD7FF@python.org> Message-ID: <87prfkfhzd.fsf@xemacs.org> Shouldn't this thread move lock stock and .signature to email-sig? Barry Warsaw writes: > >> It does seem to make sense to think about headers as text header > >> names and text header values. > > > > I disagree. IMHO, structured header types should have object values, > > and something like > > While I agree, there's still a need for a higher level API that make > it easy to do the simple things. Sure. I'm suggesting that the way to determine whether something is simple or not is by whether it falls out naturally from correct structure. Ie, no operations that only a Cirque du Soleil juggler can perform are allowed. > I agree that the Message class needs to be strict. A parser needs to > be lenient; Not always. The Postel Principle only applies to stuph coming in off the wire. But we're *also* going to be parsing pseudo-email components that are being handed to us by applications (eg, the perennial control-character-in-the-unremovable-address Mailman bug). Our parser should Just Say No to that crap. > see the .defects attribute introduced in the current email > package. Oh, and this reminds me that we still haven't talked about > idempotency. That's an important principle in the current email > package, but do we need to give up on that? "Idempotency"? I'm not sure what that means in the context of the email package ... multiplication by zero? Do you mean that .parse().to_wire() should be idempotent? Yes, I think that's a good idea, and it shouldn't be too hard to implement by (optionally?) caching the whole original message or individual components (headers with all whitespace including folding cached verbatim, etc). I think caching has to be done, since stuff like "did the original fold with a leading tab or a leading space, and at what column" and so on seems kind of pointless to encode as attributes on Header objects. [Description of MessageTextView and MessageWireView elided.] > This seems similar to Glyph's basic idea, but with a different spelling. Yes. I don't much care which way it's done, and Glyph's style of spelling is more explicit. But I was thinking in terms of the number of people who are surely going to sing "Mama don' 'low no Unicodes roun' here" and squeal "codec WTF?! outta mah face, man!" From stephen at xemacs.org Fri Apr 10 21:06:59 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 04:06:59 +0900 Subject: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: <92023.1239381344@parc.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> <92023.1239381344@parc.com> Message-ID: <87ocv4fhv0.fsf@xemacs.org> Bill Janssen writes: > Barry Warsaw wrote: > > > In that case, we really need the > > bytes-in-bytes-out-bytes-in-the-chewy- > > center API first, and build things on top of that. > > Yep. Uh, I hate to rain on a parade, but isn't that how we arrived at the *current* email package? From pje at telecommunity.com Fri Apr 10 21:05:17 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 10 Apr 2009 15:05:17 -0400 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <49DF08D8.9080806@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <49DF08D8.9080806@gmail.com> Message-ID: <20090410190248.913033A4063@sparrow.telecommunity.com> At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: >This problem (slow application startup times due to too many imports at >startup, which can in turn can be due to top level imports for library >or framework functionality that a given application doesn't actually >use) is actually the main reason I sometimes wish for a nice, solid lazy >module import mechanism that manages to avoid the potential deadlock >problems created by using import statements inside functions. Have you tried http://pypi.python.org/pypi/Importing ? Or more specifically, http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ? It does of course use the import lock, but as long as your top-level module code doesn't acquire locks (directly or indirectly), it shouldn't be possible to deadlock. (Or more precisely, to add any *new* deadlocks that you didn't already have.) From barry at python.org Fri Apr 10 21:04:01 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 10 Apr 2009 15:04:01 -0400 Subject: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: <87ocv4fhv0.fsf@xemacs.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> <92023.1239381344@parc.com> <87ocv4fhv0.fsf@xemacs.org> Message-ID: On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote: > Bill Janssen writes: >> Barry Warsaw wrote: >> >>> In that case, we really need the >>> bytes-in-bytes-out-bytes-in-the-chewy- >>> center API first, and build things on top of that. >> >> Yep. > > Uh, I hate to rain on a parade, but isn't that how we arrived at the > *current* email package? Not really. We got here because we were too damn sloppy about the distinction. I'm going to remove python-dev from subsequent follow ups. Please join us at email-sig for further discussion. Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From aahz at pythoncraft.com Fri Apr 10 21:05:56 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 10 Apr 2009 12:05:56 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <49DF8A95.4010700@voidspace.org.uk> Message-ID: <20090410190555.GA5843@panix.com> On Fri, Apr 10, 2009, Barry Warsaw wrote: > On Apr 10, 2009, at 2:06 PM, Michael Foord wrote: >> >> Shouldn't headers always be text? > > /me weeps /me hands Barry a hankie -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From turnbull at sk.tsukuba.ac.jp Fri Apr 10 21:22:09 2009 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 04:22:09 +0900 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <1239382031.8682.11.camel@haku> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <1239382031.8682.11.camel@haku> Message-ID: <87myaofh5q.fsf@xemacs.org> Robert Brewer writes: > Syntactically, there's no sense in providing: > > Message.set_header('Subject', 'Some text', encoding='utf-16') > > ...since you could more clearly write the same as: > > Message.set_header('Subject', 'Some text'.encode('utf-16')) Which you now must *parse* and guess the encoding to determine how to RFC-2047-encode the binary mush. I think the encoding parameter is necessary here. > But it would be far easier to do all the encoding at once in an > output() or serialize() method. Do different headers need different > encodings? You can have multiple encodings within a single header (and a na?ve algorithm might very well encode "The price of G?del-Escher-Bach is ?25" as "The price of =?ISO-8859-1?Q?G=F6del-Escher-Bach?= is =?ISO-8859-15?Q?=A425?="). > If so, make message['Subject'] a subclass of str and give it an > .encoding attribute (with a default). But if you've set the .encoding attribute, you don't need to encode 'Some text'; .set_header() can take care of it for you. And what about the possibility that the encoding attributes disagree with the argument you passed to the codec? From ctb at msu.edu Fri Apr 10 22:38:09 2009 From: ctb at msu.edu (C. Titus Brown) Date: Fri, 10 Apr 2009 13:38:09 -0700 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC Message-ID: <20090410203809.GA24530@idyll.org> Hi all, this year we have 10-12 GSoC applications that I've put in the "relevant to core Python development" category. These projects, if mentors etc are found, are *guaranteed* a slot under the PSF GSoC umbrella. As backup GSoC admin and general busybody, I've taken on the work of coordinating these as a special subgroup within the PSF GSoC, and I thought it would be good to mention them to python-dev. Note that all of them have been run by a few different committers, including Martin, Tarek, Benjamin, and Brett, and they've been obliging enough to triage a few of them. Thanks, guys! Here's what's left after that triage. Note that except for the four at the top, these have all received positive support from *someone* who is a committer and I don't think we need to discuss them here -- patches etc. can go through normal "python-dev" channels during the course of the summer. I am looking for feedback on the first four, though. Can these reasonably be considered "core" priorites for Python? Remember, this "costs" us something in the sense of preferring these over Python subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim, etc. --- Questionable "core": 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it to py3k fits with Guido's request that "more stuff get ported". To be clear, I don't think anyone expects all of NumPy to get ported this summer, but these students will work through issues associated with porting big chunks o' code to py3k. One medium/strong proposal, one medium/weak proposal. Comments/thoughts? 2x "improve testing tools for py3k" -- variously focus on improving test coverage and testing wrappers. One proposes to provide a nice wrapper to make nose and py.test capable of running the regrtests, which (with no change to regrtest) would let people run tests in parallel, distribute or run tests across multiple machines (including Snakebite), tag and run subsets of tests with personal and/or public tags, and otherwise take advantage of many of the nice features of nose and py.test. The other proposes to measure & increase the code coverage of the py3k tests in both Python and C, integrate across multiple machines, and otherwise provide a nice set of integrated reports that anyone can generate on their own machines. This proposal, in particular, could move smoothly towards the effort to produce a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. (This wasn't integrated into the proposal because I only found out about it after the proposals were due.) I personally think that both testing proposals are good, and they grew out of conversations I had with Brett, who thinks that the general ideas are good. So, err, I'm looking for pushback, I guess ;). I can expand on these ideas a bit if people are interested. Both proposals are medium at least, and I've personally been positively impressed with the student interaction. Comments/thoughts? --- Unquestionably "core" by my criteria above: 3to2 tool -- 'nuff said. subprocess improvement -- integrating, testing, and proposing some of the various subprocess improvements that have passed across this list & the bug tracker IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker issues relating to IDLE and Tkinter. roundup VCS integration / build tools to support core development -- a single student proposed both of these and has received some support. See http://slexy.org/view/s2pFgWxufI for details. sphinx framework improvement -- support for per-paragraph comments and user/developer interface for submitting/committing fixes 2x "keyring package" -- see http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/. The poorer one of these will probably be axed unless Tarek gives it strong support. -- --titus -- C. Titus Brown, ctb at msu.edu From ggpolo at gmail.com Fri Apr 10 22:53:23 2009 From: ggpolo at gmail.com (Guilherme Polo) Date: Fri, 10 Apr 2009 17:53:23 -0300 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: On Fri, Apr 10, 2009 at 5:38 PM, C. Titus Brown wrote: > Hi all, > > this year we have 10-12 GSoC applications that I've put in the "relevant > to core Python development" category. ?These projects, if mentors etc > are found, are *guaranteed* a slot under the PSF GSoC umbrella. ?As > backup GSoC admin and general busybody, I've taken on the work of > coordinating these as a special subgroup within the PSF GSoC, and I > thought it would be good to mention them to python-dev. > > Note that all of them have been run by a few different committers, > including Martin, Tarek, Benjamin, and Brett, and they've been obliging > enough to triage a few of them. ?Thanks, guys! > > Here's what's left after that triage. > . > . > > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker > ? ? ? ?issues relating to IDLE and Tkinter. > Is it important, for the discussion, to mention that it also involves testing this area (idle and tkinter), Titus ? I'm considering this more important than "just" dealing with the tracker issues. > --titus > -- > C. Titus Brown, ctb at msu.edu Regards, -- -- Guilherme H. Polo Goncalves From ctb at msu.edu Fri Apr 10 23:02:26 2009 From: ctb at msu.edu (C. Titus Brown) Date: Fri, 10 Apr 2009 14:02:26 -0700 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: References: <20090410203809.GA24530@idyll.org> Message-ID: <20090410210226.GB13018@idyll.org> On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote: -> > -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker -> > ? ? ? ?issues relating to IDLE and Tkinter. -> > -> -> Is it important, for the discussion, to mention that it also involves -> testing this area (idle and tkinter), Titus ? I'm considering this -> more important than "just" dealing with the tracker issues. What, I tell you that your app is going to be accepted and we shouldn't argue about it, and you want to argue about it? ;) --titus -- C. Titus Brown, ctb at msu.edu From tjreedy at udel.edu Fri Apr 10 23:05:17 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Apr 2009 17:05:17 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> Message-ID: glyph at divmod.com wrote: > > On 03:21 am, ncoghlan at gmail.com wrote: >> Barry Warsaw wrote: > >>> I don't know whether the parameter thing will work or not, but you're >>> probably right that we need to get the bytes-everywhere API first. > >> Given that json is a wire protocol, that sounds like the right approach >> for json as well. Once bytes-everywhere works, then a text API can be >> built on top of it, but it is difficult to build a bytes API on top of a >> text one. > > I wish I could agree, but JSON isn't really a wire protocol. According > to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the > serialization of structured data". There are some notes about encoding, > but it is very clearly described in terms of unicode code points. >> So I guess the IO library *is* the right model: bytes at the bottom of >> the stack, with text as a wrapper around it (mediated by codecs). > > In email's case this is true, but in JSON's case it's not. JSON is a > format defined as a sequence of code points; MIME is defined as a > sequence of octets. What is the 'bytes support' issue for json? Is it about content within a json text? Or about the transport format of a json text? Reading rfc4627, a json text is a unicode string representation of an instance of one of 6 classes. In Python terms, they are Nonetype, bool, numbers (int, float, decimal?), (unicode) str, list, and [string-keyed] dict. The representation is nearly identical to Python's literals and displays. For transport, the encoding SHALL be one of UTF-8, -16LE/BE, -32LE/BD, with UFT-8 the 'default'. So a json parser (a restricted eval()) tokenizes and parses a stream of unicode chars which in Python could come from either a unicode string or decoded bytes object. The bytes decoding could be either bulk or incremental. Similarly, a json generator (an repr()-like function) produces a stream of unicode chars which again could be optionally encoded to bytes, either incrementally or in bulk. The standard does not specify any correspondence between representations and domain objects, For Python making 'null', 'true', and 'false' inter-convert with None, True, False is obvious. Numbers are slightly more problemmtical. A generator could produce decimal literals from both floats and decimals but without a non-json extension, a parser could only convert back to one, so the other would not round-trip. (Int could be handled by the presence or absence of '.0'.) Similarly, tuples could be represented, like lists, as json square-bracketed arrays, but they would be converted back to lists, not tuples, unless a non-json extension were used. So the two possible byte-suppost content issues I see are how to represent them as legal json strings and/or whether some device should be added to make them round-trip. But as indicated above, these two issues are not unique to bytes. Terry Jan Reedy From tleeuwenburg at gmail.com Fri Apr 10 23:26:12 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Sat, 11 Apr 2009 07:26:12 +1000 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: <43c8685c0904101426q3796d459w2510d236f5f5831@mail.gmail.com> Well, I think Numpy is of huge importance to a major Python user segment, the scientific community. I don't know if that makes it 'core', but I strongly agree that it's important. Better testing is always useful, and more "core", but IMO less important. -T On Sat, Apr 11, 2009 at 6:38 AM, C. Titus Brown wrote: > Hi all, > > this year we have 10-12 GSoC applications that I've put in the "relevant > to core Python development" category. These projects, if mentors etc > are found, are *guaranteed* a slot under the PSF GSoC umbrella. As > backup GSoC admin and general busybody, I've taken on the work of > coordinating these as a special subgroup within the PSF GSoC, and I > thought it would be good to mention them to python-dev. > > Note that all of them have been run by a few different committers, > including Martin, Tarek, Benjamin, and Brett, and they've been obliging > enough to triage a few of them. Thanks, guys! > > Here's what's left after that triage. Note that except for the four at > the top, these have all received positive support from *someone* who is > a committer and I don't think we need to discuss them here -- patches > etc. can go through normal "python-dev" channels during the course of the > summer. > > I am looking for feedback on the first four, though. Can these > reasonably be considered "core" priorites for Python? Remember, this > "costs" us something in the sense of preferring these over Python > subprojects like (random example) Cython, NumPy, PySoy, Tahoe, Gajim, > etc. > > --- > > Questionable "core": > > 2x "port NumPy to py3k" -- NumPy is a major Python module and porting it > to py3k fits with Guido's request that "more stuff get ported". > To be clear, I don't think anyone expects all of NumPy to get > ported this summer, but these students will work through issues > associated with porting big chunks o' code to py3k. > > One medium/strong proposal, one medium/weak proposal. > > Comments/thoughts? > > 2x "improve testing tools for py3k" -- variously focus on improving test > coverage and testing wrappers. > > One proposes to provide a nice wrapper to make nose and py.test > capable of running the regrtests, which (with no change to > regrtest) would let people run tests in parallel, distribute or > run tests across multiple machines (including Snakebite), tag > and run subsets of tests with personal and/or public tags, and > otherwise take advantage of many of the nice features of nose > and py.test. > > The other proposes to measure & increase the code coverage of > the py3k tests in both Python and C, integrate across multiple > machines, and otherwise provide a nice set of integrated reports > that anyone can generate on their own machines. This proposal, > in particular, could move smoothly towards the effort to produce > a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. > (This wasn't integrated into the proposal because I only found > out about it after the proposals were due.) > > I personally think that both testing proposals are good, and > they grew out of conversations I had with Brett, who thinks that > the general ideas are good. So, err, I'm looking for pushback, > I guess ;). I can expand on these ideas a bit if people are > interested. > > Both proposals are medium at least, and I've personally been > positively impressed with the student interaction. > > Comments/thoughts? > > --- > > Unquestionably "core" by my criteria above: > > 3to2 tool -- 'nuff said. > > subprocess improvement -- integrating, testing, and proposing some of > the various subprocess improvements that have passed across this > list & the bug tracker > > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker > issues relating to IDLE and Tkinter. > > roundup VCS integration / build tools to support core development -- > a single student proposed both of these and has received some > support. See http://slexy.org/view/s2pFgWxufI for details. > > sphinx framework improvement -- support for per-paragraph comments and > user/developer interface for submitting/committing fixes > > 2x "keyring package" -- see > > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/ > . > The poorer one of these will probably be axed unless Tarek gives it > strong support. > > -- > > --titus > -- > C. Titus Brown, ctb at msu.edu > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggpolo at gmail.com Fri Apr 10 23:39:46 2009 From: ggpolo at gmail.com (Guilherme Polo) Date: Fri, 10 Apr 2009 18:39:46 -0300 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410210226.GB13018@idyll.org> References: <20090410203809.GA24530@idyll.org> <20090410210226.GB13018@idyll.org> Message-ID: On Fri, Apr 10, 2009 at 6:02 PM, C. Titus Brown wrote: > On Fri, Apr 10, 2009 at 05:53:23PM -0300, Guilherme Polo wrote: > -> > > -> > IDLE/Tkinter patch integration & improvement -- deal with ~120 tracker > -> > ? ? ? ?issues relating to IDLE and Tkinter. > -> > > -> > -> Is it important, for the discussion, to mention that it also involves > -> testing this area (idle and tkinter), Titus ? I'm considering this > -> more important than "just" dealing with the tracker issues. > > What, I tell you that your app is going to be accepted and we shouldn't > argue about it, and you want to argue about it? ;) > Oh awesome then :) I think I misread part of your original email. > --titus > -- > C. Titus Brown, ctb at msu.edu > -- -- Guilherme H. Polo Goncalves From benjamin at python.org Sat Apr 11 01:05:02 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 10 Apr 2009 18:05:02 -0500 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com> 2009/4/10 C. Titus Brown : > 2x "improve testing tools for py3k" -- variously focus on improving test > ? ? ? ?coverage and testing wrappers. > > ? ? ? ?One proposes to provide a nice wrapper to make nose and py.test > ? ? ? ?capable of running the regrtests, which (with no change to > ? ? ? ?regrtest) would let people run tests in parallel, distribute or > ? ? ? ?run tests across multiple machines (including Snakebite), tag > ? ? ? ?and run subsets of tests with personal and/or public tags, and > ? ? ? ?otherwise take advantage of many of the nice features of nose > ? ? ? ?and py.test. > > ? ? ? ?The other proposes to measure & increase the code coverage of > ? ? ? ?the py3k tests in both Python and C, integrate across multiple > ? ? ? ?machines, and otherwise provide a nice set of integrated reports > ? ? ? ?that anyone can generate on their own machines. ?This proposal, > ? ? ? ?in particular, could move smoothly towards the effort to produce > ? ? ? ?a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. > ? ? ? ?(This wasn't integrated into the proposal because I only found > ? ? ? ?out about it after the proposals were due.) > > ? ? ? ?I personally think that both testing proposals are good, and > ? ? ? ?they grew out of conversations I had with Brett, who thinks that > ? ? ? ?the general ideas are good. ?So, err, I'm looking for pushback, > ? ? ? ?I guess ;). ?I can expand on these ideas a bit if people are > ? ? ? ?interested. > > ? ? ? ?Both proposals are medium at least, and I've personally been > ? ? ? ?positively impressed with the student interaction. To me, both of those proposals seem to say "measure and improve test coverage" or "nose integration" with a severe lack specific details. Especially the nose plugin one seems like very little work. (Running default nose in the test directory in fact works fairly well.) Another small nit is that they should address Python 2.x, too. -- Regards, Benjamin From ctb at msu.edu Sat Apr 11 01:35:24 2009 From: ctb at msu.edu (C. Titus Brown) Date: Fri, 10 Apr 2009 16:35:24 -0700 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com> References: <20090410203809.GA24530@idyll.org> <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com> Message-ID: <20090410233524.GA18347@idyll.org> On Fri, Apr 10, 2009 at 06:05:02PM -0500, Benjamin Peterson wrote: -> 2009/4/10 C. Titus Brown : -> > 2x "improve testing tools for py3k" -- variously focus on improving test -> > ?? ?? ?? ??coverage and testing wrappers. -> > -> > ?? ?? ?? ??One proposes to provide a nice wrapper to make nose and py.test -> > ?? ?? ?? ??capable of running the regrtests, which (with no change to -> > ?? ?? ?? ??regrtest) would let people run tests in parallel, distribute or -> > ?? ?? ?? ??run tests across multiple machines (including Snakebite), tag -> > ?? ?? ?? ??and run subsets of tests with personal and/or public tags, and -> > ?? ?? ?? ??otherwise take advantage of many of the nice features of nose -> > ?? ?? ?? ??and py.test. -> > -> > ?? ?? ?? ??The other proposes to measure & increase the code coverage of -> > ?? ?? ?? ??the py3k tests in both Python and C, integrate across multiple -> > ?? ?? ?? ??machines, and otherwise provide a nice set of integrated reports -> > ?? ?? ?? ??that anyone can generate on their own machines. ??This proposal, -> > ?? ?? ?? ??in particular, could move smoothly towards the effort to produce -> > ?? ?? ?? ??a "Python-wide" test suite for CPython/IronPython/PyPy/Jython. -> > ?? ?? ?? ??(This wasn't integrated into the proposal because I only found -> > ?? ?? ?? ??out about it after the proposals were due.) -> > -> > ?? ?? ?? ??I personally think that both testing proposals are good, and -> > ?? ?? ?? ??they grew out of conversations I had with Brett, who thinks that -> > ?? ?? ?? ??the general ideas are good. ??So, err, I'm looking for pushback, -> > ?? ?? ?? ??I guess ;). ??I can expand on these ideas a bit if people are -> > ?? ?? ?? ??interested. -> > -> > ?? ?? ?? ??Both proposals are medium at least, and I've personally been -> > ?? ?? ?? ??positively impressed with the student interaction. -> -> To me, both of those proposals seem to say "measure and improve test -> coverage" or "nose integration" with a severe lack specific details. -> Especially the nose plugin one seems like very little work. (Running -> default nose in the test directory in fact works fairly well.) ...fairly, yes ;). But not perfectly. And certainly not with equivalent guarantees to regrtest, which is really what Python developers need. Tracking down the corner cases, writing up examples, setting up tags, getting multiprocess to work properly, and making sure that coverage recording works properly, and then getting people to try it out on THEIR machines, is likely to be a lot of work. The plugin ecosystem for nose is growing daily and supporting that for core would be fantastic; extending it to py.test (whose plugin interface is now mostly compatible with nose) would be even better. The lack of detail on the code coverage is intentional, IMO. It's non-trivial to get a full handle on C code coverage integrated with Python code coverage -- or at least it has been for me -- so I supported the student focusing on first writing robust coverage analysis tools, and only then deciding what to "hit" with more tests. I will encourage the student to talk to this list (or the "tests" list in the stdlib sig) in order to target areas that are more relevant to people. I have had a hard time getting a good sense of what core code is well tested and what is not well tested, across various platforms. While Walter's C/Python integrated code coverage site is nice, it would be even nicer to have a way to generate all that information within any particular checkout on a real-time basis. Doing so in the context of Snakebite would be icing... and I think it's worth supporting in core, especially if it can be done without any changes *to* core. -> Another small nit is that they should address Python 2.x, too. I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an equally valid criticism. Certainly 3.x is the future so I though focusing on increasing code coverage, and especially C code coverage, could best be applied to 3.x. cheers, --titus -- C. Titus Brown, ctb at msu.edu From jackdied at gmail.com Sat Apr 11 01:53:56 2009 From: jackdied at gmail.com (Jack diederich) Date: Fri, 10 Apr 2009 19:53:56 -0400 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown wrote: [megasnip] > roundup VCS integration / build tools to support core development -- > ? ? ? ?a single student proposed both of these and has received some > ? ? ? ?support. ?See http://slexy.org/view/s2pFgWxufI for details. >From the listed webpage I have no idea what he is promising (a combination of very high level and very low level tasks). If he is offering all the same magic for Hg that Trac does for SVN (autolinking "r2001" text to patches, for example) then I'm +1. That should be cake even for a student project. He says vague things about patches too, but I'm not sure what. If he wanted to make that into a 'patchbot' that just applied every patch in isolation and ran 'make && make test' and posted results in the tracker I'd be a happy camper. But maybe those are goals for next year, because I'm not quite sure what the proposal is. -Jack From greg.ewing at canterbury.ac.nz Sat Apr 11 02:41:14 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Apr 2009 12:41:14 +1200 Subject: [Python-Dev] Lazy importing (was Rethinking intern() and its data structure) In-Reply-To: <49DF08D8.9080806@gmail.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <49DF08D8.9080806@gmail.com> Message-ID: <49DFE72A.5010905@canterbury.ac.nz> Nick Coghlan wrote: > I sometimes wish for a nice, solid lazy > module import mechanism that manages to avoid the potential deadlock > problems created by using import statements inside functions. I created an ad-hoc one of these for PyGUI recently. I can send you the code if you're interested. I didn't have any problems with deadlocks, but I did find one rather annoying problem. It seems that an exception occurring at certain times during the import process gets swallowed and turned into a generic ImportError. I had to resort to catching exceptions and printing my own traceback in order to diagnose missing auto-imported names. -- Greg From greg.ewing at canterbury.ac.nz Sat Apr 11 02:51:29 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Apr 2009 12:51:29 +1200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> Message-ID: <49DFE991.8090605@canterbury.ac.nz> Paul Moore wrote: > 3. Encoding > > JSON text SHALL be encoded in Unicode. The default encoding is > UTF-8. > > This is at best confused (in my utterly non-expert opinion :-)) as > Unicode isn't an encoding... I'm inclined to agree. I'd go further and say that if JSON is really mean to be a text format, the standard has no business mentioning encodings at all. The reason you use a text format in the first place is that you have some way of transmitting text, and you want to send something that isn't text. In that situation, the encoding is already determined by whatever means you're using to send the text. -- Greg From brendan at kublai.com Sat Apr 11 02:52:01 2009 From: brendan at kublai.com (Brendan Cully) Date: Fri, 10 Apr 2009 17:52:01 -0700 Subject: [Python-Dev] Rethinking intern() and its data structure In-Reply-To: <20090410190248.913033A4063@sparrow.telecommunity.com> References: <49DE0DF6.1040900@arbash-meinel.com> <49DE2AD4.6090605@cheimes.de> <49DE31C9.103@gmail.com> <49DE9F42.5000704@canterbury.ac.nz> <49DE9FB4.9060908@gmail.com> <43aa6ff70904092107n116fc719g592158570db4d4eb@mail.gmail.com> <49DF08D8.9080806@gmail.com> <20090410190248.913033A4063@sparrow.telecommunity.com> Message-ID: <20090411005201.GD7706@kremvax.cs.ubc.ca> On Friday, 10 April 2009 at 15:05, P.J. Eby wrote: > At 06:52 PM 4/10/2009 +1000, Nick Coghlan wrote: >> This problem (slow application startup times due to too many imports at >> startup, which can in turn can be due to top level imports for library >> or framework functionality that a given application doesn't actually >> use) is actually the main reason I sometimes wish for a nice, solid lazy >> module import mechanism that manages to avoid the potential deadlock >> problems created by using import statements inside functions. I'd love to see that too. I imagine it would be beneficial for many python applications. > Have you tried http://pypi.python.org/pypi/Importing ? Or more > specifically, > http://peak.telecommunity.com/DevCenter/Importing#lazy-imports ? Here's what we do in Mercurial, which is a little more user-friendly, but possibly too magical for general use (but provides us a very nice speedup): http://www.selenic.com/repo/index.cgi/hg/file/tip/mercurial/demandimport.py#l1 It's nice and small, and it is invisible to the rest of the code, but it's probably too aggressive for all users. The biggest problem is probably that ImportErrors are deferred until first access, which trips up modules that do things like try: import foo except ImportError import fallback as foo of which there are a few. The mercurial module maintains a blacklist as a bandaid, but it'd be great to have a real fix. From guido at python.org Sat Apr 11 04:11:44 2009 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Apr 2009 19:11:44 -0700 Subject: [Python-Dev] [Email-SIG] the email module, text, and bytes (was Re: Dropping bytes "support" in json) In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <1F3DC671-746B-425C-A847-4F6CB0DB9FD0@python.org> <20090410031151.12555.724184150.divmod.xquotient.7482@weber.divmod.com> <92023.1239381344@parc.com> <87ocv4fhv0.fsf@xemacs.org> Message-ID: On Fri, Apr 10, 2009 at 12:04 PM, Barry Warsaw wrote: > On Apr 10, 2009, at 3:06 PM, Stephen J. Turnbull wrote: > >> Bill Janssen writes: >>> >>> Barry Warsaw wrote: >>> >>>> In that case, we really need the >>>> bytes-in-bytes-out-bytes-in-the-chewy- >>>> center API first, and build things on top of that. >>> >>> Yep. >> >> Uh, I hate to rain on a parade, but isn't that how we arrived at the >> *current* email package? > > Not really. ?We got here because we were too damn sloppy about > the distinction. Agreed. I take full responsibility -- the str/unicode approach we introduced in 2.0 seemed like the best thing we could do at the time, but in retrospect it would've been better if we'd left str alone and introduced a unicode type that was truly distinct -- like str in 3.0. The email package is not the only system that ended up with a muddled distinction between the two as a result. > I'm going to remove python-dev from subsequent follow ups. ?Please join us > at email-sig for further discussion. > >Barry -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Apr 11 04:16:35 2009 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Apr 2009 19:16:35 -0700 Subject: [Python-Dev] Going off-line for a week Message-ID: Folks, I'm going off-line for a week to enjoy a family vacation. When I come back I'll probably just archive most email unread, so now's your chance to add braces to the language. :-) Not-yet-retiring-ly y'rs, -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Sat Apr 11 05:06:25 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Apr 2009 05:06:25 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> Message-ID: <49E00931.6050107@v.loewis.de> >> In email's case this is true, but in JSON's case it's not. JSON is a >> format defined as a sequence of code points; MIME is defined as a >> sequence of octets. > > What is the 'bytes support' issue for json? Is it about content within > a json text? Or about the transport format of a json text? The question is whether the json parsing should take bytes or str as input, and whether the json marshalling should produce bytes or str. More specifically, the question is whether it is ok to drop bytes. I personally think that it needs to support bytes, and that perhaps str support is optional (as you could always explicitly encode the str as UTF-8 before passing it to the JSON parser, if you somehow managed to get a str of JSON to parse). However, I really think that this question cannot be answered by reading the RFC. It should be answered by verifying how people use the json library in 2.x. > The standard does not specify any correspondence between representations > and domain objects And that is not the issue at all; nobody is debating what output the parsing should produce. Regards, Martin From thiagoharry at riseup.net Sat Apr 11 03:58:26 2009 From: thiagoharry at riseup.net (Harry (Thiago Leucz Astrizi)) Date: Fri, 10 Apr 2009 22:58:26 -0300 (BRT) Subject: [Python-Dev] Needing help to change the grammar Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello everybody. My name is Thiago and currently I'm working as a teacher in a high school in Brazil. I have plans to offer in the school a programming course to the students, but I had some problems to find a good lang?age. As a Python programmer, I really like the language's syntax and I think that Python is very good to teach programming. But there's a little problem: the commands and keywords are in english and this can be an obstacle to the teenagers that could enter in the course. Because of this, I decided to create a Python version with keywords in portuguese and with some modifications in the grammar to be more portuguese-like. To this, I'm using Python 3.0.1 source code. I already read PEP 306 (How to Change Python's Grammar) and changed the suggested files. My changes currently are working properly except for one thing: the "comp_op". The code that in english Python is written as "is not", in portuguese Python shall be "n?o ?". Besides the translations to the words "is" and "not", I'm also changing the order in which they appear letting "not" before "is". It appears to be a simple change, but strangely, I'm not being able to perform it. I already made correct modifications in Grammar/Grammar file, the new keywords already appear in Lib/keyword.py and I also changed the function validate_comp_op in Modules/parsermodule.c: static int validate_comp_op(node *tree) { (...) else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) { res = (validate_ntype(CHILD(tree, 0), NAME) && validate_ntype(CHILD(tree, 1), NAME) && (((strcmp(STR(CHILD(tree, 0)), "n?o") == 0) && (strcmp(STR(CHILD(tree, 1)), "?") == 0)) || ((strcmp(STR(CHILD(tree, 0)), "n?o") == 0) && (strcmp(STR(CHILD(tree, 1)), "em") == 0)))); if (!res && !PyErr_Occurred()) err_string("operador de compara??o desconhecido"); } return (res); } I also looked in the other files proposed in the PEP but I didn't find in them nothing that I recognized as needing changes. But when I type "make" to compile the new language, the following error appears in Lib/encodings/__init__.py (which I already translated to the portuguese Python): harry at skynet:~/Python-3.0.1$ make Fatal Python error: Py_Initialize: can't initialize sys standard streams File "/home/harry/Python-3.0.1/Lib/encodings/__init__.py", line 73 se entry n?o ? _unknown: ^ SyntaxError: invalid syntax The comp_op doesn't work! I don't know more what to change. Perhaps there's some file that I should modify, but I didn't paid attention enough in it... Please, anybody has some idea of what should I do? Thanks a lot. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJ3/eTmNGEzq1zP84RAh5vAJ492eVFgbR5KCCJNdTJOIR/Xtfb0ACdE0NG Yxnxmo9yjOL6H8J93nPBcJs= =6VLu -----END PGP SIGNATURE----- From skippy.hammond at gmail.com Sat Apr 11 06:36:00 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Sat, 11 Apr 2009 14:36:00 +1000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E00931.6050107@v.loewis.de> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> Message-ID: <49E01E30.8060302@gmail.com> [Dropping email sig] On 11/04/2009 1:06 PM, "Martin v. L?wis" wrote: > However, I really think that this question cannot be answered by > reading the RFC. It should be answered by verifying how people use > the json library in 2.x. In the absence of anything more formal, here are 2 anecdotes: * The python-twitter package seems to: - Use dumps() mainly to get string objects. It uses it both for __str__, and for an API called 'AsJsonString' - the intent of this seems to be to provide strings for the consumer of the twitter API - its not clear how such consumers would use them. Note that this API doesn't seem to need to 'write' json objects, else I suspect they would then be expecting dumps to return bytes to put on the wire. They expect loads to accept the bytes they are reading directly off the wire. * couchdb's wrappers use these functions purely as bytes - they are either decoding an application/json object from the bits they read, or they are encoding it to use directly in the body of a request (or even directly in the URL of the request!) I find myself conflicted. On one hand I believe the most common use of json will be to exchange data with something inherently byte-based. On the other hand though, json itself seems to be naturally "stringy" and the most natural interface for a casual user would be strings. I'm personally leaning slightly towards strings, putting the burden on bytes-users of json to explicitly use the appropriate encoding, even in cases where it *must* be utf8. On the other hand, I'm too lazy to dig back through this large thread, but I seem to recall a suggestion that using bytes would be significantly faster. If that is true, I'd be happy to settle for bytes as I believe the most common *actual* use of json will be via things like the twitter and couch libraries - and may even be a key bottleneck for such libraries - so people will not be directly exposed to its interface... Mark Cheers, Mark From martin at v.loewis.de Sat Apr 11 07:45:49 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Apr 2009 07:45:49 +0200 Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: <49E02E8D.5090005@v.loewis.de> > It appears to be a simple change, but strangely, I'm not being able to > perform it. I already made correct modifications in Grammar/Grammar > file, the new keywords already appear in Lib/keyword.py and I also > changed the function validate_comp_op in Modules/parsermodule.c: > > static int > validate_comp_op(node *tree) > { > (...) > else if ((res = validate_numnodes(tree, 2, "comp_op")) != 0) { > res = (validate_ntype(CHILD(tree, 0), NAME) > && validate_ntype(CHILD(tree, 1), NAME) > && (((strcmp(STR(CHILD(tree, 0)), "n?o") == 0) > && (strcmp(STR(CHILD(tree, 1)), "?") == 0)) > || ((strcmp(STR(CHILD(tree, 0)), "n?o") == 0) > && (strcmp(STR(CHILD(tree, 1)), "em") == 0)))); > if (!res && !PyErr_Occurred()) > err_string("operador de compara??o desconhecido"); > } > return (res); > } > Notice that Python source is represented in UTF-8 in the parser. It might be that the C source code has a different encoding, which would cause the strcmp to fail. Regards, Martin From martin at v.loewis.de Sat Apr 11 07:49:50 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Apr 2009 07:49:50 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E01E30.8060302@gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E01E30.8060302@gmail.com> Message-ID: <49E02F7E.6010605@v.loewis.de> > I'm personally leaning slightly towards strings, putting the burden on > bytes-users of json to explicitly use the appropriate encoding, even in > cases where it *must* be utf8. On the other hand, I'm too lazy to dig > back through this large thread, but I seem to recall a suggestion that > using bytes would be significantly faster. Not sure whether it would be *significantly* faster, but yes, Bob wrote an accelerator for parsing out of a byte string to make it really fast; IIRC, he claims that it is faster than pickling. Regards, Martin From martin at v.loewis.de Sat Apr 11 08:13:35 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Apr 2009 08:13:35 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: <49E0350F.8040506@v.loewis.de> > 2x "keyring package" -- see > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/. > The poorer one of these will probably be axed unless Tarek gives it > strong support. I don't think these are good "core" projects. Even if the students come up with a complete solution, it shouldn't be integrated with the standard library right away. Instead, it should have a life outside the standard library, and be considered for inclusion only if the user community wants it. I'm also skeptical that this is a good SoC project in the first place. Coming up with a wrapper for, say, Apple Keychain, could be a good project. Coming up with a unifying API for all keychains is out of scope, IMO; various past attempts at unifying APIs have demonstrated that creating them is difficult, and might require writing a PEP (whose acceptance then might not happen within a summer). Regards, Martin From jackdied at gmail.com Sat Apr 11 08:20:24 2009 From: jackdied at gmail.com (Jack diederich) Date: Sat, 11 Apr 2009 02:20:24 -0400 Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: On Fri, Apr 10, 2009 at 9:58 PM, Harry (Thiago Leucz Astrizi) wrote: > > Hello everybody. My name is Thiago and currently I'm working as a > teacher in a high school in Brazil. I have plans to offer in the > school a programming course to the students, but I had some problems > to find a good lang?age. As a Python programmer, I really like the > language's syntax and I think that Python is very good to teach > programming. But there's a little problem: the commands and keywords > are in english and this can be an obstacle to the teenagers that could > enter in the course. > > Because of this, I decided to create a Python version with keywords in > portuguese and with some modifications in the grammar to be more > portuguese-like. To this, I'm using Python 3.0.1 source code. I love the idea (and most recently edited PEP 306) so here are a few suggestions; Brazil has many python programmers so you might be able to make quick progress by asking them for volunteer time. To bug-hunt your technical problem: try switching the "not is" operator to include an underscore "not_is." The python LL(1) grammar checker works for python but isn't robust, and does miss some grammar ambiguities. Making the operator a single word might reveal a bug in the parser. Please consider switching your students to 'real' python part way through the course. If they want to use the vast amount of python code on the internet as examples they will need to know the few English keywords. Also - most python core developers are not native English speakers and do OK :) PyCon speakers are about 25% non-native English speakers and EuroPython speakers are about the reverse (my rough estimate - I'd love to see some hard numbers). Keep up the Good Work, -Jack From ncoghlan at gmail.com Sat Apr 11 09:09:33 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Apr 2009 17:09:33 +1000 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> Message-ID: <49E0422D.10704@gmail.com> Barry Warsaw wrote: >> Of course, one could use message.header and message.bythdr and they'd >> be the same length. > > I was trying to figure out what a 'thdr' was that we'd want to index > 'by' it. :) In the discussions about os.environ, the suggested approach was to just tack a 'b' onto the end of the name to get the bytes version (i.e. os.environb). That aligns nicely with the b"" prefix for bytes literals, and isn't much of a typing or reading burden when dealing with the bytes API instead of the text one. A similar naming scheme (i.e. msg.headers and msg.headersb) would probably work for email as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sat Apr 11 10:12:23 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Apr 2009 08:12:23 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E01E30.8060302@gmail.com> <49E02F7E.6010605@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > Not sure whether it would be *significantly* faster, but yes, Bob wrote > an accelerator for parsing out of a byte string to make it really fast; > IIRC, he claims that it is faster than pickling. Isn't premature optimization the root of all evil? Besides, the fact that many values in a typical JSON object will be strings, and must be encoded from/decoded to unicode objects in py3k, suggests that accepting/outputting unicode as default is the laziest (i.e. the best) choice performance-wise. But you don't have to trust me: look at the quick numbers I've posted. The py3k version (in the str-only incarnation I've proposed) is sometimes actually faster than the trunk version: http://mail.python.org/pipermail/python-dev/2009-April/088498.html Regards Antoine. From stephen at xemacs.org Sat Apr 11 10:35:01 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 17:35:01 +0900 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49DFE991.8090605@canterbury.ac.nz> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49DF05FC.9040208@gmail.com> <79990c6b0904100453g41c662fbs25d272d5372b5f47@mail.gmail.com> <49DFE991.8090605@canterbury.ac.nz> Message-ID: <87myantwp6.fsf@xemacs.org> Greg Ewing writes: > The reason you use a text format in the first place is that > you have some way of transmitting text, and you want to > send something that isn't text. In that situation, the > encoding is already determined by whatever means you're > using to send the text. Determined, yes, but all too often in a nondeterministic way. That's precisely the problem that the spec is trying to avert. People often schlep "text" around as if that were well-defined, forcing receivers to guess what is meant. Having a spec isn't going to stop them, but at least you can lash them with a wet noodle. The specification of at least the abstract character repertoire and coded character set also allows implementers like Python to proceed confidently with their usual internal encoding. From chris at simplistix.co.uk Sat Apr 11 11:17:27 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 10:17:27 +0100 Subject: [Python-Dev] How do I update http://www.python.org/dev/faq? Message-ID: <49E06027.60409@simplistix.co.uk> Hi All, How do I update the faq on the website? This section: http://python.org/dev/faq/#how-to-test-a-patch ...could do with fleshing out from this discussion: http://mail.python.org/pipermail/python-dev/2009-March/086771.html ...and the link to: http://www.python.org/doc/lib/module-test.html ...still ends up at the 2.5.2 docs. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Apr 11 12:12:31 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 11:12:31 +0100 Subject: [Python-Dev] Test failures on Python 2.7 (trunk) Message-ID: <49E06D0F.2080905@simplistix.co.uk> Hi All, Got these when running from checkout on Mac OS: Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it ... test test_asynchat produced unexpected output: ********************************************************************** error: uncaptured python exception, closing channel (:[Errno 9] Bad file descriptor [/Users/chris/py2k/Lib/asyncore.py|readwrite|107] [/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441] [|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165]) ...(lots of repeats of the above) ********************************************************************** test_asyncore test test_asyncore failed -- Traceback (most recent call last): File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in test_readwrite self.assertEqual(tobj.read, True) AssertionError: False != True ... test test_macostools failed -- Traceback (most recent call last): File "/Users/chris/py2k/Lib/test/test_macostools.py", line 90, in test_mkalias_relative macostools.mkalias(test_support.TESTFN, TESTFN2, sys.prefix) File "/Users/chris/py2k/Lib/plat-mac/macostools.py", line 40, in mkalias relativefsr = File.FSRef(relative) Error: (-35, 'no such volume') Should I expect these? If so, why? cheers, Chris From chris at simplistix.co.uk Sat Apr 11 12:14:32 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 11:14:32 +0100 Subject: [Python-Dev] Test failure on Py3k branch Message-ID: <49E06D88.6060507@simplistix.co.uk> Hi All, Also got the following failure from a py3k checkout: test test_cmd_line failed -- Traceback (most recent call last): File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in test_run_code 0) AssertionError: 1 != 0 Should I expect this or does someone owe beer? ;-) Chris From mario.danic at gmail.com Sat Apr 11 12:21:18 2009 From: mario.danic at gmail.com (Mario) Date: Sat, 11 Apr 2009 12:21:18 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: References: <20090410203809.GA24530@idyll.org> Message-ID: <79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com> > > > He says vague things about patches too, but I'm not sure what. If he > wanted to make that into a 'patchbot' that just applied every patch in > isolation and ran 'make && make test' and posted results in the > tracker I'd be a happy camper. > > Jack, how about you write that idea down on the wiki page mentioned in the proposal, along with the use case? Following that, I'll see if I can do anything about it to make it a reality. Cheers, M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Sat Apr 11 12:39:08 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 11 Apr 2009 11:39:08 +0100 Subject: [Python-Dev] Test failure on Py3k branch In-Reply-To: <49E06D88.6060507@simplistix.co.uk> References: <49E06D88.6060507@simplistix.co.uk> Message-ID: <5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com> On Sat, Apr 11, 2009 at 11:14 AM, Chris Withers wrote: > Also got the following failure from a py3k checkout: > > test test_cmd_line failed -- Traceback (most recent call last): > ?File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in > test_run_code > ? ?0) > AssertionError: 1 != 0 Are you on OS X? This looks like http://bugs.python.org/issue4388 Mark From chris at simplistix.co.uk Sat Apr 11 12:41:19 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 11:41:19 +0100 Subject: [Python-Dev] Test failure on Py3k branch In-Reply-To: <5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com> References: <49E06D88.6060507@simplistix.co.uk> <5c6f2a5d0904110339l1f614e0hfaeb0f253c8eede@mail.gmail.com> Message-ID: <49E073CF.5080702@simplistix.co.uk> Mark Dickinson wrote: > On Sat, Apr 11, 2009 at 11:14 AM, Chris Withers wrote: >> Also got the following failure from a py3k checkout: >> >> test test_cmd_line failed -- Traceback (most recent call last): >> File "/Users/chris/py3k/Lib/test/test_cmd_line.py", line 143, in >> test_run_code >> 0) >> AssertionError: 1 != 0 > > Are you on OS X? This looks like > > http://bugs.python.org/issue4388 Yup, that looks like it. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Apr 11 12:41:59 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 11:41:59 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> Message-ID: <49E073F7.9060309@simplistix.co.uk> Steve Holden wrote: >> Anything using an exec > > that can be done in some other (more pythonic way) There's *always* another way ;-) >> is broken by definition ;-) >> >> Benjamin? >> > We've just had a fairly clear demonstration that small semantic changes > to the language can leave unexpected areas borked. Oh? I don't follow... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From ncoghlan at gmail.com Sat Apr 11 13:10:40 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Apr 2009 21:10:40 +1000 Subject: [Python-Dev] Test failures on Python 2.7 (trunk) In-Reply-To: <49E06D0F.2080905@simplistix.co.uk> References: <49E06D0F.2080905@simplistix.co.uk> Message-ID: <49E07AB0.6020301@gmail.com> Chris Withers wrote: > Hi All, > > Got these when running from checkout on Mac OS: > > Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it > ... > test test_asynchat produced unexpected output: > ********************************************************************** > error: uncaptured python exception, closing channel > ( 'socket.error'>:[Errno 9] Bad file descriptor > [/Users/chris/py2k/Lib/asyncore.py|readwrite|107] > [/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441] > [|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165]) > ...(lots of repeats of the above) > ********************************************************************** > test_asyncore > test test_asyncore failed -- Traceback (most recent call last): > File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in > test_readwrite > self.assertEqual(tobj.read, True) > AssertionError: False != True I'm getting the asyncore failure on Linux as well (no unexpected output though - just the final exception). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From chris at simplistix.co.uk Sat Apr 11 13:23:18 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 12:23:18 +0100 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49D9FD15.9030406@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D35A39.7020507@simplistix.co.uk> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com> <49D9FD15.9030406@simplistix.co.uk> Message-ID: <49E07DA6.2010100@simplistix.co.uk> Chris Withers wrote: > Benjamin Peterson wrote: >>>>> Assuming it breaks no tests, would there be objection to me committing >>>>> the >>>>> above change to the Python 3 trunk? >>>> That's up to Benjamin. Personally, I live by "if it ain't broke, don't >>>> fix it." :-) >>> Anything using an exec is broken by definition ;-) >> >> "practicality beats purity" >> >>> Benjamin? >> >> +0 > > OK, well, I'll use it as my first "test commit" when I get a chance :-) Actually, this was gone on the py3k branch already. I've committed the fix to trunk, is there anything else I need to do? cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From dickinsm at gmail.com Sat Apr 11 14:20:28 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 11 Apr 2009 13:20:28 +0100 Subject: [Python-Dev] Python 2.6.2 final In-Reply-To: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org> References: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org> Message-ID: <5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com> On Fri, Apr 10, 2009 at 2:31 PM, Barry Warsaw wrote: > bugs.python.org is apparently down right now, but I set issue 5724 to > release blocker for 2.6.2. ?This is waiting for input from Mark Dickinson, > and it relates to test_cmath failing on Solaris 10. I'd prefer to leave this alone for 2.6.2. There's a fix posted to the issue tracker, but it's not entirely trivial and I think the risk of accidental breakage outweighs the niceness of seeing 'all tests passed' on Solaris. Mark From chris at simplistix.co.uk Sat Apr 11 14:33:33 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 13:33:33 +0100 Subject: [Python-Dev] email header encoding In-Reply-To: <87myaofh5q.fsf@xemacs.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <1239382031.8682.11.camel@haku> <87myaofh5q.fsf@xemacs.org> Message-ID: <49E08E1D.6070207@simplistix.co.uk> Stephen J. Turnbull wrote: > Robert Brewer writes: > > > Syntactically, there's no sense in providing: > > > > Message.set_header('Subject', 'Some text', encoding='utf-16') > > > > ...since you could more clearly write the same as: > > > > Message.set_header('Subject', 'Some text'.encode('utf-16')) > > Which you now must *parse* and guess the encoding to determine how to > RFC-2047-encode the binary mush. I think the encoding parameter is > necessary here. Indeed. > > But it would be far easier to do all the encoding at once in an > > output() or serialize() method. Do different headers need different > > encodings? > > You can have multiple encodings within a single header (and a na?ve "can" and "should" are two very different things. When is it even a good idea to have more than one encoding in a single header? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Apr 11 14:39:40 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 13:39:40 +0100 Subject: [Python-Dev] headers api for email package In-Reply-To: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <49E08F8C.5030205@simplistix.co.uk> Barry Warsaw wrote: > >>> message['Subject'] > > The raw bytes or the decoded unicode? A header object. > Okay, so you've picked one. Now how do you spell the other way? str(message['Subject']) bytes(message['Subject']) > Now, setting headers. Sometimes you have some unicode thing and > sometimes you have some bytes. You need to end up with bytes in the > ASCII range and you'd like to leave the header value unencoded if so. > But in both cases, you might have bytes or characters outside that > range, so you need an explicit encoding, defaulting to utf-8 probably. > > >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >>> Message.set_header('Subject', b'Some bytes') Where you just want "a damned valid email and stop making my life hard!": Message['Subject']='Some text' Where you care about what encoding is used: Message['Subject']=Header('Some text',encoding='utf-8') If you have bytes, for whatever reason: Message['Subject']=b'some bytes'.decode('utf-8') ...because only you know what encoding those bytes use! > One of those maps to > > >>> message['Subject'] = ??? ...should only accept text or a Header object. Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Apr 11 14:41:46 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 13:41:46 +0100 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49E0422D.10704@gmail.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> <49E0422D.10704@gmail.com> Message-ID: <49E0900A.3000302@simplistix.co.uk> Nick Coghlan wrote: > Barry Warsaw wrote: >>> Of course, one could use message.header and message.bythdr and they'd >>> be the same length. >> I was trying to figure out what a 'thdr' was that we'd want to index >> 'by' it. :) > > In the discussions about os.environ, the suggested approach was to just > tack a 'b' onto the end of the name to get the bytes version (i.e. > os.environb). > > That aligns nicely with the b"" prefix for bytes literals, and isn't > much of a typing or reading burden when dealing with the bytes API > instead of the text one. > > A similar naming scheme (i.e. msg.headers and msg.headersb) would > probably work for email as well. That just feels nasty though :-( Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Apr 11 14:46:18 2009 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 11 Apr 2009 13:46:18 +0100 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> Message-ID: <49E0911A.9040809@simplistix.co.uk> glyph at divmod.com wrote: > > My preference would be that > > message.headers['Subject'] = b'Some Bytes' > > would simply raise an exception. If you've got some bytes, you should > instead do > > message.bytes_headers['Subject'] = b'Some Bytes' Remind me again why you need to differentiate between headers and bytes_headers? I think bytes headers are evil. If you don't know the encoding when you have one, who does or ever will? > message.headers['Subject'] = Header(bytes=b'Some Bytes', > encoding='utf-8') > > Explicit is better than implicit, right? Indeed, and the case for the above would be to keep indempotence of incoming messages in applications like mailman... ...otherwise we could just decode them and be done with it. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk From aahz at pythoncraft.com Sat Apr 11 15:01:04 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 11 Apr 2009 06:01:04 -0700 Subject: [Python-Dev] How do I update http://www.python.org/dev/faq? In-Reply-To: <49E06027.60409@simplistix.co.uk> References: <49E06027.60409@simplistix.co.uk> Message-ID: <20090411130104.GB15750@panix.com> On Sat, Apr 11, 2009, Chris Withers wrote: > > How do I update the faq on the website? Brett Cannon has been the primary maintainer, but he's offline for a while; are you interested in picking up the task? If yes, please subscribe to pydotorg at python.org and then send in your SSH key to request commit access to the website. Otherwise, please send your suggested updates to pydotorg. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From rdmurray at bitdance.com Sat Apr 11 15:14:30 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Sat, 11 Apr 2009 09:14:30 -0400 (EDT) Subject: [Python-Dev] Test failures on Python 2.7 (trunk) In-Reply-To: <49E07AB0.6020301@gmail.com> References: <49E06D0F.2080905@simplistix.co.uk> <49E07AB0.6020301@gmail.com> Message-ID: On Sat, 11 Apr 2009 at 21:10, Nick Coghlan wrote: > Chris Withers wrote: >> Hi All, >> >> Got these when running from checkout on Mac OS: >> >> Could not find '/Users/chris/py2k/Lib/test' in sys.path to remove it >> ... >> test test_asynchat produced unexpected output: >> ********************************************************************** >> error: uncaptured python exception, closing channel >> (> 'socket.error'>:[Errno 9] Bad file descriptor >> [/Users/chris/py2k/Lib/asyncore.py|readwrite|107] >> [/Users/chris/py2k/Lib/asyncore.py|handle_expt_event|441] >> [|getsockopt|1] [/Users/chris/py2k/Lib/socket.py|_dummy|165]) >> ...(lots of repeats of the above) >> ********************************************************************** >> test_asyncore >> test test_asyncore failed -- Traceback (most recent call last): >> File "/Users/chris/py2k/Lib/test/test_asyncore.py", line 144, in >> test_readwrite >> self.assertEqual(tobj.read, True) >> AssertionError: False != True > > I'm getting the asyncore failure on Linux as well (no unexpected output > though - just the final exception). Ditto. I looked at that asyncore traceback yesterday. The way that the flags argument to the readwrite call are propagated to the object was changed, but the tests were not updated to match. I haven't yet gotten as far as figuring out why the changes were made, but svn blames josiah.carlson for the changes (or at least the most recent ones). --David From benjamin at python.org Sat Apr 11 15:21:23 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 11 Apr 2009 08:21:23 -0500 Subject: [Python-Dev] issue5578 - explanation In-Reply-To: <49E07DA6.2010100@simplistix.co.uk> References: <693bc9ab0903312015l78d542a2qd07ae9e6fdfeb84@mail.gmail.com> <49D52B2C.5050509@simplistix.co.uk> <49D52C5B.7010506@simplistix.co.uk> <49D63465.80401@simplistix.co.uk> <1afaf6160904031427p7fa95d07q340fd54cb7c34963@mail.gmail.com> <49D9FD15.9030406@simplistix.co.uk> <49E07DA6.2010100@simplistix.co.uk> Message-ID: <1afaf6160904110621g5d3e05bap63747462bad9f92b@mail.gmail.com> 2009/4/11 Chris Withers : > Actually, this was gone on the py3k branch already. > > I've committed the fix to trunk, is there anything else I need to do? Since it's not in py3k, I think not. -- Regards, Benjamin From stephen at xemacs.org Sat Apr 11 16:19:32 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Apr 2009 23:19:32 +0900 Subject: [Python-Dev] email header encoding In-Reply-To: <49E08E1D.6070207@simplistix.co.uk> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <1239382031.8682.11.camel@haku> <87myaofh5q.fsf@xemacs.org> <49E08E1D.6070207@simplistix.co.uk> Message-ID: <87vdpbjmrv.fsf@xemacs.org> Chris Withers writes: > When is it even a good idea to have more than one encoding in a single > header? I'd be happy to discuss that on email-sig, but it's really OT for Python-Dev at this point. From g.brandl at gmx.net Sat Apr 11 20:12:34 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Apr 2009 20:12:34 +0200 Subject: [Python-Dev] PyCFunction_* Missing In-Reply-To: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com> References: <7c1ab96d0904080504o3b58b1bdvedd31ac872239921@mail.gmail.com> Message-ID: Campbell Barton schrieb: > Hi, Just noticed the new Python 2.6.2 docs now dont have any reference to > * PyCFunction_New > * PyCFunction_NewEx > * PyCFunction_Check > * PyCFunction_Call > > Ofcourse these are still in the source code but Im wondering if this > is intentional that these functions should be for internal use only? I don't think so. PyCFunctions are mentioned in the C API reference, so it seems that these functions simply fall into the regrettably quite large category of public API functions that aren't documented yet. Please open a tracker item and assign it to me, so that I don't forget this. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ctb at msu.edu Sat Apr 11 20:16:33 2009 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 11 Apr 2009 11:16:33 -0700 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com> References: <20090410203809.GA24530@idyll.org> <79957db20904110321n58e50f3o4d8ede6ffc97070c@mail.gmail.com> Message-ID: <20090411181633.GG7768@idyll.org> On Sat, Apr 11, 2009 at 12:21:18PM +0200, Mario wrote: -> > He says vague things about patches too, but I'm not sure what. If he -> > wanted to make that into a 'patchbot' that just applied every patch in -> > isolation and ran 'make && make test' and posted results in the -> > tracker I'd be a happy camper. -> > -> Jack, how about you write that idea down on the wiki page mentioned in the -> proposal, along with the use case? Following that, I'll see if I can do -> anything about it to make it a reality. We had a GSoC student two years back who worked on something like this; his name is Michal Kwiatkowski. He probably has the code working somewhere. It's a nontrivial problem if you want to do it properly with VMs etc. cheers, --titus -- C. Titus Brown, ctb at msu.edu From ctb at msu.edu Sat Apr 11 20:21:14 2009 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 11 Apr 2009 11:21:14 -0700 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <49E0350F.8040506@v.loewis.de> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> Message-ID: <20090411182114.GH7768@idyll.org> On Sat, Apr 11, 2009 at 08:13:35AM +0200, "Martin v. L?wis" wrote: -> > 2x "keyring package" -- see -> > http://tarekziade.wordpress.com/2009/03/27/pycon-hallway-session-1-a-keyring-library-for-python/. -> > The poorer one of these will probably be axed unless Tarek gives it -> > strong support. -> -> I don't think these are good "core" projects. Even if the students come -> up with a complete solution, it shouldn't be integrated with the -> standard library right away. Instead, it should have a life outside the -> standard library, and be considered for inclusion only if the user -> community wants it. Tarek has said he can put it into distutils on a trial basis, although I'm sure that'll depend on what the student comes up with. I'm using "core projects" as a shorthand for projects that directly address the core development environment, the stdlib, and priorities of committers on python-dev. Tarek is a committer, and it sounded like you, Jim, and Georg were all interested in this project, too -- that pushes it well into "core" territory IMO. -> I'm also skeptical that this is a good SoC project in the first place. -> Coming up with a wrapper for, say, Apple Keychain, could be a good -> project. Coming up with a unifying API for all keychains is out of -> scope, IMO; various past attempts at unifying APIs have demonstrated -> that creating them is difficult, and might require writing a PEP -> (whose acceptance then might not happen within a summer). Well, that's a more unassailable argument and one I agree with ;). cheers, --titus -- C. Titus Brown, ctb at msu.edu From ziade.tarek at gmail.com Sat Apr 11 20:41:09 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 11 Apr 2009 20:41:09 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090411182114.GH7768@idyll.org> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> <20090411182114.GH7768@idyll.org> Message-ID: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> > -> I'm also skeptical that this is a good SoC project in the first place. What is a good SoC project from your point of view ? > -> Coming up with a wrapper for, say, Apple Keychain, could be a good > -> project. Coming up with a unifying API for all keychains is out of > -> scope, IMO; various past attempts at unifying APIs have demonstrated > -> that creating them is difficult, and might require writing a PEP > -> (whose acceptance then might not happen within a summer). > > Well, that's a more unassailable argument and one I agree with ;). For this case, the student work is not a "dumb" work consisting of writing code on an already-thaught PEP... Part of the work will consist of working on a PEP-like document, and on building APIs for various keychains and see if we can have an unified one. I doubt the PEP-like document can be written before writing prototypes APIs for various keychains has been done. At the end of the summer, if we come up with a nice unified API, I'd like to include it to Distutils for the "register" command, and maybe write a PEP to have it as part of the standard library because it makes sense to have this kind of feature imho. Tarek From ziade.tarek at gmail.com Sat Apr 11 21:13:10 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sat, 11 Apr 2009 21:13:10 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> <20090411182114.GH7768@idyll.org> <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> Message-ID: <94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com> Ok what about this then: I am changing the scope a little bit, and I think the students will be fine with this change since it's the same work. "The project will consist of creating a plugin system into Distutils to be able to store and retrieve the username/password used by some commands, without having to store it in *clear text* in the .pypirc file anymore. The student will also provide some plugins for a maximum number of existing keyring systems. Some of these plugins might be included in Distutils, and some of them in a third-party package. " Regards Tarek From martin at v.loewis.de Sun Apr 12 00:19:04 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Apr 2009 00:19:04 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090411182114.GH7768@idyll.org> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> <20090411182114.GH7768@idyll.org> Message-ID: <49E11758.7070804@v.loewis.de> > I'm using "core projects" as a shorthand for projects that directly > address the core development environment, the stdlib, and priorities of > committers on python-dev. Tarek is a committer, and it sounded like > you, Jim, and Georg were all interested in this project, too -- that > pushes it well into "core" territory IMO. I understand why Tarek wants it, and I can sympathise with that: to protect PyPI passwords better (they are currently stored on disk in plain). Putting it into distutils might not make it "official API", but then, I think it ought to be official API, since PyPI would be just one (minor) application of it; Python also features a netrc module (which probably nobody uses). So I think it would be good to have a discussion upfront whether this should be added to the library after the summer is over (assuming it actually works by then). Decision to accept it or not as a SoC project is independent, but if accepted, the student should well understand the outcome of this discussion. Regards, Martin From martin at v.loewis.de Sun Apr 12 00:36:39 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Apr 2009 00:36:39 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> <20090411182114.GH7768@idyll.org> <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> Message-ID: <49E11B77.5040408@v.loewis.de> Tarek Ziad? wrote: >> -> I'm also skeptical that this is a good SoC project in the first place. > > What is a good SoC project from your point of view ? As a core project - tricky. Implement some long-standing complex feature request, or fix a pile of outstanding bug reports for a module (like the IDLE proposal). I liked the outcome of last year's "memory profiling" project: the student added sys.getsizeof (with much of mentoring on my side), and created a profiling library and application that wasn't added to the core. The latter part is a biased outcome (as I originally hoped to get something that becomes part of the standard library - but gave up on this quickly as way too much design went into that library); the useful core contribution (getsizeof) took considerable amount of learning, and still had a few tricky design issues to resolve. In short, there must be a realistic chance that the code gets actually used. Chances for a from-scratch library to be used are nearly zero, so from-scratch libraries are not good projects. In case you wonder why I give it nearly zero chance: I keep telling long-term contributors that libraries have to be field-tested before being considered for inclusion, and sometimes, even field-testing is not enough (think setuptools). If SoC students get to short-cut the process, that would send a wrong message to contributors and users. > Part of the work will consist of working on a PEP-like document, and on > building APIs for various keychains and see if we can have an unified one. > I doubt the PEP-like document can be written before writing prototypes APIs > for various keychains has been done. That's certainly true. That's why I think it is a much larger project: - write different wrappers - come up with a unifying API - field-test it for actual applications - write a PEP This could easily take a few years to get right (unless the actual authors of the various keychain implementations get together, define a common C API, which then a Python module just needs to wrap). > At the end of the summer, if we come up with a nice unified API, I'd > like to include > it to Distutils for the "register" command, and maybe write a PEP to have it > as part of the standard library because it makes sense to have this kind > of feature imho. I completely agree that this is a useful functionality to have, and I also agree it *eventually* belongs into the standard library. I just don't like the idea of bypassing the proper process by making it part of distutils. This model (I need it, so I add it) made both distutils and setuptools so unmaintainable. Regards, Martin From martin at v.loewis.de Sun Apr 12 00:38:51 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Apr 2009 00:38:51 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com> References: <20090410203809.GA24530@idyll.org> <49E0350F.8040506@v.loewis.de> <20090411182114.GH7768@idyll.org> <94bdd2610904111141k71ba5168g81f64da88fa066e5@mail.gmail.com> <94bdd2610904111213q141093a6td3368cd8d370b317@mail.gmail.com> Message-ID: <49E11BFB.3070102@v.loewis.de> > The student will also provide some plugins for a maximum number of > existing keyring systems. > Some of these plugins might be included in Distutils, and some of them > in a third-party package. This is slightly better, but see my previous message (that is feature creep in distutils, and likely, people will start using the distutils implementation as if it were official API). Also, if you want it pluggable, you likely come up with *another* ad-hoc plugin system. Regards, Martin From greg.ewing at canterbury.ac.nz Sun Apr 12 01:49:00 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 12 Apr 2009 11:49:00 +1200 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49E0900A.3000302@simplistix.co.uk> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> <49E0422D.10704@gmail.com> <49E0900A.3000302@simplistix.co.uk> Message-ID: <49E12C6C.4020607@canterbury.ac.nz> Chris Withers wrote: > Nick Coghlan wrote: > >> A similar naming scheme (i.e. msg.headers and msg.headersb) would >> probably work for email as well. > > That just feels nasty though :-( It does tend to look like a typo to me. Inserting an underscore (headers_b) would make it look less accidental. -- Greg From brian.curtin at gmail.com Sun Apr 12 02:12:37 2009 From: brian.curtin at gmail.com (curtin@acm.org) Date: Sat, 11 Apr 2009 19:12:37 -0500 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49E0900A.3000302@simplistix.co.uk> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <71E1EA03-6E24-4A28-A47A-4EA2D501CC6D@python.org> <49E0422D.10704@gmail.com> <49E0900A.3000302@simplistix.co.uk> Message-ID: FWIW, that is also the way things are done in the pickle/cPickle module. dump/dumps and load/loads to differentiate between the file object and string ways of using that functionality. On Sat, Apr 11, 2009 at 7:41 AM, Chris Withers wrote: > Nick Coghlan wrote: > >> Barry Warsaw wrote: >> >>> Of course, one could use message.header and message.bythdr and they'd >>>> be the same length. >>>> >>> I was trying to figure out what a 'thdr' was that we'd want to index >>> 'by' it. :) >>> >> >> In the discussions about os.environ, the suggested approach was to just >> tack a 'b' onto the end of the name to get the bytes version (i.e. >> os.environb). >> >> That aligns nicely with the b"" prefix for bytes literals, and isn't >> much of a typing or reading burden when dealing with the bytes API >> instead of the text one. >> >> A similar naming scheme (i.e. msg.headers and msg.headersb) would >> probably work for email as well. >> > > That just feels nasty though :-( > > Chris > > -- > Simplistix - Content Management, Zope & Python Consulting > - http://www.simplistix.co.uk > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brian.curtin%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skippy.hammond at gmail.com Sun Apr 12 04:29:24 2009 From: skippy.hammond at gmail.com (Mark Hammond) Date: Sun, 12 Apr 2009 12:29:24 +1000 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E01E30.8060302@gmail.com> <49E02F7E.6010605@v.loewis.de> Message-ID: <49E15204.3000401@gmail.com> On 11/04/2009 6:12 PM, Antoine Pitrou wrote: > Martin v. L?wis v.loewis.de> writes: >> Not sure whether it would be *significantly* faster, but yes, Bob wrote >> an accelerator for parsing out of a byte string to make it really fast; >> IIRC, he claims that it is faster than pickling. > > Isn't premature optimization the root of all evil? > > Besides, the fact that many values in a typical JSON object will be strings, and > must be encoded from/decoded to unicode objects in py3k, suggests that > accepting/outputting unicode as default is the laziest (i.e. the best) choice > performance-wise. I don't see it as premature optimization, but rather trying to ensure the interface/api best suits the actual use cases. > But you don't have to trust me: look at the quick numbers I've posted. The py3k > version (in the str-only incarnation I've proposed) is sometimes actually faster > than the trunk version: > http://mail.python.org/pipermail/python-dev/2009-April/088498.html But if all *actual* use-cases involve moving to and from utf8 encoded bytes, I'm not sure that little example is particularly useful. In those use-cases, I'd be surprised if there wasn't significant time and space benefits in not asking apps to use an 'intermediate' string object before getting the bytes they need, particularly when the payload may be a significant size. Assuming the above is all true, I'd see choosing bytes less as a premature optimization and more a design choice which best supports actual use. So to my mind the only real question is whether the above *is* true, or if there are common use-cases which don't involve utf8-off/on-the-wire... Cheers, Mark From ron.duplain at gmail.com Sun Apr 12 04:58:07 2009 From: ron.duplain at gmail.com (Ron DuPlain) Date: Sat, 11 Apr 2009 22:58:07 -0400 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410203809.GA24530@idyll.org> References: <20090410203809.GA24530@idyll.org> Message-ID: <2b485bad0904111958o7008ae4u582604437afa9b6d@mail.gmail.com> On Fri, Apr 10, 2009 at 4:38 PM, C. Titus Brown wrote: > Unquestionably "core" by my criteria above: > > 3to2 tool -- 'nuff said. I worked on the 3to2 tool during the sprint last week at PyCon. I can chip in for GSoC in the event it does get picked up. -Ron PS - I'm out of town next week for a family vacation, returning online the week of 20 Apr. From mrts.pydev at gmail.com Sun Apr 12 12:40:12 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Sun, 12 Apr 2009 13:40:12 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49CD2930.4080307@cornell.edu> <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: The general consensus in python-ideas is that the following is needed, so I bring it to python-dev to final discussions before I file a feature request in bugs.python.org. Proposal: add add_query_params() for appending query parameters to an URL to urllib.parse and urlparse. Implementation: http://github.com/mrts/qparams/blob/83d1ec287ec10934b5e637455819cf796b1b421c/qparams.py(feel free to fork and comment). Behaviour (longish, guided by "simple things are simiple, complex things possible"): In the simplest form, parameters can be passed via keyword arguments: >>> add_query_params('foo', bar='baz') 'foo?bar=baz' >>> add_query_params('http://example.com/a/b/c?a=b', b='d') 'http://example.com/a/b/c?a=b&b=d' Note that '/', if given in arguments, is encoded: >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar') 'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar' Duplicates are discarded: >>> add_query_params('http://example.com/a/b/c?a=b', a='b') 'http://example.com/a/b/c?a=b' >>> add_query_params('http://example.com/a/b/c?a=b&c=q', a='b', b='d', ... c='q') 'http://example.com/a/b/c?a=b&c=q&b=d' But different values for the same key are supported: >>> add_query_params('http://example.com/a/b/c?a=b', a='c', b='d') 'http://example.com/a/b/c?a=b&a=c&b=d' Pass different values for a single key in a list (again, duplicates are removed): >>> add_query_params('http://example.com/a/b/c?a=b', a=('q', 'b', 'c'), ... b='d') 'http://example.com/a/b/c?a=b&a=q&a=c&b=d' Keys with no value are respected, pass ``None`` to create one: >>> add_query_params('http://example.com/a/b/c?a', b=None) 'http://example.com/a/b/c?a&b' But if a value is given, the empty key is considered a duplicate (i.e. the case of a&a=b is considered nonsensical): >>> add_query_params('http://example.com/a/b/c?a', a='b', c=None) 'http://example.com/a/b/c?a=b&c' If you need to pass in key names that are not allowed in keyword arguments, pass them via a dictionary in second argument: >>> add_query_params('foo', {"+'|???": 'bar'}) 'foo?%2B%27%7C%C3%A4%C3%BC%C3%B6=bar' Order of original parameters is retained, although similar keys are grouped together. Order of keyword arguments is not (and can not be) retained: >>> add_query_params('foo?a=b&b=c&a=b&a=d', a='b') 'foo?a=b&a=d&b=c' >>> add_query_params('http://example.com/a/b/c?a=b&q=c&e=d', ... x='y', e=1, o=2) 'http://example.com/a/b/c?a=b&q=c&e=d&e=1&x=y&o=2' If you need to retain the order of the added parameters, use an :class:`OrderedDict` as the second argument (*params_dict*): >>> from collections import OrderedDict >>> od = OrderedDict() >>> od['xavier'] = 1 >>> od['abacus'] = 2 >>> od['janus'] = 3 >>> add_query_params('http://example.com/a/b/c?a=b', od) 'http://example.com/a/b/c?a=b&xavier=1&abacus=2&janus=3' If both *params_dict* and keyword arguments are provided, values from the former are used before the latter: >>> add_query_params('http://example.com/a/b/c?a=b', od, xavier=1.1, ... zorg='a', alpha='b', watt='c', borg='d') ' http://example.com/a/b/c?a=b&xavier=1&xavier=1.1&abacus=2&janus=3&zorg=a&borg=d&watt=c&alpha=b ' Do nothing with a single argument: >>> add_query_params('a') 'a' >>> add_query_params('arbitrary strange stuff?????*()+-=42') 'arbitrary strange stuff?\xc3\xb6\xc3\xa4\xc3\xbc\xc3\xb5*()+-=42' -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at sweetapp.com Sun Apr 12 12:49:37 2009 From: brian at sweetapp.com (Brian Quinlan) Date: Sun, 12 Apr 2009 11:49:37 +0100 Subject: [Python-Dev] Possible py3k io wierdness In-Reply-To: <49DA4648.9070204@sweetapp.com> References: <48927.89.100.167.183.1238884532.squirrel@webmail5.pair.com> <49D874E4.6030602@sweetapp.com> <3CC2B586-5720-47BC-9D8A-4702E94E0B25@fuhm.net> <49D9A669.9010008@sweetapp.com> <49D9E3E0.2060408@gmail.com> <49DA4648.9070204@sweetapp.com> Message-ID: <49E1C741.5020604@sweetapp.com> I've added a new proposed patch to: http://bugs.python.org/issue5700 The idea is: - only IOBase implements close() (though a subclass can override close without causing problems so long as it calls super().close() or calls .flush() and ._close() directly) - change IOBase.close to call .flush() and then ._close() - .flush() invokes super().flush() in every class except IOBase - ._close() invokes super()._close() in every class except IOBase - FileIO is implemented in Python in _pyio.py so that it can have the same base class as the other Python-implemented files classes - tests verify that .flush() is not called after the file is closed - tests verify that ._close()/.flush() calls are propagated correctly On nice side effect is that inheritance is a lot easier and MI works as expected i.e. class DebugClass(IOBase): def flush(self): print() super().flush() def _close(self): print( super()._close() class MyClass(FileIO, DebugClass): # whatever order makes sense ... m = MyClass(...) m.close() # Will call: # IOBase.close() # DebugClass.flush() # FileIO has no .flush method # IOBase.flush() # FileIO._close() # DebugClass._close() # IOBase._close() Cheers, Brian From mrts.pydev at gmail.com Sun Apr 12 15:15:46 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Sun, 12 Apr 2009 16:15:46 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <49E1DD5A.30405@improva.dk> References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> <49E1DD5A.30405@improva.dk> Message-ID: On Sun, Apr 12, 2009 at 3:23 PM, Jacob Holm wrote: > Hi Mart > > >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar') >> 'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar < >> http://example.com/a/b/c?a=b&b=d&foo=%2Fbar>' >> >> Duplicates are discarded: >> > > Why discard duplicates? They are valid and have a well-defined meaning. The bad thing about reasoning about query strings is that there is no comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986 are rather vague in that matter. But I agree that duplicates actually have a meaning (an ordered list of identical values), so I'll remove the bits that prune them unless anyone opposes (which I doubt). >> But if a value is given, the empty key is considered a duplicate (i.e. the >> case of a&a=b is considered nonsensical): >> > > Again, it is a valid url and this will change its meaning. Why? I'm uncertain whether a&a=b has a meaning, but don't see any harm in supporting it, so I'll add the feature. >> >>> add_query_params('http://example.com/a/b/c?a', a='b', c=None) >> 'http://example.com/a/b/c?a=b&c ' >> >> If you need to pass in key names that are not allowed in keyword >> arguments, >> pass them via a dictionary in second argument: >> >> >>> add_query_params('foo', {"+'|???": 'bar'}) >> 'foo?%2B%27%7C%C3%A4%C3%BC%C3%B6=bar' >> >> Order of original parameters is retained, although similar keys are >> grouped >> together. >> > > Why the grouping? Is it a side effect of your desire to discard > duplicates? Changing the order like that changes the meaning of the url. > A concrete case where the order of field names matters is the ":records" > converter in http://pypi.python.org/pypi/zope.httpform/1.0.1 (a small > independent package extracted from the form handling code in zope). It's also related to duplicate handling, but it mostly relates to the data structure used in the initial implementation (an OrderedDict). Re-grouping is removed now and not having to deal with duplicates simplified the code considerably (using a simple list of key-value tuples now). If you change it to keep duplicates and not unnecessarily mangle the field > order I am +1, else I am -0. Thanks for your input! Changes pushed to github (see the updated behaviour there as well): http://github.com/mrts/qparams/blob/4f32670b55082f8d0ef01c33524145c3264c161a/qparams.py MS -------------- next part -------------- An HTML attachment was scrubbed... URL: From thiagoharry at riseup.net Sun Apr 12 21:09:22 2009 From: thiagoharry at riseup.net (Harry (Thiago Leucz Astrizi)) Date: Sun, 12 Apr 2009 16:09:22 -0300 (BRT) Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Written by "Martin v. L?wis" : > Notice that Python source is represented in UTF-8 in the parser. It > might be that the C source code has a different encoding, which > would cause the strcmp to fail. No, all the files in the surce code were already in UTF-8. My system is configured to treat UTF-8 as the default encoding. This is not an encoding problem. Written by "Jack diederich" : > I love the idea (and most recently edited PEP 306) so here are a few > suggestions; > > Brazil has many python programmers so you might be able to make > quick progress by asking them for volunteer time. Yes, I have plans to ask for help in the brazilian Python mailing list when I finish to prepare the C source code for this project. Then I expect to receive help to translate the python modules for this new language. There's a lot of work to do. > To bug-hunt your technical problem: try switching the "not is" > operator to include an underscore "not_is." The python LL(1) > grammar checker works for python but isn't robust, and does miss > some grammar ambiguities. Making the operator a single word might > reveal a bug in the parser. Thanks for the advice, you almost guessed what went wrong. I made some tests and already discovered what's the problem. When I change Grammar/Grammar, Python/ast.c and Modules/parsermodule.c to transform "is not" in "not is", everything works fine and I create a new Python verson where "a is not None" is wrong and "a not is None" is right. But when I translate this to "n?o ?", always happens a SyntaxError. So the probles is really in the grammar checker that can't handle some letters with accent. Well, knowing where the problem is, I think that I can try to solve it by myself. Thanks again. > Please consider switching your students to 'real' python part way > through the course. If they want to use the vast amount of python > code on the internet as examples they will need to know the few > English keywords. > > Also - most python core developers are not native English speakers > and do OK :) PyCon speakers are about 25% non-native English > speakers and EuroPython speakers are about the reverse (my rough > estimate - I'd love to see some hard numbers). Yes, I know. To a more "serious" programmer, it's essential to have a basic understanding in english and would be better for him to start with the real Python. But my intent is not to substitute Python in Brazil, but to create a new language that could be learned easily by younger people for educational purposes. My intent is to show them how a computer software works. But surely I will warn my students that to take programming more seriously, it's important to learn how to program in some other language, like the original Python. But thanks for the advice. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJ4jrjmNGEzq1zP84RAvikAJ4k25vufyWWiDvj3HFZ7Q4M38zCjgCglBGC dPQTd7mBuswKbNstpJqRuFE= =xApj -----END PGP SIGNATURE----- From tjreedy at udel.edu Sun Apr 12 22:30:28 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 12 Apr 2009 16:30:28 -0400 Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: Harry (Thiago Leucz Astrizi) wrote: > Yes, I have plans to ask for help in the brazilian Python mailing list > when I finish to prepare the C source code for this project. Then I > expect to receive help to translate the python modules for this new > language. There's a lot of work to do. There are only a few modules that you really need to do this for for beginners. Trying to convert the entire stdlib, let alone other stuff on pypi, strikes me as foolish. ... > Yes, I know. To a more "serious" programmer, it's essential to have a > basic understanding in english and would be better for him to start > with the real Python. But my intent is not to substitute Python in > Brazil, but to create a new language that could be learned easily by > younger people for educational purposes. My intent is to show them how > a computer software works. But surely I will warn my students that to > take programming more seriously, it's important to learn how to > program in some other language, like the original Python. But thanks > for the advice. If possible, and I presume it is, make your interpreter dual language. Source code in .py files is parsed as now (and module compiles to .pyc). Source in .pyb (python-brazil) is parsed with with your new parser, and get a brazilian equivalent of builtins, but use the same AST and bytecode. Bytecode is neither English nor Brazilian ;-). This would give your students access to the whole world of Python modules and allow those who want to move to normal English-based international Python to do so without obsoleting their existing work. Terry Jan Reedy PS. Since this thread is not about developing Python itself, it would be more appropriate on the python-ideas list if continued much further. PPS Once unicode identifiers were allowed, I considered it inevitable that people would also want native-language keywords, especially for younger students. So I expected a project like yours, though I expected the first to be in Asia. I think dual language versions, if possible, would be the way to do this without ghettoizing the national versions. But as I said, a general discussion of this belongs on python-ideas. From l.mastrodomenico at gmail.com Sun Apr 12 22:59:09 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 12 Apr 2009 22:59:09 +0200 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> <49E1DD5A.30405@improva.dk> Message-ID: 2009/4/12 Mart S?mermaa : > The bad thing about reasoning about query strings is that there is no > comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986 > are rather vague in that matter. FYI the HTML5 spec (http://whatwg.org/html5 ) may have a better contact with reality than the RFCs. >From a quick scan, two sections that may be relevant are "4.10.16.3 Form submission algorithm": and "4.10.16.4 URL-encoded form data": -- Lino Mastrodomenico From cs at zip.com.au Sun Apr 12 23:17:46 2009 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 13 Apr 2009 07:17:46 +1000 Subject: [Python-Dev] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: Message-ID: <20090412211746.GA23767@cskk.homeip.net> On 12Apr2009 16:15, Mart S?mermaa wrote: | On Sun, Apr 12, 2009 at 3:23 PM, Jacob Holm wrote: | > Hi Mart | > >>> add_query_params('http://example.com/a/b/c?a=b', b='d', foo='/bar') | >> 'http://example.com/a/b/c?a=b&b=d&foo=%2Fbar < | >> http://example.com/a/b/c?a=b&b=d&foo=%2Fbar>' | >> | >> Duplicates are discarded: | > | > Why discard duplicates? They are valid and have a well-defined meaning. | | The bad thing about reasoning about query strings is that there is no | comprehensive documentation about their meaning. Both RFC 1738 and RFC 3986 | are rather vague in that matter. But I agree that duplicates actually have a | meaning (an ordered list of identical values), so I'll remove the bits that | prune them unless anyone opposes (which I doubt). +1 from me, with the following suggestion: it's probably worth adding the to doco that people working with dict-style query_string params should probably go make a dict or OrderedDict and use: add_query_params(..., **the_dict) just to make the other use case obvious. An alternative would be to have add_ and append_ methods with set and list behaviour. Feels a little like API bloat, though the convenience function can be nice. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ The wonderous pulp and fibre of the brain had been substituted by brass and iron; he had taught wheelwork to think. - Harry Wilmot Buxton 1832, referring to Charles Babbage and his difference engine. From tonynelson at georgeanelson.com Sun Apr 12 23:41:00 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Sun, 12 Apr 2009 17:41:00 -0400 Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: At 16:30 -0400 04/12/2009, Terry Reedy wrote: ... > Source in .pyb (python-brazil) is parsed with with your new parser, ... In case anyone ever does this again, I suggest that the extension be the language and optionally country code: .py_pt or .py_pt_BR -- ____________________________________________________________________ TonyN.:' ' From solipsis at pitrou.net Sun Apr 12 23:56:58 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 12 Apr 2009 21:56:58 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49CD2930.4080307@cornell.edu> <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: Mart S?mermaa gmail.com> writes: > > Proposal: add add_query_params() for appending query parameters to an URL to urllib.parse and urlparse. Is there anything to /remove/ a query parameter? From v+python at g.nevcal.com Mon Apr 13 00:11:26 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 12 Apr 2009 15:11:26 -0700 Subject: [Python-Dev] Needing help to change the grammar In-Reply-To: References: Message-ID: <49E2670E.3070705@g.nevcal.com> On approximately 4/12/2009 2:41 PM, came the following characters from the keyboard of Tony Nelson: > At 16:30 -0400 04/12/2009, Terry Reedy wrote: > ... >> Source in .pyb (python-brazil) is parsed with with your new parser, > ... > > In case anyone ever does this again, I suggest that the extension be the > language and optionally country code: > > .py_pt or .py_pt_BR Wouldn't that be a good idea for this implementation too? It sounds like it is not-yet-released, as it is also not-yet-bug-free. And actually, wouldn't it be nice if international keywords could be accepted as alternates if one just said import pt_BR An implementation along that line, except for things like reversing the order of "not" and "is", would allow the next national language customization to be done by just recoding the pt_BR module, renaming to pt_it or pt_fr or pt_no and translating a bunch of strings, no? Probably it would be sufficient to allow for one language at a time, per module. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From rasky at develer.com Mon Apr 13 00:55:21 2009 From: rasky at develer.com (Giovanni Bajo) Date: Sun, 12 Apr 2009 22:55:21 +0000 (UTC) Subject: [Python-Dev] Evaluated cmake as an autoconf replacement References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <806d41050903301034i8018472pbdb3f550a1629886@mail.gmail.com> <49D110AD.3080703@cheimes.de> Message-ID: On Mon, 30 Mar 2009 20:34:21 +0200, Christian Heimes wrote: > Hallo Alexander! > > Alexander Neundorf wrote: >> This of course depends on the definition of "as good as" ;-) Well, I >> have met Windows-only developers which use CMake because it is able to >> generate project files for different versions of Visual Studio, and >> praise it for that. > > So far I haven't heard any complains about or feature requests for the > project files. ;) In fact, I have had one. I asked to put all those big CJK codecs outside of python2x.dll because they were too big and create far larger self-contained distributions (aka: py2exe/pyinstaller) as would normally be required. I was replied that it would be unconvienent to do so because of the fact that the build system is made by hand and it's hard to generate project files for each third party module. Were those project files generated automatically, changing between external modules within or outside python2x dll would be a one-line switch in CMakeLists.txt (or similar). -- Giovanni Bajo Develer S.r.l. http://www.develer.com From rasky at develer.com Mon Apr 13 01:00:04 2009 From: rasky at develer.com (Giovanni Bajo) Date: Sun, 12 Apr 2009 23:00:04 +0000 (UTC) Subject: [Python-Dev] Evaluated cmake as an autoconf replacement References: <5d44f72f0903291021u352e9864x5db3c3f9d7b32b76@mail.gmail.com> <85b5c3130904061306l80daac0y501cc6c1cd4c9594@mail.gmail.com> <18907.17310.201358.697994@montanaro.dyndns.org> <5b8d13220904070608xf5ba61fl6b22c3f08675dd64@mail.gmail.com> <49DBD6F9.7030502@canterbury.ac.nz> <806d41050904071554x30dade8eva60be765af462112@mail.gmail.com> <5b8d13220904071918x2fed76a8t9e94ad4017721ec7@mail.gmail.com> <806d41050904081245u2dad5623r2cf87aff1edf364d@mail.gmail.com> <5b8d13220904081857w46237b57t82d8a4006f00adbb@mail.gmail.com> <50862ebd0904091849q7f28fa5bmeaf3b9061629a1c6@mail.gmail.com> Message-ID: On Fri, 10 Apr 2009 11:49:04 +1000, Neil Hodgson wrote: > This means that generated Visual Studio project files will not work > for other people unless a particular absolute build location is > specified for everyone which will not suit most. Each person that wants > to build Python will have to run cmake before starting Visual Studio > thus increasing the prerequisites. Given that we're now stuck with using whatever Visual Studio version the Python maintainers decided to use, I don't see this as a problem. As in: there is already a far larger and invasive dependency. CMake is readily available on all platforms, and it can be installed in a couple of seconds. -- Giovanni Bajo Develer S.r.l. http://www.develer.com From asmodai at in-nomine.org Mon Apr 13 10:09:08 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Mon, 13 Apr 2009 10:09:08 +0200 Subject: [Python-Dev] UTF-8 Decoder Message-ID: <20090413080908.GM13110@nexus.in-nomine.org> [Note: I haven't looked thoroughly at our handling yet, so hence I raise the question.] This got posted on the Unicode list, does it seem interesting for Python itself, the UTF-8 to UTF-16 transcoding might be? http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Whenever you meet difficult situations dash forward bravely and joyfully... From mrts.pydev at gmail.com Mon Apr 13 11:29:46 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Mon, 13 Apr 2009 12:29:46 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou wrote: > Mart S?mermaa gmail.com> writes: > > > > Proposal: add add_query_params() for appending query parameters to an URL > to > urllib.parse and urlparse. > > Is there anything to /remove/ a query parameter? I'd say this is outside the scope of add_query_params(). As for the duplicate handling, I've implemented a threefold strategy that should address all use cases raised before: def add_query_params(*args, **kwargs): """ add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs) Appends query parameters to an URL and returns the result. :param url: the URL to update, a string. :param allow_dups: if * True: plainly append new parameters, allowing all duplicates (default), * False: disallow duplicates in values and regroup keys so that different values for the same key are adjacent, * None: disallow duplicates in keys -- each key can have a single value and later values override the value (like dict.update()). :param args_dict: optional dictionary of parameters, default is {}. :param separator: either ';' or '&', the separator between key-value pairs, default is '&'. :param kwargs: parameters as keyword arguments. :return: original URL with updated query parameters or the original URL unchanged if no parameters given. """ The commit is http://github.com/mrts/qparams/blob/b9bdbec46bf919d142ff63e6b2b822b5d57b6f89/qparams.py extensive description of the behaviour is in the doctests. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Apr 13 12:19:13 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Apr 2009 10:19:13 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: Mart S?mermaa gmail.com> writes: > > On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou pitrou.net> wrote: > Mart S?mermaa gmail.com> writes: > > > > Proposal: add add_query_params() for appending query parameters to an URL to > urllib.parse and urlparse. > Is there anything to /remove/ a query parameter? > > I'd say this is outside the scope of add_query_params(). Given the name of the proposed function, sure. But it sounds a bit weird to have a function dedicated to adding parameters and nothing to remove them. You could e.g. rename the function to update_query_params() and decide that every parameter whose specified value is None must atcually be removed from the URL. Regards Antoine. From fuzzyman at voidspace.org.uk Mon Apr 13 13:53:10 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 13 Apr 2009 12:53:10 +0100 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: <49E327A6.3000801@voidspace.org.uk> Antoine Pitrou wrote: > Mart S?mermaa gmail.com> writes: > >> On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou pitrou.net> >> > wrote: > >> Mart S?mermaa gmail.com> writes: >> >>> Proposal: add add_query_params() for appending query parameters to an URL >>> > to > >> urllib.parse and urlparse. >> Is there anything to /remove/ a query parameter? >> >> I'd say this is outside the scope of add_query_params(). >> > > Given the name of the proposed function, sure. But it sounds a bit weird to > have a function dedicated to adding parameters and nothing to remove them. > > Weird or not, is there actually a *need* to remove query parameters? Michael > You could e.g. rename the function to update_query_params() and decide that > every parameter whose specified value is None must atcually be removed from > the URL. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From solipsis at pitrou.net Mon Apr 13 14:01:51 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Apr 2009 12:01:51 +0000 (UTC) Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> <49E327A6.3000801@voidspace.org.uk> Message-ID: Michael Foord voidspace.org.uk> writes: > > Weird or not, is there actually a *need* to remove query parameters? Say you are filtering or sorting data based on some URL parameters. If the user wants to remove one of those filters, you have to remove the corresponding query parameter. Regards Antoine. From orsenthil at gmail.com Mon Apr 13 14:22:05 2009 From: orsenthil at gmail.com (Senthil Kumaran) Date: Mon, 13 Apr 2009 17:52:05 +0530 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D0ACD5.5090209@gmail.com> <49E327A6.3000801@voidspace.org.uk> Message-ID: <7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com> On Mon, Apr 13, 2009 at 5:31 PM, Antoine Pitrou wrote: > Say you are filtering or sorting data based on some URL parameters. If the user > wants to remove one of those filters, you have to remove the corresponding query > parameter. This is a use-case and possibly a hypothetical one which a programmer might do under special situations. There are lots of such use cases for which urllib.parse or urlparse has been used for. But my thoughts with this proposal is do we have a good RFC specfications to implementing this? If not and if we go by just go by the practical needs, then eventually we will end up with bugs or feature requests in this which will take a lot of discussions and time to get fixed. Someone pointed out to read HTML 5.0 spec instead of RFC for this request. I am yet to do that, but my opinion with respect to additions to url* module is - backing of RFCs would be the best way to go and maintain. -- Senthil From tino at wildenhain.de Mon Apr 13 14:33:08 2009 From: tino at wildenhain.de (Tino Wildenhain) Date: Mon, 13 Apr 2009 14:33:08 +0200 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com> References: <49D0ACD5.5090209@gmail.com> <49E327A6.3000801@voidspace.org.uk> <7c42eba10904130522r2dbaef23ja5e785a2206177d9@mail.gmail.com> Message-ID: <49E33104.2040302@wildenhain.de> Hi, Senthil Kumaran wrote: > On Mon, Apr 13, 2009 at 5:31 PM, Antoine Pitrou wrote: >> Say you are filtering or sorting data based on some URL parameters. If the user >> wants to remove one of those filters, you have to remove the corresponding query >> parameter. > > This is a use-case and possibly a hypothetical one which a programmer > might do under special situations. > There are lots of such use cases for which urllib.parse or urlparse > has been used for. > > But my thoughts with this proposal is do we have a good RFC > specfications to implementing this? > If not and if we go by just go by the practical needs, then eventually > we will end up with bugs or feature requests in this which will take a > lot of discussions and time to get fixed. > > Someone pointed out to read HTML 5.0 spec instead of RFC for this > request. I am yet to do that, but my opinion with respect to additions > to url* module is - backing of RFCs would be the best way to go and > maintain. I'd rather like to see an ordered dict like object returned by urlparse for parameters this would make extra methods superfluous. Also note that you might need to specify the encoding of the data somewhere (most of the times its utf-8 but it depends on the encoding used in the form page). A nice add-on would actually be a template form object which holds all the expected items and their type (and if optional or not) with little wrappers for common types (int, float, string, list, ...) which generate nice execeptions when used somewhere and not filled/no default or actually wrong data for a type. Otoh, this might get a bit too much in direction of a web app framework. Regards Tino From barry at python.org Mon Apr 13 16:01:14 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Apr 2009 10:01:14 -0400 Subject: [Python-Dev] Python 2.6.2 final In-Reply-To: <5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com> References: <776F906E-418C-4A2C-8C6C-2B0036B49AFA@python.org> <5c6f2a5d0904110520o2ea97af9t4cd18a168db795d5@mail.gmail.com> Message-ID: <3E33F52B-DC06-44D5-BC91-68F4D6AD5300@python.org> On Apr 11, 2009, at 8:20 AM, Mark Dickinson wrote: > On Fri, Apr 10, 2009 at 2:31 PM, Barry Warsaw > wrote: >> bugs.python.org is apparently down right now, but I set issue 5724 to >> release blocker for 2.6.2. This is waiting for input from Mark >> Dickinson, >> and it relates to test_cmath failing on Solaris 10. > > I'd prefer to leave this alone for 2.6.2. There's a fix posted to > the issue > tracker, but it's not entirely trivial and I think the risk of > accidental > breakage outweighs the niceness of seeing 'all tests passed' on > Solaris. Agreed. I've knocked this back to 'high' priority and accepted it for 2.6.3. Mark, feel free to apply it after 2.6.2 is tagged (which should be in about 8 hours or 2200 UTC today). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Mon Apr 13 16:11:09 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Apr 2009 10:11:09 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: On Apr 10, 2009, at 11:08 AM, James Y Knight wrote: > Until you write a parser for every header, you simply cannot decode > to unicode. The only sane choices are: > 1) raw bytes > 2) parsed structured data The email package does not need a parser for every header, but it should provide a framework that applications (or third party libraries) can use to extend the built-in header parsers. A bare minimum for functionality requires a Content-Type parser. I think the email package should also include an address header (Originator, Destination) parser, and a Message-ID header parser. Possibly others. The default would probably be some unstructured parser for headers like Subject. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Mon Apr 13 16:14:04 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Apr 2009 10:14:04 -0400 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49DF8956.5050501@g.nevcal.com> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> Message-ID: <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org> On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote: > If one name has to be longer than the other, it should be the bytes > version. Real user code is more likely to want to use the text > version, and hopefully there will be more of that type of code than > implementations using bytes. > > Of course, one could use message.header and message.bythdr and > they'd be the same length. Actually, thinking about this over the weekend, it's much better for message['subject'] to return a Header instance in all cases. Use bytes(header) to get the raw bytes. A good API for getting the parsed and decoded header values needs to take into account that it won't always be a string. For unstructured headers like Subject, str(header) would work just fine. For an Originator or Destination address, what does str(header) return? And what would be the API for getting the set of realname/addresses out of the header? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From barry at python.org Mon Apr 13 16:28:32 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 13 Apr 2009 10:28:32 -0400 Subject: [Python-Dev] headers api for email package In-Reply-To: <49E08F8C.5030205@simplistix.co.uk> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E08F8C.5030205@simplistix.co.uk> Message-ID: On Apr 11, 2009, at 8:39 AM, Chris Withers wrote: > Barry Warsaw wrote: >> >>> message['Subject'] >> The raw bytes or the decoded unicode? > > A header object. Yep. You got there before I did. :) >> Okay, so you've picked one. Now how do you spell the other way? > > str(message['Subject']) Yes for unstructured headers like Subject. For structured headers... hmm. > bytes(message['Subject']) Yes. >> Now, setting headers. Sometimes you have some unicode thing and >> sometimes you have some bytes. You need to end up with bytes in >> the ASCII range and you'd like to leave the header value unencoded >> if so. But in both cases, you might have bytes or characters >> outside that range, so you need an explicit encoding, defaulting to >> utf-8 probably. >> >>> Message.set_header('Subject', 'Some text', encoding='utf-8') >> >>> Message.set_header('Subject', b'Some bytes') > > Where you just want "a damned valid email and stop making my life > hard!": > > Message['Subject']='Some text' Yes. In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) wtf? > Where you care about what encoding is used: > > Message['Subject']=Header('Some text',encoding='utf-8') Yes. > If you have bytes, for whatever reason: > > Message['Subject']=b'some bytes'.decode('utf-8') > > ...because only you know what encoding those bytes use! So you're saying that __setitem__() should not accept raw bytes? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From martin at v.loewis.de Mon Apr 13 16:44:36 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 13 Apr 2009 16:44:36 +0200 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! In-Reply-To: References: Message-ID: <49E34FD4.3060809@v.loewis.de> > * What is the scope of a patch that requires a contributor > agreement? Van's advise is as follows: There is no definite ruling on what constitutes "work" that is copyright-protected; estimates vary between 10 and 50 lines. Establishing a rule based on line limits is not supported by law. Formally, to be on the safe side, paperwork would be needed for any contribution (no matter how small); this is tedious and probably unnecessary, as the risk of somebody suing is small. Also, in that case, there would be a strong case for an implied license. So his recommendation is to put the words "By submitting a patch or bug report, you agree to license it under the Apache Software License, v. 2.0, and further agree that it may be relicensed as necessary for inclusion in Python or other downstream projects." into the tracker; this should be sufficient for most cases. For committers, we should continue to require contributor forms. Contributor forms can be electronic, but they need to name the parties, include a signature (including electronic), and include a company contribution agreement as necessary. Regards, Martin P.S. I'm sure Van will jump in if I misunderstood parts of this. From thobes at gmail.com Mon Apr 13 17:45:11 2009 From: thobes at gmail.com (Tobias Ivarsson) Date: Mon, 13 Apr 2009 17:45:11 +0200 Subject: [Python-Dev] Contributor Agreements for Patches - was [Jython-dev] Jython on Google AppEngine! In-Reply-To: <49E34FD4.3060809@v.loewis.de> References: <49E34FD4.3060809@v.loewis.de> Message-ID: <9997d5e60904130845t7c1f636cof05cfb86d20c9c29@mail.gmail.com> On Mon, Apr 13, 2009 at 4:44 PM, "Martin v. L?wis" wrote: > > * What is the scope of a patch that requires a contributor > > agreement? > > Van's advise is as follows: > > There is no definite ruling on what constitutes "work" that is > copyright-protected; estimates vary between 10 and 50 lines. > Establishing a rule based on line limits is not supported by > law. Formally, to be on the safe side, paperwork would be needed > for any contribution (no matter how small); this is tedious and > probably unnecessary, as the risk of somebody suing is small. > Also, in that case, there would be a strong case for an implied > license. > > So his recommendation is to put the words > > "By submitting a patch or bug report, you agree to license it under the > Apache Software License, v. 2.0, and further agree that it may be > relicensed as necessary for inclusion in Python or other downstream > projects." > > into the tracker; this should be sufficient for most cases. For > committers, we should continue to require contributor forms. Sounds great to me. Cheers, Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Mon Apr 13 17:49:35 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 13 Apr 2009 11:49:35 -0400 (EDT) Subject: [Python-Dev] headers api for email package In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E08F8C.5030205@simplistix.co.uk> Message-ID: On Mon, 13 Apr 2009 at 10:28, Barry Warsaw wrote: > On Apr 11, 2009, at 8:39 AM, Chris Withers wrote: > >> Barry Warsaw wrote: >> > > > > message['Subject'] >> > The raw bytes or the decoded unicode? >> >> A header object. > > Yep. You got there before I did. :) +1 >> > Okay, so you've picked one. Now how do you spell the other way? >> >> str(message['Subject']) > > Yes for unstructured headers like Subject. For structured headers... hmm. Some "reasonable" printable interpretation that has no semantic meaning? >> bytes(message['Subject']) > > Yes. > >> > Now, setting headers. Sometimes you have some unicode thing and >> > sometimes you have some bytes. You need to end up with bytes in the >> > ASCII range and you'd like to leave the header value unencoded if so. >> > But in both cases, you might have bytes or characters outside that range, >> > so you need an explicit encoding, defaulting to utf-8 probably. >> > > > > Message.set_header('Subject', 'Some text', encoding='utf-8') >> > > > > Message.set_header('Subject', b'Some bytes') >> >> Where you just want "a damned valid email and stop making my life hard!": >> >> Message['Subject']='Some text' > > Yes. In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) > wtf? Given some usenet postings I've just dealt with, (3) appears to sometimes be spelled 'x-unknown' and sometimes (in the most recent case) 'unknown-8bit'. A quick google turns up a hit on RFC1428 for the latter, and a bunch of trouble tickets for the former...so I think 'wtf' is correctly spelled 'unknown-8bit'. However, it's not supposed to be used by mail composers, who are expected to know the encoding. It's for mail gateways that are transforming something and don't know the encoding. I'm not sure what this means for the email module, which certainly will be used in a mail gateways....maybe it's the responsibility of the application code to explicitly say 'unknown encoding'? >> Where you care about what encoding is used: >> >> Message['Subject']=Header('Some text',encoding='utf-8') > > Yes. > >> If you have bytes, for whatever reason: >> >> Message['Subject']=b'some bytes'.decode('utf-8') >> >> ...because only you know what encoding those bytes use! > > So you're saying that __setitem__() should not accept raw bytes? If I'm understanding things correctly, if it did accept bytes the person using that interface would need to do whatever encoding (eg: encoded-word) was needed, so the interface should check that the byte string is 8 bit clean. But having some sort of 'setraw' method on Header might be better for that case. --David From daniel at stutzbachenterprises.com Mon Apr 13 18:11:35 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 13 Apr 2009 11:11:35 -0500 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E00931.6050107@v.loewis.de> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> Message-ID: On Fri, Apr 10, 2009 at 10:06 PM, "Martin v. L?wis" wrote: > However, I really think that this question cannot be answered by > reading the RFC. It should be answered by verifying how people use > the json library in 2.x. > I use the json module in 2.6 to communicate with a C# JSON library and a JavaScript JSON library. The C# and JavaScript libraries produce and consume the equivalent of str, not the equivalent of bytes. Yes, the data eventually has to go over a socket as bytes, but that's often handled by a different layer of code. For JavaScript, data is typically received by via XMLHttpRequest(), which automatically figures out the encoding from the HTTP headers and/or other information (defaulting to UTF-8) and returns a str-like object that I pass to the JavaScript JSON library. For C#, I wrap the socket in a StreamReader object, which decodes the byte stream into a string stream (similar to Python's new TextIOWrapper class). Hope that helps, -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From walter at livinglogic.de Mon Apr 13 18:39:06 2009 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Mon, 13 Apr 2009 18:39:06 +0200 Subject: [Python-Dev] Google Summer of Code/core Python projects - RFC In-Reply-To: <20090410233524.GA18347@idyll.org> References: <20090410203809.GA24530@idyll.org> <1afaf6160904101605l2235f906if36aa79703cd9fd7@mail.gmail.com> <20090410233524.GA18347@idyll.org> Message-ID: <49E36AAA.6050903@livinglogic.de> C. Titus Brown wrote: > [...] > I have had a hard time getting a good sense of what core code is well > tested and what is not well tested, across various platforms. While > Walter's C/Python integrated code coverage site is nice, it would be > even nicer to have a way to generate all that information within any > particular checkout on a real-time basis. This might have to be done incrementally. Creating the output for http://coverage.livinglogic.de/ takes about 90 minutes. This breaks done like this: Downloading: 2sec Unpacking: 3sec Configuring: 30sec Compiling: 1min Running the test suite: 1hour Reading coverage files: 8sec Generating HTML files: 30min > Doing so in the context of > Snakebite would be icing... and I think it's worth supporting in core, > especially if it can be done without any changes *to* core. The only thing we'd probably need in core is a way to configure Python to run with code coverage. The coverage script does this by patching the makefile. Running the code coverage script on Snakebite would be awesome. The script is available from here: http://pypi.python.org/pypi/pycoco > -> Another small nit is that they should address Python 2.x, too. > > I asked that they focus on EITHER 2.x or 3.x, since "too broad" is an > equally valid criticism. Certainly 3.x is the future so I though > focusing on increasing code coverage, and especially C code coverage, > could best be applied to 3.x. Servus, Walter From stephen at xemacs.org Mon Apr 13 19:15:20 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Apr 2009 02:15:20 +0900 Subject: [Python-Dev] [Email-SIG] headers api for email package In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E08F8C.5030205@simplistix.co.uk> Message-ID: <873accv5jr.fsf@xemacs.org> Barry Warsaw writes: > On Apr 11, 2009, at 8:39 AM, Chris Withers wrote: > > > Barry Warsaw wrote: > >> >>> message['Subject'] > >> The raw bytes or the decoded unicode? > > > > A header object. > > Yep. You got there before I did. :) > > >> Okay, so you've picked one. Now how do you spell the other way? > > > > str(message['Subject']) > > Yes for unstructured headers like Subject. For structured headers... > hmm. Well, suppose we get really radical here. *People* see email as (rich-)text. So ... message['Subject'] returns an object, partly to be consistent with more complex headers' APIs, but partly to remind us that nothing in email is as simple as it seems. Now, str(message['Subject']) is really for presentation to the user, right? OK, so let's make it a presentation function! Decode the MIME-words, optionally unfold folded lines, optionally compress spaces, etc. This by default returns the subject field as a single, possibly quite long, line. Then a higher-level API can rewrap it, add fonts etc, for fancy presentation. This also suggests that we don't the field tag (ie, "Subject") to be part of this value. Of course a *really* smart higher-level API would access structured headers based on their structure, not on the one-size-fits-all str() conversion. Then MTAs see email as a string of octets. So guess what: > > bytes(message['Subject']) gives wire format. Yow! I think I'm just joking. Right? > >> Now, setting headers. Sometimes you have some unicode thing and > >> sometimes you have some bytes. You need to end up with bytes in > >> the ASCII range and you'd like to leave the header value unencoded > >> if so. But in both cases, you might have bytes or characters > >> outside that range, so you need an explicit encoding, defaulting to > >> utf-8 probably. > >> >>> Message.set_header('Subject', 'Some text', encoding='utf-8') > >> >>> Message.set_header('Subject', b'Some bytes') > > > > Where you just want "a damned valid email and stop making my life > > hard!": -1 I mean, yeah, Brother, I feel your pain but it just isn't that easy. If that were feasible, it would be *criminal* to have a .set_header() method at all! In fact, > > Message['Subject']='Some text' is going to (a) need to take *only* unicodes, or (b) raise Exceptions at the slightest provocation when handed bytes. And things only get worse if you try to provide this interface for say "From" (let alone "Content-Type"). Is it really worth doing the mapping interface if it's only usable with free-form headers (ie, only Subject among the commonly used headers)? > Yes. In which case I propose we guess the encoding as 1) ascii, 2) > utf-8, 3) wtf? Uh, what guessing? If you don't know what you have but you believe it to be a valid header field, then presumably you got it off the wire and it's still in bytes and you just spit it out on the wire without trying to decode or encode it. But as I already said, I think that's a bad idea. Otherwise, you should have a unicode, and you simply look at the range of the string. If it fits in ASCII, Bob's your uncle. If not, Bob's your aunt (and you use UTF-8). > > Where you care about what encoding is used: > > > > Message['Subject']=Header('Some text',encoding='utf-8') > > Yes. > > > If you have bytes, for whatever reason: > > > > Message['Subject']=b'some bytes'.decode('utf-8') > > > > ...because only you know what encoding those bytes use! > > So you're saying that __setitem__() should not accept raw bytes? How do you distinguish "raw" bytes from "encoded bytes"? __setitem__() shouldn't accept bytes at all. There should be an API which sets a .formatted_for_the_wire member, and it should have a "validate" option (ie, when true the API attempts to parse the header and raises an exception if it fails to do so; when false, it assumes you know what you're doing and will send out the bytes verbatim). From martin at v.loewis.de Mon Apr 13 19:19:18 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 13 Apr 2009 19:19:18 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> Message-ID: <49E37416.5030802@v.loewis.de> > I use the json module in 2.6 to communicate with a C# JSON library and a > JavaScript JSON library. The C# and JavaScript libraries produce and > consume the equivalent of str, not the equivalent of bytes. I assume there is a TCP connection between the json module and the C#/JavaScript libraries? If so, it doesn't matter what representation these implementations chose to use. > Hope that helps, Maybe I misunderstood, and you are *not* communicating over the wire. In this case, I'm puzzled how you get the data from Python to the C# JSON library, or to the JavaScript library. Regards, Martin From steven.bethard at gmail.com Mon Apr 13 19:23:45 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 13 Apr 2009 10:23:45 -0700 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa wrote: > > > On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou > wrote: >> >> Mart S?mermaa gmail.com> writes: >> > >> > Proposal: add add_query_params() for appending query parameters to an >> > URL to >> urllib.parse and urlparse. >> >> Is there anything to /remove/ a query parameter? > > I'd say this is outside the scope of add_query_params(). > > As for the duplicate handling, I've implemented a threefold strategy that > should address all use cases raised before: > > ?def add_query_params(*args, **kwargs): > ??? """ > ??? add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs) > > ??? Appends query parameters to an URL and returns the result. > > ??? :param url: the URL to update, a string. > ??? :param allow_dups: if > ??????? * True: plainly append new parameters, allowing all duplicates > ????????? (default), > ??????? * False: disallow duplicates in values and regroup keys so that > ????????? different values for the same key are adjacent, > ??????? * None: disallow duplicates in keys -- each key can have a single > ????????? value and later values override the value (like dict.update()). Unnamed flag parameters are unfriendly to the reader. If I see something like: add_query_params(url, True, dict(a=b, c=d)) I can pretty much guess what the first and third arguments are, but I have no clue for the second. Even if I have read the documentation before, I may not remember whether the middle argument is "allow_dups" or "keep_dups". Steve > ??? :param args_dict: optional dictionary of parameters, default is {}. > ??? :param separator: either ';' or '&', the separator between key-value > ??????? pairs, default is '&'. > ??? :param kwargs: parameters as keyword arguments. > > ??? :return: original URL with updated query parameters or the original URL > ??????? unchanged if no parameters given. > ??? """ > > The commit is > > http://github.com/mrts/qparams/blob/b9bdbec46bf919d142ff63e6b2b822b5d57b6f89/qparams.py > > extensive description of the behaviour is in the doctests. From steve at pearwood.info Mon Apr 13 20:32:25 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 14 Apr 2009 04:32:25 +1000 Subject: [Python-Dev] [Email-SIG] headers api for email package In-Reply-To: <873accv5jr.fsf@xemacs.org> References: <873accv5jr.fsf@xemacs.org> Message-ID: <200904140432.25953.steve@pearwood.info> On Tue, 14 Apr 2009 03:15:20 am Stephen J. Turnbull wrote: > *People* see email as (rich-)text. We do? It's not clear what you actually mean by "(rich-)text". In the context of email, I understand it to mean HTML in the body, web-bugs, security exploits, 36pt hot-pink bold text on a lime-green background, and all the other wonderful things modern mail clients let you put in your email. But as far as I know, no mail client tries to render HTML tags inside mail headers, so you're probably not talking about HTML rich-text. I guess you mean Unicode characters. Am I right? Now, correct me if I'm wrong, but I don't think mail headers can actually be anything *but* bytes. I see that my mail client, at least, sends bytes in the Subject header. If I try to send characters, e.g. the subject header "Testing-?-" (without the quotes), what actually gets sent is the bytes "=?utf-8?q?Testing-=CE=B2-?=" (again without the quotation marks). This seems to be covered by RFC 2047: http://tools.ietf.org/html/rfc2047 If you're proposing converting those bytes into characters, that's all very well and good, but what's your strategy for dealing with the inevitable wrongly-formatted headers? If the header can't be correctly decoded into text, there still needs to be a way to get to the raw bytes. Apart from (e.g.) mail processing apps like SpamBayes which will want to inspect the raw bytes, mail readers will need to deal with badly formatted mail. The RFC states: "However, a mail reader MUST NOT prevent the display or handling of a message because an 'encoded-word' is incorrectly formed." [...] > Then MTAs see email as a string of octets. ?So guess what: > > ?> > bytes(message['Subject']) > > gives wire format. ?Yow! ?I think I'm just joking. ?Right? Er, I'm not sure. Are you joking? I hope not, because it is important to be able to get to the raw, unmodified bytes that the MTA sees, without all the fancy processing you suggest. [...] > Otherwise, you should have a unicode, and you simply look > at the range of the string. ?If it fits in ASCII, Bob's your uncle. > If not, Bob's your aunt (and you use UTF-8). Again, correct me if I'm wrong, but *all* valid mail headers must fit in ASCII. RFC 5335 defines an experimental approach to allowing full Unicode in mail headers, but surely it's going to be a while before that's common, let alone standard. http://tools.ietf.org/html/rfc5335 -- Steven D'Aprano From daniel at stutzbachenterprises.com Mon Apr 13 20:42:41 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 13 Apr 2009 13:42:41 -0500 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E37416.5030802@v.loewis.de> References: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> Message-ID: On Mon, Apr 13, 2009 at 12:19 PM, "Martin v. L?wis" wrote: > > I use the json module in 2.6 to communicate with a C# JSON library and a > > JavaScript JSON library. The C# and JavaScript libraries produce and > > consume the equivalent of str, not the equivalent of bytes. > > I assume there is a TCP connection between the json module and the > C#/JavaScript libraries? > Yes, there's a TCP connection. Sorry for not making that clear to begin with. I also sometimes store JSON objects in a database. In that case, I pass strings to the database API which stores them in a TEXT field. Obviously somewhere they get encoding to bytes, but that's handled by the database. > If so, it doesn't matter what representation these implementations chose > to use. True, I can always convert from bytes to str or vise versa. Sometimes it is illustrative to see how others have chosen to solve the same problem. The JSON specification and other implementations serializes an object to a string. Python's json.dumps() needs to either return a str or let the user specify an encoding. At least one of these two needs to work: json.dumps({}).encode('utf-16le') # dumps() returns str '{\x00}\x00' json.dumps({}, encoding='utf-16le') # dumps() returns bytes '{\x00}\x00' In 2.6, the first one works. The second incorrectly returns '{}'. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From foom at fuhm.net Mon Apr 13 21:11:37 2009 From: foom at fuhm.net (James Y Knight) Date: Mon, 13 Apr 2009 15:11:37 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <25BB706E-C155-451B-AE18-7A8C83824FD6@fuhm.net> On Apr 13, 2009, at 10:11 AM, Barry Warsaw wrote: > The email package does not need a parser for every header, but it > should provide a framework that applications (or third party > libraries) can use to extend the built-in header parsers. A bare > minimum for functionality requires a Content-Type parser. I think > the email package should also include an address header (Originator, > Destination) parser, and a Message-ID header parser. Possibly others. Sure, that's fine... > The default would probably be some unstructured parser for headers > like Subject. But for unknown headers, it's not a useful choice to return a "str" object. "str" is just one possible structured data representation for a header: there's no correct useful decoding of all headers into str. Of course for the "Subject" header, str is the correct result type, but that's not a default, that's explicit support for "Subject". You can't correctly decode "To" into a str, so what makes you think you can decode "X-Gabazaborph" into str? The only useful and correct representation for unknown (or unimplemented) headers is the raw bytes. James From martin at v.loewis.de Mon Apr 13 22:02:17 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 13 Apr 2009 22:02:17 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410025203.GA199@panix.com> <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> Message-ID: <49E39A49.9070507@v.loewis.de> > Yes, there's a TCP connection. Sorry for not making that clear to begin > with. > > If so, it doesn't matter what representation these implementations chose > to use. > > > True, I can always convert from bytes to str or vise versa. I think you are missing the point. It will not be necessary to convert. You can write the JSON into the TCP connection in Python, and it will come out just fine as strings just fine in C# and JavaScript. This is how middleware works - it abstracts from programming languages, and allows for different representations in different languages, in a manner invisible to the participating processes. > At least one of these two needs to work: > > json.dumps({}).encode('utf-16le') # dumps() returns str > '{\x00}\x00' > > json.dumps({}, encoding='utf-16le') # dumps() returns bytes > '{\x00}\x00' > > In 2.6, the first one works. The second incorrectly returns '{}'. Ok, that might be a bug in the JSON implementation - but you shouldn't be using utf-16le, anyway. Use UTF-8 always, and it will work fine. The questions is: which of them is more appropriate, if, what you want, is bytes. I argue that the second form is better, since it saves you an encode invocation. Regards, Martin From mrts.pydev at gmail.com Mon Apr 13 22:14:50 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Mon, 13 Apr 2009 23:14:50 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: On Mon, Apr 13, 2009 at 8:23 PM, Steven Bethard wrote: > > On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa wrote: > > > > > > On Mon, Apr 13, 2009 at 12:56 AM, Antoine Pitrou > > wrote: > >> > >> Mart S?mermaa gmail.com> writes: > >> > > >> > Proposal: add add_query_params() for appending query parameters to an > >> > URL to > >> urllib.parse and urlparse. > >> > >> Is there anything to /remove/ a query parameter? > > > > I'd say this is outside the scope of add_query_params(). > > > > As for the duplicate handling, I've implemented a threefold strategy that > > should address all use cases raised before: > > > > def add_query_params(*args, **kwargs): > > """ > > add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs) > > > > Appends query parameters to an URL and returns the result. > > > > :param url: the URL to update, a string. > > :param allow_dups: if > > * True: plainly append new parameters, allowing all duplicates > > (default), > > * False: disallow duplicates in values and regroup keys so that > > different values for the same key are adjacent, > > * None: disallow duplicates in keys -- each key can have a single > > value and later values override the value (like dict.update()). > > Unnamed flag parameters are unfriendly to the reader. If I see something like: > > add_query_params(url, True, dict(a=b, c=d)) > > I can pretty much guess what the first and third arguments are, but I > have no clue for the second. Even if I have read the documentation > before, I may not remember whether the middle argument is "allow_dups" > or "keep_dups". Keyword arguments are already used for specifying the arguments to the query, so naming can't be used. Someone may need an 'allow_dups' key in their query and forget to pass it in params_dict. A default behaviour should be found that works according to most user's expectations so that they don't need to use the positional arguments generally. Antoine Pitrou wrote: > You could e.g. rename the function to update_query_params() and decide that > every parameter whose specified value is None must atcually be removed from > the URL. I agree that removing parameters is useful. Currently, None is used for signifying a key with no value. Instead, booleans could be used: if a key is True (but obviously not any other value that evaluates to True), it is a key with no value, if False (under the same evaluation restriction), it should be removed from the query if present. None should not be treated specially under that scheme. As an example: >>> update_query_params('http://example.com/?q=foo', q=False, a=True, b='c', d=None) 'http://example.com/?a&b=c&d=None' However, 1) I'm not sure about the implications of 'foo is True', I have never used it and PEP 8 explicitly warns against it -- does it work consistently across different Python implementations? (Assuming on the grounds that True should be a singleton no different from None that it should work.) 2) the API gets overly complicated -- as per the complaint above, it's usability-challenged already. From bob at redivi.com Mon Apr 13 22:28:26 2009 From: bob at redivi.com (Bob Ippolito) Date: Mon, 13 Apr 2009 13:28:26 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E39A49.9070507@v.loewis.de> References: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com> On Mon, Apr 13, 2009 at 1:02 PM, "Martin v. L?wis" wrote: >> Yes, there's a TCP connection. ?Sorry for not making that clear to begin >> with. >> >> ? ? If so, it doesn't matter what representation these implementations chose >> ? ? to use. >> >> >> True, I can always convert from bytes to str or vise versa. > > I think you are missing the point. It will not be necessary to convert. > You can write the JSON into the TCP connection in Python, and it will > come out just fine as strings just fine in C# and JavaScript. This > is how middleware works - it abstracts from programming languages, and > allows for different representations in different languages, in a > manner invisible to the participating processes. > >> At least one of these two needs to work: >> >> json.dumps({}).encode('utf-16le') ?# dumps() returns str >> '{\x00}\x00' >> >> json.dumps({}, encoding='utf-16le') ?# dumps() returns bytes >> '{\x00}\x00' >> >> In 2.6, the first one works. ?The second incorrectly returns '{}'. > > Ok, that might be a bug in the JSON implementation - but you shouldn't > be using utf-16le, anyway. Use UTF-8 always, and it will work fine. > > The questions is: which of them is more appropriate, if, what you want, > is bytes. I argue that the second form is better, since it saves you > an encode invocation. It's not a bug in dumps, it's a matter of not reading the documentation. The encoding parameter of dumps decides how byte strings should be interpreted, not what the output encoding is. The output of json/simplejson dumps for Python 2.x is either an ASCII bytestring (default) or a unicode string (when ensure_ascii=False). This is very practical in 2.x because an ASCII bytestring can be treated as either text or bytes in most situations, isn't going to get mangled over any kind of encoding mismatch (as long as it's an ASCII superset), and skips an encoding step if getting sent over the wire.. >>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be') '["foo"]' >>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be', ensure_ascii=False) u'["foo"]' -bob From daniel at stutzbachenterprises.com Mon Apr 13 22:32:17 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 13 Apr 2009 15:32:17 -0500 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com> References: <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com> Message-ID: On Mon, Apr 13, 2009 at 3:28 PM, Bob Ippolito wrote: > It's not a bug in dumps, it's a matter of not reading the > documentation. The encoding parameter of dumps decides how byte > strings should be interpreted, not what the output encoding is. > You're right; I apologize for not reading more closely. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel at stutzbachenterprises.com Mon Apr 13 23:25:28 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 13 Apr 2009 16:25:28 -0500 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E39A49.9070507@v.loewis.de> References: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: On Mon, Apr 13, 2009 at 3:02 PM, "Martin v. L?wis" wrote: > > True, I can always convert from bytes to str or vise versa. > > I think you are missing the point. It will not be necessary to convert. Sometimes I want bytes and sometimes I want str. I am going to be converting some of the time. ;-) Below is a basic CGI application that assumes that json module works with str, not bytes. How would you write it if the json module does not support returning a str? print("Content-Type: application/json; charset=utf-8") input_object = json.loads(sys.stdin.read()) output_object = do_some_work(input_object) print(json.dumps(output_object)) print() The questions is: which of them is more appropriate, if, what you want, > is bytes. I argue that the second form is better, since it saves you > an encode invocation. > If what you want is bytes, encoding has to happen somewhere. If the json module has some optimizations to do the encoding at the same time as the serialization, great. However, based on the original post of this thread, it sounds like that code doesn't exist or doesn't work correctly. What's the benefit of preventing users from getting a str out if that's what they want? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Apr 14 01:12:51 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Apr 2009 11:12:51 +1200 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <91ad5bf80903271728ka18360cpd514aa5dd93cd74a@mail.gmail.com> <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> <49E327A6.3000801@voidspace.org.uk> Message-ID: <49E3C6F3.9040400@canterbury.ac.nz> Antoine Pitrou wrote: > Say you are filtering or sorting data based on some URL parameters. If the user > wants to remove one of those filters, you have to remove the corresponding query > parameter. For an application like that, I would be keeping the parameters as a list or some other structured way and only converting them to a URL when needed. -- Greg From greg.ewing at canterbury.ac.nz Tue Apr 14 01:27:42 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Apr 2009 11:27:42 +1200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> Message-ID: <49E3CA6E.1070501@canterbury.ac.nz> Barry Warsaw wrote: > The default > would probably be some unstructured parser for headers like Subject. Only for headers known to be unstructured, I think. Completely unknown headers should be available only as bytes. -- Greg From greg.ewing at canterbury.ac.nz Tue Apr 14 01:28:24 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Apr 2009 11:28:24 +1200 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org> Message-ID: <49E3CA98.3090504@canterbury.ac.nz> Barry Warsaw wrote: > For an > Originator or Destination address, what does str(header) return? It should be an error, I think. -- Greg From alexandre at peadrop.com Tue Apr 14 01:44:38 2009 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 13 Apr 2009 19:44:38 -0400 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: On Mon, Apr 13, 2009 at 5:25 PM, Daniel Stutzbach wrote: > On Mon, Apr 13, 2009 at 3:02 PM, "Martin v. L?wis" > wrote: >> >> > True, I can always convert from bytes to str or vise versa. >> >> I think you are missing the point. It will not be necessary to convert. > > Sometimes I want bytes and sometimes I want str.? I am going to be > converting some of the time. ;-) > > Below is a basic CGI application that assumes that json module works with > str, not bytes.? How would you write it if the json module does not support > returning a str? > > print("Content-Type: application/json; charset=utf-8") > input_object = json.loads(sys.stdin.read()) > output_object = do_some_work(input_object) > print(json.dumps(output_object)) > print() > Like this? print("Content-Type: application/json; charset=utf-8") input_object = json.loads(sys.stdin.buffer.read()) output_object = do_some_work(input_object) stdout.buffer.write(json.dumps(output_object)) -- Alexandre From rdmurray at bitdance.com Tue Apr 14 01:46:20 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 13 Apr 2009 19:46:20 -0400 (EDT) Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: <49E3CA98.3090504@canterbury.ac.nz> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org> <49E3CA98.3090504@canterbury.ac.nz> Message-ID: On Tue, 14 Apr 2009 at 11:28, Greg Ewing wrote: > Barry Warsaw wrote: >> For an Originator or Destination address, what does str(header) return? > > It should be an error, I think. That doesn't make sense to me. str() should return _something_. --David From greg.ewing at canterbury.ac.nz Tue Apr 14 01:59:55 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Apr 2009 11:59:55 +1200 Subject: [Python-Dev] [Email-SIG] Dropping bytes "support" in json In-Reply-To: References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <20090410051902.12555.1059181741.divmod.xquotient.7720@weber.divmod.com> <49DF8956.5050501@g.nevcal.com> <7DF370A6-88E4-4710-9CF8-B0B3D7249383@python.org> <49E3CA98.3090504@canterbury.ac.nz> Message-ID: <49E3D1FB.9000205@canterbury.ac.nz> R. David Murray wrote: > That doesn't make sense to me. str() should return > _something_. Well, it might return something like "". But you shouldn't rely on it to give you anything useful for an arbitrary header. -- Greg From solipsis at pitrou.net Tue Apr 14 01:58:27 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Apr 2009 23:58:27 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> <6a36e7290904131328u6d4d3c20g6e12e0fd893523a2@mail.gmail.com> Message-ID: Bob Ippolito redivi.com> writes: > > The output of json/simplejson dumps for Python 2.x is either an ASCII > bytestring (default) or a unicode string (when ensure_ascii=False). > This is very practical in 2.x because an ASCII bytestring can be > treated as either text or bytes in most situations, isn't going to get > mangled over any kind of encoding mismatch (as long as it's an ASCII > superset), and skips an encoding step if getting sent over the wire.. Which means that the json module already deals with text rather than bytes, apart from the optimization that pure ASCII text is returned as 8-bit strings. Regards Antoine. From greg.ewing at canterbury.ac.nz Tue Apr 14 02:00:19 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 14 Apr 2009 12:00:19 +1200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: <49E3D213.4030801@canterbury.ac.nz> Alexandre Vassalotti wrote: >>print("Content-Type: application/json; charset=utf-8") >>input_object = json.loads(sys.stdin.read()) >>output_object = do_some_work(input_object) >>print(json.dumps(output_object)) >>print() That assumes the encoding being used by stdout has ascii as a subset. -- Greg From eric at trueblade.com Tue Apr 14 02:05:34 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 13 Apr 2009 20:05:34 -0400 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> Message-ID: <49E3D34E.8040705@trueblade.com> Mark has uploaded our newest work to Rietveld, again at http://codereview.appspot.com/33084/show. Since the last version, Mark has added 387 support (and other fixes) and I've added localized formatting ('n') back in as well as ',' formatting for float and int. I think this addresses all open issues. If you have time, please review the code on Rietveld. We believe we're ready to merge this back into the py3k branch. Pending any comments here or on Rietveld, we'll do the merge in the next day or so. Before then, if anyone could build and test the py3k-short-float-repr branch on any of the following machines, that would be great: Windows (preferably 64-bit) Itanium Old Intel/Linux (e.g., the snakebite nitrogen box) Something bigendian, like a G4 Mac We're pretty well tested on x86 Mac and Linux, and I've run it once on my Windows 32-bit machine. I have a Snakebite account, and I'll try running on nitrogen once I figure out how to log in again. I just had Itanium and PPC buildbots test our branch, and they both succeeded (or at least failed with errors not related to our changes). Eric. From benjamin at python.org Tue Apr 14 02:54:29 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 13 Apr 2009 19:54:29 -0500 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <49E3D34E.8040705@trueblade.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: <1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com> 2009/4/13 Eric Smith : > Mark has uploaded our newest work to Rietveld, again at > http://codereview.appspot.com/33084/show. Since the last version, Mark has > added 387 support (and other fixes) and I've added localized formatting > ('n') back in as well as ',' formatting for float and int. I think this > addresses all open issues. If you have time, please review the code on > Rietveld. > > We believe we're ready to merge this back into the py3k branch. Pending any > comments here or on Rietveld, we'll do the merge in the next day or so. Cool. Will you use svnmerge.py to integrate the branch? After having some odd behavior merging the io-c branch, suggest you just apply a patch to the py3k branch, -- Regards, Benjamin From eric at trueblade.com Tue Apr 14 03:14:50 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 13 Apr 2009 21:14:50 -0400 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <1afaf6160904131754v414a855coff27921490de2a0a@mail.gmail.com> Message-ID: <49E3E38A.6040204@trueblade.com> Benjamin Peterson wrote: > Cool. Will you use svnmerge.py to integrate the branch? After having > some odd behavior merging the io-c branch, suggest you just apply a > patch to the py3k branch, We're just going to apply 2 patches, without using svnmerge. First we'll add new files and the configure changes. Once we're sure that builds everywhere, then the second step will actually hook in the new functions and will have the formatting changes. From steven.bethard at gmail.com Tue Apr 14 03:19:27 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 13 Apr 2009 18:19:27 -0700 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: On Mon, Apr 13, 2009 at 1:14 PM, Mart S?mermaa wrote: > On Mon, Apr 13, 2009 at 8:23 PM, Steven Bethard wrote: >> On Mon, Apr 13, 2009 at 2:29 AM, Mart S?mermaa wrote: >> > As for the duplicate handling, I've implemented a threefold strategy that >> > should address all use cases raised before: >> > >> > ?def add_query_params(*args, **kwargs): >> > ? ? """ >> > ? ? add_query_parms(url, [allow_dups, [args_dict, [separator]]], **kwargs) >> > >> > ? ? Appends query parameters to an URL and returns the result. >> > >> > ? ? :param url: the URL to update, a string. >> > ? ? :param allow_dups: if >> > ? ? ? ? * True: plainly append new parameters, allowing all duplicates >> > ? ? ? ? ? (default), >> > ? ? ? ? * False: disallow duplicates in values and regroup keys so that >> > ? ? ? ? ? different values for the same key are adjacent, >> > ? ? ? ? * None: disallow duplicates in keys -- each key can have a single >> > ? ? ? ? ? value and later values override the value (like dict.update()). >> >> Unnamed flag parameters are unfriendly to the reader. If I see something like: >> >> ?add_query_params(url, True, dict(a=b, c=d)) >> >> I can pretty much guess what the first and third arguments are, but I >> have no clue for the second. Even if I have read the documentation >> before, I may not remember whether the middle argument is "allow_dups" >> or "keep_dups". > > Keyword arguments are already used for specifying the arguments to the > query, so naming can't be used. Someone may need an 'allow_dups' key > in their query and forget to pass it in params_dict. > > A default behaviour should be found that works according to most > user's expectations so that they don't need to use the positional > arguments generally. I believe the usual Python approach here is to have two variants of the function, add_query_params and add_query_params_no_dups (or whatever you want to name them). That way the flag parameter is "named" right in the function name. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From nad at acm.org Tue Apr 14 04:07:57 2009 From: nad at acm.org (Ned Deily) Date: Mon, 13 Apr 2009 19:07:57 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: In article <49E3D34E.8040705 at trueblade.com>, Eric Smith wrote: > Before then, if anyone could build and test the py3k-short-float-repr > branch on any of the following machines, that would be great: > [...] > Something bigendian, like a G4 Mac I'll crank up some OS X installer builds and run them on G3 and G4 Macs vs 32-/64- Intel. Any tests of interest beyond the default regttest.py? -- Ned Deily, nad at acm.org From martin at v.loewis.de Tue Apr 14 04:40:10 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 14 Apr 2009 04:40:10 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <663162E3-D2EB-4417-93D0-4764BC94646C@python.org> <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: <49E3F78A.7000307@v.loewis.de> > Below is a basic CGI application that assumes that json module works > with str, not bytes. How would you write it if the json module does not > support returning a str? In a CGI application, you shouldn't be using sys.stdin or print(). Instead, you should be using sys.stdin.buffer (or sys.stdin.buffer.raw), and sys.stdout.buffer.raw. A CGI script essentially does binary IO; if you use TextIO, there likely will be bugs (e.g. if you have attachments of type application/octet-stream). > print("Content-Type: application/json; charset=utf-8") > input_object = json.loads(sys.stdin.read()) > output_object = do_some_work(input_object) > print(json.dumps(output_object)) > print() out = sys.stdout.buffer.raw out.write(b"Content-Type: application/json; charset=utf-8\n\n") input_object = json.loads(sys.stdin.buffer.raw.read()) output_object = do_some_work(input_object) out.write(json.dumps(output_object)) > What's the benefit of preventing users from getting a str out if that's > what they want? If they really want it, there is no benefit from preventing them. I'm just puzzled why they want it, and what possible applications might be where they want it. Perhaps they misunderstand something when they think they want it. Regards, Martin From stephen at xemacs.org Tue Apr 14 09:00:59 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Apr 2009 16:00:59 +0900 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <49E3CA6E.1070501@canterbury.ac.nz> References: <86F681EB-2645-4C8C-B02F-06E9F4344139@python.org> <07025875-59B6-4508-96E5-BAFE4D36FF3B@python.org> <49E3CA6E.1070501@canterbury.ac.nz> Message-ID: <87ocuzu3bo.fsf@xemacs.org> Warning: Reply-To set to email-sig. Greg Ewing writes: > Only for headers known to be unstructured, I think. > Completely unknown headers should be available only > as bytes. Why do I get the feeling that you guys are feeling up an elephant? There are four things you might want to do with a header: (1) Put it on the wire, which must be bytes (in fact, ASCII). (2) Show it to a user (such as a rootin-tootin spam-fightin mail admin), which for consistency with well-behaved, implemented headers (ie, you might want to *gasp* *concatenate* your unknown header with a string), will sooner or later be string (ie, Unicode). (3) (Try to) parse it, in which case an internal representation with some other structure may or may not be appropriate for storing the parsed data. (4) Munge it, in which case an internal representation with some other structure may or may not be appropriate. I see no particular reason for restricting these basic API classes for any header. From eric at trueblade.com Tue Apr 14 10:45:28 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 14 Apr 2009 04:45:28 -0400 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: <49E44D28.3010500@trueblade.com> Ned Deily wrote: > In article <49E3D34E.8040705 at trueblade.com>, > Eric Smith wrote: >> Before then, if anyone could build and test the py3k-short-float-repr >> branch on any of the following machines, that would be great: >> > [...] >> Something bigendian, like a G4 Mac > > I'll crank up some OS X installer builds and run them on G3 and G4 Macs > vs 32-/64- Intel. Any tests of interest beyond the default regttest.py? Thanks! regrtest.py should be enough. Eric. From nad at acm.org Tue Apr 14 10:45:51 2009 From: nad at acm.org (Ned Deily) Date: Tue, 14 Apr 2009 01:45:51 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: In article , Ned Deily wrote: > In article <49E3D34E.8040705 at trueblade.com>, > Eric Smith wrote: > > Before then, if anyone could build and test the py3k-short-float-repr > > branch on any of the following machines, that would be great: > > > [...] > > Something bigendian, like a G4 Mac > > I'll crank up some OS X installer builds and run them on G3 and G4 Macs > vs 32-/64- Intel. Any tests of interest beyond the default regttest.py? FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for 10.3 and above; this is the similar to recent python.org OSX installers. The good news: on 10.5 i386, running the default regrtest, no signficant differences were noted from an installer built from the current main py3k head. Bad news: the same build installed on a G4 running 10.5 hung hard in test_pow of test_builtin; a kill was needed to terminate python. Same results on a G3 running 10.4. nad at pbg4:/Library/Frameworks/Python.framework/Versions/3.1$ bin/python -S lib/python3.1/test/regrtest.py -s -v test_builtin test_builtin test_abs (test.test_builtin.BuiltinTest) ... ok test_all (test.test_builtin.BuiltinTest) ... ok test_any (test.test_builtin.BuiltinTest) ... ok test_ascii (test.test_builtin.BuiltinTest) ... ok test_bin (test.test_builtin.BuiltinTest) ... ok test_callable (test.test_builtin.BuiltinTest) ... ok test_chr (test.test_builtin.BuiltinTest) ... ok test_cmp (test.test_builtin.BuiltinTest) ... ok test_compile (test.test_builtin.BuiltinTest) ... ok test_delattr (test.test_builtin.BuiltinTest) ... ok test_dir (test.test_builtin.BuiltinTest) ... ok test_divmod (test.test_builtin.BuiltinTest) ... ok test_eval (test.test_builtin.BuiltinTest) ... ok test_exec (test.test_builtin.BuiltinTest) ... ok test_exec_redirected (test.test_builtin.BuiltinTest) ... ok test_filter (test.test_builtin.BuiltinTest) ... ok test_general_eval (test.test_builtin.BuiltinTest) ... ok test_getattr (test.test_builtin.BuiltinTest) ... ok test_hasattr (test.test_builtin.BuiltinTest) ... ok test_hash (test.test_builtin.BuiltinTest) ... ok test_hex (test.test_builtin.BuiltinTest) ... ok test_id (test.test_builtin.BuiltinTest) ... ok test_import (test.test_builtin.BuiltinTest) ... ok test_input (test.test_builtin.BuiltinTest) ... ok test_isinstance (test.test_builtin.BuiltinTest) ... ok test_issubclass (test.test_builtin.BuiltinTest) ... ok test_iter (test.test_builtin.BuiltinTest) ... ok test_len (test.test_builtin.BuiltinTest) ... ok test_map (test.test_builtin.BuiltinTest) ... ok test_max (test.test_builtin.BuiltinTest) ... ok test_min (test.test_builtin.BuiltinTest) ... ok test_neg (test.test_builtin.BuiltinTest) ... ok test_next (test.test_builtin.BuiltinTest) ... ok test_oct (test.test_builtin.BuiltinTest) ... ok test_open (test.test_builtin.BuiltinTest) ... ok test_ord (test.test_builtin.BuiltinTest) ... ok test_pow (test.test_builtin.BuiltinTest) ... ^CTerminated Stepping through some of test_pow from the interactive interpreter: Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> pow(0,0) 1 <-- OK >>> pow(2,30) 1073741824 <-- OK >>> pow(0.,0) ^C^CTerminated <-- float argument => python hung in CPU loop, killed Then I tried a couple of random floats: Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 3.1 -9.255965342383856e+61 >>> 1. ^C Terminated <-- kill needed The same tests work fine on the intel Mac. Just out of curiosity, I'll try to do the same build on the 10.4 ppc; there are occasionally a few differences noted in the build results. That won't be available until later today. -- Ned Deily, nad at acm.org From l.mastrodomenico at gmail.com Tue Apr 14 10:54:04 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 14 Apr 2009 10:54:04 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: References: <49DEBB21.70305@gmail.com> <20090410052836.12555.1364055898.divmod.xquotient.7734@weber.divmod.com> <49E00931.6050107@v.loewis.de> <49E37416.5030802@v.loewis.de> <49E39A49.9070507@v.loewis.de> Message-ID: 2009/4/13 Daniel Stutzbach : > print("Content-Type: application/json; charset=utf-8") Please don't do that! According to RFC 4627 the "charset" parameter is not allowed for the application/json media type. Just use "Content-Type: application/json", the charset is only misleading because even if you specify, e.g., ISO-8859-1 a standard-compliant receiver will probably still try to interpret the content as UTF-8/16/32. OTOH a charset can be used if you send JSON with an application/javascript MIME type. -- Lino Mastrodomenico From dickinsm at gmail.com Tue Apr 14 12:31:21 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 11:31:21 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com> On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily wrote: > Ned Deily wrote: >> Eric Smith wrote: >> > Before then, if anyone could build and test the py3k-short-float-repr >> > branch on any of the following machines, that would be great: >> > >> [...] >> > Something bigendian, like a G4 Mac >> >> I'll crank up some OS X installer builds and run them on G3 and G4 Macs >> vs 32-/64- Intel. Any tests of interest beyond the default regttest.py? Ned, many thanks for doing this! > Then I tried a couple of random floats: > > Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) > [GCC 4.0.1 (Apple Inc. build 5490)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> 3.1 > -9.255965342383856e+61 >>>> 1. > ^C > Terminated <-- kill needed Cool! I suspect endianness issues. As evidence, I present: >>> list(struct.pack('>> list(struct.pack(' References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: <49E466A8.9050306@trueblade.com> Ned Deily wrote: >> I'll crank up some OS X installer builds and run them on G3 and G4 Macs >> vs 32-/64- Intel. Any tests of interest beyond the default regttest.py? > > FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for > 10.3 and above; this is the similar to recent python.org OSX installers. > The good news: on 10.5 i386, running the default regrtest, no signficant > differences were noted from an installer built from the current main > py3k head. Okay, that's awesome. Thanks. > Bad news: the same build installed on a G4 running 10.5 hung hard in > test_pow of test_builtin; a kill was needed to terminate python. Same > results on a G3 running 10.4. Okay, that's less than awesome. But still a huge thanks. > Then I tried a couple of random floats: > > Python 3.1a2+ (py3k-short-float-repr, Apr 13 2009, 20:55:35) > [GCC 4.0.1 (Apple Inc. build 5490)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> 3.1 > -9.255965342383856e+61 >>>> 1. > ^C > Terminated <-- kill needed I don't suppose it's possible that you could run this under gdb and get a stack trace when it starts looping (assuming that's what's happening)? I think I might have a PPC Mac Mini I can get my hands on, and I'll test there if possible. Eric. From dickinsm at gmail.com Tue Apr 14 12:37:38 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 11:37:38 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com> Message-ID: <5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com> By the way, a simple native build on OS X 10.4/PPC passed all tests (that we're already failing before). Mark From dickinsm at gmail.com Tue Apr 14 12:42:09 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 11:42:09 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com> <5c6f2a5d0904140337m394239f2w617488b18e41a198@mail.gmail.com> Message-ID: <5c6f2a5d0904140342s1567cdefyd9c1d9ddab089192@mail.gmail.com> On Tue, Apr 14, 2009 at 11:37 AM, Mark Dickinson wrote: > By the way, a simple native build on OS X 10.4/PPC passed all tests (that > we're already failing before). s/we're/weren't From solipsis at pitrou.net Tue Apr 14 12:44:13 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Apr 2009 10:44:13 +0000 (UTC) Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140331u33b29136o55ad0286a335326e@mail.gmail.com> Message-ID: Mark Dickinson gmail.com> writes: > > But I'd expect that there are already similar issues > with a 'fat' build of py3k on OS X. After all, there's > already a 'WORDS_BIGENDIAN' in pyconfig.h.in. I > don't know where this is used. It's used e.g. in unicode encoding/decoding, and in the IO lib. If that constant can't take different values depending on the CPU arch, we have a big problem. Regards Antoine. From dickinsm at gmail.com Tue Apr 14 14:40:36 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 13:40:36 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> Message-ID: <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> On Tue, Apr 14, 2009 at 9:45 AM, Ned Deily wrote: > FIrst attempt was a fat (32-bit i386 and ppc) build on 10.5 targeted for > 10.3 and above; this is the similar to recent python.org OSX installers. What's the proper way to create such a build? I've been trying: ./configure --with-universal-archs=32-bit --enable-framework --enable-universalsdk=/ MACOSX_DEPLOYMENT_TARGET=10.5 but the configure AC_C_BIGENDIAN macro doesn't seem to pick up on the universality: the output from ./configure contains the line: checking whether byte ordering is bigendian... no I was expecting a "... universal" instead of "... no". >From reading the autoconf manual, it seems as though AC_C_BIGENDIAN knows some magic to make things work for universal builds; it ought to be possible to imitate that magic somehow. Mark From solipsis at pitrou.net Tue Apr 14 16:42:31 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Apr 2009 14:42:31 +0000 (UTC) Subject: [Python-Dev] UTF-8 Decoder References: <20090413080908.GM13110@nexus.in-nomine.org> Message-ID: Jeroen Ruigrok van der Werven in-nomine.org> writes: > > This got posted on the Unicode list, does it seem interesting for Python > itself, the UTF-8 to UTF-16 transcoding might be? > > http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ If you have some time on your hands, you could try benchmarking it against Python 3.1's (py3k) decoder. There are two cases to consider: - mostly non-ASCII input, such as the "utf-8 demo" file mentioned in the page above - mostly ASCII input, such as will happen very often (think HTML, XML, log files, etc.) The py3k utf-8 decoder is optimized for the latter. Regards Antoine. From mal at egenix.com Tue Apr 14 17:02:39 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 14 Apr 2009 17:02:39 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090407174355.B62983A4063@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> Message-ID: <49E4A58F.70309@egenix.com> On 2009-04-07 19:46, P.J. Eby wrote: > At 04:58 PM 4/7/2009 +0200, M.-A. Lemburg wrote: >> On 2009-04-07 16:05, P.J. Eby wrote: >> > At 02:30 PM 4/7/2009 +0200, M.-A. Lemburg wrote: >> >> >> Wouldn't it be better to stick with a simpler approach and look for >> >> >> "__pkg__.py" files to detect namespace packages using that O(1) >> >> check ? >> >> > >> >> > Again - this wouldn't be O(1). More importantly, it breaks system >> >> > packages, which now again have to deal with the conflicting file >> names >> >> > if they want to install all portions into a single location. >> >> >> >> True, but since that means changing the package infrastructure, I >> think >> >> it's fair to ask distributors who want to use that approach to also >> take >> >> care of looking into the __pkg__.py files and merging them if >> >> necessary. >> >> >> >> Most of the time the __pkg__.py files will be empty, so that's not >> >> really much to ask for. >> > >> > This means your proposal actually doesn't add any benefit over the >> > status quo, where you can have an __init__.py that does nothing but >> > declare the package a namespace. We already have that now, and it >> > doesn't need a new filename. Why would we expect OS vendors to start >> > supporting it, just because we name it __pkg__.py instead of >> __init__.py? >> >> I lost you there. >> >> Since when do we support namespace packages in core Python without >> the need to add some form of magic support code to __init__.py ? >> >> My suggestion basically builds on the same idea as Martin's PEP, >> but uses a single __pkg__.py file as opposed to some non-Python >> file yaddayadda.pkg. > > Right... which completely obliterates the primary benefit of the > original proposal compared to the status quo. That is, that the PEP 382 > way is more compatible with system packaging tools. > > Without that benefit, there's zero gain in your proposal over having > __init__.py files just call pkgutil.extend_path() (in the stdlib since > 2.3, btw) or pkg_resources.declare_namespace() (similar functionality, > but with zipfile support and some other niceties). > > IOW, your proposal doesn't actually improve the status quo in any way > that I am able to determine, except that it calls for loading all the > __pkg__.py modules, rather than just the first one. (And the setuptools > implementation of namespace packages actually *does* load multiple > __init__.py's, so that's still no change over the status quo for > setuptools-using packages.) The purpose of the PEP is to create a standard for namespace packages. That's orthogonal to trying to enhance or change some existing techniques. I don't see the emphasis in the PEP on Linux distribution support and the remote possibility of them wanting to combine separate packages back into one package as good argument for adding yet another separate hierarchy of special files which Python scans during imports. That said, note that most distributions actually take the other route: they try to split up larger packages into smaller ones, so the argument becomes even weaker. It is much more important to standardize the approach than to try to extend some existing trickery and make them even more opaque than they already are by introducing yet another level of complexity. My alternative approach builds on existing methods and fits nicely with the __init__.py approach Python has already been using for more than a decade now. It's transparent, easy to understand and provides enough functionality to build upon - much like the original __init__.py idea. I've already laid out the arguments for and against it in my previous reply, so won't repeat them here. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 14 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Apr 14 17:17:07 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 14 Apr 2009 17:17:07 +0200 Subject: [Python-Dev] Adding new features to Python 2.x In-Reply-To: References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <4222a8490904060621y36285ca4g73fafe85d4b9c146@mail.gmail.com> <49DB4624.604@egenix.com> Message-ID: <49E4A8F3.7010202@egenix.com> On 2009-04-07 18:19, Guido van Rossum wrote: > On Tue, Apr 7, 2009 at 5:25 AM, M.-A. Lemburg wrote: >> On 2009-04-06 15:21, Jesse Noller wrote: >>> On Thu, Apr 2, 2009 at 4:33 PM, M.-A. Lemburg wrote: >>>> On 2009-04-02 17:32, Martin v. L?wis wrote: >>>>> I propose the following PEP for inclusion to Python 3.1. >>>> Thanks for picking this up. >>>> >>>> I'd like to extend the proposal to Python 2.7 and later. >>>> >>> -1 to adding it to the 2.x series. There was much discussion around >>> adding features to 2.x *and* 3.0, and the consensus seemed to *not* >>> add new features to 2.x and use those new features as carrots to help >>> lead people into 3.0. >> I must have missed that discussion :-) >> >> Where's the PEP pinning this down ? >> >> The Python 2.x user base is huge and the number of installed >> applications even larger. >> >> Cutting these users and application developers off of important new >> features added to Python 3 is only going to work as "carrot" for >> those developers who: >> >> * have enough resources (time, money, manpower) to port their existing >> application to Python 3 >> >> * can persuade their users to switch to Python 3 >> >> * don't rely much on 3rd party libraries (the bread and butter >> of Python applications) >> >> Realistically, such a porting effort is not likely going to happen >> for any decent sized application, except perhaps a few open source >> ones. >> >> Such a policy would then translate to a dead end for Python 2.x >> based applications. > > Think of the advantages though! Python 2 will finally become *stable*. > The group of users you are talking to are usually balking at the > thought of upgrading from 2.x to 2.(x+1) just as much as they might > balk at the thought of Py3k. We're finally giving them what they > really want. Python 2.x is stable - much more than 3.x is today. However, stable does not mean zero development, which a "No new features in Python 2.x" policy would translate to. If there are core developers that care about 2.x, then it should be possible for them to add the necessary patches to future 2.x releases. > Regarding calling this a dead end, we're committed to supporting 2.x > for at least five years. If that's not enough, well, it's open source, > so there's no reason why some group of rogue 2.x fans can't maintain > it indefinitely after that. Sure, but why can't this be done within the existing Python developer community ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 14 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dickinsm at gmail.com Tue Apr 14 18:09:35 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 17:09:35 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> Message-ID: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Okay, I think I might have fixed up the float endianness detection for universal builds on OS X. Ned, any chance you could give this another try with an updated version of the py3k-short-float-repr branch? One thing I don't understand: Is it true that to produce a working universal/fat build of Python, one has to first regenerate configure and pyconfig.h.in using autoconf version >= 2.62? If not, then I don't understand how the AC_C_BIGENDIAN autoconf macro can be giving the right results. Mark From solipsis at pitrou.net Tue Apr 14 18:14:32 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Apr 2009 16:14:32 +0000 (UTC) Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: Mark Dickinson gmail.com> writes: > > Okay, I think I might have fixed up the float endianness detection for > universal builds on OS X. Ned, any chance you could give this > another try with an updated version of the py3k-short-float-repr branch? If this approach is sane, could it be adopted for all other instances of endianness detection in the py3k code base? Has anyone tested a recent py3k using universal builds? Do all tests pass? From pje at telecommunity.com Tue Apr 14 18:27:51 2009 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 14 Apr 2009 12:27:51 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E4A58F.70309@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> Message-ID: <20090414162603.70C843A4100@sparrow.telecommunity.com> At 05:02 PM 4/14/2009 +0200, M.-A. Lemburg wrote: >I don't see the emphasis in the PEP on Linux distribution support and the >remote possibility of them wanting to combine separate packages back >into one package as good argument for adding yet another separate hierarchy >of special files which Python scans during imports. > >That said, note that most distributions actually take the other route: >they try to split up larger packages into smaller ones, so the argument >becomes even weaker. I think you've misunderstood something about the use case. System packaging tools don't like separate packages to contain the *same file*. That means that they *can't* split a larger package up with your proposal, because every one of those packages would have to contain a __pkg__.py -- and thus be in conflict with each other. Either that, or they would have to make a separate system package containing *only* the __pkg__.py, and then make all packages using the namespace depend on it -- which is more work and requires greater co-ordination among packagers. Allowing each system package to contain its own .pkg or .nsp or whatever files, on the other hand, allows each system package to be built independently, without conflict between contents (i.e., having the same file), and without requiring a special pseudo-package to contain the additional file. Also, executing multiple __pkg__.py files means that when multiple system packages are installed to site-packages, only one of them could possibly be executed. (Note that, even though the system packages themselves are not "combined", in practice they will all be installed to the same directory, i.e., site-packages or the platform equivalent thereof.) From dickinsm at gmail.com Tue Apr 14 18:30:18 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 17:30:18 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com> On Tue, Apr 14, 2009 at 5:14 PM, Antoine Pitrou wrote: > If this approach is sane, could it be adopted for all other instances of > endianness detection in the py3k code base? I think everything else is fine: float endianness detection (for marshal, pickle, struct) is done at runtime. Integer endianness detection goes via AC_C_BIGENDIAN, which understands universal builds---but only for autoconf >= 2.62. > Has anyone tested a recent py3k using universal builds? Do all tests pass? Do you know the right way to create a universal build? If so, I'm in a position to test on 32-bit PPC, 32-bit Intel and 64-bit Intel. Mark From solipsis at pitrou.net Tue Apr 14 18:49:19 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 14 Apr 2009 16:49:19 +0000 (UTC) Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com> Message-ID: Mark Dickinson gmail.com> writes: > > > Has anyone tested a recent py3k using universal builds? Do all tests pass? > > Do you know the right way to create a universal build? Not at all, sorry. Regards Antoine. From dickinsm at gmail.com Tue Apr 14 18:52:23 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 17:52:23 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com> Message-ID: <5c6f2a5d0904140952sc82b8d8x88bc9a77d4dc340e@mail.gmail.com> On Tue, Apr 14, 2009 at 5:49 PM, Antoine Pitrou wrote: > Mark Dickinson gmail.com> writes: >> Do you know the right way to create a universal build? > > Not at all, sorry. No problem :). I might try asking on the pythonmac-sig list. Mark From nad at acm.org Tue Apr 14 19:19:32 2009 From: nad at acm.org (Ned Deily) Date: Tue, 14 Apr 2009 10:19:32 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: In article , Antoine Pitrou wrote: > Has anyone tested a recent py3k using universal builds? Do all tests pass? It's done all the time. All of the current released installers (2.5, 2.6, 3.0) are 2-way (i386, ppc) universal and we occasionally test all of the current lines (2.6, trunk, 3.0, 3.1) as 4-way (i386, ppc, x86_64, ppc64), although the ppc64 has had no testing recently. -- Ned Deily, nad at acm.org From nad at acm.org Tue Apr 14 19:22:12 2009 From: nad at acm.org (Ned Deily) Date: Tue, 14 Apr 2009 10:22:12 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: In article <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8 at mail.gmail.com>, Mark Dickinson wrote: > Okay, I think I might have fixed up the float endianness detection for > universal builds on OS X. Ned, any chance you could give this > another try with an updated version of the py3k-short-float-repr branch? Not looking good. Appears to be same behavior on the G4 with 10.5 (haven't tried the G3 yet). -- Ned Deily, nad at acm.org From nad at acm.org Tue Apr 14 19:32:32 2009 From: nad at acm.org (Ned Deily) Date: Tue, 14 Apr 2009 10:32:32 -0700 Subject: [Python-Dev] Shorter float repr in Python 3.1? References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com> Message-ID: In article <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9 at mail.gmail.com>, Mark Dickinson wrote: > Do you know the right way to create a universal build? If so, I'm in a > position > to test on 32-bit PPC, 32-bit Intel and 64-bit Intel. The OSX installer script is in Mac/BuildScript/build-installer.py. For 2-way builds, it essentially does: export MACOSX_DEPLOYMENT_TARGET=10.3 configure -C --enable-framework --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk --with-universal-archs='32-bit' --with-computed-gotos OPT='-g -O3' and for 4-way: export MACOSX_DEPLOYMENT_TARGET=10.5 configure -C --enable-framework --enable-universalsdk=/Developer/SDKs/MacOSX10.5.sdk --with-universal-archs='all' --with-computed-gotos OPT='-g -O3' -- Ned Deily, nad at acm.org From martin at v.loewis.de Tue Apr 14 19:55:36 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 14 Apr 2009 19:55:36 +0200 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: <49E4CE18.8070109@v.loewis.de> > Is it true that to produce a working universal/fat build of Python, > one has to first regenerate configure and pyconfig.h.in using autoconf > version >= 2.62? If not, then I don't understand how the > AC_C_BIGENDIAN autoconf macro can be giving the right results. The outcome of AC_C_BIGENDIAN isn't used on OSX. Depending on the exact version you look at, things might work differently; in trunk, Include/pymacconfig.h should be used, which does #if defined(__APPLE__) # undef WORDS_BIGENDIAN #ifdef __BIG_ENDIAN__ #define WORDS_BIGENDIAN 1 #endif /* __BIG_ENDIAN */ #endif Earlier versions included that ifdef block directly in pyconfig.h.in. In case it isn't clear how this works: GCC predefines __BIG_ENDIAN__ on PPC but not on x86; for universal binaries, two (or more) separate preprocessor (and compiler runs) are done. HTH, Martin From martin at v.loewis.de Tue Apr 14 19:56:53 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 14 Apr 2009 19:56:53 +0200 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: <49E4CE65.5080503@v.loewis.de> > If this approach is sane, could it be adopted for all other instances of > endianness detection in the py3k code base? Don't worry - the approach that we already take is already sane, so no further changes are needed. Regards, Martin From dickinsm at gmail.com Tue Apr 14 20:27:29 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 19:27:29 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <49E4CE18.8070109@v.loewis.de> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> <49E4CE18.8070109@v.loewis.de> Message-ID: <5c6f2a5d0904141127i2089d6b6n37dc1cadbbec23fe@mail.gmail.com> On Tue, Apr 14, 2009 at 6:55 PM, "Martin v. L?wis" wrote: > The outcome of AC_C_BIGENDIAN isn't used on OSX. Depending on the exact > version you look at, things might work differently; in trunk, > Include/pymacconfig.h should be used [...] Many thanks---that was the missing piece of the puzzle. I think I understand how to make things work now. Mark From dickinsm at gmail.com Tue Apr 14 20:30:23 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 14 Apr 2009 19:30:23 +0100 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> <5c6f2a5d0904140930y7dc7cf4fg496d50fd34f89ac9@mail.gmail.com> Message-ID: <5c6f2a5d0904141130l528bca3cr6eb2c6213d79cc9e@mail.gmail.com> On Tue, Apr 14, 2009 at 6:32 PM, Ned Deily wrote: > The OSX installer script is in Mac/BuildScript/build-installer.py. > > For 2-way builds, it essentially does: > > export MACOSX_DEPLOYMENT_TARGET=10.3 > configure -C --enable-framework > ? --enable-universalsdk=/Developer/SDKs/MacOSX10.4u.sdk > ? --with-universal-archs='32-bit' --with-computed-gotos OPT='-g -O3' Great---thank you! And thank you for all the testing. I'll try to sort all this out later this evening (GMT+1); I think I understand how to fix everything now. Mark From jbaker at zyasoft.com Tue Apr 14 20:30:34 2009 From: jbaker at zyasoft.com (Jim Baker) Date: Tue, 14 Apr 2009 12:30:34 -0600 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <49DB7412.9030404@voidspace.org.uk> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49DB7412.9030404@voidspace.org.uk> Message-ID: I rather like supporting short float representation. Given that CPython is adopting it, I'm sure Jython will adopt this approach too as part of a future Jython 3.x release. - Jim On Tue, Apr 7, 2009 at 9:41 AM, Michael Foord wrote: > Mark Dickinson wrote: > >> [snip...] >> Discussion points >> ================= >> >> (1) Any objections to including this into py3k? If there's >> controversy, then I guess we'll need a PEP. >> >> > > Big +1 > >> (2) Should other Python implementations (Jython, >> IronPython, etc.) be expected to use short float repr, or should >> it just be considered an implementation detail of CPython? >> I propose the latter, except that all implementations should >> be required to satisfy eval(repr(x)) == x for finite floats x. >> >> > Short float repr should be an implementation detail, so long as > eval(repr(x)) == x still holds. > > Michael Foord > > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jbaker%40zyasoft.com > -- Jim Baker jbaker at zyasoft.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Tue Apr 14 20:30:16 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 14 Apr 2009 20:30:16 +0200 Subject: [Python-Dev] Shorter float repr in Python 3.1? In-Reply-To: <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> References: <5c6f2a5d0904070739v298013d8p3bd895e832a306bc@mail.gmail.com> <49E3D34E.8040705@trueblade.com> <5c6f2a5d0904140540y670d22d5t2f761eea8f29b314@mail.gmail.com> <5c6f2a5d0904140909x417d225ejd845de9c5c7802c8@mail.gmail.com> Message-ID: <60D4A7E5-5E09-4D53-8E7B-72E5D0321A61@mac.com> On 14 Apr, 2009, at 18:09, Mark Dickinson wrote: > Okay, I think I might have fixed up the float endianness detection for > universal builds on OS X. Ned, any chance you could give this > another try with an updated version of the py3k-short-float-repr > branch? > > One thing I don't understand: > > Is it true that to produce a working universal/fat build of Python, > one has to first regenerate configure and pyconfig.h.in using autoconf > version >= 2.62? If not, then I don't understand how the > AC_C_BIGENDIAN autoconf macro can be giving the right results. It cannot, the actual bigendian detection for universal build is done in pymacconfig.h. I have given up on getting pyconfig.h right for universal builds, especially when dealing with 4-way universal builds. Ronald > > Mark > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From mal at egenix.com Tue Apr 14 22:59:39 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 14 Apr 2009 22:59:39 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090414162603.70C843A4100@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> Message-ID: <49E4F93B.6010802@egenix.com> On 2009-04-14 18:27, P.J. Eby wrote: > At 05:02 PM 4/14/2009 +0200, M.-A. Lemburg wrote: >> I don't see the emphasis in the PEP on Linux distribution support and the >> remote possibility of them wanting to combine separate packages back >> into one package as good argument for adding yet another separate >> hierarchy >> of special files which Python scans during imports. >> >> That said, note that most distributions actually take the other route: >> they try to split up larger packages into smaller ones, so the argument >> becomes even weaker. > > I think you've misunderstood something about the use case. System > packaging tools don't like separate packages to contain the *same > file*. That means that they *can't* split a larger package up with your > proposal, because every one of those packages would have to contain a > __pkg__.py -- and thus be in conflict with each other. Either that, or > they would have to make a separate system package containing *only* the > __pkg__.py, and then make all packages using the namespace depend on it > -- which is more work and requires greater co-ordination among packagers. You are missing the point: When breaking up a large package that lives in site-packages into smaller distribution bundles, you don't need namespace packages at all, so the PEP doesn't apply. The way this works is by having a base distribution bundle that includes the needed __init__.py file and a set of extension bundles the add other files to the same directory (without including another copy of __init__.py). The extension bundles include a dependency on the base package to make sure that it always gets installed first. Debian has been using that approach for egenix-mx-base for years. Works great: http://packages.debian.org/source/lenny/egenix-mx-base eGenix has been using that approach for mx package add-ons as well - long before "namespace" packages where given that name :-) Please note that the PEP is about providing ways to have package parts live on sys.path that reintegrate themselves into a single package at import time. As such it's targeting Python developers that want to ship add-ons to existing packages, not Linux distributions (they usually have their own ideas about what goes where - something that's completely out-of- scope for the PEP). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 14 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Wed Apr 15 02:32:34 2009 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 14 Apr 2009 20:32:34 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E4F93B.6010802@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> Message-ID: <20090415003026.B0A783A4114@sparrow.telecommunity.com> At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote: >You are missing the point: When breaking up a large package that lives in >site-packages into smaller distribution bundles, you don't need namespace >packages at all, so the PEP doesn't apply. > >The way this works is by having a base distribution bundle that includes >the needed __init__.py file and a set of extension bundles the add >other files to the same directory (without including another copy of >__init__.py). The extension bundles include a dependency on the base >package to make sure that it always gets installed first. If we're going to keep that practice, there's no point to having the PEP: all three methods (base+extensions, pkgutil, setuptools) all work just fine as they are, with no changes to importing or the stdlib. In particular, without the feature of being able to drop that practice, there would be no reason for setuptools to adopt the PEP. That's why I'm -1 on your proposal: it's actually inferior to the methods we already have today. From dan.eloff at gmail.com Wed Apr 15 03:01:55 2009 From: dan.eloff at gmail.com (Dan Eloff) Date: Tue, 14 Apr 2009 20:01:55 -0500 Subject: [Python-Dev] Why does read() return bytes instead of bytearray? Message-ID: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com> Hi, Can someone please explain why read() should return an immutable bytes type instead of a mutable bytearray? It's not like read() from a file and use buffer as a key in a dict is common. Certainly read() from file or stream, modify, write is very common. I don't understand why the common case pays the price in performance and simplicity. It seemed to me that the immutable bytes was described as being useful in niche situations, but it actually seems to have been favored over bytearray in Python 3. Was there was a good reason for this decision? Or was this just an artifact in the change to two bytes types? The reason I ask is I have a server application that is mostly stream reading/writing on the hot path and in Python 2.5 the redundant copies add up to a significant overhead, (I estimate as much as 25% from my measurements) I was looking at Python 3 as a way to solve that problem, but unfortunately it doesn't look like it will help. Thanks, -Dan From amauryfa at gmail.com Wed Apr 15 03:50:06 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 15 Apr 2009 03:50:06 +0200 Subject: [Python-Dev] Why does read() return bytes instead of bytearray? In-Reply-To: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com> References: <4817b6fc0904141801q4db6f240xe5f429763d1440d1@mail.gmail.com> Message-ID: Hello, On Wed, Apr 15, 2009 at 03:01, Dan Eloff wrote: > Hi, > > Can someone please explain why read() should return an immutable bytes > type instead of a mutable bytearray? It's not like read() from a file > and use buffer as a key in a dict is common. Certainly read() from > file or stream, modify, write is very common. I don't understand why > the common case pays the price in performance and simplicity. It > seemed to me that the immutable bytes was described as being useful in > niche situations, but it actually seems to have been favored over > bytearray in Python 3. > > Was there was a good reason for this decision? Or was this just an > artifact in the change to two bytes types? No, the read() method did not change from the 2.x series. It returns a new object on each call. > The reason I ask is I have a server application that is mostly stream > reading/writing on the hot path and in Python 2.5 the redundant copies > add up to a significant overhead, (I estimate as much as 25% from my > measurements) I was looking at Python 3 as a way to solve that > problem, but unfortunately it doesn't look like it will help. Files opened in binary mode have a readinto() method, which fills the given bytearray. Is this what you are looking for? -- Amaury Forgeot d'Arc From dan.eloff at gmail.com Wed Apr 15 05:05:43 2009 From: dan.eloff at gmail.com (Dan Eloff) Date: Tue, 14 Apr 2009 22:05:43 -0500 Subject: [Python-Dev] Why does read() return bytes instead of bytearray? Message-ID: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> >No, the read() method did not change from the 2.x series. It returns a new object on each call. I think you misunderstand me, but the readinto() method looks like a perfectly reasonable solution, I didn't realize it existed, as it's not in the library reference on file objects. Thanks for enlightening me, I feel a little stupid now :) Python 3, lookout, here I come! -Dan From mal at egenix.com Wed Apr 15 09:51:30 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 15 Apr 2009 09:51:30 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415003026.B0A783A4114@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> Message-ID: <49E59202.6050809@egenix.com> On 2009-04-15 02:32, P.J. Eby wrote: > At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote: >> You are missing the point: When breaking up a large package that lives in >> site-packages into smaller distribution bundles, you don't need namespace >> packages at all, so the PEP doesn't apply. >> >> The way this works is by having a base distribution bundle that includes >> the needed __init__.py file and a set of extension bundles the add >> other files to the same directory (without including another copy of >> __init__.py). The extension bundles include a dependency on the base >> package to make sure that it always gets installed first. > > If we're going to keep that practice, there's no point to having the > PEP: all three methods (base+extensions, pkgutil, setuptools) all work > just fine as they are, with no changes to importing or the stdlib. Again: the PEP is about creating a standard for namespace packages. It's not about making namespace packages easy to use for Linux distribution maintainers. Instead, it's targeting *developers* that want to enable shipping a single package in multiple, separate pieces, giving the user the freedom to the select the ones she needs. Of course, this is possible today using various other techniques. The point is that there is no standard for namespace packages and that's what the PEP is trying to solve. > In particular, without the feature of being able to drop that practice, > there would be no reason for setuptools to adopt the PEP. That's why > I'm -1 on your proposal: it's actually inferior to the methods we > already have today. It's simpler and more in line with the Python Zen, not inferior. You are free not to support it in setuptools - the methods implemented in setuptools will continue to work as they are, but continue to require support code and, over time, no longer be compatible with other tools building upon the standard defined in the PEP. In the end, it's the user that decides: whether to go with a standard or not. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 15 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alessiogiovanni.baroni at gmail.com Wed Apr 15 10:05:13 2009 From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni) Date: Wed, 15 Apr 2009 10:05:13 +0200 Subject: [Python-Dev] IDLE timeout. Message-ID: Hi to all, I write on this list, because the error concerns the internals (I think). The IDLE has a strange behaviour. Sometimes, randomly, the IDLE restart the interpreter, with the follow exception on console: ---------------------------------------- Unhandled server exception! Thread: SockThread Client Address: ('127.0.0.1', 8833) Request: Traceback (most recent call last): File "/opt/python301/lib/python3.0/socketserver.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/opt/python301/lib/python3.0/socketserver.py", line 307, in process_request self.finish_request(request, client_address) File "/opt/python301/lib/python3.0/socketserver.py", line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 503, in __init__ socketserver.BaseRequestHandler.__init__(self, sock, addr, svr) File "/opt/python301/lib/python3.0/socketserver.py", line 614, in __init__ self.handle() File "/opt/python301/lib/python3.0/idlelib/run.py", line 259, in handle rpc.RPCHandler.getresponse(self, myseq=None, wait=0.05) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 280, in getresponse response = self._getresponse(myseq, wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 300, in _getresponse response = self.pollresponse(myseq, wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 424, in pollresponse message = self.pollmessage(wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 376, in pollmessage packet = self.pollpacket(wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 347, in pollpacket r, w, x = select.select([self.sock.fileno()], [], [], wait) select.error: (4, 'Interrupted system call') *** Unrecoverable, server exiting! ---------------------------------------- There isn't a specific reason; IDLE restart when I write some code, or when I insert a return, or also when I do nothing. If it is a bug, I don't know how to compile a test case, because the error is randomly. Thanks to all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From krstic at solarsail.hcs.harvard.edu Wed Apr 15 16:12:31 2009 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Wed, 15 Apr 2009 16:12:31 +0200 Subject: [Python-Dev] IDLE timeout. In-Reply-To: References: Message-ID: <8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu> On Apr 15, 2009, at 10:05 AM, Alessio Giovanni Baroni wrote: > r, w, x = select.select([self.sock.fileno()], [], [], wait) > select.error: (4, 'Interrupted system call') See here for an explanation of the same problem in another module: Sounds like you ought to file a bug against IDLE to have it grow EINTR handling. Cheers, -- Ivan Krsti? | http://radian.org From alessiogiovanni.baroni at gmail.com Wed Apr 15 16:33:20 2009 From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni) Date: Wed, 15 Apr 2009 16:33:20 +0200 Subject: [Python-Dev] IDLE timeout. In-Reply-To: <8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu> References: <8B98611B-BE8A-4D32-9B3A-296DB8BDFDC6@solarsail.hcs.harvard.edu> Message-ID: Ah, sometimes, the exception raised is following (slightly different from previous): Exception in Tkinter callback Traceback (most recent call last): File "/opt/python301/lib/python3.0/tkinter/__init__.py", line 1399, in __call__ return self.func(*args) File "/opt/python301/lib/python3.0/tkinter/__init__.py", line 487, in callit func(*args) File "/opt/python301/lib/python3.0/idlelib/PyShell.py", line 490, in poll_subprocess response = clt.pollresponse(self.active_seq, wait=0.05) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 424, in pollresponse message = self.pollmessage(wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 376, in pollmessage packet = self.pollpacket(wait) File "/opt/python301/lib/python3.0/idlelib/rpc.py", line 347, in pollpacket r, w, x = select.select([self.sock.fileno()], [], [], wait) select.error: (4, 'Interrupted system call') In this case the IDLE not respond, because the python interpreter is not restarted. I must to close all :-(. I will open a issue in the tracker relative to IDLE for now? Regards. 2009/4/15 Ivan Krsti? > On Apr 15, 2009, at 10:05 AM, Alessio Giovanni Baroni wrote: > >> r, w, x = select.select([self.sock.fileno()], [], [], wait) >> select.error: (4, 'Interrupted system call') >> > > > See here for an explanation of the same problem in another module: > > > Sounds like you ought to file a bug against IDLE to have it grow EINTR > handling. Cheers, > > -- > Ivan Krsti? | http://radian.org > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Apr 15 16:44:17 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 10:44:17 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E59202.6050809@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> Message-ID: <20090415144147.6845F3A4100@sparrow.telecommunity.com> At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote: >On 2009-04-15 02:32, P.J. Eby wrote: > > At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote: > >> You are missing the point: When breaking up a large package that lives in > >> site-packages into smaller distribution bundles, you don't need namespace > >> packages at all, so the PEP doesn't apply. > >> > >> The way this works is by having a base distribution bundle that includes > >> the needed __init__.py file and a set of extension bundles the add > >> other files to the same directory (without including another copy of > >> __init__.py). The extension bundles include a dependency on the base > >> package to make sure that it always gets installed first. > > > > If we're going to keep that practice, there's no point to having the > > PEP: all three methods (base+extensions, pkgutil, setuptools) all work > > just fine as they are, with no changes to importing or the stdlib. > >Again: the PEP is about creating a standard for namespace >packages. It's not about making namespace packages easy to use for >Linux distribution maintainers. Instead, it's targeting *developers* >that want to enable shipping a single package in multiple, separate >pieces, giving the user the freedom to the select the ones she needs. > >Of course, this is possible today using various other techniques. The >point is that there is no standard for namespace packages and that's >what the PEP is trying to solve. > > > In particular, without the feature of being able to drop that practice, > > there would be no reason for setuptools to adopt the PEP. That's why > > I'm -1 on your proposal: it's actually inferior to the methods we > > already have today. > >It's simpler and more in line with the Python Zen, not inferior. > >You are free not to support it in setuptools - the methods >implemented in setuptools will continue to work as they are, >but continue to require support code and, over time, no longer >be compatible with other tools building upon the standard >defined in the PEP. > >In the end, it's the user that decides: whether to go with a >standard or not. Up until this point, I've been trying to help you understand the use cases, but it's clear now that you already understand them, you just don't care. That wouldn't be a problem if you just stayed on the sidelines, instead of actively working to make those use cases more difficult for everyone else than they already are. Anyway, since you clearly understand precisely what you're doing, I'm now going to stop trying to explain things, as my responses are apparently just encouraging you, and possibly convincing bystanders that there's some genuine controversy here as well. From rdmurray at bitdance.com Wed Apr 15 16:42:33 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 15 Apr 2009 10:42:33 -0400 (EDT) Subject: [Python-Dev] Why does read() return bytes instead of bytearray? In-Reply-To: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> Message-ID: On Tue, 14 Apr 2009 at 22:05, Dan Eloff wrote: >> No, the read() method did not change from the 2.x series. It returns a new object on each call. > > I think you misunderstand me, but the readinto() method looks like a > perfectly reasonable solution, I didn't realize it existed, as it's > not in the library reference on file objects. Thanks for enlightening > me, I feel a little stupid now :) You have to follow the link from that section to the 'io' module to find it. The io module is about streams and is therefore in the 'generic operating system services' section, not the 'file and directory access section', which makes it a little harder to find when what you think you want to know about is file access...I think this is a doc bug but I'm completely unsure what would be a good fix. --David From barry at python.org Wed Apr 15 17:45:08 2009 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Apr 2009 11:45:08 -0400 Subject: [Python-Dev] RELEASED Python 2.6.2 Message-ID: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> On behalf of the Python community, I'm happy to announce the availability of Python 2.6.2. This is the latest production-ready version in the Python 2.6 series. Dozens of issues have been fixed since Python 2.6.1 was released back in December. Please see the NEWS file for all the gory details. http://www.python.org/download/releases/2.6.2/NEWS.txt For more information on Python 2.6 in general, please see http://docs.python.org/dev/whatsnew/2.6.html Source tarballs, Windows installers, and (soon) Mac OS X disk images can be downloaded from the Python 2.6.2 page: http://www.python.org/download/releases/2.6.2/ Please report bugs for any Python version in the Python tracker: http://bugs.python.org Enjoy, -Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From aahz at pythoncraft.com Wed Apr 15 18:10:33 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 15 Apr 2009 09:10:33 -0700 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415144147.6845F3A4100@sparrow.telecommunity.com> References: <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> Message-ID: <20090415161033.GA5218@panix.com> [much quote-trimming, the following is intended to just give the gist, but the bits quoted below are not in directe response to each other] On Wed, Apr 15, 2009, P.J. Eby wrote: > At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote: >> >> [...] >> Again: the PEP is about creating a standard for namespace >> packages. It's not about making namespace packages easy to use for >> Linux distribution maintainers. Instead, it's targeting *developers* >> that want to enable shipping a single package in multiple, separate >> pieces, giving the user the freedom to the select the ones she needs. >> [...] > > [...] > Anyway, since you clearly understand precisely what you're doing, I'm > now going to stop trying to explain things, as my responses are > apparently just encouraging you, and possibly convincing bystanders that > there's some genuine controversy here as well. For the benefit of us bystanders, could you summarize your vote at this point? Given the PEP's intended goals, if you do not oppose the PEP, are there any changes you think should be made? -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From mal at egenix.com Wed Apr 15 18:15:46 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 15 Apr 2009 18:15:46 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415144147.6845F3A4100@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> Message-ID: <49E60832.8030806@egenix.com> On 2009-04-15 16:44, P.J. Eby wrote: > At 09:51 AM 4/15/2009 +0200, M.-A. Lemburg wrote: >> On 2009-04-15 02:32, P.J. Eby wrote: >> > At 10:59 PM 4/14/2009 +0200, M.-A. Lemburg wrote: >> >> You are missing the point: When breaking up a large package that >> lives in >> >> site-packages into smaller distribution bundles, you don't need >> namespace >> >> packages at all, so the PEP doesn't apply. >> >> >> >> The way this works is by having a base distribution bundle that >> includes >> >> the needed __init__.py file and a set of extension bundles the add >> >> other files to the same directory (without including another copy of >> >> __init__.py). The extension bundles include a dependency on the base >> >> package to make sure that it always gets installed first. >> > >> > If we're going to keep that practice, there's no point to having the >> > PEP: all three methods (base+extensions, pkgutil, setuptools) all work >> > just fine as they are, with no changes to importing or the stdlib. >> >> Again: the PEP is about creating a standard for namespace >> packages. It's not about making namespace packages easy to use for >> Linux distribution maintainers. Instead, it's targeting *developers* >> that want to enable shipping a single package in multiple, separate >> pieces, giving the user the freedom to the select the ones she needs. >> >> Of course, this is possible today using various other techniques. The >> point is that there is no standard for namespace packages and that's >> what the PEP is trying to solve. >> >> > In particular, without the feature of being able to drop that practice, >> > there would be no reason for setuptools to adopt the PEP. That's why >> > I'm -1 on your proposal: it's actually inferior to the methods we >> > already have today. >> >> It's simpler and more in line with the Python Zen, not inferior. >> >> You are free not to support it in setuptools - the methods >> implemented in setuptools will continue to work as they are, >> but continue to require support code and, over time, no longer >> be compatible with other tools building upon the standard >> defined in the PEP. >> >> In the end, it's the user that decides: whether to go with a >> standard or not. > > Up until this point, I've been trying to help you understand the use > cases, but it's clear now that you already understand them, you just > don't care. > > That wouldn't be a problem if you just stayed on the sidelines, instead > of actively working to make those use cases more difficult for everyone > else than they already are. > > Anyway, since you clearly understand precisely what you're doing, I'm > now going to stop trying to explain things, as my responses are > apparently just encouraging you, and possibly convincing bystanders that > there's some genuine controversy here as well. Hopefully, bystanders will understand that the one single use case you are always emphasizing, namely that of Linux distribution maintainers trying to change the package installation layout, is really a rather uncommon and rare use case. It is true that I do understand what the namespace package idea is all about. I've been active in Python package development since they were first added to Python as a new built-in import feature in Python 1.5 and have been distributing packages with package add-ons for more than a decade... For some history, have a look at: http://www.python.org/doc/essays/packages.html Also note how that essay discourages the use of .pth files: """ If the package really requires adding one or more directories on sys.path (e.g. because it has not yet been structured to support dotted-name import), a "path configuration file" named package.pth can be placed in either the site-python or site-packages directory. ... A typical installation should have no or very few .pth files or something is wrong, and if you need to play with the search order, something is very wrong. """ Back to the PEP: The much more common use case is that of wanting to have a base package installation which optional add-ons that live in the same logical package namespace. The PEP provides a way to solve this use case by giving both developers and users a standard at hand which they can follow without having to rely on some non-standard helpers and across Python implementations. My proposal tries to solve this without adding yet another .pth file like mechanism - hopefully in the spirit of the original Python package idea. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 15 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From georg at python.org Wed Apr 15 17:52:57 2009 From: georg at python.org (Georg Brandl) Date: Wed, 15 Apr 2009 17:52:57 +0200 Subject: [Python-Dev] Python Bug Day on April 23 Message-ID: <49E602D9.5070603@python.org> Hi, I'd like to announce that there will be a Python Bug Day on April 23. As always, this is a perfect opportunity to get involved in Python development, or bring your own issues to attention, discuss them and (hopefully) resolve them together with the core developers. We will coordinate over IRC, in #python-dev on irc.freenode.net, and the Wiki page http://wiki.python.org/moin/PythonBugDay has all important information and a short list of steps how to get set up. Please spread the word! Georg From pje at telecommunity.com Wed Apr 15 19:49:20 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 13:49:20 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415161033.GA5218@panix.com> References: <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <20090415161033.GA5218@panix.com> Message-ID: <20090415174649.43B6B3A4100@sparrow.telecommunity.com> At 09:10 AM 4/15/2009 -0700, Aahz wrote: >For the benefit of us bystanders, could you summarize your vote at this >point? Given the PEP's intended goals, if you do not oppose the PEP, are >there any changes you think should be made? I'm +1 on Martin's original version of the PEP, subject to the point brought up by someone that .pkg should be changed to a different extension. I'm -1 on all of MAL's proposed revisions, as IMO they are a step backwards: they "standardize" an approach that will create problems that don't need to exist, and don't exist now. Martin's proposal is an improvement on the status quo, Marc's proposal is a dis-improvement. From pje at telecommunity.com Wed Apr 15 19:59:34 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 13:59:34 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E60832.8030806@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> Message-ID: <20090415175704.966B13A4100@sparrow.telecommunity.com> At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote: >The much more common use case is that of wanting to have a base package >installation which optional add-ons that live in the same logical >package namespace. Please see the large number of Zope and PEAK distributions on PyPI as minimal examples that disprove this being the common use case. I expect you will find a fair number of others, as well. In these cases, there is NO "base package"... the entire point of using namespace packages for these distributions is that a "base package" is neither necessary nor desirable. In other words, the "base package" scenario is the exception these days, not the rule. I actually know specifically of only one other such package besides your mx.* case, the logilab ll.* package. From mal at egenix.com Wed Apr 15 20:00:42 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 15 Apr 2009 20:00:42 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net> Message-ID: <49E620CA.70903@egenix.com> On 2009-04-15 19:38, James Y Knight wrote: > > On Apr 15, 2009, at 12:15 PM, M.-A. Lemburg wrote: > >> The much more common use case is that of wanting to have a base package >> installation which optional add-ons that live in the same logical >> package namespace. >> >> The PEP provides a way to solve this use case by giving both developers >> and users a standard at hand which they can follow without having to >> rely on some non-standard helpers and across Python implementations. > > I'm not sure I understand what advantage your proposal gives over the > current mechanism for doing this. > > That is, add to your __init__.py file: > > from pkgutil import extend_path > __path__ = extend_path(__path__, __name__) > > Can you describe the intended advantages over the status-quo a bit more > clearly? Simple: you don't need the above lines in your __init__.py file anymore and can rely on a Python standard for namespace packages instead of some helper implementation. The fact that you have a __pkg__.py file in your package dir will signal the namespace package character to Python's importer and this will take care of the lookup process for you. Namespace packages will be just as easy to write, install and maintain as regular Python packages. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 15 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From foom at fuhm.net Wed Apr 15 19:38:19 2009 From: foom at fuhm.net (James Y Knight) Date: Wed, 15 Apr 2009 13:38:19 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E60832.8030806@egenix.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> Message-ID: <9D093FD7-080B-479E-90B4-51294EBE5186@fuhm.net> On Apr 15, 2009, at 12:15 PM, M.-A. Lemburg wrote: > The much more common use case is that of wanting to have a base > package > installation which optional add-ons that live in the same logical > package namespace. > > The PEP provides a way to solve this use case by giving both > developers > and users a standard at hand which they can follow without having to > rely on some non-standard helpers and across Python implementations. I'm not sure I understand what advantage your proposal gives over the current mechanism for doing this. That is, add to your __init__.py file: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) Can you describe the intended advantages over the status-quo a bit more clearly? James From mal at egenix.com Wed Apr 15 20:09:11 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 15 Apr 2009 20:09:11 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415175704.966B13A4100@sparrow.telecommunity.com> References: <49D4DA72.60401@v.loewis.de> <49D52115.6020001@egenix.com> <49D66C6E.3090602@v.loewis.de> <49DB475B.8060504@egenix.com> <20090407140317.EBD383A4063@sparrow.telecommunity.com> <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> Message-ID: <49E622C7.4000208@egenix.com> On 2009-04-15 19:59, P.J. Eby wrote: > At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote: >> The much more common use case is that of wanting to have a base package >> installation which optional add-ons that live in the same logical >> package namespace. > > Please see the large number of Zope and PEAK distributions on PyPI as > minimal examples that disprove this being the common use case. I expect > you will find a fair number of others, as well. > > In these cases, there is NO "base package"... the entire point of using > namespace packages for these distributions is that a "base package" is > neither necessary nor desirable. > > In other words, the "base package" scenario is the exception these days, > not the rule. I actually know specifically of only one other such > package besides your mx.* case, the logilab ll.* package. So now you're arguing against having base packages... at least you've dropped the strange idea of using Linux distribution maintainers as central use case ;-) Think of base namespace packages (the ones providing the __init__.py file) as defining the namespace. They setup ownership and the basic infrastructure needed by add-ons. If you take Zope as example, the Products/ package dir is a good example: the __init__.py file in that directory is provided by the Zope installation (generated during Zope instance creation), so Zope "owns" the package. With the proposal, Zope could declare this package dir a namespace base package by adding a __pkg__.py file to it. Zope add-ons could then be installed somewhere else on sys.path and include a Products/ dir as well, only this time it doesn't have the __init__.py file, but only a __pkg__.py file. Python would then take care of integrating the add-on Products/ dir Python module/package contents with the base package. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 15 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From amk at amk.ca Wed Apr 15 20:52:21 2009 From: amk at amk.ca (A.M. Kuchling) Date: Wed, 15 Apr 2009 14:52:21 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415175704.966B13A4100@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> Message-ID: <20090415185221.GB13696@amk-desktop.matrixgroup.net> On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote: > Please see the large number of Zope and PEAK distributions on PyPI as > minimal examples that disprove this being the common use case. I expect > you will find a fair number of others, as well. ... > In other words, the "base package" scenario is the exception these days, > not the rule. I actually know specifically of only one other such > package besides your mx.* case, the logilab ll.* package. Isn't that pretty even, then? zope.* and PEAK are two examples of one approach; and mx.* and ll.* are two examples that use the base package approach. Neither approach seems to be the more common one, and both are pretty rare. --amk From georg at python.org Wed Apr 15 21:10:07 2009 From: georg at python.org (Georg Brandl) Date: Wed, 15 Apr 2009 21:10:07 +0200 Subject: [Python-Dev] Correction: Python Bug Day on April 25 Message-ID: <49E6310F.2020300@python.org> Hi, I managed to screw up the date, so here it goes again: I'd like to announce that there will be a Python Bug Day on April 25. As always, this is a perfect opportunity to get involved in Python development, or bring your own issues to attention, discuss them and (hopefully) resolve them together with the core developers. We will coordinate over IRC, in #python-dev on irc.freenode.net, and the Wiki page http://wiki.python.org/moin/PythonBugDay has all important information and a short list of steps how to get set up. Please spread the word! Georg From pje at telecommunity.com Wed Apr 15 21:22:52 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 15:22:52 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415185221.GB13696@amk-desktop.matrixgroup.net> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> Message-ID: <20090415192021.558E53A4119@sparrow.telecommunity.com> At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote: >On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote: > > Please see the large number of Zope and PEAK distributions on PyPI as > > minimal examples that disprove this being the common use case. I expect > > you will find a fair number of others, as well. > ... > > In other words, the "base package" scenario is the exception these days, > > not the rule. I actually know specifically of only one other such > > package besides your mx.* case, the logilab ll.* package. > >Isn't that pretty even, then? zope.* and PEAK are two examples of one >approach; and mx.* and ll.* are two examples that use the base package >approach. Neither approach seems to be the more common one, and both >are pretty rare. If you view the package listings on PyPI, you'll see that the "pure" namespaces currently in use include: alchemist.* amplecode.* atomisator.* bda.* benri.* beyondskins.* bliptv.* bopen.* borg.* bud.* ... This is just going down to the 'b's, looking only at packages whose PyPI project name reflects a nested package name, and only including those with entries that: 1. use setuptools, 2. declare one or more namespace packages, and 3. do not depend on some sort of "base" or "core" package. Technically, setuptools doesn't support base packages anyway, but if the organization appeared to be based on a "core+plugins/addons" model (as opposed to "collection of packages grouped in a namespace") I didn't include it in the list above -- i.e., I'm bending over backwards to be fair in the count. If somebody wants to do a formal count of base vs. pure, it might provide interesting stats. I initially only mentioned Zope and PEAK because I have direct knowledge of the developers' intent regarding their namespace packages. However, now that I've actually looked at a tiny sample of PyPI, it's clear that the actual field use of pure namespace packages has positively exploded since setuptools made it practical to use them. It's unclear, however, who is using base packages besides mx.* and ll.*, although I'd guess from the PyPI listings that perhaps Django is. (It seems that "base" packages are more likely to use a 'base-extension' naming pattern, vs. the 'namespace.project' pattern used by "pure" packages.) Of course, I am certainly not opposed to supporting base packages, and Martin's version of PEP 382 is a plus for setuptools because it would allow setuptools to better support the "base" scenario. But pure packages are definitely not a minority; in fact, a superficial observation of the full PyPI list suggests that there may be almost as many projects using pure-namespace packages, as there are non-namespaced projects! From fijall at gmail.com Wed Apr 15 21:36:01 2009 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 15 Apr 2009 13:36:01 -0600 Subject: [Python-Dev] Correction: Python Bug Day on April 25 In-Reply-To: <49E6310F.2020300@python.org> References: <49E6310F.2020300@python.org> Message-ID: <693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com> On Wed, Apr 15, 2009 at 1:10 PM, Georg Brandl wrote: > Hi, > > I managed to screw up the date, so here it goes again: > > I'd like to announce that there will be a Python Bug Day on April 25. > As always, this is a perfect opportunity to get involved in Python > development, or bring your own issues to attention, discuss them and > (hopefully) resolve them together with the core developers. > > We will coordinate over IRC, in #python-dev on irc.freenode.net, > and the Wiki page http://wiki.python.org/moin/PythonBugDay has all > important information and a short list of steps how to get set up. > > Please spread the word! > > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > Are you aware that this directly conflicts with TurboGears world-wide sprint? Not sure if this is relevant, just a notice. Cheers, fijal From g.brandl at gmx.net Wed Apr 15 21:55:30 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 15 Apr 2009 21:55:30 +0200 Subject: [Python-Dev] Correction: Python Bug Day on April 25 In-Reply-To: <693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com> References: <49E6310F.2020300@python.org> <693bc9ab0904151236w5b72d56bi2a5ce936201eebe6@mail.gmail.com> Message-ID: Maciej Fijalkowski schrieb: > On Wed, Apr 15, 2009 at 1:10 PM, Georg Brandl wrote: >> Hi, >> >> I managed to screw up the date, so here it goes again: >> >> I'd like to announce that there will be a Python Bug Day on April 25. > Are you aware that this directly conflicts with TurboGears world-wide sprint? > > Not sure if this is relevant, just a notice. I have been made aware :) I don't think it will be much of a problem though. Georg From ziade.tarek at gmail.com Wed Apr 15 22:00:01 2009 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 15 Apr 2009 22:00:01 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415192021.558E53A4119@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> Message-ID: <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> On Wed, Apr 15, 2009 at 9:22 PM, P.J. Eby wrote: > At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote: >> >> On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote: >> > Please see the large number of Zope and PEAK distributions on PyPI as >> > minimal examples that disprove this being the common use case. ?I expect >> > you will find a fair number of others, as well. >> ? ... >> > In other words, the "base package" scenario is the exception these days, >> > not the rule. ?I actually know specifically of only one other such >> > package besides your mx.* case, the logilab ll.* package. >> >> Isn't that pretty even, then? ?zope.* and PEAK are two examples of one >> approach; and mx.* and ll.* are two examples that use the base package >> approach. ?Neither approach seems to be the more common one, and both >> are pretty rare. > > If you view the package listings on PyPI, you'll see that the "pure" > namespaces currently in use include: > > alchemist.* > amplecode.* > atomisator.* > bda.* > benri.* > beyondskins.* > bliptv.* > bopen.* > borg.* > bud.* > ... > > This is just going down to the 'b's, looking only at packages whose PyPI > project name reflects a nested package name, and only including those with > entries that: > > 1. use setuptools, > 2. declare one or more namespace packages, and > 3. do not depend on some sort of "base" or "core" package. > > Technically, setuptools doesn't support base packages anyway, but if the > organization appeared to be based on a "core+plugins/addons" model (as > opposed to "collection of packages grouped in a namespace") I didn't include > it in the list above -- i.e., I'm bending over backwards to be fair in the > count. > > If somebody wants to do a formal count of base vs. pure, it might provide > interesting stats. ?I initially only mentioned Zope and PEAK because I have > direct knowledge of the developers' intent regarding their namespace > packages. > > However, now that I've actually looked at a tiny sample of PyPI, it's clear > that the actual field use of pure namespace packages has positively exploded > since setuptools made it practical to use them. > > It's unclear, however, who is using base packages besides mx.* and ll.*, > although I'd guess from the PyPI listings that perhaps Django is. ?(It seems > that "base" packages are more likely to use a 'base-extension' naming > pattern, vs. the 'namespace.project' pattern used by "pure" packages.) > > Of course, I am certainly not opposed to supporting base packages, and > Martin's version of PEP 382 is a plus for setuptools because it would allow > setuptools to better support the "base" scenario. > > But pure packages are definitely not a minority; in fact, a superficial > observation of the full PyPI list suggests that there may be almost as many > projects using pure-namespace packages, as there are non-namespaced > projects! > In the survey I have done on packaging, 34% of the people that answered are using setuptools namespace feature, which currently makes it impossible to use the namespace for the base package. Now for the "base" or "core" package, what peoplethat uses setuptools do most of the time: 1- they use zc.buildout so they don't need a base package : they list in a configuration files all packages needed to build the application, and one of these package happen to have the scripts to launch the application. 2 - they have a "main" package that doesn't use the same namespace, but uses setuptools instal_requires metadata to include namespaced packages. It acts like zc.buildout in some ways. For example, you mentioned atomisator.* in your example, this app has a main package called "Atomisator" (notice the upper A) that uses strategy #2 But frankly, the "base package" scenario is not spread these days simply because it's not obvious to do it without depending on a OS that has its own strategy to install packages. For example, if you are not under debian, it's a pain to use logilab packages because they use this common namespace for several packages and a plain python installation of the various packages won't work out of the box under other systems like windows. (and for pylint, I ended up creating my own distribution for windows...) So : - having namespaces natively in Python is a big win (Namespaces are one honking great idea -- let's do more of those!) - being able to still write some code under the primary namespace is something I (and lots of people) wish we could do with setuptools, so it's a big win too. Regards, Tarek -- Tarek Ziad? | http://ziade.org From mal at egenix.com Wed Apr 15 22:20:45 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 15 Apr 2009 22:20:45 +0200 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415192021.558E53A4119@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> Message-ID: <49E6419D.5010302@egenix.com> On 2009-04-15 21:22, P.J. Eby wrote: > At 02:52 PM 4/15/2009 -0400, A.M. Kuchling wrote: >> On Wed, Apr 15, 2009 at 01:59:34PM -0400, P.J. Eby wrote: >> > Please see the large number of Zope and PEAK distributions on PyPI as >> > minimal examples that disprove this being the common use case. I >> expect >> > you will find a fair number of others, as well. >> ... >> > In other words, the "base package" scenario is the exception these >> days, >> > not the rule. I actually know specifically of only one other such >> > package besides your mx.* case, the logilab ll.* package. >> >> Isn't that pretty even, then? zope.* and PEAK are two examples of one >> approach; and mx.* and ll.* are two examples that use the base package >> approach. Neither approach seems to be the more common one, and both >> are pretty rare. > > If you view the package listings on PyPI, you'll see that the "pure" > namespaces currently in use include: > > alchemist.* > amplecode.* > atomisator.* > bda.* > benri.* > beyondskins.* > bliptv.* > bopen.* > borg.* > bud.* > ... > > This is just going down to the 'b's, looking only at packages whose PyPI > project name reflects a nested package name, and only including those > with entries that: > > 1. use setuptools, > 2. declare one or more namespace packages, and > 3. do not depend on some sort of "base" or "core" package. > > Technically, setuptools doesn't support base packages anyway, but if the > organization appeared to be based on a "core+plugins/addons" model (as > opposed to "collection of packages grouped in a namespace") I didn't > include it in the list above -- i.e., I'm bending over backwards to be > fair in the count. Hmm, setuptools doesn't support the notion of base packages, ie. packages that provide their own __init__.py module, so I fail to see how your list or any other list of setuptools-depend packages can be taken as indicator for anything related to base packages. Since setuptools probably introduced the idea of namespace sharing packages to many authors in the first place, such a list is even less appropriate to use as sample base. That said, I don't think such statistics provide any useful information to decide on the namespace import strategy standard for Python which is the subject of the PEP. They just show that one helper-based mechanism is used more than others and that's simply a consequence of there not being a standard built-in way of using namespace packages in Python. Whether base packages are useful or not is really a side aspect of the PEP and my proposal. I'm more after a method that doesn't add more .pkg file cruft to Python's import mechanism. Those .pth files were originally meant to help older Python "packages" (think the early PIL or Numeric extensions) to integrate nicely into the new scheme without having to fully support dotted package names right from the start. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 15 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From benjamin at python.org Wed Apr 15 22:37:47 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 15 Apr 2009 15:37:47 -0500 Subject: [Python-Dev] Why does read() return bytes instead of bytearray? In-Reply-To: References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> Message-ID: <1afaf6160904151337o277c6bc3s8aaa71a756968c21@mail.gmail.com> 2009/4/15 R. David Murray : > On Tue, 14 Apr 2009 at 22:05, Dan Eloff wrote: >>> >>> No, the read() method did not change from the 2.x series. It returns a >>> new object on each call. >> >> I think you misunderstand me, but the readinto() method looks like a >> perfectly reasonable solution, I didn't realize it existed, as it's >> not in the library reference on file objects. Thanks for enlightening >> me, I feel a little stupid now :) > > You have to follow the link from that section to the 'io' module to find > it. > > The io module is about streams and is therefore in the 'generic operating > system services' section, not the 'file and directory access section', > which makes it a little harder to find when what you think you want to > know about is file access...I think this is a doc bug but I'm completely > unsure what would be a good fix. I've added a like to the io module in the see also section of the file and directory systems. -- Regards, Benjamin From rowen at u.washington.edu Wed Apr 15 22:47:08 2009 From: rowen at u.washington.edu (Russell E. Owen) Date: Wed, 15 Apr 2009 13:47:08 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: Thank you for 2.6.2. I see the Mac binary installer isn't out yet (at least it is not listed on the downloads page). Any chance that it will be compatible with 3rd party Tcl/Tk? Most recent releases have not been; the only way I know to make a compatible build is to build the installer on a machine that already has a 3rd party Tcl/Tk installed; the resulting binary is then compatible with both 3rd party versions of Tcl/Tk and also with Apple's ancient built in version. -- Russell From nad at acm.org Wed Apr 15 22:58:44 2009 From: nad at acm.org (Ned Deily) Date: Wed, 15 Apr 2009 13:58:44 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: In article , "Russell E. Owen" wrote: > I see the Mac binary installer isn't out yet (at least it is not listed > on the downloads page). Any chance that it will be compatible with 3rd > party Tcl/Tk? > > Most recent releases have not been; the only way I know to make a > compatible build is to build the installer on a machine that already has > a 3rd party Tcl/Tk installed; the resulting binary is then compatible > with both 3rd party versions of Tcl/Tk and also with Apple's ancient > built in version. Thanks for the reminder. FWIW, that issue has recently been documented and there is a patch for the build script to ensure that the 3rd party Tcl/Tk is present during the installer build. I don't think it made it into the 2.6.2 source tree, though. -- Ned Deily, nad at acm.org From pje at telecommunity.com Wed Apr 15 23:01:49 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 17:01:49 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E6419D.5010302@egenix.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49E6419D.5010302@egenix.com> Message-ID: <20090415205918.B5B303A4100@sparrow.telecommunity.com> At 10:20 PM 4/15/2009 +0200, M.-A. Lemburg wrote: >Whether base packages are useful or not is really a side aspect >of the PEP and my proposal. It's not whether they're useful, it's whether they're required. Your proposal *requires* base packages, and for people who intend to use pure packages, this is NOT a feature: it's a bug. Specifically, it introduces a large number of unnecessary, boilerplate dependencies to their package distribution strategy. From barry at python.org Wed Apr 15 23:08:38 2009 From: barry at python.org (Barry Warsaw) Date: Wed, 15 Apr 2009 17:08:38 -0400 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: On Apr 15, 2009, at 4:47 PM, Russell E. Owen wrote: > Thank you for 2.6.2. > > I see the Mac binary installer isn't out yet (at least it is not > listed > on the downloads page). Any chance that it will be compatible with 3rd > party Tcl/Tk? > > Most recent releases have not been; the only way I know to make a > compatible build is to build the installer on a machine that already > has > a 3rd party Tcl/Tk installed; the resulting binary is then compatible > with both 3rd party versions of Tcl/Tk and also with Apple's ancient > built in version. I can't answer this, but Ronald is building the OS X image for 2.6.2, AFAIK. I think it will be out soon, and maybe he can answer your Tcl/ Tk question. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From pje at telecommunity.com Wed Apr 15 23:11:32 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 17:11:32 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com > References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> Message-ID: <20090415210902.848443A4100@sparrow.telecommunity.com> At 10:00 PM 4/15/2009 +0200, Tarek Ziad? wrote: >Now for the "base" or "core" package, what peoplethat uses setuptools >do most of the time: > >1- they use zc.buildout so they don't need a base package : they list >in a configuration files all packages needed > to build the application, and one of these package happen to have >the scripts to launch the application. > >2 - they have a "main" package that doesn't use the same namespace, >but uses setuptools instal_requires metadata > to include namespaced packages. It acts like zc.buildout in some ways. > >For example, you mentioned atomisator.* in your example, this app has >a main package called "Atomisator" (notice the upper A) >that uses strategy #2 I think that there is some confusion here. A "main" package or buildout that assembles a larger project from components is not the same thing as having a "base" package for a namespace package. A base or core package is one that is depended upon by most or all of the related projects. In other words, the dependencies are in the *opposite direction* from what you described above. To have a base package in setuptools, you would move the target code from the namespace package __init__.py to another module or subpackage within your namespace, then make all your other projects depend on the project containing that module or subpackage. And I explicitly excluded from my survey any packages that were following this strategy, on the assumption that they might consider switching to an __init__.py or __pkg__.py strategy if some version of PEP 382 were supported by setuptools, since they already have a "base" or "core" project -- in that case, they are only changing ONE of their packages' distribution metadata to adopt the new strategy, because the dependencies already exist. >So : >- having namespaces natively in Python is a big win (Namespaces are >one honking great idea -- let's do more of those!) >- being able to still write some code under the primary namespace is >something I (and lots of people) wish we could do > with setuptools, so it's a big win too. Yes, that's why I support Martin's proposal: it would allow setuptools to support this case in the future, and it would also allow improved startup times for installations with many setuptools-based namespace packages installed in flat form. (Contra MAL's claims of decreased performance: adopting Martin's proposal allows there to be *fewer* .pth files read at startup, because only .pkg files for an actually-imported package need to be read.) From dalcinl at gmail.com Thu Apr 16 01:13:13 2009 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 15 Apr 2009 20:13:13 -0300 Subject: [Python-Dev] Why does read() return bytes instead of bytearray? In-Reply-To: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> Message-ID: On Wed, Apr 15, 2009 at 12:05 AM, Dan Eloff wrote: >>No, the read() method did not change from the 2.x series. It returns a new object on each call. > > I think you misunderstand me, but the readinto() method looks like a > perfectly reasonable solution, I didn't realize it existed, as it's > not in the library reference on file objects. Thanks for enlightening > me, I feel a little stupid now :) > However, your original question is still valid ... Why a binary read() returns an immutable type? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From solipsis at pitrou.net Thu Apr 16 01:16:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Apr 2009 23:16:29 +0000 (UTC) Subject: [Python-Dev] Why does read() return bytes instead of bytearray? References: <4817b6fc0904142005s1b4f79bdu8675d89f3118b258@mail.gmail.com> Message-ID: Lisandro Dalcin gmail.com> writes: > > However, your original question is still valid ... Why a binary read() > returns an immutable type? Because bytes is the standard type for holding binary data. Bytearray should only be used when there's a real, measured performance advantage doing so (which, IMHO, is rarer than you think). An immutable type makes daily programming much less error-prone. Regards Antoine. From stephen at xemacs.org Thu Apr 16 02:59:45 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 16 Apr 2009 09:59:45 +0900 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <49E6419D.5010302@egenix.com> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49E6419D.5010302@egenix.com> Message-ID: <87ab6h5s72.fsf@xemacs.org> M.-A. Lemburg writes: > Hmm, setuptools doesn't support the notion of base packages, ie. > packages that provide their own __init__.py module, so I fail > to see how your list or any other list of setuptools-depend > packages can be taken as indicator for anything related to > base packages. AFAICS the only things PJE has said about base packages is that (a) they aren't a universal use case for namespace packages, and (b) he'd like to be able to support them in setuptools, but admits that at present they aren't. Your arguments against the PEP supporting namespace packages as currently supported by setuptools seem purely theoretical to me, while he's defending an actual and common use case. "Although practicality beats purity." I think that for this PEP it's more important to unify the various use cases for namespace packages than it is to get rid of the .pth files. From pje at telecommunity.com Thu Apr 16 04:45:29 2009 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 15 Apr 2009 22:45:29 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <87ab6h5s72.fsf@xemacs.org> References: <49DB6A1F.50801@egenix.com> <20090407174355.B62983A4063@sparrow.telecommunity.com> <49E4A58F.70309@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <49E6419D.5010302@egenix.com> <87ab6h5s72.fsf@xemacs.org> Message-ID: <20090416024300.6E2843A4100@sparrow.telecommunity.com> At 09:59 AM 4/16/2009 +0900, Stephen J. Turnbull wrote: >I think that for this PEP it's more important to unify >the various use cases for namespace packages than it is to get rid of >the .pth files. Actually, Martin's proposal *does* get rid of the .pth files in site-packages, and replaces them with other files inside the individual packages. (Thereby speeding startup times when many namespace packages are present but only a few are used.) So Martin's proposal is a win for performance and even for decreasing clutter. (The same number of special files will be present, but they will be moved inside the namespace package directories instead of being in the parent directory.) >AFAICS the only things PJE has said about base packages is that > > (a) they aren't a universal use case for namespace packages, and > (b) he'd like to be able to support them in setuptools, but admits > that at present they aren't. ...and that Martin's proposal would actually permit me to do so, whereas MAL's proposal would not. Replacing __init__.py with a __pkg__.py wouldn't change any of the tradeoffs for how setuptools handles namespace packages, except to add an extra variable to consider (i.e., two filenames to keep track of). From glyph at divmod.com Thu Apr 16 05:46:02 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 16 Apr 2009 03:46:02 -0000 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090415210902.848443A4100@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> <20090415210902.848443A4100@sparrow.telecommunity.com> Message-ID: <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com> On 15 Apr, 09:11 pm, pje at telecommunity.com wrote: >I think that there is some confusion here. A "main" package or >buildout that assembles a larger project from components is not the >same thing as having a "base" package for a namespace package. I'm certainly confused. Twisted has its own system for "namespace" packages, and I'm not really sure where we fall in this discussion. I haven't been able to follow the whole thread, but my original understanding was that the PEP supports "defining packages", which we now seem to be calling "base packages", just fine. I don't understand the controversy over the counterproposal, since it seems roughly functionally equivalent to me. I'd appreciate it if the PEP could also be extended cover Twisted's very similar mechanism for namespace packages, "twisted.plugin.pluginPackagePaths". I know this is not quite as widely used as setuptools' namespace package support, but its existence belies a need for standardization. The PEP also seems a bit vague with regard to the treatment of other directories containing __init__.py and *.pkg files. The concept of a "defining package" seems important to avoid conflicts like this one: http://twistedmatrix.com/trac/ticket/2339 More specifically I don't quite understand the PEP's intentions towards hierarchical packages. It says that all of sys.path will be searched, but what about this case? In Twisted, the suggested idiom to structure a project which wants to provide Twisted plugins is to have a directory structure like this: MyProject/ myproject/ __init__.py twisted/ plugins/ myproject_plugin.py If you then put MyProject on PYTHONPATH, MyProject/twisted/plugins will be picked up automatically by the plugin machinery. However, as "twisted" is *not* a "namespace" package in the same way, .py files in MyProject/twisted/ would not be picked up - this is very much intentional, since the "twisted" namespace is intended to be reserved for packages that we actually produce. If either MyProject/twisted or MyProject/twisted/plugins/ had an __init__.py, then no modules in MyProject/twisted/plugins/ would be picked up, because it would be considered a conflicting package. This is important so that users can choose not to load the system- installed Twisted's plugins when they have both a system-installed version of Twisted and a non-installed development version of Twisted found first on their PYTHONPATH, and switch between them to indicate which version they want to be the "base" or "defining" package for the twisted/plugins/ namespace. Developers might also want to have a system-installed Twisted, but a non-installed development version of MyProject on PYTHONPATH. I hope this all makes sense. As I understand it, both setuptools and the proposed standard would either still have the bug described by ticket 2339 above, or would ignore twisted/plugins/ as a namespace package because its parent isn't a namespace package. I apologize for not testing with current setuptools before asking, but I'm not sure my experiments would be valid given that my environment is set up with assumptions from Twisted's system. P.S.: vendor packaging systems *ARE* a major use case for just about any aspect of Python's package structure. I really liked MvL's coverage of "vendor packages", in the PEP, since this could equally well apply to MSIs, python libraries distributed in bundles on OS X, debs, or RPMs. If this use-case were to be ignored, as one particular fellow seems to be advocating, then the broken packages and user confusion that has been going on for the last 5 years or so is just going to get worse. From jess.austin at gmail.com Thu Apr 16 08:18:01 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 01:18:01 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta Message-ID: hi, I'm new to python core development, and I've been advised to write to python-dev concerning a feature/patch I've placed at http://bugs.python.org/issue5434, with Rietveld at http://codereview.appspot.com/25079. This patch adds a "monthdelta" class and a "monthmod" function to the datetime module. The monthdelta class is much like the existing timedelta class, except that it represents months offset from a date, rather than an exact period offset from a date. This allows us to easily say, e.g. "3 months from now" without worrying about the number of days in the intervening months. >>> date(2008, 1, 30) + monthdelta(1) datetime.date(2008, 2, 29) >>> date(2008, 1, 30) + monthdelta(2) datetime.date(2008, 3, 30) The monthmod function, named in (imperfect) analogy to divmod, allows us to round-trip by returning the interim between two dates represented as a (monthdelta, timedelta) tuple: >>> monthmod(date(2008, 1, 14), date(2009, 4, 2)) (datetime.monthdelta(14), datetime.timedelta(19)) Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td These also work with datetimes! There are more details in the documentation included in the patch. In addition to the C module file, I've updated the datetime CAPI, the documentation, and tests. I feel this would be a good addition to core python. In my work, I've often ended up writing annoying one-off "add-a-month" or similar functions. I think since months work differently than most other time periods, a new object is justified rather than trying to shoe-horn something like this into timedelta. I also think that the round-trip functionality provided by monthmod is important to ensure that monthdeltas are "first-class" objects. Please let me know what you think of the idea and/or its execution. thanks, Jess Austin From tleeuwenburg at gmail.com Thu Apr 16 10:06:41 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Thu, 16 Apr 2009 18:06:41 +1000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <43c8685c0904160106l28126ae1n8490712d794e9fe7@mail.gmail.com> Hi Jess, I'm sorry if I'm failing to understand the use of this function from not looking closely at your code. I'm a bit dubious about the usefulness of this (I'm not sure I understand the use cases), but I'm very open to being convinced. Datetime semantics are very important in some areas -- I use them a lot. I'm not convinced the semantics of monthdelta are obvious. A month doesn't have a consistent length -- it could be 28, 29, 30 or 31 days. What happens when you ask for the date in "1 month's" time on the 31st Jan? What date is a month after the 31st Jan? Do you have a good spec (er, I mean PEP) for this describing what happens in the edge cases and what is meant by a monthdelta? The bug notes say it "deals sensibly" with these issues, but that's really not enough to understand what the function is likely to do. At the very least, a few well-chosen examples would help to illustrate the functionality much more clearly. Cheers, -Tennessee On Thu, Apr 16, 2009 at 4:18 PM, Jess Austin wrote: > hi, > > I'm new to python core development, and I've been advised to write to > python-dev concerning a feature/patch I've placed at > http://bugs.python.org/issue5434, with Rietveld at > http://codereview.appspot.com/25079. > > This patch adds a "monthdelta" class and a "monthmod" function to the > datetime module. The monthdelta class is much like the existing > timedelta class, except that it represents months offset from a date, > rather than an exact period offset from a date. This allows us to > easily say, e.g. "3 months from now" without worrying about the number > of days in the intervening months. > > >>> date(2008, 1, 30) + monthdelta(1) > datetime.date(2008, 2, 29) > >>> date(2008, 1, 30) + monthdelta(2) > datetime.date(2008, 3, 30) > > The monthmod function, named in (imperfect) analogy to divmod, allows > us to round-trip by returning the interim between two dates > represented as a (monthdelta, timedelta) tuple: > > >>> monthmod(date(2008, 1, 14), date(2009, 4, 2)) > (datetime.monthdelta(14), datetime.timedelta(19)) > > Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td > > These also work with datetimes! There are more details in the > documentation included in the patch. In addition to the C module > file, I've updated the datetime CAPI, the documentation, and tests. > > I feel this would be a good addition to core python. In my work, I've > often ended up writing annoying one-off "add-a-month" or similar > functions. I think since months work differently than most other time > periods, a new object is justified rather than trying to shoe-horn > something like this into timedelta. I also think that the round-trip > functionality provided by monthmod is important to ensure that > monthdeltas are "first-class" objects. > > Please let me know what you think of the idea and/or its execution. > > thanks, > Jess Austin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phd.pp.ru Thu Apr 16 10:10:36 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 16 Apr 2009 12:10:36 +0400 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <20090416081036.GA25435@phd.pp.ru> On Thu, Apr 16, 2009 at 01:18:01AM -0500, Jess Austin wrote: > I'm new to python core development, and I've been advised to write to > python-dev concerning a feature/patch I've placed at > http://bugs.python.org/issue5434, with Rietveld at > http://codereview.appspot.com/25079. I have read the python code and it looks good. I often have a need to do month-based calculations. > This patch adds a "monthdelta" class and a "monthmod" function to the > datetime module. The monthdelta class is much like the existing > timedelta class, except that it represents months offset from a date, > rather than an exact period offset from a date. I'd rather see the code merged with timedelta: timedelta(months=n). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From skip at pobox.com Thu Apr 16 10:45:24 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Apr 2009 03:45:24 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <18918.61476.980951.991275@montanaro.dyndns.org> >>> date(2008, 1, 30) + monthdelta(1) datetime.date(2008, 2, 29) What would this loop would print? for d in range(1, 32): print date(2008, 1, d) + monthdelta(1) I have this funny feeling that arithmetic using monthdelta wouldn't always be intuitive. Skip From lists at jwp.name Thu Apr 16 11:44:14 2009 From: lists at jwp.name (James Pye) Date: Thu, 16 Apr 2009 02:44:14 -0700 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <20090416081036.GA25435@phd.pp.ru> References: <20090416081036.GA25435@phd.pp.ru> Message-ID: On Apr 16, 2009, at 1:10 AM, Oleg Broytmann wrote: >> This patch adds a "monthdelta" class and a "monthmod" function to the >> datetime module. The monthdelta class is much like the existing >> timedelta class, except that it represents months offset from a date, >> rather than an exact period offset from a date. > > I'd rather see the code merged with timedelta: timedelta(months=n). +1 From amauryfa at gmail.com Thu Apr 16 11:54:13 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 16 Apr 2009 11:54:13 +0200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 10:45, wrote: > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > > What would this loop would print? > > ? ?for d in range(1, 32): > ? ? ? ?print date(2008, 1, d) + monthdelta(1) > > I have this funny feeling that arithmetic using monthdelta wouldn't always > be intuitive. FWIW, the Oracle database has two methods for adding months: 1- the add_months() function add_months(to_date('31-jan-2005'), 1) 2- the ANSI interval: to_date('31-jan-2005') + interval '1' month "add_months" is calendar sensitive, "interval" is not. "interval" raises an exception if the day is not valid for the target month (which is the case in my example) "add_months" is similar to the proposed monthdelta(), except that it has a special case for the last day of the month: """ If date is the last day of the month or if the resulting month has fewer days than the day component of date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as date. """ indeed: add_months(to_date('28-feb-2005'), 1) == to_date('31-mar-2005') In my opinion: arithmetic with months is a mess. There is no such "month interval" or "year interval" with a precise definition. If we adopt some kind of month manipulation, it should be a function or a method, like you would do for features like last_day_of_month(d), or following_weekday(d, 'monday'). date(2008, 1, 30).add_months(1) == date(2008, 2, 29) -- Amaury Forgeot d'Arc From dirkjan at ochtman.nl Thu Apr 16 12:16:15 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Thu, 16 Apr 2009 12:16:15 +0200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 11:54, Amaury Forgeot d'Arc wrote: > In my opinion: > arithmetic with months is a mess. There is no such "month interval" or > "year interval" with a precise definition. > If we adopt some kind of month manipulation, it should be a function > or a method, like you would do for features like last_day_of_month(d), > or following_weekday(d, 'monday'). > > ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29) I concur. Trying to shoehorn month arithmetic into timedelta is a PITA, precisely because it's somewhat inexact. It's better to have some separate behavior that has well-defined behavior in edge cases. Cheers, Dirkjan From jon+python-dev at unequivocal.co.uk Thu Apr 16 12:47:26 2009 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Thu, 16 Apr 2009 11:47:26 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <20090416081036.GA25435@phd.pp.ru> References: <20090416081036.GA25435@phd.pp.ru> Message-ID: <20090416104726.GS24050@snowy.squish.net> On Thu, Apr 16, 2009 at 12:10:36PM +0400, Oleg Broytmann wrote: > > This patch adds a "monthdelta" class and a "monthmod" function to the > > datetime module. The monthdelta class is much like the existing > > timedelta class, except that it represents months offset from a date, > > rather than an exact period offset from a date. > > I'd rather see the code merged with timedelta: timedelta(months=n). Unfortunately, that's simply impossible. A timedelta is a fixed number of seconds, and the time between one month and the next varies. I am very much in favour of there being the ability to add months to dates though. Obviously there is the question of what to do when you move forward 1 month from the 31st January; in my opinion an optional argument to specify different behaviours would be nice. From ronaldoussoren at mac.com Thu Apr 16 14:35:25 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 16 Apr 2009 14:35:25 +0200 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: On 15 Apr, 2009, at 22:47, Russell E. Owen wrote: > Thank you for 2.6.2. > > I see the Mac binary installer isn't out yet (at least it is not > listed > on the downloads page). Any chance that it will be compatible with 3rd > party Tcl/Tk? The Mac installer is late because I missed the pre-announcement of the 2.6.2 tag. I sent the installer to Barry earlier today. The installer was build using a 3th-party installation of Tcl/Tk. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From p.f.moore at gmail.com Thu Apr 16 16:54:08 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Apr 2009 15:54:08 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com> 2009/4/16 Jess Austin : > I'm new to python core development, and I've been advised to write to > python-dev concerning a feature/patch I've placed at > http://bugs.python.org/issue5434, with Rietveld at > http://codereview.appspot.com/25079. > > This patch adds a "monthdelta" class and a "monthmod" function to the > datetime module. ?The monthdelta class is much like the existing > timedelta class, except that it represents months offset from a date, > rather than an exact period offset from a date. ?This allows us to > easily say, e.g. "3 months from now" without worrying about the number > of days in the intervening months. > > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > ? ?>>> date(2008, 1, 30) + monthdelta(2) > ? ?datetime.date(2008, 3, 30) > > The monthmod function, named in (imperfect) analogy to divmod, allows > us to round-trip by returning the interim between two dates > represented as a (monthdelta, timedelta) tuple: > > ? ?>>> monthmod(date(2008, 1, 14), date(2009, 4, 2)) > ? ?(datetime.monthdelta(14), datetime.timedelta(19)) > > Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td I like the idea in principle. In practice, of course, month calculations are inherently ill-defined, so you need to be very specific in documenting all of the edge cases, and you should have strong use cases to ensure that the behaviour implemented matches user requirements. (I haven't yet had time to read the patch - you may well already have these points covered, certainly your comments above indicate that you appreciate the subtleties involved). > These also work with datetimes! ?There are more details in the > documentation included in the patch. ?In addition to the C module > file, I've updated the datetime CAPI, the documentation, and tests. > > I feel this would be a good addition to core python. ?In my work, I've > often ended up writing annoying one-off "add-a-month" or similar > functions. ?I think since months work differently than most other time > periods, a new object is justified rather than trying to shoe-horn > something like this into timedelta. ?I also think that the round-trip > functionality provided by monthmod is important to ensure that > monthdeltas are "first-class" objects. I agree that ultimately it would be useful in the core. However, I'd suggest that you release the functionality as an independent module in the first instance, to establish it outside of the core. Once it has matured somewhat as a 3rd party module, it would then be ready for integration in the core. This also has the benefit that it makes the functionality available to users of Python 2.6 (and possibly earlier) rather than just in 2.7/3.1 onwards. > Please let me know what you think of the idea and/or its execution. I hope the above comments help. Ultimately, I'd like to see this added to the core. It's tricky enough that having a "standard" implementation is a definite benefit in itself. But equally, I'd give it time to iron out the corner cases on a faster development cycle than the core offers before "freezing" it as part of the stdlib. Paul. From p.f.moore at gmail.com Thu Apr 16 16:56:40 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Apr 2009 15:56:40 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: <79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com> 2009/4/16 : > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > > What would this loop would print? > > ? ?for d in range(1, 32): > ? ? ? ?print date(2008, 1, d) + monthdelta(1) > > I have this funny feeling that arithmetic using monthdelta wouldn't always > be intuitive. Oh, certainly! But in the absence of "intuitive", I've found in the past that "standardised" is often better than nothing :-) (For example, I use Oracle's add_months function fairly often - it's not perfect, and not always intuitive, but at least it's well-defined in the corner cases, and fine for "normal" use). Paul. From solipsis at pitrou.net Thu Apr 16 17:12:26 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Apr 2009 15:12:26 +0000 (UTC) Subject: [Python-Dev] Issue5434: datetime.monthdelta References: <18918.61476.980951.991275@montanaro.dyndns.org> <79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com> Message-ID: Paul Moore gmail.com> writes: > > Oh, certainly! But in the absence of "intuitive", I've found in the > past that "standardised" is often better than nothing (For > example, I use Oracle's add_months function fairly often - it's not > perfect, and not always intuitive, but at least it's well-defined in > the corner cases, and fine for "normal" use). I think something like "date.add_months()" would be better than the proposed monthdelta. The monthdelta proposal suggests that addition is something well-defined and rigourous, which is not really the case here (for example, if you add a monthdelta and then substract it again, I'm not sure you always get back the original date). Regards Antoine. From pje at telecommunity.com Thu Apr 16 17:36:18 2009 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 16 Apr 2009 11:36:18 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090416034602.12555.179034490.divmod.xquotient.8434@weber .divmod.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> <20090415210902.848443A4100@sparrow.telecommunity.com> <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com> Message-ID: <20090416153350.702303A4100@sparrow.telecommunity.com> At 03:46 AM 4/16/2009 +0000, glyph at divmod.com wrote: >On 15 Apr, 09:11 pm, pje at telecommunity.com wrote: >>I think that there is some confusion here. A "main" package or >>buildout that assembles a larger project from components is not the >>same thing as having a "base" package for a namespace package. > >I'm certainly confused. > >Twisted has its own system for "namespace" packages, and I'm not >really sure where we fall in this discussion. I haven't been able >to follow the whole thread, but my original understanding was that >the PEP supports "defining packages", which we now seem to be >calling "base packages", just fine. Yes, it does. The discussion since the original proposal, however, has been dominated by MAL's counterproposal, which *requires* a defining package. There is a slight distinction between "base package" and "defining package", although I suppose I've been using them a bit interchangeably. Base package describes a use case: you have a base package which is extended in the same namespace. In that use case, you may want to place your base package in the defining package. In contrast, setuptools does not support a defining package, so if you have a base package, you must place it in a submodule or subpackage of the namespace. Does that all make sense now? MAL's proposal requires a defining package, which is counterproductive if you have a pure package with no base, since it now requires you to create an additional project on PyPI just to hold your defining package. >I'd appreciate it if the PEP could also be extended cover Twisted's >very similar mechanism for namespace packages, >"twisted.plugin.pluginPackagePaths". I know this is not quite as >widely used as setuptools' namespace package support, but its >existence belies a need for standardization. > >The PEP also seems a bit vague with regard to the treatment of other >directories containing __init__.py and *.pkg files. Do you have a clarification to suggest? My understanding (probably a projection) is that to be a nested namespace package, you have to have a parent namespace package. > The concept of a "defining package" seems important to avoid > conflicts like this one: > > http://twistedmatrix.com/trac/ticket/2339 > >More specifically I don't quite understand the PEP's intentions >towards hierarchical packages. It says that all of sys.path will be >searched, but what about this case? > >In Twisted, the suggested idiom to structure a project which wants >to provide Twisted plugins is to have a directory structure like this: > > MyProject/ > myproject/ > __init__.py > twisted/ > plugins/ > myproject_plugin.py > >If you then put MyProject on PYTHONPATH, MyProject/twisted/plugins >will be picked up automatically by the plugin machinery. Namespaces are not plugins and vice versa. The purpose of a namespace package is to allow projects managed by the same entity to share a namespace (ala Java "package" names) and avoid naming conflicts with other authors. A plugin system, by contrast, is explicitly intended for use by multiple authors, so the use case is rather different... and using namespace packages for plugins actually *increases* the possibility of naming conflicts, unless you add back in another level of hierarchy. (As apparently you are recommending via "myproject_plugin".) > However, as "twisted" is *not* a "namespace" package in the same > way, .py files in MyProject/twisted/ would not be picked up - this > is very much intentional, since the "twisted" namespace is intended > to be reserved for packages that we actually produce. If either > MyProject/twisted or MyProject/twisted/plugins/ had an __init__.py, > then no modules in MyProject/twisted/plugins/ would be picked up, > because it would be considered a conflicting package. Precisely. Note, however, that neither is twisted.plugins a namespace package, and it should not contain any .pkg files. I don't think it's reasonable to abuse PEP 382 namespace packages as a plugin system. In setuptools' case, a different mechanism is provided for locating plugin code, and of course Twisted already has its own system for the same thing. It would be nice to have a standardized way of locating plugins in the stdlib, but that will need to be a different PEP. >I hope this all makes sense. As I understand it, both setuptools >and the proposed standard would either still have the bug described >by ticket 2339 above, or would ignore twisted/plugins/ as a >namespace package because its parent isn't a namespace package. If twisted/ lacked an __init__.py, then setuptools would ignore it. Under PEP 382, the same, unless it had .pkg files. (Again, setuptools explicitly does not support using namespace packages as a plugin mechanism.) >P.S.: vendor packaging systems *ARE* a major use case for just about >any aspect of Python's package structure. I really liked MvL's >coverage of "vendor packages", in the PEP, since this could equally >well apply to MSIs, python libraries distributed in bundles on OS X, >debs, or RPMs. If this use-case were to be ignored, as one >particular fellow seems to be advocating, then the broken packages >and user confusion that has been going on for the last 5 years or so >is just going to get worse. Indeed. From nad at acm.org Thu Apr 16 18:47:37 2009 From: nad at acm.org (Ned Deily) Date: Thu, 16 Apr 2009 09:47:37 -0700 Subject: [Python-Dev] Issue5434: datetime.monthdelta References: Message-ID: In article , Jess Austin wrote: > I'm new to python core development, and I've been advised to write to > python-dev concerning a feature/patch I've placed at > http://bugs.python.org/issue5434, with Rietveld at > http://codereview.appspot.com/25079. Without having looked at the code, I wonder whether you've looked at python-dateutil. I believe its relativedelta type does what you propose, plus much more, and it has the advantage of being widely used and tested. -- Ned Deily, nad at acm.org From jess.austin at gmail.com Thu Apr 16 20:31:04 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 13:31:04 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18918.61476.980951.991275@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 3:45 AM, wrote: > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > > What would this loop would print? > > ? ?for d in range(1, 32): > ? ? ? ?print date(2008, 1, d) + monthdelta(1) >>> for d in range(1, 32): ... print(date(2008, 1, d) + monthdelta(1)) ... 2008-02-01 2008-02-02 2008-02-03 2008-02-04 2008-02-05 2008-02-06 2008-02-07 2008-02-08 2008-02-09 2008-02-10 2008-02-11 2008-02-12 2008-02-13 2008-02-14 2008-02-15 2008-02-16 2008-02-17 2008-02-18 2008-02-19 2008-02-20 2008-02-21 2008-02-22 2008-02-23 2008-02-24 2008-02-25 2008-02-26 2008-02-27 2008-02-28 2008-02-29 2008-02-29 2008-02-29 > I have this funny feeling that arithmetic using monthdelta wouldn't always > be intuitive. I think that's true, especially since these calculations are not necessarily invertible: >>> date(2008, 1, 30) + monthdelta(1) datetime.date(2008, 2, 29) >>> date(2008, 2, 29) - monthdelta(1) datetime.date(2008, 1, 29) It could be that non-intuitivity is inherent in the problem of dealing with dates and months. I've aimed for a good compromise between the needs of the problem and the pythonic example of timedelta. I would submit that timedelta itself isn't intuitive at first blush, especially if one was weaned on the arcana of RDBMS date functions, but after one uses timedelta for just a bit it makes total sense. I hope the same may be said of monthdelta. cheers, Jess From p.f.moore at gmail.com Thu Apr 16 20:31:36 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Apr 2009 19:31:36 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> <79990c6b0904160756k761cdbagf085fa4966176f70@mail.gmail.com> Message-ID: <79990c6b0904161131o5b85555k9ba488c1de87f06b@mail.gmail.com> 2009/4/16 Antoine Pitrou : > Paul Moore gmail.com> writes: >> >> Oh, certainly! But in the absence of "intuitive", I've found in the >> past that "standardised" is often better than nothing ?(For >> example, I use Oracle's add_months function fairly often - it's not >> perfect, and not always intuitive, but at least it's well-defined in >> the corner cases, and fine for "normal" use). > > I think something like "date.add_months()" would be better than the proposed > monthdelta. The monthdelta proposal suggests that addition is something > well-defined and rigourous, which is not really the case here (for example, if > you add a monthdelta and then substract it again, I'm not sure you always get > back the original date). I didn't particularly get that impression, but I understand what you're saying. Personally, I don't think it matters much one way or the other. But as well as monthdelta, the proposal included monthmod. I'm not entirely happy with the name, but I like the idea - and particularly the invariant dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td. For me, that makes it a lot easier to reason about month increments. One thing I have certainly needed in the past is a robust way of converting a difference between two dates into "natural language" - 3 years, 2 months, 1 week and 5 days (or whatever). For that type of application, monthmod would have been invaluable. In my view, monthdelta seems a lot more natural alongside monthmod, than an add_months method would. And as monthmod is a function of two dates, it can't really be a method (OK, I know, something horrid like date1.monthdiff(date2) is possible, but honestly, I don't see that as reasonable). But this type of API design discussion does emphasise why I think the module should be a 3rd party package for a while before going into the stdlib. Paul. From p.f.moore at gmail.com Thu Apr 16 20:42:00 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Apr 2009 19:42:00 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> 2009/4/16 Ned Deily : > In article > , > ?Jess Austin wrote: >> I'm new to python core development, and I've been advised to write to >> python-dev concerning a feature/patch I've placed at >> http://bugs.python.org/issue5434, with Rietveld at >> http://codereview.appspot.com/25079. > > Without having looked at the code, I wonder whether you've looked at > python-dateutil. ? I believe its relativedelta type does what you > propose, plus much more, and it has the advantage of being widely used > and tested. The key thing missing (I believe) from dateutil is any equivalent of monthmod. Hmm, it might be possible via relativedelta(d1,d2), but it's not clear to me from the documentation precisely what attributes/methods of a relativedelta object are valid for getting data *out* of it. I do agree, though, that any proposal to extend the Python datetime module should at least be informed by the design of the dateutil module. Paul. From jess.austin at gmail.com Thu Apr 16 20:47:42 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 13:47:42 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 4:54 AM, Amaury Forgeot d'Arc wrote: > FWIW, the Oracle database has two methods for adding months: > 1- the add_months() function > ? ?add_months(to_date('31-jan-2005'), 1) > 2- the ANSI interval: > ? ?to_date('31-jan-2005') + interval '1' month > > "add_months" is calendar sensitive, "interval" is not. > "interval" raises an exception if the day is not valid for the target > month (which is the case in my example) > > "add_months" is similar to the proposed monthdelta(), > except that it has a special case for the last day of the month: > """ > If date is the last day of the month or if the resulting month has > fewer days than the day > component of date, then the result is the last day of the resulting month. > Otherwise, the result has the same day component as date. > """ > indeed: > ? ?add_months(to_date('28-feb-2005'), 1) == to_date('31-mar-2005') My proposal has the "calendar sensitive" semantics you describe. It will not raise an exception in this case. > In my opinion: > arithmetic with months is a mess. There is no such "month interval" or > "year interval" with a precise definition. > If we adopt some kind of month manipulation, it should be a function > or a method, like you would do for features like last_day_of_month(d), > or following_weekday(d, 'monday'). > > ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29) I disagree with this point, in that I really like the pythonic date calculations we have with timedelta. It is easier to reason about adding and subtracting objects than it is to reason about method invocations. Also, you can store a monthdelta in a variable, which is sometimes convenient, and which is difficult to emulate with function calls. Except in certain particular cases, I'm not fond of last_day_of_month, following_weekday, etc. functions. Much in the way that timezone considerations have been factored out of the core through the use of tzinfo, I think these problems are more effectively addressed at the level of detail one finds at the application level. On the other hand, it seems like effective month calculations could be useful in the core. cheers, Jess From jess.austin at gmail.com Thu Apr 16 20:50:29 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 13:50:29 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 5:16 AM, Dirkjan Ochtman wrote: > On Thu, Apr 16, 2009 at 11:54, Amaury Forgeot d'Arc wrote: >> In my opinion: >> arithmetic with months is a mess. There is no such "month interval" or >> "year interval" with a precise definition. >> If we adopt some kind of month manipulation, it should be a function >> or a method, like you would do for features like last_day_of_month(d), >> or following_weekday(d, 'monday'). >> >> ? ?date(2008, 1, 30).add_months(1) == date(2008, 2, 29) > > I concur. Trying to shoehorn month arithmetic into timedelta is a > PITA, precisely because it's somewhat inexact. It's better to have > some separate behavior that has well-defined behavior in edge cases. This is my experience also, and including a distinct and well-defined behavior in the core is exactly my intention with this patch. From jess.austin at gmail.com Thu Apr 16 21:28:07 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 14:28:07 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com> References: <79990c6b0904160754r3761518an967fc543a76767d5@mail.gmail.com> Message-ID: Thanks for everyone's comments! On Thu, Apr 16, 2009 at 9:54 AM, Paul Moore wrote: > I like the idea in principle. In practice, of course, month > calculations are inherently ill-defined, so you need to be very > specific in documenting all of the edge cases, and you should have > strong use cases to ensure that the behaviour implemented matches user > requirements. (I haven't yet had time to read the patch - you may well > already have these points covered, certainly your comments above > indicate that you appreciate the subtleties involved). > > I agree that ultimately it would be useful in the core. However, I'd > suggest that you release the functionality as an independent module in > the first instance, to establish it outside of the core. Once it has > matured somewhat as a 3rd party module, it would then be ready for > integration in the core. This also has the benefit that it makes the > functionality available to users of Python 2.6 (and possibly earlier) > rather than just in 2.7/3.1 onwards. I have uploaded a python-coded version of this functionality to the bug page. I should backport it through 2.3 and post that to pypi, but I haven't done that yet. The current effort was focused on the C module since that's how the rest of datetime is implemented, and also I wanted to learn a bit about CPython internals. To the latter point, I would _really_ appreciate it if someone could leave a few comments on Rietveld. >> Please let me know what you think of the idea and/or its execution. > > I hope the above comments help. Ultimately, I'd like to see this added > to the core. It's tricky enough that having a "standard" > implementation is a definite benefit in itself. But equally, I'd give > it time to iron out the corner cases on a faster development cycle > than the core offers before "freezing" it as part of the stdlib. I understand these concerns. I think I was too brief in my initial message. Here are the docstrings: >>> print(monthdelta.__doc__) Months offset from a date or datetime. monthdeltas allow date calculation without regard to the different lengths of different months. A monthdelta value added to a date produces another date that has the same day-of-the-month, regardless of the lengths of the intervening months. If the resulting date is in too short a month, the last day in that month will result: date(2008,1,30) + monthdelta(1) -> date(2008,2,29) monthdeltas may be added, subtracted, multiplied, and floor-divided similarly to timedeltas. They may not be added to timedeltas directly, as both classes are intended to be used directly with dates and datetimes. Only ints may be passed to the constructor, the default argument of which is 1 (one). monthdeltas are immutable. NOTE: in calculations involving the 29th, 30th, and 31st days of the month, monthdeltas are not necessarily invertible [i.e., the result above would NOT imply that date(2008,2,29) - monthdelta(1) -> date(2008,1,30)]. >>> print(monthmod.__doc__) monthmod(start, end) -> (monthdelta, timedelta) Distribute the interim between start and end dates into monthdelta and timedelta portions. If and only if start is after end, returned monthdelta will be negative. Returned timedelta is never negative, and is always smaller than the month in which end occurs. Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] = dt + td There is better-looking documentation in html/library/datetime.html and html/c-api/datetime.html in the patch. By all means, if you're curious, download the patch and try it out yourself! cheers, Jess From rowen at u.washington.edu Thu Apr 16 20:58:27 2009 From: rowen at u.washington.edu (Russell Owen) Date: Thu, 16 Apr 2009 11:58:27 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: I installed the Mac binary on my Intel 10.5.6 system and it works, except it still uses Apple's system Tcl/Tk 8.4.7 instead of my ActiveState 8.4.19 (which is in /Library/Frameworks where one would expect). I just built python from source and that version does use ActiveState 8.4.19. I wish I knew what's going on. Not being able to use the binary distros is a bit of a pain. Just out of curiosity: which 3rd party Tcl/Tk did you have installed when you made the installer? Perhaps if it was 8.5 that would explain it. If so I may try updating my Tcl/Tk -- I've been wanting some of the bug fixes in 8.5 anyway. -- Russell On Apr 16, 2009, at 5:35 AM, Ronald Oussoren wrote: > > On 15 Apr, 2009, at 22:47, Russell E. Owen wrote: > >> Thank you for 2.6.2. >> >> I see the Mac binary installer isn't out yet (at least it is not >> listed >> on the downloads page). Any chance that it will be compatible with >> 3rd >> party Tcl/Tk? > > The Mac installer is late because I missed the pre-announcement of > the 2.6.2 tag. I sent the installer to Barry earlier today. > > The installer was build using a 3th-party installation of Tcl/Tk. > > Ronald From jared.grubb at gmail.com Thu Apr 16 22:08:07 2009 From: jared.grubb at gmail.com (Jared Grubb) Date: Thu, 16 Apr 2009 13:08:07 -0700 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> References: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> Message-ID: On 16 Apr 2009, at 11:42, Paul Moore wrote: > The key thing missing (I believe) from dateutil is any equivalent of > monthmod. I agree with that. It's well-defined and it makes a lot of sense. +1 But, I dont think monthdelta can be made to work... what should the following be? print(date(2008,1,30) + monthdelta(1)) print(date(2008,1,30) + monthdelta(2)) print(date(2008,1,30) + monthdelta(1) + monthdelta(1)) Jared From robert.kern at gmail.com Thu Apr 16 22:10:15 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 16 Apr 2009 15:10:15 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> References: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> Message-ID: On 2009-04-16 13:42, Paul Moore wrote: > 2009/4/16 Ned Deily: >> In article >> , >> Jess Austin wrote: >>> I'm new to python core development, and I've been advised to write to >>> python-dev concerning a feature/patch I've placed at >>> http://bugs.python.org/issue5434, with Rietveld at >>> http://codereview.appspot.com/25079. >> Without having looked at the code, I wonder whether you've looked at >> python-dateutil. I believe its relativedelta type does what you >> propose, plus much more, and it has the advantage of being widely used >> and tested. > > The key thing missing (I believe) from dateutil is any equivalent of monthmod. > > Hmm, it might be possible via relativedelta(d1,d2), but it's not clear > to me from the documentation precisely what attributes/methods of a > relativedelta object are valid for getting data *out* of it. I thought the examples were quite clear. relativedelta() has an alternate constructor precisely suited to these calculations but is general and handles more than just months. >>> from dateutil.relativedelta import * >>> dt = relativedelta(months=1) >>> dt relativedelta(months=+1) >>> from datetime import datetime >>> datetime(2009, 1, 15) + dt datetime.datetime(2009, 2, 15, 0, 0) >>> datetime(2009, 1, 31) + dt datetime.datetime(2009, 2, 28, 0, 0) >>> dt.months 1 >>> datetime(2009, 1, 31) + relativedelta(years=-1) datetime.datetime(2008, 1, 31, 0, 0) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jess.austin at gmail.com Thu Apr 16 23:30:13 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 16:30:13 -0500 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: References: Message-ID: Jared Grubb wrote: > On 16 Apr 2009, at 11:42, Paul Moore wrote: >> The key thing missing (I believe) from dateutil is any equivalent of >> monthmod. > > > I agree with that. It's well-defined and it makes a lot of sense. +1 > > But, I dont think monthdelta can be made to work... what should the > following be? >>> print(date(2008,1,30) + monthdelta(1)) 2008-02-29 >>> print(date(2008,1,30) + monthdelta(2)) 2008-03-30 >>> print(date(2008,1,30) + monthdelta(1) + monthdelta(1)) 2008-03-29 This is a perceptive observation: in the absence of parentheses to dictate a different order of operations, the third quantity will differ from the second. Furthermore, this won't _always_ be true, just for dates near the end of the month, which is nonintuitive. (Incidentally, this is another reason why this functionality should not just be lumped into timedelta; guarantees that have long existed for operations with timedelta would no longer hold if it tried to deal with months.) I find that date calculations involving months involve a certain amount of inherent confusion. I've tried to reduce this by introducing well-specified functionality that will allow accurate reasoning, as part of the core's included batteries. I think that one who uses these objects will develop an intuition and write accurate code quickly. It is nonintuitive that order of operation matters for addition of months, just as it matters for subtraction and division of all objects, but with the right tools we can deal with this. An interesting consequence is that if I want to determine if date b is more than a month after date a, sometimes I should use: b - monthdelta(1) > a rather than a + monthdelta(1) < b [Consider a list of run dates for a process that should run the last day of every month: "a" might be date(2008, 2, 29) while "b" is date(2008, 3, 31). In this case the two expressions would have different values.] cheers, Jess From jess.austin at gmail.com Thu Apr 16 23:41:19 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 16:41:19 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta Message-ID: Jon Ribbens wrote: > On Thu, Apr 16, 2009 at 12:10:36PM +0400, Oleg Broytmann wrote: >> > This patch adds a "monthdelta" class and a "monthmod" function to the >> > datetime module. ?The monthdelta class is much like the existing >> > timedelta class, except that it represents months offset from a date, >> > rather than an exact period offset from a date. >> >> ? ?I'd rather see the code merged with timedelta: timedelta(months=n). > > Unfortunately, that's simply impossible. A timedelta is a fixed number > of seconds, and the time between one month and the next varies. I agree. > I am very much in favour of there being the ability to add months to > dates though. Obviously there is the question of what to do when you > move forward 1 month from the 31st January; in my opinion an optional > argument to specify different behaviours would be nice. Others have suggested raising an exception when a month calculation lands on an invalid date. Python already has that; it's spelled like this: >>> dt = date(2008, 1, 31) >>> dt.replace(month=dt.month + 1) Traceback (most recent call last): File "", line 1, in ValueError: day is out of range for month What other behavior options besides "last-valid-day-of-the-month" would you like to see? cheers, Jess From solipsis at pitrou.net Thu Apr 16 23:47:14 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 16 Apr 2009 21:47:14 +0000 (UTC) Subject: [Python-Dev] Issue5434: datetime.monthdelta References: Message-ID: Jess Austin gmail.com> writes: > > What other behavior options besides "last-valid-day-of-the-month" > would you like to see? IMHO, the question is rather what the use case is for the behaviour you are proposing. In which kind of situation is it acceptable to turn 31/2 silently into 29/2? From eric at trueblade.com Thu Apr 16 23:50:27 2009 From: eric at trueblade.com (Eric Smith) Date: Thu, 16 Apr 2009 17:50:27 -0400 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <49E7A823.3060702@trueblade.com> Jess Austin wrote: > What other behavior options besides "last-valid-day-of-the-month" > would you like to see? - Add 30 days to the source date. I'm sure there are others. Followups to python-ideas. From p.f.moore at gmail.com Fri Apr 17 00:17:07 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Apr 2009 23:17:07 +0100 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> Message-ID: <79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com> 2009/4/16 Robert Kern : > On 2009-04-16 13:42, Paul Moore wrote: >> >> 2009/4/16 Ned Deily: >>> >>> In article >>> , >>> ?Jess Austin ?wrote: >>>> >>>> I'm new to python core development, and I've been advised to write to >>>> python-dev concerning a feature/patch I've placed at >>>> http://bugs.python.org/issue5434, with Rietveld at >>>> http://codereview.appspot.com/25079. >>> >>> Without having looked at the code, I wonder whether you've looked at >>> python-dateutil. ? I believe its relativedelta type does what you >>> propose, plus much more, and it has the advantage of being widely used >>> and tested. >> >> The key thing missing (I believe) from dateutil is any equivalent of >> monthmod. >> >> Hmm, it might be possible via relativedelta(d1,d2), but it's not clear >> to me from the documentation precisely what attributes/methods of a >> relativedelta object are valid for getting data *out* of it. > > I thought the examples were quite clear. relativedelta() has an alternate > constructor precisely suited to these calculations but is general and > handles more than just months. > >>>> from dateutil.relativedelta import * >>>> dt = relativedelta(months=1) >>>> dt > relativedelta(months=+1) >>>> from datetime import datetime >>>> datetime(2009, 1, 15) + dt > datetime.datetime(2009, 2, 15, 0, 0) >>>> datetime(2009, 1, 31) + dt > datetime.datetime(2009, 2, 28, 0, 0) >>>> dt.months > 1 >>>> datetime(2009, 1, 31) + relativedelta(years=-1) > datetime.datetime(2008, 1, 31, 0, 0) Yes, but given r = relativedelta(d1, d2) how do I determine the number of months between d1 and d2, and the "remainder" - what monthmod gives me. From the code, r.months looks like it works, but it's not documented, and I'm not 100% sure if it's always computed. The use case I'm thinking of is converting the difference between 2 dates into "3 years, 2 months, 5 days" or whatever. I've got an application which needs to get this right for one of the dates being 29th Feb, so I *really* get to exercise the corner cases :-) Paul From robert.kern at gmail.com Fri Apr 17 00:29:24 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 16 Apr 2009 17:29:24 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com> References: <79990c6b0904161142n7aeb155akb4196ad6f86054f5@mail.gmail.com> <79990c6b0904161517g5c83673g3e0673072ff8abee@mail.gmail.com> Message-ID: On 2009-04-16 17:17, Paul Moore wrote: > 2009/4/16 Robert Kern: >>>>> from dateutil.relativedelta import * >>>>> dt = relativedelta(months=1) >>>>> dt >> relativedelta(months=+1) >>>>> from datetime import datetime >>>>> datetime(2009, 1, 15) + dt >> datetime.datetime(2009, 2, 15, 0, 0) >>>>> datetime(2009, 1, 31) + dt >> datetime.datetime(2009, 2, 28, 0, 0) >>>>> dt.months >> 1 >>>>> datetime(2009, 1, 31) + relativedelta(years=-1) >> datetime.datetime(2008, 1, 31, 0, 0) > > Yes, but given > > r = relativedelta(d1, d2) > > how do I determine the number of months between d1 and d2, and the > "remainder" - what monthmod gives me. Oops! Sorry, I read too quickly and misread "monthmod" as "monthdelta". > From the code, r.months looks > like it works, but it's not documented, and I'm not 100% sure if it's > always computed. The result of relativedelta(d1, d2) is the same thing as if it were explicitly constructed from the years=, months=, etc. keyword arguments. From this example, I think this is something that can be relied upon: """ It works with dates too. >>> relativedelta(TODAY, johnbirthday) relativedelta(years=+25, months=+5, days=+11, hours=+12) """ > The use case I'm thinking of is converting the difference between 2 > dates into "3 years, 2 months, 5 days" or whatever. I've got an > application which needs to get this right for one of the dates being > 29th Feb, so I *really* get to exercise the corner cases :-) I believe relativedelta() is intended for this use case although it may resolve ambiguities in a different way than you were hoping. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jess.austin at gmail.com Fri Apr 17 01:02:11 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 18:02:11 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta Message-ID: Antoine Pitrou wrote: > Jess Austin gmail.com> writes: >> >> What other behavior options besides "last-valid-day-of-the-month" >> would you like to see? > > IMHO, the question is rather what the use case is for the behaviour you are > proposing. In which kind of situation is it acceptable to turn 31/2 silently > into 29/2? I have worked in utility/telecom billing, and needed to examine large numbers of invoice dates, fulfillment dates, disconnection dates, payment dates, collection event dates, etc. There would often be particular rules for the relationships among these dates, and since many companies generate invoices every day of the month, you couldn't rely on rules like "this always happens on the 5th". Here is an example (modified) from the doc page. We want to find missing invoices: >>> invoices = {123: [date(2008, 1, 31), ... date(2008, 2, 29), ... date(2008, 3, 31), ... date(2008, 4, 30), ... date(2008, 5, 31), ... date(2008, 6, 30), ... date(2008, 7, 31), ... date(2008, 12, 31)], ... 456: [date(2008, 1, 1), ... date(2008, 5, 1), ... date(2008, 6, 1), ... date(2008, 7, 1), ... date(2008, 8, 1), ... date(2008, 11, 1), ... date(2008, 12, 1)]} >>> for account, dates in invoices.items(): ... a = dates[0] ... for b in dates[1:]: ... if b - monthdelta(1) > a: ... print('account', account, 'missing between', a, 'and', b) ... a = b ... account 456 missing between 2008-01-01 and 2008-05-01 account 456 missing between 2008-08-01 and 2008-11-01 account 123 missing between 2008-07-31 and 2008-12-31 In general, sometimes we care more about the number of months that separate dates than we do about the exact dates themselves. This is perhaps not the most common situation for date calculations, but it does come up for some of us. I tired of writing one-off solutions that would fail in unexpected corner cases, so I created this patch. Paul Moore has also described his favorite use-case for this functionality. cheers, Jess From foom at fuhm.net Fri Apr 17 01:11:51 2009 From: foom at fuhm.net (James Y Knight) Date: Thu, 16 Apr 2009 19:11:51 -0400 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> On Apr 16, 2009, at 5:47 PM, Antoine Pitrou wrote: > IMHO, the question is rather what the use case is for the behaviour > you are > proposing. In which kind of situation is it acceptable to turn 31/2 > silently > into 29/2? Essentially any situation in which you'd actually want a "next month" operation it's acceptable to do that. It's a human-interface operation, and as such, everyone (ahem) "knows what it means" to say "2 months from now", but the details don't usually have to be thought about too much. Of course when you have a computer program, you actually need to tell it what you really mean. I do a fair amount of date calculating, and use two different kinds of "add-month": Option 1) Add n to the month number, truncate day number to fit the month you end up in. Option 2) As above, but with the additional caveat that if the original date is the last day of its month, the new day should also be the last day of the new month. That is: April 30th + 1 month = May 31st, instead of May 30th. They're both useful behaviors, in different circumstances. James From skip at pobox.com Fri Apr 17 02:18:35 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Apr 2009 19:18:35 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> Message-ID: <18919.51931.874515.848841@montanaro.dyndns.org> >> I have this funny feeling that arithmetic using monthdelta wouldn't >> always be intuitive. Jess> I think that's true, especially since these calculations are not Jess> necessarily invertible: >>> date(2008, 1, 30) + monthdelta(1) datetime.date(2008, 2, 29) >>> date(2008, 2, 29) - monthdelta(1) datetime.date(2008, 1, 29) Jess> It could be that non-intuitivity is inherent in the problem of Jess> dealing with dates and months. To which I would respond: >>> import this The Zen of Python, by Tim Peters ... In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. ... >From the discussion I've seen so far, it's not clear that there is one obvious way to do it, and the ambiguity of the problem forces people to guess. My recommendations after letting it roll around in the back of my brain for the day: * I think it would be best to leave the definition of monthdelta up to individual users. That is, add nothing to the datetime module and let them write a function which does what they want it to do. * The idea/implementation probably needs to bake on the python-ideas list and perhaps comp.lang.python for a bit to see if some concensus can be reached on reasonable functionality. (I'm a bit behind on this thread. Hopefully someone else has already suggested these two things.) Skip From tleeuwenburg at gmail.com Fri Apr 17 02:52:55 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Fri, 17 Apr 2009 10:52:55 +1000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18919.51931.874515.848841@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> Message-ID: <43c8685c0904161752i6a7f4a23o3ece8f5b71ec6dd8@mail.gmail.com> My thoughts on balance regarding monthdeltas: -- Month operations are useful, people will want to do them -- I think having a monthdelta object rather than a method makes sense to me -- I think the documentation is severely underdone. The functionality is not intuitive and therefore the docs need a lot more detail than usual -- Can you specify "1 month plus 10 days"?, i.e. add a monthdelta to a timedelta? -- What about other cyclical periods (fortnights, 28 days, lunar cycles, high tides)? Cheers, -T -------------- next part -------------- An HTML attachment was scrubbed... URL: From jess.austin at gmail.com Fri Apr 17 03:01:22 2009 From: jess.austin at gmail.com (Jess Austin) Date: Thu, 16 Apr 2009 20:01:22 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18919.51931.874515.848841@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 7:18 PM, wrote: > > ? ?>> I have this funny feeling that arithmetic using monthdelta wouldn't > ? ?>> always be intuitive. > > ? ?Jess> I think that's true, especially since these calculations are not > ? ?Jess> necessarily invertible: > > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > ? ?>>> date(2008, 2, 29) - monthdelta(1) > ? ?datetime.date(2008, 1, 29) > > ? ?Jess> It could be that non-intuitivity is inherent in the problem of > ? ?Jess> dealing with dates and months. > > To which I would respond: > > ? ?>>> import this > ? ?The Zen of Python, by Tim Peters > > ? ?... > ? ?In the face of ambiguity, refuse the temptation to guess. > ? ?There should be one-- and preferably only one --obvious way to do it. > ? ?Although that way may not be obvious at first unless you're Dutch. > ? ?... > > From the discussion I've seen so far, it's not clear that there is one > obvious way to do it, and the ambiguity of the problem forces people to > guess. > > My recommendations after letting it roll around in the back of my brain for > the day: > > ? ?* I think it would be best to leave the definition of monthdelta up to > ? ? ?individual users. ?That is, add nothing to the datetime module and let > ? ? ?them write a function which does what they want it to do. > > ? ?* The idea/implementation probably needs to bake on the python-ideas > ? ? ?list and perhaps comp.lang.python for a bit to see if some concensus > ? ? ?can be reached on reasonable functionality. So far, all the other solutions to the problem that have been mentioned are easily supported in current python. Raise an exception when a calculation results in an invalid date: >>> dt = date(2008, 1, 31) >>> dt.replace(month=dt.month + 1) Traceback (most recent call last): File "", line 1, in ValueError: day is out of range for month Add exactly 30 days to a date: >>> dt + timedelta(30) datetime.date(2008, 3, 1) These operations are useful in particular contexts. What I've submitted is also useful, and currently isn't easy in core, batteries-included python. While I would consider the foregoing interpretation of the Zen to be backwards (this doesn't add another way to do something that's already possible, it makes possible something that currently encourages one to pull her hair out), I suppose it doesn't matter. If adding a class and a function to a module will require extended advocacy on -ideas and c.l.p, I'm probably not the person for the job. If, on the other hand, one of the committers wants to toss this in at some point, whether now or 3 versions down the road, the patch is up at bugs.python.org (and I'm happy to make any suggested modifications). I'm glad to have written this; I learned a bit about CPython internals and scraped a layer of rust off my C skills. I will go ahead and backport the python-coded version to 2.3. I'll continue this conversation with whomever for however long, but I suspect this topic will soon have worn out its welcome on python-dev. cheers, Jess From greg.ewing at canterbury.ac.nz Fri Apr 17 03:55:58 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Apr 2009 13:55:58 +1200 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: References: Message-ID: <49E7E1AE.5030809@canterbury.ac.nz> Jess Austin wrote: > This is a perceptive observation: in the absence of parentheses to > dictate a different order of operations, the third quantity will > differ from the second. Another aspect of this is the use case mentioned right at the beginning of this discussion concerning a recurring event on a particular day of the month. If you do this the naive way by just repeatedly adding one of these monthdeltas to the previous date, and the date is near the end of the month, it will eventually end up drifting to the 28th of every month. -- Greg From tleeuwenburg at gmail.com Fri Apr 17 04:10:59 2009 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Fri, 17 Apr 2009 12:10:59 +1000 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: <49E7E1AE.5030809@canterbury.ac.nz> References: <49E7E1AE.5030809@canterbury.ac.nz> Message-ID: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> Actually, that's a point. If its' the 31st of Jan, then +1 monthdelta will be 28 Feb and another +1 will be 28 March whereas 31st Jan +2 monthdeltas will be 31 March. That's the kind of thing which really needs to be documented, or I think people really will make mistakes. For example, should a monthdelta include a goal-day so that the example above would go 31 Jan / 28 Feb / 31 March? -T On Fri, Apr 17, 2009 at 11:55 AM, Greg Ewing wrote: > Jess Austin wrote: > > This is a perceptive observation: in the absence of parentheses to >> dictate a different order of operations, the third quantity will >> differ from the second. >> > > Another aspect of this is the use case mentioned right > at the beginning of this discussion concerning a recurring > event on a particular day of the month. > > If you do this the naive way by just repeatedly adding one > of these monthdeltas to the previous date, and the date is > near the end of the month, it will eventually end up > drifting to the 28th of every month. > > -- > Greg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tleeuwenburg%40gmail.com > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Apr 17 04:29:11 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Apr 2009 12:29:11 +1000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <200904171229.11531.steve@pearwood.info> On Fri, 17 Apr 2009 07:41:19 am Jess Austin wrote: > Others have suggested raising an exception when a month calculation > lands on an invalid date. Python already has that; it's spelled like > > this: > >>> dt = date(2008, 1, 31) > >>> dt.replace(month=dt.month + 1) > > Traceback (most recent call last): > File "", line 1, in > ValueError: day is out of range for month > > What other behavior options besides "last-valid-day-of-the-month" > would you like to see? Adding one month to 31st January could mean: 1: raise an exception 2: return 28th February (last day of February) 3: return 3rd April (1 month = 31 days) 4: return 2nd April (1 month = 30 days) 5: return 28th February (1 month = 4 weeks = 28 days) 6: next business day after any of the above dates I don't really expect Python to support scenario 6, as that would require knowledge of local public holidays and conventions for week ends and working days. Open Office spreadsheet includes the following relevant functions: EDATE(start date; months) returns the serial number of the date that is a specified number of months before or after the start date. EOMONTH(start date; months) returns the serial number of the last day of the month that comes a certain number of months before or after the start date. MONTHS(start date; end date; type) calculate the difference in months between start and end date, possible values for type include 0 (interval) and 1 (in calendar months). Rather than a series of almost-identical functions catering for people who want 28 day months and 31 day months, I propose a keyword argument days_in_month which specifies the number of days in a month. Any positive integer should be accepted, but of course only 28, 30 and 31 will be meaningful for the common English meaning of the word "month". 0 or None (the default) should trigger "last day of the month" behaviour (scenario 2 above). That will (I think) simplify both documentation and implementation. Adding 1 month to a day will be defined as adding days_in_month days (if given), and if not given, adding 31 days but truncating the result to the last day of the next month. Thoughts? -- Steven D'Aprano From steve at pearwood.info Fri Apr 17 04:34:20 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Apr 2009 12:34:20 +1000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <200904171229.11531.steve@pearwood.info> References: <200904171229.11531.steve@pearwood.info> Message-ID: <200904171234.21121.steve@pearwood.info> On Fri, 17 Apr 2009 12:29:11 pm Steven D'Aprano wrote: > Adding one month to 31st January could mean: > > 1: raise an exception > 2: return 28th February (last day of February) > 3: return 3rd April (1 month = 31 days) > 4: return 2nd April (1 month = 30 days) > 5: return 28th February (1 month = 4 weeks = 28 days) Obviously I meant March, not April. Oops. -- Steven D'Aprano From glyph at divmod.com Fri Apr 17 04:53:32 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 17 Apr 2009 02:53:32 -0000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> References: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> Message-ID: <20090417025332.12555.1359868255.divmod.xquotient.8476@weber.divmod.com> On 16 Apr, 11:11 pm, foom at fuhm.net wrote: >On Apr 16, 2009, at 5:47 PM, Antoine Pitrou wrote: >It's a human-interface operation, and as such, everyone (ahem) "knows >what it means" to say "2 months from now", but the details don't >usually have to be thought about too much. Of course when you have a >computer program, you actually need to tell it what you really mean. > >I do a fair amount of date calculating, and use two different kinds of >"add-month": > >Option 1) >Add n to the month number, truncate day number to fit the month you >end up in. > >Option 2) >As above, but with the additional caveat that if the original date is >the last day of its month, the new day should also be the last day of >the new month. That is: >April 30th + 1 month = May 31st, instead of May 30th. > >They're both useful behaviors, in different circumstances. I don't have a third option, but something that would be useful to mention in the documentation for "monthdelta": frequently users will want a recurring "monthly" event. It's important to note that you need to keep your original date around if you want these rules to be consistently applied. For example, if you have a monthly billing cycle that starts on May 31, you need to keep the original May 31 around to add monthdelta(X) if you want it to be May 31 when it rolls around next year; otherwise every time February rolls around all of your end-of- month dates get clamped to the 28th of every month. (Unless you're following James's option 2, of course, in which case things which are normally on the 28th will get clamped to the 31st of following months.) My experience with month-calculating software suggests that this is something very easy to screw up. From steve at pearwood.info Fri Apr 17 04:42:45 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Apr 2009 12:42:45 +1000 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <200904171242.46009.steve@pearwood.info> On Fri, 17 Apr 2009 07:47:14 am Antoine Pitrou wrote: > Jess Austin gmail.com> writes: > > What other behavior options besides "last-valid-day-of-the-month" > > would you like to see? > > IMHO, the question is rather what the use case is for the behaviour > you are proposing. In which kind of situation is it acceptable to > turn 31/2 silently into 29/2? Any time the user expects "one month from the last day of January" to mean "the last day of February". I dare say that if you did a poll of non-programmers, that would be a very common expectation, possibly the most common. I just asked the missus, who is a non-programmer, what date is one month after 31st January and her answer was "2rd of March on leap years, otherwise 3rd of March". -- Steven D'Aprano From skip at pobox.com Fri Apr 17 04:55:02 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Apr 2009 21:55:02 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> Message-ID: <18919.61318.106749.848833@montanaro.dyndns.org> Jess> If, on the other hand, one of the committers wants to toss this in Jess> at some point, whether now or 3 versions down the road, the patch Jess> is up at bugs.python.org (and I'm happy to make any suggested Jess> modifications). Again, I think it needs to bake a bit. I understand the desire and need for doing date arithmetic with months. Python is mature enough though that I don't think you can just "toss this in". It should be available as a module outside of Python so people can beat on it, flush out bugs, make suggestions for enhancements, whatever. I believe you mentioned putting it up on PyPI. I think that's an excellent idea. I've used parts of Gustavo Niemeyer's dateutil package for a couple years and love it. It's widely used. Adding it to dateutil seems like another possibility. That would guarantee an instant user base. From there, if it is found to be useful it could make the leap to be part of the datetime module. Skip From skip at pobox.com Fri Apr 17 05:14:33 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 16 Apr 2009 22:14:33 -0500 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> References: <49E7E1AE.5030809@canterbury.ac.nz> <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> Message-ID: <18919.62489.500216.889134@montanaro.dyndns.org> Tennessee> If its' the 31st of Jan, then +1 monthdelta will be 28 Feb Tennessee> and another +1 will be 28 March whereas 31st Jan +2 Tennessee> monthdeltas will be 31 March. Other possible arithmetics: * 31 Jan 2008 + monthdelta(2) might be 31 Jan 2008 + 31 days (# days in Jan) + 29 days (# days in Feb) * 31 Jan 2008 + monthdelta(2) might be 31 Jan 2008 + 29 days (# days in Feb) + 31 days (# days in Mar) * Treat the day of the month of the base datetime as an offset from the end of the month. 29 Jan 2007 would thus have an EOM offset of -2. Adding monthdelta(2) would advance you into March with the resulting day being two from the end of the month, or 29 Mar 2007. OTOH, adding monthdelta(1) you'd wind up on 26 Feb 2007. * Consider the day of the month in the base datetime as an offset from the start of the month if it is closer to the start or as an offset from the end of the month if it is closer to the end. 12 Mar 2009 - monthdelta(2) would land you at 12 Jan 2009 whereas 17 Mar 2009 - monthdelta(1) would land you at 12 Feb 2009. My mind spins at all the possibilities. I suspect we've seen at least ten different monthdelta rules just in this thread. I don't know how much sense they all make, but we can probably keep dreaming up new ones until the cows come home. Except for completely wacko sets of rules you can probably find uses for most of them. Bake, baby, bake. pillsbury-doughboy-ly, y'rs, Skip From steve at pearwood.info Fri Apr 17 05:20:12 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Apr 2009 13:20:12 +1000 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> References: <49E7E1AE.5030809@canterbury.ac.nz> <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> Message-ID: <200904171320.13025.steve@pearwood.info> On Fri, 17 Apr 2009 12:10:59 pm Tennessee Leeuwenburg wrote: > Actually, that's a point. > > If its' the 31st of Jan, then +1 monthdelta will be 28 Feb and > another +1 will be 28 March whereas 31st Jan +2 monthdeltas will be > 31 March. > > That's the kind of thing which really needs to be documented, or I > think people really will make mistakes. It might be worth noting as an aside, but it should be obvious in the same way that string concatenation is different from numerical addition: 1 + 2 = 2 + 1 '1' + '2' != '2' + '1' > For example, should a monthdelta include a goal-day so that the > example above would go 31 Jan / 28 Feb / 31 March? No, that just adds complexity. Consider floating point addition. If you want to step through a loop while doing addition, you need to be aware of round-off error: >>> x = 0.0 >>> step = 0.001 >>> for i in xrange(1000): ... x += step ... >>> x 1.0000000000000007 The solution isn't to add a "goal" to the plus operator, but to fix your code to use a better algorithm: >>> y = 0.0 >>> for i in xrange(1, 1001): ... y = i*step ... >>> y 1.0 Same with monthdelta. -- Steven D'Aprano From nad at acm.org Fri Apr 17 05:28:45 2009 From: nad at acm.org (Ned Deily) Date: Thu, 16 Apr 2009 20:28:45 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: In article , Russell Owen wrote: > I installed the Mac binary on my Intel 10.5.6 system and it works, > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my > ActiveState 8.4.19 (which is in /Library/Frameworks where one would > expect). > > I just built python from source and that version does use ActiveState > 8.4.19. > > I wish I knew what's going on. Not being able to use the binary > distros is a bit of a pain. You're right, the tkinter included with the 2.6.2 installer is not linked properly: Is: $ cd /Library/Frameworks/Python.framework/Versions/2.6 $ cd lib/python2.6/lib-dynload $ otool -L _tkinter.so _tkinter.so: /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility version 8.4.0, current version 8.4.0) /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility version 8.4.0, current version 8.4.0) /usr/lib/libSystem.B.dylib [...] should be: _tkinter.so: /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility version 8.4.0, current version 8.4.19) /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility version 8.4.0, current version 8.4.19) /usr/lib/libSystem.B.dylib [...] -- Ned Deily, nad at acm.org From glyph at divmod.com Fri Apr 17 05:58:22 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 17 Apr 2009 03:58:22 -0000 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090416153350.702303A4100@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> <20090415210902.848443A4100@sparrow.telecommunity.com> <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com> <20090416153350.702303A4100@sparrow.telecommunity.com> Message-ID: <20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com> On 16 Apr, 03:36 pm, pje at telecommunity.com wrote: >At 03:46 AM 4/16/2009 +0000, glyph at divmod.com wrote: >>On 15 Apr, 09:11 pm, pje at telecommunity.com wrote: >>Twisted has its own system for "namespace" packages, and I'm not >>really sure where we fall in this discussion. I haven't been able to >>follow the whole thread, but my original understanding was that the >>PEP supports "defining packages", which we now seem to be calling >>"base packages", just fine. > >Yes, it does. The discussion since the original proposal, however, has >been dominated by MAL's counterproposal, which *requires* a defining >package. [snip clarifications] >Does that all make sense now? Yes. Thank you very much for the detailed explanation. It was more than I was due :-). >MAL's proposal requires a defining package, which is counterproductive >if you have a pure package with no base, since it now requires you to >create an additional project on PyPI just to hold your defining >package. Just as a use-case: would the Java "com.*" namespace be an example of a "pure package with no base"? i.e. lots of projects are in it, but no project owns it? >>I'd appreciate it if the PEP could also be extended cover Twisted's >>very similar mechanism for namespace packages, >>"twisted.plugin.pluginPackagePaths". I know this is not quite as >>widely used as setuptools' namespace package support, but its >>existence belies a need for standardization. >> >>The PEP also seems a bit vague with regard to the treatment of other >>directories containing __init__.py and *.pkg files. > >Do you have a clarification to suggest? My understanding (probably a >projection) is that to be a nested namespace package, you have to have >a parent namespace package. Just to clarify things on my end: "namespace package" to *me* means "package with modules provided from multiple distributions (the distutils term)". The definition provided by the PEP, that a package is spread over multiple directories on disk, seems like an implementation detail. Entries on __path__ slow down import, so my understanding of the platonic ideal of a system python installation is one which has a single directory where all packages reside, and a set of metadata off to the side explaining which files belong to which distributions so they can be uninstalled by a package manager. Of course, for a development installation, easy uninstallation and quick swapping between different versions of relevant dependencies is more important than good import performance. So in that case, you would want to optimize differently by having all of your distributions installed into separate directories, with a long PYTHONPATH or lots of .pth files to point at them. And of course you may want a hybrid of the two. So another clarification I'd like in the PEP is an explanation of motivation. For example, it comes as a complete surprise to me that the expectation of namespace packages was to provide only single-source namespaces like zope.*, peak.*, twisted.*. As I mentioned above, I implicitly thought this was more for com.*, twisted.plugins.*. Right now it just says that it's a package which resides in multiple directories, and it's not made clear why that's a desirable feature. >> The concept of a "defining package" seems important to avoid >>conflicts like this one: >> >> http://twistedmatrix.com/trac/ticket/2339 [snip some stuff about plugins and package layout] >Namespaces are not plugins and vice versa. The purpose of a namespace >package is to allow projects managed by the same entity to share a >namespace (ala Java "package" names) and avoid naming conflicts with >other authors. I think this is missing a key word: *separate* projects managed by the same entity. Hmm. I thought I could illustrate that the same problem actually occurs without using a plugin system, but I can actually only come up with an example if an application implements multi-library-version compatibility by doing try: from bad_old_name import bad_old_feature as feature except ImportError: from good_new_name import good_new_feature as feature rather than the other way around; and that's a terrible idea for other reasons. Other than that, you'd have to use pkg_resources.resource_listdir or somesuch, at which point you pretty much are implementing a plugin system. So I started this reply disagreeing but I think I've convinced myself that you're right. >Precisely. Note, however, that neither is twisted.plugins a namespace >package, and it should not contain any .pkg files. I don't think it's >reasonable to abuse PEP 382 namespace packages as a plugin system. In >setuptools' case, a different mechanism is provided for locating plugin >code, and of course Twisted already has its own system for the same >thing. It would be nice to have a standardized way of locating plugins >in the stdlib, but that will need to be a different PEP. Okay. So what I'm hearing is that Twisted should happily continue using our own wacky __path__-calculation logic for twisted.plugins, but that *twisted* should be a namespace package so that our separate distributions (TwistedCore, TwistedWeb, TwistedConch, et. al.) can be installed into separate directories. From greg.ewing at canterbury.ac.nz Fri Apr 17 06:08:46 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Apr 2009 16:08:46 +1200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <200904171242.46009.steve@pearwood.info> References: <200904171242.46009.steve@pearwood.info> Message-ID: <49E800CE.6000707@canterbury.ac.nz> Steven D'Aprano wrote: > "2rd of March on leap years, ^^^ The turd of March? -- Greg From greg.ewing at canterbury.ac.nz Fri Apr 17 06:14:06 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 17 Apr 2009 16:14:06 +1200 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: <200904171320.13025.steve@pearwood.info> References: <49E7E1AE.5030809@canterbury.ac.nz> <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> <200904171320.13025.steve@pearwood.info> Message-ID: <49E8020E.3010109@canterbury.ac.nz> Steven D'Aprano wrote: > it should be obvious in the > same way that string concatenation is different from numerical > addition: > > 1 + 2 = 2 + 1 > '1' + '2' != '2' + '1' However, the proposed arithmetic isn't just non- commutative, it's non-associative, which is a much rarer and more surprising thing. We do at least have ('1' + '2') + '3' == '1' + ('2' + '3') -- Greg From pje at telecommunity.com Fri Apr 17 06:56:39 2009 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 17 Apr 2009 00:56:39 -0400 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090417035822.12555.1669891463.divmod.xquotient.8566@webe r.divmod.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> <20090415210902.848443A4100@sparrow.telecommunity.com> <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com> <20090416153350.702303A4100@sparrow.telecommunity.com> <20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com> Message-ID: <20090417045411.67E4B3A4100@sparrow.telecommunity.com> At 03:58 AM 4/17/2009 +0000, glyph at divmod.com wrote: >Just as a use-case: would the Java "com.*" namespace be an example >of a "pure package with no base"? i.e. lots of projects are in it, >but no project owns it? Er, I suppose. I was thinking more of the various 'com.foo' and 'org.bar' packages as being the pure namespaces in question. For Python, a "flat is better than nested" approach seems fine at the moment. >Just to clarify things on my end: "namespace package" to *me* means >"package with modules provided from multiple distributions (the >distutils term)". The definition provided by the PEP, that a >package is spread over multiple directories on disk, seems like an >implementation detail. Agreed. >Entries on __path__ slow down import, so my understanding of the >platonic ideal of a system python installation is one which has a >single directory where all packages reside, and a set of metadata >off to the side explaining which files belong to which distributions >so they can be uninstalled by a package manager. True... except that part of the function of the PEP is to ensure that if you install those separately-distributed modules to the same directory, it still needs to work as a package and not have any inter-package file conflicts. >Of course, for a development installation, easy uninstallation and >quick swapping between different versions of relevant dependencies >is more important than good import performance. So in that case, >you would want to optimize differently by having all of your >distributions installed into separate directories, with a long >PYTHONPATH or lots of .pth files to point at them. > >And of course you may want a hybrid of the two. Yep. >So another clarification I'd like in the PEP is an explanation of >motivation. For example, it comes as a complete surprise to me that >the expectation of namespace packages was to provide only >single-source namespaces like zope.*, peak.*, twisted.*. As I >mentioned above, I implicitly thought this was more for com.*, >twisted.plugins.*. Well, aside from twisted.plugins, I wasn't aware of anybody in Python doing that... and as I described, I never really interpreted that through the lens of "namespace package" vs. "plugin finding". >Right now it just says that it's a package which resides in multiple >directories, and it's not made clear why that's a desirable feature. Good point; perhaps you can suggest some wording on these matters to Martin? >Okay. So what I'm hearing is that Twisted should happily continue >using our own wacky __path__-calculation logic for twisted.plugins, >but that *twisted* should be a namespace package so that our >separate distributions (TwistedCore, TwistedWeb, TwistedConch, et. >al.) can be installed into separate directories. Yes. Thanks for taking the time to participate in this and add another viewpoint to the mix, not to mention clarifying some areas where the PEP could be clearer. From ronaldoussoren at mac.com Fri Apr 17 08:17:50 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 17 Apr 2009 08:17:50 +0200 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: <92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com> On 16 Apr, 2009, at 20:58, Russell Owen wrote: > I installed the Mac binary on my Intel 10.5.6 system and it works, > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my > ActiveState 8.4.19 (which is in /Library/Frameworks where one would > expect). That's very string. I had ActiveState 8.4 installed (whatever was current about a month ago). > > Just out of curiosity: which 3rd party Tcl/Tk did you have installed > when you made the installer? Perhaps if it was 8.5 that would > explain it. If so I may try updating my Tcl/Tk -- I've been wanting > some of the bug fixes in 8.5 anyway. Tcl 8.5 won't happen in 2.6, and might not happen in 2.7 either. Tkinter needs to work with the system version of Tcl, which is some version of 8.4, Tkinter will not work when the major release of Tcl is different than during the compile. That makes it rather hard to support both 8.4 and 8.5 in the same installer. Ronald > > -- Russell > > On Apr 16, 2009, at 5:35 AM, Ronald Oussoren wrote: > >> >> On 15 Apr, 2009, at 22:47, Russell E. Owen wrote: >> >>> Thank you for 2.6.2. >>> >>> I see the Mac binary installer isn't out yet (at least it is not >>> listed >>> on the downloads page). Any chance that it will be compatible with >>> 3rd >>> party Tcl/Tk? >> >> The Mac installer is late because I missed the pre-announcement of >> the 2.6.2 tag. I sent the installer to Barry earlier today. >> >> The installer was build using a 3th-party installation of Tcl/Tk. >> >> Ronald > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From glyph at divmod.com Fri Apr 17 09:02:31 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 17 Apr 2009 07:02:31 -0000 Subject: [Python-Dev] PEP 382: Namespace Packages In-Reply-To: <20090417045411.67E4B3A4100@sparrow.telecommunity.com> References: <49DB6A1F.50801@egenix.com> <20090414162603.70C843A4100@sparrow.telecommunity.com> <49E4F93B.6010802@egenix.com> <20090415003026.B0A783A4114@sparrow.telecommunity.com> <49E59202.6050809@egenix.com> <20090415144147.6845F3A4100@sparrow.telecommunity.com> <49E60832.8030806@egenix.com> <20090415175704.966B13A4100@sparrow.telecommunity.com> <20090415185221.GB13696@amk-desktop.matrixgroup.net> <20090415192021.558E53A4119@sparrow.telecommunity.com> <94bdd2610904151300qbe8798dx8c2ba9eef9eb014d@mail.gmail.com> <20090415210902.848443A4100@sparrow.telecommunity.com> <20090416034602.12555.179034490.divmod.xquotient.8434@weber.divmod.com> <20090416153350.702303A4100@sparrow.telecommunity.com> <20090417035822.12555.1669891463.divmod.xquotient.8566@weber.divmod.com> <20090417045411.67E4B3A4100@sparrow.telecommunity.com> Message-ID: <20090417070231.12555.1552701942.divmod.xquotient.8602@weber.divmod.com> On 04:56 am, pje at telecommunity.com wrote: >At 03:58 AM 4/17/2009 +0000, glyph at divmod.com wrote: >>Just as a use-case: would the Java "com.*" namespace be an example of >>a "pure package with no base"? i.e. lots of projects are in it, but >>no project owns it? > >Er, I suppose. I was thinking more of the various 'com.foo' and >'org.bar' packages as being the pure namespaces in question. For >Python, a "flat is better than nested" approach seems fine at the >moment. Sure. I wasn't saying we should go down the domain-names-are-package- names road for Python itself, just that "com.*" is a very broad example of a multi-"vendor" namespace :). >>Entries on __path__ slow down import, so my understanding of the >>platonic ideal of a system python installation is one which has a >>single directory where all packages reside, and a set of metadata off >>to the side explaining which files belong to which distributions so >>they can be uninstalled by a package manager. >True... except that part of the function of the PEP is to ensure that >if you install those separately-distributed modules to the same >directory, it still needs to work as a package and not have any inter- >package file conflicts. Are you just referring to anything other than the problem of multiple packages overwriting __init__.py here? It's phrased in a very general way that makes me think maybe there's something else going on. >>So another clarification I'd like in the PEP is an explanation of >>motivation. For example, it comes as a complete surprise to me that >>the expectation of namespace packages was to provide only single- >>source namespaces like zope.*, peak.*, twisted.*. As I mentioned >>above, I implicitly thought this was more for com.*, >>twisted.plugins.*. > >Well, aside from twisted.plugins, I wasn't aware of anybody in Python >doing that... and as I described, I never really interpreted that >through the lens of "namespace package" vs. "plugin finding". There is some overlap. In particular, in the "vendor distribution" case, I would like there to be one nice, declarative Python way to say "please put these modules into the same package". In the past, Debian in particular has produced some badly broken Twisted packages in the past because there was no standard Python way to say "I have some modules here that go into an existing package". Since every distribution has its own funny ideas about what the filesystem should look like, this has come up for us in a variety of ways. I'd like it if we could use the "official" way of declaring a namespace package for that. >>Right now it just says that it's a package which resides in multiple >>directories, and it's not made clear why that's a desirable feature. > >Good point; perhaps you can suggest some wording on these matters to >Martin? I think the thing I said in my previous message about "multiple distributions" is a good start. That might not be everything, but I think it's clearly the biggest motivation. >>Okay. So what I'm hearing is that Twisted should happily continue >>using our own wacky __path__-calculation logic for twisted.plugins, >>but that *twisted* should be a namespace package so that our separate >>distributions (TwistedCore, TwistedWeb, TwistedConch, et. al.) can be >>installed into separate directories. > >Yes. I'm fairly happy with that, except the aforementioned communication- channel-with-packagers feature of namespace packages; they unambiguously say "multiple OS packages may contribute modules to this Python package". >Thanks for taking the time to participate in this and add another >viewpoint to the mix, not to mention clarifying some areas where the >PEP could be clearer. My pleasure. From robert.kern at gmail.com Fri Apr 17 09:46:57 2009 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 17 Apr 2009 02:46:57 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18919.61318.106749.848833@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> <18919.61318.106749.848833@montanaro.dyndns.org> Message-ID: On 2009-04-16 21:55, skip at pobox.com wrote: > Jess> If, on the other hand, one of the committers wants to toss this in > Jess> at some point, whether now or 3 versions down the road, the patch > Jess> is up at bugs.python.org (and I'm happy to make any suggested > Jess> modifications). > > Again, I think it needs to bake a bit. I understand the desire and need for > doing date arithmetic with months. Python is mature enough though that I > don't think you can just "toss this in". It should be available as a module > outside of Python so people can beat on it, flush out bugs, make suggestions > for enhancements, whatever. I believe you mentioned putting it up on PyPI. > I think that's an excellent idea. > > I've used parts of Gustavo Niemeyer's dateutil package for a couple years > and love it. It's widely used. Adding it to dateutil seems like another > possibility. That would guarantee an instant user base. From there, if it > is found to be useful it could make the leap to be part of the datetime > module. dateutil.relativedelta appears to do everything monthdelta does and more in a general way. Adding monthdelta to dateutil doesn't seem to make much sense. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From asmodai at in-nomine.org Fri Apr 17 10:50:16 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 17 Apr 2009 10:50:16 +0200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <18919.61318.106749.848833@montanaro.dyndns.org> References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> <18919.61318.106749.848833@montanaro.dyndns.org> Message-ID: <20090417085016.GD24948@nexus.in-nomine.org> -On [20090417 04:55], skip at pobox.com (skip at pobox.com) wrote: >Again, I think it needs to bake a bit. I understand the desire and need for >doing date arithmetic with months. Python is mature enough though that I >don't think you can just "toss this in". It should be available as a module >outside of Python so people can beat on it, flush out bugs, make suggestions >for enhancements, whatever. I think people should look at mx.DateTime a bit, including its documentation. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B To do injustice is more disgraceful than to suffer it... From solipsis at pitrou.net Fri Apr 17 11:34:21 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 17 Apr 2009 09:34:21 +0000 (UTC) Subject: [Python-Dev] Issue5434: datetime.monthdelta References: Message-ID: Jess Austin gmail.com> writes: > > I have worked in utility/telecom billing, and needed to examine large > numbers of invoice dates, fulfillment dates, disconnection dates, > payment dates, collection event dates, etc. There would often be > particular rules for the relationships among these dates, and since > many companies generate invoices every day of the month, you couldn't > rely on rules like "this always happens on the 5th". But, as you say, these are /particular rules/. Why do you think they would be the same in another industry, or even another telecom company? Why should they be integrated in Python's standard distribution? From solipsis at pitrou.net Fri Apr 17 11:37:13 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 17 Apr 2009 09:37:13 +0000 (UTC) Subject: [Python-Dev] Issue5434: datetime.monthdelta References: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> Message-ID: James Y Knight fuhm.net> writes: > > It's a human-interface operation, and as such, everyone (ahem) "knows > what it means" to say "2 months from now", but the details don't > usually have to be thought about too much. I don't think it's true. When you say "2 months from now", some people will think "9 weeks from now" (or "10 weeks from now"), others "60 days from now", and yet other will think of the meaning this proposal gives it. That's why, when scheduling a meeting, you don't say "2 months from now". You give a precise date instead, because you know otherwise people wouldn't show up on the same day. Regards Antoine. From piet at cs.uu.nl Fri Apr 17 11:42:59 2009 From: piet at cs.uu.nl (Piet van Oostrum) Date: Fri, 17 Apr 2009 11:42:59 +0200 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> (Barry Warsaw's message of "Wed\, 15 Apr 2009 11\:45\:08 -0400") References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: >>>>> Barry Warsaw (BW) wrote: >BW> On behalf of the Python community, I'm happy to announce the availability >BW> of Python 2.6.2. This is the latest production-ready version in the >BW> Python 2.6 series. Dozens of issues have been fixed since Python 2.6.1 >BW> was released back in December. Please see the NEWS file for all the gory >BW> details. >BW> http://www.python.org/download/releases/2.6.2/NEWS.txt >BW> For more information on Python 2.6 in general, please see >BW> http://docs.python.org/dev/whatsnew/2.6.html >BW> Source tarballs, Windows installers, and (soon) Mac OS X disk images can >BW> be downloaded from the Python 2.6.2 page: >BW> http://www.python.org/download/releases/2.6.2/ Maybe a link to the MacOSX image can also be added to http://www.python.org/download -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet at vanoostrum.org From barry at python.org Fri Apr 17 14:57:49 2009 From: barry at python.org (Barry Warsaw) Date: Fri, 17 Apr 2009 08:57:49 -0400 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: <7E3DD051-76DC-44A4-AB9F-DF462D764BF4@python.org> On Apr 17, 2009, at 5:42 AM, Piet van Oostrum wrote: > Maybe a link to the MacOSX image can also be added to > http://www.python.org/download Done. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From skip at pobox.com Fri Apr 17 15:45:05 2009 From: skip at pobox.com (skip at pobox.com) Date: Fri, 17 Apr 2009 08:45:05 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <49E800CE.6000707@canterbury.ac.nz> References: <200904171242.46009.steve@pearwood.info> <49E800CE.6000707@canterbury.ac.nz> Message-ID: <18920.34785.182858.323654@montanaro.dyndns.org> >> "2rd of March on leap years, > ^^^ > The turd of March? Yeah, it's from a little known Shakespearean play about a benevolent dictator, Guidius van Rossumus. The name of the play escapes me at the moment, but there's this critical scene where the BDFL is in mortal danger because of ongoing schemes by the members of the PSU. His one true friend and eventual replacement, Barius Warsawvius, known as the FLUFL, tries to warn him surreptitiously about the dangers lurking all about. Barius utters this immortal quote, "Beware the Turd of March." Unfortunately, the drama of that scene tends to be lost on modern audiences. Upon hearing that famous utterance they tend to break out in laughter, especially if the audience is made up mostly of boys under the age of twelve. -- Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/ "XML sucks, dictionaries rock" - Dave Beazley From Scott.Daniels at Acm.Org Fri Apr 17 16:58:55 2009 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Fri, 17 Apr 2009 07:58:55 -0700 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: <49E8020E.3010109@canterbury.ac.nz> References: <49E7E1AE.5030809@canterbury.ac.nz> <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> <200904171320.13025.steve@pearwood.info> <49E8020E.3010109@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Steven D'Aprano wrote: >> it should be obvious in the same way that string concatenation is >> different from numerical addition: >> >> 1 + 2 = 2 + 1 >> '1' + '2' != '2' + '1' > > However, the proposed arithmetic isn't just non- > commutative, it's non-associative, which is a > much rarer and more surprising thing. We do > at least have > > ('1' + '2') + '3' == '1' + ('2' + '3') > But we don't have: (1e40 + -1e40) + 1 == 1e40 + (-1e40 + 1) Non-associativity is what makes for floating point headaches. To my knowledge, floating point is at least commutative. --Scott David Daniels Scott.Daniels at Acm.Org From dickinsm at gmail.com Fri Apr 17 17:42:10 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 17 Apr 2009 16:42:10 +0100 Subject: [Python-Dev] Python-Dev Digest, Vol 69, Issue 143 In-Reply-To: References: <49E7E1AE.5030809@canterbury.ac.nz> <43c8685c0904161910x3fdb30fsafd148233e88bdaa@mail.gmail.com> <200904171320.13025.steve@pearwood.info> <49E8020E.3010109@canterbury.ac.nz> Message-ID: <5c6f2a5d0904170842h5667188dud795e6855245f52a@mail.gmail.com> On Fri, Apr 17, 2009 at 3:58 PM, Scott David Daniels wrote: > Non-associativity is what makes for floating point headaches. > To my knowledge, floating point is at least commutative. Well, mostly. :-) >>> from decimal import Decimal >>> x, y = Decimal('NaN123'), Decimal('-NaN456') >>> x + y Decimal('NaN123') >>> y + x Decimal('-NaN456') Similar effects can happen with regular IEEE 754 binary doubles, but Python doesn't expose NaN payloads or signs, so we don't see those effects witihin Python. Mark From status at bugs.python.org Fri Apr 17 18:06:56 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 17 Apr 2009 18:06:56 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090417160656.09609781DE@psf.upfronthosting.co.za> ACTIVITY SUMMARY (04/10/09 - 04/17/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2222 open (+37) / 15383 closed (+12) / 17605 total (+49) Open issues with patches: 852 Average duration of open issues: 642 days. Median duration of open issues: 393 days. Open Issues Breakdown open 2168 (+37) pending 54 ( +0) Issues Created Or Reopened (50) _______________________________ ignore py3_test_grammar.py syntax error 04/11/09 CLOSED http://bugs.python.org/issue5733 reopened benjamin.peterson BufferedRWPair broken 04/11/09 http://bugs.python.org/issue5734 created bquinlan patch Segfault when loading not recompiled module 04/11/09 http://bugs.python.org/issue5735 created chin patch, needs review Add the iterator protocol to dbm modules 04/11/09 http://bugs.python.org/issue5736 created akitada patch add Solaris errnos 04/11/09 http://bugs.python.org/issue5737 created mahrens easy multiprocessing example wrong 04/11/09 http://bugs.python.org/issue5738 created yaneurabeya Language reference is ambiguous regarding next() method lookup 04/12/09 http://bugs.python.org/issue5739 created ncoghlan multiprocessing.connection.Client API documentation incorrect 04/12/09 http://bugs.python.org/issue5740 created yaneurabeya SafeConfigParser incorrectly detects lone percent signs 04/12/09 CLOSED http://bugs.python.org/issue5741 created marcio inspect.findsource() should look only for sources 04/12/09 http://bugs.python.org/issue5742 created hdima patch multiprocessing.managers not accessible even though docs say so 04/12/09 CLOSED http://bugs.python.org/issue5743 created yaneurabeya multiprocessing.managers.BaseManager.connect example typos 04/12/09 CLOSED http://bugs.python.org/issue5744 reopened quiver email document update (more links) 04/13/09 CLOSED http://bugs.python.org/issue5745 created ocean-city patch socketserver problem upon disconnection (undefined member) 04/13/09 CLOSED http://bugs.python.org/issue5746 created eblond knowing the parent command 04/13/09 http://bugs.python.org/issue5747 created tarek Objects/bytesobject.c should include stringdefs.h, instead of de 04/13/09 http://bugs.python.org/issue5748 created eric.smith easy Allow bin() to have an optional "Total Bits" argument. 04/14/09 CLOSED http://bugs.python.org/issue5749 created MechPaul weird seg fault 04/14/09 CLOSED http://bugs.python.org/issue5750 created utilitarian Typo in documentation of print function parameters 04/14/09 http://bugs.python.org/issue5751 created nicolasg xml.dom.minidom does not handle newline characters in attribute 04/14/09 http://bugs.python.org/issue5752 created Tomalak CVE-2008-5983 python: untrusted python modules search path 04/14/09 http://bugs.python.org/issue5753 created iankko patch Shelve module writeback parameter does not act as advertised 04/14/09 http://bugs.python.org/issue5754 created jherskovic "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++" 04/14/09 http://bugs.python.org/issue5755 created zooko idle pydoc et al removed from 3.1 without versioned replacements 04/14/09 http://bugs.python.org/issue5756 created nad Documentation error for Condition.notify() 04/14/09 http://bugs.python.org/issue5757 created pietvo fileinput.hook_compressed returning bytes from gz file 04/14/09 http://bugs.python.org/issue5758 created mnewman __float__ not called by 'float' on classes derived from str 04/15/09 CLOSED http://bugs.python.org/issue5759 created shura_zam patch __getitem__ error message hard to understand 04/15/09 http://bugs.python.org/issue5760 created cvrebert add file name to py3k IO objects repr() 04/15/09 http://bugs.python.org/issue5761 created pitrou AttributeError: 'NoneType' object has no attribute 'replace' 04/15/09 http://bugs.python.org/issue5762 created hda scope resolving error 04/15/09 CLOSED http://bugs.python.org/issue5763 created vpodpecan 2.6.2 Python Manuals CHM file seems broken 04/15/09 http://bugs.python.org/issue5764 created dx617 stack overflow evaluating eval("()" * 30000) 04/15/09 http://bugs.python.org/issue5765 created gagenellina Mac/scripts/BuildApplet.py reset of sys.executable during instal 04/16/09 http://bugs.python.org/issue5766 created blb xmlrpclib loads invalid documents 04/16/09 http://bugs.python.org/issue5767 created exarkun logging don't encode Unicode message correctly. 04/16/09 CLOSED http://bugs.python.org/issue5768 created naoki patch OS X Installer: new make of documentation installs at wrong loca 04/16/09 http://bugs.python.org/issue5769 created nad SA bugs with unittest.py 04/16/09 CLOSED http://bugs.python.org/issue5770 created yaneurabeya patch SA bugs with unittest.py at r71263 04/16/09 http://bugs.python.org/issue5771 created yaneurabeya patch For float.__format__, don't add a trailing ".0" if we're using n 04/16/09 http://bugs.python.org/issue5772 created eric.smith easy Crash on shutdown after os.fdopen(2) in debug builds 04/16/09 http://bugs.python.org/issue5773 created amaury.forgeotdarc _winreg.OpenKey() is documented with keyword arguments, but does 04/16/09 http://bugs.python.org/issue5774 created stutzbach marshal.c needs to be checked for out of memory errors 04/16/09 http://bugs.python.org/issue5775 created eric.smith RPM build error with python-2.6.spec 04/17/09 http://bugs.python.org/issue5776 created yasusii patch unable to search in python V3 documentation 04/17/09 http://bugs.python.org/issue5777 created aotto1968 sys.version format differs between MSC and GCC 04/17/09 http://bugs.python.org/issue5778 created t-kamiya _elementtree import can fail silently 04/17/09 CLOSED http://bugs.python.org/issue5779 created naufraghi test_float fails for 'legacy' float repr style 04/17/09 http://bugs.python.org/issue5780 created marketdickinson patch Legacy float repr is used unnecessarily on some platforms 04/17/09 http://bugs.python.org/issue5781 created marketdickinson easy ',' formatting with empty format type '' (PEP 378) 04/17/09 http://bugs.python.org/issue5782 created eric.smith easy Issues Now Closed (37) ______________________ Use shorter float repr when possible 493 days http://bugs.python.org/issue1580 marketdickinson patch "make altinstall" installs pydoc, idle, smtpd.py 490 days http://bugs.python.org/issue1590 nad patch PyString_FromStringAndSize() to be considered unsafe 369 days http://bugs.python.org/issue2587 psss PyOS_vsnprintf() underflow leads to memory corruption 371 days http://bugs.python.org/issue2588 psss patch Handle ASDLSyntaxErrors gracefully 347 days http://bugs.python.org/issue2725 georg.brandl patch Starting any program as a subprocess fails when subprocess.Popen 265 days http://bugs.python.org/issue3440 r.david.murray patch Allow Division of datetime.timedelta Objects 156 days http://bugs.python.org/issue4291 Jeremy Banks asyncore's urgent data management and connection closed events 131 days http://bugs.python.org/issue4501 r.david.murray patch handling inf/nan in '%f' 101 days http://bugs.python.org/issue4799 eric.smith patch native build of python win32 using msys under wine. 90 days http://bugs.python.org/issue4954 lritter test_maxint64 fails on 32-bit systems due to assumption that 64- 81 days http://bugs.python.org/issue4977 pitrou io-c: TextIOWrapper is faster than BufferedReader but not protec 25 days http://bugs.python.org/issue5502 pitrou patch Lib/distutils/test/test_util: test_get_platform bogus for OSX 14 days http://bugs.python.org/issue5607 tarek Add more pickling tests 14 days http://bugs.python.org/issue5665 collinwinter patch, needs review string module requires bytes type for maketrans, but calling met 10 days http://bugs.python.org/issue5675 georg.brandl pydoc -w doesn't produce proper HTML 6 days http://bugs.python.org/issue5698 georg.brandl patch, patch, needs review inside *currentmodule* some links is disabled 7 days http://bugs.python.org/issue5703 georg.brandl Command line option '-3' should imply '-t' 7 days http://bugs.python.org/issue5704 georg.brandl setuptools doesn't honor standard compiler variables 7 days http://bugs.python.org/issue5706 tarek Tiny code polishing to unicode_repeat 6 days http://bugs.python.org/issue5708 georg.brandl patch optparse: please provide a usage example in the module docstring 5 days http://bugs.python.org/issue5719 georg.brandl ignore py3_test_grammar.py syntax error 0 days http://bugs.python.org/issue5733 benjamin.peterson SafeConfigParser incorrectly detects lone percent signs 1 days http://bugs.python.org/issue5741 georg.brandl multiprocessing.managers not accessible even though docs say so 0 days http://bugs.python.org/issue5743 yaneurabeya multiprocessing.managers.BaseManager.connect example typos 0 days http://bugs.python.org/issue5744 yaneurabeya email document update (more links) 0 days http://bugs.python.org/issue5745 georg.brandl patch socketserver problem upon disconnection (undefined member) 1 days http://bugs.python.org/issue5746 benjamin.peterson Allow bin() to have an optional "Total Bits" argument. 0 days http://bugs.python.org/issue5749 rhettinger weird seg fault 0 days http://bugs.python.org/issue5750 amaury.forgeotdarc __float__ not called by 'float' on classes derived from str 1 days http://bugs.python.org/issue5759 benjamin.peterson patch scope resolving error 0 days http://bugs.python.org/issue5763 marketdickinson logging don't encode Unicode message correctly. 1 days http://bugs.python.org/issue5768 vsajip patch SA bugs with unittest.py 0 days http://bugs.python.org/issue5770 benjamin.peterson patch _elementtree import can fail silently 0 days http://bugs.python.org/issue5779 naufraghi DistributionMetaData error ? 2217 days http://bugs.python.org/issue708320 varash file.seek() influences write() when opened with a+ mode 1008 days http://bugs.python.org/issue1521491 amaury.forgeotdarc Popen pipe file descriptor leak on OSError in init 641 days http://bugs.python.org/issue1751245 benjamin.peterson Top Issues Most Discussed (10) ______________________________ 10 io.FileIO calls flush() after file closed 12 days open http://bugs.python.org/issue5700 9 Add the iterator protocol to dbm modules 6 days open http://bugs.python.org/issue5736 7 2.6.2c1 fails to pass test_cmath on Solaris10 9 days open http://bugs.python.org/issue5724 6 test_float fails for 'legacy' float repr style 0 days open http://bugs.python.org/issue5780 6 Add test.support.import_python_only 53 days open http://bugs.python.org/issue5354 6 10e667.__format__('+') should return 'inf' 137 days open http://bugs.python.org/issue4482 5 ignore py3_test_grammar.py syntax error 0 days closed http://bugs.python.org/issue5733 5 setdefault speedup 8 days open http://bugs.python.org/issue5730 5 Support telling TestResult objects a test run has finished 8 days open http://bugs.python.org/issue5728 5 IDLE shell gives different len() of unicode strings compared to 973 days open http://bugs.python.org/issue1542677 From rowen at u.washington.edu Fri Apr 17 18:23:46 2009 From: rowen at u.washington.edu (Russell Owen) Date: Fri, 17 Apr 2009 09:23:46 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 In-Reply-To: <92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com> References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> <92CB905D-99F8-4727-A1AE-1772EA3ED79C@mac.com> Message-ID: <583B711E-5FD3-4C5A-9DF2-B1725E0F53A7@u.washington.edu> On Apr 16, 2009, at 11:17 PM, Ronald Oussoren wrote: > On 16 Apr, 2009, at 20:58, Russell Owen wrote: > >> I installed the Mac binary on my Intel 10.5.6 system and it works, >> except it still uses Apple's system Tcl/Tk 8.4.7 instead of my >> ActiveState 8.4.19 (which is in /Library/Frameworks where one would >> expect). > > That's very string. I had ActiveState 8.4 installed (whatever was > current about a month ago). I agree. (For what it's worth, you probably have Tcl/Tk 8.4.19 -- a version I've found to be very robust. 8.4.19 was released awhile ago and is probably the last version of 8.4 we will see, since all development is happening on 8.5 now). Could you try a simple experiment (assuming you still have ActiveState Tcl/Tk installed): run python from the command line and enter these commands: import Tkinter root = Tkinter.Tk() Then go to the application that comes up and select About Tcl/Tk... (in the Python menu) and see what version it reports. When I run with the Mac binary of 2.6.2 it reports 8.4.7 (Apple's built-in python). When I build python 2.6.2 from source it reports 8.4.19 (my ActiveState Tclc/Tk). >> Just out of curiosity: which 3rd party Tcl/Tk did you have >> installed when you made the installer? Perhaps if it was 8.5 that >> would explain it. If so I may try updating my Tcl/Tk -- I've been >> wanting some of the bug fixes in 8.5 anyway. > > Tcl 8.5 won't happen in 2.6, and might not happen in 2.7 either. > Tkinter needs to work with the system version of Tcl, which is some > version of 8.4, Tkinter will not work when the major release of Tcl > is different than during the compile. That makes it rather hard to > support both 8.4 and 8.5 in the same installer. Perfect. I agree. -- Russell From ajaksu at gmail.com Fri Apr 17 20:04:45 2009 From: ajaksu at gmail.com (Daniel (ajax) Diniz) Date: Fri, 17 Apr 2009 15:04:45 -0300 Subject: [Python-Dev] Experimental and Test Tracker instances live Message-ID: <2d75d7660904171104k3d13427au8e91489132cd2e23@mail.gmail.com> Hi, As discussed before, I have put two mock Python Tracker instances online. The Test[1] instance follows bugs.python.org code, so we can test bugfixes and procedures without breaking the real tracker. The Experimental[2] one, aka the cool instance, is where new features are showcased. Currently no emails are being sent and the dbs can be reset at any time. If you'd like to play as a registered user, please email me and I'll create a user (or activate the one you've started to register). So far, the new features[3] include: * Issue tags [4],[5] * Quiet properties [6] * Restore removed messages and files [7] * Claim ('assign to self') and add/remove self as nosy buttons [8] * Don't close issues with open dependencies [9] * Auto-add nosy users based on Components [10] * "Email me" buttons for messages and issues, "Reply by email" [11] * RSS feeds (per issue and global) [12] * Display selected issues in the index view [13] You can subscribe to a RSS feed[14] about the new features. Thanks to everyone who filled RFEs, there's still time to submit yours :) Regards, Daniel [1] http://bot.bio.br/python-dev/ [2] http://bot.bio.br/python-dev-exp/ [3] http://bot.bio.br/python-dev-exp/issue5 [4] http://mail.python.org/pipermail/tracker-discuss/2009-April/002099.html [5] http://codereview.appspot.com/40100/show [6] http://psf.upfronthosting.co.za/roundup/meta/issue249 [7] http://psf.upfronthosting.co.za/roundup/meta/issue267 [8] http://psf.upfronthosting.co.za/roundup/meta/issue258 [9] http://psf.upfronthosting.co.za/roundup/meta/issue266 [10] http://psf.upfronthosting.co.za/roundup/meta/issue258 [11] http://psf.upfronthosting.co.za/roundup/meta/issue245 [12] http://psf.upfronthosting.co.za/roundup/meta/issue155 [13] http://psf.upfronthosting.co.za/roundup/meta/issue246 [14] http://bot.bio.br/python-dev-exp/issue5?@template=feed From bjourne at gmail.com Fri Apr 17 22:41:49 2009 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Fri, 17 Apr 2009 22:41:49 +0200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> Message-ID: <740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com> It's not only about what people find intuitive. Why care about them? Most persons aren't programmers. It is about what application developers find useful too. I have often needed to calculate month deltas according to the proposal. I suspect many other programmers have too. Writing a month add function isn't entirely trivial and would be a good candidate for stdlib imho. 2009/4/17, Antoine Pitrou : > James Y Knight fuhm.net> writes: >> >> It's a human-interface operation, and as such, everyone (ahem) "knows >> what it means" to say "2 months from now", but the details don't >> usually have to be thought about too much. > > I don't think it's true. When you say "2 months from now", some people will > think "9 weeks from now" (or "10 weeks from now"), others "60 days from > now", > and yet other will think of the meaning this proposal gives it. > > That's why, when scheduling a meeting, you don't say "2 months from now". > You > give a precise date instead, because you know otherwise people wouldn't show > up > on the same day. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/bjourne%40gmail.com > -- mvh Bj?rn From aahz at pythoncraft.com Fri Apr 17 22:49:28 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 17 Apr 2009 13:49:28 -0700 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: <740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com> References: <52C09164-4AE5-42F2-AA4B-52ABFEFCD93D@fuhm.net> <740c3aec0904171341q78c88633ie4179791a123f251@mail.gmail.com> Message-ID: <20090417204928.GA15121@panix.com> On Fri, Apr 17, 2009, BJ?rn Lindqvist wrote: > > It's not only about what people find intuitive. Why care about them? > Most persons aren't programmers. It is about what application > developers find useful too. I have often needed to calculate month > deltas according to the proposal. I suspect many other programmers > have too. Writing a month add function isn't entirely trivial and > would be a good candidate for stdlib imho. At this point, further discussion really needs to move to python-ideas; for acceptance in stdlib, there needs to be either well-accepted code out in the community or a PEP for Guido to pronounce on (or probably both, in the end). I've set followups to python-ideas for convenience. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From rowen at u.washington.edu Sat Apr 18 00:47:11 2009 From: rowen at u.washington.edu (Russell E. Owen) Date: Fri, 17 Apr 2009 15:47:11 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: In article , Ned Deily wrote: > In article , > Russell Owen wrote: > > I installed the Mac binary on my Intel 10.5.6 system and it works, > > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my > > ActiveState 8.4.19 (which is in /Library/Frameworks where one would > > expect). > > > > I just built python from source and that version does use ActiveState > > 8.4.19. > > > > I wish I knew what's going on. Not being able to use the binary > > distros is a bit of a pain. > > You're right, the tkinter included with the 2.6.2 installer is not > linked properly: > > Is: > $ cd /Library/Frameworks/Python.framework/Versions/2.6 > $ cd lib/python2.6/lib-dynload > $ otool -L _tkinter.so > _tkinter.so: > /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl > (compatibility version 8.4.0, current version 8.4.0) > /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk > (compatibility version 8.4.0, current version 8.4.0) > /usr/lib/libSystem.B.dylib [...] > > should be: > _tkinter.so: > /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility > version 8.4.0, current version 8.4.19) > /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility > version 8.4.0, current version 8.4.19) > /usr/lib/libSystem.B.dylib [...] Just for the record, when I built Python 2.6 from source I got the latter output (the desired result). If someone can point me to instructions I'm willing to try to make a binary installer and make it available (though I'd much prefer to debug the standard installer). -- Russell From nad at acm.org Sat Apr 18 01:13:34 2009 From: nad at acm.org (Ned Deily) Date: Fri, 17 Apr 2009 16:13:34 -0700 Subject: [Python-Dev] RELEASED Python 2.6.2 References: <1C666973-D1B5-44C8-87B2-4FBEE31C4193@python.org> Message-ID: In article , "Russell E. Owen" wrote: > In article , > Ned Deily wrote: > > In article , > > Russell Owen wrote: > > > I installed the Mac binary on my Intel 10.5.6 system and it works, > > > except it still uses Apple's system Tcl/Tk 8.4.7 instead of my > > > ActiveState 8.4.19 (which is in /Library/Frameworks where one would > > > expect). > > > > > > I just built python from source and that version does use ActiveState > > > 8.4.19. > > > > > > I wish I knew what's going on. Not being able to use the binary > > > distros is a bit of a pain. > > > > You're right, the tkinter included with the 2.6.2 installer is not > > linked properly: > > > > Is: > > $ cd /Library/Frameworks/Python.framework/Versions/2.6 > > $ cd lib/python2.6/lib-dynload > > $ otool -L _tkinter.so > > _tkinter.so: > > /System/Library/Frameworks/Tcl.framework/Versions/8.4/Tcl > > (compatibility version 8.4.0, current version 8.4.0) > > /System/Library/Frameworks/Tk.framework/Versions/8.4/Tk > > (compatibility version 8.4.0, current version 8.4.0) > > /usr/lib/libSystem.B.dylib [...] > > > > should be: > > _tkinter.so: > > /Library/Frameworks/Tcl.framework/Versions/8.4/Tcl (compatibility > > version 8.4.0, current version 8.4.19) > > /Library/Frameworks/Tk.framework/Versions/8.4/Tk (compatibility > > version 8.4.0, current version 8.4.19) > > /usr/lib/libSystem.B.dylib [...] > > Just for the record, when I built Python 2.6 from source I got the > latter output (the desired result). > > If someone can point me to instructions I'm willing to try to make a > binary installer and make it available (though I'd much prefer to debug > the standard installer). I suspect Ronald will be fixing this in the standard installer soon. -- Ned Deily, nad at acm.org From ncoghlan at gmail.com Sat Apr 18 14:41:14 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Apr 2009 22:41:14 +1000 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D09ECF.5090407@trueblade.com> <49D0ACD5.5090209@gmail.com> Message-ID: <49E9CA6A.6060004@gmail.com> Steven Bethard wrote: > On Mon, Apr 13, 2009 at 1:14 PM, Mart S?mermaa wrote: >> A default behaviour should be found that works according to most >> user's expectations so that they don't need to use the positional >> arguments generally. > > I believe the usual Python approach here is to have two variants of > the function, add_query_params and add_query_params_no_dups (or > whatever you want to name them). That way the flag parameter is > "named" right in the function name. Yep - Guido has pointed out in a few different API design discussions that a boolean flag that is almost always set to a literal True or False is a good sign that there are two functions involved rather than just one. There are exceptions to that guideline (e.g. the reverse argument for sorted and list.sort), but they aren't common, and even when they do crop up, making them keyword-only arguments is strongly recommended. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From regebro at gmail.com Sat Apr 18 18:03:52 2009 From: regebro at gmail.com (Lennart Regebro) Date: Sat, 18 Apr 2009 18:03:52 +0200 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: Message-ID: <319e029f0904180903t39941b55o34478b868f09a876@mail.gmail.com> On Thu, Apr 16, 2009 at 08:18, Jess Austin wrote: > hi, > > I'm new to python core development, and I've been advised to write to > python-dev concerning a feature/patch I've placed at > http://bugs.python.org/issue5434, with Rietveld at > http://codereview.appspot.com/25079. > > This patch adds a "monthdelta" class and a "monthmod" function to the > datetime module. ?The monthdelta class is much like the existing > timedelta class, except that it represents months offset from a date, > rather than an exact period offset from a date. ?This allows us to > easily say, e.g. "3 months from now" without worrying about the number > of days in the intervening months. > > ? ?>>> date(2008, 1, 30) + monthdelta(1) > ? ?datetime.date(2008, 2, 29) > ? ?>>> date(2008, 1, 30) + monthdelta(2) > ? ?datetime.date(2008, 3, 30) > > The monthmod function, named in (imperfect) analogy to divmod, allows > us to round-trip by returning the interim between two dates > represented as a (monthdelta, timedelta) tuple: > > ? ?>>> monthmod(date(2008, 1, 14), date(2009, 4, 2)) > ? ?(datetime.monthdelta(14), datetime.timedelta(19)) > > Invariant: dt + monthmod(dt, dt+td)[0] + monthmod(dt, dt+td)[1] == dt + td > > These also work with datetimes! ?There are more details in the > documentation included in the patch. ?In addition to the C module > file, I've updated the datetime CAPI, the documentation, and tests. > > I feel this would be a good addition to core python. ?In my work, I've > often ended up writing annoying one-off "add-a-month" or similar > functions. ?I think since months work differently than most other time > periods, a new object is justified rather than trying to shoe-horn > something like this into timedelta. ?I also think that the round-trip > functionality provided by monthmod is important to ensure that > monthdeltas are "first-class" objects. > > Please let me know what you think of the idea and/or its execution. There are so many meanings of "one month from now" so I'd rather see a bunch of methods for monthly manipulations than a monthdelta class. Obvious: Tuesday February 3rd 2009 + 1 month = Tuesday March 3rd 2009 Not obvious: Tuesday March 3rd 2009 + 1 month = Tuesday April 7th 2009 (5 weeks) Tuesday April 7th 2009 + 1 months = Tuesday May 5th 2009 (4 weeks) Problematic: Tuesday March 31st 2009 + 1 month = what? Thursday April 30th 2009? Error? Just supporting the obvious case is just not enough to be worth the work. Doing month = month + 1 if month > 12: month = 1 year = year +1 lastday = calendar.monthrange(year, month)[1] if day > lastday: day = lastday Isn't really enough work to warrant it's own class IMO, even though it's a method I also end up doing all the time in every bloody calendar implementation I've done. :) And then comes the same question when talking about years. One year after the 20th of March 2011 may be the 20th of March 2012. But it could also be 19th of March, as 2012 is a leap year. And a year later still would then be the 20th of Match 2013 again... Code that doesn't support ALL the weird-ass variants is really not worth putting into the standard library, IMO. I'd recommend you to look at the dateutil.rrule code, maybe there is something you can use there. Perhaps there is something there that can be used straight off. Or at least maybe it can be extracted to it's own extended timedelta library that supports more advanced timedeltas, including "second to last wednesday" and "first sunday after easter". -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 From MLMLists at Comcast.net Sat Apr 18 22:08:55 2009 From: MLMLists at Comcast.net (Mitchell L Model) Date: Sat, 18 Apr 2009 16:08:55 -0400 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable Message-ID: Some library files, such as pdb.py, begin with #!/usr/bin/env python In various discussions regarding some issues I submitted I was told that the decision had been made to call Python 3.x release executables python3. (One of the conflicts I ran into when I made 'python' a link to python3.1 was that some tools used in making the HTML documentation haven't been upgraded to run with 3.) Shouldn't all library files that begin with the above line be changed so that they read 'python3' instead of python? Perhaps I should have just filed this as an issue, but I'm not confident of the state of the plan to move to python3 as the official executable name. From mrts.pydev at gmail.com Sat Apr 18 22:42:40 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Sat, 18 Apr 2009 23:42:40 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <49E9CA6A.6060004@gmail.com> References: <49D0ACD5.5090209@gmail.com> <49E9CA6A.6060004@gmail.com> Message-ID: On Sat, Apr 18, 2009 at 3:41 PM, Nick Coghlan wrote: > Yep - Guido has pointed out in a few different API design discussions > that a boolean flag that is almost always set to a literal True or False > is a good sign that there are two functions involved rather than just > one. There are exceptions to that guideline (e.g. the reverse argument > for sorted and list.sort), but they aren't common, and even when they do > crop up, making them keyword-only arguments is strongly recommended. As you yourself previously noted -- "it is often better to use *args for the two positional arguments - it avoids accidental name conflicts between the positional arguments and arbitrary keyword arguments" -- kwargs may cause name conflicts. But I also agree, that the current proliferation of positional args is ugly. add_query_params_no_dups() would be suboptimal though, as there are currently three different ways to handle the duplicates: * allow duplicates everywhere (True), * remove duplicate *values* for the same key (False), * behave like dict.update -- remove duplicate *keys*, unless explicitly passed a list (None). (See the documentation at http://github.com/mrts/qparams/blob/bf1b29ad46f9d848d5609de6de0bfac1200da310/qparams.py ). Additionally, as proposed by Antoine Pitrou, removing keys could be implemented. It feels awkward to start a PEP for such a marginal feature, but clearly a couple of enlightened design decisions are required. From benjamin at python.org Sat Apr 18 22:48:08 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 18 Apr 2009 15:48:08 -0500 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: Message-ID: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> 2009/4/18 Mitchell L Model : > Some library files, such as pdb.py, begin with > ? ? ? ?#!/usr/bin/env python > In various discussions regarding some issues I submitted I was told that the > decision had been made to call Python 3.x release executables python3. (One > of the conflicts I ran into when I made 'python' a link to python3.1 was > that some tools used in making the HTML documentation haven't been upgraded > to run with 3.) > > Shouldn't all library files that begin with the above line be changed so > that they read 'python3' instead of python? Perhaps I should have just filed > this as an issue, but I'm not confident of the state of the plan to move to > python3 as the official executable name. That sounds correct. Please file a bug report. -- Regards, Benjamin From kevin at bud.ca Sun Apr 19 01:01:02 2009 From: kevin at bud.ca (Kevin Teague) Date: Sat, 18 Apr 2009 16:01:02 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: Message-ID: <4838697B-885E-433E-A50E-1EDC5C96A5EC@bud.ca> On Apr 18, 2009, at 1:08 PM, Mitchell L Model wrote: > Some library files, such as pdb.py, begin with > #!/usr/bin/env python > In various discussions regarding some issues I submitted I was told > that the decision had been made to call Python 3.x release > executables python3. (One of the conflicts I ran into when I made > 'python' a link to python3.1 was that some tools used in making the > HTML documentation haven't been upgraded to run with 3.) > > Shouldn't all library files that begin with the above line be > changed so that they read 'python3' instead of python? Perhaps I > should have just filed this as an issue, but I'm not confident of > the state of the plan to move to python3 as the official executable > name. Hrmm ... On installing from source, one either gets: ./bin/python3.0 Or is using 'make fullinstall': ./bin/python So the default and the tutorial (http://docs.python.org/3.0/tutorial/interpreter.html ) refer to 'python3.0'. But I've done all my Python installs with 'make fullinstall' and then just manage my environment such that 'python' points to a 2.x or 3.x release depending upon what the source code I'm working on requires. If using something such as the Mac OS X Installer you'll get both a 'python' and 'python3.0'. Are there some Python installers that provide './bin/python3'? But if there sometimes just 'python', 'python3.0' or 'python3' then it's not possible for the shebang to work with both all known install methods ... One could argue that executable files part of the python standard library should have their interpreter hard-coded to the python interpreter to which they are installed with, e.g.: #!/Users/kteague/shared/python-3.0.1/bin/python Of course, this would remove the ability for a Python installation to be re-located ... if you wanted to move the install, you'd need to re- install it in order to maintain the proper shebangs. But it would mean that these scripts would also use the correct interpreter regardless of a user's current environemnt. Or, if the standard library was packaged such that all of it's scripts were advertised as console_scripts in the entry_points, it'd be easier for different install approaches to decide how to write out the shebang or to instead provide wrapper scripts for accessing those entry points (since it might be nice to have a ./bin/pdb). But that's a bit pie-in-the-sky since entry_points isn't even yet a part of the Distutils Metadata. From ncoghlan at gmail.com Sun Apr 19 01:06:41 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 09:06:41 +1000 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49D0ACD5.5090209@gmail.com> <49E9CA6A.6060004@gmail.com> Message-ID: <49EA5D01.6040208@gmail.com> Mart S?mermaa wrote: > On Sat, Apr 18, 2009 at 3:41 PM, Nick Coghlan wrote: >> Yep - Guido has pointed out in a few different API design discussions >> that a boolean flag that is almost always set to a literal True or False >> is a good sign that there are two functions involved rather than just >> one. There are exceptions to that guideline (e.g. the reverse argument >> for sorted and list.sort), but they aren't common, and even when they do >> crop up, making them keyword-only arguments is strongly recommended. > > As you yourself previously noted -- "it is often > better to use *args for the two positional arguments - it avoids > accidental name conflicts between the positional arguments and arbitrary > keyword arguments" -- kwargs may cause name conflicts. Despite what I said earlier, it is probably OK to use named parameters on the function in this case, especially since you have 3 optional arguments that someone may want to specify independently of each other. If someone really wants to add a query parameter to their URL that conflicts with one of the function parameter names then they can pass them in the same way they would pass in parameters that don't meet the rules for a Python identifier (i.e. using the explicit params dictionary). Something that can be done to even further reduce the chance of conflicts is to prefix the function parameter names with underscores: def add_query_params(_url, _dups, _params, _sep, **kwargs) That said, I'm starting to wonder if an even better option may be to just drop the kwargs support from the function and require people to always supply a parameters dictionary. That would simplify the signature to the quite straightforward: def add_query_params(url, params, allow_dups=True, sep='&') The "keyword arguments as query parameters" style would still be supported via dict's own constructor: >>> add_query_params('foo', dict(bar='baz')) 'foo?bar=baz' >>> add_query_params('http://example.com/a/b/c?a=b', dict(b='d')) 'http://example.com/a/b/c?a=b&b=d' >>> add_query_params('http://example.com/a/b/c?a=b&c=q', ... dict(a='b', b='d', c='q')) 'http://example.com/a/b/c?a=b&c=q&a=b&c=q&b=d' >>> add_query_params('http://example.com/a/b/c?a=b', dict(a='c', b='d')) 'http://example.com/a/b/c?a=b&a=c&b=d' This also makes the transition to a different container type (such as OrderedDict) cleaner, since you will already be constructing a separate object to hold the new parameters. > But I also agree, that the current proliferation of positional args is ugly. > > add_query_params_no_dups() would be suboptimal though, as there are > currently three different ways to handle the duplicates: > * allow duplicates everywhere (True), > * remove duplicate *values* for the same key (False), > * behave like dict.update -- remove duplicate *keys*, unless > explicitly passed a list (None). So if we went the multiple functions route, we would have at least: add_query_params_allow_duplicates() add_query_params_ignore_duplicate_items() add_query_params_ignore_duplicate_keys() I agree that isn't a good option, but mapping True/False/None to those specific behaviours also seems rather arbitrary (specifically, it is difficult to remember which of "allow_dups=False" and "allow_dups=None" means to ignore any duplicate keys and which means to ignore only duplicate items). It also doesn't provide a clear mechanism for extension (e.g. what if someone wanted duplicate params to trigger an exception?) Perhaps the extra argument should just be a key/value pair filtering function, and we provide functions for the three existing behaviours (i.e. allow_duplicates(), ignore_duplicate_keys(), ignore_duplicate_items()) in the urllib.parse module. > (See the documentation at > http://github.com/mrts/qparams/blob/bf1b29ad46f9d848d5609de6de0bfac1200da310/qparams.py > ). Note that your implementation and docstring currently conflict with each other - the docstring says "pass them via a dictionary in second argument:" but the dictionary is currently the third argument (the docstring also later refers to passing OrderedDictionary as the second argument). Phrases like "second optional argument" and "fourth optional argument" are also ambiguous - do they refer to "the second argument, which happens to be optional" or to "the second of the optional arguments". The fact that changing the function signature to disallow keyword argument would make the optional parameters easier to refer to is a big win in my book. > Additionally, as proposed by Antoine Pitrou, removing keys could be implemented. > > It feels awkward to start a PEP for such a marginal feature, but > clearly a couple of enlightened design decisions are required. Probably not a PEP - just a couple of documented design decisions on a tracker item pointing to discussion on this list for the rationale. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Sun Apr 19 01:19:00 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 09:19:00 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> Message-ID: <49EA5FE4.9040102@gmail.com> Benjamin Peterson wrote: > 2009/4/18 Mitchell L Model : >> Some library files, such as pdb.py, begin with >> #!/usr/bin/env python >> In various discussions regarding some issues I submitted I was told that the >> decision had been made to call Python 3.x release executables python3. (One >> of the conflicts I ran into when I made 'python' a link to python3.1 was >> that some tools used in making the HTML documentation haven't been upgraded >> to run with 3.) >> >> Shouldn't all library files that begin with the above line be changed so >> that they read 'python3' instead of python? Perhaps I should have just filed >> this as an issue, but I'm not confident of the state of the plan to move to >> python3 as the official executable name. > > That sounds correct. Please file a bug report. As Kevin pointed out, while this is a problem, changing the affected scripts to say "python3" instead isn't the right answer. All that happened with the Python 3 installers is that they do 'altinstall' rather than 'fullinstall' by default, thus leaving the 'python' alias alone. There is no "python3" alias unless a user creates it for themselves (or a distro packager does it for them). I see a few options: 1. Abandon the "python" name for the 3.x series and commit to calling it "python3" now and forever (i.e. actually make the decision that Mitchell refers to). 2. Remove the offending shebang lines from the affected files and tell people to use "python -m " instead. 3. Change the shebang lines in Python standard library scripts to be version specific and update release.py to fix them all when bumping the version number in the source tree. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From benjamin at python.org Sun Apr 19 05:14:17 2009 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 18 Apr 2009 22:14:17 -0500 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EA5FE4.9040102@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> Message-ID: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> 2009/4/18 Nick Coghlan : > Benjamin Peterson wrote: >> 2009/4/18 Mitchell L Model : >>> Some library files, such as pdb.py, begin with >>> ? ? ? ?#!/usr/bin/env python >>> In various discussions regarding some issues I submitted I was told that the >>> decision had been made to call Python 3.x release executables python3. (One >>> of the conflicts I ran into when I made 'python' a link to python3.1 was >>> that some tools used in making the HTML documentation haven't been upgraded >>> to run with 3.) >>> >>> Shouldn't all library files that begin with the above line be changed so >>> that they read 'python3' instead of python? Perhaps I should have just filed >>> this as an issue, but I'm not confident of the state of the plan to move to >>> python3 as the official executable name. >> >> That sounds correct. Please file a bug report. > > As Kevin pointed out, while this is a problem, changing the affected > scripts to say "python3" instead isn't the right answer. > > All that happened with the Python 3 installers is that they do > 'altinstall' rather than 'fullinstall' by default, thus leaving the > 'python' alias alone. There is no "python3" alias unless a user creates > it for themselves (or a distro packager does it for them). I've actually implemented a python3 alias for 3.1. > > I see a few options: > 1. Abandon the "python" name for the 3.x series and commit to calling it > "python3" now and forever (i.e. actually make the decision that Mitchell > refers to). I believe this was decided on sometime (the sprints?). -- Regards, Benjamin From ncoghlan at gmail.com Sun Apr 19 05:22:55 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 13:22:55 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: <49EA990F.6060301@gmail.com> Benjamin Peterson wrote: > 2009/4/18 Nick Coghlan : >> I see a few options: >> 1. Abandon the "python" name for the 3.x series and commit to calling it >> "python3" now and forever (i.e. actually make the decision that Mitchell >> refers to). > > I believe this was decided on sometime (the sprints?). If that decision has already been made, then sure, changing the shebang lines to use the new name is the right thing to do. It certainly wouldn't be the first time something was discussed at Pycon or the sprints and those involved forgot to mention the outcome on the list :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From steven.bethard at gmail.com Sun Apr 19 05:51:46 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 18 Apr 2009 20:51:46 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson wrote: > 2009/4/18 Nick Coghlan : >> I see a few options: >> 1. Abandon the "python" name for the 3.x series and commit to calling it >> "python3" now and forever (i.e. actually make the decision that Mitchell >> refers to). > > I believe this was decided on sometime (the sprints?). That's an unfortunate decision. When the 2.X line stops being maintained (after 2.7 maybe?) we're going to be stuck with the "3" suffix forever for the "real" Python. Why doesn't it make more sense to just use "python3" only for "altinstall" and "python" for "fullinstall"? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From tonynelson at georgeanelson.com Sun Apr 19 06:29:08 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Sun, 19 Apr 2009 00:29:08 -0400 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: At 20:51 -0700 04/18/2009, Steven Bethard wrote: >On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson >wrote: >> 2009/4/18 Nick Coghlan : >>> I see a few options: >>> 1. Abandon the "python" name for the 3.x series and commit to calling it >>> "python3" now and forever (i.e. actually make the decision that Mitchell >>> refers to). >> >> I believe this was decided on sometime (the sprints?). > >That's an unfortunate decision. When the 2.X line stops being >maintained (after 2.7 maybe?) we're going to be stuck with the "3" >suffix forever for the "real" Python. > >Why doesn't it make more sense to just use "python3" only for >"altinstall" and "python" for "fullinstall"? Just use python3 in the shebang lines all the time (where applicable ;), as it is made by both altinstall and fullinstall. fullinstall also make plain "python", but that is not important. -- ____________________________________________________________________ TonyN.:' ' From ncoghlan at gmail.com Sun Apr 19 06:37:54 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 14:37:54 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: <49EAAAA2.4040800@gmail.com> Steven Bethard wrote: > On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson wrote: >> 2009/4/18 Nick Coghlan : >>> I see a few options: >>> 1. Abandon the "python" name for the 3.x series and commit to calling it >>> "python3" now and forever (i.e. actually make the decision that Mitchell >>> refers to). >> I believe this was decided on sometime (the sprints?). > > That's an unfortunate decision. When the 2.X line stops being > maintained (after 2.7 maybe?) we're going to be stuck with the "3" > suffix forever for the "real" Python. > > Why doesn't it make more sense to just use "python3" only for > "altinstall" and "python" for "fullinstall"? Note that such an approach would then require an altaltinstall command in order to be able to install a specific version of python 3.x without changing the python3 alias (e.g. installing 3.2 without overriding 3.1). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From steven.bethard at gmail.com Sun Apr 19 06:45:14 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 18 Apr 2009 21:45:14 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EAAAA2.4040800@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> Message-ID: On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan wrote: > Steven Bethard wrote: >> On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson wrote: >>> 2009/4/18 Nick Coghlan : >>>> I see a few options: >>>> 1. Abandon the "python" name for the 3.x series and commit to calling it >>>> "python3" now and forever (i.e. actually make the decision that Mitchell >>>> refers to). >>> I believe this was decided on sometime (the sprints?). >> >> That's an unfortunate decision. When the 2.X line stops being >> maintained (after 2.7 maybe?) we're going to be stuck with the "3" >> suffix forever for the "real" Python. >> >> Why doesn't it make more sense to just use "python3" only for >> "altinstall" and "python" for "fullinstall"? > > Note that such an approach would then require an altaltinstall command > in order to be able to install a specific version of python 3.x without > changing the python3 alias (e.g. installing 3.2 without overriding 3.1). I wasn't suggesting that there shouldn't be a "python3.1", "python3.2", etc. I'm more concerned about "fullinstall" creating "python3" instead of regular "python". Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From ncoghlan at gmail.com Sun Apr 19 07:04:02 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 15:04:02 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> Message-ID: <49EAB0C2.8040506@gmail.com> Steven Bethard wrote: > On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan wrote: >> Note that such an approach would then require an altaltinstall command >> in order to be able to install a specific version of python 3.x without >> changing the python3 alias (e.g. installing 3.2 without overriding 3.1). > > I wasn't suggesting that there shouldn't be a "python3.1", > "python3.2", etc. I'm more concerned about "fullinstall" creating > "python3" instead of regular "python". If I understand Tony's summary correctly, the situation after Benjamin's latest checkin is as follows: 2.x altinstall: - installs python2.x executable 2.x fullinstall (default for "make install"): - installs python2.x executable - adjusts (or creates) python symlink to new executable 3.x altinstall (default for "make install"): - installs python3.x executable - adjusts (or creates) python3 symlink to new executable 3.x fullinstall: - installs python3.x executable - adjusts (or creates) python3 symlink to new executable - adjusts (or creates) python symlink to new executable With that setup, I'm sure we're going to get people complaining that 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are going to be 3 levels of executable naming (python3.x, python3, python), there needs to be 3 levels of installation rather than the traditional 2. For example, add a new target "py3install" and make that the default for 3.1: 3.x altinstall: - installs python3.x executable 3.x py3install (default for "make install"): - installs python3.x executable - adjusts (or creates) python3 symlink to new executable 3.x fullinstall: - installs python3.x executable - adjusts (or creates) python3 symlink to new executable - adjusts (or creates) python symlink to new executable Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From steven.bethard at gmail.com Sun Apr 19 07:14:32 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 18 Apr 2009 22:14:32 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EAB0C2.8040506@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> <49EAB0C2.8040506@gmail.com> Message-ID: On Sat, Apr 18, 2009 at 10:04 PM, Nick Coghlan wrote: > Steven Bethard wrote: >> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan wrote: >>> Note that such an approach would then require an altaltinstall command >>> in order to be able to install a specific version of python 3.x without >>> changing the python3 alias (e.g. installing 3.2 without overriding 3.1). >> >> I wasn't suggesting that there shouldn't be a "python3.1", >> "python3.2", etc. I'm more concerned about "fullinstall" creating >> "python3" instead of regular "python". > > If I understand Tony's summary correctly, the situation after Benjamin's > latest checkin is as follows: > > 2.x altinstall: > ?- installs python2.x executable > > 2.x fullinstall (default for "make install"): > ?- installs python2.x executable > ?- adjusts (or creates) python symlink to new executable > > 3.x altinstall (default for "make install"): > ?- installs python3.x executable > ?- adjusts (or creates) python3 symlink to new executable > > 3.x fullinstall: > ?- installs python3.x executable > ?- adjusts (or creates) python3 symlink to new executable > ?- adjusts (or creates) python symlink to new executable Thanks for the clear explanation. The fact that "python" still appears with "fullinstall" covers my concern. > With that setup, I'm sure we're going to get people complaining that > 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are > going to be 3 levels of executable naming (python3.x, python3, python), > there needs to be 3 levels of installation rather than the traditional 2. > > For example, add a new target "py3install" and make that the default for > 3.1: > > 3.x altinstall: > ?- installs python3.x executable > > 3.x py3install (default for "make install"): > ?- installs python3.x executable > ?- adjusts (or creates) python3 symlink to new executable > > 3.x fullinstall: > ?- installs python3.x executable > ?- adjusts (or creates) python3 symlink to new executable > ?- adjusts (or creates) python symlink to new executable Yep, I agree this is what needs done to sensibly support a "python3". Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From allan at archlinux.org Sun Apr 19 07:23:13 2009 From: allan at archlinux.org (Allan McRae) Date: Sun, 19 Apr 2009 15:23:13 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EAB0C2.8040506@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> <49EAB0C2.8040506@gmail.com> Message-ID: Nick Coghlan wrote: > Steven Bethard wrote: > >> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan wrote: >> >>> Note that such an approach would then require an altaltinstall command >>> in order to be able to install a specific version of python 3.x without >>> changing the python3 alias (e.g. installing 3.2 without overriding 3.1). >>> >> I wasn't suggesting that there shouldn't be a "python3.1", >> "python3.2", etc. I'm more concerned about "fullinstall" creating >> "python3" instead of regular "python". >> > > If I understand Tony's summary correctly, the situation after Benjamin's > latest checkin is as follows: > > 2.x altinstall: > - installs python2.x executable > > 2.x fullinstall (default for "make install"): > - installs python2.x executable > - adjusts (or creates) python symlink to new executable > > 3.x altinstall (default for "make install"): > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > > 3.x fullinstall: > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > - adjusts (or creates) python symlink to new executable > > With that setup, I'm sure we're going to get people complaining that > 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are > going to be 3 levels of executable naming (python3.x, python3, python), > there needs to be 3 levels of installation rather than the traditional 2. > > For example, add a new target "py3install" and make that the default for > 3.1: > > 3.x altinstall: > - installs python3.x executable > > 3.x py3install (default for "make install"): > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > > 3.x fullinstall: > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > - adjusts (or creates) python symlink to new executable > Adjusting the python2 installs to do something similar with symlinks to python2 would also be useful when python3 becomes the standard python and python2 is used for legacy. I.e. 2.x altinstall: - installs python2.x executable 2.x py2install (default for "make install"): - installs python2.x executable - adjusts (or creates) python2 symlink to new executable 2.x fullinstall (default for "make install"): - installs python2.x executable - adjusts (or creates) python2 symlink to new executable - adjusts (or creates) python symlink to new executable Allan From allan at archlinux.org Sun Apr 19 07:31:27 2009 From: allan at archlinux.org (Allan McRae) Date: Sun, 19 Apr 2009 15:31:27 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> <49EAB0C2.8040506@gmail.com> Message-ID: Allan McRae wrote: > Nick Coghlan wrote: >> Steven Bethard wrote: >> >>> On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan >>> wrote: >>> >>>> Note that such an approach would then require an altaltinstall command >>>> in order to be able to install a specific version of python 3.x >>>> without >>>> changing the python3 alias (e.g. installing 3.2 without overriding >>>> 3.1). >>>> >>> I wasn't suggesting that there shouldn't be a "python3.1", >>> "python3.2", etc. I'm more concerned about "fullinstall" creating >>> "python3" instead of regular "python". >>> >> >> If I understand Tony's summary correctly, the situation after Benjamin's >> latest checkin is as follows: >> >> 2.x altinstall: >> - installs python2.x executable >> >> 2.x fullinstall (default for "make install"): >> - installs python2.x executable >> - adjusts (or creates) python symlink to new executable >> >> 3.x altinstall (default for "make install"): >> - installs python3.x executable >> - adjusts (or creates) python3 symlink to new executable >> >> 3.x fullinstall: >> - installs python3.x executable >> - adjusts (or creates) python3 symlink to new executable >> - adjusts (or creates) python symlink to new executable >> >> With that setup, I'm sure we're going to get people complaining that >> 'altinstall' of 3.2 broke their python3 symlink from 3.1. If there are >> going to be 3 levels of executable naming (python3.x, python3, python), >> there needs to be 3 levels of installation rather than the >> traditional 2. >> >> For example, add a new target "py3install" and make that the default for >> 3.1: >> >> 3.x altinstall: >> - installs python3.x executable >> >> 3.x py3install (default for "make install"): >> - installs python3.x executable >> - adjusts (or creates) python3 symlink to new executable >> >> 3.x fullinstall: >> - installs python3.x executable >> - adjusts (or creates) python3 symlink to new executable >> - adjusts (or creates) python symlink to new executable >> > > > Adjusting the python2 installs to do something similar with symlinks > to python2 would also be useful when python3 becomes the standard > python and python2 is used for legacy. > > I.e. > > 2.x altinstall: > - installs python2.x executable > > 2.x py2install (default for "make install"): And of course that was supposed to say "future default"... > - installs python2.x executable > - adjusts (or creates) python2 symlink to new executable > > > 2.x fullinstall (default for "make install"): > - installs python2.x executable > - adjusts (or creates) python2 symlink to new executable > - adjusts (or creates) python symlink to new executable From regebro at gmail.com Sun Apr 19 08:16:57 2009 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 19 Apr 2009 08:16:57 +0200 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: <319e029f0904182316j11d9a198u205d4fe31b8fff1c@mail.gmail.com> On Sun, Apr 19, 2009 at 05:51, Steven Bethard wrote: > That's an unfortunate decision. When the 2.X line stops being > maintained (after 2.7 maybe?) we're going to be stuck with the "3" > suffix forever for the "real" Python. Yes, but that's the only decision that really works. > Why doesn't it make more sense to just use "python3" only for > "altinstall" and "python" for "fullinstall"? Because you will then get Python 3 trying to run all shebangs that should be run with python 2. Making Python 3 default doesn't make it compatible. ;-) And yes, that means we are stuck with it forever, and I don't like that either, but nobody could come up with an alternative. The recommendation to use python3 could change back to use python once 2.7 falls out of support, which is gonna be many years still. And until then we kinda need different shebang lines. Not much you can do to get around that. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 From greg.ewing at canterbury.ac.nz Sun Apr 19 08:52:28 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 19 Apr 2009 18:52:28 +1200 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: <49EACA2C.6060606@canterbury.ac.nz> Steven Bethard wrote: > That's an unfortunate decision. When the 2.X line stops being > maintained (after 2.7 maybe?) we're going to be stuck with the "3" > suffix forever for the "real" Python. I don't see why we have to be stuck with it forever. When 2.x has faded into the sunset, we can start aliasing 'python' to 'python3' if we want, can't we? -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 19 08:54:37 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 19 Apr 2009 18:54:37 +1200 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EAAAA2.4040800@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> Message-ID: <49EACAAD.4030401@canterbury.ac.nz> Nick Coghlan wrote: > Note that such an approach would then require an altaltinstall command > in order to be able to install a specific version of python 3.x without > changing the python3 alias (e.g. installing 3.2 without overriding 3.1). Seems like what we need is something in between altinstall and fullinstall that aliases 'python3' but not 'python', and make that the default. Maybe call it 'install3'. -- Greg From nad at acm.org Sun Apr 19 09:26:13 2009 From: nad at acm.org (Ned Deily) Date: Sun, 19 Apr 2009 00:26:13 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EAAAA2.4040800@gmail.com> <49EAB0C2.8040506@gmail.com> Message-ID: In article <49EAB0C2.8040506 at gmail.com>, Nick Coghlan wrote: > Steven Bethard wrote: > > On Sat, Apr 18, 2009 at 9:37 PM, Nick Coghlan wrote: > >> Note that such an approach would then require an altaltinstall command > >> in order to be able to install a specific version of python 3.x without > >> changing the python3 alias (e.g. installing 3.2 without overriding 3.1). > > > > I wasn't suggesting that there shouldn't be a "python3.1", > > "python3.2", etc. I'm more concerned about "fullinstall" creating > > "python3" instead of regular "python". > > If I understand Tony's summary correctly, the situation after Benjamin's > latest checkin is as follows: > > 2.x altinstall: > - installs python2.x executable > > 2.x fullinstall (default for "make install"): > - installs python2.x executable > - adjusts (or creates) python symlink to new executable > > 3.x altinstall (default for "make install"): > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > > 3.x fullinstall: > - installs python3.x executable > - adjusts (or creates) python3 symlink to new executable > - adjusts (or creates) python symlink to new executable Note that versioning is also an unresolved issue for the scripts installed by setup.py; pydoc, idle, 2to3, and smtpd.py. See: http://bugs.python.org/issue5756 Whatever is implemented for python itself should likely apply to them as well. -- Ned Deily, nad at acm.org From mrts.pydev at gmail.com Sun Apr 19 10:38:05 2009 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Sun, 19 Apr 2009 11:38:05 +0300 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <49EA5D01.6040208@gmail.com> References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> Message-ID: On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan wrote: > That said, I'm starting to wonder if an even better option may be to > just drop the kwargs support from the function and require people to > always supply a parameters dictionary. That would simplify the signature > to the quite straightforward: > > ?def add_query_params(url, params, allow_dups=True, sep='&') That's the most straightforward and I like this more than the one below. > I agree that isn't a good option, but mapping True/False/None to those > specific behaviours also seems rather arbitrary (specifically, it is > difficult to remember which of "allow_dups=False" and "allow_dups=None" > means to ignore any duplicate keys and which means to ignore only > duplicate items). I'd say it's less of a problem when using named arguments, i.e. you read it as: allow_dups=True : yes allow_dups=False : effeminately no :), allow_dups=None : strictly no which more or less corresponds to the behaviour. > It also doesn't provide a clear mechanism for > extension (e.g. what if someone wanted duplicate params to trigger an > exception?) > > Perhaps the extra argument should just be a key/value pair filtering > function, and we provide functions for the three existing behaviours > (i.e. allow_duplicates(), ignore_duplicate_keys(), > ignore_duplicate_items()) in the urllib.parse module. This would be the most flexible and conceptually right (ye olde strategy pattern), but would clutter the API. > Note that your implementation and docstring currently conflict with each > other - the docstring says "pass them via a dictionary in second > argument:" but the dictionary is currently the third argument (the > docstring also later refers to passing OrderedDictionary as the second > argument). It's a mistake that exemplifies once again that positional args are awkward :). --- So, gentlemen, either def add_query_params(url, params, allow_dups=True, sep='&') or def allow_duplicates(...) def remove_duplicate_values(...) ... def add_query_params(url, params, strategy=allow_duplicates, sep='&') From stephen at xemacs.org Sun Apr 19 11:17:20 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 19 Apr 2009 18:17:20 +0900 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EA5FE4.9040102@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> Message-ID: <87ocut2ean.fsf@xemacs.org> Nick Coghlan writes: > 3. Change the shebang lines in Python standard library scripts to be > version specific and update release.py to fix them all when bumping the > version number in the source tree. +1 I think that it's probably best to leave "python", "python2", and "python3" for the use of downstream distributors. ISTR that was what Guido concluded, in the discuss that led to Python 3 defaulting to altinstall---it wasn't just convenient because Python 3 is a major change, but that experience has shown that deciding which Python is going to be "The python" on somebody's system just isn't a decision that Python should make. Sure, the difference between Python 2 and Python 3 is big enough to be a hairy nuisance 95% of the time, while the difference between Python 2.5 and Python 2.6 is so only 5% of the time. But the fact is that incompatibilities arise with a minor version bump, too, and all the major distros that I know about have some way to select the default Python version that will be "python". That's not because they want to distinguish between Python 2 and Python 3, nor between Python 2 and Python 1. From martin at v.loewis.de Sun Apr 19 12:18:13 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 19 Apr 2009 12:18:13 +0200 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <87ocut2ean.fsf@xemacs.org> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org> Message-ID: <49EAFA65.2090009@v.loewis.de> > I think that it's probably best to leave "python", "python2", and > "python3" for the use of downstream distributors. ISTR that was what > Guido concluded, in the discuss that led to Python 3 defaulting to > altinstall---it wasn't just convenient because Python 3 is a major > change, but that experience has shown that deciding which Python is > going to be "The python" on somebody's system just isn't a decision > that Python should make. Yes. However, at the language summit in Chicago, it was agreed that the installation should also provide a python3 symlink. I don't recall the agreement wrt. to the names of executables on Windows. Regards, Martin From p.f.moore at gmail.com Sun Apr 19 15:04:27 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 19 Apr 2009 14:04:27 +0100 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: <79990c6b0904190604s7ee2b6e1j7af35010b28ebb67@mail.gmail.com> 2009/4/19 Steven Bethard : > On Sat, Apr 18, 2009 at 8:14 PM, Benjamin Peterson wrote: >> 2009/4/18 Nick Coghlan : >>> I see a few options: >>> 1. Abandon the "python" name for the 3.x series and commit to calling it >>> "python3" now and forever (i.e. actually make the decision that Mitchell >>> refers to). >> >> I believe this was decided on sometime (the sprints?). > > That's an unfortunate decision. When the 2.X line stops being > maintained (after 2.7 maybe?) we're going to be stuck with the "3" > suffix forever for the "real" Python. > > Why doesn't it make more sense to just use "python3" only for > "altinstall" and "python" for "fullinstall"? Agreed. Personally, I'm -0 on this decision. I'd be -1 if I was a Linux user, or if I thought that it would be applied to Windows as well. As it is, my -0 is based on "it doesn't affect me, but it seems wrong to have the official name be different things depending on platform". Paul. From ncoghlan at gmail.com Sun Apr 19 15:52:39 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Apr 2009 23:52:39 +1000 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EAFA65.2090009@v.loewis.de> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org> <49EAFA65.2090009@v.loewis.de> Message-ID: <49EB2CA7.7000803@gmail.com> Martin v. L?wis wrote: >> I think that it's probably best to leave "python", "python2", and >> "python3" for the use of downstream distributors. ISTR that was what >> Guido concluded, in the discuss that led to Python 3 defaulting to >> altinstall---it wasn't just convenient because Python 3 is a major >> change, but that experience has shown that deciding which Python is >> going to be "The python" on somebody's system just isn't a decision >> that Python should make. > > Yes. However, at the language summit in Chicago, it was agreed that > the installation should also provide a python3 symlink. > > I don't recall the agreement wrt. to the names of executables on > Windows. The installer still leaves PATH alone by default, doesn't it? That means the Windows version selection is done by naming the directory. Although I guess choosing a file association for .py files becomes rather more interesting... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From tseaver at palladion.com Sun Apr 19 16:41:59 2009 From: tseaver at palladion.com (Tres Seaver) Date: Sun, 19 Apr 2009 10:41:59 -0400 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Benjamin Peterson wrote: > 2009/4/18 Nick Coghlan : >> Benjamin Peterson wrote: >>> 2009/4/18 Mitchell L Model : >>>> Some library files, such as pdb.py, begin with >>>> #!/usr/bin/env python >>>> In various discussions regarding some issues I submitted I was told that the >>>> decision had been made to call Python 3.x release executables python3. (One >>>> of the conflicts I ran into when I made 'python' a link to python3.1 was >>>> that some tools used in making the HTML documentation haven't been upgraded >>>> to run with 3.) >>>> >>>> Shouldn't all library files that begin with the above line be changed so >>>> that they read 'python3' instead of python? Perhaps I should have just filed >>>> this as an issue, but I'm not confident of the state of the plan to move to >>>> python3 as the official executable name. >>> That sounds correct. Please file a bug report. >> As Kevin pointed out, while this is a problem, changing the affected >> scripts to say "python3" instead isn't the right answer. >> >> All that happened with the Python 3 installers is that they do >> 'altinstall' rather than 'fullinstall' by default, thus leaving the >> 'python' alias alone. There is no "python3" alias unless a user creates >> it for themselves (or a distro packager does it for them). > > I've actually implemented a python3 alias for 3.1. > >> I see a few options: >> 1. Abandon the "python" name for the 3.x series and commit to calling it >> "python3" now and forever (i.e. actually make the decision that Mitchell >> refers to). > > I believe this was decided on sometime (the sprints?). It was at the Language Summit. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ6zg3+gerLs4ltQ4RAt2ZAKDRGXMXBRs5FiHLnC0MQt56janafwCdGytm /nrHCiifI/KibI+ljppr3aA= =uYha -----END PGP SIGNATURE----- From steven.bethard at gmail.com Sun Apr 19 17:54:12 2009 From: steven.bethard at gmail.com (Steven Bethard) Date: Sun, 19 Apr 2009 08:54:12 -0700 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> Message-ID: On Sun, Apr 19, 2009 at 1:38 AM, Mart S?mermaa wrote: > On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan wrote: >> That said, I'm starting to wonder if an even better option may be to >> just drop the kwargs support from the function and require people to >> always supply a parameters dictionary. That would simplify the signature >> to the quite straightforward: >> >> ?def add_query_params(url, params, allow_dups=True, sep='&') > > That's the most straightforward and I like this more than the one below. > >> I agree that isn't a good option, but mapping True/False/None to those >> specific behaviours also seems rather arbitrary (specifically, it is >> difficult to remember which of "allow_dups=False" and "allow_dups=None" >> means to ignore any duplicate keys and which means to ignore only >> duplicate items). > > I'd say it's less of a problem when using named arguments, i.e. you read it as: > > allow_dups=True : yes > allow_dups=False : effeminately no :), > allow_dups=None : strictly no > > which more or less corresponds to the behaviour. > >> It also doesn't provide a clear mechanism for >> extension (e.g. what if someone wanted duplicate params to trigger an >> exception?) >> >> Perhaps the extra argument should just be a key/value pair filtering >> function, and we provide functions for the three existing behaviours >> (i.e. allow_duplicates(), ignore_duplicate_keys(), >> ignore_duplicate_items()) in the urllib.parse module. > > This would be the most flexible and conceptually right (ye olde > strategy pattern), but would clutter the API. > >> Note that your implementation and docstring currently conflict with each >> other - the docstring says "pass them via a dictionary in second >> argument:" but the dictionary is currently the third argument (the >> docstring also later refers to passing OrderedDictionary as the second >> argument). > > It's a mistake that exemplifies once again that positional args are awkward :). > > --- > > So, gentlemen, either > > def add_query_params(url, params, allow_dups=True, sep='&') > > or > > def allow_duplicates(...) > > def remove_duplicate_values(...) > > ... > > def add_query_params(url, params, strategy=allow_duplicates, sep='&') +1 for the strategy approach. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From martin at v.loewis.de Sun Apr 19 20:51:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 19 Apr 2009 20:51:47 +0200 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EB2CA7.7000803@gmail.com> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org> <49EAFA65.2090009@v.loewis.de> <49EB2CA7.7000803@gmail.com> Message-ID: <49EB72C3.5080307@v.loewis.de> > The installer still leaves PATH alone by default, doesn't it? Correct. However, people frequently set the path "by hand", so they would probably appreciate a python3 binary (and pythonw3? python3w?). Of course, those people could also manually copy/rename the executable. > Although I guess choosing a file association for .py files becomes > rather more interesting... Indeed. We could register a py3 extension (and py3w? pyw3?), but then .py might remain associated with python3, even though people want it associated with python 2. Regards, Martin From janssen at parc.com Sun Apr 19 21:26:59 2009 From: janssen at parc.com (Bill Janssen) Date: Sun, 19 Apr 2009 12:26:59 PDT Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> Message-ID: <92117.1240169219@parc.com> Mart S?mermaa wrote: > On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan wrote: > > That said, I'm starting to wonder if an even better option may be to > > just drop the kwargs support from the function and require people to > > always supply a parameters dictionary. That would simplify the signature > > to the quite straightforward: > > > > ?def add_query_params(url, params, allow_dups=True, sep='&') Or even better, stop trying to use a mapping, and just make the "params" value a list of (name, value) pairs. That way you can stop fiddling around with "allow_dups" and just get rid of it. Bill From fuzzyman at voidspace.org.uk Sun Apr 19 21:30:05 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 19 Apr 2009 20:30:05 +0100 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <92117.1240169219@parc.com> References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> Message-ID: <49EB7BBD.8010806@voidspace.org.uk> Bill Janssen wrote: > Mart S?mermaa wrote: > > >> On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan wrote: >> >>> That said, I'm starting to wonder if an even better option may be to >>> just drop the kwargs support from the function and require people to >>> always supply a parameters dictionary. That would simplify the signature >>> to the quite straightforward: >>> >>> def add_query_params(url, params, allow_dups=True, sep='&') >>> > > Or even better, stop trying to use a mapping, and just make the "params" > value a list of (name, value) pairs. That way you can stop fiddling > around with "allow_dups" and just get rid of it. > Reluctant +1, it seems the best solution. You can always use {}.items() if you still want to store the params in a mapping. Michael > Bill > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From solipsis at pitrou.net Sun Apr 19 21:35:56 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Apr 2009 19:35:56 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> Message-ID: Bill Janssen parc.com> writes: > > Or even better, stop trying to use a mapping, and just make the "params" > value a list of (name, value) pairs. You can even accept both a list of (name, value) pairs /and/ some **kwargs, like the dict constructor does. It would be a pity to drop the user-friendliness of kwargs just to satisfy some rare and obscure requirement. Regards Antoine. From janssen at parc.com Sun Apr 19 22:21:05 2009 From: janssen at parc.com (Bill Janssen) Date: Sun, 19 Apr 2009 13:21:05 PDT Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> Message-ID: <93230.1240172465@parc.com> Antoine Pitrou wrote: > Bill Janssen parc.com> writes: > > > > Or even better, stop trying to use a mapping, and just make the "params" > > value a list of (name, value) pairs. > > You can even accept both a list of (name, value) pairs /and/ some **kwargs, like > the dict constructor does. It would be a pity to drop the user-friendliness of > kwargs just to satisfy some rare and obscure requirement. This whole discussion seems a bit "rare and obscure" to me. I've built URLs for years without this method, and never felt the lack. What bugs me is the lack of a way to build multipart-formdata payloads, the only standard way to send non-Latin1 strings as part of a request. I'd like to suggest we move this off python-dev, and to either the Web-SIG or stdlib-sig mailing lists, which are probably more interested in all of this. Bill From solipsis at pitrou.net Sun Apr 19 22:24:44 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Apr 2009 20:24:44 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> Message-ID: Bill Janssen parc.com> writes: > > This whole discussion seems a bit "rare and obscure" to me. I've built > URLs for years without this method, and never felt the lack. What bugs me > is the lack of a way to build multipart-formdata payloads, the only standard > way to send non-Latin1 strings as part of a request. ?? What's the problem with sending non-Latin1 data without multipart-formdata? From janssen at parc.com Sun Apr 19 22:59:44 2009 From: janssen at parc.com (Bill Janssen) Date: Sun, 19 Apr 2009 13:59:44 PDT Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> Message-ID: <93939.1240174784@parc.com> Antoine Pitrou wrote: > Bill Janssen parc.com> writes: > > > > This whole discussion seems a bit "rare and obscure" to me. I've built > > URLs for years without this method, and never felt the lack. What bugs me > > is the lack of a way to build multipart-formdata payloads, the only standard > > way to send non-Latin1 strings as part of a request. > > ?? What's the problem with sending non-Latin1 data without multipart-formdata? I should have said, as values for a FORM submission. There are two ways to encode form values for a FORM submission, application/x-www-form-urlencoded, and multipart/form-data. As per http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4: ``The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.'' And we don't support this in the http client-side standard library code. (Do we? Haven't looked lately.) The same section also says: ``Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').'' That "the ASCII code of the character" seemingly restricts it to ASCII... But this is complicated by the fact that most browsers try to use the character set the server will understand, and the widely used technique to accomplish this is to use the same charset the page the FORM occurs in uses. Unless this is set explicitly, it defaults to Latin-1. I prefer to avoid all this uncertainty, and use a well-defined format when submitting a form, so I tend to use multipart/form-data, which allows explicit control over this. Bill From solipsis at pitrou.net Sun Apr 19 23:45:04 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Apr 2009 21:45:04 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> Message-ID: Bill Janssen parc.com> writes: > > ``The content type "application/x-www-form-urlencoded" is inefficient > for sending large quantities of binary data or text containing non-ASCII > characters. The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding scheme would) doesn't mean that it doesn't work. There are millions of Web sites out there which allow you to submit non-ASCII data without resorting to "multipart/form-data" encoding. The situations where the submitted text is huge enough that encoding efficiency matters are probably insanely rare. > But this is complicated by the fact that most browsers try to use the > character set the server will understand, and the widely used technique > to accomplish this is to use the same charset the page the FORM occurs > in uses. Unless this is set explicitly, it defaults to Latin-1. Look out there, many Web pages specify a different character set than Latin-1... UTF8 is quite a common choice in the modern world. Also, browsers will encode those characters that cannot be encoded in the character set using HTML escapes ("&1234;"). This means you can enter any unicode text into any form, regardless of the encoding of the source page. It's up to the Web application to decode the text, sure, but any decent Web framework or toolkit should do it for you. Regards Antoine. From steve at pearwood.info Mon Apr 20 01:03:28 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Apr 2009 09:03:28 +1000 Subject: [Python-Dev] [Python-ideas] Proposed addtion to urllib.parse in 3.1 (and urlparse in 2.7) In-Reply-To: <92117.1240169219@parc.com> References: <92117.1240169219@parc.com> Message-ID: <200904200903.29083.steve@pearwood.info> On Mon, 20 Apr 2009 05:26:59 am Bill Janssen wrote: > Mart S?mermaa wrote: > > On Sun, Apr 19, 2009 at 2:06 AM, Nick Coghlan wrote: > > > That said, I'm starting to wonder if an even better option may be > > > to just drop the kwargs support from the function and require > > > people to always supply a parameters dictionary. That would > > > simplify the signature to the quite straightforward: > > > > > > ?def add_query_params(url, params, allow_dups=True, sep='&') > > Or even better, stop trying to use a mapping, and just make the > "params" value a list of (name, value) pairs. That way you can stop > fiddling around with "allow_dups" and just get rid of it. Surely it should support any mapping? That's what I do in my own code. People will use regular dicts for convenience when they don't care about order or duplicates, and (name,value) pairs, or an OrderedDict, when they do. I suppose you could force people to write params.items() if params is a dict, but it seems wrong to force an order on input data when it doesn't require one. -- Steven D'Aprano From janssen at parc.com Mon Apr 20 05:41:23 2009 From: janssen at parc.com (Bill Janssen) Date: Sun, 19 Apr 2009 20:41:23 PDT Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> Message-ID: <98461.1240198883@parc.com> Antoine Pitrou wrote: > Bill Janssen parc.com> writes: > > > > ``The content type "application/x-www-form-urlencoded" is inefficient > > for sending large quantities of binary data or text containing non-ASCII > > characters. > > The fact that it's "inefficient" (i.e. takes more bytes than an optimal encoding > scheme would) doesn't mean that it doesn't work. Absolutely. I'm just quoting the spec to you. In any case, being able to send multipart/form-data would be a nice thing to have, if only for file uploads. > Look out there, many Web pages specify a different character set than > Latin-1... UTF8 is quite a common choice in the modern world. Sure. But nowhere does a spec say that this page charset should be used in sending the values of a FORM using application/x-www-form-urlencoded in a new HTTP request. It's just a convention some browsers use. > Also, browsers will encode those characters that cannot be encoded in the > character set using HTML escapes ("&1234;"). This means you can enter any Sure, some browsers will. Others will apparently replace them with question marks. It's undefined. > unicode text into any form, regardless of the encoding of the source page. It's > up to the Web application to decode the text, sure, but any decent Web framework > or toolkit should do it for you. Bill From christian.doll at basf.com Mon Apr 20 08:54:15 2009 From: christian.doll at basf.com (christian.doll at basf.com) Date: Mon, 20 Apr 2009 08:54:15 +0200 Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation Message-ID: Hello, im looking for something like PEP-0304 ( http://www.python.org/dev/peps/pep-0304/) I need something to suppress the generation of *.pyc files because i have very much different machines which call a python program at same time. the python program crashes at different places and on different machines - i think the problem are the *.pyc files of different machines which are generated at the same time. is pep-0304 implemented in a newer python version ( we use 2.4.4 ) or is there a work around or can someone implement pep-0304? thank you for your help! Viele Gr??e Christian Doll -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Apr 20 11:44:54 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 20 Apr 2009 09:44:54 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> <98461.1240198883@parc.com> Message-ID: Bill Janssen parc.com> writes: > > Sure. But nowhere does a spec say that this page charset should be used > in sending the values of a FORM using application/x-www-form-urlencoded > in a new HTTP request. It's just a convention some browsers use. Let's call it a de facto standard then. A behaviour doesn't have to be engraved in an RFC to be considered standard. Regards Antoine. From steve at pearwood.info Mon Apr 20 11:25:24 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Apr 2009 19:25:24 +1000 Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation In-Reply-To: References: Message-ID: <200904201925.24851.steve@pearwood.info> On Mon, 20 Apr 2009 04:54:15 pm christian.doll at basf.com wrote: > I need something to suppress the generation of *.pyc files > because i have very much different machines which call a python > program at same time. This list is for development *of* Python, not development *with* Python. You would probably be better off on comp.lang.python or python-list at python.org. However, I believe that the normal way to prevent the generation of .pyc files is to remove write access to the directory where the .py files are. -- Steven D'Aprano From ismail at namtrac.org Mon Apr 20 11:54:15 2009 From: ismail at namtrac.org (=?UTF-8?B?xLBzbWFpbCBEw7ZubWV6?=) Date: Mon, 20 Apr 2009 12:54:15 +0300 Subject: [Python-Dev] Something like PEP-0304 - suppress *.pyc generation In-Reply-To: <200904201925.24851.steve@pearwood.info> References: <200904201925.24851.steve@pearwood.info> Message-ID: <19e566510904200254j9a4f3acx695ec152e65af7cc@mail.gmail.com> On Mon, Apr 20, 2009 at 12:25 PM, Steven D'Aprano wrote: > On Mon, 20 Apr 2009 04:54:15 pm christian.doll at basf.com wrote: > >> I need something to suppress the generation of *.pyc files >> because i have very much different machines which call a python >> program at same time. > > This list is for development *of* Python, not development *with* > Python. You would probably be better off on comp.lang.python or > python-list at python.org. > > However, I believe that the normal way to prevent the generation > of .pyc files is to remove write access to the directory where > the .py files are. Checkout http://docs.python.org/using/cmdline.html#envvar-PYTHONDONTWRITEBYTECODE Regards. -- ?smail D?NMEZ From a.badger at gmail.com Mon Apr 20 16:46:06 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 20 Apr 2009 07:46:06 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <49EACA2C.6060606@canterbury.ac.nz> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <1afaf6160904182014h5a8b92a8t67104637f0624f9f@mail.gmail.com> <49EACA2C.6060606@canterbury.ac.nz> Message-ID: <49EC8AAE.2050506@gmail.com> Greg Ewing wrote: > Steven Bethard wrote: > >> That's an unfortunate decision. When the 2.X line stops being >> maintained (after 2.7 maybe?) we're going to be stuck with the "3" >> suffix forever for the "real" Python. > > I don't see why we have to be stuck with it forever. > When 2.x has faded into the sunset, we can start > aliasing 'python' to 'python3' if we want, can't we? > You could, but it's not my favorite idea. Gets people used to the idea of python == python2 and python3 == python3 as something they can count on. Then says, "Oops, that was just an implementation detail, we're changing that now". Much better to either make a clean break and call the new language dialect python3 from now and forever or force people to come up with solutions to whether /usr/bin/python == python2 or python3 right now while it's fresh and relevant in their minds. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From janssen at parc.com Mon Apr 20 17:32:52 2009 From: janssen at parc.com (Bill Janssen) Date: Mon, 20 Apr 2009 08:32:52 PDT Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> <98461.1240198883@parc.com> Message-ID: <5339.1240241572@parc.com> Antoine Pitrou wrote: > Bill Janssen parc.com> writes: > > > > Sure. But nowhere does a spec say that this page charset should be used > > in sending the values of a FORM using application/x-www-form-urlencoded > > in a new HTTP request. It's just a convention some browsers use. > > Let's call it a de facto standard then. A behaviour doesn't have to be engraved > in an RFC to be considered standard. Sure. And if HTTP was all about browsers keying off pages, that would be fine with me. But it's not. HTTP is used in lots of places where there are no browsers; in fact, the idea we're busy bike-shedding is all about a client-side library making calls on a server. It's used in places where there are no "pages", too, just servers on which clients are making REST-style calls. So in the real world, the only way in which you can reliably post non-ASCII values to a server using HTTP is with multipart/form-data, which allows you to explicitly say what character set you are using. I've debugged this problem too many times with REST servers of various kinds to think otherwise. Bill From solipsis at pitrou.net Mon Apr 20 17:42:28 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 20 Apr 2009 15:42:28 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> <98461.1240198883@parc.com> <5339.1240241572@parc.com> Message-ID: Bill Janssen parc.com> writes: > > Sure. And if HTTP was all about browsers keying off pages, that would > be fine with me. But it's not. HTTP is used in lots of places where > there are no browsers; I'm sorry, I don't follow you. The fact that something else than a browser makes the request shouldn't change the behaviour on the /server/ side. > It's used in > places where there are no "pages", too, just servers on which clients > are making REST-style calls. So what? The designer of the REST API must mandate an encoding (most probably UTF-8 rather than Latin-1 as you bizarrely seemed to imply) and the problem is solved. Complaining that the RFC doesn't specify all this sounds like an excuse for programmer laziness. Antoine. From janssen at parc.com Mon Apr 20 18:33:31 2009 From: janssen at parc.com (Bill Janssen) Date: Mon, 20 Apr 2009 09:33:31 PDT Subject: [Python-Dev] =?utf-8?q?=5BPython-ideas=5D_Proposed_addtion_to_url?= =?utf-8?q?lib=2Eparse_in=093=2E1_=28and_urlparse_in_2=2E7=29?= In-Reply-To: References: <49E9CA6A.6060004@gmail.com> <49EA5D01.6040208@gmail.com> <92117.1240169219@parc.com> <93230.1240172465@parc.com> <93939.1240174784@parc.com> <98461.1240198883@parc.com> <5339.1240241572@parc.com> Message-ID: <6787.1240245211@parc.com> Antoine Pitrou wrote: > Bill Janssen parc.com> writes: > > > > Sure. And if HTTP was all about browsers keying off pages, that would > > be fine with me. But it's not. HTTP is used in lots of places where > > there are no browsers; > > I'm sorry, I don't follow you. The fact that something else than a browser makes > the request shouldn't change the behaviour on the /server/ side. I'm talking about the client side, though. > > It's used in > > places where there are no "pages", too, just servers on which clients > > are making REST-style calls. > > So what? The designer of the REST API must mandate an encoding (most probably > UTF-8 rather than Latin-1 as you bizarrely seemed to imply) and the problem is > solved. Sure, if they understand that they have to do it. > Complaining that the RFC doesn't specify all this sounds like an excuse for > programmer laziness. Or incompetence, which I'm afraid is a more likely issue. Lots of folks write their own HTTP servers, and don't really understand just *what* they need to specify. As a client-side user of one of those servers, I'm left in the dark. I think we've beat this to death for python-dev. Feel free to continue it on Web-SIG, though, if you wish. Bill From jared.grubb at gmail.com Mon Apr 20 20:22:37 2009 From: jared.grubb at gmail.com (Jared Grubb) Date: Mon, 20 Apr 2009 11:22:37 -0700 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: <87ocut2ean.fsf@xemacs.org> References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org> Message-ID: On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote: > Nick Coghlan writes: >> 3. Change the shebang lines in Python standard library scripts to be >> version specific and update release.py to fix them all when bumping >> the >> version number in the source tree. > > +1 > > I think that it's probably best to leave "python", "python2", and > "python3" for the use of downstream distributors. ISTR that was what > Guido concluded, in the discuss that led to Python 3 defaulting to > altinstall---it wasn't just convenient because Python 3 is a major > change, but that experience has shown that deciding which Python is > going to be "The python" on somebody's system just isn't a decision > that Python should make. Ok, so if I understand, the situation is: * python points to 2.x version * python3 points to 3.x version * need to be able to run certain 3k scripts from cmdline (since we're talking about shebangs) using Python3k even though "python" points to 2.x So, if I got the situation right, then do these same scripts understand that PYTHONPATH and PYTHONHOME and all the others are also probably pointing to 2.x code? Jared From fuzzyman at voidspace.org.uk Mon Apr 20 20:24:30 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 20 Apr 2009 19:24:30 +0100 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable In-Reply-To: References: <1afaf6160904181348x4710050enf0468b4f4ab98d70@mail.gmail.com> <49EA5FE4.9040102@gmail.com> <87ocut2ean.fsf@xemacs.org> Message-ID: <49ECBDDE.4040002@voidspace.org.uk> Jared Grubb wrote: > > On 19 Apr 2009, at 02:17, Stephen J. Turnbull wrote: >> Nick Coghlan writes: >>> 3. Change the shebang lines in Python standard library scripts to be >>> version specific and update release.py to fix them all when bumping the >>> version number in the source tree. >> >> +1 >> >> I think that it's probably best to leave "python", "python2", and >> "python3" for the use of downstream distributors. ISTR that was what >> Guido concluded, in the discuss that led to Python 3 defaulting to >> altinstall---it wasn't just convenient because Python 3 is a major >> change, but that experience has shown that deciding which Python is >> going to be "The python" on somebody's system just isn't a decision >> that Python should make. > > Ok, so if I understand, the situation is: > * python points to 2.x version > * python3 points to 3.x version > * need to be able to run certain 3k scripts from cmdline (since we're > talking about shebangs) using Python3k even though "python" points to 2.x > > So, if I got the situation right, then do these same scripts > understand that PYTHONPATH and PYTHONHOME and all the others are also > probably pointing to 2.x code? IIRC the proposal was to also create PYTHON3PATH and PYTHON3HOME. Michael > > Jared > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From benjamin at python.org Tue Apr 21 00:06:20 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 20 Apr 2009 17:06:20 -0500 Subject: [Python-Dev] 3.1 beta blockers Message-ID: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> The first (and only) beta of 3.1 is scheduled for less than 2 weeks away, May 2nd, and is creeping onto the horizon. There are currently 6 blockers: #5692: test_zipfile fails under Windows - This looks like a fairly easy fix. #5775: marshal.c needs to be checked for out of memory errors - Looks like Eric has this one. #5410: msvcrt bytes cleanup - It would be nice to have a Windows expert examine the patch on this issue for correctness. #5786: [This isn't applicable to 3.1] #5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE or the doc build system. -- Thanks for your work, Benjamin From benjamin at python.org Tue Apr 21 00:09:40 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 20 Apr 2009 17:09:40 -0500 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> Message-ID: <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> I forgot one: #4136 - Porting the json changes to py3k - This issue exposed the brokenness of the json module in py3k. Was any consensus reached about what the API of json should be? If the beta time rolls around and nothing has changed on this issue, I think Antoine's patch, which makes json input and output unicode should be applied. 2009/4/20 Benjamin Peterson : > The first (and only) beta of 3.1 is scheduled for less than 2 weeks > away, May 2nd, and is creeping onto the horizon. There are currently 6 > blockers: -- Regards, Benjamin From nad at acm.org Tue Apr 21 00:37:24 2009 From: nad at acm.org (Ned Deily) Date: Mon, 20 Apr 2009 15:37:24 -0700 Subject: [Python-Dev] 3.1 beta blockers References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> Message-ID: In article <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>, Benjamin Peterson wrote: > I forgot one: [...] What about #5756 - idle, pydoc, et al removed from 3.1? -- Ned Deily, nad at acm.org From barry at python.org Tue Apr 21 00:44:00 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 20 Apr 2009 18:44:00 -0400 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> Message-ID: <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org> On Apr 20, 2009, at 6:37 PM, Ned Deily wrote: > In article > <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>, > Benjamin Peterson wrote: >> I forgot one: [...] > > What about #5756 - idle, pydoc, et al removed from 3.1? Were we going to remove this from 2.7 also? I'm working on splitting two of my Tools (pynche and world) off into separate projects and can't remember what we decided about that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 304 bytes Desc: This is a digitally signed message part URL: From nad at acm.org Tue Apr 21 00:55:30 2009 From: nad at acm.org (Ned Deily) Date: Mon, 20 Apr 2009 15:55:30 -0700 Subject: [Python-Dev] 3.1 beta blockers References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org> Message-ID: In article <40D62762-ABAB-4DE1-9BE2-798E40AE23DD at python.org>, Barry Warsaw wrote: > On Apr 20, 2009, at 6:37 PM, Ned Deily wrote: > > > In article > > <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>, > > Benjamin Peterson wrote: > >> I forgot one: [...] > > > > What about #5756 - idle, pydoc, et al removed from 3.1? > > Were we going to remove this from 2.7 also? I'm working on splitting > two of my Tools (pynche and world) off into separate projects and > can't remember what we decided about that. I'm confused. The point of #5756 was that 3.x builds are broken because the installation of idle, pydoc, 2to3, and smtpd.py have been commented out in setup.py and thus these scripts are no longer being installed. Unless I'm missing something, that's the only way they were being installed in any form. If nothing else, the change breaks the OSX installer build. If they were removed deliberately (and are intended to be removed from 2.7??), there needs to be some replacement and/or doc changes, no? -- Ned Deily, nad at acm.org From benjamin at python.org Tue Apr 21 01:44:26 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 20 Apr 2009 18:44:26 -0500 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org> References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> <40D62762-ABAB-4DE1-9BE2-798E40AE23DD@python.org> Message-ID: <1afaf6160904201644s684a0044j837176f0a5e71d93@mail.gmail.com> 2009/4/20 Barry Warsaw : > On Apr 20, 2009, at 6:37 PM, Ned Deily wrote: > >> In article >> <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>, >> Benjamin Peterson wrote: >>> >>> I forgot one: [...] >> >> What about #5756 - idle, pydoc, et al removed from 3.1? > > Were we going to remove this from 2.7 also? ?I'm working on splitting two of > my Tools (pynche and world) off into separate projects and can't remember > what we decided about that. Those aren't installed as scripts like idle and pydoc, so I believe they can go. -- Regards, Benjamin From benjamin at python.org Tue Apr 21 01:47:25 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 20 Apr 2009 18:47:25 -0500 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <1afaf6160904201509g2f5e784ah34c728732ca9b160@mail.gmail.com> Message-ID: <1afaf6160904201647p445e22fcs872edd7d712f794@mail.gmail.com> 2009/4/20 Ned Deily : > In article > <1afaf6160904201509g2f5e784ah34c728732ca9b160 at mail.gmail.com>, > ?Benjamin Peterson wrote: >> I forgot one: [...] > > What about #5756 - idle, pydoc, et al removed from 3.1? I just bumped priority and left a comment. -- Regards, Benjamin From alessiogiovanni.baroni at gmail.com Tue Apr 21 11:13:31 2009 From: alessiogiovanni.baroni at gmail.com (Alessio Giovanni Baroni) Date: Tue, 21 Apr 2009 11:13:31 +0200 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> Message-ID: There are some cases of OutOfMemory? On my machine the float->string conversion is all ok. Also 'make test' is all ok. 2009/4/21 Benjamin Peterson > The first (and only) beta of 3.1 is scheduled for less than 2 weeks > away, May 2nd, and is creeping onto the horizon. There are currently 6 > blockers: > > #5692: test_zipfile fails under Windows - This looks like a fairly easy > fix. > > #5775: marshal.c needs to be checked for out of memory errors - Looks > like Eric has this one. > > #5410: msvcrt bytes cleanup - It would be nice to have a Windows > expert examine the patch on this issue for correctness. > > #5786: [This isn't applicable to 3.1] > > #5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE > or the doc build system. > > > -- > Thanks for your work, > Benjamin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/alessiogiovanni.baroni%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Tue Apr 21 13:13:34 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 21 Apr 2009 07:13:34 -0400 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> Message-ID: <49EDAA5E.7040908@trueblade.com> Alessio Giovanni Baroni wrote: > There are some cases of OutOfMemory? On my machine the float->string > conversion is all ok. Also 'make test' is all ok. I assume you're talking about issue 5775. I think it's all explained in the bug report. Basically, the float->string conversion can now return an out of memory error, which it could not before. marshal.c's w_object doesn't check for those error conditions. I doubt they'll ever occur in any test, but they need to be handled none the less. It's on my list of things to do in the next week. But if there's anyone who understands the code and would like to take a look, feel free. Eric. > > 2009/4/21 Benjamin Peterson > > > The first (and only) beta of 3.1 is scheduled for less than 2 weeks > away, May 2nd, and is creeping onto the horizon. There are currently 6 > blockers: > > #5692: test_zipfile fails under Windows - This looks like a fairly > easy fix. > > #5775: marshal.c needs to be checked for out of memory errors - Looks > like Eric has this one. > > #5410: msvcrt bytes cleanup - It would be nice to have a Windows > expert examine the patch on this issue for correctness. > > #5786: [This isn't applicable to 3.1] > > #5783: IDLE cannot find windows chm file - Awaiting a fix to the IDLE > or the doc build system. > > > -- > Thanks for your work, > Benjamin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/alessiogiovanni.baroni%40gmail.com > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/eric%2Bpython-dev%40trueblade.com From eric at trueblade.com Tue Apr 21 14:01:26 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 21 Apr 2009 08:01:26 -0400 Subject: [Python-Dev] 3.1 beta blockers In-Reply-To: <49EDAA5E.7040908@trueblade.com> References: <1afaf6160904201506r1f2df3d8pc9eafe3342a028e3@mail.gmail.com> <49EDAA5E.7040908@trueblade.com> Message-ID: <49EDB596.4070505@trueblade.com> Eric Smith wrote: > Alessio Giovanni Baroni wrote: >> There are some cases of OutOfMemory? On my machine the float->string >> conversion is all ok. Also 'make test' is all ok. > > I assume you're talking about issue 5775. I think it's all explained in > the bug report. Basically, the float->string conversion can now return > an out of memory error, which it could not before. marshal.c's w_object > doesn't check for those error conditions. I doubt they'll ever occur in > any test, but they need to be handled none the less. > > It's on my list of things to do in the next week. But if there's anyone > who understands the code and would like to take a look, feel free. I just fixed it in r71783, so it should be off the list of release blockers. From martin at v.loewis.de Wed Apr 22 08:50:22 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 08:50:22 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces Message-ID: <49EEBE2E.3090601@v.loewis.de> I'm proposing the following PEP for inclusion into Python 3.1. Please comment. Regards, Martin PEP: 383 Title: Non-decodable Bytes in System Character Interfaces Version: $Revision: 71793 $ Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $ Author: Martin v. L?wis Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 22-Apr-2009 Python-Version: 3.1 Post-History: Abstract ======== File names, environment variables, and command line arguments are defined as being character data in POSIX; the C APIs however allow passing arbitrary bytes - whether these conform to a certain encoding or not. This PEP proposes a means of dealing with such irregularities by embedding the bytes in character strings in such a way that allows recreation of the original byte string. Rationale ========= The C char type is a data type that is commonly used to represent both character data and bytes. Certain POSIX interfaces are specified and widely understood as operating on character data, however, the system call interfaces make no assumption on the encoding of these data, and pass them on as-is. With Python 3, character strings use a Unicode-based internal representation, making it difficult to ignore the encoding of byte strings in the same way that the C interfaces can ignore the encoding. On the other hand, Microsoft Windows NT has correct the original design limitation of Unix, and made it explicit in its system interfaces that these data (file names, environment variables, command line arguments) are indeed character data, by providing a Unicode-based API (keeping a C-char-based one for backwards compatibility). For Python 3, one proposed solution is to provide two sets of APIs: a byte-oriented one, and a character-oriented one, where the character-oriented one would be limited to not being able to represent all data accurately. Unfortunately, for Windows, the situation would be exactly the opposite: the byte-oriented interface cannot represent all data; only the character-oriented API can. As a consequence, libraries and applications that want to support all user data in a cross-platform manner have to accept mish-mash of bytes and characters exactly in the way that caused endless troubles for Python 2.x. With this PEP, a uniform treatment of these data as characters becomes possible. The uniformity is achieved by using specific encoding algorithms, meaning that the data can be converted back to bytes on POSIX systems only if the same encoding is used. Specification ============= On Windows, Python uses the wide character APIs to access character-oriented APIs, allowing direct conversion of the environmental data to Python str objects. On POSIX systems, Python currently applies the locale's encoding to convert the byte data to Unicode. If the locale's encoding is UTF-8, it can represent the full set of Unicode characters, otherwise, only a subset is representable. In the latter case, using private-use characters to represent these bytes would be an option. For UTF-8, doing so would create an ambiguity, as the private-use characters may regularly occur in the input also. To convert non-decodable bytes, a new error handler "python-escape" is introduced, which decodes non-decodable bytes using into a private-use character U+F01xx, which is believed to not conflict with private-use characters that currently exist in Python codecs. The error handler interface is extended to allow the encode error handler to return byte strings immediately, in addition to returning Unicode strings which then get encoded again. If the locale's encoding is UTF-8, the file system encoding is set to a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. Discussion ========== While providing a uniform API to non-decodable bytes, this interface has the limitation that chosen representation only "works" if the data get converted back to bytes with the python-escape error handler also. Encoding the data with the locale's encoding and the (default) strict error handler will raise an exception, encoding them with UTF-8 will produce non-sensical data. For most applications, we assume that they eventually pass data received from a system interface back into the same system interfaces. For example, and application invoking os.listdir() will likely pass the result strings back into APIs like os.stat() or open(), which then encodes them back into their original byte representation. Applications that need to process the original byte strings can obtain them by encoding the character strings with the file system encoding, passing "python-escape" as the error handler name. Copyright ========= This document has been placed in the public domain. From ncoghlan at gmail.com Wed Apr 22 12:56:51 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Apr 2009 20:56:51 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49EEF7F3.8060406@gmail.com> Martin v. L?wis wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. That seems like a much nicer solution than having parallel bytes/Unicode APIs everywhere. When the locale encoding is UTF-8, would UTF-8b also be used for the command line decoding and environment variable encoding/decoding? (the PEP currently only states that the encoding switch will be done for the file system encoding - it is silent regarding the other two system interfaces). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From glyph at divmod.com Wed Apr 22 14:20:24 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 22 Apr 2009 12:20:24 -0000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> On 06:50 am, martin at v.loewis.de wrote: >I'm proposing the following PEP for inclusion into Python 3.1. >Please comment. >To convert non-decodable bytes, a new error handler "python-escape" is >introduced, which decodes non-decodable bytes using into a private-use >character U+F01xx, which is believed to not conflict with private-use >characters that currently exist in Python codecs. -1. On UNIX, character data is not sufficient to represent paths. We must, must, must continue to have a simple bytes interface to these APIs. Covering it up in layers of obscure encoding hacks will not make the problem go away, it will just make it harder to understand. To make matters worse, Linux and GNOME use the PUA for some printable characters. If you open up charmap on an ubuntu system and select "view by unicode character block", then click on "private use area", you'll see many of these. I know that Apple uses at least a few PUA codepoints for the apple logo and the propeller/option icons as well. I am still -1 on any turn-non-decodable-bytes-into-text, because it makes life harder for those of us trying to keep bytes and text straight, but if you absolutely must represent POSIX filenames as mojibake rather than bytes, the only workable solution is to use NUL as your escape character. That's the only code point which _actually_ can't show up in a filename somehow. As we discussed last time, this is what Mono does with System.IO.Path. As a bonus, it's _much_ easier to detect a NUL from random application code than to try to figure out if a string has any half-surrogates or magic PUA characters which shouldn't be interpreted according to platform PUA rules. From walter at livinglogic.de Wed Apr 22 13:48:04 2009 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Wed, 22 Apr 2009 13:48:04 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49EF03F4.3010407@livinglogic.de> Martin v. L?wis wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. > > Regards, > Martin > > PEP: 383 > Title: Non-decodable Bytes in System Character Interfaces > Version: $Revision: 71793 $ > Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $ > Author: Martin v. L?wis > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 22-Apr-2009 > Python-Version: 3.1 > Post-History: > > Abstract > ======== > > File names, environment variables, and command line arguments are > defined as being character data in POSIX; the C APIs however allow > passing arbitrary bytes - whether these conform to a certain encoding > or not. This PEP proposes a means of dealing with such irregularities > by embedding the bytes in character strings in such a way that allows > recreation of the original byte string. > > Rationale > ========= > > The C char type is a data type that is commonly used to represent both > character data and bytes. Certain POSIX interfaces are specified and > widely understood as operating on character data, however, the system > call interfaces make no assumption on the encoding of these data, and > pass them on as-is. With Python 3, character strings use a > Unicode-based internal representation, making it difficult to ignore > the encoding of byte strings in the same way that the C interfaces can > ignore the encoding. > > On the other hand, Microsoft Windows NT has correct the original "correct" -> "corrected" > design limitation of Unix, and made it explicit in its system > interfaces that these data (file names, environment variables, command > line arguments) are indeed character data, by providing a > Unicode-based API (keeping a C-char-based one for backwards > compatibility). > > [...] > > Specification > ============= > > On Windows, Python uses the wide character APIs to access > character-oriented APIs, allowing direct conversion of the > environmental data to Python str objects. > > On POSIX systems, Python currently applies the locale's encoding to > convert the byte data to Unicode. If the locale's encoding is UTF-8, > it can represent the full set of Unicode characters, otherwise, only a > subset is representable. In the latter case, using private-use > characters to represent these bytes would be an option. For UTF-8, > doing so would create an ambiguity, as the private-use characters may > regularly occur in the input also. > > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs. Would this mean that real private use characters in the file name would raise an exception? How? The UTF-8 decoder doesn't pass those bytes to any error handler. > The error handler interface is extended to allow the encode error > handler to return byte strings immediately, in addition to returning > Unicode strings which then get encoded again. Then the error callback for encoding would become specific to the target encoding. Would this mean that the handler checks which encoding is used and behaves like "strict" if it doesn't recognize the encoding? > If the locale's encoding is UTF-8, the file system encoding is set to > a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes > (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. Is this done by the codec, or the error handler? If it's done by the codec I don't see a reason for the "python-escape" error handler. > Discussion > ========== > > While providing a uniform API to non-decodable bytes, this interface > has the limitation that chosen representation only "works" if the data > get converted back to bytes with the python-escape error handler > also. I thought the error handler would be used for decoding. > Encoding the data with the locale's encoding and the (default) > strict error handler will raise an exception, encoding them with UTF-8 > will produce non-sensical data. > > For most applications, we assume that they eventually pass data > received from a system interface back into the same system > interfaces. For example, and application invoking os.listdir() will "and" -> "an" > likely pass the result strings back into APIs like os.stat() or > open(), which then encodes them back into their original byte > representation. Applications that need to process the original byte > strings can obtain them by encoding the character strings with the > file system encoding, passing "python-escape" as the error handler > name. Servus, Walter From google at mrabarnett.plus.com Wed Apr 22 14:17:31 2009 From: google at mrabarnett.plus.com (MRAB) Date: Wed, 22 Apr 2009 13:17:31 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49EF0ADB.2090107@mrabarnett.plus.com> Martin v. L?wis wrote: [snip] > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs. > > The error handler interface is extended to allow the encode error > handler to return byte strings immediately, in addition to returning > Unicode strings which then get encoded again. > > If the locale's encoding is UTF-8, the file system encoding is set to > a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes > (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. > If the byte stream happens to include a sequence which decodes to U+F01xx, shouldn't that raise an exception? From dirkjan at ochtman.nl Wed Apr 22 14:31:09 2009 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 22 Apr 2009 14:31:09 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> Message-ID: <49EF0E0D.1060805@ochtman.nl> On 22/04/2009 14:20, glyph at divmod.com wrote: > -1. On UNIX, character data is not sufficient to represent paths. We > must, must, must continue to have a simple bytes interface to these > APIs. Covering it up in layers of obscure encoding hacks will not make > the problem go away, it will just make it harder to understand. As a hg developer, I have to concur. Keeping bytes-based APIs intact would make porting hg to py3k much, much easier. You may be able to imagine that dealing with paths correctly cross-platform on a VCS is a major PITA, and py3k is currently not helping the situation. Cheers, Dirkjan From benjamin at python.org Wed Apr 22 20:29:22 2009 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 22 Apr 2009 13:29:22 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF0E0D.1060805@ochtman.nl> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> Message-ID: <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com> 2009/4/22 Dirkjan Ochtman : > On 22/04/2009 14:20, glyph at divmod.com wrote: >> >> -1. On UNIX, character data is not sufficient to represent paths. We >> must, must, must continue to have a simple bytes interface to these >> APIs. Covering it up in layers of obscure encoding hacks will not make >> the problem go away, it will just make it harder to understand. > > As a hg developer, I have to concur. Keeping bytes-based APIs intact would > make porting hg to py3k much, much easier. You may be able to imagine that > dealing with paths correctly cross-platform on a VCS is a major PITA, and > py3k is currently not helping the situation. You're concerns are valid, but I don't see anything in the PEP about removing the bytes APIs. -- Regards, Benjamin From solipsis at pitrou.net Wed Apr 22 20:44:40 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 22 Apr 2009 18:44:40 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?= =?utf-8?q?=09Character_Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> Message-ID: Dirkjan Ochtman ochtman.nl> writes: > > As a hg developer, I have to concur. Keeping bytes-based APIs intact > would make porting hg to py3k much, much easier. You may be able to > imagine that dealing with paths correctly cross-platform on a VCS is a > major PITA, and py3k is currently not helping the situation. bytes-based APIs are certainly more bullet-proof under Unix, but it's the reverse under Windows. Martin's proposal aims to bridge the gap and propose something that makes text-based APIs as bullet-proof under Unix as they already are under Windows. Regards Antoine. From martin at v.loewis.de Wed Apr 22 21:07:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 21:07:47 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF03F4.3010407@livinglogic.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de> Message-ID: <49EF6B03.3010701@v.loewis.de> > "correct" -> "corrected" Thanks, fixed. >> To convert non-decodable bytes, a new error handler "python-escape" is >> introduced, which decodes non-decodable bytes using into a private-use >> character U+F01xx, which is believed to not conflict with private-use >> characters that currently exist in Python codecs. > > Would this mean that real private use characters in the file name would > raise an exception? How? The UTF-8 decoder doesn't pass those bytes to > any error handler. The python-escape codec is only used/meaningful if the env encoding is not UTF-8. For any other encoding, it is assumed that no character actually maps to the private-use characters. >> The error handler interface is extended to allow the encode error >> handler to return byte strings immediately, in addition to returning >> Unicode strings which then get encoded again. > > Then the error callback for encoding would become specific to the target > encoding. Why would it become specific? It can work the same way for any encoding: take U+F01xx, and generate the byte xx. >> If the locale's encoding is UTF-8, the file system encoding is set to >> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. > > Is this done by the codec, or the error handler? If it's done by the > codec I don't see a reason for the "python-escape" error handler. utf-8b is a new codec. However, the utf-8b codec is only used if the env encoding would otherwise be utf-8. For utf-8b, the error handler is indeed unnecessary. >> While providing a uniform API to non-decodable bytes, this interface >> has the limitation that chosen representation only "works" if the data >> get converted back to bytes with the python-escape error handler >> also. > > I thought the error handler would be used for decoding. It's used in both directions: for decoding, it converts \xXX to U+F01XX. For encoding, U+F01XX will trigger an error, which is then handled by the handler to produce \xXX. > "and" -> "an" Thanks, fixed. Regards, Martin From rdmurray at bitdance.com Wed Apr 22 20:58:20 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 22 Apr 2009 14:58:20 -0400 (EDT) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com> Message-ID: On Wed, 22 Apr 2009 at 13:29, Benjamin Peterson wrote: > 2009/4/22 Dirkjan Ochtman : >> On 22/04/2009 14:20, glyph at divmod.com wrote: >>> >>> -1. On UNIX, character data is not sufficient to represent paths. We >>> must, must, must continue to have a simple bytes interface to these >>> APIs. Covering it up in layers of obscure encoding hacks will not make >>> the problem go away, it will just make it harder to understand. >> >> As a hg developer, I have to concur. Keeping bytes-based APIs intact would >> make porting hg to py3k much, much easier. You may be able to imagine that >> dealing with paths correctly cross-platform on a VCS is a major PITA, and >> py3k is currently not helping the situation. > > You're concerns are valid, but I don't see anything in the PEP about > removing the bytes APIs. Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k. --David From martin at v.loewis.de Wed Apr 22 21:17:56 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 21:17:56 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> Message-ID: <49EF6D64.4030302@v.loewis.de> > -1. On UNIX, character data is not sufficient to represent paths. We > must, must, must continue to have a simple bytes interface to these > APIs. I'd like to respond to this concern in three ways: 1. The PEP doesn't remove any of the existing interfaces. So if the interfaces for byte-oriented file names in 3.0 work fine for you, feel free to continue to use them. 2. Even if they were taken away (which the PEP does not propose to do), it would be easy to emulate them for applications that want them. For example, listdir could be wrapped as def listdir_b(bytestring): fse = sys.getfilesystemencoding() string = bytestring.decode(fse, "python-escape") for fn in os.listdir(string): yield fn.encoded(fse, "python-escape") 3. I still disagree that we must, must, must continue to provide these interfaces. I don't understand from the rest of your message what would *actually* break if people would use the proposed interfaces. Regards, Martin From martin at v.loewis.de Wed Apr 22 21:19:31 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 21:19:31 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF0E0D.1060805@ochtman.nl> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> Message-ID: <49EF6DC3.2040406@v.loewis.de> Dirkjan Ochtman wrote: > On 22/04/2009 14:20, glyph at divmod.com wrote: >> -1. On UNIX, character data is not sufficient to represent paths. We >> must, must, must continue to have a simple bytes interface to these >> APIs. Covering it up in layers of obscure encoding hacks will not make >> the problem go away, it will just make it harder to understand. > > As a hg developer, I have to concur. Keeping bytes-based APIs intact > would make porting hg to py3k much, much easier. You may be able to > imagine that dealing with paths correctly cross-platform on a VCS is a > major PITA, and py3k is currently not helping the situation. I find these statements contradicting: py3k *is* keeping the byte-based APIs for file names intact, so why is it not helping the situation, when this is what is needed to make porting much, much easier? Regards, Martin From martin at v.loewis.de Wed Apr 22 21:21:17 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 21:21:17 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com> Message-ID: <49EF6E2D.9070504@v.loewis.de> > Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k. Define complete. I'm not aware of any interfaces wrt. file IO that are lacking, so which ones were you thinking of? Python doesn't currently provide a way to access environment variables and command line arguments as bytes. With the PEP, such a way would actually become available for applications that desire it. Regards, Martin From martin at v.loewis.de Wed Apr 22 21:24:54 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 21:24:54 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF0ADB.2090107@mrabarnett.plus.com> References: <49EEBE2E.3090601@v.loewis.de> <49EF0ADB.2090107@mrabarnett.plus.com> Message-ID: <49EF6F06.9060008@v.loewis.de> MRAB wrote: > Martin v. L?wis wrote: > [snip] >> To convert non-decodable bytes, a new error handler "python-escape" is >> introduced, which decodes non-decodable bytes using into a private-use >> character U+F01xx, which is believed to not conflict with private-use >> characters that currently exist in Python codecs. >> >> The error handler interface is extended to allow the encode error >> handler to return byte strings immediately, in addition to returning >> Unicode strings which then get encoded again. >> >> If the locale's encoding is UTF-8, the file system encoding is set to >> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. >> > If the byte stream happens to include a sequence which decodes to > U+F01xx, shouldn't that raise an exception? I apparently have not expressed it clearly, so please help me improve the text. What I mean is this: - if the environment encoding (for lack of better name) is UTF-8, Python stops using the utf-8 codec under this PEP, and switches to the utf-8b codec. - otherwise (env encoding is not utf-8), undecodable bytes get decoded with the error handler. In this case, U+F01xx will not occur in the byte stream, since no other codec ever produces this PUA character (this is not fully true - UTF-16 may also produce PUA characters, but they can't appear as env encodings). So the case you are referring to should not happen. Regards, Martin From rdmurray at bitdance.com Wed Apr 22 21:28:32 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 22 Apr 2009 15:28:32 -0400 (EDT) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF6E2D.9070504@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF0E0D.1060805@ochtman.nl> <1afaf6160904221129y2a81837bm133772bc0f4ea107@mail.gmail.com> <49EF6E2D.9070504@v.loewis.de> Message-ID: On Wed, 22 Apr 2009 at 21:21, "Martin v. L?wis" wrote: >> Yeah, but IIRC a complete set of bytes APIs doesn't exist yet in py3k. > > Define complete. I'm not aware of any interfaces wrt. file IO that are > lacking, so which ones were you thinking of? > > Python doesn't currently provide a way to access environment variables > and command line arguments as bytes. With the PEP, such a way would > actually become available for applications that desire it. Those are the two that I'm thinking of. I think I understand your proposal better now after your example of implementing listdir(bytes). Putting it in the PEP would probably be a good idea. I personally don't have enough practice in actually working with various encodings (or any understanding of unicode escapes) to comment further. --David From walter at livinglogic.de Wed Apr 22 22:06:49 2009 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Wed, 22 Apr 2009 22:06:49 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF6B03.3010701@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de> <49EF6B03.3010701@v.loewis.de> Message-ID: <49EF78D9.7020709@livinglogic.de> Martin v. L?wis wrote: >> "correct" -> "corrected" > > Thanks, fixed. > >>> To convert non-decodable bytes, a new error handler "python-escape" is >>> introduced, which decodes non-decodable bytes using into a private-use >>> character U+F01xx, which is believed to not conflict with private-use >>> characters that currently exist in Python codecs. >> Would this mean that real private use characters in the file name would >> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to >> any error handler. > > The python-escape codec is only used/meaningful if the env encoding > is not UTF-8. For any other encoding, it is assumed that no character > actually maps to the private-use characters. Which should be true for any encoding from the pre-unicode era, but not for UTF-16/32 and variants. >>> The error handler interface is extended to allow the encode error >>> handler to return byte strings immediately, in addition to returning >>> Unicode strings which then get encoded again. >> Then the error callback for encoding would become specific to the target >> encoding. > > Why would it become specific? It can work the same way for any encoding: > take U+F01xx, and generate the byte xx. If any error callback emits bytes these byte sequences must be legal in the target encoding, which depends on the target encoding itself. However for the normal use of this error handler this might be irrelevant, because those filenames that get encoded were constructed in such a way that reencoding them regenerates the original byte sequence. >>> If the locale's encoding is UTF-8, the file system encoding is set to >>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. >> Is this done by the codec, or the error handler? If it's done by the >> codec I don't see a reason for the "python-escape" error handler. > > utf-8b is a new codec. However, the utf-8b codec is only used if the > env encoding would otherwise be utf-8. For utf-8b, the error handler > is indeed unnecessary. Wouldn't it make more sense to be consistent how non-decodable bytes get decoded? I.e. should the utf-8b codec decode those bytes to PUA characters too (and refuse to encode then, so the error handler outputs them)? >>> While providing a uniform API to non-decodable bytes, this interface >>> has the limitation that chosen representation only "works" if the data >>> get converted back to bytes with the python-escape error handler >>> also. >> I thought the error handler would be used for decoding. > > It's used in both directions: for decoding, it converts \xXX to > U+F01XX. For encoding, U+F01XX will trigger an error, which is then > handled by the handler to produce \xXX. But only for non-UTF8 encodings? Servus, Walter From mal at egenix.com Wed Apr 22 22:43:34 2009 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 22 Apr 2009 22:43:34 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF78D9.7020709@livinglogic.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de> <49EF6B03.3010701@v.loewis.de> <49EF78D9.7020709@livinglogic.de> Message-ID: <49EF8176.1060704@egenix.com> On 2009-04-22 22:06, Walter D?rwald wrote: > Martin v. L?wis wrote: >>> "correct" -> "corrected" >> Thanks, fixed. >> >>>> To convert non-decodable bytes, a new error handler "python-escape" is >>>> introduced, which decodes non-decodable bytes using into a private-use >>>> character U+F01xx, which is believed to not conflict with private-use >>>> characters that currently exist in Python codecs. >>> Would this mean that real private use characters in the file name would >>> raise an exception? How? The UTF-8 decoder doesn't pass those bytes to >>> any error handler. >> The python-escape codec is only used/meaningful if the env encoding >> is not UTF-8. For any other encoding, it is assumed that no character >> actually maps to the private-use characters. > > Which should be true for any encoding from the pre-unicode era, but not > for UTF-16/32 and variants. Actually it's not even true for the pre-Unicode codecs. It was and is common for Asian companies to use company specific symbols in private areas or extended versions of CJK character sets. Microsoft even published an editor for Asian users create their own glyphs as needed: http://msdn.microsoft.com/en-us/library/cc194861.aspx Here's an overview for some US companies using such extensions: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=VendorUseOfPUA (it's no surprise that most of these actually defined their own charsets) SIL even started a registry for the private use areas (PUAs): http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA This is their current list of assignments: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&item_id=SILPUAassignments and here's how to register: http://scripts.sil.org/cms/SCRIPTs/page.php?site_id=nrsi&cat_id=UnicodePUA#404a261e -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 22 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jess.austin at gmail.com Wed Apr 22 22:57:07 2009 From: jess.austin at gmail.com (Jess Austin) Date: Wed, 22 Apr 2009 15:57:07 -0500 Subject: [Python-Dev] Issue5434: datetime.monthdelta In-Reply-To: References: <18918.61476.980951.991275@montanaro.dyndns.org> <18919.51931.874515.848841@montanaro.dyndns.org> Message-ID: On Thu, Apr 16, 2009 at 8:01 PM, Jess Austin wrote: > These operations are useful in particular contexts. ?What I've > submitted is also useful, and currently isn't easy in core, > batteries-included python. ?While I would consider the foregoing > interpretation of the Zen to be backwards (this doesn't add another > way to do something that's already possible, it makes possible > something that currently encourages one to pull her hair out), I > suppose it doesn't matter. ?If adding a class and a function to a > module will require extended advocacy on -ideas and c.l.p, I'm > probably not the person for the job. > > If, on the other hand, one of the committers wants to toss this in at > some point, whether now or 3 versions down the road, the patch is up > at bugs.python.org (and I'm happy to make any suggested > modifications). ?I'm glad to have written this; I learned a bit about > CPython internals and scraped a layer of rust off my C skills. ?I will > go ahead and backport the python-coded version to 2.3. ?I'll continue > this conversation with whomever for however long, but I suspect this > topic will soon have worn out its welcome on python-dev. I've uploaded the backported python version source distribution to PyPI, http://pypi.python.org/pypi?name=MonthDelta&:action=display with better-formatted documentation at http://packages.python.org/MonthDelta/ "easy_install MonthDelta" works too. cheers, Jess From martin at v.loewis.de Wed Apr 22 23:00:51 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 22 Apr 2009 23:00:51 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF78D9.7020709@livinglogic.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF03F4.3010407@livinglogic.de> <49EF6B03.3010701@v.loewis.de> <49EF78D9.7020709@livinglogic.de> Message-ID: <49EF8583.1010602@v.loewis.de> >> The python-escape codec is only used/meaningful if the env encoding >> is not UTF-8. For any other encoding, it is assumed that no character >> actually maps to the private-use characters. > > Which should be true for any encoding from the pre-unicode era, but not > for UTF-16/32 and variants. Right. However, these can't appear as environment/file system encodings, because they use null bytes. >> Why would it become specific? It can work the same way for any encoding: >> take U+F01xx, and generate the byte xx. > > If any error callback emits bytes these byte sequences must be legal in > the target encoding, which depends on the target encoding itself. No. The whole process started with data having an *invalid* encoding in the source encoding (which, after the roundtrip, is now the target encoding). So the python-escape error handler deliberately produces byte sequences that are invalid in the environment encoding (hence the additional permission of having it produce bytes instead of characters). > However for the normal use of this error handler this might be > irrelevant, because those filenames that get encoded were constructed in > such a way that reencoding them regenerates the original byte sequence. Exactly so. The error handler is not of much use outside this specific scenario. >> utf-8b is a new codec. However, the utf-8b codec is only used if the >> env encoding would otherwise be utf-8. For utf-8b, the error handler >> is indeed unnecessary. > > Wouldn't it make more sense to be consistent how non-decodable bytes get > decoded? I.e. should the utf-8b codec decode those bytes to PUA > characters too (and refuse to encode then, so the error handler outputs > them)? Unfortunately, that won't work. If the original encoding is UTF-8, and uses PUA characters, then, on re-encoding, it's not possible to tell whether to encode as a PUA character, or as an invalid byte. This was my original proposal a year ago, and people immediately suggested that it is not at all acceptable if there is the slightest chance of information loss. Hence the current PEP. >>> I thought the error handler would be used for decoding. >> It's used in both directions: for decoding, it converts \xXX to >> U+F01XX. For encoding, U+F01XX will trigger an error, which is then >> handled by the handler to produce \xXX. > > But only for non-UTF8 encodings? Right. For ease of use, the implementation will specify the error handler regardless, and the recommended use for applications will be to use the error handler regardless. For utf-8b, the error handler will never be invoked, since all input can be converted always. Regards, Martin From glyph at divmod.com Thu Apr 23 00:49:30 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Wed, 22 Apr 2009 22:49:30 -0000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF6D64.4030302@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <20090422122024.12555.35958968.divmod.xquotient.8926@weber.divmod.com> <49EF6D64.4030302@v.loewis.de> Message-ID: <20090422224930.12555.316330142.divmod.xquotient.9043@weber.divmod.com> On 07:17 pm, martin at v.loewis.de wrote: >>-1. On UNIX, character data is not sufficient to represent paths. We >>must, must, must continue to have a simple bytes interface to these >>APIs. >I'd like to respond to this concern in three ways: > >1. The PEP doesn't remove any of the existing interfaces. So if the > interfaces for byte-oriented file names in 3.0 work fine for you, > feel free to continue to use them. It's good to know this. It would be good if the PEP made it clear that it is proposing an additional way to work with undecodable bytes, not replacing the existing one. For me, this PEP isn't an acceptable substitute for direct bytes-based access to command-line arguments and environment variables on UNIX. To my knowledge *those* APIs still don't exist yet. I would like it if this PEP were not used as an excuse to avoid adding them. >2. Even if they were taken away (which the PEP does not propose to do), > it would be easy to emulate them for applications that want them. I think this is a pretty clear abstraction inversion. Luckily nobody is proposing it :). >3. I still disagree that we must, must, must continue to provide these > interfaces. You do have a point; if there is a clean, defined mapping between str and bytes in terms of all path/argv/environ APIs, then we don't *need* those APIs, since we can just implement them in terms of characters. But I still think that's a bad idea, since mixing the returned strings with *other* APIs remains problematic. However, I still think the mapping you propose is problematic... > I don't understand from the rest of your message what > would *actually* break if people would use the proposed interfaces. As far as more concrete problems: the utf-8 codec currently in python 2.5 and 2.6, and 3.0 will happily encode half-surrogates, at least in the builds I have. >>> '\udc81'.encode('utf-8').decode('utf-8') '\udc81' So there's an ambiguity when passing U+DC81 to this codec: do you mean \xed\xb2\x81 or do you just mean \x81? Of course it would be possible to make UTF-8B consistent in this regard, but it is still going to interact with code that thinks in terms of actual UTF-8, and the failure mode here is very difficult to inspect. A major problem here is that it's very difficult to puzzle out whether anything *will* actually break. I might be wrong about the above for some subtlety of unicode that I don't quite understand, but I don't want to spend all day experimenting with every possible set of build options, python versions, and unicode specifications. Neither, I wager, do most people who want to call listdir(). Another specific problem: looking at the Character Map application on my desktop, U+F0126 and U+F0127 are considered printable characters. I'm not sure what they're supposed to be, exactly, but there are glyphs there. This is running Ubuntu 8.04; there may be more of these in use in more recent version of GNOME. There is nothing "private" about the "private use" area; Python can never use any of these characters for *anything*, except possibly internally in ways which are never exposed to application code, because the operating system (or window system, or libraries) might use them. If I pass a string with those printable PUA/A characters in it to listdir(), what happens? Do they get turned into bytes, do they only get turned into bytes if my filesystem encoding happens to be something other than UTF-8...? The PEP seems a bit ambiguous to me as far as how the PUA hack and the half-surrogate hack interact. I could be wrong, but it seems to me to be an either-or proposition, in which case there would be *four* bytes types in python 3.1: bytes, bytearray, str-with-PUA/A-junk, str-with- half-surrogate-junk. Detecting the difference would be an expensive and subtle affair; the simplest solution I could think of would be to use an error-prone regex. If the encoding hack used were simply NULL, then the detection would be straightforward: "if '\u0000' in thingy:". Ultimately I think I'm only -0 on all of this now, as long as we get bytes versions of environ and argv. Even if these corner-case issues aren't fixed, those of us who want to have correct handling of undecodable filenames can do so. From larry at hastings.org Thu Apr 23 10:42:02 2009 From: larry at hastings.org (Larry Hastings) Date: Thu, 23 Apr 2009 01:42:02 -0700 Subject: [Python-Dev] Proposed: add an environment variable, PYTHONPREFIXES Message-ID: <49F029DA.6080107@hastings.org> I've submitted a patch to implement a new environment variable, PYTHONPREFIXES. The patch is here: http://bugs.python.org/issue5819 PYTHONPREFIXES is similar to PYTHONUSERBASE: it lets you add "prefix directories" to be culled for site packages. It differs from PYTHONUSERBASE in three ways: * PYTHONPREFIXES has an empty default value. PYTHONUSERBASE has a default, e.g. ~/.local on UNIX-like systems. * PYTHONPREFIXES supports multiple directories, separated by the site-specific directory separator character (os.pathsep). Earlier directories take precedence. PYTHONUSERBASE supports specifying at most one directory. * PYTHONPREFIXES adds its directories to site.PREFIXES, so it reuses the existing mechanisms for site package directories, exactly simulating a real prefix directory. PYTHONUSERBASE only adds a single directory, using its own custom code path. This last point bears further discussion. PYTHONUSERBASE's custom code to inspect only a single directory has resulted in at least one bug, if not more, as follows: * The bona-fide known bug: the Debian package mantainer for Python decided to change "site-packages" to "dist-packages" in 2.6, for reasons I still don't quite understand. He made this change in site.addsitepackages and distutils.sysconfig.get_python_lib, and similarly in setuptools, but he missed changing it in site.addusersitepackages. This meant that if you used setup.py to install a package to a private prefix directory, PYTHONUSERBASE had no hope of ever finding the package. (Happily this bug is fixed.) * I suspect there's a similar bug with PYTHONUSERBASE on the "os2emx" and "riscos" platforms. site.addsitepackages on those platforms looks in "{prefix}/Lib/site-packages", but site.addusersitepackages looks in "{prefix}/lib/python{version}/site-packages" as it does on any non-Windows platform. Presumably setup.py on those two platforms installs site packages to the directory site.addsitepackages inspects, which means that PYTHONUSERBASE doesn't work on those two platforms. PYTHONUSERBASE's custom code path to add site package directories seems unnecessary to me. I cannot fathom why its implementors chose this approach; in any case I think reusing site.addsitepackages is a clear win. I fear it's too late to change PYTHONUSERBASE so it simply called site.addsitepackages, as that would change its established semantics. Though if that idea found support I'd be happy to contribute a patch. A few more notes on PYTHONPREFIXES: * PYTHONPREFIXES is gated by the exact same mechanisms that shut off PYTHONUSERBASE. * Specifying "-s" on the Python command line shuts it off. * Setting the environment variable PYTHONNOUSERSITE to a non-empty string shuts it off. * If the effective uid / gid doesn't match the actual uid / gid it automatically shuts off. * I'm not enormously happy with the name. Until about an hour or two ago I was calling it "PYTHONUSERBASES". I'm open to other suggestions. * I'm not sure that PYTHONPREFIX should literally modify site.PREFIXES. If that's a bad idea I'd be happy to amend the patch so it didn't touch site.PREFIXES. * Reaction in python-ideas has been reasonably positive, though I gather Nick Coughlan and Scott David Daniels think it's unnecessary. (To read the discussion, search for the old name: "PYTHONUSERBASES".) * Ben Finney prefers a counter-proposal he made in the python-ideas discussion: change the existing PYTHONUSERBASE to support multiple directories. I don't like this approach, because: a) it means you have to explicitly add the local default if you want to use it, and b) PYTHONUSERBASE only inspects one directory, whereas PYTHONPREFIX inspects all the directories Python might use for site packages. I do admit this approach would be preferable to no change at all. The patch is thrillingly simple and works fine. However it's not ready to be merged because I haven't touched the documentation. I figured I'd hold off until I see which way the wind blows. I'd also be happy to convert this into a PEP if that's what's called for. /larry/ From georg at python.org Thu Apr 23 11:07:23 2009 From: georg at python.org (Georg Brandl) Date: Thu, 23 Apr 2009 11:07:23 +0200 Subject: [Python-Dev] Reminder: Python Bug Day on Saturday Message-ID: <49F02FCB.3020402@python.org> Hi, I'd like to remind everyoone that there will be a Python Bug Day on April 25. As always, this is a perfect opportunity to get involved in Python development, or bring your own issues to attention, discuss them and (hopefully) resolve them together with the core developers. We will coordinate over IRC, in #python-dev on irc.freenode.net, and the Wiki page http://wiki.python.org/moin/PythonBugDay has all important information and a short list of steps how to get set up. Hope to see you there! Georg From ben+python at benfinney.id.au Thu Apr 23 11:13:13 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 23 Apr 2009 19:13:13 +1000 Subject: [Python-Dev] Location of OS-installed versus Python-installed libraries (was: Proposed: add an environment variable, PYTHONPREFIXES) References: <49F029DA.6080107@hastings.org> Message-ID: <87hc0f4tsm.fsf@benfinney.id.au> Larry Hastings writes: > the Debian package mantainer for Python decided to change > "site-packages" to "dist-packages" in 2.6, for reasons I still don't > quite understand. For reference, Larry is referring to changes announced by Matthias Klose on 2009-02-16 in Message-ID: <18841.49052.405847.359567 at gargle.gargle.HOWL> : > Local installation path > ----------------------- > > When installing Python modules using distutils, the resulting files > end up in the same location wether they are installed by a Debian > package, or by a local user or administrator, unless the installation > path is overwritten on the command line. Compare this with most > software based on autoconf, where an explicit prefix has to be > provided for the packaging, while the default install installs into > /usr/local. For new Python versions packaged in Debian this will > change so that an installation into /usr (not /usr/local) requires an > extra option to distutils install command (--install-layout=deb). To > avoid breaking the packaging of existing code the distutils install > command for 2.4 and 2.5 will just accept this option and ignore it. > For the majority of packages we won't see changes in the packaging, > provided that the python packaging helpers can find the files in the > right location and move it to the expected target path. > > A second issue raised by developers was the clash of modules and > extensions installed by a local python installation (with default > prefix /usr/local) with the modules provided by Debian packages > (/usr/local/lib/pythonX.Y/site-packages shared by the patched "system" > python and the locally installed python. To avoid this clash the > directory `site-packages' should be renamed to `dist-packages' in > both locations: > > - /usr/lib/pythonX.Y/dist-packages (installation location for code > packaged for Debian) > - /usr/local/lib/pythonX.Y/dist-packages (installation location > for locally installed code using distutils install without > options). > > The path /usr/lib/pythonX.Y/site-packages is not found on sys.path > anymore. > > About the name: Discussed this with Barry Warsaw and Martin v. Loewis, > and we came to the conclusion that using the same directory name for > both locations would be the most consistent way. -- \ ?In any great organization it is far, far safer to be wrong | `\ with the majority than to be right alone.? ?John Kenneth | _o__) Galbraith, 1989-07-28 | Ben Finney From ben at redfrontdoor.org Thu Apr 23 14:54:59 2009 From: ben at redfrontdoor.org (Ben North) Date: Thu, 23 Apr 2009 13:54:59 +0100 Subject: [Python-Dev] Suggested doc patch for tarfile Message-ID: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com> Hi, The current documentation for tarfile.TarFile.extractfile() does not mention that the returned 'file-like object' supports close() and also iteration. The attached patch (against svn trunk) fixes this. (Background: I was wondering whether I could write def process_and_close_file(f_in): with closing(f_in) as f: # Do stuff with f. and have it work whether f_in was a true file or the return value of extractfile(), and thought from the documentation that I couldn't. Of course, I could have just tried it, but I think fixing the documentation wouldn't hurt.) Alternative: enhance the tarfile.ExFileObject class to support use as a context manager? Thanks, Ben. -------------- next part -------------- A non-text attachment was scrubbed... Name: tarfile.rst.patch Type: application/octet-stream Size: 612 bytes Desc: not available URL: From aahz at pythoncraft.com Thu Apr 23 14:57:37 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 23 Apr 2009 05:57:37 -0700 Subject: [Python-Dev] Suggested doc patch for tarfile In-Reply-To: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com> References: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com> Message-ID: <20090423125737.GB59@panix.com> On Thu, Apr 23, 2009, Ben North wrote: > > The current documentation for tarfile.TarFile.extractfile() does not > mention that the returned 'file-like object' supports close() and also > iteration. The attached patch (against svn trunk) fixes this. Please post the patch to bugs.python.org -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From ben at redfrontdoor.org Thu Apr 23 15:04:44 2009 From: ben at redfrontdoor.org (Ben North) Date: Thu, 23 Apr 2009 14:04:44 +0100 Subject: [Python-Dev] Suggested doc patch for tarfile In-Reply-To: <20090423125737.GB59@panix.com> References: <5169ff10904230554i495b0cb3v34f9a761698a2c50@mail.gmail.com> <20090423125737.GB59@panix.com> Message-ID: <5169ff10904230604u6dcce35ar16b928fb920ce398@mail.gmail.com> >> The current documentation for tarfile.TarFile.extractfile() does not >> mention that the returned 'file-like object' supports close() and also >> iteration. ?The attached patch (against svn trunk) fixes this. > > Please post the patch to bugs.python.org Done: http://bugs.python.org/issue5821 Thanks, Ben. From cy6ergn0m at gmail.com Thu Apr 23 20:21:37 2009 From: cy6ergn0m at gmail.com (cyberGn0m) Date: Thu, 23 Apr 2009 22:21:37 +0400 Subject: [Python-Dev] Python3 and arm-linux Message-ID: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com> Somebody knowns, is python3 works on arm-linux. Is it possible to build it? Where to find related discussions? Maybe some special patches already available? Should i try to get sources from svn or get known version snapshot? ----------------------------------------------------------------- From aleaxit at gmail.com Thu Apr 23 20:55:48 2009 From: aleaxit at gmail.com (Alex Martelli) Date: Thu, 23 Apr 2009 11:55:48 -0700 Subject: [Python-Dev] Python3 and arm-linux In-Reply-To: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com> References: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com> Message-ID: On Thu, Apr 23, 2009 at 11:21 AM, cyberGn0m wrote: > Somebody knowns, is python3 works on arm-linux. Is it possible to build it? > Where to find related discussions? Maybe some special patches already > available? Should i try to get sources from svn or get known version > snapshot? > I haven't tried, but there's an interesting distro at http://www.vanille-media.de/site/index.php/projects/python-for-arm-linux/ -- I don't know if other such distros have better-updated Python versions (eg. current 2.6.* vs that one's 2.4.*) but that one includes a lot of very useful add-ons. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From cy6ergn0m at gmail.com Thu Apr 23 20:59:01 2009 From: cy6ergn0m at gmail.com (cyberGn0m) Date: Thu, 23 Apr 2009 22:59:01 +0400 Subject: [Python-Dev] Python3 and arm-linux In-Reply-To: References: <645d12c20904231121y342247b8ib75f9eea00fea8ee@mail.gmail.com> Message-ID: <645d12c20904231159s3eb1df23oc16b9ae0ff74dff4@mail.gmail.com> Yes, i visited it.. but.. looks that it was not updated for a few years, maybe it was integrated and python-for-arm-linux is a part of openembedded project? 2009/4/23 Alex Martelli > On Thu, Apr 23, 2009 at 11:21 AM, cyberGn0m wrote: > >> Somebody knowns, is python3 works on arm-linux. Is it possible to build >> it? Where to find related discussions? Maybe some special patches already >> available? Should i try to get sources from svn or get known version >> snapshot? >> > > I haven't tried, but there's an interesting distro at > http://www.vanille-media.de/site/index.php/projects/python-for-arm-linux/-- I don't know if other such distros have better-updated Python versions > (eg. current 2.6.* vs that one's 2.4.*) but that one includes a lot of very > useful add-ons. > > > Alex > > -- ----------------------------------------------------------------- ????? ?????????? From cs at zip.com.au Fri Apr 24 01:27:12 2009 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 24 Apr 2009 09:27:12 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> Message-ID: <20090423232712.GA31693@cskk.homeip.net> On 22Apr2009 08:50, Martin v. L?wis wrote: | File names, environment variables, and command line arguments are | defined as being character data in POSIX; Specific citation please? I'd like to check the specifics of this. | the C APIs however allow | passing arbitrary bytes - whether these conform to a certain encoding | or not. Indeed. | This PEP proposes a means of dealing with such irregularities | by embedding the bytes in character strings in such a way that allows | recreation of the original byte string. [...] So you're proposing that all POSIX OS interfaces (which use byte strings) interpret those byte strings into Python3 str objects, with a codec that will accept arbitrary byte sequences losslessly and is totally reversible, yes? And, I hope, that the os.* interfaces silently use it by default. | For most applications, we assume that they eventually pass data | received from a system interface back into the same system | interfaces. For example, and application invoking os.listdir() will | likely pass the result strings back into APIs like os.stat() or | open(), which then encodes them back into their original byte | representation. Applications that need to process the original byte | strings can obtain them by encoding the character strings with the | file system encoding, passing "python-escape" as the error handler | name. -1 This last sentence kills the idea for me, unless I'm missing something. Which I may be, of course. POSIX filesystems _do_not_ have a file system encoding. The user's environment suggests a preferred encoding via the locale stuff, and apps honouring that will make nice looking byte strings as filenames for that user. (Some platforms, like MacOSX' HFS filesystems, _do_ enforce an encoding, and a quite specific variety of UTF-8 it is; I would say they're not a full UNIX filesystem _precisely_ because they reject certain byte strings that are valid on other UNIX filesystems. What will your proposal do here? I can imagine it might cope with existing names, but what happens when the user creates a new name?) Further, different users can use different locales and encodings. If they do it in different work areas they'll be perfectly happy; if they do it in a shared area doubtless confusion will reign, but only in the users' minds, not in the filesystem. If I'm writing a general purpose UNIX tool like chmod or find, I expect it to work reliably on _any_ UNIX pathname. It must be totally encoding blind. If I speak to the os.* interface to open a file, I expect to hand it bytes and have it behave. As an explicit example, I would be just fine with python's open(filename, "w") to take a string and encode it for use, but _not_ ok for os.open() to require me to supply a string and cross my fingers and hope something sane happens when it is turned into bytes for the UNIX system call. I'm very much in favour of being able to work in strings for most purposes, but if I use the os.* interfaces on a UNIX system it is necessary to be _able_ to work in bytes, because UNIX file pathnames are bytes. If there isn't a byte-safe os.* facility in Python3, it will simply be unsuitable for writing low level UNIX tools. And I very much like using Python2 for that. Finally, I have a small python program whose whole purpose in life is to transcode UNIX filenames before transfer to a MacOSX HFS directory, because of HFS's enforced particular encoding. What approach should a Python app take to transcode UNIX pathnames under your scheme? Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ The nice thing about standards is that you have so many to choose from; furthermore, if you do not like any of them, you can just wait for next year's model. - Andrew S. Tanenbaum From cs at zip.com.au Fri Apr 24 01:32:45 2009 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 24 Apr 2009 09:32:45 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090423232712.GA31693@cskk.homeip.net> Message-ID: <20090423233245.GA6401@cskk.homeip.net> On 24Apr2009 09:27, I wrote: | If I'm writing a general purpose UNIX tool like chmod or find, I expect | it to work reliably on _any_ UNIX pathname. It must be totally encoding | blind. If I speak to the os.* interface to open a file, I expect to hand | it bytes and have it behave. As an explicit example, I would be just fine | with python's open(filename, "w") to take a string and encode it for use, | but _not_ ok for os.open() to require me to supply a string and cross | my fingers and hope something sane happens when it is turned into bytes | for the UNIX system call. | | I'm very much in favour of being able to work in strings for most | purposes, but if I use the os.* interfaces on a UNIX system it is | necessary to be _able_ to work in bytes, because UNIX file pathnames | are bytes. Just to follow up to my own words here, I would be ok for all the pure-byte stuff to be off in the "posix" module if os.* goes pure character instead of bytes or bytes+strings. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ ... that, in a few years, all great physical constants will have been approximately estimated, and that the only occupation which will be left to men of science will be to carry these measurements to another place of decimals. - James Clerk Maxwell (1813-1879) Scientific Papers 2, 244, October 1871 From cs at zip.com.au Fri Apr 24 01:47:24 2009 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 24 Apr 2009 09:47:24 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF6D64.4030302@v.loewis.de> Message-ID: <20090423234724.GA8077@cskk.homeip.net> On 22Apr2009 21:17, Martin v. L?wis wrote: | > -1. On UNIX, character data is not sufficient to represent paths. We | > must, must, must continue to have a simple bytes interface to these | > APIs. | | I'd like to respond to this concern in three ways: | | 1. The PEP doesn't remove any of the existing interfaces. So if the | interfaces for byte-oriented file names in 3.0 work fine for you, | feel free to continue to use them. Ok. I think I had read things as supplanting byte-oriented interfaces with this exciting new strings-can-do-it-all approach. | 2. Even if they were taken away (which the PEP does not propose to do), | it would be easy to emulate them for applications that want them. | For example, listdir could be wrapped as | | def listdir_b(bytestring): | fse = sys.getfilesystemencoding() Alas, no, because there is no sys.getfilesystemencoding() at the POSIX level. It's only the user's current locale stuff on a UNIX system, and has _nothing_ to do with the filesystem because UNIX filesystems don't have encodings. In particular, because the "best" (or to my mind "misleading") you can do for this is report what the current user thinks: http://docs.python.org/library/sys.html#sys.getfilesystemencoding then there's no guarrentee that what is chosen has any releationship to what was in use when the files being consulted were made. Now, if I were writing listdir_b() I'd want to be able to do something along these lines: - set LC_ALL=C (or some equivalent mechanism) - have os.listdir() read bytes as numeric values and transcode their values _directly_ into the corresponding Unicode code points. - yield bytes( ord(c) for c in os_listdir_string ) - have os.open() et al transcode unicode code points back into bytes. i.e. a straight one-to-one mapping, using only codepoints in the range 1..255. Then I'd have some confidence that I had got hold of the bytes as they had come from the underlying UNIX system call, and a way to get those bytes _back_ to a UNIX system call intact. | string = bytestring.decode(fse, "python-escape") | for fn in os.listdir(string): | yield fn.encoded(fse, "python-escape") | | 3. I still disagree that we must, must, must continue to provide these | interfaces. I don't understand from the rest of your message what | would *actually* break if people would use the proposed interfaces. My other longer message describes what would break, if I understand your proposal. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ From foom at fuhm.net Fri Apr 24 02:52:03 2009 From: foom at fuhm.net (James Y Knight) Date: Thu, 23 Apr 2009 20:52:03 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: On Apr 22, 2009, at 2:50 AM, Martin v. L?wis wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. +1. Even if some people still want a low-level bytes API, it's important that the easy case be easy. That is: the majority of Python applications should *just work, damnit* even with not-properly-encoded- in-current-LC_CTYPE filenames. It looks like this proposal accomplishes that, and does so in a relatively nice fashion. James From google at mrabarnett.plus.com Fri Apr 24 03:38:49 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 24 Apr 2009 02:38:49 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EF6F06.9060008@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF0ADB.2090107@mrabarnett.plus.com> <49EF6F06.9060008@v.loewis.de> Message-ID: <49F11829.9070504@mrabarnett.plus.com> Martin v. L?wis wrote: > MRAB wrote: >> Martin v. L?wis wrote: >> [snip] >>> To convert non-decodable bytes, a new error handler "python-escape" is >>> introduced, which decodes non-decodable bytes using into a private-use >>> character U+F01xx, which is believed to not conflict with private-use >>> characters that currently exist in Python codecs. >>> >>> The error handler interface is extended to allow the encode error >>> handler to return byte strings immediately, in addition to returning >>> Unicode strings which then get encoded again. >>> >>> If the locale's encoding is UTF-8, the file system encoding is set to >>> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >>> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. >>> >> If the byte stream happens to include a sequence which decodes to >> U+F01xx, shouldn't that raise an exception? > > I apparently have not expressed it clearly, so please help me improve > the text. What I mean is this: > > - if the environment encoding (for lack of better name) is UTF-8, > Python stops using the utf-8 codec under this PEP, and switches > to the utf-8b codec. > - otherwise (env encoding is not utf-8), undecodable bytes get decoded > with the error handler. In this case, U+F01xx will not occur > in the byte stream, since no other codec ever produces this PUA > character (this is not fully true - UTF-16 may also produce PUA > characters, but they can't appear as env encodings). > So the case you are referring to should not happen. > I think what's confusing me is that you talk about mapping non-decodable bytes to U+F01xx, but you also talk about decoding to half surrogate codes U+DC80..U+DCFF. If the bytes are mapped to single half surrogate codes instead of the normal pairs (low+high), then I can see that decoding could never be ambiguous and encoding could produce the original bytes. From larry.bugbee at boeing.com Fri Apr 24 05:55:04 2009 From: larry.bugbee at boeing.com (Bugbee, Larry) Date: Thu, 23 Apr 2009 20:55:04 -0700 Subject: [Python-Dev] Python3 and arm-linux In-Reply-To: References: Message-ID: <9418DB6C0B9D434190E54A78E931C3D109465F84@XCH-NW-7V1.nw.nos.boeing.com> > > Somebody knowns, is python3 works on arm-linux. Is it possible to build it? > > Where to find related discussions? Maybe some special patches already > > available? Should i try to get sources from svn or get known version > > snapshot? > > > > I haven't tried, but there's an interesting distro at http://www.vanille- > media.de/site/index.php/projects/python-for-arm-linux/ -- I don't know if > other such distros have better-updated Python versions (eg. current 2.6.* > vs that one's 2.4.*) but that one includes a lot of very useful add-ons. You may want to look at the ?ngstr?m distro. http://www.angstrom-distribution.org/ http://www.angstrom-distribution.org/repo/?pkgname=libpython2.6-1.0 That's where I'll be heading in a couple of weeks. (I have a new BeagleBoard with an ARM Cortex-A8.) Larry From hodgestar+pythondev at gmail.com Fri Apr 24 09:59:03 2009 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Fri, 24 Apr 2009 09:59:03 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. L?wis" wrote: > For Python 3, one proposed solution is to provide two sets of APIs: a > byte-oriented one, and a character-oriented one, where the > character-oriented one would be limited to not being able to represent > all data accurately. Unfortunately, for Windows, the situation would > be exactly the opposite: the byte-oriented interface cannot represent > all data; only the character-oriented API can. As a consequence, > libraries and applications that want to support all user data in a > cross-platform manner have to accept mish-mash of bytes and characters > exactly in the way that caused endless troubles for Python 2.x. Is the second part of this actually true? My understanding may be flawed, but surely all Unicode data can be converted to and from bytes using UTF-8? Obviously not all byte sequences are valid UTF-8, but this doesn't prevent one from creating an arbitrary Unicode string using "utf-8 bytes".decode("utf-8"). Given this, can't people who must have access to all files / environment data just use the bytes interface? Disclosure: My gut reaction is that the solution described in the PEP is a hack, but I'm hardly a character encoding expert. My feeling is that the correct solution is to either standardise on the bytes interface as the lowest common denominator, or to add a Path type (and I guess an EnvironmentalData type) and use the new type to attempt to hide the differences. Schiavo Simon From v+python at g.nevcal.com Fri Apr 24 11:22:14 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 24 Apr 2009 02:22:14 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49F184C6.8000905@g.nevcal.com> On approximately 4/24/2009 12:59 AM, came the following characters from the keyboard of Simon Cross: > On Wed, Apr 22, 2009 at 8:50 AM, "Martin v. L?wis" wrote: >> For Python 3, one proposed solution is to provide two sets of APIs: a >> byte-oriented one, and a character-oriented one, where the >> character-oriented one would be limited to not being able to represent >> all data accurately. Unfortunately, for Windows, the situation would >> be exactly the opposite: the byte-oriented interface cannot represent >> all data; only the character-oriented API can. As a consequence, >> libraries and applications that want to support all user data in a >> cross-platform manner have to accept mish-mash of bytes and characters >> exactly in the way that caused endless troubles for Python 2.x. > > Is the second part of this actually true? My understanding may be > flawed, but surely all Unicode data can be converted to and from bytes > using UTF-8? Obviously not all byte sequences are valid UTF-8, but > this doesn't prevent one from creating an arbitrary Unicode string > using "utf-8 bytes".decode("utf-8"). Given this, can't people who > must have access to all files / environment data just use the bytes > interface? > > Disclosure: My gut reaction is that the solution described in the PEP > is a hack, but I'm hardly a character encoding expert. My feeling is > that the correct solution is to either standardise on the bytes > interface as the lowest common denominator, or to add a Path type (and > I guess an EnvironmentalData type) and use the new type to attempt to > hide the differences. Oh clearly it is a hack. The right solution of a Path type (and friends) was discarded in earlier discussion, because it would impact too much existing code. The use of bytes would be annoying in the context of py3, where things that you want to display are in str (Unicode). So there is no solution that allows the use of str, and the robustness of bytes, and is 100% compatible with existing practice. Hence the desire is to find a hack that is "good enough". At least, that is my understanding and synopsis. I never saw MvL's original message with the PEP delivered to my mailbox, but some of the replies came there, so I found and extensively replied to it using the Google group / usenet. My reply never showed up here and no one has commented on it either... Should I repost via the mailing list? I think so... I'll just paste it in here, with one tweak I noticed after I sent it fixed... (Sorry Simon, but it is still the same thread, anyway.) (Sorry to others, if my original reply was seen, and just wasn't worth replying to.) On Apr 21, 11:50 pm, "Martin v. L?wis" wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. Basically the scheme doesn't work. Aside from that, it is very close. There are tons of encoding schemes that could work... they don't have to include half-surrogates or bytes. What they have to do, is make sure that they are uniformly applied to all appropriate strings. The problem with this, and other preceding schemes that have been discussed here, is that there is no means of ascertaining whether a particular file name str was obtained from a str API, or was funny- decoded from a bytes API... and thus, there is no means of reliably ascertaining whether a particular filename str should be passed to a str API, or funny-encoded back to bytes. The assumption in the 2nd Discussion paragraph may hold for a large percentage of cases, maybe even including some number of 9s, but it is not guaranteed, and cannot be enforced, therefore there are cases that could fail. Whether those failure cases are a concern or not is an open question. Picking a character (I don't find U+F01xx in the Unicode standard, so I don't know what it is) that is obscure, and unlikely to be used in "real" file names, might help the heuristic nature of the encoding and decoding avoid most conflicts, but provides no guarantee that data puns will not occur in practice. Today's obscure character is tomorrows commonly used character, perhaps. Someone not on this list may be happily using that character for their own nefarious, incompatible purpose. As I realized in the email-sig, in talking about decoding corrupted headers, there is only one way to guarantee this... to encode _all_ character sequences, from _all_ interfaces. Basically it requires reserving an escape character (I'll use ? in these examples -- yes, an ASCII question mark -- happens to be illegal in Windows filenames so all the better on that platform, but the specific character doesn't matter... avoiding / \ and . is probably good, though). So the rules would be, when obtaining a file name from the bytes OS interface, that doesn't properly decode according to UTF-8, decode it by placing a ? at the beginning, then for each decodable UTF-8 sequence, add a Unicode character -- unless the character is ?, in which case you add two ??, and for each non-decodable byte sequence, place a ? and two hex digits, or a ? and a half surrogate code, or a ? and whatever gibberish you like. Two hex digits are fine by me, and will serve for this discussion. ALSO, when obtaining a file name from the str OS interfaces, encode it too... if it contains any ?, then place a ? at the front, and then any other ? in the name must be doubled. Then you have a string that can/must be encoded to be used on either str or bytes OS interfaces... or any other interfaces that want str or bytes... but whichever they want, you can do a decode, or determine that you can't, into that form. The encode and decode functions should be available for coders to use, that code to external interfaces, either OS or 3rd party packages, that do not use this encoding scheme. This encoding scheme would be used throughout all Python APIs (most of which would need very little change to accommodate it). However, programs would have to keep track of whether they were dealing with encoded or unencoded strings, if they use both types in their program (an example, is hard-coded file names or file name parts). The initial ? is not strictly necessary for this scheme to work, but I think it would be a good flag to the user that this name has been altered. This scheme does not depend on assumptions about the use of file names. This scheme would be enhanced if the file name APIs returned a subtype of str for the encoded names, but that should be considered only a hint, not a requirement. When encoding file name strings to pass to bytes APIs, the ? followed by two hex digits would be converted to a byte. Leading ? would be dropped, and ?? would convert to ?. I don't believe failures are possible when encoding to bytes. When encoding file name strings to pass to str APIs, the discovery of ? followed by two hex digits would raise an exception, the file name is not acceptable to a str API. However, leading ? would be dropped, and ?? would convert to ?, and if no ? followed by two hex digits were found, the file name would be successfully converted for use on the str API. Note that not even on Unix/Posix is it particularly easy nor useful to place a ? into file names from command lines due to shell escapes, etc. The use of ? in file names also interferes with easy ability to specifically match them in globs, etc. Anything short of such an encoding of both types of interfaces, such that it is known that all python-manipulated filenames will be encoded, will have data puns that provide a potential for failure in edge cases. Note that in this scheme, no file names that are fully Unicode and do not contain ? characters are altered by the decoding or the encoding process. That will probably reach quite a few 9s of likelihood that the scheme will go unnoticed by most people and programs and filenames. But the scheme will work reliably if implemented correctly and completely, and will have no edge cases of failure due to not having data puns. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From hodgestar+pythondev at gmail.com Fri Apr 24 11:37:02 2009 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Fri, 24 Apr 2009 11:37:02 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F184C6.8000905@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> Message-ID: On Fri, Apr 24, 2009 at 11:22 AM, Glenn Linderman wrote: > Oh clearly it is a hack. ?The right solution of a Path type (and friends) > was discarded in earlier discussion, because it would impact too much > existing code. ?The use of bytes would be annoying in the context of py3, > where things that you want to display are in str (Unicode). ?So there is no > solution that allows the use of str, and the robustness of bytes, and is > 100% compatible with existing practice. Hence the desire is to find a hack > that is "good enough". ?At least, that is my understanding and synopsis. What about keeping the bytes interface (utf-8 encoded Unicode on Windows) and adding a Path type (and friends) interface that mirrors it? > (Sorry Simon, but it is still the same thread, anyway.) Python discussions do seem to womble through a rather large set of mailing lists and news groups. :) Schiavo Simon From hodgestar+pythondev at gmail.com Fri Apr 24 12:39:15 2009 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Fri, 24 Apr 2009 12:39:15 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> Message-ID: On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman wrote: > The goal of Unicode users everywhere is to use Unicode for everything, no? > ?After all, all "real" file should have Unicode based names, and the only > proper byte sequences that should exist are UTF-8 encoding Unicode bytes. > ?(Cheek to tongue: Get out of here!) Humour aside :), the expectation that filenames are Unicode data simply doesn't agree with the reality of POSIX file systems. ?I think an approach similar to that adopted by glib [1] could work -- i.e. use the bytes API and provide some tools to assist application developers in converting them to and from Unicode strings (these tools are then where all the guess work about what encoding to use can live). [1] http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html Schiavo Simon From p.f.moore at gmail.com Fri Apr 24 14:00:40 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Apr 2009 13:00:40 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> Message-ID: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> 2009/4/24 Simon Cross : > On Fri, Apr 24, 2009 at 12:04 PM, Glenn Linderman wrote: >> The goal of Unicode users everywhere is to use Unicode for everything, no? >> ?After all, all "real" file should have Unicode based names, and the only >> proper byte sequences that should exist are UTF-8 encoding Unicode bytes. >> ?(Cheek to tongue: Get out of here!) > > Humour aside :), the expectation that filenames are Unicode data > simply doesn't agree with the reality of POSIX file systems. However, it *does* agree with the reality of Windows file systems. The fundamental problem here is that there is a strong OS disparity - for Windows, the OS uses Unicode, for POSIX, the OS uses bytes. Traditionally, Python has been happy to expose OS differences, and let application code address platform portability issues. But this is such a fundamental area, that doing so is problematic - it could easily result in *more* code being OS-specific (in subtle, only-affects-non-Latin-alphabet-using-users manners) rather than less. That is why it makes sense to have *some* means of normalising things in a way that does the best it can. The raw bytes interfaces should be available for POSIX users writing low-level code that *must* handle all possible nightmare scenarios[1], but Martin's proposal is designed to handle "the majority of cases" in a platform-independent way. To that end, a string-based interface makes sense, as frankly that's how "normal" users think of filenames. The rest of Martin's proposal seems to follow the same sort of practical approach. Paul. [1] Maybe there's a need for a Unicode interface on Windows that doesn't do *any* encoding, even in the face of garbled Unicode - I don't know low-level details well enough to be sure here. But the same principle applies, that "get the raw data, regardless" is a low-level OS-specific operation, and should not be the one used in day-to-day programming. From yesim4 at yahoo.com Fri Apr 24 15:34:29 2009 From: yesim4 at yahoo.com (Yuma Scott) Date: Fri, 24 Apr 2009 06:34:29 -0700 (PDT) Subject: [Python-Dev] version for blender Vista Message-ID: <140172.96060.qm@web30808.mail.mud.yahoo.com> Can you tell me which installer of Python I need to work with Blender and Windows Vista Home Premium? Thanks! Yuma Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From orsenthil at gmail.com Fri Apr 24 15:59:08 2009 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 24 Apr 2009 19:29:08 +0530 Subject: [Python-Dev] version for blender Vista In-Reply-To: <140172.96060.qm@web30808.mail.mud.yahoo.com> References: <140172.96060.qm@web30808.mail.mud.yahoo.com> Message-ID: <7c42eba10904240659o515ed8fcqa685e068af6fe3f4@mail.gmail.com> From: http://mail.python.org/mailman/listinfo/python-dev About Python-Dev ***Do not post general Python questions to this list. For help with Python please see the Python help page.*** On this list the key Python developers discuss the future of the language and its implementation. Topics include Python design issues, release mechanics, and maintenance of existing releases. On Fri, Apr 24, 2009 at 7:04 PM, Yuma Scott wrote: > > Can you tell me which installer of Python I need to work with > Blender and Windows Vista Home Premium? > Thanks! > Yuma Scott > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/orsenthil%40gmail.com > > -- -- Senthil From foom at fuhm.net Fri Apr 24 16:54:07 2009 From: foom at fuhm.net (James Y Knight) Date: Fri, 24 Apr 2009 10:54:07 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> Message-ID: <0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net> On Apr 24, 2009, at 8:00 AM, Paul Moore wrote: > However, it *does* agree with the reality of Windows file systems. The > fundamental problem here is that there is a strong OS disparity - for > Windows, the OS uses Unicode, for POSIX, the OS uses bytes. It's unfortunately the case that this isn't *precisely* true. Windows uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit sequences. Neither one is required by the operating system to be a proper unicode encoding. The main difference is that there is already a widely accepted way to decode a improperly-encoded 16-bit-sequence with the utf-16 codec: simply leave the lone surrogate pairs in place. James From aahz at pythoncraft.com Fri Apr 24 17:27:46 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 24 Apr 2009 08:27:46 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> Message-ID: <20090424152746.GA9543@panix.com> On Fri, Apr 24, 2009, Paul Moore wrote: > 2009/4/24 Simon Cross : >> >> Humour aside :), the expectation that filenames are Unicode data >> simply doesn't agree with the reality of POSIX file systems. > > However, it *does* agree with the reality of Windows file systems. The > fundamental problem here is that there is a strong OS disparity - for > Windows, the OS uses Unicode, for POSIX, the OS uses bytes. > Traditionally, Python has been happy to expose OS differences, and let > application code address platform portability issues. But this is such > a fundamental area, that doing so is problematic - it could easily > result in *more* code being OS-specific (in subtle, > only-affects-non-Latin-alphabet-using-users manners) rather than less. The part that I haven't seen clearly addressed so far is what happens when disks get mounted across OSes (e.g. NFS). While I agree that there should be a layer on top that can handle "most" situations, it also seems clear that the raw layer needs to be readily accessible. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From solipsis at pitrou.net Fri Apr 24 17:33:05 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Apr 2009 15:33:05 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?= =?utf-8?q?=09Character=09Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> Message-ID: Aahz pythoncraft.com> writes: > > The part that I haven't seen clearly addressed so far is what happens > when disks get mounted across OSes (e.g. NFS). Unless there's some kind of native NFS API for file access, it is hopelessly out of scope for Python. We use whatever the C library exports to us, and don't have any control over filesystem details. From stephen at xemacs.org Fri Apr 24 17:53:53 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Apr 2009 00:53:53 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <0313C0B7-27A3-455E-962E-A178B41CE049@fuhm.net> Message-ID: <87y6tqkpym.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > It's unfortunately the case that this isn't *precisely* true. Windows > uses arbitrary 16-bit sequences, just as unix uses arbitrary 8-bit > sequences. Including U+FFFE and U+FFFF "not a character nowhere nohow"? Just when I was thinking Microsoft would actually nail one.... From p.f.moore at gmail.com Fri Apr 24 17:59:59 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Apr 2009 16:59:59 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> Message-ID: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> 2009/4/24 Antoine Pitrou : > Aahz pythoncraft.com> writes: >> >> The part that I haven't seen clearly addressed so far is what happens >> when disks get mounted across OSes (e.g. NFS). > > Unless there's some kind of native NFS API for file access, it is hopelessly out > of scope for Python. We use whatever the C library exports to us, and don't have > any control over filesystem details. For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows) that's right. Resist the temptation to guess and all that. For the level Martin is (as far as I can tell) aiming at [1], we need some defined rules on how to behave (relatively) sanely. Windows is fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix, you're dealing with bytes-to-Unicode in the absence of a clearly stated encoding - which is a known can of worms... In my view: The pros for Martin's proposal are a uniform cross-platform interface, and a user-friendly API for the common case. The cons are subtle and complex corner cases, and lack of agreement on the validity of the proposed encoding in those cases. The fact that the bytes APIs won't go away probably mitigates the cons to a large extent (again, in my view...) Paul. [1] Actually, all the PEP says is "With this PEP, a uniform treatment of these data as characters becomes possible." An argument as to why this is a good thing would be a useful addition to the PEP. At the moment it's more or less treated as self-evident - which I agree with, but which clearly the Unix people here are not as certain of. From status at bugs.python.org Fri Apr 24 18:07:29 2009 From: status at bugs.python.org (Python tracker) Date: Fri, 24 Apr 2009 18:07:29 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20090424160729.B33E5780C9@psf.upfronthosting.co.za> ACTIVITY SUMMARY (04/17/09 - 04/24/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2227 open (+32) / 15427 closed (+17) / 17654 total (+49) Open issues with patches: 865 Average duration of open issues: 641 days. Median duration of open issues: 395 days. Open Issues Breakdown open 2175 (+31) pending 52 ( +1) Issues Created Or Reopened (51) _______________________________ Builtin round function is sometimes inaccurate for floats 04/18/09 CLOSED http://bugs.python.org/issue1869 reopened marketdickinson patch logging to file + encoding 04/20/09 CLOSED http://bugs.python.org/issue5170 reopened shamilbi IDLE cannot find windows chm file 04/17/09 http://bugs.python.org/issue5783 created rhettinger patch raw deflate format and zlib module 04/17/09 http://bugs.python.org/issue5784 created phr Condition.wait() does not respect its timeout 04/18/09 CLOSED http://bugs.python.org/issue5785 created Kjir len(reversed([1,2,3])) does not work anymore in 2.6.2 04/19/09 http://bugs.python.org/issue5786 reopened rhettinger object.__getattribute__(super, '__bases__') crashes the interpre 04/19/09 CLOSED http://bugs.python.org/issue5787 reopened alexer datetime.timedelta is inconvenient to use... 04/18/09 http://bugs.python.org/issue5788 created bquinlan patch powerset recipe listed twice in itertools docs 04/19/09 CLOSED http://bugs.python.org/issue5789 created stevenjd easy itertools.izip python code has a typo 04/19/09 CLOSED http://bugs.python.org/issue5790 created stevenjd title information of unicodedata is wrong in some cases 04/19/09 CLOSED http://bugs.python.org/issue5791 created cfbolz Enable short float repr() on Solaris/x86 04/19/09 http://bugs.python.org/issue5792 created marketdickinson easy Rationalize isdigit / isalpha / tolower / ... uses throughout Py 04/19/09 http://bugs.python.org/issue5793 created marketdickinson easy pickle/cPickle of recursive tuples create pickles that cPickle c 04/19/09 http://bugs.python.org/issue5794 created cwitty test_distutils failure on the ppc Debian buildbot 04/19/09 CLOSED http://bugs.python.org/issue5795 created pitrou test_posix, test_pty crash under Windows 04/19/09 CLOSED http://bugs.python.org/issue5796 created pitrou patch there is en exception om Create User page 04/20/09 http://bugs.python.org/issue5797 created nabeel test_asynchat fails on Mac OSX 04/20/09 http://bugs.python.org/issue5798 created cartman Change ntpath functions to implicitly support UNC paths 04/20/09 http://bugs.python.org/issue5799 created larry patch make wsgiref.headers.Headers accept empty constructor 04/20/09 http://bugs.python.org/issue5800 created tarek easy spurious empty lines in wsgiref code 04/20/09 http://bugs.python.org/issue5801 created tarek The security descriptors of python binaries in Windows are not s 04/20/09 http://bugs.python.org/issue5802 created kindloaf email/quoprimime: encode and decode are very slow on large messa 04/20/09 http://bugs.python.org/issue5803 created dmbaggett Add a "tail" argument to zlib.decompress 04/21/09 http://bugs.python.org/issue5804 created krisvale patch Distutils (or py2exe) error with DistributionMetaData 04/21/09 CLOSED http://bugs.python.org/issue5805 created varash MySQL crash on machine startup.... 04/21/09 CLOSED http://bugs.python.org/issue5806 created plattecoducks ConfigParser.RawConfigParser it's an "old-style" class 04/21/09 CLOSED http://bugs.python.org/issue5807 created ZeD Subprocess.getstatusoutput Fails Executing 'dir' Command on Wind 04/21/09 CLOSED http://bugs.python.org/issue5808 created mrwizard82d1 "No such file or directory" with framework build under MacOS 10. 04/21/09 http://bugs.python.org/issue5809 created creachadair test_distutils fails - sysconfig._config_vars is None 04/22/09 http://bugs.python.org/issue5810 created srid io.BufferedReader.peek(): Documentation differs from Implementat 04/22/09 http://bugs.python.org/issue5811 created trott Fraction('1e6') should be valid. 04/22/09 CLOSED http://bugs.python.org/issue5812 created marketdickinson patch Pointer into language reference from __future__ module documenta 04/22/09 CLOSED http://bugs.python.org/issue5813 created ncoghlan SocketServer: TypeError: waitpid() takes no keyword arguments 04/22/09 http://bugs.python.org/issue5814 created arekm locale.getdefaultlocale() missing corner case 04/22/09 http://bugs.python.org/issue5815 created rg3 patch Simplify parsing of complex numbers and make complex('inf') vali 04/22/09 CLOSED http://bugs.python.org/issue5816 created marketdickinson patch Right-click behavior from Windows Explorer 04/23/09 http://bugs.python.org/issue5817 created Mkop Fix five small bugs in the bininstall and altbininstall pseudota 04/23/09 http://bugs.python.org/issue5818 created larry patch Add PYTHONPREFIXES environment variable 04/23/09 http://bugs.python.org/issue5819 created larry patch Very small bug in documentation of json.load() 04/23/09 CLOSED http://bugs.python.org/issue5820 created pcshyamshankar patch Documentation: mention 'close' and iteration for tarfile.TarFile 04/23/09 http://bugs.python.org/issue5821 created bennorth patch inconsistent behavior of range when used in combination with rem 04/23/09 CLOSED http://bugs.python.org/issue5822 created zero79 feature request: a conditional "for" statement 04/23/09 CLOSED http://bugs.python.org/issue5823 created zero79 SocketServer.DatagramRequestHandler Broken under Linux 04/23/09 http://bugs.python.org/issue5824 created jimd Patch to add "remove" method to tempfile.NamedTemporaryFile 04/24/09 http://bugs.python.org/issue5825 created tebeka patch new unittest function listed as assertIsNotNot() instead of asse 04/24/09 http://bugs.python.org/issue5826 created mrooney os.path.normpath doesn't preserve unicode 04/24/09 http://bugs.python.org/issue5827 created mgiuca patch Invalid behavior of unicode.lower 04/24/09 http://bugs.python.org/issue5828 created jarek float('1e500') -> inf, complex('1e500') -> ValueError 04/24/09 http://bugs.python.org/issue5829 created marketdickinson easy heapq item comparison problematic with sched's events 04/24/09 http://bugs.python.org/issue5830 created kfj Doc mistake : threading.Timer is *not* a class 04/24/09 http://bugs.python.org/issue5831 created maxenced easy Issues Now Closed (50) ______________________ Use shorter float repr when possible 495 days http://bugs.python.org/issue1580 marketdickinson patch Builtin round function is sometimes inaccurate for floats 0 days http://bugs.python.org/issue1869 marketdickinson patch Idle, some Linuxes, cannot position Cursor by mouseclick 329 days http://bugs.python.org/issue2995 gpolo Make conversions from long to float correctly rounded. 303 days http://bugs.python.org/issue3166 marketdickinson patch Exception for test_urllib2_localnet 246 days http://bugs.python.org/issue3584 r.david.murray float.fromhex discrepancy under Solaris 240 days http://bugs.python.org/issue3633 marketdickinson patch, needs review patch for review: OS/2 EMX port fixes for 2.6 221 days http://bugs.python.org/issue3868 aimacintyre patch, patch logging to file + encoding 2 days http://bugs.python.org/issue5170 vsajip Allow auto-numbered replacement fields in str.format() strings 68 days http://bugs.python.org/issue5237 eric.smith patch Add test.support.import_python_only 58 days http://bugs.python.org/issue5354 ncoghlan 'n' formatting for int and float handles leading zero padding po 35 days http://bugs.python.org/issue5515 eric.smith bad repr of itertools.count object with negative value on OS X 1 18 days http://bugs.python.org/issue5657 ronaldoussoren Fix BufferedRWPair 8 days http://bugs.python.org/issue5734 pitrou patch Typo in documentation of print function parameters 7 days http://bugs.python.org/issue5751 georg.brandl Documentation error for Condition.notify() 7 days http://bugs.python.org/issue5757 georg.brandl __getitem__ error message hard to understand 3 days http://bugs.python.org/issue5760 georg.brandl SA bugs with unittest.py at r71263 2 days http://bugs.python.org/issue5771 benjamin.peterson patch For float.__format__, don't add a trailing ".0" if we're using n 6 days http://bugs.python.org/issue5772 eric.smith easy marshal.c needs to be checked for out of memory errors 5 days http://bugs.python.org/issue5775 eric.smith unable to search in python V3 documentation 1 days http://bugs.python.org/issue5777 georg.brandl _elementtree import can fail silently 1 days http://bugs.python.org/issue5779 benjamin.peterson test_float fails for 'legacy' float repr style 1 days http://bugs.python.org/issue5780 marketdickinson patch Legacy float repr is used unnecessarily on some platforms 1 days http://bugs.python.org/issue5781 marketdickinson easy ',' formatting with empty format type '' (PEP 378) 5 days http://bugs.python.org/issue5782 eric.smith easy Condition.wait() does not respect its timeout 1 days http://bugs.python.org/issue5785 benjamin.peterson object.__getattribute__(super, '__bases__') crashes the interpre 0 days http://bugs.python.org/issue5787 benjamin.peterson powerset recipe listed twice in itertools docs 3 days http://bugs.python.org/issue5789 rhettinger easy itertools.izip python code has a typo 2 days http://bugs.python.org/issue5790 rhettinger title information of unicodedata is wrong in some cases 0 days http://bugs.python.org/issue5791 loewis test_distutils failure on the ppc Debian buildbot 1 days http://bugs.python.org/issue5795 tarek test_posix, test_pty crash under Windows 2 days http://bugs.python.org/issue5796 r.david.murray patch Distutils (or py2exe) error with DistributionMetaData 0 days http://bugs.python.org/issue5805 loewis MySQL crash on machine startup.... 0 days http://bugs.python.org/issue5806 plattecoducks ConfigParser.RawConfigParser it's an "old-style" class 0 days http://bugs.python.org/issue5807 benjamin.peterson Subprocess.getstatusoutput Fails Executing 'dir' Command on Wind 0 days http://bugs.python.org/issue5808 benjamin.peterson Fraction('1e6') should be valid. 2 days http://bugs.python.org/issue5812 marketdickinson patch Pointer into language reference from __future__ module documenta 1 days http://bugs.python.org/issue5813 georg.brandl Simplify parsing of complex numbers and make complex('inf') vali 2 days http://bugs.python.org/issue5816 marketdickinson patch Very small bug in documentation of json.load() 0 days http://bugs.python.org/issue5820 georg.brandl patch inconsistent behavior of range when used in combination with rem 0 days http://bugs.python.org/issue5822 zero79 feature request: a conditional "for" statement 0 days http://bugs.python.org/issue5823 zero79 No sleep or busy wait 2093 days http://bugs.python.org/issue780602 gpolo ConfigParser non-string defaults broken with .getboolean() 1771 days http://bugs.python.org/issue974019 draghuram patch, easy ctrl-left/-right works incorectly with diacritics 1707 days http://bugs.python.org/issue1012435 gpolo not enough information in SGMLParseError 1625 days http://bugs.python.org/issue1063229 ajaksu2 Frame does not receive configure event on move 1562 days http://bugs.python.org/issue1100366 gpolo Let shift operators take any integer value 1436 days http://bugs.python.org/issue1205239 marketdickinson Allow thread(ing) tests to pass without setting stack size 994 days http://bugs.python.org/issue1533520 aimacintyre patch import deadlocks when using PyObjC threads 898 days http://bugs.python.org/issue1590864 abaron object.__init__ shouldn't allow args/kwds 763 days http://bugs.python.org/issue1683368 KayEss Top Issues Most Discussed (10) ______________________________ 18 Add DTrace probes 194 days open http://bugs.python.org/issue4111 15 IDLE cannot find windows chm file 7 days pending http://bugs.python.org/issue5783 9 len(reversed([1,2,3])) does not work anymore in 2.6.2 6 days open http://bugs.python.org/issue5786 8 Use shorter float repr when possible 495 days closed http://bugs.python.org/issue1580 7 Fraction('1e6') should be valid. 2 days closed http://bugs.python.org/issue5812 7 datetime.timedelta is inconvenient to use... 6 days open http://bugs.python.org/issue5788 7 logging to file + encoding 2 days closed http://bugs.python.org/issue5170 7 Builtin round function is sometimes inaccurate for floats 0 days closed http://bugs.python.org/issue1869 6 Simplify parsing of complex numbers and make complex('inf') val 2 days closed http://bugs.python.org/issue5816 6 locale.getdefaultlocale() missing corner case 2 days open http://bugs.python.org/issue5815 From google at mrabarnett.plus.com Fri Apr 24 18:29:29 2009 From: google at mrabarnett.plus.com (MRAB) Date: Fri, 24 Apr 2009 17:29:29 +0100 Subject: [Python-Dev] Dates in python-dev Message-ID: <49F1E8E9.60903@mrabarnett.plus.com> Hi, I've recently subscribed to this list and received my first "Summary of Python tracker Issues". What I find annoying are the dates, for example: ACTIVITY SUMMARY (04/17/09 - 04/24/09) 3 x double-digits (have we learned nothing from Y2K? :-)) with the _middle_ ones changing fastest! I know it's the US standard, but Python is global. Could we have an 'international' style instead, say, year-month-day: ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) Thank you for your attention, etc. From arfrever.fta at gmail.com Fri Apr 24 18:37:15 2009 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Fri, 24 Apr 2009 18:37:15 +0200 Subject: [Python-Dev] Dates in python-dev In-Reply-To: <49F1E8E9.60903@mrabarnett.plus.com> References: <49F1E8E9.60903@mrabarnett.plus.com> Message-ID: <200904241837.21090.Arfrever.FTA@gmail.com> 2009-04-24 18:29:29 MRAB napisa?(a): > Hi, > > I've recently subscribed to this list and received my first "Summary of > Python tracker Issues". What I find annoying are the dates, for example: > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > _middle_ ones changing fastest! > > I know it's the US standard, but Python is global. Could we have an > 'international' style instead, say, year-month-day: > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) +1. ISO 8601 should be mandatory. -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From phd at phd.pp.ru Fri Apr 24 19:06:57 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 24 Apr 2009 21:06:57 +0400 Subject: [Python-Dev] Dates in python-dev In-Reply-To: <49F1E8E9.60903@mrabarnett.plus.com> References: <49F1E8E9.60903@mrabarnett.plus.com> Message-ID: <20090424170657.GA13056@phd.pp.ru> On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote: > I've recently subscribed to this list and received my first "Summary of > Python tracker Issues". What I find annoying are the dates, for example: > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > _middle_ ones changing fastest! > > I know it's the US standard, but Python is global. Could we have an > 'international' style instead, say, year-month-day: > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) +1000 from me! Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From stephen at xemacs.org Fri Apr 24 19:25:03 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Apr 2009 02:25:03 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> Message-ID: <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > The pros for Martin's proposal are a uniform cross-platform interface, > and a user-friendly API for the common case. A more accurate phrasing would be "... a user-friendly API for those who feel very lucky today." Which is the common case, of course, but spins a little differently. > [1] Actually, all the PEP says is "With this PEP, a uniform > treatment of these data as characters becomes possible." An > argument as to why this is a good thing would be a useful addition > to the PEP. At the moment it's more or less treated as self-evident > - which I agree with, but which clearly the Unix people here are > not as certain of. Well, the problem is that both parts are false. If you didn't start with a valid string in a known encoding, you shouldn't treat it as characters because it's not. Hand it to a careful API, and you'll get an Exception raised in your face. And that's precisely why it's not obviously a good thing. Careful clients will have to treat it as "transcoded bytes", and so the people who develop those clients get no benefit. OTOH, at least some of those who feel lucky and use it naively are going to turn out to be wrong. That said, I'm +0 on the PEP as is. It's a little bit better than the current situation in that developers who would otherwise just punt on dealing with the other world (ie, Windows for Unix hackers, and Unix for Windows coders) will have a unified interface so it'll maybe work automagically (when you're luck :-) in that other world, too. And if somebody comes up with an idea of true genius for handling the underlying problem, or even just a slight practical improvement, then everybody who uses this API can benefit simply by upgrading Python. From stephen at xemacs.org Fri Apr 24 19:39:51 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Apr 2009 02:39:51 +0900 Subject: [Python-Dev] Dates in python-dev In-Reply-To: <200904241837.21090.Arfrever.FTA@gmail.com> References: <49F1E8E9.60903@mrabarnett.plus.com> <200904241837.21090.Arfrever.FTA@gmail.com> Message-ID: <87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp> Followups directed to Tracker-Discuss, where the people who can do something about it are hanging out. (They're here too, but I'm pretty sure they'd rather discuss this issue on that list.) Arfrever Frehtes Taifersar Arahesis writes: > 2009-04-24 18:29:29 MRAB napisa?(a): > > Hi, > > > > I've recently subscribed to this list and received my first "Summary of > > Python tracker Issues". What I find annoying are the dates, for example: > > > > ACTIVITY SUMMARY (04/17/09 - 04/24/09) > > > > 3 x double-digits (have we learned nothing from Y2K? :-)) with the > > _middle_ ones changing fastest! > > > > I know it's the US standard, but Python is global. Could we have an > > 'international' style instead, say, year-month-day: > > > > ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) > > +1. > ISO 8601 should be mandatory. From solipsis at pitrou.net Fri Apr 24 19:39:15 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Apr 2009 17:39:15 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?= =?utf-8?q?=09Character=09Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull xemacs.org> writes: > > Well, the problem is that both parts are false. If you didn't start > with a valid string in a known encoding, you shouldn't treat it as > characters because it's not. Hand it to a careful API, and you'll get > an Exception raised in your face. Which "careful API" are you talking about? > OTOH, at least some of those who feel lucky and use it > naively are going to turn out to be wrong. Why will they turn out to be wrong? From tlesher at gmail.com Fri Apr 24 19:44:13 2009 From: tlesher at gmail.com (Tim Lesher) Date: Fri, 24 Apr 2009 13:44:13 -0400 Subject: [Python-Dev] PyEval_Call* convenience functions Message-ID: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com> Is there a reason that the PyEval_CallFunction() and PyEval_CallMethod() convenience functions remain undocumented? (i.e., would a doc-and-test patch to correct this be rejected?) I didn't see any mention of this coming up in python-dev before. Also, despite its name, PyEval_CallMethod() is quite useful for calling module-level functions or classes (given that it's just a PyObject_GetAttrString plus the implementation of PyEval_CallFunction). Is there any reason (beyond its undocumented status) to believe this use case would ever be deprecated? Thanks. -- Tim Lesher From ajaksu at gmail.com Fri Apr 24 19:50:47 2009 From: ajaksu at gmail.com (Daniel Diniz) Date: Fri, 24 Apr 2009 14:50:47 -0300 Subject: [Python-Dev] [Tracker-discuss] Dates in python-dev In-Reply-To: <87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49F1E8E9.60903@mrabarnett.plus.com> <200904241837.21090.Arfrever.FTA@gmail.com> <87skjykl20.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <2d75d7660904241050l27665a8ege0aa52f6822375bd@mail.gmail.com> http://psf.upfronthosting.co.za/roundup/meta/issue274 From python at rcn.com Fri Apr 24 19:52:50 2009 From: python at rcn.com (Raymond Hettinger) Date: Fri, 24 Apr 2009 10:52:50 -0700 Subject: [Python-Dev] Tuples and underorderable types Message-ID: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> Does anyone have any ideas about what to do with issue 5830 and handling the problem in a general way (not just for sched)? The basic problem is that decorate/compare/undecorate patterns no longer work when the primary sort keys are equal and the secondary keys are unorderable (which is now the case for many callables). >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] >>> tasks.sort() Traceback (most recent call last): ... TypeError: unorderable types: function() < function() Would it make sense to provide a default ordering whenever the types are the same? def object.__lt__(self, other): if type(self) == type(other): return id(self) < id(other) raise TypeError Raymond From solipsis at pitrou.net Fri Apr 24 20:02:43 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Apr 2009 18:02:43 +0000 (UTC) Subject: [Python-Dev] Tuples and underorderable types References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> Message-ID: Raymond Hettinger rcn.com> writes: > > Would it make sense to provide a default ordering whenever the types are > the same? This doesn't work when they are not the same :-) Instead, you could make the decorating a bit more sophisticated: decorated = [(key, id(value), value) for key, value in blah(values)] or even: decorated = [(key, n, value) for n, key, value in enumerate(blah(values))] From python at rcn.com Fri Apr 24 20:19:39 2009 From: python at rcn.com (Raymond Hettinger) Date: Fri, 24 Apr 2009 11:19:39 -0700 Subject: [Python-Dev] Tuples and underorderable types References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> Message-ID: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> >> Would it make sense to provide a default ordering whenever the types are >> the same? > > This doesn't work when they are not the same :-) _ ~ @ @ \_/ > Instead, you could make the decorating a bit more sophisticated: > > decorated = [(key, id(value), value) for key, value in blah(values)] > > or even: > > decorated = [(key, n, value) for n, key, value in enumerate(blah(values))] I already do something along those lines in heapq.nsmallest() and nlargest() to preserve sort stability. The real issue isn't how to fix one particular module. The problem is that a basic python pattern is now broken in a way that may not readily surface during testing. I'm wondering if there is something we can do to mitigate the issue in a general way. It bites that the venerable technique of tuple sorting has lost some of its mojo. This may be an unintended consequence of eliminating default comparisons. Raymond From scott+python-dev at scottdial.com Fri Apr 24 20:25:04 2009 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 24 Apr 2009 14:25:04 -0400 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> Message-ID: <49F20400.3030400@scottdial.com> Raymond Hettinger wrote: > Would it make sense to provide a default ordering whenever the types are > the same? > > def object.__lt__(self, other): > if type(self) == type(other): > return id(self) < id(other) > raise TypeError No. This only makes it more difficult for someone wanting to behave smartly with incomparable types. I can easily imagine someone wanting incomparable objects to be treated as equal wrt. sorting. I am thinking especially with respect to keeping the sort stable. I think many developers would be surprised to find, >>> a = >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] >>> tasks.sort() >>> assert tasks[0][1]() == 0 , is not guaranteed. Moreover, I fail to see your point in general as a bug if you accept that there is not all objects can be total ordered. We shouldn't be patching the object base class because of legacy code that relied on sorting tuples; this code should be updated to either use a key function. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From stephen at xemacs.org Fri Apr 24 20:40:12 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Apr 2009 03:40:12 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > Stephen J. Turnbull xemacs.org> writes: > > > > Well, the problem is that both parts are false. If you didn't start > > with a valid string in a known encoding, you shouldn't treat it as > > characters because it's not. Hand it to a careful API, and you'll get > > an Exception raised in your face. > > Which "careful API" are you talking about? > > > OTOH, at least some of those who feel lucky and use it > > naively are going to turn out to be wrong. > > Why will they turn out to be wrong? To quote the PEP: """ While providing a uniform API to non-decodable bytes, this interface has the limitation that chosen representation only "works" if the data get converted back to bytes with the python-escape error handler also. Encoding the data with the locale's encoding and the (default) strict error handler will raise an exception, encoding them with UTF-8 will produce non-sensical data. For most applications, we assume that they eventually pass data received from a system interface back into the same system interfaces. """ But you can't know that. These are now "just strings", which could end up in pickles and other persistent objects, be passed across network interfaces (remote copy, for example), etc, etc, and there is no way to guarantee that the recipient will understand the rules, unless the application encapsulates them in some kind of representation that says "I look like a Unicode but I'm really just encoded bytes." But the whole point is to turn them into plain old strings so people *don't have to bother* keeping track. As I already said, this is no worse than the current situation, but it gives the impression that Python has a standard "solution". (Yes, I know Martin doesn't claim it's a solution to any of those problems. The point is user perception.) I have to wonder whether having a standard way of not solving any problems is better than having no standard way of not solving any problems. It may be, and it probably can't hurt, which is why I'm +0. From martin at v.loewis.de Fri Apr 24 20:31:54 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 24 Apr 2009 20:31:54 +0200 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> Message-ID: <49F2059A.9090708@v.loewis.de> > I'm wondering if there is something we can do to mitigate > the issue in a general way. It bites that the venerable technique > of tuple sorting has lost some of its mojo. This may be > an unintended consequence of eliminating default comparisons. I would discourage use of the decorate/sort/undecorate pattern, and encourage use of the key= argument. Or, if you really need to decorate into a tuple, still pass a key= argument. Regards, Martin From ajaksu at gmail.com Fri Apr 24 20:53:41 2009 From: ajaksu at gmail.com (Daniel Diniz) Date: Fri, 24 Apr 2009 15:53:41 -0300 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> Message-ID: <2d75d7660904241153ub47507ai1ec23545129985ed@mail.gmail.com> Raymond Hettinger wrote: > The problem is that a basic python pattern is now broken > in a way that may not readily surface during testing. > > I'm wondering if there is something we can do to mitigate > the issue in a general way. ?It bites that the venerable technique > of tuple sorting has lost some of its mojo. ?This may be > an unintended consequence of eliminating default comparisons. There could be a high performance, non-lame version of the mapping pattern below available in the stdlib (or at least in the docs): keymap = {type(lambda: 1) : id} def decorate_helper(tup): return tuple(keymap[type(i)](i) if type(i) in keymap else i for i in tup) tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] tasks.sort(key=decorate_helper) This works when comparing different types too, but then some care must be taken to avoid bad surprises: keymap[type(1j)] = abs imaginary_tasks = [(10j, lambda: 0), (20, lambda: 1), (10+1j, lambda: 2)] imaginary_tasks.sort(key=decorate_helper) # not so bad if intended mixed_tasks = [(lambda: 0,), (0.0,), (2**32,)] mixed_tasks.sort(key=decorate_helper) # oops, not the same order as in 2.x Regards, Daniel From g.brandl at gmx.net Fri Apr 24 20:59:01 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 24 Apr 2009 18:59:01 +0000 Subject: [Python-Dev] PyEval_Call* convenience functions In-Reply-To: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com> References: <9613db600904241044i7b7a9e46x1110d809a72235e1@mail.gmail.com> Message-ID: Tim Lesher schrieb: > Is there a reason that the PyEval_CallFunction() and > PyEval_CallMethod() convenience functions remain undocumented? (i.e., > would a doc-and-test patch to correct this be rejected?) > > I didn't see any mention of this coming up in python-dev before. > > Also, despite its name, PyEval_CallMethod() is quite useful for > calling module-level functions or classes (given that it's just a > PyObject_GetAttrString plus the implementation of > PyEval_CallFunction). Is there any reason (beyond its undocumented > status) to believe this use case would ever be deprecated? FWIW, there's also PyObject_CallMethod(); all PyObject_Call* variants are documented, but none of the PyEval_Call* functions are. I actually don't know why we have two sets of these, with partially conflicting definitions; perhaps someone else can shed some light? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From aahz at pythoncraft.com Fri Apr 24 21:10:33 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 24 Apr 2009 12:10:33 -0700 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> Message-ID: <20090424191033.GA14924@panix.com> On Fri, Apr 24, 2009, Raymond Hettinger wrote: > > I'm wondering if there is something we can do to mitigate the issue in > a general way. It bites that the venerable technique of tuple sorting > has lost some of its mojo. This may be an unintended consequence of > eliminating default comparisons. My understanding was that this was entirely an *intended* consequence of eliminating default comparisons. Not so much in the sense that it was desired by itself, but that the whole discussion of whether to keep moving forward in stripping out default comparisons explicitly revolved around whether this kind of difficulty warranted the overall simplification we now have (I don't remember off-hand whether this specific case was discussed, though). I think that anyone who wants to suggest reverting to some kind of default comparison behavior needs to write up a PEP and clearly summarize all previous discussion prior to 3.0 release, then go through the usual grind of starting with python-ideas before coming back to python-dev. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From l.mastrodomenico at gmail.com Fri Apr 24 21:41:21 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Fri, 24 Apr 2009 21:41:21 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: 2009/4/22 "Martin v. L?wis" : > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs. Why not use U+DCxx for non-UTF-8 encodings too? Overall I like the PEP: I think it's the best proposal so far that doesn't put an heavy burden on applications that only want to do simple things with the API. -- Lino Mastrodomenico From v+python at g.nevcal.com Fri Apr 24 21:41:25 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 24 Apr 2009 12:41:25 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <49F215E5.4050205@g.nevcal.com> On approximately 4/24/2009 11:40 AM, came the following characters from the keyboard of Stephen J. Turnbull: > Antoine Pitrou writes: > > Stephen J. Turnbull xemacs.org> writes: > > > > > > Well, the problem is that both parts are false. If you didn't start > > > with a valid string in a known encoding, you shouldn't treat it as > > > characters because it's not. Hand it to a careful API, and you'll get > > > an Exception raised in your face. > > > > Which "careful API" are you talking about? > > > > > OTOH, at least some of those who feel lucky and use it > > > naively are going to turn out to be wrong. > > > > Why will they turn out to be wrong? Because the encoding is not reliably reversible. That is why I proposed one that is. > To quote the PEP: > > """ > While providing a uniform API to non-decodable bytes, this interface > has the limitation that chosen representation only "works" if the data > get converted back to bytes with the python-escape error handler > also. Encoding the data with the locale's encoding and the (default) > strict error handler will raise an exception, encoding them with UTF-8 > will produce non-sensical data. > > For most applications, we assume that they eventually pass data > received from a system interface back into the same system > interfaces. > """ And so my encoding (1) doesn't alter the data stream for any valid Windows file name, and where the naivest of users reside (2) doesn't alter the data stream for any Posix file name that was encoded as UTF-8 sequences and doesn't contain ? characters in the file name [I perceive the use of ? in file names to be rare on Posix, because of experience, and because of the other problems caused by such use] (3) doesn't introduce data puns within applications that are correctly coded to know the encoding occurs. The encoding technique in the PEP not only can produce data puns, thus not being reversible, it provides no reliable mechanism to know that this has occurred. > But you can't know that. These are now "just strings", which could > end up in pickles and other persistent objects, be passed across > network interfaces (remote copy, for example), etc, etc, and there is > no way to guarantee that the recipient will understand the rules, > unless the application encapsulates them in some kind of > representation that says "I look like a Unicode but I'm really just > encoded bytes." This could happen. Well-formed programs need to use the encoding at the boundaries. Python could encapsulate its interfaces to the file system, but cannot encapsulate other interfaces. Fortunately, something that is pickled, would probably be unpicked by Python, and therefore all would be well. But any interface that expects a file name, and is not encapsulated by Python, must be encapsulated by the application. > But the whole point is to turn them into plain old > strings so people *don't have to bother* keeping track. And if that is the point, it isn't worth doing. If the point is that it can minimize the amount of existing, file name manipulation code that uses string manipulations, that must be reworked to be functional during a 2to3 migration, then it can be worth doing. But I think it should be done with an encoding that doesn't introduce undetectable data puns, whether mine or some different encoding with that characteristic, but not the one presently in the PEP, because it does introduce undetectable data puns. > As I already said, this is no worse than the current situation, but it > gives the impression that Python has a standard "solution". (Yes, I > know Martin doesn't claim it's a solution to any of those problems. > The point is user perception.) > > I have to wonder whether having a standard way of not solving any > problems is better than having no standard way of not solving any > problems. It may be, and it probably can't hurt, which is why I'm +0. Interesting phraseology there, Stephen! I'm +1 on the concept, -1 on the PEP, due solely to the lack of a reversible encoding. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Fri Apr 24 21:44:11 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 24 Apr 2009 12:44:11 -0700 Subject: [Python-Dev] Dates in python-dev In-Reply-To: <20090424170657.GA13056@phd.pp.ru> References: <49F1E8E9.60903@mrabarnett.plus.com> <20090424170657.GA13056@phd.pp.ru> Message-ID: <49F2168B.2050306@g.nevcal.com> On approximately 4/24/2009 10:06 AM, came the following characters from the keyboard of Oleg Broytmann: > On Fri, Apr 24, 2009 at 05:29:29PM +0100, MRAB wrote: >> I've recently subscribed to this list and received my first "Summary of >> Python tracker Issues". What I find annoying are the dates, for example: >> >> ACTIVITY SUMMARY (04/17/09 - 04/24/09) >> >> 3 x double-digits (have we learned nothing from Y2K? :-)) with the >> _middle_ ones changing fastest! >> >> I know it's the US standard, but Python is global. Could we have an >> 'international' style instead, say, year-month-day: >> >> ACTIVITY SUMMARY (2009-04-17 - 2009-04-24) > > +1000 from me! > > Oleg. You missed a prime opportunity, Oleg... +2000 from me! -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Fri Apr 24 22:25:25 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 24 Apr 2009 22:25:25 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49F22035.9030405@v.loewis.de> > Why not use U+DCxx for non-UTF-8 encodings too? I thought of that, and was tricked into believing that only U+DC8x is a half surrogate. Now I see that you are right, and have fixed the PEP accordingly. Regards, Martin From tjreedy at udel.edu Fri Apr 24 22:25:42 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Apr 2009 16:25:42 -0400 Subject: [Python-Dev] Summary of Python tracker Issues In-Reply-To: <20090424160729.B33E5780C9@psf.upfronthosting.co.za> References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za> Message-ID: Python tracker wrote: [snip] In going through this, I notice a lot of effort by Mark Dickenson and others to get some details of numbers computation and display right in time for 3.1. As a certain-to-be beneficiary, I want to thank all who contributed. Terry Jan Reedy From dickinsm at gmail.com Fri Apr 24 22:37:55 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 24 Apr 2009 21:37:55 +0100 Subject: [Python-Dev] Summary of Python tracker Issues In-Reply-To: References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za> Message-ID: <5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com> On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy wrote: > In going through this, I notice a lot of effort by Mark Dickenson and others Many others, but Eric Smith's name needs to be in big lights here. There's no way the short float repr would have been ready for 3.1 if Eric hadn't shown an interest in this at PyCon, and then taken on the major internal replumbing job this entailed for all of Python's string formatting. > 3.1. ?As a certain-to-be beneficiary, I want to thank all who contributed. Glad you like it! Mark From eric at trueblade.com Fri Apr 24 23:08:51 2009 From: eric at trueblade.com (Eric Smith) Date: Fri, 24 Apr 2009 17:08:51 -0400 Subject: [Python-Dev] Summary of Python tracker Issues In-Reply-To: <5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com> References: <20090424160729.B33E5780C9@psf.upfronthosting.co.za> <5c6f2a5d0904241337g275450f5uee119a15b831b659@mail.gmail.com> Message-ID: <49F22A63.60808@trueblade.com> Mark Dickinson wrote: > On Fri, Apr 24, 2009 at 9:25 PM, Terry Reedy wrote: >> In going through this, I notice a lot of effort by Mark Dickenson and others > > Many others, but Eric Smith's name needs to be in big lights here. > There's no way the short float repr would have been ready for 3.1 if > Eric hadn't shown an interest in this at PyCon, and then taken on > the major internal replumbing job this entailed for all of Python's > string formatting. Not to get too much into a mutual admiration mode, but Mark did the parts involving hard thinking. >> 3.1. As a certain-to-be beneficiary, I want to thank all who contributed. > > Glad you like it! Me, too. I think it's going to be great once we get it all straightened out. And I think we're close! Eric. From eric at trueblade.com Fri Apr 24 23:15:13 2009 From: eric at trueblade.com (Eric Smith) Date: Fri, 24 Apr 2009 17:15:13 -0400 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd In-Reply-To: <49DD2E41.80401@trueblade.com> References: <49DD2E41.80401@trueblade.com> Message-ID: <49F22BE1.7020603@trueblade.com> Eric Smith wrote: > Assuming that Mark's and my changes in the py3k-short-float-repr branch > get checked in shortly, I'd like to deprecate PyOS_ascii_formatd. Its > functionality is largely being replaced by PyOS_double_to_string, which > we're introducing on our branch. We've checked the changes in, and everything looks good as far as I can tell. > My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in 3.2. Having heard no dissent, I'd like to go ahead and deprecate this API. What are the mechanics of deprecating this? Just documentation, or is there something I should do in the code to generate a warning? Any pointers to examples would be great. > The 2.7 situation is tricker, because we're not planning on backporting > the short-float-repr work back to 2.7. In 2.7 I guess we'll leave > PyOS_ascii_formatd around, unfortunately. I backported the new API to 2.7, so I'll also deprecate PyOS_ascii_formatd there. Eric. From benjamin at python.org Fri Apr 24 23:17:19 2009 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 24 Apr 2009 16:17:19 -0500 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd In-Reply-To: <49F22BE1.7020603@trueblade.com> References: <49DD2E41.80401@trueblade.com> <49F22BE1.7020603@trueblade.com> Message-ID: <1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com> 2009/4/24 Eric Smith : >> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in >> 3.2. > > Having heard no dissent, I'd like to go ahead and deprecate this API. What > are the mechanics of deprecating this? Just documentation, or is there > something I should do in the code to generate a warning? Any pointers to > examples would be great. You can use PyErr_WarnEx(). -- Regards, Benjamin From a.badger at gmail.com Fri Apr 24 23:26:12 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 24 Apr 2009 14:26:12 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F215E5.4050205@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> Message-ID: <49F22E74.4070108@gmail.com> Glenn Linderman wrote: > On approximately 4/24/2009 11:40 AM, came the following characters from > And so my encoding (1) doesn't alter the data stream for any valid > Windows file name, and where the naivest of users reside (2) doesn't > alter the data stream for any Posix file name that was encoded as UTF-8 > sequences and doesn't contain ? characters in the file name [I perceive > the use of ? in file names to be rare on Posix, because of experience, > and because of the other problems caused by such use] (3) doesn't > introduce data puns within applications that are correctly coded to know > the encoding occurs. The encoding technique in the PEP not only can > produce data puns, thus not being reversible, it provides no reliable > mechanism to know that this has occurred. > Uhm.... Not arguing with your goals but '?' is unfortunately reasonably easy to get into a filename. For instance, I've had to download a lot of scratch built packages from our buildsystem recently. Scratch builds have url's with query strings in them so:: wget 'http://koji.fedoraproject.org/koji/getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm' Which results in the filename: getfile?taskID=1318059&name=monodevelop-debugger-gdb-2.0-1.1.i586.rpm -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From tjreedy at udel.edu Fri Apr 24 23:36:54 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Apr 2009 17:36:54 -0400 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> Message-ID: Raymond Hettinger wrote: > Does anyone have any ideas about what to do with issue 5830 and handling > the problem in a general way (not just for sched)? > > The basic problem is that decorate/compare/undecorate patterns no longer > work when the primary sort keys are equal and the secondary keys are > unorderable (which is now the case for many callables). > > >>> tasks = [(10, lambda: 0), (20, lambda: 1), (10, lambda: 2)] > >>> tasks.sort() > Traceback (most recent call last): > ... > TypeError: unorderable types: function() < function() > > Would it make sense to provide a default ordering whenever the types are > the same? > > def object.__lt__(self, other): > if type(self) == type(other): > return id(self) < id(other) > raise TypeError The immediate problem with this is that 'same type', or not, is sometimes a somewhat arbitrary implementation detail. In 2.x, 4000000000 could be int or long, depending on the build. In 3.0, that difference disappeared. User-defined and builtin functions are different classes for implementation, not conceptual reasons. (This could potentially bite what I understand to be your r71844/5 fix.) Unbound methods used to be the same class as bound methods (as I remember). In 3.0, the wrapping disappeared and they are the same thing as the underlying function. In 2.x, ascii text and binary data might both be str. Now they might be str and bytes. Universal ordering and default ordering by id was broken (and doomed) when Guido decided that complex numbers should not be comparable either lexicographically or by id. Your proposed object.__lt__ would reverse that decision, unless, of course, complex was special-cased (again) to over-ride it, but then we would be back to the 2.x situation of mixed rules and exceptions. Terry Jan Reedy From p.f.moore at gmail.com Sat Apr 25 00:05:04 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Apr 2009 23:05:04 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> 2009/4/24 Stephen J. Turnbull : > Paul Moore writes: > > ?> The pros for Martin's proposal are a uniform cross-platform interface, > ?> and a user-friendly API for the common case. > > A more accurate phrasing would be "... a user-friendly API for those > who feel very lucky today." ?Which is the common case, of course, but > spins a little differently. Sorry, but I think you're misrepresenting things. I'd have probably let you off if you'd missed out the "very" - but I do think that it's the common case. Consider: - Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). > ?> [1] Actually, all the PEP says is "With this PEP, a uniform > ?> treatment of these data as characters becomes possible." An > ?> argument as to why this is a good thing would be a useful addition > ?> to the PEP. At the moment it's more or less treated as self-evident > ?> - which I agree with, but which clearly the Unix people here are > ?> not as certain of. > > Well, the problem is that both parts are false. I can't work out which "parts" you are referring to here. > If you didn't start > with a valid string in a known encoding, you shouldn't treat it as > characters because it's not. Again, that's the purist argument. If you have a string (of bytes, I guess) and a 99% certain guess as to the correct encoding, then I'd argue that, as long as (a) it's not mission-critical (lives or backups depend on it) and (b) you have a means of failing relatively gracefully, you have every reason to make the assumption about encoding. After all, what's the alternative? Ultimately, you have a byte string and no encoding. You make some assumption, or you can do hardly anything. What use is "Processing file \x66\x6f\x6f" as a progress indicator for a program that scans a directory? (That was "foo" for people who can't read latin-1 written in hex :-)) > Hand it to a careful API, and you'll get > an Exception raised in your face. ?And that's precisely why it's not > obviously a good thing. ?Careful clients will have to treat it as > "transcoded bytes", and so the people who develop those clients get no > benefit. ?OTOH, at least some of those who feel lucky and use it > naively are going to turn out to be wrong. But 99% of the time, "it" is a perfectly acceptable string. (Percentage invented out of thin air, admitted :-)) Remember, only when the system encounters an undecodable byte sequence, would a technically invalid string be generated - and as far as I can tell, the main case when that would happen is on Unix, if the user specifies UTF-8 as the encoding, and the actual filesystem uses something else, *and* there's a file with a name whose byte sequence is invalid UTF-8. I'm *really* struggling to see that as a common scenario. Admittedly, there are other, possibly more common, cases where the string translation is valid, but semantically not what the user expects - user says CP1251, but filesystem is CP850, say. As a UK Windows user, I'm used to seeing CP850 vs CP1251 confusions like this - "?" replaced with ? is the common case. It happens occasionally, and occasionally causes code to behave unexpectedly. But it doesn't reformat my hard drive and the alternative (having to be extra-careful to tell every program precisely which encoding I'm using in every situation) would make programs effectively unusable. > That said, I'm +0 on the PEP as is. So I'm largely preaching to the converted here. After all, lukewarm acceptance from someone with experience of Asian encoding issues is pretty much the equivalent of resounding support from someone who only ever works in English! :-) Paul. From python at rcn.com Sat Apr 25 00:23:52 2009 From: python at rcn.com (Raymond Hettinger) Date: Fri, 24 Apr 2009 15:23:52 -0700 Subject: [Python-Dev] Tuples and underorderable types References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> <49F2059A.9090708@v.loewis.de> Message-ID: <2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1> > I would discourage use of the decorate/sort/undecorate pattern, > and encourage use of the key= argument. Or, if you really need > to decorate into a tuple, still pass a key= argument. The bug report was actually about the sched module which used heapq to prioritize tuples consisting of times, priorities, and actions. I fixed and closed the original bug a few hours ago but had a thought that the pattern itself may be ubiquitious (especially with heapq). ISTM that other bugs like this are lurking about. But all of you guys seem to think the status quo is fine, so that's the end of it. Cheers, Raymond From Leo.Barendse at nokia.com Fri Apr 24 23:57:36 2009 From: Leo.Barendse at nokia.com (Leo.Barendse at nokia.com) Date: Fri, 24 Apr 2009 23:57:36 +0200 Subject: [Python-Dev] "Length of str " changes after passed in Python 2.5 Message-ID: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com> ----------------------------------------------------------- I have the following code: # len(all_svs) = 10 # the I call a function with 2 list parameters def proc_line(line,all_svs) : # inside the function the length of the list "all_svs" is 1 more -> 11 # I had to workaround it for i in range(len(all_svs) - 1 ) : # some how the length of all_svs is incremented !!!!!!!!!!!!!!!!!!!!!!!!!!! -------------------------------------------------------------- Is this a compiler bug ?? Or is it because of my first try of Python Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From aahz at pythoncraft.com Sat Apr 25 00:34:18 2009 From: aahz at pythoncraft.com (Aahz) Date: Fri, 24 Apr 2009 15:34:18 -0700 Subject: [Python-Dev] "Length of str " changes after passed in Python 2.5 In-Reply-To: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com> References: <89F9BF23C080784180D44D7A8FEBA42909E037321B@NOK-EUMSG-03.mgdnok.nokia.com> Message-ID: <20090424223418.GA23575@panix.com> On Fri, Apr 24, 2009, Leo.Barendse at nokia.com wrote: > > I have the following code: > # len(all_svs) = 10 > > # the I call a function with 2 list parameters > def proc_line(line,all_svs) : > > # inside the function the length of the list "all_svs" is 1 more -> 11 > # I had to workaround it This sounds like a usage question. Please use comp.lang.python (or possibly the tutor mailing list). -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From foom at fuhm.net Sat Apr 25 01:06:29 2009 From: foom at fuhm.net (James Y Knight) Date: Fri, 24 Apr 2009 19:06:29 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> Message-ID: On Apr 24, 2009, at 6:05 PM, Paul Moore wrote: > - Windows systems where broken Unicode (lone surrogates or whatever) > isn't involved > - Unix systems where the user's stated filesystem encoding is correct > > Can you honestly say that this isn't the vast majority of real-world > environments? (IIRC, you are based in Japan, so it may well be true > that the likelihood of problems is a lot higher where you are than > where I am - the UK - but I suspect that averaging out, things are > generally as above). In my experience, it is normal on most unix systems that some programs (mostly daemons) are running in default "POSIX" locale, others (most user programs) are running in the "en_US.utf-8" locale, and some luddite users have set themselves to "en_US.8859-1". All running on the same system. James From tjreedy at udel.edu Sat Apr 25 03:08:16 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Apr 2009 21:08:16 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F22E74.4070108@gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> <49F22E74.4070108@gmail.com> Message-ID: Toshio Kuratomi wrote: > Glenn Linderman wrote: >> On approximately 4/24/2009 11:40 AM, came the following characters from >> And so my encoding (1) doesn't alter the data stream for any valid >> Windows file name, and where the naivest of users reside (2) doesn't >> alter the data stream for any Posix file name that was encoded as UTF-8 >> sequences and doesn't contain ? characters in the file name [I perceive >> the use of ? in file names to be rare on Posix, because of experience, >> and because of the other problems caused by such use] (3) doesn't >> introduce data puns within applications that are correctly coded to know >> the encoding occurs. The encoding technique in the PEP not only can >> produce data puns, thus not being reversible, it provides no reliable >> mechanism to know that this has occurred. >> > Uhm.... Not arguing with your goals but '?' is unfortunately reasonably > easy to get into a filename. For instance, I've had to download a lot > of scratch built packages from our buildsystem recently. Scratch builds > have url's with query strings in them so:: Is NUL \0 allowed in POSIX file names? If not, could that be used as an escape char. If it is not legal, then custom translated strings that escape in the wild would raise a red flag as soon as something else tried to use them. From tjreedy at udel.edu Sat Apr 25 03:16:24 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Apr 2009 21:16:24 -0400 Subject: [Python-Dev] Tuples and underorderable types In-Reply-To: <2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1> References: <729FA93BCDEB499F91C92D454CED094A@RaymondLaptop1> <3E1C6FF6660F42EC8D8094D34CE719D4@RaymondLaptop1> <49F2059A.9090708@v.loewis.de> <2DDEC3B33EDF49CABD54E3C1D1CAAF0B@RaymondLaptop1> Message-ID: Raymond Hettinger wrote: > >> I would discourage use of the decorate/sort/undecorate pattern, >> and encourage use of the key= argument. Or, if you really need >> to decorate into a tuple, still pass a key= argument. > > The bug report was actually about the sched module which used > heapq to prioritize tuples consisting of times, priorities, and actions. > I fixed and closed the original bug a few hours ago but had a > thought that the pattern itself may be ubiquitious (especially with heapq). > ISTM that other bugs like this are lurking about. But all of you guys > seem to think the status quo is fine, so that's the end of it. If you define the bug as the sched module not being updated to the 3.0 order, then there are possibly more. I notice that most of the heapq functions do not take a key function argument. Has or will this change in the future? Or is making key-decorated tuples the responsibility of the user? (I can see that a key func would work better with PriQueue class where the key func is passed just once.) tjr From a.badger at gmail.com Sat Apr 25 03:20:56 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 24 Apr 2009 18:20:56 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> <49F22E74.4070108@gmail.com> Message-ID: <49F26578.60603@gmail.com> Terry Reedy wrote: > Is NUL \0 allowed in POSIX file names? If not, could that be used as an > escape char. If it is not legal, then custom translated strings that > escape in the wild would raise a red flag as soon as something else > tried to use them. > AFAIK NUL should be okay but I haven't read a specification to reach that conclusion. Is that a proposal? Should I go find someone who has read the relevant standards to find out? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From cs at zip.com.au Sat Apr 25 06:22:47 2009 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 25 Apr 2009 14:22:47 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F26578.60603@gmail.com> Message-ID: <20090425042247.GA26029@cskk.homeip.net> On 24Apr2009 18:20, Toshio Kuratomi wrote: | Terry Reedy wrote: | > Is NUL \0 allowed in POSIX file names? If not, could that be used as an | > escape char. If it is not legal, then custom translated strings that | > escape in the wild would raise a red flag as soon as something else | > tried to use them. | > | AFAIK NUL should be okay but I haven't read a specification to reach | that conclusion. Is that a proposal? Should I go find someone who has | read the relevant standards to find out? NUL cannot occur in a POSIX file path, if for no other reason than that the API uses C strings, which are NUL terminated. So, yes, you could use NUL as an escape character if you're sure you're never dealing with _non_POSIX pathnames:-) Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ | I'm the female partner of a climber (I don't climb) and until now, I was | under the impression that climbers are cool people, but alas, you had to | ruin it for me. *REAL* climbers are crude, impolite, solitary, abrupt, arrogant. Sport climbers are cool. - Rene Tio in rec.climbing From p.f.moore at gmail.com Sat Apr 25 11:00:24 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 25 Apr 2009 10:00:24 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> Message-ID: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> 2009/4/25 James Y Knight : > On Apr 24, 2009, at 6:05 PM, Paul Moore wrote: >> >> - Windows systems where broken Unicode (lone surrogates or whatever) >> isn't involved >> - Unix systems where the user's stated filesystem encoding is correct >> >> Can you honestly say that this isn't the vast majority of real-world >> environments? (IIRC, you are based in Japan, so it may well be true >> that the likelihood of problems is a lot higher where you are than >> where I am - the UK - but I suspect that averaging out, things are >> generally as above). > > In my experience, it is normal on most unix systems that some programs > (mostly daemons) are running in default "POSIX" locale, others (most user > programs) are running in the "en_US.utf-8" locale, and some luddite users > have set themselves to "en_US.8859-1". All running on the same system. OK, thanks for the data point. Following on from that, would this (under Martin's proposal) result in programs receiving encoded strings, or just semantically-incorrect ones? Specifically, the 8859-1 case cannot result in encoded strings, as 8859-1 can represent all byte strings (possibly garbled, but at least validly). The utf8 case can hit unrepresentable bytes, but only if there are characters greater than 0x7F in filenames. Is the "POSIX" case ASCII? If so, then the same logic (>=0x80 is unrepresentable). So, the next question is - do people on such systems frequently use high-bit characters in filenames? Paul. PS Unfortunately, I suspect that the biggest group of people likely to be hit badly by this is people using non-latin scripts. And arguing probabilities without real data is optimistic at best. But those people are also the *least* likely people to contribute on an English-speaking list, I guess :-( (Sincere apologies if everyone but me on this list happens to actually be fluent English-speaking Russians :-)) From eric at trueblade.com Sat Apr 25 13:03:39 2009 From: eric at trueblade.com (Eric Smith) Date: Sat, 25 Apr 2009 07:03:39 -0400 Subject: [Python-Dev] Deprecating PyOS_ascii_formatd In-Reply-To: <1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com> References: <49DD2E41.80401@trueblade.com> <49F22BE1.7020603@trueblade.com> <1afaf6160904241417i64fc6640x680f7a54789b322c@mail.gmail.com> Message-ID: <49F2EE0B.1090006@trueblade.com> Benjamin Peterson wrote: > 2009/4/24 Eric Smith : >>> My proposal is to deprecate PyOS_ascii_formatd in 3.1 and remove it in >>> 3.2. >> Having heard no dissent, I'd like to go ahead and deprecate this API. What >> are the mechanics of deprecating this? Just documentation, or is there >> something I should do in the code to generate a warning? Any pointers to >> examples would be great. > > You can use PyErr_WarnEx(). Thanks. I created issue 5835 to track this. I marked it as a release blocker, but I should have no problem finishing it up this weekend. From martin at v.loewis.de Sat Apr 25 14:07:44 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 14:07:44 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090423232712.GA31693@cskk.homeip.net> References: <20090423232712.GA31693@cskk.homeip.net> Message-ID: <49F2FD10.9080302@v.loewis.de> Cameron Simpson wrote: > On 22Apr2009 08:50, Martin v. L?wis wrote: > | File names, environment variables, and command line arguments are > | defined as being character data in POSIX; > > Specific citation please? I'd like to check the specifics of this. For example, on environment variables: http://opengroup.org/onlinepubs/007908799/xbd/envvar.html # For values to be portable across XSI-conformant systems, the value # must be composed of characters from the portable character set (except # NUL and as indicated below). # Environment variable names used by the utilities in the XCU # specification consist solely of upper-case letters, digits and the "_" # (underscore) from the characters defined in Portable Character Set . # Other characters may be permitted by an implementation; Or, on command line arguments: http://opengroup.org/onlinepubs/007908799/xsh/execve.html # The arguments represented by arg0, ... are pointers to null-terminated # character strings where a character string is "A contiguous sequence of characters terminated by and including the first null byte.", and a character is # A sequence of one or more bytes representing a single graphic symbol # or control code. This term corresponds to the ISO C standard term # multibyte character (multi-byte character), where a single-byte # character is a special case of a multi-byte character. Unlike the # usage in the ISO C standard, character here has no necessary # relationship with storage space, and byte is used when storage space # is discussed. > So you're proposing that all POSIX OS interfaces (which use byte strings) > interpret those byte strings into Python3 str objects, with a codec > that will accept arbitrary byte sequences losslessly and is totally > reversible, yes? Correct. > And, I hope, that the os.* interfaces silently use it by default. Correct. > | Applications that need to process the original byte > | strings can obtain them by encoding the character strings with the > | file system encoding, passing "python-escape" as the error handler > | name. > > -1 > > This last sentence kills the idea for me, unless I'm missing something. > Which I may be, of course. > > POSIX filesystems _do_not_ have a file system encoding. Why is that a problem for the PEP? > If I'm writing a general purpose UNIX tool like chmod or find, I expect > it to work reliably on _any_ UNIX pathname. It must be totally encoding > blind. If I speak to the os.* interface to open a file, I expect to hand > it bytes and have it behave. See the other messages. If you want to do that, you can continue to. > I'm very much in favour of being able to work in strings for most > purposes, but if I use the os.* interfaces on a UNIX system it is > necessary to be _able_ to work in bytes, because UNIX file pathnames > are bytes. Please re-read the PEP. It provides a way of being able to access any POSIX file name correctly, and still pass strings. > If there isn't a byte-safe os.* facility in Python3, it will simply be > unsuitable for writing low level UNIX tools. Why is that? The mechanism in the PEP is precisely defined to allow writing low level UNIX tools. > Finally, I have a small python program whose whole purpose in life > is to transcode UNIX filenames before transfer to a MacOSX HFS > directory, because of HFS's enforced particular encoding. What approach > should a Python app take to transcode UNIX pathnames under your scheme? Compute the corresponding character strings, and use them. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:12:25 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 14:12:25 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090423234724.GA8077@cskk.homeip.net> References: <20090423234724.GA8077@cskk.homeip.net> Message-ID: <49F2FE29.6080608@v.loewis.de> > | 2. Even if they were taken away (which the PEP does not propose to do), > | it would be easy to emulate them for applications that want them. > | For example, listdir could be wrapped as > | > | def listdir_b(bytestring): > | fse = sys.getfilesystemencoding() > > Alas, no No, what? No, that algorithm would be incorrect? > because there is no sys.getfilesystemencoding() at the POSIX > level. It's only the user's current locale stuff on a UNIX system, and > has _nothing_ to do with the filesystem because UNIX filesystems don't > have encodings. So can you produce a specific example where my proposed listdir_b function would fail to work correctly? For it to work, it is not necessary that POSIX has no notion of character sets on the file system level (which is actually not true - POSIX very well recognizes the notion of character sets for file names, and recommends that you restrict yourself to the portable character set). > In particular, because the "best" (or to my mind "misleading") you > can do for this is report what the current user thinks: > http://docs.python.org/library/sys.html#sys.getfilesystemencoding > then there's no guarrentee that what is chosen has any releationship to > what was in use when the files being consulted were made. For this PEP, it's irrelevant. It will work even if the chosen encoding is a bad choice. > Now, if I were writing listdir_b() I'd want to be able to do something > along these lines: > - set LC_ALL=C (or some equivalent mechanism) > - have os.listdir() read bytes as numeric values and transcode their values > _directly_ into the corresponding Unicode code points. > - yield bytes( ord(c) for c in os_listdir_string ) > - have os.open() et al transcode unicode code points back into bytes. > i.e. a straight one-to-one mapping, using only codepoints in the range > 1..255. That would be an alternative approach to the same problem (and one that I think will fail more badly than the one I'm proposing). Regards, Martin From martin at v.loewis.de Sat Apr 25 14:17:14 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 14:17:14 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <49F2FF4A.8010507@v.loewis.de> Simon Cross wrote: >> Unfortunately, for Windows, the situation would >> be exactly the opposite: the byte-oriented interface cannot represent >> all data; only the character-oriented API can. > > Is the second part of this actually true? My understanding may be > flawed, but surely all Unicode data can be converted to and from bytes > using UTF-8? [I hope, by "second part", you refer to the part that I left] It's true that UTF-8 could represent all Windows file names. However, the byte-oriented APIs of Windows do not use UTF-8, but instead, they use the Windows ANSI code page (which varies with the installation). > Given this, can't people who > must have access to all files / environment data just use the bytes > interface? No, because the Windows API would interpret the bytes differently, and not find the right file. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:22:27 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 14:22:27 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F184C6.8000905@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> Message-ID: <49F30083.5050506@v.loewis.de> > The problem with this, and other preceding schemes that have been > discussed here, is that there is no means of ascertaining whether a > particular file name str was obtained from a str API, or was funny- > decoded from a bytes API... and thus, there is no means of reliably > ascertaining whether a particular filename str should be passed to a > str API, or funny-encoded back to bytes. Why is it necessary that you are able to make this distinction? > Picking a character (I don't find U+F01xx in the > Unicode standard, so I don't know what it is) It's a private use area. It will never carry an official character assignment. > As I realized in the email-sig, in talking about decoding corrupted > headers, there is only one way to guarantee this... to encode _all_ > character sequences, from _all_ interfaces. Basically it requires > reserving an escape character (I'll use ? in these examples -- yes, an > ASCII question mark -- happens to be illegal in Windows filenames so > all the better on that platform, but the specific character doesn't > matter... avoiding / \ and . is probably good, though). I think you'll have to write an alternative PEP if you want to see something like this implemented throughout Python. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:24:50 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 14:24:50 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> Message-ID: <49F30112.80402@v.loewis.de> > Humour aside :), the expectation that filenames are Unicode data > simply doesn't agree with the reality of POSIX file systems. I think > an approach similar to that adopted by glib [1] could work Are you saying that the approach presented in the PEP will not work? I believe it would work no matter whether that expectation agrees with reality or not. The amount of moji-bake that you get is larger when the disagreement is larger, but it will continue to *work*. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:28:11 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 14:28:11 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090424152746.GA9543@panix.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> Message-ID: <49F301DB.1090704@v.loewis.de> > The part that I haven't seen clearly addressed so far is what happens > when disks get mounted across OSes (e.g. NFS). > > While I agree that there should be a layer on top that can handle "most" > situations, it also seems clear that the raw layer needs to be readily > accessible. Indeed, with the PEP, the raw layer does remain readily available. If you know that it was originally bytes, you can get the very same bytes back if you want to. However, for disks mounted across OSes, you won't have to, normally. If you think there is a problem with these, can you please describe a specific scenario? What application, what file names, what encodings, what problems? Regards, Martin From martin at v.loewis.de Sat Apr 25 14:31:53 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 14:31:53 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> Message-ID: <49F302B9.6020907@v.loewis.de> > [1] Actually, all the PEP says is "With this PEP, a uniform treatment > of these data as characters becomes > possible." An argument as to why this is a good thing would be a > useful addition to the PEP. At the moment it's more or less treated as > self-evident - which I agree with, but which clearly the Unix people > here are not as certain of. Ok, I have added another paragraph. Not sure whether it helps to clarify though. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:35:28 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 14:35:28 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F215E5.4050205@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> Message-ID: <49F30390.2040808@v.loewis.de> > Because the encoding is not reliably reversible. Why do you say that? The encoding is completely reversible (unless we disagree on what "reversible" means). > I'm +1 on the concept, -1 on the PEP, due solely to the lack of a > reversible encoding. Then please provide an example for a setup where it is not reversible. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:42:37 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 14:42:37 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> Message-ID: <49F3053D.9090100@v.loewis.de> > Following on from that, would this (under Martin's proposal) result in > programs receiving encoded strings, or just semantically-incorrect > ones? Not sure I understand the question - what is an "encoded string"? As you analyse below, sometimes, the current (2.x) file system encoding will do the right thing; sometimes, it will decode successfully, but still not give the intended string, and sometimes, it will fail. With the PEP, it won't fail, but give a string back that likely wasn't intended by the user. This might be confusing if you try to render it to a user interface; if the application merely passes it back to file system APIs, it will work fine. > So, the next question is - do people on such systems frequently use > high-bit characters in filenames? They typically do until they run into problems. For example, if they set the locale to something, and then create files in their homedirectory, it will work just fine, and nobody else will ever see the files (except for the backup software). When they find that the files they created are inaccessible to others, they will often stop using funny characters. Regards, Martin From martin at v.loewis.de Sat Apr 25 14:44:49 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 14:44:49 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F11829.9070504@mrabarnett.plus.com> References: <49EEBE2E.3090601@v.loewis.de> <49EF0ADB.2090107@mrabarnett.plus.com> <49EF6F06.9060008@v.loewis.de> <49F11829.9070504@mrabarnett.plus.com> Message-ID: <49F305C1.5070309@v.loewis.de> > If the bytes are mapped to single half surrogate codes instead of the > normal pairs (low+high), then I can see that decoding could never be > ambiguous and encoding could produce the original bytes. I was confused by Markus Kuhn's original UTF-8b specification. I have now changed the PEP to avoid using PUA characters at all. Regards, Martin From google at mrabarnett.plus.com Sat Apr 25 16:21:13 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sat, 25 Apr 2009 15:21:13 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F305C1.5070309@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49EF0ADB.2090107@mrabarnett.plus.com> <49EF6F06.9060008@v.loewis.de> <49F11829.9070504@mrabarnett.plus.com> <49F305C1.5070309@v.loewis.de> Message-ID: <49F31C59.1040309@mrabarnett.plus.com> Martin v. L?wis wrote: >> If the bytes are mapped to single half surrogate codes instead of the >> normal pairs (low+high), then I can see that decoding could never be >> ambiguous and encoding could produce the original bytes. > > I was confused by Markus Kuhn's original UTF-8b specification. I have > now changed the PEP to avoid using PUA characters at all. > I find the PEP easier to understand now. In detail I'd say that if a sequence of bytes >=0x80 is found which is not valid UTF-8, then the first byte is mapped to a half surrogate and then decoding is continued from the next byte. The only drawback I can see is if the UTF-8 bytes actually decode to a half surrogate. However, half surrogates should really only occur in UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8 anyway! As for handling this case, you could either: 1. Raise an exception (which is what you're trying to avoid) or: 2. Treat it as invalid UTF-8 and map the bytes to half surrogates (encoding would produce the original bytes). I'd prefer option 2. Anyway, +1 from me. From p.f.moore at gmail.com Sat Apr 25 16:38:03 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 25 Apr 2009 15:38:03 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F3053D.9090100@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> Message-ID: <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> 2009/4/25 "Martin v. L?wis" : >> Following on from that, would this (under Martin's proposal) result in >> programs receiving encoded strings, or just semantically-incorrect >> ones? > > Not sure I understand the question - what is an "encoded string"? Sorry. I was struggling to come up with terminology for the various concepts I was trying to express, as I went along. I was meaning a string which has been created from a non-decodable byte sequence using the encoding process you specify in the PEP (with the current version of the PEP, this would be a string with lone half surrogate codes). I was distinguishing these because some people seemed to be implying that such strings were the ones which would result in exceptions. (I think that was Stephen, when he referred to a "careful API"). > As you analyse below, sometimes, the current (2.x) file system encoding > will do the right thing; sometimes, it will decode successfully, but > still not give the intended string, and sometimes, it will fail. With > the PEP, it won't fail, but give a string back that likely wasn't > intended by the user. This might be confusing if you try to render it to > a user interface; if the application merely passes it back to file > system APIs, it will work fine. OK, looks like my analysis matches yours, except that I wasn't sure if the third case (a string that "likely wasn't intended") could result in exceptions. From what you're saying, it sounds like it would actually be similar to the second case - I'm not clear on how surrogates work, though. >> So, the next question is - do people on such systems frequently use >> high-bit characters in filenames? > > They typically do until they run into problems. For example, if they > set the locale to something, and then create files in their > homedirectory, it will work just fine, and nobody else will ever see > the files (except for the backup software). > > When they find that the files they created are inaccessible to others, > they will often stop using funny characters. Which sounds fairly practical - and the irony of someone with a "funny character" in his surname telling me this hasn't escaped me :-) Paul. From martin at v.loewis.de Sat Apr 25 17:00:17 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 17:00:17 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> Message-ID: <49F32581.1050004@v.loewis.de> > OK, looks like my analysis matches yours, except that I wasn't sure if > the third case (a string that "likely wasn't intended") could result > in exceptions. From what you're saying, it sounds like it would > actually be similar to the second case - I'm not clear on how > surrogates work, though. On decoding, there is a guarantee that it decodes successfully. There is also a guarantee that the result will re-encode successfully, and yield the same byte string. If you pass a different string into encoding, you still may get exceptions. For example, if the filesystem encoding is latin-1, passing u"\u20ac" will continue to raise exceptions, even under the python-escape error handler - that error handler will only handle surrogates. There isn't really that much trickery to surrogates. They *have* to come in pairs to be meaningful, with the first one in the range D800..DBFF (high surrogate), and the second in the range DC00..DCFF (low surrogate). Having a lone low surrogate is not meaningful; this is how the escaping works. Proper surrogate pairs encode characters outside the BMP, for use with UTF-16: each code contributes 10 bits (just count how many codes there are in D800..DCFF), together, a pair encodes 20 bits, allowing for 2**20 characters, starting at U+10000. >> When they find that the files they created are inaccessible to others, >> they will often stop using funny characters. > > Which sounds fairly practical - and the irony of someone with a "funny > character" in his surname telling me this hasn't escaped me :-) Sure: my Unix account name was always "loewis", and even on Windows, our admins didn't dare to put the umlaut into the account name - it would be difficult to login with a US keyboard, for example. People who use non-ASCII characters in filenames around here are primarily non-IT people who aren't aware that these characters are different from the rest. I recognize that for other languages (without trivial transliterations) the problem is more severe, and people are more likely to create files with Cyrillic, or Japanese, names (say) if the systems accepts them at all. Regards, Martin From martin at v.loewis.de Sat Apr 25 17:05:23 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 25 Apr 2009 17:05:23 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F31C59.1040309@mrabarnett.plus.com> References: <49EEBE2E.3090601@v.loewis.de> <49EF0ADB.2090107@mrabarnett.plus.com> <49EF6F06.9060008@v.loewis.de> <49F11829.9070504@mrabarnett.plus.com> <49F305C1.5070309@v.loewis.de> <49F31C59.1040309@mrabarnett.plus.com> Message-ID: <49F326B3.60908@v.loewis.de> > The only drawback I can see is if the UTF-8 bytes actually decode to a > half surrogate. However, half surrogates should really only occur in > UTF-16 (as I understand it), so they shouldn't be encoded in UTF-8 > anyway! Right: that's the rationale for UTF-8b. Encoding half surrogates violates parts of the Unicode spec, so UTF-8b is "safe". > As for handling this case, you could either: > > 1. Raise an exception (which is what you're trying to avoid) > > or: > > 2. Treat it as invalid UTF-8 and map the bytes to half surrogates > (encoding would produce the original bytes). > > I'd prefer option 2. I hadn't thought of this case, but you are right - they *are* illegal bytes, after all. Raising an exception would be useless since the whole point of this codec is to never raise unicode errors. Regards, Martin From zooko at zooko.com Sat Apr 25 17:29:54 2009 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat, 25 Apr 2009 09:29:54 -0600 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <9E5E533C-E323-4937-A296-52F47F68AC3F@zooko.com> Thanks for writing this PEP 383, MvL. I recently ran into this problem in Python 2.x in the Tahoe project [1]. The Tahoe project should be considered a good use case showing what some people need. For example, the assumption that a file will later be written back into the same local filesystem (and thus luckily use the same encoding) from which it originally came doesn't hold for us, because Tahoe is used for file-sharing as well as for backup-and-restore. One of my first conclusions in pursuing this issue is that we can never use the Python 2.x unicode APIs on Linux, just as we can never use the Python 2.x str APIs on Windows [2]. (You mentioned this ugliness in your PEP.) My next conclusion was that the Linux way of doing encoding of filenames really sucks compared to, for example, the Mac OS X way. I'm heartened to see what David Wheeler is trying to persuade the maintainers of Linux filesystems to improve some of this: [3]. My final conclusion was that we needed to have two kinds of workaround for the Linux suckage: first, if decoding using the suggested filesystem encoding fails, then we fall back to mojibake [4] by decoding with iso-8859-1 (or else with windows-1252 -- I'm not sure if it matters and I haven't yet understood if utf-8b offers another alternative for this case). Second, if decoding succeeds using the suggested filesystem encoding on Linux, then write down the encoding that we used and include that with the filename. This expands the size of our filenames significantly, but it is the only way to allow some future programmer to undo the damage of a falsely- successful decoding. Here's our whole plan: [5]. Regards, Zooko [1] http://allmydata.org [2] http://allmydata.org/pipermail/tahoe-dev/2009-March/001379.html # see the footnote of this message [3] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [4] http://en.wikipedia.org/wiki/Mojibake [5] http://allmydata.org/trac/tahoe/ticket/534#comment:47 From phd at phd.pp.ru Sat Apr 25 17:51:57 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 25 Apr 2009 19:51:57 +0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F32581.1050004@v.loewis.de> References: <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> <49F32581.1050004@v.loewis.de> Message-ID: <20090425155157.GA10071@phd.pp.ru> On Sat, Apr 25, 2009 at 05:00:17PM +0200, "Martin v. L?wis" wrote: > I recognize that for other languages (without trivial transliterations) > the problem is more severe, and people are more likely to create > files with Cyrillic, or Japanese, names (say) if the systems accepts > them at all. In different encodings on the same filesystem... Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From murman at gmail.com Sat Apr 25 18:18:20 2009 From: murman at gmail.com (Michael Urman) Date: Sat, 25 Apr 2009 11:18:20 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F32581.1050004@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> <49F32581.1050004@v.loewis.de> Message-ID: On Sat, Apr 25, 2009 at 10:00, "Martin v. L?wis" wrote: > On decoding, there is a guarantee that it decodes successfully. There is > also a guarantee that the result will re-encode successfully, and yield > the same byte string. > > If you pass a different string into encoding, you still may get > exceptions. For example, if the filesystem encoding is latin-1, > passing u"\u20ac" will continue to raise exceptions, even under the > python-escape error handler - that error handler will only handle > surrogates. One angle I've not seen discussed yet is a set of use cases. While the PEP addresses the need for the python developer to not have to write insane conditional code that maps between bytes and str depending on the platform, it doesn't talk about what this allows an application to provide to a user, and at what risks. I see two main user-oriented use cases for the resulting Unicode strings this PEP will produce on all systems: displaying a list of filenames for the user to select from (an open file dialog), and allowing a user to edit or supply a filename (a save dialog or a rename control). It's clear what this PEP provides for the former. On well-behaved systems where a simpler filesystemencoding approach would work, the results are identical; the user can select filenames that are what he expects to see on both Unix and Windows. On less well-behaved systems, some characters may appear as junk in the middle of the name (or would they be invisible?), but should be recognizable enough to choose, or at least to open sequentially and remember what the last one was. On particularly poorly behaved systems, the results will be extremely difficult to read, but no approach is likely to fix this. What I don't find clear is what the risks are for the latter. On the less well behaved system, a user may well attempt to use this python application to fix filenames. Can we estimate a likelihood that edits to the names would result in a Unicode string that can no longer be encoded with the python-escape? Will a new name fully provided by a user on his keyboard (ignoring copy and paste) almost always safely encode? -- Michael Urman From martin at v.loewis.de Sat Apr 25 18:33:17 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 25 Apr 2009 18:33:17 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> <49F32581.1050004@v.loewis.de> Message-ID: <49F33B4D.3070707@v.loewis.de> > I see two main user-oriented use cases for the resulting Unicode > strings this PEP will produce on all systems: displaying a list of > filenames for the user to select from (an open file dialog), and > allowing a user to edit or supply a filename (a save dialog or a > rename control). There are more, in particular the case "user passes a file name on the command line", and "web server passes URL in environment variable". > It's clear what this PEP provides for the former. On well-behaved > systems where a simpler filesystemencoding approach would work, the > results are identical; the user can select filenames that are what he > expects to see on both Unix and Windows. On less well-behaved systems, > some characters may appear as junk in the middle of the name (or would > they be invisible?) Depends on the rendering. Try "print u'\udc00'" in your terminal to see what happens; for me, it renders the glyph for "replacement character". In GUI applications, you often see white boxes (rectangles). > What I don't find clear is what the risks are for the latter. On the > less well behaved system, a user may well attempt to use this python > application to fix filenames. Can we estimate a likelihood that edits > to the names would result in a Unicode string that can no longer be > encoded with the python-escape? Will a new name fully provided by a > user on his keyboard (ignoring copy and paste) almost always safely > encode? That very much depends on the system setup, and your impression is right that the PEP doesn't address it - it only deals with cases where you get random unsupported bytes; getting random unsupported characters from the user is not considered. If the user has the locale setup in way that matches his keyboard, it should work all fine - and will already, even without the PEP. If the user enters a character that doesn't directly map to a good file name, you get an exception, and have to tell the user to pick a different filename. Notice that it may fail at several layers: - it may be that characters entered are not supported in what Python choses as the file system encoding. - it may be that the characters are not supported by the file system, e.g. leading spaces in Win32. - it may be that the file cannot be renamed because the target name already exists. In all these cases, the application has to ask the user to reconsider; for at least the last case, it should be prepared to do that, anyway (there is also the case where renaming fails because of lack of permissions; in that case, picking a different file name won't help). Regards, Martin From solipsis at pitrou.net Sat Apr 25 18:34:13 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 25 Apr 2009 16:34:13 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> Message-ID: Paul Moore gmail.com> writes: > But those > people are also the *least* likely people to contribute on an > English-speaking list, I guess (Sincere apologies if everyone but > me on this list happens to actually be fluent English-speaking > Russians ) Actually, we're all Finnish. Regards, ?ntoine. From murman at gmail.com Sat Apr 25 18:48:21 2009 From: murman at gmail.com (Michael Urman) Date: Sat, 25 Apr 2009 11:48:21 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F33B4D.3070707@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> <49F32581.1050004@v.loewis.de> <49F33B4D.3070707@v.loewis.de> Message-ID: On Sat, Apr 25, 2009 at 11:33, "Martin v. L?wis" wrote: > If the user has the locale setup in way that matches his keyboard, > it should work all fine - and will already, even without the PEP. > If the user enters a character that doesn't directly map to a > good file name, you get an exception, and have to tell the user > to pick a different filename. This sound good so far - the 90% (or higher) case is still clean. > Notice that it may fail at several layers: > - it may be that characters entered are not supported in what > ?Python choses as the file system encoding. > - it may be that the characters are not supported by the file > ?system, e.g. leading spaces in Win32. > - it may be that the file cannot be renamed because the target > ?name already exists. > In all these cases, the application has to ask the user to > reconsider; for at least the last case, it should be prepared > to do that, anyway (there is also the case where renaming fails > because of lack of permissions; in that case, picking a different > file name won't help). This argument sounds good to me too. How will we communicate to developers what new exception might occur where? It would be a shame to have a solid application developed under Windows start raising encoding exceptions on linux. Would the encoding error get mapped to an IOError for all file APIs that do this encoding? -- Michael Urman From google at mrabarnett.plus.com Sat Apr 25 19:27:47 2009 From: google at mrabarnett.plus.com (MRAB) Date: Sat, 25 Apr 2009 18:27:47 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F33B4D.3070707@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> <49F3053D.9090100@v.loewis.de> <79990c6b0904250738i51be782fqc060870f865f6cd0@mail.gmail.com> <49F32581.1050004@v.loewis.de> <49F33B4D.3070707@v.loewis.de> Message-ID: <49F34813.5030206@mrabarnett.plus.com> Martin v. L?wis wrote: >> I see two main user-oriented use cases for the resulting Unicode >> strings this PEP will produce on all systems: displaying a list of >> filenames for the user to select from (an open file dialog), and >> allowing a user to edit or supply a filename (a save dialog or a >> rename control). > > There are more, in particular the case "user passes a file name > on the command line", and "web server passes URL in environment > variable". > >> It's clear what this PEP provides for the former. On well-behaved >> systems where a simpler filesystemencoding approach would work, the >> results are identical; the user can select filenames that are what he >> expects to see on both Unix and Windows. On less well-behaved systems, >> some characters may appear as junk in the middle of the name (or would >> they be invisible?) > > Depends on the rendering. Try "print u'\udc00'" in your terminal to see > what happens; for me, it renders the glyph for "replacement character". > In GUI applications, you often see white boxes (rectangles). > >> What I don't find clear is what the risks are for the latter. On the >> less well behaved system, a user may well attempt to use this python >> application to fix filenames. Can we estimate a likelihood that edits >> to the names would result in a Unicode string that can no longer be >> encoded with the python-escape? Will a new name fully provided by a >> user on his keyboard (ignoring copy and paste) almost always safely >> encode? > > That very much depends on the system setup, and your impression is > right that the PEP doesn't address it - it only deals with cases > where you get random unsupported bytes; getting random unsupported > characters from the user is not considered. > > If the user has the locale setup in way that matches his keyboard, > it should work all fine - and will already, even without the PEP. > If the user enters a character that doesn't directly map to a > good file name, you get an exception, and have to tell the user > to pick a different filename. > > Notice that it may fail at several layers: > - it may be that characters entered are not supported in what > Python choses as the file system encoding. > - it may be that the characters are not supported by the file > system, e.g. leading spaces in Win32. > - it may be that the file cannot be renamed because the target > name already exists. > In all these cases, the application has to ask the user to > reconsider; for at least the last case, it should be prepared > to do that, anyway (there is also the case where renaming fails > because of lack of permissions; in that case, picking a different > file name won't help). > This has made me think about what happens going the other way, ie when a user-supplied Unicode string needs to be converted to UTF-8b. That should also be reversible. Therefore: When encoding using UTF-8b, codepoints in the range U+DC80..U+DCFF should map to bytes 0x80..0xFF; all other codepoints, including the remaining half surrogates, should be encoded normally. When decoding using UTF-8b, undecodable bytes in the range 0x80..0xFF should map to U+DC80..U+DCFF; all other bytes, including the encodings for the remaining half surrogates, should be decoded normally. This will ensure that even when the user has provided a string containing half surrogates it can be encoded to bytes and then decoded back to the original string. From asmodai at in-nomine.org Sat Apr 25 21:31:40 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Sat, 25 Apr 2009 21:31:40 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> References: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <79990c6b0904250200m5dd87ec4n715d09bb16785591@mail.gmail.com> Message-ID: <20090425193140.GV10900@nexus.in-nomine.org> -On [20090425 11:01], Paul Moore (p.f.moore at gmail.com) wrote: >PS Unfortunately, I suspect that the biggest group of people likely to >be hit badly by this is people using non-latin scripts. And arguing >probabilities without real data is optimistic at best. But those >people are also the *least* likely people to contribute on an >English-speaking list, I guess :-( (Sincere apologies if everyone but >me on this list happens to actually be fluent English-speaking >Russians :-)) Even though I am Dutch I have to deal with a variety of scripts for my i18n and L10n efforts, which includes contributions to Unicode. Aside from that I also have the fair share of audio files which have the names/descriptions in the respective script (Thai, Korean, Chinese, Taiwanese, Japanese, and so on). -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Necessity relieves us of the ordeal of choice... From eric at trueblade.com Sun Apr 26 03:28:17 2009 From: eric at trueblade.com (Eric Smith) Date: Sat, 25 Apr 2009 21:28:17 -0400 Subject: [Python-Dev] [Python-checkins] r71946 - peps/trunk/pep-0315.txt In-Reply-To: <20090426003437.05F081E4022@bag.python.org> References: <20090426003437.05F081E4022@bag.python.org> Message-ID: <49F3B8B1.6090707@trueblade.com> You might want to note in the PEP that the problem that's being solved is known as the "loop and a half" problem. http://www.cs.duke.edu/~ola/patterns/plopd/loops.html#loop-and-a-half raymond.hettinger wrote: > Author: raymond.hettinger > Date: Sun Apr 26 02:34:36 2009 > New Revision: 71946 > > Log: > Revive PEP 315. > > Modified: > peps/trunk/pep-0315.txt > > Modified: peps/trunk/pep-0315.txt > ============================================================================== > --- peps/trunk/pep-0315.txt (original) > +++ peps/trunk/pep-0315.txt Sun Apr 26 02:34:36 2009 > @@ -2,9 +2,9 @@ > Title: Enhanced While Loop > Version: $Revision$ > Last-Modified: $Date$ > -Author: W Isaac Carroll > - Raymond Hettinger > -Status: Deferred > +Author: Raymond Hettinger > + W Isaac Carroll > +Status: Draft > Type: Standards Track > Content-Type: text/plain > Created: 25-Apr-2003 > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From cs at zip.com.au Sun Apr 26 03:51:13 2009 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 26 Apr 2009 11:51:13 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F2FD10.9080302@v.loewis.de> Message-ID: <20090426015113.GA17300@cskk.homeip.net> On 25Apr2009 14:07, "Martin v. L?wis" wrote: | Cameron Simpson wrote: | > On 22Apr2009 08:50, Martin v. L?wis wrote: | > | File names, environment variables, and command line arguments are | > | defined as being character data in POSIX; | > | > Specific citation please? I'd like to check the specifics of this. | For example, on environment variables: | http://opengroup.org/onlinepubs/007908799/xbd/envvar.html [...] | http://opengroup.org/onlinepubs/007908799/xsh/execve.html [...] Thanks. | > So you're proposing that all POSIX OS interfaces (which use byte strings) | > interpret those byte strings into Python3 str objects, with a codec | > that will accept arbitrary byte sequences losslessly and is totally | > reversible, yes? | | Correct. | | > And, I hope, that the os.* interfaces silently use it by default. | | Correct. Ok, then I'm probably good with the PEP. Though I have a quite strong desire to be able to work in bytes at need without doing multiple encode/decode steps. | > | Applications that need to process the original byte | > | strings can obtain them by encoding the character strings with the | > | file system encoding, passing "python-escape" as the error handler | > | name. | > | > -1 | > This last sentence kills the idea for me, unless I'm missing something. | > Which I may be, of course. | > POSIX filesystems _do_not_ have a file system encoding. | | Why is that a problem for the PEP? Because you said above "by encoding the character strings with the file system encoding", which is a fiction. | > If I'm writing a general purpose UNIX tool like chmod or find, I expect | > it to work reliably on _any_ UNIX pathname. It must be totally encoding | > blind. If I speak to the os.* interface to open a file, I expect to hand | > it bytes and have it behave. | | See the other messages. If you want to do that, you can continue to. | | > I'm very much in favour of being able to work in strings for most | > purposes, but if I use the os.* interfaces on a UNIX system it is | > necessary to be _able_ to work in bytes, because UNIX file pathnames | > are bytes. | | Please re-read the PEP. It provides a way of being able to access any | POSIX file name correctly, and still pass strings. | | > If there isn't a byte-safe os.* facility in Python3, it will simply be | > unsuitable for writing low level UNIX tools. | | Why is that? The mechanism in the PEP is precisely defined to allow | writing low level UNIX tools. Then implicitly it's byte safe. Clearly I'm being unclear; I mean original OS-level byte strings must be obtainable undamaged, and it must be possible to create/work on OS objects starting with a byte string as the pathname. | > Finally, I have a small python program whose whole purpose in life | > is to transcode UNIX filenames before transfer to a MacOSX HFS | > directory, because of HFS's enforced particular encoding. What approach | > should a Python app take to transcode UNIX pathnames under your scheme? | | Compute the corresponding character strings, and use them. In Python2 I've been going (ignoring checks for unchanged names): - Obtain the old name and interpret it into a str() "correctly". I mean here that I go: unicode_name = unicode(name, srcencoding) in old Python2 speak. name is a bytes string obtained from listdir() and srcencoding is the encoding known to have been used when the old name was constructed. Eg iso8859-1. - Compute the new name in the desired encoding. For MacOSX HFS, that's: utf8_name = unicodedata.normalize('NFD',unicode_name).encode('utf8') Still in Python2 speak, that's a byte string. - os.rename(name, utf8_name) Under your scheme I imagine this is amended. I would change your listdir_b() function as follows: def listdir_b(bytestring, fse=None): if fse is None: fse = sys.getfilesystemencoding() string = bytestring.decode(fse, "python-escape") for fn in os.listdir(string): yield fn.encoded(fse, "python-escape") So, internally, os.listdir() takes a string and encodes it to an _unspecified_ encoding in bytes, and opens the directory with that byte string using POSIX opendir(3). How does listdir() ensure that the byte string it passes to the underlying opendir(3) is identical to 'bytestring' as passed to listdir_b()? It seems from the PEP that "On POSIX systems, Python currently applies the locale's encoding to convert the byte data to Unicode". Your extension is to augument that by expressing the non-decodable byte sequences in a non-conflicting way for reversal later, yes? That seems to double the complexity of my example application, since it wants to interpret the original bytes in a caller-specified fashion, not using the locale defaults. So I must go: def macify(dirname, srcencoding): # I need this to reverse your encoding scheme fse = sys.getfilesystemencoding() # I'll pretend dirname is ready for use # it possibly has had to undergo the inverse of what happens inside # the loop below for fn in listdir(dirname): # listdir reads POSIX-bytes from readdir(3) # then encodes using the locale encoding, with your escape addition bytename = fn.encoded(fse, "python-escape") oldname = unicode(bytename, srcencoding) newbytename = unicodedata.normalize('NFD',unicode_name).encode('utf8') newname = newbytename.decode(fse, "python-escape") if fn != newname: os.rename(fn, newname) And I'm sure there's some os.path.join() complexity I have omitted. Is that correct? You'll note I need to recode the oldname unicode string because I don't know that fse is the same as the required target MacOSX UTF8 NFD encoding. So if my changes above are correct WRT the PEP, I grant that this is still doable in your scheme. But it would be far far easier with a bytes API. And let us not consider threads or other effects from locale changes during the loop run. I forget what was decided with the pure-bytes interfaces (out of scope for your PEP). Would there be a posix module with a bytes API? Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ The old day of Perl's try-it-before-you-use-it are long as gone. Nowadays you can write as many as 20..100 lines of Perl without hitting a bug in the perl implementation. - Ilya Zakharevich , in the perl-porters list, 22sep1998 From dickinsm at gmail.com Sun Apr 26 12:06:56 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 11:06:56 +0100 Subject: [Python-Dev] Two proposed changes to float formatting Message-ID: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: >>> '%f' % 2**166. '93536104789177786765035829293842113257979682750464.000000' >>> '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 More details: The current behaviour is documented (standard library->builtin types). (Until very recently, it was actually misdocumented as changing at 1e25, not 1e50.) """For safety reasons, floating point precisions are clipped to 50; %f conversions for numbers whose absolute value is over 1e50 are replaced by %g conversions. [5] All other errors raise exceptions.""" There's even a footnote: """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. - now that we're using David Gay's 'perfect rounding' code, we can be sure that the digits aren't entirely meaningless, or at least that they're the 'right' meaningless digits. This wasn't true before. - C doesn't do this, and the %f, %g, %e formats really owe their heritage to C. - float formatting is already quite complicated enough; no need to add to the mental complexity - removal simplifies the implementation :-) On to the second proposed change: (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: >>> 4., 10. (4.0, 10.0) >>> 4. + 10.j (4+10j) I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". Mostly this is just about consistency, ease of implementation, and aesthetics. As far as I can tell, the extra '.0' in the float repr serves two closely-related purposes: it makes it clear to the human reader that the number is a float rather than an integer, and it makes sure that e.g., eval(repr(x)) recovers a float rather than an int. The latter point isn't a concern for the current complex repr, but the former is: 4+10j looks to me more like a Gaussian integer than a complex number. Any comments? Mark From steve at pearwood.info Sun Apr 26 13:17:57 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 26 Apr 2009 21:17:57 +1000 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: <200904262117.58087.steve@pearwood.info> On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: > I'd like to propose two minor changes to float and complex > formatting, for 3.1. I don't think either change should prove > particularly disruptive. > > (1) Currently, '%f' formatting automatically changes to '%g' > formatting for numbers larger than 1e50. ... > I propose removing this feature for 3.1 No objections from me. +1 > I propose changing the complex str and repr to behave like the > float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" > rather than "(4+10j)". No objections here either. +0 -- Steven D'Aprano From fuzzyman at voidspace.org.uk Sun Apr 26 15:10:50 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 26 Apr 2009 14:10:50 +0100 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <200904262117.58087.steve@pearwood.info> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> <200904262117.58087.steve@pearwood.info> Message-ID: <49F45D5A.1020401@voidspace.org.uk> Steven D'Aprano wrote: > On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: > >> I'd like to propose two minor changes to float and complex >> formatting, for 3.1. I don't think either change should prove >> particularly disruptive. >> >> (1) Currently, '%f' formatting automatically changes to '%g' >> formatting for numbers larger than 1e50. >> > ... > >> I propose removing this feature for 3.1 >> > > No objections from me. +1 > > >> I propose changing the complex str and repr to behave like the >> float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" >> rather than "(4+10j)". >> > > No objections here either. +0 > > > > Doing it sooner rather than later means that it is less likely to disrupt anyone relying on the representation (i.e. doctests). Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From stephen at xemacs.org Sun Apr 26 15:47:44 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 26 Apr 2009 22:47:44 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> Message-ID: <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > 2009/4/24 Stephen J. Turnbull : > > Paul Moore writes: > > > > ?> The pros for Martin's proposal are a uniform cross-platform interface, > > ?> and a user-friendly API for the common case. > > > > A more accurate phrasing would be "... a user-friendly API for those > > who feel very lucky today." ?Which is the common case, of course, but > > spins a little differently. > > Sorry, but I think you're misrepresenting things. I'd have probably > let you off if you'd missed out the "very" - but I do think that it's > the common case. Consider: If you need reliability, then you can't get it this way. The reason "very" is (somewhat) justified is that this kind of issue is a little like unemployment. You hardly ever meet someone who's 7.2% unemployed, but you probably know several who are 100% unemployed. If you see a broken encoding once, you're likely to see it a million times (spammers have the most broken software) or maybe have it raise an unhandled Exception a dozen times (in rate of using busted software, the spammers are closely followed by bosses---which would be very bad, eh, if you 2/3 of the mail from your boss ends up in an undeliverables queue due to encoding errors that are unhandled by your some filter in your mail pipeline). > - Windows systems where broken Unicode (lone surrogates or whatever) > isn't involved > - Unix systems where the user's stated filesystem encoding is correct > Can you honestly say that this isn't the vast majority of real-world > environments? Again, that's not the point. The point is that six-sigma reliability world-wide is not going to be very comforting to the poor souls who happen to have broken software in their environment sending broken encodings regularly, because they're going to be dealing with one or two sigmas, and that's just not good enough in a production environment. > > If you didn't start with a valid string in a known encoding, you > > shouldn't treat it as characters because it's not. > > Again, that's the purist argument. If you have a string (of bytes, I > guess) and a 99% certain guess as to the correct encoding, then I'd > argue that, as long as (a) it's not mission-critical (lives or backups > depend on it) Assurance that you can even determine (a) is not provided by the PEP. There is no way to contain a problem if it should occur, because it's "just a string" and could go anywhere, and get converted back or otherwise manipulated in a context that doesn't know how to handle it (which might not even be Python if a C-level extension is involved). Given that Python has no internal mechanism for saying "in this area only valid Unicode will be accepted", it seems likely that mission critical software *will* interact with this feature, if only indirectly (or perhaps only in software originally intended for use in the U.S. only, but then it gets exported, etc). > and (b) you have a means of failing relatively > gracefully, you have every reason to make the assumption about > encoding. (b) is not provided in the PEP, either. We have no idea what the failure mode will be. > After all, what's the alternative? The alternative is to refuse to provide a simple standard way to decode unreliably, and in that way make the user reponsible for an explicit choice about what level and kinds of unreliability they will accept. I realize that's unpalatable to most people who use Python to develop software, and so I'm unwilling to go even -0 on the PEP. However, to give one example, I've been following Mailman development for about 10 years, and it is a dismal story despite a group of developers very sympathetic to encoding and multicultural issues. As recently as Mailman 2.10 (IIRC) there were *still* bugs in encoding handling that could stop the show (ie, not only did the buggy post not get processed, but the exception propagated high enough to cause everything behind it in the queue to fail, too). I think it would be sad if ten years from now there was software using this technique and failing occasionally. From dickinsm at gmail.com Sun Apr 26 15:45:47 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 14:45:47 +0100 Subject: [Python-Dev] Bug tracker down? Message-ID: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> The bugs.python.org site seems to be down. ping gives me the following (from Ireland): Macintosh-4:py3k dickinsm$ ping bugs.python.org PING bugs.python.org (88.198.142.26): 56 data bytes 36 bytes from et.2.16.rs3k6.rz5.hetzner.de (213.239.244.101): Destination Host Unreachable Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 77e1 0 0000 3a 01 603d 192.168.1.2 88.198.142.26 Various others on #python-dev have confirmed that it's not working for them. Does anyone know what the problem is? Mark From aahz at pythoncraft.com Sun Apr 26 17:19:46 2009 From: aahz at pythoncraft.com (Aahz) Date: Sun, 26 Apr 2009 08:19:46 -0700 Subject: [Python-Dev] Bug tracker down? In-Reply-To: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> Message-ID: <20090426151946.GB17459@panix.com> On Sun, Apr 26, 2009, Mark Dickinson wrote: > > The bugs.python.org site seems to be down. Dunno -- forwarded to the people who can do something about it. (There's a migration to a new mailserver going on, but I don't think this is related.) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From dickinsm at gmail.com Sun Apr 26 18:35:30 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 17:35:30 +0100 Subject: [Python-Dev] Bug tracker down? In-Reply-To: <20090426151946.GB17459@panix.com> References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> <20090426151946.GB17459@panix.com> Message-ID: <5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com> On Sun, Apr 26, 2009 at 4:19 PM, Aahz wrote: > On Sun, Apr 26, 2009, Mark Dickinson wrote: >> >> The bugs.python.org site seems to be down. > > Dunno -- forwarded to the people who can do something about it. ?(There's > a migration to a new mailserver going on, but I don't think this is > related.) Thanks. Who should I contact next time, to avoid spamming python-dev? Mark From aahz at pythoncraft.com Sun Apr 26 18:36:48 2009 From: aahz at pythoncraft.com (Aahz) Date: Sun, 26 Apr 2009 09:36:48 -0700 Subject: [Python-Dev] Bug tracker down? In-Reply-To: <5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com> References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> <20090426151946.GB17459@panix.com> <5c6f2a5d0904260935v6d08ebb8y49896f145889f49c@mail.gmail.com> Message-ID: <20090426163648.GA6892@panix.com> On Sun, Apr 26, 2009, Mark Dickinson wrote: > On Sun, Apr 26, 2009 at 4:19 PM, Aahz wrote: >> On Sun, Apr 26, 2009, Mark Dickinson wrote: >>> >>> The bugs.python.org site seems to be down. >> >> Dunno -- forwarded to the people who can do something about it. ?(There's >> a migration to a new mailserver going on, but I don't think this is >> related.) > > Thanks. Who should I contact next time, to avoid spamming python-dev? python-dev isn't a bad place (because it alerts the core developers), but you can also send a message to pydotorg at python.org -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From eric at trueblade.com Sun Apr 26 18:59:14 2009 From: eric at trueblade.com (Eric Smith) Date: Sun, 26 Apr 2009 12:59:14 -0400 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: <49F492E2.7010702@trueblade.com> Mark Dickinson wrote: > I'd like to propose two minor changes to float and complex > formatting, for 3.1. I don't think either change should prove > particularly disruptive. > > (1) Currently, '%f' formatting automatically changes to '%g' formatting for > numbers larger than 1e50. For example: ... > I propose removing this feature for 3.1 I'm +1 on this. > I have a suspicion that at least part of the > motivation for the '%f' -> '%g' switch is that it means the > implementation can use a fixed-size buffer. But Eric has > fixed this (in 3.1, at least) and the buffer is now dynamically > allocated, so this isn't a concern any more. I agree that this is a big part of the reason it was done. There's still some work to be done in the fallback code which we use if we can't use Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to calculate the maximum buffer size needed given the precision, for passing on to PyOS_snprintf. (At least I think that sentence is true, I'll very with Mark offline). > Other reasons not to switch from '%f' to '%g' in this way: > > - the change isn't gentle: as you go over the 1e50 boundary, > the number of significant digits produced suddenly changes > from 56 to 6; it would make more sense to me if it > stayed fixed at 56 sig digits for numbers larger than 1e50. This is the big reason for me. > - float formatting is already quite complicated enough; no > need to add to the mental complexity And this, too. > (2) complex str and repr don't behave like float str and repr, in that > the float version always adds a trailing '.0' (unless there's an > exponent), but the complex version doesn't: ... > I propose changing the complex str and repr to behave like the > float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" > rather than "(4+10j)". I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm not sure about the spaces around the sign. If we do want the spaces there, we can get rid of Py_DTSF_SIGN, since that's the only place it's used and we won't be able to use it for complex going forward. Eric. From dickinsm at gmail.com Sun Apr 26 19:40:44 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 18:40:44 +0100 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <49F492E2.7010702@trueblade.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> <49F492E2.7010702@trueblade.com> Message-ID: <5c6f2a5d0904261040m4fbdcc14rd0f81c37ce4bf85b@mail.gmail.com> On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith wrote: > Mark Dickinson wrote: >> I propose changing the complex str and repr to behave like the >> float version. ?That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" >> rather than "(4+10j)". > > I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm > not sure about the spaces around the sign. If we do want the spaces there, Whoops. The spaces were a mistake: I'm not proposing to add those. I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)". Mark From tseaver at palladion.com Sun Apr 26 20:03:12 2009 From: tseaver at palladion.com (Tres Seaver) Date: Sun, 26 Apr 2009 14:03:12 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> <49F22E74.4070108@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Terry Reedy wrote: > Is NUL \0 allowed in POSIX file names? If not, could that be used as an > escape char. If it is not legal, then custom translated strings that > escape in the wild would raise a red flag as soon as something else > tried to use them. Per David Wheeler's excellent "Fixing Linux/Unix/POSIX Filenames"[1]: Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that ?/? is always the directory separator, and that filenames can?t contain byte 0 (because this is the terminator). [1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ9KHg+gerLs4ltQ4RAs0HAKCiAOxmB8oBJRIoOIK+OK2LryUN6ACgp64k fzGUNScJwcdzzod3N+5JhOE= =Cw4m -----END PGP SIGNATURE----- From martin at v.loewis.de Sun Apr 26 21:03:00 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 26 Apr 2009 21:03:00 +0200 Subject: [Python-Dev] Bug tracker down? In-Reply-To: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> References: <5c6f2a5d0904260645ob40d38h5b6fb5043801b5ea@mail.gmail.com> Message-ID: <49F4AFE4.6040709@v.loewis.de> > Does anyone know what the problem is? The hardware running it apparently has serious problems. Upfronthosting, the company providing the hardware, is working on a solution. Unfortunately, it is difficult to get support from the datacenter on weekends. Regards, Martin From Scott.Daniels at Acm.Org Sun Apr 26 21:11:36 2009 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sun, 26 Apr 2009 12:11:36 -0700 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: Mark Dickinson wrote: > ... """[5] These numbers are fairly arbitrary. They are intended to > avoid printing endless strings of meaningless digits without > hampering correct use and without having to know the exact > precision of floating point values on a particular machine.""" > I don't find this particularly convincing, though---I just don't see > a really good reason not to give the user exactly what she/he > asks for here. As a user of Idle, I would not like to see the change you seek of having %f stay full-precision. When a number gets too long to print on a single line, the wrap depends on the current window width, and is calculated dynamically. One section of the display with a 8000 -digit (100-line) text makes Idle slow to scroll around in. It is too easy for numbers to go massively positive in a bug. > - the change isn't gentle: as you go over the 1e50 boundary, > the number of significant digits produced suddenly changes > from 56 to 6; it would make more sense to me if it > stayed fixed at 56 sig digits for numbers larger than 1e50. > - now that we're using David Gay's 'perfect rounding' > code, we can be sure that the digits aren't entirely > meaningless, or at least that they're the 'right' meaningless > digits. This wasn't true before. However, this is, I agree, a problem. Since all of these numbers should end in a massive number of zeroes, how about we replace only the trailing zeroes with the e, so we wind up with: 1157920892373161954235709850086879078532699846656405640e+23 or 115792089237316195423570985008687907853269984665640564.0e+24 or some such, rather than 1.157920892373162e+77 or 1.15792089237316195423570985008687907853269984665640564e+77 --Scott David Daniels Scott.Daniels at Acm.Org From dickinsm at gmail.com Sun Apr 26 21:19:20 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 20:19:20 +0100 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com> On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels wrote: > As a user of Idle, I would not like to see the change you seek of > having %f stay full-precision. ?When a number gets too long to print > on a single line, the wrap depends on the current window width, and > is calculated dynamically. ?One section of the display with a 8000 > -digit (100-line) text makes Idle slow to scroll around in. ?It is > too easy for numbers to go massively positive in a bug. I see your point. Since we're talking about floats, thought, there should never be more than 316 characters in a '%f' % x: the largest float is around 1.8e308, giving 308 digits before the point, 6 after, a decimal point, and possibly a minus sign. (Assuming that your platform uses IEEE 754 doubles.) > However, this is, I agree, a problem. ?Since all of these numbers > should end in a massive number of zeroes But they typically don't end in zeros (except the six zeros following the point), because they're stored in binary rather than decimal. For example: >>> int(1e308) 100000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336 Mark From allison at shasta.stanford.edu Sun Apr 26 21:29:38 2009 From: allison at shasta.stanford.edu (Dennis Allison) Date: Sun, 26 Apr 2009 12:29:38 -0700 Subject: [Python-Dev] float formatting Message-ID: <200904261929.n3QJTcJY030239@shasta.stanford.edu> Floating point printing is tricky, as I am sure you know. You might want to refrefresh your understanding by consulting the literture--I know I would. For example, you might want to look at http://portal.acm.org/citation.cfm?id=93559 Guy Steele's paper: Guy L. Steele , Jon L. White, How to print floating-point numbers accurately, ACM SIGPLAN Notices, v.39 n.4, April 2004 is a classic and worthy of a read. From tjreedy at udel.edu Sun Apr 26 23:02:19 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 26 Apr 2009 17:02:19 -0400 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: Mark Dickinson wrote: > I'd like to propose two minor changes to float and complex > formatting, for 3.1. I don't think either change should prove > particularly disruptive. > > (1) Currently, '%f' formatting automatically changes to '%g' formatting for > numbers larger than 1e50. For example: > >>>> '%f' % 2**166. > '93536104789177786765035829293842113257979682750464.000000' >>>> '%f' % 2**167. > '1.87072e+50' > > I propose removing this feature for 3.1 > > More details: The current behaviour is documented (standard > library->builtin types). (Until very recently, it was actually > misdocumented as changing at 1e25, not 1e50.) > > """For safety reasons, floating point precisions are clipped to 50; %f > conversions for numbers whose absolute value is over 1e50 are > replaced by %g conversions. [5] All other errors raise exceptions.""" > > There's even a footnote: > > """[5] These numbers are fairly arbitrary. They are intended to > avoid printing endless strings of meaningless digits without > hampering correct use and without having to know the exact > precision of floating point values on a particular machine.""" > > I don't find this particularly convincing, though---I just don't see > a really good reason not to give the user exactly what she/he > asks for here. I have a suspicion that at least part of the > motivation for the '%f' -> '%g' switch is that it means the > implementation can use a fixed-size buffer. But Eric has > fixed this (in 3.1, at least) and the buffer is now dynamically > allocated, so this isn't a concern any more. > > Other reasons not to switch from '%f' to '%g' in this way: > > - the change isn't gentle: as you go over the 1e50 boundary, > the number of significant digits produced suddenly changes > from 56 to 6; Looking at your example, that jumped out at me as somewhat startling... > it would make more sense to me if it > stayed fixed at 56 sig digits for numbers larger than 1e50. So I agree with this, even if the default # of sig digits were less. +1 > - now that we're using David Gay's 'perfect rounding' > code, we can be sure that the digits aren't entirely > meaningless, or at least that they're the 'right' meaningless > digits. This wasn't true before. > - C doesn't do this, and the %f, %g, %e formats really > owe their heritage to C. > - float formatting is already quite complicated enough; no > need to add to the mental complexity > - removal simplifies the implementation :-) > > > On to the second proposed change: > > (2) complex str and repr don't behave like float str and repr, in that > the float version always adds a trailing '.0' (unless there's an > exponent), but the complex version doesn't: > >>>> 4., 10. > (4.0, 10.0) >>>> 4. + 10.j > (4+10j) > > I propose changing the complex str and repr to behave like the > float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" > rather than "(4+10j)". > > Mostly this is just about consistency, ease of implementation, > and aesthetics. As far as I can tell, the extra '.0' in the float > repr serves two closely-related purposes: it makes it clear to > the human reader that the number is a float rather than an > integer, and it makes sure that e.g., eval(repr(x)) recovers a > float rather than an int. The latter point isn't a concern for > the current complex repr, but the former is: 4+10j looks to > me more like a Gaussian integer than a complex number. I agree. A complex is alternately an ordered pair of floats. A different, number-theory oriented implementation of Python might even want to read 4+10j as a G. i. tjr From Scott.Daniels at Acm.Org Sun Apr 26 23:42:20 2009 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sun, 26 Apr 2009 14:42:20 -0700 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com> Message-ID: <49F4D53C.8070304@Acm.Org> Mark Dickinson wrote: > On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels > wrote: >> As a user of Idle, I would not like to see the change you seek of >> having %f stay full-precision. When a number gets too long to print >> on a single line, the wrap depends on the current window width, and >> is calculated dynamically. One section of the display with a 8000 >> -digit (100-line) text makes Idle slow to scroll around in. It is >> too easy for numbers to go massively positive in a bug. > I had also said (without explaining: > > only the trailing zeroes with the e, so we wind up with: > > 1157920892373161954235709850086879078532699846656405640e+23 > > or 115792089237316195423570985008687907853269984665640564.0e+24 > > or some such, rather than > > 1.157920892373162e+77 > > or 1.15792089237316195423570985008687907853269984665640564e+77 These are all possible representations for 2 ** 256. > I see your point. Since we're talking about floats, thought, there > should never be more than 316 characters in a '%f' % x: the > largest float is around 1.8e308, giving 308 digits before the > point, 6 after, a decimal point, and possibly a minus sign. > (Assuming that your platform uses IEEE 754 doubles.) You are correct that I had not thought long and hard about that. 308 is livable, if not desireable. I was remebering accidentally displaying the result of a factorial call. >> However, this is, I agree, a problem. Since all of these numbers >> should end in a massive number of zeroes > > But they typically don't end in zeros (except the six zeros following > the point), > because they're stored in binary rather than decimal.... _but_ the printed decimal number I am proposing is within one ULP of the value of the binary numbery. That is, the majority of the digits in int(1e308) are a fiction -- they could just as well be the digits of int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308 That is the sense in which I say those digits in decimal are zeroes. My proposal was to have the integer part of the expansion be a representation of the accuracy of the number in a visible form. I chose the value I chose since a zero lies at the very end, and tried to indicate I did not really care where trailing actual accuracy zeros get taken off the representation. The reason I don't care is that the code from getting a floating point value is tricky, and I suspect the printing code might not easily be able to distinguish between a significant trailing zero and fictitous bits. --Scott David Daniels Scott.Daniels at Acm.Org From dickinsm at gmail.com Mon Apr 27 00:35:00 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 26 Apr 2009 23:35:00 +0100 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <49F4D53C.8070304@Acm.Org> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com> <49F4D53C.8070304@Acm.Org> Message-ID: <5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com> On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: > I had also said (without explaining: >> > only the trailing zeroes with the e, so we wind up with: >> > ? ? ?1157920892373161954235709850086879078532699846656405640e+23 >> > ?or 115792089237316195423570985008687907853269984665640564.0e+24 >> > ?or some such, rather than >> > ? ? ?1.157920892373162e+77 >> > ?or 1.15792089237316195423570985008687907853269984665640564e+77 > These are all possible representations for 2 ** 256. Understood. > _but_ the printed decimal number I am proposing is within one ULP of > the value of the binary numbery. But there are plenty of ways to get this if this is what you want: if you want a displayed result that's within 1 ulp (or 0.5 ulps, which would be better) of the true value then repr should serve your needs. If you want more control over the number of significant digits then '%g' formatting gives that, together with a nice-looking output for small numbers. It's only '%f' formatting that I'm proposing changing: I see a '%.2f' formatting request as a very specific, precise one: give me exactly 2 digits after the point---no more, no less, and it seems wrong and arbitrary that this request should be ignored for numbers larger than 1e50 in absolute value. That is, for general float formatting needs, use %g, str and repr. %e and %f are for when you want fine control. > That is, the majority of the digits > in int(1e308) are a fiction Not really: the float that Python stores has a very specific value, and the '%f' formatting is showing exactly that value. (Yes, I know that some people advocate viewing a float as a range of values rather than a specific value; but I'm pretty sure that that's not the way that the creators of IEEE 754 were thinking.) > zeros get taken off the representation. ?The reason I don't care is > that the code from getting a floating point value is tricky, and I > suspect the printing code might not easily be able to distinguish > between a significant trailing zero and fictitous bits. As of 3.1, the printing code should be fine: it's using David Gay's 'perfect rounding' code, so what's displayed should be correctly rounded to the requested precision. Mark From python at rcn.com Mon Apr 27 01:35:43 2009 From: python at rcn.com (Raymond Hettinger) Date: Sun, 26 Apr 2009 16:35:43 -0700 Subject: [Python-Dev] Two proposed changes to float formatting References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: <200F596842DC4E0CA0305794F05B8905@RaymondLaptop1> >> it would make more sense to me if it >> stayed fixed at 56 sig digits for numbers larger than 1e50. > > So I agree with this, even if the default # of sig digits were less. Several reasons to accept Mark's proposal: * It matches what C does and many languages tend to copy the C standards with respect to format codes. Matching other languages helps in porting code, copying algorithms, and mentally switching back and forth when working in multiple languages. * When a programmer has chosen %f, that means that they have consciously rejected choosing %e or %g. It is generally best to have the code do what the programmer asked for ;-) * Code that tested well with 1e47, 1e48, 1e49, and 1e50 suddenly shifts behavior with 1e51. Behavior shifts like that are bug bait. * The 56 significant digits may be rooted in the longest decimal expansion of a 53 bit float. For example, len(str(Decimal.from_float(.1))) is 57 including the leading zero. But not all machines (now, in the past, or in the future) use 53 bits for the significand. * Use of exponents is common but not universal. Some converters for SQL specs like Decimal(10,80) may not recognize the e-notation. The xmlrpc spec only accepts decimal expansions not %e notation. * The programmer needs to have some way to spell-out a decimal expansion when needed. Currently, %f is the only way. Raymond From eric at trueblade.com Mon Apr 27 01:42:51 2009 From: eric at trueblade.com (Eric Smith) Date: Sun, 26 Apr 2009 19:42:51 -0400 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> Message-ID: <49F4F17B.9090807@trueblade.com> Mark Dickinson wrote: > (1) Currently, '%f' formatting automatically changes to '%g' formatting for > numbers larger than 1e50. For example: > >>>> '%f' % 2**166. > '93536104789177786765035829293842113257979682750464.000000' >>>> '%f' % 2**167. > '1.87072e+50' > > I propose removing this feature for 3.1 I don't think we've stated it on this discussion, but I know from private email with Mark that his proposal is for both %-formatting and for float.__format__ to have this change. I just want to get it on the record here. Eric. From agbauer at gmail.com Mon Apr 27 02:59:54 2009 From: agbauer at gmail.com (Adrian) Date: Mon, 27 Apr 2009 00:59:54 +0000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com> How about another str-like type, a sequence of char-or-bytes? Could be called strbytes or stringwithinvalidcharacters. It would support whatever subset of str functionality makes sense / is easy to implement plus a to_escaped_str() method (that does the escaping the PEP talks about) for people who want to use regexes or other str-only stuff. Here is a description by example: os.listdir('.') -> [strbytes('normal_file'), strbytes('bad', 128, 'file')] strbytes('a')[0] -> strbytes('a') strbytes('bad', 128, 'file')[3] -> strbytes(128) strbytes('bad', 128, 'file').to_escaped_str() -> 'bad?128file' Having a separate type is cleaner than a "str that isn't exactly what it represents". And making the escaping an explicit (but rarely-needed) step would be less surprising for users. Anyway, I don't know a whole lot about this issue so there may an obvious reason this is a bad idea. On Wed, Apr 22, 2009 at 6:50 AM, "Martin v. L?wis" wrote: > I'm proposing the following PEP for inclusion into Python 3.1. > Please comment. > > Regards, > Martin > > PEP: 383 > Title: Non-decodable Bytes in System Character Interfaces > Version: $Revision: 71793 $ > Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $ > Author: Martin v. L?wis > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 22-Apr-2009 > Python-Version: 3.1 > Post-History: > > Abstract > ======== > > File names, environment variables, and command line arguments are > defined as being character data in POSIX; the C APIs however allow > passing arbitrary bytes - whether these conform to a certain encoding > or not. This PEP proposes a means of dealing with such irregularities > by embedding the bytes in character strings in such a way that allows > recreation of the original byte string. > > Rationale > ========= > > The C char type is a data type that is commonly used to represent both > character data and bytes. Certain POSIX interfaces are specified and > widely understood as operating on character data, however, the system > call interfaces make no assumption on the encoding of these data, and > pass them on as-is. With Python 3, character strings use a > Unicode-based internal representation, making it difficult to ignore > the encoding of byte strings in the same way that the C interfaces can > ignore the encoding. > > On the other hand, Microsoft Windows NT has correct the original > design limitation of Unix, and made it explicit in its system > interfaces that these data (file names, environment variables, command > line arguments) are indeed character data, by providing a > Unicode-based API (keeping a C-char-based one for backwards > compatibility). > > For Python 3, one proposed solution is to provide two sets of APIs: a > byte-oriented one, and a character-oriented one, where the > character-oriented one would be limited to not being able to represent > all data accurately. Unfortunately, for Windows, the situation would > be exactly the opposite: the byte-oriented interface cannot represent > all data; only the character-oriented API can. As a consequence, > libraries and applications that want to support all user data in a > cross-platform manner have to accept mish-mash of bytes and characters > exactly in the way that caused endless troubles for Python 2.x. > > With this PEP, a uniform treatment of these data as characters becomes > possible. The uniformity is achieved by using specific encoding > algorithms, meaning that the data can be converted back to bytes on > POSIX systems only if the same encoding is used. > > Specification > ============= > > On Windows, Python uses the wide character APIs to access > character-oriented APIs, allowing direct conversion of the > environmental data to Python str objects. > > On POSIX systems, Python currently applies the locale's encoding to > convert the byte data to Unicode. If the locale's encoding is UTF-8, > it can represent the full set of Unicode characters, otherwise, only a > subset is representable. In the latter case, using private-use > characters to represent these bytes would be an option. For UTF-8, > doing so would create an ambiguity, as the private-use characters may > regularly occur in the input also. > > To convert non-decodable bytes, a new error handler "python-escape" is > introduced, which decodes non-decodable bytes using into a private-use > character U+F01xx, which is believed to not conflict with private-use > characters that currently exist in Python codecs. > > The error handler interface is extended to allow the encode error > handler to return byte strings immediately, in addition to returning > Unicode strings which then get encoded again. > > If the locale's encoding is UTF-8, the file system encoding is set to > a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes > (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. > > Discussion > ========== > > While providing a uniform API to non-decodable bytes, this interface > has the limitation that chosen representation only "works" if the data > get converted back to bytes with the python-escape error handler > also. Encoding the data with the locale's encoding and the (default) > strict error handler will raise an exception, encoding them with UTF-8 > will produce non-sensical data. > > For most applications, we assume that they eventually pass data > received from a system interface back into the same system > interfaces. For example, and application invoking os.listdir() will > likely pass the result strings back into APIs like os.stat() or > open(), which then encodes them back into their original byte > representation. Applications that need to process the original byte > strings can obtain them by encoding the character strings with the > file system encoding, passing "python-escape" as the error handler > name. > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/agbauer%40gmail.com > From Scott.Daniels at Acm.Org Mon Apr 27 04:56:43 2009 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sun, 26 Apr 2009 19:56:43 -0700 Subject: [Python-Dev] Two proposed changes to float formatting In-Reply-To: <5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com> References: <5c6f2a5d0904260306y429ddc1ey82b930f17fd93469@mail.gmail.com> <5c6f2a5d0904261219n1164e451j877a268f1e4a7c28@mail.gmail.com> <49F4D53C.8070304@Acm.Org> <5c6f2a5d0904261535m50781b00vfba17efb5aaf631f@mail.gmail.com> Message-ID: <49F51EEB.7000508@Acm.Org> ark Dickinson wrote: > On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: >... >> I had also said (without explaining: >>>> only the trailing zeroes with the e, so we wind up with: >>>> 1157920892373161954235709850086879078532699846656405640e+23 >>>> or 115792089237316195423570985008687907853269984665640564.0e+24 >>>> or some such, rather than >>>> 1.157920892373162e+77 >>>> or 1.15792089237316195423570985008687907853269984665640564e+77 >> These are all possible representations for 2 ** 256. > > Understood. > >> _but_ the printed decimal number I am proposing is within one ULP of >> the value of the binary numbery. > > But there are plenty of ways to get this if this is what you want: if > you want a displayed result that's within 1 ulp (or 0.5 ulps, which > would be better) of the true value then repr should serve your needs. The representation I am suggesting here is a half-way measure between your proposal and the existing behvior. This representation addresses the abrupt transition that you point out (number of significant digits drops precipitously) without particularly changing the goal of the transition (displaying faux accuracy), without, in my (possibly naive) view, seriously complicating either the print-generating code or the issues for the reader of the output. To wit, the proposal is (A) for numbers where the printed digits exceed the accuracy presented, represent the result as an integer with an e+N, rather than a number between 1 and 2-epsilon with an exponent that makes you have to count digits to compare the two values, and (B) that the full precision available in the the value be shown in the representation. Given that everyone understands that is what I am proposing, I am OK with the decision going where it will. I am comforted that we are only talking about about four wrapped lines if we go to the full integer, which I had not realized. Further, I agree with you that there is an abrupt transition in represented accuracy as we cross from %f to %g, that should be somehow addressed. You want to address it by continuing to show digits, and I want to limit the digits shown to a value that reflects the known accuracy. I also want text that compares "smoothly" with numbers near the transition (so that greater-than and less-than relationships are obvious without thinking, hence the representation that avoids the "normalized" mantissa. . Having said all this, I think my compromise position should be clear. I did not mean to argue with you, but rather intended to propose a possible middle way that some might find appealing. --Scott David Daniels Scott.Daniels at Acm.Org From martin at v.loewis.de Mon Apr 27 07:34:03 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 27 Apr 2009 07:34:03 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <5f1bf48e0904261759i54c730fdvbdc47e0f80aa0667@mail.gmail.com> Message-ID: <49F543CB.7000707@v.loewis.de> > How about another str-like type, a sequence of char-or-bytes? That would be a different PEP. I personally like my own proposal more, but feel free to propose something different. Regards, Martin From v+python at g.nevcal.com Mon Apr 27 08:39:41 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 26 Apr 2009 23:39:41 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F30390.2040808@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> <49F30390.2040808@v.loewis.de> Message-ID: <49F5532D.2090700@g.nevcal.com> On approximately 4/25/2009 5:35 AM, came the following characters from the keyboard of Martin v. L?wis: >> Because the encoding is not reliably reversible. > > Why do you say that? The encoding is completely reversible > (unless we disagree on what "reversible" means). > >> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a >> reversible encoding. > > Then please provide an example for a setup where it is not reversible. > > Regards, > Martin It is reversible if you know that it is decoded, and apply the encoding. But if you don't know that has been encoded, then applying the reverse transform can convert an undecoded str that matches the decoded str to the form that it could have, but never did take. The problem is that there is no guarantee that the str interface provides only strictly conforming Unicode, so decoding bytes to non-strictly conforming Unicode, can result in a data pun between non-strictly conforming Unicode coming from the str interface vs bytes being decoded to non-strictly conforming Unicode coming from the bytes interface. Any particular problem that always consistently uses one or the other (bytes vs str) APIs under the covers might never be affected by such a data pun, but programs that may use both types of interface could potentially see a data pun. If your PEP depends on consistent use of one or the other type of interface, you should say so, and if the platform only provides that type of interface, maybe all is well. Both types of interfaces are available on Windows, perhaps POSIX only provides native bytes interfaces, and if the PEP is the only way to provide str interfaces, then perhaps consistency use is required. There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems might communicate file names between each other, which is not at all clear from the PEP. If this is an insoluble or un-addressed issue, it should be stated. (It is probably insoluble, due to there being multiple ways that the cross-mounted file systems might translate names; but if there are, can we learn something from the rules the mounting systems use, to be compatible with (one of) them, or not. Together with your change to avoid using PUA characters, and the rule suggested by MRAB in another branch of this thread, of treating half-surrogates as invalid byte sequences may avoid the data puns I'm concerned about. It is not clear how half-surrogate characters would be displayed, when the user prints or displays such a file name string. It would seem that programs that display file names to users might still have issues with such; an escaping mechanism that uses displayable characters would have an advantage there. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Mon Apr 27 09:07:16 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 00:07:16 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F30083.5050506@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> Message-ID: <49F559A4.8050400@g.nevcal.com> On approximately 4/25/2009 5:22 AM, came the following characters from the keyboard of Martin v. L?wis: >> The problem with this, and other preceding schemes that have been >> discussed here, is that there is no means of ascertaining whether a >> particular file name str was obtained from a str API, or was funny- >> decoded from a bytes API... and thus, there is no means of reliably >> ascertaining whether a particular filename str should be passed to a >> str API, or funny-encoded back to bytes. > > Why is it necessary that you are able to make this distinction? It is necessary that programs (not me) can make the distinction, so that it knows whether or not to do the funny-encoding or not. If a name is funny-decoded when the name is accessed by a directory listing, it needs to be funny-encoded in order to open the file. >> Picking a character (I don't find U+F01xx in the >> Unicode standard, so I don't know what it is) > > It's a private use area. It will never carry an official character > assignment. I know that U+F0000 - U+FFFFF is a private use area. I don't find a definition of U+F01xx to know what the notation means. Are you picking a particular character within the private use area, or a particular range, or what? >> As I realized in the email-sig, in talking about decoding corrupted >> headers, there is only one way to guarantee this... to encode _all_ >> character sequences, from _all_ interfaces. Basically it requires >> reserving an escape character (I'll use ? in these examples -- yes, an >> ASCII question mark -- happens to be illegal in Windows filenames so >> all the better on that platform, but the specific character doesn't >> matter... avoiding / \ and . is probably good, though). > > I think you'll have to write an alternative PEP if you want to see > something like this implemented throughout Python. I'm certainly not experienced enough in Python development processes or internals to attempt such, as yet. But somewhere in 25 years of programming, I picked up the knowledge that if you want to have a 1-to-1 reversible mapping, you have to avoid data puns, mappings of two different data values into a single data value. Your PEP, as first written, didn't seem to do that... since there are two interfaces from which to obtain data values, one performing a mapping from bytes to "funny invalid" Unicode, and the other performing no mapping, but accepting any sort of Unicode, possibly including "funny invalid" Unicode, the possibility of data puns seems to exist. I may be misunderstanding something about the use cases that prevent these two sources of "funny invalid" Unicode from ever coexisting, but if so, perhaps you could point it out, or clarify the PEP. I'll try to reread it again... could you post a URL to the most up-to-date version of the PEP, since I haven't seen such appear here, and the version I found via a Google search seems to be the original? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From cs at zip.com.au Mon Apr 27 09:55:49 2009 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 27 Apr 2009 17:55:49 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F5532D.2090700@g.nevcal.com> Message-ID: <20090427075549.GA4418@cskk.homeip.net> On 26Apr2009 23:39, Glenn Linderman wrote: [...snip...] > There are still issues regarding how Windows and POSIX programs that are > sharing cross-mounted file systems might communicate file names between > each other, which is not at all clear from the PEP. If this is an > insoluble or un-addressed issue, it should be stated. (It is probably > insoluble, due to there being multiple ways that the cross-mounted file > systems might translate names; but if there are, can we learn something > from the rules the mounting systems use, to be compatible with (one of) > them, or not. I'd say that's out of scope. A windows filesystem mounted on a UNIX host should probably be mounted with a mapping to translate the Windows Unicode names into whatever the sysadmin deems the locally most apt byte encoding. But sys.getfilesystemencoding() is based on the current user's locale settings, which need not be the same. > Together with your change to avoid using PUA characters, and the rule > suggested by MRAB in another branch of this thread, of treating > half-surrogates as invalid byte sequences may avoid the data puns I'm > concerned about. > > It is not clear how half-surrogate characters would be displayed, when > the user prints or displays such a file name string. It would seem that > programs that display file names to users might still have issues with > such; an escaping mechanism that uses displayable characters would have > an advantage there. Wouldn't any escaping mechanism that uses displayable characters require visually mangling occurences of those characters that legitimately occur in the original? -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ From v+python at g.nevcal.com Mon Apr 27 10:40:43 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 01:40:43 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090427075549.GA4418@cskk.homeip.net> References: <20090427075549.GA4418@cskk.homeip.net> Message-ID: <49F56F8B.7030108@g.nevcal.com> On approximately 4/27/2009 12:55 AM, came the following characters from the keyboard of Cameron Simpson: > On 26Apr2009 23:39, Glenn Linderman wrote: > [...snip...] > >> There are still issues regarding how Windows and POSIX programs that are >> sharing cross-mounted file systems might communicate file names between >> each other, which is not at all clear from the PEP. If this is an >> insoluble or un-addressed issue, it should be stated. (It is probably >> insoluble, due to there being multiple ways that the cross-mounted file >> systems might translate names; but if there are, can we learn something >> from the rules the mounting systems use, to be compatible with (one of) >> them, or not. >> > > I'd say that's out of scope. A windows filesystem mounted on a UNIX host > should probably be mounted with a mapping to translate the Windows > Unicode names into whatever the sysadmin deems the locally most apt > byte encoding. But sys.getfilesystemencoding() is based on the current user's > locale settings, which need not be the same. > And if it were, what would it do with files that can't be encoded with the locally most apt byte encoding? That's where we might learn something about what behaviors are deemed acceptable. Would such files be inaccessible? Accessible with mangled names? or what? And for a Unix filesystem mounted on a Windows host? Or accessed via some network connection? >> Together with your change to avoid using PUA characters, and the rule >> suggested by MRAB in another branch of this thread, of treating >> half-surrogates as invalid byte sequences may avoid the data puns I'm >> concerned about. >> >> It is not clear how half-surrogate characters would be displayed, when >> the user prints or displays such a file name string. It would seem that >> programs that display file names to users might still have issues with >> such; an escaping mechanism that uses displayable characters would have >> an advantage there. >> > > Wouldn't any escaping mechanism that uses displayable characters > require visually mangling occurences of those characters that > legitimately occur in the original? > Yes. My suggested use of ? is a visible character that is illegal in Windows file names, thus causing no valid Windows file names to be visually mangled. It is also a character that should be avoided in POSIX names because: 1) it is known to be illegal on Windows, and thus non-portable 2) it is hard to write globs that match ? without allowing matches of other characters as well 3) it must be quoted to specify it on a command line That said, someone provided a case where it is "easy" to get ? in POSIX file names. The remaining question is whether that is a reasonable use case, a frequent use case, or a stupid use case; and whether the resulting visible mangling is more or less understandable and disruptive than using half-surrogates which are: 1) invalid Unicode 2) non-displayable 3) indistinguishable using normal non-displayable character substitution rules -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From rdmurray at bitdance.com Mon Apr 27 11:32:42 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 27 Apr 2009 05:32:42 -0400 (EDT) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F56F8B.7030108@g.nevcal.com> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> Message-ID: On Mon, 27 Apr 2009 at 01:40, Glenn Linderman wrote: > Yes. My suggested use of ? is a visible character that is illegal in Windows > file names, thus causing no valid Windows file names to be visually mangled. > It is also a character that should be avoided in POSIX names because: > > 1) it is known to be illegal on Windows, and thus non-portable > 2) it is hard to write globs that match ? without allowing matches of other > characters as well > 3) it must be quoted to specify it on a command line > > That said, someone provided a case where it is "easy" to get ? in POSIX file > names. The remaining question is whether that is a reasonable use case, a > frequent use case, or a stupid use case; and whether the resulting visible Reasonable I don't know, but frequent (FSDO frequent) and out of our control yes. It happens often when downloading files with wget, for example. --David From solipsis at pitrou.net Mon Apr 27 13:29:14 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 11:29:14 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System?= =?utf-8?q?=09Character=09Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull xemacs.org> writes: > > If > you see a broken encoding once, you're likely to see it a million times > (spammers have the most broken software) or maybe have it raise an > unhandled Exception a dozen times (in rate of using busted software, > the spammers are closely followed by bosses---which would be very bad, > eh, if you 2/3 of the mail from your boss ends up in an undeliverables > queue due to encoding errors that are unhandled by your some filter in > your mail pipeline). I'm not sure how mail being stuck in a pipeline has anything to do with Martin's proposal (which deals with file paths, not with SMTP...). Besides, I don't care about spammers and their broken software. > Again, that's not the point. The point is that six-sigma reliability > world-wide is not going to be very comforting to the poor souls who > happen to have broken software in their environment sending broken > encodings regularly, because they're going to be dealing with one or > two sigmas, and that's just not good enough in a production > environment. So you're arguing that whatever solution which isn't 100% perfect but only 99.999% perfect shouldn't be implemented at all, and leave the status quo at 98%? This sounds disturbing to me. (especially given you probably sent this mail using TCP/IP...) Regards Antoine. From dd at crosstwine.com Mon Apr 27 16:25:47 2009 From: dd at crosstwine.com (Damien Diederen) Date: Mon, 27 Apr 2009 16:25:47 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: (Antoine Pitrou's message of "Wed, 8 Apr 2009 11:10:21 +0000 (UTC)") References: Message-ID: <87k556dvh0.fsf@keem.bcc> Hello, Antoine Pitrou writes: > Hello, > > We're in the process of forward-porting the recent (massive) json > updates to 3.1, and we are also thinking of dropping remnants of > support of the bytes type in the json library (in 3.1, again). This > bytes support almost didn't work at all, but there was a lot of C and > Python code for it nevertheless. We're also thinking of dropping the > "encoding" argument in the various APIs, since it is useless. I had a quick look into the module on both branches, and at Antoine's latest patch (json_py3k-3). The current situation on trunk is indeed not very pretty in terms of code duplication, and I agree it would be nice not to carry that forward. I couldn't figure out a way to get rid of it short of multi-#including "templates" and playing with the C preprocessor, however, and have the nagging feeling the latter would be frowned upon by the maintainers. There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm wrong about that. Should I give it a try, and see how "clean" the result can be made? > Under the new situation, json would only ever allow str as input, and > output str as well. By posting here, I want to know whether anybody > would oppose this (knowing, once again, that bytes support is already > broken in the current py3k trunk). Provided one of the alternatives is dropped, wouldn't it be better to do the opposite, i.e., have the decoder take bytes as input, and the encoder produce bytes?and layer the str functionality on top of that? I guess the answer depends on how the (most common) lower layers are structured, but it would be nice to allow a straight bytes path to/from the underlying transport. (I'm willing to have a go at the conversion in case somebody is interested.) Bob, would you have an idea of which lower layers are most commonly used with the json module, and whether people are more likely to expect strs or bytes in Python 3.x? Maybe that data could be inferred from some bug tracking system? > The bug entry is: http://bugs.python.org/issue4136 > > Regards > Antoine. Regards, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen From eric at trueblade.com Mon Apr 27 17:03:21 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 27 Apr 2009 11:03:21 -0400 Subject: [Python-Dev] Windows buildbots failing test_types in trunk Message-ID: <49F5C939.6020802@trueblade.com> Mark Dickinson pointed out to me that the trunk buildbots are failing under Windows. After some analysis, I think this is because of a change I made to use _toupper in integer formatting. The correct solution to this is to implement issue 5793 to come up with a working, cross-platform, locale-unaware set of functions and/or macros for isdigit / isupper / toupper, etc. I'll work on this tonight or tomorrow, at which point the Windows buildbots should turn green. I don't think this affects py3k, although I'll port it there before the beta release. Eric. From eric at trueblade.com Mon Apr 27 17:05:04 2009 From: eric at trueblade.com (Eric Smith) Date: Mon, 27 Apr 2009 11:05:04 -0400 (EDT) Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <87k556dvh0.fsf@keem.bcc> References: <87k556dvh0.fsf@keem.bcc> Message-ID: <26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com> > I couldn't figure out a way to get rid of it short of multi-#including > "templates" and playing with the C preprocessor, however, and have the > nagging feeling the latter would be frowned upon by the maintainers. Not sure if this is exactly what you mean, but look at Objects/stringlib. str.format() and unicode.format() share the same implementation, using stringdefs.h and unicodedefs.h. Eric. From bob at redivi.com Mon Apr 27 17:07:04 2009 From: bob at redivi.com (Bob Ippolito) Date: Mon, 27 Apr 2009 08:07:04 -0700 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <87k556dvh0.fsf@keem.bcc> References: <87k556dvh0.fsf@keem.bcc> Message-ID: <6a36e7290904270807kbe9ac4y90c7078393e1a393@mail.gmail.com> On Mon, Apr 27, 2009 at 7:25 AM, Damien Diederen
wrote: > > Antoine Pitrou writes: >> Hello, >> >> We're in the process of forward-porting the recent (massive) json >> updates to 3.1, and we are also thinking of dropping remnants of >> support of the bytes type in the json library (in 3.1, again). This >> bytes support almost didn't work at all, but there was a lot of C and >> Python code for it nevertheless. We're also thinking of dropping the >> "encoding" argument in the various APIs, since it is useless. > > I had a quick look into the module on both branches, and at Antoine's > latest patch (json_py3k-3). ?The current situation on trunk is indeed > not very pretty in terms of code duplication, and I agree it would be > nice not to carry that forward. > > I couldn't figure out a way to get rid of it short of multi-#including > "templates" and playing with the C preprocessor, however, and have the > nagging feeling the latter would be frowned upon by the maintainers. > > There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm > wrong about that. ?Should I give it a try, and see how "clean" the > result can be made? > >> Under the new situation, json would only ever allow str as input, and >> output str as well. By posting here, I want to know whether anybody >> would oppose this (knowing, once again, that bytes support is already >> broken in the current py3k trunk). > > Provided one of the alternatives is dropped, wouldn't it be better to do > the opposite, i.e., have the decoder take bytes as input, and the > encoder produce bytes?and layer the str functionality on top of that? ?I > guess the answer depends on how the (most common) lower layers are > structured, but it would be nice to allow a straight bytes path to/from > the underlying transport. > > (I'm willing to have a go at the conversion in case somebody is > interested.) > > Bob, would you have an idea of which lower layers are most commonly used > with the json module, and whether people are more likely to expect strs > or bytes in Python 3.x? ?Maybe that data could be inferred from some bug > tracking system? I don't know what Python 3.x users expect. As far as I know, none of the lower layers of the json package are used directly. They're certainly not supposed to be or documented as such. My use case for dumps is typically bytes output because we push it straight to and from IO. Some people embed JSON in other documents (e.g. HTML) where you would want it to be text. I'm pretty sure that the IO case is more common. -bob From dd at crosstwine.com Mon Apr 27 17:22:32 2009 From: dd at crosstwine.com (Damien Diederen) Date: Mon, 27 Apr 2009 17:22:32 +0200 Subject: [Python-Dev] Dropping bytes "support" in json In-Reply-To: <26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com> (Eric Smith's message of "Mon, 27 Apr 2009 11:05:04 -0400 (EDT)") References: <87k556dvh0.fsf@keem.bcc> <26274.63.251.87.214.1240844704.squirrel@mail.trueblade.com> Message-ID: <87y6tmce9z.fsf@keem.bcc> Hi Eric, "Eric Smith" writes: >> I couldn't figure out a way to get rid of it short of multi-#including >> "templates" and playing with the C preprocessor, however, and have the >> nagging feeling the latter would be frowned upon by the maintainers. > > Not sure if this is exactly what you mean, but look at Objects/stringlib. > str.format() and unicode.format() share the same implementation, using > stringdefs.h and unicodedefs.h. That's indeed a much better example! I'm more confortable applying the same technique to the json module now that I see it used in the core. (Provided Bob and Antoine are not turned away by the relative ugliness, that is.) > Eric. Cheers, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen From solipsis at pitrou.net Mon Apr 27 17:24:29 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 15:24:29 +0000 (UTC) Subject: [Python-Dev] Dropping bytes "support" in json References: <87k556dvh0.fsf@keem.bcc> Message-ID: Damien Diederen
crosstwine.com> writes: > > I couldn't figure out a way to get rid of it short of multi-#including > "templates" and playing with the C preprocessor, however, and have the > nagging feeling the latter would be frowned upon by the maintainers. > > There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm > wrong about that. Should I give it a try, and see how "clean" the > result can be made? Keep in mind that json is externally maintained by Bob. The more we rework his code, the less easy it will be to backport other changes from the simplejson library. I think we should either keep the code duplication (if we want to keep fast paths for both bytes and str objects), or only keep one of the two versions as my patch does. > Provided one of the alternatives is dropped, wouldn't it be better to do > the opposite, i.e., have the decoder take bytes as input, and the > encoder produce bytes?and layer the str functionality on top of that? I > guess the answer depends on how the (most common) lower layers are > structured, but it would be nice to allow a straight bytes path to/from > the underlying transport. The straightest path is actually to/from unicode, since JSON data can contain unicode strings but no byte strings. Also, the json library /has/ to output unicode when `ensure_ascii` is False. In 2.x: >>> json.dumps([u"?l?phant"], ensure_ascii=False) u'["\xe9l\xe9phant"]' In any case, I don't think it will matter much in terms of speed whether we take one route or the other. UTF-8 encoding/decoding is probably much faster (in characters per second) than JSON encoding/decoding is. Regards Antoine. From stephen at xemacs.org Mon Apr 27 17:47:05 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 00:47:05 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > I'm not sure how mail being stuck in a pipeline has anything to do > with Martin's proposal (which deals with file paths, not with > SMTP...). I hate to break it to you, but most stages of mail processing have very little to do with SMTP. In particular, processing MIME attachments often requires dealing with file names. Would practical problems arise? I expect they would. Can I tell you what they are? No; if I could I'd write a better PEP. I'm just saying that my experience is that Murphy's Law applies more to encoding processing than any other area of software I've worked in (admittedly, I don't do threads ;-). > Besides, I don't care about spammers and their broken software. That's precisely my point. The PEP's "solution" will be very appealing to people who just don't care as long as it works for them, in the subset of corner cases they happen to encounter. A lot of software, including low-level components, will be written using these APIs, and they will result in escapes of uninterpreted bytes (encoded as Unicode) into the textual world. > So you're arguing that whatever solution which isn't 100% perfect > but only 99.999% perfect shouldn't be implemented at all, and leave > the status quo at 98%? No, I'm not talking about "whatever solution". I'm only arguing about PEP 383. The point is that Martin's proposal is not just a solution to the problem he posed. It's also going to be the one obvious way to make the usual mistakes, i.e., the return values will escape into code paths they're not intended for. And the APIs won't be killable until Python 4000. If we find a better way (which I think Python 3's move to "text is Unicode" is likely to inspire!), we'll have to wait 10-15 years or more before it becomes the OOWTDI. The only real hope about that is that Unicode will become universal before that, and only archaeologists will ever encounter malformed text. I believe there are solutions that don't have that problem. Specifically, if the return values were bytes, or (better for 2.x, where bytes are strings as far as most programmers are concerned) as a new data type, to indicate that they're not text until the client acknowledges them as such. EIBTI. Unfortunately, Martin clearly doesn't intend to make such a change to the PEP. I don't have the time or the Python expertise to generate an alternative PEP. :-( I do have long experience with the pain of dealing with encoding issues caused by APIs that are intended to DTRT, conveniently. Martin's is better than most, but I just don't think convenience and robustness can be combined in this area. > This sounds disturbing to me. BTW, I'm on record as +0 on the PEP. I don't think the better proposals have a chance, because most people *want* the non-solution that they can just use as a habit, allowing Python to make decisions that should be made by the application, and not have to do "unnecessary" conversions and the like. It's not obvious to me that it should not be given to them, but I don't much like it. From p.f.moore at gmail.com Mon Apr 27 17:58:46 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Apr 2009 16:58:46 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com> 2009/4/27 Stephen J. Turnbull : > I believe there are solutions that don't have that problem. > Specifically, if the return values were bytes, or (better for 2.x, > where bytes are strings as far as most programmers are concerned) as a > new data type, to indicate that they're not text until the client > acknowledges them as such. ?EIBTI. I think you're ignoring the fact that under Windows, it's the *bytes* APIs that are lossy. Can I at least assume that you aren't recommending that only the bytes API exists on Unix, and only the Unicode API on Windows? So what's your suggestion? > Unfortunately, Martin clearly doesn't intend to make such a change to > the PEP. ?I don't have the time or the Python expertise to generate an > alternative PEP. :-( ?I do have long experience with the pain of > dealing with encoding issues caused by APIs that are intended to DTRT, > conveniently. ?Martin's is better than most, but I just don't think > convenience and robustness can be combined in this area. The *only* "robust" solution is to completely separate the 2 platforms. Which helps no-one, and is at least as bad as the 2.x situation. (Probably worse). > BTW, I'm on record as +0 on the PEP. ?I don't think the better > proposals have a chance, because most people *want* the non-solution > that they can just use as a habit, allowing Python to make decisions > that should be made by the application, and not have to do > "unnecessary" conversions and the like. ?It's not obvious to me that > it should not be given to them, but I don't much like it. People *want* a solution that doesn't require every application developer to sweat blood to write working code, simply to cover corner cases that they don't believe will happen. Not every application is a 24x7 server, and all that. Similarly, not every application is a backup program. Such applications have unique issues, which the developers should (but don't always, admittedly!) understand. The rest of us don't want to be made to care. It's not sloppiness. It's a realistic appreciation of the requirements of the application. (And an acceptance that not every bug must be fixed before release). Paul. From aahz at pythoncraft.com Mon Apr 27 17:59:13 2009 From: aahz at pythoncraft.com (Aahz) Date: Mon, 27 Apr 2009 08:59:13 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System?Character?Interfaces In-Reply-To: References: <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20090427155912.GA524@panix.com> On Mon, Apr 27, 2009, Antoine Pitrou wrote: > Stephen J. Turnbull xemacs.org> writes: >> >> If >> you see a broken encoding once, you're likely to see it a million times >> (spammers have the most broken software) or maybe have it raise an >> unhandled Exception a dozen times (in rate of using busted software, >> the spammers are closely followed by bosses---which would be very bad, >> eh, if you 2/3 of the mail from your boss ends up in an undeliverables >> queue due to encoding errors that are unhandled by your some filter in >> your mail pipeline). > > Besides, I don't care about spammers and their broken software. Maybe you don't, but anyone who has to process random messages does; you have to assume that messages will be broken. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From solipsis at pitrou.net Mon Apr 27 18:09:07 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 16:09:07 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull xemacs.org> writes: > > I hate to break it to you, but most stages of mail processing have > very little to do with SMTP. In particular, processing MIME > attachments often requires dealing with file names. AFAIK, the file name is only there as an indication for the user when he wants to save the file. If it's garbled a bit, no big deal. > The point is that Martin's proposal is not just a solution > to the problem he posed. But you haven't concretely demonstrated it with actual use cases. The problems that the PEP tries to solve, conversely, /have/ been experienced. > And the APIs won't be killable until > Python 4000. Which APIs? The PEP doesn't propose any new API, it just enhances the implementation of current APIs so that they work out of the box in all cases. > Specifically, if the return values were bytes, ... it would make Windows support worse. > or (better for 2.x, > where bytes are strings as far as most programmers are concerned) as a > new data type, I'm -1 on any new string-like type (for file paths or whatever else) with custom encoding/decoding semantics. It's the best way to ruin the clean str/bytes separation that 3.x introduced. Besides, the goal is also to makes things easier for the programmer. Otherwise, we'll have the same situation as in 2.x where many English-centric programmers produced code that was incapable of dealing with non-ASCII input, because they didn't care about the distinction between str and unicode. Regards Antoine. From dd at crosstwine.com Mon Apr 27 18:21:15 2009 From: dd at crosstwine.com (Damien Diederen) Date: Mon, 27 Apr 2009 18:21:15 +0200 Subject: [Python-Dev] Dropping bytes "support" in json References: <87k556dvh0.fsf@keem.bcc> Message-ID: <87ab62awzo.fsf@keem.bcc> Hi Antoine, Antoine Pitrou writes: > Damien Diederen
crosstwine.com> writes: >> I couldn't figure out a way to get rid of it short of multi-#including >> "templates" and playing with the C preprocessor, however, and have the >> nagging feeling the latter would be frowned upon by the maintainers. >> >> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm >> wrong about that. Should I give it a try, and see how "clean" the >> result can be made? > > Keep in mind that json is externally maintained by Bob. The more we rework his > code, the less easy it will be to backport other changes from the simplejson > library. > > I think we should either keep the code duplication (if we want to keep fast > paths for both bytes and str objects), or only keep one of the two versions as > my patch does. Yes, I was (slowly) reaching the same conclusion. >> Provided one of the alternatives is dropped, wouldn't it be better to do >> the opposite, i.e., have the decoder take bytes as input, and the >> encoder produce bytes?and layer the str functionality on top of that? I >> guess the answer depends on how the (most common) lower layers are >> structured, but it would be nice to allow a straight bytes path to/from >> the underlying transport. > > The straightest path is actually to/from unicode, since JSON data can contain > unicode strings but no byte strings. Also, the json library /has/ to output > unicode when `ensure_ascii` is False. In 2.x: > >>>> json.dumps([u"?l?phant"], ensure_ascii=False) > u'["\xe9l\xe9phant"]' > > In any case, I don't think it will matter much in terms of speed > whether we take one route or the other. UTF-8 encoding/decoding is > probably much faster (in characters per second) than JSON > encoding/decoding is. You're undoubtedly right. I was more concerned about the interaction with other modules, and avoiding unnecessary copies/conversions especially when they don't make sense from the user's perspective. I will whip up a patch adding a {loadb,dumpb} API as you suggested in another email, with the most trivial implementation, and then we'll see where to go from there. It can still be dropped if there is a concern of perpetuating a "bad idea," or I can follow up with a port of Bob's "bytes" implementation from 2.x if there is any interest. > Regards > Antoine. Cheers, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen From jek-gmane1 at kleckner.net Mon Apr 27 19:10:31 2009 From: jek-gmane1 at kleckner.net (Jim Kleckner) Date: Mon, 27 Apr 2009 10:10:31 -0700 Subject: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1 Message-ID: I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755 with the message "system cannot open the device or file". I uninstalled 2.6.1, removing all residual files also, and got the error message again. When I ran msiexec as follows to get a log, it magically worked: msiexec /i python-2.6.2.msi /l*v install.log Should I attempt to explore this further or just be happy? From stephen at xemacs.org Mon Apr 27 19:45:15 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 02:45:15 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904270858h72760fe6m2248f0e8bb99c3d7@mail.gmail.com> Message-ID: <87y6tmj8ic.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > 2009/4/27 Stephen J. Turnbull : > > I believe there are solutions that don't have that problem. > > Specifically, if the return values were bytes, or (better for 2.x, > > where bytes are strings as far as most programmers are concerned) as a > > new data type, to indicate that they're not text until the client > > acknowledges them as such. ?EIBTI. > > I think you're ignoring the fact that under Windows, it's the *bytes* > APIs that are lossy. The *Windows* bytes APIs may be lossy. Python's bytes on the other hand can represent anything that UTF-16 can. Just represented as UTF-8. The point is that in Python 3 "bytes" means it's *your* responsibility, not Python's, to decode that data. The advantage of a new data type is that Python can provide ways to do it and hide the internal representation (in theory, it could even be different for the different platforms). > Can I at least assume that you aren't recommending that only the bytes > API exists on Unix, and only the Unicode API on Windows? I'm agnostic about the underlying APIs used to talk to the OS; people who actually use that OS should decide that. I'm just recommending that the return values of the getters not be of a "character string" type until converted explicitly by the application. > The *only* "robust" solution is to completely separate the 2 > platforms. I'm not so pessimistic, unless you're referring to Microsoft's penchant for forking any solution they don't own. > People *want* a solution that doesn't require every application > developer to sweat blood to write working code, simply to cover > corner cases that they don't believe will happen. The rest of us > don't want to be made to care. Well, yes, I wrote pretty much the same thing in the post you're replying to. But do you really think PEP 383 as written is the unique solution to those requirements? From stephen at xemacs.org Mon Apr 27 20:04:44 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 03:04:44 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > > or (better for 2.x, where bytes are strings as far as most > > programmers are concerned) as a new data type, > > I'm -1 on any new string-like type (for file paths or whatever > else) with custom encoding/decoding semantics. It's the best way to > ruin the clean str/bytes separation that 3.x introduced. Excuse me, but I can't see a scheme that encodes bytes as Unicodes but only sometimes as a "clean separation". It's a dirty hack that makes life a lot easier for Windows programmers and a little easier for many Unix programmers. Practicality beats purity, true, but at the cost of the purity. > Besides, the goal is also to makes things easier for the > programmer. Otherwise, we'll have the same situation as in 2.x > where many English-centric programmers produced code that was > incapable of dealing with non-ASCII input, because they didn't care > about the distinction between str and unicode. So what you'll get here, AFAICS, is a new situation where many Windows-centric programmers will produce code that's incapable of dealing with non-Unicode input because they don't have to care about the distinction between Unicode and bytes. That's an improvement, but we can do still better and not at huge expense to programmers. From tonynelson at georgeanelson.com Mon Apr 27 20:08:51 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Mon, 27 Apr 2009 14:08:51 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F5532D.2090700@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zhlwtv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F215E5.4050205@g.nevcal.com> <49F30390.2040808@v.loewis.de> <49F5532D.2090700@g.nevcal.com> Message-ID: At 23:39 -0700 04/26/2009, Glenn Linderman wrote: >On approximately 4/25/2009 5:35 AM, came the following characters from >the keyboard of Martin v. L?wis: >>> Because the encoding is not reliably reversible. >> >> Why do you say that? The encoding is completely reversible >> (unless we disagree on what "reversible" means). >> >>> I'm +1 on the concept, -1 on the PEP, due solely to the lack of a >>> reversible encoding. >> >> Then please provide an example for a setup where it is not reversible. >> >> Regards, >> Martin > >It is reversible if you know that it is decoded, and apply the encoding. > But if you don't know that has been encoded, then applying the reverse >transform can convert an undecoded str that matches the decoded str to >the form that it could have, but never did take. > >The problem is that there is no guarantee that the str interface >provides only strictly conforming Unicode, so decoding bytes to >non-strictly conforming Unicode, can result in a data pun between >non-strictly conforming Unicode coming from the str interface vs bytes >being decoded to non-strictly conforming Unicode coming from the bytes >interface. ... Maybe this is a dumb idea, but some people might be reassured if the half-surrogates had some particular pattern that is unlikely to occur even in unreasonable text (as half-surrogates are an error in Unicode). The pattern could be some sequence of half-surrogate encoded bytes, framing the intended data, as is done for RFC 2047 internationalized header fields in email. It would take up a few more bytes in the string, but no matter. It would also make it easier to diagnose when decoding was not properly done. FWIW, I like the idea in the PEP, now that I think I understand it. (BTW, gotta love what the email package is doing to the Subject: header field. ;-') -- ____________________________________________________________________ TonyN.:' ' From tonynelson at georgeanelson.com Mon Apr 27 20:07:45 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Mon, 27 Apr 2009 14:07:45 -0400 Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: At 16:09 +0000 04/27/2009, Antoine Pitrou wrote: >Stephen J. Turnbull xemacs.org> writes: >> >> I hate to break it to you, but most stages of mail processing have >> very little to do with SMTP. In particular, processing MIME >> attachments often requires dealing with file names. > >AFAIK, the file name is only there as an indication for the user when he wants >to save the file. If it's garbled a bit, no big deal. ... Yep. In fact, it should be cleaned carefully. RFC 2183, 2.3: "It is important that the receiving MUA not blindly use the suggested filename. The suggested filename SHOULD be checked (and possibly changed) to see that it conforms to local filesystem conventions, does not overwrite an existing file, and does not present a security problem (see Security Considerations below). The receiving MUA SHOULD NOT respect any directory path information that may seem to be present in the filename parameter. The filename should be treated as a terminal component only. Portable specification of directory paths might possibly be done in the future via a separate Content Disposition parmeter, but no provision is made for it in this draft." -- ____________________________________________________________________ TonyN.:' ' From solipsis at pitrou.net Mon Apr 27 20:13:47 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 18:13:47 +0000 (UTC) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull xemacs.org> writes: > > Excuse me, but I can't see a scheme that encodes bytes as Unicodes but > only sometimes as a "clean separation". Yet it is. Filenames are all unicode, without exception, and there's no implicit conversion to bytes. That's a clean separation. > So what you'll get here, AFAICS, is a new situation where many > Windows-centric programmers will produce code that's incapable of > dealing with non-Unicode input because they don't have to care about > the distinction between Unicode and bytes. I don't understand what you're saying. py3k filenames are all unicode, even on POSIX systems, so where is the problem with/for Windows programmers? From asmodai at in-nomine.org Mon Apr 27 20:28:40 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Mon, 27 Apr 2009 20:28:40 +0200 Subject: [Python-Dev] UTF-8 Decoder In-Reply-To: References: <20090413080908.GM13110@nexus.in-nomine.org> Message-ID: <20090427182840.GA64563@nexus.in-nomine.org> -On [20090414 16:43], Antoine Pitrou (solipsis at pitrou.net) wrote: >If you have some time on your hands, you could try benchmarking it against >Python 3.1's (py3k) decoder. There are two cases to consider: Bjoern actually did it himself already: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance (results are Large, Medium, Tiny) PyUnicode_DecodeUTF8Stateful (3.1a2), Visual C++ 7.1 -Ox -Ot -G7 4523ms 5686ms 3138ms Manually inlined transcoder (see above), Visual C++ 7.1 -Ox -Ot -G7 4277ms 4998ms 4640ms So on medium and large datasets the decoder of Bjoern is very interesting, but the tiny case (just Bjoern's name) is quite a tad bit slower. The other cases seems more typical of what the average use in Python would be. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Nobilitas sola est atque unica virtus... From solipsis at pitrou.net Mon Apr 27 20:48:38 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 18:48:38 +0000 (UTC) Subject: [Python-Dev] UTF-8 Decoder References: <20090413080908.GM13110@nexus.in-nomine.org> <20090427182840.GA64563@nexus.in-nomine.org> Message-ID: Jeroen Ruigrok van der Werven in-nomine.org> writes: > > So on medium and large datasets the decoder of Bjoern is very interesting, > but the tiny case (just Bjoern's name) is quite a tad bit slower. The other > cases seems more typical of what the average use in Python would be. Keep in mind what the datasets are: ? The large buffer is a April 2009 Hindi Wikipedia article XML dump, the medium buffer Markus Kuhn's UTF-8-demo.txt, and the tiny buffer my name ? It would be interesting to test with mostly ASCII data to see what that gives. Now the good thing is that, even with wildly non-ASCII data, our current decoder is very efficient. Regards Antoine. From martin at v.loewis.de Mon Apr 27 21:42:02 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 27 Apr 2009 21:42:02 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F559A4.8050400@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> Message-ID: <49F60A8A.8090603@v.loewis.de> >> It's a private use area. It will never carry an official character >> assignment. > > > I know that U+F0000 - U+FFFFF is a private use area. I don't find a > definition of U+F01xx to know what the notation means. Are you picking > a particular character within the private use area, or a particular > range, or what? It's a range. The lower-case 'x' denotes a variable half-byte, ranging from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code points. Regards, Martin From martin at v.loewis.de Mon Apr 27 21:48:27 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 27 Apr 2009 21:48:27 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F56F8B.7030108@g.nevcal.com> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> Message-ID: <49F60C0B.9000905@v.loewis.de> >>> There are still issues regarding how Windows and POSIX programs that >>> are sharing cross-mounted file systems might communicate file names >>> between each other, which is not at all clear from the PEP. If this >>> is an insoluble or un-addressed issue, it should be stated. (It is >>> probably insoluble, due to there being multiple ways that the >>> cross-mounted file systems might translate names; but if there are, >>> can we learn something from the rules the mounting systems use, to >>> be compatible with (one of) them, or not. >>> >> >> I'd say that's out of scope. A windows filesystem mounted on a UNIX host >> should probably be mounted with a mapping to translate the Windows >> Unicode names into whatever the sysadmin deems the locally most apt >> byte encoding. But sys.getfilesystemencoding() is based on the current >> user's locale settings, which need not be the same. >> > > And if it were, what would it do with files that can't be encoded with > the locally most apt byte encoding? As Cameron says: it's out of the scope of the PEP. It really depends how the operating system deals with them. Most likely, the files are not accessible - not only not from Python, but also not accessible from any other Unix program. Details depend on the specific operating system software being used, and the specific parameters passed to it. > That's where we might learn > something about what behaviors are deemed acceptable. Would such files > be inaccessible? Accessible with mangled names? or what? Difficult to tell. What operating system did you use, and what mount options did you pass? > And for a Unix filesystem mounted on a Windows host? Or accessed via > some network connection? Same issue really: what specific mounting software did you use? Windows cannot mount Unix file systems on its own, or through some network connection. Regards, Martin From martin at v.loewis.de Mon Apr 27 22:27:27 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 27 Apr 2009 22:27:27 +0200 Subject: [Python-Dev] 2.6.2 Vista installer failure on upgrade from 2.6.1 In-Reply-To: References: Message-ID: <49F6152F.9080707@v.loewis.de> Jim Kleckner wrote: > I went to upgrade a Vista machine from 2.6.1 to 2.6.2 and got error 2755 > with the message "system cannot open the device or file". > > I uninstalled 2.6.1, removing all residual files also, and got the error > message again. > > When I ran msiexec as follows to get a log, it magically worked: > msiexec /i python-2.6.2.msi /l*v install.log > > Should I attempt to explore this further or just be happy? Where you by an chance using a SUBSTed drive? If so, just be happy: this is a known limitation (of Windows installer). Otherwise, if you can contribute a useful bug report (or even a patch), please go ahead. I would try to turn logging on through the registry and see whether that gives any insight. Regards, Martin From cs at zip.com.au Mon Apr 27 23:14:47 2009 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 28 Apr 2009 07:14:47 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F559A4.8050400@g.nevcal.com> Message-ID: <20090427211447.GA4291@cskk.homeip.net> On 27Apr2009 00:07, Glenn Linderman wrote: > On approximately 4/25/2009 5:22 AM, came the following characters from > the keyboard of Martin v. L?wis: >>> The problem with this, and other preceding schemes that have been >>> discussed here, is that there is no means of ascertaining whether a >>> particular file name str was obtained from a str API, or was funny- >>> decoded from a bytes API... and thus, there is no means of reliably >>> ascertaining whether a particular filename str should be passed to a >>> str API, or funny-encoded back to bytes. >> >> Why is it necessary that you are able to make this distinction? > > > It is necessary that programs (not me) can make the distinction, so that > it knows whether or not to do the funny-encoding or not. I would say this isn't so. It's important that programs know if they're dealing with strings-for-filenames, but not that they be able to figure that out "a priori" if handed a bare string (especially since they can't:-) > If a name is > funny-decoded when the name is accessed by a directory listing, it needs > to be funny-encoded in order to open the file. Hmm. I had thought that legitimate unicode strings already get transcoded to bytes via the mapping specified by sys.getfilesystemencoding() (the user's locale). That already happens I believe, and Martin's scheme doesn't change this. He's just funny-encoding non-decodable byte sequences, not the decoded stuff that surrounds them. So it is already the case that strings get decoded to bytes by calls like open(). Martin isn't changing that. I suppose if your program carefully constructs a unicode string riddled with half-surrogates etc and imagines something specific should happen to them on the way to being POSIX bytes then you might have a problem... I think the advantage to Martin's choice of encoding-for-undecodable-bytes is that it _doesn't_ use normal characters for the special bits. This means that _all_ normal characters are left unmangled un both "bare" and "funny-encoded" strings. Because of that, I now think I'm -1 on your "use printable characters for the encoding". I think presentation of the special characters _should_ look bogus in an app (eg little rectangles or whatever in a GUI); it's a fine flashing red light to the user. Also, by avoiding reuse of legitimate characters in the encoding we can avoid your issue with losing track of where a string came from; legitimate characters are currently untouched by Martin's scheme, except for the normal "bytes<->string via the user's locale" translation that must already happen, and there you're aided by byets and strings being different types. > I'm certainly not experienced enough in Python development processes or > internals to attempt such, as yet. But somewhere in 25 years of > programming, I picked up the knowledge that if you want to have a 1-to-1 > reversible mapping, you have to avoid data puns, mappings of two > different data values into a single data value. Your PEP, as first > written, didn't seem to do that... since there are two interfaces from > which to obtain data values, one performing a mapping from bytes to > "funny invalid" Unicode, and the other performing no mapping, but > accepting any sort of Unicode, possibly including "funny invalid" > Unicode, the possibility of data puns seems to exist. I may be > misunderstanding something about the use cases that prevent these two > sources of "funny invalid" Unicode from ever coexisting, but if so, > perhaps you could point it out, or clarify the PEP. Please elucidate the "second source" of strings. I'm presuming you mean strings egenrated from scratch rather than obtained by something like listdir(). Given such a string with "funny invalid" stuff in it, and _absent_ Martin's scheme, what do you expect the source of the strings to _expect_ to happen to them if passed to open()? They still have to be converted to bytes at the POSIX layer anyway. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Heaven could change from chocolate to vanilla without violating perfection. - arromdee at jyusenkyou.cs.jhu.edu (Ken Arromdee) From hodgestar+pythondev at gmail.com Mon Apr 27 23:27:23 2009 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Mon, 27 Apr 2009 23:27:23 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F60C0B.9000905@v.loewis.de> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> Message-ID: On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. L?wis" wrote: > As Cameron says: it's out of the scope of the PEP. It really depends how > the operating system deals with them. Most likely, the files are not > accessible - not only not from Python, but also not accessible from > any other Unix program. Details depend on the specific operating system > software being used, and the specific parameters passed to it. $ touch $'\xFF\xAA\xFF' $ vi $'\xFF\xAA\xFF' $ egrep foo $'\xFF\xAA\xFF' All worked fine from my Bash shell with locale encoding set to UTF-8. I can also open the created file from the GNOME editor file dialog (it even tells me the filename is not valid in my locale's encoding). The Nedit editor also worked. So far I haven't found anything that failed. From martin at v.loewis.de Mon Apr 27 23:33:56 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 27 Apr 2009 23:33:56 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> Message-ID: <49F624C4.10006@v.loewis.de> > $ touch $'\xFF\xAA\xFF' > $ vi $'\xFF\xAA\xFF' > $ egrep foo $'\xFF\xAA\xFF' > > All worked fine from my Bash shell with locale encoding set to UTF-8. > I can also open the created file from the GNOME editor file dialog (it > even tells me the filename is not valid in my locale's encoding). The > Nedit editor also worked. So far I haven't found anything that failed. So what SMB server did you mount here, using what software, and what mount options? I think you might be referring to an entirely different use case. Regards, Martin From solipsis at pitrou.net Mon Apr 27 23:55:41 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 27 Apr 2009 21:55:41 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> Message-ID: Simon Cross gmail.com> writes: > > $ touch $'\xFF\xAA\xFF' > $ vi $'\xFF\xAA\xFF' > $ egrep foo $'\xFF\xAA\xFF' > > All worked fine from my Bash shell with locale encoding set to UTF-8. The PEP is precisely about making py3k able to better handle these files (right now os.listdir() doesn't return the offending file in its list of results). Regards Antoine. From fuzzyman at voidspace.org.uk Tue Apr 28 00:56:05 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 27 Apr 2009 23:56:05 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System C haracter Interfaces In-Reply-To: <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <49F63805.6000208@voidspace.org.uk> Stephen J. Turnbull wrote: > Antoine Pitrou writes: > > > > or (better for 2.x, where bytes are strings as far as most > > > programmers are concerned) as a new data type, > > > > I'm -1 on any new string-like type (for file paths or whatever > > else) with custom encoding/decoding semantics. It's the best way to > > ruin the clean str/bytes separation that 3.x introduced. > > Excuse me, but I can't see a scheme that encodes bytes as Unicodes but > only sometimes as a "clean separation". It's a dirty hack that makes > life a lot easier for Windows programmers and a little easier for many > Unix programmers. Practicality beats purity, true, but at the cost of > the purity. > > The problem you don't address, which is still the reality for most programmers (especially Mac OS X where filesystem encoding is UTF 8), is that programmers *are* going to treat filenames as strings. The proposed PEP allows that to work for them - whatever platform their program runs on. Michael > > Besides, the goal is also to makes things easier for the > > programmer. Otherwise, we'll have the same situation as in 2.x > > where many English-centric programmers produced code that was > > incapable of dealing with non-ASCII input, because they didn't care > > about the distinction between str and unicode. > > So what you'll get here, AFAICS, is a new situation where many > Windows-centric programmers will produce code that's incapable of > dealing with non-Unicode input because they don't have to care about > the distinction between Unicode and bytes. > > That's an improvement, but we can do still better and not at huge > expense to programmers. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From v+python at g.nevcal.com Tue Apr 28 01:09:13 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 16:09:13 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F60A8A.8090603@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> Message-ID: <49F63B19.7010306@g.nevcal.com> On approximately 4/27/2009 12:42 PM, came the following characters from the keyboard of Martin v. L?wis: >>> It's a private use area. It will never carry an official character >>> assignment. >> >> I know that U+F0000 - U+FFFFF is a private use area. I don't find a >> definition of U+F01xx to know what the notation means. Are you picking >> a particular character within the private use area, or a particular >> range, or what? > > It's a range. The lower-case 'x' denotes a variable half-byte, ranging > from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code > points. So you only need 128 code points, so there is something else unclear. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Tue Apr 28 01:46:06 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 16:46:06 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F60C0B.9000905@v.loewis.de> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> Message-ID: <49F643BE.4050605@g.nevcal.com> On approximately 4/27/2009 12:48 PM, came the following characters from the keyboard of Martin v. L?wis: >>>> There are still issues regarding how Windows and POSIX programs that >>>> are sharing cross-mounted file systems might communicate file names >>>> between each other, which is not at all clear from the PEP. If this >>>> is an insoluble or un-addressed issue, it should be stated. (It is >>>> probably insoluble, due to there being multiple ways that the >>>> cross-mounted file systems might translate names; but if there are, >>>> can we learn something from the rules the mounting systems use, to >>>> be compatible with (one of) them, or not. >>>> >>>> >>> I'd say that's out of scope. A windows filesystem mounted on a UNIX host >>> should probably be mounted with a mapping to translate the Windows >>> Unicode names into whatever the sysadmin deems the locally most apt >>> byte encoding. But sys.getfilesystemencoding() is based on the current >>> user's locale settings, which need not be the same. >>> >>> >> And if it were, what would it do with files that can't be encoded with >> the locally most apt byte encoding? >> > > As Cameron says: it's out of the scope of the PEP. It really depends how > the operating system deals with them. Most likely, the files are not > accessible - not only not from Python, but also not accessible from > any other Unix program. Details depend on the specific operating system > software being used, and the specific parameters passed to it. > I'm not suggesting the PEP should solve the problem of mounting foreign file systems, although if it doesn't it should probably point that out. I'm just suggesting that if the people that write software to solve the problem of mounting foreign file systems have already solved the naming problem, then it might be a source of a good solution. On the other hand, it might be the source of a mediocre or bad solution. However, if those mounting system have good solutions, it would be good to be compatible with them, rather than have yet another solution. It was in that sense, of thinking about possibly existing practice, and leveraging an existing solution, that caused me to bring up the topic. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From steve at pearwood.info Tue Apr 28 02:27:17 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Apr 2009 10:27:17 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <200904281027.20431.steve@pearwood.info> On Tue, 28 Apr 2009 04:13:47 am Antoine Pitrou wrote: > Stephen J. Turnbull xemacs.org> writes: ... > > So what you'll get here, AFAICS, is a new situation where many > > Windows-centric programmers will produce code that's incapable of > > dealing with non-Unicode input because they don't have to care > > about the distinction between Unicode and bytes. > > I don't understand what you're saying. py3k filenames are all > unicode, even on POSIX systems, How is that possible on POSIX systems where the underlying file system uses bytes for filenames? If I write a piece of Python code: filename = 'some path/some name' I might call it a filename, I might think of it as a filename, but it *isn't*, it's a string in a Python program. It isn't a filename until it hits the file system, and in POSIX systems that makes it bytes. -- Steven D'Aprano From cs at zip.com.au Tue Apr 28 02:42:32 2009 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 28 Apr 2009 10:42:32 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F60C0B.9000905@v.loewis.de> Message-ID: <20090428004232.GA12325@cskk.homeip.net> On 27Apr2009 21:48, Martin v. L?wis wrote: | >>> There are still issues regarding how Windows and POSIX programs that | >>> are sharing cross-mounted file systems might communicate file names | >>> between each other, which is not at all clear from the PEP. If this | >>> is an insoluble or un-addressed issue, it should be stated. (It is | >>> probably insoluble, due to there being multiple ways that the | >>> cross-mounted file systems might translate names; but if there are, | >>> can we learn something from the rules the mounting systems use, to | >>> be compatible with (one of) them, or not. | >> | >> I'd say that's out of scope. A windows filesystem mounted on a UNIX host | >> should probably be mounted with a mapping to translate the Windows | >> Unicode names into whatever the sysadmin deems the locally most apt | >> byte encoding. But sys.getfilesystemencoding() is based on the current | >> user's locale settings, which need not be the same. | >> | > | > And if it were, what would it do with files that can't be encoded with | > the locally most apt byte encoding? | | As Cameron says: it's out of the scope of the PEP. It really depends how | the operating system deals with them. Most likely, the files are not | accessible - not only not from Python, but also not accessible from | any other Unix program. Well... If the files exist and the encoding of the mount software permits, there will be a sequence of bytes for the filename, and it will be accessible to a pure UNIX byte-speaking program. It will also be accessible from Python, because the os.* calls convert both ways: bytes->string an string->bytes as required. Martin's PEP just makes that lossless, which current it is not. Conversely, if the mount software refuses to map the filename to a POSIX byte string, the file won't exist, or will refuse to be created. For a concrete example we have but to observe my macify program I was trying to counter the PEP with (I'm now a convert, btw). It is to run on a real UNIX system and recode filenames into UTF-8 NFD, _prior_ to rsyncing to a Mac. Why? Because the MacOSX HFS filesystem refuses to accept byte strings not parsable by that encoding, and my music rsyncs were exploding, refusing to create files on the target Mac. And there's probably some grey area where a dodgy mount software will present names that can't be used. There's a supposed counter example in another followup post which I'll address there, since it seemed a little bogus to me. I think that, almost independent of this PEP, there should be an os.fsencode() function that takes a byte string (as a POSIX OS call will take) and performs the _same_ byte->string encoding that listdir() and friends are doing under the hood. And a partner os.fsdecode() for string->bytes. That will save a lot of wheel respoking and probably make it easier for people to think about this. Aside: thinking on that, perhaps those functions should be in posix.*, or alternatively would a Windows system offer them in os.* to produce native UTF-16 byte strings; useless for the WIndows API which cleanly takes unicode (I gather) but perhaps handy for people hacking filesystems directly or something like that. (Except I gather from a former existence that there is a multitude of on-disk filename encoding under WIndows depending how old your filesystems are and if they're FAT or NTFS, etc). Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Your eyes are weary from staring at the CRT. You feel sleepy. Notice how restful it is to watch the cursor blink. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise. - gabrielh at tplrd.tpl.oz.au From cs at zip.com.au Tue Apr 28 02:48:09 2009 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 28 Apr 2009 10:48:09 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: Message-ID: <20090428004809.GA17780@cskk.homeip.net> On 27Apr2009 23:27, Simon Cross wrote: | On Mon, Apr 27, 2009 at 9:48 PM, "Martin v. L?wis" wrote: | > As Cameron says: it's out of the scope of the PEP. It really depends how | > the operating system deals with them. Most likely, the files are not | > accessible - not only not from Python, but also not accessible from | > any other Unix program. Details depend on the specific operating system | > software being used, and the specific parameters passed to it. | | $ touch $'\xFF\xAA\xFF' | $ vi $'\xFF\xAA\xFF' | $ egrep foo $'\xFF\xAA\xFF' | | All worked fine from my Bash shell with locale encoding set to UTF-8. | I can also open the created file from the GNOME editor file dialog (it | even tells me the filename is not valid in my locale's encoding). The | Nedit editor also worked. So far I haven't found anything that failed. Yes, they would. Are you doing that on a real UNIX filesystem (ext2/3/4, XFS etc)? I'm not sure whether you're arguing for or against the propsal here, btw. This would make a file with a presumably UTF-8-invalid name. Martin's proposal would cheerfully map that losslessly to a string. Is there a problem here? -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Stepwise Refinement n. A sequence of kludges K, neither distinct or finite, applied to a program P aimed at transforming it into the target program Q. From benjamin at python.org Tue Apr 28 03:09:10 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 27 Apr 2009 20:09:10 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090428004232.GA12325@cskk.homeip.net> References: <49F60C0B.9000905@v.loewis.de> <20090428004232.GA12325@cskk.homeip.net> Message-ID: <1afaf6160904271809g773641aag3975ff67178ab69@mail.gmail.com> 2009/4/27 Cameron Simpson : > I think that, almost independent of this PEP, there should be an > os.fsencode() function that takes a byte string (as a POSIX OS call > will take) and performs the _same_ byte->string encoding that listdir() > and friends are doing under the hood. And a partner os.fsdecode() for > string->bytes. That will save a lot of wheel respoking and probably make > it easier for people to think about this. some_path.encode(sys.getfilesystemencoding()) -- Regards, Benjamin From v+python at g.nevcal.com Tue Apr 28 03:15:17 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 18:15:17 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090427211447.GA4291@cskk.homeip.net> References: <20090427211447.GA4291@cskk.homeip.net> Message-ID: <49F658A5.7080807@g.nevcal.com> On approximately 4/27/2009 2:14 PM, came the following characters from the keyboard of Cameron Simpson: > On 27Apr2009 00:07, Glenn Linderman wrote: > >> On approximately 4/25/2009 5:22 AM, came the following characters from >> the keyboard of Martin v. L?wis: >> >>>> The problem with this, and other preceding schemes that have been >>>> discussed here, is that there is no means of ascertaining whether a >>>> particular file name str was obtained from a str API, or was funny- >>>> decoded from a bytes API... and thus, there is no means of reliably >>>> ascertaining whether a particular filename str should be passed to a >>>> str API, or funny-encoded back to bytes. >>>> >>> Why is it necessary that you are able to make this distinction? >>> >> It is necessary that programs (not me) can make the distinction, so that >> it knows whether or not to do the funny-encoding or not. >> > > I would say this isn't so. It's important that programs know if they're > dealing with strings-for-filenames, but not that they be able to figure > that out "a priori" if handed a bare string (especially since they > can't:-) > So you agree they can't... that there are data puns. (OK, you may not have thought that through) >> If a name is >> funny-decoded when the name is accessed by a directory listing, it needs >> to be funny-encoded in order to open the file. >> > > Hmm. I had thought that legitimate unicode strings already get transcoded > to bytes via the mapping specified by sys.getfilesystemencoding() > (the user's locale). That already happens I believe, and Martin's > scheme doesn't change this. He's just funny-encoding non-decodable byte > sequences, not the decoded stuff that surrounds them. > So assume a non-decodable sequence in a name. That puts us into Martin's funny-decode scheme. His funny-decode scheme produces a bare string, indistinguishable from a bare string that would be produced by a str API that happens to contain that same sequence. Data puns. So when open is handed the string, should it open the file with the name that matches the string, or the file with the name that funny-decodes to the same string? It can't know, unless it knows that the string is a funny-decoded string or not. > So it is already the case that strings get decoded to bytes by > calls like open(). Martin isn't changing that. > I thought the process of converting strings to bytes is called encoding. You seem to be calling it decoding? > I suppose if your program carefully constructs a unicode string riddled > with half-surrogates etc and imagines something specific should happen > to them on the way to being POSIX bytes then you might have a problem... > Right. Or someone else's program does that. I only want to use Unicode file names. But if those other file names exist, I want to be able to access them, and not accidentally get a different file. > I think the advantage to Martin's choice of encoding-for-undecodable-bytes > is that it _doesn't_ use normal characters for the special bits. This > means that _all_ normal characters are left unmangled un both "bare" > and "funny-encoded" strings. > Whether the characters used for funny decoding are normal or abnormal, unless they are prevented from also appearing in filenames when they are obtained from or passed to other APIs, there is the possibility that the funny-decoded name also exists in the filesystem by the funny-decoded name... a data pun on the name. Whether the characters used for funny decoding are normal or abnormal, if they are not prevented from also appearing in filenames when they are obtained from or passed to other APIs, then in order to prevent data puns, *all* names must be passed through the decoder, and the decoder must perform a 1-to-1 reversible mapping. Martin's funny-decode process does not perform a 1-to-1 reversible mapping (unless he's changed it from the version of the PEP I found to read). This is why some people have suggested using the null character for the decoding, because it and / can't appear in POSIX file names, but everything else can. But that makes it really hard to display the funny-decoded characters. > Because of that, I now think I'm -1 on your "use printable characters > for the encoding". I think presentation of the special characters > _should_ look bogus in an app (eg little rectangles or whatever in a > GUI); it's a fine flashing red light to the user. > The reason I picked a ASCII printable character is just to make it easier for humans to see the encoding. The scheme would also work with a non-ASCII non-printable character... but I fail to see how that would help a human compare the strings on a display of file names. Having a bunch of abnormal characters in a row, displayed using a single replacement glyph, just makes an annoying mess in the file open dialog. > Also, by avoiding reuse of legitimate characters in the encoding we can > avoid your issue with losing track of where a string came from; > legitimate characters are currently untouched by Martin's scheme, except > for the normal "bytes<->string via the user's locale" translation that > must already happen, and there you're aided by byets and strings being > different types. > There are abnormal characters, but there are no illegal characters. NTFS permits any 16-bit "character" code, including abnormal ones, including half-surrogates, and including full surrogate sequences that decode to PUA characters. POSIX permits all byte sequences, including things that look like UTF-8, things that don't look like UTF-8, things that look like half-surrogates, and things that look like full surrogate sequences that decode to PUA characters. So whether the decoding/encoding scheme uses common characters, or uncommon characters, you still have the issue of data puns, unless you use a 1-to-1 transformation, that is reversible. With ASCII strings, I think no one questions that you need to escape the escape characters. C uses \ as an escape character... Everyone understands that if you want to use a \ in a C string, you have to use \\ instead... and that scheme has escaped the boundaries of C to other use cases. But it seems that you think that if we could just find one more character that no one else uses, that we wouldn't have to escape it.... and that could be true, but there aren't any characters that no one else uses. So whatever character (and a range makes it worse) you pick, someone else uses it. So in order for the scheme to work, you have to escape the escape character(s), even in names that wouldn't otherwise need to be funny-decoded. >> I'm certainly not experienced enough in Python development processes or >> internals to attempt such, as yet. But somewhere in 25 years of >> programming, I picked up the knowledge that if you want to have a 1-to-1 >> reversible mapping, you have to avoid data puns, mappings of two >> different data values into a single data value. Your PEP, as first >> written, didn't seem to do that... since there are two interfaces from >> which to obtain data values, one performing a mapping from bytes to >> "funny invalid" Unicode, and the other performing no mapping, but >> accepting any sort of Unicode, possibly including "funny invalid" >> Unicode, the possibility of data puns seems to exist. I may be >> misunderstanding something about the use cases that prevent these two >> sources of "funny invalid" Unicode from ever coexisting, but if so, >> perhaps you could point it out, or clarify the PEP. >> > > Please elucidate the "second source" of strings. I'm presuming you mean > strings egenrated from scratch rather than obtained by something like > listdir(). > POSIX has byte APIs for strings, that's one source, that is most under discussion. Windows has both bytes and 16-bit APIs for strings... the 16-bit APIs are generally mapped directly to UTF-16, but are not checked for UTF-16 validity, so all of Martin's funny-decoded files could be used for Windows file names on the 16-bit APIs. And yes, strings can be generated from scratch. > Given such a string with "funny invalid" stuff in it, and _absent_ > Martin's scheme, what do you expect the source of the strings to _expect_ > to happen to them if passed to open()? They still have to be converted > to bytes at the POSIX layer anyway. There is a fine encoding scheme that can take any str and encode to bytes: UTF-8. The problem is that UTF-8 doesn't work to take any byte sequence and decode to str, and that means that special handling has to happen when such byte sequences are encountered. But there is no str that can be generated that can't be generated in other ways, which would be properly encoded to a different byte sequence. Hence there are data puns, no 1-to-1 mapping. Hence it seems obvious to me that the only complete solution is to have an escape character, and ensure that all strings are decoded and encoded. As soon as you have an escape character, then you can decode anything into displayable, standard, Unicode, and you can create the reverse encoding unambiguously. Without an escape character, you just have a heuristic that will work sometimes, and break sometimes. If you believe non-UTF-8-decodable byte sequences are rare, you can ignore them. That's what we do now, but people squawk. If you believe that you can invent an encoding that has data puns, and that because of the character or characters involved are rare, that the problems that result can be ignored, fine... but people will squawk when they hit the problem... I'm just trying to squawk now, to point out that this is complexity for complexities sake, it adds complexity to trade one problem for a different problem, under the belief that the other problem is somehow rarer than the first. And maybe it is, today. I'd much rather have a solution that actually solves the problem. If you don't like ? as the escape character, then pick U+10F01, and anytime a U+10F01 is encountered in a file name, double it. And anytime there is an undecodable byte sequence, emit U+10F01, and then U+80 through U+FF as a subsequent character for the first byte in the undecodable sequence, and restart the decoder with the next byte. That'll work too. But use of rare, abnormal characters to take the place of undecodable bytes can never work, because of data puns, and valid use of the rare, abnormal characters. Someone suggested treating the byte sequences of the rare, abnormal characters as undecodable bytes, and decoding them using the same substitution rules. That would work too, if applied consistently, because then the rare, abnormal characters would each be escaped. But having 128 escape characters seems more complex than necessary, also. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Tue Apr 28 03:24:24 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 18:24:24 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090428004232.GA12325@cskk.homeip.net> References: <20090428004232.GA12325@cskk.homeip.net> Message-ID: <49F65AC8.5010206@g.nevcal.com> On approximately 4/27/2009 5:42 PM, came the following characters from the keyboard of Cameron Simpson: > I think that, almost independent of this PEP, there should be an > os.fsencode() function that takes a byte string (as a POSIX OS call > will take) and performs the _same_ byte->string encoding that listdir() > and friends are doing under the hood. And a partner os.fsdecode() for > string->bytes. That will save a lot of wheel respoking and probably make > it easier for people to think about this. > If a generally useful encoding scheme is invented for transforming file names within Python, it should definitely be made available for those cases where the application must transform between an encoded Python name and either a str or bytes interface presented by 3rd party software. It should be available on all platforms, so that portable code can be written. Of course, if there are variations in the 3rd party software on the various platforms, there still may be a need for platform-specific code. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From cs at zip.com.au Tue Apr 28 04:11:17 2009 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 28 Apr 2009 12:11:17 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F658A5.7080807@g.nevcal.com> Message-ID: <20090428021117.GA25536@cskk.homeip.net> On 27Apr2009 18:15, Glenn Linderman wrote: >>>>> The problem with this, and other preceding schemes that have been >>>>> discussed here, is that there is no means of ascertaining whether a >>>>> particular file name str was obtained from a str API, or was funny- >>>>> decoded from a bytes API... and thus, there is no means of reliably >>>>> ascertaining whether a particular filename str should be passed to a >>>>> str API, or funny-encoded back to bytes. >>>>> >>>> Why is it necessary that you are able to make this distinction? >>>> >>> It is necessary that programs (not me) can make the distinction, so >>> that it knows whether or not to do the funny-encoding or not. >>> >> >> I would say this isn't so. It's important that programs know if they're >> dealing with strings-for-filenames, but not that they be able to figure >> that out "a priori" if handed a bare string (especially since they >> can't:-) > > So you agree they can't... that there are data puns. (OK, you may not > have thought that through) I agree you can't examine a string and know if it came from the os.* munging or from someone else's munging. I totally disagree that this is a problem. There may be puns. So what? Use the right strings for the right purpose and all will be well. I think what is missing here, and missing from Martin's PEP, is some utility functions for the os.* namespace. PROPOSAL: add to the PEP the following functions: os.fsdecode(bytes) -> funny-encoded Unicode This is what os.listdir() does to produce the strings it hands out. os.fsencode(funny-string) -> bytes This is what open(filename,..) does to turn the filename into bytes for the POSIX open. os.pathencode(your-string) -> funny-encoded-Unicode This is what you must do to a de novo string to turn it into a string suitable for use by open. Importantly, for most strings not hand crafted to have weird sequences in them, it is a no-op. But it will recode your puns for survival. and for me, I would like to see: os.setfilesystemencoding(coding) Currently os.getfilesystemencoding() returns you the encoding based on the current locale, and (I trust) the os.* stuff encodes on that basis. setfilesystemencoding() would override that, unless coding==None in what case it reverts to the former "use the user's current locale" behaviour. (We have locale "C" for what one might otherwise expect None to mean:-) The idea here is to let to program control the codec used for filenames for special purposes, without working indirectly through the locale. >>> If a name is funny-decoded when the name is accessed by a directory >>> listing, it needs to be funny-encoded in order to open the file. >> >> Hmm. I had thought that legitimate unicode strings already get transcoded >> to bytes via the mapping specified by sys.getfilesystemencoding() >> (the user's locale). That already happens I believe, and Martin's >> scheme doesn't change this. He's just funny-encoding non-decodable byte >> sequences, not the decoded stuff that surrounds them. > > So assume a non-decodable sequence in a name. That puts us into > Martin's funny-decode scheme. His funny-decode scheme produces a bare > string, indistinguishable from a bare string that would be produced by a > str API that happens to contain that same sequence. Data puns. See my proposal above. Does it address your concerns? A program still must know the providence of the string, and _if_ you're working with non-decodable sequences in a names then you should transmute then into the funny encoding using the os.pathencode() function described above. In this way the punning issue can be avoided. _Lacking_ such a function, your punning concern is valid. > So when open is handed the string, should it open the file with the name > that matches the string, or the file with the name that funny-decodes to > the same string? It can't know, unless it knows that the string is a > funny-decoded string or not. True. open() should always expect a funny-encoded name. >> So it is already the case that strings get decoded to bytes by >> calls like open(). Martin isn't changing that. > > I thought the process of converting strings to bytes is called encoding. > You seem to be calling it decoding? My head must be standing in the wrong place. Yes, I probably mean encoding here. I'm trying to accompany these terms with little pictures like "string->bytes" to avoid confusion. >> I suppose if your program carefully constructs a unicode string riddled >> with half-surrogates etc and imagines something specific should happen >> to them on the way to being POSIX bytes then you might have a problem... > > Right. Or someone else's program does that. I only want to use Unicode > file names. But if those other file names exist, I want to be able to > access them, and not accidentally get a different file. Point taken. And I think addressed by the utility function proposed above. [...snip normal versus odd chars for the funny-encoding ...] >> Also, by avoiding reuse of legitimate characters in the encoding we can >> avoid your issue with losing track of where a string came from; >> legitimate characters are currently untouched by Martin's scheme, except >> for the normal "bytes<->string via the user's locale" translation that >> must already happen, and there you're aided by byets and strings being >> different types. > > There are abnormal characters, but there are no illegal characters. I though half-surrogates were illegal in well formed Unicode. I confess to being weak in this area. By "legitimate" above I meant things like half-surrogates which, like quarks, should not occur alone? > NTFS permits any 16-bit "character" code, including abnormal ones, > including half-surrogates, and including full surrogate sequences that > decode to PUA characters. POSIX permits all byte sequences, including > things that look like UTF-8, things that don't look like UTF-8, things > that look like half-surrogates, and things that look like full surrogate > sequences that decode to PUA characters. Sure. I'm not really talking about what filesystem will accept at the native layer, I was talking in the python funny-encoded space. [..."escaping is necessary"... I agree...] >>> I'm certainly not experienced enough in Python development processes >>> or internals to attempt such, as yet. But somewhere in 25 years of >>> programming, I picked up the knowledge that if you want to have a >>> 1-to-1 reversible mapping, you have to avoid data puns, mappings of >>> two different data values into a single data value. Your PEP, as >>> first written, didn't seem to do that... since there are two >>> interfaces from which to obtain data values, one performing a >>> mapping from bytes to "funny invalid" Unicode, and the other >>> performing no mapping, but accepting any sort of Unicode, possibly >>> including "funny invalid" Unicode, the possibility of data puns >>> seems to exist. I may be misunderstanding something about the use >>> cases that prevent these two sources of "funny invalid" Unicode from >>> ever coexisting, but if so, perhaps you could point it out, or >>> clarify the PEP. >> >> Please elucidate the "second source" of strings. I'm presuming you mean >> strings egenrated from scratch rather than obtained by something like >> listdir(). >> > > POSIX has byte APIs for strings, that's one source, that is most under > discussion. Windows has both bytes and 16-bit APIs for strings... the > 16-bit APIs are generally mapped directly to UTF-16, but are not checked > for UTF-16 validity, so all of Martin's funny-decoded files could be > used for Windows file names on the 16-bit APIs. These are existing file objects, I'll take them as source 1. They get encoded for release by os.listdir() et al. > And yes, strings can be > generated from scratch. I take this to be source 2. I think I agree with all the discussion that followed, and think the real problem is lack of utlities functions to funny-encode source 2 strings for use. hence the proposal above. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Be smart, be safe, be paranoid. - Ryan Cousineau, courier at compdyn.com DoD#863, KotRB, KotKWaWCRH From benjamin at python.org Tue Apr 28 04:58:34 2009 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 27 Apr 2009 21:58:34 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090428021117.GA25536@cskk.homeip.net> References: <49F658A5.7080807@g.nevcal.com> <20090428021117.GA25536@cskk.homeip.net> Message-ID: <1afaf6160904271958r15f2c3c0ide616c9bbc8ca0ee@mail.gmail.com> 2009/4/27 Cameron Simpson : > > PROPOSAL: add to the PEP the following functions: > > ?os.fsdecode(bytes) -> funny-encoded Unicode > ? ?This is what os.listdir() does to produce the strings it hands out. > ?os.fsencode(funny-string) -> bytes > ? ?This is what open(filename,..) does to turn the filename into bytes > ? ?for the POSIX open. > ?os.pathencode(your-string) -> funny-encoded-Unicode > ? ?This is what you must do to a de novo string to turn it into a > ? ?string suitable for use by open. > ? ?Importantly, for most strings not hand crafted to have weird > ? ?sequences in them, it is a no-op. But it will recode your puns > ? ?for survival. > > and for me, I would like to see: > > ?os.setfilesystemencoding(coding) > > Currently os.getfilesystemencoding() returns you the encoding based on > the current locale, and (I trust) the os.* stuff encodes on that basis. > setfilesystemencoding() would override that, unless coding==None in what > case it reverts to the former "use the user's current locale" behaviour. > (We have locale "C" for what one might otherwise expect None to mean:-) Time machine! http://docs.python.org/dev/py3k/library/sys.html#sys.setfilesystemencoding -- Regards, Benjamin From martin at v.loewis.de Tue Apr 28 05:35:59 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 28 Apr 2009 05:35:59 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F63B19.7010306@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> Message-ID: <49F6799F.5030208@v.loewis.de> Glenn Linderman wrote: > On approximately 4/27/2009 12:42 PM, came the following characters from > the keyboard of Martin v. L?wis: >>>> It's a private use area. It will never carry an official character >>>> assignment. >>> >>> I know that U+F0000 - U+FFFFF is a private use area. I don't find a >>> definition of U+F01xx to know what the notation means. Are you picking >>> a particular character within the private use area, or a particular >>> range, or what? >> >> It's a range. The lower-case 'x' denotes a variable half-byte, ranging >> from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code >> points. > > > So you only need 128 code points, so there is something else unclear. (please understand that this is history now, since the PEP has stopped using PUA characters). No. You seem to assume that all bytes < 128 decode successfully always. I believe this assumption is wrong, in general: py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position 3-4: illegal multibyte sequence All bytes are below 128, yet it fails to decode. Regards, Martin From martin at v.loewis.de Tue Apr 28 05:39:40 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 05:39:40 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F643BE.4050605@g.nevcal.com> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> <49F643BE.4050605@g.nevcal.com> Message-ID: <49F67A7C.4070602@v.loewis.de> > I'm not suggesting the PEP should solve the problem of mounting foreign > file systems, although if it doesn't it should probably point that out. > I'm just suggesting that if the people that write software to solve the > problem of mounting foreign file systems have already solved the naming > problem, then it might be a source of a good solution. On the other > hand, it might be the source of a mediocre or bad solution. However, if > those mounting system have good solutions, it would be good to be > compatible with them, rather than have yet another solution. It was in > that sense, of thinking about possibly existing practice, and leveraging > an existing solution, that caused me to bring up the topic. I think you make quite a lot of assumptions here. It would be better to research the state of the art first, and only then propose to follow it. Regards, Martin From cs at zip.com.au Tue Apr 28 05:39:46 2009 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 28 Apr 2009 13:39:46 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <1afaf6160904271958r15f2c3c0ide616c9bbc8ca0ee@mail.gmail.com> Message-ID: <20090428033946.GA14685@cskk.homeip.net> On 27Apr2009 21:58, Benjamin Peterson wrote: | 2009/4/27 Cameron Simpson : | > PROPOSAL: add to the PEP the following functions: [...] | > and for me, I would like to see: | > ?os.setfilesystemencoding(coding) | > | > Currently os.getfilesystemencoding() returns you the encoding based on | > the current locale, and (I trust) the os.* stuff encodes on that basis. | > setfilesystemencoding() would override that, unless coding==None in what | > case it reverts to the former "use the user's current locale" behaviour. | > (We have locale "C" for what one might otherwise expect None to mean:-) | | Time machine! http://docs.python.org/dev/py3k/library/sys.html#sys.setfilesystemencoding How embarrassing. I thought I'd looked. It doesn't have the None->return-to-default mode, and I'd like to see the word "overwritten" replaced by "overidden". And of course if Martin's PEP gets adopted then the "e.g." cleause needs replacing:-) -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Do not taunt Happy Fun Coder. From martin at v.loewis.de Tue Apr 28 05:50:11 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 05:50:11 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <200904281027.20431.steve@pearwood.info> References: <49EEBE2E.3090601@v.loewis.de> <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> <200904281027.20431.steve@pearwood.info> Message-ID: <49F67CF3.1030403@v.loewis.de> >> I don't understand what you're saying. py3k filenames are all >> unicode, even on POSIX systems, > > > How is that possible on POSIX systems where the underlying file system > uses bytes for filenames? > > If I write a piece of Python code: > > filename = 'some path/some name' > > I might call it a filename, I might think of it as a filename, but it > *isn't*, it's a string in a Python program. It isn't a filename until > it hits the file system, and in POSIX systems that makes it bytes. Python automatically encodes strings with the file system encoding before passing them to the POSIX API. Regards, Martin From stephen at xemacs.org Tue Apr 28 06:26:36 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 13:26:36 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F63805.6000208@voidspace.org.uk> References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87ws96j7lv.fsf@uwakimon.sk.tsukuba.ac.jp> <49F63805.6000208@voidspace.org.uk> Message-ID: <87skjtjtdv.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Foord writes: > The problem you don't address, which is still the reality for most > programmers (especially Mac OS X where filesystem encoding is UTF 8), is > that programmers *are* going to treat filenames as strings. > The proposed PEP allows that to work for them - whatever platform their > program runs on. Sure, for values of "work" == "No exception will be raised in my module, and some content will actually be returned." It doesn't say anything about what happens once those strings escape the immediate context. So it *encourages* those programmers to pass any problems downstream, but only after discarding the resources needed to deal with problems effectively. It's not that hard to overcome that problem, but it does require a slightly more complex API, and one that doesn't return a string but rather a stringlike object annotated with the information about how it was decoded. Conversion to a string *should* be trivial; I just think it should be invoked explicitly to make it clear where information is being discarded. Without an implicit conversion, the nature of the data (ie, context-dependent structure) is made explicit. There's a natural place to document the problem that context must be used to interpret the data accurately, and even add more robust processing (in a new PEP, of course!), etc. Then in the future this interface could be used as the basis of a more robust API. With good design (and luck) it might be subclassible or extensible to a path object API, for example. PEP 383 on the other hand is a dead end as it stands. AFAICS it gives the best possible treatment of conversion of OS data to plain string, but we're already got developers lining up to say "I can't use it". :-( From stephen at xemacs.org Tue Apr 28 06:43:12 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 13:43:12 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F18E90.9070801@nevcal.com> <79990c6b0904240500k4041d2aai33dd6dc340644649@mail.gmail.com> <20090424152746.GA9543@panix.com> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> Tony Nelson writes: > At 16:09 +0000 04/27/2009, Antoine Pitrou wrote: > >Stephen J. Turnbull xemacs.org> writes: > >> > >> I hate to break it to you, but most stages of mail processing have > >> very little to do with SMTP. In particular, processing MIME > >> attachments often requires dealing with file names. > > > >AFAIK, the file name is only there as an indication for the user > >when he wants to save the file. If it's garbled a bit, no big > >deal. Nobody said we were at the stage of *saving* the file! From foom at fuhm.net Tue Apr 28 07:19:22 2009 From: foom at fuhm.net (James Y Knight) Date: Tue, 28 Apr 2009 01:19:22 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6799F.5030208@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> Message-ID: <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> On Apr 27, 2009, at 11:35 PM, Martin v. L?wis wrote: > No. You seem to assume that all bytes < 128 decode successfully > always. > I believe this assumption is wrong, in general: > > py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax > Traceback (most recent call last): > File "", line 1, in > UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position > 3-4: illegal multibyte sequence > > All bytes are below 128, yet it fails to decode. Surely nobody uses iso2022 as an LC_CTYPE encoding. That's expressly forbidden by POSIX, if I'm not mistaken...and I can't see how it would work, considering that it uses all the bytes from 0x20-0x7f, including 0x2f ("/"), to represent non-ascii characters. Hopefully it can be assumed that your locale encoding really is a non- overlapping superset of ASCII, as is required by POSIX... I'm a bit scared at the prospect that U+DCAF could turn into "/", that just screams security vulnerability to me. So I'd like to propose that only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be encoded/decoded via the error handler. James From v+python at g.nevcal.com Tue Apr 28 07:25:15 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 22:25:15 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6799F.5030208@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> Message-ID: <49F6933B.7020705@g.nevcal.com> On approximately 4/27/2009 8:35 PM, came the following characters from the keyboard of Martin v. L?wis: > Glenn Linderman wrote: >> On approximately 4/27/2009 12:42 PM, came the following characters from >> the keyboard of Martin v. L?wis: >>>>> It's a private use area. It will never carry an official character >>>>> assignment. >>>> I know that U+F0000 - U+FFFFF is a private use area. I don't find a >>>> definition of U+F01xx to know what the notation means. Are you picking >>>> a particular character within the private use area, or a particular >>>> range, or what? >>> It's a range. The lower-case 'x' denotes a variable half-byte, ranging >>> from 0 to F. So this is the range U+F0100..U+F01FF, giving 256 code >>> points. >> >> So you only need 128 code points, so there is something else unclear. > > (please understand that this is history now, since the PEP has stopped > using PUA characters). Yes, but having found the latest PEP finally (at least I hope the one at python.org is the latest, it has quit using PUA anyway), I confirm it is history. But the same issue applies to the range of half-surrogates. > No. You seem to assume that all bytes < 128 decode successfully always. > I believe this assumption is wrong, in general: > > py> "\x1b$B' \x1b(B".decode("iso-2022-jp") #2.x syntax > Traceback (most recent call last): > File "", line 1, in > UnicodeDecodeError: 'iso2022_jp' codec can't decode bytes in position > 3-4: illegal multibyte sequence > > All bytes are below 128, yet it fails to decode. Indeed, that was the missing piece. I'd forgotten about the encodings that use escape sequences, rather than UTF-8, and DBCS. I don't think those encodings are permitted by POSIX file systems, but I suppose they could sneak in via Environment variable values, and the like. The switch from PUA to half-surrogates does not resolve the issues with the encoding not being a 1-to-1 mapping, though. The very fact that you think you can get away with use of lone surrogates means that other people might, accidentally or intentionally, also use lone surrogates for some other purpose. Even in file names. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From robert.collins at canonical.com Tue Apr 28 07:39:01 2009 From: robert.collins at canonical.com (Robert Collins) Date: Tue, 28 Apr 2009 15:39:01 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6933B.7020705@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> Message-ID: <1240897141.5830.12.camel@lifeless-64> On Mon, 2009-04-27 at 22:25 -0700, Glenn Linderman wrote: > > Indeed, that was the missing piece. I'd forgotten about the > encodings > that use escape sequences, rather than UTF-8, and DBCS. I don't > think > those encodings are permitted by POSIX file systems, but I suppose > they > could sneak in via Environment variable values, and the like. This may already have been discussed, and if so I apologise for the for the noise. Does the PEP take into consideration the normalising behaviour of Mac OSX ? We've had some ongoing challenges in bzr related to this with bzr. -Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From v+python at g.nevcal.com Tue Apr 28 07:41:34 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 22:41:34 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F67A7C.4070602@v.loewis.de> References: <20090427075549.GA4418@cskk.homeip.net> <49F56F8B.7030108@g.nevcal.com> <49F60C0B.9000905@v.loewis.de> <49F643BE.4050605@g.nevcal.com> <49F67A7C.4070602@v.loewis.de> Message-ID: <49F6970E.4000701@g.nevcal.com> On approximately 4/27/2009 8:39 PM, came the following characters from the keyboard of Martin v. L?wis: >> I'm not suggesting the PEP should solve the problem of mounting foreign >> file systems, although if it doesn't it should probably point that out. >> I'm just suggesting that if the people that write software to solve the >> problem of mounting foreign file systems have already solved the naming >> problem, then it might be a source of a good solution. On the other >> hand, it might be the source of a mediocre or bad solution. However, if >> those mounting system have good solutions, it would be good to be >> compatible with them, rather than have yet another solution. It was in >> that sense, of thinking about possibly existing practice, and leveraging >> an existing solution, that caused me to bring up the topic. >> > > I think you make quite a lot of assumptions here. It would be better > to research the state of the art first, and only then propose to follow it. I didn't propose to follow it. I only proposed an area that could be researched as a source of ideas and/or potential solutions. Apparently there wasn't, but there could have been someone listening that had the results of such research on the tip of their tongue, and might have piped up with the techniques used. I did, in fact, begin researching the topic after making the suggestion, and thus far haven't found any brilliant solutions from that arena. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From tmbdev at gmail.com Tue Apr 28 08:29:23 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 08:29:23 +0200 Subject: [Python-Dev] PEP 383 (again) Message-ID: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> I thought PEP-383 was a fairly neat approach, but after thinking about it, I now think that it is wrong. PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode strings in a reversible way. But how do those non-UTF-8 byte sequences get into those path names in the first place? Most likely because an encoding other than UTF-8 was used to write the file system, but you're now trying to interpret its path names as UTF-8. Quietly escaping a bad UTF-8 encoding with private Unicode characters is unlikely to be the right thing, since using the wrong encoding likely means that other characters are decoded incorrectly as well. As a result, the path name may fail in string comparisons and pattern matching, and will look wrong to the user in print statements and dialog boxes. Therefore, when Python encounters path names on a file system that are not consistent with the (assumed) encoding for that file system, Python should raise an error. If you really don't care what the string looks like and you just want an encoding that round-trips without loss, you can probably just set your encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding arbitrary byte sequences to unicode strings as ISO 8859-15 is no less correct than decoding them as the proposed "utf-8b". In fact, the most likely source of non-UTF-8 sequences is ISO 8859 encodings. As for what the byte-oriented interfaces should do, they are simply platform dependent. On UNIX, they should do the obvious thing. On Windows, they can either hook up to the low-level byte-oriented system calls that the systems supply, or Windows could fake it and have the byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 sequences as illegal (there are already many illegal byte sequences anyway). Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Apr 28 08:50:02 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 08:50:02 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> Message-ID: <49F6A71A.3020809@v.loewis.de> James Y Knight wrote: > Hopefully it can be assumed that your locale encoding really is a > non-overlapping superset of ASCII, as is required by POSIX... Can you please point to the part of the POSIX spec that says that such overlapping is forbidden? > I'm a bit scared at the prospect that U+DCAF could turn into "/", that > just screams security vulnerability to me. So I'd like to propose that > only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be > encoded/decoded via the error handler. It would be actually U+DC2f that would turn into /. I'm happy to exclude that range from the mapping if POSIX really requires an encoding not to be overlapping with ASCII. Regards, Martin From v+python at g.nevcal.com Tue Apr 28 08:52:48 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 27 Apr 2009 23:52:48 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090428021117.GA25536@cskk.homeip.net> References: <20090428021117.GA25536@cskk.homeip.net> Message-ID: <49F6A7C0.6090105@g.nevcal.com> On approximately 4/27/2009 7:11 PM, came the following characters from the keyboard of Cameron Simpson: > On 27Apr2009 18:15, Glenn Linderman wrote: > >>>>>> The problem with this, and other preceding schemes that have been >>>>>> discussed here, is that there is no means of ascertaining whether a >>>>>> particular file name str was obtained from a str API, or was funny- >>>>>> decoded from a bytes API... and thus, there is no means of reliably >>>>>> ascertaining whether a particular filename str should be passed to a >>>>>> str API, or funny-encoded back to bytes. >>>>>> >>>>>> >>>>> Why is it necessary that you are able to make this distinction? >>>>> >>>>> >>>> It is necessary that programs (not me) can make the distinction, so >>>> that it knows whether or not to do the funny-encoding or not. >>>> >>>> >>> I would say this isn't so. It's important that programs know if they're >>> dealing with strings-for-filenames, but not that they be able to figure >>> that out "a priori" if handed a bare string (especially since they >>> can't:-) >>> >> So you agree they can't... that there are data puns. (OK, you may not >> have thought that through) >> > > I agree you can't examine a string and know if it came from the os.* munging > or from someone else's munging. > > I totally disagree that this is a problem. > > There may be puns. So what? Use the right strings for the right purpose > and all will be well. > > I think what is missing here, and missing from Martin's PEP, is some > utility functions for the os.* namespace. > > PROPOSAL: add to the PEP the following functions: > > os.fsdecode(bytes) -> funny-encoded Unicode > This is what os.listdir() does to produce the strings it hands out. > os.fsencode(funny-string) -> bytes > This is what open(filename,..) does to turn the filename into bytes > for the POSIX open. > os.pathencode(your-string) -> funny-encoded-Unicode > This is what you must do to a de novo string to turn it into a > string suitable for use by open. > Importantly, for most strings not hand crafted to have weird > sequences in them, it is a no-op. But it will recode your puns > for survival. > > and for me, I would like to see: > > os.setfilesystemencoding(coding) > > Currently os.getfilesystemencoding() returns you the encoding based on > the current locale, and (I trust) the os.* stuff encodes on that basis. > setfilesystemencoding() would override that, unless coding==None in what > case it reverts to the former "use the user's current locale" behaviour. > (We have locale "C" for what one might otherwise expect None to mean:-) > > The idea here is to let to program control the codec used for filenames > for special purposes, without working indirectly through the locale. > > >>>> If a name is funny-decoded when the name is accessed by a directory >>>> listing, it needs to be funny-encoded in order to open the file. >>>> >>> Hmm. I had thought that legitimate unicode strings already get transcoded >>> to bytes via the mapping specified by sys.getfilesystemencoding() >>> (the user's locale). That already happens I believe, and Martin's >>> scheme doesn't change this. He's just funny-encoding non-decodable byte >>> sequences, not the decoded stuff that surrounds them. >>> >> So assume a non-decodable sequence in a name. That puts us into >> Martin's funny-decode scheme. His funny-decode scheme produces a bare >> string, indistinguishable from a bare string that would be produced by a >> str API that happens to contain that same sequence. Data puns. >> > > See my proposal above. Does it address your concerns? A program still > must know the providence of the string, and _if_ you're working with > non-decodable sequences in a names then you should transmute then into > the funny encoding using the os.pathencode() function described above. > > In this way the punning issue can be avoided. > > _Lacking_ such a function, your punning concern is valid. > Seems like one would also desire os.pathdecode to do the reverse. And also versions that take or produce bytes from funny-encoded strings. Then, if programs were re-coded to perform these transformations on what you call de novo strings, then the scheme would work. But I think a large part of the incentive for the PEP is to try to invent a scheme that intentionally allows for the puns, so that programs do not need to be recoded in this manner, and yet still work. I don't think such a scheme exists. If there is going to be a required transformation from de novo strings to funny-encoded strings, then why not make one that people can actually see and compare and decode from the displayable form, by using displayable characters instead of lone surrogates? >> So when open is handed the string, should it open the file with the name >> that matches the string, or the file with the name that funny-decodes to >> the same string? It can't know, unless it knows that the string is a >> funny-decoded string or not. >> > > True. open() should always expect a funny-encoded name. > > >>> So it is already the case that strings get decoded to bytes by >>> calls like open(). Martin isn't changing that. >>> >> I thought the process of converting strings to bytes is called encoding. >> You seem to be calling it decoding? >> > > My head must be standing in the wrong place. Yes, I probably mean > encoding here. I'm trying to accompany these terms with little pictures > like "string->bytes" to avoid confusion. > > >>> I suppose if your program carefully constructs a unicode string riddled >>> with half-surrogates etc and imagines something specific should happen >>> to them on the way to being POSIX bytes then you might have a problem... >>> >> Right. Or someone else's program does that. I only want to use Unicode >> file names. But if those other file names exist, I want to be able to >> access them, and not accidentally get a different file. >> > > Point taken. And I think addressed by the utility function proposed > above. > > [...snip normal versus odd chars for the funny-encoding ...] > >>> Also, by avoiding reuse of legitimate characters in the encoding we can >>> avoid your issue with losing track of where a string came from; >>> legitimate characters are currently untouched by Martin's scheme, except >>> for the normal "bytes<->string via the user's locale" translation that >>> must already happen, and there you're aided by byets and strings being >>> different types. >>> >> There are abnormal characters, but there are no illegal characters. >> > > I though half-surrogates were illegal in well formed Unicode. I confess > to being weak in this area. By "legitimate" above I meant things like > half-surrogates which, like quarks, should not occur alone? > "Illegal" just means violating the accepted rules. In this case, the accepted rules are those enforced by the file system (at the bytes or str API levels), and by Python (for the str manipulations). None of those rules outlaw lone surrogates. Hence, while all of the systems under discussion can handle all Unicode characters in one way or another, none of them require that all Unicode rules are followed. Yes, you are correct that lone surrogates are illegal in Unicode. No, none of the accepted rules for these systems require Unicode. >> NTFS permits any 16-bit "character" code, including abnormal ones, >> including half-surrogates, and including full surrogate sequences that >> decode to PUA characters. POSIX permits all byte sequences, including >> things that look like UTF-8, things that don't look like UTF-8, things >> that look like half-surrogates, and things that look like full surrogate >> sequences that decode to PUA characters. >> > > Sure. I'm not really talking about what filesystem will accept at > the native layer, I was talking in the python funny-encoded space. > > [..."escaping is necessary"... I agree...] > >>>> I'm certainly not experienced enough in Python development processes >>>> or internals to attempt such, as yet. But somewhere in 25 years of >>>> programming, I picked up the knowledge that if you want to have a >>>> 1-to-1 reversible mapping, you have to avoid data puns, mappings of >>>> two different data values into a single data value. Your PEP, as >>>> first written, didn't seem to do that... since there are two >>>> interfaces from which to obtain data values, one performing a >>>> mapping from bytes to "funny invalid" Unicode, and the other >>>> performing no mapping, but accepting any sort of Unicode, possibly >>>> including "funny invalid" Unicode, the possibility of data puns >>>> seems to exist. I may be misunderstanding something about the use >>>> cases that prevent these two sources of "funny invalid" Unicode from >>>> ever coexisting, but if so, perhaps you could point it out, or >>>> clarify the PEP. >>>> >>> Please elucidate the "second source" of strings. I'm presuming you mean >>> strings egenrated from scratch rather than obtained by something like >>> listdir(). >>> >>> >> POSIX has byte APIs for strings, that's one source, that is most under >> discussion. Windows has both bytes and 16-bit APIs for strings... the >> 16-bit APIs are generally mapped directly to UTF-16, but are not checked >> for UTF-16 validity, so all of Martin's funny-decoded files could be >> used for Windows file names on the 16-bit APIs. >> > > These are existing file objects, I'll take them as source 1. They get > encoded for release by os.listdir() et al. > > >> And yes, strings can be >> generated from scratch. >> > > I take this to be source 2. > One variation of source 2 is reading output from other programs, such as ls (POSIX) or dir (Windows). > I think I agree with all the discussion that followed, and think the > real problem is lack of utlities functions to funny-encode source 2 > strings for use. hence the proposal above. I think we understand each other now. I think your proposal could work, Cameron, although when recoding applications to use your proposal, I'd find it easier to use the "file name object" that others have proposed. I think that because either your proposal or the object proposals require recoding the application, that they will not be accepted. I think that because the PEP 383 allows data puns, that it should not be accepted in its present form. I think your if your proposal is accepted, that it then becomes possible to use an encoding that uses visible characters, which makes it easier for people to understand and verify. An encoding such as the one I suggested, but perhaps using a more obscure character, if there is one, but yet doesn't violate true Unicode. I think it should transform all data, from str and bytes interfaces, and produce only str values containing conforming Unicode, escaping all the non-conforming sequences in some manner. This would make the strings truly readable, as long as fonts for all the characters are available. And I had already suggested the utility functions you are suggesting, actually, in my first tirade against PEP 383 (search for "The encode and decode functions should be available for coders to use, that code to external interfaces, either OS or 3rd party packages, that do not use this encoding scheme"). I really don't care if you or who gets the credit for the idea, others may have suggested it before me, but I do care that the solution should provide functionality that works without ambiguity/data puns. The solution that was proposed in the lead up to releasing Python 3.0 was to offer both bytes and str interfaces (so we have those), and then for those that want to have a single portable implementation that can access all data, an object that encapsulates the differences, and the variant system APIs. (file system is one, command line is another, environment is another, I'm not sure if there are more.) I haven't heard if any progress on such an encapsulating object has been made; the people that proposed such have been rather quiet about this PEP. I would expect that an object implementation would provide display strings, and APIs to submit de novo str and bytes values to an object, which would run the appropriate encoding on them. Programs that want to use str interfaces on POSIX will see a subset of files on systems that contain files whose bytes filenames are not decodable. If a sysadmin wants to standardize on UTF-8 names universally, they can use something like convmv to clean up existing file names that don't conform. Programs that use str interfaces on POSIX system will work fine, but with a subset of the files. When that is unacceptable, they can either be recoded to use the bytes interfaces, or the hopefully forthcoming object encapsulation. The issue then will be what technique will be used to transform bytes into display names, but since the display names would never be fed back to the objects directly (but the object would have an interface to accept de novo str and de novo bytes) then it is just a display issue, and one that uses visible characters would seem more useful in my mind, than one that uses half-surrogates or PUAs. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Tue Apr 28 08:53:10 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 08:53:10 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <1240897141.5830.12.camel@lifeless-64> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <1240897141.5830.12.camel@lifeless-64> Message-ID: <49F6A7D6.5030809@v.loewis.de> > Does the PEP take into consideration the normalising behaviour of Mac > OSX ? We've had some ongoing challenges in bzr related to this with bzr. No, that's completely out of scope, AFAICT. I don't even know what the issues are, so I'm not able to propose a solution, at the moment. Regards, Martin From martin at v.loewis.de Tue Apr 28 08:59:19 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 08:59:19 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> Message-ID: <49F6A947.1050106@v.loewis.de> > PEP-383 attempts to represent non-UTF-8 byte sequences in Unicode > strings in a reversible way. That isn't really true; it is not, inherently, about UTF-8. Instead, it tries to represent non-filesystem-encoding byte sequence in Unicode strings in a reversible way. > Quietly escaping a bad UTF-8 encoding with private Unicode characters is > unlikely to be the right thing And indeed, the PEP stopped using PUA characters. > Therefore, when Python encounters path names on a file system > that are not consistent with the (assumed) encoding for that file > system, Python should raise an error. This is what happens currently, and users are quite unhappy about it. > If you really don't care what the string looks like and you just want an > encoding that round-trips without loss, you can probably just set your > encoding to one of the 8 bit encodings, like ISO 8859-15. Decoding > arbitrary byte sequences to unicode strings as ISO 8859-15 is no less > correct than decoding them as the proposed "utf-8b". In fact, the most > likely source of non-UTF-8 sequences is ISO 8859 encodings. Yes, users can do that (to a degree), but they are still unhappy about it. The approach actually fails for command line arguments > As for what the byte-oriented interfaces should do, they are simply > platform dependent. On UNIX, they should do the obvious thing. On > Windows, they can either hook up to the low-level byte-oriented system > calls that the systems supply, or Windows could fake it and have the > byte-oriented interfaces use UTF-8 encodings always and reject non-UTF-8 > sequences as illegal (there are already many illegal byte sequences > anyway). As is, these interfaces are incomplete - they don't support command line arguments, or environment variables. If you want to complete them, you should write a PEP. Regards, Martin From tmbdev at gmail.com Tue Apr 28 09:30:01 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 09:30:01 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F6A947.1050106@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> Message-ID: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> > > Therefore, when Python encounters path names on a file system > > that are not consistent with the (assumed) encoding for that file > > system, Python should raise an error. > > This is what happens currently, and users are quite unhappy about it. We need to keep "users" and "programmers" distinct here. Programmers may find it inconvenient that they have to spend time figuring out and deal with platform-dependent file system encoding issues and errors. But internationalization and unicode are hard, that's just a fact of life. End users, however, are going to be quite unhappy if they get a string of gibberish for a file name because you decided to interpret some non-Unicode string as UTF-8-with-extra-bytes. Or some Python program might copy files from an ISO8859-15 encoded file system to a UTF-8 encoded file system, and instead of getting an error when the encodings are set incorrectly, Python would quietly create ISO8859-15 encoded file names, making the target file system inconsistent. There is a lot of potential for major problems for end users with your proposals. In both cases, what should happen is that the end user gets an error, submits a bug, and the programmer figures out how to deal with the encoding issues correctly. > Yes, users can do that (to a degree), but they are still unhappy about > it. The approach actually fails for command line arguments As it should: if I give an ISO8859-15 encoded command line argument to a Python program that expects a UTF-8 encoding, the Python program should tell me that there is something wrong when it notices that. Quietly continuing is the wrong thing to do. If we follow your approach, that ISO8859-15 string will get turned into an escaped unicode string inside Python. If I understand your proposal correctly, if it's a output file name and gets passed to Python's open function, Python will then decode that string and end up with an ISO8859-15 byte sequence, which it will write to disk literally, even if the encoding for the system is UTF-8. That's the wrong thing to do. As is, these interfaces are incomplete - they don't support command > line arguments, or environment variables. If you want to complete them, > you should write a PEP. There's no point in scratching when there's no itch. Tom PS: > Quietly escaping a bad UTF-8 encoding with private Unicode characters is > > unlikely to be the right thing > > And indeed, the PEP stopped using PUA characters. Let me rephrase this: "quietly escaping a bad UTF-8 encoding is unlikely to be the right thing"; it doesn't matter how you do it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phd.pp.ru Tue Apr 28 09:58:06 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 28 Apr 2009 11:58:06 +0400 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> Message-ID: <20090428075806.GB23828@phd.pp.ru> On Tue, Apr 28, 2009 at 09:30:01AM +0200, Thomas Breuel wrote: > Programmers may find it inconvenient that they have to spend time figuring > out and deal with platform-dependent file system encoding issues and > errors. But internationalization and unicode are hard, that's just a fact > of life. Until it's hard there will be no internationalization. A fact of life, damn it. Programmers are lazy, and have many problems to solve. > end user gets an > error, submits a bug, and the programmer figures out how to deal with the > encoding issues correctly. And the programmer answers "The program is expected a correct environment, good filenames, etc." and closes the issue with the resolution "User error, will not fix". I am not arguing for or against the PEP in question. Python certainly has to have a way to make portable i18n less hard or else the number of portable internationalized program will be about zero. What the way should be - I don't know. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From tmbdev at gmail.com Tue Apr 28 10:37:45 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 10:37:45 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <20090428075806.GB23828@phd.pp.ru> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> Message-ID: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> > > > Until it's hard there will be no internationalization. A fact of life, > damn it. Programmers are lazy, and have many problems to solve. PEP 383 doesn't make it any easier; it just turns one set of problems into another. Actually, it makes it worse, since any problems that show up now show up far from the source of the problem, and since it can lead to security problems and/or data loss. > And the programmer answers "The program is expected a correct > environment, good filenames, etc." and closes the issue with the resolution > "User error, will not fix". The problem may well be with the program using the wrong encodings or incorrectly ignoring encoding information. Furthermore, even if it is user error, the program needs to validate its inputs and put up a meaningful error message, not mangle the disk. To detect such program bugs, it's important that when Python detects an incorrect encoding that it doesn't quietly continue with an incorrect string. Furthermore, if you don't provide clear error messages, it often takes a significant amount of time for each issue to determine that it is user error. > I am not arguing for or against the PEP in question. Python certainly > has to have a way to make portable i18n less hard or else the number of > portable internationalized program will be about zero. What the way should > be - I don't know. Returning an error for an incorrect encoding doesn't make internationalization harder, it makes it easier because it makes debugging easier. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phd.pp.ru Tue Apr 28 11:00:11 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 28 Apr 2009 13:00:11 +0400 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> Message-ID: <20090428090011.GA27583@phd.pp.ru> On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote: > Returning an error for an incorrect encoding doesn't make > internationalization harder, it makes it easier because it makes debugging > easier. What is a "correct encoding"? I have an FTP server to which clients with different local encodings are connecting. FTP protocol doesn't have a notion of encoding so filenames on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one directory! What should os.listdir() return for that directory? What is a correct encoding for that directory?! If any program starts to raise errors Python becomes completely unusable for me! But is there anything I can debug here? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From p.f.moore at gmail.com Tue Apr 28 11:20:44 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 28 Apr 2009 10:20:44 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F658A5.7080807@g.nevcal.com> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> Message-ID: <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> 2009/4/28 Glenn Linderman : > So assume a non-decodable sequence in a name. ?That puts us into Martin's > funny-decode scheme. ?His funny-decode scheme produces a bare string, > indistinguishable from a bare string that would be produced by a str API > that happens to contain that same sequence. ?Data puns. > > So when open is handed the string, should it open the file with the name > that matches the string, or the file with the name that funny-decodes to the > same string? ?It can't know, unless it knows that the string is a > funny-decoded string or not. Sorry for picking on Glenn's comment - it's only one of many in this thread. But it seems to me that there is an assumption that problems will arise when code gets a potentially funny-decoded string and doesn't know where it came from. Is that a real concern? How many programs really don't know where their data came from? Maybe a general-purpose library routine *might* just need to document explicitly how it handles funny-encoded data (I can't actually imagine anything that would, but I'll concede it may be possible) but that's just a matter of documenting your assumptions - no better or worse than many other cases. This all sounds similar to the idea of "tainted" data in security - if you lose track of untrusted data from the environment, you expose yourself to potential security issues. So the same techniques should be relevant here (including ignoring it if your application isn't such that it's s concern!) I've yet to hear anyone claim that they would have an actual problem with a specific piece of code they have written. (NB, if such a claim has been made, feel free to point me to it - I admit I've been skimming this thread at times). Paul. From tmbdev at gmail.com Tue Apr 28 11:32:26 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 11:32:26 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <20090428090011.GA27583@phd.pp.ru> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <20090428090011.GA27583@phd.pp.ru> Message-ID: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote: > On Tue, Apr 28, 2009 at 10:37:45AM +0200, Thomas Breuel wrote: > > Returning an error for an incorrect encoding doesn't make > > internationalization harder, it makes it easier because it makes > debugging > > easier. > > What is a "correct encoding"? > > I have an FTP server to which clients with different local encodings > are connecting. FTP protocol doesn't have a notion of encoding so filenames > on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one > directory! What should os.listdir() return for that directory? What is a > correct encoding for that directory?! I don't know what it should do (ftplib needs to worry about that). I do know what it shouldn't do, however: it sould not return a utf-8b string which, when used to create a file, will create a file reproducing the byte sequence of the remote machine; that's wrong. If any program starts to raise errors Python becomes completely unusable > for me! But is there anything I can debug here? If we follow PEP 383, you will get lots of errors anyway because those strings, when encoded in utf-8b, will result in an error when you try to write them on a Windows file system or any other system that doesn't allow the byte sequences that the utf-8b encodes. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phd.pp.ru Tue Apr 28 11:52:23 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 28 Apr 2009 13:52:23 +0400 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <20090428090011.GA27583@phd.pp.ru> <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> Message-ID: <20090428095223.GB27583@phd.pp.ru> On Tue, Apr 28, 2009 at 11:32:26AM +0200, Thomas Breuel wrote: > On Tue, Apr 28, 2009 at 11:00, Oleg Broytmann wrote: > > I have an FTP server to which clients with different local encodings > > are connecting. FTP protocol doesn't have a notion of encoding so filenames > > on the filesystem are in koi8-r, cp1251 and utf-8 encodings - all in one > > directory! What should os.listdir() return for that directory? What is a > > correct encoding for that directory?! > > I don't know what it should do (ftplib needs to worry about that). There is no ftplib there. FTP server is ProFTPd, ftp clients of all sort, one, e.g., an ftp client built-in into an automatic web-camera. I use python programs to process files after they have been uploaded. The programs access FTP directory as a part of local filesystem. > I do know > what it shouldn't do, however: it sould not return a utf-8b string which, > when used to create a file, will create a file reproducing the byte sequence > of the remote machine; that's wrong. That certainly wrong. But at least the approach allows python programs to list all files in a directory - currently AFAIU os.listdir() silently skips undecodeable filenames. And after a program gets all files it can process it further - it can cleanup filenames (base64-encode them, e.g.), but at least it can do something, where currently it cannot. PS. It seems I started to argue for the PEP. Well, well... Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Tue Apr 28 13:49:47 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Apr 2009 11:49:47 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> Message-ID: Paul Moore gmail.com> writes: > > I've yet to hear anyone claim that they would have an actual problem > with a specific piece of code they have written. Yep, that's the problem. Lots of theoretical problems noone has ever encountered brought up against a PEP which resolves some actual problems people encounter on a regular basis. For the record, I'm +1 on the PEP being accepted and implemented as soon as possible (preferably before 3.1). Regards Antoine. From jianchun.zhou at gmail.com Tue Apr 28 13:55:40 2009 From: jianchun.zhou at gmail.com (Jianchun Zhou) Date: Tue, 28 Apr 2009 19:55:40 +0800 Subject: [Python-Dev] Can not run under python 2.6 Message-ID: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com> Hi, there: I am new to python, and now I got a trouble: I have an application named canola, it is written under python 2.5, and can run normally under python 2.5 But when it comes under python 2.6, problem up, it says: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line 151, in _load_plugins classes = plg.load() File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line 94, in load mod = self._ldr.load() File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", line 42, in load mod = __import__(modpath, fromlist=[mod_name]) ImportError: Import by filename is not supported. Any body any idea what should I do? -- Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Tue Apr 28 13:56:59 2009 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 28 Apr 2009 12:56:59 +0100 Subject: [Python-Dev] One more proposed formatting change for 3.1 Message-ID: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com> Here's one more proposed change, this time for formatting of floats using format() and the empty presentation type. To avoid repeating myself, here's the text from the issue I just opened: http://bugs.python.org/issue5864 """ In all versions of Python from 2.6 up, I get the following behaviour: >>> format(123.456, '.4') '123.5' >>> format(1234.56, '.4') '1235.0' >>> format(12345.6, '.4') '1.235e+04' The first and third results are as I expect, but the second is somewhat misleading: it gives 5 significant digits when only 4 were requested, and moreover the last digit is incorrect. I propose that Python 2.7 and Python 3.1 be changed so that the output for the second line above is '1.235e+03'. """ This issue seems fairly clear cut to me, and I doubt that there's been enough uptake of 'format' yet for this to risk significant breakage. So unless there are objections I'll plan to make this change before this weekend's beta. Mark From p.f.moore at gmail.com Tue Apr 28 13:57:10 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 28 Apr 2009 12:57:10 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> Message-ID: <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> 2009/4/28 Antoine Pitrou : > Paul Moore gmail.com> writes: >> >> I've yet to hear anyone claim that they would have an actual problem >> with a specific piece of code they have written. > > Yep, that's the problem. Lots of theoretical problems noone has ever encountered > brought up against a PEP which resolves some actual problems people encounter on > a regular basis. > > For the record, I'm +1 on the PEP being accepted and implemented as soon as > possible (preferably before 3.1). In case it's not clear, I am also +1 on the PEP as it stands. Paul. From fuzzyman at voidspace.org.uk Tue Apr 28 14:03:42 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 28 Apr 2009 13:03:42 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> Message-ID: <49F6F09E.2020506@voidspace.org.uk> Paul Moore wrote: > 2009/4/28 Antoine Pitrou : > >> Paul Moore gmail.com> writes: >> >>> I've yet to hear anyone claim that they would have an actual problem >>> with a specific piece of code they have written. >>> >> Yep, that's the problem. Lots of theoretical problems noone has ever encountered >> brought up against a PEP which resolves some actual problems people encounter on >> a regular basis. >> >> For the record, I'm +1 on the PEP being accepted and implemented as soon as >> possible (preferably before 3.1). >> > > In case it's not clear, I am also +1 on the PEP as it stands. > Me 2 Michael > Paul. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ From fuzzyman at voidspace.org.uk Tue Apr 28 14:06:46 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 28 Apr 2009 13:06:46 +0100 Subject: [Python-Dev] Can not run under python 2.6 In-Reply-To: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com> References: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com> Message-ID: <49F6F156.6040901@voidspace.org.uk> Jianchun Zhou wrote: > Hi, there: > > I am new to python, and now I got a trouble: > > I have an application named canola, it is written under python 2.5, > and can run normally under python 2.5 > > But when it comes under python 2.6, problem up, it says: > > Traceback (most recent call last): > File > "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line > 151, in _load_plugins > classes = plg.load() > File > "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", line > 94, in load > mod = self._ldr.load() > File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", > line 42, in load > mod = __import__(modpath, fromlist=[mod_name]) > ImportError: Import by filename is not supported. > > Any body any idea what should I do? The Python-Dev mailing list is for the development of Python and not with Python. You will get a much better response asking on the comp.lang.python (python-list) or python-tutor newsgroups / mailing lists. comp.lang.python has both google groups and gmane gateways and so is easy to post to. For the particular problem you mention it is an intentional change and so the code in canola will need to be modified in order to run under Python 2.6. All the best, Michael Foord > > -- > Best Regards > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ From jianchun.zhou at gmail.com Tue Apr 28 14:20:06 2009 From: jianchun.zhou at gmail.com (Jianchun Zhou) Date: Tue, 28 Apr 2009 20:20:06 +0800 Subject: [Python-Dev] Can not run under python 2.6 In-Reply-To: <49F6F156.6040901@voidspace.org.uk> References: <2b767f890904280455j2cbdf444i187841113df1df2b@mail.gmail.com> <49F6F156.6040901@voidspace.org.uk> Message-ID: <2b767f890904280520j2dbe469di5f580c835b240a83@mail.gmail.com> OK, Thanks a lot. On Tue, Apr 28, 2009 at 8:06 PM, Michael Foord wrote: > Jianchun Zhou wrote: > >> Hi, there: >> >> I am new to python, and now I got a trouble: >> >> I have an application named canola, it is written under python 2.5, and >> can run normally under python 2.5 >> >> But when it comes under python 2.6, problem up, it says: >> >> Traceback (most recent call last): >> File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", >> line 151, in _load_plugins >> classes = plg.load() >> File "/usr/lib/python2.6/site-packages/terra/core/plugin_manager.py", >> line 94, in load >> mod = self._ldr.load() >> File "/usr/lib/python2.6/site-packages/terra/core/module_loader.py", line >> 42, in load >> mod = __import__(modpath, fromlist=[mod_name]) >> ImportError: Import by filename is not supported. >> >> Any body any idea what should I do? >> > > The Python-Dev mailing list is for the development of Python and not with > Python. You will get a much better response asking on the comp.lang.python > (python-list) or python-tutor newsgroups / mailing lists. comp.lang.python > has both google groups and gmane gateways and so is easy to post to. > > For the particular problem you mention it is an intentional change and so > the code in canola will need to be modified in order to run under Python > 2.6. > > All the best, > > Michael Foord > > >> -- >> Best Regards >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk >> >> > > > -- > http://www.ironpythoninaction.com/ > > -- Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From l.mastrodomenico at gmail.com Tue Apr 28 14:29:19 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 28 Apr 2009 14:29:19 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <20090428090011.GA27583@phd.pp.ru> <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> Message-ID: 2009/4/28 Thomas Breuel : > If we follow PEP 383, you will get lots of errors anyway because those > strings, when encoded in utf-8b, will result in an error when you try to > write them on a Windows file system or any other system that doesn't allow > the byte sequences that the utf-8b encodes. I'm not sure if when you say "write them on a Windows FS" you mean from within Windows itself or a filesystem mounted on another OS, so I'll cover both cases. Let's suppose that I use Python 2.x or something else to create a file with name b'\xff'. My (Linux) system has a sane configuration and the filesystem encoding is UTF-8, so it's an invalid name but the kernel will blindly accept it anyway. With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'. Now if this string somehow ends up in a Python 3.1 program running on Windows and it tries to create a file with this name, it will work (no exception will be raised). The Windows GUI will display the standard "invalid character" symbol (an empty box) when listing this file, but this seems reasonable since the original file was displayed as "?" by the Linux console and with another invalid character symbol by the GNOME file manager. OTOH if I write the same file on a Windows filesystem mounted on another OS, there will be in place an automatic translation (probably done by the OS kernel) from the user-visible filesystem encoding (see e.g. the "iocharset" or "utf8" mount options for vfat on Linux) to UTF-16. Which means that the write will fail with something like: IOError: [Errno 22] invalid filename: b'/media/windows_disk/\xff' (The "problem" is that a vfat filesystem mounted with the "utf8" option on Linux will only accept byte sequences that are valid UTF-8, or at least reasonably similar: e.g. b'\xed\xb3\xbf' is accepted.) Again this seems reasonable since it already happens in Python 2 and with pretty much any other software, including GNU cp. I don't see how Martin can do better than this. Well, ok, I guess he could break into my house and rename the original file to something sane... -- Lino Mastrodomenico From ronaldoussoren at mac.com Tue Apr 28 14:30:43 2009 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 28 Apr 2009 14:30:43 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6F09E.2020506@voidspace.org.uk> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> Message-ID: <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> For what it's worth, the OSX API's seem to behave as follows: * If you create a file with an non-UTF8 name on a HFS+ filesystem the system automaticly encodes the name. That is, open(chr(255), 'w') will silently create a file named '%FF' instead of the name you'd expect on a unix system. * If you mount an NFS filesystem from a linux host and that directory contains a file named chr(255) - unix-level tools will see a file with the expected name (just like on linux) - Cocoa's NSFileManager returns u"?" as the filename, that is when the filename cannot be decoded using UTF-8 the name returned by the high- level API is mangled. This is regardless of the setting of LANG. - I haven't found a way yet to access files whose names are not valid UTF-8 using the high-level Cocoa API's. The latter two are interesting because Cocoa has a unicode filesystem API on top of a POSIX C-API, just like Python 3.x. I guess the choosen behaviour works out on OSX (where users are unlikely to run into this issue), but could be more problematic on other POSIX systems. Ronald On 28 Apr, 2009, at 14:03, Michael Foord wrote: > Paul Moore wrote: >> 2009/4/28 Antoine Pitrou : >> >>> Paul Moore gmail.com> writes: >>> >>>> I've yet to hear anyone claim that they would have an actual >>>> problem >>>> with a specific piece of code they have written. >>>> >>> Yep, that's the problem. Lots of theoretical problems noone has >>> ever encountered >>> brought up against a PEP which resolves some actual problems >>> people encounter on >>> a regular basis. >>> >>> For the record, I'm +1 on the PEP being accepted and implemented >>> as soon as >>> possible (preferably before 3.1). >>> >> >> In case it's not clear, I am also +1 on the PEP as it stands. >> > > Me 2 > > Michael >> Paul. >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk >> > > > -- > http://www.ironpythoninaction.com/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2224 bytes Desc: not available URL: From tmbdev at gmail.com Tue Apr 28 14:37:33 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 14:37:33 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> Message-ID: <7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com> > > Yep, that's the problem. Lots of theoretical problems noone has ever > encountered > brought up against a PEP which resolves some actual problems people > encounter on > a regular basis. How can you bring up practical problems against something that hasn't been implemented? The fact that no other language or library does this is perhaps an indication that it isn't the right thing to do. But the biggest problem with the proposal is that it isn't needed: if you want to be able to turn arbitrary byte sequences into unicode strings and back, just set your encoding to iso8859-15. That already works and it doesn't require any changes. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From hrvoje.niksic at avl.com Tue Apr 28 14:41:19 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Tue, 28 Apr 2009 14:41:19 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <20090428090011.GA27583@phd.pp.ru> <7e51d15d0904280232w5a2dc186id67791feb9bf21e3@mail.gmail.com> <26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001> Message-ID: <49F6F96F.5050507@avl.com> Lino Mastrodomenico wrote: > Let's suppose that I use Python 2.x or something else to create a file > with name b'\xff'. My (Linux) system has a sane configuration and the > filesystem encoding is UTF-8, so it's an invalid name but the kernel > will blindly accept it anyway. > > With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'. One question that really bothers me about this proposal is the following: Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 sequence, will be converted to the half-surrogate '\udcff'. However, a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be converted to '\udcff'. Those are quite different POSIX pathnames; how will Python know which one it was when I later pass '\udcff' to open()? A poster hinted at this question, but I haven't seen it answered, yet. [1] I'm assuming that it's valid UTF8 because it passes through Python 2.5's '\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 expert. From hrvoje.niksic at avl.com Tue Apr 28 14:46:11 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Tue, 28 Apr 2009 14:46:11 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001> Message-ID: <49F6FA93.7080302@avl.com> Thomas Breuel wrote: > But the biggest problem with the proposal is that it isn't needed: if > you want to be able to turn arbitrary byte sequences into unicode > strings and back, just set your encoding to iso8859-15. That already > works and it doesn't require any changes. Are you proposing to unconditionally encode file names as iso8859-15, or to do so only when undecodeable bytes are encountered? If you unconditionally set encoding to iso8859-15, then you are effectively reverting to treating file names as bytes, regardless of the locale. You're also angering a lot of European users who expect iso8859-2, etc. If you switch to iso8859-15 only in the presence of undecodable UTF-8, then you have the same round-trip problem as the PEP: both b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a way to unambiguously recover the original file name. From rdmurray at bitdance.com Tue Apr 28 14:47:40 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 28 Apr 2009 08:47:40 -0400 (EDT) Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> Message-ID: On Tue, 28 Apr 2009 at 09:30, Thomas Breuel wrote: >>> Therefore, when Python encounters path names on a file system >>> that are not consistent with the (assumed) encoding for that file >>> system, Python should raise an error. >> >> This is what happens currently, and users are quite unhappy about it. > > We need to keep "users" and "programmers" distinct here. > > Programmers may find it inconvenient that they have to spend time figuring > out and deal with platform-dependent file system encoding issues and > errors. But internationalization and unicode are hard, that's just a fact > of life. And most programmers won't do it, because most programmers write for an English speaking audience and have no clue about unicode issues. That is probably slowly changing, but it is still true, I think. > End users, however, are going to be quite unhappy if they get a string of > gibberish for a file name because you decided to interpret some non-Unicode > string as UTF-8-with-extra-bytes. No, end users expect the gibberish, because they get it all the time (at least on Unix) when dealing with international filenames. They expect to be able to manipulate such files _despite_ the gibberish. (I speak here as an end user who does this!!) > Or some Python program might copy files from an ISO8859-15 encoded file > system to a UTF-8 encoded file system, and instead of getting an error when > the encodings are set incorrectly, Python would quietly create ISO8859-15 > encoded file names, making the target file system inconsistent. As will almost all unix programs, and the unix OS itself. On Unix, you can't make the file system inconsistent by doing this, because filenames are just byte strings with no NULLs. How _does_ Windows handle this? Would a Windows program complain, or would it happily record the gibberish? I suspect the latter, but I don't use Windows so I don't know. > There is a lot of potential for major problems for end users with your > proposals. In both cases, what should happen is that the end user gets an > error, submits a bug, and the programmer figures out how to deal with the > encoding issues correctly. What would actually happen is that the user would abandon the program that didn't work for one (not written in Python) that did. If the programmer was lucky they'd get a bug report, which they wouldn't be able to do anything about since Python wouldn't be providing the tools to let them fix it (ie: there are currently no bytes interfaces for environ or the command line in python3). >> Yes, users can do that (to a degree), but they are still unhappy about >> it. The approach actually fails for command line arguments > > As it should: if I give an ISO8859-15 encoded command line argument to a > Python program that expects a UTF-8 encoding, the Python program should tell > me that there is something wrong when it notices that. Quietly continuing > is the wrong thing to do. Imagine you are on a unix system, and you have gotten from somewhere a file whose name is encoded in something other than UTF-8 (I have a number of those on my system). Now imagine that I want to run a python program against that file, passing the name in on the command line. I type the program name, the first few (non-mangled) characters, and hit tab for completion, and my shell automagically puts the escaped bytes onto the command line. Or perhaps I cut and paste from an 'ls' listing into a quoted string on the command line. Python is now getting the mangled filename passed in on the command line, and if the python program can't manipulate that file like any other file on my disk I am going to be mightily pissed. This is the _reality_ of current unix systems, like it or not. The same apparently applies to Windows, though in that case the mangled names may be fewer and you tend to pick them from a GUI interface rather than do cut-and-paste or tab completion. > If we follow your approach, that ISO8859-15 string will get turned into an > escaped unicode string inside Python. If I understand your proposal > correctly, if it's a output file name and gets passed to Python's open > function, Python will then decode that string and end up with an ISO8859-15 > byte sequence, which it will write to disk literally, even if the encoding > for the system is UTF-8. That's the wrong thing to do. Right. Like I said, that's what most (almost all) Unix/Linux programs _do_. Now, in some future world where everyone (including Windows) acts like we are hearing OS/X does and rejects the garbled encoding _at the OS level_, then we'd be able to trust the file system encoding (FSDO trust) and there would be no need for this PEP or any similar solution. --David From l.mastrodomenico at gmail.com Tue Apr 28 15:01:32 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 28 Apr 2009 15:01:32 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6933B.7020705@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> Message-ID: 2009/4/28 Glenn Linderman : > The switch from PUA to half-surrogates does not resolve the issues with the > encoding not being a 1-to-1 mapping, though. ?The very fact that you ?think > you can get away with use of lone surrogates means that other people might, > accidentally or intentionally, also use lone surrogates for some other > purpose. ?Even in file names. It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is not a valid Unicode character (not a character at all, really) and the only way you can put this in a POSIX filename is if you use a very lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'. Since this byte sequence doesn't represent a valid character when decoded with UTF-8, it should simply be considered an invalid UTF-8 sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* '\udcff'). Martin: maybe the PEP should say this explicitly? Note that the round-trip works without ambiguities between '\udcff' in the filename: b'\xed\xb3\xbf' -> '\udced\udcb3\udcbf' -> b'\xed\xb3\xbf' and b'\xff' in the filename, decoded by Python to '\udcff': b'\xff' -> '\udcff' -> b'\xff' -- Lino Mastrodomenico From solipsis at pitrou.net Tue Apr 28 15:03:46 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Apr 2009 13:03:46 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_383=3A_Non-decodable_Bytes_in_System_C?= =?utf-8?q?haracter=09Interfaces?= References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com> Message-ID: Thomas Breuel gmail.com> writes: > > How can you bring up practical problems against something that hasn't been implemented? The PEP is simple enough that you can simulate its effect by manually computing the resulting unicode string for a hypothetical broken filename. Several people have already done so in this thread. > The fact that no other language or library does this is perhaps an indication that it isn't the right thing to do. According to some messages, it seems Java and Mono actually use this kind of workaround. Though I haven't checked (I don't use those languages). > But the biggest problem with the proposal is that it isn't needed: if you want to be able to turn arbitrary byte sequences into unicode strings and back, just set your encoding to iso8859-15.? That already works That doesn't work at all. With your proposal, any non-ASCII filename will be unreadable; not only the broken ones. Antoine. From hrvoje.niksic at avl.com Tue Apr 28 15:06:17 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Tue, 28 Apr 2009 15:06:17 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> Message-ID: <49F6FF49.6010205@avl.com> Lino Mastrodomenico wrote: > Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). "Should be considered" or "will be considered"? Python 3.0's UTF-8 decoder happily accepts it and returns u'\udcff': >>> b'\xed\xb3\xbf'.decode('utf-8') '\udcff' If the PEP depends on this being changed, it should be mentioned in the PEP. From solipsis at pitrou.net Tue Apr 28 15:13:37 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Apr 2009 13:13:37 +0000 (UTC) Subject: [Python-Dev] lone surrogates in utf-8 References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> <49F6FF49.6010205@avl.com> Message-ID: Hrvoje Niksic avl.com> writes: > > "Should be considered" or "will be considered"? Python 3.0's UTF-8 > decoder happily accepts it and returns u'\udcff': > > >>> b'\xed\xb3\xbf'.decode('utf-8') > '\udcff' Yes, there is already a bug entry for it: http://bugs.python.org/issue3672 I think we could happily fix it for 3.1 (perhaps leaving 2.7 unchanged for compatibility reasons - I don't know if some people may rely on the current behaviour). From l.mastrodomenico at gmail.com Tue Apr 28 15:14:19 2009 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 28 Apr 2009 15:14:19 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6FF49.6010205@avl.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> <49F6FF49.6010205@avl.com> Message-ID: 2009/4/28 Hrvoje Niksic : > Lino Mastrodomenico wrote: >> >> Since this byte sequence [b'\xed\xb3\xbf'] doesn't represent a valid >> character when >> decoded with UTF-8, it should simply be considered an invalid UTF-8 >> sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* >> '\udcff'). > > "Should be considered" or "will be considered"? ?Python 3.0's UTF-8 decoder > happily accepts it and returns u'\udcff': > >>>> b'\xed\xb3\xbf'.decode('utf-8') > '\udcff' Only for the new utf-8b encoding (if Martin agrees), while the existing utf-8 is fine as is (or at least waaay outside the scope of this PEP). -- Lino Mastrodomenico From p.f.moore at gmail.com Tue Apr 28 15:19:55 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 28 Apr 2009 14:19:55 +0100 Subject: [Python-Dev] One more proposed formatting change for 3.1 In-Reply-To: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com> References: <5c6f2a5d0904280456k1fa5ade0gad1aad54364002d1@mail.gmail.com> Message-ID: <79990c6b0904280619n21002694j3ba63026cf954f53@mail.gmail.com> 2009/4/28 Mark Dickinson : > Here's one more proposed change, this time for formatting > of floats using format() and the empty presentation type. > To avoid repeating myself, here's the text from the issue > I just opened: > > http://bugs.python.org/issue5864 > > """ > In all versions of Python from 2.6 up, I get the following behaviour: > >>>> format(123.456, '.4') > '123.5' >>>> format(1234.56, '.4') > '1235.0' >>>> format(12345.6, '.4') > '1.235e+04' > > The first and third results are as I expect, but the second is somewhat > misleading: it gives 5 significant digits when only 4 were requested, > and moreover the last digit is incorrect. > > I propose that Python 2.7 and Python 3.1 be changed so that the output > for the second line above is '1.235e+03'. > """ > > This issue seems fairly clear cut to me, and I doubt that there's been > enough uptake of 'format' yet for this to risk significant breakage. ?So > unless there are objections I'll plan to make this change before this > weekend's beta. +1 From duncan.booth at suttoncourtenay.org.uk Tue Apr 28 15:22:45 2009 From: duncan.booth at suttoncourtenay.org.uk (Duncan Booth) Date: Tue, 28 Apr 2009 13:22:45 +0000 (UTC) Subject: [Python-Dev] PEP 383 (again) References: <26924021.1861174.1240921767958.JavaMail.xicrypt@atgrzls001> <49F6F96F.5050507@avl.com> Message-ID: Hrvoje Niksic wrote: > Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 > sequence, will be converted to the half-surrogate '\udcff'. However, > a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be > converted to '\udcff'. Those are quite different POSIX pathnames; how > will Python know which one it was when I later pass '\udcff' to > open()? > > > [1] > I'm assuming that it's valid UTF8 because it passes through Python > 2.5's '\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 > expert. I'm not a UTF-8 expert either, but I got bitten by this yesterday. I was uploading a file to a Google Search Appliance and it was rejected as invalid UTF-8 despite having been encoded into UTF-8 by Python. The cause was a byte sequence which decoded to a half surrogate similar to your example above. Python will happily decode and encode such sequences, but as I found to my cost other systems reject them. Reading wikipedia implies that Python is wrong to accept these sequences and I think (though I'm not a lawyer) that RFC 3629 also implies this: "The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters." and "Implementations of the decoding algorithm above MUST protect against decoding invalid sequences." From murman at gmail.com Tue Apr 28 16:00:50 2009 From: murman at gmail.com (Michael Urman) Date: Tue, 28 Apr 2009 09:00:50 -0500 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Apr 27, 2009 at 23:43, Stephen J. Turnbull wrote: > Nobody said we were at the stage of *saving* the [attachment]! But speaking of saving files, I think that's the biggest hole in this that has been nagging at the back of my mind. This PEP intends to allow easy access to filenames and other environment strings which are not restricted to known encodings. What happens if the detected encoding changes? There may be difficulties de/serializing these names, such as for an MRU list. Since the serialization of the Unicode string is likely to use UTF-8, and the string for such a file will include half surrogates, the application may raise an exception when encoding the names for a configuration file. These encoding exceptions will be as rare as the unusual names (which the careful I18N aware developer has probably eradicated from his system), and thus will appear late. Or say de/serialization succeeds. Since the resulting Unicode string differs depending on the encoding (which is a good thing; it is supposed to make most cases mostly readable), when the filesystem encoding changes (say from legacy to UTF-8), the "name" changes, and deserialized references to it become stale. This can probably be handled through careful use of the same encoding/decoding scheme, if relevant, but that sounds like we've just moved the problem from fs/environment access to serialization. Is that good enough? For other uses the API knew whether it was environmentally aware, but serialization probably will not. Should this PEP make recommendations about how to save filenames in configuration files? -- Michael Urman From stephen at xemacs.org Tue Apr 28 16:09:33 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 23:09:33 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> Message-ID: <87ocugkgyq.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > But it seems to me that there is an assumption that problems will > arise when code gets a potentially funny-decoded string and doesn't > know where it came from. > > Is that a real concern? Yes, it's a real concern. I don't think it's possible to show a small piece of code one could point at and say "without a better API I bet you can't write this correctly," though. Rather, my experience with Emacs and various mail packages is that without type information it is impossible to keep track of the myriad bits and pieces of text that are recombining like pig flu, and eventually one breaks out and causes an error. It's usually easy to fix, but so are the next hundred similar regressions, and in the meantime a hundred users have suffered more or less damage or at least annoyance. There's no question that dealing with escapes of funny-decoded strings to uprepared code paths is mission creep compared to Martin's stated purpose for PEP 383, but it is also a real problem. From stephen at xemacs.org Tue Apr 28 16:24:55 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 28 Apr 2009 23:24:55 +0900 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> Message-ID: <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> Thomas Breuel writes: > PEP 383 doesn't make it any easier; it just turns one set of > problems into another. That's false. There is an interesting class of problems of the form "get a list of names from the OS and allow the user to select from it, and retrieve corresponding content." People are *very* often able to decode complete gibberish, as long as it's the only gibberish in a list. Ditto partial gibberish. In that case, PEP 383 allows the content retrieval operation to complete. There are probably other problems that this PEP solves. > Actually, it makes it worse, Again, it gives you different problems, which may be better and may be worse according to the user's requirements. Currently, you often get an exception, and running the program again is no help. The user must clean up the list to make progress. This may or may not be within the user's capacity (eg, read-only media). > since any problems that show up now show up far from the source of > the problem, and since it can lead to security problems and/or data > loss. Yes. This is a point I have been at pains to argue elsewhere in this thread. However, it is "mission creep": Martin didn't volunteer to write a PEP for it, he volunteered to write a PEP to solve the "roundtrip the value of os.listdir()" problem. And he succeeded, up to some minor details. > The problem may well be with the program using the wrong encodings or > incorrectly ignoring encoding information. Furthermore, even if it is user > error, the program needs to validate its inputs and put up a meaningful > error message, not mangle the disk. To detect such program bugs, it's > important that when Python detects an incorrect encoding that it doesn't > quietly continue with an incorrect string. I agree. Guido, however, responded that "Practicality beats purity" to a similar point in the PEP 263 discussion. Be aware that you're fighting an uphill battle here. From martin at v.loewis.de Tue Apr 28 18:46:19 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 18:46:19 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> Message-ID: <49F732DB.8050101@v.loewis.de> > If we follow your approach, that ISO8859-15 string will get turned into > an escaped unicode string inside Python. If I understand your proposal > correctly, if it's a output file name and gets passed to Python's open > function, Python will then decode that string and end up with an > ISO8859-15 byte sequence, which it will write to disk literally, even if > the encoding for the system is UTF-8. That's the wrong thing to do. I don't think anything can, or should be, done about that. If you had byte-oriented interfaces (as you do in 2.x), exactly the same thing will happen: the name of the file will be the very same byte sequence as the one passed on the command line. Most Unix users here agree that this is the right thing to happen. Regards, Martin From martin at v.loewis.de Tue Apr 28 18:49:23 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 18:49:23 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> Message-ID: <49F73393.90901@v.loewis.de> > It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is > not a valid Unicode character (not a character at all, really) and the > only way you can put this in a POSIX filename is if you use a very > lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'. > > Since this byte sequence doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). > > Martin: maybe the PEP should say this explicitly? Sure, will do. Regards, Martin From martin at v.loewis.de Tue Apr 28 19:00:37 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 28 Apr 2009 19:00:37 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <49F73635.6010105@v.loewis.de> > Since the serialization of the Unicode string is likely to use UTF-8, > and the string for such a file will include half surrogates, the > application may raise an exception when encoding the names for a > configuration file. These encoding exceptions will be as rare as the > unusual names (which the careful I18N aware developer has probably > eradicated from his system), and thus will appear late. There are trade-offs to any solution; if there was a solution without trade-offs, it would be implemented already. The Python UTF-8 codec will happily encode half-surrogates; people argue that it is a bug that it does so, however, it would help in this specific case. An alternative that doesn't suffer from the risk of not being able to store decoded strings would have been the use of PUA characters, but people rejected it because of the potential ambiguities. So they clearly dislike one risk more than the other. UTF-8b is primarily meant as an in-memory representation. > Or say de/serialization succeeds. Since the resulting Unicode string > differs depending on the encoding (which is a good thing; it is > supposed to make most cases mostly readable), when the filesystem > encoding changes (say from legacy to UTF-8), the "name" changes, and > deserialized references to it become stale. That problem has nothing to do with the PEP. If the encoding changes, LRU entries may get stale even if there were no encoding errors at all. Suppose the old encoding was Latin-1, and the new encoding is KOI8-R, then all file names are decodable before and afterwards, yet the string representation changes. Applications that want to protect themselves against that happening need to store byte representations of the file names, not character representations. Depending on the configuration file format, that may or may not be possible. I find the case pretty artificial, though: if the locale encoding changes, all file names will look incorrect to the user, so he'll quickly switch back, or rename all the files. As an application supporting a LRU list, I would remove/hide all entries that don't correlate to existing files - after all, the user may have as well deleted the file in the LRU list. Regards, Martin From martin at v.loewis.de Tue Apr 28 19:08:37 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 19:08:37 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6FF49.6010205@avl.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> <49F6FF49.6010205@avl.com> Message-ID: <49F73815.1010806@v.loewis.de> > If the PEP depends on this being changed, it should be mentioned in the > PEP. The PEP says that the utf-8b codec decodes invalid bytes into low surrogates. I have now clarified that a strict definition of UTF-8 is assumed for utf-8b. Regards, Martin From foom at fuhm.net Tue Apr 28 19:53:42 2009 From: foom at fuhm.net (James Y Knight) Date: Tue, 28 Apr 2009 13:53:42 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6A71A.3020809@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> Message-ID: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote: > James Y Knight wrote: >> Hopefully it can be assumed that your locale encoding really is a >> non-overlapping superset of ASCII, as is required by POSIX... > > Can you please point to the part of the POSIX spec that says that > such overlapping is forbidden? I can't find it...I would've thought it would be on this page: http://opengroup.org/onlinepubs/007908775/xbd/charset.html but it's not (at least, not obviously). That does say (effectively) that all encodings must be supersets of ASCII and use the same codepoints, though. However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire reason why EUC-JP was created, so I'm pretty sure that it is in fact inappropriate, and I cannot find any evidence of it ever being used on any system. From http://en.wikipedia.org/wiki/EUC-JP: "To get the EUC form of an ISO-2022 character, the most significant bit of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 to each of these original 7-bit codes); this allows software to easily distinguish whether a particular byte in a character string belongs to the ISO-646 code or the ISO-2022 (EUC) code." Also: http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html >> I'm a bit scared at the prospect that U+DCAF could turn into "/", >> that >> just screams security vulnerability to me. So I'd like to propose >> that >> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be >> encoded/decoded via the error handler. > > It would be actually U+DC2f that would turn into /. Yes, I meant to say DC2F, sorry for the confusion. > I'm happy to exclude that range from the mapping if POSIX really > requires an encoding not to be overlapping with ASCII. I think it has to be excluded from mapping in order to not introduce security issues. However... There's also SHIFT-JIS to worry about...which apparently some people actually want to use as their default encoding, despite it being broken to do so. RedHat apparently refuses to provide it as a locale charset (due to its brokenness), and it's also not available by default on my Debian system. People do unfortunately seem to actually use it in real life. https://bugzilla.redhat.com/show_bug.cgi?id=136290 So, I'd like to propose this: The "python-escape" error handler when given a non-decodable byte from 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a non- decodable byte from 0x00 to 0x7F, it will be converted to U+0000-U +007F. On the encoding side, values from U+DC80 to U+DCFF are encoded into 0x80 to 0xFF, and all other characters are treated in whatever way the encoding would normally treat them. This proposal obviously works for all non-overlapping ASCII supersets, where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for Shift-JIS and other similar ASCII-supersets with overlaps in trailing bytes of a multibyte sequence. So, a sequence like "\x81\xFD".decode("shift-jis", "python-escape") will turn into u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD". The character sets this *doesn't* work for are: ebcdic code pages (obviously completely unsuitable for a locale encoding on unix), iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ with yen, and - with overline). If it's desirable to work with shift_jisx0213, a modification of the proposal can be made: Change the second sentence to: "When given a non- decodable byte from 0x00 to 0x7F, that byte must be the second or later byte in a multibyte sequence. In such a case, the error handler will produce the encoding of that byte if it was standing alone (thus in most encodings, \x00-\x7f turn into U+00-U+7F)." It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like some people do actually use shift_jisx0213, unfortunately. James From tmbdev at gmail.com Tue Apr 28 20:38:44 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 20:38:44 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> > > However, it is "mission creep": Martin didn't volunteer to > write a PEP for it, he volunteered to write a PEP to solve the > "roundtrip the value of os.listdir()" problem. And he succeeded, up > to some minor details. Yes, it solves that problem. But that doesn't come without cost. Most importantly, now Python writes illegal UTF-8 strings even if the user chose a UTF-8 encoding. That means that illegal UTF-8 encodings can propagate anywhere, without warning. Furthermore, I don't believe that PEP 383 works consistently on Windows, and it causes programs to behave differently in unintuitive ways on Windows and Linux. I'll suggest an alternative in a separate message. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Apr 28 20:45:57 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 20:45:57 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> Message-ID: <49F74EE5.6060305@v.loewis.de> > Furthermore, I don't believe that PEP 383 works consistently on Windows, What makes you say that? PEP 383 will have no effect on Windows, compared to the status quo, whatsoever. Regards, Martin From v+python at g.nevcal.com Tue Apr 28 20:48:37 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 11:48:37 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F73635.6010105@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> Message-ID: <49F74F85.9010800@g.nevcal.com> On approximately 4/28/2009 10:00 AM, came the following characters from the keyboard of Martin v. L?wis: > An alternative that doesn't suffer from the risk of not being able to > store decoded strings would have been the use of PUA characters, but > people rejected it because of the potential ambiguities. So they clearly > dislike one risk more than the other. UTF-8b is primarily meant as > an in-memory representation. The UTF-8b representation suffers from the same potential ambiguities as the PUA characters... perhaps slightly less likely in practice, due to the use of Unicode-illegal characters, but exactly the same theoretical likelihood in the space of Python-acceptable character codes. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From google at mrabarnett.plus.com Tue Apr 28 20:55:09 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 28 Apr 2009 19:55:09 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> Message-ID: <49F7510D.7070603@mrabarnett.plus.com> James Y Knight wrote: > > On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote: > >> James Y Knight wrote: >>> Hopefully it can be assumed that your locale encoding really is a >>> non-overlapping superset of ASCII, as is required by POSIX... >> >> Can you please point to the part of the POSIX spec that says that >> such overlapping is forbidden? > > I can't find it...I would've thought it would be on this page: > http://opengroup.org/onlinepubs/007908775/xbd/charset.html > but it's not (at least, not obviously). That does say (effectively) that > all encodings must be supersets of ASCII and use the same codepoints, > though. > > However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire > reason why EUC-JP was created, so I'm pretty sure that it is in fact > inappropriate, and I cannot find any evidence of it ever being used on > any system. > > From http://en.wikipedia.org/wiki/EUC-JP: > "To get the EUC form of an ISO-2022 character, the most significant bit > of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 > to each of these original 7-bit codes); this allows software to easily > distinguish whether a particular byte in a character string belongs to > the ISO-646 code or the ISO-2022 (EUC) code." > > Also: > http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html > > >>> I'm a bit scared at the prospect that U+DCAF could turn into "/", that >>> just screams security vulnerability to me. So I'd like to propose that >>> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be >>> encoded/decoded via the error handler. >> >> It would be actually U+DC2f that would turn into /. > > Yes, I meant to say DC2F, sorry for the confusion. > >> I'm happy to exclude that range from the mapping if POSIX really >> requires an encoding not to be overlapping with ASCII. > > I think it has to be excluded from mapping in order to not introduce > security issues. > > However... > > There's also SHIFT-JIS to worry about...which apparently some people > actually want to use as their default encoding, despite it being broken > to do so. RedHat apparently refuses to provide it as a locale charset > (due to its brokenness), and it's also not available by default on my > Debian system. People do unfortunately seem to actually use it in real > life. > > https://bugzilla.redhat.com/show_bug.cgi?id=136290 > > So, I'd like to propose this: > The "python-escape" error handler when given a non-decodable byte from > 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a > non-decodable byte from 0x00 to 0x7F, it will be converted to > U+0000-U+007F. On the encoding side, values from U+DC80 to U+DCFF are > encoded into 0x80 to 0xFF, and all other characters are treated in > whatever way the encoding would normally treat them. > > This proposal obviously works for all non-overlapping ASCII supersets, > where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for > Shift-JIS and other similar ASCII-supersets with overlaps in trailing > bytes of a multibyte sequence. So, a sequence like > "\x81\xFD".decode("shift-jis", "python-escape") will turn into > u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD". > > The character sets this *doesn't* work for are: ebcdic code pages > (obviously completely unsuitable for a locale encoding on unix), > iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ > with yen, and - with overline). > > If it's desirable to work with shift_jisx0213, a modification of the > proposal can be made: Change the second sentence to: "When given a > non-decodable byte from 0x00 to 0x7F, that byte must be the second or > later byte in a multibyte sequence. In such a case, the error handler > will produce the encoding of that byte if it was standing alone (thus in > most encodings, \x00-\x7f turn into U+00-U+7F)." > > It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like > some people do actually use shift_jisx0213, unfortunately. > I've been thinking of "python-escape" only in terms of UTF-8, the only encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are decodable. But if you're talking about using it with other encodings, eg shift-jisx0213, then I'd suggest the following: 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to half surrogates U+DC00 to U+DCFF. 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF are treated as though they are undecodable bytes. 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding are encoded to bytes 0x00 to 0xFF. 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't be produced by decoding raise an exception. I think I've covered all the possibilities. :-) From tmbdev at gmail.com Tue Apr 28 21:01:58 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 21:01:58 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) Message-ID: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> I think we should break up this problem into several parts: (1) Should the default UTF-8 decoder fail if it gets an illegal byte sequence. It's probably OK for the default decoder to be lenient in some way (see below). (2) Should the default UTF-8 encoder for file system operations be allowed to generate illegal byte sequences? I think that's a definite no; if I set the encoding for a device to UTF-8, I never want Python to try to write illegal UTF-8 strings to my device. (3) What kind of representation should the UTF-8 decoder return for illegal inputs? There are actually several choices: (a) it could guess what the actual encoding is and use that, (b) it could return a valid unicode string that indicates the illegal characters but does not re-encode to the original byte sequence, or (c) it could return some kind of non-standard representation that encodes back into the original byte sequence. PEP 383 violated (2), and I think that's a bad thing. I think the best solution would be to use (3a) and fall back to (3b) if that doesn't work. If people try to write those strings, they will always get written as correctly encoded UTF-8 strings. If people really want the option of (3c), then I think encoders related to the file system should by default reject those strings as illegal because the potential problems from writing them are just too serious. Printing routines and UI routines could display them without error (but some clear indication), of course. There is yet another option, which is arguably the "right" one: make the results of os.listdir() subclasses of string that keep track of where they came from. If you write back to the same device, it just writes the same byte sequence. But if you write to other devices and the byte sequence is illegal according to its encoding, you get an error. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From zooko at zooko.com Tue Apr 28 20:51:43 2009 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Tue, 28 Apr 2009 12:51:43 -0600 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6FA93.7080302@avl.com> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001> <49F6FA93.7080302@avl.com> Message-ID: On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote: > Are you proposing to unconditionally encode file names as > iso8859-15, or to do so only when undecodeable bytes are encountered? For what it is worth, what we have previously planned to do for the Tahoe project is the second of these -- decode using some 1-byte encoding such as iso-8859-1, iso-8859-15, or windows-1252 only in the case that attempting to decode the bytes using the local alleged encoding failed. > If you switch to iso8859-15 only in the presence of undecodable > UTF-8, then you have the same round-trip problem as the PEP: both > b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a > way to unambiguously recover the original file name. Why do you say that? It seems to work as I expected here: >>> '\xff'.decode('iso-8859-15') u'\xff' >>> '\xc3\xbf'.decode('iso-8859-15') u'\xc3\xbf' >>> >>> >>> >>> '\xff'.decode('cp1252') u'\xff' >>> '\xc3\xbf'.decode('cp1252') u'\xc3\xbf' Regards, Zooko From google at mrabarnett.plus.com Tue Apr 28 21:04:35 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 28 Apr 2009 20:04:35 +0100 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F74EE5.6060305@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> Message-ID: <49F75343.9020602@mrabarnett.plus.com> Martin v. L?wis wrote: >> Furthermore, I don't believe that PEP 383 works consistently on Windows, > > What makes you say that? PEP 383 will have no effect on Windows, > compared to the status quo, whatsoever. > You could argue that if Windows is actually returning UTF-16 with half surrogates that they should be altered to conform to what UTF-8 would have returned. From v+python at g.nevcal.com Tue Apr 28 21:07:54 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 12:07:54 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> Message-ID: <49F7540A.9010500@g.nevcal.com> On approximately 4/28/2009 10:53 AM, came the following characters from the keyboard of James Y Knight: > > On Apr 28, 2009, at 2:50 AM, Martin v. L?wis wrote: > >> James Y Knight wrote: >>> Hopefully it can be assumed that your locale encoding really is a >>> non-overlapping superset of ASCII, as is required by POSIX... >> >> Can you please point to the part of the POSIX spec that says that >> such overlapping is forbidden? > > I can't find it...I would've thought it would be on this page: > http://opengroup.org/onlinepubs/007908775/xbd/charset.html > but it's not (at least, not obviously). That does say (effectively) that > all encodings must be supersets of ASCII and use the same codepoints, > though. > > However, ISO-2022 being inappropriate for LC_CTYPE usage is the entire > reason why EUC-JP was created, so I'm pretty sure that it is in fact > inappropriate, and I cannot find any evidence of it ever being used on > any system. It would seem from the definition of ISO-2022 that what it calls "escape sequences" is in your POSIX spec called "locking-shift encoding". Therefore, the second bullet item under the "Character Encoding" heading prohibits use of ISO-2022, for whatever uses that document defines (which, since you referenced it, I assume means locales, and possibly file system encodings, but I'm not familiar with the structure of all the POSIX standards documents). A locking-shift encoding (where the state of the character is determined by a shift code that may affect more than the single character following it) cannot be defined with the current character set description file format. Use of a locking-shift encoding with any of the standard utilities in the XCU specification or with any of the functions in the XSH specification that do not specifically mention the effects of state-dependent encoding is implementation-dependent. > From http://en.wikipedia.org/wiki/EUC-JP: > "To get the EUC form of an ISO-2022 character, the most significant bit > of each 7-bit byte of the original ISO 2022 codes is set (by adding 128 > to each of these original 7-bit codes); this allows software to easily > distinguish whether a particular byte in a character string belongs to > the ISO-646 code or the ISO-2022 (EUC) code." > > Also: > http://www.cl.cam.ac.uk/~mgk25/ucs/iso2022-wc.html > > >>> I'm a bit scared at the prospect that U+DCAF could turn into "/", that >>> just screams security vulnerability to me. So I'd like to propose that >>> only 0x80-0xFF <-> U+DC80-U+DCFF should ever be allowed to be >>> encoded/decoded via the error handler. >> >> It would be actually U+DC2f that would turn into /. > > Yes, I meant to say DC2F, sorry for the confusion. > >> I'm happy to exclude that range from the mapping if POSIX really >> requires an encoding not to be overlapping with ASCII. > > I think it has to be excluded from mapping in order to not introduce > security issues. > > However... > > There's also SHIFT-JIS to worry about...which apparently some people > actually want to use as their default encoding, despite it being broken > to do so. RedHat apparently refuses to provide it as a locale charset > (due to its brokenness), and it's also not available by default on my > Debian system. People do unfortunately seem to actually use it in real > life. > > https://bugzilla.redhat.com/show_bug.cgi?id=136290 > > So, I'd like to propose this: > The "python-escape" error handler when given a non-decodable byte from > 0x80 to 0xFF will produce values of U+DC80 to U+DCFF. When given a > non-decodable byte from 0x00 to 0x7F, it will be converted to > U+0000-U+007F. On the encoding side, values from U+DC80 to U+DCFF are > encoded into 0x80 to 0xFF, and all other characters are treated in > whatever way the encoding would normally treat them. > > This proposal obviously works for all non-overlapping ASCII supersets, > where 0x00 to 0x7F always decode to U+00 to U+7F. But it also works for > Shift-JIS and other similar ASCII-supersets with overlaps in trailing > bytes of a multibyte sequence. So, a sequence like > "\x81\xFD".decode("shift-jis", "python-escape") will turn into > u"\uDC81\u00fd". Which will then properly encode back into "\x81\xFD". > > The character sets this *doesn't* work for are: ebcdic code pages > (obviously completely unsuitable for a locale encoding on unix), Why is that obvious? The only thing I saw that could exclude EBCDIC would be the requirement that the codes be positive in a char, but on a system where the C compiler treats char as unsigned, EBCDIC would qualify. Of course, the use of EBCDIC would also restrict the other possible code pages to those derived from EBCDIC (rather than the bulk of code pages that are derived from ASCII), due to: If the encoded values associated with each member of the portable character set are not invariant across all locales supported by the implementation, the results achieved by an application accessing those locales are unspecified. > iso2022-* (covered above), and shift-jisx0213 (because it has replaced \ > with yen, and - with overline). > > If it's desirable to work with shift_jisx0213, a modification of the > proposal can be made: Change the second sentence to: "When given a > non-decodable byte from 0x00 to 0x7F, that byte must be the second or > later byte in a multibyte sequence. In such a case, the error handler > will produce the encoding of that byte if it was standing alone (thus in > most encodings, \x00-\x7f turn into U+00-U+7F)." > > It sounds from https://bugzilla.novell.com/show_bug.cgi?id=162501 like > some people do actually use shift_jisx0213, unfortunately. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From tmbdev at gmail.com Tue Apr 28 21:24:40 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Tue, 28 Apr 2009 21:24:40 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F74EE5.6060305@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> Message-ID: <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> On Tue, Apr 28, 2009 at 20:45, "Martin v. L?wis" wrote: > > Furthermore, I don't believe that PEP 383 works consistently on Windows, > > What makes you say that? PEP 383 will have no effect on Windows, > compared to the status quo, whatsoever. > That's what you believe, but it's not clear to me that that follows from your proposal. Your proposal says that utf-8b would be used for file systems, but then you also say that it might be used for command line arguments and environment variables. So, which specific APIs will it be used with on Windows and on POSIX systems? Or will utf-8b simply not be available on Windows at all? What happens if I create a Python version of tar, utf-8b strings slip in there, and I try to use them on Windows? You also assume that all Windows file system functions strictly conform to UTF-16 in practice (not just on paper). Have you verified that? It certainly isn't true across all versions of Windows (since NT originally used UCS-2). What's the situation on Windows CE? Another question on Linux: what happens when I decode a file system path with utf-8b and then pass the resulting unicode string to Gnome? To Qt? To windows.forms? To Java? To a unicode regular expression library? To wprintf? AFAIK, the behavior of most libraries is undefined for the kinds of unicode strings you construct, and it may be undefined in a bad way (crash, buffer overflow, whatever). Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From zooko at zooko.com Tue Apr 28 21:50:55 2009 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Tue, 28 Apr 2009 13:50:55 -0600 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> Message-ID: <944CCCB4-36E5-40D5-8F69-67C45F7FD640@zooko.com> On Apr 28, 2009, at 13:01 PM, Thomas Breuel wrote: > (2) Should the default UTF-8 encoder for file system operations be > allowed to generate illegal byte sequences? > > I think that's a definite no; if I set the encoding for a device to > UTF-8, I never want Python to try to write illegal UTF-8 strings to > my device. ... > If people really want the option of (3c), then I think encoders > related to the file system should by default reject those strings > as illegal because the potential problems from writing them are > just too serious. Printing routines and UI routines could display > them without error (but some clear indication), of course. For what it is worth, sometimes we have to write bytes to a POSIX filesystem even though those bytes are not the encoding of any string in the filesystem's "alleged encoding". The reason is that it is common for there to be filenames which are not the encodings of anything in the filesystem's alleged encoding, and the user expects my tool (Tahoe-LAFS [1]) to copy that name to a distributed storage grid and then copy it back unchanged. Even though, I re-iterate, that name is *not* a valid encoding of anything in the current encoding. This doesn't argue that this behavior has to be the *default* behavior, but it is sometimes necessary. It's too bad that POSIX is so far behind Mac OS X in this respect. (Also so far behind Windows, but I use Mac as the example to show how it is possible to build a better system on top of POSIX.) Hopefully David Wheeler's proposals to tighten the requirements in Linux filesystems will catch on: [2]. Regards, Zooko [1] http://allmydata.org [2] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html From martin at v.loewis.de Tue Apr 28 22:04:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 22:04:12 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> Message-ID: <49F7613C.9000901@v.loewis.de> > Your proposal says that utf-8b would be used for file systems, but then > you also say that it might be used for command line arguments and > environment variables. So, which specific APIs will it be used with on > Windows and on POSIX systems? On Windows, the Wide APIs are already used throughout the code base, e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the specific API for a specific functionality, please read the source code. > Or will utf-8b simply not be available > on Windows at all? It will be available, but it won't be used automatically for anything. > What happens if I create a Python version of tar, > utf-8b strings slip in there, and I try to use them on Windows? No need to create it - the tarfile module is already there. By "in there", do you mean on the file system, or in the tarfile? > You also assume that all Windows file system functions strictly conform > to UTF-16 in practice (not just on paper). Have you verified that? No, I don't assume that. I assume that all functions are strictly available in a Wide character version, and have verified that they are. > What's the situation on Windows CE? I can't see how this question is relevant to the PEP. The PEP says this: # On Windows, Python uses the wide character APIs to access # character-oriented APIs, allowing direct conversion of the # environmental data to Python str objects. This is what it already does, and this is what it will continue to do. > Another question on Linux: what happens when I decode a file system path > with utf-8b and then pass the resulting unicode string to Gnome? To > Qt? You probably get moji-bake, or an error, I didn't try. > To windows.forms? To Java? How do you do that, on Linux? > To a unicode regular expression library? You mean, SRE? SRE will match the code points as individual characters, class Cs. You should have been able to find out that for yourself. > To wprintf? Depends on the wprintf implementation. > AFAIK, the behavior of most libraries is > undefined for the kinds of unicode strings you construct, and it may be > undefined in a bad way (crash, buffer overflow, whatever). Indeed so. This is intentional. If you can crash Python that way, nothing gets worse by this PEP - you can then *already* crash Python in that way. Regards, Martin From martin at v.loewis.de Tue Apr 28 22:05:22 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 22:05:22 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F75343.9020602@mrabarnett.plus.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <49F75343.9020602@mrabarnett.plus.com> Message-ID: <49F76182.9010000@v.loewis.de> MRAB wrote: > Martin v. L?wis wrote: >>> Furthermore, I don't believe that PEP 383 works consistently on Windows, >> >> What makes you say that? PEP 383 will have no effect on Windows, >> compared to the status quo, whatsoever. >> > You could argue that if Windows is actually returning UTF-16 with half > surrogates that they should be altered to conform to what UTF-8 would > have returned. Perhaps - but this is not what the PEP specifies (and intentionally so). Regards, Martin From v+python at g.nevcal.com Tue Apr 28 22:16:34 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 13:16:34 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7510D.7070603@mrabarnett.plus.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F7510D.7070603@mrabarnett.plus.com> Message-ID: <49F76422.4010806@g.nevcal.com> On approximately 4/28/2009 11:55 AM, came the following characters from the keyboard of MRAB: > I've been thinking of "python-escape" only in terms of UTF-8, the only > encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are > decodable. UTF-8 is only mentioned in the sense of having special handling for re-encoding; all the other locales/encodings are implicit. But I also went down that path to some extent. > But if you're talking about using it with other encodings, eg > shift-jisx0213, then I'd suggest the following: > > 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to > half surrogates U+DC00 to U+DCFF. This makes 256 different escape codes. > 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF > are treated as though they are undecodable bytes. This provides escaping for the 256 different escape codes, which is lacking from the PEP. > 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding > are encoded to bytes 0x00 to 0xFF. This reverses the escaping. > 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't > be produced by decoding raise an exception. This is confusing. Did you mean "excluding" instead of "including"? > I think I've covered all the possibilities. :-) You might have. Seems like there could be a simpler scheme, though... 1. Define an escape codepoint. It could be U+003F or U+DC00 or U+F817 or pretty much any defined Unicode codepoint outside the range U+0100 to U+01FF (see rule 3 for why). Only one escape codepoint is needed, this is easier for humans to comprehend. 2. When the escape codepoint is decoded from the byte stream for a bytes interface or found in a str on the str interface, double it. 3. When an undecodable byte 0xPQ is found, decode to the escape codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. 4. When encoding, a sequence of two escape codepoints would be encoded as one escape codepoint, and a sequence of the escape codepoint followed by codepoint U+01PQ would be encoded as byte 0xPQ. Escape codepoints not followed by the escape codepoint, or by a codepoint in the range U+0100 to U+01FF would raise an exception. 5. Provide functions that will perform the same decoding and encoding as would be done by the system calls, for both bytes and str interfaces. This differs from my previous proposal in three ways: A. Doesn't put a marker at the beginning of the string (which I said wasn't necessary even then). B. Allows for a choice of escape codepoint, the previous proposal suggested a specific one. But the final solution will only have a single one, not a user choice, but an implementation choice. C. Uses the range U+0100 to U+01FF for the escape codes, rather than U+0000 to U+00FF. This avoids introducing the NULL character and escape characters into the decoded str representation, yet still uses characters for which glyphs are commonly available, are non-combining, and are easily distinguishable one from another. Rationale: The use of codepoints with visible glyphs makes the escaped string friendlier to display systems, and to people. I still recommend using U+003F as the escape codepoint, but certainly one with a typcially visible glyph available. This avoids what I consider to be an annoyance with the PEP, that the codepoints used are not ones that are easily displayed, so endecodable names could easily result in long strings of indistinguishable substitution characters. It, like MRAB's proposal, also avoids data puns, which is a major problem with the PEP. I consider this proposal to be easier to understand than MRAB's proposal, or the PEP, because of the single escape codepoint and the use of visible characters. This proposal, like my initial one, also decodes and encodes (just the escape codes) values on the str interfaces. This is necessary to avoid data puns on systems that provide both types of interfaces. This proposal could be used for programs that use str values, and easily migrates to a solution that provides an object that provides an abstraction for system interfaces that have two forms. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Tue Apr 28 22:25:07 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 22:25:07 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F74F85.9010800@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> Message-ID: <49F76623.8060903@v.loewis.de> > The UTF-8b representation suffers from the same potential ambiguities as > the PUA characters... Not at all the same ambiguities. Here, again, the two choices: A. use PUA characters to represent undecodable bytes, in particular for UTF-8 (the PEP actually never proposed this to happen). This introduces an ambiguity: two different files in the same directory may decode to the same string name, if one has the PUA character, and the other has a non-decodable byte that gets decoded to the same PUA character. B. use UTF-8b, representing the byte will ill-formed surrogate codes. The same ambiguity does *NOT* exist. If a file on disk already contains an invalid surrogate code in its file name, then the UTF-8b decoder will recognize this as invalid, and decode it byte-for-byte, into three surrogate codes. Hence, the file names that are different on disk are also different in memory. No ambiguity. Regards, Martin From v+python at g.nevcal.com Tue Apr 28 22:34:21 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 13:34:21 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> Message-ID: <49F7684D.4040904@g.nevcal.com> On approximately 4/28/2009 6:01 AM, came the following characters from the keyboard of Lino Mastrodomenico: > 2009/4/28 Glenn Linderman : >> The switch from PUA to half-surrogates does not resolve the issues with the >> encoding not being a 1-to-1 mapping, though. The very fact that you think >> you can get away with use of lone surrogates means that other people might, >> accidentally or intentionally, also use lone surrogates for some other >> purpose. Even in file names. > > It does solve this issue, because (unlike e.g. U+F01FF) '\udcff' is > not a valid Unicode character (not a character at all, really) and the > only way you can put this in a POSIX filename is if you use a very > lenient UTF-8 encoder that gives you b'\xed\xb3\xbf'. Wrong. An 8859-1 locale allows any byte sequence to placed into a POSIX filename. And while U+DCFF is illegal alone in Unicode, it is not illegal in Python str values. And from my testing, Python 3's current UTF-8 encoder will happily provide exactly the bytes value you mention when given U+DCFF. > Since this byte sequence doesn't represent a valid character when > decoded with UTF-8, it should simply be considered an invalid UTF-8 > sequence of three bytes and decoded to '\udced\udcb3\udcbf' (*not* > '\udcff'). > > Martin: maybe the PEP should say this explicitly? > > Note that the round-trip works without ambiguities between '\udcff' in > the filename: > > b'\xed\xb3\xbf' -> '\udced\udcb3\udcbf' -> b'\xed\xb3\xbf' > > and b'\xff' in the filename, decoded by Python to '\udcff': > > b'\xff' -> '\udcff' -> b'\xff' Others have made this suggestion, and it is helpful to the PEP, but not sufficient. As implemented as an error handler, I'm not sure that the b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8 decoder is happy with it. Which, in my testing, it is. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Tue Apr 28 22:37:07 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 13:37:07 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F76623.8060903@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> Message-ID: <49F768F3.8080304@g.nevcal.com> On approximately 4/28/2009 1:25 PM, came the following characters from the keyboard of Martin v. L?wis: >> The UTF-8b representation suffers from the same potential ambiguities as >> the PUA characters... > > Not at all the same ambiguities. Here, again, the two choices: > > A. use PUA characters to represent undecodable bytes, in particular for > UTF-8 (the PEP actually never proposed this to happen). > This introduces an ambiguity: two different files in the same > directory may decode to the same string name, if one has the PUA > character, and the other has a non-decodable byte that gets decoded > to the same PUA character. > > B. use UTF-8b, representing the byte will ill-formed surrogate codes. > The same ambiguity does *NOT* exist. If a file on disk already > contains an invalid surrogate code in its file name, then the UTF-8b > decoder will recognize this as invalid, and decode it byte-for-byte, > into three surrogate codes. Hence, the file names that are different > on disk are also different in memory. No ambiguity. C. File on disk with the invalid surrogate code, accessed via the str interface, no decoding happens, matches in memory the file on disk with the byte that translates to the same surrogate, accessed via the bytes interface. Ambiguity. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Tue Apr 28 23:01:14 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 23:01:14 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7684D.4040904@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <49F7684D.4040904@g.nevcal.com> Message-ID: <49F76E9A.6090701@v.loewis.de> > Others have made this suggestion, and it is helpful to the PEP, but not > sufficient. As implemented as an error handler, I'm not sure that the > b'\xed\xb3\xbf' sequence would trigger the error handler, if the UTF-8 > decoder is happy with it. Which, in my testing, it is. Rest assured that the utf-8b codec will work the way it is specified. Regards, Martin From google at mrabarnett.plus.com Tue Apr 28 23:01:44 2009 From: google at mrabarnett.plus.com (MRAB) Date: Tue, 28 Apr 2009 22:01:44 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F76422.4010806@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F7510D.7070603@mrabarnett.plus.com> <49F76422.4010806@g.nevcal.com> Message-ID: <49F76EB8.4030900@mrabarnett.plus.com> Glenn Linderman wrote: > On approximately 4/28/2009 11:55 AM, came the following characters from > the keyboard of MRAB: >> I've been thinking of "python-escape" only in terms of UTF-8, the only >> encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are >> decodable. > > > UTF-8 is only mentioned in the sense of having special handling for > re-encoding; all the other locales/encodings are implicit. But I also > went down that path to some extent. > > >> But if you're talking about using it with other encodings, eg >> shift-jisx0213, then I'd suggest the following: >> >> 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to >> half surrogates U+DC00 to U+DCFF. > > > This makes 256 different escape codes. > > Speaking personally, I won't call them 'escape codes'. I'd use the term 'escape code' to mean a character that changes the interpretation of the next character(s). >> 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF >> are treated as though they are undecodable bytes. > > > This provides escaping for the 256 different escape codes, which is > lacking from the PEP. > > >> 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding >> are encoded to bytes 0x00 to 0xFF. > > > This reverses the escaping. > > >> 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't >> be produced by decoding raise an exception. > > > This is confusing. Did you mean "excluding" instead of "including"? > Perhaps I should've said "Any codepoint which can't be produced by decoding should raise an exception". For example, decoding with UTF-8b will never produce U+DC00, therefore attempting to encode U+DC00 should raise an exception and not produce 0x00. > >> I think I've covered all the possibilities. :-) > > > You might have. Seems like there could be a simpler scheme, though... > > 1. Define an escape codepoint. It could be U+003F or U+DC00 or U+F817 > or pretty much any defined Unicode codepoint outside the range U+0100 to > U+01FF (see rule 3 for why). Only one escape codepoint is needed, this > is easier for humans to comprehend. > > 2. When the escape codepoint is decoded from the byte stream for a bytes > interface or found in a str on the str interface, double it. > > 3. When an undecodable byte 0xPQ is found, decode to the escape > codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. > > 4. When encoding, a sequence of two escape codepoints would be encoded > as one escape codepoint, and a sequence of the escape codepoint followed > by codepoint U+01PQ would be encoded as byte 0xPQ. Escape codepoints > not followed by the escape codepoint, or by a codepoint in the range > U+0100 to U+01FF would raise an exception. > > 5. Provide functions that will perform the same decoding and encoding as > would be done by the system calls, for both bytes and str interfaces. > > > This differs from my previous proposal in three ways: > > A. Doesn't put a marker at the beginning of the string (which I said > wasn't necessary even then). > > B. Allows for a choice of escape codepoint, the previous proposal > suggested a specific one. But the final solution will only have a > single one, not a user choice, but an implementation choice. > > C. Uses the range U+0100 to U+01FF for the escape codes, rather than > U+0000 to U+00FF. This avoids introducing the NULL character and escape > characters into the decoded str representation, yet still uses > characters for which glyphs are commonly available, are non-combining, > and are easily distinguishable one from another. > > Rationale: > > The use of codepoints with visible glyphs makes the escaped string > friendlier to display systems, and to people. I still recommend using > U+003F as the escape codepoint, but certainly one with a typcially > visible glyph available. This avoids what I consider to be an annoyance > with the PEP, that the codepoints used are not ones that are easily > displayed, so endecodable names could easily result in long strings of > indistinguishable substitution characters. > Perhaps the escape character should be U+005C. ;-) > It, like MRAB's proposal, also avoids data puns, which is a major > problem with the PEP. I consider this proposal to be easier to > understand than MRAB's proposal, or the PEP, because of the single > escape codepoint and the use of visible characters. > > This proposal, like my initial one, also decodes and encodes (just the > escape codes) values on the str interfaces. This is necessary to avoid > data puns on systems that provide both types of interfaces. > > This proposal could be used for programs that use str values, and easily > migrates to a solution that provides an object that provides an > abstraction for system interfaces that have two forms. > From martin at v.loewis.de Tue Apr 28 23:02:59 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 28 Apr 2009 23:02:59 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F768F3.8080304@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> Message-ID: <49F76F03.8040702@v.loewis.de> Glenn Linderman wrote: > On approximately 4/28/2009 1:25 PM, came the following characters from > the keyboard of Martin v. L?wis: >>> The UTF-8b representation suffers from the same potential ambiguities as >>> the PUA characters... >> >> Not at all the same ambiguities. Here, again, the two choices: >> >> A. use PUA characters to represent undecodable bytes, in particular for >> UTF-8 (the PEP actually never proposed this to happen). >> This introduces an ambiguity: two different files in the same >> directory may decode to the same string name, if one has the PUA >> character, and the other has a non-decodable byte that gets decoded >> to the same PUA character. >> >> B. use UTF-8b, representing the byte will ill-formed surrogate codes. >> The same ambiguity does *NOT* exist. If a file on disk already >> contains an invalid surrogate code in its file name, then the UTF-8b >> decoder will recognize this as invalid, and decode it byte-for-byte, >> into three surrogate codes. Hence, the file names that are different >> on disk are also different in memory. No ambiguity. > > C. File on disk with the invalid surrogate code, accessed via the str > interface, no decoding happens, matches in memory the file on disk with > the byte that translates to the same surrogate, accessed via the bytes > interface. Ambiguity. Is that an alternative to A and B? Regards, Martin From tmbdev at gmail.com Wed Apr 29 00:30:42 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Wed, 29 Apr 2009 00:30:42 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F7613C.9000901@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> Message-ID: <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> > > On Windows, the Wide APIs are already used throughout the code base, > e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the > specific API for a specific functionality, please read the source code. > [...] > No, I don't assume that. I assume that all functions are strictly > available in a Wide character version, and have verified that they are. The wide APIs use UTF-16. UTF-16 suffers from the same problem as UTF-8: not all sequences of words are valid UTF-16 sequences. In particular, sequences containing isolated surrogate pairs are not well-formed according to the Unicode standard. Therefore, the existence of a wide character API function does not guarantee that the wide character strings it returns can be converted into valid unicode strings. And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them. > If you can crash Python that way, > nothing gets worse by this PEP - you can then *already* crash Python > in that way. Yes, but AFAIK, Python does not currently have functions that, as part of correct usage and normal operation, are intended to generate malformed unicode strings. Under your proposal, passing the output from a correctly implemented file system or other OS function to a correctly written library using unicode strings may crash Python. In order to avoid that, every library that's built into Python would have to be checked and updated to deal with both the Unicode standard and your extension to it. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Apr 29 00:46:17 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Apr 2009 22:46:17 +0000 (UTC) Subject: [Python-Dev] PEP 383 (again) References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> Message-ID: Thomas Breuel gmail.com> writes: > > And, in fact, Windows Vista happily creates files with malformed UTF-16 encodings, and os.listdir() happily returns them. The PEP won't change that, so what's the problem exactly? > Under your proposal, passing the output from a correctly implemented file system or other OS function to a correctly written library using unicode strings may crash Python. That's a very dishonest formulation. It cannot crash Python; it can only crash hypothetical third-party programs or libraries with deficient error checking and unreasonable assumptions about input data. (and, of course, you haven't even proven those programs or libraries exist) Antoine. From v+python at g.nevcal.com Wed Apr 29 00:52:22 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 15:52:22 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F76F03.8040702@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> Message-ID: <49F788A6.3040702@g.nevcal.com> On approximately 4/28/2009 2:02 PM, came the following characters from the keyboard of Martin v. L?wis: > Glenn Linderman wrote: >> On approximately 4/28/2009 1:25 PM, came the following characters from >> the keyboard of Martin v. L?wis: >>>> The UTF-8b representation suffers from the same potential ambiguities as >>>> the PUA characters... >>> Not at all the same ambiguities. Here, again, the two choices: >>> >>> A. use PUA characters to represent undecodable bytes, in particular for >>> UTF-8 (the PEP actually never proposed this to happen). >>> This introduces an ambiguity: two different files in the same >>> directory may decode to the same string name, if one has the PUA >>> character, and the other has a non-decodable byte that gets decoded >>> to the same PUA character. >>> >>> B. use UTF-8b, representing the byte will ill-formed surrogate codes. >>> The same ambiguity does *NOT* exist. If a file on disk already >>> contains an invalid surrogate code in its file name, then the UTF-8b >>> decoder will recognize this as invalid, and decode it byte-for-byte, >>> into three surrogate codes. Hence, the file names that are different >>> on disk are also different in memory. No ambiguity. >> C. File on disk with the invalid surrogate code, accessed via the str >> interface, no decoding happens, matches in memory the file on disk with >> the byte that translates to the same surrogate, accessed via the bytes >> interface. Ambiguity. > > Is that an alternative to A and B? I guess it is an adjunct to case B, the current PEP. It is what happens when using the PEP on a system that provides both bytes and str interfaces, and both get used. On a Windows system, perhaps the ambiguous case would be the use of the str API and bytes APIs producing different memory names for the same file that contains a (Unicode-illegal) half surrogate. The half-surrogate would seem to get decoded to 3 half surrogates if accessed via the bytes interface, but only one via the str interface. The version with 3 half surrogates could match another name that actually contains 3 half surrogates, that is accessed via the str interface. I can't actually tell by reading the PEP whether it affects Windows bytes interfaces or is only implemented on POSIX, so that POSIX has a str interface. If it is only implemented on POSIX, then the current scheme (now escaping the hundreds of escape codes) could work, within a single platform... but it would still suffer from displaying garbage (sequences of replacement characters) in file listings displayed or printed. There is no way, once the string is adjusted to contain replacement characters for display, to distinguish one file name from another, if they are identical except for a same-length sequence of different undecodable bytes. The concept of a function that allows the same decoding and encoding process for 3rd party interfaces is still missing from the PEP; implementation of the PEP would require that all interfaces to 3rd party software that accept file names would have to be transcoded by the interface layer. Or else such software would have to use the bytes interfaces directly, and if they do, there is no need for the PEP. So I see the PEP as a partial solution to a limited problem, that on the one hand potentially produces indistinguishable sequences of replacement characters in filenames, rather than the mojibake (which is at least distinguishable), and on the other hand, doesn't help software that also uses 3rd party libraries to avoid the use of bytes APIs for accessing file names. There are other encodings that produce more distinguishable mojibake, and would work in the same situations as the PEP. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From cs at zip.com.au Wed Apr 29 01:06:55 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 09:06:55 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6A7C0.6090105@g.nevcal.com> Message-ID: <20090428230655.GA23830@cskk.homeip.net> I think I may be able to resolve Glenn's issues with the scheme lower down (through careful use of definitions and hand waving). On 27Apr2009 23:52, Glenn Linderman wrote: > On approximately 4/27/2009 7:11 PM, came the following characters from > the keyboard of Cameron Simpson: [...] >> There may be puns. So what? Use the right strings for the right purpose >> and all will be well. >> >> I think what is missing here, and missing from Martin's PEP, is some >> utility functions for the os.* namespace. >> >> PROPOSAL: add to the PEP the following functions: >> >> os.fsdecode(bytes) -> funny-encoded Unicode >> This is what os.listdir() does to produce the strings it hands out. >> os.fsencode(funny-string) -> bytes >> This is what open(filename,..) does to turn the filename into bytes >> for the POSIX open. >> os.pathencode(your-string) -> funny-encoded-Unicode >> This is what you must do to a de novo string to turn it into a >> string suitable for use by open. >> Importantly, for most strings not hand crafted to have weird >> sequences in them, it is a no-op. But it will recode your puns >> for survival. [...] >>> So assume a non-decodable sequence in a name. That puts us into >>> Martin's funny-decode scheme. His funny-decode scheme produces a >>> bare string, indistinguishable from a bare string that would be >>> produced by a str API that happens to contain that same sequence. >>> Data puns. >>> >> >> See my proposal above. Does it address your concerns? A program still >> must know the providence of the string, and _if_ you're working with >> non-decodable sequences in a names then you should transmute then into >> the funny encoding using the os.pathencode() function described above. >> >> In this way the punning issue can be avoided. >> _Lacking_ such a function, your punning concern is valid. > > Seems like one would also desire os.pathdecode to do the reverse. Yes. > And > also versions that take or produce bytes from funny-encoded strings. Isn't that the first two functions above? > Then, if programs were re-coded to perform these transformations on what > you call de novo strings, then the scheme would work. > But I think a large part of the incentive for the PEP is to try to > invent a scheme that intentionally allows for the puns, so that programs > do not need to be recoded in this manner, and yet still work. I don't > think such a scheme exists. I agree no such scheme exists. I don't think it can, just using strings. But _unless_ you have made a de novo handcrafted string with ill-formed sequences in it, you don't need to bother because you won't _have_ puns. If Martin's using half surrogates to encode "undecodable" bytes, then no normal string should conflict because a normal string will contain _only_ Unicode scalar values. Half surrogate code points are not such. The advantage here is that unless you've deliberately constructed an ill-formed unicode string, you _do_not_ need to recode into funny-encoding, because you are already compatible. Somewhat like one doesn't need to recode ASCII into UTF-8, because ASCII is unchanged. > If there is going to be a required transformation from de novo strings > to funny-encoded strings, then why not make one that people can actually > see and compare and decode from the displayable form, by using > displayable characters instead of lone surrogates? Because that would _not_ be a no-op for well formed Unicode strings. That reason is sufficient for me. I consider the fact that well-formed Unicode -> funny-encoded is a no-op to be an enormous feature of Martin's scheme. Unless I'm missing something, there _are_no_puns_ between funny-encoded strings and well formed unicode strings. >>>> I suppose if your program carefully constructs a unicode string riddled >>>> with half-surrogates etc and imagines something specific should happen >>>> to them on the way to being POSIX bytes then you might have a problem... >>>> >>> Right. Or someone else's program does that. I've just spent a cosy 20 minutes with my copy of Unicode 5.0 and a coffee, reading section 3.9 (Unicode Encoding Forms). I now do not believe your scenario makes sense. Someone can construct a Python3 string containing code points that includes surrogates. Granted. However such a string is not meaningful because it is not well-formed (D85). It's ill-formed (D84). It is not sane to expect it to translate into a POSIX byte sequence, be it UTF-8 or anything else, unless it is accompanied by some kind of explicit mapping provided by the programmer. Absent that mapping, it's nonsense in much the same way that a non-decodable UTF-8 byte sequence is nonsense. For example, Martin's funny-encoding is such an explicit mapping. >>>I only want to use >>> Unicode file names. But if those other file names exist, I want to >>> be able to access them, and not accidentally get a different file. But those other names _don't_ exist. >>>> Also, by avoiding reuse of legitimate characters in the encoding we can >>>> avoid your issue with losing track of where a string came from; >>>> legitimate characters are currently untouched by Martin's scheme, except >>>> for the normal "bytes<->string via the user's locale" translation that >>>> must already happen, and there you're aided by byets and strings being >>>> different types. >>>> >>> There are abnormal characters, but there are no illegal characters. >> >> I though half-surrogates were illegal in well formed Unicode. I confess >> to being weak in this area. By "legitimate" above I meant things like >> half-surrogates which, like quarks, should not occur alone? > > "Illegal" just means violating the accepted rules. I think that either we've lost track of what each other is saying, or you're wrong here. And my poor terminology hasn't been helping. What we've got: (1) Byte sequence files names in the POSIX file system. It doesn't matter whether the underlying storage is a real POSIX filesystem or mostly POSIX one like MacOSX HFS or a remotely attached non-POSIX filesystem like a Windows one, because we're talking through the POSIX API, and it is handing us byte sequences, which will expect may contain anything except a NUL. (2) Under Martin's scheme, os.listdir() et al hand us (and accept) funny-encoded Python3 strings, which are strings of Unicode code units (D77). Particularly, if there were bytes in the POSIX byte string that did not decode into Unicode scalar values (D76) then each such byte is encoded as a surrogate (D71,72,73,74). it is important to note here that because surrogates are _not_ Unicode scalar values, the is no punning between the two sets of values. (3) Other Python3 strings that have not been through Martin's mangler in either direction. Ordinary strings. Your concern is that, handed a string, a programmer could misuse (3) as (2) or vice versa because of punning. In a well-formed unicode string there are no surrogates; surrogates only occur in UTF-16 _encodings_ of Unicode strings (D75). Therefore, it _is_ possible to inspect a string, if one cared, to see if it is funny-encoded or "raw". One may get two different answers: - If there are surrogate code units then it must be funny-encoded and will therefore work perfectly if handed to a os.* interface. - If there are no surrogate code units the it may be funny encoded or it may not have been through Martin's funny-encoder, you can't tell. However, this doesn't matter because the encoder is a no-op for such strings. Therefore it will work perfectly if handed to an os.* interface. The only gap in this is a specially crated string containing surrogate code points that did not come via Martin's encoder. But such a string cannot come from a user interface, which will accept only characters and there only include unicode scalar values. Such a string can only be explicitly constructed (eg with a \uD802 code point). And if something constructs such a string, it must have in mind an explicit interpretation of those code points, which means it is the _constructor_ on whom the burden of translation lies. Does this make sesne to you, or have you a counter example in mind? > In this case, the > accepted rules are those enforced by the file system (at the bytes or > str API levels), and by Python (for the str manipulations). None of > those rules outlaw lone surrogates. Hence, while all of the systems > under discussion can handle all Unicode characters in one way or > another, none of them require that all Unicode rules are followed. Yes, > you are correct that lone surrogates are illegal in Unicode. No, none > of the accepted rules for these systems require Unicode. However, Martin's scheme explicitly translates these ill-formed sequences into Python3 strings and back, losslessly. You can have surrogates in the filesystem storage/API on Windows. You can have non-UTF-8-decodable sequences in the POSIX filesystem layer too. They're all taken in and handled. In Python3 space, one might have a bytes object with a raw POSIX byte filename in it. Presumably one can also have a byte string with a raw (UTF-16) WIndows filename in it. They're not strings, so no confusion. But there's no _string_ for these things without a matching string<->bytestring mapping associated with it. If you have a Python3 string which is well-formed Unicode, then you can hand it to the os.* interfaces and the Right Thing will happen (on Windows just because it stored Unicode and on POSIX provided you agree that your locale/getfilesystemencoding() is the right thing). If you have a string that isn't well-formed, then the meaning of any code points which are not Unicode scalar values is not well defined without some auxiliary stuff in the app. >>> NTFS permits any 16-bit "character" code, including abnormal ones, >>> including half-surrogates, and including full surrogate sequences >>> that decode to PUA characters. POSIX permits all byte sequences, >>> including things that look like UTF-8, things that don't look like >>> UTF-8, things that look like half-surrogates, and things that look >>> like full surrogate sequences that decode to PUA characters. See above. I think this is addressed. [...] >> These are existing file objects, I'll take them as source 1. They get >> encoded for release by os.listdir() et al. >> >>> And yes, strings can be generated from scratch. >> >> I take this to be source 2. > > One variation of source 2 is reading output from other programs, such as > ls (POSIX) or dir (Windows). Sure. But that is reading byte sequences, and one must again know the encoding. If that is known and the input decoded happily into Unicode scalar values, then there is no issue. If the input didn't decode, then one must make some decision about what the non-decodable bits mean. >> I think I agree with all the discussion that followed, and think the >> real problem is lack of utlities functions to funny-encode source 2 >> strings for use. hence the proposal above. > > I think we understand each other now. I think your proposal could work, > Cameron, although when recoding applications to use your proposal, I'd > find it easier to use the "file name object" that others have proposed. > I think that because either your proposal or the object proposals > require recoding the application, that they will not be accepted. I > think that because the PEP 383 allows data puns, that it should not be > accepted in its present form. I'm of the option now that the puns can only occur when the source 2 string has surrogates, and either those surrogates are chosen to match the funny-encoding, in which case the pun is not a pun, or the surrogates are chosen according to a different scheme in which case source 2 is obliged to provide a mapping. A source 2 string of only Unicode scalar values doesn't need remapping. > I think your if your proposal is accepted, that it then becomes possible > to use an encoding that uses visible characters, which makes it easier > for people to understand and verify. An encoding such as the one I > suggested, but perhaps using a more obscure character, if there is one, > but yet doesn't violate true Unicode. I think any scheme that uses any Unicode scalar value as an escape character _inherently_ introduces puns, and puns that are easier to encounter. I think the real strength of Martin's scheme is exactly that bytes strings that needed the funny-encoding _do_ produce ill-formed Unicode strings, because such strings _cannot_ conflict with well-formed strings. I think your desire for a human readable encoding is valid, but it should be a further purely "presentation" step, somewhat like quoted-printable encoding in MIME, and not the scheme used by Martin. > I think it should transform all > data, from str and bytes interfaces, and produce only str values > containing conforming Unicode, escaping all the non-conforming sequences > in some manner. This would make the strings truly readable, as long as > fonts for all the characters are available. But I think it would just move the punning. A human readable string with readable escapes in it may be funny-encoded. _Or_ it may be "raw", with funny-encoded yet to happen; after all only might weirdly be dealing with a filename which contained post-funny-encode visible sequences in it. SO you're right back to _guessing_ what you're looking at. WIth the surrogate scheme you only have to guess if there are surrogates, but then you _know_ that you're dealing with a special encoding scheme; it is certain - the guess is about which scheme. If you're working in a domain with no ill-formed strings you never need to worry at all. With a visible/printable-encoding such as you advocate the guess is about whether the scheme have even been used, which is why I think it is worse. > And I had already suggested > the utility functions you are suggesting, actually, in my first tirade > against PEP 383 (search for "The encode and decode functions should be > available for coders to use, that code to external > interfaces, either OS or 3rd party packages, that do not use this > encoding scheme"). I must have missed that sentence. But it sounds like we want the same facilities at least. > The solution that was proposed in the lead up to releasing Python 3.0 > was to offer both bytes and str interfaces (so we have those), and then > for those that want to have a single portable implementation that can > access all data, an object that encapsulates the differences, and the > variant system APIs. (file system is one, command line is another, > environment is another, I'm not sure if there are more.) I haven't > heard if any progress on such an encapsulating object has been made; the > people that proposed such have been rather quiet about this PEP. I > would expect that an object implementation would provide display > strings, and APIs to submit de novo str and bytes values to an object, > which would run the appropriate encoding on them. I think covering these other cases is quite messy, if only because there's not even agreement amonst existing command line apps about all that stuff. Regarding "APIs to submit de novo str and bytes values to an object, which would run the appropriate encoding on them" I think such a facility for de novo strings must require the caller to provide a handler/mapper for the not-well-formed parts of such strings if they occur. > Programs that want to use str interfaces on POSIX will see a subset of > files on systems that contain files whose bytes filenames are not > decodable. Not under Martin's scheme, because all bytes filenames _are_ decoded. > If a sysadmin wants to standardize on UTF-8 names > universally, they can use something like convmv to clean up existing > file names that don't conform. Programs that use str interfaces on > POSIX system will work fine, but with a subset of the files. When that > is unacceptable, they can either be recoded to use the bytes interfaces, > or the hopefully forthcoming object encapsulation. The issue then will > be what technique will be used to transform bytes into display names, > but since the display names would never be fed back to the objects > directly (but the object would have an interface to accept de novo str > and de novo bytes) then it is just a display issue, and one that uses > visible characters would seem more useful in my mind, than one that uses > half-surrogates or PUAs. I agree it might be handy to have a display function, but isn't repr() exactly that, now I think of it? Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ "waste cycles drawing trendy 3D junk" - Mac Eudora v3 config option From v+python at g.nevcal.com Wed Apr 29 01:02:13 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 16:02:13 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F76EB8.4030900@mrabarnett.plus.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F7510D.7070603@mrabarnett.plus.com> <49F76422.4010806@g.nevcal.com> <49F76EB8.4030900@mrabarnett.plus.com> Message-ID: <49F78AF5.3080406@g.nevcal.com> On approximately 4/28/2009 2:01 PM, came the following characters from the keyboard of MRAB: > Glenn Linderman wrote: >> On approximately 4/28/2009 11:55 AM, came the following characters >> from the keyboard of MRAB: >>> I've been thinking of "python-escape" only in terms of UTF-8, the only >>> encoding mentioned in the PEP. In UTF-8, bytes 0x00 to 0x7F are >>> decodable. >> >> >> UTF-8 is only mentioned in the sense of having special handling for >> re-encoding; all the other locales/encodings are implicit. But I also >> went down that path to some extent. >> >> >>> But if you're talking about using it with other encodings, eg >>> shift-jisx0213, then I'd suggest the following: >>> >>> 1. Bytes 0x00 to 0xFF which can't normally be decoded are decoded to >>> half surrogates U+DC00 to U+DCFF. >> >> >> This makes 256 different escape codes. >> >> > Speaking personally, I won't call them 'escape codes'. I'd use the term > 'escape code' to mean a character that changes the interpretation of the > next character(s). OK, I won't be offended if you don't call them 'escape codes'. :) But what else to call them? My use of that term is a bit backwards, perhaps... what happens is that because these 256 half surrogates are used to decode otherwise undecodable bytes, they themselves must be "escaped" or translated into something different, when they appear in the byte sequence. The process described reserves a set of codepoints for use, and requires that that same set of codepoints be translated using a similar mechanism to avoid their untranslated appearance in the resulting str. Escape codes have the same sort of characteristic... by replacing their normal use for some other use, they must themselves have a replacement. Anyway, I think we are communicating successfully. >>> 2. Bytes which would have decoded to half surrogates U+DC00 to U+DCFF >>> are treated as though they are undecodable bytes. >> >> >> This provides escaping for the 256 different escape codes, which is >> lacking from the PEP. >> >> >>> 3. Half surrogates U+DC00 to U+DCFF which can be produced by decoding >>> are encoded to bytes 0x00 to 0xFF. >> >> >> This reverses the escaping. >> >> >>> 4. Codepoints, including half surrogates U+DC00 to U+DCFF, which can't >>> be produced by decoding raise an exception. >> >> >> This is confusing. Did you mean "excluding" instead of "including"? >> > Perhaps I should've said "Any codepoint which can't be produced by > decoding should raise an exception". Yes, your rephrasing is clearer, regarding your intention. > For example, decoding with UTF-8b will never produce U+DC00, therefore > attempting to encode U+DC00 should raise an exception and not produce > 0x00. Decoding with UTF-8b might never produce U+DC00, but then again, it won't handle the random byte string, either. >>> I think I've covered all the possibilities. :-) >> >> >> You might have. Seems like there could be a simpler scheme, though... >> >> 1. Define an escape codepoint. It could be U+003F or U+DC00 or U+F817 >> or pretty much any defined Unicode codepoint outside the range U+0100 >> to U+01FF (see rule 3 for why). Only one escape codepoint is needed, >> this is easier for humans to comprehend. >> >> 2. When the escape codepoint is decoded from the byte stream for a >> bytes interface or found in a str on the str interface, double it. >> >> 3. When an undecodable byte 0xPQ is found, decode to the escape >> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. >> >> 4. When encoding, a sequence of two escape codepoints would be encoded >> as one escape codepoint, and a sequence of the escape codepoint >> followed by codepoint U+01PQ would be encoded as byte 0xPQ. Escape >> codepoints not followed by the escape codepoint, or by a codepoint in >> the range U+0100 to U+01FF would raise an exception. >> >> 5. Provide functions that will perform the same decoding and encoding >> as would be done by the system calls, for both bytes and str interfaces. >> >> >> This differs from my previous proposal in three ways: >> >> A. Doesn't put a marker at the beginning of the string (which I said >> wasn't necessary even then). >> >> B. Allows for a choice of escape codepoint, the previous proposal >> suggested a specific one. But the final solution will only have a >> single one, not a user choice, but an implementation choice. >> >> C. Uses the range U+0100 to U+01FF for the escape codes, rather than >> U+0000 to U+00FF. This avoids introducing the NULL character and >> escape characters into the decoded str representation, yet still uses >> characters for which glyphs are commonly available, are non-combining, >> and are easily distinguishable one from another. >> >> Rationale: >> >> The use of codepoints with visible glyphs makes the escaped string >> friendlier to display systems, and to people. I still recommend using >> U+003F as the escape codepoint, but certainly one with a typcially >> visible glyph available. This avoids what I consider to be an >> annoyance with the PEP, that the codepoints used are not ones that are >> easily displayed, so endecodable names could easily result in long >> strings of indistinguishable substitution characters. >> > Perhaps the escape character should be U+005C. ;-) Windows users everywhere would love you for that one :) >> It, like MRAB's proposal, also avoids data puns, which is a major >> problem with the PEP. I consider this proposal to be easier to >> understand than MRAB's proposal, or the PEP, because of the single >> escape codepoint and the use of visible characters. >> >> This proposal, like my initial one, also decodes and encodes (just the >> escape codes) values on the str interfaces. This is necessary to >> avoid data puns on systems that provide both types of interfaces. >> >> This proposal could be used for programs that use str values, and >> easily migrates to a solution that provides an object that provides an >> abstraction for system interfaces that have two forms. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From a.badger at gmail.com Wed Apr 29 04:09:42 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 28 Apr 2009 19:09:42 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001> <49F6FA93.7080302@avl.com> Message-ID: <49F7B6E6.20808@gmail.com> Zooko O'Whielacronx wrote: > On Apr 28, 2009, at 6:46 AM, Hrvoje Niksic wrote: >> If you switch to iso8859-15 only in the presence of undecodable UTF-8, >> then you have the same round-trip problem as the PEP: both b'\xff' and >> b'\xc3\xbf' will be converted to u'\u00ff' without a way to >> unambiguously recover the original file name. > > Why do you say that? It seems to work as I expected here: > >>>> '\xff'.decode('iso-8859-15') > u'\xff' >>>> '\xc3\xbf'.decode('iso-8859-15') > u'\xc3\xbf' >>>> >>>> >>>> >>>> '\xff'.decode('cp1252') > u'\xff' >>>> '\xc3\xbf'.decode('cp1252') > u'\xc3\xbf' > You're not showing that this is a fallback path. What won't work is first trying a local encoding (in the following example, utf-8) and then if that doesn't work, trying a one-byte encoding like iso8859-15: try: file1 = '\xff'.decode('utf-8') except UnicodeDecodeError: file1 = '\xff'.decode('iso-8859-15') print repr(file1) try: file2 = '\xc3\xbf'.decode('utf-8') except UnicodeDecodeError: file2 = '\xc3\xbf'.decode('iso-8859-15') print repr(file2) That prints: u'\xff' u'\xff' The two encodings can map different bytes to the same unicode code point so you can't do this type of thing without recording what encoding was used in the translation. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From cs at zip.com.au Wed Apr 29 04:33:53 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 12:33:53 +1000 Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in System Character?Interfaces In-Reply-To: Message-ID: <20090429023353.GA11210@cskk.homeip.net> On 28Apr2009 11:49, Antoine Pitrou wrote: | Paul Moore gmail.com> writes: | > | > I've yet to hear anyone claim that they would have an actual problem | > with a specific piece of code they have written. | | Yep, that's the problem. Lots of theoretical problems noone has ever encountered | brought up against a PEP which resolves some actual problems people encounter on | a regular basis. | | For the record, I'm +1 on the PEP being accepted and implemented as soon as | possible (preferably before 3.1). I am also +1 on this. I would like utility functions to perform: os-bytes->funny-encoded funny-encoded->os-bytes or explicit example code snippets for same in the PEP text. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ This person is currently undergoing electric shock therapy at Agnews Developmental Center in San Jose, California. All his opinions are static, please ignore him. Thank you, Nurse Ratched - the sig quote of Bob "Another beer, please" Christ From rdmurray at bitdance.com Wed Apr 29 04:40:04 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 28 Apr 2009 22:40:04 -0400 (EDT) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F768F3.8080304@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> Message-ID: On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote: > C. File on disk with the invalid surrogate code, accessed via the str > interface, no decoding happens, matches in memory the file on disk with the > byte that translates to the same surrogate, accessed via the bytes interface. > Ambiguity. Unless I'm missing something, one of these is type str, and the other is type bytes, so no ambiguity. --David From cs at zip.com.au Wed Apr 29 04:40:26 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 12:40:26 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <7e51d15d0904280537n22168cfl16c58f727be1755e@mail.gmail.com> Message-ID: <20090429024026.GA15177@cskk.homeip.net> On 28Apr2009 14:37, Thomas Breuel wrote: | But the biggest problem with the proposal is that it isn't needed: if you | want to be able to turn arbitrary byte sequences into unicode strings and | back, just set your encoding to iso8859-15. That already works and it | doesn't require any changes. No it doesn't. It does transcode without throwing exceptions. On POSIX. (On Windows? I doubt it - windows isn't using an 8-bit scheme. I believe.) But it utter destorys any hope of working in any other locale nicely. The PEP lets you work losslessly in other locales. It _may_ require some app care for particular very weird strings that don't come from the filesystem, but as far as I can see only in circumstances where such care would be needed anyway i.e. you've got to do special stuff for weirdness in the first place. Weird == "ill-formed unicode string" here. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ I just kept it wide-open thinking it would correct itself. Then I ran out of talent. - C. Fittipaldi From a.badger at gmail.com Wed Apr 29 04:39:20 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 28 Apr 2009 19:39:20 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F73635.6010105@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> Message-ID: <49F7BDD8.3010202@gmail.com> Martin v. L?wis wrote: >> Since the serialization of the Unicode string is likely to use UTF-8, >> and the string for such a file will include half surrogates, the >> application may raise an exception when encoding the names for a >> configuration file. These encoding exceptions will be as rare as the >> unusual names (which the careful I18N aware developer has probably >> eradicated from his system), and thus will appear late. > > There are trade-offs to any solution; if there was a solution without > trade-offs, it would be implemented already. > > The Python UTF-8 codec will happily encode half-surrogates; people argue > that it is a bug that it does so, however, it would help in this > specific case. Can we use this encoding scheme for writing into files as well? We've turned the filename with undecodable bytes into a string with half surrogates. Putting that string into a file has to turn them into bytes at some level. Can we use the python-escape error handler to achieve that somehow? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From larry at hastings.org Wed Apr 29 05:03:24 2009 From: larry at hastings.org (Larry Hastings) Date: Tue, 28 Apr 2009 20:03:24 -0700 Subject: [Python-Dev] Proposed: a new function-based C API for declaring Python types Message-ID: <49F7C37C.5090305@hastings.org> EXECUTIVE SUMMARY I've written a patch against py3k trunk creating a new function-based API for creating extension types in C. This allows PyTypeObject to become a (mostly) private structure. THE PROBLEM Here's how you create an extension type using the current API. * First, find some code that already has a working type declaration. Copy and paste their fifty-line PyTypeObject declaration, then hack it up until it looks like what you need. * Next--hey! There *is* no next, you're done. You can immediately create an object using your type and pass it into the Python interpreter and it would work fine. You are encouraged to call PyType_Ready(), but this isn't required and it's often skipped. This approach causes two problems. 1) The Python interpreter *must support* and *cannot change* the PyTypeObject structure, forever. Any meaningful change to the structure will break every extension. This has many consequences: a) Fields that are no longer used must be left in place, forever, as ignored placeholders if need be. Py3k cleaned up a lot of these, but it's already picked up a new one ("tp_compare" is now "tp_reserved"). b) Internal implementation details of the type system must be public. c) The interpreter can't even use a different structure internally, because extensions are free to pass in objects using PyTypeObjects the interpreter has never seen before. 2) As a programming interface this lacks a certain gentility. It clearly *works*, but it requires programmers to copy and paste with a large structure mostly containing NULLs, which they must pick carefully through to change just a few fields. THE SOLUTION My patch creates a new function-based extension type definition API. You create a type by calling PyType_New(), then call various accessor functions on the type (PyType_SetString and the like), and when your type has been completely populated you must call PyType_Activate() to enable it for use. With this API available, extension authors no longer need to directly see the innards of the PyTypeObject structure. Well, most of the fields anyway. There are a few shortcut macros in CPython that need to continue working for performance reasons, so the "tp_flags" and "tp_dealloc" fields need to remain publically visible. One feature worth mentioning is that the API is type-safe. Many such APIs would have had one generic "PyType_SetPointer", taking an identifier for the field and a void * for its value, but this would have lost type safety. Another approach would have been to have one accessor per field ("PyType_SetAddFunction"), but this would have exploded the number of functions in the API. My API splits the difference: each distinct *type* has its own set of accessors ("PyType_GetSSizeT") which takes an identifier specifying which field you wish to get or set. SIDE-EFFECTS OF THE API The major change resulting from this API: all PyTypeObjects must now be *pointers* rather than static instances. For example, the external declaration of PyType_Type itself changes from this: PyAPI_DATA(PyTypeObject) PyType_Type; to this: PyAPI_DATA(PyTypeObject *) PyType_Type; This gives rise to the first headache caused by the API: type casts on type objects. It took me a day and a half to realize that this, from Modules/_weakref.c: PyModule_AddObject(m, "ref", (PyObject *) &_PyWeakref_RefType); really needed to be this: PyModule_AddObject(m, "ref", (PyObject *) _PyWeakref_RefType); Hopefully I've already found most of these in CPython itself, but this sort of code surely lurks in extensions yet to be touched. (Pro-tip: if you're working with this patch, and you see a crash, and gdb shows you something like this at the top of the stack: #0 0x081056d8 in visit_decref (op=0x8247aa0, data=0x0) at Modules/gcmodule.c:323 323 if (PyObject_IS_GC(op)) { your problem is an errant &, likely on a type object you're passing in to the interpreter. Think--what did you touch recently? Or debug it by salting your code with calls to collect(NUM_GENERATIONS-1).) Another irksome side-effect of the API: because of "tp_flags" and "tp_dealloc", I now have two declarations of PyTypeObject. There's the externally-visible one in Include/object.h, which lets external parties see "tp_dealloc" and "tp_flags". Then there's the internal one in Objects/typeprivate.h which is the real structure. Since declaring a type twice is a no-no, the external one is gated on #ifndef PY_TYPEPRIVATE If you're a normal Python extension programmer, you'd include Python.h as normal: #include "Python.h" Python implementation files that need to see the real PyTypeObject structure now look like this: #define PY_TYPEPRIVATE #include "Python.h" #include "../Objects/typeprivate.h" Also, since the structure of PyTypeObject hasn't yet changed, there are a bunch of fields in PyTypeObject that are externally visible that I don't want to be visible. To ensure no one was using them, I renamed them to "mysterious_object_0" and "mysterious_object_1" and the like. Before this patch gets accepted, I want to reorder the fields in PyTypeObject (which we can! because it's private!) so that these public fields are at the top of the both the external and internal structures. THE UPGRADE PATH Python internally declares a great many types, and I haven't attempted to convert them all. Instead there's an conversion header file that does most of the work for you. Here's how one would apply it to an existing type. 1. Where your file currently has this: #include "Python.h" change it to this: #define PY_TYPEPRIVATE #include "Python.h" #include "pytypeconvert.h" 2. Whenever you declare a type, change it from this: static PyTypeObject YourExtension_Type = { to this: static PyTypeObject *YourExtension_Type; static PyTypeObject _YourExtension_Type = { Use NULL for your metaclass. For example, change this: PyObject_HEAD_INIT(&PyType_Type), to this: PyObject_HEAD_INIT(NULL), Also use NULL for your baseclass. For example, change this: &PyDict_Type, /* tp_base */ to this: NULL, /* tp_base */ setting it to NULL instead. 3. In your module's init function, add this: CONVERT_TYPE(YourExtension_Type, metaclass, baseclass, "description of type"); "metaclass" and "baseclass" should be the metaclass and baseclass for your type, the ones you just set to NULL in step 3. If you had NULL before the baseclass, use NULL here too. 4. If you have any static object declarations, set their ob_type to NULL in the static declaration, then set it explicitly in your init function. If your object uses a locally-defined type, be sure to do this *after* the CONVERT_TYPE line for that type. (See _Py_EllipsisObject for an example.) 5. Anywhere you're using existing Python type declarations you must remove the & from the front. The conversion header file *also* redefines PyTypeObject. But this time it redefines it to the existing definition, and that definition will stay the same forever. That's the whole point: if you have an existing Python 3.0 extension, it won't have to change if we change the internal definition of PyTypeObject. (Why bother with this conversion process, with few py3k extensions in the wild? This patch was started quite a while ago, when it seemed plausible the API would get backported to 2.x. Now I'm not so sure that will happen.) THE CURRENT PATCH I've uploaded a patch to the tracker: http://bugs.python.org/issue5872 It applies cleanly to py3k/trunk (r72081). But the code is awfully grubby. * I haven't dealt with any types I can't build, and I can't build a lot of the extensions. I'm using Linux, and I don't have the dev headers for many libraries on my laptop, and I haven't touched Windows or Mac stuff. * I created some new build warnings which should obviously be fixed. * With the patch installed, py3k trunk builds and installs. It does *not* pass the regression test suite. (It crashes.) I don't think this'll be too bad, it's just taken me this long to get it as far as I have. * There are some internal scaffolds and hacks that should be purged by the final patch. * There's no documentation. If you'd like to see how you'd use the new API, currently the best way to learn is to read Include/pytypeconvert.h. * I don't like the PY_TYPEPRIVATE hack. I only used it 'cause it sucks less than the other approaches I've thought of. I welcome your suggestions. The second-best approach I've come up with: make PyTypeObject genuinely private, and declare a different structure containing just the head of PyTypeObject. Let's call it PyTypeObjectHead. Then, for the convenience macros that use "dealloc" and "flags", cast the object to PyTypeObjectHead before dereferencing. This abandons type safety, and given my longing for type safety while developing this patch I'd prefer to not make loss of type safety an official API. THE FEEDBACK I SEEK My understanding is that the feature-freeze for Python 3.1 is in a little over a week. Given the current stability level and untestedness of the patch, and the lateness of the hour... is there any chance this would be accepted into Python 3.1? If so, I'll need to act fast. If not, I might as well take it relax, huh. My thanks to Neal Norwitz for suggesting this project, and Brett Cannon for some recent encouragement. (And another person who I discussed it with so long ago I forgot who it was... maybe Fredik Lundh?) /larry/ From cs at zip.com.au Wed Apr 29 05:27:40 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 13:27:40 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F768F3.8080304@g.nevcal.com> Message-ID: <20090429032740.GA31335@cskk.homeip.net> On 28Apr2009 13:37, Glenn Linderman wrote: > On approximately 4/28/2009 1:25 PM, came the following characters from > the keyboard of Martin v. L?wis: >>> The UTF-8b representation suffers from the same potential ambiguities as >>> the PUA characters... >> >> Not at all the same ambiguities. Here, again, the two choices: >> >> A. use PUA characters to represent undecodable bytes, in particular for >> UTF-8 (the PEP actually never proposed this to happen). >> This introduces an ambiguity: two different files in the same >> directory may decode to the same string name, if one has the PUA >> character, and the other has a non-decodable byte that gets decoded >> to the same PUA character. >> >> B. use UTF-8b, representing the byte will ill-formed surrogate codes. >> The same ambiguity does *NOT* exist. If a file on disk already >> contains an invalid surrogate code in its file name, then the UTF-8b >> decoder will recognize this as invalid, and decode it byte-for-byte, >> into three surrogate codes. Hence, the file names that are different >> on disk are also different in memory. No ambiguity. > > C. File on disk with the invalid surrogate code, accessed via the str > interface, no decoding happens, matches in memory the file on disk with > the byte that translates to the same surrogate, accessed via the bytes > interface. Ambiguity. Is this a Windows example, or (now I think on it) an equivalent POSIX example of using the PEP where the locale encoding is UTF-16? In either case, I would say one could make an argument for being stricter in reading in OS-native sequences. Grant that NTFS doesn't prevent half-surrogates in filenames, and likewise that POSIX won't because to the OS they're just bytes. On decoding, require well-formed data. When you hit ill-formed data, treat the nasty half surrogate as a PAIR of bytes to be escaped in the resulting decode. Ambiguity avoided. I'm more concerned with your (yours? someone else's?) mention of shift characters. I'm unfamiliar with these encodings: to translate such a thing into a Latin example, is it the case that there are schemes with valid encodings that look like: [SHIFT] a b c which would produce "ABC" in unicode, which is ambiguous with: A B C which would also produce "ABC"? Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Helicopters are considerably more expensive [than fixed wing aircraft], which is only right because they don't actually fly, but just beat the air into submission. - Paul Tomblin From v+python at g.nevcal.com Wed Apr 29 05:29:16 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 20:29:16 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> Message-ID: <49F7C98C.60406@g.nevcal.com> On approximately 4/28/2009 7:40 PM, came the following characters from the keyboard of R. David Murray: > On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote: >> C. File on disk with the invalid surrogate code, accessed via the str >> interface, no decoding happens, matches in memory the file on disk >> with the byte that translates to the same surrogate, accessed via the >> bytes interface. Ambiguity. > > Unless I'm missing something, one of these is type str, and the other is > type bytes, so no ambiguity. You are missing that the bytes value would get decoded to a str; thus both are str; so ambiguity is possible. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Wed Apr 29 05:32:15 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 20:32:15 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090428230655.GA23830@cskk.homeip.net> References: <20090428230655.GA23830@cskk.homeip.net> Message-ID: <49F7CA3F.6000206@g.nevcal.com> On approximately 4/28/2009 4:06 PM, came the following characters from the keyboard of Cameron Simpson: > I think I may be able to resolve Glenn's issues with the scheme lower > down (through careful use of definitions and hand waving). > Close. You at least resolved what you thought my issue was. And, you did make me more comfortable with the idea that I, in programs I write, would not be adversely affected by the PEP if implemented. While I can see that the PEP no doubt solves the os.listdir / open problem on POSIX systems for Python 3 + PEP programs that don't use 3rd party libraries, it does require programs that do use 3rd party libraries to be recoded with your functions -- which so far the PEP hasn't embraced. Or, to use the bytes APIs directly to get file names for 3rd party libraries -- but the directly ported, filenames-as-strings type of applications that could call 3rd party filenames-as-bytes libraries in 2.x must be tweaked to do something different than they did before. > On 27Apr2009 23:52, Glenn Linderman wrote: > >> On approximately 4/27/2009 7:11 PM, came the following characters from >> the keyboard of Cameron Simpson: >> > [...] > >>> There may be puns. So what? Use the right strings for the right purpose >>> and all will be well. >>> >>> I think what is missing here, and missing from Martin's PEP, is some >>> utility functions for the os.* namespace. >>> >>> PROPOSAL: add to the PEP the following functions: >>> >>> os.fsdecode(bytes) -> funny-encoded Unicode >>> This is what os.listdir() does to produce the strings it hands out. >>> os.fsencode(funny-string) -> bytes >>> This is what open(filename,..) does to turn the filename into bytes >>> for the POSIX open. >>> os.pathencode(your-string) -> funny-encoded-Unicode >>> This is what you must do to a de novo string to turn it into a >>> string suitable for use by open. >>> Importantly, for most strings not hand crafted to have weird >>> sequences in them, it is a no-op. But it will recode your puns >>> for survival. >>> > [...] > >>>> So assume a non-decodable sequence in a name. That puts us into >>>> Martin's funny-decode scheme. His funny-decode scheme produces a >>>> bare string, indistinguishable from a bare string that would be >>>> produced by a str API that happens to contain that same sequence. >>>> Data puns. >>>> >>>> >>> See my proposal above. Does it address your concerns? A program still >>> must know the providence of the string, and _if_ you're working with >>> non-decodable sequences in a names then you should transmute then into >>> the funny encoding using the os.pathencode() function described above. >>> >>> In this way the punning issue can be avoided. >>> _Lacking_ such a function, your punning concern is valid. >>> >> Seems like one would also desire os.pathdecode to do the reverse. >> > > Yes. > > >> And >> also versions that take or produce bytes from funny-encoded strings. >> > > Isn't that the first two functions above? > Yes, sorry. >> Then, if programs were re-coded to perform these transformations on what >> you call de novo strings, then the scheme would work. >> But I think a large part of the incentive for the PEP is to try to >> invent a scheme that intentionally allows for the puns, so that programs >> do not need to be recoded in this manner, and yet still work. I don't >> think such a scheme exists. >> > > I agree no such scheme exists. I don't think it can, just using strings. > > But _unless_ you have made a de novo handcrafted string with > ill-formed sequences in it, you don't need to bother because you > won't _have_ puns. If Martin's using half surrogates to encode > "undecodable" bytes, then no normal string should conflict because a > normal string will contain _only_ Unicode scalar values. Half surrogate > code points are not such. > > The advantage here is that unless you've deliberately constructed an > ill-formed unicode string, you _do_not_ need to recode into > funny-encoding, because you are already compatible. Somewhat like one > doesn't need to recode ASCII into UTF-8, because ASCII is unchanged. > Right. And I don't intend to generate ill-formed Unicode strings, in my programs. But I might well read their names from other sources. It is nice, and thank you for emphasizing (although I already did realize it, back there in the far reaches of the brain) that all the data puns are between ill-formed Unicode strings, and undecodable bytes strings. That is a nice property of the PEP's encoding/decoding method. I'm not sure it outweighs the disadvantage of taking unreadable gibberish, and producing indecipherable gibberish (codepoints with no glyphs), though, when there are ways to produce decipherable gibberish instead... or at least mostly-decipherable gibberish. Another idea forms.... described below. >> If there is going to be a required transformation from de novo strings >> to funny-encoded strings, then why not make one that people can actually >> see and compare and decode from the displayable form, by using >> displayable characters instead of lone surrogates? >> > > Because that would _not_ be a no-op for well formed Unicode strings. > > That reason is sufficient for me. > > I consider the fact that well-formed Unicode -> funny-encoded is a no-op > to be an enormous feature of Martin's scheme. > > Unless I'm missing something, there _are_no_puns_ between funny-encoded > strings and well formed unicode strings. > I think you are correct regarding where the puns are. I agree that not perturbing well-formed Unicode is a benefit. >>>>> I suppose if your program carefully constructs a unicode string riddled >>>>> with half-surrogates etc and imagines something specific should happen >>>>> to them on the way to being POSIX bytes then you might have a problem... >>>>> >>>>> >>>> Right. Or someone else's program does that. >>>> > > I've just spent a cosy 20 minutes with my copy of Unicode 5.0 and a > coffee, reading section 3.9 (Unicode Encoding Forms). > > I now do not believe your scenario makes sense. > > Someone can construct a Python3 string containing code points that > includes surrogates. Granted. > > However such a string is not meaningful because it is not well-formed > (D85). It's ill-formed (D84). It is not sane to expect it to > translate into a POSIX byte sequence, be it UTF-8 or anything else, > unless it is accompanied by some kind of explicit mapping provided by > the programmer. Absent that mapping, it's nonsense in much the same > way that a non-decodable UTF-8 byte sequence is nonsense. > > For example, Martin's funny-encoding is such an explicit mapping. > Such a string can be meaningful if it is used as a file name... it is the name of the file. I will agree that it would not be a word in any language, because it is composed of things that are not characters / codepoints, if that is what you meant. >>>> I only want to use >>>> Unicode file names. But if those other file names exist, I want to >>>> be able to access them, and not accidentally get a different file. >>>> > > But those other names _don't_ exist. > They do if someone constructs them. >>>>> Also, by avoiding reuse of legitimate characters in the encoding we can >>>>> avoid your issue with losing track of where a string came from; >>>>> legitimate characters are currently untouched by Martin's scheme, except >>>>> for the normal "bytes<->string via the user's locale" translation that >>>>> must already happen, and there you're aided by byets and strings being >>>>> different types. >>>>> >>>>> >>>> There are abnormal characters, but there are no illegal characters. >>>> >>> I though half-surrogates were illegal in well formed Unicode. I confess >>> to being weak in this area. By "legitimate" above I meant things like >>> half-surrogates which, like quarks, should not occur alone? >>> >> "Illegal" just means violating the accepted rules. >> > > I think that either we've lost track of what each other is saying, > or you're wrong here. And my poor terminology hasn't been helping. > > What we've got: > > (1) Byte sequence files names in the POSIX file system. > It doesn't matter whether the underlying storage is a real POSIX > filesystem or mostly POSIX one like MacOSX HFS or a remotely > attached non-POSIX filesystem like a Windows one, because we're > talking through the POSIX API, and it is handing us byte > sequences, which will expect may contain anything except a NUL. > > (2) Under Martin's scheme, os.listdir() et al hand us (and accept) > funny-encoded Python3 strings, which are strings of Unicode code > units (D77). > Particularly, if there were bytes in the POSIX byte string that > did not decode into Unicode scalar values (D76) then each such > byte is encoded as a surrogate (D71,72,73,74). > > it is important to note here that because surrogates are _not_ > Unicode scalar values, the is no punning between the two sets > of values. > > (3) Other Python3 strings that have not been through Martin's mangler > in either direction. Ordinary strings. > > Your concern is that, handed a string, a programmer could misuse (3) as > (2) or vice versa because of punning. > > In a well-formed unicode string there are no surrogates; surrogates only > occur in UTF-16 _encodings_ of Unicode strings (D75). > > Therefore, it _is_ possible to inspect a string, if one cared, to see if > it is funny-encoded or "raw". One may get two different answers: > > - If there are surrogate code units then it must be funny-encoded > and will therefore work perfectly if handed to a os.* interface. > > - If there are no surrogate code units the it may be funny encoded or it > may not have been through Martin's funny-encoder, you can't tell. > However, this doesn't matter because the encoder is a no-op for such > strings. > Therefore it will work perfectly if handed to an os.* interface. > > The only gap in this is a specially crated string containing surrogate > code points that did not come via Martin's encoder. But such a string > cannot come from a user interface, which will accept only characters > and there only include unicode scalar values. > > Such a string can only be explicitly constructed (eg with a \uD802 > code point). And if something constructs such a string, it must have in > mind an explicit interpretation of those code points, which means it is > the _constructor_ on whom the burden of translation lies. > > Does this make sesne to you, or have you a counter example in mind? > Lots of configuration systems permit schemes like C's \x to be used to create strings. Whether you perceive that to be a user interface or not, or believe that such things should be part of a user interface or not, they exist. Whether they validate that such strings are properly constructed Unicode text or should or should not do such validation, is open for discussion, but I'd be surprised if there are not some such schemes that don't do such checking, and consider it a feature. Why make the file name longer than necessary, when you can just use all these nice illegal codepoints to keep it shorter instead? Instead of 5 characters for a filename sequence counter, someone might stuff it in 1 character, in binary, and think they were clever. I've seen such techniques, although not specifically in Python, since I'm fairly new to reading Python code. So I consider it not beyond the realm of possibility to encounter lone surrogate code units in strings that haven't been through Martin's funny-encoder. Hence, I disbelieve that the gap you mention can be ignored. >> In this case, the >> accepted rules are those enforced by the file system (at the bytes or >> str API levels), and by Python (for the str manipulations). None of >> those rules outlaw lone surrogates. Hence, while all of the systems >> under discussion can handle all Unicode characters in one way or >> another, none of them require that all Unicode rules are followed. Yes, >> you are correct that lone surrogates are illegal in Unicode. No, none >> of the accepted rules for these systems require Unicode. >> > > However, Martin's scheme explicitly translates these ill-formed > sequences into Python3 strings and back, losslessly. You can have > surrogates in the filesystem storage/API on Windows. You can have > non-UTF-8-decodable sequences in the POSIX filesystem layer too. > They're all taken in and handled. > It is still not clear whether the PEP (1) would be implemented on Windows (2) if it is, if it prevents lone surrogates from being obtained from the str APIs, by transcoding them into 3 lone surrogates, and if doesn't transcode from the str APIs, but does funny-decode from the bytes APIs, then it would seem there is still the possibility of data puns on Windows. > In Python3 space, one might have a bytes object with a raw POSIX > byte filename in it. Presumably one can also have a byte string with a > raw (UTF-16) WIndows filename in it. They're not strings, so no > confusion. > > But there's no _string_ for these things without a matching > string<->bytestring mapping associated with it. > > If you have a Python3 string which is well-formed Unicode, then you can > hand it to the os.* interfaces and the Right Thing will happen (on > Windows just because it stored Unicode and on POSIX provided you agree > that your locale/getfilesystemencoding() is the right thing). > > If you have a string that isn't well-formed, then the meaning of any > code points which are not Unicode scalar values is not well defined > without some auxiliary stuff in the app. > > >>>> NTFS permits any 16-bit "character" code, including abnormal ones, >>>> including half-surrogates, and including full surrogate sequences >>>> that decode to PUA characters. POSIX permits all byte sequences, >>>> including things that look like UTF-8, things that don't look like >>>> UTF-8, things that look like half-surrogates, and things that look >>>> like full surrogate sequences that decode to PUA characters. >>>> > > See above. I think this is addressed. > Without transcoding on the str APIs, which I haven't seen mentioned, I don't think so. > [...] > >>> These are existing file objects, I'll take them as source 1. They get >>> encoded for release by os.listdir() et al. >>> >>> >>>> And yes, strings can be generated from scratch. >>>> >>> I take this to be source 2. >>> >> One variation of source 2 is reading output from other programs, such as >> ls (POSIX) or dir (Windows). >> > > Sure. But that is reading byte sequences, and one must again know the > encoding. If that is known and the input decoded happily into Unicode > scalar values, then there is no issue. If the input didn't decode, then > one must make some decision about what the non-decodable bits mean. > Sure. So the PEP needs your functions, or the equivalent. Last I checked, they weren't there. >>> I think I agree with all the discussion that followed, and think the >>> real problem is lack of utlities functions to funny-encode source 2 >>> strings for use. hence the proposal above. >>> >> I think we understand each other now. I think your proposal could work, >> Cameron, although when recoding applications to use your proposal, I'd >> find it easier to use the "file name object" that others have proposed. >> I think that because either your proposal or the object proposals >> require recoding the application, that they will not be accepted. I >> think that because the PEP 383 allows data puns, that it should not be >> accepted in its present form. >> > > I'm of the option now that the puns can only occur when the source 2 > string has surrogates, and either those surrogates are chosen to match > the funny-encoding, in which case the pun is not a pun, or the > surrogates are chosen according to a different scheme in which case > source 2 is obliged to provide a mapping. > > A source 2 string of only Unicode scalar values doesn't need remapping. > A correct translation of source 2 strings would be obliged to call one of your functions, that doesn't exist in the PEP, because it appears the PEP wants to assume that such strings don't exist, unless it creates them. So this takes porting effort for programs generating and consuming such strings, to avoid being mangled by the PEP. That isn't necessary today, only post-PEP. >> I think your if your proposal is accepted, that it then becomes possible >> to use an encoding that uses visible characters, which makes it easier >> for people to understand and verify. An encoding such as the one I >> suggested, but perhaps using a more obscure character, if there is one, >> but yet doesn't violate true Unicode. >> > > I think any scheme that uses any Unicode scalar value as an escape > character _inherently_ introduces puns, and puns that are easier to > encounter. > > I think the real strength of Martin's scheme is exactly that bytes strings > that needed the funny-encoding _do_ produce ill-formed Unicode strings, > because such strings _cannot_ conflict with well-formed strings. > > I think your desire for a human readable encoding is valid, but it should > be a further purely "presentation" step, somewhat like quoted-printable > encoding in MIME, and not the scheme used by Martin. > Another step? Even more porting effort? For a PEP that is trying to avoid porting effort? But maybe there is a compromise that mostly meets both goals: use U+DC10 as a (high-flying) escape character. It is not printable, so the substitution glyph will likely get displayed by display functions. Then transcode illegal bytes to the range U+0100 to U+01FF, and transcode existing U+DC10 to U+DC10 U+DC10. 1) This is an easy to understand scheme, and illegal byte values would become displayable, but would each be preceded by the substitution glyph for the U+DC10. 2) There would be no need to transcode other lone surrogates... on the other hand, any illegal code values could be treated as illegal bytes and transcoded, making the strings more nearly legal, and more uniformly displayable. 3) The property that all potential data puns are among ill-formed Unicode strings is still retained. 4) Because the result string is nearly legal Unicode (except for the escape characters U+DC10), it becomes uniformly comparable and different strings can be visibly different. 5) It is still necessary to transcode names from str interfaces, to escape any U+DC10 characters, at least, which is also required by this PEP to avoid data puns on systems that have both str and bytes interfaces. >> I think it should transform all >> data, from str and bytes interfaces, and produce only str values >> containing conforming Unicode, escaping all the non-conforming sequences >> in some manner. This would make the strings truly readable, as long as >> fonts for all the characters are available. >> > > But I think it would just move the punning. A human readable string with > readable escapes in it may be funny-encoded. _Or_ it may be "raw", with > funny-encoded yet to happen; after all only might weirdly be dealing > with a filename which contained post-funny-encode visible sequences in > it. > > SO you're right back to _guessing_ what you're looking at. > > WIth the surrogate scheme you only have to guess if there are surrogates, > but then you _know_ that you're dealing with a special encoding scheme; > it is certain - the guess is about which scheme. > I think you mean you don't have to guess if there are lone surrogates... you can look and see. > If you're working in a domain with no ill-formed strings you never need > to worry at all. > > With a visible/printable-encoding such as you advocate the guess is about > whether the scheme have even been used, which is why I think it is worse. > So the above scheme, using a U+DC10 escape character, meets your desirable truisms about lone surrogates being the trigger for knowing that you are dealing with bizarro names, but being uncertain about which kind, and also makes the results lots more readable. I still think there is a need to provide the encoding and decoding functions, for both bytes and de novo strings. >> And I had already suggested >> the utility functions you are suggesting, actually, in my first tirade >> against PEP 383 (search for "The encode and decode functions should be >> available for coders to use, that code to external >> interfaces, either OS or 3rd party packages, that do not use this >> encoding scheme"). >> > > I must have missed that sentence. But it sounds like we want the same > facilities at least. > > >> The solution that was proposed in the lead up to releasing Python 3.0 >> was to offer both bytes and str interfaces (so we have those), and then >> for those that want to have a single portable implementation that can >> access all data, an object that encapsulates the differences, and the >> variant system APIs. (file system is one, command line is another, >> environment is another, I'm not sure if there are more.) I haven't >> heard if any progress on such an encapsulating object has been made; the >> people that proposed such have been rather quiet about this PEP. I >> would expect that an object implementation would provide display >> strings, and APIs to submit de novo str and bytes values to an object, >> which would run the appropriate encoding on them. >> > > I think covering these other cases is quite messy, if only because > there's not even agreement amonst existing command line apps about all > that stuff. > > Regarding "APIs to submit de novo str and bytes values to an object, > which would run the appropriate encoding on them" I think such a > facility for de novo strings must require the caller to provide a > handler/mapper for the not-well-formed parts of such strings if they > occur. > The caller shouldn't have to supply anything. The same encoding that is applied to str system interfaces that supply strings should be applied to de novo strings. It is just a matter of transcoding a de novo string into the "right form" that it can then be encoded by the system encoder to produce the original string again, if it goes to a str interface, or to an equivalent bytes string, if it goes to a bytes interface. >> Programs that want to use str interfaces on POSIX will see a subset of >> files on systems that contain files whose bytes filenames are not >> decodable. >> > > Not under Martin's scheme, because all bytes filenames _are_ decoded. > I think I was speaking of the status quo, here, not with the PEP. >> If a sysadmin wants to standardize on UTF-8 names >> universally, they can use something like convmv to clean up existing >> file names that don't conform. Programs that use str interfaces on >> POSIX system will work fine, but with a subset of the files. When that >> is unacceptable, they can either be recoded to use the bytes interfaces, >> or the hopefully forthcoming object encapsulation. The issue then will >> be what technique will be used to transform bytes into display names, >> but since the display names would never be fed back to the objects >> directly (but the object would have an interface to accept de novo str >> and de novo bytes) then it is just a display issue, and one that uses >> visible characters would seem more useful in my mind, than one that uses >> half-surrogates or PUAs. >> > > I agree it might be handy to have a display function, but isn't repr() > exactly that, now I think of it? repr is a display function that produces rather ugly results in most non-ASCII cases. But then again, one could use repr as the funny-encoding scheme, too... I don't think we want to use repr for either case, actually. Of course, with Py 3, if the file names were objects, and could have reprlib customizations... :) :) -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From tmbdev at gmail.com Wed Apr 29 07:12:46 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Wed, 29 Apr 2009 07:12:46 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> Message-ID: <7e51d15d0904282212j681084f3i72be4eb316428499@mail.gmail.com> > > It cannot crash Python; it can only crash > hypothetical third-party programs or libraries with deficient error > checking and > unreasonable assumptions about input data. The error checking isn't necessarily deficient. For example, a safe and legitimate thing to do is for third party libraries to throw a C++ exception, raise a Python exception, or delete the half surrogate. Any of those would break one of the use cases people have been talking about, namely being able to present the output from os.listdir() to the user, say in a file selector, and then access that file. (and, of course, you haven't even proven those programs or libraries exist) > PEP 383 is a proposal that suggests changing Python such that malformed unicode strings become a required part of Python and such that Pyhon writes illegal UTF-8 encodings to UTF-8 encoded file systems. Those are big changes, and it's legitimate to ask that PEP 383 address the implications of that choice before it's made. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Apr 29 07:45:08 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 07:45:08 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <49F6A947.1050106@v.loewis.de> <7e51d15d0904280030k1aa4629dnd4e9d79312da85e9@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> Message-ID: <49F7E964.9050700@v.loewis.de> > The wide APIs use UTF-16. UTF-16 suffers from the same problem as > UTF-8: not all sequences of words are valid UTF-16 sequences. In > particular, sequences containing isolated surrogate pairs are not > well-formed according to the Unicode standard. Therefore, the existence > of a wide character API function does not guarantee that the wide > character strings it returns can be converted into valid unicode > strings. And, in fact, Windows Vista happily creates files with > malformed UTF-16 encodings, and os.listdir() happily returns them. Whatever. What does that have to do with PEP 383? Your claim was that PEP 383 may have unfortunate effects on Windows, and I'm telling you that it won't, because the behavior of Python on Windows won't change at all. So whatever the problem - it's there already, and the PEP is not going to change it. I personally don't see a problem here - *of course* os.listdir will report invalid utf-16 encodings, if that's what is stored on disk. It doesn't matter whether the file names are valid wrt. some specification. What matters is that you can access all the files. Regards, Martin From martin at v.loewis.de Wed Apr 29 07:52:23 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 07:52:23 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F788A6.3040702@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> Message-ID: <49F7EB17.4010309@v.loewis.de> >>> C. File on disk with the invalid surrogate code, accessed via the str >>> interface, no decoding happens, matches in memory the file on disk with >>> the byte that translates to the same surrogate, accessed via the bytes >>> interface. Ambiguity. >> >> Is that an alternative to A and B? > > I guess it is an adjunct to case B, the current PEP. > > It is what happens when using the PEP on a system that provides both > bytes and str interfaces, and both get used. Your formulation is a bit too stenographic to me, but please trust me that there is *no* ambiguity in the case you construct. By "accessed via the str interface", I assume you do something like fn = "some string" open(fn) You are wrong in assuming "no decoding happens", and that "matches in memory the file on disk" (whatever that means - how do I match a file on disk in memory??????). What happens instead is that fn gets *encoded* with the file system encoding, and the python-escape handler. This will *not* produce an ambiguity. If you think there is an ambiguity in that you can use both the byte interface and the string interface to access the same file: this would be a ridiculous interpretation. *Of course* you can access /etc/passwd both as "/etc/passwd" and b"/etc/passwd", there is nothing ambiguous about that. Regards, Martin From martin at v.loewis.de Wed Apr 29 08:04:52 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 08:04:52 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7BDD8.3010202@gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F7BDD8.3010202@gmail.com> Message-ID: <49F7EE04.6090701@v.loewis.de> >> The Python UTF-8 codec will happily encode half-surrogates; people argue >> that it is a bug that it does so, however, it would help in this >> specific case. > > Can we use this encoding scheme for writing into files as well? We've > turned the filename with undecodable bytes into a string with half > surrogates. Putting that string into a file has to turn them into bytes > at some level. Can we use the python-escape error handler to achieve > that somehow? Sure: if you are aware that what you write to the stream is actually a file name, you should encode it with the file system encoding, and the python-escape handler. However, it's questionable that the same approach is right for the rest of the data that goes into the file. If you use a different encoding on the stream, yet still use the python-escape handler, you may end up with completely non-sensical bytes. In practice, it probably won't be that bad - python-escape has likely escaped all non-ASCII bytes, so that on re-encoding with a different encoding, only the ASCII characters get encoded, which likely will work fine. Regards, Martin From martin at v.loewis.de Wed Apr 29 08:07:10 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 29 Apr 2009 08:07:10 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090429032740.GA31335@cskk.homeip.net> References: <20090429032740.GA31335@cskk.homeip.net> Message-ID: <49F7EE8E.1030404@v.loewis.de> > I'm more concerned with your (yours? someone else's?) mention of shift > characters. I'm unfamiliar with these encodings: to translate such a > thing into a Latin example, is it the case that there are schemes with > valid encodings that look like: > > [SHIFT] a b c > > which would produce "ABC" in unicode, which is ambiguous with: > > A B C > > which would also produce "ABC"? No: the "shift" in "shift-jis" is not really about the shift key. See http://en.wikipedia.org/wiki/Shift-JIS Regards, Martin From martin at v.loewis.de Wed Apr 29 08:27:18 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 08:27:18 +0200 Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in System Character?Interfaces In-Reply-To: <20090429023353.GA11210@cskk.homeip.net> References: <20090429023353.GA11210@cskk.homeip.net> Message-ID: <49F7F346.2010003@v.loewis.de> > I would like utility functions to perform: > os-bytes->funny-encoded > funny-encoded->os-bytes > or explicit example code snippets for same in the PEP text. Done! Martin From tmbdev at gmail.com Wed Apr 29 08:53:36 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Wed, 29 Apr 2009 08:53:36 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F7E964.9050700@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> Message-ID: <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> On Wed, Apr 29, 2009 at 07:45, "Martin v. L?wis" wrote: > Your claim was > that PEP 383 may have unfortunate effects on Windows, No, I simply think that PEP 383 is not sufficiently specified to be able to tell. > and I'm telling > you that it won't, because the behavior of Python on Windows won't > change at all. A justification for your proposal is that there are differences between Python on UNIX and Windows that you would like to reduce. But depending on where you introduce utf-8b coding on UNIX, you may also have to introduce it on Windows in order to keep the platforms consistent. So whatever the problem - it's there already, and the > PEP is not going to change it. OK, so you are saying that under PEP 383, utf-8b wouldn't be used anywhere on Windows by default. That's not clear from your proposal. It's also not clear from your proposal where utf-8b will get used on UNIX systems. Some of the places that have been suggested are: open, os.listdir, sys.argv, os.getenv. There are other potential ones, like print, write, and os.system. And what about text file and string conversions: will utf-8b become the default, or optional, or unavailable? Each of those choices potentially has significant implications. I'm just asking what those choices are so that one can then talk about the implications and see whether this proposal is a good one or whether other alternatives are better. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Wed Apr 29 08:54:21 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 28 Apr 2009 23:54:21 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7EB17.4010309@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> Message-ID: <49F7F99D.8070606@g.nevcal.com> On approximately 4/28/2009 10:52 PM, came the following characters from the keyboard of Martin v. L?wis: >>>> C. File on disk with the invalid surrogate code, accessed via the str >>>> interface, no decoding happens, matches in memory the file on disk with >>>> the byte that translates to the same surrogate, accessed via the bytes >>>> interface. Ambiguity. >>> Is that an alternative to A and B? >> I guess it is an adjunct to case B, the current PEP. >> >> It is what happens when using the PEP on a system that provides both >> bytes and str interfaces, and both get used. > > Your formulation is a bit too stenographic to me, but please trust me > that there is *no* ambiguity in the case you construct. No Martin, the point of reviewing the PEP is to _not_ trust you, even though you are generally very knowledgeable and very trustworthy. It is much easier to find problems before something is released, or even coded, than it is afterwards. > By "accessed via the str interface", I assume you do something like > > fn = "some string" > open(fn) > > You are wrong in assuming "no decoding happens", and that "matches > in memory the file on disk" (whatever that means - how do I match > a file on disk in memory??????). What happens instead is that fn > gets *encoded* with the file system encoding, and the python-escape > handler. This will *not* produce an ambiguity. You assumed, and maybe I wasn't clear in my statement. By "accessed via the str interface" I mean that (on Windows) the wide string interface would be used to obtain a file name. Now, suppose that the file name returned contains "abc" followed by the half-surrogate U+DC10 -- four 16-bit codes. Then, ask for the same filename via the bytes interface, using UTF-8 encoding. The PEP says that the above name would get translated to "abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes used to represent the half-surrogate that is actually in the file name, specifically U+DCED U+DCB0 U+DC90. This means that one name on disk can be seen as two different names in memory. Now posit another file which, when accessed via the str interface, has the name "abc" followed by U+DCED U+DCB0 U+DC90. Looks ambiguous to me. Now if you have a scheme for handling this case, fine, but I don't understand it from what is written in the PEP. > If you think there is an ambiguity in that you can use both the > byte interface and the string interface to access the same file: > this would be a ridiculous interpretation. *Of course* you can > access /etc/passwd both as "/etc/passwd" and b"/etc/passwd", > there is nothing ambiguous about that. Yes, this would be a ridiculous interpretation of "ambiguous". -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Wed Apr 29 09:17:23 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 09:17:23 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> Message-ID: <49F7FF03.2090909@v.loewis.de> > OK, so you are saying that under PEP 383, utf-8b wouldn't be used > anywhere on Windows by default. That's not clear from your proposal. You didn't read it carefully enough. The first three paragraphs of the "Specification" section make that clear. Regards, Martin From martin at v.loewis.de Wed Apr 29 09:29:05 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 09:29:05 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7F99D.8070606@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> Message-ID: <49F801C1.2070109@v.loewis.de> >>>>> C. File on disk with the invalid surrogate code, accessed via the str >>>>> interface, no decoding happens, matches in memory the file on disk >>>>> with >>>>> the byte that translates to the same surrogate, accessed via the bytes >>>>> interface. Ambiguity. >>>> Is that an alternative to A and B? >>> I guess it is an adjunct to case B, the current PEP. >>> >>> It is what happens when using the PEP on a system that provides both >>> bytes and str interfaces, and both get used. >> >> Your formulation is a bit too stenographic to me, but please trust me >> that there is *no* ambiguity in the case you construct. > > > No Martin, the point of reviewing the PEP is to _not_ trust you, even > though you are generally very knowledgeable and very trustworthy. It is > much easier to find problems before something is released, or even > coded, than it is afterwards. Sure. However, that requires you to provide meaningful, reproducible counter-examples, rather than a stenographic formulation that might hint some problem you apparently see (which I believe is just not there). > You assumed, and maybe I wasn't clear in my statement. > > By "accessed via the str interface" I mean that (on Windows) the wide > string interface would be used to obtain a file name. What does that mean? What specific interface are you referring to to obtain file names? Most of the time, file names are obtained by the user entering them on the keyboard. GUI applications are completely out of the scope of the PEP. > Now, suppose that > the file name returned contains "abc" followed by the half-surrogate > U+DC10 -- four 16-bit codes. Ok, so perhaps you might be talking about os.listdir here. Communication would be much easier if I would not need to guess what you may mean. Also, why is U+DC10 four 16-bit codes? > Then, ask for the same filename via the bytes interface, using UTF-8 > encoding. How do you do that on Windows? You cannot just pick an encoding, such as UTF-8, and pass that to the byte interface, and expect it to work. If you use the byte interface, you need to encode in the file system encoding, of course. Also, what do you mean by "ask for"?????? WHAT INTERFACE ARE YOU USING???? Please use specific python code. > The PEP says that the above name would get translated to > "abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes > used to represent the half-surrogate that is actually in the file name, > specifically U+DCED U+DCB0 U+DC90. This means that one name on disk can > be seen as two different names in memory. You are relying on false assumptions here, namely that the UTF-8 encoding would play any role. What would happen instead is that the "mbcs" encoding would be used. The "mbcs" encoding, by design from Microsoft, will never report an error, so the error handler will not be invoked at all. > Now posit another file which, when accessed via the str interface, has > the name "abc" followed by U+DCED U+DCB0 U+DC90. > > Looks ambiguous to me. Now if you have a scheme for handling this case, > fine, but I don't understand it from what is written in the PEP. You were just making false assumptions in your reasoning, assumptions that are way beyond the scope of the PEP. Regards, Martin From baptiste13z at free.fr Wed Apr 29 09:38:38 2009 From: baptiste13z at free.fr (Baptiste Carvello) Date: Wed, 29 Apr 2009 09:38:38 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F76422.4010806@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F7510D.7070603@mrabarnett.plus.com> <49F76422.4010806@g.nevcal.com> Message-ID: Glenn Linderman a ?crit : > > 3. When an undecodable byte 0xPQ is found, decode to the escape > codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. > The problem with this strategy is: paths are often sliced, so your 2 codepoints could get separated. The good thing with the PEP's strategy is that 1 character stays 1 character. Baptiste From v+python at g.nevcal.com Wed Apr 29 09:49:37 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 00:49:37 -0700 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F7FF03.2090909@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> Message-ID: <49F80691.80403@g.nevcal.com> On approximately 4/29/2009 12:17 AM, came the following characters from the keyboard of Martin v. L?wis: >> OK, so you are saying that under PEP 383, utf-8b wouldn't be used >> anywhere on Windows by default. That's not clear from your proposal. > > You didn't read it carefully enough. The first three paragraphs of > the "Specification" section make that clear. Sorry, rereading those paragraphs even with this declaration in mind, does not make that clear. It is not enough to have a solution that works; it is necessary to communicate that solution clearly enough that people understand it. By the huge amount of feedback you have received, it is clear that either the solution doesn't work, or that it wasn't communicated clearly. The following comments are an attempt to help you make the PEP clear, based on your above declaration that UTF-8b wouldn't be used on Windows. I may still be unclear about what you mean, but if you can accept these enhancements to the PEP, then maybe we are approaching a common understanding; if not, you should be aware that the PEP still needs clarification. In the first paragraph, you should make it clear that Python 3.0 does not use the Windows bytes interfaces, if it doesn't. "Python uses *only* the wide character APIs..." would suffice. As stated, it seems like Python *does* use the wide character APIs, but leaves open the possibility that it might use byte APIs also. A short description of what happens on Windows when Python code uses bytes APIs would also be helpful. In the second paragraph, it speaks of "currently" but then speaks of using the half-surrogates. I don't believe that happens "currently". You did change tense, but that paragraph is quite confusing, currently, because of the tense change. You should describe there, the action that is currently taken by Python for non-decodable byes, and then in the next paragraph talk about what the PEP changes. The 4th paragraph is now confusing too... would it not be the decode error handler that returns the byte strings, in addition to the Unicode strings? The 5th paragraph has apparently confused some people into thinking this PEP only applies to locale's using UTF-8 encodings; you should have an "else clause" to clear that up, pointing out that the reverse encoding of half-surrogates by other encodings already produces errors, that UTF-8 is a special case, not the only case. The code added to the discussion has mismatched (), making me wonder if it is complete. There is a reasonable possibility that only the final ) is missing. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From cs at zip.com.au Wed Apr 29 10:17:44 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 18:17:44 +1000 Subject: [Python-Dev] Python-Dev PEP 383: Non-decodable Bytes in System Character?Interfaces In-Reply-To: <49F7F346.2010003@v.loewis.de> Message-ID: <20090429081744.GA18296@cskk.homeip.net> On 29Apr2009 08:27, Martin v. L?wis wrote: | > I would like utility functions to perform: | > os-bytes->funny-encoded | > funny-encoded->os-bytes | > or explicit example code snippets for same in the PEP text. | | Done! Thanks! -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ From hrvoje.niksic at avl.com Wed Apr 29 10:29:32 2009 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Wed, 29 Apr 2009 10:29:32 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <22573349.1882424.1240944925201.JavaMail.xicrypt@atgrzls001> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <15546941.1861678.1240922288709.JavaMail.xicrypt@atgrzls001> <49F6FA93.7080302@avl.com> <22573349.1882424.1240944925201.JavaMail.xicrypt@atgrzls001> Message-ID: <49F80FEC.9000900@avl.com> Zooko O'Whielacronx wrote: >> If you switch to iso8859-15 only in the presence of undecodable >> UTF-8, then you have the same round-trip problem as the PEP: both >> b'\xff' and b'\xc3\xbf' will be converted to u'\u00ff' without a >> way to unambiguously recover the original file name. > > Why do you say that? It seems to work as I expected here: > > >>> '\xff'.decode('iso-8859-15') > u'\xff' > >>> '\xc3\xbf'.decode('iso-8859-15') > u'\xc3\xbf' Here is what I mean by "switch to iso8859-15" only in the presence of undecodable UTF-8: def file_name_to_unicode(fn, encoding): try: return fn.decode(encoding) except UnicodeDecodeError: return fn.decode('iso-8859-15') Now, assume a UTF-8 locale and try to use it on the provided example file names. >>> file_name_to_unicode(b'\xff', 'utf-8') '?' >>> file_name_to_unicode(b'\xc3\xbf', 'utf-8') '?' That is the ambiguity I was referring to -- to different byte sequences result in the same unicode string. From baptiste13z at free.fr Wed Apr 29 10:43:49 2009 From: baptiste13z at free.fr (Baptiste Carvello) Date: Wed, 29 Apr 2009 10:43:49 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> <49F6FF49.6010205@avl.com> Message-ID: Lino Mastrodomenico a ?crit : > > Only for the new utf-8b encoding (if Martin agrees), while the > existing utf-8 is fine as is (or at least waaay outside the scope of > this PEP). > This is questionable. This would have the consequence that \udcxx in a python string would sometimes mean a surrogate, and sometimes mean raw bytes, depending on the history of the string. By contrast, if the new utf-8b codec would *supercede* the old one, \udcxx would always mean raw bytes (at least on UCS-4 builds, where surrogates are unused). Thus ambiguity could be avoided. Baptiste From baptiste13z at free.fr Wed Apr 29 11:09:09 2009 From: baptiste13z at free.fr (Baptiste Carvello) Date: Wed, 29 Apr 2009 11:09:09 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F6A7C0.6090105@g.nevcal.com> References: <20090428021117.GA25536@cskk.homeip.net> <49F6A7C0.6090105@g.nevcal.com> Message-ID: Glenn Linderman a ?crit : > > If there is going to be a required transformation from de novo strings > to funny-encoded strings, then why not make one that people can actually > see and compare and decode from the displayable form, by using > displayable characters instead of lone surrogates? > The problem with your "escape character" scheme is that the meaning is lost with slicing of the strings, which is a very common operation. >> >> I though half-surrogates were illegal in well formed Unicode. I confess >> to being weak in this area. By "legitimate" above I meant things like >> half-surrogates which, like quarks, should not occur alone? >> > > "Illegal" just means violating the accepted rules. In this case, the > accepted rules are those enforced by the file system (at the bytes or > str API levels), and by Python (for the str manipulations). None of > those rules outlaw lone surrogates. [...] > Python could as well *specify* that lone surrogates are illegal, as their meaning is undefined by Unicode. If this rule is respected language-wise, there is no ambiguity. It might be unrealistic on windows, though. This rule could even be specified only for strings that represent filesystem paths. Sure, they are the same type as other strings, but the programmer usually knows if a given string is intended to be a path or not. Baptiste From tmbdev at gmail.com Wed Apr 29 11:19:01 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Wed, 29 Apr 2009 11:19:01 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F801C1.2070109@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> Message-ID: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> > Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). Well, here's another one: PEP 383 would disallow UTF-8 encodings of half surrogates. But such encodings are currently supported by Python, and they are used as part of CESU-8 coding. That's, in fact, a common way of converting UTF-16 to UTF-8. How are you going to deal with existing code that relies on being able to code half surrogates as UTF-8? Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Apr 29 11:25:17 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 29 Apr 2009 09:25:17 +0000 (UTC) Subject: [Python-Dev] PEP 383 (again) References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <7e51d15d0904282212j681084f3i72be4eb316428499@mail.gmail.com> Message-ID: Thomas Breuel gmail.com> writes: > > The error checking isn't necessarily deficient.? For example, a safe and legitimate thing to do is for third party libraries to throw a C++ exception, raise a Python exception, or delete the half surrogate. Do you have any concrete examples of this behaviour? When e.g. Nautilus shows some illegal UTF-8 filenames in an UTF-8 locale, it replaces the offending bytes with placeholders rather than crash in your face. > PEP 383 is a proposal that suggests changing Python such that malformed unicode strings become a required part of Python and such that Pyhon writes illegal UTF-8 encodings to UTF-8 encoded file systems. That's again a misleading statement. It only writes an "illegal encoding" if it received one from the filesystem in the first place. A clean filesystem will only receive clean filenames. >? Those are big changes, and it's legitimate to ask that PEP 383 address the implications of that choice before it's made. No, it's legitimate to ask that /you/ back up your arguments with concrete facts. It's difficult to demonstrate the non-existence of a problem. On the other hand, you can easily demonstrate that it exists, if it really does. By the way, most of those libraries under Unix would take a char * as input, so they wouldn't deal with an "illegal unicode string", they would deal with the original byte string. Regards Antoine. From v+python at g.nevcal.com Wed Apr 29 11:38:32 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 02:38:32 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F7510D.7070603@mrabarnett.plus.com> <49F76422.4010806@g.nevcal.com> Message-ID: <49F82018.5060407@g.nevcal.com> On approximately 4/29/2009 12:38 AM, came the following characters from the keyboard of Baptiste Carvello: > Glenn Linderman a ?crit : >> >> 3. When an undecodable byte 0xPQ is found, decode to the escape >> codepoint, followed by codepoint U+01PQ, where P and Q are hex digits. >> > > The problem with this strategy is: paths are often sliced, so your 2 > codepoints could get separated. The good thing with the PEP's strategy > is that 1 character stays 1 character. > > Baptiste Except for half-surrogates that are in the file names already, which get converted to 3 characters. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Wed Apr 29 11:56:05 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 02:56:05 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F801C1.2070109@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> Message-ID: <49F82435.3060205@g.nevcal.com> On approximately 4/29/2009 12:29 AM, came the following characters from the keyboard of Martin v. L?wis: >>>>>> C. File on disk with the invalid surrogate code, accessed via the str >>>>>> interface, no decoding happens, matches in memory the file on disk >>>>>> with >>>>>> the byte that translates to the same surrogate, accessed via the bytes >>>>>> interface. Ambiguity. >>>>> Is that an alternative to A and B? >>>> I guess it is an adjunct to case B, the current PEP. >>>> >>>> It is what happens when using the PEP on a system that provides both >>>> bytes and str interfaces, and both get used. >>> Your formulation is a bit too stenographic to me, but please trust me >>> that there is *no* ambiguity in the case you construct. >> >> No Martin, the point of reviewing the PEP is to _not_ trust you, even >> though you are generally very knowledgeable and very trustworthy. It is >> much easier to find problems before something is released, or even >> coded, than it is afterwards. > > Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). > >> You assumed, and maybe I wasn't clear in my statement. >> >> By "accessed via the str interface" I mean that (on Windows) the wide >> string interface would be used to obtain a file name. > > What does that mean? What specific interface are you referring to to > obtain file names? Most of the time, file names are obtained by the > user entering them on the keyboard. GUI applications are completely > out of the scope of the PEP. > >> Now, suppose that >> the file name returned contains "abc" followed by the half-surrogate >> U+DC10 -- four 16-bit codes. > > Ok, so perhaps you might be talking about os.listdir here. Communication > would be much easier if I would not need to guess what you may mean. os.listdir("") > > Also, why is U+DC10 four 16-bit codes? It isn't. First 16-bit code is U+0061 Second 16-bit code is U+0062 Third 16-bit code is U+0063 Fourth 16-bit code is U+DC10 >> Then, ask for the same filename via the bytes interface, using UTF-8 >> encoding. > > How do you do that on Windows? You cannot just pick an encoding, such > as UTF-8, and pass that to the byte interface, and expect it to work. > If you use the byte interface, you need to encode in the file system > encoding, of course. > > Also, what do you mean by "ask for"?????? WHAT INTERFACE ARE YOU > USING???? Please use specific python code. os.listdir(b"") I find that on my Windows system, with all ASCII path file names, that I get quite different results when I pass os.listdir an empty str vs an empty bytes. Rather than keep you guessing, I get the root directory contents from the empty str, and the current directory contents from an empty bytes. That is rather unexpected. So I guess I'd better suggest that a specific, equivalent directory name be passed in either bytes or str form. >> The PEP says that the above name would get translated to >> "abc" followed by 3 half-surrogates, corresponding to the 3 UTF-8 bytes >> used to represent the half-surrogate that is actually in the file name, >> specifically U+DCED U+DCB0 U+DC90. This means that one name on disk can >> be seen as two different names in memory. > > You are relying on false assumptions here, namely that the UTF-8 > encoding would play any role. > > What would happen instead is that the "mbcs" encoding would be used. The > "mbcs" encoding, by design from Microsoft, will never report an error, > so the error handler will not be invoked at all. So what you are saying here is that Python doesn't use the "A" forms of the Windows APIs for filenames, but only the "W" forms, and uses lossy decoding (from MS) to the current code page (which can never be UTF-8 on Windows). You are further saying that Python doesn't give the programmer control over the codec that is used to convert from W results to bytes, so that on Windows, it is impossible to obtain a bytes result containing UTF-8 from os.listdir, even though sys.setfilesystemencoding exists, and sys.getfilesystemencoding is affected by it, and the latter is documented as returning "mbcs", and as returning the codec that should be used by the application to convert str to bytes for filenames. (Python 3.0.1). While I can hear a "that is outside the scope of the PEP" coming, this documentation is confusing, to say the least. >> Now posit another file which, when accessed via the str interface, has >> the name "abc" followed by U+DCED U+DCB0 U+DC90. >> >> Looks ambiguous to me. Now if you have a scheme for handling this case, >> fine, but I don't understand it from what is written in the PEP. > > You were just making false assumptions in your reasoning, assumptions > that are way beyond the scope of the PEP. Absolutely correct. I was making what seemed to be reasonable assumptions about Python internals on Windows, and several of them are false, including misleading documentation for listdir (which doesn't specify that bytes and str parameters affect whether or not the current directory is honored), and sys.getfilesystemencoding (which reflects the result of sys.setfilesystemencoding, rather than returning, on Windows, the "mbcs" used by Python to create bytes forms of filenames from W forms of filenames even after sys.setfilesystemencoding is called. Things are a little clearer in the documentation for sys.setfilesystemencoding, which does say the encoding isn't used by Windows -- so why is it permitted to change it, if it has no effect?). -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From rdmurray at bitdance.com Wed Apr 29 13:07:11 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 29 Apr 2009 07:07:11 -0400 (EDT) Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F7C98C.60406@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F7C98C.60406@g.nevcal.com> Message-ID: On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote: > On approximately 4/28/2009 7:40 PM, came the following characters from the > keyboard of R. David Murray: >> On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote: >> > C. File on disk with the invalid surrogate code, accessed via the str >> > interface, no decoding happens, matches in memory the file on disk with >> > the byte that translates to the same surrogate, accessed via the bytes >> > interface. Ambiguity. >> >> Unless I'm missing something, one of these is type str, and the other is >> type bytes, so no ambiguity. > > > You are missing that the bytes value would get decoded to a str; thus both > are str; so ambiguity is possible. Only if you as the programmer decode it. Now, I don't understand the subtleties of Unicode enough to know if Martin has already successfully addressed this concern in another fashion, but personally I think that if you as a programmer are comparing funnydecoded-str strings gotten via a string interface with normal-decoded strings gotten via a bytes interface, that we could claim that your program has a bug. --David From cs at zip.com.au Wed Apr 29 13:36:53 2009 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 29 Apr 2009 21:36:53 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F82435.3060205@g.nevcal.com> Message-ID: <20090429113653.GA22908@cskk.homeip.net> On 29Apr2009 02:56, Glenn Linderman wrote: > os.listdir(b"") > > I find that on my Windows system, with all ASCII path file names, that I > get quite different results when I pass os.listdir an empty str vs an > empty bytes. > > Rather than keep you guessing, I get the root directory contents from > the empty str, and the current directory contents from an empty bytes. > That is rather unexpected. > > So I guess I'd better suggest that a specific, equivalent directory name > be passed in either bytes or str form. I think you may have uncovered an implementation bug rather than an encoding issue (because I'd expect "" and b"" to be equivalent). In ancient times, "" was a valid UNIX name for the working directory. POSIX disallows that, and requires people to use ".". Maybe you're seeing an artifact; did python move from UNIX to Windows or the other way around in its porting history? I'd guess the former. Do you get differing results from listdir(".") and listdir(b".") ? How's python2 behave for ""? (Since there's no b"" in python2.) Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ 'Supposing a tree fell down, Pooh, when we were underneath it?' 'Supposing it didn't,' said Pooh after careful thought. From v+python at g.nevcal.com Wed Apr 29 13:47:00 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 04:47:00 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F7C98C.60406@g.nevcal.com> Message-ID: <49F83E34.4020005@g.nevcal.com> On approximately 4/29/2009 4:07 AM, came the following characters from the keyboard of R. David Murray: > On Tue, 28 Apr 2009 at 20:29, Glenn Linderman wrote: >> On approximately 4/28/2009 7:40 PM, came the following characters from >> the keyboard of R. David Murray: >>> On Tue, 28 Apr 2009 at 13:37, Glenn Linderman wrote: >>> > C. File on disk with the invalid surrogate code, accessed via the >>> str > interface, no decoding happens, matches in memory the file on >>> disk with > the byte that translates to the same surrogate, accessed >>> via the bytes > interface. Ambiguity. >>> >>> Unless I'm missing something, one of these is type str, and the >>> other is >>> type bytes, so no ambiguity. >> >> >> You are missing that the bytes value would get decoded to a str; thus >> both are str; so ambiguity is possible. > > Only if you as the programmer decode it. Now, I don't understand the > subtleties of Unicode enough to know if Martin has already successfully > addressed this concern in another fashion, but personally I think that > if you as a programmer are comparing funnydecoded-str strings gotten > via a string interface with normal-decoded strings gotten via a bytes > interface, that we could claim that your program has a bug. Hopefully Martin will clarify the PEP as I suggested in another branch of this thread. He has eventually convinced me that this ambiguity is not possible, via email discussion, but the PEP is certainly less than sufficiently explanatory to make that obvious. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Wed Apr 29 14:06:57 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 05:06:57 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090429113653.GA22908@cskk.homeip.net> References: <20090429113653.GA22908@cskk.homeip.net> Message-ID: <49F842E1.1060008@g.nevcal.com> On approximately 4/29/2009 4:36 AM, came the following characters from the keyboard of Cameron Simpson: > On 29Apr2009 02:56, Glenn Linderman wrote: > >> os.listdir(b"") >> >> I find that on my Windows system, with all ASCII path file names, that I >> get quite different results when I pass os.listdir an empty str vs an >> empty bytes. >> >> Rather than keep you guessing, I get the root directory contents from >> the empty str, and the current directory contents from an empty bytes. >> That is rather unexpected. >> >> So I guess I'd better suggest that a specific, equivalent directory name >> be passed in either bytes or str form. >> > > I think you may have uncovered an implementation bug rather than an > encoding issue (because I'd expect "" and b"" to be equivalent). > Me too. > In ancient times, "" was a valid UNIX name for the working directory. > POSIX disallows that, and requires people to use ".". > > Maybe you're seeing an artifact; did python move from UNIX to Windows or the > other way around in its porting history? I'd guess the former. > > Do you get differing results from listdir(".") and listdir(b".") ? > No. Both are the same as b"" > How's python2 behave for ""? (Since there's no b"" in python2.) Python2 os.listdir("") produces the same thing as Python3 os.listdir(b"") Python2 os.listdir(u"") produces the same thing as Python3 os.listdir("") Another phenomenon of note: I created a directory named ?bc. (Windows XP, Python 3.0.1, Python 2.6.1, SetConsoleOutputCP(65001)) Python3 os.listdir(b".") prints it as b"\xe1bc" Python2 os.listdir(".") prints it as b"\xe1bc" Python2 os.listdir(u".") prints it as u"\xe1bc" Python3 os.listdir(".") prints it as "bc" -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From stephen at xemacs.org Wed Apr 29 14:18:42 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 29 Apr 2009 21:18:42 +0900 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> Message-ID: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> Thomas Breuel writes: > PEP 383 violated (2), and I think that's a bad thing. The whole purpose of PEP 383 is to send the exact same bytes that were read from the OS back to the OS => violating (2) (for whatever the apparent system file-encoding is, not limited to UTF-8), and that has overwhelmingly popular support. Note that this won't happen automatically, either, AIUI. The PEP's proposed implementation is as an error handler, and this would need to be specified explicitly. It's not intended to be the default. > I think the best solution would be to use (3a) and fall back to (3b) if that > doesn't work. If people try to write those strings, they will always get > written as correctly encoded UTF-8 strings. The intended audience aren't trying to write anything in particular, though. They just want to repeat verbatim what the OS told them. > There is yet another option, which is arguably the "right" one: make the > results of os.listdir() subclasses of string that keep track of where they > came from. Sure. This has been mentioned by several people. Martin has no intention of doing it in PEP 383, though, so it will need a new PEP. It has gotten strong pushback from several people, as well. From stephen at xemacs.org Wed Apr 29 15:14:18 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 29 Apr 2009 22:14:18 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <49F6933B.7020705@g.nevcal.com> <30565838.1863289.1240923804684.JavaMail.xicrypt@atgrzls001> <49F6FF49.6010205@avl.com> Message-ID: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp> Baptiste Carvello writes: > By contrast, if the new utf-8b codec would *supercede* the old one, > \udcxx would always mean raw bytes (at least on UCS-4 builds, where > surrogates are unused). Thus ambiguity could be avoided. Unfortunately, that's false. It could have come from a literal string (similar to the text above ;-), a C extension, or a string slice (on 16-bit builds), and there may be other ways to do it. The only way to avoid ambiguity is to change the definition of a Python string to be *valid* Unicode (possibly with Python extensions such as PEP 383 for internal use only). But Guido has rejected that in the past; validation is the application's problem, not Python's. Nor is a UCS-4 build exempt. IIRC Guido specifically envisioned Python strings being used to build up code point sequences to be directly output, which means that a UCS-4 string might none-the-less contain surrogates being added to a string intended to be sent as UTF-16 output simply by truncating the 32-bit code units to 16 bits. From stephen at xemacs.org Wed Apr 29 15:39:26 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 29 Apr 2009 22:39:26 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F73635.6010105@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> Message-ID: <87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > I find the case pretty artificial, though: if the locale encoding > changes, all file names will look incorrect to the user, so he'll > quickly switch back, or rename all the files. It's not necessarily the case that the locale encoding changes, but rather the name of the file. I have a couple of directories where I have Japanese in both EUC-JP and UTF-8, for example. (The applications where I never bothered to do a conversion from EUC to UTF-8 are things like stripping MIME attachments from messages and saving them to files when I changed my default.) So I have a little Emacs Lisp function that tries EUC or UTF8 depending on date and falls back to the other on a decode error. Another possible situation would be a user program in the user's locale communicating with a daemon running in some other locale (quite likely POSIX). So while out of scope of the PEP, I don't think it's at all artificial. From skip at pobox.com Wed Apr 29 16:30:53 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Apr 2009 09:30:53 -0500 (CDT) Subject: [Python-Dev] string to float containing whitespace Message-ID: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> Someone please tell me I'm not going mad. I could have sworn that once upon a time attempting to convert numeric strings to ints or floats if they contained whitespace raised an exception. As far back as 1.5.2 it appears that float(), string.atof() and string.atoi() allow whitespace. Maybe I'm thinking of trailing non-numeric, non-whitespace characters. Skip From amauryfa at gmail.com Wed Apr 29 17:13:42 2009 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 29 Apr 2009 17:13:42 +0200 Subject: [Python-Dev] string to float containing whitespace In-Reply-To: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> Message-ID: Hi, 2009/4/29 : > Someone please tell me I'm not going mad. ?I could have sworn that once upon > a time attempting to convert numeric strings to ints or floats if they > contained whitespace raised an exception. ?As far back as 1.5.2 it appears > that float(), string.atof() and string.atoi() allow whitespace. ?Maybe I'm > thinking of trailing non-numeric, non-whitespace characters. You are maybe referring to the Decimal constructor: decimal.Decimal(" 123") fails with 2.5, but works with 2.6. (issue 1780) -- Amaury Forgeot d'Arc From skip at pobox.com Wed Apr 29 17:26:17 2009 From: skip at pobox.com (skip at pobox.com) Date: Wed, 29 Apr 2009 10:26:17 -0500 Subject: [Python-Dev] string to float containing whitespace In-Reply-To: References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> Message-ID: <18936.29081.376818.250362@montanaro.dyndns.org> Amaury> You are maybe referring to the Decimal constructor: Amaury> decimal.Decimal(" 123") Amaury> fails with 2.5, but works with 2.6. (issue 1780) Highly unlikely, since my recollection is from way back in the early days. Also, I have yet to actually use the decimal module. :-/ Skip From theandromedan at gmail.com Wed Apr 29 17:42:11 2009 From: theandromedan at gmail.com (Paul Franz) Date: Wed, 29 Apr 2009 11:42:11 -0400 Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows Message-ID: <49F87553.7070501@gmail.com> I have looked and looked and looked. But I can not find any directions on how to install the version of Python build using Microsoft's compiler. It builds. I get the dlls and the exe's. But there is no documentation that says how to install what has been built. I have read every readme and stop by the IRC channel and there seems to be nothing. Any ideas where I can look? Paul Franz From aahz at pythoncraft.com Wed Apr 29 18:03:00 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 29 Apr 2009 09:03:00 -0700 Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows In-Reply-To: <49F87553.7070501@gmail.com> References: <49F87553.7070501@gmail.com> Message-ID: <20090429160300.GA10295@panix.com> On Wed, Apr 29, 2009, Paul Franz wrote: > > I have looked and looked and looked. But I can not find any directions > on how to install the version of Python build using Microsoft's > compiler. It builds. I get the dlls and the exe's. But there is no > documentation that says how to install what has been built. I have read > every readme and stop by the IRC channel and there seems to be nothing. > > Any ideas where I can look? Please use comp.lang.python -- python-dev is for discussion of core development. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From theandromedan at gmail.com Wed Apr 29 19:08:11 2009 From: theandromedan at gmail.com (Paul Franz) Date: Wed, 29 Apr 2009 13:08:11 -0400 Subject: [Python-Dev] Installing Python 2.5.4 from Source under Windows In-Reply-To: <20090429160300.GA10295@panix.com> References: <49F87553.7070501@gmail.com> <20090429160300.GA10295@panix.com> Message-ID: <49F8897B.5080805@gmail.com> Ok. I will ask on the python-list. Paul Franz Aahz wrote: > On Wed, Apr 29, 2009, Paul Franz wrote: > >> I have looked and looked and looked. But I can not find any directions >> on how to install the version of Python build using Microsoft's >> compiler. It builds. I get the dlls and the exe's. But there is no >> documentation that says how to install what has been built. I have read >> every readme and stop by the IRC channel and there seems to be nothing. >> >> Any ideas where I can look? >> > > Please use comp.lang.python -- python-dev is for discussion of core > development. > From larry at hastings.org Wed Apr 29 22:01:38 2009 From: larry at hastings.org (Larry Hastings) Date: Wed, 29 Apr 2009 13:01:38 -0700 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath Message-ID: <49F8B222.7070204@hastings.org> I've written a patch for Python 3.1 that changes os.path so it handles UNC paths on Windows: http://bugs.python.org/issue5799 In a Windows path string, a UNC path functions *exactly* like a drive letter. This patch means that the Python path split/join functions treats them as if they were. For instance: >>> splitdrive("A:\\FOO\\BAR.TXT") ("A:", "\\FOO\\BAR.TXT") With this patch applied: >>> splitdrive("\\\\HOSTNAME\\SHARE\\FOO\\BAR.TXT") ("\\\\HOSTNAME\\SHARE", "\\FOO\\BAR.TXT") This methodology only breaks down in one place: there is no "default directory" for a UNC share point. E.g. you can say >>> os.chdir("c:") or >>> os.chdir("c:foo\\bar") but you can't say >>> os.chdir("\\\\hostname\\share") But this is irrelevant to the patch. Here's what my patch changes: * Modify join, split, splitdrive, and ismount to add explicit support for UNC paths. (The other functions pick up support from these four.) * Simplify isabs and normpath, now that they don't need to be delicate about UNC paths. * Modify existing unit tests and add new ones. * Document the changes to the API. * Deprecate splitunc, with a warning and a documentation remark. This patch adds one subtle change I hadn't expected. If you call split() with a drive letter followed by a trailing slash, it returns the trailing slash as part of the "head" returned. E.g. >>> os.path.split("\\") ("\\", "") >>> os.path.split("A:\\") ("A:\\", "") This is mentioned in the documentation, as follows: Trailing slashes are stripped from head unless it is the root (one or more slashes only). For some reason, when os.path.split was called with a UNC path with only a trailing slash, it stripped the trailing slash: >>> os.path.split("\\\\hostname\\share\\") ("\\\\hostname\\share", "") My patch changes this behavior; you would now see: >>> os.path.split("\\\\hostname\\share\\") ("\\\\hostname\\share\\", "") I think it's an improvement--this is more consistent. Note that this does *not* break the documented requirement that os.path.join(os.path.split(path)) == path; that continues to work fine. In the interests of full disclosure: I submitted a patch providing this exact behavior just over ten years ago. GvR accepted it into Python 1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1. You can read GvR's commentary upon removing it; see comments in Misc/HISTORY dated "Tue Apr 6 19:38:18 1999". If memory serves correctly, the problems cited were only on Cygwin. At the time Cygwin used "ntpath", and it supported "//a/foo" as an alias for "A:\\FOO". You can see how this would cause Cygwin problems. In the intervening decade, two highly relevant things have happened: * Python no longer uses ntpath for os.path on Cygwin. Instead it uses posixpath. * Cygwin removed the "//a/foo" drive letter hack. In fact, I believe it now support UNC paths. Therefore this patch will have no effect on Cygwin users. What do you think? /larry/ From martin at v.loewis.de Wed Apr 29 22:06:33 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 22:06:33 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F80691.80403@g.nevcal.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com> Message-ID: <49F8B349.30901@v.loewis.de> > In the first paragraph, you should make it clear that Python 3.0 does > not use the Windows bytes interfaces, if it doesn't. "Python uses > *only* the wide character APIs..." would suffice. That's not quite exact. It uses both ANSI and Wide APIs - depending on whether you pass bytes as input or strings. Please see the Python source code to find out how this works, and what that means. > As stated, it seems > like Python *does* use the wide character APIs, but leaves open the > possibility that it might use byte APIs also. A short description of > what happens on Windows when Python code uses bytes APIs would also be > helpful. I'm at a loss how to make the text more clear than it already is. I'm really not good at writing long essays, with a lot of explanatory-but-non-normative text. I also think that explanations do not belong in the section titled specification, nor does a full description of the status quo belongs into the PEP at all. The reader should consult the current Python source code if in doubt what the status quo is. > In the second paragraph, it speaks of "currently" but then speaks of > using the half-surrogates. I don't believe that happens "currently". > You did change tense, but that paragraph is quite confusing, currently, > because of the tense change. You should describe there, the action that > is currently taken by Python for non-decodable byes, and then in the > next paragraph talk about what the PEP changes. Thanks, fixed. > The 4th paragraph is now confusing too... would it not be the decode > error handler that returns the byte strings, in addition to the Unicode > strings? No, why do you think so? That's intended as stated. > The 5th paragraph has apparently confused some people into thinking this > PEP only applies to locale's using UTF-8 encodings; you should have an > "else clause" to clear that up, pointing out that the reverse encoding > of half-surrogates by other encodings already produces errors, that > UTF-8 is a special case, not the only case. I have fixed that by extending the third paragraph. > The code added to the discussion has mismatched (), making me wonder if > it is complete. There is a reasonable possibility that only the final ) > is missing. Indeed; this is now also fixed. Regards, Martin From martin at v.loewis.de Wed Apr 29 22:15:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 22:15:12 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> Message-ID: <49F8B550.9070808@v.loewis.de> > Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). > > > Well, here's another one: PEP 383 would disallow UTF-8 encodings of half > surrogates. But such encodings are currently supported by Python, and > they are used as part of CESU-8 coding. That's, in fact, a common way > of converting UTF-16 to UTF-8. How are you going to deal with existing > code that relies on being able to code half surrogates as UTF-8? Can you please elaborate? What code specifically are you talking about? Regards, Martin From martin at v.loewis.de Wed Apr 29 22:28:54 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 22:28:54 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F82435.3060205@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> Message-ID: <49F8B886.5020700@v.loewis.de> >>>>>>> C. File on disk with the invalid surrogate code, accessed via the >>>>>>> str interface, no decoding happens, matches in memory the file on disk >>>>>>> with the byte that translates to the same surrogate, accessed via the >>>>>>> bytes interface. Ambiguity. >> What does that mean? What specific interface are you referring to to >> obtain file names? > > os.listdir("") > > os.listdir(b"") > > So I guess I'd better suggest that a specific, equivalent directory name > be passed in either bytes or str form. [Leaving the issue of the empty string apparently having different meanings aside ...] Ok. Now I understand the example. So you do os.listdir("c:/tmp") os.listdir(b"c:/tmp") and you have a file in c:/tmp that is named "abc\uDC10". > So what you are saying here is that Python doesn't use the "A" forms of > the Windows APIs for filenames, but only the "W" forms, and uses lossy > decoding (from MS) to the current code page (which can never be UTF-8 on > Windows). Actually, it does use the A form, in the second listdir example. This, in turn (inside Windows), uses the lossy CP_ACP encoding. You get back a byte string; the listdirs should give ["abc\uDC10"] [b"abc?"] (not quite sure about the second - I only guess that CP_ACP will replace the half surrogate with a question mark). So where is the ambiguity here? > You are further saying that Python doesn't give the programmer control > over the codec that is used to convert from W results to bytes, so that > on Windows, it is impossible to obtain a bytes result containing UTF-8 > from os.listdir, even though sys.setfilesystemencoding exists, and > sys.getfilesystemencoding is affected by it, and the latter is > documented as returning "mbcs", and as returning the codec that should > be used by the application to convert str to bytes for filenames. > (Python 3.0.1). Not exactly. You *can* do setfilesystemencoding on Windows, but it has no effect, as the Python file system encoding is never used on Windows. For a string, it passes it to the W API as is; for bytes, it passes it to the A API as-is. Python never invokes any codec here. > While I can hear a "that is outside the scope of the PEP" coming, this > documentation is confusing, to say the least. Only because you are apparently unaware of the status quo. If you would study the current Python source code, it would be all very clear. > Things are a little clearer in the documentation for > sys.setfilesystemencoding, which does say the encoding isn't used by > Windows -- so why is it permitted to change it, if it has no effect?). As in many cases: because nobody contributed code to make it behave otherwise. It's not that the file system encoding is "mbcs" - the file system encoding is simply unused on Windows (but that wasn't always the case, in particular not when Windows 9x still had to be supported). Regards, Martin From martin at v.loewis.de Wed Apr 29 22:35:17 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 22:35:17 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <87bpqfk29d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <49F8BA05.2070906@v.loewis.de> > So while out of scope of the PEP, I don't think it's at all > artificial. Sure - but I see this as the same case as "the file got renamed". If you have a LRU list in your app, and a file gets renamed, then the LRU list breaks (unless you also store the inode number in the LRU list, and lookup the file by inode number - or object UUID on NTFS, possibly using distributed link tracking). Regards, Martin From tjreedy at udel.edu Wed Apr 29 22:59:57 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 29 Apr 2009 16:59:57 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F842E1.1060008@g.nevcal.com> References: <20090429113653.GA22908@cskk.homeip.net> <49F842E1.1060008@g.nevcal.com> Message-ID: Glenn Linderman wrote: > On approximately 4/29/2009 4:36 AM, came the following characters from > the keyboard of Cameron Simpson: >> On 29Apr2009 02:56, Glenn Linderman wrote: >> >>> os.listdir(b"") >>> >>> I find that on my Windows system, with all ASCII path file names, >>> that I get quite different results when I pass os.listdir an empty >>> str vs an empty bytes. >>> >>> Rather than keep you guessing, I get the root directory contents >>> from the empty str, and the current directory contents from an empty >>> bytes. That is rather unexpected. >>> >>> So I guess I'd better suggest that a specific, equivalent directory >>> name be passed in either bytes or str form. >>> >> >> I think you may have uncovered an implementation bug rather than an >> encoding issue (because I'd expect "" and b"" to be equivalent). >> > > Me too. Sounds like an issue for the tracker. From tjreedy at udel.edu Wed Apr 29 23:03:30 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 29 Apr 2009 17:03:30 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> References: <49EEBE2E.3090601@v.loewis.de> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> Message-ID: Thomas Breuel wrote: > > Sure. However, that requires you to provide meaningful, reproducible > counter-examples, rather than a stenographic formulation that might > hint some problem you apparently see (which I believe is just not > there). > > > Well, here's another one: PEP 383 would disallow UTF-8 encodings of half > surrogates. By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows that. > But such encodings are currently supported by Python, and > they are used as part of CESU-8 coding. That's, in fact, a common way > of converting UTF-16 to UTF-8. How are you going to deal with existing > code that relies on being able to code half surrogates as UTF-8? From v+python at g.nevcal.com Wed Apr 29 23:09:26 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 14:09:26 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F8B886.5020700@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> <49F8B886.5020700@v.loewis.de> Message-ID: <49F8C206.5070801@g.nevcal.com> On approximately 4/29/2009 1:28 PM, came the following characters from the keyboard of Martin v. L?wis: >>>>>>>> C. File on disk with the invalid surrogate code, accessed via the >>>>>>>> str interface, no decoding happens, matches in memory the file on disk >>>>>>>> with the byte that translates to the same surrogate, accessed via the >>>>>>>> bytes interface. Ambiguity. >>> What does that mean? What specific interface are you referring to to >>> obtain file names? >> os.listdir("") >> >> os.listdir(b"") >> >> So I guess I'd better suggest that a specific, equivalent directory name >> be passed in either bytes or str form. > > [Leaving the issue of the empty string apparently having different > meanings aside ...] > > Ok. Now I understand the example. So you do > > os.listdir("c:/tmp") > os.listdir(b"c:/tmp") > > and you have a file in c:/tmp that is named "abc\uDC10". > >> So what you are saying here is that Python doesn't use the "A" forms of >> the Windows APIs for filenames, but only the "W" forms, and uses lossy >> decoding (from MS) to the current code page (which can never be UTF-8 on >> Windows). > > Actually, it does use the A form, in the second listdir example. This, > in turn (inside Windows), uses the lossy CP_ACP encoding. You get back > a byte string; the listdirs should give > > ["abc\uDC10"] > [b"abc?"] > > (not quite sure about the second - I only guess that CP_ACP will replace > the half surrogate with a question mark). > > So where is the ambiguity here? None. But not everyone can read all the Python source code to try to understand it; they expect the documentation to help them avoid that. Because the documentation is lacking in this area, it makes your concisely stated PEP rather hard to understand. Thanks for clarifying the Windows behavior, here. A little more clarification in the PEP could have avoided lots of discussion. It would seem that a PEP, proposed to modify a poorly documented (and therefore likely poorly understood) area, should be educational about the status quo, as well as presenting the suggested change. Or is it the Python philosophy that the PEPs should be as incomprehensible as possible, to generate large discussions? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Wed Apr 29 23:17:32 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 23:17:32 +0200 Subject: [Python-Dev] string to float containing whitespace In-Reply-To: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> References: <20090429143053.3AE1D10276C9@montanaro.dyndns.org> Message-ID: <49F8C3EC.8020001@v.loewis.de> skip at pobox.com wrote: > Someone please tell me I'm not going mad. I could have sworn that once upon > a time attempting to convert numeric strings to ints or floats if they > contained whitespace raised an exception. As far back as 1.5.2 it appears > that float(), string.atof() and string.atoi() allow whitespace. Maybe I'm > thinking of trailing non-numeric, non-whitespace characters. Maybe you remember truly *embedded* whitespace: py> float("1. 3") Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for float(): 1. 3 Regards, Martin From martin at v.loewis.de Wed Apr 29 23:19:25 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 29 Apr 2009 23:19:25 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <49F8C45D.6060302@v.loewis.de> > The whole purpose of PEP 383 is to send the exact same bytes that were > read from the OS back to the OS => violating (2) (for whatever the > apparent system file-encoding is, not limited to UTF-8), and that has > overwhelmingly popular support. > > Note that this won't happen automatically, either, AIUI. The PEP's > proposed implementation is as an error handler, and this would need to > be specified explicitly. It's not intended to be the default. Actually, no: the error handler will be automatically used in all places that convert file names to bytes. I have clarified the PEP to make that explicit. IOW, it replaces the current "strict" setting in these cases. Regards, Martin From cs at zip.com.au Wed Apr 29 23:49:31 2009 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 30 Apr 2009 07:49:31 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: Message-ID: <20090429214931.GA3303@cskk.homeip.net> On 29Apr2009 17:03, Terry Reedy wrote: > Thomas Breuel wrote: >> Sure. However, that requires you to provide meaningful, reproducible >> counter-examples, rather than a stenographic formulation that might >> hint some problem you apparently see (which I believe is just not >> there). >> >> Well, here's another one: PEP 383 would disallow UTF-8 encodings of >> half surrogates. > > By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows that. 5.0 also disallows it. No surprise I guess. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Out on the road, feeling the breeze, passing the cars. - Bob Seger From v+python at g.nevcal.com Thu Apr 30 00:17:42 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 29 Apr 2009 15:17:42 -0700 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F8B349.30901@v.loewis.de> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com> <49F8B349.30901@v.loewis.de> Message-ID: <49F8D206.2000104@g.nevcal.com> On approximately 4/29/2009 1:06 PM, came the following characters from the keyboard of Martin v. L?wis: > Thanks, fixed. Thanks for your fixes. They are helpful. > I'm at a loss how to make the text more clear than it already is. I'm > really not good at writing long essays, with a lot of > explanatory-but-non-normative text. I also think that explanations do > not belong in the section titled specification, nor does a full > description of the status quo belongs into the PEP at all. The reader > should consult the current Python source code if in doubt what the > status quo is. The status quo is what justifies the existence of the PEP. If the status quo were perfect, there would be no need for the PEP. The status quo should be described in the Rationale. Some of it is. The rest of it should be. If there is a need for this PEP for POSIX, but not Windows, the reason why should be given (Para 2 in Rationale seems to try to describe that, but doesn't go far enough), and also the reason that cross-platform code can install this PEP's error handler on both platforms, yet it won't affect bytes interfaces on Windows. These are two omissions that have both caused large amounts of discussion. Attempting to understand the Python source code is a good thing, but there is a lot to understand, and few will achieve a full understanding. >> The 4th paragraph is now confusing too... would it not be the decode >> error handler that returns the byte strings, in addition to the Unicode >> strings? > > No, why do you think so? That's intended as stated. Here, a use case, or several, in the PEP could help clarify why it would be the encode error handler that would return both the bytes string and the Unicode string. And why the decode error handler would not need to. Seems that if the decode handler preserved the bytes from the OS, and made them available as well as the decoded Unicode, that could be interesting to the application that is wanting to manipulate the file. Seems that if the encode handler is given the Unicode, so not clear why it should also return it. I guess if there is an error during the encode process (can there be?) then the bytes and Unicode for comparison could be useful for error reporting. But I shouldn't have to guess. The PEP should explain how these things are useful. The discussion section could be extended with use cases for both the encode and decode cases. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From cs at zip.com.au Thu Apr 30 00:45:32 2009 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 30 Apr 2009 08:45:32 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20090429224532.GA11604@cskk.homeip.net> On 29Apr2009 22:14, Stephen J. Turnbull wrote: | Baptiste Carvello writes: | > By contrast, if the new utf-8b codec would *supercede* the old one, | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where | > surrogates are unused). Thus ambiguity could be avoided. | | Unfortunately, that's false. It could have come from a literal string | (similar to the text above ;-), a C extension, or a string slice (on | 16-bit builds), and there may be other ways to do it. The only way to | avoid ambiguity is to change the definition of a Python string to be | *valid* Unicode (possibly with Python extensions such as PEP 383 for | internal use only). But Guido has rejected that in the past; | validation is the application's problem, not Python's. | | Nor is a UCS-4 build exempt. IIRC Guido specifically envisioned | Python strings being used to build up code point sequences to be | directly output, which means that a UCS-4 string might none-the-less | contain surrogates being added to a string intended to be sent as | UTF-16 output simply by truncating the 32-bit code units to 16 bits. Wouldn't you then be bypassing the implicit encoding anyway, at least to some extent, and thus not trip over the PEP? -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Clemson is the Harvard of cardboard packaging. - overhead by WIRED at the Intelligent Printing conference Oct2006 From fuzzyman at voidspace.org.uk Thu Apr 30 00:50:08 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 29 Apr 2009 23:50:08 +0100 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49F8B222.7070204@hastings.org> References: <49F8B222.7070204@hastings.org> Message-ID: <49F8D9A0.7000104@voidspace.org.uk> Larry Hastings wrote: > > I've written a patch for Python 3.1 that changes os.path so it handles > UNC paths on Windows: > > http://bugs.python.org/issue5799 +1 for the feature. I have to deal with Windows networks from time to time and this would be useful. Michael > > In a Windows path string, a UNC path functions *exactly* like a drive > letter. This patch means that the Python path split/join functions > treats them as if they were. > > For instance: > >>> splitdrive("A:\\FOO\\BAR.TXT") > ("A:", "\\FOO\\BAR.TXT") > > With this patch applied: > >>> splitdrive("\\\\HOSTNAME\\SHARE\\FOO\\BAR.TXT") > ("\\\\HOSTNAME\\SHARE", "\\FOO\\BAR.TXT") > > This methodology only breaks down in one place: there is no "default > directory" for a UNC share point. E.g. you can say > >>> os.chdir("c:") > or > >>> os.chdir("c:foo\\bar") > but you can't say > >>> os.chdir("\\\\hostname\\share") > But this is irrelevant to the patch. > > Here's what my patch changes: > * Modify join, split, splitdrive, and ismount to add explicit support > for UNC paths. (The other functions pick up support from these four.) > * Simplify isabs and normpath, now that they don't need to be delicate > about UNC paths. > * Modify existing unit tests and add new ones. > * Document the changes to the API. > * Deprecate splitunc, with a warning and a documentation remark. > > This patch adds one subtle change I hadn't expected. If you call > split() with a drive letter followed by a trailing slash, it returns the > trailing slash as part of the "head" returned. E.g. > >>> os.path.split("\\") > ("\\", "") > >>> os.path.split("A:\\") > ("A:\\", "") > This is mentioned in the documentation, as follows: > Trailing slashes are stripped from head unless it is the root > (one or more slashes only). > > For some reason, when os.path.split was called with a UNC path with only > a trailing slash, it stripped the trailing slash: > >>> os.path.split("\\\\hostname\\share\\") > ("\\\\hostname\\share", "") > My patch changes this behavior; you would now see: > >>> os.path.split("\\\\hostname\\share\\") > ("\\\\hostname\\share\\", "") > I think it's an improvement--this is more consistent. Note that this > does *not* break the documented requirement that > os.path.join(os.path.split(path)) == path; that continues to work fine. > > > In the interests of full disclosure: I submitted a patch providing this > exact behavior just over ten years ago. GvR accepted it into Python > 1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1. > > You can read GvR's commentary upon removing it; see comments in > Misc/HISTORY > dated "Tue Apr 6 19:38:18 1999". If memory serves > correctly, the problems cited were only on Cygwin. At the time Cygwin > used "ntpath", and it supported "//a/foo" as an alias for "A:\\FOO". > You can see how this would cause Cygwin problems. > > In the intervening decade, two highly relevant things have happened: > * Python no longer uses ntpath for os.path on Cygwin. Instead it uses > posixpath. > * Cygwin removed the "//a/foo" drive letter hack. In fact, I believe it > now support UNC paths. > Therefore this patch will have no effect on Cygwin users. > > > What do you think? > > > /larry/ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From barry at barrys-emacs.org Thu Apr 30 00:41:16 2009 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 29 Apr 2009 23:41:16 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49EEBE2E.3090601@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> Message-ID: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> On 22 Apr 2009, at 07:50, Martin v. L?wis wrote: > > If the locale's encoding is UTF-8, the file system encoding is set to > a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes > (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. > Forgive me if this has been covered. I've been reading this thread for a long time and still have a 100 odd replies to go... How do get a printable unicode version of these path strings if they contain none unicode data? I'm guessing that an app has to understand that filenames come in two forms unicode and bytes if its not utf-8 data. Why not simply return string if its valid utf-8 otherwise return bytes? Then in the app you check for the type for the object, string or byte and deal with reporting errors appropriately. Barry From eric at trueblade.com Thu Apr 30 00:59:25 2009 From: eric at trueblade.com (Eric Smith) Date: Wed, 29 Apr 2009 18:59:25 -0400 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49F8D9A0.7000104@voidspace.org.uk> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> Message-ID: <49F8DBCD.6050504@trueblade.com> Michael Foord wrote: > Larry Hastings wrote: >> >> I've written a patch for Python 3.1 that changes os.path so it handles >> UNC paths on Windows: >> >> http://bugs.python.org/issue5799 > > +1 for the feature. I have to deal with Windows networks from time to > time and this would be useful. +1 from me, too. I haven't looked at the implementation, but for sure the feature would be welcome. >> In the interests of full disclosure: I submitted a patch providing this >> exact behavior just over ten years ago. GvR accepted it into Python >> 1.5.2b2 (marked "*EXPERIMENTAL*") and removed it from 1.5.2c1. >> In the intervening decade, two highly relevant things have happened: >> * Python no longer uses ntpath for os.path on Cygwin. Instead it uses >> posixpath. >> * Cygwin removed the "//a/foo" drive letter hack. In fact, I believe it >> now support UNC paths. >> Therefore this patch will have no effect on Cygwin users. Yes, cygwin supports UNC paths with //host/share, and they use /cygdrive/a, etc., to refer to physical drives. It's been that way for as long as I recall, at least 7 years. From cs at zip.com.au Thu Apr 30 01:28:52 2009 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 30 Apr 2009 09:28:52 +1000 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> Message-ID: <20090429232852.GA26172@cskk.homeip.net> On 29Apr2009 23:41, Barry Scott wrote: > On 22 Apr 2009, at 07:50, Martin v. L?wis wrote: >> If the locale's encoding is UTF-8, the file system encoding is set to >> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes >> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF. > > Forgive me if this has been covered. I've been reading this thread for a > long time and still have a 100 odd replies to go... > > How do get a printable unicode version of these path strings if they > contain none unicode data? Personally, I'd use repr(). One might ask, what would you expect to see if you were printing such a string? > I'm guessing that an app has to understand that filenames come in two > forms unicode and bytes if its not utf-8 data. Why not simply return string if > its valid utf-8 otherwise return bytes? Then in the app you check for the type for > the object, string or byte and deal with reporting errors appropriately. Because it complicates the app enormously, for every app. It would be _nice_ to just call os.listdir() et al with strings, get strings, and not worry. With strings becoming unicode in Python3, on POSIX you have an issue of deciding how to get its filenames-are-bytes into a string and the reverse. One could naively map the byte values to the same Unicode code points, but that results in strings that do not contain the same characters as the user/app expects for byte values above 127. Since POSIX does not really have a filesystem level character encoding, just a user environment setting that says how the current user encodes characters into bytes (UTF-8 is increasingly common and useful, but it is not universal), it is more useful to decode filenames on the assumption that they represent characters in the user's (current) encoding convention; that way when things are displayed they are meaningful, and they interoperate well with strings made by the user/app. If all the filenames were actually encoded that way when made, that works. But different users may adopt different conventions, and indeed a user may have used ACII or and ISO8859-* coding in the past and be transitioning to something else now, so they will have a bunch of files in different encodings. The PEP uses the user's current encoding with a handler for byte sequences that don't decode to valid Unicode scaler values in a fashion that is reversible. That is, you get "strings" out of listdir() and those strings will go back in (eg to open()) perfectly robustly. Previous approaches would either silently hide non-decodable names in listdir() results or throw exceptions when the decode failed or mangle things no reversably. I believe Python3 went with the first option there. The PEP at least lets programs naively access all files that exist, and create a filename from any well-formed unicode string provided that the filesystem encoding permits the name to be encoded. The lengthy discussion mostly revolves around: - Glenn points out that strings that came _not_ from listdir, and that are _not_ well-formed unicode (== "have bare surrogates in them") but that were intended for use as filenames will conflict with the PEP's scheme - programs must know that these strings came from outside and must be translated into the PEP's funny-encoding before use in the os.* functions. Previous to the PEP they would get used directly and encode differently after the PEP, thus producing different POSIX filenames. Breakage. - Glenn would like the encoding to use Unicode scalar values only, using a rare-in-filenames character. That would avoid the issue with "outside' strings that contain surrogates. To my mind it just moves the punning from rare illegal strings to merely uncommon but legal characters. - Some parties think it would be better to not return strings from os.listdir but a subclass of string (or at least a duck-type of string) that knows where it came from and is also handily recognisable as not-really-a-string for purposes of deciding whether is it PEP-funny-encoded by direct inspection. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ The peever can look at the best day in his life and sneer at it. - Jim Hill, JennyGfest '95 From aahz at pythoncraft.com Thu Apr 30 04:50:50 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 29 Apr 2009 19:50:50 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090429232852.GA26172@cskk.homeip.net> References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <20090429232852.GA26172@cskk.homeip.net> Message-ID: <20090430025050.GB1544@panix.com> On Thu, Apr 30, 2009, Cameron Simpson wrote: > > The lengthy discussion mostly revolves around: > > - Glenn points out that strings that came _not_ from listdir, and that are > _not_ well-formed unicode (== "have bare surrogates in them") but that > were intended for use as filenames will conflict with the PEP's scheme - > programs must know that these strings came from outside and must be > translated into the PEP's funny-encoding before use in the os.* > functions. Previous to the PEP they would get used directly and > encode differently after the PEP, thus producing different POSIX > filenames. Breakage. > > - Glenn would like the encoding to use Unicode scalar values only, > using a rare-in-filenames character. > That would avoid the issue with "outside' strings that contain > surrogates. To my mind it just moves the punning from rare illegal > strings to merely uncommon but legal characters. > > - Some parties think it would be better to not return strings from > os.listdir but a subclass of string (or at least a duck-type of > string) that knows where it came from and is also handily > recognisable as not-really-a-string for purposes of deciding > whether is it PEP-funny-encoded by direct inspection. Assuming people agree that this is an accurate summary, it should be incorporated into the PEP. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From aahz at pythoncraft.com Thu Apr 30 05:16:29 2009 From: aahz at pythoncraft.com (Aahz) Date: Wed, 29 Apr 2009 20:16:29 -0700 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F8B349.30901@v.loewis.de> References: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com> <49F8B349.30901@v.loewis.de> Message-ID: <20090430031629.GB25125@panix.com> On Wed, Apr 29, 2009, "Martin v. L?wis" wrote: > > I'm at a loss how to make the text more clear than it already is. I'm > really not good at writing long essays, with a lot of > explanatory-but-non-normative text. I also think that explanations do > not belong in the section titled specification, nor does a full > description of the status quo belongs into the PEP at all. The reader > should consult the current Python source code if in doubt what the > status quo is. Perhaps not a full description of the status quo, but the PEP definitely needs a good summary -- remember that PEPs are not just for the time that they are written, but also for the future. While telling people to "read the source, Luke" makes some sense at a specific point in time, I don't think that requiring a trawl through code history is fair. And, yes, PEP-writing is painful. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From tmbdev at gmail.com Thu Apr 30 05:16:20 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 05:16:20 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> > > The whole purpose of PEP 383 is to send the exact same bytes that were > read from the OS back to the OS => violating (2) (for whatever the > apparent system file-encoding is, not limited to UTF-8), It's fine to read a file name from a file system and write the same file back as the same raw byte sequence. That I don't have a problem with; it's not quite right, but it's harmless. The problem with this PEP is that the malformed unicode it produces can end up in so many other places: as file names on another file system, in string processing libraries, in text files, in databases, in user interfaces, etc. Some of those destinations will use the utf-8b decoder, so they will get byte sequences that never could occur before and that are illegal under unicode. Nobody knows what will happen. And, yes, Martin is proposing that this is the default behavior. There are several other issues that are unresolved: utf-8b makes some current practices illegal; for example, it might break CESU-8 encodings. Also, what are Jython and IronPython supposed to do on UNIX? Can they implement these semantics at all? > and that has overwhelmingly popular support. I think people don't fully understand the tradeoffs. I certainly don't. Although there is a slight benefit, there are unknown and potentially large costs. We'd be changing Python's entire unicode string behavior for the sake of one use cases. Since our uses of Python actually involve a lot of unicode, I am wary of having malformed unicode crop up legally in Python code. And that's why I think this proposal should be shelved for a while until people have had more time to try to understand the issues and also come up with alternative proposals. Once this is adopted and implemented in C-Python, Python is stuck with it forever. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From curt at hagenlocher.org Thu Apr 30 05:40:07 2009 From: curt at hagenlocher.org (Curt Hagenlocher) Date: Wed, 29 Apr 2009 20:40:07 -0700 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> Message-ID: On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel wrote: > > Also, what are Jython and IronPython supposed to do on UNIX?? Can they > implement these semantics at all? IronPython will inherit whatever behavior Mono has implemented. The Microsoft CLR defines the native string type as UTF-16 and all of the managed APIs for things like file names and environmental variables operate on UTF-16 strings -- there simply are no byte string APIs. I assume that Mono does the same but I don't have any Mono experience. -- Curt Hagenlocher curt at hagenlocher.org From steve at pearwood.info Thu Apr 30 05:45:52 2009 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 30 Apr 2009 13:45:52 +1000 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> Message-ID: <200904301345.52776.steve@pearwood.info> On Thu, 30 Apr 2009 01:16:20 pm Thomas Breuel wrote: > And that's why I think this proposal should be shelved for a while > until people have had more time to try to understand the issues and > also come up with alternative proposals. ?Once this is adopted and > implemented in C-Python, Python is stuck with it forever. +1 on this. I'm going to quote the Zen here: Now is better than never. Although never is often better than *right* now. I don't understand the proposal and issues. I see a lot of people claiming that they do, and then spending all their time either talking past each other, or disagreeing. If everyone who claims they understand the issues actually does, why is it so hard to reach a consensus? I'd like to see some real examples of how things can break in the current system, and I'd like any potential solution to be made available as a third-party package before it goes into the standard library (if possible). Currently, we're reduced to trying to predict the consequences of implementing the PEP, instead of being able to try it out and see. Even something like a test suite would be useful: here are a bunch of malformed file names, and this is what happens when you try to work with them. Please, let's see some code we can run, not more words. -- Steven D'Aprano From tjreedy at udel.edu Thu Apr 30 05:46:44 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 29 Apr 2009 23:46:44 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F8C206.5070801@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> <49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com> Message-ID: Glenn Linderman wrote: > On approximately 4/29/2009 1:28 PM, came the following characters from >> So where is the ambiguity here? > > None. But not everyone can read all the Python source code to try to > understand it; they expect the documentation to help them avoid that. > Because the documentation is lacking in this area, it makes your > concisely stated PEP rather hard to understand. If you think a section of the doc is grossly inadequate, and there is no existing issue on the tracker, feel free to add one. > Thanks for clarifying the Windows behavior, here. A little more > clarification in the PEP could have avoided lots of discussion. It > would seem that a PEP, proposed to modify a poorly documented (and > therefore likely poorly understood) area, should be educational about > the status quo, as well as presenting the suggested change. Where the PEP proposes to change, it should start with the status quo. But Martin's somewhat reasonable position is that since he is not proposing to change behavior on Windows, it is not his responsibility to document what he is not proposing to change more adequately. This means, of course, that any observed change on Windows would then be a bug, or at least a break of the promise. On the other hand, I can see that this is enough related to what he is proposing to change that better doc would help. tjr From martin at v.loewis.de Thu Apr 30 06:48:24 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 06:48:24 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <49F8D206.2000104@g.nevcal.com> References: <7e51d15d0904272329l1cbfd579i9833f9b56aa4b55f@mail.gmail.com> <20090428075806.GB23828@phd.pp.ru> <7e51d15d0904280137y44fdc0a4u716aaa118b80be60@mail.gmail.com> <87mya0kg94.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com> <49F8B349.30901@v.loewis.de> <49F8D206.2000104@g.nevcal.com> Message-ID: <49F92D98.3020403@v.loewis.de> > But I shouldn't have to guess. The PEP should explain how these things > are useful. The discussion section could be extended with use cases for > both the encode and decode cases. See PEP 293. Regards, Martin From martin at v.loewis.de Thu Apr 30 06:52:18 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 06:52:18 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> References: <49EEBE2E.3090601@v.loewis.de> <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> Message-ID: <49F92E82.9040702@v.loewis.de> > How do get a printable unicode version of these path strings if they > contain none unicode data? Define "printable". One way would be to use a regular expression, replacing all codes in a certain range with a question mark. > I'm guessing that an app has to understand that filenames come in two forms > unicode and bytes if its not utf-8 data. Why not simply return string if > its valid utf-8 otherwise return bytes? That would have been an alternative solution, and the one that 2.x uses for listdir. People didn't like it. Regards, Martin From martin at v.loewis.de Thu Apr 30 07:17:38 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Apr 2009 07:17:38 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <200904301345.52776.steve@pearwood.info> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <200904301345.52776.steve@pearwood.info> Message-ID: <49F93472.5010509@v.loewis.de> > I don't understand the proposal and issues. I see a lot of people > claiming that they do, and then spending all their time either > talking past each other, or disagreeing. If everyone who claims they > understand the issues actually does, why is it so hard to reach a > consensus? Because the problem is difficult, and any solution has trade-offs. People disagree on which trade-offs are worse than others. > I'd like to see some real examples of how things can break in the > current system Suppose I create a new directory, and run the following script in 3.x: py> open("x","w").close() py> open(b"\xff","w").close() py> os.listdir(".") ['x'] If I quit Python, I can now do martin at mira:~/work/3k/t$ ls ? x martin at mira:~/work/3k/t$ ls -b \377 x As you can see, there are two files in the current directory, but only one of them is reported by os.listdir. The same happens to command line arguments and environment variables: Python might swallow some of them. > and I'd like any potential solution to be made > available as a third-party package before it goes into the standard > library (if possible). Unfortunately, at least for my solution, this isn't possible. I need to change the implementation of the existing file IO APIs. > Currently, we're reduced to trying to predict > the consequences of implementing the PEP, instead of being able to > try it out and see. In a sense, this is one of the primary points of the PEP process: to discuss a specification before the effort to produce an implementation is started. > Even something like a test suite would be useful: here are a bunch of > malformed file names, and this is what happens when you try to work > with them. Please, let's see some code we can run, not more words. Just try my example above, on a Linux system, in a UTF-8 locale. Regards, Martin From martin at v.loewis.de Thu Apr 30 07:24:34 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 07:24:34 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> Message-ID: <49F93612.4080100@v.loewis.de> Curt Hagenlocher wrote: > On Wed, Apr 29, 2009 at 8:16 PM, Thomas Breuel wrote: >> Also, what are Jython and IronPython supposed to do on UNIX? Can they >> implement these semantics at all? > > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like file names and environmental variables > operate on UTF-16 strings -- there simply are no byte string APIs. > > I assume that Mono does the same but I don't have any Mono experience. Marcin Kowalczyk once did a review, at http://mail.python.org/pipermail/python-3000/2007-September/010450.html It may have changed since then; at the time, Mono would omit non-decodable files in directory listings, and would refuse to start if a non-decodable command line argument is passed. The environment variable MONO_EXTERNAL_ENCODINGS can be set to specify what encodings should be tried in what order. However, I don't think it is relevant for the PEP: as Curt says, these details will be inherited from the VM; the mechanism proposed is really specific to CPython. To implement it on the other VMs, those would have to either implement it natively, or provide byte-oriented APIs to allow Jython/IronPython to implement it on top of it (the latter being not realistic or useful). Regards, Martin From martin at v.loewis.de Thu Apr 30 07:29:53 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 07:29:53 +0200 Subject: [Python-Dev] PEP 383 (again) In-Reply-To: <20090430031629.GB25125@panix.com> References: <7e51d15d0904281138q7b66d235i27dec19a1707edc2@mail.gmail.com> <49F74EE5.6060305@v.loewis.de> <7e51d15d0904281224q397240b6p2a0a786367fc55ce@mail.gmail.com> <49F7613C.9000901@v.loewis.de> <7e51d15d0904281530y3ae282f4u77263058e617028e@mail.gmail.com> <49F7E964.9050700@v.loewis.de> <7e51d15d0904282353i31f5ce4cp281194b94c55394a@mail.gmail.com> <49F7FF03.2090909@v.loewis.de> <49F80691.80403@g.nevcal.com> <49F8B349.30901@v.loewis.de> <20090430031629.GB25125@panix.com> Message-ID: <49F93751.7060701@v.loewis.de> > Perhaps not a full description of the status quo, but the PEP definitely > needs a good summary I completely agree, and believe that the PEP *does* have a good summary - it has both an abstract, and a rationale, and both say exactly what I want them to say. If people want them to say different things, they have to tell me what specifically they want it to say (perhaps even with specific formulations). If they can't communicate their requests to me, I can't comply. Regards, Martin From martin at v.loewis.de Thu Apr 30 07:42:12 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 07:42:12 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F8C206.5070801@g.nevcal.com> References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> <49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com> Message-ID: <49F93A34.4050904@v.loewis.de> > Thanks for clarifying the Windows behavior, here. A little more > clarification in the PEP could have avoided lots of discussion. It > would seem that a PEP, proposed to modify a poorly documented (and > therefore likely poorly understood) area, should be educational about > the status quo, as well as presenting the suggested change. Or is it > the Python philosophy that the PEPs should be as incomprehensible as > possible, to generate large discussions? Certainly not. See PEP 277 for a description of a specification of how file names are handled on Windows. Large discussions could be reduced if readers would try to constructively comment on the PEP, rather than making counter-proposals, or making statements about the PEP without making their implied assumptions explicit. Regards, Martin From asmodai at in-nomine.org Thu Apr 30 08:13:09 2009 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Thu, 30 Apr 2009 08:13:09 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F93472.5010509@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <200904301345.52776.steve@pearwood.info> <49F93472.5010509@v.loewis.de> Message-ID: <20090430061309.GH9749@nexus.in-nomine.org> -On [20090430 07:18], "Martin v. L?wis" (martin at v.loewis.de) wrote: >Suppose I create a new directory, and run the following script >in 3.x: > >py> open("x","w").close() >py> open(b"\xff","w").close() >py> os.listdir(".") >['x'] That is actually a regression in 3.x: Python 2.6.1 (r261:67515, Mar 8 2009, 11:36:21) >>> import os >>> open("x","w").close() >>> open(b"\xff","w").close() >>> os.listdir(".") ['x', '\xff'] [Apologies if that was completely clear through the entire discussion, but I've lost track at a given point.] -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Heart is the engine of your body, but Mind is the engine of Life... From tmbdev at gmail.com Thu Apr 30 08:28:28 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 08:28:28 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <7e51d15d0904290219v625d23cdy8812939da404e309@mail.gmail.com> Message-ID: <7e51d15d0904292328g3f97b19ele58e76a8b82c9d80@mail.gmail.com> On Wed, Apr 29, 2009 at 23:03, Terry Reedy wrote: > Thomas Breuel wrote: > >> >> Sure. However, that requires you to provide meaningful, reproducible >> counter-examples, rather than a stenographic formulation that might >> hint some problem you apparently see (which I believe is just not >> there). >> >> >> Well, here's another one: PEP 383 would disallow UTF-8 encodings of half >> surrogates. >> > > By my reading, the current Unicode 5.1 definition of 'UTF-8' disallows > that. If we use conformance to Unicode 5.1 as the basis for our discussion, then PEP 383 is off the table anyway. I'm all for strict Unicode compliance. But apparently, the Python community doesn't care. CESU-8 is described in Unicode Technical Report #26, so it at least has some official recognition. More importantly, it's also widely used. So, my question: what are the implications of PEP 383 for CESU-8 encodings on Python? My meta-point is: there are probably many more such issues hidden away and it is a really bad idea to rush something like PEP 383 out. Unicode is hard anyway, and tinkering with its semantics requires a lot of thought. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmbdev at gmail.com Thu Apr 30 08:32:51 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 08:32:51 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> Message-ID: <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher wrote: > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like file names and environmental variables > operate on UTF-16 strings -- there simply are no byte string APIs. Yes. Now think about the implications. This means that adopting PEP 383 will make IronPython and Jython running on UNIX intrinsically incompatible with CPython running on UNIX, and there's no way to fix that. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 30 08:41:28 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Apr 2009 08:41:28 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430061309.GH9749@nexus.in-nomine.org> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <200904301345.52776.steve@pearwood.info> <49F93472.5010509@v.loewis.de> <20090430061309.GH9749@nexus.in-nomine.org> Message-ID: <49F94818.7060701@v.loewis.de> Jeroen Ruigrok van der Werven wrote: > -On [20090430 07:18], "Martin v. L?wis" (martin at v.loewis.de) wrote: >> Suppose I create a new directory, and run the following script >> in 3.x: >> >> py> open("x","w").close() >> py> open(b"\xff","w").close() >> py> os.listdir(".") >> ['x'] > > That is actually a regression in 3.x: Correct - and precisely the issue that this PEP wants to address. For comparison, do os.listdir(u"."), though: py> os.listdir(u".") [u'x', '\xff'] Regards, Martin From martin at v.loewis.de Thu Apr 30 08:42:21 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 08:42:21 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> Message-ID: <49F9484D.9010507@v.loewis.de> Thomas Breuel wrote: > On Thu, Apr 30, 2009 at 05:40, Curt Hagenlocher > wrote: > > IronPython will inherit whatever behavior Mono has implemented. The > Microsoft CLR defines the native string type as UTF-16 and all of the > managed APIs for things like file names and environmental variables > operate on UTF-16 strings -- there simply are no byte string APIs. > > > Yes. Now think about the implications. This means that adopting PEP > 383 will make IronPython and Jython running on UNIX intrinsically > incompatible with CPython running on UNIX, and there's no way to fix that. *Not* adapting the PEP will also make CPython and IronPython incompatible, and there's no way to fix that. Regards, Martin From tmbdev at gmail.com Thu Apr 30 09:21:54 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 09:21:54 +0200 Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) Message-ID: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> Given the stated rationale of PEP 383, I was wondering what Windows actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device, plugged them into my Windows Vista machine, and fired up Python 3.0. First, os.listdir("f:") returns a list of strings for those file names... but those unicode strings are illegal. You can't even print them without getting an error from Python. In fact, you also can't print strings containing the proposed half-surrogate encodings either: in both cases, the output encoder rejects them with a UnicodeEncodeError. (If not even Python, with its generally lenient attitude, can print those things, some other libraries probably will fail, too.) What about round tripping? So, if you take a malformed file name from an external device (say, because it was actually encoded iso8859-15 or East Asian) and write it to an NTFS directory, it seems to write malformed UTF-16 file names. In essence, Windows doesn't really use unicode, it just implements 16bit raw character strings, just like UNIX historically implements raw 8bit character strings. Then I tried the same thing on my Ubuntu 9.04 machine. It turns out that, unlike Windows, Linux is seems to be moving to consistent use of valid UTF-8. If you plug in an external device and nothing else is known about it, it gets mounted with the utf8 option and the kernel actually seems to enforce UTF-8 encoding. I think this calls into question the rationale behind PEP 383, and we should first look into what the roadmap for UNIX/Linux and UTF-8 actually is. UNIX may have consistent unicode support (via UTF-8) before Windows. As I was saying, I think PEP 383 needs a lot more thought and research... Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmbdev at gmail.com Thu Apr 30 09:26:10 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 09:26:10 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9484D.9010507@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> Message-ID: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> > > Yes. Now think about the implications. This means that adopting PEP > > 383 will make IronPython and Jython running on UNIX intrinsically > > incompatible with CPython running on UNIX, and there's no way to fix > that. > > *Not* adapting the PEP will also make CPython and IronPython > incompatible, and there's no way to fix that. > CPython and IronPython are incompatible. And they will stay incompatible if the PEP is adopted. They would become compatible if CPython adopted Mono and/or Java semantics. Since both have had to deal with this, have you looked at what they actually do before proposing PEP 383? What did you find? Why did you choose an incompatible approach for PEP 383? Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Apr 30 09:29:36 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 30 Apr 2009 00:29:36 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <49EEBE2E.3090601@v.loewis.de> <79990c6b0904240859s344398abp4014ed02d87e35e2@mail.gmail.com> <87tz4eklqo.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0904241505w675a21abi1b1eec270acc4e8c@mail.gmail.com> <87ocujle67.fsf@uwakimon.sk.tsukuba.ac.jp> <87zle2jdza.fsf@uwakimon.sk.tsukuba.ac.jp> <87r5zdjsm7.fsf@uwakimon.sk.tsukuba.ac.jp> <49F73635.6010105@v.loewis.de> <49F74F85.9010800@g.nevcal.com> <49F76623.8060903@v.loewis.de> <49F768F3.8080304@g.nevcal.com> <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> <49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com> Message-ID: <49F95360.8070808@g.nevcal.com> On approximately 4/29/2009 8:46 PM, came the following characters from the keyboard of Terry Reedy: > Glenn Linderman wrote: >> On approximately 4/29/2009 1:28 PM, came the following characters from > >>> So where is the ambiguity here? >> >> None. But not everyone can read all the Python source code to try to >> understand it; they expect the documentation to help them avoid that. >> Because the documentation is lacking in this area, it makes your >> concisely stated PEP rather hard to understand. > > If you think a section of the doc is grossly inadequate, and there is no > existing issue on the tracker, feel free to add one. > >> Thanks for clarifying the Windows behavior, here. A little more >> clarification in the PEP could have avoided lots of discussion. It >> would seem that a PEP, proposed to modify a poorly documented (and >> therefore likely poorly understood) area, should be educational about >> the status quo, as well as presenting the suggested change. > > Where the PEP proposes to change, it should start with the status quo. > But Martin's somewhat reasonable position is that since he is not > proposing to change behavior on Windows, it is not his responsibility to > document what he is not proposing to change more adequately. This > means, of course, that any observed change on Windows would then be a > bug, or at least a break of the promise. On the other hand, I can see > that this is enough related to what he is proposing to change that > better doc would help. Yes; the very fact that the PEP discusses Windows, speaks about cross-platform code, and doesn't explicitly state that no Windows functionality will change, is confusing. An example of how to initialize things within a sample cross-platform application might help, especially if that initialization only happens if the platform is POSIX, or is commented to the effect that it has no effect on Windows, but makes POSIX happy. Or maybe it is all buried within the initialization of Python itself, and is not exposed to the application at all. I still haven't figured that out, but was not (and am still not) as concerned about that as ensuring that the overall algorithms are functional and useful and user-friendly. Showing it might have been helpful in making it clear that no Windows functionality would change, however. A statement that additional features are being added to allow cross-platform programs deal with non-decodable bytes obtained from POSIX APIs using the same code that already works on Windows, would have made things much clearer. The present Abstract does, in fact, talk only about POSIX, but later statements about Windows muddy the water. Rationale paragraph 3, explicitly talks about cross-platform programs needing to work one way on Windows and another way on POSIX to deal with all the cases. It calls that a proposal, which I guess it is for command line and environment, but it is already implemented in both bytes and str forms for file names... so that further muddies the water. It is, of course, easier to point out deficiencies in a document than to write a better document; however, it is incumbent upon the PEP author to write a PEP that is good enough to get approved, and that means making it understandable enough that people are in favor... or to respond to the plethora of comments until people are in favor. I'm not sure which one is more time-consuming. I've reached the point, based on PEP and comment responses, where I now believe that the PEP is a solution to the problem it is trying to solve, and doesn't create ambiguities in the naming. I don't believe it is the best solution. The basic problem is the overuse of fake characters... normalizing them for display results is large data loss -- many characters would be translated to the same replacement characters. Solutions exist that would allow the use of fewer different fake characters in the strings, while still having a fake character as the escape character, to preserve the invariant that all the strings manipulated by python-escape from the PEP were, and become, strings containing fake characters (from a strict Unicode perspective), which is a nice invariant*. There even exist solutions that would use only one fake character (repeatedly if necessary), and all other characters generated would be displayable characters. This would ease the burden on the program in displaying the strings, and also on the user that might view the resulting mojibake in trying to differentiate one such string from another. Those are outlined in various emails in this thread, although some include my misconception that strings obtained via Unicode-enabled OS APIs would also need to be encoded and altered. If there is any interest in using a more readable encoding, I'd be glad to rework them to remove those misconceptions. * It would be nice to point out that invariant in the PEP, also. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Thu Apr 30 09:38:29 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 30 Apr 2009 00:38:29 -0700 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F93472.5010509@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <200904301345.52776.steve@pearwood.info> <49F93472.5010509@v.loewis.de> Message-ID: <49F95575.5010408@g.nevcal.com> On approximately 4/29/2009 10:17 PM, came the following characters from the keyboard of Martin v. L?wis: >> I don't understand the proposal and issues. I see a lot of people >> claiming that they do, and then spending all their time either >> talking past each other, or disagreeing. If everyone who claims they >> understand the issues actually does, why is it so hard to reach a >> consensus? > > Because the problem is difficult, and any solution has trade-offs. > People disagree on which trade-offs are worse than others. > >> I'd like to see some real examples of how things can break in the >> current system > > Suppose I create a new directory, and run the following script > in 3.x: > > py> open("x","w").close() > py> open(b"\xff","w").close() > py> os.listdir(".") > ['x'] but... py> os.listdir(b".") ['x', '\xff'] > If I quit Python, I can now do > > martin at mira:~/work/3k/t$ ls > ? x > martin at mira:~/work/3k/t$ ls -b > \377 x > > As you can see, there are two files in the current directory, but > only one of them is reported by os.listdir. The same happens to > command line arguments and environment variables: Python might swallow > some of them. There is presently no solution for command line and environment variables, I guess... which adds some amount of urgency to the implementation of _something_, even if not this PEP. >> and I'd like any potential solution to be made >> available as a third-party package before it goes into the standard >> library (if possible). > > Unfortunately, at least for my solution, this isn't possible. I need > to change the implementation of the existing file IO APIs. Other than initializing them to use UTF-8b instead of UTF-8, and to use the new python-escape handler? I'm sure if I read the code for that, I'd be able to figure out the answer... I don't find any documented way of adding an encoding/decoding handler to the file IO encoding technique, though which lends credence to your statement, but then that could also be an oversight on my part. One could envision a staged implementation: the addition of the ability to add encoding/decoding handlers to the file IO encoding/decoding process, and the external selection of your new python-escape handler during application startup. That way, the hooks would be in the file system to allow your solution to be used, but not require that it be used; competing solutions using similar technology could be implemented and evaluated. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From v+python at g.nevcal.com Thu Apr 30 09:58:16 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 30 Apr 2009 00:58:16 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090430025050.GB1544@panix.com> References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <20090429232852.GA26172@cskk.homeip.net> <20090430025050.GB1544@panix.com> Message-ID: <49F95A18.4040907@g.nevcal.com> On approximately 4/29/2009 7:50 PM, came the following characters from the keyboard of Aahz: > On Thu, Apr 30, 2009, Cameron Simpson wrote: >> The lengthy discussion mostly revolves around: >> >> - Glenn points out that strings that came _not_ from listdir, and that are >> _not_ well-formed unicode (== "have bare surrogates in them") but that >> were intended for use as filenames will conflict with the PEP's scheme - >> programs must know that these strings came from outside and must be >> translated into the PEP's funny-encoding before use in the os.* >> functions. Previous to the PEP they would get used directly and >> encode differently after the PEP, thus producing different POSIX >> filenames. Breakage. >> >> - Glenn would like the encoding to use Unicode scalar values only, >> using a rare-in-filenames character. >> That would avoid the issue with "outside' strings that contain >> surrogates. To my mind it just moves the punning from rare illegal >> strings to merely uncommon but legal characters. >> >> - Some parties think it would be better to not return strings from >> os.listdir but a subclass of string (or at least a duck-type of >> string) that knows where it came from and is also handily >> recognisable as not-really-a-string for purposes of deciding >> whether is it PEP-funny-encoded by direct inspection. > > Assuming people agree that this is an accurate summary, it should be > incorporated into the PEP. I'll agree that once other misconceptions were explained away, that the remaining issues are those Cameron summarized. Thanks for the summary! Point two could be modified because I've changed my opinion; I like the invariant Cameron first (I think) explicitly stated about the PEP as it stands, and that I just reworded in another message, that the strings that are altered by the PEP in either direction are in the subset of strings that contain fake (from a strict Unicode viewpoint) characters. I still think an encoding that uses mostly real characters that have assigned glyphs would be better than the encoding in the PEP; but would now suggest that an escape character be a fake character. I'll note here that while the PEP encoding causes illegal bytes to be translated to one fake character, the 3-byte sequence that looks like the range of fake characters would also be translated to a sequence of 3 fake characters. This is 512 combinations that must be translated, and understood by the user (or at least by the programmer). The "escape sequence" approach requires changing only 257 combinations, and each altered combination would result in exactly 2 characters. Hence, this seems simpler to understand, and to manually encode and decode for debugging purposes. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Thu Apr 30 10:21:39 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 10:21:39 +0200 Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> Message-ID: <49F95F93.5090204@v.loewis.de> Thomas Breuel wrote: > Given the stated rationale of PEP 383, I was wondering what Windows > actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file > names on a device, plugged them into my Windows Vista machine, and fired > up Python 3.0. How did you do that, and what were the specific names that you had chosen? How does explorer display the file names? > First, os.listdir("f:") returns a list of strings for those file > names... but those unicode strings are illegal. What was the exact result that you got? > You can't even print them without getting an error from Python. This is unrelated to the PEP. Try to run the same code in IDLE, or use the ascii() function. > What about round tripping? So, if you take a malformed file name from an > external device (say, because it was actually encoded iso8859-15 or East > Asian) and write it to an NTFS directory, it seems to write malformed > UTF-16 file names. In essence, Windows doesn't really use unicode, it > just implements 16bit raw character strings, just like UNIX historically > implements raw 8bit character strings. I think you misinterpreted what you saw. To find out what way you misinterpreted it, we would have to know what it is that you saw. > I think this calls into > question the rationale behind PEP 383, and we should first look into > what the roadmap for UNIX/Linux and UTF-8 actually is. UNIX may have > consistent unicode support (via UTF-8) before Windows. If so, PEP 383 won't hurt. If you never get decode errors for file names, you can just ignore PEP 383. It's only for those of us who do get decode errors. Regards, Martin From martin at v.loewis.de Thu Apr 30 10:25:57 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 10:25:57 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> Message-ID: <49F96095.4000208@v.loewis.de> > CPython and IronPython are incompatible. And they will stay > incompatible if the PEP is adopted. > > They would become compatible if CPython adopted Mono and/or Java > semantics. Which one should it adopt? Mono semantics, or Java semantics? > Since both have had to deal with this, have you looked at what they > actually do before proposing PEP 383? What did you find? See http://mail.python.org/pipermail/python-3000/2007-September/010450.html > Why did you choose an incompatible approach for PEP 383? Because in Python, we want to be able to access all files on disk. Neither Java nor Mono are capable of doing that. Regards, Martin From martin at v.loewis.de Thu Apr 30 10:48:27 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 10:48:27 +0200 Subject: [Python-Dev] PEP 383 and GUI libraries Message-ID: <49F965DB.6050601@v.loewis.de> I checked how GUI libraries deal with half surrogates. In pygtk, a warning gets issued to the console /tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to pango_layout_set_text() self.window.show() and then the widget contains three crossed boxes. wxpython (in its wxgtk version) behaves the same way. PyQt displays a single square box. Regards, Martin From v+python at g.nevcal.com Thu Apr 30 10:55:12 2009 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 30 Apr 2009 01:55:12 -0700 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <49F965DB.6050601@v.loewis.de> References: <49F965DB.6050601@v.loewis.de> Message-ID: <49F96770.4080206@g.nevcal.com> On approximately 4/30/2009 1:48 AM, came the following characters from the keyboard of Martin v. L?wis: > I checked how GUI libraries deal with half surrogates. > In pygtk, a warning gets issued to the console > > /tmp/helloworld.py:71: PangoWarning: Invalid UTF-8 string passed to > pango_layout_set_text() > self.window.show() > > and then the widget contains three crossed boxes. > > wxpython (in its wxgtk version) behaves the same way. > > PyQt displays a single square box. Interesting. Did you use a name with other characters? Were they displayed? Both before and after the surrogates? Did you use one or three half surrogates, to produce the three crossed boxes? Did you use one or three half surrogates, to produce the single square box? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From martin at v.loewis.de Thu Apr 30 11:08:08 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 11:08:08 +0200 Subject: [Python-Dev] PEP 382 update Message-ID: <49F96A78.6060404@v.loewis.de> Guido found out that I had misunderstood the existing pkg mechanism: If a "zope" package is imported, and it uses pkgutil.extend_path, then it won't glob for files ending in .pkg, but instead searches the path for files named zope.pkg. IOW, this is unsuitable as a foundation of PEP 382. I have now changed the PEP to call the files .pth, more in line with how top-level .pth files work, and added a statement that the import feature of .pth files is not provided for package .pth files (use __init__.py instead). Regards, Martin From martin at v.loewis.de Thu Apr 30 11:12:32 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 11:12:32 +0200 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <49F96770.4080206@g.nevcal.com> References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> Message-ID: <49F96B80.5090808@v.loewis.de> > Did you use a name with other characters? Were they displayed? Both > before and after the surrogates? Yes, yes, and yes (IOW, I put the surrogate in the middle). > Did you use one or three half surrogates, to produce the three crossed > boxes? Only one, and it produced three boxes - probably one for each UTF-8 byte that pango considered invalid. > Did you use one or three half surrogates, to produce the single square box? Again, only one. Apparently, PyQt passes the Python Unicode string to Qt in a character-by-character representation, rather than going through UTF-8. Regards, Martin From tmbdev at gmail.com Thu Apr 30 11:20:10 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 11:20:10 +0200 Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) In-Reply-To: <49F95F93.5090204@v.loewis.de> References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> <49F95F93.5090204@v.loewis.de> Message-ID: <7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com> On Thu, Apr 30, 2009 at 10:21, "Martin v. L?wis" wrote: > Thomas Breuel wrote: > > Given the stated rationale of PEP 383, I was wondering what Windows > > actually does. So, I created some ISO8859-15 and ISO8859-8 encoded file > > names on a device, plugged them into my Windows Vista machine, and fired > > up Python 3.0. > > How did you do that, and what were the specific names that you > had chosen? There are several different ways I tried it. The easiest was to mount a vfat file system with various encodings on Linux and use the Python byte interface to write file names, then plug that flash drive into Windows. > I think you misinterpreted what you saw. To find out what way you > misinterpreted it, we would have to know what it is that you saw. I didn't interpret it much at all. I'm just saying that the PEP 383 assumption that these problems can't occur on Windows isn't true. I can plug in a flash drive with malformed strings, and somewhere between the disk and Python, something maps those strings onto unicode in some way, and it's done in a way that's different from PEP 383. Mono and Java must have their own solutions that are different from PEP 383. My point remains that I think PEP 383 shouldn't be rushed through, and one should look more carefully first at what the Windows kernel does in these situations, and what Mono and Java do. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 30 11:24:43 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 11:24:43 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090430025050.GB1544@panix.com> References: <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <20090429232852.GA26172@cskk.homeip.net> <20090430025050.GB1544@panix.com> Message-ID: <49F96E5B.5010107@v.loewis.de> > Assuming people agree that this is an accurate summary, it should be > incorporated into the PEP. Done! Regards, Martin From martin at v.loewis.de Thu Apr 30 11:35:02 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 11:35:02 +0200 Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com> References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> <49F95F93.5090204@v.loewis.de> <7e51d15d0904300220o1e8a78f9pc54bf3fe8148bd67@mail.gmail.com> Message-ID: <49F970C6.1000407@v.loewis.de> > There are several different ways I tried it. The easiest was to mount a > vfat file system with various encodings on Linux and use the Python byte > interface to write file names, then plug that flash drive into Windows. So can you share precisely what you have done, to allow others to reproduce it? > I think you misinterpreted what you saw. To find out what way you > misinterpreted it, we would have to know what it is that you saw. > > > I didn't interpret it much at all. I'm just saying that the PEP 383 > assumption that these problems can't occur on Windows isn't true. What are "these problems", and where does PEP 383 say they can't occur on Windows? What could Python do differently on Windows? > I can plug in a flash drive with malformed strings, and somewhere > between the disk and Python, something maps those strings onto unicode > in some way, and it's done in a way that's different from PEP 383. Of course it is. The Windows FAT driver has chosen some mapping for the file names to Unicode, and most likely not the encoding that you meant it to use. There is now no way for a Win32 application to find out how the file name is actually represented on disk, short of implementing the FAT file system itself. So what Python does is the best possible solution already - report the file names as-is, with no interpretation. > My point remains that I think PEP 383 shouldn't be rushed through, and > one should look more carefully first at what the Windows kernel does in > these situations, and what Mono and Java do. These questions really have been studied on this list for the last eight years, over and over again. It's not being rushed. Regards, Martin From martin at v.loewis.de Thu Apr 30 11:42:13 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 11:42:13 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> Message-ID: <49F97275.3010307@v.loewis.de> > I think it has to be excluded from mapping in order to not introduce > security issues. I think you are right. I have now excluded ASCII bytes from being mapped, effectively not supporting any encodings that are not ASCII compatible. Does that sound ok? Regards, Martin From tmbdev at gmail.com Thu Apr 30 11:44:40 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 11:44:40 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F96095.4000208@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> Message-ID: <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> > > > Since both have had to deal with this, have you looked at what they > > actually do before proposing PEP 383? What did you find? > > See > > http://mail.python.org/pipermail/python-3000/2007-September/010450.html > Thanks, that's very useful. > > Why did you choose an incompatible approach for PEP 383? > > Because in Python, we want to be able to access all files on disk. > Neither Java nor Mono are capable of doing that. OK, so what's wrong with os.listdir() and similar functions returning a unicode string for strings that correctly encode/decode, and with byte strings for strings that are not valid unicode? The file I/O functions already seem to deal with byte strings correctly, you never get byte strings on platforms that are fully unicode, and they are well supported. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 30 12:32:47 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 12:32:47 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> Message-ID: <49F97E4F.7080100@v.loewis.de> > OK, so what's wrong with os.listdir() and similar functions returning a > unicode string for strings that correctly encode/decode, and with byte > strings for strings that are not valid unicode? See http://bugs.python.org/issue3187 in particular msg71655 Regards, Martin From solipsis at pitrou.net Thu Apr 30 12:48:36 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 30 Apr 2009 10:48:36 +0000 (UTC) Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> Message-ID: Thomas Breuel gmail.com> writes: > > So, I created some ISO8859-15 and ISO8859-8 encoded file names on a device, plugged them into my Windows Vista machine, and fired up Python 3.0.First, os.listdir("f:") returns a list of strings for those file names... but those unicode strings are illegal. Sorry, when you report such experiments, is it too much to ask for a cut and paste of your Python session? You are being unhelpful with such unsubstantiated statements, and your mails are taking a lot of valuable bandwidth. Antoine. From tmbdev at gmail.com Thu Apr 30 12:56:03 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 12:56:03 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F97E4F.7080100@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> <49F97E4F.7080100@v.loewis.de> Message-ID: <7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com> On Thu, Apr 30, 2009 at 12:32, "Martin v. L?wis" wrote: > > OK, so what's wrong with os.listdir() and similar functions returning a > > unicode string for strings that correctly encode/decode, and with byte > > strings for strings that are not valid unicode? > > See http://bugs.python.org/issue3187 > in particular msg71655 > Why didn't you point to that discussion from the PEP 383? And why didn't you point to Kowalczyk's message on encodings in Mono, Java, etc. from the PEP? You could have saved us all a lot of time. Under the set of constraints that Guido imposes, plus the requirement that round-trip works for illegal encodings, there is no other solution than PEP 383. That doesn't make PEP 383 right--I still think it's a bad decision--but it makes it pointless to discuss it any further. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Apr 30 13:02:08 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 30 Apr 2009 12:02:08 +0100 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F97E4F.7080100@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> <49F97E4F.7080100@v.loewis.de> Message-ID: <79990c6b0904300402p5f8adff3r10c70f6279944f56@mail.gmail.com> 2009/4/30 "Martin v. L?wis" : >> OK, so what's wrong with os.listdir() and similar functions returning a >> unicode string for strings that correctly encode/decode, and with byte >> strings for strings that are not valid unicode? > > See http://bugs.python.org/issue3187 > in particular msg71655 Can I suggest that a pointer to this issue be added to the PEP? It certainly seems like a lot of the discussion of options available is captured there. And the fact that Guido's views are noted there is also useful (as he hasn't been contributing to this thread). 2009/4/30 Thomas Breuel : >> > Since both have had to deal with this, have you looked at what they >> > actually do before proposing PEP 383? What did you find? >> >> See >> >> http://mail.python.org/pipermail/python-3000/2007-September/010450.html > > Thanks, that's very useful. This reference could probably be usefully added to the PEP as well. Paul. From glyph at divmod.com Thu Apr 30 13:26:34 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Apr 2009 11:26:34 -0000 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F96095.4000208@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> Message-ID: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> On 08:25 am, martin at v.loewis.de wrote: >>Why did you choose an incompatible approach for PEP 383? > >Because in Python, we want to be able to access all files on disk. >Neither Java nor Mono are capable of doing that. Java is not capable of doing that. Mono, as I keep pointing out, is. It uses NULLs to escape invalid UNIX filenames. Please see: http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, access, and open all files on your filesystem, regardless of encoding." From tmbdev at gmail.com Thu Apr 30 13:34:47 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 13:34:47 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> Message-ID: <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com> > > Java is not capable of doing that. Mono, as I keep pointing out, is. It > uses NULLs to escape invalid UNIX filenames. Please see: > > http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > access, and open all files on your filesystem, regardless of encoding." > OK, so why not adopt the Mono solution in CPython? It seems to produce valid unicode strings, removing at least one issue with PEP 383. It also means that IronPython and CPython actually would be compatible. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Thu Apr 30 13:49:18 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 30 Apr 2009 07:49:18 -0400 (EDT) Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> Message-ID: On Thu, 30 Apr 2009 at 11:26, glyph at divmod.com wrote: > On 08:25 am, martin at v.loewis.de wrote: >> > Why did you choose an incompatible approach for PEP 383? >> >> Because in Python, we want to be able to access all files on disk. >> Neither Java nor Mono are capable of doing that. > > Java is not capable of doing that. Mono, as I keep pointing out, is. It uses > NULLs to escape invalid UNIX filenames. Please see: > > http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > access, and open all files on your filesystem, regardless of encoding." And then it goes on to say: "You won't be able to pass non-Unicode filenames as command-line arguments."(*) Not only that, but you can't reliably use such files with System.IO (whatever that is, but it sounds pretty basic). This support is only available "within the Mono.Unix and Mono.Unix.Native namespaces". Now, I don't know what that means (never having touched Mono), but it doesn't sound like it simplifies cross-platform support, which is what PEP 383 is aiming for. So it doesn't sound like Mono has solved the problem that Martin is trying to solve, even if it is possible to put Unix specific code into your Mono ap to deal with byte filenames on disk from within your GUI. FWIW I'm +1 on seeing PEP 383 in 3.1, if Martin can manage the patch in time. --David (*) I'd argue that in an important sense that makes Martin's statement about Mono being unable to access all files on disk a true statement; but, then, I freely admit that I have a bias against GUI programs in general :) From martin at v.loewis.de Thu Apr 30 14:25:28 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 14:25:28 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <7e51d15d0904300244w16654258x67c116937bc60210@mail.gmail.com> <49F97E4F.7080100@v.loewis.de> <7e51d15d0904300356k208304ech7d934d10bb809c38@mail.gmail.com> Message-ID: <49F998B8.8080602@v.loewis.de> > Why didn't you point to that discussion from the PEP 383? And why > didn't you point to Kowalczyk's message on encodings in Mono, Java, etc. > from the PEP? Because I assumed that readers of the PEP would know (and I'm sure many of them do - this has been *really* discussed over and over again). > Under the set of constraints that Guido imposes, plus the requirement > that round-trip works for illegal encodings, there is no other solution > than PEP 383. Well, there actually is an alternative: expose byte-oriented interfaces in parallel with the string-oriented ones. In the rationale, the PEP explains why I consider this the worse choice. Regards, Martin From tmbdev at gmail.com Thu Apr 30 14:59:55 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 14:59:55 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> Message-ID: <7e51d15d0904300559v40c4cc53x498184af9485bc17@mail.gmail.com> > > And then it goes on to say: "You won't be able to pass non-Unicode > filenames as command-line arguments."(*) Not only that, but you can't > reliably use such files with System.IO (whatever that is, but it > sounds pretty basic). This support is only available "within the > Mono.Unix and Mono.Unix.Native namespaces". Now, I don't know what > that means (never having touched Mono), but it doesn't sound like > it simplifies cross-platform support, which is what PEP 383 is aiming for. The problem there isn't how the characters are quoted, but that they are quoted at all, and that the ECMA and Microsoft libraries don't understand this quoting convention. Since command line parsing is handled through ECMA, you happen not to be able to get at those files (that's fixable, but why bother). The analogous problem exists with Martin's proposal on Python: if you pass a unicode string from Python to some library through a unicode API and that library attempts to open the file, it will fail because it doesn't use the proposed Python utf-8b decoder. There just is no way to fix that, no matter which quoting convention you use. In contrast to PEP 383, quoting with u0000 at least results in valid unicode strings in Python. And command line arguments (and environment variables etc.) would work in Python because in Python, those should also use the new encoding for invalid UTF-8 inputs. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 30 15:04:59 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Apr 2009 15:04:59 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> Message-ID: <49F9A1FB.4090104@v.loewis.de> >> Because in Python, we want to be able to access all files on disk. >> Neither Java nor Mono are capable of doing that. > > Java is not capable of doing that. Mono, as I keep pointing out, is. It > uses NULLs to escape invalid UNIX filenames. Please see: > > http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > access, and open all files on your filesystem, regardless of encoding." I think this is misleading. With Mono 2.0.1, I get ** (/tmp/a.exe:30553): WARNING **: FindNextFile: Bad encoding for '/home/martin/work/3k/t/\xff' Consider using MONO_EXTERNAL_ENCODINGS when running the program using System.IO; class X{ public static void Main(string[] args){ DirectoryInfo di = new DirectoryInfo("."); foreach(FileInfo fi in di.GetFiles()) System.Console.WriteLine("Next:"+fi.Name); } } On the other hand, when I write using Mono.Unix; class X{ public static void Main(string[] args){ UnixDirectoryInfo di = new UnixDirectoryInfo("."); foreach(UnixFileSystemInfo fi in di.GetFileSystemEntries()) System.Console.WriteLine("Next:"+fi.Name); } } I get indeed all files listed (and can also find out the other stat results). Of course, the resulting application will be mono-specific (it links with Mono.Posix), and not work on Microsoft .NET anymore. IOW, IronPython likely won't use this API. Python, of course, already has the equivalent of that: os.listdir, with a byte parameter, will give you access to all files. If you wanted to closely emulate the Mono API, you could set the file system encoding to the mono-lookalike codec. Regards, Martin From martin at v.loewis.de Thu Apr 30 15:06:57 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 15:06:57 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com> Message-ID: <49F9A271.8050700@v.loewis.de> > OK, so why not adopt the Mono solution in CPython? It seems to produce > valid unicode strings, removing at least one issue with PEP 383. It > also means that IronPython and CPython actually would be compatible. See my other message. The Mono solution may not be what you expect it to be. Regards, Martin From tmbdev at gmail.com Thu Apr 30 15:10:47 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 15:10:47 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9A1FB.4090104@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> Message-ID: <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> > > > > "The upshot to all this is that Mono.Unix and Mono.Unix.Native can list, > > access, and open all files on your filesystem, regardless of encoding." > > I think this is misleading. With Mono 2.0.1, I get This has nothing to do with how Mono quotes. The reason for this is that Mono quotes at all and that the Mono developers decided not to change System.IO to understand UNIX quoting. If Mono used PEP 383 quoting, this would fail the same way. And analogous failures will exist with PEP 383 in Python, because there will be more and more libraries with unicode interfaces that then use their own internal decoder (which doesn't understand utf8b) to get a UNIX file name. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Apr 30 15:32:01 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 15:32:01 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> Message-ID: <49F9A851.5010006@v.loewis.de> > This has nothing to do with how Mono quotes. The reason for this is > that Mono quotes at all and that the Mono developers decided not to > change System.IO to understand UNIX quoting. > > If Mono used PEP 383 quoting, this would fail the same way. > > And analogous failures will exist with PEP 383 in Python, because there > will be more and more libraries with unicode interfaces that then use > their own internal decoder (which doesn't understand utf8b) to get a > UNIX file name. What's an analogous failure? Or, rather, why would a failure analogous to the one I got when using System.IO.DirectoryInfo ever exist in Python? Regards, Martin From aahz at pythoncraft.com Thu Apr 30 15:42:36 2009 From: aahz at pythoncraft.com (Aahz) Date: Thu, 30 Apr 2009 06:42:36 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F95360.8070808@g.nevcal.com> References: <49F76F03.8040702@v.loewis.de> <49F788A6.3040702@g.nevcal.com> <49F7EB17.4010309@v.loewis.de> <49F7F99D.8070606@g.nevcal.com> <49F801C1.2070109@v.loewis.de> <49F82435.3060205@g.nevcal.com> <49F8B886.5020700@v.loewis.de> <49F8C206.5070801@g.nevcal.com> <49F95360.8070808@g.nevcal.com> Message-ID: <20090430134236.GA12664@panix.com> [top-posting for once to preserve full quoting] Glenn, Could you please reduce your suggestions into sample text for the PEP? We seem to be now at the stage where nobody is objecting to the PEP, so the focus should be on making the PEP clearer. If you still want to create an alternative PEP implementation, please provide step-by-step walkthroughs, preferably in a new thread -- if you did previously provide that, it's gotten lost in the flood of messages. On Thu, Apr 30, 2009, Glenn Linderman wrote: > On approximately 4/29/2009 8:46 PM, came the following characters from > the keyboard of Terry Reedy: >> Glenn Linderman wrote: >>> On approximately 4/29/2009 1:28 PM, came the following characters >>> from >> >>>> So where is the ambiguity here? >>> >>> None. But not everyone can read all the Python source code to try to >>> understand it; they expect the documentation to help them avoid that. >>> Because the documentation is lacking in this area, it makes your >>> concisely stated PEP rather hard to understand. >> >> If you think a section of the doc is grossly inadequate, and there is >> no existing issue on the tracker, feel free to add one. >> >>> Thanks for clarifying the Windows behavior, here. A little more >>> clarification in the PEP could have avoided lots of discussion. It >>> would seem that a PEP, proposed to modify a poorly documented (and >>> therefore likely poorly understood) area, should be educational about >>> the status quo, as well as presenting the suggested change. >> >> Where the PEP proposes to change, it should start with the status quo. >> But Martin's somewhat reasonable position is that since he is not >> proposing to change behavior on Windows, it is not his responsibility >> to document what he is not proposing to change more adequately. This >> means, of course, that any observed change on Windows would then be a >> bug, or at least a break of the promise. On the other hand, I can see >> that this is enough related to what he is proposing to change that >> better doc would help. > > > Yes; the very fact that the PEP discusses Windows, speaks about > cross-platform code, and doesn't explicitly state that no Windows > functionality will change, is confusing. > > An example of how to initialize things within a sample cross-platform > application might help, especially if that initialization only happens > if the platform is POSIX, or is commented to the effect that it has no > effect on Windows, but makes POSIX happy. Or maybe it is all buried > within the initialization of Python itself, and is not exposed to the > application at all. I still haven't figured that out, but was not (and > am still not) as concerned about that as ensuring that the overall > algorithms are functional and useful and user-friendly. Showing it > might have been helpful in making it clear that no Windows functionality > would change, however. > > A statement that additional features are being added to allow > cross-platform programs deal with non-decodable bytes obtained from > POSIX APIs using the same code that already works on Windows, would have > made things much clearer. The present Abstract does, in fact, talk only > about POSIX, but later statements about Windows muddy the water. > > Rationale paragraph 3, explicitly talks about cross-platform programs > needing to work one way on Windows and another way on POSIX to deal with > all the cases. It calls that a proposal, which I guess it is for > command line and environment, but it is already implemented in both > bytes and str forms for file names... so that further muddies the water. > > It is, of course, easier to point out deficiencies in a document than to > write a better document; however, it is incumbent upon the PEP author to > write a PEP that is good enough to get approved, and that means making > it understandable enough that people are in favor... or to respond to > the plethora of comments until people are in favor. I'm not sure which > one is more time-consuming. > > I've reached the point, based on PEP and comment responses, where I now > believe that the PEP is a solution to the problem it is trying to solve, > and doesn't create ambiguities in the naming. I don't believe it is the > best solution. > > The basic problem is the overuse of fake characters... normalizing them > for display results is large data loss -- many characters would be > translated to the same replacement characters. > > Solutions exist that would allow the use of fewer different fake > characters in the strings, while still having a fake character as the > escape character, to preserve the invariant that all the strings > manipulated by python-escape from the PEP were, and become, strings > containing fake characters (from a strict Unicode perspective), which is > a nice invariant*. There even exist solutions that would use only one > fake character (repeatedly if necessary), and all other characters > generated would be displayable characters. This would ease the burden > on the program in displaying the strings, and also on the user that > might view the resulting mojibake in trying to differentiate one such > string from another. Those are outlined in various emails in this > thread, although some include my misconception that strings obtained via > Unicode-enabled OS APIs would also need to be encoded and altered. If > there is any interest in using a more readable encoding, I'd be glad to > rework them to remove those misconceptions. > > * It would be nice to point out that invariant in the PEP, also. > > > -- > Glenn -- http://nevcal.com/ > =========================== > A protocol is complete when there is nothing left to remove. > -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/aahz%40pythoncraft.com -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair From google at mrabarnett.plus.com Thu Apr 30 16:01:01 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 30 Apr 2009 15:01:01 +0100 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9A271.8050700@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <87eivbk5zx.fsf@uwakimon.sk.tsukuba.ac.jp> <7e51d15d0904292016u7848e093i355df00eed338c77@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <7e51d15d0904300434q7b667ed9lf989738c7fbdfc36@mail.gmail.com> <49F9A271.8050700@v.loewis.de> Message-ID: <49F9AF1D.8060100@mrabarnett.plus.com> Martin v. L?wis wrote: >> OK, so why not adopt the Mono solution in CPython? It seems to produce >> valid unicode strings, removing at least one issue with PEP 383. It >> also means that IronPython and CPython actually would be compatible. > > See my other message. The Mono solution may not be what you expect it to be. > Have we considered discussing the problem with the developers and users of the other languages to reach a common solution? From apt.shansen at gmail.com Thu Apr 30 16:05:29 2009 From: apt.shansen at gmail.com (Stephen Hansen) Date: Thu, 30 Apr 2009 07:05:29 -0700 Subject: [Python-Dev] what Windows and Linux really do Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> References: <7e51d15d0904300021t3e623cc6p862381f4c631e7c1@mail.gmail.com> Message-ID: <7a9c25c20904300705i2237ab62pb9f14b19c46252b@mail.gmail.com> > > You can't even print them without getting an error from Python. In fact, > you also can't print strings containing the proposed half-surrogate > encodings either: in both cases, the output encoder rejects them with a > UnicodeEncodeError. (If not even Python, with its generally lenient > attitude, can print those things, some other libraries probably will fail, > too.) > I think you may be confusing two completely separate things; its a long-known issue that the windows console is simply not a Unicode-aware display device naturally. You have to manually set the codepage (by typing 'chcp 65001' -- that's utf8) *and* manually make sure you have a unicode-enabled font chosen for it (which for console fonts is extremely limited to none, and last I looked the default font didn't support unicode) before you can even try to successfully print valid unicode. The default codepage is 437 (for me at least; I think it depends on which language of Windows you're using) which is ASCII-/ish/. You have to do your test in an environment which actually supports displaying unicode at all, or its meaningless. Personally and for all the use cases I have to deal with at work, I would /love/ to see this PEP succeed. Being able to query a list of files in a directory and get them -all-, display them all to a user (which necessitates it being converted to unicode one way or the other. I don't care if certain characters don't display: as long as any arbitrary file will always end up looking like a distinct series of readable and unreadable glyphs so the user can select it clearly), and then perform operations on any selected file regardless of whatever nonsense may be going on underneath with confused users and encodings... in a cross-platform way, would be a tremendous boon to future py3k porting efforts. I ramble. If there's inconsistent encodings used by users on a posix system so that they can only make sense of half of what the names really are... that's for other programs to deal with. I just want to be able to access the files they tell me they want. For anyone who is doing something low-level, they can use the bytes API. --Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Apr 30 16:39:31 2009 From: guido at python.org (Guido van Rossum) Date: Thu, 30 Apr 2009 07:39:31 -0700 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: <49F96B80.5090808@v.loewis.de> References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: FWIW, I'm in agreement with this PEP (i.e. its status is now Accepted). Martin, you can update the PEP and start the implementation. On Thu, Apr 30, 2009 at 2:12 AM, "Martin v. L?wis" wrote: >> Did you use a name with other characters? ?Were they displayed? ?Both >> before and after the surrogates? > > Yes, yes, and yes (IOW, I put the surrogate in the middle). > >> Did you use one or three half surrogates, to produce the three crossed >> boxes? > > Only one, and it produced three boxes - probably one for each UTF-8 byte > that pango considered invalid. > >> Did you use one or three half surrogates, to produce the single square box? > > Again, only one. Apparently, PyQt passes the Python Unicode string to Qt > in a character-by-character representation, rather than going through UTF-8. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tmbdev at gmail.com Thu Apr 30 16:42:45 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 16:42:45 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9A851.5010006@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> Message-ID: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> > > What's an analogous failure? Or, rather, why would a failure analogous > to the one I got when using System.IO.DirectoryInfo ever exist in > Python? Mono.Unix uses an encoder and a decoder that knows about special quoting rules. System.IO uses a different encoder and decoder because it's a reimplementation of a Microsoft library and the Mono developers chose not to implement Mono.Unix quoting rules in it. There is nothing technical preventing System.IO from using the Mono.Unix codec, it's just that the developers didn't want to change the behavior of an ECMA and Microsoft library. The analogous phenomenon will exist in Python with PEP 383. Let's say I have a C library with wide character interfaces and I pass it a unicode string from Python.(*) That C library now turns that unicode string into UTF-8 for writing to disk using its internal UTF-8 converter. The result is that the file can be opened using Python's "open", but it can't be opened using the other library. There simply is no way you can guarantee that all libraries turn unicode strings into pathnames using utf-8b. I'm not arguing about whether that's good or bad anymore, since it's obvious that the only proposal acceptable to Guido uses some form of non-standard encoding / quoting. I'm simply pointing out that the failure you observed with System.IO has nothing to do with which quoting convention you choose, but results from the fact that the developers of System.IO are not using the same encoder/decoder as Mono.Unix (in that case, by choice). So, I don't see any reason to prefer your half surrogate quoting to the Mono U+0000-based quoting. Both seem to achieve the same goal with respect to round tripping file names, displaying them, etc., but Mono quoting actually results in valid unicode strings. It works because null is the one character that's not legal in a UNIX path name. So, why do you prefer half surrogate coding to U+0000 quoting? Tom (*) There's actually a second, sutble issue. PEP 383 intends utf-8b only to be used for file names. But that means that I might have to bind the first argument to TIFFOpen with utf-8b conversion, while I might have to bind other arguments with utf-8 conversion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at divmod.com Thu Apr 30 17:19:36 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Apr 2009 15:19:36 -0000 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> Message-ID: <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com> On 02:42 pm, tmbdev at gmail.com wrote: >So, why do you prefer half surrogate coding to U+0000 quoting? I have also been eagerly waiting for an answer to this question. I am afraid I have lost it somewhere in the storm of this thread :). Martin, if you're going to stick with the half-surrogate trick, would you mind adding a section to the PEP on "alternate encoding strategies", explaining why the NULL method was not selected? From p.f.moore at gmail.com Thu Apr 30 17:04:30 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 30 Apr 2009 16:04:30 +0100 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> Message-ID: <79990c6b0904300804u7f54b6e6g1f4efe25fc8f1533@mail.gmail.com> 2009/4/30 Thomas Breuel : > The analogous phenomenon will exist in Python with PEP 383.? Let's say I > have a C library with wide character interfaces and I pass it a unicode > string from Python.(*) [...] > (*) There's actually a second, sutble issue.? PEP 383 intends utf-8b only to > be used for file names.? But that means that I might have to bind the first > argument to TIFFOpen with utf-8b conversion, while I might have to bind > other arguments with utf-8 conversion. The footnote seems to imply that you have a concrete case rather than a hypothetical one. The discussion would be much easier if you would supply the concrete details. Then other participants in the discussion could offer concrete suggestions on how your issue could be addressed. Of course, there are 2 provisos here: 1. Maybe you don't care any more, having accepted that the PEP is going to be implemented. That's fine, but there's also no point continuing to argue your case in that event. 2. Maybe you aren't going to accept suggestions that don't conform to your idea of how things should be done. In which case, your reasoning is circular, and you're wasting people's time. Sorry, that sounds grumpy. But I get a headache at the best of times trying to understand Unicode issues, and theoretical, vague, descriptions of problems just make my headache worse... I suggest the discussion should be dropped now, as the PEP has been accepted. Paul. From martin at v.loewis.de Thu Apr 30 17:35:19 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 17:35:19 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> Message-ID: <49F9C537.9020801@v.loewis.de> > What's an analogous failure? Or, rather, why would a failure analogous > to the one I got when using System.IO.DirectoryInfo ever exist in > Python? > > > Mono.Unix uses an encoder and a decoder that knows about special quoting > rules. System.IO uses a different encoder and decoder because it's a > reimplementation of a Microsoft library and the Mono developers chose > not to implement Mono.Unix quoting rules in it. There is nothing > technical preventing System.IO from using the Mono.Unix codec, it's just > that the developers didn't want to change the behavior of an ECMA and > Microsoft library. > > The analogous phenomenon will exist in Python with PEP 383. Let's say I > have a C library with wide character interfaces and I pass it a unicode > string from Python.(*) That C library now turns that unicode string > into UTF-8 for writing to disk using its internal UTF-8 converter. What specific library do you have in mind? Would it always use UTF-8? If so, it will fail in many other ways, as well - if the locale charset is different from UTF-8. I fail to see the analogy. In Python, the standard library works, and the extension fails; in Mono, it's actually vice versa, and not at all analogous. > So, I don't see any reason to prefer your half surrogate quoting to the > Mono U+0000-based quoting. Both seem to achieve the same goal with > respect to round tripping file names, displaying them, etc., but Mono > quoting actually results in valid unicode strings. It works because > null is the one character that's not legal in a UNIX path name. > > So, why do you prefer half surrogate coding to U+0000 quoting? If I pass a string with an embedded U+0000 to gtk, gtk will truncate the string, and stop rendering it at this character. This is worse than what it does for invalid UTF-8 sequences. Chances are fairly high that other C libraries will fail in the same way, in particular if they expect char* (which is very common in C). So I prefer the half surrogate because its failure mode is better th > (*) There's actually a second, sutble issue. PEP 383 intends utf-8b > only to be used for file names. But that means that I might have to > bind the first argument to TIFFOpen with utf-8b conversion, while I > might have to bind other arguments with utf-8 conversion. I couldn't find a Python wrapper for libtiff. If a wrapper was written, it would indeed have to use the file system encoding for the file name parameters. However, it would have to do that even without PEP 383, since the file name should be encoded in the locale's encoding, not in UTF-8, anyway. Regards, Martin From murman at gmail.com Thu Apr 30 17:43:02 2009 From: murman at gmail.com (Michael Urman) Date: Thu, 30 Apr 2009 10:43:02 -0500 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> Message-ID: On Thu, Apr 30, 2009 at 09:42, Thomas Breuel wrote: > So, I don't see any reason to prefer your half surrogate quoting to the Mono > U+0000-based quoting.? Both seem to achieve the same goal with respect to > round tripping file names, displaying them, etc., but Mono quoting actually > results in valid unicode strings.? It works because null is the one > character that's not legal in a UNIX path name. This seems to summarize only half of the problem. Mono's U+0000 quoting creates a string which is an invalid filename; PEP 383's creates one which is an unsanctioned collection of code units. Neither can be passed directly to the posix filesystem in question. I favor PEP 383 because its Unicode strings can be usefully passed to most APIs that would display it usefully. Mono's U+0000 probably truncates most strings. And since such non-valid Unicode strings can occur on the Windows filesystem, I don't find their use in PEP 383 to be a flaw. -- Michael Urman From martin at v.loewis.de Thu Apr 30 18:07:33 2009 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 30 Apr 2009 18:07:33 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com> Message-ID: <49F9CCC5.2010504@v.loewis.de> > Martin, if you're going to stick with the half-surrogate trick, would > you mind adding a section to the PEP on "alternate encoding strategies", > explaining why the NULL method was not selected? In the PEP process, it isn't my job to criticize competing proposals. Instead, proponents of competing proposals should write alternative PEPs, which then get criticized on their own. As the PEP author, I would have to collect the objections to the PEP in the PEP, which I did; I'm not convinced that I would have to also collect all alternative proposals that people come up with in the PEP (except when they are in fact amendments that I accept). I hope I had made it clear that I don't try to "shoot down" alternative proposals, but have rather asked people making alternative proposals to write their own PEPs. At some point (when the amount of alternative proposals grew unreasonably), I stopped responding to each and every alternative proposal that this should be proposed in a separate PEP. Wrt. escaping with U+0000: I personally disliked it because I considered it difficult to implement. In particular, on encoding: how do you arrange the encoder not to encode the NUL character in the encoding, as it would surely be a valid character? The surrogate approach works much better here, as it will automatically invoke the error handler. With further testing, I found that in practice, the proposal also suffers from the problem that the character would be taken as a terminating character by APIs - I found that to be a real problem in gtk, and have added that to the PEP. Regards, Martin From glyph at divmod.com Thu Apr 30 18:26:25 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Apr 2009 16:26:25 -0000 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9C537.9020801@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> <49F9C537.9020801@v.loewis.de> Message-ID: <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com> On 03:35 pm, martin at v.loewis.de wrote: >>So, why do you prefer half surrogate coding to U+0000 quoting? > >If I pass a string with an embedded U+0000 to gtk, gtk will truncate >the string, and stop rendering it at this character. This is worse than >what it does for invalid UTF-8 sequences. Chances are fairly high that >other C libraries will fail in the same way, in particular if they >expect char* (which is very common in C). Hmm. I believe the intended failure mode here, for PyGTK at least, is actually this: TypeError: GtkLabel.set_text() argument 1 must be string without null bytes, not unicode APIs in PyGTK which accept NULLs and silently trucate are probably broken. Although perhaps I've just made your point even more strongly; one because the behavior is inconsistent, and two because it sometimes raises an exception if a NULL is present, and apparently the goal here is to prevent exceptions from being raised anywhere in the process. For this idiom to be of any use to GTK programs, gtk.FileChooser.get_filename() will probably need to be changed, since (in py2) it currently returns a str, not unicode. The PEP should say something about how GUI libraries should handle file choosers, so that they'll be consistent and compatible with the standard library. Perhaps only that file choosers need to take this PEP into account, and the rest is obvious. Or maybe the right thing for GTK to do would be to continue to use bytes on POSIX and convert to text on Windows, since open(), listdir() et. al. will continue to accept bytes for filenames? >So I prefer the half surrogate because its failure mode is better th Heh heh heh. From glyph at divmod.com Thu Apr 30 18:35:25 2009 From: glyph at divmod.com (glyph at divmod.com) Date: Thu, 30 Apr 2009 16:35:25 -0000 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9CCC5.2010504@v.loewis.de> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> <20090430151936.12555.425993626.divmod.xquotient.10039@weber.divmod.com> <49F9CCC5.2010504@v.loewis.de> Message-ID: <20090430163525.12555.1432542229.divmod.xquotient.10137@weber.divmod.com> On 04:07 pm, martin at v.loewis.de wrote: >>Martin, if you're going to stick with the half-surrogate trick, would >>you mind adding a section to the PEP on "alternate encoding >>strategies", >>explaining why the NULL method was not selected? > >In the PEP process, it isn't my job to criticize competing proposals. >Instead, proponents of competing proposals should write alternative >PEPs, which then get criticized on their own. As the PEP author, I >would >have to collect the objections to the PEP in the PEP, which I did; >I'm not convinced that I would have to also collect all alternative >proposals that people come up with in the PEP (except when they are in >fact amendments that I accept). Fair enough. I have probably misunderstood the process. I dimly recalled reading some PEPs which addressed alternate approaches in this way and I thought it was part of the process. Anyway, congratulations on getting the PEP accepted, good luck with the implementation. Thanks for addressing my question. From martin at v.loewis.de Thu Apr 30 18:21:03 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 18:21:03 +0200 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com> References: <7e51d15d0904281201l68580ee2te8d726f974dbe6ac@mail.gmail.com> <7e51d15d0904292332n5bb26c4chff04d69e72f5d259@mail.gmail.com> <49F9484D.9010507@v.loewis.de> <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> <49F9C537.9020801@v.loewis.de> <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com> Message-ID: <49F9CFEF.2050401@v.loewis.de> >> If I pass a string with an embedded U+0000 to gtk, gtk will truncate >> the string, and stop rendering it at this character. This is worse than >> what it does for invalid UTF-8 sequences. Chances are fairly high that >> other C libraries will fail in the same way, in particular if they >> expect char* (which is very common in C). > > Hmm. I believe the intended failure mode here, for PyGTK at least, is > actually this: > > TypeError: GtkLabel.set_text() argument 1 must be string without null > bytes, not unicode It may depend on the widget also, I tried it with wxMessageDialog (I only had the wx example available, and am using wxgtk). > APIs in PyGTK which accept NULLs and silently trucate are probably > broken. Although perhaps I've just made your point even more strongly; > one because the behavior is inconsistent, and two because it sometimes > raises an exception if a NULL is present, and apparently the goal here > is to prevent exceptions from being raised anywhere in the process. Indeed so. > For this idiom to be of any use to GTK programs, > gtk.FileChooser.get_filename() will probably need to be changed, since > (in py2) it currently returns a str, not unicode. Perhaps - the entire PEP is about Python 3 only. I don't know whether PyGTK already works with 3.x. > The PEP should say something about how GUI libraries should handle file > choosers, so that they'll be consistent and compatible with the standard > library. Perhaps only that file choosers need to take this PEP into > account, and the rest is obvious. Or maybe the right thing for GTK to > do would be to continue to use bytes on POSIX and convert to text on > Windows, since open(), listdir() et. al. will continue to accept bytes > for filenames? In Python 3, the file chooser should definitely return strings, and it would be good if they were PEP 383 compliant. >> So I prefer the half surrogate because its failure mode is better th > > Heh heh heh. And it wasn't even intentional :-) Martin From stephen at xemacs.org Thu Apr 30 18:39:52 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 01 May 2009 01:39:52 +0900 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <20090429224532.GA11604@cskk.homeip.net> References: <87d4avk3f9.fsf@uwakimon.sk.tsukuba.ac.jp> <20090429224532.GA11604@cskk.homeip.net> Message-ID: <87y6tihz8n.fsf@uwakimon.sk.tsukuba.ac.jp> Cameron Simpson writes: > On 29Apr2009 22:14, Stephen J. Turnbull wrote: > | Baptiste Carvello writes: > | > By contrast, if the new utf-8b codec would *supercede* the old one, > | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where > | > surrogates are unused). Thus ambiguity could be avoided. > | > | Unfortunately, that's false. [Because Python strings are > | intended to be used as containers for widechars which are to be > | interpreted as Unicode when that makes sense, but there's no > | restriction against nonsense code points, including in UCS-4 > | Python.] [...] > Wouldn't you then be bypassing the implicit encoding anyway, at least to > some extent, and thus not trip over the PEP? Sure. I'm not really arguing the PEP here; the point is that under the current definition of Python strings, ambiguity is unavoidable. The best we can ask for is fewer exceptions, and an attempt to reduce ambiguity to a bare minimum in the code paths that we open up when we make definition that allows a formerly erroneous computation to succeed. Martin is well aware of this, the PEP is clear enough about that (to me, but I'm a mail and multilingual editor internals kinda guy). I'd rather have more validation of strings, but *shrug* Martin's doing the work. OTOH, the Unicode fans need to understand that past policy of Python is not to validate; Python is intended to provide all the tools needed to write validating apps, but it isn't one itself. Martin's PEP is quite narrow in that sense. All it is about is an invertible encoding of broken encodings. It does have the downside that it guarantees that Python itself can produce non-conforming strings, but that's not the end of the world, and an app can keep track of them or even refuse them by setting the error handler, if it wants to. From dripton at ripton.net Thu Apr 30 18:44:35 2009 From: dripton at ripton.net (David Ripton) Date: Thu, 30 Apr 2009 09:44:35 -0700 Subject: [Python-Dev] a suggestion ... Re: PEP 383 (again) In-Reply-To: <49F9CFEF.2050401@v.loewis.de> References: <7e51d15d0904300026v2156926cp48c52cf12d2dbe27@mail.gmail.com> <49F96095.4000208@v.loewis.de> <20090430112634.12555.1299754091.divmod.xquotient.10020@weber.divmod.com> <49F9A1FB.4090104@v.loewis.de> <7e51d15d0904300610r79a28888k9e742367992592b2@mail.gmail.com> <49F9A851.5010006@v.loewis.de> <7e51d15d0904300742s10c4c049pc7a0935b3cb366d1@mail.gmail.com> <49F9C537.9020801@v.loewis.de> <20090430162625.12555.1123571271.divmod.xquotient.10122@weber.divmod.com> <49F9CFEF.2050401@v.loewis.de> Message-ID: <20090430164435.GA314@vidar.dreamhost.com> On 2009.04.30 18:21:03 +0200, "Martin v. L?wis" wrote: > Perhaps - the entire PEP is about Python 3 only. I don't know whether > PyGTK already works with 3.x. It does not. There is a bug in the Gnome tracker for it, and I believe some work has been done to start porting PyGObject, but it appears that a full PyGTK on Python 3 is a ways off. -- David Ripton dripton at ripton.net From google at mrabarnett.plus.com Thu Apr 30 20:02:23 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 30 Apr 2009 19:02:23 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces Message-ID: <49F9E7AF.1050306@mrabarnett.plus.com> One further question: should the encoder accept a string like u'\xDCC2\xDC80'? That would encode to b'\xC2\x80', which, when decoded, would give u'\x80'. Does the PEP only guarantee that strings decoded from the filesystem are reversible, but not check what might be de novo strings? From jimjjewett at gmail.com Thu Apr 30 21:03:47 2009 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 30 Apr 2009 15:03:47 -0400 Subject: [Python-Dev] #!/usr/bin/env python --> python3 where applicable Message-ID: Jared Grubb wrote: > Ok, so if I understand, the situation is: > * python points to 2.x version > * python3 points to 3.x version > * need to be able to run certain 3k scripts from cmdline (since we're > talking about shebangs) using Python3k even though "python" > points to 2.x > So, if I got the situation right, then do these same scripts > understand that PYTHONPATH and PYTHONHOME and all the others > are also probably pointing to 2.x code? Would it make sense to introduce PYTHON2PATH and PYTHON3PATH (or even PYTHON27PATH and PYTHON 32PATH) et al? Or is this an area where we just figure that whoever moved the file locations around for distribution can hardcode things properly? -jJ From martin at v.loewis.de Thu Apr 30 21:10:37 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 21:10:37 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F9E7AF.1050306@mrabarnett.plus.com> References: <49F9E7AF.1050306@mrabarnett.plus.com> Message-ID: <49F9F7AD.8090704@v.loewis.de> MRAB wrote: > One further question: should the encoder accept a string like > u'\xDCC2\xDC80'? That would encode to b'\xC2\x80' Indeed so. > which, when decoded, would give u'\x80'. Assuming the encoding is UTF-8, yes. > Does the PEP only guarantee that strings decoded > from the filesystem are reversible, but not check what might be de novo > strings? Exactly so. Regards, Martin From mike.klaas at gmail.com Thu Apr 30 21:10:51 2009 From: mike.klaas at gmail.com (Mike Klaas) Date: Thu, 30 Apr 2009 12:10:51 -0700 Subject: [Python-Dev] PEP 383 and GUI libraries In-Reply-To: References: <49F965DB.6050601@v.loewis.de> <49F96770.4080206@g.nevcal.com> <49F96B80.5090808@v.loewis.de> Message-ID: On 30-Apr-09, at 7:39 AM, Guido van Rossum wrote: > FWIW, I'm in agreement with this PEP (i.e. its status is now > Accepted). Martin, you can update the PEP and start the > implementation. +1 Kudos to Martin for seeing this through with (imo) considerable patience and dignity. -Mike From larry at hastings.org Thu Apr 30 21:32:32 2009 From: larry at hastings.org (Larry Hastings) Date: Thu, 30 Apr 2009 12:32:32 -0700 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49F8DBCD.6050504@trueblade.com> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> Message-ID: <49F9FCD0.80208@hastings.org> Counting the votes for http://bugs.python.org/issue5799 : +1 from Mark Hammond (via private mail) +1 from Paul Moore (via the tracker) +1 from Tim Golden (in Python-ideas, though what he literally said was "I'm up for it") +1 from Michael Foord +1 from Eric Smith There have been no other votes. Is that enough consensus for it to go in? If so, are there any core developers who could help me get it in before the 3.1 feature freeze? The patch should be in good shape; it has unit tests and updated documentation. /larry/ From piet at cs.uu.nl Thu Apr 30 21:33:05 2009 From: piet at cs.uu.nl (Piet van Oostrum) Date: Thu, 30 Apr 2009 21:33:05 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> (Ronald Oussoren's message of "Tue\, 28 Apr 2009 14\:30\:43 +0200") References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> Message-ID: >>>>> Ronald Oussoren (RO) wrote: >RO> For what it's worth, the OSX API's seem to behave as follows: >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the >RO> system automaticly encodes the name. >RO> That is, open(chr(255), 'w') will silently create a file named '%FF' >RO> instead of the name you'd expect on a unix system. Not for me (I am using Python 2.6.2). >>> f = open(chr(255), 'w') Traceback (most recent call last): File "", line 1, in IOError: [Errno 22] invalid mode ('w') or filename: '\xff' >>> I once got a tar file from a Linux system which contained a file with a non-ASCII, ISO-8859-1 encoded filename. The tar file refused to be unpacked on a HFS+ filesystem. -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet at vanoostrum.org From barry at barrys-emacs.org Thu Apr 30 21:43:24 2009 From: barry at barrys-emacs.org (Barry Scott) Date: Thu, 30 Apr 2009 20:43:24 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F92E82.9040702@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <49F92E82.9040702@v.loewis.de> Message-ID: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> On 30 Apr 2009, at 05:52, Martin v. L?wis wrote: >> How do get a printable unicode version of these path strings if they >> contain none unicode data? > > Define "printable". One way would be to use a regular expression, > replacing all codes in a certain range with a question mark. What I mean by printable is that the string must be valid unicode that I can print to a UTF-8 console or place as text in a UTF-8 web page. I think your PEP gives me a string that will not encode to valid UTF-8 that the outside of python world likes. Did I get this point wrong? > > >> I'm guessing that an app has to understand that filenames come in >> two forms >> unicode and bytes if its not utf-8 data. Why not simply return >> string if >> its valid utf-8 otherwise return bytes? > > That would have been an alternative solution, and the one that 2.x > uses > for listdir. People didn't like it. In our application we are running fedora with the assumption that the filenames are UTF-8. When Windows systems FTP files to our system the files are in CP-1251(?) and not valid UTF-8. What we have to do is detect these non UTF-8 filename and get the users to rename them. Having an algorithm that says if its a string no problem, if its a byte deal with the exceptions seems simple. How do I do this detection with the PEP proposal? Do I end up using the byte interface and doing the utf-8 decode myself? Barry From google at mrabarnett.plus.com Thu Apr 30 21:54:42 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 30 Apr 2009 20:54:42 +0100 Subject: [Python-Dev] Proposed: add support for UNC paths to all functions in ntpath In-Reply-To: <49F9FCD0.80208@hastings.org> References: <49F8B222.7070204@hastings.org> <49F8D9A0.7000104@voidspace.org.uk> <49F8DBCD.6050504@trueblade.com> <49F9FCD0.80208@hastings.org> Message-ID: <49FA0202.2020203@mrabarnett.plus.com> Larry Hastings wrote: > > > Counting the votes for http://bugs.python.org/issue5799 : > > +1 from Mark Hammond (via private mail) > +1 from Paul Moore (via the tracker) > +1 from Tim Golden (in Python-ideas, though what he literally said > was "I'm up for it") > +1 from Michael Foord > +1 from Eric Smith > > There have been no other votes. > > Is that enough consensus for it to go in? If so, are there any core > developers who could help me get it in before the 3.1 feature freeze? > The patch should be in good shape; it has unit tests and updated > documentation. > +1 from me. From nad at acm.org Thu Apr 30 21:54:50 2009 From: nad at acm.org (Ned Deily) Date: Thu, 30 Apr 2009 12:54:50 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> Message-ID: In article , Piet van Oostrum wrote: > >>>>> Ronald Oussoren (RO) wrote: > >RO> For what it's worth, the OSX API's seem to behave as follows: > >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the > >RO> system automaticly encodes the name. > > >RO> That is, open(chr(255), 'w') will silently create a file named '%FF' > >RO> instead of the name you'd expect on a unix system. > > Not for me (I am using Python 2.6.2). > > >>> f = open(chr(255), 'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' > >>> What version of OSX are you using? On Tiger 10.4.11 I see the failure you see but on Leopard 10.5.6 the behavior Ronald reports. -- Ned Deily, nad at acm.org From martin at v.loewis.de Thu Apr 30 22:06:33 2009 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 30 Apr 2009 22:06:33 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> References: <49EEBE2E.3090601@v.loewis.de> <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <49F92E82.9040702@v.loewis.de> <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> Message-ID: <49FA04C9.70906@v.loewis.de> >>> How do get a printable unicode version of these path strings if they >>> contain none unicode data? >> >> Define "printable". One way would be to use a regular expression, >> replacing all codes in a certain range with a question mark. > > What I mean by printable is that the string must be valid unicode > that I can print to a UTF-8 console or place as text in a UTF-8 > web page. > > I think your PEP gives me a string that will not encode to > valid UTF-8 that the outside of python world likes. Did I get this > point wrong? You are right. However, if your *only* requirement is that it should be printable, then this is fairly underspecified. One way to get a printable string would be this function def printable_string(unprintable): return "" This will always return a printable version of the input string... > In our application we are running fedora with the assumption that the > filenames are UTF-8. When Windows systems FTP files to our system > the files are in CP-1251(?) and not valid UTF-8. That would be a bug in your FTP server, no? If you want all file names to be UTF-8, then your FTP server should arrange for that. > Having an algorithm that says if its a string no problem, if its > a byte deal with the exceptions seems simple. > > How do I do this detection with the PEP proposal? > Do I end up using the byte interface and doing the utf-8 decode > myself? No, you should encode using the "strict" error handler, with the locale encoding. If the file name encodes successfully, it's correct, otherwise, it's broken. Regards, Martin From google at mrabarnett.plus.com Thu Apr 30 22:07:41 2009 From: google at mrabarnett.plus.com (MRAB) Date: Thu, 30 Apr 2009 21:07:41 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> References: <49EEBE2E.3090601@v.loewis.de> <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <49F92E82.9040702@v.loewis.de> <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> Message-ID: <49FA050D.4090104@mrabarnett.plus.com> Barry Scott wrote: > > On 30 Apr 2009, at 05:52, Martin v. L?wis wrote: > >>> How do get a printable unicode version of these path strings if they >>> contain none unicode data? >> >> Define "printable". One way would be to use a regular expression, >> replacing all codes in a certain range with a question mark. > > What I mean by printable is that the string must be valid unicode > that I can print to a UTF-8 console or place as text in a UTF-8 > web page. > > I think your PEP gives me a string that will not encode to > valid UTF-8 that the outside of python world likes. Did I get this > point wrong? > > >> >> >>> I'm guessing that an app has to understand that filenames come in two >>> forms >>> unicode and bytes if its not utf-8 data. Why not simply return string if >>> its valid utf-8 otherwise return bytes? >> >> That would have been an alternative solution, and the one that 2.x uses >> for listdir. People didn't like it. > > In our application we are running fedora with the assumption that the > filenames are UTF-8. When Windows systems FTP files to our system > the files are in CP-1251(?) and not valid UTF-8. > > What we have to do is detect these non UTF-8 filename and get the > users to rename them. > > Having an algorithm that says if its a string no problem, if its > a byte deal with the exceptions seems simple. > > How do I do this detection with the PEP proposal? > Do I end up using the byte interface and doing the utf-8 decode > myself? > What do you do currently? The PEP just offers a way of reading all filenames as Unicode, if that's what you want. So what if the strings can't be encoded to normal UTF-8! The filenames aren't valid UTF-8 anyway! :-) From foom at fuhm.net Thu Apr 30 22:20:31 2009 From: foom at fuhm.net (James Y Knight) Date: Thu, 30 Apr 2009 16:20:31 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49F97275.3010307@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F97275.3010307@v.loewis.de> Message-ID: <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net> On Apr 30, 2009, at 5:42 AM, Martin v. L?wis wrote: > I think you are right. I have now excluded ASCII bytes from being > mapped, effectively not supporting any encodings that are not ASCII > compatible. Does that sound ok? Yes. The practical upshot of this is that users who brokenly use "ja_JP.SJIS" as their locale (which, note, first requires editing some files in /var/lib/locales manually to enable its use..) may still have python not work with invalid-in-shift-jis filenames. Since that locale is widely recognized as a bad idea to use, and is not supported by any distros, it certainly doesn't bother me that it isn't 100% supported in python. It seems like the most common reason why people want to use SJIS is to make old pre-unicode apps work right in WINE -- in which case it doesn't actually affect unix python at all. I'd personally be fine with python just declaring that the filesystem- encoding will *always* be utf-8b and ignore the locale...but I expect some other people might complain about that. Of course, application authors can decide to do that themselves by calling sys.setfilesystemencoding('utf-8b') at the start of their program. James From tmbdev at gmail.com Thu Apr 30 22:55:48 2009 From: tmbdev at gmail.com (Thomas Breuel) Date: Thu, 30 Apr 2009 22:55:48 +0200 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> Message-ID: <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com> > > Not for me (I am using Python 2.6.2). > > >>> f = open(chr(255), 'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' > >>> You can get the same error on Linux: $ python Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f=open(chr(255),'w') Traceback (most recent call last): File "", line 1, in IOError: [Errno 22] invalid mode ('w') or filename: '\xff' >>> (Some file system drivers do not enforce valid utf8 yet, but I suspect they will in the future.) Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Thu Apr 30 23:13:43 2009 From: barry at barrys-emacs.org (Barry Scott) Date: Thu, 30 Apr 2009 22:13:43 +0100 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <49FA04C9.70906@v.loewis.de> References: <49EEBE2E.3090601@v.loewis.de> <83846C63-72CE-4E6E-A30D-8CF1AD95D2CF@barrys-emacs.org> <49F92E82.9040702@v.loewis.de> <3710EB34-DB71-4AD3-90D3-9657733B6DD3@barrys-emacs.org> <49FA04C9.70906@v.loewis.de> Message-ID: <3D703962-7B3A-4BBC-95DB-ACD90838F13B@barrys-emacs.org> On 30 Apr 2009, at 21:06, Martin v. L?wis wrote: >>>> How do get a printable unicode version of these path strings if >>>> they >>>> contain none unicode data? >>> >>> Define "printable". One way would be to use a regular expression, >>> replacing all codes in a certain range with a question mark. >> >> What I mean by printable is that the string must be valid unicode >> that I can print to a UTF-8 console or place as text in a UTF-8 >> web page. >> >> I think your PEP gives me a string that will not encode to >> valid UTF-8 that the outside of python world likes. Did I get this >> point wrong? > > You are right. However, if your *only* requirement is that it should > be printable, then this is fairly underspecified. One way to get > a printable string would be this function > > def printable_string(unprintable): > return "" Ha ha! Indeed this works, but I would have to try to turn enough of the string into a reasonable hint at the name of the file so the user can some chance of know what is being reported. > > > This will always return a printable version of the input string... > >> In our application we are running fedora with the assumption that the >> filenames are UTF-8. When Windows systems FTP files to our system >> the files are in CP-1251(?) and not valid UTF-8. > > That would be a bug in your FTP server, no? If you want all file names > to be UTF-8, then your FTP server should arrange for that. Not a bug its the lack of a feature. We use ProFTPd that has just implemented what is required. I forget the exact details - they are at work - when the ftp client asks for the FEAT of the ftp server the server can say use UTF-8. Supporting that in the server was apparently none-trivia. > > >> Having an algorithm that says if its a string no problem, if its >> a byte deal with the exceptions seems simple. >> >> How do I do this detection with the PEP proposal? >> Do I end up using the byte interface and doing the utf-8 decode >> myself? > > No, you should encode using the "strict" error handler, with the > locale encoding. If the file name encodes successfully, it's correct, > otherwise, it's broken. O.k. I understand. Barry From benjamin at python.org Thu Apr 30 23:25:16 2009 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 30 Apr 2009 16:25:16 -0500 Subject: [Python-Dev] 3.1 beta deferred Message-ID: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com> Hi everyone! In the interest of letting Martin implement PEP 383 for 3.1, I am deferring the release of the 3.1 beta until next Wednesday, May 6th. Thank you, Benjamin From tjreedy at udel.edu Thu Apr 30 23:39:10 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 30 Apr 2009 17:39:10 -0400 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net> References: <49EEBE2E.3090601@v.loewis.de> <49F184C6.8000905@g.nevcal.com> <49F30083.5050506@v.loewis.de> <49F559A4.8050400@g.nevcal.com> <49F60A8A.8090603@v.loewis.de> <49F63B19.7010306@g.nevcal.com> <49F6799F.5030208@v.loewis.de> <875E02B9-00AA-47E0-AA68-66C2B62DBF33@fuhm.net> <49F6A71A.3020809@v.loewis.de> <873CC8F9-879C-4146-91D5-072ACA4D4D9B@fuhm.net> <49F97275.3010307@v.loewis.de> <36EBC80A-EBF2-4C4E-B948-48AA30E63911@fuhm.net> Message-ID: James Y Knight wrote: > On Apr 30, 2009, at 5:42 AM, Martin v. L?wis wrote: >> I think you are right. I have now excluded ASCII bytes from being >> mapped, effectively not supporting any encodings that are not ASCII >> compatible. Does that sound ok? > > Yes. The practical upshot of this is that users who brokenly use > "ja_JP.SJIS" as their locale (which, note, first requires editing some > files in /var/lib/locales manually to enable its use..) may still have > python not work with invalid-in-shift-jis filenames. Since that locale > is widely recognized as a bad idea to use, and is not supported by any > distros, it certainly doesn't bother me that it isn't 100% supported in > python. It seems like the most common reason why people want to use SJIS > is to make old pre-unicode apps work right in WINE -- in which case it > doesn't actually affect unix python at all. > > I'd personally be fine with python just declaring that the > filesystem-encoding will *always* be utf-8b and ignore the locale...but > I expect some other people might complain about that. Of course, > application authors can decide to do that themselves by calling > sys.setfilesystemencoding('utf-8b') at the start of their program. It seems to me that the 3.1+ doc set (or wiki) could be usefully extended with a How-to on working with filenames. I am not sure that everything useful fits anywhere in particular the ref manuals. From a.badger at gmail.com Thu Apr 30 23:35:42 2009 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 30 Apr 2009 14:35:42 -0700 Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces In-Reply-To: <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com> References: <20090427211447.GA4291@cskk.homeip.net> <49F658A5.7080807@g.nevcal.com> <79990c6b0904280220x5a1352b6u153edc7487c737f9@mail.gmail.com> <79990c6b0904280457g3c8b1153p84624b3ab1ef04be@mail.gmail.com> <49F6F09E.2020506@voidspace.org.uk> <1209A1AB-1A80-4E46-88B3-5F545476ADFA@mac.com> <7e51d15d0904301355u2268bf0te06769792f697cc7@mail.gmail.com> Message-ID: <49FA19AE.9060402@gmail.com> Thomas Breuel wrote: > Not for me (I am using Python 2.6.2). > > >>> f = open(chr(255), 'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' > >>> > > > You can get the same error on Linux: > > $ python > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) > [GCC 4.3.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> f=open(chr(255),'w') > Traceback (most recent call last): > File "", line 1, in > IOError: [Errno 22] invalid mode ('w') or filename: '\xff' >>>> > > (Some file system drivers do not enforce valid utf8 yet, but I suspect > they will in the future.) > Do you suspect that from discussing the issue with kernel developers or reading a thread on lkml? If not, then you're suspicion seems to be pretty groundless.... The fact that VFAT enforces an encoding does not lend itself to your argument for two reasons: 1) VFAT is not a Unix filesystem. It's a filesystem that's compatible with Windows/DOS. If Windows and DOS have filesystem encodings, then it makes sense for that driver to enforce that as well. Filesystems intended to be used natively on Linux/Unix do not necessarily make this design decision. 2) The encoding is specified when mounting the filesystem. This means that you can still mix encodings in a number of ways. If you mount with an encoding that has full byte coverage, for instance, each user can put filenames from different encodings on there. If you mount with utf8 on a system which uses euc-jp as the default encoding, you can have full paths that contain a mix of utf-8 and euc-jp. Etc. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From tjreedy at udel.edu Thu Apr 30 23:41:56 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 30 Apr 2009 17:41:56 -0400 Subject: [Python-Dev] 3.1 beta deferred In-Reply-To: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com> References: <1afaf6160904301425h4b420827w3c51eafd097e9c73@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > Hi everyone! > In the interest of letting Martin implement PEP 383 for 3.1, I am > deferring the release of the 3.1 beta until next Wednesday, May 6th. That might also give time for Larry Hastngs' UNC path patch. (and anything else essentially ready ;-)