From martin at v.loewis.de Thu May 1 00:50:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 00:50:43 +0200 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <20080430144804.GA26439@panix.com> References: <20080430144804.GA26439@panix.com> Message-ID: <4818F7C3.7060806@v.loewis.de> > There's a big difference between "not enough memory" and "directory > consumes lots of memory". My company has some directories with several > hundred thousand entries, so using an iterator would be appreciated > (although by the time we upgrade to Python 3.x, we probably will have > fixed that architecture). > > But even then, we're talking tens of megabytes at worst, so it's not a > killer -- just painful. But what kind of operation do you want to perform on that directory? I would expect that usually, you either a) refer to a single file, which you are either going to create, or want to process. In that case, you know the name in advance, so you open/stat/mkdir/unlink/rmdir the file, without caring how many files exist in the directory, or b) need to process all files, to count/sum/backup/remove them; in this case, you will need the entire list in the process, and reading them one-by-one is likely going to slow down the entire operation, instead of speeding it up. So in no case, you actually need to read the entries incrementally. That the C APIs provide chunk-wise processing is just because dynamic memory management is so painful to write in C that the caller is just asked to pass a limited-size output buffer, which then gets refilled in subsequent read calls. Originally, the APIs would return a single entry at a time from the file system, which was super-slow. Today, SysV all-singing all-dancing getdents provides multiple entries at a time, for performance reasons. Regards, Martin From greg.ewing at canterbury.ac.nz Thu May 1 00:49:23 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 01 May 2008 10:49:23 +1200 Subject: [Python-3000] range() issues In-Reply-To: References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> Message-ID: <4818F773.4060809@canterbury.ac.nz> Guido van Rossum wrote: > I would like to see the following: > > - sq_length should return maxsize if the actual value doesn't fit So that code will silently behave as though the rest of the sequence wasn't there some of the time? Can you elaborate on the rationale for this? I'm having trouble seeing how it's a good idea. -- Greg From guido at python.org Thu May 1 01:00:25 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 16:00:25 -0700 Subject: [Python-3000] range() issues In-Reply-To: <4818F773.4060809@canterbury.ac.nz> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> Message-ID: On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing wrote: > Guido van Rossum wrote: > > > I would like to see the following: > > > > - sq_length should return maxsize if the actual value doesn't fit > > > > So that code will silently behave as though the rest of > the sequence wasn't there some of the time? Only if it uses LBYL. > Can you elaborate on the rationale for this? I'm having > trouble seeing how it's a good idea. Ask the designers of the Java collections package. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 1 01:02:31 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 16:02:31 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <4818F7C3.7060806@v.loewis.de> References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> Message-ID: There is one use case I can see for an iterator-version of os.listdir() (to be named os.opendir()): when globbing a huge directory looking for a certain pattern. Using os.listdir() you end up needed enough memory to hold all of the names at once. Using os.opendir() you would need only enough memory to hold all of the names THAT MATCH. On Wed, Apr 30, 2008 at 3:50 PM, "Martin v. L?wis" wrote: > > There's a big difference between "not enough memory" and "directory > > consumes lots of memory". My company has some directories with several > > hundred thousand entries, so using an iterator would be appreciated > > (although by the time we upgrade to Python 3.x, we probably will have > > fixed that architecture). > > > > But even then, we're talking tens of megabytes at worst, so it's not a > > killer -- just painful. > > But what kind of operation do you want to perform on that directory? > > I would expect that usually, you either > > a) refer to a single file, which you are either going to create, or > want to process. In that case, you know the name in advance, so > you open/stat/mkdir/unlink/rmdir the file, without caring how > many files exist in the directory, > or > > b) need to process all files, to count/sum/backup/remove them; > in this case, you will need the entire list in the process, > and reading them one-by-one is likely going to slow down > the entire operation, instead of speeding it up. > > So in no case, you actually need to read the entries incrementally. > > That the C APIs provide chunk-wise processing is just because > dynamic memory management is so painful to write in C that the > caller is just asked to pass a limited-size output buffer, which then > gets refilled in subsequent read calls. Originally, the APIs would > return a single entry at a time from the file system, which was > super-slow. Today, SysV all-singing all-dancing getdents provides > multiple entries at a time, for performance reasons. > > Regards, > Martin > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Thu May 1 01:11:23 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 01 May 2008 09:11:23 +1000 Subject: [Python-3000] range() issues In-Reply-To: <4818F773.4060809@canterbury.ac.nz> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> Message-ID: <4818FC9B.1080809@gmail.com> Greg Ewing wrote: > Guido van Rossum wrote: >> I would like to see the following: >> >> - sq_length should return maxsize if the actual value doesn't fit > > So that code will silently behave as though the rest of > the sequence wasn't there some of the time? > > Can you elaborate on the rationale for this? I'm having > trouble seeing how it's a good idea. > Yeah, it sounds more like behaviour I would expect from __length_hint__, not __length__. In the bug tracker, Alexander mentioned the possibility of removing __length__ and __getitem__ support from range() objects in py3k, and implementing only __length_hint__ instead (leaving range() as a bare-bones iterable). I'm starting to like that idea more and more. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Thu May 1 01:14:08 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 01 May 2008 11:14:08 +1200 Subject: [Python-3000] range() issues In-Reply-To: References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> Message-ID: <4818FD40.5010701@canterbury.ac.nz> Guido van Rossum wrote: > On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing wrote: > >> So that code will silently behave as though the rest of >> the sequence wasn't there some of the time? > > Only if it uses LBYL. I don't understand that. Iteration isn't the only thing one does with sequences. If you have a reason to call len() in the first place, I don't see how having it sometimes return inaccurate results can be helpful. >> Can you elaborate on the rationale for this? > Ask the designers of the Java collections package. Do you mean that they have a rationale which you agree with and think applies to Python as well, or do you mean that you're doing it just because Java does it and they must have a good reason? If the former, can you refer me to a document which espouses it? -- Greg From guido at python.org Thu May 1 01:36:24 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 16:36:24 -0700 Subject: [Python-3000] range() issues In-Reply-To: <4818FC9B.1080809@gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FC9B.1080809@gmail.com> Message-ID: On Wed, Apr 30, 2008 at 4:11 PM, Nick Coghlan wrote: > In the bug tracker, Alexander mentioned the possibility of removing > __length__ and __getitem__ support from range() objects in py3k, and > implementing only __length_hint__ instead (leaving range() as a bare-bones > iterable). I'm starting to like that idea more and more. Indeed. Do check if it breaks anything though (and how serious the breakage is). Also note that __bool__ for a range should probably remain implemented -- True for a non-empty range, False for an empty one. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 1 01:41:22 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 16:41:22 -0700 Subject: [Python-3000] range() issues In-Reply-To: <4818FD40.5010701@canterbury.ac.nz> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FD40.5010701@canterbury.ac.nz> Message-ID: On Wed, Apr 30, 2008 at 4:14 PM, Greg Ewing wrote: > Guido van Rossum wrote: > > > > > On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing > wrote: > > > > > > > > > So that code will silently behave as though the rest of > > > the sequence wasn't there some of the time? > > > > > > > Only if it uses LBYL. > > > > I don't understand that. Iteration isn't the only thing > one does with sequences. If you have a reason to call > len() in the first place, I don't see how having it > sometimes return inaccurate results can be helpful. I've come across situations where len() raising an exception was more inconvenient than returning a truncated value (e.g. when printing). > > > Can you elaborate on the rationale for this? > > > > > > > > > Ask the designers of the Java collections package. > > > > Do you mean that they have a rationale which you agree > with and think applies to Python as well, or do you > mean that you're doing it just because Java does it > and they must have a good reason? > > If the former, can you refer me to a document which > espouses it? You'll have to do some research, but I believe the circumstances are similar -- they have a size() method that is defined to return an unboxed int, so they are limited by that. I found the spec here: http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collection.html#size() But I didn't find a rationale. I'm sure it was PBP though. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Thu May 1 02:02:22 2008 From: brett at python.org (Brett Cannon) Date: Wed, 30 Apr 2008 17:02:22 -0700 Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1> Message-ID: On Tue, Apr 29, 2008 at 11:33 PM, Joe Smith wrote: > > "Brett Cannon" wrote in message > news:bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7 at mail.gmail.com... > > > > > On Tue, Apr 29, 2008 at 2:46 AM, Raymond Hettinger wrote: > > > > > > > > > * UserList/UserString [done: 3.0] > > > > > > > > > > Note that these were updated and moved to the collections module in > Py3.0. > > > > > > > > > > Noted. > > > > > > > > > > > > > > anydbm dbm.tools [1]_ > > > > whichdb dbm.tools [1]_ > > > > > > > > > > Were there any better naming suggestions than dbm.tools? The original > > > names seem much more informative. > > > > > > > > > > But way too much overhead for two modules that only contained one > > useful function each. As Nick said, if you don't know DB stuff then I > > don't see any loss of information. > > > > If you can come up with a better name I am open to suggestions, but > > the module merge will happen. > > > > Is there a problem having the functions be just dbm.open() and > dmb.whichdb()? As a user the latter one seems espeically logical, as it is a > tool to help me select which "submodule" I want to use. There is a general dislike in putting code in a package's __init__ module. Personally I am fine with doing that, but I tried not to do that with the reorg. If people speak up in support of this then it can happen. -Brett From guido at python.org Thu May 1 02:08:31 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 17:08:31 -0700 Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1> Message-ID: On Wed, Apr 30, 2008 at 5:02 PM, Brett Cannon wrote: > There is a general dislike in putting code in a package's __init__ > module. Personally I am fine with doing that, but I tried not to do > that with the reorg. If people speak up in support of this then it can > happen. I'm not sure I agree with that sentiment. Quite a few packages have large __index__.py files. Django routinely puts lots of code there too. Even if people prefer not to put (too much) code in __init__.py, a good compromise might be to put actual implementation code in a separate submodule, and to put things like from submodule import * # submodule.py better define __all__... or from submodule import api1, api2, ... in __init__.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From musiccomposition at gmail.com Thu May 1 02:10:44 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 30 Apr 2008 19:10:44 -0500 Subject: [Python-3000] range() issues In-Reply-To: References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FD40.5010701@canterbury.ac.nz> Message-ID: <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com> On Wed, Apr 30, 2008 at 6:41 PM, Guido van Rossum wrote: > I've come across situations where len() raising an exception was more > inconvenient than returning a truncated value (e.g. when printing). In those cases, shouldn't you be explicit, catch the overflow exception, and then use sys.maxsize? > But I didn't find a rationale. I'm sure it was PBP though. What's PBP? (A search only turns up a bicycle race. :)) -- Cheers, Benjamin Peterson From guido at python.org Thu May 1 02:16:26 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 17:16:26 -0700 Subject: [Python-3000] range() issues In-Reply-To: <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FD40.5010701@canterbury.ac.nz> <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com> Message-ID: On Wed, Apr 30, 2008 at 5:10 PM, Benjamin Peterson wrote: > On Wed, Apr 30, 2008 at 6:41 PM, Guido van Rossum wrote: > > I've come across situations where len() raising an exception was more > > inconvenient than returning a truncated value (e.g. when printing). > > In those cases, shouldn't you be explicit, catch the overflow > exception, and then use sys.maxsize? That's what I did *after* a big run crashed. :-( > > But I didn't find a rationale. I'm sure it was PBP though. > > What's PBP? (A search only turns up a bicycle race. :)) Practicality Beats Purity, from the zen of Python. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From musiccomposition at gmail.com Thu May 1 02:26:35 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 30 Apr 2008 19:26:35 -0500 Subject: [Python-3000] range() issues In-Reply-To: References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FD40.5010701@canterbury.ac.nz> <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com> Message-ID: <1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com> On Wed, Apr 30, 2008 at 7:16 PM, Guido van Rossum wrote: > > > But I didn't find a rationale. I'm sure it was PBP though. > > > > What's PBP? (A search only turns up a bicycle race. :)) > > Practicality Beats Purity, from the zen of Python It's practical to have a builtin function silently "lie" about the length of a sequence? I don't see how that makes anybody's life much easier. > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- Cheers, Benjamin Peterson From brett at python.org Thu May 1 02:34:19 2008 From: brett at python.org (Brett Cannon) Date: Thu, 1 May 2008 02:34:19 +0200 Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1> Message-ID: On Wed, Apr 30, 2008 at 5:08 PM, Guido van Rossum wrote: > On Wed, Apr 30, 2008 at 5:02 PM, Brett Cannon wrote: > > There is a general dislike in putting code in a package's __init__ > > module. Personally I am fine with doing that, but I tried not to do > > that with the reorg. If people speak up in support of this then it can > > happen. > > I'm not sure I agree with that sentiment. Quite a few packages have > large __index__.py files. Django routinely puts lots of code there > too. > > Even if people prefer not to put (too much) code in __init__.py, a > good compromise might be to put actual implementation code in a > separate submodule, and to put things like > > from submodule import * # submodule.py better define __all__... > > or > > from submodule import api1, api2, ... > > in __init__.py. Going through the PEP the dbm suggestion seems to be the only one that jumps out at me at possibly benefiting at moving something to the __init__.py module. I personally don't like putting stuff in another module and then importing as that provides two different module names to get at the same time. I prefer there being just a single way to get at the code. Anyway, assuming there is no great outcry then I will take Joe's suggestion as I like that organization more than the current one. -Brett From guido at python.org Thu May 1 02:34:34 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Apr 2008 17:34:34 -0700 Subject: [Python-3000] range() issues In-Reply-To: <1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FD40.5010701@canterbury.ac.nz> <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com> <1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com> Message-ID: As I said before, apparently it is practical in the Java world. On Wed, Apr 30, 2008 at 5:26 PM, Benjamin Peterson wrote: > On Wed, Apr 30, 2008 at 7:16 PM, Guido van Rossum wrote: > > > > But I didn't find a rationale. I'm sure it was PBP though. > > > > > > What's PBP? (A search only turns up a bicycle race. :)) > > > > Practicality Beats Purity, from the zen of Python > > It's practical to have a builtin function silently "lie" about the > length of a sequence? I don't see how that makes anybody's life much > easier. > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > -- > Cheers, > Benjamin Peterson > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Thu May 1 03:04:35 2008 From: rasky at develer.com (Giovanni Bajo) Date: Thu, 1 May 2008 01:04:35 +0000 (UTC) Subject: [Python-3000] Removal of os.path.walk References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> Message-ID: On Wed, 30 Apr 2008 16:02:31 -0700, Guido van Rossum wrote: > There is one use case I can see for an iterator-version of os.listdir() > (to be named os.opendir()): when globbing a huge directory looking for a > certain pattern. Using os.listdir() you end up needed enough memory to > hold all of the names at once. Using os.opendir() you would need only > enough memory to hold all of the names THAT MATCH. Not only that, but you can also start processing files one by one without having to wait for the whole list to be constructed (which might take time over a network file system); in fact, the user might even want to abort the operation after a few files were processed, in which case the whole directory is not accessed. -- Giovanni Bajo Develer S.r.l. http://www.develer.com From ishimoto at gembook.org Thu May 1 05:06:16 2008 From: ishimoto at gembook.org (atsuo ishimoto) Date: Thu, 1 May 2008 12:06:16 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> Message-ID: <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com> On Thu, May 1, 2008 at 2:36 AM, Guido van Rossum wrote: > I still like this proposal. I don't quite understand the competing (?) > proposal by Stephen Turnbull; perhaps Stephen can compare and contrast > the two proposals? I think Stephen's proposal is not competing to Martin's proposal, but add some characters to be hex-escaped as ambiguous. > And where does Atsuo fall? Sorry, I cannot understand word 'fall', perhaps a colloquial expression? If you mean 'Hey, Atsuo. Hurry up!', then I have just uploaded draft PEP to Python Wiki. http://wiki.python.org/moin/Python3kStringRepr Feedback and suggestions are much appreciated. From stephen at xemacs.org Thu May 1 06:06:34 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 May 2008 13:06:34 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com> Message-ID: <87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp> atsuo ishimoto writes: > > And where does Atsuo fall? > > Sorry, I cannot understand word 'fall', perhaps a colloquial expression? In this case, it means "what is your opinion, compared to Stephen and Martin?" > If you mean 'Hey, Atsuo. Hurry up!', then I have just uploaded draft > PEP to Python Wiki. Great! I'll take a look tomorrow or Friday. From martin at v.loewis.de Thu May 1 07:31:33 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 07:31:33 +0200 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> Message-ID: <481955B5.2030805@v.loewis.de> Guido van Rossum wrote: > There is one use case I can see for an iterator-version of > os.listdir() (to be named os.opendir()): when globbing a huge > directory looking for a certain pattern. Using os.listdir() you end up > needed enough memory to hold all of the names at once. Using > os.opendir() you would need only enough memory to hold all of the > names THAT MATCH. You would still have to read the entire directory, right? In that kind of class, there is a number of applications; e.g. du(1) also wouldn't have to create a list of all files in the directory, but add the sizes of the files incrementally. So the question really is whether it is a problem to keep all file names in memory simultaneously. As Aahz says, the total memory consumption for a large directory is still comparatively low, for today's machines. Regards, Martin From greg.ewing at canterbury.ac.nz Thu May 1 07:33:10 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 01 May 2008 17:33:10 +1200 Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1> Message-ID: <48195616.9000403@canterbury.ac.nz> Brett Cannon wrote: > There is a general dislike in putting code in a package's __init__ > module. Why? What's the point of having an __init__.py file if you're not allowed to put any code there? If it's something that applies to the package as a whole, that seems like the obvious place to put it. -- Greg From martin at v.loewis.de Thu May 1 09:08:48 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 01 May 2008 09:08:48 +0200 Subject: [Python-3000] range() issues In-Reply-To: <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> Message-ID: <48196C80.6020608@v.loewis.de> > These numbers aren't ridiculously large. I just tried > > for i in range(2**31): pass > > on my (32-bit) laptop: it took 736.8 seconds, or about 12 and a bit minutes. > (An aside: in contrast, > > for i in range(2**31-1): pass > > took only 131.1 seconds; looks like there's some potential for optimization > here....) No, it means the optimization has already been implemented: py> iter(range(2**31-1)) py> iter(range(2**31)) IOW, you can iterate over very long ranges, but doing so will be much slower (per element) than iterating over a short range. Regards, Martin From ncoghlan at gmail.com Thu May 1 12:20:04 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 01 May 2008 20:20:04 +1000 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <481955B5.2030805@v.loewis.de> References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> Message-ID: <48199954.4000800@gmail.com> Martin v. L?wis wrote: > Guido van Rossum wrote: >> There is one use case I can see for an iterator-version of >> os.listdir() (to be named os.opendir()): when globbing a huge >> directory looking for a certain pattern. Using os.listdir() you end up >> needed enough memory to hold all of the names at once. Using >> os.opendir() you would need only enough memory to hold all of the >> names THAT MATCH. > > You would still have to read the entire directory, right? > In that kind of class, there is a number of applications; > e.g. du(1) also wouldn't have to create a list of all files > in the directory, but add the sizes of the files incrementally. > > So the question really is whether it is a problem to keep > all file names in memory simultaneously. As Aahz says, the > total memory consumption for a large directory is still > comparatively low, for today's machines. I think Giovanni's point is an important one as well - with an iterator, you can pipeline your operations far more efficiently, since you don't have to wait for the whole directory listing before doing anything (e.g. if you're doing some kind of move/rename operation on a directory, you can start copying the first file to its new location without having to wait for the directory read to finish). Reducing the startup delays of an operation can be a very useful thing when it comes to providing a user with a good feeling of responsiveness from an application (and if it allows the application to more effectively pipeline something, there may be an actual genuine improvement in responsiveness, rather than just the appearance of one). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From richard at tartarus.org Thu May 1 13:36:37 2008 From: richard at tartarus.org (Richard Boulton) Date: Thu, 01 May 2008 12:36:37 +0100 Subject: [Python-3000] range() issues In-Reply-To: <48196C80.6020608@v.loewis.de> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <48196C80.6020608@v.loewis.de> Message-ID: <4819AB45.3060606@tartarus.org> Martin v. L?wis wrote: >> These numbers aren't ridiculously large. I just tried >> >> for i in range(2**31): pass >> >> on my (32-bit) laptop: it took 736.8 seconds, or about 12 and a bit minutes. >> (An aside: in contrast, >> >> for i in range(2**31-1): pass >> >> took only 131.1 seconds; looks like there's some potential for optimization >> here....) There's always potential for optimization ... just a question of whether it's worth the increased coding (and maintenance) effort. > No, it means the optimization has already been implemented: > > py> iter(range(2**31-1)) > > py> iter(range(2**31)) > > > IOW, you can iterate over very long ranges, but doing so will be much > slower (per element) than iterating over a short range. In the slow example given, only one of the returned items needs to be a long, so a possible further optimisation which would work well for this case would be to automatically split the range into two parts - the part which only needs short integers, and the part which needs longs, and have a "mixedrange_iterator" type which returned all the items from one of these, followed by all the items from the other. In the general case, there might need to be three such sub-iterators used: range(-2**32, 2**32), for example, would be decomposed into range(-2**32, -2**31-1) + range(-2**31, 2**31-1) + range(2**31, 2**32) Not saying it's worth doing this optimisation, particularly, but I'm going to guess that these are the lines the grandparent poster was thinking along. -- Richard From martin at v.loewis.de Thu May 1 15:52:23 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 15:52:23 +0200 Subject: [Python-3000] range() issues In-Reply-To: <4819A506.6090807@lemurconsulting.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <48196C80.6020608@v.loewis.de> <4819A506.6090807@lemurconsulting.com> Message-ID: <4819CB17.2050109@v.loewis.de> > In the slow example given, only one of the returned items needs to be a > long This is Py3k. They are all longs. Regards, Martin From aahz at pythoncraft.com Thu May 1 16:25:24 2008 From: aahz at pythoncraft.com (Aahz) Date: Thu, 1 May 2008 07:25:24 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <481955B5.2030805@v.loewis.de> References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> Message-ID: <20080501142524.GA3546@panix.com> On Thu, May 01, 2008, "Martin v. L?wis" wrote: > Guido van Rossum wrote: >> >> There is one use case I can see for an iterator-version of >> os.listdir() (to be named os.opendir()): when globbing a huge >> directory looking for a certain pattern. Using os.listdir() you end up >> needed enough memory to hold all of the names at once. Using >> os.opendir() you would need only enough memory to hold all of the >> names THAT MATCH. > > You would still have to read the entire directory, right? In that > kind of class, there is a number of applications; e.g. du(1) also > wouldn't have to create a list of all files in the directory, but add > the sizes of the files incrementally. Actually, the primary application I'm thinking of is a CGI that displays part of a directory listing (paged) for manual processing of individual files. > So the question really is whether it is a problem to keep all file > names in memory simultaneously. As Aahz says, the total memory > consumption for a large directory is still comparatively low, for > today's machines. Only for a single process. Throw together three or ten processes, and it adds up. As I said, not a huge problem, but defintely the potential for pain. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Help a hearing-impaired person: http://rule6.info/hearing.html From ncoghlan at gmail.com Thu May 1 16:41:57 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 02 May 2008 00:41:57 +1000 Subject: [Python-3000] range() issues In-Reply-To: <4819CB17.2050109@v.loewis.de> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <48196C80.6020608@v.loewis.de> <4819A506.6090807@lemurconsulting.com> <4819CB17.2050109@v.loewis.de> Message-ID: <4819D6B5.3060905@gmail.com> Martin v. L?wis wrote: >> In the slow example given, only one of the returned items needs to be a >> long > > This is Py3k. They are all longs. Not inside the object they aren't - I believe the optimised one uses C longs internally, and converts to a Python long when it returns the values, whereas 'longrange' uses Python long objects internally as well. Oddly enough, this is going to make the increment/decrement operations for the counter quite a bit slower :) One way to optimise this (since all we need to support here is counting rather than arbitrary arithmetic) would be for the longrange iterator to use some simple pure C fixed point arithmetic internally to keep track of an arbitrarily long counter, and only convert to a Python long when it has to (just like the optimised shortrange iterator). I'm not sure it is worth the hassle though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From dickinsm at gmail.com Thu May 1 16:59:38 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Thu, 1 May 2008 10:59:38 -0400 Subject: [Python-3000] range() issues In-Reply-To: <4819D6B5.3060905@gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <48196C80.6020608@v.loewis.de> <4819A506.6090807@lemurconsulting.com> <4819CB17.2050109@v.loewis.de> <4819D6B5.3060905@gmail.com> Message-ID: <5c6f2a5d0805010759w8610ff0oc6e3c4e7aa2c9fc5@mail.gmail.com> On Thu, May 1, 2008 at 10:41 AM, Nick Coghlan wrote: > One way to optimise this (since all we need to support here is counting > rather than arbitrary arithmetic) would be for the longrange iterator to use > some simple pure C fixed point arithmetic internally to keep track of an > arbitrarily long counter, and only convert to a Python long when it has to > (just like the optimised shortrange iterator). > Stop already! It was an ill-considered, throwaway comment, and I apologise for making it. > I'm not sure it is worth the hassle though. > Indeed. Using such a large range is almost certainly not common enough to make it worth optimising... Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From collinw at gmail.com Thu May 1 16:41:35 2008 From: collinw at gmail.com (Collin Winter) Date: Thu, 1 May 2008 07:41:35 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: Message-ID: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com> On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon wrote: > [bcc to stdlib-sig] > > After two false starts over the YEARS of trying to cleanup and > reorganize the stdlib, creating a SIG to get this going, having Guido > give the PEP the once-over over the past several days, and creating > two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready > for public vetting! > > While reading this PEP, do remember this is only about either removing > modules, renaming them, or moving them into a package. Additions are > not covered by this PEP! > > Also realize all of the right people have been consulted on this stuff > (e.g., the web SIG about the urllib package). So please do not think > that something that seems drastic (e.g., the removal of all > Mac-specific modules) was taken lightly when in fact the proper people > were asked and they were okay with what is going on. > > Lastly, I do not want this to turn into a drawn-out thread about how > people think some module should stay because they happen to use it or > suggest some other module to remove. Please think before you propose a > change. I have been through this proposal process for this reorg > before and every time it has gotten way out of control. I do not want > it happen this time. > > OK, with all of that out of the way, here is the PEP: > ----------------------------------------------- > > PEP: 3108 > Title: Standard Library Reorganization > Version: $Revision: 62573 $ > Last-Modified: $Date: 2008-04-28 17:56:36 -0700 (Mon, 28 Apr 2008) $ > Author: Brett Cannon > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 01-Jan-2007 > Python-Version: 3.0 > Post-History: > > > Abstract > ======== > > Just like the language itself, Python's standard library (stdlib) has > grown over the years to be very rich. But over time some modules > have lost their need to be included with Python. There has also been > an introduction of a naming convention for modules since Python's > inception that not all modules follow. > > Python 3.0 has presented a chance to remove modules that do not have > long term usefulness. This chance also allows for the renaming of > modules so that they follow the Python style guide [#pep-0008]_. This > PEP lists modules that should not be included in Python 3.0 and what > modules need to be renamed. > > > Modules to Remove > ================= > > Guido pronounced that "silly old stuff" is to be deleted from the > stdlib for Py3K [#silly-old-stuff]_. This is open-ended on purpose. > Each module to be removed needs to have a justification as to why it > should no longer be distributed with Python. This can range from the > module being deprecated in Python 2.x to being for a platform that is > no longer widely used. > > This section of the PEP lists the various modules to be removed. Each > subsection represents a different reason for modules to be > removed. Each module must have a specific justification on top of > being listed in a specific subsection so as to make sure only modules > that truly deserve to be removed are in fact removed. > > When a reason mentions how long it has been since a module has been > "uniquely edited", it is in reference to how long it has been since a > checkin was done specifically for the module and not for a change that > applied universally across the entire stdlib. If an edit time is not > denoted as "unique" then it is the last time the file was edited, > period. > > The procedure to thoroughly remove a module is: > > #. Remove the module. > #. Remove the tests. > #. Edit ``Modules/Setup.dist`` and ``setup.py`` if needed. > #. Remove the docs (if applicable). > #. Run the regression test suite (using ``-uall``); watch out for > tests that are skipped because an import failed for the removed > module. > > If a deprecation warning is added to 2.6, it would be better to make > all the changes to 2.6, merge the changes into the 3k branch, then > perform the procedure above. This will avoid some merge conflicts. > > > Previously deprecated > --------------------- > > PEP 4 lists all modules that have been deprecated in the stdlib > [#pep-0004]_. The specified motivations mirror those listed in > PEP 4. All modules listed > in the PEP at the time of the first alpha release of Python 3.0 will > be removed. > > The entire contents of lib-old will also be removed. These modules > have already been removed from being imported but are kept in the > distribution for Python for users that rely upon the code. > > * buildtools > > + Documented as deprecated since Python 2.3 without an explicit > reason. > > * cfmfile > > + Documented as deprecated since Python 2.4 without an explicit > reason. > > * cl > > + Documented as obsolete since Python 2.0 or earlier. > + Interface to SGI hardware. > > * md5 > > + Supplanted by the ``hashlib`` module. > > * mimetools > > + Documented as obsolete without an explicit reason. > > * MimeWriter > > + Supplaned by the ``email`` package. > > * mimify > > + Supplanted by the ``email`` package. > > * multifile > > + Supplanted by the ``email`` package. > > * posixfile > > + Locking is better done by ``fcntl.lockf()``. > > * rfc822 > > + Supplanted by the ``email`` package. > > * sha > > + Supplanted by the ``hashlib`` package. > > * sv > > + Documented as obsolete since Python 2.0 or earlier. > + Interface to obsolete SGI Indigo hardware. > > * timing > > + Documented as obsolete since Python 2.0 or earlier. > + ``time.clock()`` gives better time resolution. > > > Platform-specific with minimal use > ---------------------------------- > > Python supports many platforms, some of which are not widely held. > And on some of these platforms there are modules that have limited use > to people on those platforms. Because of their limited usefulness it > would be better to no longer burden the Python development team with > their maintenance. > > The module mentioned below are documented. All undocumented modules > for the specified platforms will also be removed. > > IRIX > ///// > The IRIX operating system is no longer produced [#irix-retirement]_. > Removing all modules from the plat-irix[56] directory has been deemed > reasonable because of this fact. > > + AL/al [done: 3.0] > > - Provides sound support on Indy and Indigo workstations. > - Both workstations are no longer available. > - Code has not been uniquely edited in three years. > > + cd [done: 3.0] > > - CD drive control for SGI systems. > - SGI no longer sells machines with IRIX on them. > - Code has not been uniquely edited in 14 years. > > + cddb [done: 3.0] > > - Undocumented. > > + cdplayer [done: 3.0] > > - Undocumented. > > + cl/CL/CL_old [done: 3.0] > > - Compression library for SGI systems. > - SGI no longer sells machines with IRIX on them. > - Code has not been uniquely edited in 14 years. > > + DEVICE/GL/gl/cgen/cgensuport [done: 3.0] > > - GL access, which is the predecessor to OpenGL. > - Has not been edited in at least eight years. > - Third-party libraries provide better support (PyOpenGL [#pyopengl]_). > > + ERRNO [done: 3.0] > > - Undocumented. > > + FILE [done: 3.0] > > - Undocumented. > > + FL/fl/flp [done: 3.0] > > - Wrapper for the FORMS library [#irix-forms]_ > - FORMS has not been edited in 12 years. > - Library is not widely used. > - First eight hits on Google are for Python docs for fl. > > + fm [done: 3.0] > > - Wrapper to the IRIS Font Manager library. > - Only available on SGI machines which no longer come with IRIX. > > + GET [done: 3.0] > > - Undocumented. > > + GLWS [done: 3.0] > > - Undocumented. > > + imgfile [done: 3.0] > > - Wrapper for SGI libimage library for imglib image files > (``.rgb`` files). > - Python Imaging Library provdes read-only support [#pil]_. > - Not uniquely edited in 13 years. > > + IN [done: 3.0] > > - Undocumented. > > + IOCTL [done: 3.0] > > - Undocumented. > > + jpeg [done: 3.0] > > - Wrapper for JPEG (de)compressor. > - Code not uniquely edited in nine years. > - Third-party libraries provide better support > (Python Imaging Library [#pil]_). > > + panel [done: 3.0] > > - Undocumented. > > + panelparser [done: 3.0] > > - Undocumented. > > + readcd [done: 3.0] > > - Undocumented. > > + SV [done: 3.0] > > - Undocumented. > > + torgb [done: 3.0] > > - Undocumented. > > + WAIT [done: 3.0] > > - Undocumented. > > > Mac-specific modules > //////////////////// > > The Mac-specific modules are mostly unmaintained (e.g., the bgen > tool used to auto-generate many of the modules has never been > updated to support UCS-4). It is also not Python's place to maintain > such a large amount of OS-specific modules. Thus all modules under > plat-mac are to be removed. > > A stub module for proxy access will be provided for use by urllib. > > * _builtinSuites > > - Undocumented. > - Package under lib-scriptpackages. > > * Audio_mac > > - Undocumented. > > * aepack > > - OSA support is better through third-party modules. > > * Appscript [#appscript]_. > > - Hard-coded endianness which breaks on Intel Macs. > - Might need to rename if Carbon package dependent. > > * aetools > > - See aepack. > > * aetypes > > - See aepack. > > * applesingle > > - Undocumented. > - AppleSingle is a binary file format for A/UX. > - A/UX no longer distributed. > > * appletrawmain > > - Undocumented. > > * appletrunner > > - Undocumented. > > * argvemulator > > - Undocumented. > > * autoGIL > > - Very bad model for using Python with the CFRunLoop. > > * bgenlocations > > - Undocumented. > > * bundlebuilder > > - Undocumented. > > * Carbon > > - Carbon development has stopped. > - Does not support 64-bit systems completely. > - Dependent on bgen which has never been updated to support UCS-4 > Unicode builds of Python. > > * CodeWarrior > > - Undocumented. > - Package under lib-scriptpackages. > > * ColorPicker > > - Better to use Cocoa for GUIs. > > * EasyDialogs > > - Better to use Cocoa for GUIs. > > * Explorer > > - Undocumented. > - Package under lib-scriptpackages. > > * Finder > > - Undocumented. > - Package under lib-scriptpackages. > > > * findertools > > - No longer useful. > > * FrameWork > > - Poorly documented. > - Not updated to support Carbon Events. > > * gensuitemodule > > - See aepack. > > * ic > > * icopen > > - Not needed on OS X. > - Meant to replace 'open' which is usually a bad thing to do. > > * macerrors > > - Undocumented. > > * MacOS > > - Would also mean the removal of binhex. > > * macostools > > * macresource > > - Undocumented. > > * MiniAEFrame > > - See aepack. > > * Nav > > - Undocumented. > > * Netscape > > - Undocumented. > - Package under lib-scriptpackages. > > > * pimp > > - Undocumented. > > * PixMapWrapper > > - Undocumented. > > * StdSuites > > - Undocumented. > - Package under lib-scriptpackages. > > * SystemEvents > > - Undocumented. > - Package under lib-scriptpackages. > > * Terminal > > - Undocumented. > - Package under lib-scriptpackages. > > > * terminalcommand > > - Undocumented. > > * videoreader > > - No longer used. > > * W > > - No longer distributed with Python. > > > .. _PyObjC: http://pyobjc.sourceforge.net/ > > > Solaris > /////// > > + SUNAUDIODEV/sunaudiodev [done: 3.0] > > - Access to the sound card on Sun machines. > - Code not uniquely edited in over eight years. > > > Hardly used > ------------ > > Some modules that are platform-independent are hardly used. This > can be from how easy it is to implement the functionality from scratch > or because the audience for the code is very small. > > * audiodev [done: 3.0] > > + Undocumented. > + Not edited in five years. > + If removed sunaudio should go as well (also undocumented; not > edited in over seven years). > > * imputil > > + Undocumented. > + Never updated to support absolute imports. > > * mutex > > + Easy to implement using a semaphore and a queue. > + Cannot block on a lock attempt. > + Not uniquely edited since its addition 15 years ago. > + Only useful with the 'sched' module. > + Not thread-safe. > > > * stringold [done: 3.0] > > + Function versions of the methods on string objects. > + Obsolete since Python 1.6. > + Any functionality not in the string object or module will be moved > to the string module (mostly constants). > > * symtable/_symtable > > + Undocumented. > > * toaiff [done: 3.0, moved to Demo] > > + Undocumented. > + Requires ``sox`` library to be installed on the system. > > * user > > + Easily handled by allowing the application specify its own > module name, check for existence, and import if found. > > * new [done: 3.0] > > + Just a rebinding of names from the 'types' module. > + Can also call ``type`` built-in to get most types easily. > + Docstring states the module is no longer useful as of revision > 27241 (2002-06-15). > > * pure [done: 3.0] > > + Written before Pure Atria was bought by Rational which was then > bought by IBM (in other words, very old). > > * test.testall [done: 3.0] > > + From the days before regrtest. > > > Obsolete > -------- > > Becoming obsolete signifies that either another module in the stdlib > or a widely distributed third-party library provides a better solution > for what the module is meant for. > > * Bastion/rexec [done: 3.0] > > + Restricted execution / security. > + Turned off in Python 2.3. > + Modules deemed unsafe. > > * bsddb185 [done: 3.0] > > + Superceded by bsddb3 > + Not built by default. > + Documentation specifies that the "module should never be used > directly in new code". > > * commands > > + subprocess module replaces it [#pep-0324]_. > + Remove getstatus(), move rest to subprocess. > > * compiler (need to add AST -> bytecode mechanism) [done: 3.0] > > + Having to maintain both the built-in compiler and the stdlib > package is redundant [#ast-removal]_. > + The AST created by the compiler is available [#ast]_. > + Mechanism to compile from an AST needs to be added. > > * dircache > > + Negligible use. > + Easily replicated. > > * dl [done: 3.0] > > + ctypes provides better support for same functionality. > > * fpformat > > + All functionality is supported by string interpolation. > > * htmllib > > + Superceded by HTMLParser. > > * ihooks > > + Undocumented. > + For use with rexec which has been turned off since Python 2.3. > > * imageop [done: 3.0] > > + Better support by third-party libraries > (Python Imaging Library [#pil]_). > + Unit tests relied on rgbimg and imgfile. > - rgbimg was removed in Python 2.6. > - imgfile slated for removal in this PEP. [done: 3.0] > > * linuxaudiodev [done: 3.0] > > + Replaced by ossaudiodev. > > * mhlib > > + Obsolete mailbox format. > > * popen2 [done: 3.0] > > + subprocess module replaces them [#pep-0324]_. > > * sched > > + Replaced by threading.Timer. > > > * sgmllib > > + Does not fully parse SGML. > + In the stdlib for support to htmllib which is slated for removal. > > * stat > > + ``os.stat`` now returns a tuple with attributes. > + Functions in the module should be made into methods for the object > returned by os.stat. > > * statvfs > > + ``os.statvfs`` now returns a tuple with attributes. > > * thread > > + People should use 'threading' instead. > > - Rename 'thread' to _thread. > - Deprecate dummy_thread and rename _dummy_thread. > - Move thread.get_ident over to threading. > > + Guido has previously supported the deprecation > [#thread-deprecation]_. > > * urllib > > + Superceded by urllib2. > + Functionality unique to urllib will be kept in the > `urllib package`_. > > * UserDict [done: 3.0] > > + Not as useful since types can be a superclass. > + Useful bits moved to the 'collections' module. > > * UserList/UserString [done: 3.0] > > + Not useful since types can be a superclass. > > > Modules to Rename > ================= > > Along with the stdlib gaining some modules that are no longer > relevant, there is also the issue of naming. Many modules existed in > the stdlib before PEP 8 came into existence [#pep-0008]_. This has > led to some naming inconsistencies and namespace bloat that should be > addressed. > > > PEP 8 violations > ---------------- > > PEP 8 specifies that modules "should have short, all-lowercase names" > where "underscores can be used ... if it improves readability" > [#pep-0008]_. The use of underscores is discouraged in package names. > The following modules violate PEP 8 and are not somehow being renamed > by being moved to a package. > > ================== ================================================== > Current Name Replacement Name > ================== ================================================== > _winreg winreg (rename also because module has a public > interface and thus should not have a leading > underscore) > ConfigParser configparser > copy_reg copyreg > PixMapWrapper pixmapwrapper > Queue queue > SocketServer socketserver > ================== ================================================== > > > Merging C and Python implementations of the same interface > ---------------------------------------------------------- > > Several interfaces have both a Python and C implementation. While it > is great to have a C implementation for speed with a Python > implementation as fallback, there is no need to expose the two > implementations independently in the stdlib. For Python 3.0 all > interfaces with two implementations will be merged into a single > public interface. > > The C module is to be given a leading underscore to delineate the fact > that it is not the reference implementation (the Python implementation > is). This means that any semantic difference between the C and Python > versions must be dealt with before Python 3.0 or else the C > implementation will be removed until it can be fixed. > > One interface that is not listed below is xml.etree.ElementTree. This > is an externally maintained module and thus is not under the direct > control of the Python development team for renaming. See `Open > Issues`_ for a discussion on this. > > * pickle/cPickle > > + Rename cPickle to _pickle. > + Semantic completeness of C implementation *not* verified. > > * profile/cProfile > > + Rename cProfile to _profile. > + Semantic completeness of C implementation *not* verified. > > * StringIO/cStringIO [done: 3.0] > > + Add the class to the 'io' module. > > > No public, documented interface > ------------------------------- > > There are several modules in the stdlib that have no defined public > interface. These modules exist as support code for other modules that > are exposed. Because they are not meant to be used directly they > should be renamed to reflect this fact. > > ============ =============================== > Current Name Replacement Name > ============ =============================== > markupbase _markupbase [done: 3.0] > dummy_thread _dummy_thread [#]_ > ============ =============================== > > .. [#] Assumes ``thread`` is renamed to ``_thread``. > > > Poorly chosen names > ------------------- > > A few modules have names that were poorly chosen in hindsight. They > should be renamed so as to prevent their bad name from perpetuating > beyond the 2.x series. > > ================= =============================== > Current Name Replacement Name > ================= =============================== > repr reprlib > test.test_support test.support > ================= =============================== > > > Grouping of modules > ------------------- > > As the stdlib has grown, several areas within it have expanded to > include multiple modules (e.g., dbm support). Thus some new packages > make sense where the renaming makes a module's name easier to work > with. > > > dbm package > /////////// > > ================= =============================== > Current Name Replacement Name > ================= =============================== > anydbm dbm.tools [1]_ > dbhash dbm.bsd > dbm dbm.ndbm > dumbdm dbm.dumb > gdbm dbm.gnu > whichdb dbm.tools [1]_ > ================= =============================== > > > .. [1] ``dbm.tools`` can combine ``anybdbm`` and ``whichdb`` since the public > API for both modules has no name conflict and the two modules have > closely related usage. > > > > html package > //////////// > > ================== =============================== > Current Name Replacement Name > ================== =============================== > HTMLParser html.parser > htmlentitydefs html.entities > ================== =============================== > > > http package > //////////// > > ================= =============================== > Current Name Replacement Name > ================= =============================== > httplib http.client > BaseHTTPServer http.server [2]_ > CGIHTTPServer http.server [2]_ > SimpleHTTPServer http.server [2]_ > Cookie http.cookies > cookielib http.cookiejar > ================= =============================== > > .. [2] The ``http.server`` module can combine the specified modules > safely as they have no naming conflicts. > > > tkinter package > /////////////// > > ================== =============================== > Current Name Replacement Name > ================== =============================== > Canvas tkinter.canvas > Dialog tkinter.dialog > FileDialog tkinter.filedialog [4]_ > FixTk tkinter._fix > ScrolledText tkinter.scrolledtext > SimpleDialog tkinter.simpledialog [5]_ > Tix tkinter.tix > Tkconstants tkinter.constants > Tkdnd tkinter.dnd > Tkinter tkinter.__init__ > tkColorChooser tkinter.colorchooser > tkCommonDialog tkinter.commondialog > tkFileDialog tkinter.filedialog [4]_ > tkFont tkinter.font > tkMessageBox tkinter.messagebox > tkSimpleDialog tkinter.simpledialog [5]_ > turtle tkinter.turtle > ================== =============================== > > .. [4] ``tkinter.filedialog`` can safely combine ``FileDialog`` and > ``tkFileDialog`` as there are no naming conflicts. > > .. [5] ``tkinter.simpledialog`` can safely combine ``SimpleDialog`` > and ``tkSimpleDialog`` have no naming conflicts. > > > urllib package > ////////////// > > Originally this new package was to be named ``url``, but because of > the common use of the name as a variable, it has been deemed better > to keep the name ``urllib`` and instead shift existing modules around > into a new package. > > ================== =============================== > Current Name Replacement Name > ================== =============================== > urllib2 urllib.request > urlparse urllib.parse > urllib urllib.parse, urllib.request [6]_ > ================== =============================== > > .. [6] The quoting-related functions from ``urllib`` will be added > to ``urllib.parse``. ``urllib.URLOpener`` and > ``urllib.FancyUrlOpener`` will be added to ``urllib.request`` > as long as the documentation for both modules is updated. > > > xmlrpc package > ////////////// > > ================== =============================== > Current Name Replacement Name > ================== =============================== > xmlrpclib xmlrpc.client > SimpleXMLRPCServer xmlrpc.server [3]_ > CGIXMLRPCServer xmlrpc.server [3]_ > ================== =============================== > > .. [3] The modules being combined into ``xmlrpc.server`` have no > naming conflicts and thus can safely be merged. > > > Transition Plan > =============== > > For modules to be removed > ------------------------- > > For the removal of modules that are continuing to exist in the Python > 2.x series (i.e., not deprecated explicitly in the 2.x series), > ``warnings.warn3k()`` will be used to issue a DeprecationWarning. FYI, we can also flag these using 2to3. > Renaming of modules > ------------------- > > For modules that are renamed, stub modules will be created with the > original names and be kept in a directory within the stdlib (e.g. like > how lib-old was once used). The need to keep the stub modules within > a directory is to prevent naming conflicts with case-insensitive > filesystems in those cases where nothing but the case of the module > is changing. > > These stub modules will import the module code based on the new > naming. The same type of warning being raised by modules being > removed will be raised in the stub modules. > > Support in the 2to3 refactoring tool for renames will also be used > [#2to3]_. Import statements will be rewritten so that only the import > statement and none of the rest of the code needs to be touched. This > will be accomplished by using the ``as`` keyword in import statements > to bind in the module namespace to the old name while importing based > on the new name. You should cite the existing fix_imports fixer as one example of how to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup > Open Issues > =========== > > Renaming of modules maintained outside of the stdlib > ---------------------------------------------------- > > xml.etree.ElementTree not only does not meet PEP 8 naming standards > but it also has an exposed C implementation [#pep-0008]_. It is an > externally maintained package, though [#pep-0360]_. A request will be > made for the maintainer to change the name so that it matches PEP 8 > and hides the C implementation. > > > Rejected Ideas > ============== > > Modules that were originally suggested for removal > -------------------------------------------------- > > * asynchat/asyncore > > + Josiah Carlson has said he will maintain the modules. > > * audioop/sunau/aifc > > + Audio modules where the formats are still used. > > * base64/quopri/uu > > + All still widely used. > + 'codecs' module does not provide as nice of an API for basic > usage. > > * fileinput > > + Useful when having to work with stdin. > > * linecache > > + Used internally in several places. > > * nis > > + Testimonials from people that new installations of NIS are still > occurring > > * getopt > > + Simpler than optparse. > > * repr > > + Useful as a basis for overriding. > + Used internally. > > * telnetlib > > + Really handy for quick-and-dirty remote access. > + Some hardware supports using telnet for configuration and > querying. > > * Tkinter > > + Would prevent IDLE from existing. > + No GUI toolkit would be available out of the box. > > > Introducing a new top-level package > ----------------------------------- > > It has been suggested that the entire stdlib be placed within its own > package. This PEP will not address this issue as it has its own > design issues (naming, does it deserve special consideration in import > semantics, etc.). Everything within this PEP can easily be handled if > a new top-level package is introduced. > > > Introducing new packages to contain theme-related modules > --------------------------------------------------------- > > During the writing of this PEP it was noticed that certain themes > appeared in the stdlib. In the past people have suggested introducing > new packages to help collect modules that share a similar theme (e.g., > audio). An Open Issue was created to suggest some new packages to > introduce. > > In the end, though, not enough support could be pulled together to > warrant moving forward with the idea. Instead name simplification has > been chosen as the guiding force for PEPs to create. > > > References > ========== > > .. [#pep-0004] PEP 4: Deprecation of Standard Modules > (http://www.python.org/dev/peps/pep-0004/) > > .. [#pep-0008] PEP 8: Style Guide for Python Code > (http://www.python.org/dev/peps/pep-0008/) > > .. [#pep-0324] PEP 324: subprocess -- New process module > (http://www.python.org/dev/peps/pep-0324/) > > .. [#pep-0360] PEP 360: Externally Maintained Packages > (http://www.python.org/dev/peps/pep-0360/) > > .. [#module-index] Python Documentation: Global Module Index > (http://docs.python.org/modindex.html) > > .. [#timing-module] Python Library Reference: Obsolete > (http://docs.python.org/lib/obsolete-modules.html) > > .. [#silly-old-stuff] Python-Dev email: "Py3k release schedule worries" > (http://mail.python.org/pipermail/python-3000/2006-December/005130.html) > > .. [#thread-deprecation] Python-Dev email: Autoloading? > (http://mail.python.org/pipermail/python-dev/2005-October/057244.html) > > .. [#py-dev-summary-2004-11-01] Python-Dev Summary: 2004-11-01 > (http://www.python.org/dev/summary/2004-11-01_2004-11-15/#id10) > > .. [#2to3] 2to3 refactoring tool > (http://svn.python.org/view/sandbox/trunk/2to3/) > > .. [#pyopengl] PyOpenGL > (http://pyopengl.sourceforge.net/) > > .. [#pil] Python Imaging Library (PIL) > (http://www.pythonware.com/products/pil/) > > .. [#twisted] Twisted > (http://twistedmatrix.com/trac/) > > .. [#irix-retirement] SGI Press Release: > End of General Availability for MIPS IRIX Products -- December 2006 > (http://www.sgi.com/support/mips_irix.html) > > .. [#irix-forms] FORMS Library by Mark Overmars > (ftp://ftp.cs.ruu.nl/pub/SGI/FORMS) > > .. [#sun-au] Wikipedia: Au file format > (http://en.wikipedia.org/wiki/Au_file_format) > > .. [#appscript] appscript > (http://appscript.sourceforge.net/) > > .. [#ast] _ast module > (http://docs.python.org/lib/ast.html) > > .. [#ast-removal] python-dev email: getting compiler package failures > (http://mail.python.org/pipermail/python-3000/2007-May/007615.html) > > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/collinw%40gmail.com > From martin at v.loewis.de Thu May 1 17:26:03 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 17:26:03 +0200 Subject: [Python-3000] range() issues In-Reply-To: <4819D6B5.3060905@gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <48196C80.6020608@v.loewis.de> <4819A506.6090807@lemurconsulting.com> <4819CB17.2050109@v.loewis.de> <4819D6B5.3060905@gmail.com> Message-ID: <4819E10B.2040101@v.loewis.de> Nick Coghlan wrote: > Martin v. L?wis wrote: >>> In the slow example given, only one of the returned items needs to be a >>> long >> >> This is Py3k. They are all longs. > > Not inside the object they aren't Right, inside, they are longs - but the *returned items* are all longs. > One way to optimise this (since all we need to support here is counting > rather than arbitrary arithmetic) would be for the longrange iterator to > use some simple pure C fixed point arithmetic internally to keep track > of an arbitrarily long counter, and only convert to a Python long when > it has to (just like the optimised shortrange iterator). > > I'm not sure it is worth the hassle though. What simple pure C fixed point arithmetic would you be thinking of? The long type *is* a pure C fixed point arithmetic. There are perhaps some simplifications possible to longrangeiter_next possible, e.g. it doesn't need to perform a multiplication, but could just add the step each time. Also, it could cache the value 1 in a global variable, rather than creating a fresh one each time. Other than that, I cannot imagine why another fixed point arithmetic might be significantly faster. Regards, Martin From guido at python.org Thu May 1 17:57:10 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 08:57:10 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <20080501142524.GA3546@panix.com> References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> <20080501142524.GA3546@panix.com> Message-ID: On Thu, May 1, 2008 at 7:25 AM, Aahz wrote: > Actually, the primary application I'm thinking of is a CGI that displays > part of a directory listing (paged) for manual processing of individual > files. But wouldn't you usually want the listing sorted, while os.listdir() doesn't guarantee sorting? So you'd still have to read the entire thing, sort it, and then display the selected page. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 1 17:58:22 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 08:58:22 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <48199954.4000800@gmail.com> References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> <48199954.4000800@gmail.com> Message-ID: On Thu, May 1, 2008 at 3:20 AM, Nick Coghlan wrote: > I think Giovanni's point is an important one as well - with an iterator, > you can pipeline your operations far more efficiently, since you don't have > to wait for the whole directory listing before doing anything (e.g. if > you're doing some kind of move/rename operation on a directory, you can > start copying the first file to its new location without having to wait for > the directory read to finish). > > Reducing the startup delays of an operation can be a very useful thing when > it comes to providing a user with a good feeling of responsiveness from an > application (and if it allows the application to more effectively pipeline > something, there may be an actual genuine improvement in responsiveness, > rather than just the appearance of one). This sounds like optimizing for a super-rare case. And please do tell me if you've timed this. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Thu May 1 18:11:07 2008 From: aahz at pythoncraft.com (Aahz) Date: Thu, 1 May 2008 09:11:07 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> <20080501142524.GA3546@panix.com> Message-ID: <20080501161106.GA13254@panix.com> On Thu, May 01, 2008, Guido van Rossum wrote: > On Thu, May 1, 2008 at 7:25 AM, Aahz wrote: >> >> Actually, the primary application I'm thinking of is a CGI that displays >> part of a directory listing (paged) for manual processing of individual >> files. > > But wouldn't you usually want the listing sorted, while os.listdir() > doesn't guarantee sorting? So you'd still have to read the entire > thing, sort it, and then display the selected page. With hundreds of thousands of files, the sorting is done after filtering, so reducing the memory consumed during the filter stage is still extremely useful. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Help a hearing-impaired person: http://rule6.info/hearing.html From ishimoto at gembook.org Thu May 1 18:21:04 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 2 May 2008 01:21:04 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <797440730804181935p1f618e90ob1b8b9efb48932c3@mail.gmail.com> Message-ID: <797440730805010921r3d0b785bjb10e05d7aefc8d1e@mail.gmail.com> On Thu, May 1, 2008 at 2:34 AM, Guido van Rossum wrote: > This should be done with a new function, not added to print. Once you > specify an encoding, you have to write to sys.stdout.buffer, which is > the underlying binary stream; but you'd have to flush the > TextIOWrapper and deal with incomplete codec state, and in general I > don't think it's a good idea. Thank you for your comment. I'll reconsider this part in the PEP. From martin at v.loewis.de Thu May 1 18:33:05 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 18:33:05 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4819F0C1.8050401@v.loewis.de> > The problem is that this doesn't display the representation of strings > and identifier names in an unambiguous way. "AKMOT" could be > all-ASCII, it could be all-Cyrillic, or it could be a mixture of > ASCII, Cyrillic, and Greek. I don't see this is a problem. Yes, it can happen, but no, it is not a problem. Unless I lost the thread, we are still talking about the repr() of regular strings here, right? > How about choosing a standard Python repertoire (based on the Unicode > standard, of course) of which characters get a graphic repr and which > ones get \u-escaped, and have a post-hook for repr which gets passed > the string repr proposes to print out? You mean, you only display the characters if they form a valid identifier? That would not be good, since it would disallow display of symbols. Regards, Martin From martin at v.loewis.de Thu May 1 18:38:53 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 18:38:53 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4819F21D.8070808@v.loewis.de> > > I think "standard repertoire based on Unicode" may be confusing the issue. > > By "standard repertoire" I mean that all Pythons will show the same > characters the same way, while "based on Unicode" is intended to mean > looking at TR#36 and TR#39 in picking the repertoires. I don't think either TR#36 or TR#39 are applicable here. This is not identifier syntax; there may various symbols and whatnot in the string, which should also be rendered as-is. The escaping that repr() does is *not* to achieve unambiguity, but to achieve printability. Regards, Martin From guido at python.org Thu May 1 18:41:07 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 09:41:07 -0700 Subject: [Python-3000] Invitation to try out open source code review tool Message-ID: Some of you may have seen a video recorded in November 2006 where I showed off Mondrian, a code review tool that I was developing for Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped that I could release Mondrian as open source, but it was not to be: due to its popularity inside Google, it became more and more tied to proprietary Google infrastructure like Bigtable, and it remained limited to Perforce, the commercial revision control system most used at Google. What I'm announcing now is the next best thing: an code review tool for use with Subversion, inspired by Mondrian and (soon to be) released as open source. Some of the code is even directly derived from Mondrian. Most of the code is new though, written using Django and running on Google App Engine. I'm inviting the Python developer community to try out the tool on the web for code reviews. I've added a few code reviews already, but I'm hoping that more developers will upload at least one patch for review and invite a reviewer to try it out. To try it out, go here: http://codereview.appspot.com Please use the Help link in the top right to read more on how to use the app. Please sign in using your Google Account (either a Gmail address or a non-Gmail address registered with Google) to interact more with the app (you need to be signed in to create new issues and to add comments to existing issues). Don't hesitate to drop me a note with feedback -- note though that there are a few known issues listed at the end of the Help page. The Help page is really a wiki, so feel free to improve it! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu May 1 19:12:07 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 01 May 2008 19:12:07 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> Message-ID: <4819F9E7.9040706@v.loewis.de> > I still like this proposal. I don't quite understand the competing (?) > proposal by Stephen Turnbull; perhaps Stephen can compare and contrast > the two proposals? And where does Atsuo fall? IIUC, Stephen proposes to use some of the "security" algorithms for display, without (yet) specifying which one specifically. I don't think they apply, as these algorithms are designed for identifiers (in particular for use in programming languages and domain names); any character classified as "confusing" would get escaped. As Stephen elaborates, that would have the undesirable side effect of escaping the Cyrillic A (i.e. ?), likewise for some Greek letters. In any case, one would have to write a precise specification first (UTR#36/#39 leave options), and probably extend the tables in unicodedata. Atsuo's latest proposal (http://wiki.python.org/moin/Python3kStringRepr) is an elaboration of mine, I think. I would have phrased it slightly differently, i.e. - escaped are all Z* and C* characters, plus backslash, except space. In UCS-2 builds, half surrogates get escaped only if they don't occur as a pair. - escaping looks like this: * \r, \n, \t, \\ * \xXX for characters from Latin-1 * \uXXXX for characters from the BMP * \U00XXXXXX for anything else What I didn't have in my original proposal was escaping of Zs except for space, which then would also escape NBSP, EN QUAD, EM QUAD, THIN SPACE, HAIR SPACE, OGHAM SPACE MARK, etc. Escaping them is fine also. Also, I didn't consider surrogate pairs in UCS-2 builds originally; they should (of course) get represented as-is. The issue then is output of repr to a device, which may go wrong in two ways: - the device claims it supports the character, but doesn't actually have a glyph for it. In that case, the terminal encoding should be adjusted. - the device cannot display certain characters in the repr. Here, an escaping error handler can be used if desired. Regards, Martin From ishimoto at gembook.org Thu May 1 19:16:26 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 2 May 2008 02:16:26 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com> <87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <797440730805011016s17e93375jb3f2d35e81c105bf@mail.gmail.com> On Thu, May 1, 2008 at 1:06 PM, Stephen J. Turnbull wrote: > atsuo ishimoto writes: > > > > And where does Atsuo fall? > > > > Sorry, I cannot understand word 'fall', perhaps a colloquial expression? > > In this case, it means "what is your opinion, compared to Stephen and > Martin?" Oh, I see. Thank you. As I wrote, I think these proposals are not competing, so I don't 'fall' to neither side. In my PEP, I proposed to use Unicode properties based on proposal from Michael and Martin. It's almost identical as written by Martin, but I added Zs (Separator, Space) other than ASCII space('\x20'). This category contains characters listed at end of this mail. I assume these characters should be hex-escaped, although I know nothing about these characters. I think readability beats unambiguity for repr(), so I don't agree Stephen's view that "repr is like quoted-printable encoding in MIME". If the standard repertoire Stephen proposed is desired, the conversion based on the repertoire should be done against strings repr() produced. Such repertoire will be more useful if we have: def standard_string(s): return _convert_ambiguous_chars(s) print standard_string(repr(obj)), standard_string(sys.stdin.readline()) > Great! I'll take a look tomorrow or Friday. > Thank you. I'll looking forward your feedback. Characters defined as Zs:: --------------------------------------------------------- 0x20 SPACE 0xa0 NO-BREAK SPACE 0x1680 OGHAM SPACE MARK 0x2000 EN QUAD 0x2001 EM QUAD 0x2002 EN SPACE 0x2003 EM SPACE 0x2004 THREE-PER-EM SPACE 0x2005 FOUR-PER-EM SPACE 0x2006 SIX-PER-EM SPACE 0x2007 FIGURE SPACE 0x2008 PUNCTUATION SPACE 0x2009 THIN SPACE 0x200a HAIR SPACE 0x200b ZERO WIDTH SPACE 0x202f NARROW NO-BREAK SPACE 0x205f MEDIUM MATHEMATICAL SPACE 0x3000 IDEOGRAPHIC SPACE From stephen at xemacs.org Thu May 1 19:33:48 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 May 2008 02:33:48 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <4819F21D.8070808@v.loewis.de> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> Message-ID: <87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > The escaping that repr() does is *not* to achieve unambiguity, > but to achieve printability. Well, if that is the case, then I withdraw my comments pretty much entirely, and apologize for the noise. I think you've already specified what is needed to achieve printability correctly. From martin at v.loewis.de Thu May 1 19:31:20 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 19:31:20 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4819FE68.2010400@v.loewis.de> > > The escaping that repr() does is *not* to achieve unambiguity, > > but to achieve printability. > > Well, if that is the case, then I withdraw my comments pretty much > entirely, and apologize for the noise. I think you've already > specified what is needed to achieve printability correctly. After I posted this, I read Guido's earlier message that the case may not be as clear. So please take this as my own opinion, not as a given - some people apparently want repr to provide unambiguous output also. If so, I still don't think the security mechanisms of Unicode apply - if you have combining characters in the string, and the precombined version also exists in Unicode, then those algorithms may still not help. Likewise for compatibility characters. Regards, Martin From tjreedy at udel.edu Thu May 1 19:49:37 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 1 May 2008 13:49:37 -0400 Subject: [Python-3000] Displaying strings containing unicode escapes References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> Message-ID: ""Martin v. L?wis"" wrote in message news:4819F21D.8070808 at v.loewis.de... |> > I think "standard repertoire based on Unicode" may be confusing the issue. | > | > By "standard repertoire" I mean that all Pythons will show the same | > characters the same way, while "based on Unicode" is intended to mean | > looking at TR#36 and TR#39 in picking the repertoires. | | I don't think either TR#36 or TR#39 are applicable here. This is not | identifier syntax; there may various symbols and whatnot in the | string, which should also be rendered as-is. | | The escaping that repr() does is *not* to achieve unambiguity, | but to achieve printability. I agree with Martin that chasing 'unambiguity' is something of a chimera. Whether or not the glyphs for two Unicode chars are identical or not depends on the display system. As I type these here, 1(one) and l (el) are barely distinguishable, depending on reading lens and distance. Should one be excaped? I think not. I have had displays in which they are pixel for pixel identical, but also ones which made them clearly different. Ditto for 0 (zero) and O (Oh). A and *could* be made to look different on modern high-definition outputs. I suspect they already have been or will be. I think standard Python should somehow have two options: escape everything but ASCII (for unambuguity and old display systems) and escape nothing that is potentially printable (leaving partially capable systems to fare as they will). In-between solutions will ultimately be programmer and system specific. Terry Jan Reedy From phd at phd.pp.ru Thu May 1 19:56:32 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 1 May 2008 21:56:32 +0400 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> Message-ID: <20080501175632.GA8293@phd.pp.ru> On Thu, May 01, 2008 at 01:49:37PM -0400, Terry Reedy wrote: > I think standard Python should somehow have two options: escape everything > but ASCII (for unambuguity and old display systems) and escape nothing that > is potentially printable (leaving partially capable systems to fare as they > will). In-between solutions will ultimately be programmer and system > specific. +1 repr() should not escape printable chars, and there should be a codec to escape everything, so one could write mystring.encode("escape_string"). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From brett at python.org Thu May 1 20:02:52 2008 From: brett at python.org (Brett Cannon) Date: Thu, 1 May 2008 11:02:52 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com> References: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com> Message-ID: On Thu, May 1, 2008 at 7:41 AM, Collin Winter wrote: > > On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon wrote: > > Transition Plan > > =============== > > > > For modules to be removed > > ------------------------- > > > > For the removal of modules that are continuing to exist in the Python > > 2.x series (i.e., not deprecated explicitly in the 2.x series), > > ``warnings.warn3k()`` will be used to issue a DeprecationWarning. > > FYI, we can also flag these using 2to3. > I can't remember if we have a guiding rule on this yet, but if 2to3 can fix this, do we still want the warning? Obviously both names will be provided so people can move their code over, but perhaps the warning is not needed? > > > Renaming of modules > > ------------------- > > > > For modules that are renamed, stub modules will be created with the > > original names and be kept in a directory within the stdlib (e.g. like > > how lib-old was once used). The need to keep the stub modules within > > a directory is to prevent naming conflicts with case-insensitive > > filesystems in those cases where nothing but the case of the module > > is changing. > > > > These stub modules will import the module code based on the new > > naming. The same type of warning being raised by modules being > > removed will be raised in the stub modules. > > > > Support in the 2to3 refactoring tool for renames will also be used > > [#2to3]_. Import statements will be rewritten so that only the import > > statement and none of the rest of the code needs to be touched. This > > will be accomplished by using the ``as`` keyword in import statements > > to bind in the module namespace to the old name while importing based > > on the new name. > > You should cite the existing fix_imports fixer as one example of how > to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup Done. -Brett From facundobatista at gmail.com Thu May 1 20:20:10 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 1 May 2008 15:20:10 -0300 Subject: [Python-3000] range() issues In-Reply-To: <4818FC9B.1080809@gmail.com> References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com> <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com> <1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com> <1209582854.1924.7.camel@qrnik> <4818F773.4060809@canterbury.ac.nz> <4818FC9B.1080809@gmail.com> Message-ID: 2008/4/30, Nick Coghlan : > In the bug tracker, Alexander mentioned the possibility of removing > __length__ and __getitem__ support from range() objects in py3k, and > implementing only __length_hint__ instead (leaving range() as a bare-bones > iterable). I'm starting to like that idea more and more. +1 -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From martin at v.loewis.de Thu May 1 20:59:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 20:59:19 +0200 Subject: [Python-3000] gettext In-Reply-To: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> Message-ID: <481A1307.3000605@v.loewis.de> > Are we going to want to keep the "u" variants of the gettext APIs > around in 3.0? Also, the unicode parameters (for .install methods) > don't make much sense in 3.0. > > I don't see how we could remove them in 3.0, but perhaps rename then > to their non-"u" variants and deprecate? I think the new module should only support the Unicode API. gettext is about text, i.e. character strings; there is no need for byte-oriented APIs. Regards, Martin From barry at python.org Thu May 1 21:15:11 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 1 May 2008 15:15:11 -0400 Subject: [Python-3000] gettext In-Reply-To: <87d4o7chho.fsf@physik.rwth-aachen.de> References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> <35FDD892-1F6B-42DA-B5DB-FF5DC6992D46@python.org> <87d4o7chho.fsf@physik.rwth-aachen.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 30, 2008, at 2:41 PM, Torsten Bronger wrote: > Indeed. From today's perspective, I see no use case for getting > human text snippets in byte strings encoded with the same encoding > that just happened to be used in the .mo file, or with the > "preferred system encoding". Agreed. > So it is only about the question how much hassle a > renaming/deprecation generates for existing code. Maybe we shouldn't be so worried about deprecation. Py3 is a clean break, right? >> That might argue for renaming ugettext() to gettext() and adding >> something like a egettext() or bgettext() method. > > Okay. But I think its not much advantage to have the "encoded" > functions under new names, given that instead of renaming, you can > also easily use ugettext to mimic their behaviour. Works for me. >> OTOH, the current names are inspired from GNU gettext so it seems >> to me there's not much value in renaming our methods, except to >> increase confusion and break backward compatibility . > > Well, this is hard to evaluate. However, I think that if there is > no danger of getting silent errors, then the module should switch to > unicode, possibly even unicode-only. After all, the results of > gettext are likely to be passed to higher-level functions that use > (or will switch to) unicode, too. > > As for "gettext" returning a unicode string: If clearly documented, > I see not too much harm in using a different type scheme than C > gettext; this should be acceptable in a reimplementation in another > language. Torsten, I agree. Let's just rename ugettext() to gettext() and have it return unicodes. That's the cleanest API we can do for Python. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBoWv3EjvBPtnXfVAQJF1QQAr2G+UHqXkuckx9oREYpwsXhDhISy4pKJ l3Ai+p+vlVsKIPiYn8HSuJYFRa8QIOBT5EOl6DEDMvQ78hYXu1VaLGWO5bOvnrjS TtCeyM9xZuXWxB3StHO3ao8pK4VdBtljBsi+3vZ8br+4zZpOKQRwiMoWozqyq6u1 EwxFUwE19qI= =936h -----END PGP SIGNATURE----- From barry at python.org Thu May 1 21:15:59 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 1 May 2008 15:15:59 -0400 Subject: [Python-3000] gettext In-Reply-To: <481A1307.3000605@v.loewis.de> References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> <481A1307.3000605@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 1, 2008, at 2:59 PM, Martin v. L?wis wrote: >> Are we going to want to keep the "u" variants of the gettext APIs >> around in 3.0? Also, the unicode parameters (for .install methods) >> don't make much sense in 3.0. >> >> I don't see how we could remove them in 3.0, but perhaps rename then >> to their non-"u" variants and deprecate? > > I think the new module should only support the Unicode API. gettext is > about text, i.e. character strings; there is no need for byte-oriented > APIs. Sounds like you agree that we should just rename the u-variants and forget about deprecation, correct? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBoW8HEjvBPtnXfVAQJd6gP/TprSKI5X9Q5E8D1pqHU2iGB3yKRuJ+4H fzjEEG5vX8Uk+JdaPR83FdwBlTMqtzZPNAKKZzjMJQr/u0a0y+M+JhHhQm6AzS5+ Pc6NFDsqW4HDQDhXVezCMwMK0G7+RRdL4bw+i0mtqiTRkXn0H/ImcM7CzCh7hsYz FuXhiyTiXrQ= =HuR0 -----END PGP SIGNATURE----- From martin at v.loewis.de Thu May 1 21:28:58 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 01 May 2008 21:28:58 +0200 Subject: [Python-3000] gettext In-Reply-To: References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> <481A1307.3000605@v.loewis.de> Message-ID: <481A19FA.6050202@v.loewis.de> > Sounds like you agree that we should just rename the u-variants and > forget about deprecation, correct? Exactly. Regards, Martin From musiccomposition at gmail.com Thu May 1 22:00:38 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 1 May 2008 15:00:38 -0500 Subject: [Python-3000] gettext In-Reply-To: References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com> <35FDD892-1F6B-42DA-B5DB-FF5DC6992D46@python.org> <87d4o7chho.fsf@physik.rwth-aachen.de> Message-ID: <1afaf6160805011300x2724dcf7p90d697a7470e192e@mail.gmail.com> On Thu, May 1, 2008 at 2:15 PM, Barry Warsaw wrote: > Torsten, I agree. Let's just rename ugettext() to gettext() and have it > return unicodes. That's the cleanest API we can do for Python. I have a patch for something like this at issue 2512. -- Cheers, Benjamin Peterson From barry at python.org Thu May 1 22:26:34 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 1 May 2008 16:26:34 -0400 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 Message-ID: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is a reminder that the LAST planned alpha releases of Python 2.6 and 3.0 are scheduled for next Wednesday, 07-May-2008. Please be diligent over the next week so that none of your changes break Python. The stable buildbots look moderately okay, let's see what we can do about getting them all green: http://www.python.org/dev/buildbot/stable/ We have a few showstopper bugs, and I will be looking at these more carefully starting next week. http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers Time is running short to get any new features into Python 2.6 and 3.0. The release after this one is scheduled to be the first beta release, at which time we will institute a feature freeze. If your feature doesn't make it in by then, you'll have to wait until 2.7/3.1. If there is something that absolutely must go into 2.6/3.0 be sure that there is a bug issue open for it and that the Priority is set to 'release blocker'. I may reduce it to critical for the next alpha, but we'll review all the release blocker and critical issues for the first 2.6 and 3.0 beta releases. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBonfHEjvBPtnXfVAQLaSwP+IMjYbLryACRColvgTU4ezPHhbBpdDaRA I2k15cLsqmkFwHitt9TaTlLklnZuETiEfl7pVzow20KW18Z2tWP5U5KVMrVVbrJM 9pMS/vC102FVD88ukyQcPP5q+pw2+r2qTLr3q/205zdELQlWo+Ny6ir6dAgTKOd4 /OZqgCMBHS4= =MhWr -----END PGP SIGNATURE----- From lists at cheimes.de Thu May 1 23:27:52 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 01 May 2008 23:27:52 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: <481A35D8.60604@cheimes.de> Barry Warsaw schrieb: > This is a reminder that the LAST planned alpha releases of Python 2.6 > and 3.0 are scheduled for next Wednesday, 07-May-2008. Please be > diligent over the next week so that none of your changes break Python. > The stable buildbots look moderately okay, let's see what we can do > about getting them all green: I like to draw some attention to two features for the last alpha: PEP 370: Per user site-packages directory http://www.python.org/dev/peps/pep-0370/ Alternative memory allocation for ints, floats and longs using PyMalloc instead of the current block allocation. The issue has been discussed in great length a few months ago but without a final decision. http://bugs.python.org/issue2039 Christian From tjreedy at udel.edu Thu May 1 23:42:39 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 1 May 2008 17:42:39 -0400 Subject: [Python-3000] Invitation to try out open source code review tool References: Message-ID: As I understood this,one needs a diff to comment on. I can imagine wanting, or wanting others, to be able to comment on a file or lines of files without making a fake diff (of the file versus itself or a blank file). Then only one column would be needed. I presume the current site is for trial purposes. You obviously don't want hundreds of repositories listed. Are you planning, for instance, to suggest that Google project hosting add a Review tab or link to the project pages? And I followed the link to pages about Rietveld ;-) tjr From guido at python.org Fri May 2 00:41:14 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 15:41:14 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: On Thu, May 1, 2008 at 2:42 PM, Terry Reedy wrote: > As I understood this,one needs a diff to comment on. > I can imagine wanting, or wanting others, to be able to comment on a file > or lines of files without making a fake diff (of the file versus itself or > a blank file). Then only one column would be needed. Yeah, this use case is not well supported. In my experience with the internal tool at Google, I don't think that anybody has ever requested that feature, so perhaps in practice it's not so common. I mean, who wants to review a 5000-line file once it's checked in? :-) The right point for such a review (certainly this is the case at Google) is when it goes in. > I presume the current site is for trial purposes. Actually I'm hoping to keep it alive forever, just evolving the functionality based on feedback. > You obviously don't want > hundreds of repositories listed. Repository management is a bit of an open problem. Fortunately, when you use upload.py, you don't need to have a repository listed -- upload.py will specify the correct base URL, especially for repositories hosted at Google. (I should probably figure out how to support SourceForge as well...) > Are you planning, for instance, to > suggest that Google project hosting add a Review tab or link to the project > pages? They've been following my release with interest... > And I followed the link to pages about Rietveld ;-) Thanks. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Fri May 2 01:24:24 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 1 May 2008 19:24:24 -0400 Subject: [Python-3000] Invitation to try out open source code review tool References: Message-ID: "Guido van Rossum" wrote in message news:ca471dc20805011541y63dd132eo6e67310eaeea3ffa at mail.gmail.com... | On Thu, May 1, 2008 at 2:42 PM, Terry Reedy wrote: | > As I understood this,one needs a diff to comment on. | > I can imagine wanting, or wanting others, to be able to comment on a file | > or lines of files without making a fake diff (of the file versus itself or | > a blank file). Then only one column would be needed. | | Yeah, this use case is not well supported. In my experience with the | internal tool at Google, I don't think that anybody has ever requested | that feature, so perhaps in practice it's not so common. I mean, who | wants to review a 5000-line file once it's checked in? :-) The right | point for such a review (certainly this is the case at Google) is when | it goes in. I am thinking of an entirely different scenario: a package of modules that are maybe a few hundred lines each and that accompany a book and are meant for human reading as much or more than for machine execution. Or this: 15 minutes ago I was reading a PEP and discovered that a link did not work. So I find the non-clickable author email at the top and notify the author with my email program. But how much nicer to double click an adjacent line and stick the comment in place (and let your system do the emailing). (I presume the sponsor of an item in your system can remove no-longer-needed comments.) So I guess I am thinking of your system as one for collaborative online editing rather than just patch review. Terry From guido at python.org Fri May 2 01:29:13 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 16:29:13 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: On Thu, May 1, 2008 at 4:24 PM, Terry Reedy wrote: > > "Guido van Rossum" wrote in message > news:ca471dc20805011541y63dd132eo6e67310eaeea3ffa at mail.gmail.com... > > | On Thu, May 1, 2008 at 2:42 PM, Terry Reedy wrote: > | > As I understood this,one needs a diff to comment on. > | > I can imagine wanting, or wanting others, to be able to comment on a > file > | > or lines of files without making a fake diff (of the file versus > itself or > | > a blank file). Then only one column would be needed. > | > | Yeah, this use case is not well supported. In my experience with the > | internal tool at Google, I don't think that anybody has ever requested > | that feature, so perhaps in practice it's not so common. I mean, who > | wants to review a 5000-line file once it's checked in? :-) The right > | point for such a review (certainly this is the case at Google) is when > | it goes in. > > I am thinking of an entirely different scenario: a package of modules that > are maybe a few hundred lines each and that accompany a book and are meant > for human reading as much or more than for machine execution. > > Or this: 15 minutes ago I was reading a PEP and discovered that a link did > not work. So I find the non-clickable author email at the top and notify > the author with my email program. But how much nicer to double click an > adjacent line and stick the comment in place (and let your system do the > emailing). (I presume the sponsor of an item in your system can remove > no-longer-needed comments.) So I guess I am thinking of your system as one > for collaborative online editing rather than just patch review. I agree that those are all great use cases. Eventually we'll be able to support these; right now though, I'd like to focus on the more immediate need (IMO) of patch reviews. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ndbecker2 at gmail.com Fri May 2 01:37:55 2008 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 01 May 2008 19:37:55 -0400 Subject: [Python-3000] Invitation to try out open source code review tool References: Message-ID: It would be really nice to see support for some other backends, such as Hg or bzr (which are both written in python), in addition to svn. From guido at python.org Fri May 2 01:45:01 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 16:45:01 -0700 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481A35D8.60604@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> Message-ID: On Thu, May 1, 2008 at 2:27 PM, Christian Heimes wrote: > Barry Warsaw schrieb: > > > This is a reminder that the LAST planned alpha releases of Python 2.6 > > and 3.0 are scheduled for next Wednesday, 07-May-2008. Please be > > diligent over the next week so that none of your changes break Python. > > The stable buildbots look moderately okay, let's see what we can do > > about getting them all green: > > I like to draw some attention to two features for the last alpha: > > PEP 370: Per user site-packages directory > http://www.python.org/dev/peps/pep-0370/ I like this, except one issue: I really don't like the .local directory. I don't see any compelling reason why this needs to be ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide it from view, especially since the user is expected to manage this explicitly. > Alternative memory allocation for ints, floats and longs using PyMalloc > instead of the current block allocation. The issue has been discussed in > great length a few months ago but without a final decision. > http://bugs.python.org/issue2039 I might look at this later; but it seems to me to be a pure optimization and thus not required to be in before the first beta. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri May 2 01:45:33 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 16:45:33 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: On Thu, May 1, 2008 at 4:37 PM, Neal Becker wrote: > It would be really nice to see support for some other backends, such as Hg > or bzr (which are both written in python), in addition to svn. Once it's open source feel free to add those! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adlaiff6 at gmail.com Fri May 2 01:47:51 2008 From: adlaiff6 at gmail.com (Leif Walsh) Date: Thu, 1 May 2008 19:47:51 -0400 (EDT) Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: On Thu, 1 May 2008, Neal Becker wrote: > It would be really nice to see support for some other backends, such as Hg > or bzr (which are both written in python), in addition to svn. /me starts the clamour for git -- Cheers, Leif From barry at python.org Fri May 2 01:54:40 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 1 May 2008 19:54:40 -0400 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> Message-ID: <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 1, 2008, at 7:45 PM, Guido van Rossum wrote: > On Thu, May 1, 2008 at 2:27 PM, Christian Heimes > wrote: >> Barry Warsaw schrieb: >> >>> This is a reminder that the LAST planned alpha releases of Python >>> 2.6 >>> and 3.0 are scheduled for next Wednesday, 07-May-2008. Please be >>> diligent over the next week so that none of your changes break >>> Python. >>> The stable buildbots look moderately okay, let's see what we can do >>> about getting them all green: >> >> I like to draw some attention to two features for the last alpha: >> >> PEP 370: Per user site-packages directory >> http://www.python.org/dev/peps/pep-0370/ > > I like this, except one issue: I really don't like the .local > directory. I don't see any compelling reason why this needs to be > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide > it from view, especially since the user is expected to manage this > explicitly. Interesting. I'm of the opposite opinion. I really don't want Python dictating to me what my home directory should look like (a dot file doesn't count because so many tools conspire to hide it from me). I guess there's always $PYTHONUSERBASE, but I think I will not be alone. ;) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBpYQ3EjvBPtnXfVAQLY+AP/dy7qoQKNEJiKtlwqCtw7LUCMLMQylBX8 DfbIonOnAaKHzjveyswuxVeAEq/C/fxssOGMhyd++H/1koJHjBdIHp47+RgohbHQ 1xCyA6Qj8f6xM3xdCR7lRuIDdjb6Tb/iCIQT/dHLrYxEf+VGUC+xVa3JIXdfJu4s kUYg7tU8SQ8= =xJWG -----END PGP SIGNATURE----- From guido at python.org Fri May 2 03:55:56 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 18:55:56 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> Message-ID: On Thu, May 1, 2008 at 5:03 PM, wrote: > On 11:45 pm, guido at python.org wrote: > > > I like this, except one issue: I really don't like the .local > > directory. I don't see any compelling reason why this needs to be > > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide > > it from view, especially since the user is expected to manage this > > explicitly. > > > > I've previously given a spirited defense of ~/.local on this list ( > http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) among > other places. > > Briefly, "lib" is not the only directory participating in this convention; > you've also got the full complement of other stuff that might go into an > installation like /usr/local. So, while "lib" might annoy me a little, "bin > etc games include lib lib32 man sbin share src" is going to get ugly pretty > fast, especially if this is what comes up in Finder or Nautilus or Explorer > every time I open a window. Unless I misread the PEP, there's only going to be a lib subdirectory. Python packages don't put stuff in other places AFAIK. On the Mac, the default Finder window is not your home directory but your Desktop, which is a subdirectory thereof with a markedly public name. In fact, OS X has a whole bunch of reserved names in your home directory, and none of them start with a dot. The rule seems to be that if it contains stuff that the user cares about, it doesn't start with a dot. > If it's going to be a visible directory on the > grounds that this is a Python- specific thing that is explicitly *not* > participating in a convention with other software, then please call it > "~/Python" or something. Much better than ~/.local/ IMO. > Am I the only guy who finds software that insists on visible, fixed files > in my home directory rude? vmware, for example, wants a "~/vmware" > directory, but pretty much every other application I use is nice enough to > use dotfiles (even cedega, with a roughly-comparable-to- lib "applications > I've installed for you" folder). The distinction to my mind is that most dot files (with the exception of a few like .profile or .bashrc) are not managed by most users -- the apps that manage them provide an APIs for manipulating their contents. (Sort of like thw Windows registry.) Non-dot files are for stuff that the user needs to be aware of. I'm not sure where Python packages fall, but ISTM that this is something a user must explicitly choose as the target of an installer. The user is also likely to have to dig through there to remove stuff, as Python package management doesn't have a way to remove packages. > Put another way - it's trivial to make ~/.local/lib show up by symlinking > ~/lib, That's not the same thing at all. > but you can't make ~/lib disappear, and lots of software ends up > looking at ~. But what software cares about another file there? My home directory is mostly a switching point where I have quick access to everything I access regularly. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Fri May 2 04:31:20 2008 From: brett at python.org (Brett Cannon) Date: Thu, 1 May 2008 19:31:20 -0700 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: On Thu, May 1, 2008 at 1:26 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > This is a reminder that the LAST planned alpha releases of Python 2.6 and > 3.0 are scheduled for next Wednesday, 07-May-2008. Please be diligent over > the next week so that none of your changes break Python. The stable > buildbots look moderately okay, let's see what we can do about getting them > all green: > > http://www.python.org/dev/buildbot/stable/ > > We have a few showstopper bugs, and I will be looking at these more > carefully starting next week. > > > http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers > > Time is running short to get any new features into Python 2.6 and 3.0. The > release after this one is scheduled to be the first beta release, at which > time we will institute a feature freeze. If your feature doesn't make it in > by then, you'll have to wait until 2.7/3.1. If there is something that > absolutely must go into 2.6/3.0 be sure that there is a bug issue open for > it and that the Priority is set to 'release blocker'. I may reduce it to > critical for the next alpha, but we'll review all the release blocker and > critical issues for the first 2.6 and 3.0 beta releases. I just closed the release blocker I created (the backwards-compatibility issue with warnings.showwarning() ). I would like to add a PendingDeprecationWarning (or stronger) to 2.6 for showwarning() implementations that don't support the optional 'line' argument. I guess the best way to do it in C code would be to see if PyFunction_GetDefaults() returns a tuple of length two (since showwarning() already has a single optional argument as it is). Anyone have an issue with me doing this? Is PendingDeprecationWarning safe enough for 2.6? Or should this be a 3.0-only thing with a DeprecationWarning? -Brett From musiccomposition at gmail.com Fri May 2 04:35:12 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 1 May 2008 21:35:12 -0500 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: <1afaf6160805011935k6dc045bexd6fb54f112f2307@mail.gmail.com> On Thu, May 1, 2008 at 9:31 PM, Brett Cannon wrote: > > I just closed the release blocker I created (the > backwards-compatibility issue with warnings.showwarning() ). I would > like to add a PendingDeprecationWarning (or stronger) to 2.6 for > showwarning() implementations that don't support the optional 'line' > argument. I guess the best way to do it in C code would be to see if > PyFunction_GetDefaults() returns a tuple of length two (since > showwarning() already has a single optional argument as it is). > > Anyone have an issue with me doing this? Is PendingDeprecationWarning > safe enough for 2.6? Or should this be a 3.0-only thing with a > DeprecationWarning? I vote for a full DeprecationWarning. > > -Brett -- Cheers, Benjamin Peterson From g.brandl at gmx.net Fri May 2 05:28:19 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 02 May 2008 05:28:19 +0200 Subject: [Python-3000] Problems with the new super() In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On Thu, May 1, 2008 at 11:20 AM, Georg Brandl wrote: >> But the other two magical things about super() really bother me too. I >> haven't looked at the new super in detail so far (and I don't know how >> many others have), and two things are really strikingly unpythonic in >> my view: >> >> * super() only works when named "super" [1]. It shouldn't be a function if >> it has that property; no other Python function has that. > > Actually, I believe IronPython and/or Jython have to use this trick in > certain cases -- at least I recall Jim Hugunin talking about > generating different code when the use of locals() was detected. I don't know if it's possible in Jython to have "locals" referring to something else. For CPython, the name "super" in a function can refer to anything -- local, global or builtin -- and it just feels wrong for the compiler to make assumptions based on the mere mention of a non-reserved name. > I'm not proud of this, but I don't see a way around it. The > alternative would be to make it a keyword, which seemed excessive > (plus, it would be odd if super() were a keyword when self is not). I don't find it odd. In fact, IMO the whole magic needed for the runtime implementation of "super()" justifies super becoming a keyword. Georg [Moving this to the Python-3000 list] -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Fri May 2 05:49:18 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 20:49:18 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> Message-ID: I stand corrected on a few points. You've convinced me that ~/lib/ is wrong. But I still don't like ~/.local/; not in the last place because it's not any more local than any other dot files or directories. The "symmetry" with /usr/local/ is pretty weak, and certainly won't help beginning users. As a compromise, I'm okay with ~/Python/. I would like to be able to say that the user explicitly has to set an environment variable in order to benefit from this feature, just like with $PYTHONPATH and $PYTHONSTARTUP. But that might defeat the point of making this easy to use for noobs. On OS X I think we should put this somewhere under ~/Library/. Just put it in a different place than where the Python framework puts its stuff. On Thu, May 1, 2008 at 8:25 PM, wrote: > On 01:55 am, guido at python.org wrote: > > > On Thu, May 1, 2008 at 5:03 PM, wrote: > > > > Hi everybody. I apologize for writing yet another lengthy screed about a > simple directory naming issue. I feel strongly about it but I encourate > anyone who doesn't to simply skip it. > > First, some background: my strong feelings here are actually based on an > experience I had a long time ago when helping someone with some C++ > programming homework. They were baffled because when I helped them the > programs compiled, but then as soon as they tried it on their own it didn't. > The issue was that I had replicated my own autotools-friendly directory > structure for them (at the time, "~/bin", "~/include", "~/lib", "~/etc", and > so on managed with GNU stow) onto their machine and edited their shell setup > to include them appropriately. But, as soon as I was finished, they > "cleaned up" the "mess" I had left behind, and thereby removed all of their > build dependencies. This was on a shared university build server, before > the days of linux as a friendly, graphical operating system which encouraged > you to look even more frequently at your home directory, so if anything I > suspect the likelihood that this is a problem would be worse now. Since > cleaning up my own home directory, of course, I find that I appreciate the > lack of visual noise in Nautilus et. al. as well. > > Also, while I obviously think all tools should work this way, I think that > Python in particular will attract an audience who is learning to program but > not necessarily savvy with arcane nuances of filesystem layout, and it would > be best if those details were abstracted. > > My concern here is for the naive python developer reading installation > instructions off of a wiki and trying to get started with Twisted > development. Seeing a directory created in your home directory (or, as the > case may be, 3 directories, "bin", "lib", and "include") is a bit of a > surprise. They don't actually care where the files in their installed > library are, as long as they're "installed", and they can import them. > However, they may care that clicking on the little house icon now shows not > "Pictures", "Movies", etc, but "lib" (what's a 'lib'?) "bin" (what's a bin? > is that like a box where I throw my stuff?) "share" (I put my stuff in > "share", but it's not shared. Wait, I'm supposed to put it in "Public"?). > > > > > > > Briefly, "lib" is not the only directory participating in this > convention; > > > you've also got the full complement of other stuff that might go into an > > > installation like /usr/local. So, while "lib" might annoy me a little, > "bin > > > etc games include lib lib32 man sbin share src" is going to get ugly > pretty > > > fast, especially if this is what comes up in Finder or Nautilus or > Explorer > > > every time I open a window. > > > > > > > Unless I misread the PEP, there's only going to be a lib subdirectory. > > Python packages don't put stuff in other places AFAIK. > > > > Python packages, at the very least, frequently put stuff in "bin" (or > "scripts", I think, on Windows). Not all Python packages are pure- Python > packages either; setup.py boasts --install-platlib, --install- headers, > --install-data, and --exec-prefix options, which suggests an "include", > "bin", and "share" directory, at least. I'm sure if I had more time to > grovel around I'd find one that installed manpages. Twisted has some, but > apparently setup.py doesn't do anything with them, we leave that to the OS > packages... > > Of course, very little of this is handled by the PEP. But even the usage > of the name "lib" implies that the PEP is taking some care to be compatible > with an idiom that goes beyond Python itself here, or at least beyond simple > Python packages. > > Even assuming that no Python library ever wanted to install any of these > things, there are many Python libraries which are simply wrappers around > lower-level libraries, and if I want to perform a per-user install of one of > those, I am going to ./configure --prefix=~/something (and by "something", I > mean ".local" ;)) and it would be nice to have Python living in the same > space. For that matter it'd be nice to get autotools and Ruby and PHP and > Perl and Emacs (ad nauseum) all looking at ~/.local as a mirror of /usr, so > that I didn't have to write a bunch of shell bootstrap glue to get > everything to behave consistently, or learn the new, special names for bits > of configuration under "~" that are different from the ones under /usr/local > or /etc. > > I replicate a consistent Python development environment with a ton of > bizarre dependencies across something like 15 different OS installations > (not to mention a bevy of virtual machines I keep around just for fun), so I > think about these issues a lot. Most of these machines are macs and linux > boxes, but I do my best on Windows too. FWIW I don't have any idea what the > right thing to do is on Windows; ".local" doesn't particularly make sense, > but neither does "lib" in that context. There's no reasonable guess as to > where to put scripts, or dependent shared libraries... but then, per-user > installation is less of an issue on Windows. > > > > On the Mac, the default Finder window is not your home directory but > > your Desktop, which is a subdirectory thereof with a markedly public > > name. In fact, OS X has a whole bunch of reserved names in your home > > directory, and none of them start with a dot. The rule seems to be > > that if it contains stuff that the user cares about, it doesn't start > > with a dot. > > > > Hmm. On my Mac laptop, the default Finder window is definitely my home > directory; this may be an artifact of many OS upgrades or some tweak that I > performed a long time ago and forgot about, though. Apologies if that is > not the average user experience. > > For what it's worth, Ubuntu also has some directories that it creates: > Desktop, Pictures, Documents, Examples, Templates, Videos. These are empty, > and I typically delete the ones I don't use. > > > > > > > If it's going to be a visible directory on the > > > grounds that this is a Python- specific thing that is explicitly *not* > > > participating in a convention with other software, then please call it > > > "~/Python" or something. > > > > > > > Much better than ~/.local/ IMO. > > > > It depends how this is being perceived. If this is Python mirroring the > /usr/local layout convention for users, as the name "lib" implies, then this > is worse. However, if Python is just trying to select a location for its > own library bookkeeping and not allow the installation of platform libraries > or scripts using this mechanism... well, ~/.python.d would still be my > preference ;-) but I could at least understand "Python" as mirroring the > Mac, GNOME and KDE convention for a few very special directories. > > > > > > > Am I the only guy who finds software that insists on visible, fixed > files > > > in my home directory rude? vmware, for example, wants a "~/vmware" > > > directory, but pretty much every other application I use is nice enough > to > > > use dotfiles (even cedega, with a roughly-comparable-to- lib > "applications > > > I've installed for you" folder). > > > > > > > The distinction to my mind is that most dot files (with the exception > > of a few like .profile or .bashrc) are not managed by most users -- > > the apps that manage them provide an APIs for manipulating their > > contents. (Sort of like thw Windows registry.) Non-dot files are for > > stuff that the user needs to be aware of. > > > > My experience of modern Linux suggests that the usage you're describing is > gradually being phased out - applications that want to manage some > non-user-visible storage in something like the registry increasingly use > gconf (or a database, in server-land). Granted, gconf itself is stored in > dotfiles, but it's just a few. > > In my home directory I have, in version control, variously written by hand > or databases maintained from externally downloaded stuff: > > ~/.asoundrc > ~/.emacs > ~/.vimrc > ~/.vim > ~/.Xresources > ~/.fonts > ~/.gnomerc > ~/.inputrc > ~/.bashrc > ~/.bash_profile > ~/.profile > ~/.screenrc > ~/.Xresources > ~/.ssh/config > ~/.ssh/authorized_keys > ~/.ssh/known_hosts > > I know about these dot files and I care about them and I maintain them, but > they're there for the benefit of particular pieces of software, not me. > There are a lot of other dotfiles there, but I don't think that this set is > "a few"; I am quite happy that I don't have to see every one of them every > time I am looking at my home directory in a "save as" dialog. > > > > I'm not sure where Python packages fall, but ISTM that this is > > something a user must explicitly choose as the target of an installer. > > The user is also likely to have to dig through there to remove stuff, > > as Python package management doesn't have a way to remove packages. > > > > I hope that users never have to explicitly choose this as the target of the > installer; I was under the impression that the point of adding this feature > was to allow the default behavior of distutils to work simply and > automatically on UNIX-y platforms rather than puking about permissions, or > requiring arcana like "sudo" access or editing your shell's startup. I am > quietly agitating elsewhere to get ~/.local/bin added to $PATH by default, > by the way ;-). (~/.local/lib on $LD_LIBRARY_PATH is a hard sell, but that > too...) > > Once you have to know about it and explicitly choose it it's not much more > work to set all the appropriate shell environment variables yourself. And, > for that matter, *I* already have, so I suppose regardless of the outcome of > this discussion I'll still have a ~/.local :-). > > > > > > > Put another way - it's trivial to make ~/.local/lib show up by > symlinking > > > ~/lib, > > > > > > > That's not the same thing at all. > > > > I'm not sure what you're saying it's not the same as. All I'm saying is > that if advanced users want to show it, they'll symlink it; if naive users > want to hide it, they'll delete it and break python, possibly without > knowing why ;). > > > > > > > but you can't make ~/lib disappear, and lots of software ends up > > > looking at ~. > > > > > > > But what software cares about another file there? My home directory is > > mostly a switching point where I have quick access to everything I > > access regularly. > > > > Nothing's going to break, if that's what you mean. No software processes > the list of ~ and does anything with it; but lots of stuff shows me that > list. In GNOME, on Ubuntu, when a "choose file" dialog comes up, 80% of the > time it comes up by default in my home directory. When I open a terminal it > opens in my home directory. The default location for Emacs is my home > directory. I can quickly measure my cognitive load by looking at the > contents of that directory. Since my shell starts there, autocomplete > starts there, and so common-letter real estate is scarce. I have a > directory called "Projects" that I currently autocomplete with 'p' and > a directory called 'Linux' that I autocomplete with 'l'; either > public-name proposal will have me typing an additional letter on these every > day ;-). > > In other words, I care about another file there. I use my home directory > as a sort of to-do list; it's mostly empty unless I have a lot going on, in > which case it fills up with various objects I'm working on, and then I empty > it out again. There are a few exceptions to this rule; on every platform > there are a few things the OS puts there, but they are generally things like > "Pictures", "Desktop", and "Music"... where I put pictures, downloaded > files, and music. The Mac's "Library" directory has never bothered me, since > it's OS-provided and basically an alternate location for dotfiles. > ("Application Data" and friends are another story.) > > In a way, I agree with you. "everything I access regularly" is a good > description of my home directory. Except, this "lib" directory is not > something I want to access regularly; very occasionally, maybe once every > few weeks, I want to chuck some dependency in there and then forget about it > for a year. > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri May 2 05:54:33 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 1 May 2008 20:54:33 -0700 Subject: [Python-3000] Problems with the new super() In-Reply-To: References: Message-ID: This whole movement to condemn super because it's not "pure" strikes me as wasted energy. That's my last word. On Thu, May 1, 2008 at 8:28 PM, Georg Brandl wrote: > Guido van Rossum schrieb: > > > > On Thu, May 1, 2008 at 11:20 AM, Georg Brandl wrote: > > > > > But the other two magical things about super() really bother me too. I > > > haven't looked at the new super in detail so far (and I don't know how > > > many others have), and two things are really strikingly unpythonic in > > > my view: > > > > > > * super() only works when named "super" [1]. It shouldn't be a function > if > > > it has that property; no other Python function has that. > > > > > > > Actually, I believe IronPython and/or Jython have to use this trick in > > certain cases -- at least I recall Jim Hugunin talking about > > generating different code when the use of locals() was detected. > > > > I don't know if it's possible in Jython to have "locals" referring to > something else. For CPython, the name "super" in a function can refer to > anything -- local, global or builtin -- and it just feels wrong for the > compiler to make assumptions based on the mere mention of a non-reserved > name. > > > > > I'm not proud of this, but I don't see a way around it. The > > alternative would be to make it a keyword, which seemed excessive > > (plus, it would be odd if super() were a keyword when self is not). > > > > I don't find it odd. In fact, IMO the whole magic needed for the runtime > implementation of "super()" justifies super becoming a keyword. > > Georg > > [Moving this to the Python-3000 list] > > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent > thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Fri May 2 07:57:31 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 2 May 2008 15:57:31 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502050720.GO78165@nexus.in-nomine.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <20080502050720.GO78165@nexus.in-nomine.org> Message-ID: <05bb01c8ac19$6db816c0$49284440$@com.au> > Is there a reliable way to identify 32-bits and 64-bits Windows from > within Python? Not that I'm aware of. 'sys.platform=="win32" and "64 bits" in sys.version' will be reliable when it returns True, but it might be wrong when it returns False (although when it returns False, things will look a lot like a 32bit OS). The best way I can find for the win32 API to tell you this is a combination of the above and the IsWow64Process() (which returns True if you are a 32bit process on a 64bit platform) I'd be interested to know why you care though - ie, how will the behavior of your programs depend on that? The virtualization compatibility hacks which, best I can tell are currently enabled for Python mean that the answer to the question might not be as useful as people might think. But I'm sure valid reasons for wanting to know this exist, so I'd be happy to create a patch which add a new sys.iswow64process() process if desired. Cheers, Mark From jbarham at gmail.com Fri May 2 08:50:52 2008 From: jbarham at gmail.com (John Barham) Date: Thu, 1 May 2008 23:50:52 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> Message-ID: <4f34febc0805012350v6251650do28d46ff5f5577421@mail.gmail.com> > I think it would be great if Python were the first real adopter of this > convention... A convention without any adopters? Seems like a non sequitur... From lists at cheimes.de Fri May 2 10:30:20 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 02 May 2008 10:30:20 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> Message-ID: <481AD11C.4020806@cheimes.de> Guido van Rossum schrieb: > I like this, except one issue: I really don't like the .local > directory. I don't see any compelling reason why this needs to be > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide > it from view, especially since the user is expected to manage this > explicitly. The directory name has been commented on by glyph in great length (again). Thanks glyph! I'm all on his side. The base directory for Python related files should be a dot directory in the root directory of the users home dir. I slightly prefer ~/.local/ over other suggestions but I'm also open to ~/.python.d/ Should I wait with the commit until we have agreed on a directory name or do you want me to commit the code now? > I might look at this later; but it seems to me to be a pure > optimization and thus not required to be in before the first beta. Correct, it's an optimization to enhance the memory utilization. Christian From steve at holdenweb.com Fri May 2 10:49:17 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 02 May 2008 04:49:17 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> Message-ID: <481AD58D.2010201@holdenweb.com> Guido van Rossum wrote: > I stand corrected on a few points. You've convinced me that ~/lib/ is > wrong. But I still don't like ~/.local/; not in the last place because > it's not any more local than any other dot files or directories. The > "symmetry" with /usr/local/ is pretty weak, and certainly won't help > beginning users. > So it's the *name* you don't like rather than the invisibility? > As a compromise, I'm okay with ~/Python/. I would like to be able to > say that the user explicitly has to set an environment variable in > order to benefit from this feature, just like with $PYTHONPATH and > $PYTHONSTARTUP. But that might defeat the point of making this easy to > use for noobs. > Groan. Then everyone else realizes what a "great idea" this is, and we see ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with a comment market in its name), ~/Lisp/ and the rest? I don't think people would thank us for that in the long term. I'm about +10 on invisibility, for the simple reason that "hiding the mechanism" is the right thing to do for naive users, who are the most likely to screw things up if given the chance and the most likely to be unaware of dot-name directories. If you don't like ~/.local/ then please consider ~/.private/ or ~/.personal/ or something else, but don't gratuitously add a visible subdirectory. > On OS X I think we should put this somewhere under ~/Library/. Just > put it in a different place than where the Python framework puts its > stuff. > Nothing to say about OS X. One day Windows might start to respect the "hidden dot" convention, but perhaps in the interim we could create a (Windows-hidden) ~/.private/? Assuming we could work out where to put it ;-) > On Thu, May 1, 2008 at 8:25 PM, wrote: [much good sense] regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From lists at cheimes.de Fri May 2 10:57:21 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 02 May 2008 10:57:21 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD58D.2010201@holdenweb.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> Message-ID: <481AD771.6040802@cheimes.de> Steve Holden schrieb: > Nothing to say about OS X. > > One day Windows might start to respect the "hidden dot" convention, but > perhaps in the interim we could create a (Windows-hidden) ~/.private/? > Assuming we could work out where to put it ;-) Windows and Mac OS X have dedicated directories for application specific libraries. That is ~/Library on Mac and Application Data on Windows. The latter is i18n-ed and called "Anwendungsdaten" in German. Fortunately Windows sets an environment var to the application data directory. Christian From lists at cheimes.de Fri May 2 11:44:26 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 02 May 2008 11:44:26 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502091633.GV78165@nexus.in-nomine.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de> <20080502091633.GV78165@nexus.in-nomine.org> Message-ID: <481AE27A.10906@cheimes.de> Jeroen Ruigrok van der Werven schrieb: > "Windows uses the Roaming folder for application specific data, such as > custom dictionaries, which are machine independent and should roam with the > user profile. The AppData\Roaming folder in Windows Vista is the same as the > Documents and Settings\username\Application Data folder in Windows XP." > > I think that's different from what you meant above though, since I doubt > you'd want this (the libraries) to roam with the user. In a matter of fact I *want* to roam the libraries. On the other hand this might become an issue if a user roams between a 32bit and 64bit system ... From ncoghlan at gmail.com Fri May 2 12:43:19 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 02 May 2008 20:43:19 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502092008.GW78165@nexus.in-nomine.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> <20080502092008.GW78165@nexus.in-nomine.org> Message-ID: <481AF047.5050109@gmail.com> Jeroen Ruigrok van der Werven wrote: > -On [20080502 10:50], Steve Holden (steve at holdenweb.com) wrote: >> Groan. Then everyone else realizes what a "great idea" this is, and we see >> ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with >> a comment market in its name), ~/Lisp/ and the rest? I don't think people >> would thank us for that in the long term. > > I'm +1 on just using $HOME/.local, but otherwise $HOME/.python makes sense > too. $HOME/.python.d doesn't do it for me, too clunky (and hardly used if I > look at my .files in $HOME). > > But I agree with Steve that it should be a hidden directory. This sums up my opinion pretty well. Hidden by default, but easy to expose (e.g. via a local -> .local symlink) for the more experienced users that want it more easily accessible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Fri May 2 12:51:20 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 02 May 2008 20:51:20 +1000 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> Message-ID: <481AF228.2080900@gmail.com> Terry Reedy wrote: > I think standard Python should somehow have two options: escape everything > but ASCII (for unambuguity and old display systems) and escape nothing that > is potentially printable (leaving partially capable systems to fare as they > will). In-between solutions will ultimately be programmer and system > specific. If repr() is made to work as Martin suggests (i.e. only escape the unprintable stuff), then the unicode_escape codec can be used fairly easily to restore the 2.x escape everything non-ASCII behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From barry at python.org Fri May 2 13:32:52 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 2 May 2008 07:32:52 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> Message-ID: <68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 2, 2008, at 1:48 AM, glyph at divmod.com wrote: > etc, though. In the long term, if everyone followed suit on > ~/.local, that would be great. But I don't want a ~/Python, ~/Java, > ~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as > my arm just so I can run a few applications without system- > installing them. I hate to send a "me too" messages, but I have to say Glyph is exactly right here. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBr75XEjvBPtnXfVAQIHAgP+JDpOymVEKfFvzZQZd8WtTpY6jsjvntAA 2J38LslMAXJSs3BcRBU/ELcbvTpr/JoEButktAQJCJpIhsmRTV0y3KcS/d/d+Sao 9V3ME2/yZ94qeQheB7jJIhfihNlC7VhG+CjSOMZrRZwm3k2drGGDdfdgGeSGZJOl B6uCEB0i0iI= =gup1 -----END PGP SIGNATURE----- From exarkun at divmod.com Fri May 2 15:32:49 2008 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 2 May 2008 09:32:49 -0400 Subject: [Python-3000] warnings.showwarning (was Re: [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: Message-ID: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm> On Thu, 1 May 2008 19:31:20 -0700, Brett Cannon wrote: > > [snip] > >I just closed the release blocker I created (the >backwards-compatibility issue with warnings.showwarning() ). I would >like to add a PendingDeprecationWarning (or stronger) to 2.6 for >showwarning() implementations that don't support the optional 'line' >argument. I guess the best way to do it in C code would be to see if >PyFunction_GetDefaults() returns a tuple of length two (since >showwarning() already has a single optional argument as it is). Hi Brett, I'm still seeing some strange behavior from the warnings module, This can be observed on the community buildbot for Twisted, for example: http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171/step-Twisted.zope.stable/0 The log ends with basically all of the warning-related tests in Twisted failing, reporting that no warnings happened. There is also some strange behavior that can be easily observed in the REPL: exarkun at boson:~/Projects/python/trunk$ ./python /home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7: DeprecationWarning: the sets module is deprecated from sets import Set Python 2.6a2+ (trunk:62636M, May 2 2008, 09:19:41) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import warnings >>> warnings.warn("foo") :1: UserWarning: foo # Where'd the module name go? >>> def f(*a): ... print a ... >>> warnings.showwarning = f >>> warnings.warn("foo") >>> # Where'd the warning go? Any ideas on this? Jean-Paul From exarkun at divmod.com Fri May 2 15:47:16 2008 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 2 May 2008 09:47:16 -0400 Subject: [Python-3000] [Python-Dev] warnings.showwarning (was Re: Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm> Message-ID: <20080502134716.6859.1256230877.divmod.quotient.58108@ohm> On Fri, 2 May 2008 09:32:49 -0400, Jean-Paul Calderone wrote: >On Thu, 1 May 2008 19:31:20 -0700, Brett Cannon wrote: >> >>[snip] >> >>I just closed the release blocker I created (the >>backwards-compatibility issue with warnings.showwarning() ). I would >>like to add a PendingDeprecationWarning (or stronger) to 2.6 for >>showwarning() implementations that don't support the optional 'line' >>argument. I guess the best way to do it in C code would be to see if >>PyFunction_GetDefaults() returns a tuple of length two (since >>showwarning() already has a single optional argument as it is). > >Hi Brett, > >I'm still seeing some strange behavior from the warnings module, This >can be observed on the community buildbot for Twisted, for example: > >http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171 >/step-Twisted.zope.stable/0 > >The log ends with basically all of the warning-related tests in Twisted >failing, reporting that no warnings happened. Just to follow up on this part, the failures are due to the tests expecting to be able to override a different function in the warnings module, not showwarning (warn_explicit). We used warn_explicit because there's no way to clear way to disable the filtering that gets applied to showwarning. warn_explicit doesn't claim to be a public hook, so I guess I won't complain about this. :) The below behavior still seems wrong to me, though. >There is also some strange behavior that can be easily observed in the REPL: > > exarkun at boson:~/Projects/python/trunk$ ./python >/home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7: >DeprecationWarning: the sets module is deprecated > from sets import Set > Python 2.6a2+ (trunk:62636M, May 2 2008, 09:19:41) [GCC 4.1.3 >20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import warnings > >>> warnings.warn("foo") > :1: UserWarning: foo # Where'd the module name go? > >>> def f(*a): > ... print a > ... > >>> warnings.showwarning = f > >>> warnings.warn("foo") > >>> # Where'd the warning go? > >Any ideas on this? > >Jean-Paul From guido at python.org Fri May 2 15:56:33 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 2 May 2008 06:56:33 -0700 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD11C.4020806@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> Message-ID: I'm withdrawing my opposition in the light of the sheer number of words that have already been written with this. On Fri, May 2, 2008 at 1:30 AM, Christian Heimes wrote: > Guido van Rossum schrieb: > > > I like this, except one issue: I really don't like the .local > > directory. I don't see any compelling reason why this needs to be > > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide > > it from view, especially since the user is expected to manage this > > explicitly. > > The directory name has been commented on by glyph in great length > (again). Thanks glyph! I'm all on his side. The base directory for > Python related files should be a dot directory in the root directory of > the users home dir. I slightly prefer ~/.local/ over other suggestions > but I'm also open to ~/.python.d/ > > Should I wait with the commit until we have agreed on a directory name > or do you want me to commit the code now? > > > > I might look at this later; but it seems to me to be a pure > > optimization and thus not required to be in before the first beta. > > Correct, it's an optimization to enhance the memory utilization. > > Christian > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Fri May 2 15:59:46 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 02 May 2008 23:59:46 +1000 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: <481B1E52.908@gmail.com> Barry Warsaw wrote: > Time is running short to get any new features into Python 2.6 and 3.0. > The release after this one is scheduled to be the first beta release, at > which time we will institute a feature freeze. If your feature doesn't > make it in by then, you'll have to wait until 2.7/3.1. If there is > something that absolutely must go into 2.6/3.0 be sure that there is a > bug issue open for it and that the Priority is set to 'release > blocker'. I may reduce it to critical for the next alpha, but we'll > review all the release blocker and critical issues for the first 2.6 and > 3.0 beta releases. I tried to bump http://bugs.python.org/issue643841 ("New class special method lookup change") up to release blocker, but the bug tracker still appears to be a bit flaky (it keeps giving me an error when I try to submit the change - unfortunately I can't submit anything about it to the metatracker, because I've forgotten my password for it and the metatracker is getting a connection refused when it tries to send the reminder email :P). Here's the comment I was trying to submit along with the bug priority change: """Bumping the priority on this to release blocker for 3.0 - I think we need to have a good answer for the folks who've written old-style __getattr__ based auto-delegating classes before removing old-style classes entirely in 3.0. We could get away with ignoring the issue in the past because people had the option of just using an old-style class rather than having to deal with the difficulties of doing this with a new-style class. With 3.0, that approach is being eliminated. A ProxyMixin class written in Python would address that need (and shouldn't be particularly hard to write), but I'm not sure where it would go in the standard library.""" Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fdrake at acm.org Fri May 2 19:53:54 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 2 May 2008 13:53:54 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> Message-ID: On May 1, 2008, at 7:54 PM, Barry Warsaw wrote: > Interesting. I'm of the opposite opinion. I really don't want > Python dictating to me what my home directory should look like (a > dot file doesn't count because so many tools conspire to hide it > from me). I guess there's always $PYTHONUSERBASE, but I think I > will not be alone. ;) Using ~/.local/ for user-managed content doesn't seem right to me at all, because it's hidden by default. If user-local package installs went to ~/ by default (~/bin/ for scripts, ~/lib/python/ or ~/lib/pythonX.Y/ for modules and packages), with a way to set an alternate "prefix" instead of ~/ using a distutils configuration setting, I'd be happy enough. I'd be even happier if there were no default per-user location, but a required configuration setting (in the existing distutils config locations) in order to enable per-user installation. -Fred -- Fred Drake From janssen at parc.com Fri May 2 20:03:14 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 2 May 2008 11:03:14 PDT Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD11C.4020806@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> Message-ID: <08May2.110318pdt."58696"@synergy1.parc.xerox.com> > I slightly prefer ~/.local/ over other suggestions > but I'm also open to ~/.python.d/ Guido's point about it not being necessarily "local" is a good one. I use lots of computers; they all automount my home directory (~) from a network file server. Nothing under that directory should be machine-specific. My .login and .xinitrc scripts check the machine ID and do different things on different machines. Bill From janssen at parc.com Fri May 2 20:10:27 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 2 May 2008 11:10:27 PDT Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD771.6040802@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de> Message-ID: <08May2.111033pdt."58696"@synergy1.parc.xerox.com> > Windows and Mac OS X have dedicated directories for application specific > libraries. That is ~/Library on Mac and Application Data on Windows. In fact, I had to write code for this, and had to read the specs for each. Here's the code (I've substituted Python for UpLib): if sys.platform == 'darwin': listdir = os.path.expanduser(os.path.join("~", "Library", "Application Support", "org.python")) elif sys.platform == 'win32': if os.environ.has_key('APPDATA'): listdir = os.path.join(os.environ['APPDATA'], 'Python') elif os.environ.has_key('USERPROFILE'): listdir = os.path.join(os.environ['USERPROFILE'], 'Application Data', 'Python') elif os.environ.has_key('HOMEDIR') and os.environ.has_key('HOMEPATH'): listdir = os.path.join(os.environ['HOMEDIR'], os.environ['HOMEPATH'], 'Python') else: listdir = os.path.join(os.path.expanduser("~"), 'Python') else: # pretty much has to be unix listdir = os.path.expanduser(os.path.join("~", ".python")) From guido at python.org Fri May 2 21:56:35 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 2 May 2008 12:56:35 -0700 Subject: [Python-3000] Special offer! Ten code reviews Message-ID: I'd like to get some more people trying out codereview.appspot.com, so I'm offering the first 10 people to submit a new patch there for my review to do the review by Monday. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Fri May 2 22:14:52 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 03 May 2008 05:14:52 +0900 Subject: [Python-3000] ~/.local [was: Reminder: last alphas next Wednesday 07-May-2008] In-Reply-To: <08May2.110318pdt."58696"@synergy1.parc.xerox.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> <08May2.110318pdt."58696"@synergy1.parc.xerox.com> Message-ID: <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> Bill Janssen replied to Christian Heimes as follows:: > > I slightly prefer ~/.local/ over other suggestions > > but I'm also open to ~/.python.d/ > > Guido's point about it not being necessarily "local" is a good one. Christian Heimes (I think) wrote: > Windows and Mac OS X have dedicated directories for application specific > libraries. That is ~/Library on Mac and Application Data on Windows. You're both missing the point of what's wanted here, I suspect. I can't speak for others, but I do want "~/.local" and I agree with the uses Glyph suggests for it. I grant that "local" may not be a good word for it in the context of a personal system in a corporate environment, but here's how I think about it. What it means (to me in the context of Unix-y system organization) is "this is where I put stuff that I would be happy to have as part of the system I was given (by some authority: my boss, Microsoft, or Brett Cannon's stdlib PEP), but for some reason I'm not comfortable/ permitted to install it as system software." It could physically reside on the moon (given a tachyon backbone ) and unlike Mac-ish ~/Library or "Application Data" on Windows data *about me* or my use of the application *does not* go there. From janssen at parc.com Fri May 2 22:26:12 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 2 May 2008 13:26:12 PDT Subject: [Python-3000] ~/.local [was: Reminder: last alphas next Wednesday 07-May-2008] In-Reply-To: <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> <08May2.110318pdt."58696"@synergy1.parc.xerox.com> <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <08May2.132618pdt."58696"@synergy1.parc.xerox.com> > What it means (to me in the context of Unix-y system organization) is > "this is where I put stuff that I would be happy to have as part of > the system I was given (by some authority: my boss, Microsoft, or > Brett Cannon's stdlib PEP), but for some reason I'm not comfortable/ > permitted to install it as system software." Yeah, I was just pointing out that for me, "~" ports across a number of different machines, and putting stuff specific to any particular machine in there needs more thought. For UpLib, I generate machine UUIDs from characteristics of the machine, using uuidgen, and store compiled code and other machine specific things in a subdirectory with that UUID. Otherwise, we end up trying to execute PPC compiled shared libraries on a SPARC platform, or Python 2.5 extensions with Python 2.3. Bill From solipsis at pitrou.net Fri May 2 22:32:33 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 May 2008 20:32:33 +0000 (UTC) Subject: [Python-3000] Special offer! Ten code reviews References: Message-ID: Guido van Rossum python.org> writes: > > I'd like to get some more people trying out codereview.appspot.com, so > I'm offering the first 10 people to submit a new patch there for my > review to do the review by Monday. I just tried to submit a patch using the Web form, and got a 500 Server Error... From musiccomposition at gmail.com Fri May 2 23:09:05 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 2 May 2008 16:09:05 -0500 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com> On Thu, May 1, 2008 at 11:41 AM, Guido van Rossum wrote: > Some of you may have seen a video recorded in November 2006 where I > showed off Mondrian, a code review tool that I was developing for > Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped > that I could release Mondrian as open source, but it was not to be: > due to its popularity inside Google, it became more and more tied to > proprietary Google infrastructure like Bigtable, and it remained > limited to Perforce, the commercial revision control system most used > at Google. I was salivating over that video, so I'm really excited be able to try out something like it now. > Don't hesitate to drop me a note with feedback -- note though that > there are a few known issues listed at the end of the Help page. The > Help page is really a wiki, so feel free to improve it! My request at the moment is to let people use their real names for display; my email address does not at all resemble my name. -- Cheers, Benjamin Peterson From musiccomposition at gmail.com Fri May 2 23:13:47 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 2 May 2008 16:13:47 -0500 Subject: [Python-3000] Special offer! Ten code reviews In-Reply-To: References: Message-ID: <1afaf6160805021413r3527734cna2a36f1dd6dd5204@mail.gmail.com> On Fri, May 2, 2008 at 3:32 PM, Antoine Pitrou wrote: > I just tried to submit a patch using the Web form, and got a 500 Server Error... It's been fixed. -- Cheers, Benjamin Peterson From guido at python.org Fri May 2 23:25:52 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 2 May 2008 14:25:52 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com> References: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com> Message-ID: On Fri, May 2, 2008 at 2:09 PM, Benjamin Peterson wrote: > On Thu, May 1, 2008 at 11:41 AM, Guido van Rossum wrote: > > Some of you may have seen a video recorded in November 2006 where I > > showed off Mondrian, a code review tool that I was developing for > > Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped > > that I could release Mondrian as open source, but it was not to be: > > due to its popularity inside Google, it became more and more tied to > > proprietary Google infrastructure like Bigtable, and it remained > > limited to Perforce, the commercial revision control system most used > > at Google. > > I was salivating over that video, so I'm really excited be able to try > out something like it now. > > > > Don't hesitate to drop me a note with feedback -- note though that > > there are a few known issues listed at the end of the Help page. The > > Help page is really a wiki, so feel free to improve it! > > My request at the moment is to let people use their real names for > display; my email address does not at all resemble my name. I've noticed. Surely there's an interesting story there. :-) The feature request is on my TODO list. The design is a bit involved, since I'd have to ask people to register and maintain a userid -> nickname mapping; the Google Account API we're piggybacking on only gives you the email address. Once it's open sourced (Monday?) I'd love to see contributions like this! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From musiccomposition at gmail.com Fri May 2 23:28:55 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 2 May 2008 16:28:55 -0500 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com> Message-ID: <1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com> On Fri, May 2, 2008 at 4:25 PM, Guido van Rossum wrote: > > My request at the moment is to let people use their real names for > > display; my email address does not at all resemble my name. > > I've noticed. Surely there's an interesting story there. :-) Maybe I tell you why next PyCon... One more question: What's the number on the upper right hand corner by my username? -- Cheers, Benjamin Peterson From brett at python.org Fri May 2 23:27:48 2008 From: brett at python.org (Brett Cannon) Date: Fri, 2 May 2008 14:27:48 -0700 Subject: [Python-3000] [Python-Dev] warnings.showwarning (was Re: Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: <20080502134716.6859.1256230877.divmod.quotient.58108@ohm> References: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm> <20080502134716.6859.1256230877.divmod.quotient.58108@ohm> Message-ID: On Fri, May 2, 2008 at 6:47 AM, Jean-Paul Calderone wrote: [SNIP] > > Hi Brett, > > > > I'm still seeing some strange behavior from the warnings module, This > > can be observed on the community buildbot for Twisted, for example: > > > > > http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171 > /step-Twisted.zope.stable/0 > > > > The log ends with basically all of the warning-related tests in Twisted > > failing, reporting that no warnings happened. > > > > Just to follow up on this part, the failures are due to the tests expecting > to be able to override a different function in the warnings module, not > showwarning (warn_explicit). We used warn_explicit because there's no way > to clear way to disable the filtering that gets applied to showwarning. > warn_explicit doesn't claim to be a public hook, so I guess I won't > complain > about this. :) > Yeah, you guys are being naughty by replacing that and expecting stuff still to work. =) > The below behavior still seems wrong to me, though. > > > > There is also some strange behavior that can be easily observed in the > REPL: > > > > exarkun at boson:~/Projects/python/trunk$ ./python > /home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7: > DeprecationWarning: the sets module is deprecated > > from sets import Set > > Python 2.6a2+ (trunk:62636M, May 2 2008, 09:19:41) [GCC 4.1.3 > 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import warnings > > >>> warnings.warn("foo") > > :1: UserWarning: foo # Where'd the module name go? > > >>> def f(*a): > > ... print a > > ... > > >>> warnings.showwarning = f > > >>> warnings.warn("foo") > > >>> # Where'd the warning go? > > > > Any ideas on this? If you run this in a stock 2.5 interpreter I get something similar except the missing '__main__'. If I run it with PYTHONSTARTUP set it actually uses that module for some reason as the source. I created issue2743 to fix the output at the interpreter, but I made it a critical bug since it is only at the interpreter (and thus breaking people's code will be small), but it should still be fixed since 'warnings' is a core piece of infrastructure. -Brett -Brett From guido at python.org Fri May 2 23:39:54 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 2 May 2008 14:39:54 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: <1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com> References: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com> <1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com> Message-ID: On Fri, May 2, 2008 at 2:28 PM, Benjamin Peterson wrote: > One more question: What's the number on the upper right hand corner by > my username? It's a debugging counter. It gets reset each time a new service instance is created. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Sat May 3 00:33:15 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 2 May 2008 18:33:15 -0400 Subject: [Python-3000] Displaying strings containing unicode escapes References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> Message-ID: "Nick Coghlan" wrote in message news:481AF228.2080900 at gmail.com... | Terry Reedy wrote: | > I think standard Python should somehow have two options: escape everything | > but ASCII (for unambuguity and old display systems) and escape nothing that | > is potentially printable (leaving partially capable systems to fare as they | > will). In-between solutions will ultimately be programmer and system | > specific. | | If repr() is made to work as Martin suggests (i.e. only escape the | unprintable stuff), then the unicode_escape codec can be used fairly | easily to restore the 2.x escape everything non-ASCII behaviour. so print(s.encode('unicode_escape)) ? Fine with me, especially if that or whatever is added to the repr() doc. From stephen at xemacs.org Sat May 3 01:42:49 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 03 May 2008 08:42:49 +0900 Subject: [Python-3000] ~/.local [was: Reminder: last alphas next Wednesday 07-May-2008] In-Reply-To: <08May2.132618pdt."58696"@synergy1.parc.xerox.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> <08May2.110318pdt."58696"@synergy1.parc.xerox.com> <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> <08May2.132618pdt."58696"@synergy1.parc.xerox.com> Message-ID: <87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp> Bill Janssen writes: > Yeah, I was just pointing out that for me, "~" ports across a number > of different machines, and putting stuff specific to any particular > machine in there needs more thought. Sure. But AIUI that's not the problem that "~/.local" is intended to solve. Also, it's a generic problem of networked environments, not in any way limited to "~", which should be susceptible to the usual solutions for multiarchitecture installations (eg subdirectories named by GNU's CPU-OS-VENDOR convention, or your UUID convention). In particular, "pure Python" programs shouldn't much care, right? From janssen at parc.com Sat May 3 02:51:53 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 2 May 2008 17:51:53 PDT Subject: [Python-3000] ~/.local [was: Reminder: last alphas next Wednesday 07-May-2008] In-Reply-To: <87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> <08May2.110318pdt."58696"@synergy1.parc.xerox.com> <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> <08May2.132618pdt."58696"@synergy1.parc.xerox.com> <87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <08May2.175159pdt."58696"@synergy1.parc.xerox.com> > In particular, "pure Python" programs shouldn't much care, right? With the addition of ctypes, "pure" Python programs aren't so pure anymore. But even that should work across architectures, right? > Also, it's a generic problem of networked environments, not in > any way limited to "~", which should be susceptible to the usual > solutions for multiarchitecture installations (eg subdirectories named > by GNU's CPU-OS-VENDOR convention, or your UUID convention). Yep. I'm just pointing out that networked environments are becoming more common, not less common. Bill From skip at pobox.com Sat May 3 02:03:20 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 2 May 2008 19:03:20 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> Message-ID: <18459.43976.85481.758104@montanaro-dyndns-org.local> Fred> If user-local package installs went to ~/ by default ... with a Fred> way to set an alternate "prefix" instead of ~/ using a distutils Fred> configuration setting, I'd be happy enough. +1 from me. Skip From ishimoto at gembook.org Sat May 3 02:54:24 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 3 May 2008 09:54:24 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> Message-ID: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> On Sat, May 3, 2008 at 7:33 AM, Terry Reedy wrote: > so print(s.encode('unicode_escape)) ? > Fine with me, especially if that or whatever is added to the repr() doc. > I don't recommend repr(obj).encode('unicode_escape'), because backslash characters in the string will be escaped again by the codec. >>> print(repr("\\")) '\\' >>> print(str(repr("\\").encode("unicode-escape"), "ASCII")) '\\\\' 'ASCII' codec with 'backslashreplace' error handler works better. >>> print(str(repr("\\").encode("ASCII", "backslashreplace"), "ASCII")) '\\' Looks complicated to get same result as Python 2.x. I originally proposed to allow print(repr('\\'), encoding="ASCII", errors="backslashreplace") to get same result, but this is hard to implement. If requirement for ASCII-repr is popular enough, we can provide a built-in function like this: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") 2to3 can use repr_ascii() for better compatibility. Is new built-in function desirable, or just document is good enough? From tjreedy at udel.edu Sat May 3 02:02:56 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 2 May 2008 20:02:56 -0400 Subject: [Python-3000] PEP 8 Style Guide and Python 3 Message-ID: At least one of the style recommendations in PEP 8 -- use class rather that string exceptions -- is obsolete in Py 3. And there are others, and perhaps others where the spirit of the recommendation is the same but details are different. For a new Python 3 programmer who does not need or want to know anything about Python 2, reading about 'string exceptions' would be confusing. One possibility for isolation is for each major section to have separate 2.x and 3.x subsections. But where there are several scattered changes, this would require large chunks of duplication. For instance, under Prescriptive: Naming Conventions Package and Module Names Modules should have... becomes Modules must have ... (I presume, hence the renaming project). But all three paragraphs would have to be duplicated in 2.x and 3.x to be coherent, and then they would not be in their sensible place. A couple of paragraphs on, 'because exceptions should be classes' becomes 'because exceptions are classes'. Again, moving two variants to 2.x and 3.x sections would be awkward. So, especially if PEP 8 is considered more or less frozen, I suggest the possibility of a new PEP 3008, Python 3 style guide. Terry Jan Reedy From martin at v.loewis.de Sat May 3 09:34:55 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 03 May 2008 09:34:55 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> Message-ID: <481C159F.9080409@v.loewis.de> > Is new built-in function desirable, or just document is good enough? Traditionally, I take the position that new built-in functions are rarely desirable; this one is no exception. Regards, Martin From ncoghlan at gmail.com Sat May 3 10:48:52 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 03 May 2008 18:48:52 +1000 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <481C159F.9080409@v.loewis.de> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <481C159F.9080409@v.loewis.de> Message-ID: <481C26F4.1030700@gmail.com> Martin v. L?wis wrote: >> Is new built-in function desirable, or just document is good enough? > > Traditionally, I take the position that new built-in functions are > rarely desirable; this one is no exception. I agree with that, but string.repr_ascii may be a reasonable thing to add. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat May 3 11:05:43 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 03 May 2008 19:05:43 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <18459.43976.85481.758104@montanaro-dyndns-org.local> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> Message-ID: <481C2AE7.9010805@gmail.com> skip at pobox.com wrote: > Fred> If user-local package installs went to ~/ by default ... with a > Fred> way to set an alternate "prefix" instead of ~/ using a distutils > Fred> configuration setting, I'd be happy enough. > > +1 from me. But then we clutter up people's (read *my*) home directory with no way for them to do anything about it. We should stay out of people's way by default, while making it easy for them to poke around if they want to. The ~/.local convention does that, but using ~/ directly does not. The major reasons why I think staying out of people's way by default is important: - for people like me (glyph, Georg, etc), it allows us to keep our home directory organised the way we like it. As far as I am concered, applications can store whatever user-specific configuration and data files they like inside hidden files or directories, but they shouldn't be inflicting any visible files on me that aren't related to things I am working on. - for novice users, the fact that it's hidden helps keep them from deleting it by accident - for experienced users (Barry, skip, etc) that want ~/.local to be more easily accessible, creating a visible ~/local symlink is an utterly trivial exercise. Switching the default to use public directories instead of hidden ones helps the third group at the expense of the first two groups. Given that the third group already has an easy workaround to get the behaviour they want, that seems like a bad trade-off to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat May 3 11:08:13 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 03 May 2008 19:08:13 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <08May2.110318pdt."58696"@synergy1.parc.xerox.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <481AD11C.4020806@cheimes.de> <08May2.110318pdt."58696"@synergy1.parc.xerox.com> Message-ID: <481C2B7D.7000403@gmail.com> Bill Janssen wrote: >> I slightly prefer ~/.local/ over other suggestions >> but I'm also open to ~/.python.d/ > > Guido's point about it not being necessarily "local" is a good one. I > use lots of computers; they all automount my home directory (~) from a > network file server. Nothing under that directory should be > machine-specific. My .login and .xinitrc scripts check the machine ID > and do different things on different machines. So long as the machine-specific stuff gets installed to architecture specific directories as they do under /usr/local, I don't see why this would be a problem. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Sat May 3 18:07:03 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 3 May 2008 09:07:03 -0700 Subject: [Python-3000] PEP 8 Style Guide and Python 3 In-Reply-To: References: Message-ID: I'd much rather stick with a single style guide; PEP 8 can be revised as needed. I suggest that we preface the 2.x-specific things with words like "in Python 2, ..." but by and large focus the style guide on Py3k. We could even migrate the rules that are only relevant to 2.x to an Appendix-like chapter. That can then be easily deleted at some point in the future. On Fri, May 2, 2008 at 5:02 PM, Terry Reedy wrote: > At least one of the style recommendations in PEP 8 -- use class rather that > string exceptions -- is obsolete in Py 3. And there are others, and > perhaps others where the spirit of the recommendation is the same but > details are different. > > For a new Python 3 programmer who does not need or want to know anything > about Python 2, reading about 'string exceptions' would be confusing. > > One possibility for isolation is for each major section to have separate > 2.x and 3.x subsections. But where there are several scattered changes, > this would require large chunks of duplication. For instance, under > > Prescriptive: Naming Conventions > Package and Module Names > Modules should have... > > becomes Modules must have ... (I presume, hence the renaming project). > But all three paragraphs would have to be duplicated in 2.x and 3.x to be > coherent, and then they would not be in their sensible place. > > A couple of paragraphs on, 'because exceptions should be classes' becomes > 'because exceptions are classes'. Again, moving two variants to 2.x and > 3.x sections would be awkward. > > So, especially if PEP 8 is considered more or less frozen, I suggest the > possibility of a new PEP 3008, Python 3 style guide. > > Terry Jan Reedy > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sat May 3 13:51:40 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 3 May 2008 06:51:40 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481C2AE7.9010805@gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> Message-ID: <18460.20940.882777.235301@montanaro-dyndns-org.local> Nick> skip at pobox.com wrote: Fred> If user-local package installs went to ~/ by default ... with a Fred> way to set an alternate "prefix" instead of ~/ using a distutils Fred> configuration setting, I'd be happy enough. Skip> +1 from me. Nick> But then we clutter up people's (read *my*) home directory with no Nick> way for them to do anything about it. Fred asked for a --prefix flag (which is what I was voting on). I don't really care what you do by default as long as you give me a way to do it differently. Skip From skip at pobox.com Sat May 3 17:08:32 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 3 May 2008 10:08:32 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> Message-ID: <18460.32752.462983.25145@montanaro-dyndns-org.local> >> - for experienced users (Barry, skip, etc) that want ~/.local to be >> more easily accessible, creating a visible ~/local symlink is an >> utterly trivial exercise. Barry> Hey Nick, I agree with everything above, except that I'd probably Barry> put myself more in Glyph's camp :). Can't speak for Skip Barry> though... I already install everything in ~/local and just have ~/local/bin in my PATH. If I lived in a truly platform-dependent world I'd add platform-dependent ~/local-plat1, ~/local/plat2, etc directories and extend PATH a bit more. Skip From stephen at xemacs.org Sat May 3 13:02:14 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 03 May 2008 20:02:14 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <481C26F4.1030700@gmail.com> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <481C159F.9080409@v.loewis.de> <481C26F4.1030700@gmail.com> Message-ID: <87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Martin v. L?wis wrote: > >> Is new built-in function desirable, or just document is good enough? > > > > Traditionally, I take the position that new built-in functions are > > rarely desirable; this one is no exception. > > I agree with that, but string.repr_ascii may be a reasonable thing to add. But this is basically completely a codec issue. We have an internal representation, and we want to translate it in a stream-oriented way to an external representation. Unless there's an efficiency issue, why not just provide a hook for a codec? From barry at python.org Sat May 3 15:10:30 2008 From: barry at python.org (Barry Warsaw) Date: Sat, 3 May 2008 09:10:30 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481C2AE7.9010805@gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 3, 2008, at 5:05 AM, Nick Coghlan wrote: > > The major reasons why I think staying out of people's way by default > is important: > - for people like me (glyph, Georg, etc), it allows us to keep our > home directory organised the way we like it. As far as I am > concered, applications can store whatever user-specific > configuration and data files they like inside hidden files or > directories, but they shouldn't be inflicting any visible files on > me that aren't related to things I am working on. > - for novice users, the fact that it's hidden helps keep them from > deleting it by accident > - for experienced users (Barry, skip, etc) that want ~/.local to be > more easily accessible, creating a visible ~/local symlink is an > utterly trivial exercise. Hey Nick, I agree with everything above, except that I'd probably put myself more in Glyph's camp :). Can't speak for Skip though... - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSBxkSXEjvBPtnXfVAQKSigP/d6HIeQ5QLZR4QZ7GAIttb0d+8JI6PM0e 3E2+br0jZ9IeDwjjCLIAx1kbfgIX56++NGoU7tQqiQtbcapI3H3Vb+X+VSAcs30L ORj709MDtF2oqXSzEHww5HHeKoZiQ8/FfiaZoXrXzqPVP5k9MSZu1zLrT3rpWAUP 8YLFekz/LUA= =l5be -----END PGP SIGNATURE----- From ncoghlan at gmail.com Sat May 3 20:00:05 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 04 May 2008 04:00:05 +1000 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <481C159F.9080409@v.loewis.de> <481C26F4.1030700@gmail.com> <87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <481CA825.8010606@gmail.com> Stephen J. Turnbull wrote: > Nick Coghlan writes: > > Martin v. L?wis wrote: > > >> Is new built-in function desirable, or just document is good enough? > > > > > > Traditionally, I take the position that new built-in functions are > > > rarely desirable; this one is no exception. > > > > I agree with that, but string.repr_ascii may be a reasonable thing to add. > > But this is basically completely a codec issue. We have an internal > representation, and we want to translate it in a stream-oriented way > to an external representation. Unless there's an efficiency issue, > why not just provide a hook for a codec? It would just be a convenience function to do a string to string conversion in code. I agree for an actual output stream you could just set the encoding to ASCII with backslashreplace error handling. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sat May 3 20:13:08 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 04 May 2008 04:13:08 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> Message-ID: <481CAB34.6040809@gmail.com> Barry Warsaw wrote: > On May 3, 2008, at 5:05 AM, Nick Coghlan wrote: >> - for experienced users (Barry, skip, etc) that want ~/.local to be >> more easily accessible, creating a visible ~/local symlink is an >> utterly trivial exercise. > > Hey Nick, I agree with everything above, except that I'd probably put > myself more in Glyph's camp :). Can't speak for Skip though... I was actually looking at something Fred wrote and managed to misread it as something you had posted - and it turns out Skip was just agreeing with Fred about the 'provide an option to tell distutils to use a different user-specific directory name than the default one' idea, and isn't particularly worried about where the packages go by default. Sorry for the confusion. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From phd at phd.pp.ru Sat May 3 22:02:14 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sun, 4 May 2008 00:02:14 +0400 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> References: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> Message-ID: <20080503200214.GA32314@phd.pp.ru> On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote: > If requirement for ASCII-repr is popular enough, we can provide a > built-in function like this: > > def repr_ascii(obj): > return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") It is hard to apply the function for repr(container). repr(container).encode("unicode_escape") is the only way (at least I don't see any other way). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From martin at v.loewis.de Sat May 3 22:20:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 03 May 2008 22:20:43 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <20080503200214.GA32314@phd.pp.ru> References: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com> <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <20080503200214.GA32314@phd.pp.ru> Message-ID: <481CC91B.307@v.loewis.de> > On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote: >> If requirement for ASCII-repr is popular enough, we can provide a >> built-in function like this: >> >> def repr_ascii(obj): >> return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") > > It is hard to apply the function for repr(container). > repr(container).encode("unicode_escape") is the only way (at least I don't > see any other way). I think Atsuo envisioned you to invoke "repr_ascii(container)". Regards, Martin From phd at phd.pp.ru Sat May 3 22:36:17 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sun, 4 May 2008 00:36:17 +0400 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <481CC91B.307@v.loewis.de> References: <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <20080503200214.GA32314@phd.pp.ru> <481CC91B.307@v.loewis.de> Message-ID: <20080503203617.GA1658@phd.pp.ru> On Sat, May 03, 2008 at 10:20:43PM +0200, "Martin v. L?wis" wrote: > > On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote: > >> If requirement for ASCII-repr is popular enough, we can provide a > >> built-in function like this: > >> > >> def repr_ascii(obj): > >> return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") > > > > It is hard to apply the function for repr(container). > > repr(container).encode("unicode_escape") is the only way (at least I don't > > see any other way). > > I think Atsuo envisioned you to invoke "repr_ascii(container)". Who knows what are string representations of the objects in container; there is a chance .encode() after repr() will escape or unescape the result in a wrong way. I do not insist on anything (I think printable repr() and repr().encode("unicode_escape") satisfy my needs) so I'm just pointing there could be a problem; don't know how important it is. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From martin at v.loewis.de Sat May 3 22:57:06 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 03 May 2008 22:57:06 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <20080503203617.GA1658@phd.pp.ru> References: <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <20080503200214.GA32314@phd.pp.ru> <481CC91B.307@v.loewis.de> <20080503203617.GA1658@phd.pp.ru> Message-ID: <481CD1A2.2030105@v.loewis.de> >>>> def repr_ascii(obj): >>>> return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") >>> It is hard to apply the function for repr(container). >>> repr(container).encode("unicode_escape") is the only way (at least I don't >>> see any other way). >> I think Atsuo envisioned you to invoke "repr_ascii(container)". > > Who knows what are string representations of the objects in container; I know: it's a Unicode object. > there is a chance .encode() after repr() will escape or unescape the result > in a wrong way. No, there is no such chance. > I do not insist on anything (I think printable repr() and > repr().encode("unicode_escape") satisfy my needs) so I'm just pointing > there could be a problem; don't know how important it is. I don't think there is one (except that any repr_ascii function should also *decode* its result back into a string before returning it). Regards, Martin From phd at phd.pp.ru Sat May 3 23:09:00 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sun, 4 May 2008 01:09:00 +0400 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <481CD1A2.2030105@v.loewis.de> References: <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp> <4819F21D.8070808@v.loewis.de> <481AF228.2080900@gmail.com> <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com> <20080503200214.GA32314@phd.pp.ru> <481CC91B.307@v.loewis.de> <20080503203617.GA1658@phd.pp.ru> <481CD1A2.2030105@v.loewis.de> Message-ID: <20080503210900.GB1658@phd.pp.ru> On Sat, May 03, 2008 at 10:57:06PM +0200, "Martin v. L?wis" wrote: > > there is a chance .encode() after repr() will escape or unescape the result > > in a wrong way. > > No, there is no such chance. Ok, then. Probbaly I was wrong. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From fdrake at acm.org Sun May 4 01:34:03 2008 From: fdrake at acm.org (Fred Drake) Date: Sat, 3 May 2008 19:34:03 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <18460.20940.882777.235301@montanaro-dyndns-org.local> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> Message-ID: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> On May 3, 2008, at 7:51 AM, skip at pobox.com wrote: > Fred asked for a --prefix flag (which is what I was voting on). I > don't > really care what you do by default as long as you give me a way to > do it > differently. What's most interesting (to me) is that no one's commented on my note that my preferred approach would be that there's no default at all; the location would have to be specified explicitly. Whether on the command line or in the distutils configuration doesn't matter, but explicitness should be required. -Fred -- Fred Drake From ncoghlan at gmail.com Sun May 4 06:50:45 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 04 May 2008 14:50:45 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> Message-ID: <481D40A5.1050905@gmail.com> Fred Drake wrote: > On May 3, 2008, at 7:51 AM, skip at pobox.com wrote: >> Fred asked for a --prefix flag (which is what I was voting on). I don't >> really care what you do by default as long as you give me a way to do it >> differently. > > What's most interesting (to me) is that no one's commented on my note > that my preferred approach would be that there's no default at all; the > location would have to be specified explicitly. Whether on the command > line or in the distutils configuration doesn't matter, but explicitness > should be required. I thought Christian said something about that defeating one of the main points of the PEP - to allow per-user installation of modules to "just work" for non-administrators. (It may not have been Christian, and it may not have been directly in response to you, but I'm pretty sure I read it somewhere in this thread ;) Anyway, a per-user site-packages directly only "just works" if the standard behaviour of a Python installation is to provide access to the per-user packages without requiring any additional action on the part of the user. A couple of paragraphs in the PEP may also be of interest to you: """For security reasons the user site directory is not added to sys.path when the effective user id or group id is not equal to the process uid / gid [9]. It's an additional barrier against code injection into suid apps. However Python suid scripts must always use the -E and -s option or users can sneak in their own code. The user site directory can be suppressed with a new option -s or the environment variable PYTHONNOUSERSITE. The feature can be disabled globally by setting site.ENABLE_USER_SITE to the value False. It must be set by editing site.py. It can't be altered in sitecustomize.py or later.""" So Python itself turns the feature off automatically for invocation via sudo and the like, and the sysadmin can disable the feature completely through site.py. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From aahz at pythoncraft.com Sun May 4 01:25:51 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 3 May 2008 16:25:51 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> <68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org> Message-ID: <20080503232550.GB28577@panix.com> On Fri, May 02, 2008, Barry Warsaw wrote: > On May 2, 2008, at 1:48 AM, glyph at divmod.com wrote: >> >>In the long term, if everyone followed suit on >>~/.local, that would be great. But I don't want a ~/Python, ~/Java, >>~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as >>my arm just so I can run a few applications without system- >>installing them. > > I hate to send a "me too" messages, but I have to say Glyph is exactly > right here. +1 -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ Help a hearing-impaired person: http://rule6.info/hearing.html From stefan_ml at behnel.de Sun May 4 14:52:42 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 04 May 2008 14:52:42 +0200 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: <481DB19A.20605@behnel.de> Guido van Rossum wrote: > On Thu, May 1, 2008 at 4:37 PM, Neal Becker wrote: >> It would be really nice to see support for some other backends, such as Hg >> or bzr (which are both written in python), in addition to svn. > > Once it's open source feel free to add those! trac supports a pretty wide set of VCSes. http://trac.edgewall.org/wiki/VersioningSystemBackend Maybe your tools could integrate these backends somehow instead of re-implementing yet another suite of VCS backend connectors. Stefan From skip at pobox.com Sun May 4 16:14:59 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 4 May 2008 09:14:59 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> Message-ID: <18461.50403.928218.622685@montanaro-dyndns-org.local> glyph> As I've said a dozen times in this thread already, the feature glyph> I'd like to get from a per-user installation location is that glyph> 'setup.py install', or at least some completely canonical glyph> distutils incantation, should work, by default, for non-root glyph> users; ideally non-administrators on windows as well as non-root glyph> users on unixish platforms. I'm unclear why anything needs changing then. At work we have idiosyncratic central install locations for everything, not just Python. None of this stuff is installed by root. When I want to install some package to test without polluting the central waters I simply run setup.py install with a --prefix arg then set PYTHONPATH to pick up my stuff before the central stuff. I see no reason to change the behavior of setup.py's install command. It gives you the flexibility needed to handle a number of different scenarios. Skip From jnoller at gmail.com Sun May 4 16:17:36 2008 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 4 May 2008 10:17:36 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> Message-ID: <4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com> On Sun, May 4, 2008 at 9:58 AM, wrote: ...snip... > As I've said a dozen times in this thread already, the feature I'd like to > get from a per-user installation location is that 'setup.py install', or at > least some completely canonical distutils incantation, should work, by > default, for non-root users; ideally non-administrators on windows as well > as non-root users on unixish platforms. > This is a big +1 from me. The way I currently work around the "must be root to install stuff" on both OS/X and other Lin/Uni(xes) is via virtualenv.py and a lot of bash environment trickery. If nothing else comes out of this, I think what glyph points out is the ideal, and simplest goal. Ignoring the directory name debate, I would like to see this local "user" dir mirror the normal directory tree that packages installed from distutils/setuptools typically use, namely it should have the: lib/site-packages/ and bin/ directories, and a known parent name. One thing that could be done is pick a default name for the parent, ala ~/Python - but let users override it with an environment variable if they so desire (PYTHON_USER_DIR?) so that those who want it hidden can have it hidden, and those of us who don't, don't. -jesse From ncoghlan at gmail.com Sun May 4 16:22:27 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 05 May 2008 00:22:27 +1000 Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> Message-ID: <481DC6A3.70104@gmail.com> glyph at divmod.com wrote: > As I've said a dozen times in this thread already, the feature I'd like > to get from a per-user installation location is that 'setup.py install', > or at least some completely canonical distutils incantation, should > work, by default, for non-root users; ideally non-administrators on > windows as well as non-root users on unixish platforms. This is what I see as the goal of PEP 370 as well. Perhaps the PEP could be more explicit in spelling that out? """The primary goal of this PEP is to provide a standard mechanism allowing Python users to install distutils packages for their own use without affecting other users of the same machine, and without requiring any change to the packages themselves.""" I think the current Rationale section kind of assumes that the reader already recognises the above paragraph as the reason for the PEP. In the UNIX Notes section, the PEP should probably also state that the reason for choosing a hidden dot-file directory is that users generally aren't going to have any interest in the source files for the Python packages that they install, and that users that would prefer for the files to be visible can easily make a symbolic link to the directory. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From facundobatista at gmail.com Sun May 4 16:49:54 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Sun, 4 May 2008 11:49:54 -0300 Subject: [Python-3000] PEP 8 Style Guide and Python 3 In-Reply-To: References: Message-ID: 2008/5/3, Guido van Rossum : > as needed. I suggest that we preface the 2.x-specific things with > words like "in Python 2, ..." but by and large focus the style guide > on Py3k. We could even migrate the rules that are only relevant to 2.x > to an Appendix-like chapter. That can then be easily deleted at some > point in the future. +1 -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From lists at cheimes.de Sun May 4 18:14:06 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 04 May 2008 18:14:06 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> Message-ID: <481DE0CE.8010306@cheimes.de> > First, Skip, I *only* care about the default behavior. There's already > a way to do it differently: PYTHONPATH. So, Fred, I think what you're > arguing for is to drop this feature entirely. Or is there some other > use for a new way to allow users to explicitly add something to > sys.path, aside from PYTHONPATH? It seems that it would add more > complexity and I can't see what the value would be. PYTHONPATH is lacking one feature which is important for lots of packages and setuptools. The directories in PYTHONPATH are just added to sys.path. But setuptools require a site package directory. Maybe a new env var PYTHONSITEPATH could solve the problem. > As I've said a dozen times in this thread already, the feature I'd like > to get from a per-user installation location is that 'setup.py install', > or at least some completely canonical distutils incantation, should > work, by default, for non-root users; ideally non-administrators on > windows as well as non-root users on unixish platforms. The implementation of my PEP provides a new option for install: $ python setup.py install --user Is it sufficient for you? Christian From lists at cheimes.de Sun May 4 18:19:17 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 04 May 2008 18:19:17 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481C2AE7.9010805@gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> Message-ID: <481DE205.1060108@cheimes.de> Nick Coghlan schrieb: > - for experienced users (Barry, skip, etc) that want ~/.local to be more > easily accessible, creating a visible ~/local symlink is an utterly > trivial exercise. Our you can set the environment variable PYTHONUSERBASE to $HOME. PYTHONUSERBASE is the root directory for user specific data: def addusersitepackages(known_paths): """Add a per user site-package to sys.path Each user has its own python directory with site-packages in the home directory. USER_BASE is the root directory for all Python versions USER_SITE is the user specific site-packages directory USER_SITE/.. can be used for data. """ global USER_BASE, USER_SITE env_base = os.environ.get("PYTHONUSERBASE", None) def joinuser(*args): return os.path.expanduser(os.path.join(*args)) #if sys.platform in ('os2emx', 'riscos'): # # Don't know what to put here # USER_BASE = '' # USER_SITE = '' if os.name == "nt": base = os.environ.get("APPDATA") or "~" USER_BASE = env_base if env_base else joinuser(base, "Python") USER_SITE = os.path.join(USER_BASE, "Python" + sys.version[0] + sys.version[2], "site-packages") else: USER_BASE = env_base if env_base else joinuser("~", ".local") USER_SITE = os.path.join(USER_BASE, "lib", "python" + sys.version[:3], "site-packages") if os.path.isdir(USER_SITE): addsitedir(USER_SITE, known_paths) return known_paths Christian From lists at cheimes.de Sun May 4 18:24:38 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 04 May 2008 18:24:38 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com> Message-ID: <481DE346.2040608@cheimes.de> Jesse Noller schrieb: > One thing that could be done is pick a default name for the parent, > ala ~/Python - but let users override it with an environment variable > if they so desire (PYTHON_USER_DIR?) so that those who want it hidden > can have it hidden, and those of us who don't, don't. Has anybody read my PEP or do I need a Christian's English to real English converter? *g* >From my PEP 370: --- The path to the user base directory can be overwritten with the environment variable PYTHONUSERBASE. The default location is used when PYTHONUSERBASE is not set or empty. --- PYTHONUSERBASE defaults to ~/.local/ on Unix. In order to install packages in ~/lib, ~/bin etc directly you can do export PYTHONUSERBASE=$HOME in your .bashrc or .profile. Christian From jimjjewett at gmail.com Sun May 4 20:01:08 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 4 May 2008 14:01:08 -0400 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <4819F9E7.9040706@v.loewis.de> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <4819F9E7.9040706@v.loewis.de> Message-ID: On 5/1/08, "Martin v. L?wis" wrote: > - escaping looks like this: > * \r, \n, \t, \\ > * \xXX for characters from Latin-1 > * \uXXXX for characters from the BMP > * \U00XXXXXX for anything else > What I didn't have in my original proposal was escaping of Zs > except for space, which then would also escape NBSP, EN QUAD, > EM QUAD, THIN SPACE, HAIR SPACE, OGHAM SPACE MARK, etc. Escaping > them is fine also. Also, I didn't consider surrogate pairs in > UCS-2 builds originally; they should (of course) get represented > as-is. I realize that this is the traditional escape form, but I wonder if it might be better to just use the character names instead of the hex character codes. The names can be written in ASCII, they are unambiguous, and they are easier to understand than a random hex value. -jJ From lists at cheimes.de Sun May 4 21:57:26 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 04 May 2008 21:57:26 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481DE4CD.7070401@egenix.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <481DE0CE.8010306@cheimes.de> <481DE4CD.7070401@egenix.com> Message-ID: <481E1526.6000903@cheimes.de> M.-A. Lemburg schrieb: >> PYTHONPATH is lacking one feature which is important for lots of >> packages and setuptools. The directories in PYTHONPATH are just added to >> sys.path. But setuptools require a site package directory. Maybe a new >> env var PYTHONSITEPATH could solve the problem. > > We don't need another setup variable for this. Just place a > well-known module into the site-packages/ directory and then > query it's __file__ attribute, e.g. > > site-packages/site_packages.py > > The module could even include a few helpers to query various > settings which apply to the site packages directory, e.g. > > site_packages.get_dir() > site_packages.list_packages() > site_packages.list_modules() > etc. I don't see how it is going to solve the use case "Add another site package directory when I don't have write access to the global site package directory and I don't want to modify my apps." > Just in case you don't know... > > python setup.py install --home=~ > > will install to ~/lib/python > > The problem is not getting the packages installed in a non-admin > location. It's about Python looking in a non-admin location per > default (as well as in the site-packages location). I know the --home option. For one the --home option is Unix only and not supported on Windows Also the --user option takes all options of my PEP 370 user site directory into account, includinge the PYTHONUSERBASE env var. Christian From lists at cheimes.de Sun May 4 21:59:51 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 04 May 2008 21:59:51 +0200 Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: <481DC6A3.70104@gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <481DC6A3.70104@gmail.com> Message-ID: <481E15B7.9060003@cheimes.de> Nick Coghlan schrieb: > This is what I see as the goal of PEP 370 as well. Perhaps the PEP could > be more explicit in spelling that out? > > """The primary goal of this PEP is to provide a standard mechanism > allowing Python users to install distutils packages for their own use > without affecting other users of the same machine, and without requiring > any change to the packages themselves.""" > > I think the current Rationale section kind of assumes that the reader > already recognises the above paragraph as the reason for the PEP. Good point ;) The author of the PEP was kinda sure all readers would recognize the ratio. Again explicit is better than implicit. I'll update the PEP later. > In the UNIX Notes section, the PEP should probably also state that the > reason for choosing a hidden dot-file directory is that users generally > aren't going to have any interest in the source files for the Python > packages that they install, and that users that would prefer for the > files to be visible can easily make a symbolic link to the directory. Good point, too. Thanks Nick! Christian From stephen at xemacs.org Sun May 4 23:02:56 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 05 May 2008 06:02:56 +0900 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <4819F9E7.9040706@v.loewis.de> Message-ID: <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > I realize that this is the traditional escape form, but I wonder if it > might be better to just use the character names instead of the hex > character codes. That would require changing the parser, no? Of all types, string had better roundtrip through repr()! From ncoghlan at gmail.com Mon May 5 04:22:28 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 05 May 2008 12:22:28 +1000 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <4819F9E7.9040706@v.loewis.de> <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <481E6F64.80902@gmail.com> Stephen J. Turnbull wrote: > Jim Jewett writes: > > > I realize that this is the traditional escape form, but I wonder if it > > might be better to just use the character names instead of the hex > > character codes. > > That would require changing the parser, no? Of all types, string had > better roundtrip through repr()! The string parser has understood Unicode names for quite some time (examples use 2.5.1): >>> print u"\N{GREEK SMALL LETTER ALPHA}" ? >>> print u"\N{GREEK CAPITAL LETTER ALPHA}" ? >>> print u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}" ? Using the names gets fairly verbose compared to the hex escapes though: >>> u"\N{GREEK SMALL LETTER ALPHA}" u'\u03b1' >>> u"\N{GREEK CAPITAL LETTER ALPHA}" u'\u0391' >>> u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}" u'\u0386' Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Mon May 5 05:38:04 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 4 May 2008 20:38:04 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: This code is now open source! Browse it here: http://code.google.com/p/rietveld/source/browse --Guido On Thu, May 1, 2008 at 9:41 AM, Guido van Rossum wrote: > Some of you may have seen a video recorded in November 2006 where I > showed off Mondrian, a code review tool that I was developing for > Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped > that I could release Mondrian as open source, but it was not to be: > due to its popularity inside Google, it became more and more tied to > proprietary Google infrastructure like Bigtable, and it remained > limited to Perforce, the commercial revision control system most used > at Google. > > What I'm announcing now is the next best thing: an code review tool > for use with Subversion, inspired by Mondrian and (soon to be) > released as open source. Some of the code is even directly derived > from Mondrian. Most of the code is new though, written using Django > and running on Google App Engine. > > I'm inviting the Python developer community to try out the tool on the > web for code reviews. I've added a few code reviews already, but I'm > hoping that more developers will upload at least one patch for review > and invite a reviewer to try it out. > > To try it out, go here: > > http://codereview.appspot.com > > Please use the Help link in the top right to read more on how to use > the app. Please sign in using your Google Account (either a Gmail > address or a non-Gmail address registered with Google) to interact > more with the app (you need to be signed in to create new issues and > to add comments to existing issues). > > Don't hesitate to drop me a note with feedback -- note though that > there are a few known issues listed at the end of the Help page. The > Help page is really a wiki, so feel free to improve it! > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 5 05:42:15 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 4 May 2008 20:42:15 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: I forgot -- you need to link or copy the 'django' directory from Django 0.97.pre into the app directory. Otherwise you'll be using the Django 0.96.1 that's included with the AppEngine runtime, and the code is not compatible with that version. On Sun, May 4, 2008 at 8:38 PM, Guido van Rossum wrote: > This code is now open source! Browse it here: > > http://code.google.com/p/rietveld/source/browse > > --Guido > > > > On Thu, May 1, 2008 at 9:41 AM, Guido van Rossum wrote: > > Some of you may have seen a video recorded in November 2006 where I > > showed off Mondrian, a code review tool that I was developing for > > Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped > > that I could release Mondrian as open source, but it was not to be: > > due to its popularity inside Google, it became more and more tied to > > proprietary Google infrastructure like Bigtable, and it remained > > limited to Perforce, the commercial revision control system most used > > at Google. > > > > What I'm announcing now is the next best thing: an code review tool > > for use with Subversion, inspired by Mondrian and (soon to be) > > released as open source. Some of the code is even directly derived > > from Mondrian. Most of the code is new though, written using Django > > and running on Google App Engine. > > > > I'm inviting the Python developer community to try out the tool on the > > web for code reviews. I've added a few code reviews already, but I'm > > hoping that more developers will upload at least one patch for review > > and invite a reviewer to try it out. > > > > To try it out, go here: > > > > http://codereview.appspot.com > > > > Please use the Help link in the top right to read more on how to use > > the app. Please sign in using your Google Account (either a Gmail > > address or a non-Gmail address registered with Google) to interact > > more with the app (you need to be signed in to create new issues and > > to add comments to existing issues). > > > > Don't hesitate to drop me a note with feedback -- note though that > > there are a few known issues listed at the end of the Help page. The > > Help page is really a wiki, so feel free to improve it! > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From schmir at gmail.com Mon May 5 13:28:06 2008 From: schmir at gmail.com (Ralf Schmitt) Date: Mon, 5 May 2008 13:28:06 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: <932f8baf0805050428t2cd31b00x9c60d8ef43b5828c@mail.gmail.com> On Thu, May 1, 2008 at 10:26 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > This is a reminder that the LAST planned alpha releases of Python 2.6 and > 3.0 are scheduled for next Wednesday, 07-May-2008. Please be diligent over > the next week so that none of your changes break Python. The stable > buildbots look moderately okay, let's see what we can do about getting them > all green: > > http://www.python.org/dev/buildbot/stable/ > > We have a few showstopper bugs, and I will be looking at these more > carefully starting next week. > > > http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers > running the testsuite segfaults on my 64 bit debian testing in test_pyexpat. This does not happen in a debug build: test_pyclbr test_pydoc test_pyexpat Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x2b573851a6e0 (LWP 19486)] 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00002aaaaf694f0a in doContent (parser=0x1b4bab0, startTagLevel=0, enc=0x1b4dba0, s=0x1b4cb3c "", ' ' , "frozenset([frozenset([2]),\n", ' ' , "frozenset([0,\n", ' ' ..., end=0x1b4cb40 ' ' , "frozenset([frozenset([2]),\n", ' ' , "frozenset([0,\n", ' ' ..., nextPtr=0x1b4bae0, haveMore=1 '\001') at extensions/expat/lib/xmlparse.c:2540 #2 0x00002aaaaf6972ee in contentProcessor (parser=0x1b4bab0, start=0x0, end=0x1b4c470 " q\001", endPtr=0x0) at extensions/expat/lib/xmlparse.c:2003 #3 0x00002aaaaf698662 in doProlog (parser=0x1b4bab0, enc=0x1b4dba0, s=0x1b4c738 "", 'a' ..., end=0x1b4cb40 ' ' , "frozenset([frozenset([2]),\n", ' ' , "frozenset([0,\n", ' ' ..., tok=29, next=0x1b4c738 "", 'a' ..., nextPtr=0x1b4bae0, haveMore=1 '\001') at extensions/expat/lib/xmlparse.c:3803 #4 0x00002aaaaf69adc3 in prologInitProcessor (parser=0x1b4bab0, s=0x1b4c710 "", 'a' ..., end=0x1b4cb40 ' ' , "frozenset([frozenset([2]),\n", ' ' , "frozenset([0,\n", ' ' ..., nextPtr=0x1b4bae0) at extensions/expat/lib/xmlparse.c:3551 #5 0x00002aaaaf68cc61 in XML_ParseBuffer (parser=0x1d20670, len=28625724, isFinal=0) at extensions/expat/lib/xmlparse.c:1562 #6 0x00002aaaaf689467 in xmlparse_Parse (self=0x1d20670, args=) at extensions/pyexpat.c:922 #7 0x0000000000419b9d in PyObject_Call (func=0x1a52b48, arg=0x2b7a710, kw=0x1b4c610) at Objects/abstract.c:2490 #8 0x00000000004902f8 in PyEval_EvalFrameEx (f=0x8cba40, throwflag=) at Python/ceval.c:3944 #9 0x0000000000494824 in PyEval_EvalCodeEx (co=0x2b5738764558, globals=, locals=, args=0x23aa130, argcount=4, kws=0x23aa150, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2908 -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Mon May 5 18:30:30 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 05 May 2008 18:30:30 +0200 Subject: [Python-3000] Displaying strings containing unicode escapes In-Reply-To: <481E6F64.80902@gmail.com> References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp> <4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com> <4807C3C1.6010602@v.loewis.de> <4819F9E7.9040706@v.loewis.de> <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp> <481E6F64.80902@gmail.com> Message-ID: <481F3626.5010207@v.loewis.de> > Using the names gets fairly verbose compared to the hex escapes though: > >>>> u"\N{GREEK SMALL LETTER ALPHA}" > u'\u03b1' >>>> u"\N{GREEK CAPITAL LETTER ALPHA}" > u'\u0391' >>>> u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}" > u'\u0386' The extreme case (in Python 2.5) is py> u"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}" u'\ufbf9' Regards, Martin From martin at v.loewis.de Mon May 5 18:46:32 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 05 May 2008 18:46:32 +0200 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: References: Message-ID: <481F39E8.2010904@v.loewis.de> > This code is now open source! Browse it here: > > http://code.google.com/p/rietveld/source/browse Are you also going to call it Rietveld then? Sounds better to me than "the open source code review tool". Regards, Martin From guido at python.org Mon May 5 19:24:56 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 5 May 2008 10:24:56 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: <481F39E8.2010904@v.loewis.de> References: <481F39E8.2010904@v.loewis.de> Message-ID: On Mon, May 5, 2008 at 9:46 AM, "Martin v. L?wis" wrote: > > This code is now open source! Browse it here: > > > > http://code.google.com/p/rietveld/source/browse > > Are you also going to call it Rietveld then? Sounds better > to me than "the open source code review tool". I've been reluctant to use the Rietveld name too much since Americans can't spell it. :-) But the open source project *is* called Rietveld, so I suppose I should start using that name... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon May 5 19:32:43 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 5 May 2008 12:32:43 -0500 Subject: [Python-3000] [Python-Dev] Invitation to try out open source code review tool In-Reply-To: References: <481F39E8.2010904@v.loewis.de> Message-ID: <18463.17595.324578.284849@montanaro-dyndns-org.local> Guido> I've been reluctant to use the Rietveld name too much since Guido> Americans can't spell it. :-) But the open source project *is* Guido> called Rietveld, so I suppose I should start using that name... Which reminds me... What's it mean? All I saw was a Dutch city and (maybe?) a Dutch architect by that name. Skip From guido at python.org Mon May 5 19:33:57 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 5 May 2008 10:33:57 -0700 Subject: [Python-3000] [Python-Dev] Invitation to try out open source code review tool In-Reply-To: <18463.17595.324578.284849@montanaro-dyndns-org.local> References: <481F39E8.2010904@v.loewis.de> <18463.17595.324578.284849@montanaro-dyndns-org.local> Message-ID: On Mon, May 5, 2008 at 10:32 AM, wrote: > > Guido> I've been reluctant to use the Rietveld name too much since > Guido> Americans can't spell it. :-) But the open source project *is* > Guido> called Rietveld, so I suppose I should start using that name... > > Which reminds me... What's it mean? All I saw was a Dutch city and > (maybe?) a Dutch architect by that name. > > Skip > http://code.google.com/p/rietveld/wiki/CodeReviewBackground -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 6 00:33:40 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 5 May 2008 15:33:40 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: Message-ID: On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon wrote: > After two false starts over the YEARS of trying to cleanup and > reorganize the stdlib, creating a SIG to get this going, having Guido > give the PEP the once-over over the past several days, and creating > two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready > for public vetting! I've accepted this PEP. Everyone, get to work on implementing this! I'm sure some small nits will come up during the work that nobody anticipated during the PEP discussion. In that case, let's be flexible and work to update the PEP with the best possible solution. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Tue May 6 01:20:51 2008 From: brett at python.org (Brett Cannon) Date: Mon, 5 May 2008 16:20:51 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: Message-ID: On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum wrote: > On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon wrote: > > After two false starts over the YEARS of trying to cleanup and > > reorganize the stdlib, creating a SIG to get this going, having Guido > > give the PEP the once-over over the past several days, and creating > > two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready > > for public vetting! > > I've accepted this PEP. Woohoo! > Everyone, get to work on implementing this! > I'm sure some small nits will come up during the work that nobody > anticipated during the PEP discussion. In that case, let's be flexible > and work to update the PEP with the best possible solution. And use the PEP to keep track of what state everything is in! Hopefully I will start work on this tonight or tomorrow. -Brett From musiccomposition at gmail.com Tue May 6 02:03:35 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 5 May 2008 19:03:35 -0500 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: Message-ID: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com> On Mon, May 5, 2008 at 6:20 PM, Brett Cannon wrote: > On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum wrote: > > > > I've accepted this PEP. > > Woohoo! Congrats! > > > Everyone, get to work on implementing this! > > I'm sure some small nits will come up during the work that nobody > > anticipated during the PEP discussion. In that case, let's be flexible > > and work to update the PEP with the best possible solution. > > And use the PEP to keep track of what state everything is in! > Hopefully I will start work on this tonight or tomorrow. What can I/we do to help? -- Cheers, Benjamin Peterson From ishimoto at gembook.org Tue May 6 02:56:24 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Tue, 6 May 2008 09:56:24 +0900 Subject: [Python-3000] PEP 3108 - String representation in Python 3000 Message-ID: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> I've written a PEP for new string representation in Python 3000. Patch is updated at http://bugs.python.org/issue2630, and Guido updated a patch to Rietveld: http://codereview.appspot.com/767 . I would appreciate your comments and help. ----------------------------------------------- PEP: 3138 Title: String representation in Python 3000 Version: $Revision$ Last-Modified: $Date$ Author: Atsuo Ishimoto Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-May-2008 Post-History: Abstract ======== This PEP proposes new string representation form for Python 3000. In Python prior to Python 3000, the repr() built-in function converts arbitrary objects to printable ASCII strings for debugging and logging. For Python 3000, a wider range of characters, based on the Unicode standard, should be considered 'printable'. Motivation ========== The current repr() converts 8-bit strings to ASCII using following algorithm. - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII characters(>=0x80) to '\\xXX'. - Backslash-escape quote characters(' or ") and add quote character at head and tail. For Unicode strings, the following additional conversions are done. - Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. - Convert 16-bit characters(>=0x100) to '\\uXXXX'. - Convert 21-bit characters(>=0x10000) and surrogate pair characters to '\\U00xxxxxx'. This algorithm converts any string to printable ASCII, and repr() is used as handy and safe way to print strings for debugging or for logging. Although all non-ASCII characters are escaped, this does not matter when most of the string's characters are ASCII. But for other languages, such as Japanese where most characters in a string are not ASCII, this is very inconvenient. Python 3000 has a lot of nice features for non-Latin users such as non-ASCII identifiers, so it would be helpful if Python could also progress in a similar way for printable output. Some users might be concerned that such output will mess up their console if they print binary data like images. But this is unlikely to happen in practice because bytes and strings are different types in Python 3000, so printing an image to the console won't mess it up. This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. Specification ============= - The algorithm to build repr() strings should be changed to: * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'. * Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. * Convert Unicode whitespace other than ASCII space('\\x20') and control characters (categories Z* and C* in the Unicode database) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. - Set the Unicode error-handler for sys.stdout and sys.stderr to 'backslashreplace' by default. Rationale ========= The repr() in Python 3000 should be Unicode not ASCII based, just like Python 3000 strings. Also, conversion should not be affected by the locale setting, because the locale is not necessarily the same as the output device's locale. For example, it is common for a daemon process to be invoked in an ASCII setting, but writes UTF-8 to its log files. Characters not supported by user's console are hex-escaped on printing, by the Unicode encoders' error-handler. If the error-handler of the output file is 'backslashreplace', such characters are hex-escaped without raising UnicodeEncodeError. For example, if your default encoding is ASCII, ``print('?')`` will prints '\\xa2'. If your encoding is ISO-8859-1, '' will be printed. Printable characters -------------------- The Unicode standard doesn't define Non-printable characters, so we must create our own definition. Here we propose to define Non-printable characters as follows. - Non-printable ASCII characters as Python 2. - Broken surrogate pair characters. - Characters defined in the Unicode character database as * Cc (Other, Control) * Cf (Other, Format) * Cs (Other, Surrogate) * Co (Other, Private Use) * Cn (Other, Not Assigned) * Zl Separator, Line ('\\u2028', LINE SEPARATOR) * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR) * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in this category should be escaped to avoid ambiguity. Alternate Solutions ------------------- To help debugging in non-Latin languages without changing repr(), other suggestion were made. - Supply a tool to print lists or dicts. Strings to be printed for debugging are not only contained by lists or dicts, but also in many other types of object. File objects contain a file name in Unicode, exception objects contain a message in Unicode, etc. These strings should be printed in readable form when repr()ed. It is unlikely to be possible to implement a tool to print all possible object types. - Use sys.displayhook and sys.excepthook. For interactive sessions, we can write hooks to restore hex escaped characters to the original characters. But these hooks are called only when the result of evaluating an expression entered in an interactive Python session, and doesn't work for the print() function or for non-interactive sessions. - Subclass sys.stdout and sys.stderr. It is difficult to implement a subclass to restore hex-escaped characters since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. For example, `` print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But there is no chance to tell file objects apart. - Make the encoding used by unicode_repr() adjustable. There is no benefit preserving the current repr() behavior to make application/library authors aware of non-ASCII repr(). And selecting an encoding on printing is more flexible than having a global setting. Open Issues =========== - A lot of people use UTF-8 for their encoding, for example, en_US.utf8 and de_DE.utf8. In such cases, the backslashescape trick doesn't work. Backwards Compatibility ======================= Changing repr() may break some existing codes, especially testing code. Five of Python's regression test fail with this modification. If you need repr() strings without non-ASCII character as Python 2, you can use following function. :: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") Reference Implementation ======================== http://bugs.python.org/issue2630 References ========== .. [1] Multibyte string on string::string_print (http://bugs.python.org/issue479898) Copyright ========= This document has been placed in the public domain. From ishimoto at gembook.org Tue May 6 03:00:22 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Tue, 6 May 2008 10:00:22 +0900 Subject: [Python-3000] PEP 3138 - String representation in Python 3000 Message-ID: <797440730805051800w15a13a8eodc56e2cde72f9177@mail.gmail.com> Oops, I missed PEP-number in the subject! "PEP 3138 - String representation in Python 3000" should be correct subject. From phd at phd.pp.ru Tue May 6 08:09:17 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 6 May 2008 10:09:17 +0400 Subject: [Python-3000] PEP 3138 - String representation in Python 3000 In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> Message-ID: <20080506060917.GA29253@phd.pp.ru> Hello! Well done! Thank you! On Tue, May 06, 2008 at 09:56:24AM +0900, Atsuo Ishimoto wrote: > I've written a PEP for new string representation in Python 3000. > > Patch is updated at http://bugs.python.org/issue2630, and Guido > updated a patch to Rietveld: > http://codereview.appspot.com/767 . > > I would appreciate your comments and help. > > ----------------------------------------------- > PEP: 3138 > Title: String representation in Python 3000 > Version: $Revision$ > Last-Modified: $Date$ > Author: Atsuo Ishimoto > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 05-May-2008 > Post-History: > > Abstract > ======== > > This PEP proposes new string representation form for Python 3000. In > Python prior to Python 3000, the repr() built-in function converts > arbitrary objects to printable ASCII strings for debugging and logging. > For Python 3000, a wider range of characters, based on the Unicode > standard, should be considered 'printable'. > > > Motivation > ========== > > The current repr() converts 8-bit strings to ASCII using following > algorithm. > > - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. > > - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII > characters(>=0x80) to '\\xXX'. > > - Backslash-escape quote characters(' or ") Currently Python doesn't escape double-quote ("), it only escapes apostrophe ('). > and add quote character at "and add the quote character (apostrophe, ') at" > head and tail. I think they are "the beginning" and "the end" of the string. > For Unicode strings, the following additional conversions are done. > > - Convert leading surrogate pair characters without trailing character > (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. > > - Convert 16-bit characters(>=0x100) to '\\uXXXX'. > > - Convert 21-bit characters(>=0x10000) and surrogate pair characters to > '\\U00xxxxxx'. > > This algorithm converts any string to printable ASCII, and repr() is > used as handy and safe way to print strings for debugging or for > logging. Although all non-ASCII characters are escaped, this does not > matter when most of the string's characters are ASCII. But for other > languages, such as Japanese where most characters in a string are not > ASCII, this is very inconvenient. Python 3000 has a lot of nice features > for non-Latin users such as non-ASCII identifiers, so it would be > helpful if Python could also progress in a similar way for printable > output. > > Some users might be concerned that such output will mess up their > console if they print binary data like images. But this is unlikely to > happen in practice because bytes and strings are different types in > Python 3000, so printing an image to the console won't mess it up. > > This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. > > > Specification > ============= > > - The algorithm to build repr() strings should be changed to: > > * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. > > * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to > '\\xXX'. > > * Convert leading surrogate pair characters without trailing character > (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. > > * Convert Unicode whitespace other than ASCII space('\\x20') and > control characters (categories Z* and C* in the Unicode database) > to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. > > - Set the Unicode error-handler for sys.stdout and sys.stderr to > 'backslashreplace' by default. > > > Rationale > ========= > > The repr() in Python 3000 should be Unicode not ASCII based, just like > Python 3000 strings. Also, conversion should not be affected by the > locale setting, because the locale is not necessarily the same as the > output device's locale. For example, it is common for a daemon process > to be invoked in an ASCII setting, but writes UTF-8 to its log files. Not only to log files. HTTP daemons, e.g., run with one locale but answer to all kinds of clients. > Characters not supported by user's console are hex-escaped on printing, > by the Unicode encoders' error-handler. If the error-handler of the > output file is 'backslashreplace', such characters are hex-escaped > without raising UnicodeEncodeError. For example, if your default > encoding is ASCII, ``print('?')`` will prints '\\xa2'. If your encoding > is ISO-8859-1, '' will be printed. > > > Printable characters > -------------------- > > The Unicode standard doesn't define Non-printable characters, so we must > create our own definition. Here we propose to define Non-printable > characters as follows. > > - Non-printable ASCII characters as Python 2. > > - Broken surrogate pair characters. > > - Characters defined in the Unicode character database as > > * Cc (Other, Control) > * Cf (Other, Format) > * Cs (Other, Surrogate) > * Co (Other, Private Use) > * Cn (Other, Not Assigned) > * Zl Separator, Line ('\\u2028', LINE SEPARATOR) > * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR) > * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in > this category should be escaped to avoid ambiguity. > > > Alternate Solutions > ------------------- > > To help debugging in non-Latin languages without changing repr(), other > suggestion were made. > > - Supply a tool to print lists or dicts. > > Strings to be printed for debugging are not only contained by lists or > dicts, but also in many other types of object. File objects contain a > file name in Unicode, exception objects contain a message in Unicode, > etc. These strings should be printed in readable form when repr()ed. > It is unlikely to be possible to implement a tool to print all > possible object types. > > - Use sys.displayhook and sys.excepthook. > > For interactive sessions, we can write hooks to restore hex escaped > characters to the original characters. But these hooks are called only > when the result of evaluating an expression entered in an interactive > Python session, and doesn't work for the print() function or for > non-interactive sessions. Or for logging.debug("%r", ...) > - Subclass sys.stdout and sys.stderr. > > It is difficult to implement a subclass to restore hex-escaped > characters since there isn't enough information left by the time it's > a string to undo the escaping correctly in all cases. For example, `` > print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But > there is no chance to tell file objects apart. > > - Make the encoding used by unicode_repr() adjustable. > > There is no benefit preserving the current repr() behavior to make > application/library authors aware of non-ASCII repr(). And selecting > an encoding on printing is more flexible than having a global setting. > > > Open Issues > =========== > > - A lot of people use UTF-8 for their encoding, for example, en_US.utf8 > and de_DE.utf8. In such cases, the backslashescape trick doesn't work. Also there is a problem of similarly drawing characters in Western, Greek and Cyrillic languages. These languages use similar (but different) alphabets (descended from the common ancestor) and contain letters that look similar but has different character codes. For example, it is hard to distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of course, very much depends on the fonts used but usually these letters are almost indistinguishable.) > Backwards Compatibility > ======================= > > Changing repr() may break some existing codes, especially testing code. > Five of Python's regression test fail with this modification. If you > need repr() strings without non-ASCII character as Python 2, you can use > following function. > > :: > > def repr_ascii(obj): > return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") > > > Reference Implementation > ======================== > > http://bugs.python.org/issue2630 > > > References > ========== > > .. [1] Multibyte string on string::string_print > (http://bugs.python.org/issue479898) > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/phd%40phd.pp.ru Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From brett at python.org Tue May 6 08:22:52 2008 From: brett at python.org (Brett Cannon) Date: Mon, 5 May 2008 23:22:52 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com> References: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com> Message-ID: On Mon, May 5, 2008 at 5:03 PM, Benjamin Peterson wrote: > On Mon, May 5, 2008 at 6:20 PM, Brett Cannon wrote: > > On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum wrote: > > > > > > > I've accepted this PEP. > > > > Woohoo! > > Congrats! > > > > > > > Everyone, get to work on implementing this! > > > I'm sure some small nits will come up during the work that nobody > > > anticipated during the PEP discussion. In that case, let's be flexible > > > and work to update the PEP with the best possible solution. > > > > And use the PEP to keep track of what state everything is in! > > Hopefully I will start work on this tonight or tomorrow. > > What can I/we do to help? Once I have worked out exactly needs to be done for each possible thing (deletion, rename), then going through the motions in terms of just doing the right thing for 2.6/3.0. I have an idea on how I want to test the deletion warnings. Once I have that in place then it should be a matter of adding the tests to test_py3kwarn, the warning in the module, and the proper note in the docs. -Brett From glyph at divmod.com Fri May 2 02:03:24 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 02 May 2008 00:03:24 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> Message-ID: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> On 11:45 pm, guido at python.org wrote: >I like this, except one issue: I really don't like the .local >directory. I don't see any compelling reason why this needs to be >~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide >it from view, especially since the user is expected to manage this >explicitly. I've previously given a spirited defense of ~/.local on this list ( http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) among other places. Briefly, "lib" is not the only directory participating in this convention; you've also got the full complement of other stuff that might go into an installation like /usr/local. So, while "lib" might annoy me a little, "bin etc games include lib lib32 man sbin share src" is going to get ugly pretty fast, especially if this is what comes up in Finder or Nautilus or Explorer every time I open a window. If it's going to be a visible directory on the grounds that this is a Python- specific thing that is explicitly *not* participating in a convention with other software, then please call it "~/Python" or something. Am I the only guy who finds software that insists on visible, fixed files in my home directory rude? vmware, for example, wants a "~/vmware" directory, but pretty much every other application I use is nice enough to use dotfiles (even cedega, with a roughly-comparable-to- lib "applications I've installed for you" folder). Put another way - it's trivial to make ~/.local/lib show up by symlinking ~/lib, but you can't make ~/lib disappear, and lots of software ends up looking at ~. From asmodai at in-nomine.org Fri May 2 07:07:20 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 2 May 2008 07:07:20 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> Message-ID: <20080502050720.GO78165@nexus.in-nomine.org> -On [20080501 22:27], Barry Warsaw (barry at python.org) wrote: >Time is running short to get any new features into Python 2.6 and >3.0. Is there a reliable way to identify 32-bits and 64-bits Windows from within Python? I have not found any yet, but it might be a mere oversight on my behalf. The reason I ask is that both return win32, which is most likely a reference to the API, even when having installed the 64 bits Python version. This, of course, by using win32 causes some issues with, for example, setuptools since it generate an egg with a win32 identifier. Now if you have Python C extension code it will be 64-bit compiled, thus not working on 32-bits Windows. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B All are lunatics, but he who can analyze his delusions is called a philosopher. From glyph at divmod.com Fri May 2 05:25:49 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 02 May 2008 03:25:49 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> Message-ID: <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> On 01:55 am, guido at python.org wrote: >On Thu, May 1, 2008 at 5:03 PM, wrote: Hi everybody. I apologize for writing yet another lengthy screed about a simple directory naming issue. I feel strongly about it but I encourate anyone who doesn't to simply skip it. First, some background: my strong feelings here are actually based on an experience I had a long time ago when helping someone with some C++ programming homework. They were baffled because when I helped them the programs compiled, but then as soon as they tried it on their own it didn't. The issue was that I had replicated my own autotools-friendly directory structure for them (at the time, "~/bin", "~/include", "~/lib", "~/etc", and so on managed with GNU stow) onto their machine and edited their shell setup to include them appropriately. But, as soon as I was finished, they "cleaned up" the "mess" I had left behind, and thereby removed all of their build dependencies. This was on a shared university build server, before the days of linux as a friendly, graphical operating system which encouraged you to look even more frequently at your home directory, so if anything I suspect the likelihood that this is a problem would be worse now. Since cleaning up my own home directory, of course, I find that I appreciate the lack of visual noise in Nautilus et. al. as well. Also, while I obviously think all tools should work this way, I think that Python in particular will attract an audience who is learning to program but not necessarily savvy with arcane nuances of filesystem layout, and it would be best if those details were abstracted. My concern here is for the naive python developer reading installation instructions off of a wiki and trying to get started with Twisted development. Seeing a directory created in your home directory (or, as the case may be, 3 directories, "bin", "lib", and "include") is a bit of a surprise. They don't actually care where the files in their installed library are, as long as they're "installed", and they can import them. However, they may care that clicking on the little house icon now shows not "Pictures", "Movies", etc, but "lib" (what's a 'lib'?) "bin" (what's a bin? is that like a box where I throw my stuff?) "share" (I put my stuff in "share", but it's not shared. Wait, I'm supposed to put it in "Public"?). >> Briefly, "lib" is not the only directory participating in this >>convention; >>you've also got the full complement of other stuff that might go into >>an >>installation like /usr/local. So, while "lib" might annoy me a >>little, "bin >>etc games include lib lib32 man sbin share src" is going to get ugly >>pretty >>fast, especially if this is what comes up in Finder or Nautilus or >>Explorer >>every time I open a window. > >Unless I misread the PEP, there's only going to be a lib subdirectory. >Python packages don't put stuff in other places AFAIK. Python packages, at the very least, frequently put stuff in "bin" (or "scripts", I think, on Windows). Not all Python packages are pure- Python packages either; setup.py boasts --install-platlib, --install- headers, --install-data, and --exec-prefix options, which suggests an "include", "bin", and "share" directory, at least. I'm sure if I had more time to grovel around I'd find one that installed manpages. Twisted has some, but apparently setup.py doesn't do anything with them, we leave that to the OS packages... Of course, very little of this is handled by the PEP. But even the usage of the name "lib" implies that the PEP is taking some care to be compatible with an idiom that goes beyond Python itself here, or at least beyond simple Python packages. Even assuming that no Python library ever wanted to install any of these things, there are many Python libraries which are simply wrappers around lower-level libraries, and if I want to perform a per-user install of one of those, I am going to ./configure --prefix=~/something (and by "something", I mean ".local" ;)) and it would be nice to have Python living in the same space. For that matter it'd be nice to get autotools and Ruby and PHP and Perl and Emacs (ad nauseum) all looking at ~/.local as a mirror of /usr, so that I didn't have to write a bunch of shell bootstrap glue to get everything to behave consistently, or learn the new, special names for bits of configuration under "~" that are different from the ones under /usr/local or /etc. I replicate a consistent Python development environment with a ton of bizarre dependencies across something like 15 different OS installations (not to mention a bevy of virtual machines I keep around just for fun), so I think about these issues a lot. Most of these machines are macs and linux boxes, but I do my best on Windows too. FWIW I don't have any idea what the right thing to do is on Windows; ".local" doesn't particularly make sense, but neither does "lib" in that context. There's no reasonable guess as to where to put scripts, or dependent shared libraries... but then, per-user installation is less of an issue on Windows. >On the Mac, the default Finder window is not your home directory but >your Desktop, which is a subdirectory thereof with a markedly public >name. In fact, OS X has a whole bunch of reserved names in your home >directory, and none of them start with a dot. The rule seems to be >that if it contains stuff that the user cares about, it doesn't start >with a dot. Hmm. On my Mac laptop, the default Finder window is definitely my home directory; this may be an artifact of many OS upgrades or some tweak that I performed a long time ago and forgot about, though. Apologies if that is not the average user experience. For what it's worth, Ubuntu also has some directories that it creates: Desktop, Pictures, Documents, Examples, Templates, Videos. These are empty, and I typically delete the ones I don't use. >>If it's going to be a visible directory on the >>grounds that this is a Python- specific thing that is explicitly *not* >>participating in a convention with other software, then please call it >>"~/Python" or something. > >Much better than ~/.local/ IMO. It depends how this is being perceived. If this is Python mirroring the /usr/local layout convention for users, as the name "lib" implies, then this is worse. However, if Python is just trying to select a location for its own library bookkeeping and not allow the installation of platform libraries or scripts using this mechanism... well, ~/.python.d would still be my preference ;-) but I could at least understand "Python" as mirroring the Mac, GNOME and KDE convention for a few very special directories. >> Am I the only guy who finds software that insists on visible, fixed >>files >>in my home directory rude? vmware, for example, wants a "~/vmware" >>directory, but pretty much every other application I use is nice >>enough to >>use dotfiles (even cedega, with a roughly-comparable-to- lib >>"applications >>I've installed for you" folder). > >The distinction to my mind is that most dot files (with the exception >of a few like .profile or .bashrc) are not managed by most users -- >the apps that manage them provide an APIs for manipulating their >contents. (Sort of like thw Windows registry.) Non-dot files are for >stuff that the user needs to be aware of. My experience of modern Linux suggests that the usage you're describing is gradually being phased out - applications that want to manage some non-user-visible storage in something like the registry increasingly use gconf (or a database, in server-land). Granted, gconf itself is stored in dotfiles, but it's just a few. In my home directory I have, in version control, variously written by hand or databases maintained from externally downloaded stuff: ~/.asoundrc ~/.emacs ~/.vimrc ~/.vim ~/.Xresources ~/.fonts ~/.gnomerc ~/.inputrc ~/.bashrc ~/.bash_profile ~/.profile ~/.screenrc ~/.Xresources ~/.ssh/config ~/.ssh/authorized_keys ~/.ssh/known_hosts I know about these dot files and I care about them and I maintain them, but they're there for the benefit of particular pieces of software, not me. There are a lot of other dotfiles there, but I don't think that this set is "a few"; I am quite happy that I don't have to see every one of them every time I am looking at my home directory in a "save as" dialog. >I'm not sure where Python packages fall, but ISTM that this is >something a user must explicitly choose as the target of an installer. >The user is also likely to have to dig through there to remove stuff, >as Python package management doesn't have a way to remove packages. I hope that users never have to explicitly choose this as the target of the installer; I was under the impression that the point of adding this feature was to allow the default behavior of distutils to work simply and automatically on UNIX-y platforms rather than puking about permissions, or requiring arcana like "sudo" access or editing your shell's startup. I am quietly agitating elsewhere to get ~/.local/bin added to $PATH by default, by the way ;-). (~/.local/lib on $LD_LIBRARY_PATH is a hard sell, but that too...) Once you have to know about it and explicitly choose it it's not much more work to set all the appropriate shell environment variables yourself. And, for that matter, *I* already have, so I suppose regardless of the outcome of this discussion I'll still have a ~/.local :-). >> Put another way - it's trivial to make ~/.local/lib show up by >>symlinking >>~/lib, > >That's not the same thing at all. I'm not sure what you're saying it's not the same as. All I'm saying is that if advanced users want to show it, they'll symlink it; if naive users want to hide it, they'll delete it and break python, possibly without knowing why ;). >>but you can't make ~/lib disappear, and lots of software ends up >>looking at ~. > >But what software cares about another file there? My home directory is >mostly a switching point where I have quick access to everything I >access regularly. Nothing's going to break, if that's what you mean. No software processes the list of ~ and does anything with it; but lots of stuff shows me that list. In GNOME, on Ubuntu, when a "choose file" dialog comes up, 80% of the time it comes up by default in my home directory. When I open a terminal it opens in my home directory. The default location for Emacs is my home directory. I can quickly measure my cognitive load by looking at the contents of that directory. Since my shell starts there, autocomplete starts there, and so common-letter real estate is scarce. I have a directory called "Projects" that I currently autocomplete with 'p' and a directory called 'Linux' that I autocomplete with 'l'; either public-name proposal will have me typing an additional letter on these every day ;-). In other words, I care about another file there. I use my home directory as a sort of to-do list; it's mostly empty unless I have a lot going on, in which case it fills up with various objects I'm working on, and then I empty it out again. There are a few exceptions to this rule; on every platform there are a few things the OS puts there, but they are generally things like "Pictures", "Desktop", and "Music"... where I put pictures, downloaded files, and music. The Mac's "Library" directory has never bothered me, since it's OS-provided and basically an alternate location for dotfiles. ("Application Data" and friends are another story.) In a way, I agree with you. "everything I access regularly" is a good description of my home directory. Except, this "lib" directory is not something I want to access regularly; very occasionally, maybe once every few weeks, I want to chuck some dependency in there and then forget about it for a year. From glyph at divmod.com Fri May 2 07:48:17 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 02 May 2008 05:48:17 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> Message-ID: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com> On 03:49 am, guido at python.org wrote: >I stand corrected on a few points. You've convinced me that ~/lib/ is >wrong. But I still don't like ~/.local/; not in the last place because >it's not any more local than any other dot files or directories. The >"symmetry" with /usr/local/ is pretty weak, and certainly won't help >beginning users. Why do you say the symmetry is weak? The name might not be that evocative, but the main thrust of what I'm saying is that "~/." should be an autotools-style directory layout. The symmetry I suggest is in exactly that sense; that's what /usr/local is. I don't actually care what "" is, except I (and many others) already use "local" for that value, and the more software that honors it, the better. GNU stow (arguably the king of per-user installation management) suggests ~/local as an autotools --prefix target; the free desktop project implicitly suggests ~/.local (by suggesting ~/.local/share is a place to put the same files that would normally be searched for in /usr/share and /usr/local/share). So the word "local" is just floating around in this meme space; I don't like the word that much, but I don't see that there's a different one which more clearly evokes the concept either. I originally used "~/UNIX" and then ~/.unix, but switched to .local when I noticed other folks doing it. One I've actually seen mentioned a few times is "~/.nix-config", which I certainly don't think is any better. It would help beginning users if ~/.local/bin and ~/.local/lib were honored by the system. I, and other adherents of this idea that it would be nice if users could install source without admin privs, have been suggesting that to distro guys when I (we) can, and I figure in a few years, somebody might bite. If that happens, it will start being *easier* to build stuff from source into a separated location than to need root, stomp on the system, and inevitably break some stuff. Agitating for ~/Python/Platform/Libraries on $LD_LIBRARY_PATH (or equivalent) is a lot harder to do with a straight face. This is the reason I'm bothering to spill so many pixels on this topic; I think it would be great if Python were the first real adopter of this convention, and once *one* project has really gone full bore, each subsequent one is progressively easier to convince. However, if you've made up your mind on ~/Python, I think I've more than made my case at this point, so I'll stop cluttering up the lists :). (By the way, for what it's worth: I _hate_ the bin/lib/etc/man/src/include naming convention mess, but it's a mess which is programmatically honored in like a hundred billion lines of code. This is why I want it supported, but hidden ;).) >As a compromise, I'm okay with ~/Python/. I would like to be able to >say that the user explicitly has to set an environment variable in >order to benefit from this feature, just like with $PYTHONPATH and >$PYTHONSTARTUP. But that might defeat the point of making this easy to >use for noobs. Is there another point? It seems to me that this change is entirely about shared conventions and "works by default" behavior. If you are going to set an environment variable, set PYTHONPATH; it's already much more flexible. ~/Python opens up some new problems though, although perhaps they are trivially resolved: how should this interoperate with distutils? 'Just make "python setup.py --user" do what "python setup.py --prefix ~/.local" would do' is pretty straightforward, but "~/Python" would need a new convention. Should "~/Python" have a "~/Python/Scripts" directory that one could add to $PATH? A "~/Python/Platform" directory, for includes, libraries, other random junk like manpages or HTML docs? ~/Python/2.6/lib, or ~/Python2.6/lib? To be fair, a separate, and purpose-designed Python directory layout might also make certain things neater. For example one could support parallel installation with Python2.6 (or Python/2.6) by giving each a 'lib' and 'bin' directory, and always having the scripts in the 2.6/bin dir invoke the 2.6 interpreter, rather than having separated space for libraries but having to mangle the names of scripts ("twistd8.0-py2.6"). I'd still prefer compatibility-by-convention with other tools, languages, etc, though. In the long term, if everyone followed suit on ~/.local, that would be great. But I don't want a ~/Python, ~/Java, ~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as my arm just so I can run a few applications without system-installing them. >On OS X I think we should put this somewhere under ~/Library/. Just >put it in a different place than where the Python framework puts its >stuff. Isn't the whole point that it should be the same place? Under current Python releases, OS X already has this functionality via ~/Library/Python/2.5/site-packages. Also, I'd strongly suggest supporting both ~/Library (although the existing location seems fine to me) *and* whatever the default is on other platforms; there are already enough points of pain where OS X behaves "kind of like a UNIX, but not really", and every project needs to add these little workarounds and caveats in the documentation. Is there a benefit to be derived from making this situation worse by introducing another such subtlety? From steve at holdenweb.com Fri May 2 10:49:17 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 02 May 2008 04:49:17 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> Message-ID: <481AD58D.2010201@holdenweb.com> Guido van Rossum wrote: > I stand corrected on a few points. You've convinced me that ~/lib/ is > wrong. But I still don't like ~/.local/; not in the last place because > it's not any more local than any other dot files or directories. The > "symmetry" with /usr/local/ is pretty weak, and certainly won't help > beginning users. > So it's the *name* you don't like rather than the invisibility? > As a compromise, I'm okay with ~/Python/. I would like to be able to > say that the user explicitly has to set an environment variable in > order to benefit from this feature, just like with $PYTHONPATH and > $PYTHONSTARTUP. But that might defeat the point of making this easy to > use for noobs. > Groan. Then everyone else realizes what a "great idea" this is, and we see ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with a comment market in its name), ~/Lisp/ and the rest? I don't think people would thank us for that in the long term. I'm about +10 on invisibility, for the simple reason that "hiding the mechanism" is the right thing to do for naive users, who are the most likely to screw things up if given the chance and the most likely to be unaware of dot-name directories. If you don't like ~/.local/ then please consider ~/.private/ or ~/.personal/ or something else, but don't gratuitously add a visible subdirectory. > On OS X I think we should put this somewhere under ~/Library/. Just > put it in a different place than where the Python framework puts its > stuff. > Nothing to say about OS X. One day Windows might start to respect the "hidden dot" convention, but perhaps in the interim we could create a (Windows-hidden) ~/.private/? Assuming we could work out where to put it ;-) > On Thu, May 1, 2008 at 8:25 PM, wrote: [much good sense] regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From glyph at divmod.com Fri May 2 20:34:35 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 02 May 2008 18:34:35 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> Message-ID: <20080502183435.25821.905798949.divmod.xquotient.7501@joule.divmod.com> On 05:53 pm, fdrake at acm.org wrote: >On May 1, 2008, at 7:54 PM, Barry Warsaw wrote: >>Interesting. I'm of the opposite opinion. I really don't want >>Python dictating to me what my home directory should look like (a dot >>file doesn't count because so many tools conspire to hide it from >>me). I guess there's always $PYTHONUSERBASE, but I think I will not >>be alone. ;) >Using ~/.local/ for user-managed content doesn't seem right to me at >all, because it's hidden by default. I don't understand your reason for saying this. Terms like "user" and "manage" are somewhat vague. What sort of experience are you hoping to provide what sort of user with this convention? I hope my earlier explanations were clear as far as the types of users. I believe that the management of ~/.local/ is a subtle question. It will largely be "managed" by simply telling distutils to put files there; I hope, implicitly. In my mind there are 2 types of users who will be "managing" it - newbies, who don't really know what's going on but want "cd mypackage-0.0.1; python setup.py install; python -c 'import mypackage'" (or perhaps even "easy_install mypackage") to work, and advanced users who want to be able to mix-and-match different versions of different packages. Advanced users might already have a PYTHONPATH management (virtual python, virtualenv, combinator, ~/.bashrc hacks, a directory full of symlinks) that already works for them, or be comfortable with inspecting a hidden directory, so ~/.local isn't a problem for them (i.e. us); newbies don't want to see the directory until they already know what's going on. >I'd be even happier if there were no default per-user location, but a >required configuration setting (in the existing distutils config >locations) in order to enable per-user installation. If you're happier without this feature, then perhaps your tastes run counter to a useful implementation of it :). Why wouldn't you want it, though? PYTHONPATH still exists; you don't have to use it, personally. From andy at hexten.net Fri May 2 21:39:47 2008 From: andy at hexten.net (Andy Armstrong) Date: Fri, 2 May 2008 20:39:47 +0100 Subject: [Python-3000] Invitation to try out open source code review tool Message-ID: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net> Hi Guido, I'm afraid I've added a Perl based project (Test::Harness). I then went back and read your post and got to the bit where you specifically invited *Python* developers. Sorry about that. I'm not trying to colonise Pythonspace with Perl, honest :) -- Andy Armstrong, Hexten From glyph at divmod.com Sun May 4 15:58:03 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sun, 04 May 2008 13:58:03 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> Message-ID: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> On 3 May, 11:34 pm, fdrake at acm.org wrote: >On May 3, 2008, at 7:51 AM, skip at pobox.com wrote: >>Fred asked for a --prefix flag (which is what I was voting on). I >>don't >>really care what you do by default as long as you give me a way to do >>it >>differently. > >What's most interesting (to me) is that no one's commented on my note >that my preferred approach would be that there's no default at all; >the location would have to be specified explicitly. Whether on the >command line or in the distutils configuration doesn't matter, but >explicitness should be required. I thought I responded to it in my initial response, but let me be clearer. First, Skip, I *only* care about the default behavior. There's already a way to do it differently: PYTHONPATH. So, Fred, I think what you're arguing for is to drop this feature entirely. Or is there some other use for a new way to allow users to explicitly add something to sys.path, aside from PYTHONPATH? It seems that it would add more complexity and I can't see what the value would be. As I've said a dozen times in this thread already, the feature I'd like to get from a per-user installation location is that 'setup.py install', or at least some completely canonical distutils incantation, should work, by default, for non-root users; ideally non-administrators on windows as well as non-root users on unixish platforms. From mal at egenix.com Sun May 4 18:31:09 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 04 May 2008 18:31:09 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481DE0CE.8010306@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <481DE0CE.8010306@cheimes.de> Message-ID: <481DE4CD.7070401@egenix.com> On 2008-05-04 18:14, Christian Heimes wrote: >> First, Skip, I *only* care about the default behavior. There's already >> a way to do it differently: PYTHONPATH. So, Fred, I think what you're >> arguing for is to drop this feature entirely. Or is there some other >> use for a new way to allow users to explicitly add something to >> sys.path, aside from PYTHONPATH? It seems that it would add more >> complexity and I can't see what the value would be. > > PYTHONPATH is lacking one feature which is important for lots of > packages and setuptools. The directories in PYTHONPATH are just added to > sys.path. But setuptools require a site package directory. Maybe a new > env var PYTHONSITEPATH could solve the problem. We don't need another setup variable for this. Just place a well-known module into the site-packages/ directory and then query it's __file__ attribute, e.g. site-packages/site_packages.py The module could even include a few helpers to query various settings which apply to the site packages directory, e.g. site_packages.get_dir() site_packages.list_packages() site_packages.list_modules() etc. >> As I've said a dozen times in this thread already, the feature I'd like >> to get from a per-user installation location is that 'setup.py install', >> or at least some completely canonical distutils incantation, should >> work, by default, for non-root users; ideally non-administrators on >> windows as well as non-root users on unixish platforms. > > The implementation of my PEP provides a new option for install: > > $ python setup.py install --user > > Is it sufficient for you? Just in case you don't know... python setup.py install --home=~ will install to ~/lib/python The problem is not getting the packages installed in a non-admin location. It's about Python looking in a non-admin location per default (as well as in the site-packages location). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 04 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Sun May 4 22:56:34 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 04 May 2008 22:56:34 +0200 Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481E1526.6000903@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <481DE0CE.8010306@cheimes.de> <481DE4CD.7070401@egenix.com> <481E1526.6000903@cheimes.de> Message-ID: <481E2302.8000509@egenix.com> On 2008-05-04 21:57, Christian Heimes wrote: > M.-A. Lemburg schrieb: >>> PYTHONPATH is lacking one feature which is important for lots of >>> packages and setuptools. The directories in PYTHONPATH are just added to >>> sys.path. But setuptools require a site package directory. Maybe a new >>> env var PYTHONSITEPATH could solve the problem. >> We don't need another setup variable for this. Just place a >> well-known module into the site-packages/ directory and then >> query it's __file__ attribute, e.g. >> >> site-packages/site_packages.py >> >> The module could even include a few helpers to query various >> settings which apply to the site packages directory, e.g. >> >> site_packages.get_dir() >> site_packages.list_packages() >> site_packages.list_modules() >> etc. > > I don't see how it is going to solve the use case "Add another site > package directory when I don't have write access to the global site > package directory and I don't want to modify my apps." No, but it's going to solve the issue "which of the sys.path directories is to be considered the site packages" directory. I was under the impression that this is what you were after. >> Just in case you don't know... >> >> python setup.py install --home=~ >> >> will install to ~/lib/python >> >> The problem is not getting the packages installed in a non-admin >> location. It's about Python looking in a non-admin location per >> default (as well as in the site-packages location). > > I know the --home option. For one the --home option is Unix only and not > supported on Windows Also the --user option takes all options of my PEP > 370 user site directory into account, includinge the PYTHONUSERBASE env var. Ok. Just wanted to mention that there is a precedent in distutils for doing user home directory installations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 04 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From asmodai at in-nomine.org Fri May 2 11:16:33 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 2 May 2008 11:16:33 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD771.6040802@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de> Message-ID: <20080502091633.GV78165@nexus.in-nomine.org> -On [20080502 11:00], Christian Heimes (lists at cheimes.de) wrote: >Windows and Mac OS X have dedicated directories for application specific >libraries. That is ~/Library on Mac and Application Data on Windows. The >latter is i18n-ed and called "Anwendungsdaten" in German. Fortunately >Windows sets an environment var to the application data directory. And Vista has C:\ProgramData\{vendor}\{application}, which is *not* $APPDATA, but $ProgramData. $APPDATA points to C:\Users\{user}\AppData\Roaming on Vista -- which is very different. "Windows uses the Roaming folder for application specific data, such as custom dictionaries, which are machine independent and should roam with the user profile. The AppData\Roaming folder in Windows Vista is the same as the Documents and Settings\username\Application Data folder in Windows XP." I think that's different from what you meant above though, since I doubt you'd want this (the libraries) to roam with the user. See http://download.microsoft.com/download/3/b/a/3ba6d659-6e39-4cd7-b3a2-9c96482f5353/Managing%20Roaming%20User%20Data%20Deployment%20Guide.doc for more background. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Seek not death in the error of your life: and pull not upon yourselves destruction with the works of your hands... From asmodai at in-nomine.org Fri May 2 11:20:08 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 2 May 2008 11:20:08 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481AD58D.2010201@holdenweb.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> Message-ID: <20080502092008.GW78165@nexus.in-nomine.org> -On [20080502 10:50], Steve Holden (steve at holdenweb.com) wrote: >Groan. Then everyone else realizes what a "great idea" this is, and we see >~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with >a comment market in its name), ~/Lisp/ and the rest? I don't think people >would thank us for that in the long term. I'm +1 on just using $HOME/.local, but otherwise $HOME/.python makes sense too. $HOME/.python.d doesn't do it for me, too clunky (and hardly used if I look at my .files in $HOME). But I agree with Steve that it should be a hidden directory. -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Cum angelis et pueris, fideles inveniamur. Quis est iste Rex gloriae..? From asmodai at in-nomine.org Fri May 2 15:11:31 2008 From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven) Date: Fri, 2 May 2008 15:11:31 +0200 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <481B0DE6.30406@lemurconsulting.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com> <481AD58D.2010201@holdenweb.com> <20080502092008.GW78165@nexus.in-nomine.org> <481B0DE6.30406@lemurconsulting.com> Message-ID: <20080502131131.GD78165@nexus.in-nomine.org> -On [20080502 14:49], Richard Boulton (richard at lemurconsulting.com) wrote: >So, on Ubuntu computers at least, it seems likely that a $HOME/.local/ >directory will already exist, with the beginnings of a unix style layout >inside it. On my Ubuntu 8 box: [15:11] [ruigrok at akuma] (0) {0} % ls ~/.local share -- Jeroen Ruigrok van der Werven / asmodai ????? ?????? ??? ?? ?????? http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B The only source of knowledge is experience... From ncoghlan at gmail.com Tue May 6 12:41:32 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 06 May 2008 20:41:32 +1000 Subject: [Python-3000] PEP 3138 - String representation in Python 3000 In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> Message-ID: <482035DC.5060906@gmail.com> Atsuo Ishimoto wrote: > I've written a PEP for new string representation in Python 3000. +1 from me - with this PEP in place getting the old repr() behaviour back is fairly straightforward (as shown in the PEP), but it's hard to get the unicode-friendly repr() behaviour any other way (because the current repr() loses too much information, as demonstrated fairly thoroughly in the python-dev thread that inspired the PEP). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From barry at python.org Tue May 6 13:26:20 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 6 May 2008 07:26:20 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> Message-ID: <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 1, 2008, at 8:03 PM, glyph at divmod.com wrote: > > Am I the only guy who finds software that insists on visible, fixed > files in my home directory rude? vmware, for example, wants a "~/ > vmware" directory, but pretty much every other application I use is > nice enough to use dotfiles (even cedega, with a roughly-comparable- > to- lib "applications I've installed for you" folder). No Glyph, you are not alone! I don't even like the OS putting stuff like Pictures, Music, Movies, Videos and Desktop in my home directory, but I guess that's the price we pay for a modrin desktopy operatin' systum. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCBAXHEjvBPtnXfVAQJdrgP+Mw0qZebL+MqUk3wKMsRt5mHzT/uHhQ0Z NVwyooWKWnvLMMifCbaG3pjVs7MehfcbAK8uLTlF8Ss9/w1Q5SWJkdhLMWOvHdA6 CJMvGyuokElD5e2cKXiakUWUshN/CeGNElTpxHUBdwmkirfXLQzQll9jlYbnr0I8 du2+rTj/oAc= =015L -----END PGP SIGNATURE----- From rasky at develer.com Tue May 6 13:41:27 2008 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 6 May 2008 11:41:27 +0000 (UTC) Subject: [Python-3000] Removal of os.path.walk References: <20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de> <481955B5.2030805@v.loewis.de> <48199954.4000800@gmail.com> Message-ID: On Thu, 01 May 2008 08:58:22 -0700, Guido van Rossum wrote: > On Thu, May 1, 2008 at 3:20 AM, Nick Coghlan wrote: >> I think Giovanni's point is an important one as well - with an >> iterator, >> you can pipeline your operations far more efficiently, since you don't >> have to wait for the whole directory listing before doing anything >> (e.g. if you're doing some kind of move/rename operation on a >> directory, you can start copying the first file to its new location >> without having to wait for the directory read to finish). >> >> Reducing the startup delays of an operation can be a very useful thing >> when >> it comes to providing a user with a good feeling of responsiveness from >> an application (and if it allows the application to more effectively >> pipeline something, there may be an actual genuine improvement in >> responsiveness, rather than just the appearance of one). > > This sounds like optimizing for a super-rare case. And please do tell me > if you've timed this. I do, it's easy. I have several Maildir directories with tens thousands of messages who take 10-15 seconds to be listed through NFS (starting from a ext3 file system). On the contrary, commands like "grep -r "whatever" ." start displaying output immediately. Without something like opendir(), it's basically making impossible to achieve this in Python. -- Giovanni Bajo Develer S.r.l. http://www.develer.com From mal at egenix.com Tue May 6 13:45:53 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 06 May 2008 13:45:53 +0200 Subject: [Python-3000] PEP 3108 - String representation in Python 3000 In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> Message-ID: <482044F1.8020100@egenix.com> On 2008-05-06 02:56, Atsuo Ishimoto wrote: > I've written a PEP for new string representation in Python 3000. > > Patch is updated at http://bugs.python.org/issue2630, and Guido > updated a patch to Rietveld: > http://codereview.appspot.com/767 . > > I would appreciate your comments and help. >... > Specification > ============= > > - The algorithm to build repr() strings should be changed to: > > * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. > > * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to > '\\xXX'. > > * Convert leading surrogate pair characters without trailing character > (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. > > * Convert Unicode whitespace other than ASCII space('\\x20') and > control characters (categories Z* and C* in the Unicode database) > to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. > > - Set the Unicode error-handler for sys.stdout and sys.stderr to > 'backslashreplace' by default. For sys.stderr it may make sense to override any error reporting because of encoding problems. -0 on that. For sys.stdout this doesn't make sense at all, since it hides encoding errors for all applications using sys.stdout as piping mechanism. -1 on that. Both are really way beyond the scope of the PEP and I don't really see the need for them. They also don't cover the cases where you write the repr() to a log file, some stream or syslog. I'd be +1 on making the error handling of sys.stdout and sys.stderr user adjustable. > Printable characters > -------------------- > > The Unicode standard doesn't define Non-printable characters, so we must > create our own definition. Here we propose to define Non-printable > characters as follows. > > - Non-printable ASCII characters as Python 2. > > - Broken surrogate pair characters. > > - Characters defined in the Unicode character database as > > * Cc (Other, Control) > * Cf (Other, Format) > * Cs (Other, Surrogate) > * Co (Other, Private Use) > * Cn (Other, Not Assigned) > * Zl Separator, Line ('\\u2028', LINE SEPARATOR) > * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR) > * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in > this category should be escaped to avoid ambiguity. This is all very nice, but if that means that the whole Unicode database has to be loaded every time the interpreter starts up as you indicated on the ticket, them I'm firmly -1 against that. We've taken great care *not* to do this in Py2.x by moving the database to a module that's imported only when needed. It would be really silly to do this now, just to get some Unicode repr() processed. BTW, I'm sure it's possible to break down the above into a set of ranges and switch cases that are easy to test without having to lookup code points in the database. Even if you do end up using the database, it should only be imported if the repr() really does not need to lookup code points outside the Latin-1 range. > Alternate Solutions > ------------------- > > To help debugging in non-Latin languages without changing repr(), other > suggestion were made. > ... > - Make the encoding used by unicode_repr() adjustable. > > There is no benefit preserving the current repr() behavior to make > application/library authors aware of non-ASCII repr(). And selecting > an encoding on printing is more flexible than having a global setting. I'm not sure what you are saying here. I proposed to make the Unicode repr() output a regular encoding that's being implemented by a codec. You could then easily change the encoding to whatever you need for your application or console. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 06 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From alec at swapoff.org Tue May 6 14:45:22 2008 From: alec at swapoff.org (Alec Thomas) Date: Tue, 6 May 2008 22:45:22 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> Message-ID: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> 2008/5/6 Barry Warsaw : > On May 1, 2008, at 8:03 PM, glyph at divmod.com wrote: > > Am I the only guy who finds software that insists on visible, fixed files > in my home directory rude? vmware, for example, wants a "~/vmware" > directory, but pretty much every other application I use is nice enough to > use dotfiles (even cedega, with a roughly-comparable-to- lib "applications > I've installed for you" folder). > > No Glyph, you are not alone! I don't even like the OS putting stuff like > Pictures, Music, Movies, Videos and Desktop in my home directory, but I > guess that's the price we pay for a modrin desktopy operatin' systum. I too find this irritating. FWIW my vote is for ~/.python. ~/.local comes in a distant second due to non-obviousness and ~/Python is several light years beyond that. -- Evolution: Taking care of those too stupid to take care of themselves. From ishimoto at gembook.org Tue May 6 14:53:08 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Tue, 6 May 2008 21:53:08 +0900 Subject: [Python-3000] PEP 3138 - String representation in Python 3000 In-Reply-To: <20080506060917.GA29253@phd.pp.ru> References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> <20080506060917.GA29253@phd.pp.ru> Message-ID: <797440730805060553s449b863dvb7536244d1d6f252@mail.gmail.com> On Tue, May 6, 2008 at 3:09 PM, Oleg Broytmann wrote: > Hello! Well done! Thank you! Thank you! I updated the Wiki http://wiki.python.org/moin/Python3kStringRepr as per your suggestions. > Not only to log files. HTTP daemons, e.g., run with one locale but > answer to all kinds of clients. > I thought it is not good idea to use repr() to render HTML, but I had to remember the cgitb module. Thank you for your help! From jeremy at alum.mit.edu Tue May 6 15:24:24 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Tue, 6 May 2008 09:24:24 -0400 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com> Message-ID: If we want to grab a particular restructuring task, is there a way to record that we're working on it? Jeremy On Tue, May 6, 2008 at 2:22 AM, Brett Cannon wrote: > On Mon, May 5, 2008 at 5:03 PM, Benjamin Peterson > wrote: > > On Mon, May 5, 2008 at 6:20 PM, Brett Cannon wrote: > > > On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum wrote: > > > > > > > > > > I've accepted this PEP. > > > > > > Woohoo! > > > > Congrats! > > > > > > > > > > > Everyone, get to work on implementing this! > > > > I'm sure some small nits will come up during the work that nobody > > > > anticipated during the PEP discussion. In that case, let's be flexible > > > > and work to update the PEP with the best possible solution. > > > > > > And use the PEP to keep track of what state everything is in! > > > Hopefully I will start work on this tonight or tomorrow. > > > > What can I/we do to help? > > Once I have worked out exactly needs to be done for each possible > thing (deletion, rename), then going through the motions in terms of > just doing the right thing for 2.6/3.0. I have an idea on how I want > to test the deletion warnings. Once I have that in place then it > should be a matter of adding the tests to test_py3kwarn, the warning > in the module, and the proper note in the docs. > > -Brett > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From ncoghlan at gmail.com Tue May 6 15:51:32 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 06 May 2008 23:51:32 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> Message-ID: <48206264.1010507@gmail.com> Alec Thomas wrote: > FWIW my vote is for ~/.python. ~/.local comes in a distant second due > to non-obviousness and ~/Python is several light years beyond that. I think if the obviousness (or lack thereof) of the chosen directory name ever really matters to anyone, we did it wrong. After all, unless you're trying to use something other than distutils to get a package ready for installation, how often does it really matter that the site-packages directory for an installed python interpreter actually lives somewhere inside /usr/local? The main advantage I see to using the "~/.local" approach is that a lot of questions about file layout (e.g. where to put architecture specific code) are automatically (and fairly obviously) answered "Do whatever is done for the system-wide equivalent in /usr/local". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ishimoto at gembook.org Tue May 6 15:55:48 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Tue, 6 May 2008 22:55:48 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 Message-ID: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> (I changed subject) Thank you for your comment. On Tue, May 6, 2008 at 8:45 PM, M.-A. Lemburg wrote: > For sys.stdout this doesn't make sense at all, since it hides encoding > errors for all applications using sys.stdout as piping mechanism. > -1 on that. You can raise UnicodeEncodigError for encoding errors if you want, by setting sys.stdout's error-handler to `strict`. > > Both are really way beyond the scope of the PEP and I don't > really see the need for them. Even though this PEP was rejected, I'll still propose to change default error-handler for sys.stdout and for sys.stderr to 'backslashreplace'. For Python 2, 'strict' error-handler is acceptable because most of text data are 8-bit string, but for Py3K, raising exceptions when the printed text contains a character not supported by console is annoying. > They also don't cover the cases > where you write the repr() to a log file, some stream or syslog. Sure. I missed some cases, such as cgitb module or logging module. I'll investigate them later. If you have another candidate, please let me know. > > - Characters defined in the Unicode character database as [snip] > > This is all very nice, but if that means that the whole Unicode > database has to be loaded every time the interpreter starts up > as you indicated on the ticket, them I'm firmly -1 against that. I changed a patch to add a flag to the _PyUnicode_TypeRecords table, so the Unicode database is not loaded at stat up. > > I proposed to make the Unicode repr() output a regular encoding > that's being implemented by a codec. You could then easily > change the encoding to whatever you need for your application > or console. I think global setting is not flexible enough. And I see no benefit to customizable repr() except to keep compatible with Python 2, but I think it is easy to migrate the existing code to the Py3k. From ncoghlan at gmail.com Tue May 6 16:10:43 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 07 May 2008 00:10:43 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> Message-ID: <482066E3.7030209@gmail.com> Atsuo Ishimoto wrote: >> I proposed to make the Unicode repr() output a regular encoding >> that's being implemented by a codec. You could then easily >> change the encoding to whatever you need for your application >> or console. > > I think global setting is not flexible enough. And I see no benefit to > customizable repr() except to keep compatible with Python 2, but I > think it is easy to migrate the existing code to the Py3k. There's a bigger issue with trying to make whatever repr() does a codec in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy. That said, it would be nice to have a way to easily stack Unicode->Unicode transforms on top of text IO streams, or byte->byte transforms on top of binary streams. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From skip at pobox.com Tue May 6 16:21:35 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 6 May 2008 09:21:35 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> Message-ID: <18464.26991.824584.173058@montanaro-dyndns-org.local> Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second Alec> due to non-obviousness and ~/Python is several light years beyond Alec> that. I guess we're going to have to agree to disagree. I find hiding directories which contain executable code extremely non-obvious. Would you prefer /usr/.local to /usr/local? If not, then why prefer ~/.local to ~/local? Skip From steve at holdenweb.com Tue May 6 16:37:04 2008 From: steve at holdenweb.com (Steve Holden) Date: Tue, 06 May 2008 10:37:04 -0400 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <18464.26991.824584.173058@montanaro-dyndns-org.local> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> Message-ID: skip at pobox.com wrote: > Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second > Alec> due to non-obviousness and ~/Python is several light years beyond > Alec> that. > > I guess we're going to have to agree to disagree. I find hiding directories > which contain executable code extremely non-obvious. Would you prefer > /usr/.local to /usr/local? If not, then why prefer ~/.local to ~/local? > Not wanting to speak for Alec, but in my opinion the answer is mostly because /usr/local doesn't impinge on a home directory listing, so I don't care that it's visible. Naive users don't go looking around the filestore any more than they poke around in their hidden subdirectories. If you want it visible, make a visible symbolic link! regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From alec at swapoff.org Tue May 6 17:11:57 2008 From: alec at swapoff.org (Alec Thomas) Date: Wed, 7 May 2008 01:11:57 +1000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <18464.26991.824584.173058@montanaro-dyndns-org.local> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> Message-ID: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> 2008/5/7 : > > Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second > Alec> due to non-obviousness and ~/Python is several light years beyond > Alec> that. > > I guess we're going to have to agree to disagree. I find hiding directories > which contain executable code extremely non-obvious. Would you prefer Python would not be unique. Mozilla/Firefox does exactly this, putting per-user plugins in ~/.mozilla. > /usr/.local to /usr/local? If not, then why prefer ~/.local to ~/local? Because unlike a home directory, users don't frequently perform directory listings or tab completion of /usr/. For a frequently used personal directory one wants the minimum of noise. Also: 1. If every application followed the convention of creating non-hidden paths in home directories the directory listing would be *incredibly* noisy. To illustrate, I have 160 dotfiles, most of which were created by applications. I have only 8 non-hidden directories, all of which I have created myself. 2. Non-hidden directories interfere with tab completion muscle memory. 3. On a more subjective note, home directories are personal space. People shape them to their personality, and interfering with this is impolite. 4. Per-application dotfiles have 30 years of convention behind them. Conversely, only a few applications use ~/.local (for example, Openbox and Audacious both look for configuration here) and none that I'm aware of default to ~/local. 5. Applications that create non-hidden directories in user home directories are generally perceived as being obnoxious. -- Evolution: Taking care of those too stupid to take care of themselves. From janssen at parc.com Tue May 6 17:30:35 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 6 May 2008 08:30:35 PDT Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> Message-ID: <08May6.083045pdt."58696"@synergy1.parc.xerox.com> > > /usr/.local to /usr/local? If not, then why prefer ~/.local to ~/local? > > Because unlike a home directory, users don't frequently perform > directory listings or tab completion of /usr/. For a frequently used > personal directory one wants the minimum of noise. Glad someone around here knows actual facts about the statistics of using "ls" :-). Can you point to published user studies about this? If not, let me just say that I perform directory listings of /usr a whole lot *more* than my home directory. Um, isn't this all argument about what color to paint the shed? Bill From collinw at gmail.com Tue May 6 18:07:00 2008 From: collinw at gmail.com (Collin Winter) Date: Tue, 6 May 2008 09:07:00 -0700 Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup In-Reply-To: References: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com> Message-ID: <43aa6ff70805060907m5568969dqcce8aa1127177b44@mail.gmail.com> On Thu, May 1, 2008 at 11:02 AM, Brett Cannon wrote: > On Thu, May 1, 2008 at 7:41 AM, Collin Winter wrote: > > > > On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon wrote: > > > > > Transition Plan > > > =============== > > > > > > For modules to be removed > > > ------------------------- > > > > > > For the removal of modules that are continuing to exist in the Python > > > 2.x series (i.e., not deprecated explicitly in the 2.x series), > > > ``warnings.warn3k()`` will be used to issue a DeprecationWarning. > > > > FYI, we can also flag these using 2to3. > > > > I can't remember if we have a guiding rule on this yet, but if 2to3 > can fix this, do we still want the warning? Obviously both names will > be provided so people can move their code over, but perhaps the > warning is not needed? I say keep the runtime warning. 2to3 can't fix the cases where the module is being removed entirely; the best it can do is to flag the import statement as requiring the user's attention. > > > Renaming of modules > > > ------------------- > > > > > > For modules that are renamed, stub modules will be created with the > > > original names and be kept in a directory within the stdlib (e.g. like > > > how lib-old was once used). The need to keep the stub modules within > > > a directory is to prevent naming conflicts with case-insensitive > > > filesystems in those cases where nothing but the case of the module > > > is changing. > > > > > > These stub modules will import the module code based on the new > > > naming. The same type of warning being raised by modules being > > > removed will be raised in the stub modules. > > > > > > Support in the 2to3 refactoring tool for renames will also be used > > > [#2to3]_. Import statements will be rewritten so that only the import > > > statement and none of the rest of the code needs to be touched. This > > > will be accomplished by using the ``as`` keyword in import statements > > > to bind in the module namespace to the old name while importing based > > > on the new name. > > > > You should cite the existing fix_imports fixer as one example of how > > to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup > > Done. > > -Brett > From guido at python.org Tue May 6 18:37:00 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 6 May 2008 09:37:00 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> Message-ID: On Tue, May 6, 2008 at 8:11 AM, Alec Thomas wrote: > Python would not be unique. Mozilla/Firefox does exactly this, putting > per-user plugins in ~/.mozilla. Note that this is moot since I'm going to accept the PEP as it stands (i.e. ~/.local) but I want to point out something that seems to be lost occasionally. Hiding stuff in dot files is the right thing to do when there's a separate API (like Mozilla) to manage those files. It is IMO much more questionable when the user is expected to manage things directly using the standard filesystem API. That's why Pictures etc. are not dot files. Of course, there's a gray area -- grizzled Unix wizards manage dozens of dot files like .profile and .exrc -- but I still think this is a useful (partial) guiding principle. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 6 18:42:20 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 6 May 2008 09:42:20 -0700 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> Message-ID: On Tue, May 6, 2008 at 7:37 AM, Steve Holden wrote: > If you want it visible, make a visible symbolic link! Note that the point is moot, since I'm going to accept Christian's PEP, i.e. ~/.local, but this argument "you can make it visible yourself" is bogus. The point of visibility (when it's brought up) isn't that you *can* make it visible -- you can always do that with ls -a or whatever Finder option. The point is that (in some people's view) the results of an action should be left *in plain sight* so that the user has clear evidence of what happened. I'm fine in this case with the counterarguments though, so I'll be accepting the PEP in a minute. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue May 6 18:48:34 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 6 May 2008 11:48:34 -0500 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org> <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com> <18464.26991.824584.173058@montanaro-dyndns-org.local> <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> Message-ID: <18464.35810.137067.640251@montanaro-dyndns-org.local> >> /usr/.local to /usr/local? If not, then why prefer ~/.local to ~/local? Alec> Because unlike a home directory, users don't frequently perform Alec> directory listings or tab completion of /usr/. For a frequently Alec> used personal directory one wants the minimum of noise. I don't mind the system clearly telling me about code I've installed. That's a lot different than Mozilla hiding it's internal stuff in ~/.mozilla. Skip From guido at python.org Tue May 6 19:03:59 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 6 May 2008 10:03:59 -0700 Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008) In-Reply-To: <481E15B7.9060003@cheimes.de> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org> <18459.43976.85481.758104@montanaro-dyndns-org.local> <481C2AE7.9010805@gmail.com> <18460.20940.882777.235301@montanaro-dyndns-org.local> <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org> <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com> <481DC6A3.70104@gmail.com> <481E15B7.9060003@cheimes.de> Message-ID: All, I've accepted PEP 370, Christian Heimes's proposal to add a per-user site-package directory. The location will be somewhere under ~/.local for Unix/Linux/OS X, and %APPDATA%/Python for Windows (per the original proposal in the PEP). Congratulations Christian, and thanks for championing this. Thanks also to everyone who contributed to the discussion and showed the error of my ways -- especially those who did so in under 100 words. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 6 19:19:08 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 6 May 2008 10:19:08 -0700 Subject: [Python-3000] Invitation to try out open source code review tool In-Reply-To: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net> References: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net> Message-ID: On Fri, May 2, 2008 at 12:39 PM, Andy Armstrong wrote: > Hi Guido, > > I'm afraid I've added a Perl based project (Test::Harness). I then went > back and read your post and got to the bit where you specifically invited > *Python* developers. Sorry about that. I'm not trying to colonise > Pythonspace with Perl, honest :) No problem! I didn't mean to be exclusive. You're more than welcome to use Rietveld. We'll be making an announcement later today that opens it up for everyone anyway, and any experiences you have to share are welcome on the codereview-discuss Google group. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Tue May 6 20:47:49 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 May 2008 03:47:49 +0900 Subject: [Python-3000] PEP 3108 - String representation in Python 3000 In-Reply-To: <482044F1.8020100@egenix.com> References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com> <482044F1.8020100@egenix.com> Message-ID: <874p9b9smi.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > This is all very nice, but if that means that the whole Unicode > database has to be loaded every time the interpreter starts up Ouch. > BTW, I'm sure it's possible to break down the above into a set of > ranges and switch cases that are easy to test without having to > lookup code points in the database. Even if you do end up using > the database, it should only be imported if the repr() really > does not need to lookup code points outside the Latin-1 range. You mean, "really does need to look up", right? Would it be too disgusting to have a simple range-based repr() as a builtin, and replace it with a lookup-based repr() defined in the Unicode database? From barry at python.org Wed May 7 00:43:05 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 6 May 2008 18:43:05 -0400 Subject: [Python-3000] [Python-Dev] PEP 370 heads up In-Reply-To: <4820DDD8.2040600@cheimes.de> References: <4820DDD8.2040600@cheimes.de> Message-ID: <8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 6, 2008, at 6:38 PM, Christian Heimes wrote: > Guido has accepted my user site directory PEP today. > http://python.org/dev/peps/pep-0370/ > > I'm about the merge the code. But first I like to let you know some > things and get your opinion. Very awesome Christian! I'm psyched for this to get into the last alpha releases, which I remind everyone happens tomorrow. Plan on svn tree freeze at approximately 6pm EDT (2200 UTC). Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCDe+nEjvBPtnXfVAQJJMAP/XiZvXPptw8tZ4/01hD7r39/lWgoDUmjp gVzne4+XMfz8NcLQMP2+Y38cPrQziyG8BYDqN/vWT641bOwv20QHuZYFvI9Kr09q jTEC39DzNRfD6ThzD/na6M1M7glpXiWr3hj4Va56JEnn1ekj6Ejb7BoW1oyuyz6T gUuAgVT2lOw= =2IIq -----END PGP SIGNATURE----- From lists at cheimes.de Wed May 7 00:50:37 2008 From: lists at cheimes.de (Christian Heimes) Date: Wed, 07 May 2008 00:50:37 +0200 Subject: [Python-3000] [Python-Dev] PEP 370 heads up In-Reply-To: <8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org> References: <4820DDD8.2040600@cheimes.de> <8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org> Message-ID: <4820E0BD.9040405@cheimes.de> Barry Warsaw schrieb: > Very awesome Christian! I'm psyched for this to get into the last alpha > releases, which I remind everyone happens tomorrow. Plan on svn tree > freeze at approximately 6pm EDT (2200 UTC). Thanks Barry! Also thanks to Glyph, Nick and all the other people that stepped in during the discussion in favor of ~/.local! Christian PS: I'll try to get json into shape for Python 3.0. It's going to be tricky for various reasons For example the re module still doesn't support bytes. From barry at python.org Thu May 8 01:24:32 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 7 May 2008 19:24:32 -0400 Subject: [Python-3000] Releasing alphas tonight Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Just a reminder that I'm going to be cutting the releases tonight. Because of work, I didn't make the 6pm EDT goal, and now I have to run out for a few hours. I will send another message when I'm ready to start spinning the release, but figure it will be at about 10pm EDT. Please limit your commits between now and then to only things you absolutely know will improve stability and test passing. Thanks, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCI6MXEjvBPtnXfVAQIfgwP+P7XOTMwWex5+YwiOza0fEeUj5n8OJuxU ISK3p3Tas4tPM65eMCHk5vmIFOBfJDFyWBpNhGr+uKmaWMgiqtPX5fs6nMmkbkrY dWrfG5Mgth9U1hpR4/1y/p2W82DJX9exmnjYL2BxjZ/TGeZdbcpUcs6Cc/fpHKR/ wTQ3dagAPNA= =bDtn -----END PGP SIGNATURE----- From lists at cheimes.de Thu May 8 02:51:43 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 08 May 2008 02:51:43 +0200 Subject: [Python-3000] Releasing alphas tonight In-Reply-To: References: Message-ID: <48224E9F.40407@cheimes.de> Barry Warsaw schrieb: > Hi all, > > Just a reminder that I'm going to be cutting the releases tonight. > Because of work, I didn't make the 6pm EDT goal, and now I have to run > out for a few hours. I will send another message when I'm ready to > start spinning the release, but figure it will be at about 10pm EDT. > Please limit your commits between now and then to only things you > absolutely know will improve stability and test passing. The py3k branch has a major show stopper, It's leaking references to the max. ... test_builtin leaked [14, 14, 14, 14] references, sum=56 test_exceptions beginning 9 repetitions 123456789 ......... test_exceptions leaked [40, 40, 40, 40] references, sum=160 test_types beginning 9 repetitions 123456789 ......... test_types leaked [2, 2, 2, 2] references, sum=8 test_unittest beginning 9 repetitions 123456789 ......... test_unittest leaked [23, 23, 23, 23] references, sum=92 ... I'm trying to find the issue. Christian From lists at cheimes.de Thu May 8 03:25:01 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 08 May 2008 03:25:01 +0200 Subject: [Python-3000] Releasing alphas tonight In-Reply-To: <48224E9F.40407@cheimes.de> References: <48224E9F.40407@cheimes.de> Message-ID: <4822566D.7080207@cheimes.de> Christian Heimes schrieb: > The py3k branch has a major show stopper, It's leaking references to the > max. Fixed ;) Christian From musiccomposition at gmail.com Thu May 8 04:21:13 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 7 May 2008 21:21:13 -0500 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: References: Message-ID: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com> Can I go ahead and remove this then? > > > It seems that os.walk has more options and a cleaner interface to > > walking trees than os.path.walk does. Is there support for the removal > > this in Py3k? > > > > -- > > Cheers, > > Benjamin Peterson -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From guido at python.org Thu May 8 05:12:46 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 7 May 2008 20:12:46 -0700 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com> References: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com> Message-ID: On Wed, May 7, 2008 at 7:21 PM, Benjamin Peterson wrote: > Can I go ahead and remove this then? Yes, but let's do it after Barry has released the alphas. > > > It seems that os.walk has more options and a cleaner interface to > > > walking trees than os.path.walk does. Is there support for the removal > > > this in Py3k? > > > > > > -- > > > Cheers, > > > Benjamin Peterson > > > > -- > Cheers, > Benjamin Peterson > "There's no place like 127.0.0.1." > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu May 8 06:36:00 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 00:36:00 -0400 Subject: [Python-3000] Releasing alphas tonight In-Reply-To: <4822566D.7080207@cheimes.de> References: <48224E9F.40407@cheimes.de> <4822566D.7080207@cheimes.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 7, 2008, at 9:25 PM, Christian Heimes wrote: > Christian Heimes schrieb: >> The py3k branch has a major show stopper, It's leaking references >> to the >> max. > > Fixed ;) Thanks! Folks, I apologize. I had some system problems tonight so I fell behind on the release. I just applied Antoine's patch for bug 2507 and I'd like to make sure the buildbots complete. Other than that, the only other release critical is one for the release process. I'll complete the releases tomorrow morning (EDT) so in the meantime, please refrain from committing anything. Thanks, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCKDMXEjvBPtnXfVAQL1CgP9Heg1XlNjpM3wgT4N8PK090HnaGIJ6MzH Fs3QtngvLB/YPf31VrkYILIIMG/YBs+yqCZFziuSR2alNYNBcvwNVfpIljMuq9AM qtj+cu2vbhkoh+gR8LjM1J8ZWAKhI5G6eAxHuGlTWykdumcSllkB6xW4uLQ2RolZ eOS5Avnc/Qs= =kJdD -----END PGP SIGNATURE----- From barry at python.org Thu May 8 06:36:22 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 00:36:22 -0400 Subject: [Python-3000] [Python-Dev] Releasing alphas tonight In-Reply-To: <001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh> References: <48224E9F.40407@cheimes.de> <001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh> Message-ID: <6AB977B4-8300-4A07-B7CF-F2403C5D4156@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 7, 2008, at 9:41 PM, Hirokazu Yamamoto wrote: > Hello. > >> The py3k branch has a major show stopper, It's leaking references >> to the >> max. > > Is there any chance this leak also will be fixed? > http://bugs.python.org/issue2222 Not for the alphas, sorry. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCKDRnEjvBPtnXfVAQKjGAP8CFKRDMBYJC8dm+tR/nHucrRa/Nqfy977 I8rx/B5QWN+feBk6LhODaEQ2NPOQaF+iSTaDnOlF9f2+Z6m85b94zsLJPY9EoiAC qdNmYBmZWYtuzvLmCh5Ef2aCjtfbn4Ik8i3SR9amQJBhuq7ubbdYVUsbcy6HCUUV K2Xp8LV1HWM= =aYB/ -----END PGP SIGNATURE----- From barry at python.org Thu May 8 13:32:42 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 07:32:42 -0400 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <20080508043520.B60821E400E@bag.python.org> References: <20080508043520.B60821E400E@bag.python.org> Message-ID: <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 8, 2008, at 12:35 AM, raymond.hettinger wrote: > Author: raymond.hettinger > Date: Thu May 8 06:35:20 2008 > New Revision: 62848 > > Log: > Frozensets do not benefit from autoconversion. Since the trunk buildbots appear to be mostly happy (well those that are connected anyway), and because I couldn't get the releases out last night, I'll let this one slide. I'd like to find a way to more forcefully enforce commit freezes for the betas though. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCLk2nEjvBPtnXfVAQJxiwP/VPTmeKVLoKkc/xIF0tc/lb6pT7kZ0swL b1M2TUkl/+xOuKf3J2EIkHOiKdNNmivl80nG/wP9/VTa7lVJGnWgIeLi0yC20Q9n wvtHaXCrHDc4/ibiShjwYqD4YR0BGwJI7BrlyCYzohbjFK6QYsxd+5a96Cipb/cB +K/Akjqry4Q= =xQfb -----END PGP SIGNATURE----- From musiccomposition at gmail.com Thu May 8 13:54:35 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 8 May 2008 06:54:35 -0500 Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> Message-ID: <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw wrote: > Since the trunk buildbots appear to be mostly happy (well those that are > connected anyway), and because I couldn't get the releases out last night, > I'll let this one slide. I'd like to find a way to more forcefully enforce > commit freezes for the betas though. I wonder if you couldn't alter the server side commit hook to reject everything with the message "Sorry, we're in a freeze." (You'd have to make an exception for yourself.) -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From barry at python.org Thu May 8 13:59:50 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 07:59:50 -0400 Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> Message-ID: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote: > On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw wrote: >> Since the trunk buildbots appear to be mostly happy (well those >> that are >> connected anyway), and because I couldn't get the releases out last >> night, >> I'll let this one slide. I'd like to find a way to more forcefully >> enforce >> commit freezes for the betas though. > > I wonder if you couldn't alter the server side commit hook to reject > everything with the message "Sorry, we're in a freeze." (You'd have to > make an exception for yourself.) This is exactly what I'm thinking about! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCLrNnEjvBPtnXfVAQITDwP/WGqlRHSfvE668clPM3gshhYbAapZcF+e mNKGwu407/q03LYRqHr2QY0gBxsySJBWl5OsozmJUOTc7NEY/E/MtiauauzCJiyO 24sJ2V52aROwYBLG+4tLFcaGmWmnsWPg79Qj/yJQKMMiH5OznPfagLECOjlwDZZA ianWqOZxeYc= =xyD7 -----END PGP SIGNATURE----- From barry at python.org Thu May 8 14:03:54 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 08:03:54 -0400 Subject: [Python-3000] Fwd: [issue2547] Py30a4 RELNOTES only cover 30a1 and 30a2 References: <1210247710.02.0.210073560245.issue2547@psf.upfronthosting.co.za> Message-ID: <88A090EF-7B71-46F7-9F08-560FCAE07762@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Begin forwarded message: > From: "Barry A. Warsaw" > Date: May 8, 2008 7:55:10 AM EDT > To: barry at python.org > Subject: [issue2547] Py30a4 RELNOTES only cover 30a1 and 30a2 > Reply-To: Tracker > > > Barry A. Warsaw added the comment: > > I've updated the release script to at least touch RELNOTES, but I'm > unsure as to what the policy is for updating the content of this file. > I'm closing this issue but will bring it up on the mailing list. > > __________________________________ > Tracker > > __________________________________ So there was a release critical issue open about making sure to update Py3k's RELNOTES file. I've updated the release script so that I'll be sure to edit this file, however I'm not sure what the policy is on updating it. Would you expect me to update it and if so, from what data source? Do we list all open critical bugs on the Py3k tracker? All open PEPs? I'd like to ask everyone doing Py3k development to help pitch in and keep this file up-to-date. I think this will be more important as we move to beta releases starting next cycle. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCLsK3EjvBPtnXfVAQJdsAP8CFBVoUk6Zubmw5NWfOywWQH5kg1oLcm4 mhXm5kGKcPvouNphOs6P4UxqG3l8/Fib0cD5TLCx6SFDCDwamuPSogLBGvCxFpZu ztjMGyVWNraxxHDgQ1suq1LvOItIMeA6SHqozRpNJ+UchfaEPu8weRSWT0VGB/bN qoYTQ8rPWwQ= =poFO -----END PGP SIGNATURE----- From lists at cheimes.de Thu May 8 14:21:54 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 08 May 2008 14:21:54 +0200 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> Message-ID: <4822F062.7090305@cheimes.de> Barry Warsaw schrieb: > This is exactly what I'm thinking about! -1 A technical solution never solves a social problem. It's just going to cause more social and technical problems. All community members with svn write privileges must subscribe to the Python developer list. Committers must check the lists prior to a check in if a release is immanent. Releases are announced at least four days prior to svn freeze so it's not going to be a problem. The problem often lies with occasional committers and maintainers of stdlib packages. People need to show more discipline or eventually we have to (temporarily) revoke their privileges. Christian From barry at python.org Thu May 8 15:20:42 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 09:20:42 -0400 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <4822F062.7090305@cheimes.de> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> Message-ID: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 8, 2008, at 8:21 AM, Christian Heimes wrote: > Barry Warsaw schrieb: >> This is exactly what I'm thinking about! > > -1 > > A technical solution never solves a social problem. It's just going to > cause more social and technical problems. In this case I disagree. Given our global nature and the vast amounts of email we all get, I think a friendly little svn commit hook reminder is a simple and workable solution. This commit lock really doesn't need to be in place for very long. Optimistically, I only need it long enough to create the tags, which /normally/ should take me 10 minutes. > All community members with svn write privileges must subscribe to the > Python developer list. Committers must check the lists prior to a > check > in if a release is immanent. Releases are announced at least four days > prior to svn freeze so it's not going to be a problem. The problem > often > lies with occasional committers and maintainers of stdlib packages. > People need to show more discipline or eventually we have to > (temporarily) revoke their privileges. Or aggressively back out any changes from freeze time to tag time. If we don't add the commit hook lock, I will be very strict about this come the betas. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCL+KnEjvBPtnXfVAQIkxgQAqXXwZjHyI93L1xEvrIPYGkTugxlgEva/ bj9ip59XqB6EYS8NnciJU29WZhcc3WnEoOsdWk7qwYV0qOc2YOgYh775GF4Q2S/A 5qVw+oePFIGCWMhezVG/JYph8V6T0QL36hhgd78WqBJKa2C7IpKEjh3HATwY8DQL nouyqdmIDJo= =Vohh -----END PGP SIGNATURE----- From ncoghlan at gmail.com Thu May 8 15:23:11 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 08 May 2008 23:23:11 +1000 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <4822F062.7090305@cheimes.de> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> Message-ID: <4822FEBF.9060800@gmail.com> Christian Heimes wrote: > Barry Warsaw schrieb: >> This is exactly what I'm thinking about! > > -1 > > A technical solution never solves a social problem. It's just going to > cause more social and technical problems. > > All community members with svn write privileges must subscribe to the > Python developer list. Committers must check the lists prior to a check > in if a release is immanent. Releases are announced at least four days > prior to svn freeze so it's not going to be a problem. The problem often > lies with occasional committers and maintainers of stdlib packages. > People need to show more discipline or eventually we have to > (temporarily) revoke their privileges. It's actually the time zone issues that get me in relation to code freezes... so I just try to avoid committing anything for a day or two :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From murman at gmail.com Thu May 8 15:41:20 2008 From: murman at gmail.com (Michael Urman) Date: Thu, 8 May 2008 08:41:20 -0500 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: On Thu, May 8, 2008 at 8:20 AM, Barry Warsaw wrote: > Or aggressively back out any changes from freeze time to tag time. If we > don't add the commit hook lock, I will be very strict about this come the > betas. I know this way is fairly entrenched in the python release process, but it sounds like it's using the tools incorrectly. In particular with subversion is very easy (compared to cvs) to branch and to switch branches locally. Why not create a new prerelease branch at the beginning of freeze and only merge in the critical changes? This way only the release manager need know or care about the branch, and nobody else has to really modify his behavior. Then tag, move, and/or delete the branch as desired. The obvious stumbling blocks include buildbots not following the new branch (this could be a blocker), and release scripts possibly needing modifications if they contain direct svn url references. -- Michael Urman From barry at python.org Thu May 8 15:51:29 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 09:51:29 -0400 Subject: [Python-3000] Freeze lifted Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've created the tags for 3.0a5 and 2.6a3, and the tarballs look good, so I'm lifting the commit freeze for these two branches. Thanks everyone, and look for the release announcements in a little while. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCMFYXEjvBPtnXfVAQKwyAP/bVnUtGzHMaJcwdc6BZR+kZJ0M22k/Vbp Nk1IfPts3HPKC7cNWzEkpWlqeXnGC0piuqDGrv2igY2Ori7LVMaTOea1xj8L1KqA QxiSHT0qtkW9J/io/q3Vw4cdXjshUQahSVPL2upafmCF1ROGDM0IKODq6kzjxgGV I8XI4BciN20= =NIY+ -----END PGP SIGNATURE----- From barry at python.org Thu May 8 16:24:16 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 10:24:16 -0400 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: <694A97D7-5CB1-4DC4-B17F-2B157DE89CF4@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 8, 2008, at 9:41 AM, Michael Urman wrote: > On Thu, May 8, 2008 at 8:20 AM, Barry Warsaw wrote: >> Or aggressively back out any changes from freeze time to tag time. >> If we >> don't add the commit hook lock, I will be very strict about this >> come the >> betas. > > I know this way is fairly entrenched in the python release process, > but it sounds like it's using the tools incorrectly. In particular > with subversion is very easy (compared to cvs) to branch and to switch > branches locally. Why not create a new prerelease branch at the > beginning of freeze and only merge in the critical changes? This way > only the release manager need know or care about the branch, and > nobody else has to really modify his behavior. Then tag, move, and/or > delete the branch as desired. > > The obvious stumbling blocks include buildbots not following the new > branch (this could be a blocker), and release scripts possibly needing > modifications if they contain direct svn url references. I definitely think we'd want the buildbots to track the release branches, and it's a bit of a pain to get the release scripts to deal with the svn switches. Right now I think the freeze window is pretty short (barring unforeseen networking snafus) that it's not worth it. However, once the release process is smooth enough, maybe this little freeze hiccup will be worth eliminating. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCMNEHEjvBPtnXfVAQIDogP+NVpyE7AhUS1Eerqv/N+ERTuKnmy/rSNQ wQhOlAxlvx/lPgm0Mi70C9cA60ogxwGE+nJPf0RQxN2bVfhE/+fvElRl9x7xuoo3 wAK6/zzItqMCP4bpaT8sbsqn4tPB4OCKr0eM/SgZMxrHZkHHZwLTVAw81h40Fmr3 A30V6JpZpdU= =q3uu -----END PGP SIGNATURE----- From guido at python.org Thu May 8 18:24:35 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 8 May 2008 09:24:35 -0700 Subject: [Python-3000] Fwd: [issue2547] Py30a4 RELNOTES only cover 30a1 and 30a2 In-Reply-To: <88A090EF-7B71-46F7-9F08-560FCAE07762@python.org> References: <1210247710.02.0.210073560245.issue2547@psf.upfronthosting.co.za> <88A090EF-7B71-46F7-9F08-560FCAE07762@python.org> Message-ID: On Thu, May 8, 2008 at 5:03 AM, Barry Warsaw wrote: > So there was a release critical issue open about making sure to update > Py3k's RELNOTES file. I've updated the release script so that I'll be sure > to edit this file, however I'm not sure what the policy is on updating it. > Would you expect me to update it and if so, from what data source? Do we > list all open critical bugs on the Py3k tracker? All open PEPs? > > I'd like to ask everyone doing Py3k development to help pitch in and keep > this file up-to-date. I think this will be more important as we move to > beta releases starting next cycle. I believe I invented this file for the 3.0a1 release, when I realized that some things were broken but I didn't want to hold up the release any longer. I also kept adding to it for a while *after* the release, which seems odd, except that I also copied the updated contents to the website. Possibly making it public on the website is the main goal of the file -- it makes users aware of the top ten (say) "gotchas" without having to scan the bug tracker or ask the mailing list. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Thu May 8 18:52:02 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 08 May 2008 18:52:02 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> Message-ID: <48232FB2.3020205@egenix.com> On 2008-05-06 15:55, Atsuo Ishimoto wrote: > (I changed subject) > > Thank you for your comment. > > On Tue, May 6, 2008 at 8:45 PM, M.-A. Lemburg wrote: > >> For sys.stdout this doesn't make sense at all, since it hides encoding >> errors for all applications using sys.stdout as piping mechanism. >> -1 on that. > > You can raise UnicodeEncodigError for encoding errors if you want, by > setting sys.stdout's error-handler to `strict`. No, that's not a good idea. I don't want to change every single affected application just to make sure that they don't write corrupt data to stdout. >> Both are really way beyond the scope of the PEP and I don't >> really see the need for them. > > Even though this PEP was rejected, You mean PEP 3138 was rejected ?? > I'll still propose to change > default error-handler for sys.stdout and for sys.stderr to > 'backslashreplace'. For Python 2, 'strict' error-handler is acceptable > because most of text data are 8-bit string, but for Py3K, raising > exceptions when the printed text contains a character not supported by > console is annoying. Well, "annoying" is not good enough for such a big change :-) Please also consider the different situations you are addressing: * console output (ie. printing) * stdout file output (ie. piping) * interactive session use (ie. running print at the Python prompt) The backslashreplace idea may have some merrits in interactive Python sessions or IDLE, but it hides encoding errors in all other situations. >> They also don't cover the cases >> where you write the repr() to a log file, some stream or syslog. > > Sure. I missed some cases, such as cgitb module or logging module. > I'll investigate them later. If you have another candidate, please let > me know. You have to address the general use cases, not just specific implementations in the Python stdlib - those can easily be changed, but doing the same in all the existing code out there that wants to get ported to Py3k is a different issue. I'm not against changing the repr() of Unicode objects, but please make sure that this change does not break debugging Python applications. Whether you're debugging an app using 'print' statements, piping repr() through a socket to a remote debugger or writing information to a log file. The important factor to take into account is the other end that will receive the data. BTW: One problem that your PEP doesn't address, which I mentioned on the ticket: By putting all printable chars into the repr() you lose the ability to actually see the number of code points you have in a Unicode string. A Unicode-aware editor, shell or pager will display the data as glyphs and not as code points, ie. glyphs expressed using combining code points will appear as one "character" to the user - even though the Unicode object contains multiple code points. As a result, the length and any indexes you might use in the debugging session will not match what the user sees in his shell window. >>> - Characters defined in the Unicode character database as > [snip] >> This is all very nice, but if that means that the whole Unicode >> database has to be loaded every time the interpreter starts up >> as you indicated on the ticket, them I'm firmly -1 against that. > > I changed a patch to add a flag to the _PyUnicode_TypeRecords table, > so the Unicode database is not loaded at stat up. Thanks. Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED isn't all that intuitive. And also add your definition from the PEP to unicodectype.c - since this is not a Unicode standard. I'd also appreciate if you could make that property available as Unicode method, e.g. .isprintable(). This addition is good on its own. >> I proposed to make the Unicode repr() output a regular encoding >> that's being implemented by a codec. You could then easily >> change the encoding to whatever you need for your application >> or console. > > I think global setting is not flexible enough. And I see no benefit to > customizable repr() except to keep compatible with Python 2, but I > think it is easy to migrate the existing code to the Py3k. That's what I don't see in your PEP. How can things easily be changed so that it's possible to get the Py2.x style hex escaping back into Py3k without having to change all repr() calls and %r format markers for Unicode objects ? I can see your point with it being easier to read e.g. German, Japanese or Korean data, but it still has to be possible to use repr() for proper debugging which allows the user to actually see what is stored in a Unicode object in terms of code points. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Thu May 8 19:18:06 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 08 May 2008 19:18:06 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482066E3.7030209@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> Message-ID: <482335CE.7000309@egenix.com> On 2008-05-06 16:10, Nick Coghlan wrote: > Atsuo Ishimoto wrote: >>> I proposed to make the Unicode repr() output a regular encoding >>> that's being implemented by a codec. You could then easily >>> change the encoding to whatever you need for your application >>> or console. >> >> I think global setting is not flexible enough. And I see no benefit to >> customizable repr() except to keep compatible with Python 2, but I >> think it is easy to migrate the existing code to the Py3k. > > There's a bigger issue with trying to make whatever repr() does a codec > in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with > Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy. > > That said, it would be nice to have a way to easily stack > Unicode->Unicode transforms on top of text IO streams, or byte->byte > transforms on top of binary streams. +1 Here's what I wrote on the ticket for the PEP. I wasn't aware of that change, otherwise, I'd have commented on this earlier: > On 2008-05-06 19:10, Guido van Rossum wrote: >> Guido van Rossum added the comment: >> >> On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote: >>> So you've limited the codec design to just doing Unicode<->bytes >>> conversions ? >> >> Yes. This was quite a conscious decision that was not taken lightly, >> with lots of community input, quite a while ago. >> >>> The original codec design was to have the codec decide which >>> types to take on input and to generate on output, e.g. to >>> escape characters in Unicode (converting Unicode to Unicode), >>> work on compressed 8-bit strings (converting 8-bit strings to >>> 8-bit strings), etc. >> >> Unfortunately this design made it hard to reason about the correctness >> of code, since (especially in Py3k, where bytes and str are more >> different than str and unicode were in 2.x) it's hard to write code >> that uses .encode() or .decode() unless it knows which codec is being >> used. >> >> IOW, when translated to 3.0, the design violates the general design >> principle that the *type* of a function's or method's return value >> should not depend on the *value* of one of the arguments. > > I understand where this concept originates and usual apply this > rule to software design as well, however, in the particular case > of codecs, the codec registry and its helper functions are merely > interfaces to code that is defined elsewhere. > > In comparison, the approach is very much like getattr() - you know > what the attribute is called, but know nothing about its type > until you receive it from the function. > > The reason codecs where designed like this was to be able to > easily stack them. For this to work, only the interfaces need > to be defined, without restricting the codecs too much in terms > of which types may be used. > > I'd suggest to lift the type restrictions from the general > codecs.c access APIs (PyCodec_*), since they don't really belong > there and instead only impose the limitation on PyUnicode and > PyString methods .encode() and .decode(). > > If you then also allow those methods to return *both* > PyUnicode and PyString, you'd still have strong typing > (only 1 of two possible types is allowed) and stacking > streams or having codecs that work on PyUnicode->PyUnicode > or PyString->PyString would still be accessible via > .encode()/.decode(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From g.brandl at gmx.net Thu May 8 21:03:45 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 08 May 2008 21:03:45 +0200 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: Barry Warsaw schrieb: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On May 8, 2008, at 8:21 AM, Christian Heimes wrote: > >> Barry Warsaw schrieb: >>> This is exactly what I'm thinking about! >> >> -1 >> >> A technical solution never solves a social problem. It's just going to >> cause more social and technical problems. > > In this case I disagree. Given our global nature and the vast amounts > of email we all get, I think a friendly little svn commit hook > reminder is a simple and workable solution. While I'm +0 on the commit hook, it would help if a mail that announces a freeze would - not be hidden in a thread on python-dev and - have a easily recognizable title, like "[TRUNK FREEZE] ....". Georg From tjreedy at udel.edu Thu May 8 22:49:14 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 8 May 2008 16:49:14 -0400 Subject: [Python-3000] [Python-checkins] r62848 -python/trunk/Objects/setobject.c References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de><617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: Given that we cannot depend on timely mail/news propagation or on exact day-ahead scheduling of a freeze, a current freeze notice either from the repository or on a .../dev/status page might work better. From tjreedy at udel.edu Thu May 8 22:55:38 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 8 May 2008 16:55:38 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> Message-ID: Functions that map unicode->unicode or bytes->bytes could be called transcoders. Each type could be given a .transcode method to go along with but contrast with .encode or .decode. tjr From barry at python.org Fri May 9 01:50:06 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 8 May 2008 19:50:06 -0400 Subject: [Python-3000] RELEASED Python 2.6a3 and 3.0a5 Message-ID: <88DFD025-8670-42FA-9B73-AFF5193FB0AE@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team and the Python community, I am happy to announce the third alpha release of Python 2.6, and the fifth alpha release of Python 3.0. Please note that these are alpha releases, and as such are not suitable for production environments. We continue to strive for a high degree of quality, but there are still some known problems and the feature sets have not been finalized. These alphas are being released to solicit feedback and hopefully discover bugs, as well as allowing you to determine how changes in 2.6 and 3.0 might impact you. If you find things broken or incorrect, please submit a bug report at http://bugs.python.org For more information and downloadable distributions, see the Python 2.6 website: http://www.python.org/download/releases/2.6/ and the Python 3.0 web site: http://www.python.org/download/releases/3.0/ These are the last planned alphas for both versions. If all goes well, next month will see the first beta releases of both, which will also signal feature freeze. Two beta releases are planned, with the final releases scheduled for September 3, 2008. See PEP 361 for release details: http://www.python.org/dev/peps/pep-0361/ Enjoy, - -Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCORrnEjvBPtnXfVAQIK+QQAgEUtAvW7uo0BxMiT1bCAo2E9ZecWJ9xe DBgd/5IK8moITkqhqGAH5UvfytV6uPkOMgGIS/Uvk4hzhU3jwSopEIDJLFQ5nGtC lCzOHzkDjSNZ8Q2OOAI9mbSHY8grvVxCMB4X2SVXIEMZ6M/X1AcV2b0utp9O1w/l T/PEvP8U1uY= =2Tnb -----END PGP SIGNATURE----- From g.brandl at gmx.net Fri May 9 08:02:02 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 09 May 2008 08:02:02 +0200 Subject: [Python-3000] [Python-checkins] r62848 -python/trunk/Objects/setobject.c In-Reply-To: References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de><617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: Terry Reedy schrieb: > Given that we cannot depend on timely mail/news propagation or on exact > day-ahead scheduling of a freeze, a current freeze notice either from the > repository or on a .../dev/status page might work better. Nobody is going to look at such a page before making a commit :) Georg From humberto at digi.com.br Fri May 9 09:45:42 2008 From: humberto at digi.com.br (Humberto Diogenes) Date: Fri, 9 May 2008 04:45:42 -0300 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: References: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com> Message-ID: <5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br> On 08/05/2008, at 00:12, Guido van Rossum wrote: > On Wed, May 7, 2008 at 7:21 PM, Benjamin Peterson > wrote: >> Can I go ahead and remove this then? > > Yes, but let's do it after Barry has released the alphas. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) Hi, Benjamin! I noticed you've already removed os.path.walk in r62909, but there are still some references to it in the code, as I noticed issuing a `make altinstall` on a Mac: AttributeError: 'module' object has no attribute 'walk' References in .py files: ./Mac/scripts/cachersrc.py:42: os.path.walk(dir, handler, (verbose, force)) ./Mac/scripts/zappycfiles.py:25: os.path.walk(dir, walker, None) ./Mac/Tools/Doc/setup.py:112: os.path.walk(self.build_html, self.visit, None) ./setup.py:1577: os.path.walk(dirname, self.set_dir_modes_visitor, mode) ./Tools/i18n/pygettext.py:344: os.path.walk(name, _visit_pyfiles, list) ./Tools/scripts/findlinksto.py:25: os.path.walk(dirname, visit, prog) ./Tools/versioncheck/checkversions.py:34: os.path.walk(tree, check1dir, None) Maybe it would be nice to include some tips about the translation from os.path.walk to os.walk in the migration notes, too. Thanks! -- Humberto Di?genes http://humberto.digi.com.br From stephen at xemacs.org Fri May 9 10:11:08 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 09 May 2008 17:11:08 +0900 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: <8763tnnbhf.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Urman writes: > I know this way is fairly entrenched in the python release process, > but it sounds like it's using the tools incorrectly. In particular > with subversion is very easy (compared to cvs) to branch and to switch > branches locally. Why not create a new prerelease branch at the > beginning of freeze and only merge in the critical changes? Well, speaking from experience: - some of the "critical changes" may only get committed on the release branch - something different from what's in the mainline may get committed on the release branch - the milestones are on a sideline, not on the mainline. Getting these points right is essential to ensure that the beta testers' work is actually relevant to the development process, that bisection searches work correctly, etc. > only the release manager need know or care about the branch, and > nobody else has to really modify his behavior. Behavior modification is the main point of having a release cycle. Setting deadlines, changing the nature of the patches, bringing issues to closure, etc. A release without a freeze is like a sentence without a period, IMO. From humberto at digi.com.br Fri May 9 10:27:56 2008 From: humberto at digi.com.br (Humberto Diogenes) Date: Fri, 9 May 2008 05:27:56 -0300 Subject: [Python-3000] Removal of os.path.walk In-Reply-To: <5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br> References: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com> <5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br> Message-ID: <5E8DADB0-DCAD-4B5A-8910-CE8C7357FCE0@digi.com.br> On 09/05/2008, at 04:45, Humberto Diogenes wrote: > I noticed you've already removed os.path.walk in r62909, but there > are still some references to it in the code, as I noticed issuing a > `make altinstall` on a Mac: > AttributeError: 'module' object has no attribute 'walk' Here's the fix for the installation issue: Index: setup.py =================================================================== --- setup.py (revision 62932) +++ setup.py (working copy) @@ -1574,13 +1574,10 @@ def set_dir_modes(self, dirname, mode): if not self.is_chmod_supported(): return - os.path.walk(dirname, self.set_dir_modes_visitor, mode) + for root, dirs, files in os.walk(dirname): + log.info("changing mode of %s to %o" % (root, mode)) + if not self.dry_run: os.chmod(root, mode) - def set_dir_modes_visitor(self, mode, dirname, names): - if os.path.islink(dirname): return - log.info("changing mode of %s to %o", dirname, mode) - if not self.dry_run: os.chmod(dirname, mode) - def is_chmod_supported(self): return hasattr(os, 'chmod') I don't even know if this is really necessary, as it seems to run in one directory only: changing mode of /usr/local/lib/python3.0/lib-dynload/ to 755 -- Humberto Di?genes http://humberto.digi.com.br From mal at egenix.com Fri May 9 12:44:01 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 09 May 2008 12:44:01 +0200 Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> Message-ID: <48242AF1.906@egenix.com> On 2008-05-08 13:59, Barry Warsaw wrote: > On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote: > >> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw wrote: >>> Since the trunk buildbots appear to be mostly happy (well those that are >>> connected anyway), and because I couldn't get the releases out last >>> night, >>> I'll let this one slide. I'd like to find a way to more forcefully >>> enforce >>> commit freezes for the betas though. > >> I wonder if you couldn't alter the server side commit hook to reject >> everything with the message "Sorry, we're in a freeze." (You'd have to >> make an exception for yourself.) > > This is exactly what I'm thinking about! +1, that's easy to do with Subversion and doesn't hurt anyone. Please also use a term like "freeze" or "frozen" in the subject line of the announcement - perhaps even in capital letters. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Fri May 9 12:54:02 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 09 May 2008 12:54:02 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> Message-ID: <48242D4A.3060802@egenix.com> On 2008-05-08 22:55, Terry Reedy wrote: > Functions that map unicode->unicode or bytes->bytes could be called > transcoders. Each type could be given a .transcode method to go along with > but contrast with .encode or .decode. Are you suggesting to have two separate methods which then allow same-type-conversions ? One for encoding to the same type and one for decoding ? Fine with me. They do have to map naturally to the codec method encode and decode, though, so a single method won't do, unless maybe you add a parameter to define the direction of the coding process. In summary, I'd just like to see the following happen: * revert the type restrictions on the PyCodec_* API * enforce the restrictions on the .encode() and .decode() methods of PyUnicode and PyString objects (str and bytes) * add a way to PyUnicode and PyString objects (str and bytes) to allow same type encoding and decoding Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From barry at python.org Fri May 9 14:22:32 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 9 May 2008 08:22:32 -0400 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> Message-ID: <520D2894-2296-4C6D-97FF-8521520E8E81@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 8, 2008, at 3:03 PM, Georg Brandl wrote: > > While I'm +0 on the commit hook, it would help if a mail that > announces > a freeze would > - not be hidden in a thread on python-dev and > - have a easily recognizable title, like "[TRUNK FREEZE] ....". I will make the freeze announcement more recognizable in the future, but I also want to point out that the entire release schedule has been published far in advance in PEP 361. At this point, the freeze dates should come as no surprise. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCRCCnEjvBPtnXfVAQKJgAQAojZ5vIg2K4q4e+XEHogQKeFjxkh5+o6U eWDjmkeVImwe1Sylb+mCqrxQ7JNY6d1m35hQsna/Ghan1IVIQ857fCBXS84aIUGl AGAnbrzxAt7RoYz/dyhz2twf1Uui5OVGOCYnmZ3ExZhTrEHN7ze43C+Blir0sH+4 DCuDj4xmpMM= =6W75 -----END PGP SIGNATURE----- From skip at pobox.com Fri May 9 14:15:21 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 9 May 2008 07:15:21 -0500 Subject: [Python-3000] Code Freeze - full or partial? In-Reply-To: <48242AF1.906@egenix.com> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <48242AF1.906@egenix.com> Message-ID: <18468.16473.206010.819980@montanaro-dyndns-org.local> In the past I seem to recall that the Python code proper might be frozen (for a day or two) before a release, but that it was okay to still commit changes to non-code files such as documentation or files in Misc. Is this still the case in the new release-early-release-often regime? Is the intention to make the duration of the code freeze so short (a few minutes or hours) that it's not worth the effort to make this distinction? Skip From barry at python.org Fri May 9 15:25:17 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 9 May 2008 09:25:17 -0400 Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <48242AF1.906@egenix.com> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <48242AF1.906@egenix.com> Message-ID: <308D6BF1-936D-45F2-960B-0A42D04186A5@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 9, 2008, at 6:44 AM, M.-A. Lemburg wrote: > On 2008-05-08 13:59, Barry Warsaw wrote: >> On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote: >>> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw >>> wrote: >>>> Since the trunk buildbots appear to be mostly happy (well those >>>> that are >>>> connected anyway), and because I couldn't get the releases out >>>> last night, >>>> I'll let this one slide. I'd like to find a way to more >>>> forcefully enforce >>>> commit freezes for the betas though. >>> I wonder if you couldn't alter the server side commit hook to reject >>> everything with the message "Sorry, we're in a freeze." (You'd >>> have to >>> make an exception for yourself.) >> This is exactly what I'm thinking about! > > +1, that's easy to do with Subversion and doesn't hurt anyone. Agreed. Look for it for the first beta. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCRQvXEjvBPtnXfVAQKyLwP8D0AVX+jgvy04hM207eeWRZb3JcHMtZuP ZcOuBQsCsVFppCxAreYIwfa0e6TD2LHBV4uz/G7Nxt6qNI6SY7lHQezNg4RezFwJ e93HAGdD0djj4BrL/xCr0wrK6wCwjodcvcjFdqTjEdLnkS7KGM9ooW8ZdYjQp6jI E+ZLDdhQ/KY= =24yM -----END PGP SIGNATURE----- From barry at python.org Fri May 9 15:30:11 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 9 May 2008 09:30:11 -0400 Subject: [Python-3000] Code Freeze - full or partial? In-Reply-To: <18468.16473.206010.819980@montanaro-dyndns-org.local> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <48242AF1.906@egenix.com> <18468.16473.206010.819980@montanaro-dyndns-org.local> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 9, 2008, at 8:15 AM, skip at pobox.com wrote: > In the past I seem to recall that the Python code proper might be > frozen > (for a day or two) before a release, but that it was okay to still > commit > changes to non-code files such as documentation or files in Misc. > Is this > still the case in the new release-early-release-often regime? Is the > intention to make the duration of the code freeze so short (a few > minutes or > hours) that it's not worth the effort to make this distinction? For the alphas, that's certainly been the case because it hasn't been necessary to coordinate all the Experts. IOW, it's okay for the Windows installer to get uploaded a few hours after the tarballs. For the betas, rcs and finals, I think we want a little bit more coordination (correct me if you disagree). So in that case, there may be a longer freeze. Even in that case, I don't envision more than a 24 hour freeze hopefully. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iQCVAwUBSCRR5HEjvBPtnXfVAQLa+gP8CL9koa5eGBvP8g+CA8l61SIuluHNbPkq SH7uOiPMeuIX392xy82ixnXjYTlCJn9epWouYkiWta3GA+ZaCcmTFFavZ3ZbLbE3 uxfzhCWsZ5EUW5/iDCOUrlEwuxXJ6FU4naRTaTCBTELXRKvb3sI5C2pFjrb6JTZc hP2hP6m+A2Y= =avCD -----END PGP SIGNATURE----- From steve at holdenweb.com Fri May 9 15:31:43 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 09 May 2008 09:31:43 -0400 Subject: [Python-3000] [Python-checkins] r62848 - python/trunk/Objects/setobject.c In-Reply-To: <520D2894-2296-4C6D-97FF-8521520E8E81@python.org> References: <20080508043520.B60821E400E@bag.python.org> <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org> <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com> <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org> <4822F062.7090305@cheimes.de> <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org> <520D2894-2296-4C6D-97FF-8521520E8E81@python.org> Message-ID: Barry Warsaw wrote: > On May 8, 2008, at 3:03 PM, Georg Brandl wrote: > >> While I'm +0 on the commit hook, it would help if a mail that announces >> a freeze would >> - not be hidden in a thread on python-dev and >> - have a easily recognizable title, like "[TRUNK FREEZE] ....". > > I will make the freeze announcement more recognizable in the future, but > I also want to point out that the entire release schedule has been > published far in advance in PEP 361. At this point, the freeze dates > should come as no surprise. > A python-dev calendar on Google Calendars? That would give us one more warning to ignore :-) regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From eric+python-dev at trueblade.com Fri May 9 17:24:35 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 09 May 2008 11:24:35 -0400 Subject: [Python-3000] Adding 'n' format presentation type to integers Message-ID: <48246CB3.7060504@trueblade.com> 'n' is like 'g', but adds locale-specific thousands separators. Issue 2802 (http://bugs.python.org/issue2802) points out that 'n' formatting isn't useful for integers, because it first converts to float. There's no way to get 1,000,000 as a result, since 'g' converts to '1e+06'. I propose adding 'n' as an integer format presentation type to PEP 3101. The definition would be: 'n' - Number. This is the same as 'd', except that it uses the current locale setting to insert the appropriate number separator characters. I already have the C code needed to implement this in Python/pystrtod.c (for floats), so it would just take some refactoring to get the integer formatter to use it. If there is agreement, I'll update the PEP and implement this in 2.6 and 3.0. Eric. From guido at python.org Fri May 9 18:06:59 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 9 May 2008 09:06:59 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <48242D4A.3060802@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> Message-ID: On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg wrote: > On 2008-05-08 22:55, Terry Reedy wrote: >> >> Functions that map unicode->unicode or bytes->bytes could be called >> transcoders. Each type could be given a .transcode method to go along with >> but contrast with .encode or .decode. > > Are you suggesting to have two separate methods which then > allow same-type-conversions ? One for encoding to the same > type and one for decoding ? > > Fine with me. > > They do have to map naturally to the codec method encode and > decode, though, so a single method won't do, unless maybe > you add a parameter to define the direction of the coding > process. > > In summary, I'd just like to see the following happen: > > * revert the type restrictions on the PyCodec_* API > > * enforce the restrictions on the .encode() and .decode() > methods of PyUnicode and PyString objects (str and bytes) > > * add a way to PyUnicode and PyString objects (str and bytes) > to allow same type encoding and decoding +1 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ishimoto at gembook.org Fri May 9 19:23:07 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 10 May 2008 02:23:07 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <48232FB2.3020205@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> Message-ID: <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> On Fri, May 9, 2008 at 1:52 AM, M.-A. Lemburg wrote: >>> For sys.stdout this doesn't make sense at all, since it hides encoding >>> errors for all applications using sys.stdout as piping mechanism. >>> -1 on that. >> >> You can raise UnicodeEncodigError for encoding errors if you want, by >> setting sys.stdout's error-handler to `strict`. > > No, that's not a good idea. I don't want to change every single > affected application just to make sure that they don't write > corrupt data to stdout. The changes you need to make for your applications will be so small that I don't think this is valid argument. And number of applications you need to change will be rather small. What you call "corrupt data" are just hex-escaped characters of foreign language. In most case, printing(or writing to file) such string doesn't harm, so I think raising exception by default is overkill. Java doesn't raise exception for encoding error, but just print `?`. .NET languages such as C# also prints '?'. Perl prints hex-escaped string, as proposed in this PEP. >> Even though this PEP was rejected, > > You mean PEP 3138 was rejected ?? Er, I should have written "Even if this PEP was ...", perhaps. > Well, "annoying" is not good enough for such a big change :-) So? Annoyance of Perl was enough reason to change entire language for me :-) > The backslashreplace idea may have some merrits in interactive > Python sessions or IDLE, but it hides encoding errors in all > other situations. Encoding errors are not hidden, but are represented by hex-escaped strings. We can get much more information about the string being printed than printing tracebacks. > I'm not against changing the repr() of Unicode objects, but > please make sure that this change does not break debugging > Python applications.Whether you're debugging an app using > 'print' statements, piping repr() through a socket to a remote > debugger or writing information to a log file. The important > factor to take into account is the other end that will receive > the data. I think your request is too vague to be completed. This proposal improve current broken debugging for me, and I see no lost information for debugging. But the "other end" may be too vary to say something. > BTW: One problem that your PEP doesn't address, which I mentioned > on the ticket: > > By putting all printable chars into the repr() you lose the > ability to actually see the number of code points you have > in a Unicode string. > With current repr(), I can not get any information other than number of code points. This is not what I want to know by printing repr(). For length of the string, I'll just do print(len(s)). > > Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED > isn't all that intuitive. The name `Py_UNICODE_ISPRINTABLE` came to my mind at first, but I was not sure the `printable` is accurate word. I'm okay for Py_UNICODE_ISPRINTABLE, but I'd like to hear opinions. If no one objects Py_UNICODE_ISPRINTABLE, I'll go for it. > > How can things easily be changed so that it's possible to get the > Py2.x style hex escaping back into Py3k without having to change > all repr() calls and %r format markers for Unicode objects ? I didn't intend to imply "without having to change". Perhaps, "migrate" would be wrong word and "port" may be better. For repr() and %r format, they are unlikely to be changed in most case. They need to be changed if pure ASCII are required even if your locale is capable to print the strings. > I can see your point with it being easier to read e.g. German, > Japanese or Korean data, but it still has to be possible to > use repr() for proper debugging which allows the user to > actually see what is stored in a Unicode object in terms of > code points. You can see code points easily, the function I wrote in the PEP to convert such strings as repr() in Python 2 is good example. But I believe ordinary use-case prefer readable string over code points. From dalcinl at gmail.com Fri May 9 23:35:12 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 9 May 2008 18:35:12 -0300 Subject: [Python-3000] about the status of PyNumberMethods Message-ID: Yesterday I was working on a patch for Cython to make the generated C code works from Python 2.3 to 2.6 and also 3.0. After four hours of carefully diving in Python sources from 2.3 to 3.0 and finishing the patch, the only stuff I would object from the current codebase of Py3K is the status of PyNumberMethods. A slot changed its name (nb_nonzero to nb_bool), some slots are gone (nb_[inplace_]divide) and others are unused (nb_hex, nb_oct, and nb_coerce). What are the long term plans for this? BTW, I was also looking at the very, very clever hackery implementing the method cache for types. My English is crude, but perhaps the Py_TPFLAGX_[HAVE|VALID]_VERSION_TAG could be renamed to something like XXX_MCACHE_TAG or XXX_METHODCACHE_TAG, that IMHO is more descriptive of what those flag are intendef for... -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From tjreedy at udel.edu Fri May 9 23:52:22 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 9 May 2008 17:52:22 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> Message-ID: "M.-A. Lemburg" wrote in message news:48242D4A.3060802 at egenix.com... | On 2008-05-08 22:55, Terry Reedy wrote: | > Functions that map unicode->unicode or bytes->bytes could be called | > transcoders. Each type could be given a .transcode method to go along with | > but contrast with .encode or .decode. My main idea is that we can both keep current functionality *and* the new restriction on usage of .encode() and .decode() (which *does* make things less confusing at least for me). | Are you suggesting to have two separate methods which then | allow same-type-conversions ? One for encoding to the same | type and one for decoding ? I only suggested the possibility of one because I was thinking of transcoders more generally than those in definite 'encode'/'decode' pairs. A lossy encoder needs a decoder just to do the reverse type conversion. But a lossy transcoder whose natural partner is the identity function does not. At least not conceptually. (Example for bytes: map most control chars to 0 and any above 127 to 127.) Another difference is that transcoders can be chained is a way that encoders (or decoders, both in the class-changing sense) cannot. Thinking more, I realize that there are byte transcoders scattered across several modules and they are not going to be consolidated. Perhaps only unicode 'transcoders' are needed. But not for me to decide. | Fine with me. I do not really have a hat in this ring, so details are for others to decide. | They do have to map naturally to the codec method encode and | decode, though, so a single method won't do, unless maybe | you add a parameter to define the direction of the coding | process. It was an open question to me whether to reuse codecs or make a new transcoders module. But ditto my last comment. Terry Jan Reedy From eric+python-dev at trueblade.com Sun May 11 05:16:56 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Sat, 10 May 2008 23:16:56 -0400 Subject: [Python-3000] Adding 'n' format presentation type to integers In-Reply-To: <48246CB3.7060504@trueblade.com> References: <48246CB3.7060504@trueblade.com> Message-ID: <48266528.4080400@trueblade.com> Eric Smith wrote: > 'n' is like 'g', but adds locale-specific thousands separators. > > Issue 2802 (http://bugs.python.org/issue2802) points out that 'n' > formatting isn't useful for integers, because it first converts to > float. There's no way to get 1,000,000 as a result, since 'g' converts > to '1e+06'. > > I propose adding 'n' as an integer format presentation type to PEP 3101. > The definition would be: > > 'n' - Number. This is the same as 'd', except that it uses the > current locale setting to insert the appropriate > number separator characters. > > I already have the C code needed to implement this in Python/pystrtod.c > (for floats), so it would just take some refactoring to get the integer > formatter to use it. > > If there is agreement, I'll update the PEP and implement this in 2.6 and > 3.0. Having heard no objections, I'll update the PEP and check in the change. Eric. From g.brandl at gmx.net Sun May 11 22:58:50 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 11 May 2008 22:58:50 +0200 Subject: [Python-3000] CGI module - remove backward-compatibility classes? Message-ID: The CGI module has some classes that are marked as "backwards compatibility only". They are not formally deprecated in the docs, but this can be done for 2.6. Should we remove them in 3.0? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Sun May 11 23:21:21 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 11 May 2008 14:21:21 -0700 Subject: [Python-3000] CGI module - remove backward-compatibility classes? In-Reply-To: References: Message-ID: On 5/11/08, Georg Brandl wrote: > The CGI module has some classes that are marked as "backwards compatibility > only". They are not formally deprecated in the docs, but this can be done > for 2.6. Should we remove them in 3.0? > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- Sent from Gmail for mobile | mobile.google.com --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun May 11 23:22:00 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 11 May 2008 14:22:00 -0700 Subject: [Python-3000] CGI module - remove backward-compatibility classes? In-Reply-To: References: Message-ID: +1 On 5/11/08, Georg Brandl wrote: > The CGI module has some classes that are marked as "backwards compatibility > only". They are not formally deprecated in the docs, but this can be done > for 2.6. Should we remove them in 3.0? > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- Sent from Gmail for mobile | mobile.google.com --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Sun May 11 23:43:05 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 11 May 2008 23:43:05 +0200 Subject: [Python-3000] CGI module - remove backward-compatibility classes? In-Reply-To: References: Message-ID: Done in r63099. Georg Guido van Rossum schrieb: > +1 > > On 5/11/08, Georg Brandl wrote: >> The CGI module has some classes that are marked as "backwards compatibility >> only". They are not formally deprecated in the docs, but this can be done >> for 2.6. Should we remove them in 3.0? >> >> Georg >> >> -- >> Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. >> Four shall be the number of spaces thou shalt indent, and the number of thy >> indenting shall be four. Eight shalt thou not indent, nor either indent thou >> two, excepting that thou then proceed to four. Tabs are right out. >> >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From mal at egenix.com Wed May 14 18:18:39 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 May 2008 18:18:39 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> Message-ID: <482B10DF.50105@egenix.com> Atuso you are not really addressing my arguments in your reply. My main concern is that repr(unicode) as well as '%r' is used a lot in logging and debugging of applications. In the 2.x series of Python, the output of repr() has traditionally always been plain ASCII and does not require any special encoding and also doesn't run into problems when mixing the output with other encodings used in the log file, on the console or whereever the output of repr() is sent. You are now suggesting to break this convention by allowing all printable code points to be used in the repr() output. Depending on where you send the repr() output and the contents of the PyUnicode object, this will likely result in exceptions in the .write() method of the stream object. Just adjusting sys.stdout and sys.stderr to prevent them from falling over is not enough (and is indeed not within the scope of the PEP, since those changes are *major* and not warranted for just getting your Unicode repr() to work). repr() is very often written to log files and those would all have to be changed as well. Now, as I've said before, I can see your point about wanting to be able to read the Unicode code points, even if you use repr() - instead of the more straight-forward .encode() approach. However, when suggesting such changes, you always have to see the other side as well: - Are there alternative ways to get the "problem" fixed ? - Is the added convenience worth breaking existing conventions ? - Is it worth breaking existing applications ? I've suggested making the repr() output configurable to address the convenience aspect of your proposal. You could then set the output encoding to e.g. "unicode-printable" and get your preferred output. The default could remain set to the current all-ASCII output. Hardwiring the encoding is not a good idea, esp. since there are lots of alternatives for you to get readable output from PyUnicode object now and without any changes to the interpreter. E.g. print '%s' % u.encode('utf-8') or print '%s' % u.encode('shift-jis') or logfile = open('my.log', encoding='unicode-printable') logfile.write(u) or def unicode_repr(u): return u.encode('unicode-printable') print '%s' % unicode_repr(u) There are many ways to solve your problem. In summary, I am: -1 on hardwiring the unicode repr() output to a non-ASCII encoding +1 on adding the PyUnicode_ISPRINTABLE() API +1 on adding a unicode-printable codec which implements your suggested encoding, so that you can use it for e.g. log files or as sys.stdout encoding +0 on making unicode repr() encoding adjustable Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 On 2008-05-09 19:23, Atsuo Ishimoto wrote: > On Fri, May 9, 2008 at 1:52 AM, M.-A. Lemburg wrote: >>>> For sys.stdout this doesn't make sense at all, since it hides encoding >>>> errors for all applications using sys.stdout as piping mechanism. >>>> -1 on that. >>> You can raise UnicodeEncodigError for encoding errors if you want, by >>> setting sys.stdout's error-handler to `strict`. >> No, that's not a good idea. I don't want to change every single >> affected application just to make sure that they don't write >> corrupt data to stdout. > > The changes you need to make for your applications will be so small > that I don't think this is valid argument. > And number of applications you need to change will be rather small. > What you call "corrupt data" are just hex-escaped characters of > foreign language. In most case, printing(or writing to file) such > string doesn't harm, so I think raising exception by default is > overkill. Java doesn't raise exception for encoding error, but just > print `?`. .NET languages such as C# also prints '?'. Perl prints > hex-escaped string, as proposed in this PEP. > >>> Even though this PEP was rejected, >> You mean PEP 3138 was rejected ?? > > Er, I should have written "Even if this PEP was ...", perhaps. > >> Well, "annoying" is not good enough for such a big change :-) > > So? Annoyance of Perl was enough reason to change entire language for me :-) > >> The backslashreplace idea may have some merrits in interactive >> Python sessions or IDLE, but it hides encoding errors in all >> other situations. > > Encoding errors are not hidden, but are represented by hex-escaped > strings. We can get much more information about the string being > printed than printing tracebacks. > >> I'm not against changing the repr() of Unicode objects, but >> please make sure that this change does not break debugging >> Python applications.Whether you're debugging an app using >> 'print' statements, piping repr() through a socket to a remote >> debugger or writing information to a log file. The important >> factor to take into account is the other end that will receive >> the data. > > I think your request is too vague to be completed. This proposal > improve current broken debugging for me, and I see no lost information > for debugging. But the "other end" may be too vary to say something. > >> BTW: One problem that your PEP doesn't address, which I mentioned >> on the ticket: >> >> By putting all printable chars into the repr() you lose the >> ability to actually see the number of code points you have >> in a Unicode string. >> > > With current repr(), I can not get any information other than number > of code points. This is not what I want to know by printing repr(). > For length of the string, I'll just do print(len(s)). > >> Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED >> isn't all that intuitive. > > The name `Py_UNICODE_ISPRINTABLE` came to my mind at first, but I was > not sure the `printable` is accurate word. I'm okay for > Py_UNICODE_ISPRINTABLE, but I'd like to hear opinions. If no one > objects Py_UNICODE_ISPRINTABLE, I'll go for it. > >> How can things easily be changed so that it's possible to get the >> Py2.x style hex escaping back into Py3k without having to change >> all repr() calls and %r format markers for Unicode objects ? > > I didn't intend to imply "without having to change". Perhaps, > "migrate" would be wrong word and "port" may be better. > > For repr() and %r format, they are unlikely to be changed in most > case. They need to be changed if pure ASCII are required even if your > locale is capable to print the strings. > >> I can see your point with it being easier to read e.g. German, >> Japanese or Korean data, but it still has to be possible to >> use repr() for proper debugging which allows the user to >> actually see what is stored in a Unicode object in terms of >> code points. > > You can see code points easily, the function I wrote in the PEP to > convert such strings as repr() in Python 2 is good example. But I > believe ordinary use-case prefer readable string over code points. From mal at egenix.com Wed May 14 18:23:20 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 May 2008 18:23:20 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> Message-ID: <482B11F8.2090200@egenix.com> On 2008-05-09 18:06, Guido van Rossum wrote: > On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg wrote: >> On 2008-05-08 22:55, Terry Reedy wrote: >>> Functions that map unicode->unicode or bytes->bytes could be called >>> transcoders. Each type could be given a .transcode method to go along with >>> but contrast with .encode or .decode. >> Are you suggesting to have two separate methods which then >> allow same-type-conversions ? One for encoding to the same >> type and one for decoding ? >> >> Fine with me. >> >> They do have to map naturally to the codec method encode and >> decode, though, so a single method won't do, unless maybe >> you add a parameter to define the direction of the coding >> process. >> >> In summary, I'd just like to see the following happen: >> >> * revert the type restrictions on the PyCodec_* API >> >> * enforce the restrictions on the .encode() and .decode() >> methods of PyUnicode and PyString objects (str and bytes) >> >> * add a way to PyUnicode and PyString objects (str and bytes) >> to allow same type encoding and decoding > > +1 Fine, so we need new methods for PyUnicode and PyString objects which allow encoding and decoding using the same type (and enforce the return types). Any suggestions ? How about these: str.str_encode() -> str str.str_decode() -> str bytes.bytes_encode() -> bytes bytes.bytes_decode() -> bytes Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From g.brandl at gmx.net Wed May 14 18:33:41 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 14 May 2008 18:33:41 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B11F8.2090200@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> Message-ID: M.-A. Lemburg schrieb: > On 2008-05-09 18:06, Guido van Rossum wrote: >> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg wrote: >>> On 2008-05-08 22:55, Terry Reedy wrote: >>>> Functions that map unicode->unicode or bytes->bytes could be called >>>> transcoders. Each type could be given a .transcode method to go along with >>>> but contrast with .encode or .decode. >>> Are you suggesting to have two separate methods which then >>> allow same-type-conversions ? One for encoding to the same >>> type and one for decoding ? >>> >>> Fine with me. >>> >>> They do have to map naturally to the codec method encode and >>> decode, though, so a single method won't do, unless maybe >>> you add a parameter to define the direction of the coding >>> process. >>> >>> In summary, I'd just like to see the following happen: >>> >>> * revert the type restrictions on the PyCodec_* API >>> >>> * enforce the restrictions on the .encode() and .decode() >>> methods of PyUnicode and PyString objects (str and bytes) >>> >>> * add a way to PyUnicode and PyString objects (str and bytes) >>> to allow same type encoding and decoding >> >> +1 Will this get use the hex, base64 etc. "codecs" back? If yes, great! > Fine, so we need new methods for PyUnicode and PyString objects > which allow encoding and decoding using the same type (and enforce > the return types). > > Any suggestions ? > > How about these: > > str.str_encode() -> str > str.str_decode() -> str > > bytes.bytes_encode() -> bytes > bytes.bytes_decode() -> bytes Cool, a naming contest :) What about transform/untransform? Georg From guido at python.org Wed May 14 18:55:28 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 14 May 2008 09:55:28 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> Message-ID: On Wed, May 14, 2008 at 9:33 AM, Georg Brandl wrote: > Will this get use the hex, base64 etc. "codecs" back? If yes, great! If someone does the work, yes. There will need to be some way to add metadata to codecs to indicate which of the following they support: str<->bytes, str<->str, bytes<->bytes. M.-A. Lemburg schrieb: > > Fine, so we need new methods for PyUnicode and PyString objects > > which allow encoding and decoding using the same type (and enforce > > the return types). > > > > Any suggestions ? > > > > How about these: > > > > str.str_encode() -> str > > str.str_decode() -> str > > > > bytes.bytes_encode() -> bytes > > bytes.bytes_decode() -> bytes > Cool, a naming contest :) > > What about transform/untransform? +1, anything to avoid having to type underscores. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Wed May 14 19:24:11 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 May 2008 19:24:11 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> Message-ID: <482B203B.3080305@egenix.com> On 2008-05-14 18:33, Georg Brandl wrote: > M.-A. Lemburg schrieb: >>>> In summary, I'd just like to see the following happen: >>>> >>>> * revert the type restrictions on the PyCodec_* API >>>> >>>> * enforce the restrictions on the .encode() and .decode() >>>> methods of PyUnicode and PyString objects (str and bytes) >>>> >>>> * add a way to PyUnicode and PyString objects (str and bytes) >>>> to allow same type encoding and decoding >>> >>> +1 > > Will this get use the hex, base64 etc. "codecs" back? If yes, great! I suppose so :-) Those would work only work on bytes, though, so to convert the result into text, you'd have to do: text = bytes.encodebytes('hex').decode('ascii') bytes = text.encode('ascii').decodebytes('hex') >> Fine, so we need new methods for PyUnicode and PyString objects >> which allow encoding and decoding using the same type (and enforce >> the return types). >> >> Any suggestions ? >> >> How about these: >> >> str.str_encode() -> str >> str.str_decode() -> str >> >> bytes.bytes_encode() -> bytes >> bytes.bytes_decode() -> bytes > > Cool, a naming contest :) > > What about transform/untransform? Not bad :-) Here's a version without underscores: str.encodestr() -> str str.decodestr() -> str bytes.encodebytes() -> bytes bytes.decodebytes() -> bytes -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Wed May 14 19:27:18 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 May 2008 19:27:18 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> Message-ID: <482B20F6.20706@egenix.com> On 2008-05-14 18:55, Guido van Rossum wrote: > On Wed, May 14, 2008 at 9:33 AM, Georg Brandl wrote: >> Will this get use the hex, base64 etc. "codecs" back? If yes, great! > > If someone does the work, yes. There will need to be some way to add > metadata to codecs to indicate which of the following they support: > str<->bytes, str<->str, bytes<->bytes. No problem: we have codecs.CodecInfo to store such information. We'd just need a way to describe the supported input/output type combinations in one or more attributes to that structure. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From martin at v.loewis.de Wed May 14 19:43:41 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 May 2008 19:43:41 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B10DF.50105@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> Message-ID: <482B24CD.5080206@v.loewis.de> > Hardwiring the encoding is not a good idea, esp. since there > are lots of alternatives for you to get readable output from > PyUnicode object now and without any changes to the interpreter. > > E.g. > > print '%s' % u.encode('utf-8') We are talking about Python 3 here, so it is fairly important that you consider all syntactic and semantic details of Python 3 - otherwise it is not clear whether or not you are aware of them: - the print syntax is incorrect - .encode returns a byte string - therefore, %s applies __str__ to the byte string, yielding something like b'...', with hex escapes for the non-ASCII bytes > There are many ways to solve your problem. No. If you strike out those that don't actually work, close to none remain. Regards, Martin From jimjjewett at gmail.com Wed May 14 19:45:10 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 14 May 2008 13:45:10 -0400 Subject: [Python-3000] string API growth [was: Re: PEP 3138- String representation in Python 3000] Message-ID: On 5/14/08, Georg Brandl wrote: > M.-A. Lemburg schrieb: >>> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg wrote: >>>> On 2008-05-08 22:55, Terry Reedy wrote: >>>>> Functions that map unicode->unicode or bytes->bytes could be called >>>>> transcoders. bytes->bytes might be, but for many mappings (and all unicode->unicode mappings) they are general transformers. If you care about the concrete representation, then you aren't really dealing with unicode anymore; you're dealing with the ByteString. >>>> Are you suggesting to have two separate methods which then >>>> allow same-type-conversions ? >>>> ... have to map naturally to the codec method encode and >>>> decode For str->str or bytes->bytes, how do you decide which direction is "en"coding vs "de"coding? > > How about these: > > str.str_encode() -> str > > str.str_decode() -> str > > bytes.bytes_encode() -> bytes > > bytes.bytes_decode() -> bytes > What about transform/untransform? Maybe I'm missing something, but it seems to me that there are only a few logical combinations; if the below is wrong, maybe that is one reason unicode seems more complex than it should. Encoding: str -> ByteString (staticmethod) BytesString.encode(my_string, encoding=?) == my_string.encode(encoding=?) Decoding: ByteString -> str my_bytes.decode(encoding=?) == (staticmethod) str.decode(my_bytes, encoding=?) General Transforming: # Why insist on type-preservation? # Why even make these methods? my_string.transform(fn) == fn(my_string) my_bytes.transform(fn) == fn(my_bytes) Transcoding: ByteString -> ByteString # If you care how it is represented, it is no longer unicode; # it is a specific (ByteString) representation mybytes.recode(old_encoding=?, new_encoding) # Can the old encoding often be inferred? # Or should it always be written because of EIBTI? -jJ From mal at egenix.com Wed May 14 20:51:22 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 14 May 2008 20:51:22 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B24CD.5080206@v.loewis.de> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <482B24CD.5080206@v.loewis.de> Message-ID: <482B34AA.9000001@egenix.com> On 2008-05-14 19:43, Martin v. L?wis wrote: >> Hardwiring the encoding is not a good idea, esp. since there >> are lots of alternatives for you to get readable output from >> PyUnicode object now and without any changes to the interpreter. >> >> E.g. >> >> print '%s' % u.encode('utf-8') > > We are talking about Python 3 here, so it is fairly important > that you consider all syntactic and semantic details of Python > 3 - otherwise it is not clear whether or not you are aware of > them: > - the print syntax is incorrect > - .encode returns a byte string > - therefore, %s applies __str__ to the byte string, yielding > something like b'...', with hex escapes for the non-ASCII > bytes Sorry, I was in Python 2 mode. For Python 3 you don't need the .encode() calls since the stream will take care of that for you: # Let sys.stdout take care of the encoding print('"%s"' % u.transform('unicode-printable')) # Log to a file: logfile = open('my.log', 'a', encoding='unicode-printable') logfile.write('"%s"' % u) # Using a helper def unicode_repr(u): return '"' + u.transform('unicode-printable') + '"' print(unicode_repr(u)) For the purists: the above assumes that 'unicode-printable' will encode '"' to '\"'. BTW: I found that logfile = open('my.log', 'a', encoding='unicode-printable') doesn't raise an exception. Only when you call the .write() method you get the expected: LookupError: unknown encoding: unicode-printable Is that intended ? IMO, such errors should not be deferred. >> There are many ways to solve your problem. > > No. If you strike out those that don't actually work, close > to none remain. They may look a bit different, but the logic is essentially the same. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 14 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Wed May 14 23:39:00 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 May 2008 07:39:00 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B203B.3080305@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> Message-ID: <482B5BF4.1090007@gmail.com> M.-A. Lemburg wrote: > On 2008-05-14 18:33, Georg Brandl wrote: >> M.-A. Lemburg schrieb: >>> Fine, so we need new methods for PyUnicode and PyString objects >>> which allow encoding and decoding using the same type (and enforce >>> the return types). >>> >>> Any suggestions ? >>> >>> How about these: >>> >>> str.str_encode() -> str >>> str.str_decode() -> str >>> >>> bytes.bytes_encode() -> bytes >>> bytes.bytes_decode() -> bytes >> >> Cool, a naming contest :) >> >> What about transform/untransform? > > Not bad :-) > > Here's a version without underscores: > > str.encodestr() -> str > str.decodestr() -> str > > bytes.encodebytes() -> bytes > bytes.decodebytes() -> bytes A couple more possibilities (Guido is probably going to have to choose a colour for this bikeshed somewhere along the line...): mystr.recodeto('unicode-escaped') mystr.recodefrom('unicode-escaped') mybytes.recodeto('hex') mybytes.recodefrom('hex') Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Wed May 14 23:42:30 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 14 May 2008 14:42:30 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B5BF4.1090007@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> Message-ID: On Wed, May 14, 2008 at 2:39 PM, Nick Coghlan wrote: > M.-A. Lemburg wrote: >> >> On 2008-05-14 18:33, Georg Brandl wrote: >>> >>> M.-A. Lemburg schrieb: >>>> >>>> Fine, so we need new methods for PyUnicode and PyString objects >>>> which allow encoding and decoding using the same type (and enforce >>>> the return types). >>>> >>>> Any suggestions ? >>>> >>>> How about these: >>>> >>>> str.str_encode() -> str >>>> str.str_decode() -> str >>>> >>>> bytes.bytes_encode() -> bytes >>>> bytes.bytes_decode() -> bytes >>> >>> Cool, a naming contest :) >>> >>> What about transform/untransform? >> >> Not bad :-) >> >> Here's a version without underscores: >> >> str.encodestr() -> str >> str.decodestr() -> str >> >> bytes.encodebytes() -> bytes >> bytes.decodebytes() -> bytes > > A couple more possibilities (Guido is probably going to have to choose a > colour for this bikeshed somewhere along the line...): > > mystr.recodeto('unicode-escaped') > mystr.recodefrom('unicode-escaped') > > mybytes.recodeto('hex') > mybytes.recodefrom('hex') Nah. I'm still in favor of [un]transform. Let's just stick to that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu May 15 02:16:21 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 May 2008 12:16:21 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B11F8.2090200@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> Message-ID: <482B80D5.8000202@canterbury.ac.nz> Wasn't there a big discussion once before about whether encode/decode should be usable for things other than unicode<->non-unicode transformations? I thought the conclusion reached back then was that they shouldn't. Is there some reason the transformations being talked about can't just be provided as functions that operate on strings or bytes? -- Greg From stephen at xemacs.org Thu May 15 02:58:15 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 15 May 2008 09:58:15 +0900 Subject: [Python-3000] string API growth [was: Re: PEP 3138- String representation in Python 3000] In-Reply-To: References: Message-ID: <877idwl6xk.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > Maybe I'm missing something, but it seems to me that there are only a > few logical combinations; There are lots of logical combinations, but most of them fall into "general transform", is that what you mean? > if the below is wrong, maybe that is one > reason unicode seems more complex than it should. > > Encoding: str -> ByteString > (staticmethod) BytesString.encode(my_string, encoding=?) > == > my_string.encode(encoding=?) > > Decoding: ByteString -> str > my_bytes.decode(encoding=?) > == > (staticmethod) str.decode(my_bytes, encoding=?) +1 > General Transforming: > # Why insist on type-preservation? > # Why even make these methods? > my_string.transform(fn) == fn(my_string) > my_bytes.transform(fn) == fn(my_bytes) Make them methods if they are "like" codecs, by which I mean something like (more or less) invertible stream-oriented transformations. Eg, my_bytes.gzip() Pretty weak, though. > Transcoding: ByteString -> ByteString > # If you care how it is represented, it is no longer unicode; > # it is a specific (ByteString) representation > mybytes.recode(old_encoding=?, new_encoding) > > # Can the old encoding often be inferred? > # Or should it always be written because of EIBTI? (1) I agree this is the obvious connotation of "transcode" in the codec context. (2) This usage is too special to deserve treatment at this level, especially since for most purposes my_bytes.decode(old_encoding).encode(new_encoding) will be perfectly sufficient. (3) old_encoding should not be inferred as part of .decode() or .recode(), as such inference is unreliable and domain-specific heuristics often lead to great improvements. A separate method/function should be used. From guido at python.org Thu May 15 03:27:56 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 14 May 2008 18:27:56 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B80D5.8000202@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> Message-ID: On Wed, May 14, 2008 at 5:16 PM, Greg Ewing wrote: > Wasn't there a big discussion once before about whether > encode/decode should be usable for things other than > unicode<->non-unicode transformations? I thought the > conclusion reached back then was that they shouldn't. That was before the idea was brought up to have separate APIs for the X<->X transforms. The reason to drop those was making the type signatures of .encode() and .decode() predictable, which is much more of a concern in 3.0 than it is in 2.x where it's basically string in, string out and whether that's unicode of 8-bit is a minor detail (in some cases at least). > Is there some reason the transformations being talked > about can't just be provided as functions that operate > on strings or bytes? Several people have explained that having these available as transformations and being able to register new transformations is very convenient; there is plenty of existing use in 2.x of this feature, so we're not inventing something new. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Thu May 15 05:00:49 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 15 May 2008 12:00:49 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B80D5.8000202@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> Message-ID: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Wasn't there a big discussion once before about whether > encode/decode should be usable for things other than > unicode<->non-unicode transformations? I thought the > conclusion reached back then was that they shouldn't. That group prevailed, but it was more like a WBA title bout ... here's the rematch. This one won't "prove" anything either. > Is there some reason the transformations being talked > about can't just be provided as functions that operate > on strings or bytes? This discussion isn't about whether it could be done or not, it's about where people expect to find such functionality. Personally, if I can find .encode('euc-jp') on a string object, I would expect to find .encode('gzip') on a bytes object, too. I think this one is just going to come down to BDFL pronouncement about which is more Pythonic, because I don't really see either point of view as more "natural". From guido at python.org Thu May 15 05:22:12 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 14 May 2008 20:22:12 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, May 14, 2008 at 8:00 PM, Stephen J. Turnbull wrote: > This discussion isn't about whether it could be done or not, it's > about where people expect to find such functionality. Personally, if > I can find .encode('euc-jp') on a string object, I would expect to > find .encode('gzip') on a bytes object, too. The argument against reusing the same method name is that in 3.0 we need to keep bytes and str instances separate more carefully than we did in 2.x. Consider code that gets an encoding passed in as a variable e. It knows it has a bytes instance b. To encode b from bytes to str (unicode), it can use s = b.decode(e). It can then treat s as a string, e.g. write it to a text file or pass it to a text processing class. If the possibility existed that the result was actually a bytes instance (e.g. when e == 'gzip' instead of e == 'euc-jp') this would either cause the code to break subtly in the field, or it would require the programmer do an additional type check on s before using it. (And I know quite a few programmers who would feel obliged to handle this case.) Of course the possibility always exists that e is not a valid encoding at all; but that case raises a predictable exception. Similar in the case that b can't be decoded using e. Having something be a valid encoding but return an unusable result is much more problematic. > I think this one is just going to come down to BDFL pronouncement > about which is more Pythonic, because I don't really see either point > of view as more "natural". It's mostly settled. There will be separate methods to transform bytes to bytes and to transform str to str, and these will use separate collections of encodings. (Or perhaps some codecs will apply to multiple cases, e.g. rot13 might apply both for str<->str and for bytes<->bytes; but I'd expect gzip to apply only for bytes<->bytes.) There will be metadata on the codecs so that b.decode("gzip") will raise an exception just as b.transform("utf-8") will. The details haven't all been sorted out but so far the only names proposed that I like are transform() and untransform(). I propose that b.transform("gzip") would compress and b.untransform("gzip") would uncompress. I'm fine with the str and bytes methods both being called transform() and untransform() -- this is no different than the current situation with e.g. lower() and upper(). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu May 15 10:13:26 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 May 2008 20:13:26 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <482BF0A6.70602@canterbury.ac.nz> Stephen J. Turnbull wrote: > This discussion isn't about whether it could be done or not, it's > about where people expect to find such functionality. Personally, if > I can find .encode('euc-jp') on a string object, I would expect to > find .encode('gzip') on a bytes object, too. What I'm not seeing is a clear rationale on where you draw the line. Out of all the possible transformations between a string and some other kind of data, which ones deserve to be available via this rather strange and special interface, and why? -- Greg From stephen at xemacs.org Thu May 15 11:12:14 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 15 May 2008 18:12:14 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482BF0A6.70602@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> Message-ID: <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > What I'm not seeing is a clear rationale on where you > draw the line. Out of all the possible transformations > between a string and some other kind of data, which > ones deserve to be available via this rather strange > and special interface, and why? I don't know nuthin about just desserts. As I wrote earlier in response to Jim, what I would *expect* to be provided by this interface (not necessarily named "encode" and "decode", but invoked as a method with a transformation name as parameter) are those transformations that are "like codecs": stream-oriented and invertible. From mal at egenix.com Thu May 15 11:48:55 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 May 2008 11:48:55 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> Message-ID: <482C0707.8020805@egenix.com> On 2008-05-14 23:42, Guido van Rossum wrote: > On Wed, May 14, 2008 at 2:39 PM, Nick Coghlan wrote: >> M.-A. Lemburg wrote: >>> On 2008-05-14 18:33, Georg Brandl wrote: >>>> M.-A. Lemburg schrieb: >>>>> Fine, so we need new methods for PyUnicode and PyString objects >>>>> which allow encoding and decoding using the same type (and enforce >>>>> the return types). >>>>> >>>>> Any suggestions ? >>>>> >>>>> How about these: >>>>> >>>>> str.str_encode() -> str >>>>> str.str_decode() -> str >>>>> >>>>> bytes.bytes_encode() -> bytes >>>>> bytes.bytes_decode() -> bytes >>>> Cool, a naming contest :) >>>> >>>> What about transform/untransform? >>> Not bad :-) >>> >>> Here's a version without underscores: >>> >>> str.encodestr() -> str >>> str.decodestr() -> str >>> >>> bytes.encodebytes() -> bytes >>> bytes.decodebytes() -> bytes >> A couple more possibilities (Guido is probably going to have to choose a >> colour for this bikeshed somewhere along the line...): >> >> mystr.recodeto('unicode-escaped') >> mystr.recodefrom('unicode-escaped') >> >> mybytes.recodeto('hex') >> mybytes.recodefrom('hex') > > Nah. I'm still in favor of [un]transform. Let's just stick to that. Ok, so I'll add str.transform() -> str (uses the encode function of the codec) str.untransform() -> str (uses the decode function of the codec) bytes.transform() -> bytes (uses the encode function of the codec) bytes.untransform() -> bytes (uses the decode function of the codec) Is there an easy way to SVN-revive the removed base64, hex, etc codec modules in encodings ? As far as I remember, I have to look for the revision just before they were deleted and then "copy" them from there using the repo URL. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From p.f.moore at gmail.com Thu May 15 12:06:56 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 15 May 2008 11:06:56 +0100 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> On 15/05/2008, Guido van Rossum wrote: > Consider code that gets an encoding passed in as a > variable e. It knows it has a bytes instance b. To encode b from bytes > to str (unicode), it can use s = b.decode(e). To encode, you use .decode? It's nice to know it's not just me who has trouble keeping the terminology straight... Paul. From mal at egenix.com Thu May 15 12:22:45 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 May 2008 12:22:45 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <482C0EF5.3070205@egenix.com> On 2008-05-15 11:12, Stephen J. Turnbull wrote: > Greg Ewing writes: > > > What I'm not seeing is a clear rationale on where you > > draw the line. Out of all the possible transformations > > between a string and some other kind of data, which > > ones deserve to be available via this rather strange > > and special interface, and why? > > I don't know nuthin about just desserts. > > As I wrote earlier in response to Jim, what I would *expect* to be > provided by this interface (not necessarily named "encode" and > "decode", but invoked as a method with a transformation name as > parameter) are those transformations that are "like codecs": > stream-oriented and invertible. str.transform(encoding) will use the standard codecs.encode(encoding), but additionally check that the output has the type str and raise an error if it doesn't. Dito for .untransform(encoding). For bytes, the methods will check that the output has type bytes and raise an error if it doesn't. The methods could also check the meta-data on the found codecs before actually running the transformation, but that may not always lead to usable results, e.g. if a codec can handle both str->str and bytes->bytes by doing the type check itself. In any case, the above type checks will always happen to not cause unexpected results. I'll write up a PEP once we have a better understanding of the details, e.g. of how the codec type information should be defined... Here's a straight-forward approach: codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)] codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)] for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would then be: codecinfo.encode_type_combinations = [(str, bytes)] codecinfo.decode_type_combinations = [(bytes, str)] -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Thu May 15 12:34:32 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 May 2008 20:34:32 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482BF0A6.70602@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> Message-ID: <482C11B8.3010505@gmail.com> Greg Ewing wrote: > Stephen J. Turnbull wrote: >> This discussion isn't about whether it could be done or not, it's >> about where people expect to find such functionality. Personally, if >> I can find .encode('euc-jp') on a string object, I would expect to >> find .encode('gzip') on a bytes object, too. > > What I'm not seeing is a clear rationale on where you > draw the line. Out of all the possible transformations > between a string and some other kind of data, which > ones deserve to be available via this rather strange > and special interface, and why? > Where this kind of unified interface to binary and character transforms is incredibly handy is in a stacking IO model like the one used in Py3k. For example, suppose you're using a compressed XML stream to communicate over a network socket. What this approach allows you to do is have generic 'transformation' layers in your IO stack, so you can just build up your IO stack as something like: XMLParserIO('myschema') BufferedTextIO('utf-8') BytesTransform('gzip') RawSocketIO To change to a different compression mechanism (e.g. bz2), you just chance the codec used by the BytesTransform layer from 'gzip' to 'bz2'. As for how you choose what to provide as codecs... well, that's a major reason why the codec registry is extensible. The answer is that any binary or character transform which is useful to the application programmer can be accessed via the codec API - the only question will be whether the application programmer will have to write the codec themselves, or will find it already provided in the standard library. Cheers, Nick. P.S. My original tangential response that didn't actually answer your question, but may still be useful to some folks: An actual codec that encodes a character string to a byte sequence, and decodes a byte sequence back to a character string would be invoked via the str.encode() and bytes.decode() methods. For example, mystr.encode('utf-8') to serialise a string using UTF-8, mybytes.decode('utf-8') to read it back. A text transform that converts a character string to a different character string would be invoked via the str.transform() and str.untransform() methods. For example, mystr.transform('unicode-escape') to convert unicode characters to their \u or \U equivalents, mystr.untransform('unicode-escape') to convert them back to the actual unicode characters. A binary transform that converts a byte sequence to a different byte sequence would be invoked via the bytes.transform() and bytes.untransform() methods. For example, mybytes.transform('gzip') to compress a byte sequence, mybytes.untransform('gzip') to decompress it. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From mal at egenix.com Thu May 15 12:38:11 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 May 2008 12:38:11 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> Message-ID: <482C1293.3030409@egenix.com> On 2008-05-15 12:06, Paul Moore wrote: > On 15/05/2008, Guido van Rossum wrote: >> Consider code that gets an encoding passed in as a >> variable e. It knows it has a bytes instance b. To encode b from bytes >> to str (unicode), it can use s = b.decode(e). > > To encode, you use .decode? It's nice to know it's not just me who has > trouble keeping the terminology straight... It's all a matter of perspective. You can say you're encoding Latin-1 to Unicode, or you can say your encoding Unicode to Latin-1. Python's Unicode implementation regards PyUnicode as the "bigger" type than PyString (*), since it can hold all possible code points, so when going from the "bigger" type to the smaller one, you *encode*, whereas when going from the smaller one to the bigger one, you *decode*. For codecs in general, you have a source and a destination defining the codec (= coding / decoding). When going from the source to the destination you *encode*, the other way around is *decoding*. (*) This is why coercion in Py2 goes from PyString to PyUnicode and not the other way around. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Thu May 15 13:01:41 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 May 2008 21:01:41 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C0EF5.3070205@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> <482C0EF5.3070205@egenix.com> Message-ID: <482C1815.1050008@gmail.com> M.-A. Lemburg wrote: > I'll write up a PEP once we have a better understanding of the > details, e.g. of how the codec type information should be > defined... > > Here's a straight-forward approach: > > codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)] > codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)] > > for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would > then be: > > codecinfo.encode_type_combinations = [(str, bytes)] > codecinfo.decode_type_combinations = [(bytes, str)] Do we need something that flexible? Would a simpler approach with separate "binary_transform" and "text_transform" flags be enough? With the latter approach, the encode()/decode() methods could complain if either of the transform flags was set on the codec, while the transform()/untransform() methods could complain if the appropriate transform flag *wasn't* set. Note also that both bytearray and bytes provide decode() methods, and will presumably provide transform() methods, so actual type annotations may not be the best way to go about this. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From mal at egenix.com Thu May 15 13:48:40 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 May 2008 13:48:40 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C1815.1050008@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> <482C0EF5.3070205@egenix.com> <482C1815.1050008@gmail.com> Message-ID: <482C2318.50205@egenix.com> On 2008-05-15 13:01, Nick Coghlan wrote: > M.-A. Lemburg wrote: >> I'll write up a PEP once we have a better understanding of the >> details, e.g. of how the codec type information should be >> defined... >> >> Here's a straight-forward approach: >> >> codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)] >> codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)] >> >> for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would >> then be: >> >> codecinfo.encode_type_combinations = [(str, bytes)] >> codecinfo.decode_type_combinations = [(bytes, str)] > > Do we need something that flexible? Would a simpler approach with > separate "binary_transform" and "text_transform" flags be enough? > > With the latter approach, the encode()/decode() methods could complain > if either of the transform flags was set on the codec, while the > transform()/untransform() methods could complain if the appropriate > transform flag *wasn't* set. The above is a mechanism for codecs which do have a very flexible interface in terms of supported types. The methods on various objects are just convenience helpers for easier access and in Py3k also provide type-safety. The .transform() methods would simply check for the corresponding type combination, ie. str.transform() would check for (str, str). str.encode() would check for (str, bytes), bytes.decode() for (bytes, str). Alternatively, we could just not check the type combinations at all and only apply the result type check. > Note also that both bytearray and bytes provide decode() methods, and > will presumably provide transform() methods, so actual type annotations > may not be the best way to go about this. I'm not sure I understand. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Thu May 15 15:42:24 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 15 May 2008 23:42:24 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C2318.50205@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> <482C0EF5.3070205@egenix.com> <482C1815.1050008@gmail.com> <482C2318.50205@egenix.com> Message-ID: <482C3DC0.4020600@gmail.com> M.-A. Lemburg wrote: > The .transform() methods would simply check for the corresponding > type combination, ie. str.transform() would check for (str, str). > str.encode() would check for (str, bytes), bytes.decode() for > (bytes, str). > > Alternatively, we could just not check the type combinations > at all and only apply the result type check. > >> Note also that both bytearray and bytes provide decode() methods, and >> will presumably provide transform() methods, so actual type >> annotations may not be the best way to go about this. > > I'm not sure I understand. If we went with the approach of checking type annotations on the codec, then would a codec which was only annotated with (bytes, str) on the decode method be usable by bytearray.decode()? And if we aren't going to check the type annotations before invoking the codec, what's the point in having them at all? Better to leave them out entirely, invoke the relevant method of the named codec and see if we get the right type back. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From mal at egenix.com Thu May 15 16:22:13 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 May 2008 16:22:13 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C3DC0.4020600@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp> <482C0EF5.3070205@egenix.com> <482C1815.1050008@gmail.com> <482C2318.50205@egenix.com> <482C3DC0.4020600@gmail.com> Message-ID: <482C4715.3040906@egenix.com> On 2008-05-15 15:42, Nick Coghlan wrote: > M.-A. Lemburg wrote: >> The .transform() methods would simply check for the corresponding >> type combination, ie. str.transform() would check for (str, str). >> str.encode() would check for (str, bytes), bytes.decode() for >> (bytes, str). >> >> Alternatively, we could just not check the type combinations >> at all and only apply the result type check. >> >>> Note also that both bytearray and bytes provide decode() methods, and >>> will presumably provide transform() methods, so actual type >>> annotations may not be the best way to go about this. >> >> I'm not sure I understand. > > If we went with the approach of checking type annotations on the codec, > then would a codec which was only annotated with (bytes, str) on the > decode method be usable by bytearray.decode()? Probably not, but the suggested form allows adding (bytearray, str) if the codec support this as well and bytearray.decode() could check for that combination. > And if we aren't going to check the type annotations before invoking the > codec, what's the point in having them at all? They provide meta-information about the codec capabilities and may be useful in other contexts as well, e.g. if you want to add an .encode() method to some other object. > Better to leave them out > entirely, invoke the relevant method of the named codec and see if we > get the right type back. That's an option, yes. OTOH, if you first decode a 100MB data string using e.g. gzip and then find that the return type doesn't match what you had expected, the added global warming due to wasted CPU heat is going to make you feel rather uncomfortable :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From param at cs.wisc.edu Thu May 15 17:06:34 2008 From: param at cs.wisc.edu (Paramjit Oberoi) Date: Thu, 15 May 2008 08:06:34 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> Message-ID: On Thu, May 15, 2008 at 3:06 AM, Paul Moore wrote: > On 15/05/2008, Guido van Rossum wrote: >> Consider code that gets an encoding passed in as a >> variable e. It knows it has a bytes instance b. To encode b from bytes >> to str (unicode), it can use s = b.decode(e). > > To encode, you use .decode? It's nice to know it's not just me who has > trouble keeping the terminology straight... It takes a lot of effort, and constant vigilance, to keep encode/decode straight in one's head. Maybe this means they need to be renamed to something like tobytes() and tostring()? tostring() is probably not the best choice though - too much baggage from java. -param From ishimoto at gembook.org Thu May 15 18:13:01 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 16 May 2008 01:13:01 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482B10DF.50105@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> Message-ID: <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> On Thu, May 15, 2008 at 1:18 AM, M.-A. Lemburg wrote: > Atuso > > you are not really addressing my arguments in your reply. > > My main concern is that repr(unicode) as well as '%r' is used > a lot in logging and debugging of applications. > > In the 2.x series of Python, the output of repr() has traditionally > always been plain ASCII and does not require any special encoding > and also doesn't run into problems when mixing the output with > other encodings used in the log file, on the console or whereever > the output of repr() is sent. > > You are now suggesting to break this convention by allowing > all printable code points to be used in the repr() output. > Depending on where you send the repr() output and the contents > of the PyUnicode object, this will likely result in exceptions > in the .write() method of the stream object. > I can't understand why Python 3000 should stick to ASCII repr(). If your concern is about output, it should be addressed by file object on printing. The repr() generates text information about an object, and file encode the text for user's environment on output. This is straight forward, flexible and common pattern for the Unicode applications. > Just adjusting sys.stdout and sys.stderr to prevent them from > falling over is not enough (and is indeed not within the scope > of the PEP, since those changes are *major* and not warranted > for just getting your Unicode repr() to work). repr() is very > often written to log files and those would all have to be > changed as well. > For other files than sys.std*, I see no problem with:: log = open(filename, errors='backslashreplace'). log.write("%r" % obj) Although I prefer to 'backslashreplace' as default value for errors. > - Are there alternative ways to get the "problem" fixed ? > - Is the added convenience worth breaking existing conventions ? I would like to call it "improve", not break :) > - Is it worth breaking existing applications ? I guess number of applications broken by this change would be small, and fix would be easy. So I think worth it, and perhaps a lot of programmers in the non-Latin countries might think so, too. Apparently, this PEP brought you concern without any benefit. But this PEP is necessary to make the most of Unicode's ability for debugging and logging. > > I've suggested making the repr() output configurable to address > the convenience aspect of your proposal. You could then set the > output encoding to e.g. "unicode-printable" and get your preferred > output. The default could remain set to the current all-ASCII output. > I'm sorry, I cannot understand what "unicode-printable" codec does. Could you please explain it? I don't like to make repr() adjustable(I presume you mean to make unicode_repr() in the Modules/unicodeobject.c adjustable), because old repr() convention remains intact. Third party applications or libraries could be failed when I use my custom repr() function. From p.f.moore at gmail.com Thu May 15 18:49:06 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 15 May 2008 17:49:06 +0100 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> Message-ID: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> On 15/05/2008, Atsuo Ishimoto wrote: > I would like to call it "improve", not break :) Please can you help me understand the impact here. I am running Windows XP (UK English - console code page 850, which is some variety of Latin 1). Currently, printing non-latin1 characters gives me an exception: for example, >>> print("Hello\u03C8") Traceback (most recent call last): File "", line 1, in File "D:\Apps\Python30\lib\io.py", line 1103, in write b = s.encode(self._encoding) File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in position 5: character maps to (This is 3.0a1 - I don't know if much has changed in more recent alphas, if it's significant I can upgrade and try again). Can you explain what I need to change to make sys.stdout behave as you propose? If you can do that, I can test what I will see in your proposal if I type print(repr("Hello\u03C8")). My suspicion is that I will see unreadable garbage, rather than what I currently get, which is backslash-escaped, but readable. The key point here is that I don't think you're proposing to detect the user's display capabilities and adapt the output to match, so if my display can't cope with the full Unicode character set, I'll have to make manual adjustments or see broken output. Like it or not, a large proportion of Python's users still work in environments where much of the Unicode character space is not displayed readably. My apologies if I misunderstood your proposal - I have almost no Unicode experience, and that probably shows :-) Paul. From p.f.moore at gmail.com Thu May 15 18:53:39 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 15 May 2008 17:53:39 +0100 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> Message-ID: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com> On 15/05/2008, Paul Moore wrote: > My apologies if I misunderstood your proposal - I have almost no > Unicode experience, and that probably shows :-) One point I forgot to clarify is that I'm fully aware that print(arbitrary_string) may display garbage, if the string contains Unicode that my display can't handle. The key point for me is that print(repr(arbitrary_string)) is *guaranteed* to display correctly, even on my limited-capability terminal, precisely because it only uses ASCII and no matter how dumb, all terminals I know of display ASCII. Paul. From phd at phd.pp.ru Thu May 15 19:03:32 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 15 May 2008 21:03:32 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> Message-ID: <20080515170332.GA9117@phd.pp.ru> On Thu, May 15, 2008 at 05:49:06PM +0100, Paul Moore wrote: > Like it or not, a large proportion of Python's users still work in > environments where much of the Unicode character space is not > displayed readably. How large is that "large proportion"? 10%? 50%? 90%? How often users working in ascii-only environment are confronted with non-ascii strings? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Thu May 15 19:06:02 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 15 May 2008 21:06:02 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com> Message-ID: <20080515170602.GB9117@phd.pp.ru> On Thu, May 15, 2008 at 05:53:39PM +0100, Paul Moore wrote: > One point I forgot to clarify is that I'm fully aware that > print(arbitrary_string) may display garbage, if the string contains > Unicode that my display can't handle. The key point for me is that > print(repr(arbitrary_string)) is *guaranteed* to display correctly, > even on my limited-capability terminal, precisely because it only uses > ASCII and no matter how dumb, all terminals I know of display ASCII. That's up to print() or any other output device to decide, not to repr(). If I send repr() from a CGI back to the browser it doesn't matter if the server is ascii-only, it only matters if the browser can display unicode. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ishimoto at gembook.org Thu May 15 19:50:22 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 16 May 2008 02:50:22 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> Message-ID: <797440730805151050g472d947r18e8f7c7d520d44e@mail.gmail.com> On Fri, May 16, 2008 at 1:49 AM, Paul Moore wrote: > On 15/05/2008, Atsuo Ishimoto wrote: >> I would like to call it "improve", not break :) > > Please can you help me understand the impact here. I am running > Windows XP (UK English - console code page 850, which is some variety > of Latin 1). Currently, printing non-latin1 characters gives me an > exception: for example, > >>>> print("Hello\u03C8") > Traceback (most recent call last): > File "", line 1, in > File "D:\Apps\Python30\lib\io.py", line 1103, in write > b = s.encode(self._encoding) > File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode > return codecs.charmap_encode(input,errors,encoding_map) > UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in > position 5: character maps to > > (This is 3.0a1 - I don't know if much has changed in more recent > alphas, if it's significant I can upgrade and try again). > > Can you explain what I need to change to make sys.stdout behave as you > propose? If you can do that, I can test what I will see in your > proposal if I type print(repr("Hello\u03C8")). My suspicion is that I > will see unreadable garbage, rather than what I currently get, which > is backslash-escaped, but readable. With my proposal, print("Hello\u03C8") prints "Hello\u03C8" instead of raising an exception. And print(repr("Hello\u03C8")) prints "'Hello\u03C8'", so no garbage are printed. Now, let's say you are Greek and working on Greek version of XP. print("Hello\u03C8") prints "Hello"+collect Greek character(GREEK SMALL LETTER PSI). And print(repr("Hello\u03C8")) prints "'Hello"+collect Greek character+"'". If you have Greek font, you can try this if you swich your command prompt by "chcp 1253" (change codepage to 1253) on your command prompt. > > The key point here is that I don't think you're proposing to detect > the user's display capabilities and adapt the output to match, so if > my display can't cope with the full Unicode character set, I'll have > to make manual adjustments or see broken output. > Python detects user's capabilities, since Python 2.x(or 1.6? I forgot.) On Windows, Python detects user's encoding from codepage. On Unix, locale is used to detect encoding. > Like it or not, a large proportion of Python's users still work in > environments where much of the Unicode character space is not > displayed readably. > I agree. So rejecting my proposal as "Not common use-case" might be reasonable. But I should argue to get sympathy, anyway:). > One point I forgot to clarify is that I'm fully aware that > print(arbitrary_string) may display garbage, if the string contains > Unicode that my display can't handle. The key point for me is that > print(repr(arbitrary_string)) is *guaranteed* to display correctly, > even on my limited-capability terminal, precisely because it only uses > ASCII and no matter how dumb, all terminals I know of display ASCII. I can understand your aware. Perhaps you don't want see your terminal flash by escape sequence, beep, endless graphic characters, etc. For legacy byte-string applications(whether written in C or Python), printing arbitrary string can cause such mess. But this is unlikely to happen by printing the Unicode string, since the characters your terminal cannot understand will be escaped or be converted to character such as '?'. Hope this helps. From ncoghlan at gmail.com Fri May 16 00:56:44 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 16 May 2008 08:56:44 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> Message-ID: <482CBFAC.9040004@gmail.com> Paul Moore wrote: > On 15/05/2008, Atsuo Ishimoto wrote: >> I would like to call it "improve", not break :) > > Please can you help me understand the impact here. I am running > Windows XP (UK English - console code page 850, which is some variety > of Latin 1). Currently, printing non-latin1 characters gives me an > exception: for example, As Oleg and Atsuo already pointed out, this is addressed in the PEP by switching the encoding error mode on sys.stderr and sys.stdout to backslashreplace instead of the current strict. So not only will repr() still display correctly for you, all other strings containing Unicode characters will start displaying as well (with Unicode escapes in place of the glyphs your display can't cope with). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Fri May 16 01:30:29 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 May 2008 11:30:29 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C0707.8020805@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> Message-ID: <482CC795.1050405@canterbury.ac.nz> M.-A. Lemburg wrote: > str.transform() -> str (uses the encode function of the codec) > str.untransform() -> str (uses the decode function of the codec) Not sure I like those names. It's rather unclear which direction is "transform" and which is "untransform". People seem to have trouble enough with "encode" and "decode", but at least there's a clear definition of that from Unicode-land, and there's the type difference to catch the mistake if you get it wrong. Since both ends have the same type here, it's more important to find unambiguous names if possible. -- Greg From greg.ewing at canterbury.ac.nz Fri May 16 01:36:42 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 May 2008 11:36:42 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482C11B8.3010505@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> Message-ID: <482CC90A.5010907@canterbury.ac.nz> Nick Coghlan wrote: > What this approach allows you to do is have > generic 'transformation' layers in your IO stack, so you can just build > up your IO stack as something like: > > XMLParserIO('myschema') > BufferedTextIO('utf-8') > BytesTransform('gzip') > RawSocketIO There's nothing wrong with that, but what it doesn't answer is why it's not sufficient just to do things like from gzip import gzip_codec stream2 = BytesTransform(gzip_codec, stream1) i.e. why there has to be a special kind of namespace for codecs. -- Greg From guido at python.org Fri May 16 01:46:31 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 15 May 2008 16:46:31 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482CC795.1050405@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> Message-ID: On Thu, May 15, 2008 at 4:30 PM, Greg Ewing wrote: > M.-A. Lemburg wrote: >> >> str.transform() -> str (uses the encode function of the codec) >> str.untransform() -> str (uses the decode function of the codec) > > Not sure I like those names. It's rather unclear which > direction is "transform" and which is "untransform". > > People seem to have trouble enough with "encode" and > "decode", but at least there's a clear definition of > that from Unicode-land, and there's the type difference > to catch the mistake if you get it wrong. > > Since both ends have the same type here, it's more > important to find unambiguous names if possible. Really? Don't you think it's pretty obvious that b.transform("gzip") compresses and b.untransform("gzip") decompresses? Or that b.transform("base64") generates base64 format? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Fri May 16 03:39:54 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 May 2008 13:39:54 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48232FB2.3020205@egenix.com> <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com> <482B10DF.50105@egenix.com> <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com> <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com> <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com> Message-ID: <482CE5EA.1020504@canterbury.ac.nz> Paul Moore wrote: > The key point for me is that > print(repr(arbitrary_string)) is *guaranteed* to display correctly, > even on my limited-capability terminal, precisely because it only uses > ASCII and no matter how dumb, all terminals I know of display ASCII. That still sounds like something that the I/O object connected to the terminal should deal with. You'll have the same problem with any other unicode output that ends up going to the terminal, so it has to deal with it anyway. -- Greg From greg.ewing at canterbury.ac.nz Fri May 16 04:46:21 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 May 2008 14:46:21 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> Message-ID: <482CF57D.6010200@canterbury.ac.nz> Guido van Rossum wrote: > Really? Don't you think it's pretty obvious that b.transform("gzip") > compresses and b.untransform("gzip") decompresses? Or that > b.transform("base64") generates base64 format? Well, maybe. I think the problem is that the word "transform" is inherently direction-neutral, and it only becomes obvious that you have a direction in mind for it when you pair it with some invention such as "untransform". Maybe it's not all that bad, but it just seems like it should be possible to do better than picking a very general word like "transform" and giving it our own special meaning. -- Greg From alexandre at peadrop.com Fri May 16 05:40:17 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 15 May 2008 23:40:17 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482CF57D.6010200@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> Message-ID: On Thu, May 15, 2008 at 10:46 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> Really? Don't you think it's pretty obvious that b.transform("gzip") >> compresses and b.untransform("gzip") decompresses? Or that >> b.transform("base64") generates base64 format? > > Well, maybe. I think the problem is that the word > "transform" is inherently direction-neutral, and it > only becomes obvious that you have a direction in > mind for it when you pair it with some invention > such as "untransform". Me, I have don't a problem with inventing a new word. It is true that it would be slightly more appropriate to say "inverse_transform", but that would be awful to type. Personally, I find the meaning of transform/untransform intuitive, but that's just me. -- Alexandre From tjreedy at udel.edu Fri May 16 07:21:23 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 16 May 2008 01:21:23 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><48242D4A.3060802@egenix.com><482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com><482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> Message-ID: "Greg Ewing" wrote in message news:482CF57D.6010200 at canterbury.ac.nz... | Guido van Rossum wrote: | | > Really? Don't you think it's pretty obvious that b.transform("gzip") | > compresses and b.untransform("gzip") decompresses? Or that | > b.transform("base64") generates base64 format? | | Well, maybe. I think the problem is that the word | "transform" is inherently direction-neutral, and it | only becomes obvious that you have a direction in | mind for it when you pair it with some invention | such as "untransform". | | Maybe it's not all that bad, but it just seems | like it should be possible to do better than picking | a very general word like "transform" and giving | it our own special meaning. Would you prefer re_transform, which is English? From guido at python.org Fri May 16 07:23:35 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 15 May 2008 22:23:35 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> Message-ID: On Thu, May 15, 2008 at 10:21 PM, Terry Reedy wrote: > > "Greg Ewing" wrote in message > news:482CF57D.6010200 at canterbury.ac.nz... > | Guido van Rossum wrote: > | > | > Really? Don't you think it's pretty obvious that b.transform("gzip") > | > compresses and b.untransform("gzip") decompresses? Or that > | > b.transform("base64") generates base64 format? > | > | Well, maybe. I think the problem is that the word > | "transform" is inherently direction-neutral, and it > | only becomes obvious that you have a direction in > | mind for it when you pair it with some invention > | such as "untransform". > | > | Maybe it's not all that bad, but it just seems > | like it should be possible to do better than picking > | a very general word like "transform" and giving > | it our own special meaning. > > Would you prefer re_transform, which is English? Yuck, no. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Fri May 16 07:39:30 2008 From: brett at python.org (Brett Cannon) Date: Thu, 15 May 2008 22:39:30 -0700 Subject: [Python-3000] Help with finishing PEP 3108 Message-ID: I need help to finish implementing PEP 3108. While over 80 modules are now deprecated in Python 2.6 (of which I did over 50 of), there are still over 20 tasks left to do in relation to the PEP. My free time is being sucked away since I have a conference paper deadline of June 1. And I am moving May 31. And I have to fly down to California to help my mother move on June 4. And other personal stuff (see a certain trend in my life at the moment?). So if you have time to help, please see issue 2775 (http://bugs.python.org/issue2775) and the dependencies list. The issues range from doing patch reviews of work people have already done, renaming a module, creating a new package, removing some use from the stdlib, or even backporting some changes made to 3.0 that were never merged into 2.6. In other words a wide variety of things. =) The PEP outlines the steps necessary to deprecate a module for deletion or for a rename in a step-by-step manner so you don't need to worry about forgetting a step. If you can't choose what to do, the issues that will lead to a module be deleted are the highest priority as renames can be handled by 2to3 in a later version while module deletions are harder to get pushed through and accepted. The modules left to still remove are still there because they are still used somehow in the stdlib. The module renames are mostly done at this point, but the new packages have not been handled yet. Obviously I don't want the beta to be held up by this, nor do I want to see any of the work left out because I couldn't get to it all. So any and all help is appreciated. -Brett From greg.ewing at canterbury.ac.nz Fri May 16 09:31:45 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 16 May 2008 19:31:45 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> Message-ID: <482D3861.8050100@canterbury.ac.nz> Terry Reedy wrote: > Would you prefer re_transform, which is English? Fiddling with the name of the antonym doesn't help. The direction of "untransform" or whatever it's called is only as clear as the direction of "transform". -- Greg From mark.russell at zen.co.uk Fri May 16 12:57:37 2008 From: mark.russell at zen.co.uk (Mark Russell) Date: Fri, 16 May 2008 11:57:37 +0100 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D3861.8050100@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> Message-ID: <4C64E79A-8395-44F4-9B38-8FFB4E001451@zen.co.uk> On 16 May 2008, at 08:31, Greg Ewing wrote: > Fiddling with the name of the antonym doesn't help. How about adding a direction indicator? gzipped = plaintext.transformto("gzip") plaintext = gzipped.transformfrom("gzip") Mark From jjb5 at cornell.edu Fri May 16 15:40:19 2008 From: jjb5 at cornell.edu (Joel Bender) Date: Fri, 16 May 2008 09:40:19 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D3861.8050100@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> Message-ID: <482D8EC3.1050303@cornell.edu> > Fiddling with the name of the antonym doesn't help. > The direction of "untransform" or whatever it's > called is only as clear as the direction of > "transform". How about making the transformation parameter more descriptive? gzipped = plaintext.transform(plaintext_to_gzip) plaintext = gzipped.transform(gzip_to_plaintext) I would rather have one function that can do lots of different transformations, the same name can be used for bytes and strings, the transformation can be subclassed, and it doesn't have to be reflexive if that doesn't make sense. somebytes.transform(ebcdic_to_plaintext) OK, maybe that's no so common in YOUR world :-) pict = open('me.jpg', 'r').read() y = pict.transform(jpeg_to_png).transform(plaintext_to_base64) Joel From ncoghlan at gmail.com Fri May 16 16:06:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 17 May 2008 00:06:07 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482CC90A.5010907@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> Message-ID: <482D94CF.7090107@gmail.com> Greg Ewing wrote: > There's nothing wrong with that, but what it doesn't > answer is why it's not sufficient just to do things > like > > from gzip import gzip_codec > stream2 = BytesTransform(gzip_codec, stream1) > > i.e. why there has to be a special kind of namespace > for codecs. Selecting an encoding is the kind of thing that will often come from the application's environment, or user preferences or configuration options, rather than being hardcoded at development time. With a flat, string-based codec namespace, those things are trivial to look up. Having to mess around with __import__ just to support a "choose compression method" configuration option would be fairly annoying. The case for the special namespace is much stronger for the actual unicode encodings, but it still has at least some force for the bytes->bytes and str->str transforms. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jjb5 at cornell.edu Fri May 16 17:58:31 2008 From: jjb5 at cornell.edu (Joel Bender) Date: Fri, 16 May 2008 11:58:31 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D8EC3.1050303@cornell.edu> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu> Message-ID: <482DAF27.4080905@cornell.edu> I wrote: > ...and it doesn't have to be reflexive if that... Umm, that should have said 'have an inverse', which is different than reflexive or symmetric. I get a little lost on 'surjective' and 'injective', having been taught the terms 'onto' and 'one-to-one'. But I digress. For wrapping a file-like object, I would prefer a TransformIO class that takes read and write transform functions, e.g., f = TransformIO(open('data.txt') , read=ebcdic_to_plaintext , write=plaintext_to_ebcdic ) These parameters would be optional, so if 'write' was omitted then write attempts would fail, likewise for 'read'. Using functools.partial could be used to provide common transforms: ISO_8859_1_Transform = functools.partial( TransformIO , read=ISO_8859_1_Decode , write=ISO_8859_1_Encode ) Where the to/from plain text is implicit. And no, I'm not a huge fan of underbars. Joel From stephen at xemacs.org Sat May 17 01:17:29 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 17 May 2008 08:17:29 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D8EC3.1050303@cornell.edu> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu> Message-ID: <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp> Joel Bender writes: > > Fiddling with the name of the antonym doesn't help. > > The direction of "untransform" or whatever it's > > called is only as clear as the direction of > > "transform". > > How about making the transformation parameter more descriptive? > > gzipped = plaintext.transform(plaintext_to_gzip) > plaintext = gzipped.transform(gzip_to_plaintext) +1 But why be verbose *and* ignore the vernacular? gzipped = plaintext.transform('gzip') plaintext = gzipped.transform('gunzip') I think the style should be EIBTI for "private" protocols, and TOOWDTI for transforms that wrap well-known libraries. > I would rather have one function that can do lots of different > transformations, the same name can be used for bytes and strings, This is a non-starter, because you don't know what the representation of strings is. We could be right-thinking and mandate that in the .transform() context the string representation is considered big-endian (and for little-endian platforms the bytes are swabbed before applying the transformation). But that would annoy all the Wintel users because string.transform('zip') would produce gobbledgook when unzipped from the command line. And of course assuming a little- endian representation is un-right-thinkable. In this sense string-to-string and byte-to-byte *must* be kept separate from "true" codecs. I think it would be a very bad idea to allow names to be shared for, say, byte-to-byte and string-to-byte "gzip" for the reason given above. Whether string-to-string and byte-to-byte need to share a namespace is another question, but since we already need three (string->byte, byte->string, byte->byte) that should be forced not to collide, I don't think that there's that big a loss in requiring that .transform('pig_latin') (string to string) be spelled differently from .transform('pig_latin1') (byte to byte assuming ISO 8859/1 data). Do you have use cases where byte-to-byte and string-to-string transformations should share the same name? From greg.ewing at canterbury.ac.nz Sat May 17 10:26:50 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 17 May 2008 20:26:50 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D94CF.7090107@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> Message-ID: <482E96CA.80103@canterbury.ac.nz> Nick Coghlan wrote: > Having to mess around with __import__ just to support a "choose > compression method" configuration option would be fairly annoying. Perhaps, but even then, I'm not sure it makes sense to lump them all into the same namespace. If you're choosing a compression method, it makes sense to choose 'zip', 'gzip', or 'bzip2', but less sense to choose 'hex' or 'base64', and even less 'utf8' or 'latin1'. Similarly there will be different appropriate sets for video encoding, audio encoding, etc. -- Greg From mal at egenix.com Sat May 17 11:17:45 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 17 May 2008 11:17:45 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482E96CA.80103@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz> Message-ID: <482EA2B9.4090801@egenix.com> On 2008-05-17 10:26, Greg Ewing wrote: > Nick Coghlan wrote: > >> Having to mess around with __import__ just to support a "choose >> compression method" configuration option would be fairly annoying. > > Perhaps, but even then, I'm not sure it makes sense to > lump them all into the same namespace. Note that only the stdlib codecs are using one flat namespace. Other codec packages may (and should) register their own codec search functions and can then easily use other namespaces as well. Think of the codec registry and access as a highly specialized module import mechanism. It is well possible to group codecs in packages and then access them via their package name, e.g. 'compress.gzip'. However, in practice, just writing 'gzip' is going to have enough expressiveness to have a programmer understand what is happening. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 17 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Sat May 17 11:18:55 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 17 May 2008 19:18:55 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482E96CA.80103@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz> Message-ID: <482EA2FF.5060306@gmail.com> Greg Ewing wrote: > Nick Coghlan wrote: > >> Having to mess around with __import__ just to support a "choose >> compression method" configuration option would be fairly annoying. > > Perhaps, but even then, I'm not sure it makes sense to > lump them all into the same namespace. > > If you're choosing a compression method, it makes sense > to choose 'zip', 'gzip', or 'bzip2', but less sense to > choose 'hex' or 'base64', and even less 'utf8' or 'latin1'. > > Similarly there will be different appropriate sets for > video encoding, audio encoding, etc. The problem with that is that defining the categories becomes a fairly tedious chore. Having the codec namespace is convenient, and I don't see anything but downsides in trying to replace it with something more complicated. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stephen at xemacs.org Sat May 17 23:57:34 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 18 May 2008 06:57:34 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482E96CA.80103@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz> Message-ID: <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > If you're choosing a compression method, it makes sense > to choose 'zip', 'gzip', or 'bzip2', but less sense to > choose 'hex' or 'base64', Doesn't "consenting adults" cover choosing a nonsensical compressor? Do you really think that .transform clients will really choose 'base64' when they want 'lzma'? If so, why isn't if compression_method not in ['zip', 'lzma']: raise PEBKAC_Error sufficient protection? > and even less 'utf8' or 'latin1'. These will fail the typing tests, since they are string->bytes, not bytes->bytes. These tests will be necessary, which could be considered an argument against the flat namespace. From greg.ewing at canterbury.ac.nz Sun May 18 01:50:24 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 18 May 2008 11:50:24 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz> <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <482F6F40.4030902@canterbury.ac.nz> Stephen J. Turnbull wrote: > Do you really think that .transform clients will really choose > 'base64' when they want 'lzma'? It depends on who the "client" is. An application popping up a list of compression methods is just going to confuse users if it lists "base64" as a possibility. So it already needs some application-specific notion of what constitutes a probable compression method built into it, and if that list is to be extensible, it needs an application-specific registry to manage it. Once you've got that, the general codec registry doesn't help you much. -- Greg From stephen at xemacs.org Sun May 18 05:05:59 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 18 May 2008 12:05:59 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482F6F40.4030902@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz> <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp> <482F6F40.4030902@canterbury.ac.nz> Message-ID: <87lk28fh0o.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > So it already needs some application-specific notion of > what constitutes a probable compression method built > into it, and if that list is to be extensible, it needs > an application-specific registry to manage it. Once > you've got that, the general codec registry doesn't > help you much. Excuse me? The codec-and-transform registry tells whether the codec or transform is available in this Python; that's all it is supposed to do. Even if you do need an application-specific registry of compressors, some Python-level registry is required to determine whether a desired one is actually available and where it lives. True, this could be done through the usual module mechanisms, but that won't require any less coding than using the usual codec mechanism. And I find Nick's rationale for a flat namespace of strings quite convincing given that it won't cost any more. I also suspect that it may make sense to allow various "standard deobfuscations" of codec names as in glibc (whose version of iconv considers "utf8", "UTF-8", and "Utf_8" to be equivalent names for "Unicode UTF-8" according to rules which canonicalize case and strip punctuation), as well as aliasing. (These aren't strong reasons for using a flat string registry, but they come more or less for free if we do use it.) From martin at v.loewis.de Sun May 18 09:30:38 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 18 May 2008 09:30:38 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482D94CF.7090107@gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> Message-ID: <482FDB1E.3010303@v.loewis.de> > Selecting an encoding is the kind of thing that will often come from the > application's environment, or user preferences or configuration options, > rather than being hardcoded at development time. And that's the main difference why having encode/decode is a good idea, and having transform/untransform is a bad idea. Encoding names are in configuration data all the time, or even in the actual data (e.g. in MIME); they rarely are in configuration. You typically *don't* read the name of transformations from a configuration file. And even if they are in configuration, you typically have a fixed set of options, rather than an extensible one. > With a flat, > string-based codec namespace, those things are trivial to look up. > Having to mess around with __import__ just to support a "choose > compression method" configuration option would be fairly annoying. I wouldn't mess with import: import gzip, bz2 compressors = {"gzip":gzip.StreamCompressor, "bzip2":bz2.BZ2Compressor} decompressors={"gzip":gzip.StreamDecompressor, "bzip2":bz2.BZ2Decompressor} It's not that people invent new compression methods every day. OTOH, these things have often more complex parameters than just a name; e.g. the compressors also take a compression level. In these cases, using output_to = compressors[name](compresslevel=complevel) could work fine (as both might happen to support the compresslevel keyword argument). > The case for the special namespace is much stronger for the actual > unicode encodings, but it still has at least some force for the > bytes->bytes and str->str transforms. Not to me, no. Regards, Martin From regebro at gmail.com Sun May 18 16:38:01 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 18 May 2008 16:38:01 +0200 Subject: [Python-3000] Python incompatibility test project. Message-ID: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com> Hi all! I have created a project to make tests for all incompatibilities between Python 2.5, 2.6 and 3.0. It's hosted on Google code: http://code.google.com/p/python-incompatibility/ It currently contains what I believe to be complete tests of language incompatibilities. It also contains example code of how to avoid the incompatibility if possible and hence write code running under both 2.6 and 3.0. Files called test_something25.py runs under Python 2.5, and 2.6 but should fail under Python 3.0. Files called test_something30.py runs under Python 3.0, but should fail under Python 2.5. Files called test_something26.py runs under Python 2.6 and Python 3.0. It also contains a test runner, runtest.py, and another testrunner that prints out the test in a nice grid, called makereport.py. Both these run under python2.4 to 3.0. makereport.py requires you to have python2.5, python2.6 and python3.0 installed in the path. There is as of today no tests of the standard library changes, but I would like to have it. Help with this is appreciated, ask and ye shall receive commit rights. :) I could also have missed some language incompatibility. The report output as of just now is: Python 2.5 code Python 2.6 code Python 3.0 code Group Test 2.5 2.6 3.0 2.5 2.6 3.0 2.5 2.6 3.0 classic_classes MRO Y Y N Y Y Y N N Y class_type Y Y N Y Y Y - - - dict dynamic_key_views - - - - - - N N Y iterator Y Y N Y Y Y N N Y slicing Y Y N Y Y Y - - - sorting Y Y N Y Y Y - - - division division Y Y N Y Y Y N N Y exception_syntax exception_syntax Y Y N N Y Y N Y Y filter filter Y Y N Y Y Y - - - long long Y Y N Y Y Y N N Y map map Y Y N Y Y Y - - - print print_file Y Y N N Y Y N N Y print_stdout Y Y N N Y Y N N Y range range Y Y N Y Y Y - - - reduce reduce Y Y N N Y Y N Y Y sort sort Y Y N Y Y Y - - - sorted Y Y N Y Y Y - - - string_exceptions string_exceptions Y N N - - - - - - unicode unicode Y Y N N Y Y N N Y xrange xrange Y Y N Y Y Y N N Y Note the following: - All 2.6 tests run under both 2.6 and 3.0. Python3 is not so incompatible as rumour has it. :-) - There are less tests for 3.0 than for 2.5. Much of the incompatibility for 3.0 is that you can't do some bad programming that you could in 2.x. For example you can't do "adict.keys()[5]" in 3.0. But why on earth would you misuse dicts like that? :-) Python 3 will force you to write good code in some cases where you in 2.5 can write bad code. :-) So the better your code, the easier to port to Python 3. ;.) Feedback and help is greatly appreciated! No Python 3 experience necessary, this is a fun way to get to know Python 3! -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From regebro at gmail.com Sun May 18 17:03:16 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 18 May 2008 17:03:16 +0200 Subject: [Python-3000] Python incompatibility test project. In-Reply-To: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com> References: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com> Message-ID: <319e029f0805180803j69503e55jdedfd7b092089e4@mail.gmail.com> On Sun, May 18, 2008 at 4:38 PM, Lennart Regebro wrote: > It currently contains what I believe to be complete tests of language > incompatibilities. Although I just relialized that there is a bunch of builtin methods that are gone which I don't have tests for. Ah well. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From paul.bedaride at gmail.com Sun May 18 23:33:34 2008 From: paul.bedaride at gmail.com (paul bedaride) Date: Sun, 18 May 2008 23:33:34 +0200 Subject: [Python-3000] Metaclass Vs Class Decorator Message-ID: I see the peps 3115 and 3129 about metaclass and class decorators. I think that the pep 3129 need to be improved for show the way to declare the decorator and not just the way to appy them. I also wonder if we need this two things, and if that is not two way to explain the same semantic. It's why a want to know how to express the class decorator for making a comparison paul bedaride -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sun May 18 23:35:04 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 18 May 2008 23:35:04 +0200 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: References: Message-ID: paul bedaride schrieb: > I see the peps 3115 and 3129 about metaclass and class decorators. > > I think that the pep 3129 need to be improved for show the way to declare > the decorator and not just the way to appy them. > > I also wonder if we need this two things, and if that is not two way to > explain > the same semantic. > > It's why a want to know how to express the class decorator for making a > comparison A class decorator works exactly like a function decorator, that is, @foo class X: ... is equivalent to class X: ... X = foo(X) This should be all you need to know in order to write a class decorator. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From python at rcn.com Mon May 19 05:36:54 2008 From: python at rcn.com (Raymond Hettinger) Date: Sun, 18 May 2008 20:36:54 -0700 Subject: [Python-3000] Metaclass Vs Class Decorator References: Message-ID: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> >> It's why a want to know how to express the class decorator for making a >> comparison [Georg] > A class decorator works exactly like a function decorator, that is, > > @foo > class X: ... > > is equivalent to > > class X: ... > X = foo(X) > > This should be all you need to know in order to write a class decorator. I concur. Raymond From ncoghlan at gmail.com Mon May 19 07:06:55 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 19 May 2008 15:06:55 +1000 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: References: Message-ID: <48310AEF.8010408@gmail.com> paul bedaride wrote: > I also wonder if we need this two things, and if that is not two way to > explain > the same semantic. Changing the metaclass can lead to some fundamental changes to the way a class operates. Class decorators are for simpler things which don't require major changes to the class, and, in particular, things which shouldn't automatically be inherited by subclasses. The specific motivating example in the python-dev thread linked from PEP 3129 was a class registry where being a subclass of an already registered class didn't necessary imply that the subclass should also be registered. This semantic is painful to implement using a metaclass, but trivial with a class decorator. "Should subclasses implicitly inherit this behaviour" is actually a pretty decent rule of thumb for deciding whether something should be handled with a metaclass or a class decorator. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Mon May 19 17:14:20 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 19 May 2008 08:14:20 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482FDB1E.3010303@v.loewis.de> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> Message-ID: On Sun, May 18, 2008 at 12:30 AM, "Martin v. L?wis" wrote: >> Selecting an encoding is the kind of thing that will often come from the >> application's environment, or user preferences or configuration options, >> rather than being hardcoded at development time. > > And that's the main difference why having encode/decode is a good idea, > and having transform/untransform is a bad idea. > > Encoding names are in configuration data all the time, or even in the > actual data (e.g. in MIME); they rarely are in configuration. > > You typically *don't* read the name of transformations from a > configuration file. And even if they are in configuration, you > typically have a fixed set of options, rather than an extensible > one. > >> With a flat, >> string-based codec namespace, those things are trivial to look up. >> Having to mess around with __import__ just to support a "choose >> compression method" configuration option would be fairly annoying. > > I wouldn't mess with import: > > import gzip, bz2 > compressors = {"gzip":gzip.StreamCompressor, > "bzip2":bz2.BZ2Compressor} > decompressors={"gzip":gzip.StreamDecompressor, > "bzip2":bz2.BZ2Decompressor} > > It's not that people invent new compression methods every day. > > OTOH, these things have often more complex parameters than just > a name; e.g. the compressors also take a compression level. In > these cases, using > > output_to = compressors[name](compresslevel=complevel) > > could work fine (as both might happen to support the compresslevel > keyword argument). > >> The case for the special namespace is much stronger for the actual >> unicode encodings, but it still has at least some force for the >> bytes->bytes and str->str transforms. > > Not to me, no. Hm, Martin is pretty convincing here. Before we go ahead and accept .transform() and friends (by whatever name) we should look for convincing use cases where the transformation is typically given by some other input, rather than hard-coded in the app. (And cases where there are two or three possibilities from a fixed menu don't count -- so that would rule out Content-transfer-encoding.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jjb5 at cornell.edu Mon May 19 17:53:11 2008 From: jjb5 at cornell.edu (Joel Bender) Date: Mon, 19 May 2008 11:53:11 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu> <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4831A267.5000304@cornell.edu> Stephen J. Turnbull wrote: > But why be verbose *and* ignore the vernacular? > > gzipped = plaintext.transform('gzip') > plaintext = gzipped.transform('gunzip') I'm generally resistant to a registry, none of my applications are so general that they would take advantage of a string-key-to-dictionary-to-function-pointer. If they did, they would have to have some pretty severe constraints on what functions can be selected, so I would end up building my own context sensitive dictionary of available functions. I'm in favor of: gzipped = plaintext.transform(zlib.compress) plaintext = gzipped.transform(zlib.decompress) So, you may ask, why would that be any better that this... gzipped = zlib.compress(plaintext) ...and the answer is that it depends on what you consider the most appropriate design pattern to follow. > I think the style should be EIBTI for "private" protocols, and TOOWDTI > for transforms that wrap well-known libraries. I've been around socket libraries and protocol encoding/decoding stacks too long I guess, or I'm just jaded, but TOOWDTI is a pipe dream. There's Only One Blessed Way To Do It I can understand and appreciate. EIBTI trumps TOOWDTI when it has to go through a registry. I would be -1 on this design: In module codecs: from gzip import compress as _gzip_compress ... _registry['gzip'] = _gzip_compress Where there is a great deal of code that enforces TOOWDTI, effectively obfuscating the fact that all your passing to transform() nothing more magical than a reference to a function. > This is a non-starter, because you don't know what the representation > of strings is. If you're working on that kind of application. My applications have to know what the items in the sequence are, or they have to figure it out, but when it comes time to do the transformation, they know. > We could be right-thinking and mandate that in the > .transform() context the string representation is considered > big-endian (and for little-endian platforms the bytes are swabbed > before applying the transformation). Yuck. > But that would annoy all the Wintel users because string.transform('zip') > would produce gobbledgook when unzipped from the command line. And > of course assuming a little-endian representation is un-right-thinkable. It would annoy me because mandating the format of the input is up to the transformation function, not the transform(). y = x.transform(f) If there is some endian restriction on f, it should detect it and enforce it, or if it can't, document it. If there is some platform strangeness, it should take that into account. > In this sense string-to-string and byte-to-byte *must* be kept > separate from "true" codecs. I don't any codecs that aren't true. Some may be more popular or command than others, and the more popular ones may be blessed by being presented as easily accessible, just like your gunzip === gzip_to_plaintext. > I think it would be a very bad idea to allow names to be shared > for, say, byte-to-byte and string-to-byte "gzip" for the reason > given above. I don't agree, only because I've written plenty of functions that can take a variety of different kinds of inputs as a convenience. If zlib.compress can take bytes or strings I would be fine with that, and if I could be more explicit, e.g., gzipped = plainbytes.transform(zlib.compress_bytes) I would be even happier. What is not available in Python that is in C++, and believe that I don't miss it all THAT much, is a way to select the appropriate function based on both the input and output. Annotations would have been a way to do it, but there's far too many people that don't like it for very good reasons. > Whether string-to-string and byte-to-byte need to share a namespace is > another question, but since we already need three (string->byte, > byte->string, byte->byte) that should be forced not to collide, I > don't think that there's that big a loss in requiring that > .transform('pig_latin') (string to string) be spelled differently from > .transform('pig_latin1') (byte to byte assuming ISO 8859/1 data). I agree, and I don't think there's an advantage to passing string names. import piglatin as pig piggy = mytext.transform(pig.latin1_encode) I'm -1 on transform.register('pig_latin1', pig.latin1_encode). > Do you have use cases where byte-to-byte and string-to-string > transformations should share the same name? Not in the same module. Joel From guido at python.org Mon May 19 18:10:19 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 19 May 2008 09:10:19 -0700 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> References: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> Message-ID: On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger wrote: >>> It's why a want to know how to express the class decorator for making a >>> comparison > > [Georg] >> >> A class decorator works exactly like a function decorator, that is, >> >> @foo >> class X: ... >> >> is equivalent to >> >> class X: ... >> X = foo(X) >> >> This should be all you need to know in order to write a class decorator. > > I concur. Technically, that's true, but an example wouldn't hurt. Examples also help understanding the motivation. Even the difference between class decorators and metaclasses could be explained with examples. (E.g. a metaclass that auto-registers its classes vs. a class decorator that registers a class.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Mon May 19 19:03:53 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 19 May 2008 19:03:53 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> Message-ID: <4831B2F9.8040001@egenix.com> On 2008-05-19 17:14, Guido van Rossum wrote: > Hm, Martin is pretty convincing here. Before we go ahead and accept > .transform() and friends (by whatever name) we should look for > convincing use cases where the transformation is typically given by > some other input, rather than hard-coded in the app. (And cases where > there are two or three possibilities from a fixed menu don't count -- > so that would rule out Content-transfer-encoding.) The .transform() methods are meant as interface to same type codecs in general, not just compression algorithms. They are convenience methods to the codecs registry with the added benefit of applying type checks which the codecs registry does not guarantee since it only manages codecs. Of course, you can write everything directly against the codec registry or some other specialized interface, but that's not really what we're after here. The methods are meant to make code easy to write in the general use case, without having to worry about special parameters or finding the right module and function names. Motivation: When was the last time you used a gzip compression option (ie. yes there are options, but do you use them in the general use case) ? Can you write code that applies UU encoding without looking up the details in the documentation (ie. there is a module for doing UU-encoding in the stdlib, but what's it's name, what's the function, does it need extra logic) ? The motivation is not driven by having the need to pass a configuration parameter to a .transform() method. It's being able to write str.transform('gzip').transform('uu') which doesn't require knowledge about the modules doing the actual work behind the scenes. We're not adding those methods because there's no other way to get the functionality. It's all about usability, readability and PEP20 ("Beautiful is better than ugly."). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 19 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From python at rcn.com Mon May 19 19:19:06 2008 From: python at rcn.com (Raymond Hettinger) Date: Mon, 19 May 2008 10:19:06 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> Message-ID: <006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1> [MAL] > It's being able to write > > str.transform('gzip').transform('uu') > > which doesn't require knowledge about the modules doing the actual > work behind the scenes. What is the reverse operation for the above example: str.untransform('uu').untransform('gzip')? Why can't we use codecs and stick with the usual encode/decode methods? Raymond From jjb5 at cornell.edu Mon May 19 19:21:38 2008 From: jjb5 at cornell.edu (Joel Bender) Date: Mon, 19 May 2008 13:21:38 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <482FDB1E.3010303@v.loewis.de> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> Message-ID: <4831B722.3070707@cornell.edu> Martin v. L?wis wrote: > And that's the main difference why having encode/decode is a good idea, > and having transform/untransform is a bad idea. I agree that 'untransform' is a bad name for the inverse of transform, but I don't think 'transform' is bad. For me the distinction is existence of a 'model'. sequence -> model -> sequence ...is different than... sequence -> sequence where 'sequence' is a string, bytes or stream. In transformations there is no intermediate model. > OTOH, these things have often more complex parameters than just > a name; e.g. the compressors also take a compression level. In > these cases, using > > output_to = compressors[name](compresslevel=complevel) > > could work fine (as both might happen to support the compresslevel > keyword argument). Your example seems to indicate a model->sequence operation, that I would call 'encode'. Now the question becomes, given 'f', what makes more sense: (a) y = x.transform(f) (b) y = x.encode(f) (c) y = f(x) What do you expect the function signature of 'output_to' to be? Is it callable? Is it something that is going to be a stream wrapper, that has .read() and .write()? Is it an intermediary, something that can be built as an object and bound between two streams bidirectionally? f().transform(x, y) Another case, which would suffer from as much if not more API confusion, would be encrypting and decrypting... from Crypto.Cipher import DES obj = DES.new('abcdefgh', DES.ECB) plain = "Guido van Rossum is a space alien.XXXXXX" In this case using .transform() would seem to be a good fit because there is no model, but 'obj' suffers from being directionless, so it becomes this... ciph = plain.transform(obj.encrypt) ...which isn't substantially clearer than... ciph = obj.encrypt(plain) Parametric transformations don't bother me, but that would be an indication that there's a lot more going on, and perhaps there are better (and pre-existing) labels for these functions. Joel From paul.bedaride at gmail.com Mon May 19 19:34:17 2008 From: paul.bedaride at gmail.com (paul bedaride) Date: Mon, 19 May 2008 19:34:17 +0200 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: References: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> Message-ID: I think about it, and I think that it's two differents way of applying a similar thing, it's why I wonder, if this can't be good if metaclass and class decorator have the same interface, then we can use a class as a metaclass or as a decorator ?? paul bedaride On Mon, May 19, 2008 at 6:10 PM, Guido van Rossum wrote: > On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger wrote: > >>> It's why a want to know how to express the class decorator for making a > >>> comparison > > > > [Georg] > >> > >> A class decorator works exactly like a function decorator, that is, > >> > >> @foo > >> class X: ... > >> > >> is equivalent to > >> > >> class X: ... > >> X = foo(X) > >> > >> This should be all you need to know in order to write a class decorator. > > > > I concur. > > Technically, that's true, but an example wouldn't hurt. Examples also > help understanding the motivation. Even the difference between class > decorators and metaclasses could be explained with examples. (E.g. a > metaclass that auto-registers its classes vs. a class decorator that > registers a class.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/ > ) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon May 19 19:36:34 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 19 May 2008 10:36:34 -0700 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: References: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> Message-ID: You ought to ask this on c.l.py. The designers of the feature were well aware of the similarities, and also of the differences, and the decision was made to have both. Explaining this to every person who asks is not a good use of our time. On Mon, May 19, 2008 at 10:34 AM, paul bedaride wrote: > I think about it, and I think that it's two differents way of applying a > similar thing, > > it's why I wonder, if this can't be good if metaclass and class decorator > have the same > interface, then we can use a class as a metaclass or as a decorator ?? > > paul bedaride > > On Mon, May 19, 2008 at 6:10 PM, Guido van Rossum wrote: >> >> On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger wrote: >> >>> It's why a want to know how to express the class decorator for making >> >>> a >> >>> comparison >> > >> > [Georg] >> >> >> >> A class decorator works exactly like a function decorator, that is, >> >> >> >> @foo >> >> class X: ... >> >> >> >> is equivalent to >> >> >> >> class X: ... >> >> X = foo(X) >> >> >> >> This should be all you need to know in order to write a class >> >> decorator. >> > >> > I concur. >> >> Technically, that's true, but an example wouldn't hurt. Examples also >> help understanding the motivation. Even the difference between class >> decorators and metaclasses could be explained with examples. (E.g. a >> metaclass that auto-registers its classes vs. a class decorator that >> registers a class.) >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Mon May 19 19:36:44 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 19 May 2008 19:36:44 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> <006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1> Message-ID: <4831BAAC.9050403@egenix.com> On 2008-05-19 19:19, Raymond Hettinger wrote: > [MAL] >> It's being able to write >> >> str.transform('gzip').transform('uu') >> >> which doesn't require knowledge about the modules doing the actual >> work behind the scenes. > > What is the reverse operation for the above example: > str.untransform('uu').untransform('gzip')? Yes. BTW: Since the codecs do bytes->bytes conversion, I should have written bytes.transform('gzip').transform('uu') > Why can't we use codecs and stick with the usual encode/decode methods? That's what you can do in Python 2.x. In Py 3.x, .encode() and .decode() have strict type requirements on their return types. .transform() and .untransform() return the same type, .encode() and .decode() return bytes and str resp. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 19 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From tjreedy at udel.edu Mon May 19 20:12:54 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 19 May 2008 14:12:54 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> Message-ID: "M.-A. Lemburg" wrote in message news:4831B2F9.8040001 at egenix.com... | Motivation: When was the last time you used a gzip compression | option (ie. yes there are options, but do you use them in the | general use case) ? Can you write code that applies UU encoding | without looking up the details in the documentation (ie. there | is a module for doing UU-encoding in the stdlib, but what's it's | name, what's the function, does it need extra logic) ? This suggests to me the possibility of two more packages for the reorganized stdlib: b2b and s2s. Or of considating most transform functions into one module, just as math and cmath consolidate float and complex transforms -- some with inverses and some not. IOW, I think .transform may be the wrong solution to library disorganization. | The motivation is not driven by having the need to pass a | configuration parameter to a .transform() method. | | It's being able to write | | str.transform('gzip').transform('uu') To me, this is to uu(gzip(s)) as somefloat.transform('cos').transform('sin') is to sin(cos(somefloat)) | which doesn't require knowledge about the modules doing the actual | work behind the scenes. It does require knowledge of the registered name. | We're not adding those methods because there's no other way | to get the functionality. It's all about usability, readability | and PEP20 ("Beautiful is better than ugly."). I think I find the direct function call more readable and prettier. tjr From mal at egenix.com Mon May 19 21:22:57 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 19 May 2008 21:22:57 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> Message-ID: <4831D391.8030008@egenix.com> On 2008-05-19 20:12, Terry Reedy wrote: > IOW, I think .transform may be the wrong solution to library > disorganization. Those methods are not meant to help with the library reorg. They are needed as an easy way to access codecs that perform str->str or bytes->bytes encoding/decoding, e.g. for escaping text ('unicode-printable', 'xml-escape'). I'm using gzip, uu or base64 as examples, since those codecs already exist in Python 2.x and currently cannot be used in Python 3.x due to the type restrictions on the .encode() and .decode() methods. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 19 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From martin at v.loewis.de Mon May 19 23:03:23 2008 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 19 May 2008 23:03:23 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4831B2F9.8040001@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> Message-ID: <4831EB1B.4080404@v.loewis.de> > They are convenience methods to the codecs registry > with the added benefit of applying type checks which the codecs > registry does not guarantee since it only manages codecs. I argue that things that could be parameters to .transform don't belong into the codec registry in the first place. > Of course, you can write everything directly against the codec > registry or some other specialized interface, but that's not > really what we're after here. No need for writing directly against the codec registry. Using some other specialized interface: yes, Yes, YES! > Motivation: When was the last time you used a gzip compression > option (ie. yes there are options, but do you use them in the > general use case) ? Depends on what I do: when I invoke gzip from the command line, I pass -9 all the time, as a habit. Or did you mean "in Python"? It's a long time that I needed to use the gzip module at all; and the last few times, I suppose it was always through the tarfile module. I use gzip so rarely that I find it wasteful that it gets its own shortcut. If I had a (half-serious) wish for a string method shortcut, it would be "GET / HTTP/1.0\r\n\r\n".sendto("foo.bar.com", 80) Perhaps I should write a codec for that: "GET / HTTP/1.0\r\n\r\n".encode("http:foo.bar.com") which sends the request and returns the response :-) > Can you write code that applies UU encoding > without looking up the details in the documentation (ie. there > is a module for doing UU-encoding in the stdlib, but what's it's > name, what's the function, does it need extra logic) ? You mean, without looking into the HTML documentation? Sure enough. "import uu" I remember, then I do help(uu), scroll to the end. If you can't remember that the module's name is uu, then you probably can't remember the codec, either: py> "foo".encode("uuencode") Traceback (most recent call last): File "", line 1, in LookupError: unknown encoding: uuencode Regards, Martin From martin at v.loewis.de Mon May 19 23:15:45 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 19 May 2008 23:15:45 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4831B722.3070707@cornell.edu> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B722.3070707@cornell.edu> Message-ID: <4831EE01.7020704@v.loewis.de> >> output_to = compressors[name](compresslevel=complevel) >> > Your example seems to indicate a model->sequence operation, that I would > call 'encode'. Now the question becomes, given 'f', what makes more sense: > > (a) y = x.transform(f) > (b) y = x.encode(f) > (c) y = f(x) > > What do you expect the function signature of 'output_to' to be? People brought that up in the context of stacking streams. So output_to would have a stream interface, so you would say (d) output_to.write(x) (and yes, I do recognize that the ultimate receiver of the output, e.g. the socket or such, is missing in my API) > Is it > callable? Is it something that is going to be a stream wrapper, that > has .read() and .write()? That's what I meant it to be. I'm not quite sure why you are asking these questions. > In this case using .transform() would seem to be a good fit because > there is no model, but 'obj' suffers from being directionless, so it > becomes this... > > ciph = plain.transform(obj.encrypt) > > ...which isn't substantially clearer than... > > ciph = obj.encrypt(plain) It isn't substantially clearer, and *therefore* it is a good fit??? > Parametric transformations don't bother me, but that would be an > indication that there's a lot more going on, and perhaps there are > better (and pre-existing) labels for these functions. If you are saying that we should call it .encrypt, not .transform: I completely agree. Regards, Martin From stephen at xemacs.org Tue May 20 00:27:46 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 20 May 2008 07:27:46 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> Message-ID: <87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Hm, Martin is pretty convincing here. Before we go ahead and accept > .transform() and friends (by whatever name) we should look for > convincing use cases where the transformation is typically given by > some other input, rather than hard-coded in the app. (And cases where > there are two or three possibilities from a fixed menu don't count -- > so that would rule out Content-transfer-encoding.) I don't understand the motivation for this restriction. I think we do not want to share names across categories, so the size of any given category is not important, it's the whole registry that is useful. If people want to filter on category, the registry entries could be given a 'category' attribute. Aside from that, the kind of application I have in mind is indeed something like the email module and its clients (like Mailman). Things like language_charset_map = { 'japanese' : 'iso-2022-jp', 'english' : 'iso-8859-1', 'russian' : 'koi8-r', ... } charset_transfer_encoding_map = { 'iso-2022-jp' : 'base64', 'iso-8859-1' : 'quoted-printable', 'koi8-r' : 'base64', ... } mime_type_compression_map = { 'text/plain' : None, 'img/bmp' : 'gzip', ... } with the almost obvious definition of transform_mime_body(). This kind of table is often given in a file accessed by non-Python- programmers. For example, for encodings that are not mostly ASCII, gzipped base64 may be a very economical way to transmit (and store) a text part. However, a non-English list that transmits a lot of code might prefer quoted-printable to allow the code to be analyzed by some kind of robot (obviously a legacy app!), and many lists will have strong preferences between UTF-8 and a legacy encoding. Japanese companies often have corporate encodings containing characters not available in JIS (and sometimes not in Unicode). A list dedicated to image processing may want to add image/* formats that haven't yet been registered with the IANA, etc. On the Mailman lists it is a FAQ that people don't understand the difference between 'None' and None. I don't think we can avoid None, True, and False, but for many Mailman admins the difference between 'gzip' and Compressors.gzip.compress is non-obvious and annoying. Giving string names to all these transforms would make the administration interface perceptibly more regular. On the other hand, suppose we have a web interface for configuration so that the admins don't ever see the difference between a codec registry key and a Python identifier. Do we want to expose all the possible compressors, codecs, transfer encodings, and what not in the module that provides the configuration UI so that the list of names can be provided? How does the web interface avoid needing to know all of those in advance? How does the web interface know which functions are which (eg, compressor v. decompressor)? Of course the same questions apply to a registry, but as functionality (answers to those questions) is added to the registry, the changes needed to take advantage of it are much more localized and less invasive than, say, requiring "compressors" to provide "compress" and "uncompress" functions or methods, and a standard set of options. The main thing that I sympathize with in Martin's post is the issue of options to transforms, but it seems to me that keyword arguments deal with that clearly and flexibly. From guido at python.org Tue May 20 00:32:43 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 19 May 2008 15:32:43 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, May 19, 2008 at 3:27 PM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > > Hm, Martin is pretty convincing here. Before we go ahead and accept > > .transform() and friends (by whatever name) we should look for > > convincing use cases where the transformation is typically given by > > some other input, rather than hard-coded in the app. (And cases where > > there are two or three possibilities from a fixed menu don't count -- > > so that would rule out Content-transfer-encoding.) > > I don't understand the motivation for this restriction. I think we do > not want to share names across categories, so the size of any given > category is not important, it's the whole registry that is useful. If > people want to filter on category, the registry entries could be given > a 'category' attribute. > > Aside from that, the kind of application I have in mind is indeed > something like the email module and its clients (like Mailman). > Things like > > language_charset_map = { 'japanese' : 'iso-2022-jp', > 'english' : 'iso-8859-1', > 'russian' : 'koi8-r', > ... } > > charset_transfer_encoding_map = { 'iso-2022-jp' : 'base64', > 'iso-8859-1' : 'quoted-printable', > 'koi8-r' : 'base64', > ... } > > mime_type_compression_map = { 'text/plain' : None, > 'img/bmp' : 'gzip', > ... } > > with the almost obvious definition of transform_mime_body(). > > This kind of table is often given in a file accessed by non-Python- > programmers. For example, for encodings that are not mostly ASCII, > gzipped base64 may be a very economical way to transmit (and store) a > text part. However, a non-English list that transmits a lot of code > might prefer quoted-printable to allow the code to be analyzed by some > kind of robot (obviously a legacy app!), and many lists will have > strong preferences between UTF-8 and a legacy encoding. Japanese > companies often have corporate encodings containing characters not > available in JIS (and sometimes not in Unicode). A list dedicated to > image processing may want to add image/* formats that haven't yet been > registered with the IANA, etc. > > On the Mailman lists it is a FAQ that people don't understand the > difference between 'None' and None. I don't think we can avoid None, > True, and False, but for many Mailman admins the difference between > 'gzip' and Compressors.gzip.compress is non-obvious and annoying. > Giving string names to all these transforms would make the > administration interface perceptibly more regular. There's no reason that for this pretty unusual and specific case you couldn't have your own function that is controlled by the string value read from the map edited by the list admin. I think the real abomination here is to expect list admins to use Python syntax at all. > On the other hand, suppose we have a web interface for configuration > so that the admins don't ever see the difference between a codec > registry key and a Python identifier. Do we want to expose all the > possible compressors, codecs, transfer encodings, and what not in the > module that provides the configuration UI so that the list of names > can be provided? How does the web interface avoid needing to know all > of those in advance? How does the web interface know which functions > are which (eg, compressor v. decompressor)? > > Of course the same questions apply to a registry, but as functionality > (answers to those questions) is added to the registry, the changes > needed to take advantage of it are much more localized and less > invasive than, say, requiring "compressors" to provide "compress" and > "uncompress" functions or methods, and a standard set of options. > > The main thing that I sympathize with in Martin's post is the issue of > options to transforms, but it seems to me that keyword arguments deal > with that clearly and flexibly. > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From turnbull at sk.tsukuba.ac.jp Tue May 20 01:07:55 2008 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 20 May 2008 08:07:55 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4831A267.5000304@cornell.edu> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com> <482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz> <482CF57D.6010200@canterbury.ac.nz> <482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu> <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp> <4831A267.5000304@cornell.edu> Message-ID: <87y765evuc.fsf@uwakimon.sk.tsukuba.ac.jp> Joel Bender writes: A lot, but I don't understand why. You seem to have a completely different pattern (and Python 2, not Python 3) in mind, but in fact as far as I can see the only point of conflict is that if the "registry of string names" proposal were adopted, you'd have trouble using the method name 'transform' as you would like to. There's nothing in the registry proposal that prevents you from calling functions by name, or writing polymorphic transformers, etc. From greg.ewing at canterbury.ac.nz Tue May 20 03:59:22 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 20 May 2008 13:59:22 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4831B2F9.8040001@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> Message-ID: <4832307A.6040609@canterbury.ac.nz> M.-A. Lemburg wrote: > It's being able to write > > str.transform('gzip').transform('uu') > > which doesn't require knowledge about the modules doing the actual > work behind the scenes. That doesn't preclude those modules exporting their functionality in the form of codecs having the standard codec interface. There are two independent issues here: 1) Should the functionality be provided in the form of a codec? (Yes, that's fine, IMO.) 2) Should all codecs live in a central registry and be callable via methods on strings and bytes? (I'm not convinced that's the case.) -- Greg From greg.ewing at canterbury.ac.nz Tue May 20 04:10:33 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 20 May 2008 14:10:33 +1200 Subject: [Python-3000] Metaclass Vs Class Decorator In-Reply-To: References: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1> Message-ID: <48323319.7080901@canterbury.ac.nz> paul bedaride wrote: > it's why I wonder, if this can't be good if metaclass and class > decorator have the same > interface, then we can use a class as a metaclass or as a decorator ?? That doesn't make sense -- metaclasses and class decorators are very different things and have very different capabilities. There is some overlap between the things they can do, but trying to unify them would be a mistake. -- Greg From mal at egenix.com Tue May 20 12:06:38 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 20 May 2008 12:06:38 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4832307A.6040609@canterbury.ac.nz> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> <4832307A.6040609@canterbury.ac.nz> Message-ID: <4832A2AE.7090700@egenix.com> On 2008-05-20 03:59, Greg Ewing wrote: > M.-A. Lemburg wrote: >> It's being able to write >> >> str.transform('gzip').transform('uu') >> >> which doesn't require knowledge about the modules doing the actual >> work behind the scenes. > > That doesn't preclude those modules exporting their > functionality in the form of codecs having the standard > codec interface. Note that all codecs we currently have in Python are in fact modules that you can import and use directly - even subclass to provide more or altered functionality, e.g. from encodings import latin_1 will give you direct access to the Latin-1 codec. You seem to be worried that the functionality is supposed to be buried deep in some codec registry - that's not the case. The codec registry only takes care of finding a codec interface given a name, nothing more. Also note that I'm not suggesting to remove any of the existing implementations of specialized interfaces for e.g. compression or base64 encoding. The codecs for these only use these interface without assimilating them :-) > There are two independent issues here: > > 1) Should the functionality be provided in the form > of a codec? (Yes, that's fine, IMO.) > > 2) Should all codecs live in a central registry and > be callable via methods on strings and bytes? > (I'm not convinced that's the case.) I think there's a misunderstanding here in how codecs work. Codecs exist to provide a consistent and well-defined interface to a wide range of encoding and decoding applications. They are not trying to: * compete with specialized interfaces * replace specialized interfaces * hide specialized interfaces from the user -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Tue May 20 12:19:40 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 20 May 2008 12:19:40 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4831EB1B.4080404@v.loewis.de> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> <4831EB1B.4080404@v.loewis.de> Message-ID: <4832A5BC.60704@egenix.com> On 2008-05-19 23:03, Martin v. L?wis wrote: >> They are convenience methods to the codecs registry >> with the added benefit of applying type checks which the codecs >> registry does not guarantee since it only manages codecs. > > I argue that things that could be parameters to .transform don't > belong into the codec registry in the first place. > >> Of course, you can write everything directly against the codec >> registry or some other specialized interface, but that's not >> really what we're after here. > > No need for writing directly against the codec registry. > > Using some other specialized interface: yes, Yes, YES! So you would like to force users to write e.g. def uu(input,errors='strict',filename='',mode=0666): from cStringIO import StringIO from binascii import b2a_uu # using str() because of cStringIO's Unicode undesired Unicode behavior. infile = StringIO(str(input)) outfile = StringIO() read = infile.read write = outfile.write # Encode write('begin %o %s\n' % (mode & 0777, filename)) chunk = read(45) while chunk: write(b2a_uu(chunk)) chunk = read(45) write(' \nend\n') return outfile.getvalue() (this is adapted Py2 code taken from the uu codec) instead of writing output = input.transform('uu') Fair enough, I've noted your -1. Still, I don't think the specialized interfaces are very user-friendly. They do serve their purpose, but common usage just doesn't really bother with all those details. And it doesn't end there... You have to look up, implement and test a similar standardizing function for all other specialized interfaces you want to use as well - more or less reinventing the codec interface for every application you write. Anyway, even without a .transform() method, you can still do: import codecs output = codecs.encode(input, 'uu') However, you then have to do the type checking yourself. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 20 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From martin at v.loewis.de Tue May 20 20:35:23 2008 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 20 May 2008 20:35:23 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4832A5BC.60704@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de> <4831B2F9.8040001@egenix.com> <4831EB1B.4080404@v.loewis.de> <4832A5BC.60704@egenix.com> Message-ID: <483319EB.4090904@v.loewis.de> > So you would like to force users to write e.g. > > def uu(input,errors='strict',filename='',mode=0666): > from cStringIO import StringIO > from binascii import b2a_uu > # using str() because of cStringIO's Unicode undesired Unicode > behavior. > infile = StringIO(str(input)) > outfile = StringIO() > read = infile.read > write = outfile.write > > # Encode > write('begin %o %s\n' % (mode & 0777, filename)) > chunk = read(45) > while chunk: > write(b2a_uu(chunk)) > chunk = read(45) > write(' \nend\n') > > return outfile.getvalue() > > (this is adapted Py2 code taken from the uu codec) No. I would just use uu.encode instead, which already does the loop, and everything else. So if I really wanted a string-to-string conversion, I would do infile = StringIO(input) outfile = StringIO() uu.encode(infile, outfile) output = outfile.getvalue() More likely, I have file-like objects already, in which case I won't need to create StringIO objects. Regards, Martin From stefan_ml at behnel.de Thu May 22 10:05:59 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 22 May 2008 10:05:59 +0200 Subject: [Python-3000] Cython code generation for Py3 complete Message-ID: Hi, just a quick announcement that I finished the port of the Cython compiler to Py3. While you cannot currently run Cython itself in Py3, you can build the generated C sources unchanged under Py2.3 through 3.0a5. http://cython.org/ There isn't a release yet (though there will hopefully be one soon), but I would be happy if interested people could already give it some testing. So if you have some Pyrex sources lying around and want them to run on Python 3k, please give it a try and report any problems you find to the Cython mailing list. You can get the compiler from the public Mercurial repository: http://hg.cython.org/cython-devel/ and I have put up a developer snapshot here: http://codespeak.net/lxml/dev/Cython-0.9.6.14-3k.tar.gz Hoping for some feedback, Stefan From solipsis at pitrou.net Thu May 22 13:58:10 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 22 May 2008 11:58:10 +0000 (UTC) Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > It's all a matter of perspective. You can say you're encoding Latin-1 > to Unicode, or you can say your encoding Unicode to Latin-1. Except that Latin-1 is an encoding while Unicode is not. So I don't see how you can encode to Unicode. Of course you can encode to UTF-8, UTF-16, etc. - which /are/ encodings (and, in this case, Python returns you a bytes object :-)). Antoine. From mal at egenix.com Thu May 22 14:27:19 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 22 May 2008 14:27:19 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> Message-ID: <483566A7.6050106@egenix.com> On 2008-05-22 13:58, Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> It's all a matter of perspective. You can say you're encoding Latin-1 >> to Unicode, or you can say your encoding Unicode to Latin-1. > > Except that Latin-1 is an encoding while Unicode is not. So I don't see how you > can encode to Unicode. Of course you can encode to UTF-8, UTF-16, etc. - which > /are/ encodings (and, in this case, Python returns you a bytes object :-)). Well, yes and no :-) Unicode does encode a way to describe code points. The assignments of integers to letters, symbols, etc. (ie. a "character set") provides the encoding, so you can call it "encoding" as well. OTOH, Unicode is the mother of all character sets so to speak (even though in this case, many children existed before the mother was formed ;-), so it has a special status. In practice the terms "encoding" and "character set" are often used interchangeably, just as most people talk about "characters" when referring to "code points" and/or "glyphs", or happily mix "UTF-8", "UTF-16" and "Unicode". The Unicode consortium usually uses the terms "UCS2" and "UCS4" when referring to Unicode as "character set", but even there you have an ordering which makes it an encoding. See my talk on Unicode for some clarification: http://www.egenix.com/library/presentations/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From stephen at xemacs.org Thu May 22 19:52:52 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 23 May 2008 02:52:52 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <483566A7.6050106@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> Message-ID: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> M.-A. Lemburg writes: > On 2008-05-22 13:58, Antoine Pitrou wrote: > > M.-A. Lemburg egenix.com> writes: > >> It's all a matter of perspective. You can say you're encoding Latin-1 > >> to Unicode, or you can say your encoding Unicode to Latin-1. > > > > Except that Latin-1 is an encoding while Unicode is not. > > Well, yes and no :-) > > Unicode does encode a way to describe code points. I don't think this is a useful POV in the context of Python, where 'unicode' is a primitive type, and not implemented as an array of (Python) integers. From guido at python.org Thu May 22 19:55:01 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 22 May 2008 10:55:01 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Hi folks, Is this thread reaching a conclusion yet? I am hoping I can soon accept some variant of the following: 1. repr() returns a Unicode string containing only printable Unicode characters, using \x\u\U escapes for characters that are not considered printable according to some version of the Unicode standard augmented with some Python practicality, but unaffected by platform or locale. This can be implemented efficiently, without having to load the whole Unicode database, at least for strings containing only a large subset of the Unicode character set (e.g. all of UCS2, and possibly whole ranges of UCS4). 2. If you don't want any non-ASCII printed to a file, set the file's encoding to ASCII and the error handler to backslashescape. But as I haven't followed the thread I may be way off. Is Martin's proposal to allow forcing the default stdin/stdout/stderr encodings through environment variables related? (It should allow for setting the error handler too.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Thu May 22 21:09:25 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 22 May 2008 21:09:25 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4835C4E5.1070407@egenix.com> On 2008-05-22 19:52, Stephen J. Turnbull wrote: > M.-A. Lemburg writes: > > On 2008-05-22 13:58, Antoine Pitrou wrote: > > > M.-A. Lemburg egenix.com> writes: > > >> It's all a matter of perspective. You can say you're encoding Latin-1 > > >> to Unicode, or you can say your encoding Unicode to Latin-1. > > > > > > Except that Latin-1 is an encoding while Unicode is not. > > > > Well, yes and no :-) > > > > Unicode does encode a way to describe code points. > > I don't think this is a useful POV in the context of Python, where > 'unicode' is a primitive type, and not implemented as an array of > (Python) integers. Agreed. I was just explaining where the whole notion of encoding and decoding originates and how the meaning of the .encode() and .decode() methods came to be. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Thu May 22 21:11:56 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 22 May 2008 21:11:56 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4835C57C.9010007@egenix.com> On 2008-05-22 19:55, Guido van Rossum wrote: > Hi folks, > > Is this thread reaching a conclusion yet? I am hoping I can soon > accept some variant of the following: > > 1. repr() returns a Unicode string containing only printable Unicode > characters, using \x\u\U escapes for characters that are not > considered printable according to some version of the Unicode standard > augmented with some Python practicality, but unaffected by platform or > locale. This can be implemented efficiently, without having to load > the whole Unicode database, at least for strings containing only a > large subset of the Unicode character set (e.g. all of UCS2, and > possibly whole ranges of UCS4). > > 2. If you don't want any non-ASCII printed to a file, set the file's > encoding to ASCII and the error handler to backslashescape. Sounds like a good compromise. Just please don't set the error handler of sys.stdout to anything but "strict" per default. > But as I haven't followed the thread I may be way off. > > Is Martin's proposal to allow forcing the default stdin/stdout/stderr > encodings through environment variables related? (It should allow for > setting the error handler too.) It's not related, but would be very helpful on its own, esp. for the stdin part in 3.x. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From solipsis at pitrou.net Thu May 22 21:59:09 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 22 May 2008 21:59:09 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1211486349.5825.14.camel@fsol> Le jeudi 22 mai 2008 ? 10:55 -0700, Guido van Rossum a ?crit : > Hi folks, > > Is this thread reaching a conclusion yet? I am hoping I can soon > accept some variant of the following: > > 1. repr() returns a Unicode string containing only printable Unicode > characters, using \x\u\U escapes for characters that are not > considered printable according to some version of the Unicode standard > augmented with some Python practicality, but unaffected by platform or > locale. This can be implemented efficiently, without having to load > the whole Unicode database, at least for strings containing only a > large subset of the Unicode character set (e.g. all of UCS2, and > possibly whole ranges of UCS4). > > 2. If you don't want any non-ASCII printed to a file, set the file's > encoding to ASCII and the error handler to backslashescape. Since some people still seem wary that repr() might return non-ascii results, perhaps we could also: 3. Add a builtin function named ascii() and a formatting code "%a" that both call repr() internally and then convert all non-ascii characters to \uXXXX escapes. 2to3 might even replace all occurrences of repr() by ascii(), to err on the safe side. Regards Antoine. From martin at v.loewis.de Thu May 22 22:38:47 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 22 May 2008 22:38:47 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <483566A7.6050106@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com> <48242D4A.3060802@egenix.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> Message-ID: <4835D9D7.7040809@v.loewis.de> > The Unicode consortium usually uses the terms "UCS2" and "UCS4" > when referring to Unicode as "character set", but even there > you have an ordering which makes it an encoding. The Unicode consortium uses the term "coded character set" to describe the assignment of characters in the set to numbers, and "character encoding scheme" to refer to an algorithm that produces a sequence of bytes, and doesn't use the term "encoding" altogether, see http://www.unicode.org/unicode/reports/tr17/ Regards, Martin From martin at v.loewis.de Thu May 22 22:41:34 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 22 May 2008 22:41:34 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4835DA7E.40304@v.loewis.de> > Is Martin's proposal to allow forcing the default stdin/stdout/stderr > encodings through environment variables related? (It should allow for > setting the error handler too.) It's related only if it supports setting the error handler as well. Would "encoding/errorhandler" sound like a useful syntax? Regards, Martin From phd at phd.pp.ru Thu May 22 22:56:36 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 23 May 2008 00:56:36 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <4835DA7E.40304@v.loewis.de> References: <482B80D5.8000202@canterbury.ac.nz> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <4835DA7E.40304@v.loewis.de> Message-ID: <20080522205636.GA8561@phd.pp.ru> On Thu, May 22, 2008 at 10:41:34PM +0200, "Martin v. L?wis" wrote: > Would "encoding/errorhandler" sound like a useful syntax? encoding:errorhandler Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Thu May 22 23:16:31 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 22 May 2008 14:16:31 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <20080522205636.GA8561@phd.pp.ru> References: <482B80D5.8000202@canterbury.ac.nz> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <4835DA7E.40304@v.loewis.de> <20080522205636.GA8561@phd.pp.ru> Message-ID: On Thu, May 22, 2008 at 1:56 PM, Oleg Broytmann wrote: > On Thu, May 22, 2008 at 10:41:34PM +0200, "Martin v. L?wis" wrote: >> Would "encoding/errorhandler" sound like a useful syntax? > > encoding:errorhandler Whichever character is guaranteed never to be part of an encoding name. All things being equal I'd prefer ':' too, since that's a pretty common separator in environment variables, and doesn't make it look like a pathname. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 22 23:18:50 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 22 May 2008 14:18:50 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <1211486349.5825.14.camel@fsol> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> Message-ID: > Le jeudi 22 mai 2008 ? 10:55 -0700, Guido van Rossum a ?crit : >> Is this thread reaching a conclusion yet? I am hoping I can soon >> accept some variant of the following: >> >> 1. repr() returns a Unicode string containing only printable Unicode >> characters, using \x\u\U escapes for characters that are not >> considered printable according to some version of the Unicode standard >> augmented with some Python practicality, but unaffected by platform or >> locale. This can be implemented efficiently, without having to load >> the whole Unicode database, at least for strings containing only a >> large subset of the Unicode character set (e.g. all of UCS2, and >> possibly whole ranges of UCS4). >> >> 2. If you don't want any non-ASCII printed to a file, set the file's >> encoding to ASCII and the error handler to backslashescape. On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou wrote: > Since some people still seem wary that repr() might return non-ascii > results, perhaps we could also: > > 3. Add a builtin function named ascii() and a formatting code "%a" that > both call repr() internally and then convert all non-ascii characters to > \uXXXX escapes. I'd call that a stretch goal, but it seems an easy one. > 2to3 might even replace all occurrences of repr() by ascii(), to err on > the safe side. I'd be against that. Could someone (Atsuo?) write up a new version for the PEP, adding the conclusions reached in this thread and recapping some of the discussion? I think this can get in before the first beta release, and that seems doable. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ishimoto at gembook.org Fri May 23 03:46:31 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 23 May 2008 10:46:31 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> Message-ID: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> On Fri, May 23, 2008 at 6:18 AM, Guido van Rossum wrote: >>> 2. If you don't want any non-ASCII printed to a file, set the file's >>> encoding to ASCII and the error handler to backslashescape. > > On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou wrote: >> Since some people still seem wary that repr() might return non-ascii >> results, perhaps we could also: >> >> 3. Add a builtin function named ascii() and a formatting code "%a" that >> both call repr() internally and then convert all non-ascii characters to >> \uXXXX escapes. > > I'd call that a stretch goal, but it seems an easy one. Martin may against for new builtin function. Perhaps string.asciirepr() might better? > > Could someone (Atsuo?) write up a new version for the PEP, adding the > conclusions reached in this thread and recapping some of the > discussion? I think this can get in before the first beta release, and > that seems doable. > I'll revise the PEP and the patch soon. One point still remains is default error handler for sys.stdout. I can live with 'strict' error handler, but I think raising exceptions for evenry un-supported characters by default is too exacting. From guido at python.org Fri May 23 06:30:35 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 22 May 2008 21:30:35 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> Message-ID: On Thu, May 22, 2008 at 6:46 PM, Atsuo Ishimoto wrote: > On Fri, May 23, 2008 at 6:18 AM, Guido van Rossum wrote: > >>>> 2. If you don't want any non-ASCII printed to a file, set the file's >>>> encoding to ASCII and the error handler to backslashescape. >> >> On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou wrote: >>> Since some people still seem wary that repr() might return non-ascii >>> results, perhaps we could also: >>> >>> 3. Add a builtin function named ascii() and a formatting code "%a" that >>> both call repr() internally and then convert all non-ascii characters to >>> \uXXXX escapes. >> >> I'd call that a stretch goal, but it seems an easy one. > > Martin may against for new builtin function. Perhaps > string.asciirepr() might better? That's not a pretty name (and aren't we going to get rid of the string module after all?). But it's a minor detail. >> Could someone (Atsuo?) write up a new version for the PEP, adding the >> conclusions reached in this thread and recapping some of the >> discussion? I think this can get in before the first beta release, and >> that seems doable. >> > > I'll revise the PEP and the patch soon. Great! > One point still remains is default error handler for sys.stdout. I can > live with 'strict' error handler, but I think raising exceptions for > evenry un-supported characters by default is too exacting. I think to avoid exceptions you should arrange for the encoding to be capable of encoding all characters (e.g. utf8 or utf16). IMO it's important to trust that you didn't write garbage, unless you specifically asked for it. It's different for stderr, there I think the most lenient error handling should be the default. PS> I couldn't get backslashescape to work -- is this just a proposal? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mwm at mired.org Tue May 6 17:36:19 2008 From: mwm at mired.org (Mike Meyer) Date: Tue, 06 May 2008 15:36:19 -0000 Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday 07-May-2008 In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org> <481A35D8.60604@cheimes.de> <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com> Message-ID: <20080506112925.023901a1@mbook-fbsd> On Fri, 02 May 2008 00:03:24 -0000 glyph at divmod.com wrote: > On 11:45 pm, guido at python.org wrote: > >I like this, except one issue: I really don't like the .local > >directory. I don't see any compelling reason why this needs to be > >~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide > >it from view, especially since the user is expected to manage this > >explicitly. > > I've previously given a spirited defense of ~/.local on this list ( > http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) > among other places. > > Briefly, "lib" is not the only directory participating in this > convention; you've also got the full complement of other stuff that > might go into an installation like /usr/local. So, while "lib" might > annoy me a little, "bin etc games include lib lib32 man sbin share src" > is going to get ugly pretty fast, especially if this is what comes up in > Finder or Nautilus or Explorer every time I open a window. You have a problem with 10 directories? Well, ok - if you have that on top of all the clutter that you normally get, yeah, I might object too. On the other hand, if *every* application used those 10 directories - and *only* those 10 directories - for all the files it needed that weren't for user-created data, that would be heaven. The fallacy you're falling into is that users never have to deal with those dot-files (or directories). They do. One of the most common operations when trying to diagnose a misbehaving application is "delete the configuration files" (my favorite is that I fix gnucash printing failures by deleting CUPS config files....), and the user has to figure out which, if any, of those magic files need to be deleted. If you're using Finder, you wind up turning on the preference that says "show me those", and suddenly your nice, clean directory explodes into ... Well, here's my home directory, shared between a Mac and a Unix box: mbook-fbsd% cd mbook-fbsd% ls | wc -l 42 mbook-fbsd% ls -d .* | wc -l 174 It's not very clean. Because it's a Mac, it's got some directories that the Mac felt I needed that I really have no use for. And there's maybe a dozen files there that are scratch files from various things I haven't cleaned up yet. Of course, the dot-files are much worse, because I normally don't see them, so there's not incentive to clean them up at all. But if i could trade those 172 (can't lose . and ..) "hidden" .files for 10 visible directories in ~, I'd do it in an instant - even if I didn't already have bin, etc, src & lib directories there. > Put another way - it's trivial to make ~/.local/lib show up by > symlinking ~/lib, but you can't make ~/lib disappear, and lots of > software ends up looking at ~. Just for the record, it's equally trivial - but better - to make ".local" disappear by symlinking '.local' to '.'. But providing an option is even cleaner, and then the fact that you can't use symlink to hide one is moot. As far as I'm concerned, .local is the worst possible choice for this choice for this name. Not only does it wind up in the more cluttered of the two name spaces, it doesn't tell me anything about the application(s) it belongs to, so I have to worry about it pretty much every time I'm mucking about with the config files. .python would be much better - at least I'd know what it was for by the name. http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ocean at m2.ccsnet.ne.jp Thu May 8 04:13:58 2008 From: ocean at m2.ccsnet.ne.jp (Hirokazu Yamamoto) Date: Thu, 08 May 2008 02:13:58 -0000 Subject: [Python-3000] [Python-Dev] Releasing alphas tonight References: <48224E9F.40407@cheimes.de> Message-ID: <001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh> Hello. > The py3k branch has a major show stopper, It's leaking references to the > max. Is there any chance this leak also will be fixed? http://bugs.python.org/issue2222 Thank you. From paul.bedaride at gmail.com Fri May 9 22:34:18 2008 From: paul.bedaride at gmail.com (paul bedaride) Date: Fri, 9 May 2008 22:34:18 +0200 Subject: [Python-3000] class style Message-ID: Hello, I'm new on this list and it's just for ask a question about class-style in python 3000 because I don't understand for instance why in class Example(object): var1 = 'example' var2 = property(fget=lambda self: 'example') var1 seems to be linked to class and var2 to object. In more it not seem possible to define class property and it could be usefull. I don't know if you have already discuss about class style but if you have could you give me the log ? thanks in advance paul bedaride -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishimoto at gembook.org Fri May 23 09:28:03 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Fri, 23 May 2008 16:28:03 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <482C1293.3030409@egenix.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> Message-ID: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> On Fri, May 23, 2008 at 1:30 PM, Guido van Rossum wrote: >> One point still remains is default error handler for sys.stdout. I can >> live with 'strict' error handler, but I think raising exceptions for >> evenry un-supported characters by default is too exacting. > > I think to avoid exceptions you should arrange for the encoding to be > capable of encoding all characters (e.g. utf8 or utf16). The utf-8 console is fine for my personal development style, I'm afraid it doesn't work for you. Whether your console is capable to display Japanese characters or not, you will want to see Japanese characters in hex-escaped characters, don't you? > > IMO it's important to trust that you didn't write garbage, unless you > specifically asked for it. Is this requested by users? With Python 2, we can always print strings containing garbage without exceptions. Python 3 is much stricter in this respect. To get meaningful information instead of tracebacks, we need to know encoding of output device and characters to be printed whenever we print strings. This is hard to be accomplished in practice. > PS> I couldn't get backslashescape to work -- is this just a proposal? No. Works for me without any modifications. I tried with latest source form svn. Python 3.0a5+ (py3k:63546, May 23 2008, 13:42:06) [MSC v.1500 32 bit (Intel)] on win32 >>> "????".encode("ascii", "backslashreplace") b'\\u30d1\\u30a4\\u30bd\\u30f3' [39364 refs] From ncoghlan at gmail.com Fri May 23 10:39:01 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 23 May 2008 18:39:01 +1000 Subject: [Python-3000] class style In-Reply-To: References: Message-ID: <483682A5.8080103@gmail.com> paul bedaride wrote: > Hello, > > I'm new on this list The question you asked is more appropriate for comp.lang.python, not python-dev/python-3000 (which are about the development *of* Python, not development *with* Python). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stefan_ml at behnel.de Fri May 23 12:44:00 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 23 May 2008 12:44:00 +0200 Subject: [Python-3000] Single buffer implied in new buffer protocol? Message-ID: Hi, while implementing Py_buffer support in Cython, I noticed (the hard way, throught a segfault), that the buffer pointer passed into getbuffer() can be NULL, e.g. when calling memoryview.tobytes(). According to PEP 3118 (first paragraph below the getbuffer() signature), this implies setting a lock on the memory. Funny enough, the LOCK flag wasn't even set in my case, I just get NULL as buffer and 285 as flags... Anyway, my point is that this part of the protocol actually implies setting a lock on the buffer *provider* rather than the buffer itself, as the buffer provider cannot distinguish between different buffers based on a NULL pointer. I know, the protocol is overly complex already and hard to implement from a provider perspective, and I understand that that was preferred over putting the complexity into the consumer. But wouldn't it make more sense to *always* pass the buffer pointer, to let the provider decide what it makes of the flags? I can well imagine the case where a buffer provider chooses to return different buffer pointers based on the WRITABLE flag, for example. In that case, it would be unable to attribute the lock to any of the buffers. Stefan From guido at python.org Fri May 23 16:22:00 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 23 May 2008 07:22:00 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> Message-ID: On Fri, May 23, 2008 at 12:28 AM, Atsuo Ishimoto wrote: > On Fri, May 23, 2008 at 1:30 PM, Guido van Rossum wrote: > >>> One point still remains is default error handler for sys.stdout. I can >>> live with 'strict' error handler, but I think raising exceptions for >>> evenry un-supported characters by default is too exacting. >> >> I think to avoid exceptions you should arrange for the encoding to be >> capable of encoding all characters (e.g. utf8 or utf16). > > The utf-8 console is fine for my personal development style, I'm > afraid it doesn't work for you. Whether your console is capable to > display Japanese characters or not, you will want to see Japanese > characters in hex-escaped characters, don't you? Personally, I can live with it. I rarely generate Japanese text so I doubt it'll be a problem. I can also change the console encoding and error handler. >> IMO it's important to trust that you didn't write garbage, unless you >> specifically asked for it. > > Is this requested by users? With Python 2, we can always print strings > containing garbage without exceptions. Python 3 is much stricter in > this respect. To get meaningful information instead of tracebacks, we > need to know encoding of output device and characters to be printed > whenever we print strings. This is hard to be accomplished in > practice. Tracebacks should always go to stderr. What I meant by "not writing garbage" was for some app that e.g. acts like a filter or otherwise produces output (on stdout) for another program to consume. The other program might not understand \u escapes. I'd rather trap this when writing, not when reading the garbage several stages later. IOW: - stderr (and probably also interactive stdout): set backslashreplace - stdout (if not interactive): strict Default encoding taken from environment in all cases. >> PS> I couldn't get backslashescape to work -- is this just a proposal? > > No. Works for me without any modifications. I tried with latest source form svn. > > Python 3.0a5+ (py3k:63546, May 23 2008, 13:42:06) [MSC v.1500 32 bit (Intel)] on > win32 >>>> "????".encode("ascii", "backslashreplace") > b'\\u30d1\\u30a4\\u30bd\\u30f3' > [39364 refs] Ah, backspashreplace, not backslashescape. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ishimoto at gembook.org Fri May 23 17:05:37 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 24 May 2008 00:05:37 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> Message-ID: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> 2008/5/23 Guido van Rossum : > Personally, I can live with it. I rarely generate Japanese text so I > doubt it'll be a problem. I can also change the console encoding and > error handler. While you rarely generate Japanese text, but I guess you often get non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in Spanish, etc. Forecasting encoding of data is hard in these days. > > What I meant by "not writing garbage" was for some app that e.g. acts > like a filter or otherwise produces output (on stdout) for another > program to consume. The other program might not understand \u escapes. > I'd rather trap this when writing, not when reading the garbage > several stages later. > > IOW: > > - stderr (and probably also interactive stdout): set backslashreplace > - stdout (if not interactive): strict > > Default encoding taken from environment in all cases. Fine with me. I'll update the PEP and patch. Thank you! From stephen at xemacs.org Fri May 23 22:42:54 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 24 May 2008 05:42:54 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <483566A7.6050106@egenix.com> <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> Message-ID: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> Atsuo Ishimoto writes: > 2008/5/23 Guido van Rossum : > > Personally, I can live with it. I rarely generate Japanese text so I > > doubt it'll be a problem. I can also change the console encoding and > > error handler. > > While you rarely generate Japanese text, but I guess you often get > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in > Spanish, etc. Forecasting encoding of data is hard in these days. I don't see the problem. You don't have to forecast the encoding of data. Strings are Unicode in Python internal format. The question is whether the device receiving the output of repr can handle all of the characters that will be generated. From ishimoto at gembook.org Sat May 24 04:04:48 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 24 May 2008 11:04:48 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> On Sat, May 24, 2008 at 5:42 AM, Stephen J. Turnbull wrote: > Atsuo Ishimoto writes: > > 2008/5/23 Guido van Rossum : > > > Personally, I can live with it. I rarely generate Japanese text so I > > > doubt it'll be a problem. I can also change the console encoding and > > > error handler. > > > > While you rarely generate Japanese text, but I guess you often get > > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in > > Spanish, etc. Forecasting encoding of data is hard in these days. > > I don't see the problem. You don't have to forecast the encoding of > data. Strings are Unicode in Python internal format. The question is > whether the device receiving the output of repr can handle all of the > characters that will be generated. > Yes. My question is "Which do you feel comfortable, printing collect glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign characters, but I had feeling that western people prefer hex-escaped ASCII in general. But from responses I saw, perhaps this is not big deal. From guido at python.org Sat May 24 07:01:11 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 23 May 2008 22:01:11 -0700 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> Message-ID: On Fri, May 23, 2008 at 7:04 PM, Atsuo Ishimoto wrote: > On Sat, May 24, 2008 at 5:42 AM, Stephen J. Turnbull wrote: >> Atsuo Ishimoto writes: >> > 2008/5/23 Guido van Rossum : >> > > Personally, I can live with it. I rarely generate Japanese text so I >> > > doubt it'll be a problem. I can also change the console encoding and >> > > error handler. >> > >> > While you rarely generate Japanese text, but I guess you often get >> > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in >> > Spanish, etc. Forecasting encoding of data is hard in these days. >> >> I don't see the problem. You don't have to forecast the encoding of >> data. Strings are Unicode in Python internal format. The question is >> whether the device receiving the output of repr can handle all of the >> characters that will be generated. > > Yes. My question is "Which do you feel comfortable, printing collect > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign > characters, but I had feeling that western people prefer hex-escaped > ASCII in general. But from responses I saw, perhaps this is not big > deal. I've certainly gotten over it, and have come to appreciate your point of view. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Sat May 24 07:07:50 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 24 May 2008 01:07:50 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><1211486349.5825.14.camel@fsol><797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com><797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com><797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com><87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> Message-ID: "Atsuo Ishimoto" wrote in message news:797440730805231904y501d310fw124ccd0e37defd3b at mail.gmail.com... | Yes. My question is "Which do you feel comfortable, printing collect | glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign | characters, but I had feeling that western people prefer hex-escaped | ASCII in general. But from responses I saw, perhaps this is not big | deal. Given that my system displays most major alphabets, and that I can recognize most, the glyphs are more informative for informal purposes than seemingly 'random' codes. From ncoghlan at gmail.com Sat May 24 09:33:19 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 24 May 2008 17:33:19 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><1211486349.5825.14.camel@fsol><797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com><797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com><797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com><87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> Message-ID: <4837C4BF.7070302@gmail.com> Terry Reedy wrote: > "Atsuo Ishimoto" wrote in message > news:797440730805231904y501d310fw124ccd0e37defd3b at mail.gmail.com... > | Yes. My question is "Which do you feel comfortable, printing collect > | glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign > | characters, but I had feeling that western people prefer hex-escaped > | ASCII in general. But from responses I saw, perhaps this is not big > | deal. > > Given that my system displays most major alphabets, and that I can > recognize most, the glyphs are more informative for informal purposes than > seemingly 'random' codes. The same goes for me - Konsole displays all sorts of Unicode glyphs just fine. I can actually read Japanese kana and the Cyrillic alphabet a heck of a lot better than I can read Unicode hex escapes, purely because the additional symbols are more distinctive than a relatively arbitrary collection of numbers :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stephen at xemacs.org Sat May 24 11:37:11 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 24 May 2008 18:37:11 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> Message-ID: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> Atsuo Ishimoto writes: > Yes. My question is "Which do you feel comfortable, printing collect > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign > characters, but I had feeling that western people prefer hex-escaped > ASCII in general. But from responses I saw, perhaps this is not big > deal. I think Americans, at least, tend to fear that non-ASCII will be interpreted as terminal control sequences or highlighted annoyingly in some way. Otherwise, they might grumble about the fact that what they're seeing isn't English, but it doesn't matter whether it's hex-escaped or kanji. From ishimoto at gembook.org Sat May 24 12:49:34 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sat, 24 May 2008 19:49:34 +0900 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 Message-ID: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> I updated a PEP 3138 - String representation in Python 3000. Python wiki is also updated. (http://wiki.python.org/moin/Python3kStringRepr) I would appreciate your comments and help. ----------------------------------------------- PEP: 3138 Title: String representation in Python 3000 Version: $Revision$ Last-Modified: $Date$ Author: Atsuo Ishimoto Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-May-2008 Post-History: Abstract ======== This PEP proposes new string representation form for Python 3000. In Python prior to Python 3000, the ``repr()`` built-in function converts arbitrary objects to printable ASCII strings for debugging and logging. For Python 3000, a wider range of characters, based on the Unicode standard, should be considered 'printable'. Motivation ========== The current ``repr()`` converts 8-bit strings to ASCII using following algorithm. - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII characters(>=0x80) to '\\xXX'. - Backslash-escape quote characters(apostrophe, ') and add the quote character at the beginning and the end. For Unicode strings, the following additional conversions are done. - Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. - Convert 16-bit characters(>=0x100) to '\\uXXXX'. - Convert 21-bit characters(>=0x10000) and surrogate pair characters to '\\U00xxxxxx'. This algorithm converts any string to printable ASCII, and ``repr()`` is used as handy and safe way to print strings for debugging or for logging. Although all non-ASCII characters are escaped, this does not matter when most of the string's characters are ASCII. But for other languages, such as Japanese where most characters in a string are not ASCII, this is very inconvenient. Python 3000 has a lot of nice features for non-Latin users such as non-ASCII identifiers, so it would be helpful if Python could also progress in a similar way for printable output. Some users might be concerned that such output will mess up their console if they print binary data like images. But this is unlikely to happen in practice because bytes and strings are different types in Python 3000, so printing an image to the console won't mess it up. This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. Specification ============= - Add Python API ``int PY_UNICODE_ISPRINTABLE(Py_UNICODE ch)``. `` PY_UNICODE_ISPRINTABLE()`` return 0 if ``repr()`` should escape the Unicode character ``ch``, 1 otherwise. Characters should be escaped are * Characters defined in the Unicode character database as "Other"(Cc, Cf, Cs, Co, Cn). * Characters defined in the Unicode character database as "Separator" (Zl, Zp, Zs) other than ASCII space(0x20). - The algorithm to build ``repr()`` strings should be changed to: * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'. * Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. * Backslash-escape quote characters(apostrophe, ') and add quote character at the beginning and the end. - Set the Unicode error-handler for sys.stderr to 'backslashreplace' by default. - Set the Unicode error-handler for sys.stdout in the Python interactive session to 'backslashreplace' by default. - Add ``'%a'`` string format operator. ``'%a'`` converts any python object to string using ``repr()`` and then hex-escape all non-ASCII characters. ``'%a'`` operator generates same string as ``'%r'`` in Python 2. - Add ``ascii()`` builtin function. ``ascii()`` converts any python object to string using ``repr()`` and then hex-escape all non-ASCII characters. ``ascii()`` generates same string as ``repr()`` in Python 2. - Add ``isprintable()`` method to the string type. ``str.isprintable()`` return True if ``repr()`` should escape the characters in the string, False otherwise. ``isprintable()`` method calls ``PY_UNICODE_ISPRINTABLE()`` internally. Rationale ========= The ``repr()`` in Python 3000 should be Unicode not ASCII based, just like Python 3000 strings. Also, conversion should not be affected by the locale setting, because the locale is not necessarily the same as the output device's locale. For example, it is common for a daemon process to be invoked in an ASCII setting, but writes UTF-8 to its log files. Also, web applications might want to report the error information in more readable form based on the HTML page's encoding. Characters not supported by user's console are hex-escaped on printing, by the Unicode encoder's error-handler. If the error-handler of the output file is 'backslashreplace', such characters are hex-escaped without raising UnicodeEncodeError. For example, if your default encoding is ASCII, ``print('Hello ?')`` will prints 'Hello \\xa2'. If your encoding is ISO-8859-1, 'Hello ?' will be printed. For non-interactive session, default error-handler of sys.stdout should be default to 'strict'. Other applications reading the output might not understand hex-escaped characters, so un-supported characters should be trapped when writing. Printable characters -------------------- The Unicode standard doesn't define Non-printable characters, so we must create our own definition. Here we propose to define Non-printable characters as follows. - Non-printable ASCII characters as Python 2. - Broken surrogate pair characters. - Characters defined in the Unicode character database as * Cc (Other, Control) * Cf (Other, Format) * Cs (Other, Surrogate) * Co (Other, Private Use) * Cn (Other, Not Assigned) * Zl Separator, Line ('\\u2028', LINE SEPARATOR) * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR) * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in this category should be escaped to avoid ambiguity. Alternate Solutions ------------------- To help debugging in non-Latin languages without changing ``repr()``, other suggestion were made. - Supply a tool to print lists or dicts. Strings to be printed for debugging are not only contained by lists or dicts, but also in many other types of object. File objects contain a file name in Unicode, exception objects contain a message in Unicode, etc. These strings should be printed in readable form when repr()ed. It is unlikely to be possible to implement a tool to print all possible object types. - Use sys.displayhook and sys.excepthook. For interactive sessions, we can write hooks to restore hex escaped characters to the original characters. But these hooks are called only when the result of evaluating an expression entered in an interactive Python session, and doesn't work for the print() function, for non- interactive sessions or for logging.debug("%r", ...), etc. - Subclass sys.stdout and sys.stderr. It is difficult to implement a subclass to restore hex-escaped characters since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. For example, `` print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But there is no chance to tell file objects apart. - Make the encoding used by ``unicode_repr()`` adjustable, and make current ``repr()`` as default. With adjustable ``repr()``, result of ``repr()`` is unpredictable and would make impossible to write correct code involving ``repr()``. And if current ``repr()`` is default, then old convention remains intact and user may expect ASCII strings as the result of ``repr()``. Third party applications or libraries could be choked when custom ``repr()`` function is used. Backwards Compatibility ======================= Changing ``repr()`` may break some existing codes, especially testing code. Five of Python's regression test fail with this modification. If you need ``repr()`` strings without non-ASCII character as Python 2, you can use following function. :: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") For logging or for debugging, following code can raise UnicodeEncodeError. :: log = open("logfile", "w") log.write(repr(data)) # UnicodeEncodeError will be raised # if data contains unsupported characters. To avoid exceptions raised, you can specify error-handler explicitly. :: log = open("logfile", "w", errors="backslashreplace") log.write(repr(data)) # Unsupported characters will be escaped. For the console with Unicode-based encoding, for example, en_US.utf8 and de_DE.utf8, the backslashescape trick doesn't work and all printable characters are not escaped. This will cause a problem of similarly drawing characters in Western,Greek and Cyrillic languages. These languages use similar (but different) alphabets (descended from the common ancestor) and contain letters that look similar but has different character codes. For example, it is hard to distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of course, very much depends on the fonts used but usually these letters are almost indistinguishable.) To avoid the problem, user can adjust terminal encoding to get desired result suitable for their environment or use ``repr_ascii()`` described above. Open Issues =========== - Is ``ascii()`` function necessary, or documentation is just fine? If necessary, should ``ascii()`` belong to builtin namespace? Rejected Proposals ================== - Add encoding and errors arguments to the builtin print() function, with defaults of sys.getfilesystemencoding() and 'backslashreplace'. Complicated to implement, and in general, this is not seem to good idea. [2]_ - Use character names to escape characters, instead of hex character codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK SMALL LETTER ALPHA}"``. Using character names get verbose compared to hex-escape. e.g., ``repr ("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``. Reference Implementation ======================== http://bugs.python.org/issue2630 References ========== .. [1] Multibyte string on string\::string_print (http://bugs.python.org/issue479898) .. [2] [Python-3000] Displaying strings containing unicode escapes (http://mail.python.org/pipermail/python-3000/2008-April/013366.html) Copyright ========= This document has been placed in the public domain. From jimjjewett at gmail.com Sat May 24 18:53:08 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 24 May 2008 12:53:08 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/24/08, Stephen J. Turnbull wrote: > Atsuo Ishimoto writes: > > Yes. My question is "Which do you feel comfortable, printing collect > > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign > > characters, but I had feeling that western people prefer hex-escaped > > ASCII in general. But from responses I saw, perhaps this is not big > > deal. It depends on why I'm looking at it. I do prefer hex for repr, because hex is safer; if I want pretty, I'll use print (or pprint). > I think Americans, at least, tend to fear that non-ASCII will be > interpreted as terminal control sequences or highlighted annoyingly in > some way. Because it often is, even on systems that can display the proper glyphs in other contexts -- and it isn't always possible to recover from a messed-up terminal without restarting the session. I'll grant that this implies bugs in the programs I use -- but they happen enough with enough different programs that it is a concern. > Otherwise, they might grumble about the fact that what > they're seeing isn't English, but it doesn't matter whether it's > hex-escaped or kanji. I'm more worried that it might look like English, yet be subtly (and importantly) different. I can distinguish the characters in ASCII pretty well, or at least recognize when something looks ambiguous. I cannot do that so well with other scripts -- but seeing a hex escape warns me that something special is happening. Note that I have no objection to properly displaying other characters as a system-wide setting. I'm glad that it is easy to do with print. I just want it to be very easy to say "on my system, repr is ASCII". I would prefer that ASCII also be the default, so that people who want more characters opt in to receive them, at least once at installation time. -jJ From phd at phd.pp.ru Sat May 24 19:18:14 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 24 May 2008 21:18:14 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20080524171814.GA4026@phd.pp.ru> On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote: > if I want pretty, I'll use print (or pprint). str(container_of_strings) uses repr(), so you loose prettiness on either print or '%s' % container_of_strings. Exceptions use repr() for file names, e.g., which is very inconvenient, IMHO. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Sat May 24 19:27:21 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 24 May 2008 21:27:21 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <20080524171814.GA4026@phd.pp.ru> References: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> Message-ID: <20080524172721.GC4026@phd.pp.ru> On Sat, May 24, 2008 at 09:18:14PM +0400, Oleg Broytmann wrote: > On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote: > > if I want pretty, I'll use print (or pprint). > > str(container_of_strings) uses repr(), so you loose prettiness on either > print or '%s' % container_of_strings. Exceptions use repr() for file names, > e.g., which is very inconvenient, IMHO. I meant - you cannot print() an exception to make it pretty - it uses repr() internally anyway. The only way to win back the prettiness is to make repr() prints printable strings without encoding. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From janssen at parc.com Sat May 24 20:47:55 2008 From: janssen at parc.com (Bill Janssen) Date: Sat, 24 May 2008 11:47:55 PDT Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <08May24.114756pdt."58698"@synergy1.parc.xerox.com> > Atsuo Ishimoto writes: > > > Yes. My question is "Which do you feel comfortable, printing collect > > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign > > characters, but I had feeling that western people prefer hex-escaped > > ASCII in general. But from responses I saw, perhaps this is not big > > deal. > > I think Americans, at least, tend to fear that non-ASCII will be > interpreted as terminal control sequences or highlighted annoyingly in > some way. Otherwise, they might grumble about the fact that what > they're seeing isn't English, but it doesn't matter whether it's > hex-escaped or kanji. The nice thing about hex-escaped characters is that I can look up the character code to find out what the character is. Hard to do that with a glyph that I don't recognize. Bill From martin at v.loewis.de Sat May 24 21:20:57 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 24 May 2008 21:20:57 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <08May24.114756pdt."58698"@synergy1.parc.xerox.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <08May24.114756pdt."58698"@synergy1.parc.xerox.com> Message-ID: <48386A99.1000800@v.loewis.de> > The nice thing about hex-escaped characters is that I can look up the > character code to find out what the character is. Hard to do that > with a glyph that I don't recognize. Not that difficult. Suppose I have the character ?, I just do py> unicodedata.name(u"?") 'CYRILLIC CAPITAL LETTER SCHWA' I used cut-n-paste to insert the character into the interactive prompt; that worked just fine. Regards, Martin From tjreedy at udel.edu Sat May 24 22:15:28 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 24 May 2008 16:15:28 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python3000 References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> Message-ID: | | - Add ``isprintable()`` method to the string type. ``str.isprintable()`` | return True if ``repr()`` should escape the characters in the string, | False otherwise. Is not this backwards? Isprintable to me mean should *not* escape. From janssen at parc.com Sun May 25 00:26:49 2008 From: janssen at parc.com (Bill Janssen) Date: Sat, 24 May 2008 15:26:49 PDT Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <48386A99.1000800@v.loewis.de> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <1211486349.5825.14.camel@fsol> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <08May24.114756pdt."58698"@synergy1.parc.xerox.com> <48386A99.1000800@v.loewis.de> Message-ID: <08May24.152651pdt."58698"@synergy1.parc.xerox.com> > Not that difficult. Suppose I have the character ??, I just do > > py> unicodedata.name(u"??") > 'CYRILLIC CAPITAL LETTER SCHWA' > > I used cut-n-paste to insert the character into the interactive prompt; > that worked just fine. I suppose, if I knew about unicodedata.name(), and if my cursed command-line terminal supported cut-and-paste. Between rxvt, xterm, Emacs shell buffers, Windows command shells, and OS X Terminal.app windows, I find it hard to know just what will and will not work in that regard. Bill From ncoghlan at gmail.com Sun May 25 02:45:30 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 25 May 2008 10:45:30 +1000 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> Message-ID: <4838B6AA.5000207@gmail.com> Terry Reedy wrote: > | > | - Add ``isprintable()`` method to the string type. ``str.isprintable()`` > | return True if ``repr()`` should escape the characters in the string, > | False otherwise. > > Is not this backwards? Isprintable to me mean should *not* escape. I agree (I suspect the incorrect phrasing is due to the fact that this query method used to ask the opposite question - then the name got changed without updating the description) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ishimoto at gembook.org Sun May 25 06:10:52 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sun, 25 May 2008 13:10:52 +0900 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python3000 In-Reply-To: <4838B6AA.5000207@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <4838B6AA.5000207@gmail.com> Message-ID: <797440730805242110r3cccc6f7p80ace3888c3b5336@mail.gmail.com> On Sun, May 25, 2008 at 9:45 AM, Nick Coghlan wrote: > Terry Reedy wrote: >> >> | >> | - Add ``isprintable()`` method to the string type. ``str.isprintable()`` >> | return True if ``repr()`` should escape the characters in the string, >> | False otherwise. >> >> Is not this backwards? Isprintable to me mean should *not* escape. > > I agree (I suspect the incorrect phrasing is due to the fact that this query > method used to ask the opposite question - then the name got changed without > updating the description) Dang, I'm sorry for dumb mistake. Your suspection is right:). I updated the PEP, with some addition to motivation section as per Oleg's advice. ----------------------------------------------- PEP: 3138 Title: String representation in Python 3000 Version: $Revision$ Last-Modified: $Date$ Author: Atsuo Ishimoto Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-May-2008 Post-History: Abstract ======== This PEP proposes new string representation form for Python 3000. In Python prior to Python 3000, the ``repr()`` built-in function converts arbitrary objects to printable ASCII strings for debugging and logging. For Python 3000, a wider range of characters, based on the Unicode standard, should be considered 'printable'. Motivation ========== The current ``repr()`` converts 8-bit strings to ASCII using following algorithm. - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII characters(>=0x80) to '\\xXX'. - Backslash-escape quote characters(apostrophe, ') and add the quote character at the beginning and the end. For Unicode strings, the following additional conversions are done. - Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. - Convert 16-bit characters(>=0x100) to '\\uXXXX'. - Convert 21-bit characters(>=0x10000) and surrogate pair characters to '\\U00xxxxxx'. This algorithm converts any string to printable ASCII, and ``repr()`` is used as handy and safe way to print strings for debugging or for logging. Although all non-ASCII characters are escaped, this does not matter when most of the string's characters are ASCII. But for other languages, such as Japanese where most characters in a string are not ASCII, this is very inconvenient. we can use ``print(aJapaneseString)`` to get readable string, but we don't have workaround to read strings in containers such as list or tuple. ``print(listOfJapaneseStrings)`` uses repr() to build the string to be printed, so resulting strings are always hex-escaped. Or when ``open(japaneseFilemame)`` raises an exception, the error message is something like ``IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful. Python 3000 has a lot of nice features for non-Latin users such as non-ASCII identifiers, so it would be helpful if Python could also progress in a similar way for printable output. Some users might be concerned that such output will mess up their console if they print binary data like images. But this is unlikely to happen in practice because bytes and strings are different types in Python 3000, so printing an image to the console won't mess it up. This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. Specification ============= - Add Python API ``int PY_UNICODE_ISPRINTABLE(Py_UNICODE ch)``. `` PY_UNICODE_ISPRINTABLE()`` return 0 if ``repr()`` should escape the Unicode character ``ch``, 1 otherwise. Characters should be escaped are * Characters defined in the Unicode character database as "Other"(Cc, Cf, Cs, Co, Cn). * Characters defined in the Unicode character database as "Separator" (Zl, Zp, Zs) other than ASCII space(0x20). - The algorithm to build ``repr()`` strings should be changed to: * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'. * Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. * Backslash-escape quote characters(apostrophe, ') and add quote character at the beginning and the end. - Set the Unicode error-handler for sys.stderr to 'backslashreplace' by default. - Set the Unicode error-handler for sys.stdout in the Python interactive session to 'backslashreplace' by default. - Add ``'%a'`` string format operator. ``'%a'`` converts any python object to string using ``repr()`` and then hex-escape all non-ASCII characters. ``'%a'`` operator generates same string as ``'%r'`` in Python 2. - Add ``ascii()`` builtin function. ``ascii()`` converts any python object to string using ``repr()`` and then hex-escape all non-ASCII characters. ``ascii()`` generates same string as ``repr()`` in Python 2. - Add ``isprintable()`` method to the string type. ``str.isprintable()`` return False if ``repr()`` should escape the characters in the string, True otherwise. ``isprintable()`` method calls ``PY_UNICODE_ISPRINTABLE()`` internally. Rationale ========= The ``repr()`` in Python 3000 should be Unicode not ASCII based, just like Python 3000 strings. Also, conversion should not be affected by the locale setting, because the locale is not necessarily the same as the output device's locale. For example, it is common for a daemon process to be invoked in an ASCII setting, but writes UTF-8 to its log files. Also, web applications might want to report the error information in more readable form based on the HTML page's encoding. Characters not supported by user's console are hex-escaped on printing, by the Unicode encoder's error-handler. If the error-handler of the output file is 'backslashreplace', such characters are hex-escaped without raising UnicodeEncodeError. For example, if your default encoding is ASCII, ``print('Hello ?')`` will prints 'Hello \\xa2'. If your encoding is ISO-8859-1, 'Hello ?' will be printed. For non-interactive session, default error-handler of sys.stdout should be default to 'strict'. Other applications reading the output might not understand hex-escaped characters, so un-supported characters should be trapped when writing. Printable characters -------------------- The Unicode standard doesn't define Non-printable characters, so we must create our own definition. Here we propose to define Non-printable characters as follows. - Non-printable ASCII characters as Python 2. - Broken surrogate pair characters. - Characters defined in the Unicode character database as * Cc (Other, Control) * Cf (Other, Format) * Cs (Other, Surrogate) * Co (Other, Private Use) * Cn (Other, Not Assigned) * Zl Separator, Line ('\\u2028', LINE SEPARATOR) * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR) * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in this category should be escaped to avoid ambiguity. Alternate Solutions ------------------- To help debugging in non-Latin languages without changing ``repr()``, other suggestion were made. - Supply a tool to print lists or dicts. Strings to be printed for debugging are not only contained by lists or dicts, but also in many other types of object. File objects contain a file name in Unicode, exception objects contain a message in Unicode, etc. These strings should be printed in readable form when repr()ed. It is unlikely to be possible to implement a tool to print all possible object types. - Use sys.displayhook and sys.excepthook. For interactive sessions, we can write hooks to restore hex escaped characters to the original characters. But these hooks are called only when the result of evaluating an expression entered in an interactive Python session, and doesn't work for the print() function, for non- interactive sessions or for logging.debug("%r", ...), etc. - Subclass sys.stdout and sys.stderr. It is difficult to implement a subclass to restore hex-escaped characters since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. For example, `` print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But there is no chance to tell file objects apart. - Make the encoding used by ``unicode_repr()`` adjustable, and make current ``repr()`` as default. With adjustable ``repr()``, result of ``repr()`` is unpredictable and would make impossible to write correct code involving ``repr()``. And if current ``repr()`` is default, then old convention remains intact and user may expect ASCII strings as the result of ``repr()``. Third party applications or libraries could be choked when custom ``repr()`` function is used. Backwards Compatibility ======================= Changing ``repr()`` may break some existing codes, especially testing code. Five of Python's regression test fail with this modification. If you need ``repr()`` strings without non-ASCII character as Python 2, you can use following function. :: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") For logging or for debugging, following code can raise UnicodeEncodeError. :: log = open("logfile", "w") log.write(repr(data)) # UnicodeEncodeError will be raised # if data contains unsupported characters. To avoid exceptions raised, you can specify error-handler explicitly. :: log = open("logfile", "w", errors="backslashreplace") log.write(repr(data)) # Unsupported characters will be escaped. For the console with Unicode-based encoding, for example, en_US.utf8 and de_DE.utf8, the backslashescape trick doesn't work and all printable characters are not escaped. This will cause a problem of similarly drawing characters in Western,Greek and Cyrillic languages. These languages use similar (but different) alphabets (descended from the common ancestor) and contain letters that look similar but has different character codes. For example, it is hard to distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of course, very much depends on the fonts used but usually these letters are almost indistinguishable.) To avoid the problem, user can adjust terminal encoding to get desired result suitable for their environment or use ``repr_ascii()`` described above. Open Issues =========== - Is ``ascii()`` function necessary, or documentation is just fine? If necessary, should ``ascii()`` belong to builtin namespace? Rejected Proposals ================== - Add encoding and errors arguments to the builtin print() function, with defaults of sys.getfilesystemencoding() and 'backslashreplace'. Complicated to implement, and in general, this is not seem to good idea. [2]_ - Use character names to escape characters, instead of hex character codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK SMALL LETTER ALPHA}"``. Using character names get verbose compared to hex-escape. e.g., ``repr ("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``. Reference Implementation ======================== http://bugs.python.org/issue2630 References ========== .. [1] Multibyte string on string\::string_print (http://bugs.python.org/issue479898) .. [2] [Python-3000] Displaying strings containing unicode escapes (http://mail.python.org/pipermail/python-3000/2008-April/013366.html) Copyright ========= This document has been placed in the public domain. From stephen at xemacs.org Sun May 25 10:03:19 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 25 May 2008 17:03:19 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > > Otherwise, they might grumble about the fact that what > > they're seeing isn't English, but it doesn't matter whether it's > > hex-escaped or kanji. > > I'm more worried that it might look like English, yet be subtly (and > importantly) different. Let me remind you that I advocated that position, and (1) Martin shot me down hard, and (2) Guido indicated that it is a point, but he now seems happy enough not to worry about it. If you're serious about that, you need to pick up the ball; I'm not comfortable advocating it, especially in view of the wide variety of cases where it seems to be used for something other than diagnosing normally invisible features of output. > I just want it to be very easy to say "on my system, repr is ASCII". That is in all proposals. > I would prefer that ASCII also be the default, so that people who want > more characters opt in to receive them, Well, the people who want more characters include all non-Americans and some large fraction of Americans. I don't see how "Better Is Better" can possibly beat "Worse Is Better" here, given the extent to which repr is used to produce output meaningful to end-users (vs. diagnostics for application and/or Python maintainers). From lists at cheimes.de Sun May 25 16:59:24 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 25 May 2008 16:59:24 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 Message-ID: <48397ECC.9070805@cheimes.de> Hello! The first set of betas of Python 2.6 and 3.0 is fast apace. I like to grab the final chance and clean up the C API of 2.6 and 3.0. I know, I know, I brought up the topic two times in the past. But this time I mean it for real! :] Last time Guido said: --- I think it can actually be simplified. I think maintaining binary compatibility between 2.6 and earlier versions is hopeless anyway, so we might as well just rename PyString to PyBytes in 2.6 and 3.0, and have an extra set of macros so that code using PyString needs to be recompiled but not otherwise touched. E.g. typedef { ... } PyBytesObject; #define PyStringObject PyBytesObject ... PyString_Type; #define PyBytes_Type PyString_Type --- I like to follow Guido's advice and change the code as following: * replace PyBytes_ with PyByteArray_ * replace PyString with PyBytes_ * rename bytesobject.[ch] to bytearrayobject.[ch] * rename stringobject.[ch] to bytesobject.[ch] * add a new file stringobject.h which contains the aliases PyString_ -> PyBytes_ Christian From lists at cheimes.de Sun May 25 17:28:53 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 25 May 2008 17:28:53 +0200 Subject: [Python-3000] Please svnmerge your changes Message-ID: <483985B5.6020705@cheimes.de> Hello fellow developers! I've been busy with personal work in the past weeks. At present I'm still moving into my new apartment. It has been a real challenge to install an IKEA kitchen in a house built before WW2 all by myself. On the one hand it's fun but on the other hand it costs me most of my free time at night. At least this building has a shelter in its cellar so I'm mostly protected in the case of an air strike. *g* In order to get all code merged before the first betas I need your help. Please everybody grab a couple of your checkins and merge them yourself. You can find the list of required merges at http://rafb.net/p/cghbTk63.html Christian From stefan_ml at behnel.de Sun May 25 17:56:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 25 May 2008 17:56:29 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: Hi, Christian Heimes wrote: > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ will that be included by Python.h by default? Stefan From lists at cheimes.de Sun May 25 18:16:52 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 25 May 2008 18:16:52 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> Message-ID: <483990F4.30802@cheimes.de> Stefan Behnel schrieb: > will that be included by Python.h by default? Only in Python 2.6 Christian From g.brandl at gmx.net Sun May 25 21:21:26 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 25 May 2008 21:21:26 +0200 Subject: [Python-3000] dbm package creation Message-ID: Hi, I'll handle the PEP 3108 dbm package if nobody else is already at it. Two questions though: * the whichdb() function returns strings that are module names. These names won't be importable anymore in 3k. Should the return values remain the same in 3k, or should whichdb() return the new names, and if the latter, including "dbm." or not? * two of the previous modules are C modules, namely dbm and gdbm. They can't be easily moved into the package. I expect the solution is to create stub Python modules and rename the C modules with a leading underscore? (It's already like this for bsd, except that the C module name, bsddb, has no underscore.) cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From brett at python.org Mon May 26 00:02:32 2008 From: brett at python.org (Brett Cannon) Date: Sun, 25 May 2008 15:02:32 -0700 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: On Sun, May 25, 2008 at 7:59 AM, Christian Heimes wrote: > Hello! > > The first set of betas of Python 2.6 and 3.0 is fast apace. I like to > grab the final chance and clean up the C API of 2.6 and 3.0. I know, I > know, I brought up the topic two times in the past. But this time I mean > it for real! :] > > Last time Guido said: > --- > I think it can actually be simplified. I think maintaining binary > compatibility between 2.6 and earlier versions is hopeless anyway, so > we might as well just rename PyString to PyBytes in 2.6 and 3.0, and > have an extra set of macros so that code using PyString needs to be > recompiled but not otherwise touched. E.g. > > typedef { ... } PyBytesObject; > #define PyStringObject PyBytesObject > > ... PyString_Type; > #define PyBytes_Type PyString_Type > > > --- > > I like to follow Guido's advice and change the code as following: > > * replace PyBytes_ with PyByteArray_ > * replace PyString with PyBytes_ > * rename bytesobject.[ch] to bytearrayobject.[ch] > * rename stringobject.[ch] to bytesobject.[ch] > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ +1 from me. -Brett From brett at python.org Mon May 26 00:04:32 2008 From: brett at python.org (Brett Cannon) Date: Sun, 25 May 2008 15:04:32 -0700 Subject: [Python-3000] Please svnmerge your changes In-Reply-To: <483985B5.6020705@cheimes.de> References: <483985B5.6020705@cheimes.de> Message-ID: On Sun, May 25, 2008 at 8:28 AM, Christian Heimes wrote: > Hello fellow developers! > > I've been busy with personal work in the past weeks. At present I'm > still moving into my new apartment. It has been a real challenge to > install an IKEA kitchen in a house built before WW2 all by myself. On > the one hand it's fun but on the other hand it costs me most of my free > time at night. At least this building has a shelter in its cellar so I'm > mostly protected in the case of an air strike. *g* > > In order to get all code merged before the first betas I need your help. > Please everybody grab a couple of your checkins and merge them yourself. > > You can find the list of required merges at http://rafb.net/p/cghbTk63.html For stuff from the sandbox, would it help at all to block them explicitly even though they shouldn't get merged at all? -Brett From brett at python.org Mon May 26 00:08:34 2008 From: brett at python.org (Brett Cannon) Date: Sun, 25 May 2008 15:08:34 -0700 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: On Sun, May 25, 2008 at 12:21 PM, Georg Brandl wrote: > Hi, > > I'll handle the PEP 3108 dbm package if nobody else is already at it. > I know I have not started the work. > Two questions though: > > * the whichdb() function returns strings that are module names. These > names won't be importable anymore in 3k. Should the return values > remain the same in 3k, or should whichdb() return the new names, and > if the latter, including "dbm." or not? > New names with the package name prepended. Should probably change the API at some point to just return the module to use instead of the name. > * two of the previous modules are C modules, namely dbm and gdbm. They > can't be easily moved into the package. I expect the solution is to > create stub Python modules and rename the C modules with a leading > underscore? (It's already like this for bsd, except that the C module > name, bsddb, has no underscore.) > Yep. I don't know of any package in the stdlib that uses a extension module in some other fashion. -Brett From humitos at gmail.com Mon May 26 01:41:41 2008 From: humitos at gmail.com (Manuel Kaufmann) Date: Sun, 25 May 2008 20:41:41 -0300 Subject: [Python-3000] Hello World! Message-ID: <200805252041.41477.humitos@gmail.com> Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System Engenieer" in Santa F? (Capital) Argentina. I met Python in the "1? Jornadas de Python en Santa F?" three years ago, and I'm happy with it. How can you see I don't speek english very well but I try to make undertand me. "Saludos!" -- Kaufmann Manuel Blog: http://humitos.wordpress.com/ PyAr: http://www.python.com.ar/ From tjreedy at udel.edu Mon May 26 02:22:43 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 25 May 2008 20:22:43 -0400 Subject: [Python-3000] Hello World! References: <200805252041.41477.humitos@gmail.com> Message-ID: "Manuel Kaufmann" wrote in message news:200805252041.41477.humitos at gmail.com... | Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System Engenieer" in | Santa F? (Capital) Argentina. | | I met Python in the "1? Jornadas de Python en Santa F?" three years ago, and | I'm happy with it. How can you see I don't speek english very well but I try | to make undertand me. Hi Manuel, This list (Python 3000 devel) is for discussion of development of a *future* version of Python, primarily by the developers of that version. For general discussion of Python, please address python-list or comp.lang.python. From humitos at gmail.com Mon May 26 02:34:45 2008 From: humitos at gmail.com (Manuel Kaufmann) Date: Sun, 25 May 2008 21:34:45 -0300 Subject: [Python-3000] Hello World! In-Reply-To: References: <200805252041.41477.humitos@gmail.com> Message-ID: <200805252134.46034.humitos@gmail.com> El Sunday 25 May 2008 21:22:43 Terry Reedy escribi?: > Hi Manuel, > This list (Python 3000 devel) is for discussion of development of a > *future* version of Python, primarily by the developers of that version. Yes, I know that. I subscribed to it because I want to discuss about this issue[1] which I make a patch for that but I don't sure if it's correct or not. [1] http://bugs.python.org/issue2888 -- Kaufmann Manuel Blog: http://humitos.wordpress.com/ PyAr: http://www.python.com.ar/ From humitos at gmail.com Mon May 26 01:44:47 2008 From: humitos at gmail.com (Manuel Kaufmann) Date: Sun, 25 May 2008 20:44:47 -0300 Subject: [Python-3000] Hello World! Message-ID: <200805252044.47192.humitos@gmail.com> Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System Engenieer" in Santa F? (Capital) Argentina. I met Python in the "1? Jornadas de Python en Santa F?" three years ago, and I'm happy with it. How can you see I don't speek english very well but I try to make undertand me. "Saludos!" -- Kaufmann Manuel Blog: http://humitos.wordpress.com/ PyAr: http://www.python.com.ar/ From tjreedy at udel.edu Mon May 26 03:54:49 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 25 May 2008 21:54:49 -0400 Subject: [Python-3000] Hello World! References: <200805252041.41477.humitos@gmail.com> <200805252134.46034.humitos@gmail.com> Message-ID: "Manuel Kaufmann" wrote in message news:200805252134.46034.humitos at gmail.com... | El Sunday 25 May 2008 21:22:43 Terry Reedy escribi?: | > Hi Manuel, | > This list (Python 3000 devel) is for discussion of development of a | > *future* version of Python, primarily by the developers of that version. | | Yes, I know that. I subscribed to it because I want to discuss about this | issue[1] which I make a patch for that but I don't sure if it's correct or | not. | | [1] http://bugs.python.org/issue2888 In that case, have more patience and give the tracker discussion process more time. From humitos at gmail.com Mon May 26 06:01:07 2008 From: humitos at gmail.com (Manuel Kaufmann) Date: Mon, 26 May 2008 01:01:07 -0300 Subject: [Python-3000] Hello World! In-Reply-To: References: <200805252041.41477.humitos@gmail.com> <200805252134.46034.humitos@gmail.com> Message-ID: <200805260101.07655.humitos@gmail.com> El Sunday 25 May 2008 22:54:49 Terry Reedy escribi?: > In that case, have more patience and give the tracker discussion process > more time. Sorry, I didn't explain myself. I want to discuss "How to should work pprint.pprint?" or "Why py3k not works according to documentation?[1]" I prefer the new way of showing it. Some times the old style don't like me. Example (in py2.6): >>> import pprint >>> stuff = [1,2,3] >>> pprint.pprint(stuff, indent=4) [ 1, 2, 3] >>> stuff.insert(0, stuff[:]) >>> pprint.pprint(stuff, indent=4) [ [ 1, 2, 3], 1, 2, 3] >>> stuff.insert(0, stuff[:]) >>> pprint.pprint(stuff, indent=4) [ [ [ 1, 2, 3], 1, 2, 3], [ 1, 2, 3], 1, 2, 3] I prefer this one (in py3k): >>> import pprint >>> stuff = [1,2,3] >>> stuff.insert(0, stuff[:]) >>> pprint.pprint(stuff, indent=4) [[1, 2, 3], 1, 2, 3] >>> stuff.insert(0, stuff[:]) >>> pprint.pprint(stuff, indent=4) [[[1, 2, 3], 1, 2, 3], [1, 2, 3], 1, 2, 3] >>> Now, if py3k is working fine, the documentation should be fix. -- Kaufmann Manuel Blog: http://humitos.wordpress.com/ PyAr: http://www.python.com.ar/ From g.brandl at gmx.net Mon May 26 11:14:06 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 26 May 2008 11:14:06 +0200 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: Brett Cannon schrieb: > On Sun, May 25, 2008 at 12:21 PM, Georg Brandl wrote: >> Hi, >> >> I'll handle the PEP 3108 dbm package if nobody else is already at it. >> > > I know I have not started the work. > >> Two questions though: >> >> * the whichdb() function returns strings that are module names. These >> names won't be importable anymore in 3k. Should the return values >> remain the same in 3k, or should whichdb() return the new names, and >> if the latter, including "dbm." or not? >> > > New names with the package name prepended. > > Should probably change the API at some point to just return the module > to use instead of the name. > >> * two of the previous modules are C modules, namely dbm and gdbm. They >> can't be easily moved into the package. I expect the solution is to >> create stub Python modules and rename the C modules with a leading >> underscore? (It's already like this for bsd, except that the C module >> name, bsddb, has no underscore.) >> > > Yep. I don't know of any package in the stdlib that uses a extension > module in some other fashion. Okay, that's settled then! Georg From solipsis at pitrou.net Mon May 26 11:42:35 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 26 May 2008 09:42:35 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes Message-ID: Hello all, Trying to fix #2507 (Exception state lives too long in 3.0) has uncovered new issues with the bare "raise" statement when in used in exception block nesting situations (see #2833: __exit__ silences the active exception). I say "uncovered" rather than "crated" since, as Amaury points out in the latter bug entry, re-raising behaviour has always been a bit limited or non-obvious. Witness the following code: try: raise Exception("foo") except Exception: try: raise KeyError("caught") except KeyError: pass raise With python 2.x and py3k pre-r62847, it would re-raise KeyError("caught") (whereas the intuitive behaviour would be to re-raise Exception("foo")). With py3k post-r62847, it now raises a "RuntimeError: No active exception to reraise". Note that in py3k at least, we can get the "correct" behaviour by writing instead: try: raise Exception("foo") except Exception as e: try: raise KeyError("caught") except KeyError: pass raise e The only slight annoyance being that the re-raising statement ("raise e") is added at the end of the original traceback. There are other funny situations. Just try (with any Python version): def except_yield(): try: raise Exception("foo") except: yield 1 raise list(except_yield()) The problem with properly fixing the bare "raise" statement is that right now, the saved exception state is a member of the frame object. That is, there is no proper stacking of exception states when some lexically nested exception handlers are involved in the same frame. Now perhaps it is time to think about fixing that problem, without losing the expected properties of exceptions in py3k. I propose the following changes: - an "except" block now also becomes a block in ceval.c terms, that is, a specific PyTryBlock is pushed at its beginning (please note that right now SETUP_EXCEPT, despite its name, encloses the "try" block rather than any "except" statement) - this specific PyTryBlock - let's name it EXCEPT_HANDLER - is created implicitly, not explicitly through an opcode; this is necessary because it must be created *before* setting the current exception state to the caught exception, waiting for an opcode to be executed would be too late - before pushing this EXCEPT_HANDLER on the block stack, the current thread's exception state (that is, before the exception is caught) is saved on the frame stack (that is, the three objects representing the type, value and traceback respectively) - an EXCEPT_HANDLER block is unwinded explicitly with a dedicated POP_EXCEPT opcode at the end of the exception handler; this opcode, not only unwinds the block as POP_BLOCK does, but also pops and restores the exception state which was saved on the stack before pushing the block - an EXCEPT_HANDLER block, when it is unwinded implicitly because of a control transfer (e.g. "return" or "continue" or "break" or "raise"), follows the same treatment as in the POP_EXCEPT opcode: that is, in addition to unwinding the block, it also pops and restores the previous exception state - the current set_exc_info() / reset_exc_info() machinery is yanked, since it is not useful anymore; this also probably removes three fields in the frame object, because it does not need to contain the previous exception state anymore I've not studied the "with" statement implementation. Chances are it should also be adapted to follow the principles above. I may also be missing other annoying "details" :-) What do you think? Regards Antoine. From musiccomposition at gmail.com Mon May 26 14:35:24 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 26 May 2008 07:35:24 -0500 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: <1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com> On Sun, May 25, 2008 at 9:59 AM, Christian Heimes wrote: > > I like to follow Guido's advice and change the code as following: > > * replace PyBytes_ with PyByteArray_ > * replace PyString with PyBytes_ > * rename bytesobject.[ch] to bytearrayobject.[ch] > * rename stringobject.[ch] to bytesobject.[ch] > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ +1 Do you need any help? > > Christian -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From mal at egenix.com Mon May 26 15:29:07 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 26 May 2008 15:29:07 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: <483ABB23.6050900@egenix.com> On 2008-05-25 16:59, Christian Heimes wrote: > Hello! > > The first set of betas of Python 2.6 and 3.0 is fast apace. I like to > grab the final chance and clean up the C API of 2.6 and 3.0. I know, I > know, I brought up the topic two times in the past. But this time I mean > it for real! :] > > Last time Guido said: > --- > I think it can actually be simplified. I think maintaining binary > compatibility between 2.6 and earlier versions is hopeless anyway, so > we might as well just rename PyString to PyBytes in 2.6 and 3.0, and > have an extra set of macros so that code using PyString needs to be > recompiled but not otherwise touched. E.g. > > typedef { ... } PyBytesObject; > #define PyStringObject PyBytesObject > > ... PyString_Type; > #define PyBytes_Type PyString_Type > > > --- > > I like to follow Guido's advice and change the code as following: > > * replace PyBytes_ with PyByteArray_ > * replace PyString with PyBytes_ > * rename bytesobject.[ch] to bytearrayobject.[ch] > * rename stringobject.[ch] to bytesobject.[ch] > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ Since this is major break in the Python C API, please make sure that you bump the Python C API level used for module imports. Most imports will fail anyway at the link stage, since PyString_* APIs are probably the most used C APIs in Python extensions. One detail, I'm worried about is the change of the type name, since that is sometimes used in object serialization or proxy implementations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 41 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From lists at cheimes.de Mon May 26 15:43:57 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 26 May 2008 15:43:57 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com> Message-ID: <483ABE9D.3070004@cheimes.de> Benjamin Peterson schrieb: > On Sun, May 25, 2008 at 9:59 AM, Christian Heimes wrote: >> I like to follow Guido's advice and change the code as following: >> >> * replace PyBytes_ with PyByteArray_ >> * replace PyString with PyBytes_ >> * rename bytesobject.[ch] to bytearrayobject.[ch] >> * rename stringobject.[ch] to bytesobject.[ch] >> * add a new file stringobject.h which contains the aliases PyString_ -> >> PyBytes_ > > +1 > > Do you need any help? I've renamed the functions and modules. Can you help me with updating the C API docs? In Python 2.6 the docs must still use PyString but you can add a note that PyBytes_ works, too. Christian From lists at cheimes.de Mon May 26 15:40:31 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 26 May 2008 15:40:31 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483ABB23.6050900@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> Message-ID: <483ABDCF.8000105@cheimes.de> M.-A. Lemburg schrieb: > Most imports will fail anyway at the link stage, since PyString_* APIs > are probably the most used C APIs in Python extensions. I think you have missed an important point. In Python 2.6 the names stay the same for the linker. Although the functions are now called PyBytes_Egg, they are redefined to PyString_Egg by a second header file. In Python 2.6 the renaming of PyString are purely for consistence with the new Python 3.0 names. The names for PyString stay the same for external code like the library and extension modules. PyBytes -> PyByteArray is a different story, though. > One detail, I'm worried about is the change of the type name, since > that is sometimes used in object serialization or proxy implementations. The type names aren't changed, too They are still "str" and "bytearray" in Python 2.6 (moved down) > Since this is major break in the Python C API, please make sure > that you bump the Python C API level used for module imports. Do you still think it's necessary to bump up the C API version level? Christian From mal at egenix.com Mon May 26 17:03:20 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 26 May 2008 17:03:20 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483ABDCF.8000105@cheimes.de> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> Message-ID: <483AD138.7000804@egenix.com> On 2008-05-26 15:40, Christian Heimes wrote: > M.-A. Lemburg schrieb: >> Most imports will fail anyway at the link stage, since PyString_* APIs >> are probably the most used C APIs in Python extensions. > > I think you have missed an important point. In Python 2.6 the names stay > the same for the linker. Although the functions are now called > PyBytes_Egg, they are redefined to PyString_Egg by a second header file. > > In Python 2.6 the renaming of PyString are purely for consistence with > the new Python 3.0 names. The names for PyString stay the same for > external code like the library and extension modules. Isn't that an awefuly confusing approach ? Wouldn't it be better to keep PyString APIs and definitions in stringobject.c|h and only add a new bytesobject.h header file that #defines the PyBytes APIs in terms of PyString APIs ? That maintains backwards compatibility and allows Python internals to use the new API names. With your approach, you've basically backported the confusing notion in Py3k that str() maps PyUnicode, only that in Py2 str() will now map to PyBytes. You'd have to add an aliase bytes -> str to the builtins to at least reduce the confusion a bit. However, that's bound to cause even more problems, since people will start using bytes() instead of str() in Py2 applications and as a result they won't run in older Python versions anymore. The same problem applies to Py2 extensions writers that wish to support older Python releases as well. > PyBytes -> PyByteArray is a different story, though. PyBytes was new in 2.6 anyway, so there's no breakage there. >> One detail, I'm worried about is the change of the type name, since >> that is sometimes used in object serialization or proxy implementations. > > The type names aren't changed, too They are still "str" and "bytearray" > in Python 2.6 Good. > (moved down) >> Since this is major break in the Python C API, please make sure >> that you bump the Python C API level used for module imports. > > Do you still think it's necessary to bump up the C API version level? Yes, but please let's first discuss this some more. I don't think that the timing was right.... you started this thread just yesterday and the patches are already checked in. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 26 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 41 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From g.brandl at gmx.net Mon May 26 17:12:30 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 26 May 2008 17:12:30 +0200 Subject: [Python-3000] http package: _FooCookieJar modules? Message-ID: dbm and xmlrpc are done, now I'm at the http package, and have a question: Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar modules separate from http.cookiejar? I'd rather merge them into http.cookiejar and have two less strangely named modules. cheers, Georg From fumanchu at aminus.org Mon May 26 18:29:26 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 26 May 2008 09:29:26 -0700 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: Antoine Pitrou wrote: > Trying to fix #2507 (Exception state lives too long in 3.0) has > uncovered new issues with the bare "raise" statement when in used > in exception block nesting situations (see #2833: __exit__ > silences the active exception). I say "uncovered" rather than > "created" since, as Amaury points out in the latter bug entry, > re-raising behaviour has always been a bit limited or non-obvious. > > Witness the following code: > > try: > raise Exception("foo") > except Exception: > try: raise KeyError("caught") > except KeyError: pass > raise > > With python 2.x and py3k pre-r62847, it would re-raise > KeyError("caught") (whereas the intuitive behaviour would > be to re-raise Exception("foo")). With py3k post-r62847, > it now raises a "RuntimeError: No active exception to > reraise". > > Note that in py3k at least, we can get the "correct" behaviour by > writing instead: > > try: > raise Exception("foo") > except Exception as e: > try: raise KeyError("caught") > except KeyError: pass > raise e > > The only slight annoyance being that the re-raising statement > ("raise e") is added at the end of the original traceback. I wouldn't call that either "incorrect" or "non-obvious". It certainly hasn't been a burden in Python 2.x. > There are other funny situations. Just try (with any Python version): > > def except_yield(): > try: > raise Exception("foo") > except: > yield 1 > raise > list(except_yield()) > > The problem with properly fixing the bare "raise" statement is that > right now, the saved exception state is a member of the frame object. > That is, there is no proper stacking of exception states when some > lexically nested exception handlers are involved in the same frame. > > Now perhaps it is time to think about fixing that problem, without > losing the expected properties of exceptions in py3k. I propose > the following changes: > > - an "except" block now also becomes a block in ceval.c terms, > that is, a specific PyTryBlock is pushed at its beginning (please > note that right now SETUP_EXCEPT, despite its name, encloses the > "try" block rather than any "except" statement) > [snip lots more changes] That seems like an awful lot of work and change just to trade the above problem for a new one: try: raise Exception("foo") except Exception as e: try: raise KeyError("caught") except KeyError as x: pass raise x In either case, it's easy enough to bind the exception to a name--easier in 2.6/3k with the abolition of string exceptions (since "except BaseException" now catches everything). Robert Brewer fumanchu at aminus.org From solipsis at pitrou.net Mon May 26 18:48:54 2008 From: solipsis at pitrou.net (Antoine) Date: Mon, 26 May 2008 18:48:54 +0200 (CEST) Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: <47085.192.165.213.18.1211820534.squirrel@webmail.nerim.net> Hi, >> The only slight annoyance being that the re-raising statement >> ("raise e") is added at the end of the original traceback. > > I wouldn't call that either "incorrect" or "non-obvious". What are you talking about exactly? :) The fact that in 2.x the last caught exception is re-raised even after the end of the except block which caught it, rather than the exception caught by the lexically enclosing block? Anyway, in 3.x this behaviour will be impossible to mimick, since by specification the exception state must disappear at the end of the except block. > That seems like an awful lot of work and change just to trade the above > problem for a new one: > > try: > raise Exception("foo") > except Exception as e: > try: raise KeyError("caught") > except KeyError as x: pass > raise x The snippet above will not work under 3.x *by design* (the "x" variable disappears at the end of the except block), there is even a test for it in test_exceptions.py :-) The proposal I made is meant to allow having proper exception cleanup semantics as mandated by the py3k spec, and yet be able to using a bare "raise" re-raising statement in non-trivial nested exception handler situations. Or of course we can just decide that bare "raise" is obsolete in 3.x and must be replaced with a properly qualified raise statement. Regards Antoine. From solipsis at pitrou.net Mon May 26 18:58:15 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 26 May 2008 16:58:15 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: <47085.192.165.213.18.1211820534.squirrel@webmail.nerim.net> Message-ID: Antoine pitrou.net> writes: > > The proposal I made is meant to allow having proper exception cleanup > semantics as mandated by the py3k spec, and yet be able to using a bare > "raise" re-raising statement in non-trivial nested exception handler > situations. I forgot to add that sys.exc_info() is probably impacted too. Actually, anything which retrieves the current thread's exception state... Regards Antoine. From brett at python.org Mon May 26 19:45:06 2008 From: brett at python.org (Brett Cannon) Date: Mon, 26 May 2008 10:45:06 -0700 Subject: [Python-3000] http package: _FooCookieJar modules? In-Reply-To: References: Message-ID: On Mon, May 26, 2008 at 8:12 AM, Georg Brandl wrote: > dbm and xmlrpc are done, now I'm at the http package, and have a > question: > > Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar > modules separate from http.cookiejar? I'd rather merge them into > http.cookiejar and have two less strangely named modules. > They have leading underscores, so do what you will. If anyone directly imports them that's their problem. -Brett From g.brandl at gmx.net Mon May 26 20:01:22 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 26 May 2008 20:01:22 +0200 Subject: [Python-3000] http package: _FooCookieJar modules? In-Reply-To: References: Message-ID: Brett Cannon schrieb: > On Mon, May 26, 2008 at 8:12 AM, Georg Brandl wrote: >> dbm and xmlrpc are done, now I'm at the http package, and have a >> question: >> >> Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar >> modules separate from http.cookiejar? I'd rather merge them into >> http.cookiejar and have two less strangely named modules. >> > > They have leading underscores, so do what you will. If anyone directly > imports them that's their problem. Done. Georg From brett at python.org Mon May 26 21:08:55 2008 From: brett at python.org (Brett Cannon) Date: Mon, 26 May 2008 12:08:55 -0700 Subject: [Python-3000] http package: _FooCookieJar modules? In-Reply-To: References: Message-ID: On Mon, May 26, 2008 at 11:01 AM, Georg Brandl wrote: > Brett Cannon schrieb: >> >> On Mon, May 26, 2008 at 8:12 AM, Georg Brandl wrote: >>> >>> dbm and xmlrpc are done, now I'm at the http package, and have a >>> question: >>> >>> Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar >>> modules separate from http.cookiejar? I'd rather merge them into >>> http.cookiejar and have two less strangely named modules. >>> >> >> They have leading underscores, so do what you will. If anyone directly >> imports them that's their problem. > > Done. Great! Wow, PEP 3108 might actually get finished before the beta! -Brett From lists at cheimes.de Mon May 26 23:34:58 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 26 May 2008 23:34:58 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483AD138.7000804@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> Message-ID: <483B2D02.8040400@cheimes.de> M.-A. Lemburg schrieb: > Isn't that an awefuly confusing approach ? > > Wouldn't it be better to keep PyString APIs and definitions in > stringobject.c|h > > and only add a new bytesobject.h header file that #defines the > PyBytes APIs in terms of PyString APIs ? That maintains > backwards compatibility and allows Python internals to use the > new API names. > > With your approach, you've basically backported the confusing > notion in Py3k that str() maps PyUnicode, only that in Py2 > str() will now map to PyBytes. The last time I brought up the topic, I had a lengthy discussion with Guido. At first I wanted to rename the API in Python 3.0 only. Guido argued that it's going to cause too much merge conflicts. He then suggested the approach I implemented today. I find the approach less confusing than your suggestion and my initial idea. The internal API names are consistent for Python 2.6 and 3.0. The byte string C API is prefixed PyBytes and the unicode C API is prefixed PyUnicode. A core developer has just to remember that 'str' is a byte string in 2.x but an unicode object in 3.0. Extension developers don't have to worry at all. The ABI and external API is mostly the same and still exposes the 'str' functions as PyString. > You'd have to add an aliase bytes -> str to the builtins to > at least reduce the confusion a bit. Python 2.6 already has an alias bytes -> str > Yes, but please let's first discuss this some more. I don't think > that the timing was right.... you started this thread just yesterday > and the patches are already checked in. I'm sorry if I was too hasty for you. I got +1 from a couple of developers and it's basically Guido's suggestion. Christian From jimjjewett at gmail.com Tue May 27 00:53:41 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 26 May 2008 18:53:41 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <20080524171814.GA4026@phd.pp.ru> References: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> Message-ID: On 5/24/08, Oleg Broytmann wrote: > On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote: > > if I want pretty, I'll use print (or pprint). > str(container_of_strings) uses repr(), so you loose prettiness on either > print or '%s' % container_of_strings. This is not a problem with repr; it is a bug with str. I certainly support a flag for repr meaning "This was really str; repr got called because the container doesn't have str, but go back to str for the contents." (Alternatively, write an explicit repr that does that, add it to the builtin types, and make it available for easy use with extensions.) > Exceptions use repr() for file names, > e.g., which is very inconvenient, IMHO. I'm not sure I fully understand this problem, but I would expect the right solution to be a change to either Exception.__str__ or the way filename-related exceptions are initialized. Changing all of repr is again overkill. -jJ From phd at phd.pp.ru Tue May 27 01:03:30 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 03:03:30 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> Message-ID: <20080526230330.GB8849@phd.pp.ru> On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote: > On 5/24/08, Oleg Broytmann wrote: > > On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote: > > > if I want pretty, I'll use print (or pprint). > > > str(container_of_strings) uses repr(), so you loose prettiness on either > > print or '%s' % container_of_strings. > > This is not a problem with repr; it is a bug with str. str(container) tries to call container.__str__ (which is absent) and then container.__repr__. Is "str or repr" a bug? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Tue May 27 01:24:02 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 03:24:02 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> Message-ID: <20080526232402.GC8849@phd.pp.ru> On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote: > I certainly support a flag for repr meaning "This was really str; repr > got called because the container doesn't have str, but go back to str > for the contents." (Alternatively, write an explicit repr that does > that, add it to the builtin types, and make it available for easy use > with extensions.) > > > Exceptions use repr() for file names, > > e.g., which is very inconvenient, IMHO. > > I'm not sure I fully understand this problem, but I would expect the > right solution to be a change to either Exception.__str__ or the way > filename-related exceptions are initialized. Changing all of repr is > again overkill. There are two different and unrelated problems. One is that str(container) calls repr() on items. This probably could be fixed with a flag to repr() so it remembers it was called from str(). This has nothing with hex-encoding strings - calling str() on items would be a win in any case, especially for items that implements both __str__ and __repr__ methods. The other problem is that repr(string) returns it hex-encoded. The second problem is hard to fix without changing repr itself, because repr() is used in many places (exceptions was only one example). On the other hand those who want the old (current) behaviour can do repr(obj).encode("ascii", errors="backslashreplace"). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From jimjjewett at gmail.com Tue May 27 01:26:56 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 26 May 2008 19:26:56 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Summary: The only reason for this change is that __repr__ gets used when __str__ *should* be used instead. Fix that bug instead of making repr less predictable. On 5/25/08, Stephen J. Turnbull wrote: > Jim Jewett writes: > > I'm more worried that it might look like English, yet be subtly > > (and importantly) different. > Let me remind you that I advocated that position, and (1) Martin > shot me down hard, and (2) Guido indicated that it is a point, > but he now seems happy enough not to worry about it. I will agree that this is similar to the issue of non-ascii identifiers. If you can always trust everything on your system completely, then it doesn't matter whether or not you can even read the code. If you might have to at least review things, then confusability is an issue. The question is where to draw the line. I see print (and therefore str) as being intended for people, so they should clearly use as much of unicode as available. Identifiers are not usually part of the UI, so the case isn't as strong (but i agree that it is now settled). repr is not for normal UI; it is in explicit contrast to str. I therefore believe it should default to the safest possible representation. >... in view of the wide variety of cases where it seems to be > used for something other than diagnosing normally invisible > features of output. These are bugs. I haven't yet seen a single case where it *should* have been using repr instead of str. Unfortunately, str itself resorts to repr in some cases, and -- buggily -- then stays in repr mode as it recurses down. The right answer is not to make repr less predictable; it is to make those str representations better -- if only by having them go back to str(x) for the containers' contents. > > I just want it to be very easy to say "on my system, repr is ASCII". > That is in all proposals. Then I sometimes missed it. And I'll note that it didn't happen for identifiers. > > I would prefer that ASCII also be the default, so that people who want > > more characters opt in to receive them, >... given the extent to > which repr is used to produce output meaningful to end-users > (vs. diagnostics for application and/or Python maintainers). Again -- *why* is repr used instead of str? As nearly as I can tell, it is because of bugs (admittedly, often in builtin __str__ functions); making repr less predictable is a workaround rather than a solution. -jJ From phd at phd.pp.ru Tue May 27 01:34:32 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 03:34:32 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20080526233432.GD8849@phd.pp.ru> On Mon, May 26, 2008 at 07:26:56PM -0400, Jim Jewett wrote: > Summary: > > The only reason for this change is that __repr__ gets used when > __str__ *should* be used instead. No, it is not the only reason. The other reason is that repr() is used in many different places where we want readable output. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From rhamph at gmail.com Tue May 27 02:00:02 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 26 May 2008 18:00:02 -0600 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <20080526232402.GC8849@phd.pp.ru> References: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> Message-ID: On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann wrote: > On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote: >> I certainly support a flag for repr meaning "This was really str; repr >> got called because the container doesn't have str, but go back to str >> for the contents." (Alternatively, write an explicit repr that does >> that, add it to the builtin types, and make it available for easy use >> with extensions.) >> >> > Exceptions use repr() for file names, >> > e.g., which is very inconvenient, IMHO. >> >> I'm not sure I fully understand this problem, but I would expect the >> right solution to be a change to either Exception.__str__ or the way >> filename-related exceptions are initialized. Changing all of repr is >> again overkill. > > There are two different and unrelated problems. One is that > str(container) calls repr() on items. This probably could be fixed with > a flag to repr() so it remembers it was called from str(). This has nothing > with hex-encoding strings - calling str() on items would be a win in any > case, especially for items that implements both __str__ and __repr__ > methods. There's a reason for that convention. Would you prefer str(['1', '2', '3']) return '[1, 2, 3]'? Personally, I'm happy with Guido's suggestion of stderr and interactive to backslashreplace and stdout to strict. It's a dramatic change, but I think I can get used to it. -- Adam Olsen, aka Rhamphoryncus From jimjjewett at gmail.com Tue May 27 03:06:55 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 26 May 2008 21:06:55 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> Message-ID: On 5/24/08, Atsuo Ishimoto wrote: > Specification > ============= It might help to call out which parts are changes. If I understand correctly, the only changes (as opposed to additions) are for characters which are for characters which are (all three of) (a) outside of ASCII (b) not broken (that is, not half of a surrogate pair half) (c) not in the new excluded set. > * Characters defined in the Unicode character database as "Separator" > (Zl, Zp, Zs) other than ASCII space(0x20). Please put in a note that Zl and Zp refer only to two specific unicode characters, not to what most people think of as line separators or paragraph markers. > * Backslash-escape quote characters(apostrophe, ') and add quote > character at the beginning and the end. Do you just mean the two ASCII quotation marks that python uses? As written, I wondered whether it would include backquote or guillemet. > - Add ``'%a'`` string format operator. ``'%a'`` converts any python > object to string using ``repr()`` and then hex-escape all non-ASCII > characters. ``'%a'`` operator generates same string as ``'%r'`` in > Python 2. Then why not keep the old %r, and add a new one for the unicode repr? Is it again because of the bug where str([..., mystr, ...]) ends up doing repr on mystr? > - Add ``ascii()`` builtin function. ``ascii()`` converts any python > object to string using ``repr()`` and then hex-escape all non-ASCII > characters. ``ascii()`` generates same string as ``repr()`` in Python 2. The problem isn't that I want to be able to write code that acts the old way; the problem is that I want to ensure all code running on my system acts the old way. Adding an ascii() function doesn't help. Keeping repr and adding full_repr would work (because I could look for the new name). Keeping repr and fixing the way it recurses when used as a str fallback would be even better. > Strings to be printed for debugging are not only contained by lists or > dicts, but also in many other types of object. File objects contain a > file name in Unicode, exception objects contain a message in Unicode, > etc. These strings should be printed in readable form when repr()ed. > It is unlikely to be possible to implement a tool to print all > possible object types. You could go a long way (particularly in Py3k, where everything inherits from object) by changing the builtin containers, and changing object.__str__ to try "<%s: %s>" % (type(v), iter(v)) before falling back to repr. (You may wish something that looks for mappings and sequences instead of any iterables. You may wish to change the exact look of the repr -- the point is just to tell the contained objects to try str.) > - Make the encoding used by ``unicode_repr()`` adjustable, and make > current ``repr()`` as default. > With adjustable ``repr()``, result of ``repr()`` is unpredictable and > would make impossible to write correct code involving ``repr()``. No more so than 3138. The setting of repr is predictable on a given system. (Even if you make it a changeable during a single run, it is predictable by checking first.) Across systems, the 3138 proposal is already unpredictable, because you don't know which systems will apply backslash-replace on which characters (and on which runs). -jJ From greg.ewing at canterbury.ac.nz Tue May 27 04:04:51 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 May 2008 14:04:51 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> Message-ID: <483B6C43.1060401@canterbury.ac.nz> Jim Jewett wrote: > I certainly support a flag for repr meaning "This was really str; repr > got called because the container doesn't have str, but go back to str > for the contents." Doing this properly would require changing the signature of repr() *everywhere*, not just for strings, because the flag needs to be propagated recursively to any nested object that could have strings in it. > Alternatively, write an explicit repr that does > that, Again, every container type would need to know about and handle this new form of repr(). -- Greg From jimjjewett at gmail.com Tue May 27 04:13:10 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 26 May 2008 22:13:10 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> Message-ID: On 5/26/08, Adam Olsen wrote: > On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann wrote: > > There are two different and unrelated problems. One is that > > str(container) calls repr() on items. This probably could be fixed with > > a flag to repr() so it remembers it was called from str(). This has nothing > > with hex-encoding strings - calling str() on items would be a win in any > > case, especially for items that implements both __str__ and __repr__ > > methods. > There's a reason for that convention. Would you prefer str(['1', '2', > '3']) return '[1, 2, 3]'? I don't think anyone is arguing about how to display >>> "%" % string The problem is classes where str(x) != repr(x), and how they get messed up when a container holding (one of their) instances is printed. >>> class A: def __str__(self): return "an A" >>> a=A() >>> print a # this is fine. an A >>> str(a) # this is OK, you have asked for "%s" % a 'an A' >>> repr(a) # this is OK, you wanted repr explicitly. '<__main__.A instance at 0x012DDAF8>' >>> print ([a]) # this stinks ... [<__main__.A instance at 0x012DDAF8>] It would be much better as: >>> print ([a]) # after fixing the recursion bug ['an a'] Whereas you are asking about the (perhaps also acceptable): >>> # after fixing the recursion bug, >>> print ([a]) # and somehow not even applying str [an a] -jJ From greg.ewing at canterbury.ac.nz Tue May 27 04:12:51 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 May 2008 14:12:51 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <483B6E23.3050805@canterbury.ac.nz> Jim Jewett wrote: > repr is not for normal UI; Except that there seem to be places where it *is* used for normal UI, e.g. putting filenames into error messages. -- Greg From ncoghlan at gmail.com Tue May 27 04:32:24 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 May 2008 12:32:24 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> Message-ID: <483B72B8.30906@gmail.com> Jim Jewett wrote: > On 5/26/08, Adam Olsen wrote: >> On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann wrote: > >> > There are two different and unrelated problems. One is that >> > str(container) calls repr() on items. This probably could be fixed with >> > a flag to repr() so it remembers it was called from str(). This has nothing >> > with hex-encoding strings - calling str() on items would be a win in any >> > case, especially for items that implements both __str__ and __repr__ >> > methods. > >> There's a reason for that convention. Would you prefer str(['1', '2', >> '3']) return '[1, 2, 3]'? > > I don't think anyone is arguing about how to display > > >>> "%" % string > > The problem is classes where str(x) != repr(x), and how they get > messed up when a container holding (one of their) instances is > printed. This is NOT a bug, since str([1, 2, 3]) and str(list("123")) SHOULD produce results that look different. Calling str() internally to display the contents of containers is a broken idea. The ambiguity that recursive calls to str() would introduce would make any concerns about potential confusion between different Unicode glyphs seem utterly inconsequential. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue May 27 04:48:08 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 May 2008 12:48:08 +1000 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> Message-ID: <483B7668.1090800@gmail.com> Jim Jewett wrote: > Is it again because of the bug where str([..., mystr, ...]) ends up > doing repr on mystr? Jim, could you please stop describing this behaviour as a bug? It is a perfectly legitimate and desirable approach that ensures lists of different items look different when displayed. Or are actually stating that you *want* str([1, 2, 3]) and str(list("123")) to produce the same output? >> - Add ``ascii()`` builtin function. ``ascii()`` converts any python >> object to string using ``repr()`` and then hex-escape all non-ASCII >> characters. ``ascii()`` generates same string as ``repr()`` in Python 2. > > The problem isn't that I want to be able to write code that acts the > old way; the problem is that I want to ensure all code running on my > system acts the old way. This is for Py3k - you'll be lucky if your old code runs at all, let alone in the same way. > Adding an ascii() function doesn't help. > > Keeping repr and adding full_repr would work (because I could look for > the new name). Py3k. The default option should do the right thing (and in that day-and-age, that means permitting Unicode, rather than restricting object representations to the anglo-centric ASCII subset). The ascii() function would just be a convenience for those cases where the programmer deliberately wants to be anglo-centric. > Keeping repr and fixing the way it recurses when used as a str > fallback would be even better. No it wouldn't - the ambiguity introduced by doing so would dwarf anything we might introduce by permitting arbitrary Unicode characters in repr() output. > No more so than 3138. The setting of repr is predictable on a given > system. (Even if you make it a changeable during a single run, it is > predictable by checking first.) Across systems, the 3138 proposal is > already unpredictable, because you don't know which systems will apply > backslash-replace on which characters (and on which runs). If you're worried about doctests, those should be using StringIO objects, so nothing will ever need to be backslash replaced (since it will be Unicode all the way). In terms of actual IO for display to a user, why do you care if something gets backslash replaced or not? The characters which are replaced will only be those which the user's terminal can't display anyway. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stephen at xemacs.org Tue May 27 05:02:45 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 27 May 2008 12:02:45 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > The only reason for this change is that __repr__ gets used when > __str__ *should* be used instead. That's not what the advocates say. Now, repr() is supposed to return something that is acceptable to eval (but doesn't always, especially for recursive objects), while str() is supposed to be more "user-friendly" (but can be horrible if you need to see precisely what the contents are or on an output device that's not prepared for it). As far as I can tell, which should be used is a "beauty in the eye of the beholder" issue, and in the case of repr() Spanish and Chinese users are going to feel more or less differently from Americans about which characters should be escaped. > repr is not for normal UI; it is in explicit contrast to str. I > therefore believe it should default to the safest possible > representation. Well, in `String Conversions', the manual says """In particular, converting a string adds quotes around it and converts "funny" characters to escape sequences that are safe to print.""" Now, I agree with you about what's "safe". However, in a text- processing application in a Japanese environment, that's hardly useful, and our Japanese programmer can argue that in his environment, printing all of Unicode *is* safe. Furthermore, most people run in environments where printing Unicode is safe. >>> I just want it to be very easy to say "on my system, repr is ASCII". > >> That is in all proposals. > > Then I sometimes missed it. I should say, "that was in Guido's desiderata, so I assume anything still on the table has it". Viz: 2. If you don't want any non-ASCII printed to a file, set the file's encoding to ASCII and the error handler to backslashescape. (In ) If that's not easy enough for you (I sympathize!), then you need to get Guido's ear. > And I'll note that it didn't happen for identifiers. That's on input, which is very much a different question. > Again -- *why* is repr used instead of str? I don't use it myself other than as a way of diagnosing bugs in programs I write or maintain; in personal practice, I'm in your camp. But my understanding is that there is often an intermediate level, such as a website admin, who needs *some* of the precision of repr() such as escaped representation of whitespace, but also needs to be able read most of the output. It so happens that repr() works as designed for ASCII and acceptably so for ISO Latin, precisely because it *was* designed for ASCII! It sucks for non-Western-European scripts, though, including the ISO 8859 scripts for Cyrillic, Greek, Arabic, and Hebrew. My understanding is that there are more use-cases than there are stringifying functions and methods. Something's got to give. From rhamph at gmail.com Tue May 27 06:52:17 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 26 May 2008 22:52:17 -0600 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> Message-ID: On Mon, May 26, 2008 at 8:13 PM, Jim Jewett wrote: > On 5/26/08, Adam Olsen wrote: >> There's a reason for that convention. Would you prefer str(['1', '2', >> '3']) return '[1, 2, 3]'? > > I don't think anyone is arguing about how to display > > >>> "%" % string > > The problem is classes where str(x) != repr(x), and how they get > messed up when a container holding (one of their) instances is > printed. > >>>> class A: > def __str__(self): return "an A" >>>> a=A() > >>>> print a # this is fine. > an A >>>> str(a) # this is OK, you have asked for "%s" % a > 'an A' >>>> repr(a) # this is OK, you wanted repr explicitly. > '<__main__.A instance at 0x012DDAF8>' > >>>> print ([a]) # this stinks ... > [<__main__.A instance at 0x012DDAF8>] > > It would be much better as: > >>>> print ([a]) # after fixing the recursion bug > ['an a'] > > > > Whereas you are asking about the (perhaps also acceptable): > >>>> # after fixing the recursion bug, >>>> print ([a]) # and somehow not even applying str > [an a] Hmm, I see where the confusion is. Containers only define __repr__, so although you think it's the list.__str__ that's mistakenly using repr(), it's str(list) itself that's calling repr(list). So the question to ask is whether we can define a useful __str__ for containers. str(['an a']) -> '[an a]' is not too bad, but str(['hello, world']) -> '[hello, world]' is ambiguous. It crosses the line into garbage. We could probably define a third variant (beside __str__ and __repr__) which'd be "pretty but unambiguous", but if it's just for escaping unicode you should use PEP 3138's ascii_repr(), and if it's for more it should be a separate discussion on python-list/python-ideas instead of here. Along the way I've found a new definition of repr: unambiguous when used in a container's repr. -- Adam Olsen, aka Rhamphoryncus From phd at phd.pp.ru Tue May 27 08:29:47 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 10:29:47 +0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483B7668.1090800@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> Message-ID: <20080527062947.GA14808@phd.pp.ru> On Tue, May 27, 2008 at 12:48:08PM +1000, Nick Coghlan wrote: > Jim Jewett wrote: > >Is it again because of the bug where str([..., mystr, ...]) ends up > >doing repr on mystr? > > Jim, could you please stop describing this behaviour as a bug? I am with Jim on this part (but only on this). I'd like this class Test: def __str__(self): return "STR" def __repr__(self): return "REPR" test = Test() print test print repr(test) print str(test) print [test] print str([test]) print repr([test]) to print STR REPR STR [STR] [STR] [REPR] but the code actually prints STR REPR STR [REPR] [REPR] [REPR] str(container) not calling str() on items is at least a strange and unexpected behaviour, if not a bug. Unfortunately, it is not easy to fix, and missing quotes on string items is a loose (albeit a minor one for me), so I accepted the compromise - to fix repr() and forget about str(container). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Tue May 27 10:13:39 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 May 2008 18:13:39 +1000 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <20080527062947.GA14808@phd.pp.ru> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> Message-ID: <483BC2B3.6040308@gmail.com> Oleg Broytmann wrote: > str(container) not calling str() on items is at least a strange and > unexpected behaviour, if not a bug. I have no problem at all with people calling this behaviour surprising and unexpected, but I'm not happy with them calling it a bug without being challenged, since there are very good reasons for it working the way it does. str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce distinctive output: calling repr() on container contents achieves this, calling str() does not. Strings are a good example of this ambiguity problem, but there are others, such as Decimal objects which can be indistinguishable from normal floats and integers when printed, but definitely aren't interchangeable with them: >>> x = [1, 2, 3, 1.0, 2.0, 3.0] >>> y = map(Decimal, [1, 2, 3, '1.0', '2.0', '3.0']) >>> x [1, 2, 3, 1.0, 2.0, 3.0] >>> y [Decimal("1"), Decimal("2"), Decimal("3"), Decimal("1.0"), Decimal("2.0"), Decimal("3.0")] >>> x == y False The reason for the inequality is fairly obvious given a repr() based output for the containers (1.0 != Decimal('1.0') by design), but how big would the potential for confusion be if str() on containers invoked str() on their contents: >>> print '[%s]' % ', '.join(map(str, x)) [1, 2, 3, 1.0, 2.0, 3.0] >>> print '[%s]' % ', '.join(map(str, y)) [1, 2, 3, 1.0, 2.0, 3.0] While it could be argued that if you want unambiguous output you should be invoking repr() on the container instead of str(), I'm still seeing many more downsides than upsides to the idea of making str() on the builtin containers display their contents with str() instead of repr(). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From phd at phd.pp.ru Tue May 27 10:36:13 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 12:36:13 +0400 Subject: [Python-3000] str(container) calls repr() (was: PEP 3138) In-Reply-To: <483BC2B3.6040308@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> Message-ID: <20080527083612.GA15216@phd.pp.ru> On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote: > str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce > distinctive output: calling repr() on container contents achieves this, > calling str() does not. String representation is a special case and *the only* special case, and must be handled as a special case. I don't like this special case to be used as a model for all other types. No other type allows usage like list("123"). > While it could be argued that if you want unambiguous output you should > be invoking repr() on the container instead of str(), I'm still seeing > many more downsides than upsides to the idea of making str() on the > builtin containers display their contents with str() instead of repr(). The decision should be upon the user. In an ideal world str(container) calls str() on items, and repr(container) calls repr() on items, so the user can choose what [s]he wants. Currently user is just stuck with repr(). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Tue May 27 11:28:42 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 May 2008 19:28:42 +1000 Subject: [Python-3000] str(container) calls repr() In-Reply-To: <20080527083612.GA15216@phd.pp.ru> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru> Message-ID: <483BD44A.3070902@gmail.com> Oleg Broytmann wrote: > On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote: >> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce >> distinctive output: calling repr() on container contents achieves this, >> calling str() does not. > > String representation is a special case and *the only* special case, and > must be handled as a special case. The problem arises whenever you have two different objects which can produce the same answer for str(), but different answers for repr(). Strings are the most obvious case, since str(obj) and str(repr(obj)) will always produce the same answer for any object which doesn't have separate __str__ and __repr__ implementations, but they aren't the only case (as I endeavoured to show with the Decimal example). > I don't like this special case to be used > as a model for all other types. No other type allows usage like list("123"). That's just the way I happened to write it. You can enter it as str(list(["1", "2", "3"])) if you prefer. > The decision should be upon the user. In an ideal world str(container) > calls str() on items, and repr(container) calls repr() on items, so the > user can choose what [s]he wants. Currently user is just stuck with repr(). That's hardly the case - developers are quite free to iterate over the container invoking str() on each of the sub-items and building the output that way. All I'm really asking for here is for people to identify the use cases that justify introducing such a potential for ambiguity into the container implementations. That has been done to my satisfaction for PEP 3138 (which is why I've switched from being an opponent of allowing repr() to return arbitrary Unicode characters to being a supported of the PEP), but I've yet to see *any* specific use cases for containers invoking str() that wouldn't be better addressed with an application or library specific display loop. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From phd at phd.pp.ru Tue May 27 11:57:09 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 27 May 2008 13:57:09 +0400 Subject: [Python-3000] str(container) calls repr() In-Reply-To: <483BD44A.3070902@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru> <483BD44A.3070902@gmail.com> Message-ID: <20080527095709.GC15216@phd.pp.ru> On Tue, May 27, 2008 at 07:28:42PM +1000, Nick Coghlan wrote: > The problem arises whenever you have two different objects which can > produce the same answer for str(), but different answers for repr(). Aside strings itself, the only example of such objects I can imagine is numbers (ints, floats and decimals). str(12) is the same as str('12') and the same as str(Decimal('1')), but that's all. > All I'm really asking for here is for people to identify the use cases > that justify introducing such a potential for ambiguity into the > container implementations. Why do you afraid of ambiguity so much? str() is supposed to produce pretty output, not necessary non-ambiguous, tight? And if a user wants non-ambiguity [s]he will use repr(). > I've yet to see > *any* specific use cases for containers invoking str() that wouldn't be > better addressed with an application or library specific display loop. Unfortunately, every library has to have its own specific loop, because universal container traversing is impossible. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From mal at egenix.com Tue May 27 12:10:25 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 27 May 2008 12:10:25 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483B2D02.8040400@cheimes.de> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> Message-ID: <483BDE11.509@egenix.com> On 2008-05-26 23:34, Christian Heimes wrote: > M.-A. Lemburg schrieb: >> Isn't that an awefuly confusing approach ? >> >> Wouldn't it be better to keep PyString APIs and definitions in >> stringobject.c|h >> >> and only add a new bytesobject.h header file that #defines the >> PyBytes APIs in terms of PyString APIs ? That maintains >> backwards compatibility and allows Python internals to use the >> new API names. >> >> With your approach, you've basically backported the confusing >> notion in Py3k that str() maps PyUnicode, only that in Py2 >> str() will now map to PyBytes. > > The last time I brought up the topic, I had a lengthy discussion with > Guido. At first I wanted to rename the API in Python 3.0 only. Guido > argued that it's going to cause too much merge conflicts. He then > suggested the approach I implemented today. That's the same argument that came up in the module renaming discussion. I have a feeling that we should be looking for better merge tools, rather than implement code changes that cause more trouble than do good, just because our existing tools aren't smart enough. Wouldn't it be possible to have a 2to3.py converter take the 2.x code (including the C code), convert it and then apply any changes to the 3.x branch ? This wouldn't be merging in the classical sense, it would be automated forward porting. > I find the approach less confusing than your suggestion and my initial > idea. I disagree on that. Renaming old APIs to use the new names by adding a header file with #define is standard practice. Renaming the old APIs in the source code and undoing the renaming with a header file is not. > The internal API names are consistent for Python 2.6 and 3.0. The > byte string C API is prefixed PyBytes and the unicode C API is prefixed > PyUnicode. A core developer has just to remember that 'str' is a byte > string in 2.x but an unicode object in 3.0. So you've solved part of the problem for 3.x by moving the naming mixup back to 2.x. > Extension developers don't have to worry at all. The ABI and external > API is mostly the same and still exposes the 'str' functions as PyString. Well, yes, but only due to a preprocessor hack that turns the names used in bytesobject.c back into names you'd normally look for in stringobject.c. And all this, just because Subversion can't handle merging of symbol renaming. >> You'd have to add an aliase bytes -> str to the builtins to >> at least reduce the confusion a bit. > > Python 2.6 already has an alias bytes -> str > >> Yes, but please let's first discuss this some more. I don't think >> that the timing was right.... you started this thread just yesterday >> and the patches are already checked in. > > I'm sorry if I was too hasty for you. I got +1 from a couple of > developers and it's basically Guido's suggestion. Please discuss any changes of the 2.x code base on python-dev. Such major changes do need more discussion and possibly a PEP as well. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 27 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 40 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Tue May 27 12:27:21 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 27 May 2008 20:27:21 +1000 Subject: [Python-3000] str(container) calls repr() In-Reply-To: <20080527095709.GC15216@phd.pp.ru> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru> <483BD44A.3070902@gmail.com> <20080527095709.GC15216@phd.pp.ru> Message-ID: <483BE209.3090202@gmail.com> Oleg Broytmann wrote: > On Tue, May 27, 2008 at 07:28:42PM +1000, Nick Coghlan wrote: >> The problem arises whenever you have two different objects which can >> produce the same answer for str(), but different answers for repr(). > > Aside strings itself, the only example of such objects I can imagine is > numbers (ints, floats and decimals). str(12) is the same as str('12') and > the same as str(Decimal('1')), but that's all. Those are the only examples I can think of in the standard library, but who knows what user code is doing. We shouldn't break that without compelling use cases. >> All I'm really asking for here is for people to identify the use cases >> that justify introducing such a potential for ambiguity into the >> container implementations. > > Why do you afraid of ambiguity so much? str() is supposed to produce > pretty output, not necessary non-ambiguous, tight? And if a user wants > non-ambiguity [s]he will use repr(). Agreed, but I see the fact that the 'pretty' representation of a container is also unambiguous as a feature rather than a bug (note that even the pprint pretty printing module uses repr() for container contents). *shrug* Make the case in a PEP for str() of the standard containers to recurse using str() instead of repr() and get enough people to agree that it's a good idea and I'll shut up about it. Until that happens, I'd like people to stop claiming that the current behaviour is a bug in any way shape or form. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From mal at egenix.com Tue May 27 12:35:04 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 27 May 2008 12:35:04 +0200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <483BE3D8.4080806@egenix.com> On 2008-05-27 05:02, Stephen J. Turnbull wrote: > Jim Jewett writes: > > > The only reason for this change is that __repr__ gets used when > > __str__ *should* be used instead. > > That's not what the advocates say. > > Now, repr() is supposed to return something that is acceptable to eval > (but doesn't always, especially for recursive objects), while str() is > supposed to be more "user-friendly" (but can be horrible if you need > to see precisely what the contents are or on an output device that's > not prepared for it). AFAIK, eval(repr(obj)) is no longer a requirement... simply because it has always only worked for a small subset of objects and in reality, you wouldn't want to call eval() on anything too often due to the security implications. In my daily use, I see repr(obj) as a way to get a debugging text view of an object, whereas str(obj) is a way to convert it into text. If an object doesn't have a special debugging text view, then it's fine to use the standard text view instead. > As far as I can tell, which should be used is a "beauty in the eye of > the beholder" issue, and in the case of repr() Spanish and Chinese > users are going to feel more or less differently from Americans about > which characters should be escaped. I'm not sure that's always the case, but users should certainly have the freedom to decide whether they prefer backslashed quoted code points or glyphs on their screen. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 27 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 40 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From theller at ctypes.org Tue May 27 12:56:27 2008 From: theller at ctypes.org (Thomas Heller) Date: Tue, 27 May 2008 12:56:27 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483BDE11.509@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> Message-ID: M.-A. Lemburg schrieb: > On 2008-05-26 23:34, Christian Heimes wrote: >> M.-A. Lemburg schrieb: >>> Isn't that an awefuly confusing approach ? >>> >>> Wouldn't it be better to keep PyString APIs and definitions in >>> stringobject.c|h and only add a new bytesobject.h header file >>> that #defines the PyBytes APIs in terms of PyString APIs ? That >>> maintains backwards compatibility and allows Python internals to >>> use the new API names. >>> >>> With your approach, you've basically backported the confusing >>> notion in Py3k that str() maps PyUnicode, only that in Py2 str() >>> will now map to PyBytes. >> >> The last time I brought up the topic, I had a lengthy discussion >> with Guido. At first I wanted to rename the API in Python 3.0 only. >> Guido argued that it's going to cause too much merge conflicts. He >> then suggested the approach I implemented today. > > That's the same argument that came up in the module renaming > discussion. > > I have a feeling that we should be looking for better merge tools, > rather than implement code changes that cause more trouble than do > good, just because our existing tools aren't smart enough. There are applications out there that dynamically import the python dll and link to the exported functions by name; they will all break. I believe in the past we have been more carefull with changes like these. Even when python api functions were turned into cpp macros, we provided exported functions for them; for a few examples see the function definitions near line 1778 in file Python/pythonrun.c . Thomas From stefan_ml at behnel.de Tue May 27 13:10:42 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 27 May 2008 11:10:42 +0000 (UTC) Subject: [Python-3000] =?utf-8?q?Why_is_type=5Fmodified=28=29_in_typeobjec?= =?utf-8?q?t=2Ec_not_a_public_function=3F?= Message-ID: Hi, when we build extension classes in Cython, we have to first build the type to make it available to user code, and then update the type's tp_dict while we run the class body code (PyObject_SetAttr() does not work here). In Py2.6+, this requires invalidating the method cache after each attribute change, which Python does internally using the type_modified() function. Could this function get a public interface? I do not think Cython is the only case where C code wants to modify a type after its creation, and copying the code over seems like a hack to me. Stefan From mal at egenix.com Tue May 27 13:54:26 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 27 May 2008 13:54:26 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> Message-ID: <483BF672.6020108@egenix.com> On 2008-05-27 12:56, Thomas Heller wrote: > M.-A. Lemburg schrieb: >> On 2008-05-26 23:34, Christian Heimes wrote: >>> M.-A. Lemburg schrieb: >>>> Isn't that an awefuly confusing approach ? >>>> >>>> Wouldn't it be better to keep PyString APIs and definitions in >>>> stringobject.c|h and only add a new bytesobject.h header file >>>> that #defines the PyBytes APIs in terms of PyString APIs ? That >>>> maintains backwards compatibility and allows Python internals to >>>> use the new API names. >>>> >>>> With your approach, you've basically backported the confusing >>>> notion in Py3k that str() maps PyUnicode, only that in Py2 str() >>>> will now map to PyBytes. >>> The last time I brought up the topic, I had a lengthy discussion >>> with Guido. At first I wanted to rename the API in Python 3.0 only. >>> Guido argued that it's going to cause too much merge conflicts. He >>> then suggested the approach I implemented today. >> That's the same argument that came up in the module renaming >> discussion. >> >> I have a feeling that we should be looking for better merge tools, >> rather than implement code changes that cause more trouble than do >> good, just because our existing tools aren't smart enough. > > There are applications out there that dynamically import the python dll > and link to the exported functions by name; they will all break. The exported APIs still use the old names. Just the source code versions of the APIs change to the new names and they now live in different files as well. > I believe in the past we have been more carefull with changes like these. > Even when python api functions were turned into cpp macros, we provided > exported functions for them; for a few examples see the function definitions > near line 1778 in file Python/pythonrun.c . IMO, we should keep using that strategy for Python 2.x. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 27 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 40 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From dalcinl at gmail.com Tue May 27 15:39:02 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 27 May 2008 10:39:02 -0300 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: Chistian, I've posted some weeks ago some observation about the status of PyNumberMethods API. The thread link is below, I t did not received much atention. http://mail.python.org/pipermail/python-3000/2008-May/013594.html Now I sumarize that post * 'nb_nonzero' was renamed to 'nb_bool' * 'nb_inplace_divide' was removed * 'nb_hex', 'nb_oct', and 'nb_coerce' are there, but they are unused IMHO, the PyNumbersMethods struct should be left as in Py2, or it should be cleaned up, that is, all unused slots should be removed. On 5/25/08, Christian Heimes wrote: > Hello! > > The first set of betas of Python 2.6 and 3.0 is fast apace. I like to > grab the final chance and clean up the C API of 2.6 and 3.0. I know, I > know, I brought up the topic two times in the past. But this time I mean > it for real! :] > > Last time Guido said: > --- > I think it can actually be simplified. I think maintaining binary > compatibility between 2.6 and earlier versions is hopeless anyway, so > we might as well just rename PyString to PyBytes in 2.6 and 3.0, and > have an extra set of macros so that code using PyString needs to be > recompiled but not otherwise touched. E.g. > > typedef { ... } PyBytesObject; > #define PyStringObject PyBytesObject > > ... PyString_Type; > #define PyBytes_Type PyString_Type > > > --- > > I like to follow Guido's advice and change the code as following: > > * replace PyBytes_ with PyByteArray_ > * replace PyString with PyBytes_ > * rename bytesobject.[ch] to bytearrayobject.[ch] > * rename stringobject.[ch] to bytesobject.[ch] > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ > > Christian > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/dalcinl%40gmail.com > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From bwinton at latte.ca Tue May 27 17:31:05 2008 From: bwinton at latte.ca (Blake Winton) Date: Tue, 27 May 2008 11:31:05 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483BC2B3.6040308@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> Message-ID: <483C2939.2000409@latte.ca> Nick Coghlan wrote: > Oleg Broytmann wrote: >> str(container) not calling str() on items is at least a strange and >> unexpected behaviour, if not a bug. > I have no problem at all with people calling this behaviour surprising > and unexpected, but I'm not happy with them calling it a bug without > being challenged, since there are very good reasons for it working the > way it does. > > str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce > distinctive output: calling repr() on container contents achieves this, > calling str() does not. Why? Seriously, I can write: >>> print 1, "1", Decimal("1") and get as my output: 1 1 1 and somehow no-one complains that it should actually print: 1 "1" Decimal("1") but for some reason when I put those in a list, they should magically change their display? I think the burden of proof is on you to explain why, when we have a perfectly good name for unambiguous output ("repr"), do we need to override the name for ambiguous and nicely-formatted output ("str"), to achieve the same goal. > While it could be argued that if you want unambiguous output you should > be invoking repr() on the container instead of str(), I'm still seeing > many more downsides than upsides to the idea of making str() on the > builtin containers display their contents with str() instead of repr(). But which downsides do you see that aren't solved by the use of repr to get unambiguous output? Later, Blake. From facundobatista at gmail.com Tue May 27 18:01:07 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Tue, 27 May 2008 13:01:07 -0300 Subject: [Python-3000] Py3 docs Message-ID: Hi all! Is there any official web site with the documentation for Python 3.0 in html format? If not, let me share with you this one [1], updated nightly, kindly set up by another Argentinian pythonista (Humitos). Regards, [1] http://humitos.homelinux.net/py3kdoc/ -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From amauryfa at gmail.com Tue May 27 18:06:05 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 27 May 2008 18:06:05 +0200 Subject: [Python-3000] Py3 docs In-Reply-To: References: Message-ID: Facundo Batista wrote: > Is there any official web site with the documentation for Python 3.0 > in html format? > > If not, let me share with you this one [1], updated nightly, kindly > set up by another Argentinian pythonista (Humitos). > > Regards, > > [1] http://humitos.homelinux.net/py3kdoc/ Well, I use this one every day: http://docs.python.org/dev/3.0/ -- Amaury Forgeot d'Arc From stefan_ml at behnel.de Tue May 27 18:08:42 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 27 May 2008 16:08:42 +0000 (UTC) Subject: [Python-3000] Py3 docs References: Message-ID: Facundo Batista gmail.com> writes: > Is there any official web site with the documentation for Python 3.0 > in html format? You mean like this? http://docs.python.org/dev/3.0/ Stefan From facundobatista at gmail.com Tue May 27 18:09:57 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Tue, 27 May 2008 13:09:57 -0300 Subject: [Python-3000] Py3 docs In-Reply-To: References: Message-ID: 2008/5/27 Amaury Forgeot d'Arc : > Well, I use this one every day: > http://docs.python.org/dev/3.0/ Ah, didn't know about this (I was reading the .rst directly before). Thanks! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From guido at python.org Tue May 27 18:49:47 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 09:49:47 -0700 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: On Sun, May 25, 2008 at 3:08 PM, Brett Cannon wrote: > On Sun, May 25, 2008 at 12:21 PM, Georg Brandl wrote: >> Hi, >> >> I'll handle the PEP 3108 dbm package if nobody else is already at it. >> > > I know I have not started the work. > >> Two questions though: >> >> * the whichdb() function returns strings that are module names. These >> names won't be importable anymore in 3k. Should the return values >> remain the same in 3k, or should whichdb() return the new names, and >> if the latter, including "dbm." or not? >> > > New names with the package name prepended. > > Should probably change the API at some point to just return the module > to use instead of the name. I'm not sure I disagree. I see the return value as an enum, only one use for which is to import it. (If you wanted to just use the module, why not use anydbm?) I'd prefer to keep the return strings the same (no 'dbm.' prefix) and fix the code that uses whichdb. Or is there an expected future use case where the returned value would be something in a *different* package? Returning a module object would seem the least attractive version -- that would require importing the module, which may not be in the caller's plan at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 27 19:04:45 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 10:04:45 -0700 Subject: [Python-3000] str(container) calls repr() (was: PEP 3138) In-Reply-To: <20080527083612.GA15216@phd.pp.ru> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru> Message-ID: On Tue, May 27, 2008 at 1:36 AM, Oleg Broytmann wrote: > On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote: >> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce >> distinctive output: calling repr() on container contents achieves this, >> calling str() does not. > > String representation is a special case and *the only* special case, and > must be handled as a special case. I don't like this special case to be used > as a model for all other types. No other type allows usage like list("123"). > >> While it could be argued that if you want unambiguous output you should >> be invoking repr() on the container instead of str(), I'm still seeing >> many more downsides than upsides to the idea of making str() on the >> builtin containers display their contents with str() instead of repr(). > > The decision should be upon the user. In an ideal world str(container) > calls str() on items, and repr(container) calls repr() on items, so the > user can choose what [s]he wants. Currently user is just stuck with repr(). I disagree. Calling str() on items is counterproductive. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue May 27 20:53:25 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 14:53:25 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <483B72B8.30906@gmail.com> References: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> <483B72B8.30906@gmail.com> Message-ID: On 5/26/08, Nick Coghlan wrote: > Jim Jewett wrote: > > The problem is classes where str(x) != repr(x), and how they get > > messed up when a container holding (one of their) instances is > > printed. > This is NOT a bug, since str([1, 2, 3]) and str(list("123")) SHOULD produce > results that look different. Ideally, but it isn't that important. repr(1) and repr("1") need to be different, but if str(1) and str("1") look alike, that is acceptable. It already happens with >>> "%s %s " % (1, "1") > Calling str() internally to display the > contents of containers is a broken idea. If you are using it for precise debugging, you should use repr -- and repr should be used all the way down. If you're using str because you want (fairly) readable output, then str should be used all the way down. > The ambiguity that recursive calls > to str() would introduce would make any concerns about potential confusion > between different Unicode glyphs seem utterly inconsequential. So don't do it on repr -- do it only on str -- but always do it on str, even when str falls back to repr for a particular level of recursion. -jJ From jimjjewett at gmail.com Tue May 27 21:08:18 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 15:08:18 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/26/08, Stephen J. Turnbull wrote: > Jim Jewett writes: > > The only reason for this change is that __repr__ gets used when > > __str__ *should* be used instead. > That's not what the advocates say. I still haven't seen a use case where it *should* be using repr *and* needs to print outside of ASCII. There are plenty of cases where it *is* using repr because str(container) fell back to repr, and then the contained strings stay in repr instead of shifting back to str. I just haven't seen any where repr is the *right* function, as opposed to what they're stuck with because a container doesn't implement a separate __str__. [The file exceptions *may* be a separate case, because of tracebacks using repr to print, but I'm not sure even there.] > Well, in `String Conversions', the manual says """In particular, > converting a string adds quotes around it and converts > "funny" characters to escape sequences that are safe to print.""" > Now, I agree with you about what's "safe". However, in a text- > processing application in a Japanese environment, that's hardly > useful, and our Japanese programmer can argue that in his environment, > printing all of Unicode *is* safe. I think he or she will still be wrong, because of confusables -- it is just that "unsafe" characters are far more rare (since byte value alone isn't a problem) and the cost of not printing non-ASCII characters is higher. So I suggest that he or she use str, rather than repr -- and that we fix containers to make this possible. > > Again -- *why* is repr used instead of str? > I don't use it myself other than as a way of diagnosing bugs in > programs I write or maintain; in personal practice, I'm in your camp. > But my understanding is that there is often an intermediate level, > such as a website admin, who needs *some* of the precision of repr() > such as escaped representation of whitespace, but also needs to be > able read most of the output. Could someone who does need this explain more? I understand wanting the two side-by-side. I sometimes want that with hex. I understand wanting a container's contents to be readable. I realize you can't easily get that today, and consider that a bug. (Nick's disagreement noted.) I don't understand needing *exactly* whitespace escaped, but not, say, stray characters from scripts you've never used, even though the rest of the page *is* in an expected script. -jJ From jimjjewett at gmail.com Tue May 27 21:17:29 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 15:17:29 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <20080524171814.GA4026@phd.pp.ru> <20080526232402.GC8849@phd.pp.ru> Message-ID: On 5/27/08, Adam Olsen wrote: > On Mon, May 26, 2008 at 8:13 PM, Jim Jewett wrote: > > The problem is classes where str(x) != repr(x), and how they get > > messed up when a container holding (one of their) instances is > > printed. > >>>> class A: > >>>> def __str__(self): return "an A" > >>>> a=A() > >>>> print a # this is fine. > > an A > >>>> str(a) # this is OK, you have asked for "%s" % a > > 'an A' > >>>> repr(a) # this is OK, you wanted repr explicitly. > > '<__main__.A instance at 0x012DDAF8>' > >>>> print ([a]) # this stinks ... > > [<__main__.A instance at 0x012DDAF8>] > > It would be much better as: > >>>> print ([a]) # after fixing the recursion bug > > ['an a'] > Hmm, I see where the confusion is. Containers only define __repr__, > so although you think it's the list.__str__ that's mistakenly using > repr(), it's str(list) itself that's calling repr(list). Exactly. And the fact that it then calls repr (rather than str) on the contents -- even though the user asked for str -- is what I (but not Nick) consider a bug. I believe this bug is also the only real source of the pain that motivates PEP 3138. > So the question to ask is whether we can define a useful __str__ for > containers. str(['an a']) -> '[an a]' is not too bad, but > str(['hello, world']) -> '[hello, world]' is ambiguous. It crosses > the line into garbage. I would consider '["hello, world"]' to be perfectly acceptable; the conversion to str was explicit. But to be honest, I would also accept '[hello, world]' despite the ambiguity -- if ambiguity is a problem, then str probably isn't the right function. (Admittedly, that does increase the pressure for a 3rd case in between; I'm just not sure that there would be enough need for that in-between, if str worked "all the way down".) > Along the way I've found a new definition of repr: unambiguous when > used in a container's repr. err ... actually even that isn't met, if you look too hard at corner (or malicious) cases. But I agree that it is a good goal -- which need not apply to a containers __str__. -jJ From jimjjewett at gmail.com Tue May 27 21:33:02 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 15:33:02 -0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <483BE3D8.4080806@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> Message-ID: On 5/27/08, M.-A. Lemburg wrote: > On 2008-05-27 05:02, Stephen J. Turnbull wrote: > > ... repr() Spanish and Chinese users are going to feel more or less > > differently from Americans about which characters should be escaped. > I'm not sure that's always the case, but users should certainly > have the freedom to decide whether they prefer backslashed quoted > code points or glyphs on their screen. Agreed, and they already do if they go far enough out of their way to be explicit. The question is what to do by default. We agree that, by default, str(x) should display glyphs when possible. (And changing this is hard in practice, even if you don't recognize the glyphs.) We agree that, by default today, repr uses backslash. (And changing this is hard, even if you do recognize the glyphs.) We agree also agree that in many cases, people want the glyphs but get a backslash. The only disagreement is over how to fix this. PEP 3138 says that repr should start printing unicode glyphs. I say that repr should (insetad) start recognizing when it was called in place of __str__, and should revert back to __str__ when it recurses down to the next level. -jJ From guido at python.org Tue May 27 21:42:48 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 12:42:48 -0700 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String ABC) In-Reply-To: References: Message-ID: [+python-3000] On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher wrote: > Strings are currently iterable and it was stated multiple times that this is a > good idea and shouldn't change. While I still don't think that that's a good > idea I would like to propose a solution for the problem many people are > experiencing by introducing an abstract base class for strings. > > Basically *the* problematic situation with iterable strings is something like > a `flatten` function that flattens out every iterable object except of strings. > Imagine it's implemented in a way similar to that:: > > def flatten(iterable): > for item in iterable: > try: > if isinstance(item, basestring): > raise TypeError() > iterator = iter(item) > except TypeError: > yield item > else: > for i in flatten(iterator): > yield i > > A problem comes up as soon as user defined strings (such as UserString) is > passed to the function. In my opinion a good solution would be a "String" > ABC one could test against. I'm not against this, but so far I've not been able to come up with a good set of methods to endow the String ABC with. Another problem is that not everybody draws the line in the same place -- how should instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) be treated? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue May 27 22:03:51 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 16:03:51 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483B7668.1090800@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> Message-ID: On 5/26/08, Nick Coghlan wrote: > > The problem isn't that I want to be able to write code that acts the > > old way; the problem is that I want to ensure all code running on my > > system acts the old way. > This is for Py3k - you'll be lucky if your old code runs at all, let alone > in the same way. Again, this isn't about code I wrote; it is about code someone else wrote. If they used a new function designed to display unicode, then I know it was intentional. If they used repr, then it is quite likely that they were using 2.x repr, and just didn't consider the non-ASCII case. > > Keeping repr and fixing the way it recurses when used as a str > > fallback would be even better. > No it wouldn't - the ambiguity introduced by doing so would > dwarf anything It wouldn't add *any* ambiguity when someone called repr explicitly. When they called str explicitly, ambiguity would occur exactly for objects where it is already tolerable for str. (Because these same objects would already be ambigous if they were top-level objects instead of contained subobjects.) > In terms of actual IO for display to a user, why do you care if something > gets backslash replaced or not? The characters which are replaced will only > be those which the user's terminal can't display anyway. [Assuming non-buggy terminals, yes.] My biggest concern is with characters that the *terminal* can display fine, but which the *human* will not recognize. (At least not without some emphasis warning them that something is unexpected.) For str, those characters are not a problem -- if I don't notice them, then they (almost by definition) are not crucial to me. For repr, that is a problem. If I am using repr, then I want attention called to anything unexpected. The fact that a character might be common somewhere else doesn't matter -- *I* wasn't expecting in *in my environment*. A system-level switch to add expected characters is fine. A generic assumption that anything printable is expected -- that is not fine. -jJ From jyasskin at gmail.com Tue May 27 22:04:02 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Tue, 27 May 2008 15:04:02 -0500 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> Message-ID: <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com> On Tue, May 27, 2008 at 2:33 PM, Jim Jewett wrote: > I say that repr should (insetad) start recognizing when it was called > in place of __str__, and should revert back to __str__ when it > recurses down to the next level. That sounds like a PEP you should write, which, if accepted, might obviate some of the rationale for this one. -- Namast?, Jeffrey Yasskin From phd at phd.pp.ru Tue May 27 22:14:50 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 28 May 2008 00:14:50 +0400 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com> References: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com> Message-ID: <20080527201450.GA29645@phd.pp.ru> On Tue, May 27, 2008 at 03:04:02PM -0500, Jeffrey Yasskin wrote: > On Tue, May 27, 2008 at 2:33 PM, Jim Jewett wrote: > > I say that repr should (insetad) start recognizing when it was called > > in place of __str__, and should revert back to __str__ when it > > recurses down to the next level. > > That sounds like a PEP you should write, which, if accepted, might > obviate some of the rationale for this one. I have wrote the PEP. The draft is being discussed now in the Russian Python and Zope mailing list. I will post it here tomorrow. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From benji at benjiyork.com Tue May 27 22:09:47 2008 From: benji at benjiyork.com (Benji York) Date: Tue, 27 May 2008 16:09:47 -0400 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String ABC) In-Reply-To: References: Message-ID: On Tue, May 27, 2008 at 3:42 PM, Guido van Rossum wrote: > [+python-3000] > > On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher > wrote: >> Basically *the* problematic situation with iterable strings is something like >> a `flatten` function that flattens out every iterable object except of strings. >> Imagine it's implemented in a way similar to that:: > > I'm not against this, but so far I've not been able to come up with a > good set of methods to endow the String ABC with. Another problem is > that not everybody draws the line in the same place -- how should > instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) > be treated? Maybe the opposite approach would be more fruitful. Flattening is about removing nested "containers", so perhaps there should be an ABC that things like lists and tuples provide, but strings don't. No idea what that might be. -- Benji York From rhamph at gmail.com Tue May 27 22:21:57 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 27 May 2008 14:21:57 -0600 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> Message-ID: On Tue, May 27, 2008 at 2:03 PM, Jim Jewett wrote: > On 5/26/08, Nick Coghlan wrote: >> > The problem isn't that I want to be able to write code that acts the >> > old way; the problem is that I want to ensure all code running on my >> > system acts the old way. > >> This is for Py3k - you'll be lucky if your old code runs at all, let alone >> in the same way. > > Again, this isn't about code I wrote; it is about code someone else > wrote. If they used a new function designed to display unicode, then > I know it was intentional. If they used repr, then it is quite likely > that they were using 2.x repr, and just didn't consider the non-ASCII > case. Welcome to 3.0: unicode is now the norm. >> > Keeping repr and fixing the way it recurses when used as a str >> > fallback would be even better. > >> No it wouldn't - the ambiguity introduced by doing so would >> dwarf anything > > It wouldn't add *any* ambiguity when someone called repr explicitly. > When they called str explicitly, ambiguity would occur exactly for > objects where it is already tolerable for str. (Because these same > objects would already be ambigous if they were top-level objects > instead of contained subobjects.) I don't think str() is normally used on containers. str(3) and str('hello') are shallow and explicit - not ambiguous. The fact that we fallback to repr() when there is no sensible __str__ means we can use print(obj) for debugging and have it Just Work. If you really cared we could remove the fallback behaviour, raising a TypeError instead, but this won't do anything to help PEP 3138. We'd need a third function that applies to containers (like repr), differing only in how it handles non-ascii. PEP 3138 already provides a simple solution for this though: ascii_repr(). It's just not the default repr(). -- Adam Olsen, aka Rhamphoryncus From gregor.lingl at aon.at Tue May 27 22:48:25 2008 From: gregor.lingl at aon.at (Gregor Lingl) Date: Tue, 27 May 2008 22:48:25 +0200 Subject: [Python-3000] how to deal with compatibility problems (example: turtle module) Message-ID: <483C7399.3050103@aon.at> Hi, when doing some final checking of the new turtle module I ran into the following problem, which I'd like to discuss with the intention to clarify how to handle problems that result more or less from suboptimal design decisions of the module to replace.: (1) As requested I added an __all__ variable to the new turtle module to define those names, that will be imported by: from turtle import * Of course I consider this very useful. (The old module didn't have an __all__ variable) (2) Moreover it was requested that the new turtle module be fully compatible with the old one. (3) The old module has a from math import * statement, which results in importing all names from math when doing from turtle import *. Moreover there are defined two functions in turtle.py which overwrite the correspoding functions from math (namely degrees() and radians()) Is this a feature which should be retained? (I suppose that it was not intended by the developer of the old turtle module but happened somehow.) If so, I had to add all names from dir(math) to my __all__ variable (except those two mentioned above). My personal opinion is, that this would be a rather ugly solution, and I think that this 'feature' should at least be eliminated in the Python 3.0 version. On the other hand one could argue, that most (if not all) of the functions in math are normally not used by users of turtle, and those who use it certainly know how to import what they need. So one could drop the from math import * already in Python 2.6. But, of course, this argument doesn't consider the possibility of breaking some old code. I'm also interested in how to proceed with this, because there are a few similar problems with the turtle module which should be solved with the transition from Python2.6 to Python3.0 So, generally, which guidelines should one use to decide on problems like this - and who is the one who decides? With best regards Gregor From g.brandl at gmx.net Tue May 27 22:48:27 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 27 May 2008 22:48:27 +0200 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On Sun, May 25, 2008 at 3:08 PM, Brett Cannon wrote: >> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl wrote: >>> Hi, >>> >>> I'll handle the PEP 3108 dbm package if nobody else is already at it. >>> >> >> I know I have not started the work. >> >>> Two questions though: >>> >>> * the whichdb() function returns strings that are module names. These >>> names won't be importable anymore in 3k. Should the return values >>> remain the same in 3k, or should whichdb() return the new names, and >>> if the latter, including "dbm." or not? >>> >> >> New names with the package name prepended. >> >> Should probably change the API at some point to just return the module >> to use instead of the name. > > I'm not sure I disagree. I see the return value as an enum, only one > use for which is to import it. (If you wanted to just use the module, > why not use anydbm?) I'd prefer to keep the return strings the same > (no 'dbm.' prefix) and fix the code that uses whichdb. So add a mapping to dbm.__init__ that maps old names to new names? > Or is there an expected future use case where the returned value would > be something in a *different* package? There was in the past, with the now-defunct bsddb185 module which was not used by anydbm. > Returning a module object would seem the least attractive version -- > that would require importing the module, which may not be in the > caller's plan at all. It may not be, but the modules are imported anyway during import of dbm.__init__ (which contains whichdb() now.) Georg From guido at python.org Tue May 27 22:57:28 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 13:57:28 -0700 Subject: [Python-3000] how to deal with compatibility problems (example: turtle module) In-Reply-To: <483C7399.3050103@aon.at> References: <483C7399.3050103@aon.at> Message-ID: The old turtle.py explicitly says from math import * # Also for export so I think it's desirable to keep this behavior. My intent with that line was that an absolute beginner could put "from turtle import *" in their interactive session and be able to use both the turtle code and the high-school math functions that might come in handy, like sin() and cos(). The other math functions don' really hurt I believe. Where there's a naming conflict, obviously the turtle module wins. --Guido On Tue, May 27, 2008 at 1:48 PM, Gregor Lingl wrote: > Hi, > > when doing some final checking of the new turtle module I ran into the > following problem, which I'd like to discuss with the intention to clarify > how to handle problems that result more or less from suboptimal design > decisions of the module to replace.: > > (1) As requested I added an __all__ variable to the new turtle module > to define those names, that will be imported by: from turtle import * > Of course I consider this very useful. (The old module didn't have an > __all__ variable) > > (2) Moreover it was requested that the new turtle module be fully > compatible with the old one. > > (3) The old module has a from math import * statement, which > results in importing all names from math when doing from turtle import *. > Moreover there are defined two functions in turtle.py which overwrite > the correspoding functions from math (namely degrees() and radians()) > Is this a feature which should be retained? (I suppose that it was not > intended by the developer of the old turtle module but happened > somehow.) > > If so, I had to add all names from dir(math) to my __all__ variable > (except those two mentioned above). > > My personal opinion is, that this would be a rather ugly solution, and > I think that this 'feature' should at least be eliminated in the Python 3.0 > version. > > On the other hand one could argue, that most (if not all) of the functions > in math are normally not used by users of turtle, and those who use it > certainly know how to import what they need. So one could drop the > from math import * already in Python 2.6. But, of course, this argument > doesn't consider the possibility of breaking some old code. > > I'm also interested in how to proceed with this, because there are a few > similar problems with the turtle module which should be solved with the > transition from Python2.6 to Python3.0 > > So, generally, which guidelines should one use to decide on problems like > this - and who is the one who decides? > > With best regards > Gregor > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 27 22:59:10 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 13:59:10 -0700 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: On Tue, May 27, 2008 at 1:48 PM, Georg Brandl wrote: > Guido van Rossum schrieb: >> >> On Sun, May 25, 2008 at 3:08 PM, Brett Cannon wrote: >>> >>> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl wrote: >>>> >>>> Hi, >>>> >>>> I'll handle the PEP 3108 dbm package if nobody else is already at it. >>>> >>> >>> I know I have not started the work. >>> >>>> Two questions though: >>>> >>>> * the whichdb() function returns strings that are module names. These >>>> names won't be importable anymore in 3k. Should the return values >>>> remain the same in 3k, or should whichdb() return the new names, and >>>> if the latter, including "dbm." or not? >>>> >>> >>> New names with the package name prepended. >>> >>> Should probably change the API at some point to just return the module >>> to use instead of the name. >> >> I'm not sure I disagree. I see the return value as an enum, only one >> use for which is to import it. (If you wanted to just use the module, >> why not use anydbm?) I'd prefer to keep the return strings the same >> (no 'dbm.' prefix) and fix the code that uses whichdb. > > So add a mapping to dbm.__init__ that maps old names to new names? Is the mapping not just 'dbm.' + x? >> Or is there an expected future use case where the returned value would >> be something in a *different* package? > > There was in the past, with the now-defunct bsddb185 module which was > not used by anydbm. > >> Returning a module object would seem the least attractive version -- >> that would require importing the module, which may not be in the >> caller's plan at all. > > It may not be, but the modules are imported anyway during import of > dbm.__init__ (which contains whichdb() now.) Hm, that's a regression if you ask me. Couldn't you use lazy import? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Tue May 27 23:16:53 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 27 May 2008 23:16:53 +0200 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: Guido van Rossum schrieb: >>> I'm not sure I disagree. I see the return value as an enum, only one >>> use for which is to import it. (If you wanted to just use the module, >>> why not use anydbm?) I'd prefer to keep the return strings the same >>> (no 'dbm.' prefix) and fix the code that uses whichdb. >> >> So add a mapping to dbm.__init__ that maps old names to new names? > > Is the mapping not just 'dbm.' + x? No. The mapping is dbhash -> dbm.bsd dbm -> dbm.ndbm (*) gdbm -> dbm.gnu (*) dumbdbm -> dbm.dumb (*) Not exactly; the original C modules are now called _dbm and _gdbm, and the submodules are stubs that import those. >>> Or is there an expected future use case where the returned value would >>> be something in a *different* package? >> >> There was in the past, with the now-defunct bsddb185 module which was >> not used by anydbm. >> >>> Returning a module object would seem the least attractive version -- >>> that would require importing the module, which may not be in the >>> caller's plan at all. >> >> It may not be, but the modules are imported anyway during import of >> dbm.__init__ (which contains whichdb() now.) > > Hm, that's a regression if you ask me. Couldn't you use lazy import? There's a module attribute "error" -- supposed to be a tuple of all possible errors from the db modules; that is hard to make lazy. Of course we could solve this by making all the different db module errors subclasses of a common exception (but since most of them are defined in C modules, this is hard again.) Georg From guido at python.org Tue May 27 23:30:45 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 14:30:45 -0700 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: On Tue, May 27, 2008 at 2:16 PM, Georg Brandl wrote: > Guido van Rossum schrieb: > >>>> I'm not sure I disagree. I see the return value as an enum, only one >>>> use for which is to import it. (If you wanted to just use the module, >>>> why not use anydbm?) I'd prefer to keep the return strings the same >>>> (no 'dbm.' prefix) and fix the code that uses whichdb. >>> >>> So add a mapping to dbm.__init__ that maps old names to new names? >> >> Is the mapping not just 'dbm.' + x? > > No. The mapping is > > dbhash -> dbm.bsd > dbm -> dbm.ndbm (*) > gdbm -> dbm.gnu (*) > dumbdbm -> dbm.dumb > > (*) Not exactly; the original C modules are now called _dbm and _gdbm, > and the submodules are stubs that import those. OK. I see. Hadn't remembered how messy it was. :-( I withdraw my opposition to returning module names. I still think returning a module would be a bad idea. >>>> Or is there an expected future use case where the returned value would >>>> be something in a *different* package? >>> >>> There was in the past, with the now-defunct bsddb185 module which was >>> not used by anydbm. >>> >>>> Returning a module object would seem the least attractive version -- >>>> that would require importing the module, which may not be in the >>>> caller's plan at all. >>> >>> It may not be, but the modules are imported anyway during import of >>> dbm.__init__ (which contains whichdb() now.) >> >> Hm, that's a regression if you ask me. Couldn't you use lazy import? > > There's a module attribute "error" -- supposed to be a tuple of all > possible errors from the db modules; that is hard to make lazy. > > Of course we could solve this by making all the different db module > errors subclasses of a common exception (but since most of them are > defined in C modules, this is hard again.) OK, let's make the latter a stretch goal. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Tue May 27 23:37:09 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 27 May 2008 23:37:09 +0200 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: Guido van Rossum schrieb: >>>>> Or is there an expected future use case where the returned value would >>>>> be something in a *different* package? >>>> >>>> There was in the past, with the now-defunct bsddb185 module which was >>>> not used by anydbm. >>>> >>>>> Returning a module object would seem the least attractive version -- >>>>> that would require importing the module, which may not be in the >>>>> caller's plan at all. >>>> >>>> It may not be, but the modules are imported anyway during import of >>>> dbm.__init__ (which contains whichdb() now.) >>> >>> Hm, that's a regression if you ask me. Couldn't you use lazy import? >> >> There's a module attribute "error" -- supposed to be a tuple of all >> possible errors from the db modules; that is hard to make lazy. >> >> Of course we could solve this by making all the different db module >> errors subclasses of a common exception (but since most of them are >> defined in C modules, this is hard again.) > > OK, let's make the latter a stretch goal. :-) I just realized: since dumbdbm's "error" is just IOError, using "except [any]dbm.error" will always catch IOError. So the easy solution is to just derive the database error classes from IOError. The slightly harder solution is to declare the above a bug, create a new builtin (at least on C level) error class DBError and derive them from that. Georg From gregor.lingl at aon.at Tue May 27 23:40:57 2008 From: gregor.lingl at aon.at (Gregor Lingl) Date: Tue, 27 May 2008 23:40:57 +0200 Subject: [Python-3000] how to deal with compatibility problems (example: turtle module) In-Reply-To: References: <483C7399.3050103@aon.at> Message-ID: <483C7FE9.6070909@aon.at> Guido van Rossum schrieb: > The old turtle.py explicitly says > > from math import * # Also for export > > so I think it's desirable to keep this behavior. My intent with that > line was that an absolute beginner could put "from turtle import *" in > their interactive session and be able to use both the turtle code and > the high-school math functions that might come in handy, like sin() > and cos(). The other math functions don' really hurt I believe. Where > there's a naming conflict, obviously the turtle module wins. > > --Guido > Thanks for the quick reply, I'll do it this way. Gregor P.S.: I'd just like to add one (critical) remark which results from some decades working as a highschool teacher and (nearly) one decade working with Python, and especially turtle graphics with highschool students: sin() and cos() imported from math work with radians. The default angle-mode for turtle is degrees. So when using trig-functions I have to talk about radian measure and conversion of angle units. To calculate the sine of 30 degrees for instance I had to call sin(radians(30)) etc., but unfortunately just this radians() functions is not available anymore when doing from turtle import *. So in this case this import is of limited use. And it definitely makes sense to tell highschool students that sin(), cos() and friends live in a module called math. From oliphant.travis at ieee.org Tue May 27 23:53:59 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue, 27 May 2008 16:53:59 -0500 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: Message-ID: Stefan Behnel wrote: > Hi, > > while implementing Py_buffer support in Cython, I noticed (the hard way, > throught a segfault), that the buffer pointer passed into getbuffer() can be > NULL, e.g. when calling memoryview.tobytes(). According to PEP 3118 (first > paragraph below the getbuffer() signature), this implies setting a lock on the > memory. Funny enough, the LOCK flag wasn't even set in my case, I just get > NULL as buffer and 285 as flags... The memoryview implementation is not yet done. I'm not sure if that is the only issue here. > > Anyway, my point is that this part of the protocol actually implies setting a > lock on the buffer *provider* rather than the buffer itself, as the buffer > provider cannot distinguish between different buffers based on a NULL pointer Yes, the language in the PEP could be more clear. Obviously, if you haven't provided a Py_buffer structure to fill in, then you are only asking to lock the object's buffer from other access. Naturally, the exporter should handle the case when no lock is actually requested. > > I know, the protocol is overly complex already and hard to implement from a > provider perspective, and I understand that that was preferred over putting > the complexity into the consumer. But wouldn't it make more sense to *always* > pass the buffer pointer, to let the provider decide what it makes of the > flags? Perhaps we are not understanding each other. The Py_buffer structure and the buffer pointer are 2 separate things. It is the Py_buffer structure than can be NULL when getbuffer is called (the buf member of the structure is the actual buffer pointer and it is un-defined when getbuffer is called and it contains the buffer pointer on successful return). Thanks for your probing. -Travis From guido at python.org Wed May 28 00:31:44 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 15:31:44 -0700 Subject: [Python-3000] dbm package creation In-Reply-To: References: Message-ID: On Tue, May 27, 2008 at 2:37 PM, Georg Brandl wrote: > Guido van Rossum schrieb: > >>>>>> Or is there an expected future use case where the returned value would >>>>>> be something in a *different* package? >>>>> >>>>> There was in the past, with the now-defunct bsddb185 module which was >>>>> not used by anydbm. >>>>> >>>>>> Returning a module object would seem the least attractive version -- >>>>>> that would require importing the module, which may not be in the >>>>>> caller's plan at all. >>>>> >>>>> It may not be, but the modules are imported anyway during import of >>>>> dbm.__init__ (which contains whichdb() now.) >>>> >>>> Hm, that's a regression if you ask me. Couldn't you use lazy import? >>> >>> There's a module attribute "error" -- supposed to be a tuple of all >>> possible errors from the db modules; that is hard to make lazy. >>> >>> Of course we could solve this by making all the different db module >>> errors subclasses of a common exception (but since most of them are >>> defined in C modules, this is hard again.) >> >> OK, let's make the latter a stretch goal. :-) > > I just realized: since dumbdbm's "error" is just IOError, using > "except [any]dbm.error" will always catch IOError. > > So the easy solution is to just derive the database error classes > from IOError. > > The slightly harder solution is to declare the above a bug, create > a new builtin (at least on C level) error class DBError and derive > them from that. I think deriving them all from IOError is good enough. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 28 00:34:11 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 27 May 2008 15:34:11 -0700 Subject: [Python-3000] how to deal with compatibility problems (example: turtle module) In-Reply-To: <483C7FE9.6070909@aon.at> References: <483C7399.3050103@aon.at> <483C7FE9.6070909@aon.at> Message-ID: In the light of that, I'm not opposed to relaxing the 100% compatibility requirement. On Tue, May 27, 2008 at 2:40 PM, Gregor Lingl wrote: > > > Guido van Rossum schrieb: >> >> The old turtle.py explicitly says >> >> from math import * # Also for export >> >> so I think it's desirable to keep this behavior. My intent with that >> line was that an absolute beginner could put "from turtle import *" in >> their interactive session and be able to use both the turtle code and >> the high-school math functions that might come in handy, like sin() >> and cos(). The other math functions don' really hurt I believe. Where >> there's a naming conflict, obviously the turtle module wins. >> >> --Guido >> > > Thanks for the quick reply, I'll do it this way. > > Gregor > > P.S.: I'd just like to add one (critical) remark which results from some > decades working as a highschool teacher and (nearly) one decade working > with Python, and especially turtle graphics with highschool students: > > sin() and cos() imported from math work with radians. The default > angle-mode for turtle is degrees. So when using trig-functions I have > to talk about radian measure and conversion of angle units. To calculate > the sine of 30 degrees for instance I had to call sin(radians(30)) etc., > but unfortunately just this radians() functions is not available anymore > when doing from turtle import *. So in this case this import is of limited > use. And it definitely makes sense to tell highschool students that > sin(), cos() and friends live in a module called math. > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Wed May 28 00:40:48 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 18:40:48 -0400 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String ABC) In-Reply-To: References: Message-ID: On 5/27/08, Benji York wrote: > Guido van Rossum wrote: > > Armin Ronacher wrote: > >> Basically *the* problematic situation with iterable strings is something like > >> a `flatten` function that flattens out every iterable object except of strings. > > I'm not against this, but so far I've not been able to come up with a > > good set of methods to endow the String ABC with. Another problem is > > that not everybody draws the line in the same place -- how should > > instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) > > be treated? > Maybe the opposite approach would be more fruitful. Flattening is about > removing nested "containers", so perhaps there should be an ABC that > things like lists and tuples provide, but strings don't. No idea what > that might be. It isn't really stringiness that matters, it is that you have to terminate even though you still have an iterable container. The test is roughly (1==len(v) and v[0]==v), except that you want to stop a layer sooner. Guido had at least a start in Searchable, back when ABC were still in the sandbox: http://svn.python.org/view/sandbox/trunk/abc/abc.py?rev=55321&view=auto Searchable represented the fact that (x in c) =/=> (x in iter(c)) because of sequence searches like ("Error" in results) -jJ From python at rcn.com Wed May 28 00:54:20 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 27 May 2008 15:54:20 -0700 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC) References: Message-ID: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> "Jim Jewett" > It isn't really stringiness that matters, it is that you have to > terminate even though you still have an iterable container. Well said. > Guido had at least a start in Searchable, back when ABC > were still in the sandbox: Have to disagree here. An object cannot know in general whether a flattener wants to split it or not. That is an application dependent decision. A better answer is be able to tell the flattener what should be considered atomic in a given circumstance. Raymond From ncoghlan at gmail.com Wed May 28 01:17:36 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 May 2008 09:17:36 +1000 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <483C9690.9010601@gmail.com> Jim Jewett wrote: > So I suggest that he or she use str, rather than repr -- and that we > fix containers to make this possible. And hope that every other author of a Python container class on the planet does the same thing? Recursing downwards with str() instead of repr() will break as soon as it encounters a container class which either doesn't resurce with str() or doesn't propagate a new "this is really str()" flag (depending on how Oleg's PEP suggests implementing this). PEP 3138 fixes the problem without relying on third parties to do anything. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed May 28 01:21:31 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 May 2008 09:21:31 +1000 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483C2939.2000409@latte.ca> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca> Message-ID: <483C977B.2000500@gmail.com> Blake Winton wrote: > Nick Coghlan wrote: >> While it could be argued that if you want unambiguous output you >> should be invoking repr() on the container instead of str(), I'm still >> seeing many more downsides than upsides to the idea of making str() on >> the builtin containers display their contents with str() instead of >> repr(). > > But which downsides do you see that aren't solved by the use of repr to > get unambiguous output? The fact that calling str() on containers has been unambiguous for years. All I'm saying is that no compelling use cases have been presented to justify changing the status quo (aside from the Unicode escaping problem, which is better addressed by allowing repr() to return arbitrary Unicode glyphs as proposed by PEP 3138, since that also fixes a bunch of other cases where repr() is invoked on Unicode strings). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Wed May 28 02:25:50 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 May 2008 12:25:50 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: <483BE3D8.4080806@egenix.com> References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> Message-ID: <483CA68E.7040909@canterbury.ac.nz> M.-A. Lemburg wrote: > AFAIK, eval(repr(obj)) is no longer a requirement... simply because > it has always only worked for a small subset of objects and in > reality, you wouldn't want to call eval() on anything too often > due to the security implications. Yes, I actually think that sentence in the docs should be removed, since it's more misleading than helpful. A suitable replacement might be something like "str() is intended for normal program output, and repr() is intended for diagnostic output". Plus something about it being desirable for repr() to make the type of the object as unambiguous as possible. -- Greg From jimjjewett at gmail.com Wed May 28 02:52:06 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 20:52:06 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483C977B.2000500@gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca> <483C977B.2000500@gmail.com> Message-ID: On 5/27/08, Nick Coghlan wrote: > Blake Winton wrote: > > But which downsides do you see that aren't solved by the use of repr to > get unambiguous output? > The fact that calling str() on containers has been unambiguous for years. > All I'm saying is that no compelling use cases have been presented to > justify changing the status quo (aside from the Unicode escaping problem, > which is better addressed by allowing repr() to return arbitrary Unicode > glyphs as proposed by PEP 3138, since that also fixes a bunch of other cases > where repr() is invoked on Unicode strings). I think it is pretty clear that there are sometimes reasons to want more than one string representation. There are arguably far more than two distinctions that would be useful, but two is what the language supports. That was the justification for str vs repr in the first place. What is the advantage in continuing to conflate the two for (only portions of) containers? The only justfication that I can see is "backwards compatibility", which applies even more strongly to repr than it does to str. -jJ From greg.ewing at canterbury.ac.nz Wed May 28 03:09:19 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 May 2008 13:09:19 +1200 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483C2939.2000409@latte.ca> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca> Message-ID: <483CB0BF.2060505@canterbury.ac.nz> Blake Winton wrote: > Seriously, I can write: > >>> print 1, "1", Decimal("1") > and get as my output: > 1 1 1 Yes, but you've explicitly told it to print that, so presumably it's what you want in that case. Equally, you need to be explicit about how you want a list printed. -- Greg From jimjjewett at gmail.com Wed May 28 03:44:54 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 27 May 2008 21:44:54 -0400 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <483CB0BF.2060505@canterbury.ac.nz> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca> <483CB0BF.2060505@canterbury.ac.nz> Message-ID: On 5/27/08, Greg Ewing wrote: > Blake Winton wrote: > > Seriously, I can write: > > >>> print 1, "1", Decimal("1") > > and get as my output: > > 1 1 1 > Yes, but you've explicitly told it to print that, > so presumably it's what you want in that case. > Equally, you need to be explicit about how you want > a list printed. Agreed; and that is why I consider the current behavior a bug. If you want the type information in there, then you should use repr instead of str. For example, to get get the type information for [1, "1", Decimal("1")], writing: print repr([1, "1", Decimal("1")]) is not such a huge problem. On the other hand, if you do not care about the specific types, and want to declutter the output, then writing: print ("[" + ", ".join(str(e for e in [1, "1", Decimal("1")])) + "]") is a bit more awkward in the best case -- and fails if your data structures are not all nested to exactly the same depth. Suddenly, you need to rewrite the equivalent of the pprint module. Again, what is the advantage of having str(x) be redundant to repr(x) in the case of containers? -jJ From greg.ewing at canterbury.ac.nz Wed May 28 03:45:00 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 May 2008 13:45:00 +1200 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> Message-ID: <483CB91C.9090900@canterbury.ac.nz> Jim Jewett wrote: > PEP 3138 says that repr should start printing unicode glyphs. > > I say that repr should (insetad) start recognizing when it was called > in place of __str__, and should revert back to __str__ when it > recurses down to the next level. But we *don't* all agree that the only case where we want unicode glyphs is when we call str(). I can understand a Japanese user wanting to see his text in Japanese when he uses repr() explicitly to debug his program. -- Greg From greg.ewing at canterbury.ac.nz Wed May 28 04:07:58 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 May 2008 14:07:58 +1200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: Message-ID: <483CBE7E.9080902@canterbury.ac.nz> Travis Oliphant wrote: > Obviously, if you > haven't provided a Py_buffer structure to fill in, then you are only > asking to lock the object's buffer from other access. What's the use case for that? Why would you ever want to lock an object if you don't intend to access it? BTW, I seem to remember when the PEP was being discussed that there was talk of putting some intelligence into the PyObject_* layer to make things easier for both the user and the provider, such as filling in some members of the Py_buffer if the provider didn't do it. Did anything come of that? -- Greg From greg.ewing at canterbury.ac.nz Wed May 28 04:41:55 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 May 2008 14:41:55 +1200 Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru> <483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca> <483CB0BF.2060505@canterbury.ac.nz> Message-ID: <483CC673.4020804@canterbury.ac.nz> Jim Jewett wrote: > Again, what is the advantage of having str(x) be redundant to repr(x) > in the case of containers? I think you're misrepresenting the situation when you describe it that way. Guido didn't sit down and think "I know, let's make str(lst) do the same as repr(lst)." He thought "It's not clear what str(lst) should do, so let's not define it at all." There can't be a bug in list.__str__, because list.__str__ *does not exist*. If you want one, you have to decide what you want it to do and write it yourself. I've never found this to be a problem. Either repr(lst) is good enough, or I've wanted something completely different and had to write my own code for it anyway. I've *never* wanted anything that was just like repr(lst) except that it didn't quote the strings. That would only be confusing. -- Greg From carl at carlsensei.com Wed May 28 05:48:41 2008 From: carl at carlsensei.com (Carl Johnson) Date: Tue, 27 May 2008 17:48:41 -1000 Subject: [Python-3000] Proposal to add __str__ method to iterables. Message-ID: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com> Proposal to add __str__ method to iterables: Proposed behavior of the __str__ method for iterables is that it returns the result of "".join(str(i) for i in self). Justification: Notice this difference in the behavior of filter* and a list comprehension: >>> filter(lambda c: c!="a", "abracadbra") 'brcdbr' >>> [c for c in "abracadbra" if c != "a"] ['b', 'r', 'c', 'd', 'b', 'r'] *This is the pre-3.0 filter's behavior. Post-3.0, "filter" is really ifilter. In order to replicate the behavior of filter with a comprehension, the return type must be the same as the input type: >>> def my_filter(cond, it): ... return type(it)(i for i in it if cond(i)) Thus, we get the same results using the old style filter and my_filter: >>> filter(lambda c: c!="a", (1,2)) (1, 2) >>> my_filter(lambda c: c!="a", (1,2)) (1, 2) >>> filter(lambda c: c!="a", [1,2]) [1, 2] >>> my_filter(lambda c: c!="a", [1,2]) [1, 2] But not in every case! >>> filter(lambda c: c!="a", "abracadbra") 'brcdbr' >>> my_filter(lambda c: c!="a", "abracadbra") '' Why does my_filter return a string saying ""? Because generator objects have no __str__ method, so str(gen_obj) returns gen_obj.__repr__(). So, my proposal is to make strings act like the other members of the iterable family by adding an __str__ method to them, which does a "".join on the str of its members. - - - - Potential downside #1: Don't try to print an infinite object, like itertools.count(). Other potential downside #2: This makes "".join(l) obsolete. Regarding #1: Do a repr instead. Regarding #2: I don't consider that to be a bad thing actually. I think doing "".join is very unnatural for people new to Python, and I think that even as people who are used to Python, I think we should admit that it's a little weird to join list members in that way. In terms of actual implementation, this could also be done by having the str class look for a __str__ method then a __iter__ method and only then use __repr__ as the final fallback instead of falling back to __repr__ as is done now. That might be easier than adding __str__ methods to all iterables. - - - - Incidentally, I think the idea that str(["1", "2"]) should return "[1, 2]" is a terrible idea. Where's the use case for that? When would you ever need to print that? It should return "12", which actually does have a use case as the replacement for "".join(["1", "2"]). From mal at egenix.com Wed May 28 12:12:27 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 28 May 2008 12:12:27 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483BDE11.509@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> Message-ID: <483D300B.5090309@egenix.com> I'm beginning to wonder whether I'm the only one who cares about the Python 2.x branch not getting cluttered up with artifacts caused by a broken forward merge strategy. How can it be that we allow major C API changes such as the renaming of the PyString APIs to go into the trunk without discussion or a PEP ? We're having lengthy discussions about the addition of single method to an object, but such major changes just go in like that and nobody seems to really care. Puzzled, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 28 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 39 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 On 2008-05-27 12:10, M.-A. Lemburg wrote: > On 2008-05-26 23:34, Christian Heimes wrote: >> M.-A. Lemburg schrieb: >>> Isn't that an awefuly confusing approach ? >>> >>> Wouldn't it be better to keep PyString APIs and definitions in >>> stringobject.c|h >>> >>> and only add a new bytesobject.h header file that #defines the >>> PyBytes APIs in terms of PyString APIs ? That maintains >>> backwards compatibility and allows Python internals to use the >>> new API names. >>> >>> With your approach, you've basically backported the confusing >>> notion in Py3k that str() maps PyUnicode, only that in Py2 >>> str() will now map to PyBytes. >> >> The last time I brought up the topic, I had a lengthy discussion with >> Guido. At first I wanted to rename the API in Python 3.0 only. Guido >> argued that it's going to cause too much merge conflicts. He then >> suggested the approach I implemented today. > > That's the same argument that came up in the module renaming > discussion. > > I have a feeling that we should be looking for better merge > tools, rather than implement code changes that cause more trouble > than do good, just because our existing tools aren't smart > enough. > > Wouldn't it be possible to have a 2to3.py converter > take the 2.x code (including the C code), convert it and then > apply any changes to the 3.x branch ? > > This wouldn't be merging in the classical sense, it would be > automated forward porting. > >> I find the approach less confusing than your suggestion and my initial >> idea. > > I disagree on that. > > Renaming old APIs to use the new names by adding a header file with > #define is standard practice. > > Renaming the old APIs in the source code and undoing the renaming > with a header file is not. > >> The internal API names are consistent for Python 2.6 and 3.0. The >> byte string C API is prefixed PyBytes and the unicode C API is prefixed >> PyUnicode. A core developer has just to remember that 'str' is a byte >> string in 2.x but an unicode object in 3.0. > > So you've solved part of the problem for 3.x by moving the naming mixup > back to 2.x. > >> Extension developers don't have to worry at all. The ABI and external >> API is mostly the same and still exposes the 'str' functions as PyString. > > Well, yes, but only due to a preprocessor hack that turns the > names used in bytesobject.c back into names you'd normally look > for in stringobject.c. > > And all this, just because Subversion can't handle merging of > symbol renaming. > >>> You'd have to add an aliase bytes -> str to the builtins to >>> at least reduce the confusion a bit. >> >> Python 2.6 already has an alias bytes -> str >> >>> Yes, but please let's first discuss this some more. I don't think >>> that the timing was right.... you started this thread just yesterday >>> and the patches are already checked in. >> >> I'm sorry if I was too hasty for you. I got +1 from a couple of >> developers and it's basically Guido's suggestion. > > Please discuss any changes of the 2.x code base on python-dev. > > Such major changes do need more discussion and possibly a PEP as well. > > Thanks, From phd at phd.pp.ru Wed May 28 13:55:15 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Wed, 28 May 2008 15:55:15 +0400 Subject: [Python-3000] str(containter) calls repr(item) In-Reply-To: <20080527201450.GA29645@phd.pp.ru> References: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> <483BE3D8.4080806@egenix.com> <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com> <20080527201450.GA29645@phd.pp.ru> Message-ID: <20080528115515.GA14748@phd.pp.ru> On Wed, May 28, 2008 at 12:14:50AM +0400, Oleg Broytmann wrote: > I have wrote the PEP. I'm discussing the PEP with Jim Jewett - more motivation and better wording - so the PEP will be published a bit later. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From musiccomposition at gmail.com Wed May 28 14:00:11 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 28 May 2008 07:00:11 -0500 Subject: [Python-3000] Proposal to add __str__ method to iterables. In-Reply-To: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com> References: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com> Message-ID: <1afaf6160805280500we89354dv1e2308b11cc2d42e@mail.gmail.com> On Tue, May 27, 2008 at 10:48 PM, Carl Johnson wrote: > - - - - > > Potential downside #1: Don't try to print an infinite object, like > itertools.count(). > > Other potential downside #2: This makes "".join(l) obsolete. No, it wouldn't. What is people want to join sequences with something other than a whitespace or whatever you propose. > > Regarding #1: Do a repr instead. > > Regarding #2: I don't consider that to be a bad thing actually. I think > doing "".join is very unnatural for people new to Python, and I think that > even as people who are used to Python, I think we should admit that it's a > little weird to join list members in that way. It's good to have join on string object because then any iterable can be joined. It doesn't require the sequence to implement it. > > In terms of actual implementation, this could also be done by having the str > class look for a __str__ method then a __iter__ method and only then use > __repr__ as the final fallback instead of falling back to __repr__ as is > done now. That might be easier than adding __str__ methods to all iterables. > > - - - - > > Incidentally, I think the idea that str(["1", "2"]) should return "[1, 2]" > is a terrible idea. Where's the use case for that? When would you ever need > to print that? It should return "12", which actually does have a use case as > the replacement for "".join(["1", "2"]). However, it's not expected. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/musiccomposition%40gmail.com > -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From lists at cheimes.de Wed May 28 14:02:53 2008 From: lists at cheimes.de (Christian Heimes) Date: Wed, 28 May 2008 14:02:53 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483BDE11.509@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> Message-ID: <483D49ED.8060907@cheimes.de> M.-A. Lemburg schrieb: > I have a feeling that we should be looking for better merge > tools, rather than implement code changes that cause more trouble > than do good, just because our existing tools aren't smart > enough. We don't have better tools at our hands. I don't think we'll get any tools in time or chance the VCS right before a major release. > Wouldn't it be possible to have a 2to3.py converter > take the 2.x code (including the C code), convert it and then > apply any changes to the 3.x branch ? Such a converter would be nice for 3rd party code but it's not an option for the core. In the past few months I've merged a lot of code from trunk to py3k. A 2to3 C converter doesn't help with merge conflicts. Naming differences make any merge more painful >> I find the approach less confusing than your suggestion and my initial >> idea. > > I disagree on that. > > Renaming old APIs to use the new names by adding a header file with > #define is standard practice. > > Renaming the old APIs in the source code and undoing the renaming > with a header file is not. I wasn't talking about standard practice here. I talked about less confusion for core developers. My approach doesn't split our internal API in two. And by the way it *is* a standard approach fore Python. Guido told me that the same approach was used during the 1.x to 2.0 migration. > And all this, just because Subversion can't handle merging of > symbol renaming. As I said earlier we don't have better tools at our disposal. We have to make some compromises. Sometimes practicality beat purity. > Please discuss any changes of the 2.x code base on python-dev. > > Such major changes do need more discussion and possibly a PEP as well. In the last few months I started at least three topics about the C API renaming. It's in the thread "2.6 and 3.0 tasks" http://permalink.gmane.org/gmane.comp.python.devel/93016 Christian From p.f.moore at gmail.com Wed May 28 14:29:32 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 28 May 2008 13:29:32 +0100 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483D300B.5090309@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> Message-ID: <79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com> On 28/05/2008, M.-A. Lemburg wrote: > I'm beginning to wonder whether I'm the only one who cares about > the Python 2.x branch not getting cluttered up with artifacts caused > by a broken forward merge strategy. I care, but I struggle to understand the implications and/or what is being proposed in many cases. Recent examples are the ABC backports and the current thread (string C API). I simply don't follow the issues well enough to comment. > How can it be that we allow major C API changes such as the renaming > of the PyString APIs to go into the trunk without discussion or > a PEP ? Christian has raised this a couple of times, but there has been little discussion. I suspect that this is because there is not enough clarity over the practical consequences. A PEP may help here, but I'm not sure how much - it could spark discussion, but would anyone actually end up any better informed? > We're having lengthy discussions about the addition of single method > to an object, but such major changes just go in like that and nobody > seems to really care. I suspect deadline pressure and burnout are involved here. In all honesty, there's been little or no work done on the C API, which is just as much in need of review and possible cleanup for 3.0 as the language. It's as close as makes no difference to too late now - does that mean we've lost the chance? Paul. From mal at egenix.com Wed May 28 14:59:33 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 28 May 2008 14:59:33 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com> Message-ID: <483D5735.4090608@egenix.com> On 2008-05-28 14:29, Paul Moore wrote: > On 28/05/2008, M.-A. Lemburg wrote: >> I'm beginning to wonder whether I'm the only one who cares about >> the Python 2.x branch not getting cluttered up with artifacts caused >> by a broken forward merge strategy. > > I care, but I struggle to understand the implications and/or what is > being proposed in many cases. Thanks, so I'm not the only :-) > Recent examples are the ABC backports and the current thread (string C > API). I simply don't follow the issues well enough to comment. > >> How can it be that we allow major C API changes such as the renaming >> of the PyString APIs to go into the trunk without discussion or >> a PEP ? > > Christian has raised this a couple of times, but there has been little > discussion. I suspect that this is because there is not enough clarity > over the practical consequences. A PEP may help here, but I'm not sure > how much - it could spark discussion, but would anyone actually end up > any better informed? Probably, yes. The reason is that if you have a PEP, more people are likely to review it and make comments. If you start a discussion with a general subject line which then results in lots of little sub-threads, important aspects of the discussion are likely to go unnoticed in the noise. >> We're having lengthy discussions about the addition of single method >> to an object, but such major changes just go in like that and nobody >> seems to really care. > > I suspect deadline pressure and burnout are involved here. > > In all honesty, there's been little or no work done on the C API, > which is just as much in need of review and possible cleanup for 3.0 > as the language. It's as close as makes no difference to too late now > - does that mean we've lost the chance? Perhaps, but the C API is certainly not used by as many people as the Python front-end and changes to the C API also have much deeper consequences due the API being written in C rather than Python. Overall, I don't think there's a lot to cleanup in the C API. Perhaps remove a few of those '...Ex()' APIs that were introduced to extend the original APIs and maybe remove or free up a few type slots that are no longer needed, but that's about it. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 28 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 39 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Wed May 28 14:47:00 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 28 May 2008 14:47:00 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming (Stabilizing the C API of 2.6 and 3.0) In-Reply-To: <483D49ED.8060907@cheimes.de> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> Message-ID: <483D5444.8000705@egenix.com> On 2008-05-28 14:02, Christian Heimes wrote: > M.-A. Lemburg schrieb: >> I have a feeling that we should be looking for better merge >> tools, rather than implement code changes that cause more trouble >> than do good, just because our existing tools aren't smart >> enough. > > We don't have better tools at our hands. I don't think we'll get any > tools in time or chance the VCS right before a major release. > >> Wouldn't it be possible to have a 2to3.py converter >> take the 2.x code (including the C code), convert it and then >> apply any changes to the 3.x branch ? > > Such a converter would be nice for 3rd party code but it's not an option > for the core. In the past few months I've merged a lot of code from > trunk to py3k. A 2to3 C converter doesn't help with merge conflicts. > Naming differences make any merge more painful I was suggesting to not use SVN to merge changes directly, but to instead use an intermediate step in the process: Init: 1. grab the latest trunk 2. apply a 2to3 converter to the Python code and the C code, applying any renaming that may be necessary 3. save this converted version in a separate branch merge-branch Update: 1. checkout the merge-branch, . grab the latest trunk and 3.x branch 2. apply a 2to3 converter to the Python code and the C code, applying any renaming that may be necessary 3. copy the files over your working copy of the merge-branch 4. create a diff on the merge-branch 5. apply the diffs to 3.x branch, resolving any conflicts as necessary This doesn't require new tools (except for some C renaming support in the 2to3 tool). It only changes the procedure. We'd basically follow our own suggestions w/r to porting to 3.x, which is to make changes in the 2.x code, apply 2to3 and then apply remaining fixes there. I'm suggesting this, since 3.x is likely to introduce more Python stdlib and C API changes. The process would likely also makes a lot of other changes more easily manageable and reduce the overall merge conflicts. >>> I find the approach less confusing than your suggestion and my initial >>> idea. >> I disagree on that. >> >> Renaming old APIs to use the new names by adding a header file with >> #define is standard practice. >> >> Renaming the old APIs in the source code and undoing the renaming >> with a header file is not. > > I wasn't talking about standard practice here. I talked about less > confusion for core developers. My approach doesn't split our internal > API in two. No, but it does apply a well hidden renaming which will cause confusion when using a debugger to trace calls in C code. If you use PyBytes APIs, you expect to find PyBytes functions in the libs and also set breakpoints on these. With the renaming we don't have two sets of APIs (old and new) exposed in the lib, like what we normally do when applying changes to API names. > And by the way it *is* a standard approach fore Python. Guido told me > that the same approach was used during the 1.x to 2.0 migration. There was no API change between 1.6 and 2.0. You are probably talking about the great renaming between 1.4 and 1.5. That was different, since it changes almost all C APIs in Python. And it used the standard practice... from rename2.h in Python 1.5: /* This file contains a bunch of #defines that make it possible to use "old style" names (e.g. object) with the new style Python source distribution. */ #define True Py_True #define False Py_False #define None Py_None ie. #define >> And all this, just because Subversion can't handle merging of >> symbol renaming. > > As I said earlier we don't have better tools at our disposal. We have to > make some compromises. Sometimes practicality beat purity. See above. >> Please discuss any changes of the 2.x code base on python-dev. >> >> Such major changes do need more discussion and possibly a PEP as well. > > In the last few months I started at least three topics about the C API > renaming. It's in the thread "2.6 and 3.0 tasks" > http://permalink.gmane.org/gmane.comp.python.devel/93016 Thanks. I stopped reading that thread after Guido's reply in http://comments.gmane.org/gmane.comp.python.devel/92541 It would really help if subject lines were more specific. This thread also uses a much to general subject line (which is why I changed it). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 28 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 39 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Wed May 28 15:43:13 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 28 May 2008 23:43:13 +1000 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming (Stabilizing the C API of 2.6 and 3.0) In-Reply-To: <483D5444.8000705@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> Message-ID: <483D6171.5000208@gmail.com> M.-A. Lemburg wrote: > You are probably talking about the great renaming between 1.4 and 1.5. > That was different, since it changes almost all C APIs in Python. > And it used the standard practice... from rename2.h in Python 1.5: > > /* This file contains a bunch of #defines that make it possible to use > "old style" names (e.g. object) with the new style Python source > distribution. */ > > #define True Py_True > #define False Py_False > #define None Py_None > > ie. #define This is what I expected to see in stringobject.h, along with some code in stringobject.c to allow the linker to see the old names *as well as* the new names. At the moment, all the code appears to be using the new names, but stringobject.h implicitly converts the new names back to the old names - so trying to use ctypes to retrieve the PyBytes_* functions from the Python DLL will fail. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From dalcinl at gmail.com Wed May 28 16:35:06 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 28 May 2008 11:35:06 -0300 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: <483CBE7E.9080902@canterbury.ac.nz> References: <483CBE7E.9080902@canterbury.ac.nz> Message-ID: On 5/27/08, Greg Ewing wrote: > Travis Oliphant wrote: > > > Obviously, if you haven't provided a Py_buffer structure to fill in, then > you are only asking to lock the object's buffer from other access. > > > > What's the use case for that? Why would you ever want > to lock an object if you don't intend to access it? > Well, iff we already accessed the object, had stored the raw memory pointer, and hold a reference to it, and now we want other thread to operate on the raw memory, does not make sense to just lock the object? In the context of MPI communication, I believe I have a use case, using something called persistent communication requests. You emit a Comm.Recv_init() call with the pointer to the buffer receiving the message (then you have to ask the object for the buffer pointer). The Comm.Recv_init() returns a 'Prequest' instance (persistent request), but the actual communication does not initiate until you call Prequest.Start(). Then when you initiate the communication, we should lock the object until the communication finalizes, because then we could release the GIL but protect the raw memory from other accesses. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From janssen at parc.com Wed May 28 19:08:23 2008 From: janssen at parc.com (Bill Janssen) Date: Wed, 28 May 2008 10:08:23 PDT Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483D300B.5090309@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> Message-ID: <08May28.100829pdt."58698"@synergy1.parc.xerox.com> > I'm beginning to wonder whether I'm the only one who cares about > the Python 2.x branch not getting cluttered up with artifacts caused > by a broken forward merge strategy. I share your concern. Seems to me that perhaps (not sure, but perhaps) the rush to back-port from 3.x, and the concern about minimizing pain of moving from 2.x to 3.x, has become the tail wagging the dog. Bill From brett at python.org Wed May 28 21:40:37 2008 From: brett at python.org (Brett Cannon) Date: Wed, 28 May 2008 12:40:37 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <8453256766467481803@unknownmsgid> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <8453256766467481803@unknownmsgid> Message-ID: On Wed, May 28, 2008 at 10:08 AM, Bill Janssen wrote: >> I'm beginning to wonder whether I'm the only one who cares about >> the Python 2.x branch not getting cluttered up with artifacts caused >> by a broken forward merge strategy. > > I share your concern. Seems to me that perhaps (not sure, but > perhaps) the rush to back-port from 3.x, and the concern about > minimizing pain of moving from 2.x to 3.x, has become the tail wagging > the dog. > Speaking for myself, I know that if fixing something in 2.x means a pain in forward-porting, I will just do it in 3.x and leave it someone else to back-port to 2.x which will lower the chances of the back-port ever occurring. I don't want to do this, but I am fighting damn hard against burn-out at this point and if I have to choose between complete burn-out and only working on the leading edge version of Python, I will choose the latter. So I for one appreciate Christian taking all of us into account in terms of the approach taken to make our lives easier when we work on Python. -Brett From greg at krypto.org Wed May 28 22:47:29 2008 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 28 May 2008 13:47:29 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483D300B.5090309@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> Message-ID: <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> On Wed, May 28, 2008 at 3:12 AM, M.-A. Lemburg wrote: > I'm beginning to wonder whether I'm the only one who cares about > the Python 2.x branch not getting cluttered up with artifacts caused > by a broken forward merge strategy. > > How can it be that we allow major C API changes such as the renaming > of the PyString APIs to go into the trunk without discussion or > a PEP ? I do not consider it a C API change. The API and ABI have not changed. Old code still compiles. Old binaries still dynamically load and work fine. (I just confirmed this by importing a couple python2.4 .so files into my non-debug build of 2.6 trunk) A of the PyString APIs are the real implementations in 2.x and are still there. We only switched to using their PyBytes equivalent names within the Python trunk code base. Are you objecting to our own code switching to use a different name even though the actual underlying API and ABI haven't changed? I suppose to people reading the code and going against old reference books it could be confusing but they've got to get used to the new names somehow and sometime. I strongly support changes like this one that makes the life of porting C code forwards and backwards between 2.x and 3.x easier without breaking compatibility with earlier 2.x version because that is going to be a serious pain for all of us otherwise. -gps From greg.ewing at canterbury.ac.nz Thu May 29 02:31:52 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 May 2008 12:31:52 +1200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: <483CBE7E.9080902@canterbury.ac.nz> Message-ID: <483DF978.70203@canterbury.ac.nz> Lisandro Dalcin wrote: > You emit a > Comm.Recv_init() call with the pointer to the buffer receiving the > message (then you have to ask the object for the buffer pointer). > ... Then when you initiate the communication, we should > lock the object No, you can't rely on a buffer pointer returned earlier if the object may have been unlocked in the meantime. The right thing to do in this case is just keep a reference to the object whose buffer you're going to be storing the result in. Then when it comes time to start the receive, you obtain the buffer pointer and lock the object at the same time. -- Greg From allyourcode at gmail.com Thu May 29 03:23:51 2008 From: allyourcode at gmail.com (Daniel Wong) Date: Wed, 28 May 2008 18:23:51 -0700 Subject: [Python-3000] suggestion: structured assignment Message-ID: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Hi, Are there plans for introducing syntax like this: (a, (b[2], c)) = ('big' ('red', 'dog')) It seems quite doable, because Professor Hillfinger at UC Berkeley created pyth, a dialect of Python, which has this feature. See page 10 of the spec he created for his students to implement the language: http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf Of course, this idea could also be applied to 'for' constructs (loops, list comprehensions, and generators) where assignments are implicit. Parallel looping (esp using zip) is a great use case for this. Here's a case that's come up more than once for me that "structured" assignments would solve really nicely: for n, (a, b) in enumerate(list_of_pairs): ... Currently, I must do the following instead: for n, pair in enumerate(list_of_pairs): a, b = pair ... This isn't such a great solution, because there's more indirection with the introduction of an otherwise useless variable; and (less significantly) there's an extra line of code that doesn't actually compute anything. Thoughts? Daniel PS: Sorry if this has already been discussed; I'm new to this list and I didn't see this mentioned in PEP 3099, unless it's covered under the LL(1) clause. From musiccomposition at gmail.com Thu May 29 03:26:12 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 28 May 2008 20:26:12 -0500 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Message-ID: <1afaf6160805281826t5993094ck2a6c179a31c1e91@mail.gmail.com> Hi Daniel, At the moment, we are preparing to ship betas, so this kind of proposal is a little late for 2.6/3.0. Also, I would recommend to try this on the python-ideas mailing list first. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From mike.klaas at gmail.com Thu May 29 03:29:54 2008 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 28 May 2008 18:29:54 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Message-ID: <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> On 28-May-08, at 6:23 PM, Daniel Wong wrote: > Currently, I must do the following instead: > > for n, pair in enumerate(list_of_pairs): > a, b = pair > ... > > <> > Thoughts? I find it hard to believe that you have even attempted this, which has been valid in python for ages: >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]): print x, a, b 0 1 2 1 3 4 2 5 6 -Mike From allyourcode at gmail.com Thu May 29 06:34:14 2008 From: allyourcode at gmail.com (allyourcode at gmail.com) Date: Wed, 28 May 2008 21:34:14 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> Message-ID: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> Well, I'm sorry for bothering his majesty with such a stupid idea. At least one other person didn't know about it either... On 5/28/08, Mike Klaas wrote: > > On 28-May-08, at 6:23 PM, Daniel Wong wrote: > >> Currently, I must do the following instead: >> >> for n, pair in enumerate(list_of_pairs): >> a, b = pair >> ... >> >> <> >> Thoughts? > > I find it hard to believe that you have even attempted this, which has > been valid in python for ages: > > >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]): > print x, a, b > > 0 1 2 > 1 3 4 > 2 5 6 > > -Mike > From brett at python.org Thu May 29 06:38:05 2008 From: brett at python.org (Brett Cannon) Date: Wed, 28 May 2008 21:38:05 -0700 Subject: [Python-3000] Finishing up PEP 3108 Message-ID: The issues related to PEP 3108 now total 14. With the beta (supposedly) in a week, I am hoping the last minor details can be pulled together or decisions made on what can be postponed and what should definitely be considered a release blocker. Issue 2847 - the aifc module still imports the cl module in 3.0. Problem is that the cl module is gone. =) So it seems silly to have the imports lying about. This can probably be changed to critical. Issue 2848 - mimetools has been deprecated for a while, but it is still used in a bunch of places. Since this has been deprecated in PEP 4 for a long time, should we add the removal warning in 2.6 now and then make its actual removal of usage something to do by another beta? Issue 2849 - rfc822 is the same problem as mimetools. Issue 2854 - gestalt needs to be added back into 3.0. This is Benjamin's issue. =) Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then again, pydoc is busted thanks to the new doc format. Issue 2874 - the stat module is not so useful anymore, but it still has functions that are useful. Currently the value returned by os.stat() is a named tuple, but that won't support methods. So the object returned by os.stat() needs to probably become a proper object in posix. Issue 2876 - The UserDict module has been removed in 3.0, but two classes were moved and renamed in collections and another was removed. The removal is a 3.0 warning, but the class renaming might be a tricky 2to3 fixer (not sure if the fix_imports fixer can be tweaked to handle this). Issue 2877 - UserString.UserString moved. Just need to apply the patch. Issue 2878 - Ditto for UserList. Issue 2885 - Creation of the urllib package. Jeremy has been working on this. I believe his patch is up on rietveld. Issue 2917 - This is merging pickle and cPickle. Alexandre's thing. Issue 2918 - Same for StringIO/cStringIO. Issue 2919 - profile and cProfile needs to be merged. This has not been dealt with yet. Would it be reasonable to deprecate importing cProfile directly in 2.6 with the assumption the merge will work out for 3.0? So that is everything that's left. Issue 2775 is the tracking issue so you can look there to see what issues are still open and need work. I was hoping to spend Monday and Tuesday trying to tie up as many loose ends as possible, but the conference paper I have been working on that was due Sunday is now due a week later, and so Monday and Tuesday will be spent on that (supervisor's orders). Plus I am flying out Wednesday for 10 days to help my mother move and I don't know when I will get Net again. In other words, I still need help. =) -Brett P.S.: A huge thanks goes to everyone who has helped so far. My life has been nothing but stress for a while now and you guys have helped keep the stress from reaching epic proportions. From guido at python.org Thu May 29 06:47:59 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 28 May 2008 21:47:59 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Message-ID: Apart from the missing comma after 'big' this is already supported. The time machine strikes again! --Guido On Wed, May 28, 2008 at 6:23 PM, Daniel Wong wrote: > Hi, > > Are there plans for introducing syntax like this: > > (a, (b[2], c)) = ('big' ('red', 'dog')) > > It seems quite doable, because Professor Hillfinger at UC Berkeley > created pyth, a dialect of Python, which has this feature. See page 10 > of the spec he created for his students to implement the language: > > http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf > > Of course, this idea could also be applied to 'for' constructs (loops, > list comprehensions, and generators) where assignments are implicit. > > Parallel looping (esp using zip) is a great use case for this. Here's > a case that's come up more than once for me that "structured" > assignments would solve really nicely: > > for n, (a, b) in enumerate(list_of_pairs): ... > > Currently, I must do the following instead: > > for n, pair in enumerate(list_of_pairs): > a, b = pair > ... > > This isn't such a great solution, because there's more indirection > with the introduction of an otherwise useless variable; and (less > significantly) there's an extra line of code that doesn't actually > compute anything. > > Thoughts? > > Daniel > > PS: Sorry if this has already been discussed; I'm new to this list and > I didn't see this mentioned in PEP 3099, unless it's covered under the > LL(1) clause. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From allyourcode at gmail.com Thu May 29 06:51:11 2008 From: allyourcode at gmail.com (allyourcode at gmail.com) Date: Wed, 28 May 2008 21:51:11 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> Message-ID: <7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com> I just looked through the official tutorial and Dive into Python, and didn't find anything about it in either of those places. While this feature is documented in the language reference, it does not seem to be a well-known feature (another example: at least one other person did not know about it). On 5/28/08, allyourcode at gmail.com wrote: > Well, I'm sorry for bothering his majesty with such a stupid idea. At > least one other person didn't know about it either... > > On 5/28/08, Mike Klaas wrote: >> >> On 28-May-08, at 6:23 PM, Daniel Wong wrote: >> >>> Currently, I must do the following instead: >>> >>> for n, pair in enumerate(list_of_pairs): >>> a, b = pair >>> ... >>> >>> <> >>> Thoughts? >> >> I find it hard to believe that you have even attempted this, which has >> been valid in python for ages: >> >> >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]): >> print x, a, b >> >> 0 1 2 >> 1 3 4 >> 2 5 6 >> >> -Mike >> > From allyourcode at gmail.com Thu May 29 06:52:28 2008 From: allyourcode at gmail.com (allyourcode at gmail.com) Date: Wed, 28 May 2008 21:52:28 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Message-ID: <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com> Indeed. Thank you, Guido. On 5/28/08, Guido van Rossum wrote: > Apart from the missing comma after 'big' this is already supported. > > The time machine strikes again! > > --Guido > > On Wed, May 28, 2008 at 6:23 PM, Daniel Wong wrote: >> Hi, >> >> Are there plans for introducing syntax like this: >> >> (a, (b[2], c)) = ('big' ('red', 'dog')) >> >> It seems quite doable, because Professor Hillfinger at UC Berkeley >> created pyth, a dialect of Python, which has this feature. See page 10 >> of the spec he created for his students to implement the language: >> >> http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf >> >> Of course, this idea could also be applied to 'for' constructs (loops, >> list comprehensions, and generators) where assignments are implicit. >> >> Parallel looping (esp using zip) is a great use case for this. Here's >> a case that's come up more than once for me that "structured" >> assignments would solve really nicely: >> >> for n, (a, b) in enumerate(list_of_pairs): ... >> >> Currently, I must do the following instead: >> >> for n, pair in enumerate(list_of_pairs): >> a, b = pair >> ... >> >> This isn't such a great solution, because there's more indirection >> with the introduction of an otherwise useless variable; and (less >> significantly) there's an extra line of code that doesn't actually >> compute anything. >> >> Thoughts? >> >> Daniel >> >> PS: Sorry if this has already been discussed; I'm new to this list and >> I didn't see this mentioned in PEP 3099, unless it's covered under the >> LL(1) clause. >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From allyourcode at gmail.com Thu May 29 07:55:34 2008 From: allyourcode at gmail.com (Daniel Wong) Date: Wed, 28 May 2008 22:55:34 -0700 Subject: [Python-3000] non-local assignment Message-ID: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com> I'm confused by the section on "no alternate binding operator" in PEP 3099. On the one hand, it says no alternative binding operator will be considered; yet the link provided shows that Guido is in favor of developing a syntax for non-local assignment. Please excuse me if this post violates that rule. Here's my suggestion on what the syntax should look like: set! var val Scheme users will recognize this syntax, which has the distinct advantage of not being confusable with regular assignment; whereas, this is an unfortunate feature of :=, which Guido has already rejected. The way this is supposed to work is you go to the inner-most scope in which var is declared and change its value there to val. If var does not occur in any containing scope, you could raise an UndeclaredVariable exception. Thoughts? Daniel From cvrebert at gmail.com Thu May 29 08:08:02 2008 From: cvrebert at gmail.com (Chris Rebert) Date: Wed, 28 May 2008 23:08:02 -0700 Subject: [Python-3000] non-local assignment In-Reply-To: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com> References: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com> Message-ID: <47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com> It's been decided to go w/ the "nonlocal" keyword to declare outer variables (ala the "global" keyword) rather than using an alternate assignment operator (which was one of the competing proposals). It's too late to make a change such as your suggestion because PEP 3104 ( http://www.python.org/dev/peps/pep-3104/ ), which proposed "nonlocal", has already been accepted (and BDFL-blessed IIRC). Furthermore, there's no precedent for Python operators to use both a keyword and punctuation together like "set!", and "set" can't be used instead as it's the name of a builtin type (in Py3K). In the future, searching the list archives can be quite helpful. - Chris Rebert On Wed, May 28, 2008 at 10:55 PM, Daniel Wong wrote: > I'm confused by the section on "no alternate binding operator" in PEP > 3099. On the one hand, it says no alternative binding operator will be > considered; yet the link provided shows that Guido is in favor of > developing a syntax for non-local assignment. Please excuse me if this > post violates that rule. Here's my suggestion on what the syntax > should look like: > > set! var val > > Scheme users will recognize this syntax, which has the distinct > advantage of not being confusable with regular assignment; whereas, > this is an unfortunate feature of :=, which Guido has already > rejected. > > The way this is supposed to work is you go to the inner-most scope in > which var is declared and change its value there to val. If var does > not occur in any containing scope, you could raise an > UndeclaredVariable exception. > > Thoughts? > > Daniel > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com > From stefan_ml at behnel.de Thu May 29 08:15:25 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 08:15:25 +0200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: Message-ID: Travis Oliphant wrote: > Stefan Behnel wrote: >> Anyway, my point is that this part of the protocol actually implies >> setting a >> lock on the buffer *provider* rather than the buffer itself, as the >> buffer >> provider cannot distinguish between different buffers based on a NULL >> pointer > > Yes, the language in the PEP could be more clear. Obviously, if you > haven't provided a Py_buffer structure to fill in, then you are only > asking to lock the object's buffer from other access. That's what I'm questioning below. > Naturally, the exporter should handle the case when no lock is actually > requested. That would be considered a bug, right? So it should raise an exception? I can't find that in the PEP. >> But wouldn't it make more sense to *always* >> pass the buffer pointer, to let the provider decide what it makes of the >> flags? > > Perhaps we are not understanding each other. The Py_buffer structure > and the buffer pointer are 2 separate things. I know, I wasn't clear, but I actually meant what I said: the buffer pointer may not be without interest. Imagine the case that a provider decides to create more than one buffer, maybe one for read-only access and one for each concurrent request for write access (and then merge the changes back on release). Then creating the lock by passing NULL as Py_buffer would set a lock on the provider object, not the respective write buffer (or even the read-buffer, where no lock is required). That would be hard to handle by the provider. > the buf member of > the structure is the actual buffer pointer and it is un-defined when > getbuffer is called and it contains the buffer pointer on successful > return. But that's only for the buffer creation case. A lock request could just pass in the correct buffer and set the LOCK flag. That doesn't even change the single buffer case (where overwriting the buffer pointer with itself does no harm), but it enables the multiple buffer case. Stefan From allyourcode at gmail.com Thu May 29 08:24:47 2008 From: allyourcode at gmail.com (allyourcode at gmail.com) Date: Wed, 28 May 2008 23:24:47 -0700 Subject: [Python-3000] non-local assignment In-Reply-To: <47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com> References: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com> <47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com> Message-ID: <7c8225f20805282324j149d5d53hdcce7de28c69eb75@mail.gmail.com> I actually read a good portion of the thread that PEP 3099 refers to, so I thought I had read up on the subject before making my suggestion. I had also perused that PEP and didn't realize there was no way my suggestion could be accepted. I suppose it's too late, but I think it's too bad that a negative keyword was selected, although it is completely accurate. On 5/28/08, Chris Rebert wrote: > It's been decided to go w/ the "nonlocal" keyword to declare outer > variables (ala the "global" keyword) rather than using an alternate > assignment operator (which was one of the competing proposals). It's > too late to make a change such as your suggestion because PEP 3104 ( > http://www.python.org/dev/peps/pep-3104/ ), which proposed "nonlocal", > has already been accepted (and BDFL-blessed IIRC). > > Furthermore, there's no precedent for Python operators to use both a > keyword and punctuation together like "set!", and "set" can't be used > instead as it's the name of a builtin type (in Py3K). > > In the future, searching the list archives can be quite helpful. > > - Chris Rebert > > > On Wed, May 28, 2008 at 10:55 PM, Daniel Wong wrote: >> I'm confused by the section on "no alternate binding operator" in PEP >> 3099. On the one hand, it says no alternative binding operator will be >> considered; yet the link provided shows that Guido is in favor of >> developing a syntax for non-local assignment. Please excuse me if this >> post violates that rule. Here's my suggestion on what the syntax >> should look like: >> >> set! var val >> >> Scheme users will recognize this syntax, which has the distinct >> advantage of not being confusable with regular assignment; whereas, >> this is an unfortunate feature of :=, which Guido has already >> rejected. >> >> The way this is supposed to work is you go to the inner-most scope in >> which var is declared and change its value there to val. If var does >> not occur in any containing scope, you could raise an >> UndeclaredVariable exception. >> >> Thoughts? >> >> Daniel >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com >> > From ishimoto at gembook.org Thu May 29 08:40:22 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Thu, 29 May 2008 15:40:22 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> Message-ID: <797440730805282340h2ea9f8dfqba91f0e67f7e273e@mail.gmail.com> On Tue, May 27, 2008 at 10:06 AM, Jim Jewett wrote: >> * Characters defined in the Unicode character database as "Separator" >> (Zl, Zp, Zs) other than ASCII space(0x20). > > Please put in a note that Zl and Zp refer only to two specific > unicode characters, not to what most people think of as line > separators or paragraph markers. Thank you for suggestion. > >> * Backslash-escape quote characters(apostrophe, ') and add quote >> character at the beginning and the end. > > Do you just mean the two ASCII quotation marks that python uses? No, just an apostrophe(') as current Python. > > As written, I wondered whether it would include backquote or guillemet. Proposal to change repr() for these character is not included in this PEP, although I don't know what guillemet is. > >> - Add ``'%a'`` string format operator. ``'%a'`` converts any python >> object to string using ``repr()`` and then hex-escape all non-ASCII >> characters. ``'%a'`` operator generates same string as ``'%r'`` in >> Python 2. > > Then why not keep the old %r, and add a new one for the unicode repr? > repr() and "%r" should be consistent with object's __repr()__ function. > Is it again because of the bug where str([..., mystr, ...]) ends up > doing repr on mystr? I don't think it a bug, as other people described. > >> - Add ``ascii()`` builtin function. ``ascii()`` converts any python >> object to string using ``repr()`` and then hex-escape all non-ASCII >> characters. ``ascii()`` generates same string as ``repr()`` in Python 2. > > The problem isn't that I want to be able to write code that acts the > old way; the problem is that I want to ensure all code running on my > system acts the old way. > Adding an ascii() function doesn't help. I can understand your worry to possible code breakage, but still I think this PEP is right thing for Python 3000. ascii() may make porting code to Python 3000 easier a bit. > >> Strings to be printed for debugging are not only contained by lists or >> dicts, but also in many other types of object. File objects contain a >> file name in Unicode, exception objects contain a message in Unicode, >> etc. These strings should be printed in readable form when repr()ed. >> It is unlikely to be possible to implement a tool to print all >> possible object types. > > You could go a long way (particularly in Py3k, where everything > inherits from object) by changing the builtin containers, and changing Changing builtin containers is not sufficient, so the way would be too long to be practical. Do you wish to override __repr__() method of all types you encounter? >> - Make the encoding used by ``unicode_repr()`` adjustable, and make >> current ``repr()`` as default. > >> With adjustable ``repr()``, result of ``repr()`` is unpredictable and >> would make impossible to write correct code involving ``repr()``. > > No more so than 3138. The setting of repr is predictable on a given > system. (Even if you make it a changeable during a single run, it is > predictable by checking first.) Across systems, the 3138 proposal is > already unpredictable, because you don't know which systems will apply > backslash-replace on which characters (and on which runs). > In this PEP, result of repr() is perfectly predictable. The repr() generates exactly same string among systems. But in general, strings printed to console, whether generated by repr() or not, are less predictable. Some characters in the string may be backslash-escaped, may be replaced by '?' or may raise exception depending on user's configuration. From ishimoto at gembook.org Thu May 29 08:40:50 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Thu, 29 May 2008 15:40:50 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> Message-ID: <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> On Wed, May 28, 2008 at 5:12 AM, Jim Jewett wrote: >> >> - Add ``'%a'`` string format operator. ``'%a'`` converts any python >> >> object to string using ``repr()`` and then hex-escape all non-ASCII >> >> characters. ``'%a'`` operator generates same string as ``'%r'`` in >> >> Python 2. > >> > Then why not keep the old %r, and add a new one for the unicode repr? > >> repr() and "%r" should be consistent with object's __repr()__ function. > > Let me rephrase that: > > Why change repr and add a replacement that acts like old repr? The "%r" and ascii() are not in my original proposal, but proposed in this discussion. I added them to the PEP, but still I'm not sure they are neccesary. > > Wouldn't it be easier to just add a new function (and format > character) that act in the desirable new way? That way there are no > backwards compatibility problems, and people who use it will make an > explicit choice that can be trusted. Adding a new function is not enough, but we should define new protocol to types such as __unicode_repr__() and implement them . For example, the list type should implement a method which does almost same job as __repr__(). class List: def __repr__(self): return "[%s]" % ",".join(repr(s) for s in self._items) def __unicode_repr__(self): return "[%s]" % ",".join(unicode_repr(s) for s in self._items) I think keeping old repr() is not worth this effort. > What I really want is that the > > "No str? Use repr instead" > > fallback change into > > "No str? Use repr on *this* object instead, but keep using str on > subobjects if those are printed" > Even If Python changed to call str() on subobjects as you want, I'll still insist on PEP 3138. Printing result of str() is not always relevant to debugging, and repr() is designed for debugging. str() can not be a replacement for repr(). From stefan_ml at behnel.de Thu May 29 09:22:22 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 09:22:22 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: Christian Heimes wrote: > * add a new file stringobject.h which contains the aliases PyString_ -> > PyBytes_ Just a quick note that that file is still missing from SVN, so it's kind of hard to compile existing code against the current branch state... Stefan From jcea at jcea.es Thu May 29 09:34:18 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 29 May 2008 09:34:18 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming (Stabilizing the C API of 2.6 and 3.0) In-Reply-To: <483D5444.8000705@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> Message-ID: <483E5C7A.2090507@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 M.-A. Lemburg wrote: | If you use PyBytes APIs, you expect to find PyBytes functions in | the libs and also set breakpoints on these. Very good point. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ ~ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSD5ccplgi5GaxT1NAQIZwQP/SMW+GFHxPWui2/tjj2DgZtnzYigjQj/o T8/DYFXEwls65E1xukOi3zS9ePU49u+i36EaVOvYmYdasedTmODnV3anmBo49VFv rsWWr4BBbRwLj4TjjwWPGy7KNKCvyG/mIiBH0uq9tOe2oW9gZng67e1f3snBIite mw4qF6w9bmw= =1Rh8 -----END PGP SIGNATURE----- From stefan_ml at behnel.de Thu May 29 09:57:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 09:57:27 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> Message-ID: Lisandro Dalcin wrote: > Chistian, I've posted some weeks ago some observation about the status > of PyNumberMethods API. The thread link is below, I t did not received > much atention. > > http://mail.python.org/pipermail/python-3000/2008-May/013594.html > > Now I sumarize that post > > * 'nb_nonzero' was renamed to 'nb_bool' That's a non-critical change. Usage of these field names outside of the Python core should be extremely rare. > * 'nb_inplace_divide' was removed as was nb_divide, apparently, which is pretty close to the beginning of the struct. > * 'nb_hex', 'nb_oct', and 'nb_coerce' are there, but they are unused > > IMHO, the PyNumbersMethods struct should be left as in Py2, or it > should be cleaned up, that is, all unused slots should be removed. Since there were already two fields right inside the struct that were removed (one even before the three you mention), I think it makes sense to remove the remaining left-overs also. I filed a bug. http://bugs.python.org/issue2997 Stefan From stefan_ml at behnel.de Thu May 29 10:30:55 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 10:30:55 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: <483D5444.8000705@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> Message-ID: M.-A. Lemburg wrote: > If you use PyBytes APIs, you expect to find PyBytes functions in > the libs and also set breakpoints on these. AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here. Besides, how likely is it that users set a breakpoint on the PyBytes/PyString functions? Stefan From paul.bedaride at gmail.com Thu May 29 10:50:25 2008 From: paul.bedaride at gmail.com (paul bedaride) Date: Thu, 29 May 2008 10:50:25 +0200 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com> Message-ID: this work, (a, (b[2], c)) = ('big', ('red', 'dog')) but this not (a, (b[2], c)) += ('big' ('red', 'dog')) paul bedaride On Thu, May 29, 2008 at 6:52 AM, wrote: > Indeed. Thank you, Guido. > > On 5/28/08, Guido van Rossum wrote: > > Apart from the missing comma after 'big' this is already supported. > > > > The time machine strikes again! > > > > --Guido > > > > On Wed, May 28, 2008 at 6:23 PM, Daniel Wong > wrote: > >> Hi, > >> > >> Are there plans for introducing syntax like this: > >> > >> (a, (b[2], c)) = ('big' ('red', 'dog')) > >> > >> It seems quite doable, because Professor Hillfinger at UC Berkeley > >> created pyth, a dialect of Python, which has this feature. See page 10 > >> of the spec he created for his students to implement the language: > >> > >> http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf > >> > >> Of course, this idea could also be applied to 'for' constructs (loops, > >> list comprehensions, and generators) where assignments are implicit. > >> > >> Parallel looping (esp using zip) is a great use case for this. Here's > >> a case that's come up more than once for me that "structured" > >> assignments would solve really nicely: > >> > >> for n, (a, b) in enumerate(list_of_pairs): ... > >> > >> Currently, I must do the following instead: > >> > >> for n, pair in enumerate(list_of_pairs): > >> a, b = pair > >> ... > >> > >> This isn't such a great solution, because there's more indirection > >> with the introduction of an otherwise useless variable; and (less > >> significantly) there's an extra line of code that doesn't actually > >> compute anything. > >> > >> Thoughts? > >> > >> Daniel > >> > >> PS: Sorry if this has already been discussed; I'm new to this list and > >> I didn't see this mentioned in PEP 3099, unless it's covered under the > >> LL(1) clause. > >> _______________________________________________ > >> Python-3000 mailing list > >> Python-3000 at python.org > >> http://mail.python.org/mailman/listinfo/python-3000 > >> Unsubscribe: > >> http://mail.python.org/mailman/options/python-3000/guido%40python.org > >> > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/ > ) > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wescpy at gmail.com Thu May 29 10:56:55 2008 From: wescpy at gmail.com (wesley chun) Date: Thu, 29 May 2008 01:56:55 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? Message-ID: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> hi, i'm looking to duplicate this string format operator '#' functionality with the new format(). here it is using the old string format operator: >>> i = 45 >>> 'dec: %d/oct: %o/hex: %X' % (i, i, i) # no "#" means no leading "0" or "0x/X" 'dec: 45/oct: 55/hex: 2D' >>> 'dec: %d/oct: %#o/hex: %#X' % (i, i, i) # leading "#" gives us "0" and "0x/X" 'dec: 45/oct: 0o55/hex: 0X2D' if i repeat both of the above with format(), it fails with the "#": >>> 'dec: {0}/oct: {0:o}/hex: {0:X}'.format(i) 'dec: 45/oct: 55/hex: 2D' >>> 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i) Traceback (most recent call last): File "", line 1, in 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i) ValueError: Invalid conversion specification i have to resort to the uglier: >>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i) 'dec: 45/oct: 0o55/hex: 0X2D' is this functionality being dropped, or am i missing something? i didn't get anything from searching the Py3000 mailing list archives. i couldn't find anything in either formatter.h nor stringobject.c. secondly, and much more minor, is that i think there's a minor typo in the PEP: print format(10.0, "7.3g") <-- print() is now a function so it needs another pair of ( ). thanks, -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "Core Python Programming", Prentice Hall, (c)2007,2001 http://corepython.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com From stefan_ml at behnel.de Thu May 29 10:59:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 10:59:29 +0200 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> <7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com> Message-ID: allyourcode at gmail.com wrote: > I just looked through the official tutorial and Dive into Python, and > didn't find anything about it in either of those places. Tutorial section on "tuples and sequences", not quite the most hidden place in the universe. http://docs.python.org/tut/node7.html#SECTION007300000000000000000 Stefan From ncoghlan at gmail.com Thu May 29 11:47:37 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 29 May 2008 19:47:37 +1000 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> Message-ID: <483E7BB9.5060002@gmail.com> wesley chun wrote: > i have to resort to the uglier: > >>>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i) > 'dec: 45/oct: 0o55/hex: 0X2D' Is being explicit about the displayed prefix really that much uglier? The old # alternative display formats were somewhat arbitrary. > is this functionality being dropped, or am i missing something? i > didn't get anything from searching the Py3000 mailing list archives. i > couldn't find anything in either formatter.h nor stringobject.c. > > secondly, and much more minor, is that i think there's a minor typo in the PEP: > print format(10.0, "7.3g") <-- print() is now a function so it needs > another pair of ( ). It works fine as written in 2.x :) (but, yes, you're right that as a 3000-series PEP, 3101 should probably treat print() as a function in its examples) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Thu May 29 11:53:05 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 29 May 2008 19:53:05 +1000 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> Message-ID: <483E7D01.4010603@gmail.com> Stefan Behnel wrote: > M.-A. Lemburg wrote: >> If you use PyBytes APIs, you expect to find PyBytes functions in >> the libs and also set breakpoints on these. > > AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here. The PyBytes_* functions appear to be there, but a preprocessor macro means it is actually the PyString_* functions that appear in the Python DLL. That's great from a backwards compatibility point of view, but seriously confusing from the point of view of anyone trying to embed or otherwise debug Python 2.6. > Besides, how likely is it that users set a breakpoint on the PyBytes/PyString > functions? Not very likely at all - but it would still be nice if the PyBytes_* symbols were visible to the linker as well as the preprocessor. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From lists at cheimes.de Thu May 29 11:59:28 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 May 2008 11:59:28 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> Message-ID: <483E7E80.70403@cheimes.de> Stefan Behnel schrieb: > Christian Heimes wrote: >> * add a new file stringobject.h which contains the aliases PyString_ -> >> PyBytes_ > > Just a quick note that that file is still missing from SVN, so it's kind of > hard to compile existing code against the current branch state... No, the file is in SVN. It's just not in the py3k branch because it's not vital to the core. I had plans to add a Python 2.x compatibility header to Python 3.0 But I'm not going to spend any more time on the topic or any other development until we have reached an agreement on the naming. I don't want to waste more of my free time in vain. Christian From lists at cheimes.de Thu May 29 12:01:44 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 May 2008 12:01:44 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> Message-ID: <483E7F08.80704@cheimes.de> Stefan Behnel schrieb: > M.-A. Lemburg wrote: >> If you use PyBytes APIs, you expect to find PyBytes functions in >> the libs and also set breakpoints on these. > > AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here. In Python 2.6 the PyBytes_* functions are only available to the compiler but not to the linker. In 2.6 the ABI functions are PyString_* and in 3.0 it's PyBytes_* Christian From qrczak at knm.org.pl Thu May 29 12:05:20 2008 From: qrczak at knm.org.pl (=?UTF-8?Q?Marcin_=E2=80=98Qrczak=E2=80=99_Kowalczyk?=) Date: Thu, 29 May 2008 12:05:20 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: <483E7D01.4010603@gmail.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> <483E7D01.4010603@gmail.com> Message-ID: <3f4107910805290305s63b97e73i87824755f7aa31fb@mail.gmail.com> 2008/5/29 Nick Coghlan : > it would still be nice if the PyBytes_* symbols > were visible to the linker as well as the preprocessor. If this is not a strict requirement but a useful extra, then it might be done in an unportable way. GCC has an 'alias' attribute: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From stefan_ml at behnel.de Thu May 29 12:08:47 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 29 May 2008 12:08:47 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: <483E7F08.80704@cheimes.de> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> <483E7F08.80704@cheimes.de> Message-ID: Christian Heimes wrote: > Stefan Behnel schrieb: >> M.-A. Lemburg wrote: >>> If you use PyBytes APIs, you expect to find PyBytes functions in >>> the libs and also set breakpoints on these. >> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here. > > In Python 2.6 the PyBytes_* functions are only available to the compiler > but not to the linker. In 2.6 the ABI functions are PyString_* and in > 3.0 it's PyBytes_* Ah, even better then. Given that it was always PyString_*() in Py2, that totally sounds like the right thing to me. I really don't think anyone using the newly advertised Py3 PyBytes_*() C-API functions will honestly expect them to be available in a 2.x binary lib. Stefan From ncoghlan at gmail.com Thu May 29 12:34:58 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 29 May 2008 20:34:58 +1000 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> <483E7D01.4010603@gmail.com> Message-ID: <483E86D2.5000809@gmail.com> Stefan Behnel wrote: > Nick Coghlan wrote: >> Stefan Behnel wrote: >>> Besides, how likely is it that users set a breakpoint on the >>> PyBytes/PyString functions? >> Not very likely at all - but it would still be nice if the PyBytes_* >> symbols were visible to the linker as well as the preprocessor. > > Right, that's a nice-to-have, an add-on. Why don't we just let Christian > finish his work, which is vital for the beta release? Then it's still time to > file a bug report on the missing bits and provide a patch that adds linker > symbols for PyBytes_*() in Py2.6 as an additional feature. Yeah, it took me a while to get my head around what he was trying to do, but GPS explained it pretty well elsewhere in this thread. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From qgallet at gmail.com Thu May 29 15:25:12 2008 From: qgallet at gmail.com (Quentin Gallet-Gilles) Date: Thu, 29 May 2008 15:25:12 +0200 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com> On Thu, May 29, 2008 at 9:12 AM, Georg Brandl wrote: > Brett Cannon schrieb: > >> The issues related to PEP 3108 now total 14. With the beta >> (supposedly) in a week, I am hoping the last minor details can be >> pulled together or decisions made on what can be postponed and what >> should definitely be considered a release blocker. >> >> Issue 2847 - the aifc module still imports the cl module in 3.0. >> Problem is that the cl module is gone. =) So it seems silly to have >> the imports lying about. This can probably be changed to critical. >> > > It shouldn't be a problem to rip everything cl-related out of aifc. > The question is how useful aifc will be after that ... > Has someone already used that module ? I took a look into it, but I'm a bit confused about the various compression types, case-sensitivity and compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the same encoding ? Can we use the audioop module for ALAW, just like it's already done for ULAW ? [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric+python-dev at trueblade.com Thu May 29 15:28:53 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 29 May 2008 09:28:53 -0400 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> Message-ID: <483EAF95.5050503@trueblade.com> wesley chun wrote: > hi, > > i'm looking to duplicate this string format operator '#' functionality > with the new format(). here it is using the old string format > operator: > >>>> i = 45 >>>> 'dec: %d/oct: %o/hex: %X' % (i, i, i) # no "#" means no leading "0" or "0x/X" > 'dec: 45/oct: 55/hex: 2D' >>>> 'dec: %d/oct: %#o/hex: %#X' % (i, i, i) # leading "#" gives us "0" and "0x/X" > 'dec: 45/oct: 0o55/hex: 0X2D' > > if i repeat both of the above with format(), it fails with the "#": > >>>> 'dec: {0}/oct: {0:o}/hex: {0:X}'.format(i) > 'dec: 45/oct: 55/hex: 2D' >>>> 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i) > Traceback (most recent call last): > File "", line 1, in > 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i) > ValueError: Invalid conversion specification > > i have to resort to the uglier: > >>>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i) > 'dec: 45/oct: 0o55/hex: 0X2D' > > is this functionality being dropped, or am i missing something? i > didn't get anything from searching the Py3000 mailing list archives. i > couldn't find anything in either formatter.h nor stringobject.c. I don't see it as a big problem. You can now use any prefix you want, instead of the hard coded values that # supplied. > > secondly, and much more minor, is that i think there's a minor typo in the PEP: > print format(10.0, "7.3g") <-- print() is now a function so it needs > another pair of ( ). Fixed in r63786. Thanks for catching it. There was another print() function already in the PEP, so clearly the intent was to be 3.0 compliant. Eric. > > thanks, > -- wesley > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > "Core Python Programming", Prentice Hall, (c)2007,2001 > http://corepython.com > > wesley.j.chun :: wescpy-at-gmail.com > python training and technical consulting > cyberweb.consulting : silicon valley, ca > http://cyberwebconsulting.com > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com > From g.brandl at gmx.net Thu May 29 16:28:16 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 29 May 2008 16:28:16 +0200 Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com> <483E7F08.80704@cheimes.de> Message-ID: Stefan Behnel schrieb: > Christian Heimes wrote: >> Stefan Behnel schrieb: >>> M.-A. Lemburg wrote: >>>> If you use PyBytes APIs, you expect to find PyBytes functions in >>>> the libs and also set breakpoints on these. >>> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here. >> >> In Python 2.6 the PyBytes_* functions are only available to the compiler >> but not to the linker. In 2.6 the ABI functions are PyString_* and in >> 3.0 it's PyBytes_* > > Ah, even better then. Given that it was always PyString_*() in Py2, that > totally sounds like the right thing to me. I really don't think anyone using > the newly advertised Py3 PyBytes_*() C-API functions will honestly expect them > to be available in a 2.x binary lib. Can't we have the best of both worlds -- have the macro and a stub function for the linker, like done with PyErr_Warn? Georg From mal at egenix.com Thu May 29 16:51:25 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 29 May 2008 16:51:25 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <08May28.100829pdt."58698"@synergy1.parc.xerox.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <08May28.100829pdt."58698"@synergy1.parc.xerox.com> Message-ID: <483EC2ED.7010104@egenix.com> On 2008-05-28 19:08, Bill Janssen wrote: >> I'm beginning to wonder whether I'm the only one who cares about >> the Python 2.x branch not getting cluttered up with artifacts caused >> by a broken forward merge strategy. > > I share your concern. Seems to me that perhaps (not sure, but > perhaps) the rush to back-port from 3.x, and the concern about > minimizing pain of moving from 2.x to 3.x, has become the tail wagging > the dog. Indeed. If the need to be able to forward merge changes from the 2.x trunk to the 3.x branch is the only reason for the current approach, then we need to find a better procedure for getting patches to 2.x forwarded to 3.x. I believe that everyone is aware that 3.x breaks things and that's fine. However, the reason for introducing such breakage in 3.x is that users have the option to decide whether and when to switch to the new major version. Being able to play with 3.x features in 2.x is nice, but I wouldn't really consider those essential for 2.x. It certainly doesn't warrant causing major problems in the 2.x releases. The module renaming backport was one example (which was undone again), the C API renaming is another. I expect more such features to be backported from 3.x to 2.x (even though I don't really think it's worth the trouble) and since this always means that changes have to applied in two worlds, we'll need a better process for getting changes in one major release ported to the other. Simply tweaking 2.x into shape so that the rather simple minded SVN merge command works, isn't a good enough procedure for this. That's why I suggested to use an intermediate form or branch for the merging - one that implements the 2.x with all renaming and syntax fixing applied. This would: * reduce the number of merge conflicts since the renaming would already have happened * reduce the patch sizes that have to be applied to 3.x in order to stay in sync with 2.x * result in a tool chain that makes it easier for all Python users to port their code to 3.x * simplify renaming or reorg of modules, functions, methods and C APIs without requiring major changes on either side -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 29 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 38 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Thu May 29 17:22:58 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 29 May 2008 17:22:58 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> Message-ID: <483ECA52.6040000@egenix.com> On 2008-05-28 22:47, Gregory P. Smith wrote: > On Wed, May 28, 2008 at 3:12 AM, M.-A. Lemburg wrote: >> I'm beginning to wonder whether I'm the only one who cares about >> the Python 2.x branch not getting cluttered up with artifacts caused >> by a broken forward merge strategy. >> >> How can it be that we allow major C API changes such as the renaming >> of the PyString APIs to go into the trunk without discussion or >> a PEP ? > > I do not consider it a C API change. The API and ABI have not > changed. Old code still compiles. Old binaries still dynamically > load and work fine. (I just confirmed this by importing a couple > python2.4 .so files into my non-debug build of 2.6 trunk) > > A of the PyString APIs are the real implementations in 2.x and are > still there. We only switched to using their PyBytes equivalent names > within the Python trunk code base. > > Are you objecting to our own code switching to use a different name > even though the actual underlying API and ABI haven't changed? I > suppose to people reading the code and going against old reference > books it could be confusing but they've got to get used to the new > names somehow and sometime. > > I strongly support changes like this one that makes the life of > porting C code forwards and backwards between 2.x and 3.x easier > without breaking compatibility with earlier 2.x version because that > is going to be a serious pain for all of us otherwise. Well, first of all, it is a change in the C API: APIs have different names now, they live in different files, the Python documentation doesn't apply anymore, books have to be updated, programmers trained, etc. etc. That's fine for 3.x, it's not for 2.x. Second, if you leave out the "ease merging" argument, all of this is not really necessary in 2.x. If you absolutely want to have PyBytes APIs in 2.x, then you can *add* them, without removing the PyString APIs. We have done that on a smaller scale a couple of times in the past (turned functions into macros or vice-versa). And finally, the "merge" argument itself is not really all that strong. It's just a matter of getting the procedure corrected. Then you can rename and restructure as much as you want in 3.x - without affecting the stability and matureness of the 2.x branch. I suspect more of these backports to happen, so we better get things done right now instead of putting Python's reputation as stable and mature programming language at risk. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 29 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 38 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From lars at ibp.de Thu May 29 16:56:20 2008 From: lars at ibp.de (Lars Immisch) Date: Thu, 29 May 2008 16:56:20 +0200 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com> References: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com> Message-ID: <483EC414.7080603@ibp.de> > Issue 2847 - the aifc module still imports the cl module in 3.0. > Problem is that the cl module is gone. =) So it seems silly to have > the imports lying about. This can probably be changed to critical. > > > It shouldn't be a problem to rip everything cl-related out of aifc. > The question is how useful aifc will be after that ... > > > Has someone already used that module ? I took a look into it, but I'm a > bit confused about the various compression types, case-sensitivity and > compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the > same encoding ? Can we use the audioop module for ALAW, just like it's > already done for ULAW ? There is just one alaw I've ever come across (G.711), and the audioop implementation could be used (audioop's alaw support is younger than the aifc module, BTW) The capitalisation is confusing, but your document [1] says: "Apple Computer's QuickTime player recognize only the Apple compression types. Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" and "ulaw" formats and were in use long before Apple introduced the new codes, QuickTime does not recognize them." So this seems just a matter of naming in the AIFC, but not a matter of two different alaw implementations. - Lars [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html From qgallet at gmail.com Thu May 29 17:39:17 2008 From: qgallet at gmail.com (Quentin Gallet-Gilles) Date: Thu, 29 May 2008 17:39:17 +0200 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: <483EC414.7080603@ibp.de> References: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com> <483EC414.7080603@ibp.de> Message-ID: <8b943f2b0805290839s7a1f3238g9e21407a56c34159@mail.gmail.com> On Thu, May 29, 2008 at 4:56 PM, Lars Immisch wrote: > > >> Issue 2847 - the aifc module still imports the cl module in 3.0. >> Problem is that the cl module is gone. =) So it seems silly to have >> the imports lying about. This can probably be changed to critical. >> >> >> It shouldn't be a problem to rip everything cl-related out of aifc. >> The question is how useful aifc will be after that ... >> >> >> Has someone already used that module ? I took a look into it, but I'm a >> bit confused about the various compression types, case-sensitivity and >> compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the >> same encoding ? Can we use the audioop module for ALAW, just like it's >> already done for ULAW ? >> > > There is just one alaw I've ever come across (G.711), and the audioop > implementation could be used (audioop's alaw support is younger than the > aifc module, BTW) > > The capitalisation is confusing, but your document [1] says: "Apple > Computer's QuickTime player recognize only the Apple compression types. > Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" and > "ulaw" formats and were in use long before Apple introduced the new codes, > QuickTime does not recognize them." > > So this seems just a matter of naming in the AIFC, but not a matter of two > different alaw implementations. > > - Lars > Ok, I'll handle this issue. I'll be using the audioop implementation as a replacement of the SGI compression library. I'll also create a test suite, as Brett mentioned in the bug tracker the module was missing one. Quentin > > [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Thu May 29 17:45:24 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 29 May 2008 17:45:24 +0200 Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483ECA52.6040000@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> Message-ID: <483ECF94.7060607@cheimes.de> M.-A. Lemburg schrieb: > Well, first of all, it is a change in the C API: > APIs have different names now, they live in different files, > the Python documentation doesn't apply anymore, books have to > be updated, programmers trained, etc. etc. That's fine for > 3.x, it's not for 2.x. No, that's not correct. The 2.x API is still the same. I've only changed the internal code. > Second, if you leave out the "ease merging" argument, all of > this is not really necessary in 2.x. If you absolutely want > to have PyBytes APIs in 2.x, then you can *add* them, without > removing the PyString APIs. We have done that on a smaller > scale a couple of times in the past (turned functions into > macros or vice-versa). The PyString methods are still available and the official API for dealing with str objects in 2.x. > And finally, the "merge" argument itself is not really all that > strong. It's just a matter of getting the procedure corrected. > Then you can rename and restructure as much as you want in > 3.x - without affecting the stability and matureness of the > 2.x branch. I'm volunteering to revert my chances if you are volunteering to keep the Python 2.x series in sync with the 3.x series. Christian From brett at python.org Thu May 29 19:32:04 2008 From: brett at python.org (Brett Cannon) Date: Thu, 29 May 2008 10:32:04 -0700 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: On Thu, May 29, 2008 at 12:12 AM, Georg Brandl wrote: > Brett Cannon schrieb: >> >> The issues related to PEP 3108 now total 14. With the beta >> (supposedly) in a week, I am hoping the last minor details can be >> pulled together or decisions made on what can be postponed and what >> should definitely be considered a release blocker. >> >> Issue 2847 - the aifc module still imports the cl module in 3.0. >> Problem is that the cl module is gone. =) So it seems silly to have >> the imports lying about. This can probably be changed to critical. > > It shouldn't be a problem to rip everything cl-related out of aifc. > The question is how useful aifc will be after that ... > If it ends up not being useful then the module can just go. >> Issue 2848 - mimetools has been deprecated for a while, but it is >> still used in a bunch of places. Since this has been deprecated in PEP >> 4 for a long time, should we add the removal warning in 2.6 now and >> then make its actual removal of usage something to do by another beta? >> >> Issue 2849 - rfc822 is the same problem as mimetools. > > The problem is that nobody seems to know what exactly distinguishes > mimetools/rfc822' classes and its successor's (email's) classes, so > it's hard to replace it in the stdlib. > Right. I have looked myself over the years and it never seemed brain-dead simple. >> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then >> again, pydoc is busted thanks to the new doc format. > > I will try to handle this in the coming week. > Fred had the interesting suggestion of removing pydoc in Py3K based on the thinking that documentation tools like pydoc should be external to Python. With the docs now so easy to generate directly, should pydoc perhaps just be gutted to only what is needed for help() to work? >> Issue 2919 - profile and cProfile needs to be merged. This has not >> been dealt with yet. Would it be reasonable to deprecate importing >> cProfile directly in 2.6 with the assumption the merge will work out >> for 3.0? > > That's not the right way to go, you don't want to deprecate cStringIO > or cPickle either. > Yeah, sorry, you're right. Guess my brain was not fully working when I wrote that. =) >> So that is everything that's left. Issue 2775 is the tracking issue so >> you can look there to see what issues are still open and need work. I >> was hoping to spend Monday and Tuesday trying to tie up as many loose >> ends as possible, but the conference paper I have been working on that >> was due Sunday is now due a week later, and so Monday and Tuesday will >> be spent on that (supervisor's orders). Plus I am flying out Wednesday >> for 10 days to help my mother move and I don't know when I will get >> Net again. In other words, I still need help. =) > > Let's hope we get this right in time. > > Then again, there are lots of other release blockers, so it may well be > that the beta is delayed by some time. Guess it depends on the whim of the release manager. =) -Brett From allyourcode at gmail.com Thu May 29 19:51:13 2008 From: allyourcode at gmail.com (allyourcode at gmail.com) Date: Thu, 29 May 2008 10:51:13 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com> Message-ID: <7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com> This is in response to Stefan Behnel, who wrote ---- Tutorial section on "tuples and sequences", not quite the most hidden place in the universe. http://docs.python.org/tut/node7.html#SECTION007300000000000000000 Stefan ---- I just read that section twice and no where does it mention that Python does what I was suggesting. In fact, by the discussion on sequence unpacking, it seems to imply that Python does *not* do what I wanted. PS: Sorry for letting all of this sarcasm get under my skin and clog up the mailing list, but it's really mean spirited and unnecssary. From wescpy at gmail.com Thu May 29 20:06:58 2008 From: wescpy at gmail.com (wesley chun) Date: Thu, 29 May 2008 11:06:58 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <483E7BB9.5060002@gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483E7BB9.5060002@gmail.com> Message-ID: <78b3a9580805291106v283d8502gc22eae871e0f5aa6@mail.gmail.com> > wesley chun wrote: >> >> i have to resort to the uglier: >> >>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i) >> 'dec: 45/oct: 0o55/hex: 0X2D' [Nick Coghlan ]: > Is being explicit about the displayed prefix really that much uglier? The > old # alternative display formats were somewhat arbitrary. [Eric Smith ]: > I don't see it as a big problem. You can now use any prefix you want, > instead of the hard coded values that # supplied. based on both your replies, it sounds like it's going away! :-) no, i don't have a problem with it. however, it'd be nice to put something about this in the PEP in case anyone else wonders/asks. >> print format(10.0, "7.3g") [Nick Coghlan ]: > It works fine as written in 2.x :) > (but, yes, you're right that as a 3000-series PEP, 3101 should probably > treat print() as a function in its examples) [Eric Smith ]: > Fixed in r63786. Thanks for catching it. There was another print() > function already in the PEP, so clearly the intent was to be 3.0 compliant. another 2 suggestions then (only pick one): 1. if both str.format() and format() are going to be backported to 2.x, there should be an example of it there too (see below where i'm also taking an additional liberty of changing "g" to "f" which i use more and give another number as an example): 2.x: >>> print format(10.8765, '7.2f') 10.88 3.x: >>> print(format(10.8765, '7.2f')) 10.88 2. drop the print altogether, esp. since this is about strings. >>> format(10.8765, '7.2f') ' 10.88' cheers, -wesley From mal at egenix.com Thu May 29 20:08:57 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 29 May 2008 20:08:57 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483ECF94.7060607@cheimes.de> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> Message-ID: <483EF139.8000606@egenix.com> Christian, so far you have not responded to any of the suggestions made on this thread, only defended your checkin. That's not very helpful in getting to some conclusion. * What's so hard about going with a proper, standard solution that doesn't involve using your preprocessor hack ? * Why can't we have both PyString *and* PyBytes exposed in 2.x, with one redirecting to the other ? * Why should the 2.x code base turn to hacks, just because 3.x wants to restructure itself ? * Why aren't you even considering my proposed solution for this whole renaming and reorg problem ? BTW: Is there some PEP or wiki page explaining how you actually implement the merging from 2.x to 3.x ? I'm still under the assumption that you're only using svnmerge.py for this and doing straight merging from the trunk to the branch. Not sure how others feel about it, but if the only option you would feel comfortable with is not having the 3.x renaming backported, then I'd rather go with that, really. It's easy enough to add a header file to map PyString APIs to PyBytes if you want to port an extension to 3.x. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 29 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 38 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 On 2008-05-29 17:45, Christian Heimes wrote: > M.-A. Lemburg schrieb: >> Well, first of all, it is a change in the C API: >> APIs have different names now, they live in different files, >> the Python documentation doesn't apply anymore, books have to >> be updated, programmers trained, etc. etc. That's fine for >> 3.x, it's not for 2.x. > > No, that's not correct. The 2.x API is still the same. I've only changed > the internal code. > >> Second, if you leave out the "ease merging" argument, all of >> this is not really necessary in 2.x. If you absolutely want >> to have PyBytes APIs in 2.x, then you can *add* them, without >> removing the PyString APIs. We have done that on a smaller >> scale a couple of times in the past (turned functions into >> macros or vice-versa). > > The PyString methods are still available and the official API for > dealing with str objects in 2.x. > >> And finally, the "merge" argument itself is not really all that >> strong. It's just a matter of getting the procedure corrected. >> Then you can rename and restructure as much as you want in >> 3.x - without affecting the stability and matureness of the >> 2.x branch. > > I'm volunteering to revert my chances if you are volunteering to keep > the Python 2.x series in sync with the 3.x series. > > Christian > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/mal%40egenix.com From qrczak at knm.org.pl Thu May 29 20:16:28 2008 From: qrczak at knm.org.pl (=?UTF-8?Q?Marcin_=E2=80=98Qrczak=E2=80=99_Kowalczyk?=) Date: Thu, 29 May 2008 20:16:28 +0200 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <483EAF95.5050503@trueblade.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> Message-ID: <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> 2008/5/29 Eric Smith : > I don't see it as a big problem. You can now use any prefix you want, > instead of the hard coded values that # supplied. Except that it works incorrectly for negative numbers. -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From guido at python.org Thu May 29 21:19:52 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 29 May 2008 12:19:52 -0700 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com> <7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com> Message-ID: On Thu, May 29, 2008 at 10:51 AM, wrote: > This is in response to Stefan Behnel, who wrote > > ---- > > Tutorial section on "tuples and sequences", not quite the most hidden place in > the universe. > > http://docs.python.org/tut/node7.html#SECTION007300000000000000000 > > Stefan > > ---- > > I just read that section twice and no where does it mention that > Python does what I was suggesting. In fact, by the discussion on > sequence unpacking, it seems to imply that Python does *not* do what I > wanted. It implies no such thing. It says "tuples may be nested" but fails to show an example of a nested tuple on the LHS of an assignment. I don't see how you can draw the conclusion from this that it's not supported. > PS: Sorry for letting all of this sarcasm get under my skin and clog > up the mailing list, but it's really mean spirited and unnecssary. You can't very well demand that a tutorial have examples of every feature. Perhaps (continuing the sarcasm for a bit) you also didn't think that tuples could be nested more than one level, since there is no example of that? Plus, you could have tried this in the interactive interpreter in 10 seconds rather than spawning a long thread. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From phd at phd.pp.ru Thu May 29 21:21:57 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 29 May 2008 23:21:57 +0400 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) Message-ID: <20080529192157.GA17896@phd.pp.ru> Hello. A draft for a discussion. PEP: XXX Title: str(container) should call str(item), not repr(item) Version: $Revision$ Last-Modified: $Date$ Author: Oleg Broytmann , Jim Jewett Discussions-To: python-3000 at python.org Status: Draft Type: Standards Track Content-Type: text/plain Created: 27-May-2008 Post-History: 28-May-2008 Abstract This document discusses the advantages and disadvantages of the current implementation of str(container). It also discusses the pros and cons of a different approach - to call str(item) instead of repr(item). Motivation Currently str(container) calls repr on items. Arguments for it: -- containers refuse to guess what the user wants to see on str(container) - surroundings, delimiters, and so on; -- repr(item) usually displays type information - apostrophes around strings, class names, etc. Arguments against: -- it's illogical; str() is expected to call __str__ if it exists, not __repr__; -- there is no standard way to print a container's content calling items' __str__, that's inconvenient in cases where __str__ and __repr__ return different results; -- repr(item) sometimes do wrong things (hex-escapes non-ascii strings, e.g.) This PEP proposes to change how str(container) works. It is proposed to mimic how repr(container) works except one detail - call str on items instead of repr. This allows a user to choose what results she want to get - from item.__repr__ or item.__str__. Current situation Most container types (tuples, lists, dicts, sets, etc.) do not implement __str__ method, so str(container) calls container.__repr__, and container.__repr__, once called, forgets it is called from str and always calls repr on the container's items. This behaviour has advantages and disadvantages. One advantage is that most items are represented with type information - strings are surrounded by apostrophes, instances may have both class name and instance data: >>> print([42, '42']) [42, '42'] >>> print([Decimal('42'), datetime.now()]) [Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)] The disadvantage is that __repr__ often returns technical data (like '') or unreadable string (hex-encoded string if the input is non-ascii string): >>> print(['????']) ['\xd4\xc5\xd3\xd4'] One of the motivations for PEP 3138 is that neither repr nor str will allow the sensible printing of dicts whose keys are non-ascii text strings. Now that unicode identifiers are allowed, it includes Python's own attribute dicts. This also includes JSON serialization (and caused some hoops for the json lib). PEP 3138 proposes to fix this by breaking the "repr is safe ASCII" invariant, and changing the way repr (which is used for persistence) outputs some objects, with system-dependent failures. Changing how str(container) works would allow easy debugging in the normal case, and retrain the safety of ASCII-only for the machine-readable case. The only downside is that str(x) and repr(x) would more often be different -- but only in those cases where the current almost-the-same version is insufficient. It also seems illogical that str(container) calls repr on items instead of str. It's only logical to expect following code class Test: def __str__(self): return "STR" def __repr__(self): return "REPR" test = Test() print(test) print(repr(test)) print([test]) print(str([test])) to print STR REPR [STR] [STR] where it actually prints STR REPR [REPR] [REPR] Especially it is illogical to see that print in Python 2 uses str if it is called on what seems to be a tuple: >>> print Decimal('42'), datetime.now() 42 2008-05-27 20:16:22.534285 where on an actual tuple it prints >>> print((Decimal('42'), datetime.now())) (Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911)) A different approach - call str(item) For example, with numbers it is often only the value that people care about. >>> print Decimal('3') 3 But putting the value in a list forces users to read the type information, exactly as if repr had been called for the benefit of a machine: >>> print [Decimal('3')] [Decimal("3")] After this change, the type information would not clutter the str output: >>> print "%s".format([Decimal('3')]) [3] >>> str([Decimal('3')]) # == [3] But it would still be available if desired: >>> print "%r".format([Decimal('3')]) [Decimal('3')] >>> repr([Decimal('3')]) # == [Decimal('3')] There is a number of strategies to fix the problem. The most radical is to change __repr__ so it accepts a new parameter (flag) "called from str, so call str on items, not repr". The drawback of the proposal is that every __repr__ implementation must be changed. Introspection could help a bit (inspect __repr__ before calling if it accepts 2 or 3 parameters), but introspection doesn't work on classes written in C, like all builtin containers. Less radical proposal is to implement __str__ methods for builtin container types. The obvious drawback is a duplication of effort - all those __str__ and __repr__ implementations are only differ in one small detail - if they call str or repr on items. The most conservative proposal is not to change str at all but to allow developers to implement their own application- or library-specific pretty-printers. The drawback is again a multiplication of effort and proliferation of many small specific container-traversal algorithms. Backward compatibility In those cases where type information is more important than usual, it will still be possible to get the current results by calling repr explicitly. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Thu May 29 21:31:17 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 29 May 2008 12:31:17 -0700 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) In-Reply-To: <20080529192157.GA17896@phd.pp.ru> References: <20080529192157.GA17896@phd.pp.ru> Message-ID: Let me just save everyone a lot of time and say that I'm opposed to this change, and that I believe that it would cause way too much disturbance to be accepted this close to beta. --Guido On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann wrote: > Hello. A draft for a discussion. > > PEP: XXX > Title: str(container) should call str(item), not repr(item) > Version: $Revision$ > Last-Modified: $Date$ > Author: Oleg Broytmann , > Jim Jewett > Discussions-To: python-3000 at python.org > Status: Draft > Type: Standards Track > Content-Type: text/plain > Created: 27-May-2008 > Post-History: 28-May-2008 > > > Abstract > > This document discusses the advantages and disadvantages of the > current implementation of str(container). It also discusses the > pros and cons of a different approach - to call str(item) instead > of repr(item). > > > Motivation > > Currently str(container) calls repr on items. Arguments for it: > -- containers refuse to guess what the user wants to see on > str(container) - surroundings, delimiters, and so on; > -- repr(item) usually displays type information - apostrophes > around strings, class names, etc. > > Arguments against: > -- it's illogical; str() is expected to call __str__ if it exists, > not __repr__; > -- there is no standard way to print a container's content calling > items' __str__, that's inconvenient in cases where __str__ and > __repr__ return different results; > -- repr(item) sometimes do wrong things (hex-escapes non-ascii > strings, e.g.) > > This PEP proposes to change how str(container) works. It is > proposed to mimic how repr(container) works except one detail > - call str on items instead of repr. This allows a user to choose > what results she want to get - from item.__repr__ or item.__str__. > > > Current situation > > Most container types (tuples, lists, dicts, sets, etc.) do not > implement __str__ method, so str(container) calls > container.__repr__, and container.__repr__, once called, forgets > it is called from str and always calls repr on the container's > items. > > This behaviour has advantages and disadvantages. One advantage is > that most items are represented with type information - strings > are surrounded by apostrophes, instances may have both class name > and instance data: > > >>> print([42, '42']) > [42, '42'] > >>> print([Decimal('42'), datetime.now()]) > [Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)] > > The disadvantage is that __repr__ often returns technical data > (like '') or unreadable string (hex-encoded > string if the input is non-ascii string): > > >>> print(['????']) > ['\xd4\xc5\xd3\xd4'] > > One of the motivations for PEP 3138 is that neither repr nor str > will allow the sensible printing of dicts whose keys are non-ascii > text strings. Now that unicode identifiers are allowed, it > includes Python's own attribute dicts. This also includes JSON > serialization (and caused some hoops for the json lib). > > PEP 3138 proposes to fix this by breaking the "repr is safe ASCII" > invariant, and changing the way repr (which is used for > persistence) outputs some objects, with system-dependent failures. > > Changing how str(container) works would allow easy debugging in > the normal case, and retrain the safety of ASCII-only for the > machine-readable case. The only downside is that str(x) and > repr(x) would more often be different -- but only in those cases > where the current almost-the-same version is insufficient. > > It also seems illogical that str(container) calls repr on items > instead of str. It's only logical to expect following code > > class Test: > def __str__(self): > return "STR" > > def __repr__(self): > return "REPR" > > > test = Test() > print(test) > print(repr(test)) > print([test]) > print(str([test])) > > to print > > STR > REPR > [STR] > [STR] > > where it actually prints > > STR > REPR > [REPR] > [REPR] > > Especially it is illogical to see that print in Python 2 uses str > if it is called on what seems to be a tuple: > > >>> print Decimal('42'), datetime.now() > 42 2008-05-27 20:16:22.534285 > > where on an actual tuple it prints > > >>> print((Decimal('42'), datetime.now())) > (Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911)) > > > A different approach - call str(item) > > For example, with numbers it is often only the value that people > care about. > > >>> print Decimal('3') > 3 > > But putting the value in a list forces users to read the type > information, exactly as if repr had been called for the benefit of > a machine: > > >>> print [Decimal('3')] > [Decimal("3")] > > After this change, the type information would not clutter the str > output: > > >>> print "%s".format([Decimal('3')]) > [3] > >>> str([Decimal('3')]) # == > [3] > > But it would still be available if desired: > > >>> print "%r".format([Decimal('3')]) > [Decimal('3')] > >>> repr([Decimal('3')]) # == > [Decimal('3')] > > There is a number of strategies to fix the problem. The most > radical is to change __repr__ so it accepts a new parameter (flag) > "called from str, so call str on items, not repr". The > drawback of the proposal is that every __repr__ implementation > must be changed. Introspection could help a bit (inspect __repr__ > before calling if it accepts 2 or 3 parameters), but introspection > doesn't work on classes written in C, like all builtin containers. > > Less radical proposal is to implement __str__ methods for builtin > container types. The obvious drawback is a duplication of effort > - all those __str__ and __repr__ implementations are only differ > in one small detail - if they call str or repr on items. > > The most conservative proposal is not to change str at all but > to allow developers to implement their own application- or > library-specific pretty-printers. The drawback is again > a multiplication of effort and proliferation of many small > specific container-traversal algorithms. > > > Backward compatibility > > In those cases where type information is more important than > usual, it will still be possible to get the current results by > calling repr explicitly. > > > Copyright > > This document has been placed in the public domain. > > > > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > Oleg. > -- > Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Thu May 29 21:41:51 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 29 May 2008 15:41:51 -0400 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> Message-ID: <483F06FF.9090007@trueblade.com> Marcin ?Qrczak? Kowalczyk wrote: > 2008/5/29 Eric Smith : > >> I don't see it as a big problem. You can now use any prefix you want, >> instead of the hard coded values that # supplied. > > Except that it works incorrectly for negative numbers. Excellent point. If only this had been brought up back when the PEP was written :( Any suggestions on how to improve the situation? I guess we could add '#' back in to the format specifier. I can't really think of any other way that doesn't involve converting the number to a string and then operating on that, just to get the sign. I'm reasonably sure I could implement that before the beta (next Wednesday) if a decision is reached before this weekend. Eric. From stephen at xemacs.org Thu May 29 22:00:35 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 30 May 2008 05:00:35 +0900 Subject: [Python-3000] PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com> <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com> <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp> <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com> <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87y75skhi4.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > On 5/26/08, Stephen J. Turnbull wrote: > > Jim Jewett writes: > > > > The only reason for this change is that __repr__ gets used when > > > __str__ *should* be used instead. > > > That's not what the advocates say. > > I still haven't seen a use case where it *should* be using repr *and* > needs to print outside of ASCII. I suggest that's because you rarely (if ever) read program or program *textual* input or output that's not written in ASCII. > > Now, I agree with you about what's "safe". However, in a text- > > processing application in a Japanese environment, that's hardly > > useful, and our Japanese programmer can argue that in his environment, > > printing all of Unicode *is* safe. > > I think he or she will still be wrong, because of confusables -- it is > just that "unsafe" characters are far more rare (since byte value > alone isn't a problem) and the cost of not printing non-ASCII > characters is higher. AFAIK confusables in strings are generally not a problem, that's part of what I mean by "environment". If they are, then you probably need to set up special controls in the environment anyway, and Python giving you Unicode escapes instead of glyphs is redundant. > > I don't use it myself other than as a way of diagnosing bugs in > > programs I write or maintain; in personal practice, I'm in your camp. > > But my understanding is that there is often an intermediate level, > > such as a website admin, who needs *some* of the precision of repr() > > such as escaped representation of whitespace, but also needs to be > > able read most of the output. > > Could someone who does need this explain more? I don't think that's useful. See below. > I don't understand needing *exactly* whitespace escaped, but not, say, > stray characters from scripts you've never used, even though the rest > of the page *is* in an expected script. Of course *everybody* wants *stray* characters escaped! The problem is that to a Japanese, the 21000 kanji are *not* stray characters. To a Korean, the 21000 kanji and the 11000 Hangul are not stray characters. Etc. So the first question is "can repr()'s printable repertoire usefully be made locale-dependent?", and the answer is emphatically "no". (I'm pretty sure that's a pronouncement from Guido, I could look it up later.) The next question is "what is the most useful compromise?", and the candidates are "ASCII" and "all of Unicode". You want the former, and the 5.7 billion people whose native language is not American English want the latter. I don't know about the other 300 million Americans. From phd at phd.pp.ru Thu May 29 21:57:57 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 29 May 2008 23:57:57 +0400 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) In-Reply-To: References: <20080529192157.GA17896@phd.pp.ru> Message-ID: <20080529195757.GB17896@phd.pp.ru> On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote: > Let me just save everyone a lot of time and say that I'm opposed to > this change, and that I believe that it would cause way too much > disturbance to be accepted this close to beta. > > On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann wrote: > > PEP: XXX > > Title: str(container) should call str(item), not repr(item) That's ok. A rejected PEP has its purpose, too. It will rest peacefully in the archive, holding all arguments consolidated and will serve as a point of reference. Any objection if I demand it be properly registered, assigned a number and then rejected? PS. Am I the champion whose PEP has been killed before I even finished it? ;) Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From stephen at xemacs.org Thu May 29 22:15:47 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 30 May 2008 05:15:47 +0900 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com> <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com> Message-ID: <87wslckgss.fsf@uwakimon.sk.tsukuba.ac.jp> allyourcode at gmail.com writes: > Well, I'm sorry for bothering his majesty with such a stupid idea. At > least one other person didn't know about it either... > > On 5/28/08, Mike Klaas wrote: > > I find it hard to believe that you have even attempted this, which has > > been valid in python for ages: Um, stupidity (in the sense of not understanding all the implications of the grammar) or ignorance (of the relevant section of the docs) is not the point. The point is that the proposed syntax (a) might already mean something (even the semantics you suggest, in which case you should say "d'oh, thanks!" when it is pointed out), or (b) not be feasible for reasons that become obvious when you see the error message that is emitted when you try it. So try it before posting. From musiccomposition at gmail.com Thu May 29 22:49:08 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 29 May 2008 15:49:08 -0500 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com> On Wed, May 28, 2008 at 11:38 PM, Brett Cannon wrote: > > Issue 2854 - gestalt needs to be added back into 3.0. This is > Benjamin's issue. =) Is that your way of say "Check in the patch!" ? :) -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From brett at python.org Thu May 29 23:01:52 2008 From: brett at python.org (Brett Cannon) Date: Thu, 29 May 2008 14:01:52 -0700 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: <1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com> References: <1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com> Message-ID: On Thu, May 29, 2008 at 1:49 PM, Benjamin Peterson wrote: > On Wed, May 28, 2008 at 11:38 PM, Brett Cannon wrote: >> >> Issue 2854 - gestalt needs to be added back into 3.0. This is >> Benjamin's issue. =) > > Is that your way of say "Check in the patch!" ? :) > More or less; specifically, "don't forget to do this." =) -Brett From musiccomposition at gmail.com Thu May 29 23:04:43 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 29 May 2008 16:04:43 -0500 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) In-Reply-To: <20080529195757.GB17896@phd.pp.ru> References: <20080529192157.GA17896@phd.pp.ru> <20080529195757.GB17896@phd.pp.ru> Message-ID: <1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com> On Thu, May 29, 2008 at 2:57 PM, Oleg Broytmann wrote: > On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote: >> Let me just save everyone a lot of time and say that I'm opposed to >> this change, and that I believe that it would cause way too much >> disturbance to be accepted this close to beta. >> >> On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann wrote: >> > PEP: XXX >> > Title: str(container) should call str(item), not repr(item) > > That's ok. A rejected PEP has its purpose, too. It will rest peacefully > in the archive, holding all arguments consolidated and will serve as a point > of reference. > Any objection if I demand it be properly registered, assigned a number > and then rejected? I've added it for you. See r63794. > > PS. Am I the champion whose PEP has been killed before I even finished it? ;) Probably not. :) > > Oleg. > -- > Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/musiccomposition%40gmail.com > -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From phd at phd.pp.ru Thu May 29 23:16:58 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 30 May 2008 01:16:58 +0400 Subject: [Python-3000] PEP 3140: str(container) should call str(item), not repr(item)s In-Reply-To: <1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com> References: <20080529192157.GA17896@phd.pp.ru> <20080529195757.GB17896@phd.pp.ru> <1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com> Message-ID: <20080529211658.GB19274@phd.pp.ru> On Thu, May 29, 2008 at 04:04:43PM -0500, Benjamin Peterson wrote: > On Thu, May 29, 2008 at 2:57 PM, Oleg Broytmann wrote: > > Any objection if I demand it be properly registered, assigned a number > > and then rejected? > > I've added it for you. See r63794. Thank you! Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From greg.ewing at canterbury.ac.nz Fri May 30 00:26:12 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 30 May 2008 10:26:12 +1200 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> Message-ID: <483F2D84.10604@canterbury.ac.nz> Daniel Wong wrote: > Are there plans for introducing syntax like this: > > (a, (b[2], c)) = ('big' ('red', 'dog')) I think you'll find Guido has made another trip in the time machine for this one: Python 2.3 (#1, Aug 5 2003, 15:52:30) [GCC 3.1 20020420 (prerelease)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> b = [0,1,2] >>> (a, (b[2], c)) = ('big', ('red', 'dog')) >>> a 'big' >>> b [0, 1, 'red'] >>> c 'dog' >>> -- Greg From ncoghlan at gmail.com Fri May 30 00:57:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 30 May 2008 08:57:07 +1000 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483EF139.8000606@egenix.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> Message-ID: <483F34C3.3050402@gmail.com> M.-A. Lemburg wrote: > * Why can't we have both PyString *and* PyBytes exposed in 2.x, > with one redirecting to the other ? We do have that - the PyString_* names still work perfectly fine in 2.x. They just won't be used in the Python core codebase anymore - everything in the Python core will use either PyBytes_* or PyUnicode_* regardless of which branch (2.x or 3.x) you're working on. I think that's a good thing for ease of maintenance in the future, even if it takes people a while to get their heads around it right now. > * Why should the 2.x code base turn to hacks, just because 3.x wants > to restructure itself ? With the better explanation from Greg of what the checked in approach achieves (i.e. preserving exact ABI compatibility for PyString_*, while allowing PyBytes_* to be used at the source code level), I don't see what has been done as being any more of a hack than the possibly more common "#define " (which *would* break binary compatibility). The only things that I think would tidy it up further would be to: - include an explanation of the approach and its effects on API and ABI backward and forward compatibility within 2.x and between 2.x and 3.x in stringobject.h - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Fri May 30 01:10:23 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 30 May 2008 09:10:23 +1000 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <483F06FF.9090007@trueblade.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> Message-ID: <483F37DF.5060507@gmail.com> Eric Smith wrote: > Marcin ?Qrczak? Kowalczyk wrote: >> 2008/5/29 Eric Smith : >> >>> I don't see it as a big problem. You can now use any prefix you want, >>> instead of the hard coded values that # supplied. >> >> Except that it works incorrectly for negative numbers. > > Excellent point. If only this had been brought up back when the PEP was > written :( > > Any suggestions on how to improve the situation? I guess we could add > '#' back in to the format specifier. I can't really think of any other > way that doesn't involve converting the number to a string and then > operating on that, just to get the sign. > > I'm reasonably sure I could implement that before the beta (next > Wednesday) if a decision is reached before this weekend. Doing the right thing for negative numbers is a good point. It also means the prefix can be handled properly when dealing with aligned fields. The following update to the standard format specifier in the PEP: [[fill]align][#][sign][0][minimumwidth][.precision][type] The '#' prefix option inserts the appropriate prefix characters ('0b', '0o', '0x', '0X') when displaying numbers in binary, octal or hexadecimal formats. The prefix is inserted into the displayed number after the sign character and fill characters (if any), but before any leading zeroes. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From wescpy at gmail.com Fri May 30 01:34:39 2008 From: wescpy at gmail.com (wesley chun) Date: Thu, 29 May 2008 16:34:39 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <483F06FF.9090007@trueblade.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> Message-ID: <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> On 5/29/08, Eric Smith wrote: > Marcin 'Qrczak' Kowalczyk wrote: > > Except that it works incorrectly for negative numbers. wow, that is a great point. i didn't think of this either. it makes it very inconvenient (see below) and makes it more difficult to say we've completed replaced the '%' operator. > I can't really think of any other way that doesn't involve converting the > number to a string and then operating on that, just to get the sign. here's one way of doing it without converting to a string first (it's ugly too): >>> i = -45 >>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i)) '-0x2d' thx for putting it (back) in, -wesley From humberto at digi.com.br Fri May 30 02:25:21 2008 From: humberto at digi.com.br (Humberto Diogenes) Date: Thu, 29 May 2008 21:25:21 -0300 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <21258346-305C-439D-A7F0-EE945FE77B37@digi.com.br> On 29/05/2008, at 14:32, Brett Cannon wrote: > On Thu, May 29, 2008 at 12:12 AM, Georg Brandl > wrote: >> >>> Issue 2848 - mimetools has been deprecated for a while, but it is >>> still used in a bunch of places. Since this has been deprecated in >>> PEP >>> 4 for a long time, should we add the removal warning in 2.6 now and >>> then make its actual removal of usage something to do by another >>> beta? >>> >>> Issue 2849 - rfc822 is the same problem as mimetools. >> >> The problem is that nobody seems to know what exactly distinguishes >> mimetools/rfc822' classes and its successor's (email's) classes, so >> it's hard to replace it in the stdlib. >> > > Right. I have looked myself over the years and it never seemed > brain-dead simple. Well, as documented in issue 2849, rfc822 is almost gone. I've already removed it from mailbox and test_urllib2 modules. It seems that there remains only one important use of it, which is in cgi.FieldStorage.read_multi(). I couldn't figure out how to replace it there, though, as read_multi's current implementation relies on the fact that rfc822.Message(fp) advances the file pointer just by the amount it needs, while email.parser.Parser.parse() reads the whole file. I believe that read_multi can be rewritten in a way that's compatible with email.parser, but I don't know how to do that... :\ -- Humberto Di?genes http://humberto.digi.com.br From eric+python-dev at trueblade.com Fri May 30 02:45:46 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Thu, 29 May 2008 20:45:46 -0400 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> Message-ID: <483F4E3A.9090403@trueblade.com> wesley chun wrote: > On 5/29/08, Eric Smith wrote: >> Marcin 'Qrczak' Kowalczyk wrote: >>> Except that it works incorrectly for negative numbers. > > wow, that is a great point. i didn't think of this either. it makes > it very inconvenient (see below) and makes it more difficult to say > we've completed replaced the '%' operator. > > >> I can't really think of any other way that doesn't involve converting the >> number to a string and then operating on that, just to get the sign. > > here's one way of doing it without converting to a string first (it's ugly too): > >>>> i = -45 >>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i)) > '-0x2d' Agreed, ick! > thx for putting it (back) in, I didn't say I would, I said I would if a decision was reached :) I'd like to see some more consensus, and I hope that Talin (the PEP author) chimes in. Eric. From greg.ewing at canterbury.ac.nz Fri May 30 03:07:47 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 30 May 2008 13:07:47 +1200 Subject: [Python-3000] suggestion: structured assignment In-Reply-To: <7c8225f20805291744n62431d93y1c0c484d0387a01d@mail.gmail.com> References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com> <483F2D84.10604@canterbury.ac.nz> <7c8225f20805291744n62431d93y1c0c484d0387a01d@mail.gmail.com> Message-ID: <483F5363.6030409@canterbury.ac.nz> Daniel Wong wrote: > Ironic that you should mention it. He already mentioned it. The time machine thing is pretty much a standard joke in the Python community, which goes to show how common it is for people to be pleasantly surprised by what Python already does. I think everyone's being a bit hard on Mr. Wong here. When you're new to Python, you don't always realise at first how deep and subtle a thing you're dealing with. I was guilty of failing to follow the try-it-first rule myself in the early days. I soon learned better. In fact, it works the other way too -- the transcript I posted was the result of me thinking "Doesn't that already work? Hang on, I'd better try it first to make sure it really does..." :-) -- Greg From greg at krypto.org Fri May 30 09:45:25 2008 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 30 May 2008 00:45:25 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483F34C3.3050402@gmail.com> References: <48397ECC.9070805@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> Message-ID: <52dc1c820805300045g5c37256em31de5f5d76dc365b@mail.gmail.com> On Thu, May 29, 2008 at 3:57 PM, Nick Coghlan wrote: > M.-A. Lemburg wrote: >> * Why should the 2.x code base turn to hacks, just because 3.x wants >> to restructure itself ? > > With the better explanation from Greg of what the checked in approach > achieves (i.e. preserving exact ABI compatibility for PyString_*, while > allowing PyBytes_* to be used at the source code level), I don't see what > has been done as being any more of a hack than the possibly more common > "#define " (which *would* break binary compatibility). > > The only things that I think would tidy it up further would be to: > - include an explanation of the approach and its effects on API and ABI > backward and forward compatibility within 2.x and between 2.x and 3.x in > stringobject.h > - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 Yes that is the only complaint I believe I really see left at this point. It is easy enough to fix. Change the current stringobject.h "#define PyBytes_Foo PyString_Foo" approach into a .c file that defines one line stub functions for all PyString_Foo() functions to call actual PyBytes_Foo() functions. I'd even go so far as to put the one line alternate name stubs in the Objects/bytesobject.c and .h file right next to the PyBytes_Foo() method definitions so that its clear from reading a single file that they are the same thing. The performance implications of this are minor all things considered (a single absolute jmp given a good compiler) and regardless of what we do should only apply to extension modules, not the core. If we do the above in trunk will this thread end? I'm personally not really clear on why we need PyBytes_Foo to show up in the -binary- ABI in 2.6. The #define's are enough for me but I'm happy to make this compromise. No 2.x books, documentation or literature will be invalidated by the changes regardless. -gps From oliphant.travis at ieee.org Fri May 30 10:21:59 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 30 May 2008 03:21:59 -0500 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: Message-ID: Stefan Behnel wrote: > Travis Oliphant wrote: >> Stefan Behnel wrote: >>> Anyway, my point is that this part of the protocol actually implies >>> setting a >>> lock on the buffer *provider* rather than the buffer itself, as the >>> buffer >>> provider cannot distinguish between different buffers based on a NULL >>> pointer >> Yes, the language in the PEP could be more clear. Obviously, if you >> haven't provided a Py_buffer structure to fill in, then you are only >> asking to lock the object's buffer from other access. > > That's what I'm questioning below. > I see what you are referring to. The protocol to lock the buffer after requesting and obtaining one was not well thought out. I think the use case I had in mind was locking in the buffer before actually getting it. Once you have a buffer, I see how you may want to lock the buffer after getting it. For example, I could see how you may want to go from a non-locked read/write where you are guaranteed by the object that it won't move the memory but not that someone hasn't written to the memory area to an exclusive write-lock where no-one else can write to the area until you are done. This should be clarified in the PEP. Can you take a stab at it? -Travis From mal at egenix.com Fri May 30 10:37:08 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 30 May 2008 10:37:08 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483F34C3.3050402@gmail.com> References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com> <483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> Message-ID: <483FBCB4.5020007@egenix.com> On 2008-05-30 00:57, Nick Coghlan wrote: > M.-A. Lemburg wrote: >> * Why can't we have both PyString *and* PyBytes exposed in 2.x, >> with one redirecting to the other ? > > We do have that - the PyString_* names still work perfectly fine in 2.x. > They just won't be used in the Python core codebase anymore - everything > in the Python core will use either PyBytes_* or PyUnicode_* regardless > of which branch (2.x or 3.x) you're working on. I think that's a good > thing for ease of maintenance in the future, even if it takes people a > while to get their heads around it right now. Sorry, I probably wasn't clear enough: Why can't we have both PyString *and* PyBytes exposed as C APIs (ie. visible in code and in the linker) in 2.x, with one redirecting to the other ? >> * Why should the 2.x code base turn to hacks, just because 3.x wants >> to restructure itself ? > > With the better explanation from Greg of what the checked in approach > achieves (i.e. preserving exact ABI compatibility for PyString_*, while > allowing PyBytes_* to be used at the source code level), I don't see > what has been done as being any more of a hack than the possibly more > common "#define " (which *would* break binary > compatibility). > > The only things that I think would tidy it up further would be to: > - include an explanation of the approach and its effects on API and ABI > backward and forward compatibility within 2.x and between 2.x and 3.x in > stringobject.h > - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 Which is what I was suggesting all along; sorry if I wasn't clear enough on that. The standard approach is that you provide #define redirects from the old APIs to the new ones (which are then picked up by the compiler) *and* add function wrappers to the same affect (to make linkers, dynamic load APIs such ctypes and debuggers happy). Example from pythonrun.h|c: --------------------------- /* Use macros for a bunch of old variants */ #define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, NULL) /* Deprecated C API functions still provided for binary compatiblity */ #undef PyRun_String PyAPI_FUNC(PyObject *) PyRun_String(const char *str, int s, PyObject *g, PyObject *l) { return PyRun_StringFlags(str, s, g, l, NULL); } I still believe that we should *not* make "easy of merging" the primary motivation for backporting changes in 3.x to 2.x. Software design should not be guided by restrictions in the tool chain, if not absolutely necessary. The main argument for a backport needs to be general usefulness to the 2.x users, IMHO... just like any other feature that makes it into 2.x. If merging is difficult then this needs to be addressed, but there are more options to that than always going back to the original 2.x trunk code. I've given a few suggestions on how this could be approached in other emails on this thread. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 30 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 37 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From jgennis at gmail.com Fri May 30 11:13:08 2008 From: jgennis at gmail.com (Jamie Gennis) Date: Fri, 30 May 2008 02:13:08 -0700 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC) In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> References: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> Message-ID: Perhaps drawing a distinction between containers (or maybe "collections"?), and non-container iterables is appropriate? I would define containers as objects that can be iterated over multiple times and for which iteration does not instantiate new objects. By this definition generators would not be considered containers (but views would), and for practicality it may be worth also having an ABC for containers-and-generators (no idea what to name it). This would result in the following hierarchy: iterables - strings, bytes, etc. - containers-and-generators - - containers - - - tuple, list, set, dict views, etc. - - generators I don't think there needs to be different operations defined for the different ABCs. They're all just iterables with different iteration semantics. Jamie On Tue, May 27, 2008 at 3:54 PM, Raymond Hettinger wrote: > "Jim Jewett" > >> It isn't really stringiness that matters, it is that you have to >> terminate even though you still have an iterable container. >> > > Well said. > > > Guido had at least a start in Searchable, back when ABC >> were still in the sandbox: >> > > Have to disagree here. An object cannot know in general > whether a flattener wants to split it or not. That is an application > dependent decision. A better answer is be able to tell the > flattener what should be considered atomic in a given circumstance. > > > Raymond > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/jgennis%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri May 30 11:51:45 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 May 2008 09:51:45 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: Message-ID: Hi, I'm surprised that nobody except Robert Brewer reacted to my proposal. The two relevant bugs (#2507 and #2833) have been marked respectively as "critical" and "release blocker", so I thought at least some people felt concerned :-) Should I wait a bit for people to react and give a qualified opinion, or should I assume one of the following implicit answers (and if so, which one!): - we don't really care about re-raising, just fix #2507 the simple way so that exception state is properly cleaned up - we must fix both #2507 and #2833 in a clean way, and your proposal looks fine - we must fix both #2507 and #2833 in a clean way, but your proposal is completely bogus cheers Antoine. From g.brandl at gmx.net Fri May 30 14:19:23 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 30 May 2008 14:19:23 +0200 Subject: [Python-3000] urllib.quote/unquote behavior? Message-ID: Hi, Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely. quote() encodes characters with codepoint < 256 using latin-1, but others using utf-8. unquote() decodes everything using latin-1. Is the correct behavior to always use utf-8? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From phd at phd.pp.ru Fri May 30 14:42:30 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 30 May 2008 16:42:30 +0400 Subject: [Python-3000] urllib.quote/unquote behavior? In-Reply-To: References: Message-ID: <20080530124230.GA32657@phd.pp.ru> On Fri, May 30, 2008 at 02:19:23PM +0200, Georg Brandl wrote: > Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely. > quote() encodes characters with codepoint < 256 using latin-1, but others > using utf-8. unquote() decodes everything using latin-1. > > Is the correct behavior to always use utf-8? Always UTF-8. See http://en.wikipedia.org/wiki/Percent-encoding#Current_standard Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Fri May 30 16:07:32 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 May 2008 14:07:32 +0000 (UTC) Subject: [Python-3000] urllib.quote/unquote behavior? References: <20080530124230.GA32657@phd.pp.ru> Message-ID: Oleg Broytmann phd.pp.ru> writes: > On Fri, May 30, 2008 at 02:19:23PM +0200, Georg Brandl wrote: > > Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely. > > quote() encodes characters with codepoint < 256 using latin-1, but others > > using utf-8. unquote() decodes everything using latin-1. > > > > Is the correct behavior to always use utf-8? > > Always UTF-8. See > http://en.wikipedia.org/wiki/Percent-encoding#Current_standard Well, according to your link things are not that simple: """ This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected. """ Practically, in the particular case of HTTP, you must probably distinguish between the file path part (before the ? sign) and the query string part (after the ? sign). The file path percent-encoding may depend on the actual filesystem encoding, or the Web server configuration. The query string percent-encoding may depend on the actual Web application being queried, or the programming language in which it's written, or anything else altogether :-) Regards Antoine. From divinekid at gmail.com Fri May 30 16:57:45 2008 From: divinekid at gmail.com (Haoyu Bai) Date: Fri, 30 May 2008 22:57:45 +0800 Subject: [Python-3000] Any plan to export PyInstanceMethod? Message-ID: <484015E9.1060206@gmail.com> Hello, As I can see that there is a PyInstanceMethod_New C API in Python 3, which is a replacement of the old new.instancemethod. However, it is not exported to Python namespace such as builtin or other module currently. So I am curious that is there any plan to export it? Thank you! Best regards, Haoyu Bai 5/30/2008 From rrr at ronadam.com Fri May 30 17:50:08 2008 From: rrr at ronadam.com (Ron Adam) Date: Fri, 30 May 2008 10:50:08 -0500 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC) In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> References: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> Message-ID: <48402230.3020505@ronadam.com> Raymond Hettinger wrote: > "Jim Jewett" >> It isn't really stringiness that matters, it is that you have to >> terminate even though you still have an iterable container. > > Well said. > > >> Guido had at least a start in Searchable, back when ABC >> were still in the sandbox: > > Have to disagree here. An object cannot know in general > whether a flattener wants to split it or not. That is an application > dependent decision. A better answer is be able to tell the > flattener what should be considered atomic in a given circumstance. > > > Raymond A while back (a couple of years I think), we had a discussion on python-list about flatten in which I posted the following version of a flatten function. It turned out to be nearly twice as fast as any other version. def flatten(L): """ Flatten a list in place. """ i = 0 while i < len(L): while type(L[i]) is list: L[i:i+1] = L[i] i += 1 return L For this to work the object to be flattened needs to be both mutable and list like. At the moment I can't think of any reason I would want to flatten anything that was not list like. To make it a bit more flexible it could be changed just a bit. def flatten(L): """ Flatten a list in place. """ objtype = type(L) i = 0 while i < len(L): while type(L[i]) is objtype: L[i:i+1] = L[i] i += 1 return L Generally, I don't think you would want to flatten dissimilar objects. Cheers, Ron From rrr at ronadam.com Fri May 30 17:50:08 2008 From: rrr at ronadam.com (Ron Adam) Date: Fri, 30 May 2008 10:50:08 -0500 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC) In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> References: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1> Message-ID: <48402230.3020505@ronadam.com> Raymond Hettinger wrote: > "Jim Jewett" >> It isn't really stringiness that matters, it is that you have to >> terminate even though you still have an iterable container. > > Well said. > > >> Guido had at least a start in Searchable, back when ABC >> were still in the sandbox: > > Have to disagree here. An object cannot know in general > whether a flattener wants to split it or not. That is an application > dependent decision. A better answer is be able to tell the > flattener what should be considered atomic in a given circumstance. > > > Raymond A while back (a couple of years I think), we had a discussion on python-list about flatten in which I posted the following version of a flatten function. It turned out to be nearly twice as fast as any other version. def flatten(L): """ Flatten a list in place. """ i = 0 while i < len(L): while type(L[i]) is list: L[i:i+1] = L[i] i += 1 return L For this to work the object to be flattened needs to be both mutable and list like. At the moment I can't think of any reason I would want to flatten anything that was not list like. To make it a bit more flexible it could be changed just a bit. def flatten(L): """ Flatten a list in place. """ objtype = type(L) i = 0 while i < len(L): while type(L[i]) is objtype: L[i:i+1] = L[i] i += 1 return L Generally, I don't think you would want to flatten dissimilar objects. Cheers, Ron From guido at python.org Fri May 30 19:30:53 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 30 May 2008 10:30:53 -0700 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: The issue you're raising is deep. subtle and complex -- I can't quite fathom your proposal, and expect I'd have to spend at least an hour with the source code before I could truly understand the issue and the proposal. I haven't done that yet, so take the following with a grain of salt. That said, it seems you are proposing taking the logical consequence of making except handlers properly nested and scoped, and if you can come up with a patch to implement this, I think I could support it. I would be okay as well with restricting bare raise syntactically to appearing only inside an except block, to emphasize the change in semantics that was started when we decided to make the optional variable disappear at the end of the except block. This would render the following code illegal: def f(): try: 1/0 except: pass raise I am fine with that, even if there are probably some uses of it that may be a little tricky to rewrite. (The same happened when we reduced the variable scope.) --Guido On Fri, May 30, 2008 at 2:51 AM, Antoine Pitrou wrote: > > Hi, > > I'm surprised that nobody except Robert Brewer reacted to my proposal. The two > relevant bugs (#2507 and #2833) have been marked respectively as "critical" and > "release blocker", so I thought at least some people felt concerned :-) > > Should I wait a bit for people to react and give a qualified opinion, or should > I assume one of the following implicit answers (and if so, which one!): > > - we don't really care about re-raising, just fix #2507 the simple way so that > exception state is properly cleaned up > - we must fix both #2507 and #2833 in a clean way, and your proposal looks fine > - we must fix both #2507 and #2833 in a clean way, but your proposal is > completely bogus > > cheers > > Antoine. > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Fri May 30 19:40:34 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 30 May 2008 11:40:34 -0600 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: On Fri, May 30, 2008 at 3:51 AM, Antoine Pitrou wrote: > > Hi, > > I'm surprised that nobody except Robert Brewer reacted to my proposal. The two > relevant bugs (#2507 and #2833) have been marked respectively as "critical" and > "release blocker", so I thought at least some people felt concerned :-) Flip side of the bikeshed effect. Nobody feels confident in their understanding so nobody comments. > Should I wait a bit for people to react and give a qualified opinion, or should > I assume one of the following implicit answers (and if so, which one!): > > - we don't really care about re-raising, just fix #2507 the simple way so that > exception state is properly cleaned up > - we must fix both #2507 and #2833 in a clean way, and your proposal looks fine > - we must fix both #2507 and #2833 in a clean way, but your proposal is > completely bogus I'd like if a bare "raise" became purely lexical (as Guido just suggested), ditching all the magic. However, things such as pdb.pm() still need access to the last exception. Maybe we can pare it down the bare minimum, a per-thread last_exception? That'd quickly get clobbered (we should intentionally clear when leaving an except block), but is that ever a problem? -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Fri May 30 20:10:31 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 May 2008 18:10:31 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: Message-ID: Hello, Guido van Rossum python.org> writes: > > That said, it seems you are proposing taking the logical consequence > of making except handlers properly nested and scoped, It's exactly that. > I would be okay as well with restricting bare raise syntactically to > appearing only inside an except block, to emphasize the change in > semantics that was started when we decided to make the optional > variable disappear at the end of the except block. > > This would render the following code illegal: > > def f(): > try: 1/0 > except: pass > raise Please note as well that: def f(): try: 1/0 except: pass return sys.exc_info() would return (None, None, None). Actually, it already does with the patch I proposed for #2507, and the test suite runs fine after fixing a problem in doctest.py. Regards Antoine. From solipsis at pitrou.net Fri May 30 20:16:20 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 May 2008 18:16:20 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: Message-ID: Adam Olsen gmail.com> writes: > I'd like if a bare "raise" became purely lexical (as Guido just > suggested), ditching all the magic. > > However, things such as pdb.pm() still need access to the last > exception. Maybe we can pare it down the bare minimum, a per-thread > last_exception? That'd quickly get clobbered (we should intentionally > clear when leaving an except block), Well, the plan is to keep storing the current exception state in the thread state structure, so sys.exc_info() would still work fine until we leave the exception block. From guido at python.org Fri May 30 20:28:59 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 30 May 2008 11:28:59 -0700 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: On Fri, May 30, 2008 at 10:40 AM, Adam Olsen wrote: > I'd like if a bare "raise" became purely lexical (as Guido just > suggested), ditching all the magic. > > However, things such as pdb.pm() still need access to the last > exception. Maybe we can pare it down the bare minimum, a per-thread > last_exception? That'd quickly get clobbered (we should intentionally > clear when leaving an except block), but is that ever a problem? No, pdb.pm() uses sys.last_*, not sys.exc_*. This is three variables set only when an unhandled exception reaches the interactive prompt and prints a traceback there. So no worries. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri May 30 20:31:22 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 30 May 2008 11:31:22 -0700 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: On Fri, May 30, 2008 at 11:10 AM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> >> That said, it seems you are proposing taking the logical consequence >> of making except handlers properly nested and scoped, > > It's exactly that. > >> I would be okay as well with restricting bare raise syntactically to >> appearing only inside an except block, to emphasize the change in >> semantics that was started when we decided to make the optional >> variable disappear at the end of the except block. >> >> This would render the following code illegal: >> >> def f(): >> try: 1/0 >> except: pass >> raise > > Please note as well that: > > def f(): > try: 1/0 > except: pass > return sys.exc_info() > > would return (None, None, None). > Actually, it already does with the patch I proposed for #2507, and the test > suite runs fine after fixing a problem in doctest.py. I'm fine with that. Since in 3.0 sys.exc_info() returns nothing that isn't accessible from the caught variable, the only reason to use it is that it makes the exception available to functions *called* from the except clause. (E.g. logging.exception() works this way.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Fri May 30 20:54:29 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 30 May 2008 12:54:29 -0600 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: On Fri, May 30, 2008 at 12:16 PM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> I'd like if a bare "raise" became purely lexical (as Guido just >> suggested), ditching all the magic. >> >> However, things such as pdb.pm() still need access to the last >> exception. Maybe we can pare it down the bare minimum, a per-thread >> last_exception? That'd quickly get clobbered (we should intentionally >> clear when leaving an except block), > > Well, the plan is to keep storing the current exception state in the > thread state structure, so sys.exc_info() would still work fine until we > leave the exception block. Just to be clear, you'll remove PyFrameObject's f_exc_{type,value,traceback}, and rely exclusively on sys.exc_info(), right? -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Fri May 30 21:02:43 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 30 May 2008 19:02:43 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: Message-ID: Adam Olsen gmail.com> writes: > > Just to be clear, you'll remove PyFrameObject's > f_exc_{type,value,traceback}, Yes. > and rely exclusively on sys.exc_info(), > right? More exactly, tstate->exc_* will continue storing the current state, and sys.exc_info() will continue relying on these values. regards Antoine. From g.brandl at gmx.net Fri May 30 21:08:33 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 30 May 2008 21:08:33 +0200 Subject: [Python-3000] Mac module removal complete? Message-ID: Hi, there still is a plat-mac directory in Lib (though it's empty), and several places in the tree refer to it. Also, quite a few libs/scripts/tools in the Mac subdir refer to modules that were removed in Python 3. Some Mac head will need to do some additional cleanup before final release (I'd do it, but as a non-Mac-user I can't judge well enough what is important). Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Fri May 30 21:09:25 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 30 May 2008 12:09:25 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <483F4E3A.9090403@trueblade.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> <483F4E3A.9090403@trueblade.com> Message-ID: I'd be fine with adding '#' back to the formatting language for hex and oct. On Thu, May 29, 2008 at 5:45 PM, Eric Smith wrote: > wesley chun wrote: >> >> On 5/29/08, Eric Smith wrote: >>> >>> Marcin 'Qrczak' Kowalczyk wrote: >>>> >>>> Except that it works incorrectly for negative numbers. >> >> wow, that is a great point. i didn't think of this either. it makes >> it very inconvenient (see below) and makes it more difficult to say >> we've completed replaced the '%' operator. >> >> >>> I can't really think of any other way that doesn't involve converting >>> the >>> number to a string and then operating on that, just to get the sign. >> >> here's one way of doing it without converting to a string first (it's ugly >> too): >> >>>>> i = -45 >>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i)) >> >> '-0x2d' > > Agreed, ick! > >> thx for putting it (back) in, > > I didn't say I would, I said I would if a decision was reached :) I'd like > to see some more consensus, and I hope that Talin (the PEP author) chimes > in. > > Eric. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From musiccomposition at gmail.com Fri May 30 21:22:33 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 30 May 2008 14:22:33 -0500 Subject: [Python-3000] Mac module removal complete? In-Reply-To: References: Message-ID: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com> On Fri, May 30, 2008 at 2:08 PM, Georg Brandl wrote: > Hi, > > there still is a plat-mac directory in Lib (though it's empty), and several > places in the tree refer to it. Also, quite a few libs/scripts/tools in the > Mac subdir refer to modules that were removed in Python 3. I'm pretty sure that plat-mac is going to go, but can Brett confirm? > > Some Mac head will need to do some additional cleanup before final release > (I'd do it, but as a non-Mac-user I can't judge well enough what is > important). I can handle that. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From brett at python.org Fri May 30 21:36:38 2008 From: brett at python.org (Brett Cannon) Date: Fri, 30 May 2008 12:36:38 -0700 Subject: [Python-3000] Mac module removal complete? In-Reply-To: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com> References: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com> Message-ID: On Fri, May 30, 2008 at 12:22 PM, Benjamin Peterson wrote: > On Fri, May 30, 2008 at 2:08 PM, Georg Brandl wrote: >> Hi, >> >> there still is a plat-mac directory in Lib (though it's empty), and several >> places in the tree refer to it. Also, quite a few libs/scripts/tools in the >> Mac subdir refer to modules that were removed in Python 3. > > I'm pretty sure that plat-mac is going to go, but can Brett confirm? Ditch it. It's empty so there is no need to keep it. I am not even sure how useful the Mac directory is at this point (but I have not looked). -Brett From guido at python.org Fri May 30 21:37:00 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 30 May 2008 12:37:00 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <48405420.8010800@trueblade.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> <483F4E3A.9090403@trueblade.com> <48405420.8010800@trueblade.com> Message-ID: Of course. On Fri, May 30, 2008 at 12:23 PM, Eric Smith wrote: > Guido van Rossum wrote: >> >> I'd be fine with adding '#' back to the formatting language for hex and >> oct. > > And bin, I assume? > >> >> On Thu, May 29, 2008 at 5:45 PM, Eric Smith >> wrote: >>> >>> wesley chun wrote: >>>> >>>> On 5/29/08, Eric Smith wrote: >>>>> >>>>> Marcin 'Qrczak' Kowalczyk wrote: >>>>>> >>>>>> Except that it works incorrectly for negative numbers. >>>> >>>> wow, that is a great point. i didn't think of this either. it makes >>>> it very inconvenient (see below) and makes it more difficult to say >>>> we've completed replaced the '%' operator. >>>> >>>> >>>>> I can't really think of any other way that doesn't involve converting >>>>> the >>>>> number to a string and then operating on that, just to get the sign. >>>> >>>> here's one way of doing it without converting to a string first (it's >>>> ugly >>>> too): >>>> >>>>>>> i = -45 >>>>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i)) >>>> >>>> '-0x2d' >>> >>> Agreed, ick! >>> >>>> thx for putting it (back) in, >>> >>> I didn't say I would, I said I would if a decision was reached :) I'd >>> like >>> to see some more consensus, and I hope that Talin (the PEP author) chimes >>> in. >>> >>> Eric. >>> >>> _______________________________________________ >>> Python-3000 mailing list >>> Python-3000 at python.org >>> http://mail.python.org/mailman/listinfo/python-3000 >>> Unsubscribe: >>> http://mail.python.org/mailman/options/python-3000/guido%40python.org >>> >> >> >> > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From eric+python-dev at trueblade.com Fri May 30 21:23:12 2008 From: eric+python-dev at trueblade.com (Eric Smith) Date: Fri, 30 May 2008 15:23:12 -0400 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> <483F4E3A.9090403@trueblade.com> Message-ID: <48405420.8010800@trueblade.com> Guido van Rossum wrote: > I'd be fine with adding '#' back to the formatting language for hex and oct. And bin, I assume? > > On Thu, May 29, 2008 at 5:45 PM, Eric Smith > wrote: >> wesley chun wrote: >>> On 5/29/08, Eric Smith wrote: >>>> Marcin 'Qrczak' Kowalczyk wrote: >>>>> Except that it works incorrectly for negative numbers. >>> wow, that is a great point. i didn't think of this either. it makes >>> it very inconvenient (see below) and makes it more difficult to say >>> we've completed replaced the '%' operator. >>> >>> >>>> I can't really think of any other way that doesn't involve converting >>>> the >>>> number to a string and then operating on that, just to get the sign. >>> here's one way of doing it without converting to a string first (it's ugly >>> too): >>> >>>>>> i = -45 >>>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i)) >>> '-0x2d' >> Agreed, ick! >> >>> thx for putting it (back) in, >> I didn't say I would, I said I would if a decision was reached :) I'd like >> to see some more consensus, and I hope that Talin (the PEP author) chimes >> in. >> >> Eric. >> >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > > > From musiccomposition at gmail.com Sat May 31 02:22:37 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 30 May 2008 19:22:37 -0500 Subject: [Python-3000] Mac module removal complete? In-Reply-To: References: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com> Message-ID: <1afaf6160805301722m1ad4d33fp264907c84e1faf9c@mail.gmail.com> On Fri, May 30, 2008 at 2:36 PM, Brett Cannon wrote: > On Fri, May 30, 2008 at 12:22 PM, Benjamin Peterson >> >> I'm pretty sure that plat-mac is going to go, but can Brett confirm? > > Ditch it. It's empty so there is no need to keep it. I am not even > sure how useful the Mac directory is at this point (but I have not > looked). I did remove references to plat-mac in the Makefile/configure. However, lib-darwin is auto-generated on install. I'm not really sure if this is still needed. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From solipsis at pitrou.net Sat May 31 03:33:14 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 31 May 2008 01:33:14 +0000 (UTC) Subject: [Python-3000] Exception re-raising woes References: Message-ID: Guido van Rossum python.org> writes: > I would be okay as well with restricting bare raise syntactically to > appearing only inside an except block, to emphasize the change in > semantics that was started when we decided to make the optional > variable disappear at the end of the except block. > > This would render the following code illegal: > > def f(): > try: 1/0 > except: pass > raise But you may want to use bare raise in a function called from an exception handler, e.g.: def handle_exception(): if user() == "Albert": # Albert likes his exceptions uncooked raise else: logging.exception("an exception occurred") def f(): try: raise KeyError except: handle_exception() Antoine. From rhamph at gmail.com Sat May 31 03:44:22 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 30 May 2008 19:44:22 -0600 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: On Fri, May 30, 2008 at 7:33 PM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> I would be okay as well with restricting bare raise syntactically to >> appearing only inside an except block, to emphasize the change in >> semantics that was started when we decided to make the optional >> variable disappear at the end of the except block. >> >> This would render the following code illegal: >> >> def f(): >> try: 1/0 >> except: pass >> raise > > But you may want to use bare raise in a function called from an exception > handler, e.g.: > > def handle_exception(): > if user() == "Albert": > # Albert likes his exceptions uncooked > raise > else: > logging.exception("an exception occurred") > > def f(): > try: > raise KeyError > except: > handle_exception() This can be rewritten to use sys.exc_info(), ie: def handle_exception(): if user() == "Albert": # Albert likes his exceptions uncooked raise sys.exc_info()[1] else: logging.exception("an exception occurred") -- Adam Olsen, aka Rhamphoryncus From mhammond at skippinet.com.au Sat May 31 08:43:43 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 31 May 2008 16:43:43 +1000 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References: Message-ID: <003a01c8c2e9$afc2d910$0f488b30$@com.au> > >> This would render the following code illegal: > >> > >> def f(): > >> try: 1/0 > >> except: pass > >> raise > > > > But you may want to use bare raise in a function called from an > exception > > handler, e.g.: > > > > def handle_exception(): > > if user() == "Albert": > > # Albert likes his exceptions uncooked > > raise > > else: > > logging.exception("an exception occurred") > > > > def f(): > > try: > > raise KeyError > > except: > > handle_exception() > > This can be rewritten to use sys.exc_info(), ie: > > def handle_exception(): > if user() == "Albert": > # Albert likes his exceptions uncooked > raise sys.exc_info()[1] > else: > logging.exception("an exception occurred") In both Python 2.x and 3 (a few months old build of Py3k though), the traceback isn't the same. For Python 2.0 you could write it like: def handle_exception(): ... raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] Its not clear how that would be spelt in py3k though (and from what I can see, sys.exc_info() itself has an uncertain future in py3k). Cheers, Mark From stefan_ml at behnel.de Sat May 31 15:36:58 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 31 May 2008 15:36:58 +0200 Subject: [Python-3000] doctest portability Message-ID: Hi, I currently use a bunch of work-arounds for doctests in lxml's test suite to make them work in Py3. I converted most tests to a mix of Py2 and Py3 syntax (e.g. using both u'' and b'' literals), and most of the runtime work is done using regular expressions that convert the except-as syntax, strip package names from tracebacks and translate bytes/str output between Py2 and Py3 syntax/repr. I know, I could use the lib2to3 package, but it a) is a one-way tool in the wrong direction if you have to distinguish bytes/str literals, b) lacks configurability stating exactly what changes need to be done and c) seemed harder to set up for doctests than doing the conversion by hand. It would be really nice if the doctest module had a simple option that specified if the doctests of a test suite are in Py2 or Py3 syntax, and then just did the right thing under Py3 (and maybe also 2.6). Otherwise, a lot more people than just myself will have a hard time getting their test suites to run in Py3, which is basically the only way to sanely migrate code. Stefan From regebro at gmail.com Sat May 31 15:52:59 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sat, 31 May 2008 15:52:59 +0200 Subject: [Python-3000] Proposal to add __str__ method to iterables. In-Reply-To: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com> References: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com> Message-ID: <319e029f0805310652k5c6b4fa6x4e7f597ebc7b344c@mail.gmail.com> On Wed, May 28, 2008 at 5:48 AM, Carl Johnson wrote: > Proposed behavior of the __str__ method for iterables is that it returns the > result of "".join(str(i) for i in self). In 8-9 years of python programming I have probably never needed to do "".join(str(i) for i in self), so even if there was a __str__ on iterables, this seems to me to be a particularily useless default. :) > In order to replicate the behavior of filter with a comprehension Instead of running filter on a string, you can replace the offending character with emptyness. filter(lambda c: c!="a", "abracadbra") 'brcdbr' "abracadbra".replace('c', '') 'brcdbr' I find using filter on a string kinda strange, I have to say. How often do you have to filter away certain characters in s a string? Never happened to me. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From solipsis at pitrou.net Sat May 31 15:59:24 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 31 May 2008 13:59:24 +0000 (UTC) Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?= References: <003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: Mark Hammond skippinet.com.au> writes: > In both Python 2.x and 3 (a few months old build of Py3k though), the > traceback isn't the same. For Python 2.0 you could write it like: > > def handle_exception(): > ... > raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] > > Its not clear how that would be spelt in py3k though (and from what I can > see, sys.exc_info() itself has an uncertain future in py3k). sys.exc_info() will remain, it's just that the returned value will be (None, None, None) if we are not in an except block in any of the currently active frames in the thread. In the case above it would return the current exception (the one caught in one of the enclosing frames). By the way, another interesting sys.exc_info() case: def except_yield(): try: raise TypeError except: yield 1 def f(): for i in except_yield(): return sys.exc_info() Right now, running f() returns (None, None, None). But with rewritten exception stacking, it may return the 3-tuple for the TypeError raised in except_yield(). Regards Antoine. From g.brandl at gmx.net Sat May 31 17:13:38 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 31 May 2008 17:13:38 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Stefan Behnel schrieb: > Hi, > > I currently use a bunch of work-arounds for doctests in lxml's test suite to > make them work in Py3. I converted most tests to a mix of Py2 and Py3 syntax > (e.g. using both u'' and b'' literals), and most of the runtime work is done > using regular expressions that convert the except-as syntax, strip package > names from tracebacks and translate bytes/str output between Py2 and Py3 > syntax/repr. > > I know, I could use the lib2to3 package, but it a) is a one-way tool in the > wrong direction if you have to distinguish bytes/str literals, b) lacks > configurability stating exactly what changes need to be done and c) seemed > harder to set up for doctests than doing the conversion by hand. Shouldn't the -d option handle doctests without further set-up? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sat May 31 17:33:38 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 31 May 2008 17:33:38 +0200 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: Brett Cannon schrieb: >>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then >>> again, pydoc is busted thanks to the new doc format. >> >> I will try to handle this in the coming week. >> > > Fred had the interesting suggestion of removing pydoc in Py3K based on > the thinking that documentation tools like pydoc should be external to > Python. With the docs now so easy to generate directly, should pydoc > perhaps just be gutted to only what is needed for help() to work? pydoc is fine for displaying docstring help, and interactive help. This should stay. Of course, it would also be nice for ``help("if")`` to work effortlessly, which it currently only does if the generated HTML documentation is available somewhere, which it typically isn't -- on Unix most distributions put it in a separate package (from which pydoc won't always find it of its own), on Windows only the CHM file is distributed and must be decompiled to get single HTML files. Now that the docs are reST, the source is almost pretty enough to display it raw, but I could also imagine a "text" writer that removes the more obscure markup to present a casual-reader-friendly text version. The needed sources could then be distributed with Python -- it shouldn't be more than about 200 kb. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From steve at holdenweb.com Sat May 31 17:42:24 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 31 May 2008 11:42:24 -0400 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <484171E0.4050204@holdenweb.com> Georg Brandl wrote: > Brett Cannon schrieb: > >>>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then >>>> again, pydoc is busted thanks to the new doc format. >>> >>> I will try to handle this in the coming week. >>> >> >> Fred had the interesting suggestion of removing pydoc in Py3K based on >> the thinking that documentation tools like pydoc should be external to >> Python. With the docs now so easy to generate directly, should pydoc >> perhaps just be gutted to only what is needed for help() to work? > > pydoc is fine for displaying docstring help, and interactive help. > This should stay. > > Of course, it would also be nice for ``help("if")`` to work effortlessly, > which it currently only does if the generated HTML documentation is > available somewhere, which it typically isn't -- on Unix most distributions > put it in a separate package (from which pydoc won't always find it > of its own), on Windows only the CHM file is distributed and must be > decompiled to get single HTML files. > > Now that the docs are reST, the source is almost pretty enough to display > it raw, but I could also imagine a "text" writer that removes the more > obscure markup to present a casual-reader-friendly text version. > > The needed sources could then be distributed with Python -- it shouldn't > be more than about 200 kb. The versioned documentation will sometimes be available from the Internet if you want to think about using that as a fallback source. It *would* be nice if help("if") worked. It would be even handier if help(if) worked, but that's a syntax problem, and it would be a horrendous one to overcome, I suspect. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From stefan_ml at behnel.de Sat May 31 17:47:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 31 May 2008 17:47:06 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Georg Brandl wrote: > Stefan Behnel schrieb: >> I know, I could use the lib2to3 package, but it a) is a one-way tool >> in the >> wrong direction if you have to distinguish bytes/str literals, b) lacks >> configurability stating exactly what changes need to be done and c) >> seemed >> harder to set up for doctests than doing the conversion by hand. > > Shouldn't the -d option handle doctests without further set-up? If you start 2to3 from the command prompt to convert the files that contain the doctests and copy them to a new location, then yes. But the question is: how do you run a Py2 doctest in Py3 without first copying your doctests or doctest containing sources to new files and then running the tests from there? You can't require people to put such a work-around into every test script in the world. Adding an option, fine. Copying files, adapting paths and all that, why? Stefan From guido at python.org Sat May 31 18:32:13 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 31 May 2008 09:32:13 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> Message-ID: Hi Atsuo, I'm very close to accepting your PEP. I have a few questions: - The Rationale has a more elaborate (and perhaps slightly conflicting, regarding the status of ASCII space?) definition of our definition of non-printable than the Specification. Perhaps this could be merged? - I'm still not comfortable with making stdout default to backslashreplace. Stuff written to stdout might be consumed by another program that might misinterpret the \ escapes. Previously I thought I was okay with doing this only if stdout.isatty() returns True, but I think that would just add confusion of the kind "it works in interactive mode but not when redirecting to a file". I'm okay with apps who think they need this setting that explicitly, but not to having it be the default. (For stderr however I agree that backslashreplace is the right default.) - What happens to Unicode characters that are "unassigned"? I assume there are many of those, especially outside the basic plane. Shouldn't we be conservative and convert these to \u or \U escapes as well? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Sat May 31 19:24:32 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 31 May 2008 11:24:32 -0600 Subject: [Python-3000] sys.exc_info() In-Reply-To: References: <003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: On Sat, May 31, 2008 at 7:59 AM, Antoine Pitrou wrote: > Mark Hammond skippinet.com.au> writes: >> In both Python 2.x and 3 (a few months old build of Py3k though), the >> traceback isn't the same. For Python 2.0 you could write it like: >> >> def handle_exception(): >> ... >> raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] >> >> Its not clear how that would be spelt in py3k though (and from what I can >> see, sys.exc_info() itself has an uncertain future in py3k). > > sys.exc_info() will remain, it's just that the returned value will be (None, > None, None) if we are not in an except block in any of the currently active > frames in the thread. In the case above it would return the current exception > (the one caught in one of the enclosing frames). > > By the way, another interesting sys.exc_info() case: > > def except_yield(): > try: > raise TypeError > except: > yield 1 > > def f(): > for i in except_yield(): > return sys.exc_info() > > Right now, running f() returns (None, None, None). But with rewritten exception > stacking, it may return the 3-tuple for the TypeError raised in except_yield(). What exception stacking? I thought we'd be using a simple per-thread exception. I'd expect the yield statement to clear it, giving us (None, None, None). -- Adam Olsen, aka Rhamphoryncus From ishimoto at gembook.org Sat May 31 19:30:05 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sun, 1 Jun 2008 02:30:05 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> Message-ID: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Hi Guido, On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum wrote: > Hi Atsuo, > > I'm very close to accepting your PEP. I have a few questions: Great! > > - The Rationale has a more elaborate (and perhaps slightly > conflicting, regarding the status of ASCII space?) definition of our > definition of non-printable than the Specification. Perhaps this could > be merged? > Yes, I'll merge them. > - I'm still not comfortable with making stdout default to > backslashreplace. Stuff written to stdout might be consumed by another > program that might misinterpret the \ escapes. Previously I thought I > was okay with doing this only if stdout.isatty() returns True, but I > think that would just add confusion of the kind "it works in > interactive mode but not when redirecting to a file". I'm okay with > apps who think they need this setting that explicitly, but not to > having it be the default. (For stderr however I agree that > backslashreplace is the right default.) Okay, we'll keep 'strict' as default error handler for stdout always, then. I can live with it. But, my $0.02, I expect this issue will be revisited after people start to develop real applications with Python 3.x. > > - What happens to Unicode characters that are "unassigned"? I assume > there are many of those, especially outside the basic plane. Shouldn't > we be conservative and convert these to \u or \U escapes as well? > Unassigned characters are defined as 'Cn ' in the Unicode database and they will be escaped. I'll update the PEP and the patch on Sunday. Thank you! From solipsis at pitrou.net Sat May 31 19:41:46 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 31 May 2008 17:41:46 +0000 (UTC) Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?= References: <003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: Adam Olsen gmail.com> writes: > > By the way, another interesting sys.exc_info() case: > > > > def except_yield(): > > try: > > raise TypeError > > except: > > yield 1 > > > > def f(): > > for i in except_yield(): > > return sys.exc_info() > > > > Right now, running f() returns (None, None, None). But with rewritten exception > > stacking, it may return the 3-tuple for the TypeError raised in except_yield(). > > What exception stacking? I thought we'd be using a simple per-thread > exception. I'd expect the yield statement to clear it, giving us > (None, None, None). There is a per-thread exception for the current exception state but we must also save and restore the previous state when we enter and leave an exception handler, respectively, so that re-raising and sys.exc_info() work properly in situations with lexically nested exception handlers. Also, "yield" cannot blindingly clear the exception state, because the frame calling the generator may except the exception state to be non-None. Consequently, we might have to keep the f_exc_* members solely for the generator case. From rhamph at gmail.com Sat May 31 21:35:36 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 31 May 2008 13:35:36 -0600 Subject: [Python-3000] sys.exc_info() In-Reply-To: References: <003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: On Sat, May 31, 2008 at 11:41 AM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> > By the way, another interesting sys.exc_info() case: >> > >> > def except_yield(): >> > try: >> > raise TypeError >> > except: >> > yield 1 >> > >> > def f(): >> > for i in except_yield(): >> > return sys.exc_info() >> > >> > Right now, running f() returns (None, None, None). But with rewritten > exception >> > stacking, it may return the 3-tuple for the TypeError raised in > except_yield(). >> >> What exception stacking? I thought we'd be using a simple per-thread >> exception. I'd expect the yield statement to clear it, giving us >> (None, None, None). > > There is a per-thread exception for the current exception state but we > must also save and restore the previous state when we enter and leave > an exception handler, respectively, so that re-raising and sys.exc_info() > work properly in situations with lexically nested exception handlers. The bytecode generation for "raise" could be changed literally be the same as "except Foo as e: raise e". Reuse our existing stack, not add another one. sys.exc_info() won't get clobbered until another exception gets raised. I see no reason why this needs to return anything other than (None, None, None): def x(): try: ... except: try: ... except: pass #raise return sys.exc_info() The commented out raise should use the outer except block (and thus be lexically based), but sys.exc_info() doesn't have to be. If you want it to work, use it *immediately* at the start of the block. > Also, "yield" cannot blindingly clear the exception state, because the frame > calling the generator may except the exception state to be non-None. > Consequently, we might have to keep the f_exc_* members solely for the > generator case. Why? Why should the frame calling the generator be inspecting the exception state of the generator? What's the use case? -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Sat May 31 22:03:49 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 31 May 2008 20:03:49 +0000 (UTC) Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?= References: <003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: Adam Olsen gmail.com> writes: > > The bytecode generation for "raise" could be changed literally be the > same as "except Foo as e: raise e". Reuse our existing stack, not add > another one. As someone else pointed, there is a difference between the two constructs: the latter appends a line to the traceback while the former doesn't. I suppose in some contexts it can be useful (especially if the exception is re-raised several times because of a complex architecture, e.g. a framework). > The commented out raise should use the outer except block (and thus be > lexically based), but sys.exc_info() doesn't have to be. But would you object to sys.exc_info() being lexically based as well? I say that because the bare "raise" statement and sys.exc_info() use the same attributes internally, so they will have the same semantics unless we decide it's better to do otherwise. > > Also, "yield" cannot blindingly clear the exception state, because the frame > > calling the generator may except the exception state to be non-None. > > Consequently, we might have to keep the f_exc_* members solely for the > > generator case. > > Why? Why should the frame calling the generator be inspecting the > exception state of the generator? What's the use case? You misunderstood me. The f_exc_* fields will be used internally to swap between the inner generator's exception state and the calling frame's own exception state. They will have no useful meaning for outside code so I suggest they are not accessible from Python code anymore. Regards Antoine. From guido at python.org Sat May 31 22:09:49 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 31 May 2008 13:09:49 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: Great -- get ready to make your patch perfect! On Sat, May 31, 2008 at 10:30 AM, Atsuo Ishimoto wrote: > Hi Guido, > > On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum wrote: >> Hi Atsuo, >> >> I'm very close to accepting your PEP. I have a few questions: > > Great! > >> >> - The Rationale has a more elaborate (and perhaps slightly >> conflicting, regarding the status of ASCII space?) definition of our >> definition of non-printable than the Specification. Perhaps this could >> be merged? >> > > Yes, I'll merge them. > >> - I'm still not comfortable with making stdout default to >> backslashreplace. Stuff written to stdout might be consumed by another >> program that might misinterpret the \ escapes. Previously I thought I >> was okay with doing this only if stdout.isatty() returns True, but I >> think that would just add confusion of the kind "it works in >> interactive mode but not when redirecting to a file". I'm okay with >> apps who think they need this setting that explicitly, but not to >> having it be the default. (For stderr however I agree that >> backslashreplace is the right default.) > > Okay, we'll keep 'strict' as default error handler for stdout always, > then. I can live with it. > But, my $0.02, I expect this issue will be revisited after people > start to develop real applications with Python 3.x. > >> >> - What happens to Unicode characters that are "unassigned"? I assume >> there are many of those, especially outside the basic plane. Shouldn't >> we be conservative and convert these to \u or \U escapes as well? >> > > Unassigned characters are defined as 'Cn ' in the Unicode database and > they will be escaped. > > I'll update the PEP and the patch on Sunday. Thank you! > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Sat May 31 23:33:05 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 31 May 2008 15:33:05 -0600 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: On Sat, May 31, 2008 at 11:30 AM, Atsuo Ishimoto wrote: > On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum wrote: >> - I'm still not comfortable with making stdout default to >> backslashreplace. Stuff written to stdout might be consumed by another >> program that might misinterpret the \ escapes. Previously I thought I >> was okay with doing this only if stdout.isatty() returns True, but I >> think that would just add confusion of the kind "it works in >> interactive mode but not when redirecting to a file". I'm okay with >> apps who think they need this setting that explicitly, but not to >> having it be the default. (For stderr however I agree that >> backslashreplace is the right default.) > > Okay, we'll keep 'strict' as default error handler for stdout always, > then. I can live with it. > But, my $0.02, I expect this issue will be revisited after people > start to develop real applications with Python 3.x. I think the reason why strict/backslashreplace (respectively) work well is that you can print a unicode string to stdout, have it fail (encoding can't handle it), then get an exception printed to stderr with the string escaped. Making stderr stricter would make it unable to print the string and making stdout less strict would let the error pass silently (printing potential garbage instead). -- Adam Olsen, aka Rhamphoryncus From dickinsm at gmail.com Sat May 31 23:52:10 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 31 May 2008 17:52:10 -0400 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <5c6f2a5d0805311452v15a87d06g899fa8182dbd9d2a@mail.gmail.com> On Sat, May 31, 2008 at 11:33 AM, Georg Brandl wrote: > > Now that the docs are reST, the source is almost pretty enough to display > it raw, but I could also imagine a "text" writer that removes the more > obscure markup to present a casual-reader-friendly text version. > > The needed sources could then be distributed with Python -- it shouldn't > be more than about 200 kb. > +1 from me. Would this mean that htmllib and sgmllib could be removed without further ado. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From andymac at bullseye.apana.org.au Mon May 26 15:10:43 2008 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Tue, 27 May 2008 00:10:43 +1100 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <48397ECC.9070805@cheimes.de> References: <48397ECC.9070805@cheimes.de> Message-ID: <483AB6D3.8010207@bullseye.andymac.org> Christian Heimes wrote: > The first set of betas of Python 2.6 and 3.0 is fast apace. I like to > grab the final chance and clean up the C API of 2.6 and 3.0. I know, I > know, I brought up the topic two times in the past. But this time I mean > it for real! :] On the subject of stabilising the API, I assigned issue 2862 to you concerning tidying up freelist management interfaces for ints and floats (http://bugs.python.org/issue2862). Note that the patch in issue 2862 is essentially orthogonal to the patch in issue 2039, although any int/float freelist implementation changes would require amendments. Additionally, I notice that not all of the types with free lists have grown routines to clear them - dicts, lists and sets are missing these routines. I will add a patch for these in the next few days if no-one else gets there first. On the subject of issue 2039, I've come to the view that "explicit is better than implicit" applies to the freelist management, and with the addition of freelist clearing routines called from gc.collect() I see little reason to pursue bounding of freelist sizes (and would suggest removal of existing bounding code in those freelist implementations that currently have it). I have also come to the view that pymalloc's automatic attempts to return empty arenas to the OS should be changed to an on-demand cleaning, called after all other cleanup in gc.collect(). Returning arenas, while not expensive in general, is nonetheless not free (in performance terms). -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac at pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From steve at holdenweb.com Sat May 31 17:42:24 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 31 May 2008 11:42:24 -0400 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: References: Message-ID: <484171E0.4050204@holdenweb.com> Georg Brandl wrote: > Brett Cannon schrieb: > >>>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then >>>> again, pydoc is busted thanks to the new doc format. >>> >>> I will try to handle this in the coming week. >>> >> >> Fred had the interesting suggestion of removing pydoc in Py3K based on >> the thinking that documentation tools like pydoc should be external to >> Python. With the docs now so easy to generate directly, should pydoc >> perhaps just be gutted to only what is needed for help() to work? > > pydoc is fine for displaying docstring help, and interactive help. > This should stay. > > Of course, it would also be nice for ``help("if")`` to work effortlessly, > which it currently only does if the generated HTML documentation is > available somewhere, which it typically isn't -- on Unix most distributions > put it in a separate package (from which pydoc won't always find it > of its own), on Windows only the CHM file is distributed and must be > decompiled to get single HTML files. > > Now that the docs are reST, the source is almost pretty enough to display > it raw, but I could also imagine a "text" writer that removes the more > obscure markup to present a casual-reader-friendly text version. > > The needed sources could then be distributed with Python -- it shouldn't > be more than about 200 kb. The versioned documentation will sometimes be available from the Internet if you want to think about using that as a fallback source. It *would* be nice if help("if") worked. It would be even handier if help(if) worked, but that's a syntax problem, and it would be a horrendous one to overcome, I suspect. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/