From andrewm at object-craft.com.au Mon Sep 1 03:53:45 2008 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Mon, 01 Sep 2008 11:53:45 +1000 Subject: [Python-3000] Minor problem with ABCMeta? Message-ID: <20080901015345.A1196600801@longblack.object-craft.com.au> The __subclasscheck__ method of ABCMeta contains the following code: # Check if it's a subclass of a registered class (recursive) for rcls in cls._abc_registry: if issubclass(subclass, rcls): cls._abc_registry.add(subclass) return True It looks to me like this code will result in an unnecessary call to cls._abc_registry.add() in the case that "subclass" is already in cls._abc_registry. It looks like the code should be preceded with something like: if subclass in cls._abc_registry: return True -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From ncoghlan at gmail.com Mon Sep 1 12:05:58 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 01 Sep 2008 20:05:58 +1000 Subject: [Python-3000] Minor problem with ABCMeta? In-Reply-To: <20080901015345.A1196600801@longblack.object-craft.com.au> References: <20080901015345.A1196600801@longblack.object-craft.com.au> Message-ID: <48BBBE86.20506@gmail.com> Andrew McNamara wrote: > The __subclasscheck__ method of ABCMeta contains the following code: > > # Check if it's a subclass of a registered class (recursive) > for rcls in cls._abc_registry: > if issubclass(subclass, rcls): > cls._abc_registry.add(subclass) > return True > > It looks to me like this code will result in an unnecessary call to > cls._abc_registry.add() in the case that "subclass" is already in > cls._abc_registry. It looks like the code should be preceded with > something like: > > if subclass in cls._abc_registry: > return True Actually, it looks to me like the subclass is getting added to the wrong set - it should be going into the _abc_cache, not the _abc_registry. Tracker item with the 2-line patch here if someone would care to give it the necessary post-beta review: http://bugs.python.org/issue3747 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From andrewm at object-craft.com.au Tue Sep 2 03:46:34 2008 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Tue, 02 Sep 2008 11:46:34 +1000 Subject: [Python-3000] Minor problem with ABCMeta? In-Reply-To: <48BBBE86.20506@gmail.com> References: <20080901015345.A1196600801@longblack.object-craft.com.au> <48BBBE86.20506@gmail.com> Message-ID: <20080902014634.326D0600801@longblack.object-craft.com.au> >Actually, it looks to me like the subclass is getting added to the wrong >set - it should be going into the _abc_cache, not the _abc_registry. Ah, you're right - that makes a lot more sense. Thanks. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From andrewm at object-craft.com.au Tue Sep 2 04:37:39 2008 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Tue, 02 Sep 2008 12:37:39 +1000 Subject: [Python-3000] re.escape() fails when passed bytes() Message-ID: <20080902023739.72D14600801@longblack.object-craft.com.au> In Python 2, re.escape() works with either str or unicode, but in Python 3, it no longer works with bytes(). I've created issue 3756 to track this: http://bugs.python.org/issue3756 -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From guido at python.org Tue Sep 2 19:26:27 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Sep 2008 10:26:27 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raise OverflowError? In-Reply-To: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: On Sat, Aug 30, 2008 at 8:07 AM, Hagen F?rstenau wrote: > While __len__() is allowed to return a value of any size, issues 2723 > and 3729 need a decision on what len() should do if the value doesn't > fit in a Py_ssize_t. > > In a previous thread > (http://mail.python.org/pipermail/python-3000/2008-May/013387.html) > Guido wanted len() to "lie" and return sys.maxsize in this case, but > several people have voiced strong discomfort with that. Any comments > or pronouncements? I stand by my view. I might voice strong discomfort with raising an exception because it doesn't fit in some implementation detail. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From daniel at stutzbachenterprises.com Tue Sep 2 20:14:30 2008 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 2 Sep 2008 13:14:30 -0500 Subject: [Python-3000] Should len() clip to sys.maxsize or raise OverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: On Tue, Sep 2, 2008 at 12:26 PM, Guido van Rossum wrote: > I stand by my view. I might voice strong discomfort with raising an > exception because it doesn't fit in some implementation detail. Isn't that precisely what OverflowError is for? ("it doesn't fit in some implementation detail") It seems to me that the Purity angle here would be to allow len() to return any Python int object. The Practical angle wants to restrict it to sys.maxsize for performance reasons. Throwing an OverflowError seems like a good way for Practical to cry, "Oops, I've been caught". (I'm interested in this issue because my list-like extension typecan in some cases have a length greater than sys.maxsize) -- Daniel Stutzbach, Ph.D. http://stutzbachenterprises.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 2 20:21:49 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Sep 2008 11:21:49 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raise OverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: On Tue, Sep 2, 2008 at 11:14 AM, Daniel Stutzbach wrote: > On Tue, Sep 2, 2008 at 12:26 PM, Guido van Rossum wrote: >> I stand by my view. I might voice strong discomfort with raising an >> exception because it doesn't fit in some implementation detail. > > Isn't that precisely what OverflowError is for? ("it doesn't fit in some > implementation detail") > > It seems to me that the Purity angle here would be to allow len() to return > any Python int object. The Practical angle wants to restrict it to > sys.maxsize for performance reasons. Throwing an OverflowError seems like a > good way for Practical to cry, "Oops, I've been caught". > > (I'm interested in this issue because my list-like extension type can in > some cases have a length greater than sys.maxsize) The way I see it is that there are tons of ways I can think of how raising OverflowError can break unsuspecting programs (e.g. code that has been tested before but never with a humungous input), whereas returning a "little white lie" would allow such code to proceed just fine. Some examples of code that is inconvenienced by the exception: if len(x): # used as non-empty test if len(x) > 100: # used to guarantee we can access items 0 through 99 for i in range(len(x)): # will be broken out before reaching the end -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Tue Sep 2 21:08:14 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 2 Sep 2008 12:08:14 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: From: "Guido van Rossum" > The way I see it is that there are tons of ways I can think of how > raising OverflowError can break unsuspecting programs (e.g. code that > has been tested before but never with a humungous input), whereas > returning a "little white lie" would allow such code to proceed just > fine. Some examples of code that is inconvenienced by the exception: > > if len(x): # used as non-empty test > if len(x) > 100: # used to guarantee we can access items 0 through 99 > for i in range(len(x)): # will be broken out before reaching the end That makes sense to me and there a probably plenty of examples. However, I worry more about other examples that will fail and do so it a way that is nearly impossible to find through code review (because the code IS correct as written). n = len(log_entries) if log_entries[n] in handled: log_entries.pop(n) It's not hard to imagine other examples with slicing and whatnot. These cases may be less common that those pointed out by Guido but they will be disasterous when they occur and very hard to defend against or debug. I would rather face the overflow errors when they arise than deal with the latter cases. In the former, I can always make an immediate fix by replacing the builtin with an Overflow suppressing version. But, in the latter case, the silent failure is *much* harder to deal with. Raymond From guido at python.org Tue Sep 2 21:20:32 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Sep 2008 12:20:32 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: On Tue, Sep 2, 2008 at 12:08 PM, Raymond Hettinger wrote: > From: "Guido van Rossum" >> >> The way I see it is that there are tons of ways I can think of how >> raising OverflowError can break unsuspecting programs (e.g. code that >> has been tested before but never with a humungous input), whereas >> returning a "little white lie" would allow such code to proceed just >> fine. Some examples of code that is inconvenienced by the exception: >> >> if len(x): # used as non-empty test >> if len(x) > 100: # used to guarantee we can access items 0 through 99 >> for i in range(len(x)): # will be broken out before reaching the end > > That makes sense to me and there a probably plenty of examples. > However, I worry more about other examples that will fail > and do so it a way that is nearly impossible to find through > code review (because the code IS correct as written). > > n = len(log_entries) > if log_entries[n] in handled: This should raise an IndexError. I think you meant something else? > log_entries.pop(n) > > It's not hard to imagine other examples with slicing and whatnot. > These cases may be less common that those pointed out by Guido > but they will be disasterous when they occur and very hard to > defend against or debug. > > I would rather face the overflow errors when they arise than > deal with the latter cases. In the former, I can always make an immediate > fix by replacing the builtin with an Overflow suppressing version. > But, in the latter case, the silent failure is *much* harder to deal with. > > > Raymond > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From digitalxero at gmail.com Tue Sep 2 22:13:21 2008 From: digitalxero at gmail.com (Dj Gilcrease) Date: Tue, 2 Sep 2008 14:13:21 -0600 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: why would it raise an index error when log_entries has more indicies then sys.maxsize, it should just check the entry @ sys.maxsize. Maybe I missed it, but why cant len just return an int, which if I remember correctly is now a long in py3k, so on a 64 bit system len would (hopefully) never lie, but on a 32 bit system it could return a number segnificently lower then the actual number of entries. On 9/2/08, Guido van Rossum wrote: > On Tue, Sep 2, 2008 at 12:08 PM, Raymond Hettinger wrote: >> From: "Guido van Rossum" >>> >>> The way I see it is that there are tons of ways I can think of how >>> raising OverflowError can break unsuspecting programs (e.g. code that >>> has been tested before but never with a humungous input), whereas >>> returning a "little white lie" would allow such code to proceed just >>> fine. Some examples of code that is inconvenienced by the exception: >>> >>> if len(x): # used as non-empty test >>> if len(x) > 100: # used to guarantee we can access items 0 through 99 >>> for i in range(len(x)): # will be broken out before reaching the end >> >> That makes sense to me and there a probably plenty of examples. >> However, I worry more about other examples that will fail >> and do so it a way that is nearly impossible to find through >> code review (because the code IS correct as written). >> >> n = len(log_entries) >> if log_entries[n] in handled: > > This should raise an IndexError. I think you meant something else? > >> log_entries.pop(n) >> >> It's not hard to imagine other examples with slicing and whatnot. >> These cases may be less common that those pointed out by Guido >> but they will be disasterous when they occur and very hard to >> defend against or debug. >> >> I would rather face the overflow errors when they arise than >> deal with the latter cases. In the former, I can always make an >> immediate >> fix by replacing the builtin with an Overflow suppressing version. >> But, in the latter case, the silent failure is *much* harder to deal with. >> >> >> Raymond >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/digitalxero%40gmail.com > -- Dj Gilcrease OpenRPG Developer ~~http://www.openrpg.com OpenRPG+ Lead Developer ~~http://openrpg.digitalxero.net XeroPortal Creator ~~http://www.digitalxero.net Web Admin for Thewarcouncil.us ~~http://www.thewarcouncil.us From python at rcn.com Tue Sep 2 22:35:18 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 2 Sep 2008 13:35:18 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> >> That makes sense to me and there a probably plenty of examples. >> However, I worry more about other examples that will fail >> and do so it a way that is nearly impossible to find through >> code review (because the code IS correct as written). >> >> n = len(log_entries) >> if log_entries[n] in handled: > > This should raise an IndexError. I think you meant something else? > >> log_entries.pop(n) Right. It should have been n-1 in my quick example. The idea is that if the len() return value is actually being use for something (in this case indexing, but possibly also slicing, resource managment, etc), then the app will silently start doing the wrong thing. next_ticket_number = len(tickets) create_new_ticket_form(time(), next_ticket_number) ISTM, there are many uses for len() when it is bad news if the result is less than the real length. Those cases will be harder to detect and correct than if an overflow was raised. Raymond From guido at python.org Tue Sep 2 22:53:00 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Sep 2008 13:53:00 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: On Tue, Sep 2, 2008 at 1:35 PM, Raymond Hettinger wrote: >>> That makes sense to me and there a probably plenty of examples. >>> However, I worry more about other examples that will fail >>> and do so it a way that is nearly impossible to find through >>> code review (because the code IS correct as written). >>> >>> n = len(log_entries) >>> if log_entries[n] in handled: >> >> This should raise an IndexError. I think you meant something else? >> >>> log_entries.pop(n) > > Right. It should have been n-1 in my quick example. And why not -1? That doesn't have the clipping problem. > The idea is that if the len() return value is actually being use for > something > (in this case indexing, but possibly also slicing, resource managment, etc), > then the app will silently start doing the wrong thing. > > next_ticket_number = len(tickets) > create_new_ticket_form(time(), next_ticket_number) > > ISTM, there are many uses for len() when it is bad news if the result > is less than the real length. Those cases will be harder to detect and > correct than if an overflow was raised. I'm sorry, but toy examples like these don't convince me. Most of them sound like they are likely using real lists. But for *real* lists, and for anything that actually store an item (no matter how small -- could be a reference to None) for each valid index value, there is no possibility that __len__() will ever overlow, since the clipping limit is half the memory size, while the theoretical number of references that can be stored in memory is either 1/4th or 1/8th of the memory size depending on pointer size. The only time when __len__ can be larger than sys.maxsize is when the class implements some kind of virtual space where the values are computed on the fly. In such cases trying to walk over all values is bound to take forever, and the length is likely not of all that much interest to the caller -- but sometimes we may need to pass such an object to some library code we didn't write that is making some trivial use of len(), like the examples I gave before. That said, I would actually be okay with the status quo (which does raise an OverflowError) as long as we commit to fixing this properly in 2.7 / 3.1, by removing the range restriction (like we've done for other int operations a long time ago). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From daniel at stutzbachenterprises.com Tue Sep 2 23:18:08 2008 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 2 Sep 2008 16:18:08 -0500 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: On Tue, Sep 2, 2008 at 3:53 PM, Guido van Rossum wrote: > The only time when __len__ can be larger than sys.maxsize is when the > class implements some kind of virtual space where the values are > computed on the fly. In such cases trying to walk over all values is > bound to take forever, and the length is likely not of all that much > interest to the caller -- but sometimes we may need to pass such an > object to some library code we didn't write that is making some > trivial use of len(), like the examples I gave before. len() is useful for more than iteration, such as setting the bounds for a binary search (e.g., over a large on-disk data structure) That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). > +1 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhamph at gmail.com Tue Sep 2 23:34:26 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 2 Sep 2008 15:34:26 -0600 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: On Tue, Sep 2, 2008 at 2:53 PM, Guido van Rossum wrote: > The only time when __len__ can be larger than sys.maxsize is when the > class implements some kind of virtual space where the values are > computed on the fly. In such cases trying to walk over all values is > bound to take forever, and the length is likely not of all that much > interest to the caller -- but sometimes we may need to pass such an > object to some library code we didn't write that is making some > trivial use of len(), like the examples I gave before. > > That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). +1 Otherwise it sounds like these virtual containers shouldn't support len() at all. Maybe a .len() method instead, with all the TMTOWTDI that implies. -- Adam Olsen, aka Rhamphoryncus From ncoghlan at gmail.com Tue Sep 2 23:57:59 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 03 Sep 2008 07:57:59 +1000 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: <48BDB6E7.5070304@gmail.com> Guido van Rossum wrote: > The only time when __len__ can be larger than sys.maxsize is when the > class implements some kind of virtual space where the values are > computed on the fly. In such cases trying to walk over all values is > bound to take forever, and the length is likely not of all that much > interest to the caller -- but sometimes we may need to pass such an > object to some library code we didn't write that is making some > trivial use of len(), like the examples I gave before. > > That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). For those that haven't been following issue 2690, the latter paragraph will make it much easier to turn range() into a proper representative of collections.Sequence. I don't actually see any huge technical problems with implementing this - we're just going to have to add a second C level method slot that uses the unaryfunc signature (returning PyObject *) for a "virtual length" method in addition to the existing mp_length and sq_length (which return PySsize_t). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Wed Sep 3 00:01:19 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 03 Sep 2008 08:01:19 +1000 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> Message-ID: <48BDB7AF.5050702@gmail.com> Dj Gilcrease wrote: > why would it raise an index error when log_entries has more indicies > then sys.maxsize, it should just check the entry @ sys.maxsize. > > Maybe I missed it, but why cant len just return an int, which if I > remember correctly is now a long in py3k, so on a 64 bit system len > would (hopefully) never lie, but on a 32 bit system it could return a > number segnificently lower then the actual number of entries. That's the implementation detail Guido is referring to - when len(obj) delegates to obj.__len__(), the result of the method call gets stored in a PySsize_t value, creating the problem. It's fixable, but not for the current release cycle. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From python at rcn.com Tue Sep 2 23:30:59 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 2 Sep 2008 14:30:59 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: <67F7BAE0456A4455B3D9629F8A8BA15A@RaymondLaptop1> From: "Guido van Rossum" > That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). And there was much rejoicing! Raymond From greg.ewing at canterbury.ac.nz Wed Sep 3 03:54:13 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 03 Sep 2008 13:54:13 +1200 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: <48BDB6E7.5070304@gmail.com> References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> <48BDB6E7.5070304@gmail.com> Message-ID: <48BDEE45.705@canterbury.ac.nz> Nick Coghlan wrote: > - we're just going to have to add a second C level method slot that uses > the unaryfunc signature (returning PyObject *) for a "virtual length" > method in addition to the existing mp_length and sq_length (which return > PySsize_t). As an aside, is there any plan to clean up the duplication between the mp_ and sq_ method slots? -- Greg From hagenf at coli.uni-saarland.de Wed Sep 3 10:23:37 2008 From: hagenf at coli.uni-saarland.de (=?UTF-8?Q?Hagen_F=C3=BCrstenau?=) Date: Wed, 3 Sep 2008 09:23:37 +0100 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: <33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com> > That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). What should be done when __len__() returns a float? In Python 2.6 the behaviour depends on whether the class in new-style or not. (Old-style classes raise a TypeError, new-style classes truncate.) Is there any good reason for truncating? - Hagen From guido at python.org Wed Sep 3 18:15:14 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Sep 2008 09:15:14 -0700 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: <33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com> References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> <33965e610809030123p4df9c62aq9a02e36a9e79a041@mail.gmail.com> Message-ID: 2008/9/3 Hagen F?rstenau : >> That said, I would actually be okay with the status quo (which does >> raise an OverflowError) as long as we commit to fixing this properly >> in 2.7 / 3.1, by removing the range restriction (like we've done for >> other int operations a long time ago). > > What should be done when __len__() returns a float? In Python 2.6 the > behaviour depends on whether the class in new-style or not. (Old-style > classes raise a TypeError, new-style classes truncate.) Is there any > good reason for truncating? That souds like a bug. IMO TypeError is the right response here. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcea at jcea.es Wed Sep 3 18:59:15 2008 From: jcea at jcea.es (Jesus Cea) Date: Wed, 03 Sep 2008 18:59:15 +0200 Subject: [Python-3000] Should len() clip to sys.maxsize or raiseOverflowError? In-Reply-To: References: <33965e610808300807t3dc7eb1es44a29afd2afcd56f@mail.gmail.com> <52F00F4EEEC242D3896F2BF59EE6C1D6@RaymondLaptop1> Message-ID: <48BEC263.40402@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote: > That said, I would actually be okay with the status quo (which does > raise an OverflowError) as long as we commit to fixing this properly > in 2.7 / 3.1, by removing the range restriction (like we've done for > other int operations a long time ago). +1 Some of my python time is spend managing a huge python persistent repository of almost two hundred terabytes. I have more than 2^32 objects just now. In fact, a bit more than 2^35. Growing about 25-30% per year. Disk capacity raises about 40% per year, so the hardware growing is under control :)... barely. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL7CYJlgi5GaxT1NAQJwFgP9HHsI/GfNY3i0ZTEvfRt16BGVJ5gOKq35 KNDv4XzuMFmdaPEdwtAuKvGEbcb5f8+jvkbi3kmUHVJI73hs0pO/lKMbtKAjwsal rq/wlA7na6oHhe7zIN/UljQPJhy1K0SuSO2Y0GzeJ88MTRbBjw/Ulitw3ESc5Dij eSQkpRrkQHc= =QSRC -----END PGP SIGNATURE----- From jcea at jcea.es Wed Sep 3 20:03:16 2008 From: jcea at jcea.es (Jesus Cea) Date: Wed, 03 Sep 2008 20:03:16 +0200 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") Message-ID: <48BED164.1050103@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 With this email I confirm that bsddb work for 2.6/3.0 rc1 is done. I have some issues with the buildbots, nevertheless. See: , for example. """ Re-running failed tests in verbose mode Re-running test 'test_bsddb3' in verbose mode test test_bsddb3 crashed -- : str() on a bytes instance Traceback (most recent call last): File "./Lib/test/regrtest.py", line 603, in runtest_inner indirect_test() File "/Users/buildslave/bb/3.0.psf-g4/build/Lib/test/test_bsddb3.py", line 60, in test_main print(db.DB_VERSION_STRING, file=sys.stderr) BytesWarning: str() on a bytes instance """ I can't reproduce the issue in my local Python3.0 development version (here, all tests passes fine). Any suggestion?. "Decoding" the "db.DB_VERSION_STRING" byte string would solve the error, but I rather prefer to know WHY I am having this issue at all. My Python3.0 "str()" has no any issue with byte values: """ [jcea at tesalia tmp]$ python3.0 Python 3.0b3+ (py3k:66121, Sep 1 2008, 22:25:14) [GCC 4.2.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> b=b'some string...' >>> b b'some string...' >>> str(b) "b'some string...'" >>> """ Help appreciated. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL7RXplgi5GaxT1NAQLGywP+KHothVVQ1bxmQZixBfKrdAhyqCRZ3S61 xGHE9U6IOaF1fB5O9S+E/OGEa8RX5hWNyxie5UsjG7N7qt0r6q5tqA8bedomsZtY /CM0lCOr5F3ssFsUF965WxUD03aD+IRssr+7SKTyotNHH1qGF7ffggfGDJmF0/wq C3RYRS6cUjs= =sx2A -----END PGP SIGNATURE----- From lists at cheimes.de Wed Sep 3 20:36:48 2008 From: lists at cheimes.de (Christian Heimes) Date: Wed, 03 Sep 2008 20:36:48 +0200 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: <48BED164.1050103@jcea.es> References: <48BED164.1050103@jcea.es> Message-ID: Jesus Cea wrote: > I can't reproduce the issue in my local Python3.0 development version > (here, all tests passes fine). Any suggestion?. Yeah, use my byte warning mode of Python 3.0. Before your time in the core team I had a long discussion with Guido and a few others. The conclusion of the discussion was the byte warning mode. ./python --help -b : issue warnings about str(bytes_instance), str(buffer_instance) and comparing bytes/buffer with str. (-bb: issue errors) By the way buffer_instance should be renamed to bytearray. ./python -bb Python 3.0b3+ (py3k:66180M, Sep 3 2008, 12:35:13) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> str(b'') Traceback (most recent call last): File "", line 1, in BytesWarning: str() on a bytes instance [41030 refs] > "Decoding" the "db.DB_VERSION_STRING" byte string would solve the error, > but I rather prefer to know WHY I am having this issue at all. My > Python3.0 "str()" has no any issue with byte values: It has no issues because Guido wanted str() to successed. Any comparison or conversion of a byte / bytearray instance with / to str is most likely a bug or design flaw in the application. The byte warning mode helps to discover hard to find bugs like "" == b"". Christian From jcea at jcea.es Wed Sep 3 23:31:49 2008 From: jcea at jcea.es (Jesus Cea) Date: Wed, 03 Sep 2008 23:31:49 +0200 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: References: <48BED164.1050103@jcea.es> Message-ID: <48BF0245.9000102@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christian Heimes wrote: > Jesus Cea wrote: >> I can't reproduce the issue in my local Python3.0 development version >> (here, all tests passes fine). Any suggestion?. > > Yeah, use my byte warning mode of Python 3.0. Before your time in the > core team I had a long discussion with Guido and a few others. The > conclusion of the discussion was the byte warning mode. So much to learn, so little time :). Do you have an URL at hand? >> "Decoding" the "db.DB_VERSION_STRING" byte string would solve the error, >> but I rather prefer to know WHY I am having this issue at all. My >> Python3.0 "str()" has no any issue with byte values: > > It has no issues because Guido wanted str() to successed. Any comparison > or conversion of a byte / bytearray instance with / to str is most > likely a bug or design flaw in the application. The byte warning mode > helps to discover hard to find bugs like "" == b"". Just committed the "decode" thing: r66188. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL8CPplgi5GaxT1NAQK9gQP/SvkaSpRl+hqAtuCy1uh8cj6NwUN2Iw0E Z6XZfxNIRgcBtwJdugSu/70lqRsCusj9cSrxxhCw5xPQSjUeLQTsVlvqGNPGU1XI PNKrA6ofqsHRlJgg/umKmdlyOy8PftckWugPOw2RVIeQXeRuWxs35/7F4uVEnCT2 ttCvJJPhDJo= =hahg -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 00:29:02 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Sep 2008 18:29:02 -0400 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: <48BF0245.9000102@jcea.es> References: <48BED164.1050103@jcea.es> <48BF0245.9000102@jcea.es> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 3, 2008, at 5:31 PM, Jesus Cea wrote: > Christian Heimes wrote: >> Jesus Cea wrote: >>> I can't reproduce the issue in my local Python3.0 development >>> version >>> (here, all tests passes fine). Any suggestion?. >> >> Yeah, use my byte warning mode of Python 3.0. Before your time in the >> core team I had a long discussion with Guido and a few others. The >> conclusion of the discussion was the byte warning mode. > > So much to learn, so little time :). Do you have an URL at hand? > >>> "Decoding" the "db.DB_VERSION_STRING" byte string would solve the >>> error, >>> but I rather prefer to know WHY I am having this issue at all. My >>> Python3.0 "str()" has no any issue with byte values: >> >> It has no issues because Guido wanted str() to successed. Any >> comparison >> or conversion of a byte / bytearray instance with / to str is most >> likely a bug or design flaw in the application. The byte warning mode >> helps to discover hard to find bugs like "" == b"". > > Just committed the "decode" thing: r66188. Jesus, again thanks for working so hard on pybsddb, I really appreciate the effort. However, in talking with several developers, there are still concerns about bundling bsddb with Python 3.0. We have to leave it in 2.6 for backward compatibility reasons, but we should deprecate it, remove it from 3.0 and continue to release it as a separate package. The issues come down to these: - - You (Jesus) are the only person maintaining the code - - The upstream bsddb API has never been the most stable thing in the world - - Concerns that the buildbot environments are not adequately testing the code My gut own feeling is that both pybsddb and Python would be much better served with this code outside the core, distributed separately. All your work would still live on, and be appreciated by the community, but neither pybsddb nor Python would be tied to each other's release cycles. And of course, your continued maintenance on the 2.6 branch is greatly appreciated. The dilemma for me is whether to let 3.0rc1 go out with or without bsddb. Either way, if there's a pitchfork revolt of this decision we'll have to break our rule for rc2 to back the change out. Guido has already given his approval to remove pybsddb from Python 3.0: http://mail.python.org/pipermail/python-dev/2008-July/081379.html and I know Brett agrees, so that's it. On IRC, I've just asked Benjamin to do the honors for 3.0 and Brett will add the deprecations for 2.6. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL8PrnEjvBPtnXfVAQKFfAP9EfWlyrmdBvIrO85vX4dpd/uIjM1Q5Ngm LP4a20nWPsmA6LpMbW7fjpwVnnNOeJqamqX8JFsqcETw1GOJnIgovqhHItzCgQjb 0X+Uw/m2Uv0TqKcgrf0WXw61sLG8liWdkV4tq92JnbzBVwEzCTdZPDfOGUGEYRop Q3LLOxKRRow= =4dVo -----END PGP SIGNATURE----- From jcea at jcea.es Thu Sep 4 01:01:29 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 01:01:29 +0200 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: References: <48BED164.1050103@jcea.es> <48BF0245.9000102@jcea.es> Message-ID: <48BF1749.7040803@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote: > and I know Brett agrees, so that's it. On IRC, I've just asked Benjamin > to do the honors for 3.0 and Brett will add the deprecations for 2.6. I just committed the fix for bsddb testsuite in Python 3.0 branch: http://www.python.org/dev/buildbot/3.0.stable/changes/2687 Can I do anything to revert this decision?. If not, what can I do to be reconsidered in 3.1?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL8XSZlgi5GaxT1NAQJQ5QP/ZGivsmwMbMta2mcxYSbc97BgHGbvIavD fTjuJ7v2R+3p0bQIAAGs7ih7mkJ/6a+F6j2hqC4Qk+0p3NK5IYn+lCThtBmjlIyb zmcTBzWyctuSdjV/AvDhjziRbnMlCIjhBCBHO9vc82hb5AmiBo5XT9szJHCfpDa+ 2DWp8t765t8= =SrAU -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 03:25:26 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Sep 2008 21:25:26 -0400 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: <48BF1749.7040803@jcea.es> References: <48BED164.1050103@jcea.es> <48BF0245.9000102@jcea.es> <48BF1749.7040803@jcea.es> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 3, 2008, at 7:01 PM, Jesus Cea wrote: > Barry Warsaw wrote: >> and I know Brett agrees, so that's it. On IRC, I've just asked >> Benjamin >> to do the honors for 3.0 and Brett will add the deprecations for 2.6. > > I just committed the fix for bsddb testsuite in Python 3.0 branch: > http://www.python.org/dev/buildbot/3.0.stable/changes/2687 > > Can I do anything to revert this decision?. If not, what can I do to > be > reconsidered in 3.1?. Start raising some pitchforks. It looks like Raymond will join the march :). Really, this is about what's best for Python and pybsddb. In this article, Guido unambiguously states his opinion: http://mail.python.org/pipermail/python-dev/2008-July/081362.html "+1. In my recollection maintaining bsddb has been nothing but trouble right from the start when we were all sitting together at "Zope Corp North" in a rented office in McLean... We can remove it from 3.0. We can't really remove it from 2.6, but we can certainly start end-of-lifing it in 2.6." Jesus, let me stress that IMO this is not a reflection on your work at all. On the contrary, keeping it alive in 2.x and providing a really solid independent package for 3.0 is critical for its continued relevance to Python programmers. I completely agree with Guido that bsddb (not pybsddb) has been a headache since forever. For example, IIRC Sleepycat was notorious for changing the API in micro releases, though I don't know if that's still the case with the current maintainers. I personally believe that Python and pybsddb are both better off with their own maintenance lifecycles so I stand by my decision that pulling it out of 3.0 is the right thing to do. 3.1 is far enough away that any decision we make in 3.0 can be re-evaluated. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL85BnEjvBPtnXfVAQLfkwQAtoagOP37uAwL1r2H7w73erTsWBYHf4VH KcTZsjeQ/mEvmaaJIG86ylAtpxmDmMF5x7OClR66bXXxf0oTnWV4KMC9rLdQW8R/ KpMIfuQw/501AQgFmcB0M6SQ6CYyJHU5K+K6X+ScOPHOJoG8usPK1pk8XFGOXBZK UGXCEHVvlrk= =7AOQ -----END PGP SIGNATURE----- From skip at pobox.com Thu Sep 4 04:33:48 2008 From: skip at pobox.com (skip at pobox.com) Date: Wed, 3 Sep 2008 21:33:48 -0500 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> Message-ID: <18623.18700.76260.893902@montanaro-dyndns-org.local> >From issue3769: Skip> Remind me why we want to get rid of bsddb? Benjamin> The reasons are enumerated in PEP 3108. Not much justification and no references to outside discussion for such a heavily used package which has been part of Python for a long time in one form or another. I find it amusing that bsddb3 is a key justification for the removal of bsddb185 and then later on bsddb3 is deemed too much of a maintenance burden itself to retain. Does dumbdbm (aka dbm.dumb) work any better than it used to? I'd hate to think that's going to be the default cross-platform dictionary-on-disk package for Python. Can he pep at least be edited to reflect which of the various dbm.{bsd,ndbm,dumb,gnu} modules will be supported on what platforms by default? Skip From brett at python.org Thu Sep 4 05:26:22 2008 From: brett at python.org (Brett Cannon) Date: Wed, 3 Sep 2008 20:26:22 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <18623.18700.76260.893902@montanaro-dyndns-org.local> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> Message-ID: On Wed, Sep 3, 2008 at 7:33 PM, wrote: > > >From issue3769: > > Skip> Remind me why we want to get rid of bsddb? > > Benjamin> The reasons are enumerated in PEP 3108. > > Not much justification and no references to outside discussion for such a > heavily used package which has been part of Python for a long time in one > form or another. > > I find it amusing that bsddb3 is a key justification for the removal of > bsddb185 and then later on bsddb3 is deemed too much of a maintenance burden > itself to retain. > > Does dumbdbm (aka dbm.dumb) work any better than it used to? I'd hate to > think that's going to be the default cross-platform dictionary-on-disk > package for Python. Can he pep at least be edited to reflect which of the > various dbm.{bsd,ndbm,dumb,gnu} modules will be supported on what platforms > by default? > All but dbm.dumb require some pre-existing library to exist to compile against. So any platform that has the proper libraries installed will be able to use ndbm or gnu, but as for which platforms that are I do not know. -Brett From python at rcn.com Thu Sep 4 05:36:40 2008 From: python at rcn.com (Raymond Hettinger) Date: Wed, 3 Sep 2008 20:36:40 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> Message-ID: <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> > Skip> Remind me why we want to get rid of bsddb? > > Benjamin> The reasons are enumerated in PEP 3108. > > Not much justification and no references to outside discussion for such a > heavily used package which has been part of Python for a long time in one > form or another. Well said. Raymond From barry at python.org Thu Sep 4 05:41:27 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Sep 2008 23:41:27 -0400 Subject: [Python-3000] Not releasing rc1 tonight Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm not going to release rc1 tonight. There are too many open release blockers that I don't want to defer, and I'd like the buildbots to churn through the bsddb removal on all platforms. Let me first thank Benjamin, Brett, Mark and Antoine for their help on IRC tonight. Here are the issues I'm not comfortable with deferring: 3640 test_cpickle crash on AMD64 Windows build 874900 threading module can deadlock after fork 3574 compile() cannot decode Latin-1 source encodings 3657 pickle can pickle the wrong function 3187 os.listdir can return byte strings 3660 reference leaks in 3.0 3594 PyTokenizer_FindEncoding() never succeeds 3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2 In addition, Mark reported in IRC that there are some regressions in the logging module. I appreciate any feedback or fixes you can provide on these issues. You might also want to look at the deferred blockers to see if there's anything that really should be blocking rc1. I'd like to try again on Friday and stick to rc2 on the 17th. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL9Y53EjvBPtnXfVAQJGXwP+JZUa5EWlQh7yzt7aFdEM3qgiFZnKqWhz TN4Cen0/eK8c4+t8a5WC+OLvc/P3PhMPhLSnE+g6IqQUO+pt+2LANgpAvCUrUahc Nk2pt3gCclcmWlzVvCBspVPZjFPkHsW0uVhgK6x1C/2Re90yjeBqPGgT4LGlmaR3 bz6A3iiUnk0= =Y5aN -----END PGP SIGNATURE----- From brett at python.org Thu Sep 4 06:10:33 2008 From: brett at python.org (Brett Cannon) Date: Wed, 3 Sep 2008 21:10:33 -0700 Subject: [Python-3000] Problem with grammar for 'except'? Message-ID: I gave a talk last night at the Vancouver Python users group on 2.6/3.0, and I tried the following code and it failed during a live demo:: >>> try: pass ... except Exception, Exception: pass File "", line 2 except Exception, Exception: pass ^ SyntaxError: invalid syntax Now from what I can tell from PEP 3110, that should be legal in 3.0. Am I reading the PEP correctly? -Brett From python at rcn.com Thu Sep 4 06:14:21 2008 From: python at rcn.com (Raymond Hettinger) Date: Wed, 3 Sep 2008 21:14:21 -0700 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight References: Message-ID: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> [Barry] > I'm not going to release rc1 tonight. Can I go ahead with some bug fixes and doc improvements or should I wait until after Friday? Raymond From python at rcn.com Thu Sep 4 06:25:00 2008 From: python at rcn.com (Raymond Hettinger) Date: Wed, 3 Sep 2008 21:25:00 -0700 Subject: [Python-3000] Problem with grammar for 'except'? References: Message-ID: [Brett] >I gave a talk last night at the Vancouver Python users group on > 2.6/3.0, and I tried the following code and it failed during a live > demo:: > > >>> try: pass > ... except Exception, Exception: pass > File "", line 2 > except Exception, Exception: pass > ^ > SyntaxError: invalid syntax > > Now from what I can tell from PEP 3110, that should be legal in 3.0. > Am I reading the PEP correctly? Don't think so. The parens are necessary for a tuple of exceptions lest it be confused with the old "except E, v" syntax which meant "except E as e". Maybe in 3.1, the paren requirement can be dropped. But for 3.0, it would be a problem given that old scripts would start getting misinterpreted. I did something similar for list.sort() by requiring keyword arguments. That way, we wouldn't have list.sort(f) running with f as a cmp function 2.6 and as a key function in 3.0. Raymond From greg at krypto.org Thu Sep 4 07:08:04 2008 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 3 Sep 2008 22:08:04 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> Message-ID: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> On Wed, Sep 3, 2008 at 8:36 PM, Raymond Hettinger wrote: >> Skip> Remind me why we want to get rid of bsddb? >> >> Benjamin> The reasons are enumerated in PEP 3108. >> >> Not much justification and no references to outside discussion for such a >> heavily used package which has been part of Python for a long time in one >> form or another. > > Well said. Frankly I don't see a big deal with not including it in *3.0* so long as a reference to where to download it as an add on (jcea's pybsddb site) is included in the release notes and PEP 3108. I've updated the relevant documentation in 3.0. I'm not going to fight a battle against its removal when I know several python devs are already way to scarred and cranky to ever change their minds due to BerkeleyDB itself being painful to get working right on all platforms. Theres truth to that not being worth our time if we actually want to test the module properly to avoid shipping a lemon. The fact that the Python Lib/bsddb/test/ test suite has uncovered actual Oracle/Sleepycat BerkeleyDB bugs in supposedly stable releases has always disturbed me. I do wish this had been discussed on comp.lang.python before now rather than pulling the rug out at the last minute. oh well. -gps PS Thank you jcea for your wonderful work on improving bsddb! Regardless of whether it appears in the standard library in the future you're making many users very happy with your work. From greg at krypto.org Thu Sep 4 07:17:53 2008 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 3 Sep 2008 22:17:53 -0700 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> Message-ID: <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> I agree that this should go in. zlib should return bytes. other read functions and similar modules like bz2module already return bytes. unless i hear objections, i'll commit this in about 12 hours. On Wed, Aug 20, 2008 at 12:20 AM, Anand Balachandran Pillai wrote: > Hi, > > I think the patches for issue 3492 should also be merged in for this beta. > It affects the behaviour of zlib module. I have submitted patches for > zlibmodule.c > and test_zlib.py a few weeks back and they are ready to be merged anytime. > > I am forwarding a message where I discussed this with Antoine, who is > CCed on the bug. > > Thanks > > --Anand > > ---------- Forwarded message ---------- > From: Antoine Pitrou > Date: Wed, Aug 20, 2008 at 12:26 PM > Subject: Re: [Python-3000] Beta 3 planned for this Wednesday > To: Anand Balachandran Pillai > > > > Hello Anand, > >> If that is the case is http://bugs.python.org/issue3492 important ? It is >> not marked as critical or blocker or anything, but I find it strange >> that zlib in >> Python 3.0 will return bytearrays, if we don't merge the patches. > > Agreed, however I can't do it myself today. You should post your message > to the list or on the bug tracker if you want someone to apply the patch > before the beta. > > Regards > > Antoine. > > > > > > -- > -Anand > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > From abpillai at gmail.com Thu Sep 4 07:42:43 2008 From: abpillai at gmail.com (Anand Balachandran Pillai) Date: Thu, 4 Sep 2008 11:12:43 +0530 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> Message-ID: <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith wrote: > I agree that this should go in. zlib should return bytes. other read > functions and similar modules like bz2module already return bytes. > unless i hear objections, i'll commit this in about 12 hours. +1 :) > Regards -- -Anand From mhammond at skippinet.com.au Thu Sep 4 09:28:36 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 4 Sep 2008 17:28:36 +1000 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: Message-ID: <00cd01c90e5f$daf326a0$90d973e0$@com.au> Barry writes: > In addition, Mark reported in IRC that there are some regressions in > the logging module. 3772 logging module fails with non-ascii data Which according to the IRC discussion doesn't apply to py3k. The fix for 2.6 is quite trivial... Cheers, Mark From jcea at jcea.es Thu Sep 4 11:59:28 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 11:59:28 +0200 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: References: <48BED164.1050103@jcea.es> <48BF0245.9000102@jcea.es> <48BF1749.7040803@jcea.es> Message-ID: <48BFB180.7010706@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Barry Warsaw wrote: >> Can I do anything to revert this decision?. If not, what can I do to be >> reconsidered in 3.1?. > > Start raising some pitchforks. It looks like Raymond will join the > march :). Sorry, I know the word ("pitchfork"), but I don't understand the meaning you want to communicate. English is not my native language. > I completely agree with Guido that bsddb (not pybsddb) has been a > headache since forever. I toke over bsddb maintenance in february/march because it was decided it was unmaintained and should be removed. I stepped forward to avoid this risk, because I'm qualified, motivated and I use bsddb everyday. I think that providing a powerful ACID/replication/distributed transaction module in stock python is a *huge* feature. In fact, given some enterprise policies, some users of this module won't be able to use it, because policies don't allow to install "non standard" packages, aside python itself. Being a novice python-dev member is a handicap, working in such a "visible" and shamed module, and it shows. But although managing the 3.0 conversion hasn't been easy, it is already done. > I personally believe that Python > and pybsddb are both better off with their own maintenance lifecycles I agree, and that is the reason there exists a separate bsddb3 module available via PYPI. But that is orthogonal to bsddb inclusion in Python. I will keep maintaining bsddb as a separate package, in any case. But I will miss being able to get your advice and you knowledge, and the invaluable patches Neal provides from time to time :). 3.0 release is being stressful for all of us. I now. Thanks for the time you spent explaining the situation and letting me argue back :). Thanks to all of you (python-dev) for the time you wasted teaching me and suffering builtbots crashes O:-) PS: I will battle for bsddb readmission. If any of you can provide positive rationals for it, please, let me know. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL+xgJlgi5GaxT1NAQIdNAP/bySuzSW6bqUnT89Y0tUuGI0G0Svmol1A 3YNXtW/UIhNSL2BVNrAzrSLlcFjJmoCJOOUCfsK22sMb7+JveLFofPiUz+4Q2eaA Zs3rIFY/k13eJmFtDd101OExgBtamzIUjkYVyr6OxdxlvIMbDp2zMdwHiFQM3vr8 MUntInFEDQA= =qTfu -----END PGP SIGNATURE----- From andrewm at object-craft.com.au Thu Sep 4 12:29:04 2008 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Thu, 04 Sep 2008 20:29:04 +1000 Subject: [Python-3000] bsddb finished for 2.6/3.0 (and ": str() on a bytes instance") In-Reply-To: <48BFB180.7010706@jcea.es> References: <48BED164.1050103@jcea.es> <48BF0245.9000102@jcea.es> <48BF1749.7040803@jcea.es> <48BFB180.7010706@jcea.es> Message-ID: <20080904102904.66DBB600898@longblack.object-craft.com.au> >> Start raising some pitchforks. It looks like Raymond will join the >> march :). > >Sorry, I know the word ("pitchfork"), but I don't understand the meaning >you want to communicate. English is not my native language. The farmers march on town hall as a mob, carrying their pitchforks, when the council makes an unpopular rule. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From skip at pobox.com Thu Sep 4 13:03:27 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 4 Sep 2008 06:03:27 -0500 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> Message-ID: <18623.49279.686393.3857@montanaro-dyndns-org.local> Brett> All but dbm.dumb require some pre-existing library to exist to Brett> compile against. So any platform that has the proper libraries Brett> installed will be able to use ndbm or gnu, but as for which Brett> platforms that are I do not know. Wasn't bsddb (either bsddb185 or bsddb3) built for the Windows distributions? Without something there's no guarantee that anydbm or shelve will work out of the box. As Raymond pointed out, dumbdbm would be a poor choice as the default dict-on-disk module. Skip From skip at pobox.com Thu Sep 4 13:08:01 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 4 Sep 2008 06:08:01 -0500 Subject: [Python-3000] Not releasing rc1 tonight In-Reply-To: References: Message-ID: <18623.49553.393129.601096@montanaro-dyndns-org.local> Barry> In addition, Mark reported in IRC that there are some regressions Barry> in the logging module. Vinay apparently checked in some changes to the logging module with no review. In the absence of obvious bug fixes there that should probably be reverted. Skip From facundobatista at gmail.com Thu Sep 4 13:31:18 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 4 Sep 2008 08:31:18 -0300 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> References: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> Message-ID: 2008/9/4 Raymond Hettinger : > Can I go ahead with some bug fixes and doc improvements > or should I wait until after Friday? Doc improvements: go ahead. Bug fixes: the patchs should be revised by other developer. (I'll be hanging around in #python-dev today and tomorrow, btw, ping me if I can help you) -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From barry at python.org Thu Sep 4 15:01:34 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 09:01:34 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> References: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> Message-ID: <5FEA88D7-EEA9-40E1-AD03-FC2607565FEC@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 12:14 AM, Raymond Hettinger wrote: > [Barry] >> I'm not going to release rc1 tonight. > > Can I go ahead with some bug fixes and doc improvements > or should I wait until after Friday? Doc fixes are fine. Please have bug fix patches reviewed by another python-dev member. Bonus points for any bug fix that closes a release blocker! :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/cL3EjvBPtnXfVAQKnmgQAlx89LWeq0hEmTRvTGy/DHIYioARqAisG wJAnZPqinbGI6pkyn0kiwgDOvNzstnFQSZsEFiAFU+iF+nbgkm8agcTf+eLXqCFK y+o0xXTi7fLXKuaGioY54kz3BcwQH17Ul3X6vRxBdCWYesAe3rIXprnNgt/Euuyy P5sZLKwfTls= =b3n4 -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 15:02:44 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 09:02:44 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <00cd01c90e5f$daf326a0$90d973e0$@com.au> References: <00cd01c90e5f$daf326a0$90d973e0$@com.au> Message-ID: <50FD6660-44FE-433E-827C-B1ECD82FB55A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 3:28 AM, Mark Hammond wrote: > Barry writes: > >> In addition, Mark reported in IRC that there are some regressions in >> the logging module. > > 3772 logging module fails with non-ascii data > > Which according to the IRC discussion doesn't apply to py3k. The > fix for > 2.6 is quite trivial... Thanks. Looks like Vinay committed this. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/cdHEjvBPtnXfVAQIb7gP9G2o8eSnWWfEmlanwoqiHGxgqUjQtx8Xz es/Sjclk5KZ2X4I/jITJcOxGDfTT3h7FX9tDQiUaIzZAVB66qyzWc3957bUwqeqS 9HNqfB4OoIa1Ds2+lukXpEPci6eddl2xVFEkejgsfdyS4q2/K1/R6URTPCXnPNiH zoiXNaEcBcM= =Zk4M -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 15:03:34 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 09:03:34 -0400 Subject: [Python-3000] Not releasing rc1 tonight In-Reply-To: <18623.49553.393129.601096@montanaro-dyndns-org.local> References: <18623.49553.393129.601096@montanaro-dyndns-org.local> Message-ID: <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 7:08 AM, skip at pobox.com wrote: > > Barry> In addition, Mark reported in IRC that there are some > regressions > Barry> in the logging module. > > Vinay apparently checked in some changes to the logging module with no > review. In the absence of obvious bug fixes there that should > probably be > reverted. Or did he commit Mark's patch from bug 3772? If so, that would count as a reviewed patch. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/cpnEjvBPtnXfVAQIkSwQApjBbIGgyV3X1oBhBLtRjTZrVDgFXPfRH XyXtVd1r75PT+24UuqPHWNC9l+/sKnUaYqH3kJbHG2duMyr/duG7j6EIJLzOz+QC SKwqtQr+WDBR0vpH3Q0wrUzQNXhtDyCjWx84IatRbKRVDUfbDlFQy/jj+SLvRBBR WGJTAFP1x5g= =mebg -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 15:04:25 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 09:04:25 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: <7CA312873E964C5ABB4B4AA4F2793DF2@RaymondLaptop1> Message-ID: <27C24817-848B-4173-B5F8-7240EB5B28D1@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 7:31 AM, Facundo Batista wrote: > (I'll be hanging around in #python-dev today and tomorrow, btw, ping > me if I can help you) Me too, though I'm a bit busy at work. Ping my nick 'barry' if you need any RM-level decision. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/c2XEjvBPtnXfVAQJ4EQP/SecaG0VRtsezedDRpX+zwmVo6W0n+9EP rmKH5CWMSTWh53rXySCmE8IS2rrdhoyCZNSy0aERMTGz5JuEh/sw+O5EaxJQMFST DdYx0aLRVwb62JaQHr7W7YyVWBG5+CQa3BehASFiwsw0dsAp0BpkwW1nIhybkLcW hXNRzB2gwXI= =9Mgt -----END PGP SIGNATURE----- From skip at pobox.com Thu Sep 4 15:45:43 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 4 Sep 2008 08:45:43 -0500 Subject: [Python-3000] Not releasing rc1 tonight In-Reply-To: <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org> References: <18623.49553.393129.601096@montanaro-dyndns-org.local> <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org> Message-ID: <18623.59015.183377.941143@montanaro-dyndns-org.local> Barry> Or did he commit Mark's patch from bug 3772? If so, that would Barry> count as a reviewed patch. The checkin message says issue 3726: Author: vinay.sajip Date: Wed Sep 3 11:20:05 2008 New Revision: 66180 Log: Issue #3726: Allowed spaces in separators in logging configuration files. Modified: python/trunk/Lib/logging/config.py python/trunk/Lib/test/test_logging.py python/trunk/Misc/NEWS I noticed because someone else (Brett?) questioned the apparent lack of review. Skip From barry at python.org Thu Sep 4 15:48:44 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 09:48:44 -0400 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 1:08 AM, Gregory P. Smith wrote: > Frankly I don't see a big deal with not including it in *3.0* so long > as a reference to where to download it as an add on (jcea's pybsddb > site) is included in the release notes and PEP 3108. I've updated the > relevant documentation in 3.0. Thanks Greg. I think this is exactly right. I've updated the RELNOTES file to point to the Cheeseshop package. We should make sure to widely advertise the availability of the separate package. Aside: can someone review the older RELNOTES items? Ideally, I'd like to have this cleaned up to contain just the issues for 3.0 final. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/nPXEjvBPtnXfVAQI89QP9GMeaK5YvShAwB1Ok2YKK0FDa0f04LKdk 0rIKpNLCR4Yhw3HTmtff8TVanbGoXcClM17UrTJSkAzzQIZDNSp6dT1Y+lnpe/Gi /3sVObliEdpOhVg5HPBd0mBr3Vgehqo9x4fxaws4p9GcdPvE7dF/96gFIuJXAK4l 5ybmmErxECI= =F78M -----END PGP SIGNATURE----- From p.f.moore at gmail.com Thu Sep 4 16:19:07 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Sep 2008 15:19:07 +0100 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> Message-ID: <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> 2008/9/4 Brett Cannon : > On Wed, Sep 3, 2008 at 7:33 PM, wrote: > All but dbm.dumb require some pre-existing library to exist to compile > against. So any platform that has the proper libraries installed will > be able to use ndbm or gnu, but as for which platforms that are I do > not know. On Windows, none are available except dbm.dumb and bsddb (presently). If bsddb is to be removed, can/should one of the other "real" dbm variants be added to the standard binary, so that Windows users have at least one usable dbm option? Paul From barry at python.org Thu Sep 4 16:39:46 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 10:39:46 -0400 Subject: [Python-3000] Not releasing rc1 tonight In-Reply-To: <18623.59015.183377.941143@montanaro-dyndns-org.local> References: <18623.49553.393129.601096@montanaro-dyndns-org.local> <4C4EC784-A7CA-4497-A0FF-1D9B5D0E8399@python.org> <18623.59015.183377.941143@montanaro-dyndns-org.local> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 9:45 AM, skip at pobox.com wrote: > > Barry> Or did he commit Mark's patch from bug 3772? If so, that > would > Barry> count as a reviewed patch. > > The checkin message says issue 3726: > > Author: vinay.sajip > Date: Wed Sep 3 11:20:05 2008 > New Revision: 66180 > > Log: > Issue #3726: Allowed spaces in separators in logging > configuration files. > > Modified: > python/trunk/Lib/logging/config.py > python/trunk/Lib/test/test_logging.py > python/trunk/Misc/NEWS > > I noticed because someone else (Brett?) questioned the apparent lack > of > review. Yep, that's a different issue. Unless someone wants to vouch for the committed patch after the fact, could someone please revert the change and contact Vinay to get a proper fix reviewed? I noticed that he says in the tracker issue that what was committed was modified from the posted patch. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/zMnEjvBPtnXfVAQJY3QP+LNXhx1YGuCHSw/D2n0yVBj1PLLUbgYnp k/+zWWmvIRc8YSApV1YyYR4iXfqqYFoi1SH0eh7F1k9+2CZ51HHD0p6CZ0Eb1FQ2 405ocxT28R3UR/E0ozxFca3IuNhGPR2FI/BCfsLrdrA3UtHA4XvZMDvM3KxEMarl 9WdYgop/I8Y= =b6Ry -----END PGP SIGNATURE----- From python at rcn.com Thu Sep 4 16:59:13 2008 From: python at rcn.com (Raymond Hettinger) Date: Thu, 4 Sep 2008 07:59:13 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> Message-ID: [Brett Cannon] >> On Wed, Sep 3, 2008 at 7:33 PM, wrote: >> All but dbm.dumb require some pre-existing library to exist to compile >> against. So any platform that has the proper libraries installed will >> be able to use ndbm or gnu, but as for which platforms that are I do >> not know. [Paul Moore] > On Windows, none are available except dbm.dumb and bsddb (presently). > If bsddb is to be removed, can/should one of the other "real" dbm > variants be added to the standard binary, so that Windows users have > at least one usable dbm option? The is a major problem for shelves (which I use often). Some alternative needs to be put in place before bsddb gets ripped-out. Was any of this discussed or thought through for PEP 3108? What's wrong with deprecating in 3.0 and replacing in 3.1? What advantage is there in ripping out bsddb during a release candidate and crippling shelves on Windows? Why the rush to rip it out no matter what? Raymond From jcea at jcea.es Thu Sep 4 17:03:53 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 17:03:53 +0200 Subject: [Python-3000] About "daemon" in threading module Message-ID: <48BFF8D9.3030002@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 renamed it to "thread.set_daemon()". Lately Python 3.0 changes the method to an attribute "thread.daemon". I think the last change is risky, because you can mistype and create a new attribute, instead of set daemon mode. Since daemon mode is only usually visible when things goes wrong (the main thread dies), you can miss the bug for a long time. Similarly with the new "thread.name" attribute. I would rather revert to the method style, or redo the class to avoid new attribute creation, maybe via some "thread.__setattr__()" magic. Sorry if this issue is already discussed. I don't find any previous thread about this. PS: If you mistype the method name, you get an error. If you mistype the attribute assignment, the bug goes unnoticed. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL/415lgi5GaxT1NAQJ+ugP/RA0wG1b2/q0C96FYq18AIMGONrfKMh7+ BjxrdAL3knqwPXsJW1JbgQV5vsOLpMqx6v8epdFN9FLH5KBLTW3jDmn3OAh7FwyN 5CcoXFc8MWT4/tsa2+SUOVC1rBibx5+b2Cz28KK/RnXK6O4WR/u/3f8fpssMApdW kC6Y0MqFoD4= =2/g3 -----END PGP SIGNATURE----- From p.f.moore at gmail.com Thu Sep 4 17:07:31 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Sep 2008 16:07:31 +0100 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48BFF8D9.3030002@jcea.es> References: <48BFF8D9.3030002@jcea.es> Message-ID: <79990c6b0809040807sddbb366h8c8ed0286334d570@mail.gmail.com> 2008/9/4 Jesus Cea : > PS: If you mistype the method name, you get an error. If you mistype the > attribute assignment, the bug goes unnoticed. I'm neutral over the threading change, but this is a good point to consider in general as part of the "method vs property" question when designing classes. Paul. From barry at python.org Thu Sep 4 17:24:05 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 11:24:05 -0400 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Since rc1 did not go out last night, bsddb could be restored. I still don't think it should be, but at this point it's up to Guido to override, and I will abide by his decision. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSL/9lXEjvBPtnXfVAQKMygQAi1Wtox+GxN4ZiC1lyiqX3Vklyket4Was 5bCxt4On1DeGmacunjOdCigCzG4Or6fbSe7cSh1y4CbTL3httxKLTh1gok6PME/X k9MBSY3T6j1ykcvc64ThMoaGvgNuFXKmYo7ifLaDfp2KLSpmGDHVj3uuuYcVPmMy HwhYQDIEda4= =Jzrq -----END PGP SIGNATURE----- From jnoller at gmail.com Thu Sep 4 17:33:51 2008 From: jnoller at gmail.com (Jesse Noller) Date: Thu, 4 Sep 2008 11:33:51 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48BFF8D9.3030002@jcea.es> References: <48BFF8D9.3030002@jcea.es> Message-ID: <4222a8490809040833r4af29fa1j50af4d69ed3be683@mail.gmail.com> On Thu, Sep 4, 2008 at 11:03 AM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 > renamed it to "thread.set_daemon()". Lately Python 3.0 changes the > method to an attribute "thread.daemon". > > I think the last change is risky, because you can mistype and create a > new attribute, instead of set daemon mode. Since daemon mode is only > usually visible when things goes wrong (the main thread dies), you can > miss the bug for a long time. > > Similarly with the new "thread.name" attribute. > > I would rather revert to the method style, or redo the class to avoid > new attribute creation, maybe via some "thread.__setattr__()" magic. > > Sorry if this issue is already discussed. I don't find any previous > thread about this. > > PS: If you mistype the method name, you get an error. If you mistype the > attribute assignment, the bug goes unnoticed. > Jesus - The discussion and implementation happened here: http://bugs.python.org/issue3352 and: http://mail.python.org/pipermail/python-dev/2008-June/080285.html -Jesse From python at rcn.com Thu Sep 4 17:35:26 2008 From: python at rcn.com (Raymond Hettinger) Date: Thu, 4 Sep 2008 08:35:26 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> Message-ID: <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> [Barry] > Since rc1 did not go out last night, bsddb could be restored. I still > don't think it should be, but at this point it's up to Guido to > override, and I will abide by his decision. Put in my vote for restoration, deprecation, and thought-out removal/replacement in 3.1. The ensuing discussions have made it clear that immediate removal is controversial and problematic. Also, part of the original motivation disappeared when Jesus Cea stepped-up. Raymond From lists at cheimes.de Thu Sep 4 17:39:22 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 04 Sep 2008 17:39:22 +0200 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48BFF8D9.3030002@jcea.es> References: <48BFF8D9.3030002@jcea.es> Message-ID: Jesus Cea wrote: > I would rather revert to the method style, or redo the class to avoid > new attribute creation, maybe via some "thread.__setattr__()" magic. Or maybe with __slots__ in the threading class. It'd also safe some memory and subclasses of Threading still work as expected. Christian From jnoller at gmail.com Thu Sep 4 17:41:03 2008 From: jnoller at gmail.com (Jesse Noller) Date: Thu, 4 Sep 2008 11:41:03 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> Message-ID: <4222a8490809040841k3905bcd0h495f5b673f621bee@mail.gmail.com> On Thu, Sep 4, 2008 at 11:39 AM, Christian Heimes wrote: > Jesus Cea wrote: >> >> I would rather revert to the method style, or redo the class to avoid >> new attribute creation, maybe via some "thread.__setattr__()" magic. > > Or maybe with __slots__ in the threading class. It'd also safe some memory > and subclasses of Threading still work as expected. > > Christian FWIW: Any change like this should also patch the multiprocessing module - both threading and multiprocessing are "moving in lock-step" From solipsis at pitrou.net Thu Sep 4 17:47:36 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 4 Sep 2008 15:47:36 +0000 (UTC) Subject: [Python-3000] About "daemon" in threading module References: <48BFF8D9.3030002@jcea.es> Message-ID: Jesus Cea jcea.es> writes: > > First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 > renamed it to "thread.set_daemon()". Lately Python 3.0 changes the > method to an attribute "thread.daemon". > > I think the last change is risky, because you can mistype and create a > new attribute, instead of set daemon mode. Since daemon mode is only > usually visible when things goes wrong (the main thread dies), you can > miss the bug for a long time. I've never understood why the "daemon" flag couldn't be passed as one of the constructor arguments. It would make code shorter, and avoid the mistyping risk mentioned by Jesus. It also sounds saner, since you shouldn't change the flag after the thread is started anyway. Regards Antoine. From jcea at jcea.es Thu Sep 4 18:02:22 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 18:02:22 +0200 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> Message-ID: <48C0068E.1060606@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gregory P. Smith wrote: > The fact that the Python Lib/bsddb/test/ test suite has uncovered > actual Oracle/Sleepycat BerkeleyDB bugs in supposedly stable releases > has always disturbed me. This is true. But python uses openssl, for example, and it must be updated from time to time, for example. The only difference is that the bugs are not discovered by python. In fact, I can say that Berkeley DB 4.7 snapshot releases crashed a lot with bsddb testsuite. Berkeley DB 4.7.25 is rock solid, in part, because of pybsddb and the feedback between me and Oracle people. > PS Thank you jcea for your wonderful work on improving bsddb! > Regardless of whether it appears in the standard library in the future > you're making many users very happy with your work. I would give a leg to know how many, actually :). Err, put that sawchain down! :-P - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAGjplgi5GaxT1NAQIIUgQApuIbyiv6iLf2SgiiXQEJ3iJ3gLK2ksne Nry+Yb1fwR1DfEyUd94QqkpI6rAsCZblqC2uboNblx59naz6V4Xlg8ts2ZCr0k5y GD6vDjV+PeGSgDKZHf8X28kCikVRXDSwPcpT659hjjYBfaezxQDkrVMbt+RQFU8H KbKWpz5A5fc= =rHIW -----END PGP SIGNATURE----- From jcea at jcea.es Thu Sep 4 18:20:59 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 18:20:59 +0200 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> Message-ID: <48C00AEB.6030708@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Raymond Hettinger wrote: > [Barry] >> Since rc1 did not go out last night, bsddb could be restored. I >> still don't think it should be, but at this point it's up to Guido >> to override, and I will abide by his decision. > > Put in my vote for restoration, deprecation, and thought-out > removal/replacement in 3.1. > The ensuing discussions have made it clear that immediate removal is > controversial and problematic. > Also, part of the original motivation disappeared when Jesus Cea > stepped-up. I have a travel this weekend (4 days weekend in Madrid, until Wednesday), but I can cancel and be available for any remaining issue. I would need to know in the next 24-30 hours or so :). I'm a bit worried about you restoring bsddb and be pulled-off shortly again if I can't resolve any remaining issues in minutes :). But I would take the risk. PS: Something to consider, also, is that (I think), some buildbot could crash (core dump) while running the bsddb testsuite. That would be an issue with Berkeley DB installation, not bsddb module, undetected until now because no code used the Berkeley DB library. Since the run would crash with core dump, it wouldn't verify the entire python testsuite. That would be a big issue. What can be done then, if the buildbot admins are not available to verify library versions, provide the coredump, and so?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAK6Jlgi5GaxT1NAQK++QP+INECV7OaHxXDoJNj4BUT2aImNuNq+uFc EkYeWXXIzvbnOjukTT22V6UrH/eXdqKW/E/QMAzw7K9h35/13Xedz+5VLigVVOUK ELW/0/bBiSaEHiwpLPwfWeq1eCN5NgwlgiHVMH9XJtyCh5Qw8f/pio+llcoJ2wun NZ9FQnRP3Sw= =VUCY -----END PGP SIGNATURE----- From barry at python.org Thu Sep 4 19:07:47 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 13:07:47 -0400 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <48C00AEB.6030708@jcea.es> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za><1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com><18623.18700.76260.893902@montanaro-dyndns-org.local><79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> Message-ID: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote: > I'm a bit worried about you restoring bsddb and be pulled-off shortly > again if I can't resolve any remaining issues in minutes :). But I > would > take the risk. Don't worry about that. Guido's decision will be binding for 3.0. > PS: Something to consider, also, is that (I think), some buildbot > could > crash (core dump) while running the bsddb testsuite. That would be an > issue with Berkeley DB installation, not bsddb module, undetected > until > now because no code used the Berkeley DB library. Since the run would > crash with core dump, it wouldn't verify the entire python testsuite. > That would be a big issue. What can be done then, if the buildbot > admins > are not available to verify library versions, provide the coredump, > and so?. Live with it. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMAV43EjvBPtnXfVAQJRVAP/fr9hqBegZhWSlBa9Rtt0ODfysGCW112Q 4fgOG3LiWjxKOvdk4cTJ4zpRvBqwhF8h5ZSImDPJpQbE+Nzw8qNFGKGjTP37TRfV XNBm6gJlzEvX8B3N7BtvVWk7LmzhVP2+Dcs/36drGwnrfclUIpOjBjlKktet7sL1 eQan89KEPbg= =vYG1 -----END PGP SIGNATURE----- From guido at python.org Thu Sep 4 19:57:40 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Sep 2008 10:57:40 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> Message-ID: [I don't know who added my Google address to the CC list. Please don't do that again.] On Thu, Sep 4, 2008 at 10:07 AM, Barry Warsaw wrote: > On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote: > >> I'm a bit worried about you restoring bsddb and be pulled-off shortly >> again if I can't resolve any remaining issues in minutes :). But I would >> take the risk. > > Don't worry about that. Guido's decision will be binding for 3.0. I am still in favor of removing bsddb from Python 3.0. It depends on a 3rd party library of enormous complexity whose stability cannot always be taken for granted. Arguments about code ownership, release cycles, bugbot stability and more all point towards keeping it separate. I consider it no different in nature than 3rd party UI packages (e.g. wxPython or PyQt) or relational database bindings (e.g. the MySQL or PostgreSQL bindings): very useful to a certain class of users, but outside the scope of the core distribution. Python 3.0 is a perfect opportunity to say goodbye to bsddb as a standard library component. For apps that depend on it, it is just a download away -- deprecating in 3.0 and removal in 3.1 would actually send the *wrong* message, since it is very much alive! I am grateful for Jesus to have taken over maintenance, and hope that the package blossoms in its newfound freedom. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Sep 4 20:01:13 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Sep 2008 11:01:13 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> Message-ID: [I don't know who added my Google address to the CC list. Please don't do that again.] On Thu, Sep 4, 2008 at 10:07 AM, Barry Warsaw wrote: > On Sep 4, 2008, at 12:20 PM, Jesus Cea wrote: > >> I'm a bit worried about you restoring bsddb and be pulled-off shortly >> again if I can't resolve any remaining issues in minutes :). But I would >> take the risk. > > Don't worry about that. Guido's decision will be binding for 3.0. I am still in favor of removing bsddb from Python 3.0. It depends on a 3rd party library of enormous complexity whose stability cannot always be taken for granted. Arguments about code ownership, release cycles, bugbot stability and more all point towards keeping it separate. I consider it no different in nature than 3rd party UI packages (e.g. wxPython or PyQt) or relational database bindings (e.g. the MySQL or PostgreSQL bindings): very useful to a certain class of users, but outside the scope of the core distribution. Python 3.0 is a perfect opportunity to say goodbye to bsddb as a standard library component. For apps that depend on it, it is just a download away -- deprecating in 3.0 and removal in 3.1 would actually send the *wrong* message, since it is very much alive! I am grateful for Jesus to have taken over maintenance, and hope that the package blossoms in its newfound freedom. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Sep 4 20:16:26 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 14:16:26 -0400 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> Message-ID: <5149970C-83E2-4F40-903D-06601860BE0B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 1:57 PM, Guido van Rossum wrote: > I am still in favor of removing bsddb from Python 3.0. It depends on a > 3rd party library of enormous complexity whose stability cannot always > be taken for granted. Arguments about code ownership, release cycles, > bugbot stability and more all point towards keeping it separate. I > consider it no different in nature than 3rd party UI packages (e.g. > wxPython or PyQt) or relational database bindings (e.g. the MySQL or > PostgreSQL bindings): very useful to a certain class of users, but > outside the scope of the core distribution. > > Python 3.0 is a perfect opportunity to say goodbye to bsddb as a > standard library component. For apps that depend on it, it is just a > download away -- deprecating in 3.0 and removal in 3.1 would actually > send the *wrong* message, since it is very much alive! I am grateful > for Jesus to have taken over maintenance, and hope that the package > blossoms in its newfound freedom. Thanks Guido. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMAl+3EjvBPtnXfVAQKOFAQAoccw1UNoJ9EIqpkauyhD6ITloOYdMEEC Mqp4hmJWgW8PO2J4YzGYGr7W4ty4JdsL9VGxga20bT9iFvIiTVR8ZOkAPInzhZke bpiXhac5Z6v9I0+8xLbEguiM9z10xHVXscCmQdjBkk4RRWTtoioq+NCESeU6qgrM LxHfAPlCMms= =THGn -----END PGP SIGNATURE----- From jcea at jcea.es Thu Sep 4 20:33:25 2008 From: jcea at jcea.es (Jesus Cea) Date: Thu, 04 Sep 2008 20:33:25 +0200 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> Message-ID: <48C029F5.70307@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote: > I am still in favor of removing bsddb from Python 3.0. BDFL has talked. I want to record this: * I will keep maintaining bsddb in Python 2.6. No idea what is the plan for 2.7, nevertheless. * I will keep bsddb updated and available via PYPI, both for 2.x and 3.x branches. Source only. Windows users will be at the mercy of other compiling the module and making it available. * I will be available if the decision to drop bsddb from standard lib is reconsidered. * I will try to find another Python area of interest to me, to fully honor my commit privileges. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAp8Zlgi5GaxT1NAQLLpAP/Yb5Po5krSTt+L6llyIx/CDkcr60M57Kc W034uMSH8IfQ4cswkM+d96BwCHlDczew5qWzHYR/f7K0YeZPaKWuT4Z8/WlchejV oGC0orGq/NQ1LnNyGjzgdFq50htdQt93EWUvBjhQwOyFeiLb1XuxCFyM/ZA3ge2y fCaamryHnqA= =1RcD -----END PGP SIGNATURE----- From brett at python.org Thu Sep 4 20:41:30 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Sep 2008 11:41:30 -0700 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <48C029F5.70307@jcea.es> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> <48C029F5.70307@jcea.es> Message-ID: On Thu, Sep 4, 2008 at 11:33 AM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Guido van Rossum wrote: >> I am still in favor of removing bsddb from Python 3.0. > > BDFL has talked. > > I want to record this: > > * I will keep maintaining bsddb in Python 2.6. No idea what is the plan > for 2.7, nevertheless. > Great! As everyone has said, this is nothing personal and I am glad you are not taking it that way. As for 2.7, it's a wait and see. My guess is that we will have one where we have backported some more stuff from 3.0/3.1 to 2.7 to keep the transition easy. > * I will keep bsddb updated and available via PYPI, both for 2.x and 3.x > branches. Source only. Windows users will be at the mercy of other > compiling the module and making it available. > > * I will be available if the decision to drop bsddb from standard lib is > reconsidered. > > * I will try to find another Python area of interest to me, to fully > honor my commit privileges. > As I mentioned in another email, I think it would be a great idea to change the dbm package so that bsddb can easily be hooked into the dbm package as a 3rd-party DB back-end (along with any other DB backend that wants to). If you want to work on that for 2.7/3.1 it would be greatly appreciated! -Brett From barry at python.org Thu Sep 4 20:52:26 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 14:52:26 -0400 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <48C029F5.70307@jcea.es> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> <00A761011E6B4617B3728CB9B9FD9A60@RaymondLaptop1> <48C00AEB.6030708@jcea.es> <36FB505D-1489-4BD0-84D6-74F2791F914D@python.org> <48C029F5.70307@jcea.es> Message-ID: <6C8A6A2C-0234-4DFD-BBF2-6AF8A4383349@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 2:33 PM, Jesus Cea wrote: > * I will try to find another Python area of interest to me, to fully > honor my commit privileges. BTW Jesus, if you want to maintain the code on python.org, we can create an area in the sandbox for you. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMAuanEjvBPtnXfVAQLD6QP+MZdd5MtqxpNOO2jFG0KpKf1/2f+jzaHS UCyjTINbkRIBoAA8QYOTtbkVdNvALnxatCR4N6HPKPEKxVVdAptOP3QQr4iP3dU4 YjBYfd6Ki17gqcuS65ELwjNowPAow+E+8duOfRK3QmrcOS0nl/soTSh1j4ylF6xZ D9v15jK4NKY= =VN3g -----END PGP SIGNATURE----- From guido at python.org Thu Sep 4 21:36:12 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Sep 2008 12:36:12 -0700 Subject: [Python-3000] Problem with grammar for 'except'? In-Reply-To: References: Message-ID: On Wed, Sep 3, 2008 at 9:25 PM, Raymond Hettinger wrote: > [Brett] >> >> I gave a talk last night at the Vancouver Python users group on >> 2.6/3.0, and I tried the following code and it failed during a live >> demo:: >> >> >>> try: pass >> ... except Exception, Exception: pass >> File "", line 2 >> except Exception, Exception: pass >> ^ >> SyntaxError: invalid syntax >> >> Now from what I can tell from PEP 3110, that should be legal in 3.0. >> Am I reading the PEP correctly? > > Don't think so. > The parens are necessary for a tuple of exceptions > lest it be confused with the old "except E, v" syntax > which meant "except E as e". > > Maybe in 3.1, the paren requirement can be dropped. I would wait longer -- until well after the 2.x line is dead and buried. It will take some time for every Python user to train their Python fingers not to type "except E, v:" and we don't want people who are late in migrating inserting bugs like this in their first 3.x program. > But for 3.0, it would be a problem given that old > scripts would start getting misinterpreted. > > I did something similar for list.sort() by requiring > keyword arguments. That way, we wouldn't have > list.sort(f) running with f as a cmp function 2.6 and > as a key function in 3.0. > > > Raymond > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Sep 4 21:56:51 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Sep 2008 12:56:51 -0700 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> Message-ID: On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: > Jesus Cea jcea.es> writes: >> >> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >> method to an attribute "thread.daemon". >> >> I think the last change is risky, because you can mistype and create a >> new attribute, instead of set daemon mode. Since daemon mode is only >> usually visible when things goes wrong (the main thread dies), you can >> miss the bug for a long time. > > I've never understood why the "daemon" flag couldn't be passed as one of the > constructor arguments. It would make code shorter, and avoid the mistyping risk > mentioned by Jesus. It also sounds saner, since you shouldn't change the flag > after the thread is started anyway. As to the why question, this was done to match the Java Thread class. I don't want to speculate why the Java API was designed this way -- possibly it was a relic of an earlier API version in Java, but possibly there's a reason I can't fathom right now. After all, there are excellent reasons why start() is a separate call... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Sep 4 22:05:21 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Sep 2008 16:05:21 -0400 Subject: [Python-3000] [Python-3000-checkins] r66218 - python/branches/py3k/RELNOTES In-Reply-To: References: <20080904134435.6093A1E400B@bag.python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote: > I'm a little confused -- why did you remove the release notes for > previous betas but leave those for the alphas in place? ISTM that the > file was an accumulation of release notes throughout the various > releases -- just like Misc/NEWS, but with a different focus. This is > how release notes in other products I've seen typically work, too. Mostly because I wasn't sure which of those release notes are still relevant. I asked (albeit in a different thread) for some assistance in determining what the current state of those are. I'm not so sure it's helpful to separately indicate release notes for alphas and betas, and it's definitely not helpful to include items that are no longer relevant. My thought was to list only those that apply to the final release, and then track them with public stable releases such as 3.0.1, 3.0.2, etc. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMA/gXEjvBPtnXfVAQKAhAP+O4/0eQYulAcc1gcuf4fO74kJlUBmF+i9 ezQIoPBSKh98xPGybqPcuwqNw5ZES3TSt4zgX4MNdD+4aLdXE6n/lcTS/OTgcGtd 5Tyw2ltVy9WfNE1oKK785B25uQQe94IpaYORMZW4ABdd2IYL0KOTVbRd3zjKKHqC krx0XRSzgzU= =7X4W -----END PGP SIGNATURE----- From steven.bethard at gmail.com Thu Sep 4 22:05:31 2008 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 4 Sep 2008 14:05:31 -0600 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> Message-ID: On Thu, Sep 4, 2008 at 1:56 PM, Guido van Rossum wrote: > On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: >> Jesus Cea jcea.es> writes: >>> >>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >>> method to an attribute "thread.daemon". >>> >>> I think the last change is risky, because you can mistype and create a >>> new attribute, instead of set daemon mode. Since daemon mode is only >>> usually visible when things goes wrong (the main thread dies), you can >>> miss the bug for a long time. >> >> I've never understood why the "daemon" flag couldn't be passed as one of the >> constructor arguments. It would make code shorter, and avoid the mistyping risk >> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag >> after the thread is started anyway. > > As to the why question, this was done to match the Java Thread class. > I don't want to speculate why the Java API was designed this way -- > possibly it was a relic of an earlier API version in Java, but > possibly there's a reason I can't fathom right now. After all, there > are excellent reasons why start() is a separate call... This may or may not be relevant, but since Java doesn't support argument defaults, it's often easier to define a very simple constructor, and use a bunch of setters if you want to modify the defaults. I've done this myself when programming in Java to avoid the exponential number of constructor overloads that would be necessary to do defaults properly. Of course, I don't know whether or not that had anything to do with this particular Java decision. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From brett at python.org Thu Sep 4 22:15:08 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Sep 2008 13:15:08 -0700 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: Message-ID: On Wed, Sep 3, 2008 at 8:41 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'm not going to release rc1 tonight. There are too many open release > blockers that I don't want to defer, and I'd like the buildbots to churn > through the bsddb removal on all platforms. Let me first thank Benjamin, > Brett, Mark and Antoine for their help on IRC tonight. > > Here are the issues I'm not comfortable with deferring: > > 3640 test_cpickle crash on AMD64 Windows build > 874900 threading module can deadlock after fork > 3574 compile() cannot decode Latin-1 source encodings > 3657 pickle can pickle the wrong function > 3187 os.listdir can return byte strings > 3660 reference leaks in 3.0 > 3594 PyTokenizer_FindEncoding() never succeeds > 3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2 > I just added issue 3776 to this list: deprecate bsddb/dbhash in 2.6 for removal in 3.0 . There is a patch attached to the issue to be reviewed. -Brett From ncoghlan at gmail.com Thu Sep 4 23:20:49 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Sep 2008 07:20:49 +1000 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> Message-ID: <48C05131.60209@gmail.com> Guido van Rossum wrote: > On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: >> Jesus Cea jcea.es> writes: >>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >>> method to an attribute "thread.daemon". >>> >>> I think the last change is risky, because you can mistype and create a >>> new attribute, instead of set daemon mode. Since daemon mode is only >>> usually visible when things goes wrong (the main thread dies), you can >>> miss the bug for a long time. >> I've never understood why the "daemon" flag couldn't be passed as one of the >> constructor arguments. It would make code shorter, and avoid the mistyping risk >> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag >> after the thread is started anyway. > > As to the why question, this was done to match the Java Thread class. > I don't want to speculate why the Java API was designed this way -- > possibly it was a relic of an earlier API version in Java, but > possibly there's a reason I can't fathom right now. After all, there > are excellent reasons why start() is a separate call... Hmm, having (daemon=False) as a parameter on start() would probably be an even better API than having it on __init__() (modulo subclassing compatibility concerns). Regarding Jesus concern, you can always call t._set_daemon(True) and t._set_name(whatever) if you want the extra defence against typographic errors. The potential for mistyping attribute names is hardly a problem that is unique to threading.Thread. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Thu Sep 4 23:30:34 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Sep 2008 07:30:34 +1000 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <48C0068E.1060606@jcea.es> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <1055D9D2507D4F64A6D30656C8A835D1@RaymondLaptop1> <52dc1c820809032208p6e0d5e31x295bebc79beaa86a@mail.gmail.com> <48C0068E.1060606@jcea.es> Message-ID: <48C0537A.9050404@gmail.com> Jesus Cea wrote: > This is true. But python uses openssl, for example, and it must be > updated from time to time, for example. The only difference is that the > bugs are not discovered by python. > > In fact, I can say that Berkeley DB 4.7 snapshot releases crashed a lot > with bsddb testsuite. Berkeley DB 4.7.25 is rock solid, in part, because > of pybsddb and the feedback between me and Oracle people. I think that comparison actually cuts to the heart of the issue - the problem isn't the stability of pybsddb itself, it's the stability of the underlying bsddb libraries. We don't typically have anywhere near the same level of problems with other wrapped interfaces (tk, sqlite3, openssl come to mind). Making anydbm/whichdb more extensible to allow any DB-API compliant interfaces to add themselves in 2.7/3.1 in a supported fashion would definitely be a good change though. The ActiveState and Enthought folks may also give some serious thought to continuing to bundle pybsddb even with their Python 3.0 releases (especially for Windows). Cheers, Nick. _______________________________________________ Python-3000 mailing list Python-3000 at python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Fri Sep 5 00:12:48 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Sep 2008 15:12:48 -0700 Subject: [Python-3000] [Python-3000-checkins] r66218 - python/branches/py3k/RELNOTES In-Reply-To: References: <20080904134435.6093A1E400B@bag.python.org> Message-ID: On Thu, Sep 4, 2008 at 1:05 PM, Barry Warsaw wrote: > > On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote: > >> I'm a little confused -- why did you remove the release notes for >> previous betas but leave those for the alphas in place? ISTM that the >> file was an accumulation of release notes throughout the various >> releases -- just like Misc/NEWS, but with a different focus. This is >> how release notes in other products I've seen typically work, too. > > Mostly because I wasn't sure which of those release notes are still > relevant. I asked (albeit in a different thread) for some assistance in > determining what the current state of those are. > > I'm not so sure it's helpful to separately indicate release notes for alphas > and betas, and it's definitely not helpful to include items that are no > longer relevant. My thought was to list only those that apply to the final > release, and then track them with public stable releases such as 3.0.1, > 3.0.2, etc. Well, all the alpha notes should have been fixed by now too -- they all describe temporary deviations from our high standard for releases. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Fri Sep 5 06:13:30 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Sep 2008 21:13:30 -0700 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: Message-ID: On Wed, Sep 3, 2008 at 8:41 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'm not going to release rc1 tonight. There are too many open release > blockers that I don't want to defer, and I'd like the buildbots to churn > through the bsddb removal on all platforms. Let me first thank Benjamin, > Brett, Mark and Antoine for their help on IRC tonight. > > Here are the issues I'm not comfortable with deferring: > > 3640 test_cpickle crash on AMD64 Windows build > 874900 threading module can deadlock after fork > 3574 compile() cannot decode Latin-1 source encodings > 3657 pickle can pickle the wrong function > 3187 os.listdir can return byte strings > 3660 reference leaks in 3.0 > 3594 PyTokenizer_FindEncoding() never succeeds > 3629 Py30b3 won't compile a regex that compiles with 2.5.2 and 30b2 > And because I can't stop causing trouble, I just uploaded a patch for issue3781 which solidifies warnings.catch_warnings() and its API a little bit more. Really simple patch. -Brett From greg at krypto.org Fri Sep 5 09:17:13 2008 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 5 Sep 2008 00:17:13 -0700 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> Message-ID: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> Anyone have an opinion on http://bugs.python.org/issue3492 in regards to it being a release blocker? The gist of it: zlib returns bytearrays where other modules return bytes. zipimport, because it uses zlib, required bytearrays instead of bytes as input. A few other modules also appear to return bytearrays when they're likely better off returning bytes for consistency. IMHO, it seems like bytearrays should rarely be returned by the existing standard library apis. Since they are mutable they are ideally suited for new APIs where they're passed in and modified. Whats the big deal if this is not fixed before release? Users are likely to get frustrated at inputs not being hashable without explicit (data copy) conversion to an immutable type. And any code that gets written depending on these returning bytearrays instead of bytes would need fixing if we waited until 3.1 to fix it. -gps On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai wrote: > On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith wrote: >> I agree that this should go in. zlib should return bytes. other read >> functions and similar modules like bz2module already return bytes. >> unless i hear objections, i'll commit this in about 12 hours. > > +1 :) > >> > > Regards > > -- > -Anand > From greg at krypto.org Fri Sep 5 09:30:04 2008 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 5 Sep 2008 00:30:04 -0700 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> Message-ID: <52dc1c820809050030r48ae629ejc05a968a6e811c1e@mail.gmail.com> On Thu, Sep 4, 2008 at 8:39 AM, Christian Heimes wrote: > Jesus Cea wrote: >> >> I would rather revert to the method style, or redo the class to avoid >> new attribute creation, maybe via some "thread.__setattr__()" magic. > > Or maybe with __slots__ in the threading class. It'd also safe some memory > and subclasses of Threading still work as expected. Agreed. This is what __slots__ is for. From jcea at jcea.es Fri Sep 5 18:08:28 2008 From: jcea at jcea.es (Jesus Cea) Date: Fri, 05 Sep 2008 18:08:28 +0200 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> Message-ID: <48C1597C.40507@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gregory P. Smith wrote: > Anyone have an opinion on http://bugs.python.org/issue3492 in regards > to it being a release blocker? > > The gist of it: zlib returns bytearrays where other modules return > bytes. zipimport, because it uses zlib, required bytearrays instead > of bytes as input. A few other modules also appear to return > bytearrays when they're likely better off returning bytes for > consistency. I strongly agree that zlib *SHOULD* return bytes. If zimport requires a bytearray (why?), it can do the conversion itself. > Whats the big deal if this is not fixed before release? Users are > likely to get frustrated at inputs not being hashable without explicit > (data copy) conversion to an immutable type. And any code that gets > written depending on these returning bytearrays instead of bytes would > need fixing if we waited until 3.1 to fix it. +1. Release Blocker. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMFZeZlgi5GaxT1NAQKPhgQAgeEIZyLU6jhMYq3ALKv5/Ashpa6tGWQV 8SligFMYOyAY7THD8pxMZ3yWtghRtzIRvaDkJCRhNfJp4sGHO4gLj/FCbxe5cuLv 41pYndNQ+VXZMCVkJ5OsdVvww59vvOHKvqwSOWd6BL3JUWjKdWIe/yyAKM2+tKL9 owJPMlIE+l0= =j0iu -----END PGP SIGNATURE----- From jcea at jcea.es Fri Sep 5 18:10:38 2008 From: jcea at jcea.es (Jesus Cea) Date: Fri, 05 Sep 2008 18:10:38 +0200 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48C05131.60209@gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> Message-ID: <48C159FE.7070400@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nick Coghlan wrote: > Guido van Rossum wrote: >> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: >>> Jesus Cea jcea.es> writes: >>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >>>> method to an attribute "thread.daemon". >>>> >>>> I think the last change is risky, because you can mistype and create a >>>> new attribute, instead of set daemon mode. Since daemon mode is only >>>> usually visible when things goes wrong (the main thread dies), you can >>>> miss the bug for a long time. >>> I've never understood why the "daemon" flag couldn't be passed as one of the >>> constructor arguments. It would make code shorter, and avoid the mistyping risk >>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag >>> after the thread is started anyway. >> As to the why question, this was done to match the Java Thread class. >> I don't want to speculate why the Java API was designed this way -- >> possibly it was a relic of an earlier API version in Java, but >> possibly there's a reason I can't fathom right now. After all, there >> are excellent reasons why start() is a separate call... > > Hmm, having (daemon=False) as a parameter on start() would probably be > an even better API than having it on __init__() (modulo subclassing > compatibility concerns). Agreed. Could it be done for 3.0?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMFZ/plgi5GaxT1NAQKvoQQApXIgymNMmPtL3ZX/EsllxnnW47oSgzB7 OaOzQXaFsyCo00ErUFm0hluIIHLT6Wqa4nlY1ixx6ThgytNOqHQIRgN/w6oS4kGP WXO3pztXKaiD3gJfxjUOU7FRdOrlXjqwGryq/OPwKtxKFzyloTdTwUAhKCgpwFt3 9QLSioRgLPo= =Pztd -----END PGP SIGNATURE----- From jnoller at gmail.com Fri Sep 5 19:05:55 2008 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 5 Sep 2008 13:05:55 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48C159FE.7070400@jcea.es> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> Message-ID: <4222a8490809051005j6a0e32c5q78f36fb6f148310@mail.gmail.com> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Nick Coghlan wrote: >> Guido van Rossum wrote: >>> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: >>>> Jesus Cea jcea.es> writes: >>>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >>>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >>>>> method to an attribute "thread.daemon". >>>>> >>>>> I think the last change is risky, because you can mistype and create a >>>>> new attribute, instead of set daemon mode. Since daemon mode is only >>>>> usually visible when things goes wrong (the main thread dies), you can >>>>> miss the bug for a long time. >>>> I've never understood why the "daemon" flag couldn't be passed as one of the >>>> constructor arguments. It would make code shorter, and avoid the mistyping risk >>>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag >>>> after the thread is started anyway. >>> As to the why question, this was done to match the Java Thread class. >>> I don't want to speculate why the Java API was designed this way -- >>> possibly it was a relic of an earlier API version in Java, but >>> possibly there's a reason I can't fathom right now. After all, there >>> are excellent reasons why start() is a separate call... >> >> Hmm, having (daemon=False) as a parameter on start() would probably be >> an even better API than having it on __init__() (modulo subclassing >> compatibility concerns). > > Agreed. Could it be done for 3.0?. Personally, I'm staunchly against changing the __init__ for the threading.Thread and multiprocessing.Process modules - it does break/make more confusing the common subclassing people do. I do like the idea of using __slots__, or reverting back to a set_method entirely. -jesse From jnoller at gmail.com Fri Sep 5 19:06:27 2008 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 5 Sep 2008 13:06:27 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48C159FE.7070400@jcea.es> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> Message-ID: <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Nick Coghlan wrote: >> Guido van Rossum wrote: >>> On Thu, Sep 4, 2008 at 8:47 AM, Antoine Pitrou wrote: >>>> Jesus Cea jcea.es> writes: >>>>> First we had "thread.setDaemon()". This was not PEP8, so Python 3.0 >>>>> renamed it to "thread.set_daemon()". Lately Python 3.0 changes the >>>>> method to an attribute "thread.daemon". >>>>> >>>>> I think the last change is risky, because you can mistype and create a >>>>> new attribute, instead of set daemon mode. Since daemon mode is only >>>>> usually visible when things goes wrong (the main thread dies), you can >>>>> miss the bug for a long time. >>>> I've never understood why the "daemon" flag couldn't be passed as one of the >>>> constructor arguments. It would make code shorter, and avoid the mistyping risk >>>> mentioned by Jesus. It also sounds saner, since you shouldn't change the flag >>>> after the thread is started anyway. >>> As to the why question, this was done to match the Java Thread class. >>> I don't want to speculate why the Java API was designed this way -- >>> possibly it was a relic of an earlier API version in Java, but >>> possibly there's a reason I can't fathom right now. After all, there >>> are excellent reasons why start() is a separate call... >> >> Hmm, having (daemon=False) as a parameter on start() would probably be >> an even better API than having it on __init__() (modulo subclassing >> compatibility concerns). > > Agreed. Could it be done for 3.0?. Also, FWIW, I thought we were no longer doing API changes? From guido at python.org Fri Sep 5 19:37:03 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Sep 2008 10:37:03 -0700 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> Message-ID: This needs to be fixed. It is surely a relic from the alpha1 situation where the bytes type was mutable. No read APIs should return mutable bytes. Write APIs should accept mutable and immutable bytes though. On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith wrote: > Anyone have an opinion on http://bugs.python.org/issue3492 in regards > to it being a release blocker? > > The gist of it: zlib returns bytearrays where other modules return > bytes. zipimport, because it uses zlib, required bytearrays instead > of bytes as input. A few other modules also appear to return > bytearrays when they're likely better off returning bytes for > consistency. > > IMHO, it seems like bytearrays should rarely be returned by the > existing standard library apis. Since they are mutable they are > ideally suited for new APIs where they're passed in and modified. > > Whats the big deal if this is not fixed before release? Users are > likely to get frustrated at inputs not being hashable without explicit > (data copy) conversion to an immutable type. And any code that gets > written depending on these returning bytearrays instead of bytes would > need fixing if we waited until 3.1 to fix it. > > -gps > > On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai > wrote: >> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith wrote: >>> I agree that this should go in. zlib should return bytes. other read >>> functions and similar modules like bz2module already return bytes. >>> unless i hear objections, i'll commit this in about 12 hours. >> >> +1 :) >> >>> >> >> Regards >> >> -- >> -Anand >> > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy.kloth at gmail.com Sat Sep 6 03:54:42 2008 From: jeremy.kloth at gmail.com (Jeremy Kloth) Date: Fri, 5 Sep 2008 19:54:42 -0600 Subject: [Python-3000] PyUnicodeObject implementation Message-ID: <200809051954.42787.jeremy.kloth@gmail.com> I don't know if this is too late to do before the final release, but shouldn't the implementation of PyUnicodeObject be updated to match the much more efficient old PyStringObject layout? I mean eliminating the double malloc that is currently required for each unicode string. PyStringObject is declared as a PyVarObject allocated in one chunk, whereas the current PyUnicodeObject is a PyObject allocated in two chunks, one for the object and one for the Py_UNICODE data. I think that this change would go a long way towads making unicode strings comparable to old (2.x) string speeds. I can see that if not changed now, there would be 3rd party extensions that would be relying on the particular layout of PyUnicodeObject and therefore making changing it later too risky. If there is interest in this change, I would happily write a patch that make this change. Thanks, Jeremy -- Jeremy Kloth http://4suite.org/ From guido at python.org Sat Sep 6 04:25:41 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Sep 2008 19:25:41 -0700 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <200809051954.42787.jeremy.kloth@gmail.com> References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: This is an excellent idea that fell by the wayside. Since it is a major coding project there's no way it can be done for 3.0 -- the risk of introducing new instabilities or leaks is just too high. We *might* be able to get it in for 3.0.1, if the code is reviewed really well. Though it might be safer to aim for 3.1. On Fri, Sep 5, 2008 at 6:54 PM, Jeremy Kloth wrote: > I don't know if this is too late to do before the final release, but shouldn't > the implementation of PyUnicodeObject be updated to match the much more > efficient old PyStringObject layout? I mean eliminating the double malloc > that is currently required for each unicode string. > > PyStringObject is declared as a PyVarObject allocated in one chunk, whereas > the current PyUnicodeObject is a PyObject allocated in two chunks, one for > the object and one for the Py_UNICODE data. > > I think that this change would go a long way towads making unicode strings > comparable to old (2.x) string speeds. I can see that if not changed now, > there would be 3rd party extensions that would be relying on the particular > layout of PyUnicodeObject and therefore making changing it later too risky. > > If there is interest in this change, I would happily write a patch that make > this change. > > Thanks, > Jeremy > > -- > Jeremy Kloth > http://4suite.org/ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Sat Sep 6 04:38:26 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Sep 2008 12:38:26 +1000 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> Message-ID: <48C1ED22.5040002@gmail.com> Jesse Noller wrote: > On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Nick Coghlan wrote: >>> Hmm, having (daemon=False) as a parameter on start() would probably be >>> an even better API than having it on __init__() (modulo subclassing >>> compatibility concerns). >> Agreed. Could it be done for 3.0?. > > Also, FWIW, I thought we were no longer doing API changes? We aren't - if we'd thought of it a month ago, we could have included it, but now 2.7/3.1 is the earliest for that change. As far as the 'typo protection' goes... I'm still not convinced that the delayed action of the set daemon effect means that the Thread object needs special protection. If an application fails to set the attribute properly, then its test suite will hang on shutdown (as the threading module attempts to do .join() on a thread that hasn't been told to stop). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jnoller at gmail.com Sat Sep 6 04:44:18 2008 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 5 Sep 2008 22:44:18 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48C1ED22.5040002@gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> <48C1ED22.5040002@gmail.com> Message-ID: <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> On Fri, Sep 5, 2008 at 10:38 PM, Nick Coghlan wrote: > Jesse Noller wrote: >> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> Nick Coghlan wrote: >>>> Hmm, having (daemon=False) as a parameter on start() would probably be >>>> an even better API than having it on __init__() (modulo subclassing >>>> compatibility concerns). >>> Agreed. Could it be done for 3.0?. >> >> Also, FWIW, I thought we were no longer doing API changes? > > We aren't - if we'd thought of it a month ago, we could have included > it, but now 2.7/3.1 is the earliest for that change. > > As far as the 'typo protection' goes... I'm still not convinced that the > delayed action of the set daemon effect means that the Thread object > needs special protection. > > If an application fails to set the attribute properly, then its test > suite will hang on shutdown (as the threading module attempts to do > .join() on a thread that hasn't been told to stop). I happen to really like like the property-approach. It makes sense to call thread.daemon = True, it's also clean and feels natural now that it's there. And you're right - typos in this will bite people fairly quickly, but to Jesus' point - those people may go chasing something else before noticing they typed deamon instead of daemon. -jesse From solipsis at pitrou.net Sat Sep 6 12:40:35 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Sep 2008 10:40:35 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Jeremy Kloth gmail.com> writes: > > I don't know if this is too late to do before the final release, but shouldn't > the implementation of PyUnicodeObject be updated to match the much more > efficient old PyStringObject layout? I mean eliminating the double malloc > that is currently required for each unicode string. I have already written such a patch some months ago, you can find it here: http://bugs.python.org/issue1943 You will perhaps need to adapt the patch a bit in order for it to work properly with the current py3k branch. Also note that Marc-Andr? Lemburg (one of the authors of the unicode implementation) is opposed to that change. See the discussion in the bug tracker issue for the details. Regards Antoine. From barry at python.org Sat Sep 6 18:24:01 2008 From: barry at python.org (Barry Warsaw) Date: Sat, 6 Sep 2008 12:24:01 -0400 Subject: [Python-3000] [Python-3000-checkins] r66218 - python/branches/py3k/RELNOTES In-Reply-To: References: <20080904134435.6093A1E400B@bag.python.org> Message-ID: <0286F806-C214-4C6A-BF0B-DEF10A1961D9@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2008, at 6:12 PM, Guido van Rossum wrote: > On Thu, Sep 4, 2008 at 1:05 PM, Barry Warsaw wrote: >> >> On Sep 4, 2008, at 3:51 PM, Guido van Rossum wrote: >> >>> I'm a little confused -- why did you remove the release notes for >>> previous betas but leave those for the alphas in place? ISTM that >>> the >>> file was an accumulation of release notes throughout the various >>> releases -- just like Misc/NEWS, but with a different focus. This is >>> how release notes in other products I've seen typically work, too. >> >> Mostly because I wasn't sure which of those release notes are still >> relevant. I asked (albeit in a different thread) for some >> assistance in >> determining what the current state of those are. >> >> I'm not so sure it's helpful to separately indicate release notes >> for alphas >> and betas, and it's definitely not helpful to include items that >> are no >> longer relevant. My thought was to list only those that apply to >> the final >> release, and then track them with public stable releases such as >> 3.0.1, >> 3.0.2, etc. > > Well, all the alpha notes should have been fixed by now too -- they > all describe temporary deviations from our high standard for releases. Okay, I'm going to blow away the old alpha issues and just leave known big issues. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMKuoXEjvBPtnXfVAQKIhQP/c7JOmiEjLGk8Oa3tUbmDIl3ka22ttmxg u+zQJSZJCgCLjVMrU2CUEjmh5QYqHItctSNdIxnQzwXvTPWdETV6D+7Q2Y+Mx5Qz 7kbzeNXXXWGRMaJyacwwfrtoqn5tA517btCJPjCHvwXl/R79suBT0CtTlvM399NG iwUCmZEtOfo= =1MLS -----END PGP SIGNATURE----- From greg at krypto.org Sat Sep 6 22:53:53 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 6 Sep 2008 13:53:53 -0700 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> Message-ID: <52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com> issue 3797 created with trivial patches for the remaining bytearray returning abusers. review needed. I don't have a build environment for windows to test the PC/winreg one on but its too simple to be wrong. On Fri, Sep 5, 2008 at 10:37 AM, Guido van Rossum wrote: > This needs to be fixed. It is surely a relic from the alpha1 situation > where the bytes type was mutable. No read APIs should return mutable > bytes. Write APIs should accept mutable and immutable bytes though. > > On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith wrote: >> Anyone have an opinion on http://bugs.python.org/issue3492 in regards >> to it being a release blocker? >> >> The gist of it: zlib returns bytearrays where other modules return >> bytes. zipimport, because it uses zlib, required bytearrays instead >> of bytes as input. A few other modules also appear to return >> bytearrays when they're likely better off returning bytes for >> consistency. >> >> IMHO, it seems like bytearrays should rarely be returned by the >> existing standard library apis. Since they are mutable they are >> ideally suited for new APIs where they're passed in and modified. >> >> Whats the big deal if this is not fixed before release? Users are >> likely to get frustrated at inputs not being hashable without explicit >> (data copy) conversion to an immutable type. And any code that gets >> written depending on these returning bytearrays instead of bytes would >> need fixing if we waited until 3.1 to fix it. >> >> -gps >> >> On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai >> wrote: >>> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith wrote: >>>> I agree that this should go in. zlib should return bytes. other read >>>> functions and similar modules like bz2module already return bytes. >>>> unless i hear objections, i'll commit this in about 12 hours. >>> >>> +1 :) >>> >>>> >>> >>> Regards >>> >>> -- >>> -Anand >>> >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From skip at pobox.com Sat Sep 6 23:06:49 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 Sep 2008 16:06:49 -0500 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? Message-ID: <18626.61673.143430.847735@montanaro-dyndns-org.local> I'm trying to figure out how to install this dbm.sqlite module I have without overwriting the basic install. My thought was to create a dbm package in site-packages then copy sqlite.py there. That doesn't work though. Modifying dbm.__init__.py to include this does: import pkgutil __path__ = pkgutil.extend_path(__path__, __name__) I'm wondering if all the core packages in 3.x should include the above in their __init__.py files. Skip From brett at python.org Sun Sep 7 00:28:18 2008 From: brett at python.org (Brett Cannon) Date: Sat, 6 Sep 2008 15:28:18 -0700 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: <18626.61673.143430.847735@montanaro-dyndns-org.local> References: <18626.61673.143430.847735@montanaro-dyndns-org.local> Message-ID: On Sat, Sep 6, 2008 at 2:06 PM, wrote: > I'm trying to figure out how to install this dbm.sqlite module I have > without overwriting the basic install. My thought was to create a dbm > package in site-packages then copy sqlite.py there. That doesn't work > though. Modifying dbm.__init__.py to include this does: > > import pkgutil > __path__ = pkgutil.extend_path(__path__, __name__) > > I'm wondering if all the core packages in 3.x should include the above in > their __init__.py files. > Well, a side-effect of this is that all package imports will suddenly spike the number of stat calls linearly to the number of entries on sys.path. Another option is to use a pth file that imports your module (as like _dbm_sqlite.py or something) and have it, as a side-effect of importing, set itself on dbm. -Brett From skip at pobox.com Sun Sep 7 00:36:08 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 Sep 2008 17:36:08 -0500 Subject: [Python-3000] Nonlinearity in dbm.ndbm? Message-ID: <18627.1496.785870.379332@montanaro-dyndns-org.local> While doing a little testing of my dbm.sqlite module (it's pretty damn slow at the moment) I came across this chestnut. Given this shell for loop: for n in 10 100 1000 10000 ; do rm -f /tmp/trash.db* python3.0 -m timeit -s 'import dbm.ndbm as db' -s 'f = db.open("/tmp/trash.db", "c")' 'for i in range('$n'): f[str(i)] = str(i)' done I get this output: 100000 loops, best of 3: 16 usec per loop 1000 loops, best of 3: 185 usec per loop 100 loops, best of 3: 5.04 msec per loop 10 loops, best of 3: 207 msec per loop Replacing dbm.ndbm with dbm.sqlite shows more linear growth (only went to n=1000 because it was so slow): 10 loops, best of 3: 44.9 msec per loop 10 loops, best of 3: 460 msec per loop 10 loops, best of 3: 5.26 sec per loop My guess is there is something nonlinear in the ndbm code, probably the underlying library, but it may be worth checking the wrapper quickly. Platform is Mac OSX 10.5.4 on a MacBook Pro. Now to dig into the abysmal sqlite performance. Skip From josiah.carlson at gmail.com Sun Sep 7 00:47:49 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 6 Sep 2008 15:47:49 -0700 Subject: [Python-3000] Nonlinearity in dbm.ndbm? In-Reply-To: <18627.1496.785870.379332@montanaro-dyndns-org.local> References: <18627.1496.785870.379332@montanaro-dyndns-org.local> Message-ID: On Sat, Sep 6, 2008 at 3:36 PM, wrote: > While doing a little testing of my dbm.sqlite module (it's pretty damn slow > at the moment) I came across this chestnut. Given this shell for loop: > > for n in 10 100 1000 10000 ; do > rm -f /tmp/trash.db* > python3.0 -m timeit -s 'import dbm.ndbm as db' -s 'f = db.open("/tmp/trash.db", "c")' 'for i in range('$n'): f[str(i)] = str(i)' > done > > I get this output: > > 100000 loops, best of 3: 16 usec per loop > 1000 loops, best of 3: 185 usec per loop > 100 loops, best of 3: 5.04 msec per loop > 10 loops, best of 3: 207 msec per loop > > Replacing dbm.ndbm with dbm.sqlite shows more linear growth (only went to > n=1000 because it was so slow): > > 10 loops, best of 3: 44.9 msec per loop > 10 loops, best of 3: 460 msec per loop > 10 loops, best of 3: 5.26 sec per loop > > My guess is there is something nonlinear in the ndbm code, probably the > underlying library, but it may be worth checking the wrapper quickly. > > Platform is Mac OSX 10.5.4 on a MacBook Pro. > > Now to dig into the abysmal sqlite performance. The version I just posted to the tracker reads/writes about 30k entries/second. You may want to look at the differences (looks to be due to your lack of a primary key/index). - Josiah From skip at pobox.com Sun Sep 7 01:25:48 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 Sep 2008 18:25:48 -0500 Subject: [Python-3000] Nonlinearity in dbm.ndbm? In-Reply-To: References: <18627.1496.785870.379332@montanaro-dyndns-org.local> Message-ID: <18627.4476.160727.405718@montanaro-dyndns-org.local> >> Now to dig into the abysmal sqlite performance. Josiah> The version I just posted to the tracker reads/writes about 30k Josiah> entries/second. You may want to look at the differences (looks Josiah> to be due to your lack of a primary key/index). Thanks. The real speedup was to avoid using cursors. Here's the progression: * My original (no indexes, keys and values are text, using cursors w/ commit and explicit close, delete+insert to assign key): 10 loops, best of 3: 51.4 msec per loop 10 loops, best of 3: 505 msec per loop * As above, but with a primary key: 10 loops, best of 3: 52.5 msec per loop 10 loops, best of 3: 507 msec per loop * As above, but keys and values are blobs: 10 loops, best of 3: 50.4 msec per loop 10 loops, best of 3: 529 msec per loop * As above, but get rid of del self[key] in __setitem__ and use the replace statement instead of insert: 10 loops, best of 3: 25.4 msec per loop 10 loops, best of 3: 263 msec per loop * Remove try/finally with explicit close() calls (Gerhard says he never closes cursors.): 10 loops, best of 3: 23.2 msec per loop 10 loops, best of 3: 270 msec per loop * Get rid of cursors, calling the connection's execute method instead: 1000 loops, best of 3: 198 usec per loop 100 loops, best of 3: 2.26 msec per loop Hmmm... Should cursors be used? What benefit are they? Without them is the sqlite code thread-safe? Skip From ncoghlan at gmail.com Sun Sep 7 04:33:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Sep 2008 12:33:07 +1000 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: References: <18626.61673.143430.847735@montanaro-dyndns-org.local> Message-ID: <48C33D63.1010305@gmail.com> Brett Cannon wrote: > On Sat, Sep 6, 2008 at 2:06 PM, wrote: >> I'm trying to figure out how to install this dbm.sqlite module I have >> without overwriting the basic install. My thought was to create a dbm >> package in site-packages then copy sqlite.py there. That doesn't work >> though. Modifying dbm.__init__.py to include this does: >> >> import pkgutil >> __path__ = pkgutil.extend_path(__path__, __name__) >> >> I'm wondering if all the core packages in 3.x should include the above in >> their __init__.py files. >> > > Well, a side-effect of this is that all package imports will suddenly > spike the number of stat calls linearly to the number of entries on > sys.path. > > Another option is to use a pth file that imports your module (as like > _dbm_sqlite.py or something) and have it, as a side-effect of > importing, set itself on dbm. It would probably be cleaner to add "extend_path" functions to the extensible core packages rather than have them automatically extend their path list on startup. E.g. dbm.__init__.py may have something like the following: def extend_package(dirs=None): global __path__ if dirs is None: import pkgutil if __package_name__ is not None: name = __package_name__ else: name = __name__ __path__ = pkgutil.extend_path(__path__, name) else: __path__.extend(dirs) So the standard library packages would be self-contained by default, but an application could explicitly request that the extensible packages be expanded to incorporate other directories. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From skip at pobox.com Sun Sep 7 05:37:16 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 Sep 2008 22:37:16 -0500 Subject: [Python-3000] Nonlinearity in dbm.ndbm? In-Reply-To: <18627.4476.160727.405718@montanaro-dyndns-org.local> References: <18627.1496.785870.379332@montanaro-dyndns-org.local> <18627.4476.160727.405718@montanaro-dyndns-org.local> Message-ID: <18627.19564.859436.270707@montanaro-dyndns-org.local> Josiah> The version I just posted to the tracker reads/writes about 30k Josiah> entries/second. You may want to look at the differences (looks Josiah> to be due to your lack of a primary key/index). me> Thanks. The real speedup was to avoid using cursors. Let me take another stab at this. My __setitem__ looks like this: def __setitem__(self, key, val): c = self._conn.cursor() c.execute("replace into dict" " (key, value) values (?, ?)", (key, val)) self._conn.commit() This works (tests pass), but is slow (23-25 msec per loop). If I change it to this: def __setitem__(self, key, val): self._conn.execute("replace into dict" " (key, value) values (?, ?)", (key, val)) which is essentially your __setitem__ without the type checks on the key and value, it runs much faster (about 300 usec per loop), but the unit tests fail. This also works: def __setitem__(self, key, val): self._conn.execute("replace into dict" " (key, value) values (?, ?)", (key, val)) self._conn.commit() I think you need the commits and have to suffer with the speed penalty. Skip From josiah.carlson at gmail.com Sun Sep 7 05:58:20 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 6 Sep 2008 20:58:20 -0700 Subject: [Python-3000] Nonlinearity in dbm.ndbm? In-Reply-To: <18627.19564.859436.270707@montanaro-dyndns-org.local> References: <18627.1496.785870.379332@montanaro-dyndns-org.local> <18627.4476.160727.405718@montanaro-dyndns-org.local> <18627.19564.859436.270707@montanaro-dyndns-org.local> Message-ID: On Sat, Sep 6, 2008 at 8:37 PM, wrote: > > Josiah> The version I just posted to the tracker reads/writes about 30k > Josiah> entries/second. You may want to look at the differences (looks > Josiah> to be due to your lack of a primary key/index). > > me> Thanks. The real speedup was to avoid using cursors. > > Let me take another stab at this. My __setitem__ looks like this: > > def __setitem__(self, key, val): > c = self._conn.cursor() > c.execute("replace into dict" > " (key, value) values (?, ?)", (key, val)) > self._conn.commit() > > This works (tests pass), but is slow (23-25 msec per loop). If I change it > to this: > > def __setitem__(self, key, val): > self._conn.execute("replace into dict" > " (key, value) values (?, ?)", (key, val)) > > which is essentially your __setitem__ without the type checks on the key and > value, it runs much faster (about 300 usec per loop), but the unit tests > fail. This also works: > > def __setitem__(self, key, val): > self._conn.execute("replace into dict" > " (key, value) values (?, ?)", (key, val)) > self._conn.commit() > > I think you need the commits and have to suffer with the speed penalty. I guess I need to look at your unittests, because in my testing, reading/writing with a single instance works great, but if you want changes to be seen by other instances (in other threads or processes), you need to .commit() changes. I'm thinking that that's a reasonable expectation; I never expected bsddbs to be able to share their data with other processes until I did a .sync(), but maybe I never expected much from my dbm-like interfaces? - Josiah From josiah.carlson at gmail.com Sun Sep 7 06:08:03 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 6 Sep 2008 21:08:03 -0700 Subject: [Python-3000] Nonlinearity in dbm.ndbm? In-Reply-To: References: <18627.1496.785870.379332@montanaro-dyndns-org.local> <18627.4476.160727.405718@montanaro-dyndns-org.local> <18627.19564.859436.270707@montanaro-dyndns-org.local> Message-ID: On Sat, Sep 6, 2008 at 8:58 PM, Josiah Carlson wrote: > On Sat, Sep 6, 2008 at 8:37 PM, wrote: >> >> Josiah> The version I just posted to the tracker reads/writes about 30k >> Josiah> entries/second. You may want to look at the differences (looks >> Josiah> to be due to your lack of a primary key/index). >> >> me> Thanks. The real speedup was to avoid using cursors. >> >> Let me take another stab at this. My __setitem__ looks like this: >> >> def __setitem__(self, key, val): >> c = self._conn.cursor() >> c.execute("replace into dict" >> " (key, value) values (?, ?)", (key, val)) >> self._conn.commit() >> >> This works (tests pass), but is slow (23-25 msec per loop). If I change it >> to this: >> >> def __setitem__(self, key, val): >> self._conn.execute("replace into dict" >> " (key, value) values (?, ?)", (key, val)) >> >> which is essentially your __setitem__ without the type checks on the key and >> value, it runs much faster (about 300 usec per loop), but the unit tests >> fail. This also works: >> >> def __setitem__(self, key, val): >> self._conn.execute("replace into dict" >> " (key, value) values (?, ?)", (key, val)) >> self._conn.commit() >> >> I think you need the commits and have to suffer with the speed penalty. > > I guess I need to look at your unittests, because in my testing, > reading/writing with a single instance works great, but if you want > changes to be seen by other instances (in other threads or processes), > you need to .commit() changes. I'm thinking that that's a reasonable > expectation; I never expected bsddbs to be able to share their data > with other processes until I did a .sync(), but maybe I never expected > much from my dbm-like interfaces? I took sandbox/trunk/dbm_sqlite/Lib/test/test_dbm_sqlite.py, changed some of the imports to be used with 2.6, got rid of the 'b' prefix on bytes objects, and my implementation passes in 2.5 (I had to add support for buffer, double-close, and the 'c' flag). Maybe there's something funky going on with Python 3.0's sqlite3? - Josiah From stefan_ml at behnel.de Sun Sep 7 09:15:42 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 07 Sep 2008 09:15:42 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Antoine Pitrou wrote: > Also note that Marc-Andr? Lemburg (one of the authors of the unicode > implementation) is opposed to that change. See the discussion in the bug tracker > issue for the details. >From a Cython perspective, I find the lack of efficient subclassing after such a change particularly striking. That seriously bit me in Py2 when I tried making XML text content a bit more intelligent in lxml (i.e. make it remember what XML element it originated from). Having the same problem for unicode in Py3 doesn't sound like a good idea to me. Stefan From solipsis at pitrou.net Sun Sep 7 15:52:54 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Sep 2008 13:52:54 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Stefan Behnel behnel.de> writes: > > From a Cython perspective, I find the lack of efficient subclassing after such > a change particularly striking. That seriously bit me in Py2 when I tried > making XML text content a bit more intelligent in lxml (i.e. make it remember > what XML element it originated from). I've used a library which had adopted this kind of behaviour (I think it was BeautifulSoup). After using it several times in a row I noticed memory consumption of my program exploded. The problem was that the library was returning objects which looked innocently like strings, but internally kept a reference to a multi-megabyte HTML tree. The solution was to convert them explicitly to str before storing them for later use, which defeated the point of having an str-derived type. In these cases I think it's much friendlier to the user of the API to use composition rather than inheritance. Or, simply, just return a raw string and let the user keep the context separately if he wants to. PS: what do you call "efficient subclassing"? if you look at the current implementation of unicode_subtype_new() in unicodeobject.c, it isn't very efficient (everything including the raw data buffer is allocated twice). From guido at python.org Sun Sep 7 16:38:06 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Sep 2008 07:38:06 -0700 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: On Sun, Sep 7, 2008 at 12:15 AM, Stefan Behnel wrote: > Antoine Pitrou wrote: >> Also note that Marc-Andr? Lemburg (one of the authors of the unicode >> implementation) is opposed to that change. See the discussion in the bug tracker >> issue for the details. > > From a Cython perspective, I find the lack of efficient subclassing after such > a change particularly striking. That seriously bit me in Py2 when I tried > making XML text content a bit more intelligent in lxml (i.e. make it remember > what XML element it originated from). Having the same problem for unicode in > Py3 doesn't sound like a good idea to me. Can you explain this a bit more? I presume you're talking about subclassing in C, which always precarious -- from the Python perspective there's no difference, the objects are opaque. I do note that the mechanisms that exist for supporting adding a __dict__ to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other purposes. Also, please explain why instead of subclassing you couldn't use a wrapper class? (I.e. use containment instead of inheritance.) All in all, given the advantage (half the number of allocations) of the proposal I think there would have to be *very* good arguments against before we reject this outright. I'd like to understand Marc-Andre's reasons too. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stefan_ml at behnel.de Sun Sep 7 16:46:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 07 Sep 2008 16:46:29 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Antoine Pitrou wrote: > Stefan Behnel behnel.de> writes: >> From a Cython perspective, I find the lack of efficient subclassing after such >> a change particularly striking. That seriously bit me in Py2 when I tried >> making XML text content a bit more intelligent in lxml (i.e. make it remember >> what XML element it originated from). > > I've used a library which had adopted this kind of behaviour (I think it was > BeautifulSoup). After using it several times in a row I noticed memory > consumption of my program exploded. The problem was that the library was > returning objects which looked innocently like strings, but internally kept a > reference to a multi-megabyte HTML tree. The solution was to convert them > explicitly to str before storing them for later use, which defeated the point of > having an str-derived type. I'm aware of that problem. > In these cases I think it's much friendlier to the user of the API to use > composition rather than inheritance. Or, simply, just return a raw string and > let the user keep the context separately if he wants to. That's not that easy for the result of an arbitrary XPath query. But you can switch the behaviour off when you build the query, so that it gives you a straight string as result. > PS: what do you call "efficient subclassing"? if you look at the current > implementation of unicode_subtype_new() in unicodeobject.c, it isn't very > efficient (everything including the raw data buffer is allocated twice). That's something that may be optimised one day without affecting user code. A different memory layout that prevents C-level subclassing is a very different kind of change. Plus, even with the double-allocation, a C-level subclass is still faster than a Python-level subclass for me. Setup for timeit: s = b"abcdef ghijk"; from lxml.etree import _ElementUnicodeResult; u = type("u", (unicode,), {}) $ python2.6 -m timeit ... 'unicode(s)' 1000000 loops, best of 3: 0.623 usec per loop $ python2.6 -m timeit -s ... '_ElementUnicodeResult(s)' 1000000 loops, best of 3: 0.822 usec per loop $ python2.6 -m timeit -s ... 'u(s)' 1000000 loops, best of 3: 0.849 usec per loop $ python2.6 -m timeit -s ... 'unicode(s, "utf-8")' 1000000 loops, best of 3: 0.622 usec per loop $ python2.6 -m timeit -s ... '_ElementUnicodeResult(s, "utf-8")' 1000000 loops, best of 3: 0.806 usec per loop $ python2.6 -m timeit -s ... 'u(s, "utf-8")' 1000000 loops, best of 3: 0.844 usec per loop Doing the same with a unicode string as input gives me lower but similar numbers. Stefan From stefan_ml at behnel.de Sun Sep 7 16:58:13 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 07 Sep 2008 16:58:13 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Hi, Guido van Rossum wrote: > On Sun, Sep 7, 2008 at 12:15 AM, Stefan Behnel wrote: >> Antoine Pitrou wrote: >>> Also note that Marc-Andr? Lemburg (one of the authors of the unicode >>> implementation) is opposed to that change. See the discussion in the bug tracker >>> issue for the details. >> From a Cython perspective, I find the lack of efficient subclassing after such >> a change particularly striking. That seriously bit me in Py2 when I tried >> making XML text content a bit more intelligent in lxml (i.e. make it remember >> what XML element it originated from). Having the same problem for unicode in >> Py3 doesn't sound like a good idea to me. > > Can you explain this a bit more? I presume you're talking about > subclassing in C Yes, I mentioned Cython above. > I do note that the mechanisms that exist for supporting adding a __dict__ > to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other > purposes. I never looked into these, but this does not sound like it would impact subclassing. > Also, please explain why instead of subclassing you couldn't use a > wrapper class? (I.e. use containment instead of inheritance.) Because users will expect that the return values can be passed into anything that accepts a string, which is much more than you could catch with a wrapper class. There are tons of C-level APIs inside and outside of Python itself that require strings for certain operations and will not accept any other object. Just think of passing a wrapper object as type name of a newly created type. Stefan From skip at pobox.com Sun Sep 7 17:22:11 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 7 Sep 2008 10:22:11 -0500 Subject: [Python-3000] Would someone please look at this bug report? Message-ID: <18627.61859.941803.115472@montanaro-dyndns-org.local> I created this bug report against 3.0 yesterday: http://bugs.python.org/issue3799 I marked it high priority because it seems to me that all the dbm.* modules should agree on whether they accept strings as keys or require bytes. That's clearly not the case at the moment. I suppose perhaps I should have marked it as a release blocker, but I don't think that's my call. Thx, Skip From martin at v.loewis.de Sun Sep 7 18:01:53 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 07 Sep 2008 18:01:53 +0200 Subject: [Python-3000] XML as bytes or unicode? In-Reply-To: References: <1afaf6160808180935k3470efc0n65c318b87d54a99@mail.gmail.com> <48B24525.3080808@v.loewis.de> Message-ID: <48C3FAF1.5090909@v.loewis.de> >> Parsing Unicode XML strings isn't quite that meaningful. > > Maybe not according to the XML standard, but I can see lots of > practical situations where the encoding is always known and applied by > some other layer, i.e. the I/O library or a database wrapper. Forcing > XML to be interpreted as binary isn't always the best idea. E.g. > consider storing XML in a SVN repository. Or consider storing XML > fragments in Python string literals. Stefan got it right - a "higher-level protocol" may override the encoding declaration in the XML data. In the case of Python Unicode strings, the data is 16-bit Unicode (or 32-bit), "obviously" overriding the declared encoding (although technically, that protocol needs to explicitly state what encoding takes precedence). So let me rephrase: "Parsing Unicode XML strings may easily lead to parsing problems" (i.e. if the parser hasn't been told that a higher-layer protocol was in place). This is currently the case in 3.0: py> d=xml.dom.minidom.parseString("\u20ac") py> d.documentElement.childNodes[0].data '?\x82?' py> list(map(ord,d.documentElement.childNodes[0].data)) [226, 130, 172] Regards, Martin From barry at python.org Sun Sep 7 18:02:06 2008 From: barry at python.org (Barry Warsaw) Date: Sun, 7 Sep 2008 12:02:06 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: Message-ID: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 7, 2008, at 10:51 AM, Fredrik Lundh wrote: > Barry Warsaw wrote: > >> I'm not going to release rc1 tonight. There are too many open >> release blockers that I don't want to defer, and I'd like the >> buildbots to churn through the bsddb removal on all platforms. > >> I'd like to try again on Friday and stick to rc2 on the 17th. > > any news on this front? > > (I have a few minor ET fixes, and possibly a Unicode 5.1 patch, but > have had absolutely no time to spend on that. is the window still > open?) There are 8 open release blockers, a few of which have patches that need review. So I think we are still not ready to release rc1. But it worries me because I think this is going to push the final release beyond our October 1st goal. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMP6/3EjvBPtnXfVAQIprQQAsWgxQPKyxM/rrG5TWL4UqI7xne6dLTjL Nx3OBpi8hcNXEqyxzoosFXXZy4PpSWU+SwxuI1YQT9rUjv/ks6yxu3cBcEVhtEHV KE34YS4D825tVGvbvpsOXF06fsfv5j5zZGB6hlSipZoiv1rhR3uEsO2zkWaI4eQ6 Ty2Cfuxu10A= =8eP5 -----END PGP SIGNATURE----- From brett at python.org Sun Sep 7 21:58:15 2008 From: brett at python.org (Brett Cannon) Date: Sun, 7 Sep 2008 12:58:15 -0700 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: <48C33D63.1010305@gmail.com> References: <18626.61673.143430.847735@montanaro-dyndns-org.local> <48C33D63.1010305@gmail.com> Message-ID: On Sat, Sep 6, 2008 at 7:33 PM, Nick Coghlan wrote: [SNIP] > So the standard library packages would be self-contained by default, but > an application could explicitly request that the extensible packages be > expanded to incorporate other directories. > I was thinking about this the other day and realized there is a risk of people make incorrect associations of third-party code as part of the stdlib. It also could lead to future name clashes with modules if we were ever to add a module with the same name as one injected by a third-party. -Brett From brett at python.org Sun Sep 7 22:02:11 2008 From: brett at python.org (Brett Cannon) Date: Sun, 7 Sep 2008 13:02:11 -0700 Subject: [Python-3000] Would someone please look at this bug report? In-Reply-To: <18627.61859.941803.115472@montanaro-dyndns-org.local> References: <18627.61859.941803.115472@montanaro-dyndns-org.local> Message-ID: On Sun, Sep 7, 2008 at 8:22 AM, wrote: > I created this bug report against 3.0 yesterday: > > http://bugs.python.org/issue3799 > > I marked it high priority because it seems to me that all the dbm.* modules > should agree on whether they accept strings as keys or require bytes. > That's clearly not the case at the moment. I suppose perhaps I should have > marked it as a release blocker, but I don't think that's my call. > Well, I think it is your call by being a core developer. If Barry disagrees he can lower the priority. -Brett From ncoghlan at gmail.com Sun Sep 7 23:23:26 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 08 Sep 2008 07:23:26 +1000 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: <48C4464E.5010707@gmail.com> Guido van Rossum wrote: > All in all, given the advantage (half the number of allocations) of > the proposal I think there would have to be *very* good arguments > against before we reject this outright. I'd like to understand > Marc-Andre's reasons too. As Stefan notes, because of the frequency with which strings are manipulated in C code via PyString_* / PyUnicode_* calls, it is a data type where "accept no substitutes" prevails. MAL's primary concern appears to be that having Unicode as a plain PyObject leaves the type more open to subclass-based optimisations that have been rejected for the builtin types themselves. Having PyString/PyBytes as PyVarObjects means that subclasses are more limited in what they can do. One possibility that occurs to me is to use a PyVarObject variant that allocates space for an additional void pointer before the variable sized section of the object. The builtin type would leave that pointer NULL, but subtypes could perform the second allocation needed to populate it. The question is whether the 4-8 bytes wasted per object would be worth the fact that only one memory allocation would be needed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Sun Sep 7 23:25:14 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 08 Sep 2008 07:25:14 +1000 Subject: [Python-3000] Would someone please look at this bug report? In-Reply-To: References: <18627.61859.941803.115472@montanaro-dyndns-org.local> Message-ID: <48C446BA.8070301@gmail.com> Brett Cannon wrote: > On Sun, Sep 7, 2008 at 8:22 AM, wrote: >> I created this bug report against 3.0 yesterday: >> >> http://bugs.python.org/issue3799 >> >> I marked it high priority because it seems to me that all the dbm.* modules >> should agree on whether they accept strings as keys or require bytes. >> That's clearly not the case at the moment. I suppose perhaps I should have >> marked it as a release blocker, but I don't think that's my call. >> > > Well, I think it is your call by being a core developer. If Barry > disagrees he can lower the priority. That's the way I've been interpreting it (and while a couple of them have certainly turned out to be less urgent than I thought after further analysis, I don't regret getting an explicit decision on them before the rc went out). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Mon Sep 8 00:55:32 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Sep 2008 15:55:32 -0700 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C4464E.5010707@gmail.com> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> Message-ID: On Sun, Sep 7, 2008 at 2:23 PM, Nick Coghlan wrote: > Guido van Rossum wrote: >> All in all, given the advantage (half the number of allocations) of >> the proposal I think there would have to be *very* good arguments >> against before we reject this outright. I'd like to understand >> Marc-Andre's reasons too. > > As Stefan notes, because of the frequency with which strings are > manipulated in C code via PyString_* / PyUnicode_* calls, it is a data > type where "accept no substitutes" prevails. > > MAL's primary concern appears to be that having Unicode as a plain > PyObject leaves the type more open to subclass-based optimisations that > have been rejected for the builtin types themselves. Hm. I don't have any particularly insightful imagination as to what those optimizations might be. Have any been implemented (in 3rd party code) in the 8 years that the Unicode object has existed? > Having > PyString/PyBytes as PyVarObjects means that subclasses are more limited > in what they can do. True. > One possibility that occurs to me is to use a PyVarObject variant that > allocates space for an additional void pointer before the variable sized > section of the object. The builtin type would leave that pointer NULL, > but subtypes could perform the second allocation needed to populate it. > > The question is whether the 4-8 bytes wasted per object would be worth > the fact that only one memory allocation would be needed. I believe that 4-8 bytes is more than the overhead of an extra memory allocation from the obmalloc heap. It is probably about the same as the overhead for a memory allocation from the regular malloc heap. So for short strings (of which there are often a lot) it would be more expensive; for longer objects it would probably work out just about the same. There could be a different approach though, whereby the offset from the start of the object to the start of the character array wasn't a constant but a value stored in the class object. (In fact, tp_basicsize could probably be used for this.) It would slow down access to the characters a bit though -- a classic time-space trade-off that would require careful measurement in order to decide which is better. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Mon Sep 8 02:23:21 2008 From: foom at fuhm.net (James Y Knight) Date: Sun, 7 Sep 2008 20:23:21 -0400 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> Message-ID: On Sep 7, 2008, at 6:55 PM, Guido van Rossum wrote: >> One possibility that occurs to me is to use a PyVarObject variant >> that >> allocates space for an additional void pointer before the variable >> sized >> section of the object. The builtin type would leave that pointer >> NULL, >> but subtypes could perform the second allocation needed to populate >> it. >> >> The question is whether the 4-8 bytes wasted per object would be >> worth >> the fact that only one memory allocation would be needed. > > I believe that 4-8 bytes is more than the overhead of an extra memory > allocation from the obmalloc heap. It is probably about the same as > the overhead for a memory allocation from the regular malloc heap. So > for short strings (of which there are often a lot) it would be more > expensive; for longer objects it would probably work out just about > the same. > > There could be a different approach though, whereby the offset from > the start of the object to the start of the character array wasn't a > constant but a value stored in the class object. (In fact, > tp_basicsize could probably be used for this.) It would slow down > access to the characters a bit though -- a classic time-space > trade-off that would require careful measurement in order to decide > which is better. Given that you can, today, subclass str in Python, without wasting an extra 4/8 bytes of memory, or adding anything new to the class object, why wouldn't anyone who really wanted to make a hypothetical optimized subclass just use the same mechanism (putting your additional data *after* the character data) to subclass it in C? It may be a little tricky, but not exactly rocket science, and given that all these C subclasses of str are so far hypothetical, just leaving it as "it's possible" seems perfectly reasonable... James From wescpy at gmail.com Mon Sep 8 02:34:59 2008 From: wescpy at gmail.com (wesley chun) Date: Sun, 7 Sep 2008 17:34:59 -0700 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com> >> Barry Warsaw wrote: >>> I'm not going to release rc1 tonight. >>> I'd like to try again on Friday and stick to rc2 on the 17th. > > There are 8 open release blockers, a few of which have patches that need > review. So I think we are still not ready to release rc1. But it worries > me because I think this is going to push the final release beyond our > October 1st goal. the goal is admirable, but unless there are paying sponsors that require this deadline be met, i'd suggest that we can push the releases until they're ready. the changes that 2.6 and 3.0 bring are too major to be released before they are ready for primetime. also, there hasn't been a beta3 download available for Win users (aside from the developers who can build it) since Martin has been on vacation... they will effectively be leapfrogged from b2 directly to rc1. i think he comes back tomorrow, so if rc1 really is going out soon, would it make sense for him to make b3 MSI files too? just my $0.02, -wesley From abpillai at gmail.com Mon Sep 8 06:10:01 2008 From: abpillai at gmail.com (Anand Balachandran Pillai) Date: Mon, 8 Sep 2008 09:40:01 +0530 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> <48C1ED22.5040002@gmail.com> <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> Message-ID: <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com> On Sat, Sep 6, 2008 at 8:14 AM, Jesse Noller wrote: > On Fri, Sep 5, 2008 at 10:38 PM, Nick Coghlan wrote: >> Jesse Noller wrote: >>> On Fri, Sep 5, 2008 at 12:10 PM, Jesus Cea wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> Nick Coghlan wrote: >>>>> Hmm, having (daemon=False) as a parameter on start() would probably be >>>>> an even better API than having it on __init__() (modulo subclassing >>>>> compatibility concerns). >>>> Agreed. Could it be done for 3.0?. >>> >>> Also, FWIW, I thought we were no longer doing API changes? >> >> We aren't - if we'd thought of it a month ago, we could have included >> it, but now 2.7/3.1 is the earliest for that change. >> >> As far as the 'typo protection' goes... I'm still not convinced that the >> delayed action of the set daemon effect means that the Thread object >> needs special protection. >> >> If an application fails to set the attribute properly, then its test >> suite will hang on shutdown (as the threading module attempts to do >> .join() on a thread that hasn't been told to stop). > > I happen to really like like the property-approach. It makes sense to > call thread.daemon = True, it's also clean and feels natural now that > it's there. And you're right - typos in this will bite people fairly > quickly, but to Jesus' point - those people may go chasing something > else before noticing they typed deamon instead of daemon. I think Jesus raises a very valid point. I have often typed "setDeamon" instead of "setDaemon" for my thread objects. I always make it a point to keep open the module documentation for threading.Thread before calling this method on the objects. I think the "Pythonic" way of doing it would be to use properties, so 'thread.daemon=1' is very nice. But without __slots__, we are going to have many developers write 'thread.deamon=1' and not notice this is the problem when they start debugging after stuff does not work the way they expect at process shutdown. They are going to chase after some other thread... I guess adding __slots__ to Thread class is the best approach for this. +1 for that... IMHO, this is perhaps late for 3.0, but definitely a good thing to add for 3.1. > > -jesse > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/abpillai%40gmail.com > Regards, -- -Anand From martin at v.loewis.de Mon Sep 8 06:35:28 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 08 Sep 2008 06:35:28 +0200 Subject: [Python-3000] PEP 3108 and the demise of bsddb3 In-Reply-To: <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> References: <1220484055.04.0.310220074498.issue3769@psf.upfronthosting.co.za> <1afaf6160809031622j3b63a08h912686a839bd0ed@mail.gmail.com> <18623.18700.76260.893902@montanaro-dyndns-org.local> <79990c6b0809040719y7836b635vc30a6e46f0ce0182@mail.gmail.com> Message-ID: <48C4AB90.1020601@v.loewis.de> > On Windows, none are available except dbm.dumb and bsddb (presently). > If bsddb is to be removed, can/should one of the other "real" dbm > variants be added to the standard binary, so that Windows users have > at least one usable dbm option? Which one specifically? What's the licensing implications? For 3.0, I think that is too late. Regards, Martin From martin at v.loewis.de Mon Sep 8 07:00:48 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 08 Sep 2008 07:00:48 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: <48C4B180.2050301@v.loewis.de> >> Can you explain this a bit more? I presume you're talking about >> subclassing in C > > Yes, I mentioned Cython above. Can you please still elaborate? I have never used Cython before, but if it cannot efficiently subclass str, isn't that a bug in Cython? >> I do note that the mechanisms that exist for supporting adding a __dict__ >> to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other >> purposes. > > I never looked into these, but this does not sound like it would impact > subclassing. To me, the relationship is fairly straight: if you want to subclass a type, *all* you need is a way to place an __dict__ in the object, if it doesn't already have one. If the base object already has an __dict__, the layout of the subtype can be the same as the layout of the base type. Now, what Guido (probably) refers to is the implementation strategy used for adding __dict__ could be generalized for adding additional slots as well: for a variable-sized object (str or tuple), the dictoffset is negative, indicating that you have to count from the end of the object, not from the start, to find the slot. So if you are worried about __dict__-stored attributes being too slow (*), this approach could be a solution. (*) This assumes that the lack of additional slots actually *is* your concern. Regards, Martin From martin at v.loewis.de Mon Sep 8 07:07:46 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 08 Sep 2008 07:07:46 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> Message-ID: <48C4B322.2060401@v.loewis.de> > Given that you can, today, subclass str in Python, without wasting an > extra 4/8 bytes of memory, or adding anything new to the class object, > why wouldn't anyone who really wanted to make a hypothetical optimized > subclass just use the same mechanism (putting your additional data > *after* the character data) to subclass it in C? > > It may be a little tricky, but not exactly rocket science I believe many people do consider it rocket science, or at least so much out of their reach that it doesn't actually come to their mind as a possible solution. I'm really curious about Stefan's explanation why efficient subclassing of str is not possible in Cython (is it not possible at all? is it possible but inefficient? if so, how much, and why?) Regards, Martin From abpillai at gmail.com Mon Sep 8 07:40:45 2008 From: abpillai at gmail.com (Anand Balachandran Pillai) Date: Mon, 8 Sep 2008 11:10:45 +0530 Subject: [Python-3000] Fwd: Beta 3 planned for this Wednesday (OT: Beta 3 planned for this Wednesday) In-Reply-To: <52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com> References: <8548c5f30808200020i3a48c512hb9bf5fbbc149dfbf@mail.gmail.com> <52dc1c820809032217h2094f4cds3792da66f6489d91@mail.gmail.com> <8548c5f30809032242w57c8d6f2j612a791a84b5f53c@mail.gmail.com> <52dc1c820809050017p3b1f487cl5601e27ae51d47f1@mail.gmail.com> <52dc1c820809061353w6bd21232jf08ccad6b7bde19f@mail.gmail.com> Message-ID: <8548c5f30809072240n26dce363h2545e5147da1f37@mail.gmail.com> Hi Gregory, If you need help in testing out the bytearray related patches on various platforms (#3797, #3492) let me know. Regards --Anand On Sun, Sep 7, 2008 at 2:23 AM, Gregory P. Smith wrote: > issue 3797 created with trivial patches for the remaining bytearray > returning abusers. review needed. > > I don't have a build environment for windows to test the PC/winreg one > on but its too simple to be wrong. > > On Fri, Sep 5, 2008 at 10:37 AM, Guido van Rossum wrote: >> This needs to be fixed. It is surely a relic from the alpha1 situation >> where the bytes type was mutable. No read APIs should return mutable >> bytes. Write APIs should accept mutable and immutable bytes though. >> >> On Fri, Sep 5, 2008 at 12:17 AM, Gregory P. Smith wrote: >>> Anyone have an opinion on http://bugs.python.org/issue3492 in regards >>> to it being a release blocker? >>> >>> The gist of it: zlib returns bytearrays where other modules return >>> bytes. zipimport, because it uses zlib, required bytearrays instead >>> of bytes as input. A few other modules also appear to return >>> bytearrays when they're likely better off returning bytes for >>> consistency. >>> >>> IMHO, it seems like bytearrays should rarely be returned by the >>> existing standard library apis. Since they are mutable they are >>> ideally suited for new APIs where they're passed in and modified. >>> >>> Whats the big deal if this is not fixed before release? Users are >>> likely to get frustrated at inputs not being hashable without explicit >>> (data copy) conversion to an immutable type. And any code that gets >>> written depending on these returning bytearrays instead of bytes would >>> need fixing if we waited until 3.1 to fix it. >>> >>> -gps >>> >>> On Wed, Sep 3, 2008 at 10:42 PM, Anand Balachandran Pillai >>> wrote: >>>> On Thu, Sep 4, 2008 at 10:47 AM, Gregory P. Smith wrote: >>>>> I agree that this should go in. zlib should return bytes. other read >>>>> functions and similar modules like bz2module already return bytes. >>>>> unless i hear objections, i'll commit this in about 12 hours. >>>> >>>> +1 :) >>>> >>>>> >>>> >>>> Regards >>>> >>>> -- >>>> -Anand >>>> >>> _______________________________________________ >>> Python-3000 mailing list >>> Python-3000 at python.org >>> http://mail.python.org/mailman/listinfo/python-3000 >>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org >>> >> >> >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> > -- -Anand From stefan_ml at behnel.de Mon Sep 8 08:56:17 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 08 Sep 2008 08:56:17 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C4B180.2050301@v.loewis.de> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> Message-ID: Martin v. L?wis wrote: >>> Can you explain this a bit more? I presume you're talking about >>> subclassing in C >> Yes, I mentioned Cython above. > > Can you please still elaborate? I have never used Cython before Should have been clearer, sorry. C-level subtyping in Cython/Pyrex works as follows. We create a new struct for the type that contains the parent-struct as first field, and then we add the new attributes of the new type behind that. This implies that this kind of subtyping is single inheritance (as opposed to normal Python subclassing, which is the same in Pyrex/Cython and Python). This currently works for all builtin types, except str. It results in a very regular memory layout for extension types. The way it's written in Pyrex/Cython is: cdef class MyListSubType(PyListObject): cdef int some_additional_int_field cdef my_struct* some_struct def __init__(self): self.some_struct = get_the_struct_pointer(...) self.some_additional_int_field = 1 PyListObject will become a struct member called "__pyx_base" in the new struct for MyListSubType, and access to members of the base type does a straight self->__pyx_base-> (... possibly more __pyx_base derefs ...) -> field_name The C compiler will make this a straight "self[index_of_field_name]" pointer deref, unbeatable in speed. The exact memory layout only needs to be available at C compile time. Also, the exact members of the parent type(s) are not required at Cython compile time (only those used in the code), as the C compiler will get them right when it reads their header file. > if it cannot efficiently subclass str, isn't that a bug in Cython? I wouldn't mind letting Cython special case subtypes of str (or unicode in Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that only applies to exactly those types *and* can be done realiably for subtypes of subtypes. I'm just not aware of such a solution. >>> I do note that the mechanisms that exist for supporting adding a __dict__ >>> to a str (in 2.x; or bytes in 3.x) or a tuple could be extended for other >>> purposes. >> I never looked into these, but this does not sound like it would impact >> subclassing. > > To me, the relationship is fairly straight: if you want to subclass a > type, *all* you need is a way to place an __dict__ in the object, if > it doesn't already have one. If the base object already has an __dict__, > the layout of the subtype can be the same as the layout of the base > type. As long as you accept the dictionary indirection and type unpacking for accessing fields even in the context of private C-level type members of an extension type, which are currently accessible through straight pointers. There is a huge performance difference between e.g. a) dereferencing a pointer to a C int, and b) asking a dictionary for a name, have it find the result, check if the result is empty, check if the result is a Python long or int (or a pointer object, or whatever), unpack the result into a C int. Plus the need to raise an exception in the error case, plus the Python-level visibility of internal C-level fields (such as arbitrary pointers), plus the inability to do this without holding the GIL. Plus the casting all over the place when it's not a C int but a struct pointer, for example. > Now, what Guido (probably) refers to is the implementation strategy > used for adding __dict__ could be generalized for adding additional > slots as well: for a variable-sized object (str or tuple), the > dictoffset is negative, indicating that you have to count from the > end of the object, not from the start, to find the slot. This does sound interesting, but I will have to look into the implications. As I said, it has to be a viable solution without (noticeable) impact on other types. I'm not sure how this would interact with subtypes of subtypes, and what the memory layout would be in that case. Stefan From solipsis at pitrou.net Mon Sep 8 12:19:54 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Sep 2008 10:19:54 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> Message-ID: Stefan Behnel behnel.de> writes: > > cdef class MyListSubType(PyListObject): > cdef int some_additional_int_field > cdef my_struct* some_struct > > def __init__(self): > self.some_struct = get_the_struct_pointer(...) > self.some_additional_int_field = 1 In your example, you could wrap the additional fields (additional_int_field and some_struct) in a dedicated struct, and define a macro which gives a pointer to this struct when given the address of the object. Once you have the pointer to the struct, accessing additional fields is as simple as in the non-PyVarObject case. Something like (pseudocode): #define MyStrSubType_FIELDS_ADDR(op) \ ((struct MyStrSubType_subfields*) &((void*)op + PyString_Type->tp_basicsize \ + op->size * PyString_Type->tp_itemsize)) It's not as trivially cheap as a straight field access, but much less expensive than a dictionary lookup. (perhaps this needs to be a bit more complicated if you want a specific alignment for your fields) From p.f.moore at gmail.com Mon Sep 8 13:04:35 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Sep 2008 12:04:35 +0100 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com> Message-ID: <79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com> 2008/9/8 wesley chun : > the goal is admirable, but unless there are paying sponsors that > require this deadline be met, i'd suggest that we can push the > releases until they're ready. the changes that 2.6 and 3.0 bring are > too major to be released before they are ready for primetime. I believe that the reason for the Oct 1st deadline is that, if we hit it, the new versions will be included in some vendor OS releases (I don't know the exact details, but that's my recollection). > also, there hasn't been a beta3 download available for Win users > (aside from the developers who can build it) since Martin has been on > vacation... they will effectively be leapfrogged from b2 directly to > rc1. i think he comes back tomorrow, so if rc1 really is going out > soon, would it make sense for him to make b3 MSI files too? I agree that the lack of Windows installers is somewhat frustrating (not that I begrudge Martin his holiday!) but in practice I wonder how much impact it has. I've used the earlier betas and alphas, but most of my code relies on one or more external packages, so I tend to have to wait for 2.3 (or 3.0) compatible binaries of those. The only one readily available is pywin32, where there's a 2.6 version (but still no 3.0). I don't know how common my situation is, but certainly the Windows betas don't get as much testing by me as I'd like. Paul. From solipsis at pitrou.net Mon Sep 8 13:24:03 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Sep 2008 11:24:03 +0000 (UTC) Subject: [Python-3000] os.write accepts unicode strings Message-ID: Hello, I thought I'd mention the following issue before it's too late to possibly fix it in 3.0. Basically, os.write() accepts str as well as bytes object, which doesn't sound right. http://bugs.python.org/issue3782 Regards Antoine. From ncoghlan at gmail.com Mon Sep 8 13:25:11 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 08 Sep 2008 21:25:11 +1000 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> <48C1ED22.5040002@gmail.com> <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com> Message-ID: <48C50B97.1060409@gmail.com> Anand Balachandran Pillai wrote: > I guess adding __slots__ to Thread class is the best approach > for this. +1 for that... > > IMHO, this is perhaps late for 3.0, but definitely a good thing > to add for 3.1. I'd want at least one major release with a deprecation warning on Thread's __setattr__ before we did anything like blocking the addition of new attributes. Thread has been exposed as a normal python class for a long time, and there is sure to be code out there that relies on setting new attributes on Thread instances. And I still don't know what makes daemon so special that it needs typo protection when almost everything else in the standard library doesn't have it. Given the amount of memory that is going to be allocated for the new thread's stack, the saving of the space for an empty __dict__ slot also isn't a particularly significant gain. (I deliberately pronounce daemon as day-mon though, so I don't forget how to spell it - perhaps pronouncing it as dee-mon makes it harder to remember the order of the 'a' and the 'e'?) If it is just the specific typo as 'deamon' that concerns people, adding a property specifically to raise an exception for that name would be far less hassle than locking down the attributes of all Thread instances. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From barry at python.org Mon Sep 8 15:16:07 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Sep 2008 09:16:07 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 7, 2008, at 4:12 PM, Fredrik Lundh wrote: > Barry Warsaw wrote: > >>> (I have a few minor ET fixes, and possibly a Unicode 5.1 patch, >>> but have had absolutely no time to spend on that. is the window >>> still open?) >> There are 8 open release blockers, a few of which have patches that >> need review. So I think we are still not ready to release rc1. > > So what's the new ETA? Should I set aside some time to work on the > patches, say, tomorrow, or is it too late? It's not too late. If they fix bugs and the code gets reviewed then yes, you can check them in. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMUlmHEjvBPtnXfVAQJ51QP7BdUGcKN4+L9vD+g7y2TI0+TSw4Ms+eAc yXprcbQnfGp1+uxzjiTCeAv0OSAodw4aakAaI4wzrAkKYNmsVaWOiGKiKrLvR7+Y ++qBxxxVwlKL606hlJCKgphD4hbZcW1w3wY94CXkmrTqyZe/XrStvBj7X10gWeYW lwC3ATaQQ5Y= =tyym -----END PGP SIGNATURE----- From barry at python.org Mon Sep 8 15:17:46 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Sep 2008 09:17:46 -0400 Subject: [Python-3000] [Python-Dev] Not releasing rc1 tonight In-Reply-To: <79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <78b3a9580809071734u5967f305mbff6120dfca538b7@mail.gmail.com> <79990c6b0809080404h4ae2f636xe77b46ede7c437ec@mail.gmail.com> Message-ID: <98B2280E-5101-44AE-B7E4-4A880F78A0B2@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 8, 2008, at 7:04 AM, Paul Moore wrote: > 2008/9/8 wesley chun : >> the goal is admirable, but unless there are paying sponsors that >> require this deadline be met, i'd suggest that we can push the >> releases until they're ready. the changes that 2.6 and 3.0 bring are >> too major to be released before they are ready for primetime. > > I believe that the reason for the Oct 1st deadline is that, if we hit > it, the new versions will be included in some vendor OS releases (I > don't know the exact details, but that's my recollection). This is what I've been told. I haven't been told that if we miss the mark, it /won't/ be included but that's my assumption. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMUl+nEjvBPtnXfVAQJHsQQAhCt1HfqB3JooC0KZXzUryRJUNMdC7QZh KiX1dayV8q0R2QZtJFBaxP05uqCMEP0uxnWGwmyUm3LT4Idmde6ZGcTnBO160HgL bjwYGYDMtS7X9PxQjMyszVY1gwIX4iFX4KhYtqXKrtodMrqwSbuH69b5cM/0RZ9s DUUPYS/qKjo= =9zBO -----END PGP SIGNATURE----- From barry at python.org Mon Sep 8 15:23:37 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Sep 2008 09:23:37 -0400 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't think there's any way we're going to make our October 1st goal. We have 8 open release critical bugs, and 18 deferred blockers. We do not have a beta3 Windows installer and I don't have high hopes for rectifying all of these problems in the next day or two. I propose that we push the entire schedule back two weeks. This means that the planned rc2 on 17-September becomes our rc1. The planned final release for 01-October becomes our rc2, and we release the finals on 15-October. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMUnWXEjvBPtnXfVAQIEAQQAnut+CRyBAacC2zzptb5l9cphwke0sEjx THJXHCBUfidaEV7SCtyfkh6i+IpqynvFRsKyOYSWsMojAa5rO/iM6ZJLkUav9c62 IzweJ6Nw3UnOJ/7xksCesDVxDRncFtvu0eRUZWDkOsrNawL+Z21DGKtAuau/pgiY sFnKeyP7NX0= =ZNPm -----END PGP SIGNATURE----- From guido at python.org Mon Sep 8 19:13:08 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Sep 2008 10:13:08 -0700 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw wrote: > I don't think there's any way we're going to make our October 1st goal. We > have 8 open release critical bugs, and 18 deferred blockers. We do not have > a beta3 Windows installer and I don't have high hopes for rectifying all of > these problems in the next day or two. > > I propose that we push the entire schedule back two weeks. This means that > the planned rc2 on 17-September becomes our rc1. The planned final release > for 01-October becomes our rc2, and we release the finals on 15-October. > > - -Barry Perhaps it's time to separate the 2.6 and 3.0 release schedules? I don't care if the next version of OSX contains 3.0 or not -- but I do care about it having 2.6. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Mon Sep 8 22:10:07 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Sep 2008 16:10:07 -0400 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: <48C50B97.1060409@gmail.com> References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> <48C1ED22.5040002@gmail.com> <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com> <48C50B97.1060409@gmail.com> Message-ID: Nick Coghlan wrote: > (I deliberately pronounce daemon as day-mon though, so I don't forget > how to spell it - perhaps pronouncing it as dee-mon makes it harder to > remember the order of the 'a' and the 'e'?) > > If it is just the specific typo as 'deamon' that concerns people, adding > a property specifically to raise an exception for that name would be far > less hassle than locking down the attributes of all Thread instances. Different people have different mis-spelling quirks. I might type demon (in other contexts) but never deamon instead of daemon. There are other stdlib attributes I am more likely to misspell, so worrying about just this one, to the point of changing the implementation, seems a bit mis-directed. From ncoghlan at gmail.com Mon Sep 8 23:01:41 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Sep 2008 07:01:41 +1000 Subject: [Python-3000] About "daemon" in threading module In-Reply-To: References: <48BFF8D9.3030002@jcea.es> <48C05131.60209@gmail.com> <48C159FE.7070400@jcea.es> <4222a8490809051006h78fe49fcm1385a105688212de@mail.gmail.com> <48C1ED22.5040002@gmail.com> <4222a8490809051944x20be939ap196ab565291629d4@mail.gmail.com> <8548c5f30809072110y9faeb0bw2a582e36d8794ff3@mail.gmail.com> <48C50B97.1060409@gmail.com> Message-ID: <48C592B5.6060203@gmail.com> Terry Reedy wrote: > Different people have different mis-spelling quirks. I might type demon > (in other contexts) but never deamon instead of daemon. There are other > stdlib attributes I am more likely to misspell, so worrying about just > this one, to the point of changing the implementation, seems a bit > mis-directed. I actually agree, but the concern about mispelling daemon in particular was raised by a couple of folks (my own opinion is that failing to set a thread's daemon status correctly should be picked up by even a pretty basic unit test suite). However, if anything at all was to be done about this, explicitly intercepting a couple of common spelling errors (such as 'demon' and 'deamon') struck me as a lower impact approach than completely blocking the addition of new attributes to Thread instances. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Tue Sep 9 00:12:36 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 09 Sep 2008 00:12:36 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> Message-ID: <48C5A354.6070908@v.loewis.de> > I wouldn't mind letting Cython special case subtypes of str (or unicode in > Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that > only applies to exactly those types *and* can be done realiably for subtypes > of subtypes. I'm just not aware of such a solution. As people have pointed out: add new fields *after* the variable-sized members. To access it, you need to compute the length of the base object, and then cast the pointer to an extension struct. That extends to further subtypes, too. Access is slightly slower, i.e. it's not a compile-time constant, but base_address + base_address[ob_len]*elem_size - more_fields_size This still compiles efficiently, e.g. on x86, gcc compiles a struct field access to movl 20(%eax), %eax and an access with a var-sized offset into movl 8(%eax), %edx; fetch length into edx movl -20(%eax,%edx,2), %eax; access 20-byte sized struct, assuming elements of size 2 > This does sound interesting, but I will have to look into the implications. As > I said, it has to be a viable solution without (noticeable) impact on other > types. I'm not sure how this would interact with subtypes of subtypes, and > what the memory layout would be in that case. See above. Regards, Martin From martin at v.loewis.de Tue Sep 9 00:16:38 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 09 Sep 2008 00:16:38 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C5A354.6070908@v.loewis.de> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5A354.6070908@v.loewis.de> Message-ID: <48C5A446.8040101@v.loewis.de> > base_address + base_address[ob_len]*elem_size - more_fields_size The subtraction is wrong, of course - it's still an addition. I was just confused by tp_dictoffset being negative in that case; the sign is but a mere flag in that case, and the offset is still positive. Regards, Martin From musiccomposition at gmail.com Tue Sep 9 01:13:29 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 8 Sep 2008 18:13:29 -0500 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> On Mon, Sep 8, 2008 at 12:13 PM, Guido van Rossum wrote: > On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw wrote: >> I don't think there's any way we're going to make our October 1st goal. We >> have 8 open release critical bugs, and 18 deferred blockers. We do not have >> a beta3 Windows installer and I don't have high hopes for rectifying all of >> these problems in the next day or two. >> >> I propose that we push the entire schedule back two weeks. This means that >> the planned rc2 on 17-September becomes our rc1. The planned final release >> for 01-October becomes our rc2, and we release the finals on 15-October. >> >> - -Barry > > Perhaps it's time to separate the 2.6 and 3.0 release schedules? I > don't care if the next version of OSX contains 3.0 or not -- but I do > care about it having 2.6. I'm not really sure what good that would do us unless we wanted to bring 3.0 back to the beta phase and continue to work on some larger issues with it. I also suspect doing two separate, but close together final releases would be more stressful than having them in lock and step. Just my pocket change, though. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From eric at trueblade.com Tue Sep 9 01:15:04 2008 From: eric at trueblade.com (Eric Smith) Date: Mon, 08 Sep 2008 19:15:04 -0400 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C5A354.6070908@v.loewis.de> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5A354.6070908@v.loewis.de> Message-ID: <48C5B1F8.5070707@trueblade.com> Martin v. L?wis wrote: >> I wouldn't mind letting Cython special case subtypes of str (or unicode in >> Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that >> only applies to exactly those types *and* can be done realiably for subtypes >> of subtypes. I'm just not aware of such a solution. > > As people have pointed out: add new fields *after* the variable-sized > members. To access it, you need to compute the length of the base > object, and then cast the pointer to an extension struct. How about putting the variable sized data _before_ the struct? That is, make the memory layout: Admittedly, accessing the string data is now more complex, since you have to know where it starts (which we already know, based on the size). But that might be simpler than having the offset logic when accessing derived object fields, because that would be different from all other C objects. There would be some complications when allocating, because of alignment issues, but I don't think it would be impossible to do this. We'd need to be careful when deallocating, as well (of course). Eric. From guido at python.org Tue Sep 9 01:25:10 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Sep 2008 16:25:10 -0700 Subject: [Python-3000] Proposed revised schedule In-Reply-To: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> Message-ID: On Mon, Sep 8, 2008 at 4:13 PM, Benjamin Peterson wrote: > On Mon, Sep 8, 2008 at 12:13 PM, Guido van Rossum wrote: >> On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw wrote: >>> I don't think there's any way we're going to make our October 1st goal. We >>> have 8 open release critical bugs, and 18 deferred blockers. We do not have >>> a beta3 Windows installer and I don't have high hopes for rectifying all of >>> these problems in the next day or two. >>> >>> I propose that we push the entire schedule back two weeks. This means that >>> the planned rc2 on 17-September becomes our rc1. The planned final release >>> for 01-October becomes our rc2, and we release the finals on 15-October. >>> >>> - -Barry >> >> Perhaps it's time to separate the 2.6 and 3.0 release schedules? I >> don't care if the next version of OSX contains 3.0 or not -- but I do >> care about it having 2.6. > > I'm not really sure what good that would do us unless we wanted to > bring 3.0 back to the beta phase and continue to work on some larger > issues with it. I also suspect doing two separate, but close together > final releases would be more stressful than having them in lock and > step. Well, from the number of release blockers it sounds like another 3.0 beta is the right thing. For 2.6 however I believe we're much closer to the finish line -- there aren't all those bytes/str issues to clean up, for example! And apparently the benefit of releasing on schedule is that we will be included in OSX. That's a much bigger deal for 2.6 than for 3.0 (I doubt that Apple would add two versions anyway). > Just my pocket change, though. > > > > -- > Cheers, > Benjamin Peterson > "There's no place like 127.0.0.1." > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Tue Sep 9 02:11:49 2008 From: lists at cheimes.de (Christian Heimes) Date: Tue, 09 Sep 2008 02:11:49 +0200 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> Message-ID: <48C5BF45.9020807@cheimes.de> Guido van Rossum wrote: > Well, from the number of release blockers it sounds like another 3.0 > beta is the right thing. For 2.6 however I believe we're much closer > to the finish line -- there aren't all those bytes/str issues to clean > up, for example! And apparently the benefit of releasing on schedule > is that we will be included in OSX. That's a much bigger deal for 2.6 > than for 3.0 (I doubt that Apple would add two versions anyway). I'm on Guido's side. Ok, from the marketing perspective it's a nice catch to release 2.6 and 3.0 on the same day. "Python 2.6.0 and 3.0.0 released" makes a great headline. But given the chance to get Python 2.6 into the next OSX version it's fine with me to release 3.0 a couple of weeks later. Python 3.0 is not ready for a release candidate. We just fixed a bunch of memory leaks and critical errors over the last week. And don't forget Windows! The Windows builds didn't get thorough testing because we didn't provide our tests with official builds. I'm +1 for a 2.6rc and another beta of 3.0 Christian From greg.ewing at canterbury.ac.nz Tue Sep 9 02:10:19 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Sep 2008 12:10:19 +1200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> Message-ID: <48C5BEEB.3030009@canterbury.ac.nz> Stefan Behnel wrote: > We create a new struct for the type that contains the parent-struct > as first field, and then we add the new attributes of the new type behind > that. I seem to remember there's a field in the type called tp_basicsize that's meant to indicate how big the base part of the struct is, with any variable-size part placed after it. If a variable-size type always uses this field to find the variable data, it seems to me that the usual scheme for subclassing should still work, with the extra fields existing in between those of the base class and the new position of the variable data. Does Py_Unicode not take notice of this field? If not, maybe that's something that should be fixed. -- Greg From python at rcn.com Tue Sep 9 04:07:48 2008 From: python at rcn.com (Raymond Hettinger) Date: Mon, 8 Sep 2008 19:07:48 -0700 Subject: [Python-3000] [Python-Dev] Proposed revised schedule References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org><1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> Message-ID: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> [Guido van Rossum] > Well, from the number of release blockers it sounds like another 3.0 > beta is the right thing. For 2.6 however I believe we're much closer > to the finish line -- there aren't all those bytes/str issues to clean > up, for example! And apparently the benefit of releasing on schedule > is that we will be included in OSX. That's a much bigger deal for 2.6 > than for 3.0 (I doubt that Apple would add two versions anyway). With the extra time, it would be worthwhile to add dbm.sqlite to 3.0 to compensate for the loss of bsddb so that shelves won't become useless on Windows builds. Raymond From guido at python.org Tue Sep 9 04:11:02 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Sep 2008 19:11:02 -0700 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> Message-ID: On Mon, Sep 8, 2008 at 7:07 PM, Raymond Hettinger wrote: > [Guido van Rossum] >> >> Well, from the number of release blockers it sounds like another 3.0 >> beta is the right thing. For 2.6 however I believe we're much closer >> to the finish line -- there aren't all those bytes/str issues to clean >> up, for example! And apparently the benefit of releasing on schedule >> is that we will be included in OSX. That's a much bigger deal for 2.6 >> than for 3.0 (I doubt that Apple would add two versions anyway). > > With the extra time, it would be worthwhile to add dbm.sqlite to 3.0 > to compensate for the loss of bsddb so that shelves won't become > useless on Windows builds. So get started already! :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Tue Sep 9 04:12:23 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 8 Sep 2008 21:12:23 -0500 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> Message-ID: <18629.56199.90786.234922@montanaro-dyndns-org.local> Raymond> With the extra time, it would be worthwhile to add dbm.sqlite Raymond> to 3.0 to compensate for the loss of bsddb so that shelves Raymond> won't become useless on Windows builds. My vote is to separate 2.6 and 3.0 then come back together for 2.7 and 3.1. I'm a bit less sure about adding dbm.sqlite. Unless Josiah's version is substantially faster and more robust I think my version needs to cook a bit longer. I'm just not comfortable enough with SQLite to pronounce my version fit enough. I only intended it as a proof-of-concept, and it's clear it has some shortcomings. Skip From martin at v.loewis.de Tue Sep 9 07:39:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 09 Sep 2008 07:39:43 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C5B1F8.5070707@trueblade.com> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5A354.6070908@v.loewis.de> <48C5B1F8.5070707@trueblade.com> Message-ID: <48C60C1F.2030501@v.loewis.de> > How about putting the variable sized data _before_ the struct? That won't work for container objects (such as tuples); they already have the GC structure before the PyObject, whose size and layout is opaque to the objects. Regards, Martin From stefan_ml at behnel.de Tue Sep 9 08:03:16 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 09 Sep 2008 08:03:16 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48C5A354.6070908@v.loewis.de> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5A354.6070908@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> I wouldn't mind letting Cython special case subtypes of str (or unicode in >> Py3) *somehow*, as long as this "somewhow" proves to be a viable solution that >> only applies to exactly those types *and* can be done realiably for subtypes >> of subtypes. I'm just not aware of such a solution. > > As people have pointed out: add new fields *after* the variable-sized > members. To access it, you need to compute the length of the base > object, and then cast the pointer to an extension struct. > > That extends to further subtypes, too. Thanks Martin, Antoine, this still requires some figuring out of the details for Cython, but I agree that Cython is a good place to handle this problem and to fix it for both Py2 and whatever Py3 will add to it. Martin, you compared these things to rocket science, so let me quote a variant of what some of the Jython people tend to say: "Cython - we write C so you don't have to." Stefan From stefan_ml at behnel.de Tue Sep 9 10:31:33 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 9 Sep 2008 08:31:33 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5BEEB.3030009@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > We create a new struct for the type that contains the parent-struct > > as first field, and then we add the new attributes of the new type behind > > that. > > I seem to remember there's a field in the type called tp_basicsize > that's meant to indicate how big the base part of the struct is, > with any variable-size part placed after it. > > If a variable-size type always uses this field to find the variable > data, it seems to me that the usual scheme for subclassing should > still work, with the extra fields existing in between those of the > base class and the new position of the variable data. > > Does Py_Unicode not take notice of this field? If not, maybe that's > something that should be fixed. Look at the layout of PyStringObject. The last entry is a char* ob_sval[1] The only purpose of that entry is to point to the buffer. That's also exploited by PyString_AS_STRING(), a macro that translates to the pointer deref "s->ob_sval". Subtypes that declare their own members will have them run into ob_sval. As you noted, a general solution for this problem would be to replace PyString_AS_STRING() and the future PyUnicode_AS_DATA() (and, well, all occurrences of "->ob_sval" in the CPython source code) by (s + s->tp_basicsize) But that would have the same impact on all string data access operations as noted by Martin. I expect that this could be done for the new PyUnicode type in Py3. The performance impact is relatively small and it removes the C subclassing problem, so that may be considered a reasonable trade-off. Regarding Cython (and Pyrex), however, it doesn't solve the problem in general for the existing Py2 versions that Cython supports (starting from 2.3), so a portable solution implemented by Cython would still be best. Stefan From stefan_ml at behnel.de Tue Sep 9 10:39:20 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 9 Sep 2008 08:39:20 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4B180.2050301@v.loewis.de> <48C5BEEB.3030009@canterbury.ac.nz> Message-ID: Stefan Behnel wrote: > (s + s->tp_basicsize) I (obviously) meant (s + Py_TYPE(s)->tp_basicsize) so the impact is another bit bigger. Stefan From mal at egenix.com Tue Sep 9 11:32:37 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Sep 2008 11:32:37 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> Message-ID: <48C642B5.2020109@egenix.com> Before jumping to conclusions, please read the discussion on the patch ticket: http://bugs.python.org/issue1943 It turned out that the patch only provides a marginal performance improvement, so the perceived main argument for the PyVarObject implementation doesn't turn out to be a real advantage. The reasons for chosing a PyObject approach for Unicode rather than a PyVarObject one like for strings were the following: * a pointer to the actual data makes it possible to implement optimizations that share data, e.g. slice objects that a parser generates when parsing a larger input string or view objects that turn a memory mapped file into a live Unicode object without any copying overhead * a fixed object size results in making good use of the Python allocator, since all objects live in the same pool; as a result you have better cache locality - which is good for situations where you have to deal with lots of objects * objects should be small in order to have lots of them in the free lists * resizing the object should not result in the object's address to change, since this is a common operation when creating Unicode objects * a fixed size PyObject makes extending the object at C level very easy (probably a few more that I've forgotten - it's been a while since the days of Python 1.6) The disadvantages of PyVarObjects w/r to extending them in C were made rather clear in this thread: * finding the extensions requires pointer arithmetic * the alignment of the extended parts has to be dealt with in the object implementation (rather than having the compiler take care of this) * when resizing the object's data, the extension parts have to be copied and realigned as well * when resizing the object's data, the addresses of the extension parts change, so code has to be aware of this, e.g. caching of the offsets is not easily possible There are also more general disadvantages: * resizing the object can cause a change in the object's address, so code has to be aware of this * objects are spread over many different pools in the memory allocator, reducing cache locality * keeping PyVarObjects in the free lists requires more memory IMHO, it's a lot better to tweak the parameters that we have in the Unicode implementation (e.g. raise the KEEPALIVE_SIZE_LIMIT to 32, see the ticket for details) and to improve the memory allocator for storage of small memory chunks or improve the free list management (which Antoine did with his free list patch). The only valid advantage I see with the PyVarObject patch is the slightly simplified implementation for the standard case. Given the number of disadvantages, that did not convince me to change my -1 on the patch. Regarding making a PyObject -> PyVarObject change in 3.0.1: that's not a good idea, since it's not a bug fix, but rather a new feature that also changes the C API significantly. Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 On 2008-09-08 00:55, Guido van Rossum wrote: > On Sun, Sep 7, 2008 at 2:23 PM, Nick Coghlan wrote: >> Guido van Rossum wrote: >>> All in all, given the advantage (half the number of allocations) of >>> the proposal I think there would have to be *very* good arguments >>> against before we reject this outright. I'd like to understand >>> Marc-Andre's reasons too. >> As Stefan notes, because of the frequency with which strings are >> manipulated in C code via PyString_* / PyUnicode_* calls, it is a data >> type where "accept no substitutes" prevails. >> >> MAL's primary concern appears to be that having Unicode as a plain >> PyObject leaves the type more open to subclass-based optimisations that >> have been rejected for the builtin types themselves. > > Hm. I don't have any particularly insightful imagination as to what > those optimizations might be. Have any been implemented (in 3rd party > code) in the 8 years that the Unicode object has existed? > >> Having >> PyString/PyBytes as PyVarObjects means that subclasses are more limited >> in what they can do. > > True. > >> One possibility that occurs to me is to use a PyVarObject variant that >> allocates space for an additional void pointer before the variable sized >> section of the object. The builtin type would leave that pointer NULL, >> but subtypes could perform the second allocation needed to populate it. >> >> The question is whether the 4-8 bytes wasted per object would be worth >> the fact that only one memory allocation would be needed. > > I believe that 4-8 bytes is more than the overhead of an extra memory > allocation from the obmalloc heap. It is probably about the same as > the overhead for a memory allocation from the regular malloc heap. So > for short strings (of which there are often a lot) it would be more > expensive; for longer objects it would probably work out just about > the same. > > There could be a different approach though, whereby the offset from > the start of the object to the start of the character array wasn't a > constant but a value stored in the class object. (In fact, > tp_basicsize could probably be used for this.) It would slow down > access to the characters a bit though -- a classic time-space > trade-off that would require careful measurement in order to decide > which is better. > From ncoghlan at gmail.com Tue Sep 9 12:20:29 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Sep 2008 20:20:29 +1000 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <18629.56199.90786.234922@montanaro-dyndns-org.local> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> Message-ID: <48C64DED.3090103@gmail.com> skip at pobox.com wrote: > Raymond> With the extra time, it would be worthwhile to add dbm.sqlite > Raymond> to 3.0 to compensate for the loss of bsddb so that shelves > Raymond> won't become useless on Windows builds. > > My vote is to separate 2.6 and 3.0 then come back together for 2.7 and 3.1. > I'm a bit less sure about adding dbm.sqlite. Unless Josiah's version is > substantially faster and more robust I think my version needs to cook a bit > longer. I'm just not comfortable enough with SQLite to pronounce my version > fit enough. I only intended it as a proof-of-concept, and it's clear it has > some shortcomings. Given that the *API* is fixed though, it is probably better to have the module present in 3.0 and bring it back to the main line in 2.7. If any absolute clangers from a performance/stability point of view get past Raymond (and everyone else with an interest in this) then they can be addressed in 3.0.1 in a few months time. Whereas if we leave the module out entirely, then 3.0 users are completely out of luck until 3.1 (or have to download and possibly build pybsddb). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From solipsis at pitrou.net Tue Sep 9 12:31:19 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Sep 2008 10:31:19 +0000 (UTC) Subject: [Python-3000] [Python-Dev] dbm.sqlite References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > Given that the *API* is fixed though, it is probably better to have the > module present in 3.0 and bring it back to the main line in 2.7. > > If any absolute clangers from a performance/stability point of view get > past Raymond (and everyone else with an interest in this) then they can > be addressed in 3.0.1 in a few months time. I agree about performance but I don't think it's right to say we can fix stability later. This is a storage module, and people risk losing their data if there are glaring bugs. If we really want an efficient dbm-compatible storage backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb? Even though it has its quirks, it's certainly much more tested than a hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild. From stefan_ml at behnel.de Tue Sep 9 12:55:07 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 9 Sep 2008 10:55:07 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: Stefan Behnel wrote: > Antoine Pitrou wrote: >> Stefan Behnel behnel.de> writes: >>> From a Cython perspective, I find the lack of efficient subclassing after >>> such a change particularly striking. >> what do you call "efficient subclassing"? if you look at the current >> implementation of unicode_subtype_new() in unicodeobject.c, it isn't very >> efficient (everything including the raw data buffer is allocated twice). > > That's something that may be optimised one day without affecting user code. Coming back to this: Why is this done anyway? Can't the new instance of the unicode-subtype just steal the buffer pointer of the already allocated unicode object? Stefan From solipsis at pitrou.net Tue Sep 9 13:13:57 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Sep 2008 11:13:57 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> <48C642B5.2020109@egenix.com> Message-ID: Hello, M.-A. Lemburg egenix.com> writes: > > It turned out that the patch only provides a marginal performance > improvement, so the perceived main argument for the PyVarObject > implementation doesn't turn out to be a real advantage. Uh, while the results are not always overwhelming, they are however far better than the simple freelist improvement (which is not even always an improvement). > * a fixed object size results in making good use of the Python > allocator, since all objects live in the same pool; as a result > you have better cache locality - which is good for situations > where you have to deal with lots of objects I'm not sure how cache locality of unrelated unicode objects helps performance. However, having a separate allocation in a different pool for the raw character data implies that cache locality is worse when it comes to actually accessing the character data (the pointer and the data is points to are in completely different areas). Pointer chasing makes memory accesses impossible to predict, and thus access latencies difficult to hide for the CPU. Anyway, it's just theoretical speculation, I think running benchmarks and comparing performance numbers is the most reasonable thing we can do (which is a bit difficult since we don't have real-world benchmarks for string processing; stringbench and pybench most probably run from the CPU cache and thus don't really stress memory access patterns; it's why I chose the simplistic split() of a very large string to demonstrate performance of my patches). > * objects should be small in order to have lots of them in > the free lists But the freelists are less efficient since they only avoid one allocation and not both of them. And if you make them avoid both allocations, then the freelists are actually bigger in memory (because of more overhead). Also, those two arguments could be made for lists vs. tuples, but I've never seen anyone dispute that tuples are more efficient than lists. > IMHO, it's a lot better to tweak the parameters that we have > in the Unicode implementation (e.g. raise the KEEPALIVE_SIZE_LIMIT > to 32, see the ticket for details) and to improve > the memory allocator for storage of small memory chunks or > improve the free list management (which Antoine did with his > free list patch). But that patch as I said above yields very mixed results, it even degrades performance in some case. I'm not against apply (some variant of) it, but it's really not a deal-breaker. Regards Antoine. From jnoller at gmail.com Tue Sep 9 14:49:20 2008 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 9 Sep 2008 08:49:20 -0400 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: On Sep 8, 2008, at 1:13 PM, "Guido van Rossum" wrote: > On Mon, Sep 8, 2008 at 6:23 AM, Barry Warsaw wrote: >> I don't think there's any way we're going to make our October 1st >> goal. We >> have 8 open release critical bugs, and 18 deferred blockers. We do >> not have >> a beta3 Windows installer and I don't have high hopes for >> rectifying all of >> these problems in the next day or two. >> >> I propose that we push the entire schedule back two weeks. This >> means that >> the planned rc2 on 17-September becomes our rc1. The planned final >> release >> for 01-October becomes our rc2, and we release the finals on 15- >> October. >> >> - -Barry > > Perhaps it's time to separate the 2.6 and 3.0 release schedules? I > don't care if the next version of OSX contains 3.0 or not -- but I do > care about it having 2.6. > Given that 2.6 is going to be more widely adopted and used by both the community and OS distributors, I'm +1 on splitting the releases as well. -Jesse From barry at python.org Tue Sep 9 15:17:10 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 9 Sep 2008 09:17:10 -0400 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> Message-ID: <937C6D77-3168-4127-8D4F-59AA291F0A86@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 8, 2008, at 1:13 PM, Guido van Rossum wrote: > Perhaps it's time to separate the 2.6 and 3.0 release schedules? I > don't care if the next version of OSX contains 3.0 or not -- but I do > care about it having 2.6. I've talked with my contact at MajorOS Vendor (tm) and, as much as he can say, he would be fine with this. They're having problems getting 3rd party modules to build against 3.0 anyway, but if we can release a very solid 2.6 by the 1-Oct deadline, I would support splitting the releases. I really don't like doing this, but if we can get 2.6 out on time, and 3.0 doesn't lag too far behind, I'm okay with it. We'll have to abbreviate the release schedule though, so everyone should concentrate on fixing the 2.6 showstoppers. I think we need to get 2.6rc1 out this week, followed by 2.6rc2 next Wednesday as planned and 2.6final on 1-October. I've shuffled the tracker to reduce all 3.0-only bugs to deferred blocker, and to increase all 2.6 deferred blockers to release blockers. There are 11 open blocker issues for 2.6: 3629 Python won't compile a regex that compiles with 2.5.2 and 30b2 3640 test_cpickle crash on AMD64 Windows build 3777 long(4.2) now returns an int 3781 warnings.catch_warnings fails gracelessly when recording warnings but... 2876 Write UserDict fixer for 2to3 2350 'exceptions' import fixer 3642 Objects/obmalloc.c:529: warning: comparison is always false due... 3617 Add MS EULA to the list of third-party licenses in the Windows... 3657 pickle can pickle the wrong function 1868 threading.local doesn't free attrs when assigning thread exits 3809 test_logging leaving a 'test.blah' file behind If we can close them by Wednesday or Thursday, and the 2.6 bots stay green, I will cut the 2.6rc1 release this week and the 2.6rc2 and final on schedule. If you're on board with this, please do what you can to resolve these open issues. As always, I'm on irc if you need to discuss anything. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMZ3V3EjvBPtnXfVAQKLbAP6A9b0WBB0H/ONZbKie2TazK/qYLthYnZQ iIpfJ2UboOA7dJ/ueXIsD413oI8GTbUOsUlJOWbSzAfJ6oBuPHrjr4IFRCZhchKG lwViDaK/7aWgIusGFpt6y/SgwJBU531wb7o3Lx/P6rLx5Wh5Nr+tvhngt0WkSMSj WtCsy3mmgmQ= =3HdI -----END PGP SIGNATURE----- From barry at python.org Tue Sep 9 15:21:53 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 9 Sep 2008 09:21:53 -0400 Subject: [Python-3000] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> Message-ID: <914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 8, 2008, at 7:25 PM, Guido van Rossum wrote: > Well, from the number of release blockers it sounds like another 3.0 > beta is the right thing. For 2.6 however I believe we're much closer > to the finish line -- there aren't all those bytes/str issues to clean > up, for example! And apparently the benefit of releasing on schedule > is that we will be included in OSX. That's a much bigger deal for 2.6 > than for 3.0 (I doubt that Apple would add two versions anyway). The MajorOS Vendor (tm) may be willing to ship a 3.0 beta if it's far enough along, though not as the primary Python version. They clearly want 2.6 for that. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMZ4cXEjvBPtnXfVAQL4ygP/fLILvf3NhvmN3R2T7htGm08xt/bOBYGt +BDrV4rapS4j3jo2Cx+McEdjJZCdq9x7BIaTN+4ITwq02LEY5fmhp6NkhzE1dlnq qdgBq8x/Z4AnsxfydtqYrPhrzLWPpdEZElgll5FB6Dj6XIA7cB8tuds2cE7+OXJI Guom1Y0k6Ao= =u4FB -----END PGP SIGNATURE----- From barry at python.org Tue Sep 9 15:23:28 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 9 Sep 2008 09:23:28 -0400 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org><1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> Message-ID: <3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 8, 2008, at 10:07 PM, Raymond Hettinger wrote: > [Guido van Rossum] >> Well, from the number of release blockers it sounds like another 3.0 >> beta is the right thing. For 2.6 however I believe we're much closer >> to the finish line -- there aren't all those bytes/str issues to >> clean >> up, for example! And apparently the benefit of releasing on schedule >> is that we will be included in OSX. That's a much bigger deal for 2.6 >> than for 3.0 (I doubt that Apple would add two versions anyway). > > With the extra time, it would be worthwhile to add dbm.sqlite to 3.0 > to compensate for the loss of bsddb so that shelves won't become > useless on Windows builds. That seems risky to me. First, it's a new feature. Second, it will be largely untested code. I would much rather see dbm.sqlite released as a separate package for possible integration into the core for 3.1. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMZ40XEjvBPtnXfVAQK2WQP/e3N2rYD2rbsoynEnXvAjzF8lPoPRFDvl hbjERsbB93uSoBPHaTdjtXnW+InC0W4GC5ogHF9wARbzYTJaxx09WmjihX+PvgsW JhXwLpG3gtyclfqSAF8MWZHc4UnKnyUt5UgYBlZrzT0z7FhWmelUPl8QhS8/2n9L oT3qX8eLabI= =Zu70 -----END PGP SIGNATURE----- From barry at python.org Tue Sep 9 15:25:03 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 9 Sep 2008 09:25:03 -0400 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 9, 2008, at 3:22 AM, Georg Brandl wrote: > Even if I can't contribute very much at the moment, I'm still +1 to > that. > I doubt Python would get nice publicity if we released a 3.0 but had > to > tell everyone, "but don't really use it yet, it may still contain any > number of showstoppers." I completely agree. We should not release anything that's not ready. Assuming that we all agree that 2.6 is much closer to being ready, that gives us two options: delay 2.6 to coincide with 3.0 or split the releases. The latter seems like the wisest choice to meet our goals. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMZ5L3EjvBPtnXfVAQJwSQP/U7FFFI8ao5Xesf6F3QFIUMYFeISrlhof 9ynkQXAskUMelAfayGMSd2nD2+buXA7gyBWplAAEF2rtLhZ3N0+zeh/2HnqcY0b9 EtUM5shAIMlb2948IMoXlxSMplH5auBHMLYFnuPAIH9ERXsGVfyihLnUarAfzmT+ XrWfjrU62TA= =CUR4 -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Sep 9 16:17:27 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2008 00:17:27 +1000 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <914B35A3-C8C2-42B6-9A3B-11E1F0F03998@python.org> Message-ID: <48C68577.6070705@gmail.com> Barry Warsaw wrote: > On Sep 8, 2008, at 7:25 PM, Guido van Rossum wrote: > >> Well, from the number of release blockers it sounds like another 3.0 >> beta is the right thing. For 2.6 however I believe we're much closer >> to the finish line -- there aren't all those bytes/str issues to clean >> up, for example! And apparently the benefit of releasing on schedule >> is that we will be included in OSX. That's a much bigger deal for 2.6 >> than for 3.0 (I doubt that Apple would add two versions anyway). > > The MajorOS Vendor (tm) may be willing to ship a 3.0 beta if it's far > enough along, though not as the primary Python version. They clearly > want 2.6 for that. Given that the sum total of actual Python 3.0 programs is currently pretty close to zero, I don't really see any reason for *any* OS vendor (even Linux distros) to be including a 3.0 interpreter in their base install at this point in time. I personally expect it to stay in the "optional extras" category until some time next year. Pessimists-have-more-opportunities-to-be-pleasantly-surprised'ly, Nick. _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Tue Sep 9 16:21:14 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2008 00:21:14 +1000 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <937C6D77-3168-4127-8D4F-59AA291F0A86@python.org> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <937C6D77-3168-4127-8D4F-59AA291F0A86@python.org> Message-ID: <48C6865A.5000703@gmail.com> Barry Warsaw wrote: > 3781 warnings.catch_warnings fails gracelessly when recording warnings I just assigned this one to myself - I'll have a patch up for review shortly (the patch will revert back to having this be a regression test suite only feature). Cheers, Nick. _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Tue Sep 9 19:06:15 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 09 Sep 2008 13:06:15 -0400 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: References: <18626.61673.143430.847735@montanaro-dyndns-org.local> Message-ID: <20080909170519.36BAE3A4072@sparrow.telecommunity.com> At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote: >On Sat, Sep 6, 2008 at 2:06 PM, wrote: > > I'm trying to figure out how to install this dbm.sqlite module I have > > without overwriting the basic install. My thought was to create a dbm > > package in site-packages then copy sqlite.py there. That doesn't work > > though. Modifying dbm.__init__.py to include this does: > > > > import pkgutil > > __path__ = pkgutil.extend_path(__path__, __name__) > > > > I'm wondering if all the core packages in 3.x should include the above in > > their __init__.py files. > > > >Well, a side-effect of this is that all package imports will suddenly >spike the number of stat calls linearly to the number of entries on >sys.path. "All package imports"? "Spike"? >Another option is to use a pth file that imports your module (as like >_dbm_sqlite.py or something) and have it, as a side-effect of >importing, set itself on dbm. That adds an import to startup time, whether you use the package or not. At least extend_path will only take effect if you actually import that package. From josiah.carlson at gmail.com Tue Sep 9 19:43:34 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Tue, 9 Sep 2008 10:43:34 -0700 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> Message-ID: On Tue, Sep 9, 2008 at 3:31 AM, Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> >> Given that the *API* is fixed though, it is probably better to have the >> module present in 3.0 and bring it back to the main line in 2.7. >> >> If any absolute clangers from a performance/stability point of view get >> past Raymond (and everyone else with an interest in this) then they can >> be addressed in 3.0.1 in a few months time. > > I agree about performance but I don't think it's right to say we can fix > stability later. This is a storage module, and people risk losing their data if > there are glaring bugs. If we really want an efficient dbm-compatible storage > backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb? > Even though it has its quirks, it's certainly much more tested than a > hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild. Yes and no. Sqlite in Python 3.0 has been tested and used more than bsddb in Python 3.0, this can be trivially seen because sqlite has been working in Python 3.0 for quite a while, which hasn't been the case with bsddb in Python 3.0. While the wrapper for sqlite to offer a dbm-like interface is relatively untested (it does have testcases thanks to Skip), dealing with a couple-hundred (at most) line wrapper is far more reasonable for testing, verification, bugfixing, etc., than the wrappers for bsddb. - Josiah From brett at python.org Tue Sep 9 21:38:28 2008 From: brett at python.org (Brett Cannon) Date: Tue, 9 Sep 2008 12:38:28 -0700 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: <20080909170519.36BAE3A4072@sparrow.telecommunity.com> References: <18626.61673.143430.847735@montanaro-dyndns-org.local> <20080909170519.36BAE3A4072@sparrow.telecommunity.com> Message-ID: On Tue, Sep 9, 2008 at 10:06 AM, Phillip J. Eby wrote: > At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote: >> >> On Sat, Sep 6, 2008 at 2:06 PM, wrote: >> > I'm trying to figure out how to install this dbm.sqlite module I have >> > without overwriting the basic install. My thought was to create a dbm >> > package in site-packages then copy sqlite.py there. That doesn't work >> > though. Modifying dbm.__init__.py to include this does: >> > >> > import pkgutil >> > __path__ = pkgutil.extend_path(__path__, __name__) >> > >> > I'm wondering if all the core packages in 3.x should include the above >> > in >> > their __init__.py files. >> > >> >> Well, a side-effect of this is that all package imports will suddenly >> spike the number of stat calls linearly to the number of entries on >> sys.path. > > "All package imports"? "Spike"? > pkgutil.extend_path() would be executed for every package imported by the fact that is code at the global level of the module. And if you look at the implementation of extend_path(), there is a os.path.isdir() call for every entry on sys.path, and if that succeeds there is os.path.isfile() call. Plus there is also an os.path.isfile() call for every sys.path entry as well. I call that a "spike" in "all package imports" in terms of stat calls if this was added to all packages as suggested. And that can be painful on systems where stat calls are expensive (e.g., NFS). At least extend_path() appends so the new entries are put at the back of the list. > >> Another option is to use a pth file that imports your module (as like >> _dbm_sqlite.py or something) and have it, as a side-effect of >> importing, set itself on dbm. > > That adds an import to startup time, whether you use the package or not. At > least extend_path will only take effect if you actually import that package. > Yes, it's a trade-off depending on what penalty cost you would prefer to pay. But as I said, I don't like the idea of letting people inject into the stdlib namespace like this in the first place so I don't want this to happen in any official capacity. -Brett From python at rcn.com Tue Sep 9 21:47:32 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 9 Sep 2008 12:47:32 -0700 Subject: [Python-3000] [Python-Dev] dbm.sqlite References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com><3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1><18629.56199.90786.234922@montanaro-dyndns-org.local><48C64DED.3090103@gmail.com> Message-ID: >>> Given that the *API* is fixed though, it is probably better to have the >>> module present in 3.0 and bring it back to the main line in 2.7. >>> >>> If any absolute clangers from a performance/stability point of view get >>> past Raymond (and everyone else with an interest in this) then they can >>> be addressed in 3.0.1 in a few months time. >> >> I agree about performance but I don't think it's right to say we can fix >> stability later. This is a storage module, and people risk losing their data if >> there are glaring bugs. If we really want an efficient dbm-compatible storage >> backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb? >> Even though it has its quirks, it's certainly much more tested than a >> hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild. > > Yes and no. Sqlite in Python 3.0 has been tested and used more than > bsddb in Python 3.0, this can be trivially seen because sqlite has > been working in Python 3.0 for quite a while, which hasn't been the > case with bsddb in Python 3.0. While the wrapper for sqlite to offer > a dbm-like interface is relatively untested (it does have testcases > thanks to Skip), dealing with a couple-hundred (at most) line wrapper > is far more reasonable for testing, verification, bugfixing, etc., > than the wrappers for bsddb. I concur. Sqlite is very stable, especially for our purposes here (records with only a text key paired with a pickled blob). Also, the dbm API and mapping API's have been worked-out long ago. Also, we've got the shelve test suite to exercise the setup. And the wrapper module is small enough and simple enough to be very easier to review. Doesn't get much easier than this. Raymond From solipsis at pitrou.net Tue Sep 9 21:49:25 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Sep 2008 19:49:25 +0000 (UTC) Subject: [Python-3000] [Python-Dev] dbm.sqlite References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> Message-ID: Josiah Carlson gmail.com> writes: > > While the wrapper for sqlite to offer > a dbm-like interface is relatively untested (it does have testcases > thanks to Skip), dealing with a couple-hundred (at most) line wrapper > is far more reasonable for testing, verification, bugfixing, etc., > than the wrappers for bsddb. There is theory and there is practice. There are lots of things that unittests can't or often don't catch. I don't think it is reasonable at all to add a completely new and untested storage backend at the last minute, while the usual advice for module inclusion is to first publish it on PyPI to get feedback and estimate popularity. Another possibility is for the module to be clearly labeled as experimental. From pje at telecommunity.com Tue Sep 9 22:31:22 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 09 Sep 2008 16:31:22 -0400 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: References: <18626.61673.143430.847735@montanaro-dyndns-org.local> <20080909170519.36BAE3A4072@sparrow.telecommunity.com> Message-ID: <20080909203021.8CC703A409C@sparrow.telecommunity.com> At 12:38 PM 9/9/2008 -0700, Brett Cannon wrote: >On Tue, Sep 9, 2008 at 10:06 AM, Phillip J. Eby wrote: > > At 03:28 PM 9/6/2008 -0700, Brett Cannon wrote: > >> > >> On Sat, Sep 6, 2008 at 2:06 PM, wrote: > >> > I'm trying to figure out how to install this dbm.sqlite module I have > >> > without overwriting the basic install. My thought was to create a dbm > >> > package in site-packages then copy sqlite.py there. That doesn't work > >> > though. Modifying dbm.__init__.py to include this does: > >> > > >> > import pkgutil > >> > __path__ = pkgutil.extend_path(__path__, __name__) > >> > > >> > I'm wondering if all the core packages in 3.x should include the above > >> > in > >> > their __init__.py files. > >> > > >> > >> Well, a side-effect of this is that all package imports will suddenly > >> spike the number of stat calls linearly to the number of entries on > >> sys.path. > > > > "All package imports"? "Spike"? > > > >pkgutil.extend_path() would be executed for every package imported by >the fact that is code at the global level of the module. Each package that uses it and that is imported, yes. > And if you >look at the implementation of extend_path(), there is a >os.path.isdir() call for every entry on sys.path, and if that succeeds >there is os.path.isfile() call. Plus there is also an os.path.isfile() >call for every sys.path entry as well. Note, btw, that that could be greatly reduced by use of sys.path_importer_cache; only entries that are missing or None need to have the subdirectory check. >I call that a "spike" in "all package imports" in terms of stat calls >if this was added to all packages as suggested. And that can be >painful on systems where stat calls are expensive (e.g., NFS). At >least extend_path() appends so the new entries are put at the back of >the list. ...which actually negates the entire point of the proposal, which was somebody wanting to be able to install an override/upgrade of a module in a stdlib package. >Yes, it's a trade-off depending on what penalty cost you would prefer >to pay. But as I said, I don't like the idea of letting people inject >into the stdlib namespace like this in the first place so I don't want >this to happen in any official capacity. IIUC, the OP was requesting the ability to *upgrade* a stdlib-provided module, not add items to the namespace. From mal at egenix.com Tue Sep 9 23:26:34 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Sep 2008 23:26:34 +0200 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: <18626.61673.143430.847735@montanaro-dyndns-org.local> References: <18626.61673.143430.847735@montanaro-dyndns-org.local> Message-ID: <48C6EA0A.9020006@egenix.com> On 2008-09-06 23:06, skip at pobox.com wrote: > I'm trying to figure out how to install this dbm.sqlite module I have > without overwriting the basic install. My thought was to create a dbm > package in site-packages then copy sqlite.py there. That doesn't work > though. Modifying dbm.__init__.py to include this does: > > import pkgutil > __path__ = pkgutil.extend_path(__path__, __name__) > > I'm wondering if all the core packages in 3.x should include the above in > their __init__.py files. If all you want to do is get the module into the dbm package, why not make this explicit by requiring an import to install the extra module ?! import install_dbm_sqlite which then does: import sys, dbm import dbm_sqlite # Install dbm_sqlite into the dbm package sys.modules['dbm.sqlite'] = dbm_sqlite dbm.sqlite = dbm_sqlite Unlike pkgutil, this also works with ZIP files and frozen modules and makes the installation explicit and visible to the user reading your code. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From mal at egenix.com Tue Sep 9 23:37:15 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Sep 2008 23:37:15 +0200 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> Message-ID: <48C6EC8B.7060502@egenix.com> On 2008-09-09 21:49, Antoine Pitrou wrote: > Josiah Carlson gmail.com> writes: >> While the wrapper for sqlite to offer >> a dbm-like interface is relatively untested (it does have testcases >> thanks to Skip), dealing with a couple-hundred (at most) line wrapper >> is far more reasonable for testing, verification, bugfixing, etc., >> than the wrappers for bsddb. > > There is theory and there is practice. There are lots of things that unittests > can't or often don't catch. I don't think it is reasonable at all to add a > completely new and untested storage backend at the last minute, while the usual > advice for module inclusion is to first publish it on PyPI to get feedback and > estimate popularity. > > Another possibility is for the module to be clearly labeled as experimental. Since when are we adding experimental modules to the stdlib ? Also, we're past beta3. This is not the time to add completely new modules to the stdlib - regardless of whether they are stable, experimental or something in between. Note that this doesn't mean to imply anything regarding the module implementation itself. It's a matter of quality control and assurance. Besides, what's so bad with downloading and installing a package from PyPI ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From skip at pobox.com Wed Sep 10 00:03:11 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 9 Sep 2008 17:03:11 -0500 Subject: [Python-3000] Should package __init__ files include pkgutil.extend_path? In-Reply-To: <48C6EA0A.9020006@egenix.com> References: <18626.61673.143430.847735@montanaro-dyndns-org.local> <48C6EA0A.9020006@egenix.com> Message-ID: <18630.62111.952479.189074@montanaro-dyndns-org.local> mal> If all you want to do is get the module into the dbm package, why mal> not make this explicit by requiring an import to install the extra mal> module ?! mal> import install_dbm_sqlite mal> which then does: mal> import sys, dbm mal> import dbm_sqlite mal> # Install dbm_sqlite into the dbm package mal> sys.modules['dbm.sqlite'] = dbm_sqlite mal> dbm.sqlite = dbm_sqlite I was hoping to make migration from an external module in test to a module distributed with Python (if it gets that far) as seamless as possible. Skip From skip at pobox.com Wed Sep 10 00:26:31 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 9 Sep 2008 17:26:31 -0500 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: <48C6EC8B.7060502@egenix.com> References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> <48C6EC8B.7060502@egenix.com> Message-ID: <18630.63511.654735.927024@montanaro-dyndns-org.local> mal> Besides, what's so bad with downloading and installing a package mal> from PyPI ? Nothing, I do it all the time. But my impression is that when an external module moves into the core it frequently undergoes some type of name change (e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are functionally identical. In this case, my hope is that dbm.sqlite will eventually move into the distributed dbm package. If so, it would be nice if the move was transparent. Skip From martin at v.loewis.de Wed Sep 10 00:29:35 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 10 Sep 2008 00:29:35 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> Message-ID: <48C6F8CF.1080905@v.loewis.de> > Coming back to this: Why is this done anyway? Can't the new instance of the > unicode-subtype just steal the buffer pointer of the already allocated unicode > object? Only if the refcount of the tmp object is 1. But then, yes, it could. You then also need to change unicode_dealloc, to only optionally release the pointer (and probably also to not put the object into the freelist if it doesn't have a str pointer). IOW, no :-) Regards, Martin From mal at egenix.com Wed Sep 10 11:23:54 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 10 Sep 2008 11:23:54 +0200 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: <18630.63511.654735.927024@montanaro-dyndns-org.local> References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> <48C6EC8B.7060502@egenix.com> <18630.63511.654735.927024@montanaro-dyndns-org.local> Message-ID: <48C7922A.2070401@egenix.com> On 2008-09-10 00:26, skip at pobox.com wrote: > mal> Besides, what's so bad with downloading and installing a package > mal> from PyPI ? > > Nothing, I do it all the time. But my impression is that when an external > module moves into the core it frequently undergoes some type of name change > (e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are > functionally identical. In this case, my hope is that dbm.sqlite will > eventually move into the distributed dbm package. If so, it would be nice > if the move was transparent. Transparent as in "I don't have to change my code" ? I actually find it helpful to have the PyPI packages that ended up in the stdlib use different names, since that opens up the possibility to use the more current releases from PyPI in an application. Switching back to the core version usually just takes a one line change, if at all... try: import pysqlite as sqlite except ImportError: import sqlite3 as sqlite -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ncoghlan at gmail.com Wed Sep 10 11:55:35 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2008 19:55:35 +1000 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: <48C7922A.2070401@egenix.com> References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> <48C6EC8B.7060502@egenix.com> <18630.63511.654735.927024@montanaro-dyndns-org.local> <48C7922A.2070401@egenix.com> Message-ID: <48C79997.3040705@gmail.com> M.-A. Lemburg wrote: > On 2008-09-10 00:26, skip at pobox.com wrote: >> mal> Besides, what's so bad with downloading and installing a package >> mal> from PyPI ? >> >> Nothing, I do it all the time. But my impression is that when an external >> module moves into the core it frequently undergoes some type of name change >> (e.g. pysqlite vs sqlite3 or Optik vs optparse) even if the two versions are >> functionally identical. In this case, my hope is that dbm.sqlite will >> eventually move into the distributed dbm package. If so, it would be nice >> if the move was transparent. > > Transparent as in "I don't have to change my code" ? > > I actually find it helpful to have the PyPI packages that ended up in > the stdlib use different names, since that opens up the possibility > to use the more current releases from PyPI in an application. > > Switching back to the core version usually just takes a one line > change, if at all... > > try: > import pysqlite as sqlite > except ImportError: > import sqlite3 as sqlite > I still think it would be kind of nice to be able to write that as: import pysqlite or sqlite3 as sqlite (ditto for "from pysqlite or sqlite3 import ") You could even do it as a pre-AST transform (similar to try/except/finally) and not even have to go anywhere near the implementation of the import system itself. I've never been motivated enough to write a PEP about it though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From fdrake at acm.org Wed Sep 10 14:09:32 2008 From: fdrake at acm.org (Fred Drake) Date: Wed, 10 Sep 2008 08:09:32 -0400 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: <48C7922A.2070401@egenix.com> References: <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> <48C6EC8B.7060502@egenix.com> <18630.63511.654735.927024@montanaro-dyndns-org.local> <48C7922A.2070401@egenix.com> Message-ID: <605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org> On Sep 10, 2008, at 5:23 AM, M.-A. Lemburg wrote: > I actually find it helpful to have the PyPI packages that ended up in > the stdlib use different names, since that opens up the possibility > to use the more current releases from PyPI in an application. I'm with Marc-Andre on this; using the same module names for different codebases is a pain. The standard library and the rest of the Python world shouldn't overlap module names (this is part of why many have requested a separate namespace for the Python standard library). -Fred -- Fred Drake From guido at python.org Wed Sep 10 18:11:41 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Sep 2008 09:11:41 -0700 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: <605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org> References: <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> <48C6EC8B.7060502@egenix.com> <18630.63511.654735.927024@montanaro-dyndns-org.local> <48C7922A.2070401@egenix.com> <605AFC31-2E44-4787-BCC8-929DE09F0BCA@acm.org> Message-ID: 2008/9/10 Fred Drake : > On Sep 10, 2008, at 5:23 AM, M.-A. Lemburg wrote: >> I actually find it helpful to have the PyPI packages that ended up in >> the stdlib use different names, since that opens up the possibility >> to use the more current releases from PyPI in an application. > > I'm with Marc-Andre on this; using the same module names for different codebases is a pain. The standard library and the rest of the Python world shouldn't overlap module names (this is part of why many have requested a separate namespace for the Python standard library). +1. Remember the xml / _xmlplus debacle? -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From edreamleo at gmail.com Thu Sep 11 17:56:48 2008 From: edreamleo at gmail.com (Edward K. Ream) Date: Thu, 11 Sep 2008 10:56:48 -0500 Subject: [Python-3000] 2to3 still broken with b3 on XP Message-ID: Just downloaded the latest Windows installer, python-3.0b3.msi The 2to3 script is still broken: C:\Python30\Tools\Scripts>c:\python30\python.exe 2to3.py Traceback (most recent call last): File "2to3.py", line 5, in sys.exit(refactor.main()) TypeError: main() takes at least 1 positional argument (0 given) Edward -------------------------------------------------------------------- Edward K. Ream email: edreamleo at gmail.com Leo: http://webpages.charter.net/edreamleo/front.html -------------------------------------------------------------------- From edreamleo at gmail.com Thu Sep 11 18:46:33 2008 From: edreamleo at gmail.com (Edward K. Ream) Date: Thu, 11 Sep 2008 11:46:33 -0500 Subject: [Python-3000] 2to3 still broken with b3 on XP In-Reply-To: References: Message-ID: On Thu, Sep 11, 2008 at 10:56 AM, Edward K. Ream wrote: > The 2to3 script is still broken: > > C:\Python30\Tools\Scripts>c:\python30\python.exe 2to3.py > Traceback (most recent call last): > File "2to3.py", line 5, in > sys.exit(refactor.main()) > TypeError: main() takes at least 1 positional argument (0 given) I hacked 2to3.py as follows: #!/usr/bin/env python from lib2to3 import refactor import sys import os sys.exit(refactor.main(fixer_dir=os.curdir)) This mostly seems to work. However, non-ascii characters can cause a crash: QQQQQQ C:\leo.repo\trunk\leo>fix test\fix-failure.py C:\leo.repo\trunk\leo>c:\Python30\python.exe c:\Python30\Tools\Scripts\2to3.py test\fix-failure.py Traceback (most recent call last): File "c:\Python30\Tools\Scripts\2to3.py", line 9, in sys.exit(refactor.main(fixer_dir=os.curdir)) File "c:\python30\lib\lib2to3\refactor.py", line 85, in main rt.refactor_args(args) File "c:\python30\lib\lib2to3\refactor.py", line 243, in refactor_args self.refactor_file(arg) File "c:\python30\lib\lib2to3\refactor.py", line 272, in refactor_file input = f.read() + "\n" # Silence certain parse errors File "c:\python30\lib\io.py", line 1719, in read decoder.decode(self.buffer.read(), final=True)) File "c:\python30\lib\io.py", line 1294, in decode output = self.decoder.decode(input, final=final) File "c:\python30\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 47: character maps to C:\leo.repo\trunk\leo>print /d:con test\fix-failure.py C:\leo.repo\trunk\leo\test\fix-failure.py is currently being printed # -*- coding: utf-8 -*- s = 'abc'.replace(u'??? ???', '" "') ? C:\leo.repo\trunk\leo> QQQQQQ The actual contents of the file are: # -*- coding: utf-8 -*- s = 'abc'.replace(u'" "', '" "') BTW, in other situations I've seen similar crashes with non-ascii characters outside the first 256 characters on Python 2.5, which seems strange to me because Leo handles all kinds of unicode characters correctly. I have no idea whether I am doing something wrong. The following appears in my sitecustomize.py file: sys.setdefaultencoding('utf-8') Edward -------------------------------------------------------------------- Edward K. Ream email: edreamleo at gmail.com Leo: http://webpages.charter.net/edreamleo/front.html -------------------------------------------------------------------- From edreamleo at gmail.com Thu Sep 11 18:53:38 2008 From: edreamleo at gmail.com (Edward K. Ream) Date: Thu, 11 Sep 2008 11:53:38 -0500 Subject: [Python-3000] 2to3 still broken with b3 on XP In-Reply-To: References: Message-ID: On Thu, Sep 11, 2008 at 11:46 AM, Edward K. Ream wrote: > The actual contents of the file are: > > # -*- coding: utf-8 -*- > > s = 'abc'.replace(u'" "', '" "') The chars got munged. The first string should have the following characters: U+201C: left double quotation mark U+201D: right double quotation mark Edward -------------------------------------------------------------------- Edward K. Ream email: edreamleo at gmail.com Leo: http://webpages.charter.net/edreamleo/front.html -------------------------------------------------------------------- From skip at pobox.com Fri Sep 12 14:17:55 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 12 Sep 2008 07:17:55 -0500 Subject: [Python-3000] How much should non-dict mappings behave like dict? Message-ID: <18634.24051.300204.451209@montanaro-dyndns-org.local> In issue 3783 (http://bugs.python.org/issue3783) the question was raised about whether or not it's worthwhile making this guarantee: zip(d.keys(), d.values()) == d.items() in the face of no changes to the mapping object. At issue is whether the SQL query should force a predictable order on the keys and values fetched from the database or if that's just wasted CPU cycles. Making it concrete, should these two SELECT statements force a consistent ordering on the keys and values retrieved from the database: select key from dict order by key select value from dict order by key Currently SQLite does return the keys and values in the same, predictable, order, but doesn't guarantee that behavior (so it could change in the future). While the discussion in the issue is related to this nascent dbm.sqlite module, I think it's worth considering the more general issue of how behavior non-dict mapping types should be required to share with the dict type. In the section "Mapping Types -- dict" in the 2.5.2 library reference: http://docs.python.org/lib/typesmapping.html there is a footnote about ordering of keys and values: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): "pairs = zip(a.values(), a.keys())". The same relationship holds for the iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(), a.iterkeys())" provides the same value for pairs. Another way to create the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]". It's not entirely clear if this page is meant to apply just to dictionaries or if (to the extent possible) it should apply to all mapping types. I'm of the opinion it should apply more broadly. Others are not of that opinion. Should the documentation be more explicit about this? Comments? Thx, Skip From eric at trueblade.com Fri Sep 12 14:51:32 2008 From: eric at trueblade.com (Eric Smith) Date: Fri, 12 Sep 2008 08:51:32 -0400 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: <18634.24051.300204.451209@montanaro-dyndns-org.local> References: <18634.24051.300204.451209@montanaro-dyndns-org.local> Message-ID: <48CA65D4.9070702@trueblade.com> skip at pobox.com wrote: > In issue 3783 (http://bugs.python.org/issue3783) the question was raised > about whether or not it's worthwhile making this guarantee: > > zip(d.keys(), d.values()) == d.items() > > in the face of no changes to the mapping object. At issue is whether the > SQL query should force a predictable order on the keys and values fetched > from the database or if that's just wasted CPU cycles. Making it concrete, > should these two SELECT statements force a consistent ordering on the keys > and values retrieved from the database: > > select key from dict order by key > select value from dict order by key > > Currently SQLite does return the keys and values in the same, predictable, > order, but doesn't guarantee that behavior (so it could change in the > future). > > While the discussion in the issue is related to this nascent dbm.sqlite > module, I think it's worth considering the more general issue of how > behavior non-dict mapping types should be required to share with the dict > type. > > In the section "Mapping Types -- dict" in the 2.5.2 library reference: > > http://docs.python.org/lib/typesmapping.html > > there is a footnote about ordering of keys and values: > > Keys and values are listed in an arbitrary order which is non-random, > varies across Python implementations, and depends on the dictionary's > history of insertions and deletions. If items(), keys(), values(), > iteritems(), iterkeys(), and itervalues() are called with no intervening > modifications to the dictionary, the lists will directly > correspond. This allows the creation of (value, key) pairs using zip(): > "pairs = zip(a.values(), a.keys())". The same relationship holds for the > iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(), > a.iterkeys())" provides the same value for pairs. Another way to create > the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]". > > It's not entirely clear if this page is meant to apply just to dictionaries > or if (to the extent possible) it should apply to all mapping types. I'm of > the opinion it should apply more broadly. Others are not of that opinion. > Should the documentation be more explicit about this? > > Comments? I think the guarantee should be removed from dicts, and in any event shouldn't be a requirement for other mappings. I think the ordering should be an implementation detail, just as the part that says "depends on the dictionary's history of insertions and deletions" need not be true for all mapping implementations. I wouldn't want the performance of any dictionary or any mapping type bound by these constraints, especially one that might be large and in a database. Given items(), I don't see why you'd ever need "zip(a.keys(), a.values())" to work. Antoine makes many of these same points in the issue comments. But then I have no current or imagined use case that would rely on this behavior. Others may of course disagree. Has anyone ever seen code that relies on this? Might such code predate items()? Eric. From barry at python.org Fri Sep 12 14:54:20 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 12 Sep 2008 08:54:20 -0400 Subject: [Python-3000] Updated release schedule for 2.6 and 3.0 Message-ID: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We had a lot of discussion recently about changing the release schedule and splitting Python 2.6 and 3.0. There was general consensus that this was a good idea, in order to hit our October 1 deadline for Python 2.6 final at least. There is only one open blocker for 2.6, issue 3657. Andrew, Fred, Tim and I (via IRC) will be getting together tonight to do some Python hacking, so we should resolve this issue and release 2.6rc1 tonight. We'll have an abbreviated 2.6rc1, and I will release 2.6rc2 next Wednesday the 17th as planned. The final planned release of 2.6 will be Wednesday October 1st. If 3.0 is looking better, I will release 3.0rc1 on Wednesday, otherwise we'll re-evaluate the release schedule for 3.0 as necessary. This means currently the schedule looks like this: Fri 12-Sep 2.6rc1 Wed 17-Sep 2.6rc2, 3.0rc1 Wed 01-Oct 2.6 final, 3.0rc2 Wed 15-Oct 3.0 final I've updated the Python Release Schedule gcal and will update the PEP momentarily. We'll close the tree later tonight (UTC-4). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMpmfHEjvBPtnXfVAQJUiQP/eTXStyp9M0+Ja7iAFfYcpzfM19j9ddBr ocMC+KTDSgci5/rmFw4KdMhqO9TBk2sXIdCd9GInnuMOKtndCKZ/PXaexnVvSVGb P3CpkkMs/vG1itQIc/EXq6CUhzwuxEv9h8Wo8+zcmL05Cc1IrE5d2OYiO0+KQ8ei lW+j/aNKMWY= =w2oI -----END PGP SIGNATURE----- From skip at pobox.com Fri Sep 12 14:59:37 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 12 Sep 2008 07:59:37 -0500 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: <48CA65D4.9070702@trueblade.com> References: <18634.24051.300204.451209@montanaro-dyndns-org.local> <48CA65D4.9070702@trueblade.com> Message-ID: <18634.26553.442410.703389@montanaro-dyndns-org.local> Eric> Given items(), I don't see why you'd ever need "zip(a.keys(), Eric> a.values())" to work. Eric> Antoine makes many of these same points in the issue comments. And as I pointed out there's no telling what users will do. The zip(keys,values) behavior works for dicts and has probably worked for other mapping types. I'm just asking here whether or not the Python documentation should be more explicit about what is and isn't expected of mapping types. Eric> But then I have no current or imagined use case that would rely on Eric> this behavior. Others may of course disagree. Has anyone ever seen Eric> code that relies on this? Might such code predate items()? Or might accidentally work: keys = d.keys() ...do something with keys and other lists... # sometime later... vals = d.values() ...do something with vals and those other lists... It might just be by accident that the code works, or it might be code which, as you indicated, predates items() (though it's been around quite awhile now). The point of only addressing this to the topic of Python 3 is that this is presumed to be the place where we remove as many warts as possible. It would also be nice to tighten up the documentation in these somewhat murky areas. Skip From edreamleo at gmail.com Fri Sep 12 15:19:27 2008 From: edreamleo at gmail.com (Edward K. Ream) Date: Fri, 12 Sep 2008 08:19:27 -0500 Subject: [Python-3000] Updated release schedule for 2.6 and 3.0 In-Reply-To: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org> References: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org> Message-ID: On Fri, Sep 12, 2008 at 7:54 AM, Barry Warsaw wrote: > We had a lot of discussion recently about changing the release schedule and > splitting Python 2.6 and 3.0. There was general consensus that this was a > good idea, in order to hit our October 1 deadline for Python 2.6 final at > least. Does it matter to anyone besides the you, the Python developers, whether the schedule slips by two weeks, or two months, for that matter? I am underwhelmed by 3.0 b3: sax and 2to3 are/were broken. A b4 release for 3.0 (at least) would seem more prudent. Edward From barry at python.org Fri Sep 12 15:20:46 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 12 Sep 2008 09:20:46 -0400 Subject: [Python-3000] Updated release schedule for 2.6 and 3.0 In-Reply-To: References: <93DF7138-3E99-44E3-AF7D-01D89D13E910@python.org> Message-ID: <364A23D0-DC52-4C1D-B853-1D8A30C6C928@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2008, at 9:19 AM, Edward K. Ream wrote: > On Fri, Sep 12, 2008 at 7:54 AM, Barry Warsaw > wrote: > >> We had a lot of discussion recently about changing the release >> schedule and >> splitting Python 2.6 and 3.0. There was general consensus that >> this was a >> good idea, in order to hit our October 1 deadline for Python 2.6 >> final at >> least. > > Does it matter to anyone besides the you, the Python developers, > whether the schedule slips by two weeks, or two months, for that > matter? For Python 3.0? No. > I am underwhelmed by 3.0 b3: sax and 2to3 are/were broken. A b4 > release for 3.0 (at least) would seem more prudent. We will release no Python before its time. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSMpsr3EjvBPtnXfVAQKCUQP/WYfGeubbkWjmwI9mlQ4dMbVjGk15imAJ ArIBs4sH9tbZTE12uNhjNgvXRbN+1QfejNeWOEJEAnPdErPAT0TKAmgA2Rj1MmjP ook5+MbxkgkNnKbz8lozMPduclc7Djf22CYboAqiskK7G6LfD1fsCrIMEVSku/HX dQpXGkG/C4g= =FYgJ -----END PGP SIGNATURE----- From guido at python.org Fri Sep 12 18:13:05 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Sep 2008 09:13:05 -0700 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: <18634.24051.300204.451209@montanaro-dyndns-org.local> References: <18634.24051.300204.451209@montanaro-dyndns-org.local> Message-ID: 2008/9/12 skip : > In issue 3783 (http://bugs.python.org/issue3783) the question was raised > about whether or not it's worthwhile making this guarantee: > > zip(d.keys(), d.values()) == d.items() > > in the face of no changes to the mapping object. At issue is whether the > SQL query should force a predictable order on the keys and values fetched > from the database or if that's just wasted CPU cycles. Making it concrete, > should these two SELECT statements force a consistent ordering on the keys > and values retrieved from the database: > > select key from dict order by key > select value from dict order by key What's the purpose of the "order by key" clauses here? Doesn't that force the return order? Perhaps you meant to leave those out? > Currently SQLite does return the keys and values in the same, predictable, > order, but doesn't guarantee that behavior (so it could change in the > future). > > While the discussion in the issue is related to this nascent dbm.sqlite > module, I think it's worth considering the more general issue of how > behavior non-dict mapping types should be required to share with the dict > type. > > In the section "Mapping Types -- dict" in the 2.5.2 library reference: > > http://docs.python.org/lib/typesmapping.html > > there is a footnote about ordering of keys and values: > > Keys and values are listed in an arbitrary order which is non-random, > varies across Python implementations, and depends on the dictionary's > history of insertions and deletions. If items(), keys(), values(), > iteritems(), iterkeys(), and itervalues() are called with no intervening > modifications to the dictionary, the lists will directly > correspond. This allows the creation of (value, key) pairs using zip(): > "pairs = zip(a.values(), a.keys())". The same relationship holds for the > iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(), > a.iterkeys())" provides the same value for pairs. Another way to create > the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]". That last example is unnecessarily odd -- why does it put the values first? > It's not entirely clear if this page is meant to apply just to dictionaries > or if (to the extent possible) it should apply to all mapping types. I'm of > the opinion it should apply more broadly. Others are not of that opinion. > Should the documentation be more explicit about this? I probably wrote an early version of that text, and I meant it to apply to dicts only. (In general this section is a description of the dict implementation, not of the mapping concept.) I do want to keep this guarantee for dicts, for the following reasons: (a) it's very unlikely it will ever change in CPython (note the caveat of no changes); (b) users will write working code that subtly depends on it, without even realizing it; (c) no amount of documentation is going to get those users not to make that assumption; (d) but documenting this requirement (for dicts) is sure to draw the attention of the implementers of alternative Python versions, who will have to implement this so as not to break the implicit assumptions of users in (b). -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Fri Sep 12 19:13:11 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 12 Sep 2008 12:13:11 -0500 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: References: <18634.24051.300204.451209@montanaro-dyndns-org.local> Message-ID: <18634.41767.788522.651599@montanaro-dyndns-org.local> >> select key from dict order by key >> select value from dict order by key Guido> What's the purpose of the "order by key" clauses here? Doesn't Guido> that force the return order? Perhaps you meant to leave those Guido> out? It's simply to guarantee that the order of the elements of values() is the same as the order of the elements of keys(). Again, I was thinking that this property: zip(d.keys(), d.values()) == d.items() was a desirable property of mappings, not just of the CPython dict implementation. So is there a definition of what it means to be a mapping? Maybe this page in the C API doc? http://docs.python.org/api/mapping.html >From that I infer that a mapping must offer these methods: keys, values, items, __len__, __contains__, __getitem__, __setitem__ and __delitem__. No guarantee about the ordering of keys, values and items is made. Can we settle on something like this and spell it out explicitly somewhere in the 3.0 docs? Skip From guido at python.org Fri Sep 12 19:33:06 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Sep 2008 10:33:06 -0700 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: <18634.41767.788522.651599@montanaro-dyndns-org.local> References: <18634.24051.300204.451209@montanaro-dyndns-org.local> <18634.41767.788522.651599@montanaro-dyndns-org.local> Message-ID: On Fri, Sep 12, 2008 at 10:13 AM, wrote: > > >> select key from dict order by key > >> select value from dict order by key > > Guido> What's the purpose of the "order by key" clauses here? Doesn't > Guido> that force the return order? Perhaps you meant to leave those > Guido> out? > > It's simply to guarantee that the order of the elements of values() is the > same as the order of the elements of keys(). Again, I was thinking that > this property: zip(d.keys(), d.values()) == d.items() was a desirable > property of mappings, not just of the CPython dict implementation. But in SQL this would force alphabetical ordering so of course it would both return them in corresponding order. Maybe we should just drop this, it seems hardly relevant. > So is there a definition of what it means to be a mapping? Maybe this page > in the C API doc? > > http://docs.python.org/api/mapping.html > > From that I infer that a mapping must offer these methods: keys, values, > items, __len__, __contains__, __getitem__, __setitem__ and __delitem__. No > guarantee about the ordering of keys, values and items is made. Can we > settle on something like this and spell it out explicitly somewhere in the > 3.0 docs? That's a C API definition that hasn't been updated. If anything documents the concept of a mapping it would be the Mapping ABC in the collections module. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From josiah.carlson at gmail.com Fri Sep 12 19:55:13 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Fri, 12 Sep 2008 10:55:13 -0700 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: References: <18634.24051.300204.451209@montanaro-dyndns-org.local> <18634.41767.788522.651599@montanaro-dyndns-org.local> Message-ID: On Fri, Sep 12, 2008 at 10:33 AM, Guido van Rossum wrote: > On Fri, Sep 12, 2008 at 10:13 AM, wrote: >> >> >> select key from dict order by key >> >> select value from dict order by key >> >> Guido> What's the purpose of the "order by key" clauses here? Doesn't >> Guido> that force the return order? Perhaps you meant to leave those >> Guido> out? >> >> It's simply to guarantee that the order of the elements of values() is the >> same as the order of the elements of keys(). Again, I was thinking that >> this property: zip(d.keys(), d.values()) == d.items() was a desirable >> property of mappings, not just of the CPython dict implementation. > > But in SQL this would force alphabetical ordering so of course it > would both return them in corresponding order. Maybe we should just > drop this, it seems hardly relevant. If the desire is to behave like a bsddb.btree instance, alphabetical is ok. Replacing the 'order by key' with 'order by rowid' is reasonably sane, if alphabetical is explicitly undesired. Really there are 3 options: alphabetical ordering, rowid ordering, no guaranteed ordering (which seems to be on-disk or rowid ordering, my brief tests tell me nothing). >> So is there a definition of what it means to be a mapping? Maybe this page >> in the C API doc? >> >> http://docs.python.org/api/mapping.html >> >> From that I infer that a mapping must offer these methods: keys, values, >> items, __len__, __contains__, __getitem__, __setitem__ and __delitem__. No >> guarantee about the ordering of keys, values and items is made. Can we >> settle on something like this and spell it out explicitly somewhere in the >> 3.0 docs? > > That's a C API definition that hasn't been updated. If anything > documents the concept of a mapping it would be the Mapping ABC in the > collections module. According to PEP 3119 on the mapping ABC: "i.e. iterating over the items, keys and values should return results in the same order." So... key ordered, or rowid ordered? - Josiah From skip at pobox.com Fri Sep 12 20:02:44 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 12 Sep 2008 13:02:44 -0500 Subject: [Python-3000] How much should non-dict mappings behave like dict? In-Reply-To: References: <18634.24051.300204.451209@montanaro-dyndns-org.local> <18634.41767.788522.651599@montanaro-dyndns-org.local> Message-ID: <18634.44740.453950.907065@montanaro-dyndns-org.local> Guido> What's the purpose of the "order by key" clauses here? Doesn't Guido> that force the return order? Perhaps you meant to leave those Guido> out? >>> >>> It's simply to guarantee that the order of the elements of values() >>> is the same as the order of the elements of keys(). Again, I was >>> thinking that this property: zip(d.keys(), d.values()) == d.items() >>> was a desirable property of mappings, not just of the CPython dict >>> implementation. >> >> But in SQL this would force alphabetical ordering so of course it >> would both return them in corresponding order. Maybe we should just >> drop this, it seems hardly relevant. Josiah> If the desire is to behave like a bsddb.btree instance, Josiah> alphabetical is ok. Replacing the 'order by key' with 'order by Josiah> rowid' is reasonably sane, if alphabetical is explicitly Josiah> undesired. Really there are 3 options: alphabetical ordering, Josiah> rowid ordering, no guaranteed ordering (which seems to be Josiah> on-disk or rowid ordering, my brief tests tell me nothing). Folks, this is my last comment on this particular issue. I think everybody misunderstands what I was getting at here. All I wanted to do was guarantee that keys and values were returned in the same order if called with no intervening updates. Ordering both statements by the keys seemed to be the easiest way to accomplish that. I could have cared less if the result was sorted, just that it was predictable. Gerhard suggested that if predictable ordering was desired that "order by rowid" would be better. >> That's a C API definition that hasn't been updated. If anything >> documents the concept of a mapping it would be the Mapping ABC in the >> collections module. Josiah> According to PEP 3119 on the mapping ABC: "i.e. iterating over Josiah> the items, keys and values should return results in the same Josiah> order." Josiah> So... key ordered, or rowid ordered? I could care less, just so it's predictable. Ordering by rowid is probably more efficient. Skip From solipsis at pitrou.net Fri Sep 12 20:10:51 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Sep 2008 18:10:51 +0000 (UTC) Subject: [Python-3000] How much should non-dict mappings behave like dict? References: <18634.24051.300204.451209@montanaro-dyndns-org.local> <18634.41767.788522.651599@montanaro-dyndns-org.local> <18634.44740.453950.907065@montanaro-dyndns-org.local> Message-ID: > Gerhard suggested that if predictable > ordering was desired that "order by rowid" would be better. I personally don't understand what predictability brings (using a disk backend implies that you should minimize queries, so using keys() then values() is inefficient compared to using items() anyway), but this will be my last comment on the issue as well :) Regards Antoine. From greg at krypto.org Sun Sep 14 02:07:46 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 13 Sep 2008 17:07:46 -0700 Subject: [Python-3000] [Python-Dev] Proposed revised schedule In-Reply-To: <3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org> References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <3DFD4AAC-D8EA-46E6-BC56-C713861C02B7@python.org> Message-ID: <52dc1c820809131707o18e75f6an24a6f3cca1dd084@mail.gmail.com> On Tue, Sep 9, 2008 at 6:23 AM, Barry Warsaw wrote: > > That seems risky to me. First, it's a new feature. Second, it will be > largely untested code. I would much rather see dbm.sqlite released as a > separate package for possible integration into the core for 3.1. > > - -Barry > +1 From andreaskalsch at gmx.de Sun Sep 14 11:57:50 2008 From: andreaskalsch at gmx.de (Andreas Kalsch) Date: Sun, 14 Sep 2008 11:57:50 +0200 Subject: [Python-3000] Hi Message-ID: <20080914095750.223700@gmx.net> Hi, I am a newbie and I have some questions: - Is there a searchable archive of this (and any other) mailing lists? http://mail.python.org/pipermail/python-3000/ doesn't let me search the archive, so I don't know, if I ask a question which has already been answered. - Where can I find replacements for removed Python 2.x modules? E.g. I want to use the commands module (to execute shell commands in Python). Where do I find it in Python 3000? Andi -- GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 From qgallet at gmail.com Sun Sep 14 12:21:05 2008 From: qgallet at gmail.com (Quentin Gallet-Gilles) Date: Sun, 14 Sep 2008 12:21:05 +0200 Subject: [Python-3000] Hi In-Reply-To: <20080914095750.223700@gmx.net> References: <20080914095750.223700@gmx.net> Message-ID: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com> Hi Andreas, - There are alternatives to the mailman interface. Gmane, for instance, is searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel - I suggest you take a look at the 2.6 library reference ( http://docs.python.org/dev/library/index.html). For the "commands" module, you'll see the following warning : "In 3.x, getstatus() and two undocumented functions (mk2arg() and mkarg()) have been removed. Also, getstatusoutput() and getoutput() have been moved to the subprocess module." Also, if you follow the link to the subprocess module documentation, you'll see many examples on how to do what you want. By the way, those questions are best answered on comp.lang.python. This list is about the core development of Python3000 exclusively. Cheers, Quentin On Sun, Sep 14, 2008 at 11:57 AM, Andreas Kalsch wrote: > Hi, > > I am a newbie and I have some questions: > > - Is there a searchable archive of this (and any other) mailing lists? > http://mail.python.org/pipermail/python-3000/ doesn't let me search the > archive, so I don't know, if I ask a question which has already been > answered. > > - Where can I find replacements for removed Python 2.x modules? E.g. I want > to use the commands module (to execute shell commands in Python). Where do I > find it in Python 3000? > > Andi > > > -- > GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry > Passion! > > http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/qgallet%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreaskalsch at gmx.de Sun Sep 14 12:59:52 2008 From: andreaskalsch at gmx.de (Andreas Kalsch) Date: Sun, 14 Sep 2008 12:59:52 +0200 Subject: [Python-3000] Hi In-Reply-To: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com> References: <20080914095750.223700@gmx.net> <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com> Message-ID: <20080914105952.18780@gmx.net> > Hi Andreas, > > - There are alternatives to the mailman interface. Gmane, for instance, is > searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel This is what I was searching for, thanks! > - I suggest you take a look at the 2.6 library reference ( > http://docs.python.org/dev/library/index.html). For the "commands" module, > you'll see the following warning : > "In 3.x, getstatus() and two undocumented functions (mk2arg() and mkarg()) > have been removed. Also, getstatusoutput() and getoutput() have been moved > to the subprocess module." > Also, if you follow the link to the subprocess module documentation, > you'll > see many examples on how to do what you want. Yes I have found, that now there is the subprocess module. > By the way, those questions are best answered on comp.lang.python. This > list > is about the core development of Python3000 exclusively. > > Cheers, > Quentin Thank you for your answers! > On Sun, Sep 14, 2008 at 11:57 AM, Andreas Kalsch > wrote: > > > Hi, > > > > I am a newbie and I have some questions: > > > > - Is there a searchable archive of this (and any other) mailing lists? > > http://mail.python.org/pipermail/python-3000/ doesn't let me search the > > archive, so I don't know, if I ask a question which has already been > > answered. > > > > - Where can I find replacements for removed Python 2.x modules? E.g. I > want > > to use the commands module (to execute shell commands in Python). Where > do I > > find it in Python 3000? > > > > Andi > > > > > > -- > > GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry > > Passion! > > > > > http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/qgallet%40gmail.com > > -- GMX Kostenlose Spiele: Einfach online spielen und Spa? haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196 From ncoghlan at gmail.com Sun Sep 14 14:06:59 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Sep 2008 22:06:59 +1000 Subject: [Python-3000] Hi In-Reply-To: <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com> References: <20080914095750.223700@gmx.net> <8b943f2b0809140321p1bc9ab22wbd85c6da05b31c1e@mail.gmail.com> Message-ID: <48CCFE63.6050109@gmail.com> Quentin Gallet-Gilles wrote: > Hi Andreas, > > - There are alternatives to the mailman interface. Gmane, for instance, > is searchable : http://news.gmane.org/gmane.comp.python.python%2d3000.devel For what it's worth, I personally just use Google on the mailman archives via a couple of keyword bookmarks in Firefox. (e.g. including "site:mail.python.org inurl:python-dev" in a Google search will search the Mailman archives for python-dev, and I can trigger such a search by typing "pydev " in the address bar). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jcea at jcea.es Wed Sep 17 20:27:21 2008 From: jcea at jcea.es (Jesus Cea) Date: Wed, 17 Sep 2008 20:27:21 +0200 Subject: [Python-3000] [Python-Dev] dbm.sqlite In-Reply-To: References: <9ADBEA78-8FB6-4C92-8140-CF74643A76B3@python.org> <1afaf6160809081613w266930efx75e9a6f5886b98e5@mail.gmail.com> <3B14584AAB7147C6892032FE3FAE3755@RaymondLaptop1> <18629.56199.90786.234922@montanaro-dyndns-org.local> <48C64DED.3090103@gmail.com> Message-ID: <48D14C09.10608@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Antoine Pitrou wrote: > I agree about performance but I don't think it's right to say we can fix > stability later. This is a storage module, and people risk losing their data if > there are glaring bugs. If we really want an efficient dbm-compatible storage > backend for all platforms on 3.0, then why not bite the bullet and re-add bsddb? > Even though it has its quirks, it's certainly much more tested than a > hypothetical dbm.sqlite whipped up in a few days and used by nobody in the wild. Of course I'm +1 to re-adding bsddb, moreover with 3.0 slipping the original 1st October release. But note than Guido in person "rather prefer" to drop bsddb in 3.0. I have a conflict talking about sqlite dbm module in 3.0. So I rather do not vote on that issue. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSNFMA5lgi5GaxT1NAQI9ywP/U4g7PjtMp5Uae0NMxByCJsbFgJPXkbMx S8xi31YqUx9j3hc/3vFjYH2+Ywf1WPTDfUN3LLhf0oVBEbwJl9QQKyua0e2AesBY g6qQ0meZdpRHm0WzHByI5/aMkxAnwEoHILveMubnQRr1KpTexGHEa6mXv5aVwkJm 6KIqS3tG0kk= =XMnZ -----END PGP SIGNATURE----- From barry at python.org Thu Sep 18 07:40:18 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 18 Sep 2008 01:40:18 -0400 Subject: [Python-3000] RELEASED Python 2.6rc2 and 3.0rc1 Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team and the Python community, I am happy to announce the second and final planned release candidate for Python 2.6, as well as the first release candidate for Python 3.0. These are release candidates, so while they are not suitable for production environments, we strongly encourage you to download and test them on your software. We expect only critical bugs to be fixed between now and the final releases. Currently Python 2.6 is scheduled for October 1st, 2008. Python 3.0 release candidate 2 is planned for October 1st, with the final release planned for October 15, 2008. If you find things broken or incorrect, please submit bug reports at http://bugs.python.org For more information and downloadable distributions, see the Python 2.6 website: http://www.python.org/download/releases/2.6/ and the Python 3.0 web site: http://www.python.org/download/releases/3.0/ See PEP 361 for release schedule details: http://www.python.org/dev/peps/pep-0361/ Enjoy, - -Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSNHpw3EjvBPtnXfVAQLW9wP/RBCaUvhuheIh+BjLLIHQFBQi7D3uVgqi l0+4fhhoKGJvtWklLfSM9I1prcjH/d6tzUu4fIOjX7aM+wZSG++vkfmBoehnhyZW AvU9Lax4mqDwhOJA2QA0WMx0obpYYVHeUl7D1g9kWzbRUkZDX9NZGMWThhEOC1qA UA3bBYbvWiQ= =BFNH -----END PGP SIGNATURE----- From gregor.lingl at aon.at Mon Sep 22 00:15:36 2008 From: gregor.lingl at aon.at (Gregor Lingl) Date: Mon, 22 Sep 2008 00:15:36 +0200 Subject: [Python-3000] turtle.Screen.__init__ issue Message-ID: <48D6C788.2050400@aon.at> Hello there, its high time to resolve an issue, which I have already addressed twice some weeks ago. (You can find a more elaborate description in my former posting cited below) There is a tiny difference (also in behaviour!) in turtle.Screen.__init__() between the versions for 2.6 and 3.0. The difference results from the fact, that I submitted the 3.0 version approx. a week later, after having it ported to 3.0. In this process I had found what I now consider to be a bug in 2.6 and changed it accordingly. Shortly: If you have already a Screen object containing some turtles and some graphics, in 2.6: s = Screen() returns an object with identical state and behaviour, but clears (re-initializes) the screen and thus destroys the content in 3.0 s = Screen() returns an object with identical state and behaviour, but leaves the content untouched The difference in code consist only in indenting the call of the __init__ method of the parent class, so it will be executed only conditionally. Anyway, as this difference between the two versions is highly undesirable there are (imho) three options to proceed: (1) correct 2.6 in order that it will work like 3.0 (2) undo the change in 3.0 in order that it will work like 2.6 (3) find a different solution for both I would (like Vern, see below) decisevely prefer option (1), and I suppose that there is not enough time left to chose option (3) as this would probably need some discussions. What is your opinion, and who should decide? For your convenience I've attached a diff-file which also contains the description of three other small bugs, which I've found in the meantime and which shouldn't cause any controversies. Regards, Gregor %%%%%%%%%% Here follows the answer of Vern Ceder - a long term turtle graphics user and author of several patches for the old turtle module - to my former posting: >> Gregor, >> >> I don't feel authoritative on the correctness/appropriateness of the implementation, >> but I do agree completely that behavior b, or what you have in the 3.0 version, >> is vastly preferable. >> >> Cheers, >> Vern -------- Original-Nachricht -------- Betreff: [Python-Dev] turtle.Screen- how to implement best a Singleton Datum: Mon, 18 Aug 2008 10:15:45 +0200 Von: Gregor Lingl An: python-dev at python.org CC: Toby Donaldson , python-3000 at python.org, jjposner at snet.net, Brad Miller , Vern Ceder Hi, this posting - concerning the new turtle module - goes to the Python-Dev and Python-3000 lists and to a couple of 'power users' of turtle graphics, hoping to recieve feedback from the developer's point of view as well as from the user's point of view. Currently the implementations of the turtle.Screen class for Python 2.6 and Python 3.0 differ by a 'tiny' detail with an important difference in behaviour. So clearly this has to be resolved before the final release.(The origin of this difference is, that when I ported turtle.py to Python 3.0 I discovered (and 'fixed') what I now consider to be a bug in the 2.6 version.) I'd like to ask you kindly for your advice to achieve an optimal solution. The posting consists of three parts: 1. Exposition of design goals 2. Problem with the implementation 3. How to solve it? Preliminary remark: I've had some discussions on this topic before but I still do not see a clear solution. Moreover I'm well aware of the fact that using the Singleton pattern is controversial. So ... 1. Exposition of design goals ... why use the Singleton design pattern? The turtle module contains a TurtleScreen class, which implements methods to control the drawing area the turtle is (turtles are) drawing on. It's constructor needs a Tkinter Canvas as argument. In order to avoid the need for users to tinker around with Tkinter stuff there is the Screen(TurtleScreen) class, designed to be used by beginners(students, kids,...), particularly in interactive sessions. A (THE (!)) Screen object is essentially a window containing a scrolled canvas, the TurtleScreen. So it's a ressource which should exist only once. It can be constructed in several ways: - implicitely by calling an arbitrary function derived from a Turtle-method, such as forward(100) or by constructing a Turtle such as bob = Turtle() - implicitely by calling an arbitrary function derived from a Screen method, such as bgcolor("red") - explicitely by calling it's constructor such as s = Screen() Anyway this construction should only happen if a Screen object doesn't exist yet. Now for the pending question: What should happen, when s = Screen() is called explicitely and there exists already 'the' Screen object. (i) Clearly s should get a reference to the existing Screen object, but ... (ii) (a)... should s be reinitialized (this is the case now in Python 2.6), or (b)... should s be left untouched (this is the case now in Python 3.0) I, for my part, prefer the latter solution (b). Example: a student, having (interactively) produced some design using some turtle t = Turtle() decides spontaneously to change backgroundcolor. s = Screen(); s.bgcolor("pink") should do this for her - instead of deleting her design and moreover her turtle. To reinitialize the screen she still can use s.clear(). Of course, there are workarounds to achieve the same effect also with solution (a), for instance by assigning s = Screen() *before* drawing anything or by assigning s = t.getscreen(). But imho (which derives itself from my experience as a teacher) solution (b) supports better the oop-view as well as experimenting spontaneously in interactive sessions. 2. Problem with the implementation The task is to derive a Singleton class from a Nonsingleton class (Screen from TurtleScreen). The current implementations of the Screen 'Singleton' both use the Borg idiom. Just for *explaining* the difference between the two versions of class Screen here concisely, I'll use a 'standard' Singleton pattern (roughly equivalent to the Borg idiom): class Spam(object): def __init__(self, s): self.s = s class SingleSpam(Spam): _inst = None def __new__(cls, *args, **kwargs): if cls != type(cls._inst): cls._inst = Spam.__new__(cls, *args, **kwargs) return cls._inst def __init__(self, s): if vars(self): return ###### should this be here??? Spam.__init__(self, s) Shortly, this means that SingleSpam.__init__() acts like an empty method whenever a (the!) SingleSpam object already exists. 3.0 version of Screen acts like this. By contrast 2.6 version of Screen acts as if the butlast line were not there and thus reinitializes the Screen object. 3. How to solve it? Main question: which *behaviour* of the Screen class should be preferred. If 3.0, is it feasible and correct not to call the constructor of the parent class if the object already exists? Additional question: Do you consider the Borg idiom a good solution for this task or should the standard singleton pattern as shown above be preferred. Or would you suggest a solution/an approach different from both? Thanks for your patience, and - in advance - for your assistance Regard, Gregor _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gregor.lingl%40aon.at -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: turtle_patch_rc2-diff.txt URL: From Graham.Dumpleton at gmail.com Fri Sep 19 13:37:09 2008 From: Graham.Dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 19 Sep 2008 04:37:09 -0700 (PDT) Subject: [Python-3000] PySys_SetObject() crashes in secondary sub interpreters. Message-ID: <8d669f5a-ff02-482c-9fa4-a1fb303ba0d8@o40g2000prn.googlegroups.com> For early Python 3.0 alpha versions I had mod_wsgi working no problems. When I tried b3 it breaks for secondary sub interpreters. In particular, a call to PySys_SetObject() crashes. >From what I can tell so far the problem is that 'interp->sysdict' is NULL after calling Py_NewInterpreter() to create a secondary sub interpreter. Reading through code and using a debugger, at this point this seems to be due to condition if code: sysmod = _PyImport_FindExtension("sys", "sys"); if (bimod != NULL && sysmod != NULL) { interp->sysdict = PyModule_GetDict(sysmod); if (interp->sysdict == NULL) goto handle_error; Py_INCREF(interp->sysdict); PySys_SetPath(Py_GetPath()); PyDict_SetItemString(interp->sysdict, "modules", interp->modules); _PyImportHooks_Init(); initmain(); if (!Py_NoSiteFlag) initsite(); } in Py_NewInterpreter() not executing due to _PyImport_FindExtension("sys", "sys") returning NULL. Down in _PyImport_FindExtension(), it appears that the reason it fails is because of following returning with NULL. def = (PyModuleDef*)PyDict_GetItemString(extensions, filename); ..... if (def->m_base.m_init == NULL) return NULL; In other words, whatever m_base.m_init is meant to be is NULL when perhaps it isn't meant to be. (gdb) call ((PyModuleDef*)PyDict_GetItemString(extensions,"builtins"))- >m_base.m_init $9 = (PyObject *(*)()) 0 (gdb) call ((PyModuleDef*)PyDict_GetItemString(extensions,"sys"))- >m_base.m_init $10 = (PyObject *(*)()) 0 I am going to keep tracking through to try and work out why, but posting this initial information in case this rings a bell with anyone. I'll also try creating a small test outside of mod_wsgi which creates a secondary interpreter and calls PySys_SetObject() to see if it crashes. This should show if there is an underlying problem, or something to do with how mod_wsgi uses interpreter creation code. Thanks in advance for any feedback. Graham From krstic at solarsail.hcs.harvard.edu Fri Sep 26 02:15:22 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Thu, 25 Sep 2008 20:15:22 -0400 Subject: [Python-3000] PyCon 2009 - Call for proposals Message-ID: Hi folks, PyCon '09 will be opening for talk proposals shortly; see below. We'd love to have some great talks on Python 3000, so please don't be shy! Cheers, Ivan Krsti? Chair, PyCon 2009 Program Committee * * * Call for proposals -- PyCon 2009 -- =============================================================== Want to share your experience and expertise? PyCon 2009 is looking for proposals to fill the formal presentation tracks. The PyCon conference days will be March 27-29, 2009 in Chicago, Illinois, preceded by the tutorial days (March 25-26), and followed by four days of development sprints (March 30-April 2). Previous PyCon conferences have had a broad range of presentations, from reports on academic and commercial projects to tutorials and case studies. We hope to continue that tradition this year. Online proposal submission will open on September 29, 2008. Proposals will be accepted through November 03, with acceptance notifications coming out on December 15. For the detailed call for proposals, please see: We look forward to seeing you in Chicago! -- Ivan Krsti? | http://radian.org From solipsis at pitrou.net Fri Sep 26 12:08:49 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 26 Sep 2008 10:08:49 +0000 (UTC) Subject: [Python-3000] PyUnicodeObject implementation References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> <48C642B5.2020109@egenix.com> Message-ID: So what would be the outcome of this discussion, and should a decision (and which one) be taken? Regards Antoine. From mal at egenix.com Fri Sep 26 16:42:59 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 26 Sep 2008 16:42:59 +0200 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> <48C642B5.2020109@egenix.com> Message-ID: <48DCF4F3.3050803@egenix.com> On 2008-09-26 12:08, Antoine Pitrou wrote: > So what would be the outcome of this discussion, and should a decision (and > which one) be taken? I'm still -1 on changing Unicode objects to PyVarObjects for the reasons already stated in various postings on this thread and on the ticket. I'd much rather like to see the parameters of the implementation optimized (both in the Unicode implementation and pymalloc). See the ticket discussion for details. Regards, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 26 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From guido at python.org Fri Sep 26 19:00:09 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Sep 2008 10:00:09 -0700 Subject: [Python-3000] PyUnicodeObject implementation In-Reply-To: <48DCF4F3.3050803@egenix.com> References: <200809051954.42787.jeremy.kloth@gmail.com> <48C4464E.5010707@gmail.com> <48C642B5.2020109@egenix.com> <48DCF4F3.3050803@egenix.com> Message-ID: On Fri, Sep 26, 2008 at 7:42 AM, M.-A. Lemburg wrote: > On 2008-09-26 12:08, Antoine Pitrou wrote: >> So what would be the outcome of this discussion, and should a decision (and >> which one) be taken? > > I'm still -1 on changing Unicode objects to PyVarObjects for the > reasons already stated in various postings on this thread and on > the ticket. I still find those reasons rather weak; the old 8-bit string object was pretty darn successful despite being a PyVarObject. > I'd much rather like to see the parameters of the implementation > optimized (both in the Unicode implementation and pymalloc). See the > ticket discussion for details. I think the only way to decide is to have an alternative implementation ready and prove that it is faster. Or maybe it isn't, which would also decide the case. In order to allow for a fair race, if the new implementation comes up with a neat speed-up trick that could also be applied to the old implementation, it should be remove or applied to both implementations. That is, I don't want a black-box race -- I want to see it proven that for a realistic app the choice to use a PyVarObject actually makes a difference. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Sat Sep 27 00:24:37 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 26 Sep 2008 18:24:37 -0400 Subject: [Python-3000] Reminder: Python 2.6 final next Wednesday Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is a reminder that Python 2.6 final is schedule for release next Wednesday, October 1st. Once again, I've gone through the release blocker issues and knocked anything that doesn't specifically affect 2.6 to deferred blocker. This leaves us with 7 open blocking issues. Please spend some time over the next several days reviewing patches, making comments and working toward closing these issues. Email me directly if you have any questions, or ping me on irc. Unfortunately, my 'net connection may be a little bit flakey until Wednesday, but I will do my best to get online and follow up as needed. Please pitch in to help get Python 2.6 released on time! Thanks, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSN1hJnEjvBPtnXfVAQIQfQQAgYr0tuzJhm3LZX/1SaCwRJcX09UvNH1I CjZHs2TKS22MjF9d3mmBgcEJPl9AwGE+6EF6OiSgrsNRoRtnN0MMT3nQo+deRkan P3jUgMFJMFkA7Uq5MmuNnEnKZXa/bsu/8Om/4wqHqvtDXbUQkZPfyE8BFwBzJJSM Aa6Wp3wieFs= =r4iD -----END PGP SIGNATURE----- From giles at spacepigs.com Sat Sep 27 22:16:32 2008 From: giles at spacepigs.com (Giles Constant) Date: Sat, 27 Sep 2008 21:16:32 +0100 (BST) Subject: [Python-3000] Alternative to standard regular expressions Message-ID: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> unsure if this is the right place to voice my thoughts on such a thing, but given the idealism of python (particularly as an anti-thesis to much of the ideas of perl), after trying to fix a broken perl script late at night, It occurred to me that regular expressions are somewhat un-pythonic. I actually find the python 're' module, although more versatile than regular expressions in perl, something that I always have to refer to the manual for, in spite of the number of times I've used it. In other words, I'm tempted to stretch our beloved term "unpythonic" to regular expressions. This is rare for a small python module. So I thought it's time to start something new, perhaps as a python module. I've googled around to see if there's any attempts at an alternative out there, and found nothing, although there have been some people who have made some very well written articles about how regular expressions are a problem in a number of ways: 1) They look horrible. Like line noise. Each character is a functional unit, meaning something that would take a paragraph to describe is reduced to a small number of characters. Given that programmers tend to spend more time thinking than typing, I don't see any advantage to this. 2) They can fail in subtle ways. Exceptional cases can emerge where an expression which works in 99% of cases starts losing characters whose possibility were missed by the author 3) They can very quickly become rather long (check the expression for an email address in the back of the 'mastering regular expressions' o'reilly book). 4) The use of multi-line switches and other trailing-end characters complicates things further. One of the great things about python is that its string, slice, and split/join functions mean that I rarely use regular expressions in python. In fact, I try to avoid it. But a more pythonic matching and substitution system could be a great thing. The first thing that occurred to me in trying to imagine what an easier to use alternative would look like is that they're the wrong way round: the functional characters - the things that actually do things - are escaped, while the match strings written in text are the default. Unless you're trying to write a '/' or '\', that is, which you have to escape (carefully, if you're writing something exposed to the internet and you don't want your server hosed by a hacker). In other words, it is the match string which should be treated as special, and the special functions which should be the norm. So, for an example first foray into this idea (I'm making this up as I go along.. I should point out!) Instead of: /\d+hello/ How about (explanation of syntax to follow): boolean = match(input, "oneormore(digit).one('hello')") I'm using a '.' to separate lexical units here. The specifying functions indicate how many times or under what circumstances the unit is matched, and within the brackets are classes representing what needs to be matched. 'digit' represents '\d' in this case, and a string is just that. Taking it a bit further: /\d{1,3}hello/ is replaced by boolean = match(input, "range(digit, (1,3)).one('hello')" Ok, so what about substitution.. s/.*(hello).*/$1/ result = substitute(input, "many(char)|one('hello')|many(char)", "match(0)") Instead of dots, matches which should be captured are contained between pipe symbols. I'm still having an argument with myself as to whether some sort of function/keyword should be used instead. I dunno. That's why I emailed you guys :-) I'm going to have a bigger think about this tomorrow, but I think it could be a great feature. Cheers! (and thanks for a great language), Giles From musiccomposition at gmail.com Sat Sep 27 22:52:50 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sat, 27 Sep 2008 15:52:50 -0500 Subject: [Python-3000] Alternative to standard regular expressions In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> Message-ID: <1afaf6160809271352m3e30be10jc260fd5fb2b4d0de@mail.gmail.com> On Sat, Sep 27, 2008 at 3:16 PM, Giles Constant wrote: > unsure if this is the right place to voice my thoughts on such a thing, > but given the idealism of python (particularly as an anti-thesis to much of > the ideas of perl), after trying to fix a broken perl script late at > night, It > occurred to me that regular expressions are somewhat un-pythonic. I actually > find the python 're' module, although more versatile than regular expressions > in perl, something that I always have to refer to the manual for, in spite of > the number of times I've used it. In other words, I'm tempted to stretch our > beloved term "unpythonic" to regular expressions. This is rare for a small > python module. Try the comp.lang.python or python-ideas mailing list. This list is more devoted to the current development of Python than new ideas. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From digitalxero at gmail.com Sun Sep 28 21:00:10 2008 From: digitalxero at gmail.com (Dj Gilcrease) Date: Sun, 28 Sep 2008 13:00:10 -0600 Subject: [Python-3000] Alternative to standard regular expressions In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> Message-ID: On Sat, Sep 27, 2008 at 2:16 PM, Giles Constant wrote: > Instead of: > /\d+hello/ > > How about (explanation of syntax to follow): > > boolean = match(input, "oneormore(digit).one('hello')") Looks like you want pyparsing http://pyparsing.wikispaces.com/ From greg at krypto.org Sun Sep 28 22:34:50 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 28 Sep 2008 13:34:50 -0700 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48DE705E.6050405@v.loewis.de> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> Message-ID: <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> On 9/27/08, "Martin v. L?wis" wrote: >> I think that the problem is important because it's a regression from 2.5 >> to >> 2.6/3.0. Python 2.5 uses bytes filename, so it was possible to >> open/unlink "invalid" unicode strings (since it's not unicode but bytes). > > I'd like to stress that the problem is *not* a regression from 2.5 to 2.6. > > As for 3.0, I'd like to argue that the problem is a minor issue. Even > though you may run into file names that can't be decoded, that happening > really indicates some bigger problem in the management of the system > where this happens, and the proper solution (IMO) should be to change > the system (leaving open the question whether or not Python should > be also changed to work with such broken systems). > > Regards, > Martin Note: bcc python-dev,cc: python-3000 "broken" systems will always exist. Code to deal with them must be possible to write in python 3.0. since any given path (not just fs) can have its own encoding it makes the most sense to me to let the OS deal with the errors and not try to enforce bytes vs string encoding type at the python lib. level. -gps From martin at v.loewis.de Sun Sep 28 23:13:38 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 28 Sep 2008 23:13:38 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> Message-ID: <48DFF382.7020006@v.loewis.de> > "broken" systems will always exist. Code to deal with them must be > possible to write in python 3.0. Python 3.0 will have bugs. This might just be one of them. I can agree that Python 3.x will need to support that somehow, but perhaps not 3.0. Regards, Martin From greg.ewing at canterbury.ac.nz Mon Sep 29 00:55:09 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Sep 2008 10:55:09 +1200 Subject: [Python-3000] Alternative to standard regular expressions In-Reply-To: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> References: <59903.81.152.141.254.1222546592.squirrel@spacepigs.com> Message-ID: <48E00B4D.9040904@canterbury.ac.nz> Giles Constant wrote: > How about (explanation of syntax to follow): > > boolean = match(input, "oneormore(digit).one('hello')") Take this a step further and use constructor functions to build the RE. from spiffy_re import one, oneormore pattern = oneormore(digit) + one('hello') match = pattern.match(input) -- Greg From greg at krypto.org Mon Sep 29 01:21:16 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 28 Sep 2008 16:21:16 -0700 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48DFF382.7020006@v.loewis.de> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> Message-ID: <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. L?wis" wrote: >> "broken" systems will always exist. Code to deal with them must be >> possible to write in python 3.0. > > Python 3.0 will have bugs. This might just be one of them. I can agree > that Python 3.x will need to support that somehow, but perhaps not 3.0. > > Regards, > Martin Agreed. At this point I think we just need to get 3.0 out there and be willing to fix flaws like this for 3.1 or in some cases for 3.0.1. From foom at fuhm.net Mon Sep 29 06:43:55 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Sep 2008 00:43:55 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> Message-ID: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> On Sep 28, 2008, at 7:21 PM, Gregory P. Smith wrote: > On Sun, Sep 28, 2008 at 2:13 PM, "Martin v. L?wis" > wrote: >>> "broken" systems will always exist. Code to deal with them must be >>> possible to write in python 3.0. >> >> Python 3.0 will have bugs. This might just be one of them. I can >> agree >> that Python 3.x will need to support that somehow, but perhaps not >> 3.0. >> >> Regards, >> Martin > > Agreed. At this point I think we just need to get 3.0 out there and > be willing to fix flaws like this for 3.1 or in some cases for 3.0.1. This problem sure would be "practically" solved simply by switching the way the filesystemencoding is selected. You'll note that if you want things to Just Work for a backup tool with today's Py3k, all you need to do is switch the filesystem encoding to iso-8859-1. In that encoding, every byte string has an associated unique unicode string, so there's no problem with any possible filename. With that in mind, here's my proposal: a) Whenever ASCII would be selected as a filesystem encoding, use iso-8859-1 instead. a) Whenever UTF-8 would be selected as a filesystem encoding, use UTF-8b [1] instead. It's clearly not a 100% perfect solution, but it completely solves the issue for users with the most popular filesystem encodings: ASCII, iso-8859-1, and UTF-8. IMO, that's good enough to just leave things there. But even if it's deemed not good enough, and the byte-string level file access APIs are all implemented, I *still* think doing the above is a good idea. It makes unicode string file/environment/argv access work in a huge majority of cases: a) windows always, b) Mac OS X always, c) ASCII locale always, d) ISO-8859-1 locale always, e) UTF-8 locale always, f) other locales when the filenames really are encoded in their locale. It will make users happy, and it's simple enough to implement for python 3.0. James [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be successfully round-tripped. It's not currently implemented in python core, but it's a pretty trivial encoding, and is available under the BSD license, see below. Background: http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html Blog post: http://bsittler.livejournal.com/10381.html Implementation for python: http://hyperreal.org/~est/libutf8b/ James From martin at v.loewis.de Mon Sep 29 07:09:11 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Sep 2008 07:09:11 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: <48E062F7.8060501@v.loewis.de> > This problem sure would be "practically" solved simply by switching the > way the filesystemencoding is selected. Great minds think alike :-) I just proposed a similar approach in the tracker, with the following variations: - applications can explicitly set the file system encoding. If they set it to Latin-1, they can access all files on a POSIX system. - use private-use characters for unrepresentable bytes For the second item, there was the immediate objection that this gives conflicts in UTF-8, for which UTF-8b could be a good solution. Regards, Martin From rhamph at gmail.com Mon Sep 29 09:32:48 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 01:32:48 -0600 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight wrote: > [1] UTF-8b has a similar property to 8859-1, in that all byte strings can be > successfully round-tripped. It's not currently implemented in python core, > but it's a pretty trivial encoding, and is available under the BSD license, > see below. UTF-8b doesn't work as intended. It produces an invalid unicode object (garbage surrogates) that cannot be used with external APIs or libraries that require unicode. If you don't need unicode then your code should state so explicitly, and 8859-1 is ideal there. -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Mon Sep 29 13:12:43 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Sep 2008 11:12:43 +0000 (UTC) Subject: [Python-3000] =?utf-8?q?=5BPython-Dev=5D_Filename_as_byte_string_?= =?utf-8?b?aW4gcHl0aG9uCTIuNiBvciAzLjA/?= References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: Adam Olsen gmail.com> writes: > > UTF-8b doesn't work as intended. It produces an invalid unicode > object (garbage surrogates) that cannot be used with external APIs or > libraries that require unicode. At least it works with all Python operations supported by the unicode type (methods, concatenation, etc.) without any bad surprise. That feeding it to e.g. PyGTK may give bogus results is another problem. > If you don't need unicode then your > code should state so explicitly, and 8859-1 is ideal there. But then you can say bye-bye to proper representation (e.g. using print()) of even valid filenames. From victor.stinner at haypocalc.com Mon Sep 29 14:07:55 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 29 Sep 2008 14:07:55 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue Message-ID: <200809291407.55291.victor.stinner@haypocalc.com> Hi, After reading the previous discussion, here is new proposition. Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX (eg. Linux or *BSD) is affected. Some system are broken, but Python have to be able to open/copy/move/remove files with an "invalid filename". The issue can wait for Python 3.0.1 / 3.1. Windows ------- On Windows, we might reject bytes filenames for all file operations: open(), unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) POSIX OS -------- The default behaviour should be to use unicode and raise an error if conversion to unicode fails. It should also be possible to use bytes using bytes arguments and optional arguments (for getcwd). - listdir(unicode) -> unicode and raise an error on invalid filename - listdir(bytes) -> bytes - getcwd() -> unicode - getcwd(bytes=True) -> bytes - open(): accept bytes or unicode os.path.*() should accept operations on bytes filenames, but maybe not on bytes+unicode arguments. os.path.join('directory', b'filename'): raise an error (or use *implicit* conversion to bytes)? When the user wants to display a filename to the screen, he can uses: text = str(filename, fs_encoding, "replace") -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From victor.stinner at haypocalc.com Mon Sep 29 14:12:07 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 29 Sep 2008 14:12:07 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: <200809291412.07840.victor.stinner@haypocalc.com> Le Monday 29 September 2008 06:43:55, vous avez ?crit?: > It will make users happy, and it's simple enough to implement for > python 3.0. I dislike your argument. A "quick and dirty hack" is always faster to implement than a real solution, but we may hits later new issues if we don't choose the right solution. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From victor.stinner at haypocalc.com Mon Sep 29 15:23:04 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 29 Sep 2008 15:23:04 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <200809291523.04421.victor.stinner@haypocalc.com> Patches are already avaible in the issue #3187 (os.listdir): Le Monday 29 September 2008 14:07:55 Victor Stinner, vous avez ?crit?: > - listdir(unicode) -> unicode and raise an error on invalid filename Need raise_decoding_errors.patch (don't clear Unicode error > - listdir(bytes) -> bytes Always working. > - getcwd() -> unicode > - getcwd(bytes=True) -> bytes Need merge_os_getcwd_getcwdu.patch Note that current implement of getcwd() uses PyUnicode_FromString() to encode the directory, whereas getcwdu() uses the correct code (PyUnicode_Decode). So I merged both functions to keep only the correct version: getcwdu() => getcwd(). > - open(): accept bytes or unicode Need io_byte_filename.patch (just remove a check) > os.path.*() should accept operations on bytes filenames, but maybe not on > bytes+unicode arguments. os.path.join('directory', b'filename'): raise an > error (or use *implicit* conversion to bytes)? os.path.join() already reject mixing bytes + str. But os.path.join(), glob.glob(), fnmatch.*(), etc. doesn't support bytes. I wrote some patches like: - glob1_bytes.patch: Fix glob.glob() to accept invalid directory name - fnmatch_bytes.patch: Patch fnmatch.filter() to accept bytes filenames But I dislike both patches since they mix bytes and str. So this part still need some work. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From steven.bethard at gmail.com Mon Sep 29 17:16:47 2008 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 29 Sep 2008 09:16:47 -0600 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 6:07 AM, Victor Stinner wrote: > The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). > > - listdir(unicode) -> unicode and raise an error on invalid filename > - listdir(bytes) -> bytes > - getcwd() -> unicode > - getcwd(bytes=True) -> bytes Please let's not introduce boolean flags like this. How about ``getcwdb`` in parallel with the old ``getcwdu``? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From victor.stinner at haypocalc.com Mon Sep 29 18:00:18 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 29 Sep 2008 18:00:18 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <200809291800.18911.victor.stinner@haypocalc.com> Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez ?crit?: > > - getcwd() -> unicode > > - getcwd(bytes=True) -> bytes > > Please let's not introduce boolean flags like this. How about > ``getcwdb`` in parallel with the old ``getcwdu``? Yeah, you're right. So i wrote a new patch: os_getcwdb.patch With my patch we get (Python3): * os.getcwd() -> unicode * os.getcwdb() -> bytes Previously in Python2 it was: * os.getcwd() -> str (bytes) * os.getcwdu() -> unicode -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From foom at fuhm.net Mon Sep 29 18:16:32 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Sep 2008 12:16:32 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote: > On Sun, Sep 28, 2008 at 10:43 PM, James Y Knight > wrote: >> [1] UTF-8b has a similar property to 8859-1, in that all byte >> strings can be >> successfully round-tripped. It's not currently implemented in >> python core, >> but it's a pretty trivial encoding, and is available under the BSD >> license, >> see below. > > UTF-8b doesn't work as intended. It produces an invalid unicode > object (garbage surrogates) that cannot be used with external APIs or > libraries that require unicode. I'd be interested to hear more detail on what you expect the practical ramifications of this to be. It doesn't sound likely to be a problem to me. > If you don't need unicode then your > code should state so explicitly, and 8859-1 is ideal there. But, I *do* want unicode. ALL my filenames are encoded in utf8. Except...that one over there. That's the whole point of UTF-8b: correctly encoded names get decoded correctly and readably, and the other cases get decoded into something unique that cannot possibly conflict. James From g.brandl at gmx.net Mon Sep 29 18:45:28 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Sep 2008 18:45:28 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: Victor Stinner schrieb: > POSIX OS > -------- > > The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). > > - listdir(unicode) -> unicode and raise an error on invalid filename > - listdir(bytes) -> bytes > - getcwd() -> unicode > - getcwd(bytes=True) -> bytes > - open(): accept bytes or unicode > > os.path.*() should accept operations on bytes filenames, but maybe not on > bytes+unicode arguments. os.path.join('directory', b'filename'): raise an > error (or use *implicit* conversion to bytes)? This approach (changing all path-handling functions to accept either bytes or string, but not both) is doomed in my eyes. First, there are lots of them, second, they are not only in os.path but in many modules and also in user code, and third, I see no clean way of implementing them in the specified way. (Just try to do it with os.path.join as an example; I couldn't find the good way to write it, only the bad and the ugly...) If I had to choose, I'd still argue for the modified UTF-8 as filesystem encoding (if it were UTF-8 otherwise), despite possible surprises when a such-encoded filename escapes from Python. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Mon Sep 29 19:06:01 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Sep 2008 10:06:01 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: > Victor Stinner schrieb: (Thanks Victor for moving this to the list. Having a discussion in the tracker is really painful, I find.) >> POSIX OS >> -------- >> >> The default behaviour should be to use unicode and raise an error if >> conversion to unicode fails. It should also be possible to use bytes using >> bytes arguments and optional arguments (for getcwd). >> >> - listdir(unicode) -> unicode and raise an error on invalid filename I know I keep flipflopping on this one, but the more I think about it the more I believe it is better to drop those names than to raise an exception. Otherwise a "naive" program that happens to use os.listdir() can be rendered completely useless by a single non-UTF-8 filename. Consider the use of os.listdir() by the glob module. If I am globbing for *.py, why should the presence of a file named b'\xff' cause it to fail? Robust programs using os.listdir() should use the bytes->bytes version. >> - listdir(bytes) -> bytes >> - getcwd() -> unicode >> - getcwd(bytes=True) -> bytes >> - open(): accept bytes or unicode >> >> os.path.*() should accept operations on bytes filenames, but maybe not on >> bytes+unicode arguments. os.path.join('directory', b'filename'): raise an >> error (or use *implicit* conversion to bytes)? (Yeah, it should be all bytes or all strings.) On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl wrote: > This approach (changing all path-handling functions to accept either bytes > or string, but not both) is doomed in my eyes. First, there are lots of them, > second, they are not only in os.path but in many modules and also in user > code, and third, I see no clean way of implementing them in the specified way. > (Just try to do it with os.path.join as an example; I couldn't find the > good way to write it, only the bad and the ugly...) It doesn't have to be supported for all operations -- just enough to be able to access all the system calls. and do the most basic pathname manipulations (split and join -- almost everything else can be built out of those). > If I had to choose, I'd still argue for the modified UTF-8 as filesystem > encoding (if it were UTF-8 otherwise), despite possible surprises when a > such-encoded filename escapes from Python. I'm having a hard time finding info about UTF-8b. Does anyone have a decent link? I noticed that OSX has a different approach yet. I believe it insists on valid UTF-8 filenames. It may even require some normalization but I don't know if the kernel enforces this. I tried to create a file named b'\xff' and it came out as %ff. Then "rm %ff" worked. So I think it may be replacing all bad UTF8 sequences with their % encoding. The "set filesystem encoding to be Latin-1" approach has a certain charm as well, but clearly would be a mistake on OSX, and probably on other systems too (whenever the user doesn't think in Latin-1). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Mon Sep 29 23:57:45 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 15:57:45 -0600 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: On Mon, Sep 29, 2008 at 5:12 AM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> >> UTF-8b doesn't work as intended. It produces an invalid unicode >> object (garbage surrogates) that cannot be used with external APIs or >> libraries that require unicode. > > At least it works with all Python operations supported by the unicode type > (methods, concatenation, etc.) without any bad surprise. That feeding it to e.g. > PyGTK may give bogus results is another problem. > >> If you don't need unicode then your >> code should state so explicitly, and 8859-1 is ideal there. > > But then you can say bye-bye to proper representation (e.g. using print()) of > even valid filenames. You can't print UTF-8b either. Printing requires converting the unicode object to UTF-8 (or whatever output encoding), and the unicode object isn't valid, so you'd get an exception[1]. The same applies to all other hacks (such as PUA scalars). Either the scalar value already has an expected behaviour, in which case decoding is lossy and reencoding replaces the correct behaviour, or it's not a valid scalar value, which then can't be used with any external API that requires conformant unicode. There's no solution except to not decode, and 8859-1 is the way to do that. [1] Python's UTF codecs are broken in a couple respects, including the fact that python itself uses CESU-8(!). See http://bugs.python.org/issue3297 and http://bugs.python.org/issue3672 -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Tue Sep 30 00:04:38 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 16:04:38 -0600 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <200809291800.18911.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809291800.18911.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 10:00 AM, Victor Stinner wrote: > Le Monday 29 September 2008 17:16:47 Steven Bethard, vous avez ?crit : >> > - getcwd() -> unicode >> > - getcwd(bytes=True) -> bytes >> >> Please let's not introduce boolean flags like this. How about >> ``getcwdb`` in parallel with the old ``getcwdu``? > > Yeah, you're right. So i wrote a new patch: os_getcwdb.patch > > With my patch we get (Python3): > * os.getcwd() -> unicode > * os.getcwdb() -> bytes > > Previously in Python2 it was: > * os.getcwd() -> str (bytes) > * os.getcwdu() -> unicode Why not do: * os.getcwd() -> unicode * posix.getcwdb() -> bytes os gets the standard version and posix has an (unambiguously named) platform-specific version. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Tue Sep 30 00:17:20 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 16:17:20 -0600 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 11:06 AM, Guido van Rossum wrote: > On Mon, Sep 29, 2008 at 9:45 AM, Georg Brandl wrote: > >> This approach (changing all path-handling functions to accept either bytes >> or string, but not both) is doomed in my eyes. First, there are lots of them, >> second, they are not only in os.path but in many modules and also in user >> code, and third, I see no clean way of implementing them in the specified way. >> (Just try to do it with os.path.join as an example; I couldn't find the >> good way to write it, only the bad and the ugly...) > > It doesn't have to be supported for all operations -- just enough to > be able to access all the system calls. and do the most basic pathname > manipulations (split and join -- almost everything else can be built > out of those). > >> If I had to choose, I'd still argue for the modified UTF-8 as filesystem >> encoding (if it were UTF-8 otherwise), despite possible surprises when a >> such-encoded filename escapes from Python. > > I'm having a hard time finding info about UTF-8b. Does anyone have a > decent link? http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html Scroll down to item D, near the bottom. It turns malformed bytes into lone (therefor malformed) surrogates. > I noticed that OSX has a different approach yet. I believe it insists > on valid UTF-8 filenames. It may even require some normalization but I > don't know if the kernel enforces this. I tried to create a file named > b'\xff' and it came out as %ff. Then "rm %ff" worked. So I think it > may be replacing all bad UTF8 sequences with their % encoding. I suspect linux will eventually take this route as well. If ext3 had an option for UTF-8 validation I know I'd want it on. That'd move the error to the program creating bogus file names, rather than those trying to read, display, and manage them. > The "set filesystem encoding to be Latin-1" approach has a certain > charm as well, but clearly would be a mistake on OSX, and probably on > other systems too (whenever the user doesn't think in Latin-1). Aye, it's a better hack than UTF-8b, but adding byte functions is even better. -- Adam Olsen, aka Rhamphoryncus From foom at fuhm.net Tue Sep 30 00:28:31 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Sep 2008 18:28:31 -0400 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Sep 29, 2008, at 6:17 PM, Adam Olsen wrote: > I suspect linux will eventually take this route as well. If ext3 had > an option for UTF-8 validation I know I'd want it on. That'd move the > error to the program creating bogus file names, rather than those > trying to read, display, and manage them. Of course, even on Mac OS X, or a theoretical UTF-8-enforcing ext3, random byte strings are still possible in your program's argv, in environment variables, and as arguments to subprocesses. So python still needs to do something... James From martin at v.loewis.de Tue Sep 30 00:56:18 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 00:56:18 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <48E15D12.40009@v.loewis.de> > The default behaviour should be to use unicode and raise an error if > conversion to unicode fails. It should also be possible to use bytes using > bytes arguments and optional arguments (for getcwd). I'm still opposed to allowing bytes as file names at all in 3k. Python should really strive for providing a uniform datatype, and that should be the character string type. For applications that cannot trust that the conversion works always correctly on POSIX systems, sys.setfilesystemencoding should be provided. In the long run, need for explicit calls to this function should be reduced, by a) systems getting more consistent in their file name encoding, and b) Python providing better defaults for detecting the file name encoding, and better round-trip support for non-encodable bytes. Part b) is probably out-of-scope for 3.0 now, but should be reconsidered for 3.1 Regards, Martin From martin at v.loewis.de Tue Sep 30 01:14:29 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 01:14:29 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> Message-ID: <48E16155.1040209@v.loewis.de> Adam Olsen wrote: > There's no solution except to not > decode, and 8859-1 is the way to do that. I think you need to elaborate that. What does ISO-8859-1 has to do with a Python datatype in this context: which datatype, and what algorithm on it are you specifically referring to? When I do (in 2.x) py> "foo".decode("iso-8859-1") u'foo' ISTM that 8859-1 is all about decoding, so I don't understand why you say it is a way not to decode. Regards, Martin From rhamph at gmail.com Tue Sep 30 01:23:52 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 17:23:52 -0600 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48E16155.1040209@v.loewis.de> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> Message-ID: On Mon, Sep 29, 2008 at 5:14 PM, "Martin v. L?wis" wrote: > Adam Olsen wrote: >> There's no solution except to not >> decode, and 8859-1 is the way to do that. > > I think you need to elaborate that. What does ISO-8859-1 has to do > with a Python datatype in this context: which datatype, and what > algorithm on it are you specifically referring to? > > When I do (in 2.x) > > py> "foo".decode("iso-8859-1") > u'foo' > > ISTM that 8859-1 is all about decoding, so I don't understand why > you say it is a way not to decode. 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an API that always returns unicode but accepts an encoding you can use it, then reencode using 8859-1 to get back the original bytes. An ugly hack, but more correct than UTF-8b or any similar attempt to do "unicode but not quite unicode"; either it's lossy, or it's not unicode. There's no in between. -- Adam Olsen, aka Rhamphoryncus From victor.stinner at haypocalc.com Tue Sep 30 01:29:24 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 01:29:24 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <200809300129.24972.victor.stinner@haypocalc.com> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit?: > >> - listdir(unicode) -> unicode and raise an error on invalid filename > > I know I keep flipflopping on this one, but the more I think about it > the more I believe it is better to drop those names than to raise an > exception. Otherwise a "naive" program that happens to use > os.listdir() can be rendered completely useless by a single non-UTF-8 > filename. Consider the use of os.listdir() by the glob module. If I am > globbing for *.py, why should the presence of a file named b'\xff' > cause it to fail? It would be hard for a newbie programmer to understand why he's unable to find his very important file ("important r?port.doc") using os.listdir(). And yes, if your file system is broken, glob() will fail. If we choose to support bytes on Linux, a robust and portable program have to use only bytes filenames on Linux to always be able to list and open files. A full example to list files and display filenames: import os import os.path import sys if os.path.supports_unicode_filenames: cwd = getcwd() else: cwd = getcwdb() encoding = sys.getfilesystemencoding() for filename in os.listdir(cwd): if os.path.supports_unicode_filenames: text = str(filename, encoding, "replace) else: text = filename print("=== File {0} ===".format(text)) for line in open(filename): ... We need an "if" to choose the directory. The second "if" is only needed to display the filename. Using bytes, it would be possible to write better code detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so display correctly the filename and/or propose to rename the file. Would it possible using UTF-8b / PUA hacks? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From martin at v.loewis.de Tue Sep 30 01:31:11 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 01:31:11 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> Message-ID: <48E1653F.2050308@v.loewis.de> >> ISTM that 8859-1 is all about decoding, so I don't understand why >> you say it is a way not to decode. > > 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an > API that always returns unicode but accepts an encoding you can use > it, then reencode using 8859-1 to get back the original bytes. I still don't understand. 8859-1 is an encoding, not a datatype. So how do you propose file names to be represented? "In 8859-1" is not a valid answer, because you cannot derive an implementation from that answer (atleast, I cannot). Please explain. Regards, Martin From foom at fuhm.net Tue Sep 30 01:33:47 2008 From: foom at fuhm.net (James Y Knight) Date: Mon, 29 Sep 2008 19:33:47 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> Message-ID: <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net> On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote: > An ugly hack, but more correct than UTF-8b or any similar attempt to > do "unicode but not quite unicode"; either it's lossy, or it's not > unicode. There's no in between. Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very poor way forward. I don't see how you can claim it's more correct. It's correct in no case except for pure ASCII on a utf-8 system. I still like the UTF-8b proposal, but if you want to push against that, I don't see any sensible alternative but to move back towards a bytestring API. Having two parallel APIs or a mixture of data types is confusing, so, just toss the Unicode APIs entirely. That'd be much much nicer than having everyone use 8859-1, incorrectly, for their platform encoding. On Windows, the platform-native Unicode strings could simply be encoded into utf-8 when entering Python-land, and decoded back to Unicode when leaving pythonland, to keep the API consistently bytestring oriented on both platforms. James From rhamph at gmail.com Tue Sep 30 01:34:43 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 17:34:43 -0600 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48E1653F.2050308@v.loewis.de> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> <48E1653F.2050308@v.loewis.de> Message-ID: On Mon, Sep 29, 2008 at 5:31 PM, "Martin v. L?wis" wrote: >>> ISTM that 8859-1 is all about decoding, so I don't understand why >>> you say it is a way not to decode. >> >> 8859-1 has no invalid bytes and is a 1-to-1 mapping. If you have an >> API that always returns unicode but accepts an encoding you can use >> it, then reencode using 8859-1 to get back the original bytes. > > I still don't understand. 8859-1 is an encoding, not a datatype. > So how do you propose file names to be represented? "In 8859-1" > is not a valid answer, because you cannot derive an implementation > from that answer (atleast, I cannot). Please explain. Decoding UTF-8 using 8859-1 gives you garbage, but it's lossless and reversible, and that's all a backup program would need. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Tue Sep 30 01:41:36 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Sep 2008 16:41:36 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner wrote: > Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit : >> >> - listdir(unicode) -> unicode and raise an error on invalid filename >> >> I know I keep flipflopping on this one, but the more I think about it >> the more I believe it is better to drop those names than to raise an >> exception. Otherwise a "naive" program that happens to use >> os.listdir() can be rendered completely useless by a single non-UTF-8 >> filename. Consider the use of os.listdir() by the glob module. If I am >> globbing for *.py, why should the presence of a file named b'\xff' >> cause it to fail? > > It would be hard for a newbie programmer to understand why he's unable to find > his very important file ("important r?port.doc") using os.listdir(). *Every* failure in this scenario will be hard to understand for a newbie programmer. We can just document the fact. > And yes, > if your file system is broken, glob() will fail. Why should it? > If we choose to support bytes on Linux, a robust and portable program have to > use only bytes filenames on Linux to always be able to list and open files. Right. But such robustness is only needed to support certain odd cases and we cannot demand that most people bother to write robust code all the time. > A full example to list files and display filenames: > > import os > import os.path > import sys > if os.path.supports_unicode_filenames: This is backwards -- the Unicode API is always supported, the bytes API only on Linux (and perhaps some other other Unixes). > cwd = getcwd() > else: > cwd = getcwdb() > encoding = sys.getfilesystemencoding() > for filename in os.listdir(cwd): > if os.path.supports_unicode_filenames: > text = str(filename, encoding, "replace) > else: > text = filename > print("=== File {0} ===".format(text)) > for line in open(filename): > ... > > We need an "if" to choose the directory. The second "if" is only needed to > display the filename. Using bytes, it would be possible to write better code > detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so > display correctly the filename and/or propose to rename the file. Would it > possible using UTF-8b / PUA hacks? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Tue Sep 30 01:50:32 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 17:50:32 -0600 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net> Message-ID: On Mon, Sep 29, 2008 at 5:33 PM, James Y Knight wrote: > On Sep 29, 2008, at 7:23 PM, Adam Olsen wrote: >> >> An ugly hack, but more correct than UTF-8b or any similar attempt to >> do "unicode but not quite unicode"; either it's lossy, or it's not >> unicode. There's no in between. > > Promoting the use of 8859-1 to decode mostly-utf-8 data seems like a very > poor way forward. I don't see how you can claim it's more correct. It's > correct in no case except for pure ASCII on a utf-8 system. It's correct in the sense that it can roundtrip all filenames. UTF-8b is lossy, so certain filenames are not roundtripped properly. It doesn't let you print correctly, but neither would an API that returns bytes. 8859-1 is just a hack for when you want bytes, but the API only allows unicode. > I still like the UTF-8b proposal, but if you want to push against that, I > don't see any sensible alternative but to move back towards a bytestring > API. Having two parallel APIs or a mixture of data types is confusing, so, > just toss the Unicode APIs entirely. That'd be much much nicer than having > everyone use 8859-1, incorrectly, for their platform encoding. As a user, I expect all file names to be printable. That requires unicode, and any program that creates filenames with arbitrary bytestrings is just broken. Not all operating systems enforce this yet, but returning bytes only means we have to explicitly decode in the 99% of cases where we'd happily assume it's correct unicode. I'd rather the 1% of cases that need to handle bad file names make an explicit effort to do so, via alternate byte APIs or (if necessary) the 8859-1 hack. > On Windows, the platform-native Unicode strings could simply be encoded into > utf-8 when entering Python-land, and decoded back to Unicode when leaving > pythonland, to keep the API consistently bytestring oriented on both > platforms. -- Adam Olsen, aka Rhamphoryncus From victor.stinner at haypocalc.com Tue Sep 30 02:02:38 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 02:02:38 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <200809300202.38574.victor.stinner@haypocalc.com> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit?: > If I had to choose, I'd still argue for the modified UTF-8 as filesystem > encoding (if it were UTF-8 otherwise), despite possible surprises when a > such-encoded filename escapes from Python. If I understand correctly this solution. The idea is to change the default file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 to make sure that UTF-8 conversion will never fail. Let's try with an ugly directory on my UTF-8 file system: $ find . ./t?ste ./? ./a?b ./dossi? ./dossi?/abc ./dir?name ./dir?name/xyz Python3 using encoding=ISO-8859-1: >>> import os; os.listdir(b'.') [b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname'] >>> files=os.listdir('.'); files ['t??ste', '??', 'a?b', 'dossi??', 'dir?name'] >>> open(files[0]).close() >>> os.listdir(files[-1]) ['xyz'] Ok, I have unicode filenames and I'm able to open a file and list a directory. The problem is now to display correctly the filenames. For me "unicode" sounds like "text (characters) encoded in the correct charset". In this case, unicode is just a storage for *bytes* in a custom charset. How can we mix with ? Eg. os.path.join('dossi??', "fichi?") : first argument is encoded in ISO-8859-1 whereas the second argument is encoding in Unicode. It's something like that: str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9' Whereas the correct (unicode) result should be: 'dossi?/fichi?' as bytes in ISO-8859-1: b'dossi\xc3\xa9/fichi\xc3\xa9' as bytes in UTF-8: b'dossi\xe9/fichi\xe9' Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: . -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From martin at v.loewis.de Tue Sep 30 02:07:23 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 02:07:23 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: <48E16DBB.1030805@v.loewis.de> > import os > import os.path > import sys > if os.path.supports_unicode_filenames: > cwd = getcwd() > else: > cwd = getcwdb() > encoding = sys.getfilesystemencoding() > for filename in os.listdir(cwd): > if os.path.supports_unicode_filenames: > text = str(filename, encoding, "replace) > else: > text = filename > print("=== File {0} ===".format(text)) > for line in open(filename): > ... > > We need an "if" to choose the directory. The second "if" is only needed to > display the filename. Using bytes, it would be possible to write better code > detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so > display correctly the filename and/or propose to rename the file. Would it > possible using UTF-8b / PUA hacks? Not sure what "it" is: to write the code above using the PUA hack: for filename in os.listdir(os.getcwd()) text = repr(filename) print("=== File {0} ===".format(text)) for line in open(filenmae): ... If "it" is "display the filename": sure, see above. If "it" is "detect the real charset": sure, why not? Regards, Martin From rhamph at gmail.com Tue Sep 30 02:08:41 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Sep 2008 18:08:41 -0600 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809300129.24972.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 5:29 PM, Victor Stinner wrote: > Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit : >> >> - listdir(unicode) -> unicode and raise an error on invalid filename >> >> I know I keep flipflopping on this one, but the more I think about it >> the more I believe it is better to drop those names than to raise an >> exception. Otherwise a "naive" program that happens to use >> os.listdir() can be rendered completely useless by a single non-UTF-8 >> filename. Consider the use of os.listdir() by the glob module. If I am >> globbing for *.py, why should the presence of a file named b'\xff' >> cause it to fail? > > It would be hard for a newbie programmer to understand why he's unable to find > his very important file ("important r?port.doc") using os.listdir(). And yes, > if your file system is broken, glob() will fail. Imagine a program that list all files in a dir, as well as their file size. If we return bytes we'll print the name wrong. If we return lossy unicode we'll be unable to get the size of some files. If we return a malformed unicode we'll be unable to print at all (and what if this is a GUI app?) The common use cases need unicode, so the best options for them are to fail outright or skip bad filenames. The uncommon use cases need bytes, and they could do an explicit lossy decode for printing, while still keeping the internal file name as bytes. Failing outright does have the advantage that the resulting exception should have a half-decent approximation of the bad filename. (Thanks to the recent choices on unicode repr() and having stderr do escapes.) -- Adam Olsen, aka Rhamphoryncus From victor.stinner at haypocalc.com Tue Sep 30 02:09:33 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 02:09:33 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48E15B83.9040205@v.loewis.de> Message-ID: <200809300209.33636.victor.stinner@haypocalc.com> Le Tuesday 30 September 2008 01:31:45 Adam Olsen, vous avez ?crit?: > The alternative is not be valid unicode, but since we can't use such > objects with external libs, can't even print them, we might as well > call them something else. We already have a name for that: bytes. :-) From victor.stinner at haypocalc.com Tue Sep 30 02:47:20 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 02:47:20 +0200 Subject: [Python-3000] Patch for an initial support of bytes filename in Python3 Message-ID: <200809300247.20349.victor.stinner@haypocalc.com> Hi, See attached patch: python3_bytes_filename.patch Using the patch, you will get: - open() support bytes - listdir(unicode) -> only unicode, *skip* invalid filenames (as asked by Guido) - remove os.getcwdu() - create os.getcwdb() -> bytes - glob.glob() support bytes - fnmatch.filter() support bytes - posixpath.join() and posixpath.split() support bytes Mixing bytes and str is invalid. Examples raising a TypeError: - posixpath.join(b'x', 'y') - fnmatch.filter([b'x', 'y'], '*') - fnmatch.filter([b'x', b'y'], '*') - glob.glob1('.', b'*') - glob.glob1(b'.', '*') $ diffstat ~/python3_bytes_filename.patch Lib/fnmatch.py | 7 +++- Lib/glob.py | 15 ++++++--- Lib/io.py | 2 - Lib/posixpath.py | 20 ++++++++---- Modules/posixmodule.c | 83 ++++++++++++++++++-------------------------------- 5 files changed, 62 insertions(+), 65 deletions(-) TODO: - review this patch :-) - support non-ASCII bytes in fnmatch.filter() - fix other functions, eg. posixpath.isabs() and fnmatch.fnmatchcase() - fix functions written in C: grep FileSystemDefaultEncoding - make sure that mixing bytes and str is rejected -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ -------------- next part -------------- A non-text attachment was scrubbed... Name: python3_bytes_filename.patch Type: text/x-diff Size: 6732 bytes Desc: not available URL: From stephen at xemacs.org Tue Sep 30 04:24:29 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Sep 2008 11:24:29 +0900 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: <87prmme5gi.fsf@xemacs.org> Guido van Rossum writes: > On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner > wrote: > > It would be hard for a newbie programmer to understand why he's > > unable to find his very important file ("important r?port.doc") > > using os.listdir(). > *Every* failure in this scenario will be hard to understand for a > newbie programmer. We can just document the fact. Guido is absolutely right. The Emacs/Mule people have been trying to solve this kind of problem for 20 years, and the best they've come up with is Martin's strategy: if you need really robust decoding, force ISO 8859/1 (which for historical reasons uses all 256 octets) to get a lossless internal text representation, and decode from that and *track the encoding used* at the application level. The email-sig/Mailman people will testify how hard this is to do well, even when you have a handful of RFCs that specify how it is to be done! On the other hand, this kind of robustness is almost never needed in "general newbie programming", except when you are writing a program to be used to clean up after an undisciplined administration, or some other system disaster. Under normal circumstances the system encoding is well-known and conformance is universal. The best you can do for a general programming system is to heuristically determine a single system encoding and raise an error if the decoding fails. From stephen at xemacs.org Tue Sep 30 05:11:12 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Sep 2008 12:11:12 +0900 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> Message-ID: <87od26e3an.fsf@xemacs.org> James Y Knight writes: > On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote: > > UTF-8b doesn't work as intended. It produces an invalid unicode > > object (garbage surrogates) that cannot be used with external APIs or > > libraries that require unicode. > > I'd be interested to hear more detail on what you expect the practical > ramifications of this to be. It doesn't sound likely to be a problem > to me. That's because you have a specific use case in mind. Adam clearly has in mind passing the filename on to a library which might proceed to signal an error (to him, unexpected) on garbage surrogates. He doesn't want to be surprised by that. The problem is that all of these hacks involve a private encoding that looks like something else, and standards-conforming external programs will be confused by them. You can't prevent them from leaking unless you store them as a non-text type, which has huge ramifications. > > If you don't need unicode then your > > code should state so explicitly, and 8859-1 is ideal there. > > But, I *do* want unicode. ALL my filenames are encoded in utf8. That's not what really is at issue here. The point is that in the exceptional case where you get non-Unicode, and are willing to accept it, ersatz binary (ISO-8859-1) works fine. The problem is tagging this as an exceptional filename that doesn't use the usual encoding; that should be done by the application, I think. Most applications won't need it. > Except...that one over there. That's the whole point of UTF-8b: > correctly encoded names get decoded correctly and readably, and the > other cases get decoded into something unique that cannot possibly > conflict. Sure. But there are lots of other operations besides encoding and decoding that we do with filenames. How do you display a filename? How about concatenating them to make paths? What do you do when you want to mix a filename with other, well-formed strings? If you keep the filenames internally in UTF-8b, you're going to need what amounts to a whole string API for dealing with them, aren't you? If you're not doing that, how is UTF-8b represented? And in any case, when you do want to process them as text, the "something unique" will have to be handled exceptionally. I don't think it makes sense to delay that exception; the exception should be raised as soon as Python fails to make sense of the filename. What to do about that exception is a policy matter, as well. Shouldn't that policy be decided at the application level, rather than the Python level? From brett at python.org Tue Sep 30 05:14:00 2008 From: brett at python.org (Brett Cannon) Date: Mon, 29 Sep 2008 20:14:00 -0700 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com> References: <200809300247.20349.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 5:47 PM, Victor Stinner wrote: > Hi, > > See attached patch: python3_bytes_filename.patch > Patches should go on the tracker, not the mailing list. Otherwise it will just get lost. -Brett From martin at v.loewis.de Tue Sep 30 08:00:55 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 08:00:55 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809300202.38574.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: <48E1C097.8030309@v.loewis.de> > Change the default file system encoding to store bytes in Unicode is like > introducing a new Python type: . Exactly. Seems like the best solution to me, despite your polemics. Regards, Martin From g.brandl at gmx.net Tue Sep 30 08:22:37 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 30 Sep 2008 08:22:37 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <200809300202.38574.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: Victor Stinner schrieb: > Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit : >> If I had to choose, I'd still argue for the modified UTF-8 as filesystem >> encoding (if it were UTF-8 otherwise), despite possible surprises when a >> such-encoded filename escapes from Python. > > If I understand correctly this solution. The idea is to change the default > file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 > to make sure that UTF-8 conversion will never fail. No, that was not what I meant (although it is another possibility). As I wrote, Martin's proposal that I support here is using the modified UTF-8 codec that successfully roundtrips otherwise invalid UTF-8 data. You seem to forget that (disregarding OSX here, since it already enforces UTF-8) the majority of file names on Posix systems will be encoded correctly. > Let's try with an ugly directory on my UTF-8 file system: > $ find > .. > ../t?ste > ../? > ../a?b > ../dossi? > ../dossi?/abc > ../dir?name > ../dir?name/xyz > > Python3 using encoding=ISO-8859-1: >>>> import os; os.listdir(b'.') > [b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname'] >>>> files=os.listdir('.'); files > ['t??ste', '??', 'a?b', 'dossi??', 'dir?name'] >>>> open(files[0]).close() >>>> os.listdir(files[-1]) > ['xyz'] > > Ok, I have unicode filenames and I'm able to open a file and list a directory. > The problem is now to display correctly the filenames. > > For me "unicode" sounds like "text (characters) encoded in the correct > charset". In this case, unicode is just a storage for *bytes* in a custom > charset. > How can we mix with unicode>? Eg. os.path.join('dossi??', "fichi?") : first argument is encoded > in ISO-8859-1 whereas the second argument is encoding in Unicode. It's > something like that: > str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9' > > Whereas the correct (unicode) result should be: > 'dossi?/fichi?' > as bytes in ISO-8859-1: > b'dossi\xc3\xa9/fichi\xc3\xa9' > as bytes in UTF-8: > b'dossi\xe9/fichi\xe9' With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will be displayed and handled correctly. The others are *invalid* in the filesystem encoding UTF-8 and therefore would be represented by something like u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look pretty when printed, but then, what do other applications do? They e.g. display a question mark as you show above, which is not better in terms of readability. But it will work when given to a filename-handling function. Valid filenames can be compared to Unicode strings. A real-world example: OpenOffice can't open files with invalid bytes in their name. They are displayed in the "Open file" dialog, but trying to open fails. This regularly drives me crazy. Let's not make Python not work this way too, or, even worse, not even display those filenames. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From rhamph at gmail.com Tue Sep 30 08:52:21 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Sep 2008 00:52:21 -0600 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 12:22 AM, Georg Brandl wrote: > Victor Stinner schrieb: >> Le Monday 29 September 2008 18:45:28 Georg Brandl, vous avez ?crit : >>> If I had to choose, I'd still argue for the modified UTF-8 as filesystem >>> encoding (if it were UTF-8 otherwise), despite possible surprises when a >>> such-encoded filename escapes from Python. >> >> If I understand correctly this solution. The idea is to change the default >> file system encoding, right? Eg. if your filesystem is UTF-8, use ISO-8859-1 >> to make sure that UTF-8 conversion will never fail. > > No, that was not what I meant (although it is another possibility). As I wrote, > Martin's proposal that I support here is using the modified UTF-8 codec that > successfully roundtrips otherwise invalid UTF-8 data. > > You seem to forget that (disregarding OSX here, since it already enforces > UTF-8) the majority of file names on Posix systems will be encoded correctly. > >> Let's try with an ugly directory on my UTF-8 file system: >> $ find >> .. >> ../t?ste >> ../? >> ../a?b >> ../dossi? >> ../dossi?/abc >> ../dir?name >> ../dir?name/xyz >> >> Python3 using encoding=ISO-8859-1: >>>>> import os; os.listdir(b'.') >> [b't\xc3\xaaste', b'\xc3\xb4', b'a\xffb', b'dossi\xc3\xa9', b'dir\xffname'] >>>>> files=os.listdir('.'); files >> ['t??ste', '??', 'a?b', 'dossi?(c)', 'dir?name'] >>>>> open(files[0]).close() >>>>> os.listdir(files[-1]) >> ['xyz'] >> >> Ok, I have unicode filenames and I'm able to open a file and list a directory. >> The problem is now to display correctly the filenames. >> >> For me "unicode" sounds like "text (characters) encoded in the correct >> charset". In this case, unicode is just a storage for *bytes* in a custom >> charset. > >> How can we mix with > unicode>? Eg. os.path.join('dossi?(c)', "fichi?") : first argument is encoded >> in ISO-8859-1 whereas the second argument is encoding in Unicode. It's >> something like that: >> str(b'dossi\xc3\xa9', 'ISO-8859-1') + '/' + 'fichi\xe9' >> >> Whereas the correct (unicode) result should be: >> 'dossi?/fichi?' >> as bytes in ISO-8859-1: >> b'dossi\xc3\xa9/fichi\xc3\xa9' >> as bytes in UTF-8: >> b'dossi\xe9/fichi\xe9' > > With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will > be displayed and handled correctly. The others are *invalid* in the filesystem > encoding UTF-8 and therefore would be represented by something like > > u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look > pretty when printed, but then, what do other applications do? They e.g. display > a question mark as you show above, which is not better in terms of readability. > > But it will work when given to a filename-handling function. Valid filenames > can be compared to Unicode strings. > > A real-world example: OpenOffice can't open files with invalid bytes in their > name. They are displayed in the "Open file" dialog, but trying to open fails. > This regularly drives me crazy. Let's not make Python not work this way too, > or, even worse, not even display those filenames. The only way to display that file would be to transform it into some other valid unicode string. However, as that string is already valid, you've just made any files named after it impossible to open. If you extend unicode then you're unable to display that extended name[1]. I think Guido's right on this one. If I have to choose between openoffice crashing or skipping my file, I'd vastly prefer it skip it. A warning would be a nice bonus (from python or from openoffice), telling me there's a buggered file I should go fix. Renaming the file is the end solution. [1] You could argue that Unicode should add new scalars to handle all currently invalid UTF-8 sequences. They could then output to their original forms if in UTF-8, or a mundane form in UTF-16 and UTF-32. However, I suspect "we don't want to add validation to linux" will not be a very persuasive argument. -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Tue Sep 30 11:28:03 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Sep 2008 09:28:03 +0000 (UTC) Subject: [Python-3000] New proposition for Python3 bytes filename issue References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: Adam Olsen gmail.com> writes: > > The only way to display that file would be to transform it into some > other valid unicode string. However, as that string is already valid, > you've just made any files named after it impossible to open. Not if those valid sequences are also properly escaped to avoid collisions. That's what utf-8b claims to do. My view of utf-8b is that if is not really a new codec, but an escaping phase added in front of utf-8, such that illegal byte sequences get converted to legal byte sequences. This is how e.g. XML-escaping works ("&" -> "&", etc.). The only difficulty being in choosing sufficiently rare escaping sequences, so that readability is not impacted. From mal at egenix.com Tue Sep 30 12:31:51 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 30 Sep 2008 12:31:51 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E1C097.8030309@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> Message-ID: <48E20017.3020405@egenix.com> On 2008-09-30 08:00, Martin v. L?wis wrote: >> Change the default file system encoding to store bytes in Unicode is like >> introducing a new Python type: . > > Exactly. Seems like the best solution to me, despite your polemics. Not a bad idea... have os.listdir() return Unicode subclasses that work like file handles, ie. they have an extra buffer that holds the original bytes value received from the underlying C API. Passing these handles to open() would then do the right thing by using whatever os.listdir() got back from the file system to open the file, while still providing a sane way to display the filename, e.g. using question marks for the invalid characters. The only problem with this approach is concatenation of such handles to form pathnames, but then perhaps those concatenations could just work on the bytes value as well (I don't know of any OS that uses non- ASCII path separators). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From stephen at xemacs.org Tue Sep 30 13:24:45 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 30 Sep 2008 20:24:45 +0900 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: <87iqsdev0i.fsf@xemacs.org> Adam Olsen writes: > [1] You could argue that Unicode should add new scalars to handle all > currently invalid UTF-8 sequences. AFAIK there are about 2^31 of these, though! From rhamph at gmail.com Tue Sep 30 14:20:28 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Sep 2008 06:20:28 -0600 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 3:28 AM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> >> The only way to display that file would be to transform it into some >> other valid unicode string. However, as that string is already valid, >> you've just made any files named after it impossible to open. > > Not if those valid sequences are also properly escaped to avoid collisions. > That's what utf-8b claims to do. > > My view of utf-8b is that if is not really a new codec, but an escaping phase > added in front of utf-8, such that illegal byte sequences get converted to legal > byte sequences. This is how e.g. XML-escaping works ("&" -> "&", etc.). The > only difficulty being in choosing sufficiently rare escaping sequences, so that > readability is not impacted. UTF-8b uses lone surrogates, which are malformed. You bring up a good point though. That sort escaping is lossless, and a PUA escape character would be unlikely to collide. It would still fail if another API was used to open the file (gtk or openoffice?), and the thought of it creeping into other apps gives me an icky feeling. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Tue Sep 30 14:36:51 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Sep 2008 06:36:51 -0600 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <87iqsdev0i.fsf@xemacs.org> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <87iqsdev0i.fsf@xemacs.org> Message-ID: On Tue, Sep 30, 2008 at 5:24 AM, Stephen J. Turnbull wrote: > Adam Olsen writes: > > > [1] You could argue that Unicode should add new scalars to handle all > > currently invalid UTF-8 sequences. > > AFAIK there are about 2^31 of these, though! They've promised to never allocate above U+10FFFF (0 to 1114111). Not sure that makes new additions easier or harder. ;) -- Adam Olsen, aka Rhamphoryncus From solipsis at pitrou.net Tue Sep 30 11:06:52 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Sep 2008 11:06:52 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net> Message-ID: <1222765612.6214.12.camel@fsol> Le lundi 29 septembre 2008 ? 17:50 -0600, Adam Olsen a ?crit : > It's correct in the sense that it can roundtrip all filenames. UTF-8b > is lossy, so certain filenames are not roundtripped properly. Why do you say UTF-8b is lossy? From what I've read it claims to be lossless (i.e. the range of characters used for escaping of invalid bytes are themselves escaped if they are encountered in the source sequence). > As a user, I expect all file names to be printable. That requires > unicode, and any program that creates filenames with arbitrary > bytestrings is just broken. But if you use iso-8859-1 for decoding, all non-ASCII filenames will be printed wrongly, not only those with invalid bytestrings. I fail to see what it brings. From guido at python.org Tue Sep 30 15:50:10 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 06:50:10 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy wrote: > >> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit : > >>> I know I keep flipflopping on this one, but the more I think about it >>> the more I believe it is better to drop those names than to raise an >>> exception. Otherwise a "naive" program that happens to use >>> os.listdir() can be rendered completely useless by a single non-UTF-8 >>> filename. Consider the use of os.listdir() by the glob module. If I am >>> globbing for *.py, why should the presence of a file named b'\xff' >>> cause it to fail? > > To avoid silent skipping, is it possible to drop 'unreadable' names, issue a > warning (instead of exception), and continue to completion? > "Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames" That would be annoying as hell in most cases. I consider the dropping of unreadable names similar to the suppression of "hidden" files by various operating systems. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 15:53:09 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 06:53:09 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E1C097.8030309@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> Message-ID: On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" wrote: >> Change the default file system encoding to store bytes in Unicode is like >> introducing a new Python type: . > > Exactly. Seems like the best solution to me, despite your polemics. Martin, I don't understand why you are in favor of storing raw bytes encoded as Latin-1 in Unicode string objects, which clearly gives rise to mojibake. In the past you have always been staunchly opposed to API changes or practices that could lead to mojibake (and you had me quite convinced). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Tue Sep 30 15:54:20 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 15:54:20 +0200 Subject: [Python-3000] =?iso-8859-1?q?=5BPython-Dev=5D_Patch_for_an_initia?= =?iso-8859-1?q?l_support_of_bytes_filename_in=09Python3?= In-Reply-To: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> References: <200809300247.20349.victor.stinner@haypocalc.com> <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> Message-ID: <200809301554.21222.victor.stinner@haypocalc.com> Hi, > This is the most sane contribution I've seen so far :). Oh thanks. > Do I understand properly that (listdir(bytes) -> bytes)? Yes, os.listdir(bytes)->bytes. It's already the current behaviour. But with Python3 trunk, os.listdir(str) -> str ... or bytes (if unicode conversion fails). > If so, this seems basically sane to me, since it provides text behavior > where possible and allows more sophisticated filesystem wrappers (i.e. > Twisted's FilePath, Will McGugan's "FS") to do more tricky things, > separating filenames for display to the user and filenames for exchange > with the FS. It's the goal of my patch. Let people do what you want with bytes: rename the file, try the best charset to display the filename, etc. > >- remove os.getcwdu() > >- create os.getcwdb() -> bytes > >- glob.glob() support bytes > >- fnmatch.filter() support bytes > >- posixpath.join() and posixpath.split() support bytes > > It sounds like maybe there should be some 2to3 fixers in here somewhere, > too? IMHO a programmer should not use bytes for filenames. Only specific programs used to fix a broken system (eg. convmv program), a backup program, etc. should use bytes. So the "default" type (type and not charset) for filenames should be str in Python3. If my patch would be applied, 2to3 have to replace getcwdu() to getcwd(). That's all. > Not necessarily as part of this patch, but somewhere related? I > don't know what they would do, but it does seem quite likely that code > which was previously correct under 2.6 (using bytes) would suddenly be > mixing bytes and unicode with these APIs. It looks like 2to3 convert all text '...' or u'...' to unicode (str). So converted programs will use str for filenames. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From guido at python.org Tue Sep 30 15:59:42 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 06:59:42 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: On Mon, Sep 29, 2008 at 11:22 PM, Georg Brandl wrote: > No, that was not what I meant (although it is another possibility). As I wrote, > Martin's proposal that I support here is using the modified UTF-8 codec that > successfully roundtrips otherwise invalid UTF-8 data. I thought that the "successful rountripping" pretty much stopped as soon as the unicode data is exported to somewhere else -- doesn't it contain invalid surrogate sequences? In general, I'm very reluctant to use utf-8b given that it doesn't seem to be well documented as a standard anywhere. Providing some minimal APIs that can process raw-bytes filenames still makes more sense -- it is mostly analogous of our treatment of text files, where the underlying binary data is also accessible. > You seem to forget that (disregarding OSX here, since it already enforces > UTF-8) the majority of file names on Posix systems will be encoded correctly. Apparently under certain circumstances (external FS mounted) OSX can also have non-UTF-8 filenames. [...] > With the filenames decoded by UTF-8, your files named t?ste, ?, dossi? will > be displayed and handled correctly. The others are *invalid* in the filesystem > encoding UTF-8 and therefore would be represented by something like > > u'dir\uXXffname' where XX is some private use Unicode namespace. It won't look > pretty when printed, but then, what do other applications do? They e.g. display > a question mark as you show above, which is not better in terms of readability. > > But it will work when given to a filename-handling function. Valid filenames > can be compared to Unicode strings. > > A real-world example: OpenOffice can't open files with invalid bytes in their > name. They are displayed in the "Open file" dialog, but trying to open fails. > This regularly drives me crazy. Let's not make Python not work this way too, > or, even worse, not even display those filenames. How can it *regularly* drive you crazy when "the majority of fie names [...] encoded correctly" (as you assert above)? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 16:04:09 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 07:04:09 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 2:28 AM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> >> The only way to display that file would be to transform it into some >> other valid unicode string. However, as that string is already valid, >> you've just made any files named after it impossible to open. > > Not if those valid sequences are also properly escaped to avoid collisions. > That's what utf-8b claims to do. > > My view of utf-8b is that if is not really a new codec, but an escaping phase > added in front of utf-8, such that illegal byte sequences get converted to legal > byte sequences. This is how e.g. XML-escaping works ("&" -> "&", etc.). The > only difficulty being in choosing sufficiently rare escaping sequences, so that > readability is not impacted. The problem is that there's no way (at least nobody has proposed one AFAICT) to tell whether the escaping has been applied. When reading XML, you *know* that you are expected to unescape exactly one level of & escaping. You would never find XML with the unescaping already done for you. But the output of utf-8b is indistinguishable from regular utf-8 so you don't know whether you need to unescape things. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From victor.stinner at haypocalc.com Tue Sep 30 16:11:02 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 30 Sep 2008 16:11:02 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> Message-ID: <200809301611.03027.victor.stinner@haypocalc.com> Le Tuesday 30 September 2008 15:53:09 Guido van Rossum, vous avez ?crit?: > On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" wrote: > >> Change the default file system encoding to store bytes in Unicode is > >> like introducing a new Python type: . > > > > Exactly. Seems like the best solution to me, despite your polemics. > > Martin, I don't understand why you are in favor of storing raw bytes > encoded as Latin-1 in Unicode string objects, which clearly gives rise > to mojibake. In the past you have always been staunchly opposed to API > changes or practices that could lead to mojibake (and you had me quite > convinced). If I understood correctly, the goal of Python3 is the clear *separation* of bytes and characters. Store bytes in Unicode is pratical because it doesn't need to change the existing code, but it doesn't fix the problem, it's just move problems which be raised later. I didn't get an answer to my question: what is the result + ? I guess that the result is instead of raising an error (invalid types). So again: why introducing a new type instead of reusing existing Python types? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From guido at python.org Tue Sep 30 16:05:58 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 07:05:58 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E20017.3020405@egenix.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com> Message-ID: On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg wrote: > On 2008-09-30 08:00, Martin v. L?wis wrote: >>> Change the default file system encoding to store bytes in Unicode is like >>> introducing a new Python type: . >> >> Exactly. Seems like the best solution to me, despite your polemics. > > Not a bad idea... have os.listdir() return Unicode subclasses that work > like file handles, ie. they have an extra buffer that holds the original > bytes value received from the underlying C API. > > Passing these handles to open() would then do the right thing by using > whatever os.listdir() got back from the file system to open the file, > while still providing a sane way to display the filename, e.g. using > question marks for the invalid characters. > > The only problem with this approach is concatenation of such handles > to form pathnames, but then perhaps those concatenations could just > work on the bytes value as well (I don't know of any OS that uses non- > ASCII path separators). While this seems to work superficially I expect an infinite number of problems caused by code that doesn't understand this subclass. You are hinting at this in your last paragraph. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 16:32:38 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 07:32:38 -0700 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> References: <200809300247.20349.victor.stinner@haypocalc.com> <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> Message-ID: On Tue, Sep 30, 2008 at 6:21 AM, wrote: > On 12:47 am, victor.stinner at haypocalc.com wrote: > > This is the most sane contribution I've seen so far :). Thanks. I'll review it later today (after coffee+breakfast :) and will apply it assuming the code is reasonably sane, otherwise I'll go around with Victor until it is to my satisfaction. >> See attached patch: python3_bytes_filename.patch >> >> Using the patch, you will get: >> - open() support bytes >> - listdir(unicode) -> only unicode, *skip* invalid filenames >> (as asked by Guido) > > Forgive me for being a bit dense, but I couldn't find this hunk in the > patch. Do I understand properly that (listdir(bytes) -> bytes)? > > If so, this seems basically sane to me, since it provides text behavior > where possible and allows more sophisticated filesystem wrappers (i.e. > Twisted's FilePath, Will McGugan's "FS") to do more tricky things, > separating filenames for display to the user and filenames for exchange with > the FS. >> >> - remove os.getcwdu() >> - create os.getcwdb() -> bytes >> - glob.glob() support bytes >> - fnmatch.filter() support bytes >> - posixpath.join() and posixpath.split() support bytes > > It sounds like maybe there should be some 2to3 fixers in here somewhere, > too? Not necessarily as part of this patch, but somewhere related? I don't > know what they would do, but it does seem quite likely that code which was > previously correct under 2.6 (using bytes) would suddenly be mixing bytes > and unicode with these APIs. Doesn't seem easy for 2to3 to recognize such cases. If 2.6 weren't pretty much released already I'd ask to add os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone (benefit of the doubt) and leaves os.getcwdb() alone as well (a strong indication the user meant to get bytes in the 3.x version of their code. (Similar to using bytes instead of str in 2.6 even though they mean the same thing there -- they will be properly separated in 3.x.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Tue Sep 30 17:14:12 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 30 Sep 2008 11:14:12 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <48E16155.1040209@v.loewis.de> <47C8A10B-1D2F-4DCB-BACE-BE2D513A11D3@fuhm.net> Message-ID: <3B962D7E-076B-4871-99A8-A3C6220592CD@fuhm.net> On Sep 29, 2008, at 7:50 PM, Adam Olsen wrote: > I'd rather the 1% of cases that need to handle bad file names make an > explicit effort to do so, via alternate byte APIs or (if necessary) > the 8859-1 hack. So are you okay with python failing to run properly if the current directory has strange bytes in it? What if something odd is on the PATH environment variable? So much for being able to access os.environ['PATH']? I just don't see how that's okay behavior of a programming language to fail so drastically. Unless you're proposing that nothing in python itself ever use the Unicode file API...but if you're proposing that, it kinda seems silly to even have it. James From mal at egenix.com Tue Sep 30 17:20:42 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 30 Sep 2008 17:20:42 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com> Message-ID: <48E243CA.1090604@egenix.com> On 2008-09-30 16:05, Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg wrote: >> On 2008-09-30 08:00, Martin v. L?wis wrote: >>>> Change the default file system encoding to store bytes in Unicode is like >>>> introducing a new Python type: . >>> Exactly. Seems like the best solution to me, despite your polemics. >> Not a bad idea... have os.listdir() return Unicode subclasses that work >> like file handles, ie. they have an extra buffer that holds the original >> bytes value received from the underlying C API. >> >> Passing these handles to open() would then do the right thing by using >> whatever os.listdir() got back from the file system to open the file, >> while still providing a sane way to display the filename, e.g. using >> question marks for the invalid characters. >> >> The only problem with this approach is concatenation of such handles >> to form pathnames, but then perhaps those concatenations could just >> work on the bytes value as well (I don't know of any OS that uses non- >> ASCII path separators). > > While this seems to work superficially I expect an infinite number of > problems caused by code that doesn't understand this subclass. You are > hinting at this in your last paragraph. Well, to some extent Unicode objects themselves already implement such a strategy: the default encoded bytes object basically provides the low-level interfacing value. But I agree, the approach is not foolproof. In the end, I think it's better not to be clever and just return the filenames that cannot be decoded as bytes objects in os.listdir(). Passing those to open() will then open the files as expected, in most other cases the application will have to provide explicit conversions in whatever way best fits the application. Also note that os.listdir() isn't the only source of filesnames. You often read them from a file, a database, some socket, etc, so letting the application decide what to do is not asking too much, IMHO. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From janssen at parc.com Tue Sep 30 17:47:30 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Sep 2008 08:47:30 PDT Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com> References: <200809300247.20349.victor.stinner@haypocalc.com> Message-ID: <58953.1222789650@parc.com> Victor Stinner wrote: > - listdir(unicode) -> only unicode, *skip* invalid filenames > (as asked by Guido) Is there an option listdir(bytes) which will return *all* filenames (as byte sequences)? Otherwise, this seems troubling to me; *something* should be returned for filenames which can't be represented, even if it's only None. Bill From glyph at divmod.com Tue Sep 30 15:21:51 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Tue, 30 Sep 2008 13:21:51 -0000 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <200809300247.20349.victor.stinner@haypocalc.com> References: <200809300247.20349.victor.stinner@haypocalc.com> Message-ID: <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> On 12:47 am, victor.stinner at haypocalc.com wrote: This is the most sane contribution I've seen so far :). >See attached patch: python3_bytes_filename.patch > >Using the patch, you will get: >- open() support bytes >- listdir(unicode) -> only unicode, *skip* invalid filenames > (as asked by Guido) Forgive me for being a bit dense, but I couldn't find this hunk in the patch. Do I understand properly that (listdir(bytes) -> bytes)? If so, this seems basically sane to me, since it provides text behavior where possible and allows more sophisticated filesystem wrappers (i.e. Twisted's FilePath, Will McGugan's "FS") to do more tricky things, separating filenames for display to the user and filenames for exchange with the FS. >- remove os.getcwdu() >- create os.getcwdb() -> bytes >- glob.glob() support bytes >- fnmatch.filter() support bytes >- posixpath.join() and posixpath.split() support bytes It sounds like maybe there should be some 2to3 fixers in here somewhere, too? Not necessarily as part of this patch, but somewhere related? I don't know what they would do, but it does seem quite likely that code which was previously correct under 2.6 (using bytes) would suddenly be mixing bytes and unicode with these APIs. From foom at fuhm.net Tue Sep 30 18:20:00 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 30 Sep 2008 12:20:00 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <87od26e3an.fsf@xemacs.org> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> Message-ID: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote: >> Except...that one over there. That's the whole point of UTF-8b: >> correctly encoded names get decoded correctly and readably, and the >> other cases get decoded into something unique that cannot possibly >> conflict. > > Sure. But there are lots of other operations besides encoding and > decoding that we do with filenames. How do you display a filename? > How about concatenating them to make paths? What do you do when you > want to mix a filename with other, well-formed strings? If you keep > the filenames internally in UTF-8b, you're going to need what amounts > to a whole string API for dealing with them, aren't you? If you're > not doing that, how is UTF-8b represented? No, you keep the filenames internally in a PyUnicode object. All that stuff *works* in Python today, with a UTF-8b decoded string. Displaying a filename is encoding it into some other encoding. Like this: >>> '\x90\x90'.decode('utf-8b') u'\udc90\udc90' >>> u'\udc90\udc90'.encode('utf-8') '\xed\xb2\x90\xed\xb2\x90' So, that seems to work okay. Maybe I should try to display that in a web browser. Shows up as 2 "unknown character" glyphs. Perfect. If you want to mix a filename with other strings, you append them together, or use os.path, same as always. You don't need any new string API. Since from what I've tried, things seem to work, I'd really like to know what precisely does fail from the opponents of utf-8b. And again: if utf-8b isn't acceptable, because it does break things in some unknown-to-me way, I really can't imagine anything working but just going back to byte-string access as the only API. It's really not okay for the "obvious" APIs to be totally broken by unexpected input. Think os.getcwd(), sys.argv, os.environ. You can't just ignore bad files and call it done. James From guido at python.org Tue Sep 30 18:46:00 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 09:46:00 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E243CA.1090604@egenix.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com> <48E243CA.1090604@egenix.com> Message-ID: On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg wrote: > In the end, I think it's better not to be clever and just return > the filenames that cannot be decoded as bytes objects in os.listdir(). Unfortunately that's going to break most code that is using os.listdir(), so it's hardly an improved experience. > Passing those to open() will then open the files as expected, in most > other cases the application will have to provide explicit conversions > in whatever way best fits the application. In most cases the app will try to concatenate a pathname given as a string and then it will fail. > Also note that os.listdir() isn't the only source of filesnames. You > often read them from a file, a database, some socket, etc, so letting > the application decide what to do is not asking too much, IMHO. In all those cases, the code that reads them is responsible for picking an encoding or relying on a default encoding, and the resulting filenames are always expressed as text, not bytes. I don't think it's the same at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 18:47:10 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 09:47:10 -0700 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <58953.1222789650@parc.com> References: <200809300247.20349.victor.stinner@haypocalc.com> <58953.1222789650@parc.com> Message-ID: On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen wrote: > Victor Stinner wrote: > >> - listdir(unicode) -> only unicode, *skip* invalid filenames >> (as asked by Guido) > > Is there an option listdir(bytes) which will return *all* filenames (as > byte sequences)? Otherwise, this seems troubling to me; *something* > should be returned for filenames which can't be represented, even if > it's only None. Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in bytes the output is in bytes and it will return everything exactly as the underlying syscall returns it to you. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 18:57:21 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 09:57:21 -0700 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> Message-ID: On Tue, Sep 30, 2008 at 9:20 AM, James Y Knight wrote: > > On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote: > >>> Except...that one over there. That's the whole point of UTF-8b: >>> correctly encoded names get decoded correctly and readably, and the >>> other cases get decoded into something unique that cannot possibly >>> conflict. >> >> Sure. But there are lots of other operations besides encoding and >> decoding that we do with filenames. How do you display a filename? >> How about concatenating them to make paths? What do you do when you >> want to mix a filename with other, well-formed strings? If you keep >> the filenames internally in UTF-8b, you're going to need what amounts >> to a whole string API for dealing with them, aren't you? If you're >> not doing that, how is UTF-8b represented? > > No, you keep the filenames internally in a PyUnicode object. All that stuff > *works* in Python today, with a UTF-8b decoded string. > > Displaying a filename is encoding it into some other encoding. Like this: >>>> '\x90\x90'.decode('utf-8b') > u'\udc90\udc90' >>>> u'\udc90\udc90'.encode('utf-8') > '\xed\xb2\x90\xed\xb2\x90' > > So, that seems to work okay. Maybe I should try to display that in a web > browser. Shows up as 2 "unknown character" glyphs. Perfect. Well browsers are of course the epitome of lenient parsing. Try incorporating one of these things to an XML file and see if standard-conforming XML product likes it. > If you want to mix a filename with other strings, you append them together, > or use os.path, same as always. You don't need any new string API. > > Since from what I've tried, things seem to work, I'd really like to know > what precisely does fail from the opponents of utf-8b. Another problem I have with UTF-8b is its lack of standardization. > And again: if utf-8b isn't acceptable, because it does break things in some > unknown-to-me way, I really can't imagine anything working but just going > back to byte-string access as the only API. It's really not okay for the > "obvious" APIs to be totally broken by unexpected input. Think os.getcwd(), > sys.argv, os.environ. You can't just ignore bad files and call it done. Actually that is what you *have* to do with the filesystem-as-a-black-box model. Filesystems reserve the right to fail occasionally and there's nothing you can do to prevent it -- it would be unacceptable if the entire disk would stop working because it had one bad block (unless the bad block is in some kind of master table) so you just have to deal with it, and you can't wish the problems away by insisting on a perfect abstraction. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tjreedy at udel.edu Tue Sep 30 19:29:01 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Sep 2008 13:29:01 -0400 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300129.24972.victor.stinner@haypocalc.com> Message-ID: Guido van Rossum wrote: > On Mon, Sep 29, 2008 at 8:55 PM, Terry Reedy wrote: >>> Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez ?crit : >>>> I know I keep flipflopping on this one, but the more I think about it >>>> the more I believe it is better to drop those names than to raise an >>>> exception. Otherwise a "naive" program that happens to use >>>> os.listdir() can be rendered completely useless by a single non-UTF-8 >>>> filename. Consider the use of os.listdir() by the glob module. If I am >>>> globbing for *.py, why should the presence of a file named b'\xff' >>>> cause it to fail? >> To avoid silent skipping, is it possible to drop 'unreadable' names, issue a >> warning (instead of exception), and continue to completion? >> "Warning: unreadable filename skipped; see PyWiki/UnreadableFilenames" > > That would be annoying as hell in most cases. OK. Put one warning in the docs at the top of OS/Files and Directories: Note: On Unix, illegal filenames (and the files they name) are silently ignored by many of the functions below. -- but perhaps with more specific info, such as what is illegal, which functions, and how to fix outside of Python. > I consider the dropping of unreadable names similar to the suppression > of "hidden" files by various operating systems. That is documented, sometimes annoying, and reversible when it is. Python should at least document doing something similar. tjr From qrczak at knm.org.pl Tue Sep 30 19:37:36 2008 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 30 Sep 2008 19:37:36 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> Message-ID: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com> 2008/9/30 James Y Knight : >>>> u'\udc90\udc90'.encode('utf-8') > '\xed\xb2\x90\xed\xb2\x90' This is wrong: UTF-8 (like other UTF-x) encodes Unicode scalar values, not Unicode code points, i.e. surrogates as such are unencodable. '\xed\xb2\x90' is invalid UTF-8. I've experimentally implemented (not for Python) a different escaping scheme with a similar goal as UTF-8b: undecodable bytes are prefixed with U+0000 instead of being converted to unpaired surrogates, and '\x00' decodes as U+0000 U+0000. Glib provides some functions to convert filenames for display, in a way which is not necessarily reversible (includes some hex escapes in ASCII). -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From janssen at parc.com Tue Sep 30 19:41:05 2008 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Sep 2008 10:41:05 PDT Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: References: <200809300247.20349.victor.stinner@haypocalc.com> <58953.1222789650@parc.com> Message-ID: <61658.1222796465@parc.com> Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen wrote: > > Victor Stinner wrote: > > > >> - listdir(unicode) -> only unicode, *skip* invalid filenames > >> (as asked by Guido) > > > > Is there an option listdir(bytes) which will return *all* filenames (as > > byte sequences)? Otherwise, this seems troubling to me; *something* > > should be returned for filenames which can't be represented, even if > > it's only None. > > Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in > bytes the output is in bytes and it will return everything exactly as > the underlying syscall returns it to you. What about everything else? For instance, if I call os.path.join(, ), I presume I get back a which can be passed to os.listdir() to retrieve the contents of that directory. Bill From guido at python.org Tue Sep 30 19:45:55 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 10:45:55 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 10:28 AM, Georg Brandl wrote: >> How can it *regularly* drive you crazy when "the majority of fie names >> [...] encoded correctly" (as you assert above)? > > Because Office files are a) often named with long, seemingly descriptive > filenames, which invariably means umlauts in German, and b) often sent around > between systems, creating encoding problems. Gotcha. > Having seen how much controversy returning an invalid Unicode string sparks, > and given that it really isn't obvious to the newbie either, I think I now agree > that dropping filenames when calling a listdir() that returns Unicode filenames > is the best solution. I'm a little uneasy with having one function for both > bytes and Unicode return, because that kind of str/unicode mixing I thought we > had left behind in 2.x, but of course can live with it. Well, the *current* Py3k behavior where it may return a mix of bytes and str instances is really messy, and likely to trip up most code that doesn't expect it in a way that makes it hard to debug. However the *proposed* behavior (returns bytes if the arg was bytes, and returns str when the arg was str) is IMO sane, and no different than the polymorphism found in len() or many builtin operations. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Tue Sep 30 19:47:06 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 30 Sep 2008 13:47:06 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com> Message-ID: On Sep 30, 2008, at 1:37 PM, Marcin 'Qrczak' Kowalczyk wrote: > I've experimentally implemented (not for Python) a different escaping > scheme with a similar goal as UTF-8b: undecodable bytes are prefixed > with U+0000 instead of being converted to unpaired surrogates, and > '\x00' decodes as U+0000 U+0000. > > Glib provides some functions to convert filenames for display, in a > way which is not necessarily reversible (includes some hex escapes in > ASCII). This sounds quite promising: 0 is an invalid character in the filesystem API, in the environment, and in command lines, yet not in a unicode string. Good thinking! James From mal at egenix.com Tue Sep 30 19:50:37 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 30 Sep 2008 19:50:37 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E20017.3020405@egenix.com> <48E243CA.1090604@egenix.com> Message-ID: <48E266ED.9020902@egenix.com> On 2008-09-30 18:46, Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 8:20 AM, M.-A. Lemburg wrote: >> In the end, I think it's better not to be clever and just return >> the filenames that cannot be decoded as bytes objects in os.listdir(). > > Unfortunately that's going to break most code that is using > os.listdir(), so it's hardly an improved experience. Right, but this also signals a problem to the application and the application is in the best position to determine a proper work-around. >> Passing those to open() will then open the files as expected, in most >> other cases the application will have to provide explicit conversions >> in whatever way best fits the application. > > In most cases the app will try to concatenate a pathname given as a > string and then it will fail. True, and that's the right thing to do in those cases. The application will have to deal with the problem, e.g. convert the path to bytes and retry the joining, or convert the bytes string to Latin-1 and then convert the result back to bytes (using Latin-1) for passing it to open() (which will of course only work if there are no non-Latin-1 characters in the path dir), or apply a different filename encoding based on the path and then retry to convert the bytes filename into Unicode, or ask the user what to do, etc. There are many possibilities to solve the problem, apply a work-around, or inform the user of ways to correct it. >> Also note that os.listdir() isn't the only source of filesnames. You >> often read them from a file, a database, some socket, etc, so letting >> the application decide what to do is not asking too much, IMHO. > > In all those cases, the code that reads them is responsible for > picking an encoding or relying on a default encoding, and the > resulting filenames are always expressed as text, not bytes. I don't > think it's the same at all. What I was trying to say is that you run into the same problem in other places as well. Trying to have os.listdir() implement some strategy is not going to solve the problem at large. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From guido at python.org Tue Sep 30 19:54:15 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 10:54:15 -0700 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <61658.1222796465@parc.com> References: <200809300247.20349.victor.stinner@haypocalc.com> <58953.1222789650@parc.com> <61658.1222796465@parc.com> Message-ID: On Tue, Sep 30, 2008 at 10:41 AM, Bill Janssen wrote: > Guido van Rossum wrote: >> On Tue, Sep 30, 2008 at 8:47 AM, Bill Janssen wrote: >> > Victor Stinner wrote: >> > >> >> - listdir(unicode) -> only unicode, *skip* invalid filenames >> >> (as asked by Guido) >> > >> > Is there an option listdir(bytes) which will return *all* filenames (as >> > byte sequences)? Otherwise, this seems troubling to me; *something* >> > should be returned for filenames which can't be represented, even if >> > it's only None. >> >> Yes, os.listdir() becomes polymorphic -- if you pass it a pathname in >> bytes the output is in bytes and it will return everything exactly as >> the underlying syscall returns it to you. > > What about everything else? For instance, if I call > os.path.join(, ), I presume I get back a which can > be passed to os.listdir() to retrieve the contents of that directory. Yeah, Victor's code at http://bugs.python.org/issue3187 (file python3_bytes_filename.patch) does this. More needs to be done but it's a start. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 19:56:45 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 10:56:45 -0700 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com> References: <200809300247.20349.victor.stinner@haypocalc.com> <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com> Message-ID: On Tue, Sep 30, 2008 at 10:59 AM, wrote: > On 02:32 pm, guido at python.org wrote: >> If 2.6 weren't pretty much released already I'd ask to add >> os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer >> that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone >> (benefit of the doubt) and leaves os.getcwdb() alone as well (a strong >> indication the user meant to get bytes in the 3.x version of their >> code. (Similar to using bytes instead of str in 2.6 even though they >> mean the same thing there -- they will be properly separated in 3.x.) > > In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the > "benefit of the doubt" case? It could always be added to 2.7, and the > parity release of 2to3 could have a --2.7 switch that would modify the > behavior of this and other fixers. I'm not sure what you're proposing. *My* proposal is that 2to3 changes os.getcwdu() calls to os.getcwd() and leaves os.getcwd() calls alone -- there's no way to tell whether os.getcwdb() would be a better match, and for portable code, it won't be (since os.getcwdb() is a Unix-only thing). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Tue Sep 30 20:13:49 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 30 Sep 2008 20:13:49 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <200809291407.55291.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: Victor Stinner schrieb: > Hi, > > After reading the previous discussion, here is new proposition. > > Python 2.x and Windows are not affected by this issue. Only Python3 on POSIX > (eg. Linux or *BSD) is affected. > > Some system are broken, but Python have to be able to open/copy/move/remove > files with an "invalid filename". > > The issue can wait for Python 3.0.1 / 3.1. > > Windows > ------- > > On Windows, we might reject bytes filenames for all file operations: open(), > unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) Since I've seen no objections to this yet: please no. If we offer a "lower-level" bytes filename API, it should work for all platforms. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Tue Sep 30 20:20:22 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 11:20:22 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl wrote: > Victor Stinner schrieb: >> On Windows, we might reject bytes filenames for all file operations: open(), >> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) > > Since I've seen no objections to this yet: please no. If we offer a > "lower-level" bytes filename API, it should work for all platforms. I'm not sure either way. I've heard it claim that Windows filesystem APIs use Unicode natively. Does Python 3.0 on Windows currently support filenames expressed as bytes? Are they encoded first before passing to the Unicode APIs? Using what encoding? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Tue Sep 30 20:25:54 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 30 Sep 2008 14:25:54 -0400 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> Message-ID: On Sep 30, 2008, at 12:57 PM, Guido van Rossum wrote: >> And again: if utf-8b isn't acceptable, because it does break things >> in some >> unknown-to-me way, I really can't imagine anything working but just >> going >> back to byte-string access as the only API. It's really not okay >> for the >> "obvious" APIs to be totally broken by unexpected input. Think >> os.getcwd(), >> sys.argv, os.environ. You can't just ignore bad files and call it >> done. > > Actually that is what you *have* to do with the > filesystem-as-a-black-box model. Filesystems reserve the right to fail > occasionally and there's nothing you can do to prevent it -- it would > be unacceptable if the entire disk would stop working because it had > one bad block (unless the bad block is in some kind of master table) > so you just have to deal with it, and you can't wish the problems away > by insisting on a perfect abstraction. What I meant is that ignoring certain files not nearly good enough to solve the problem. python -c "import sys; print sys.argv" "$(echo -e 'filename\x90\x90')" -> python3 fails to start. cd "$(echo -e 'dir\x90')" # Assume said dir exists python -> python3 fails to start. PATH="$PATH:$(echo -e /home/user/dir\x90)" python3 -c "import os; print os.environ['PATH']" -> nope, no PATH. Those aren't good behaviors, and can't be solved simply by pretending certain files don't exist. But please see the U+0000-escape alternative proposed by Marcin. It, unlike utf-8b doesn't depend upon non-standard unicode, so maybe there won't be as much opposition to it. James From g.brandl at gmx.net Tue Sep 30 21:41:22 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 30 Sep 2008 21:41:22 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: Guido van Rossum schrieb: > On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl wrote: >> Victor Stinner schrieb: >>> On Windows, we might reject bytes filenames for all file operations: open(), >>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) >> >> Since I've seen no objections to this yet: please no. If we offer a >> "lower-level" bytes filename API, it should work for all platforms. > > I'm not sure either way. I've heard it claim that Windows filesystem > APIs use Unicode natively. Does Python 3.0 on Windows currently > support filenames expressed as bytes? Are they encoded first before > passing to the Unicode APIs? Using what encoding? Oh, ok. I had assumed Windows just uses a fixed encoding without the problem of misencoded filenames. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From tjreedy at udel.edu Tue Sep 30 21:42:23 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Sep 2008 15:42:23 -0400 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: Guido van Rossum wrote: > On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl wrote: >> Victor Stinner schrieb: >>> On Windows, we might reject bytes filenames for all file operations: open(), >>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) >> Since I've seen no objections to this yet: please no. If we offer a >> "lower-level" bytes filename API, it should work for all platforms. > > I'm not sure either way. I've heard it claim that Windows filesystem > APIs use Unicode natively. Does Python 3.0 on Windows currently > support filenames expressed as bytes? Are they encoded first before > passing to the Unicode APIs? Using what encoding? In 3.0rc1, the listdir doc needs updating: "os.listdir(path) Return a list containing the names of the entries in the directory. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory. Availability: Unix, Windows. On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects." s/Unicode/bytes/ at least for Windows. >>> os.listdir(b'.') [b'countries.txt', b'multeetest.py', b't1.py', b't1.pyc', b't2.py', b'tem', b'temp.py', b'temp.pyc', b'temp2.py', b'temp3.py', b'temp4.py', b'test.py', b'z', b'z.txt'] The bytes names do not work however: >>> t=open(b'tem') Traceback (most recent call last): File "", line 1, in t=open(b'tem') File "C:\Programs\Python30\lib\io.py", line 284, in __new__ return open(*args, **kwargs) File "C:\Programs\Python30\lib\io.py", line 184, in open raise TypeError("invalid file: %r" % file) TypeError: invalid file: b'tem' Is this what you were asking? tjr From qrczak at knm.org.pl Tue Sep 30 21:46:36 2008 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 30 Sep 2008 21:46:36 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> <3f4107910809301037j1c5f7ac0p78fb5e02f4af1f36@mail.gmail.com> Message-ID: <3f4107910809301246j62ce1cb7n6401e6f3b303c46@mail.gmail.com> 2008/9/30 Marcin 'Qrczak' Kowalczyk : > I've experimentally implemented (not for Python) a different escaping > scheme with a similar goal as UTF-8b: undecodable bytes are prefixed > with U+0000 instead of being converted to unpaired surrogates, and > '\x00' decodes as U+0000 U+0000. This was not my idea: mono did that first. http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding "In short, it's a Glorious Hack. Rejoice. Or something." Note that there are many people, including the Unicode list, who consider this evil because they view this as a non-standard modification of UTF-8. I am undecided on how evil it is. (My implementation differs from mono by the strictness of what Unicode sequences can be encoded: mono encodes all and mine does not, OTOH mine is a bijection and mono is not. Both implementations decode all byte sequences of course.) -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From martin at v.loewis.de Tue Sep 30 22:04:42 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 22:04:42 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> Message-ID: <48E2865A.3010404@v.loewis.de> Guido van Rossum wrote: > On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" wrote: >>> Change the default file system encoding to store bytes in Unicode is like >>> introducing a new Python type: . >> Exactly. Seems like the best solution to me, despite your polemics. > > Martin, I don't understand why you are in favor of storing raw bytes > encoded as Latin-1 in Unicode string objects, which clearly gives rise > to mojibake. In the past you have always been staunchly opposed to API > changes or practices that could lead to mojibake (and you had me quite > convinced). True. I try to outweigh the need for simplicity in the API against the need to support all cases. So I see two solutions: a) support bytes as file names. Supports all cases, but complicates the API very much, by pervasively bringing bytes into the status of a character data type. IMO, this must be prevented at all costs. b) make character (Unicode) strings the only string type. Does not immediately support all cases, so some hacks are needed. However, even with the hacks, it preserves the simplicity of the API; the hacks then should ideally be limited to the applications that need it. On this side, I see the following approaches: 1. try to automatically embed non-representable characters into the Unicode strings, e.g. by using PUA characters. Reduces the amount of moji-bake, but produces a lot of difficult issues. 2. let applications that desire so access all file names in a uniform manner, at the cost of producing tons of moji-bake In this case, I think moji-bake is unavoidable: it is just a plain flaw in the POSIX implementations (not the API or specification) that you can run into file names where you can't come up with the right rendering. Even for solution a), the resulting data cannot be displayed "correctly" in all cases. Currently, I favor b2, but haven't given up on b1, and they don't exclude each other. b2 is simple to implement, and delegates the choice between legible file names and universal access to all files to the application. Given the way Unix works, this is the most sensible choice, IMO: by default, Python should try to make file names legible, but stuff like backup applications should be implementable also - and they don't need legible file names. I think option a) will hunt us forever. People will ask for more and more features in the bytes type, eventually asking "give us Python 2.x strings back". It already starts: see #3982, where Benjamin asks to have .format added to bytes (for a reason unrelated to file names). Regards, Martin From solipsis at pitrou.net Tue Sep 30 22:11:41 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Sep 2008 20:11:41 +0000 (UTC) Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E2865A.3010404@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > True. I try to outweigh the need for simplicity in the API against the > need to support all cases. So I see two solutions: > > a) (...) > > b) (...) By the way, doesn't all this controversy yearn for a PEP? From tjreedy at udel.edu Tue Sep 30 22:12:20 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Sep 2008 16:12:20 -0400 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: Terry Reedy wrote: > Guido van Rossum wrote: >> I'm not sure either way. I've heard it claim that Windows filesystem >> APIs use Unicode natively. Does Python 3.0 on Windows currently >> support filenames expressed as bytes? Are they encoded first before >> passing to the Unicode APIs? Using what encoding? > [os.listdir(bytes) returns list of bytes, open(bytes) fails] More: The path functions seem also do not work: >>> op.abspath(b'tem') ... path = path.replace("/", "\\") TypeError: expected an object with the buffer interface The error message is a bit cryptic given that the problem is that the arguments to replace should be bytes instead of strings for a bytes path. .basename fails with ... while i and p[i-1] not in '/\\': TypeError: 'in ' requires string as left operand, not int os.rename, os.stat, os.mkdir, os.rmdir work. I presume same is true for others that normally work on windows. tjr From martin at v.loewis.de Tue Sep 30 22:22:07 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 22:22:07 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <200809301611.03027.victor.stinner@haypocalc.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <200809301611.03027.victor.stinner@haypocalc.com> Message-ID: <48E28A6F.8030604@v.loewis.de> > I didn't get an answer to my question: what is the result characters) stored in unicode> + ? I guess that the result is > instead of raising an error > (invalid types). So again: why introducing a new type instead of reusing > existing Python types? I didn't mean to introduce a new data type in the strict sense - merely to pass through undecodable bytes through the regular Unicode type. So the result of adding them is a regular Unicode string. Regards, Martin From martin at v.loewis.de Tue Sep 30 22:29:37 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 22:29:37 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> Message-ID: <48E28C31.6060606@v.loewis.de> Guido van Rossum wrote: > However > the *proposed* behavior (returns bytes if the arg was bytes, and > returns str when the arg was str) is IMO sane, and no different than > the polymorphism found in len() or many builtin operations. My concern still is that it brings the bytes type into the status of another character string type, which is really bad, and will require further modifications to Python for the lifetime of 3.x. This is because applications will then regularly use byte strings for file names on Unix, and regular strings on Windows, and then expect the program to work the same without further modifications. The next question then will be environment variables and command line arguments, for which we then should provide two versions (e.g. sys.argv and sys.argvb; for os.environ, os.environ["PATH"] could mean something different from os.environ[b"PATH"]). And so on (passwd/group file, Tkinter, ...) Regards, Martin From martin at v.loewis.de Tue Sep 30 22:45:55 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 22:45:55 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <48E29003.1010703@v.loewis.de> > I'm not sure either way. I've heard it claim that Windows filesystem > APIs use Unicode natively. Does Python 3.0 on Windows currently > support filenames expressed as bytes? Yes, it does (at least, os.open, os.stat support them, builtin open doesn't). > Are they encoded first before > passing to the Unicode APIs? Using what encoding? They aren't passed to the Unicode (W) APIs (by Python). Instead, they are passed to the "ANSI" (A) APIs (i.e. CP_ACP APIs). On Windows NT+, that API then converts it to Unicode through the CP_ACP (aka "mbcs") encoding; this is inside the system DLLs. CP_ACP is a lossy encoding (from Unicode to bytes): Microsoft uses replacement characters if they can, starting with similarly-looking characters, and falling back to question marks. Regards, Martin From tciny at dword.org Tue Sep 30 22:42:59 2008 From: tciny at dword.org (Jan Althaus) Date: Tue, 30 Sep 2008 21:42:59 +0100 Subject: [Python-3000] Request for documentation: PyModuleDef Message-ID: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org> Please correct me if I'm wrong, but it doesn't seem like there is a full documentation of PyModuleDef's members available? While some of them are intuitive, others aren't. The usage of m_size in particular isn't clear to me. I understand this is the size of additional per-interpreter storage, however I'm not sure how this translates to code; both in terms of declaration and functions such as PyModule_Create (e.g. how/when are the additional members/storage initialised?) Do you reckon it would make sense to add an example for such a case to the Embedding and Extending part of the docs? Ta! Jan From guido at python.org Tue Sep 30 23:06:31 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 14:06:31 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 12:42 PM, Terry Reedy wrote: > Guido van Rossum wrote: >> >> On Tue, Sep 30, 2008 at 11:13 AM, Georg Brandl wrote: >>> >>> Victor Stinner schrieb: >>>> >>>> On Windows, we might reject bytes filenames for all file operations: >>>> open(), >>>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) >>> >>> Since I've seen no objections to this yet: please no. If we offer a >>> "lower-level" bytes filename API, it should work for all platforms. >> >> I'm not sure either way. I've heard it claim that Windows filesystem >> APIs use Unicode natively. Does Python 3.0 on Windows currently >> support filenames expressed as bytes? Are they encoded first before >> passing to the Unicode APIs? Using what encoding? > > In 3.0rc1, the listdir doc needs updating: > "os.listdir(path) > Return a list containing the names of the entries in the directory. The list > is in arbitrary order. It does not include the special entries '.' and '..' > even if they are present in the directory. Availability: Unix, Windows. > > On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will > be a list of Unicode objects." > > s/Unicode/bytes/ at least for Windows. > >>>> os.listdir(b'.') > [b'countries.txt', b'multeetest.py', b't1.py', b't1.pyc', b't2.py', b'tem', > b'temp.py', b'temp.pyc', b'temp2.py', b'temp3.py', b'temp4.py', b'test.py', > b'z', b'z.txt'] > > The bytes names do not work however: > >>>> t=open(b'tem') > Traceback (most recent call last): > File "", line 1, in > t=open(b'tem') > File "C:\Programs\Python30\lib\io.py", line 284, in __new__ > return open(*args, **kwargs) > File "C:\Programs\Python30\lib\io.py", line 184, in open > raise TypeError("invalid file: %r" % file) > TypeError: invalid file: b'tem' > > Is this what you were asking? No, that's because bytes is missing from the explicit list of allowable types in io.open. Victor has a one-line trivial patch for this. Could you try this though? >>> import _fileio >>> _fileio._FileIO(b'tem') -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 23:22:11 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 14:22:11 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E2865A.3010404@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E2865A.3010404@v.loewis.de> Message-ID: On Tue, Sep 30, 2008 at 1:04 PM, "Martin v. L?wis" wrote: > Guido van Rossum wrote: >> On Mon, Sep 29, 2008 at 11:00 PM, "Martin v. L?wis" wrote: >>>> Change the default file system encoding to store bytes in Unicode is like >>>> introducing a new Python type: . >>> Exactly. Seems like the best solution to me, despite your polemics. >> >> Martin, I don't understand why you are in favor of storing raw bytes >> encoded as Latin-1 in Unicode string objects, which clearly gives rise >> to mojibake. In the past you have always been staunchly opposed to API >> changes or practices that could lead to mojibake (and you had me quite >> convinced). > > True. I try to outweigh the need for simplicity in the API against the > need to support all cases. So I see two solutions: > > a) support bytes as file names. Supports all cases, but complicates > the API very much, by pervasively bringing bytes into the status > of a character data type. IMO, this must be prevented at all costs. That's a matter of opinion. I would also like to point out that it is in fact already supported by the system calls. io.open() doesn't, but that's a wrapper around _fileio._FileIO which does support bytes. All other syscalls already do the right thing (even readlink()!) except os.listdir(), which returns a mixture of bytes and str values (which is horrible) and os.getcwd() which needs a bytes equivalent. Victor's patch addresses all these issues. Victor's patch also tries to fix glob.py, fnmatch.py, and posixpath.py. That is more debatable, because this might be the start of a never-ending project. OTOH we have precedents, e.g. the re module similarly supports both bytes and unicode (and makes an effort to avoid mixing them). > b) make character (Unicode) strings the only string type. Does not > immediately support all cases, so some hacks are needed. However, > even with the hacks, it preserves the simplicity of the API; the > hacks then should ideally be limited to the applications that need > it. On this side, I see the following approaches: > 1. try to automatically embed non-representable characters into > the Unicode strings, e.g. by using PUA characters. Reduces > the amount of moji-bake, but produces a lot of difficult issues. > 2. let applications that desire so access all file names in a > uniform manner, at the cost of producing tons of moji-bake > > In this case, I think moji-bake is unavoidable: it is just a plain > flaw in the POSIX implementations (not the API or specification) that > you can run into file names where you can't come up with the right > rendering. Even for solution a), the resulting data cannot > be displayed "correctly" in all cases. But I still like the ultimate solution to displaying names for (a) better: if it's not decodable, display it as the repr() of a bytes object. (Which happens to be its str() as well.) > Currently, I favor b2, but haven't given up on b1, and they don't > exclude each other. b2 is simple to implement, and delegates the > choice between legible file names and universal access to all files > to the application. Given the way Unix works, this is the most sensible > choice, IMO: by default, Python should try to make file names legible, > but stuff like backup applications should be implementable also - > and they don't need legible file names. I don't believe that an application-wide choice is safe. For example the tempfile module manipulates filenames (at least for NamedTemporaryFile) and I think it would be wrong if it were affected by such a global setting. (E.g. the user could pass a suffix argument containing Unicode characters outside Latin-1.) > I think option a) will hunt us forever. People will ask for more and > more features in the bytes type, eventually asking "give us Python > 2.x strings back". It already starts: see #3982, where Benjamin > asks to have .format added to bytes (for a reason unrelated to file > names). I'm not so worried about feature requests for the bytes type unrelated to filesystems; we can either grant them or not, and I am actually in many cases in favor of granting them -- just like we support bytes in the re module as I already mentioned above. The bytes and str types have intentionally similar APIs, because they have similar structure, and even somewhat similar semantics (b'ABC' and 'ABC' have related meanings even if there are subtle differences). I am also encouraged by Glyph's support for (a). He has a lot of practical experience. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 23:24:31 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 14:24:31 -0700 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: On Tue, Sep 30, 2008 at 1:12 PM, Terry Reedy wrote: > Terry Reedy wrote: >> >> Guido van Rossum wrote: > >>> I'm not sure either way. I've heard it claim that Windows filesystem >>> APIs use Unicode natively. Does Python 3.0 on Windows currently >>> support filenames expressed as bytes? Are they encoded first before >>> passing to the Unicode APIs? Using what encoding? > >> [os.listdir(bytes) returns list of bytes, open(bytes) fails] > > More: > > The path functions seem also do not work: > >>>> op.abspath(b'tem') > ... > path = path.replace("/", "\\") > TypeError: expected an object with the buffer interface > > The error message is a bit cryptic given that the problem is that the > arguments to replace should be bytes instead of strings for a bytes path. > > .basename fails with > ... > while i and p[i-1] not in '/\\': > TypeError: 'in ' requires string as left operand, not int > > os.rename, os.stat, os.mkdir, os.rmdir work. I presume same is true for > others that normally work on windows. It looks roughly like the system calls do support bytes (using what encoding?) but the Python code in os.path doesn't. This is the same as the status quo on Linux. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Tue Sep 30 23:31:34 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 01 Oct 2008 07:31:34 +1000 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> Message-ID: <48E29AB6.908@gmail.com> James Y Knight wrote: > Those aren't good behaviors, and can't be solved simply by pretending > certain files don't exist. A couple of output comparisons for two of James's examples (system Python is 2.5.3, the Python : $ python -V Python 2.5.2 $ python -c "import sys; print sys.argv" "$(echo -e 'filename\x90\x90')" ['-c', 'filename\x90\x90'] $ python -c "import os; print os.environ['DUMMY']" filename?? $ ./python -V Python 3.0b3+ $ ./python -c "import sys; print(sys.argv)" "$(echo -e 'filename\x90\x90')" Could not convert argument 3 to str $ ./python -c "import os; print(os.environ['DUMMY'])" Traceback (most recent call last): File "", line 1, in File "/home/ncoghlan/devel/py3k/Lib/os.py", line 389, in __getitem__ return self.data[self.keymap(key)] KeyError: 'DUMMY' (Is there a bug report for these yet?) I'm also starting to wonder if allowing mixed types might be the way to go for these interfaces - leaving the bytes objects in place if the Unicode decode operation fails. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Tue Sep 30 23:33:40 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 30 Sep 2008 23:33:40 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E2865A.3010404@v.loewis.de> Message-ID: <48E29B34.5080202@v.loewis.de> > By the way, doesn't all this controversy yearn for a PEP? There must be a solution for 3.0 (which *could* be "it's a bug, don't use Python 3.0 on such broken systems"); we can't wait for a PEP to resolve this issue for 3.0. Most likely, the solution for 3.0 arrives through BDFL pronouncement, in which case no PEP is needed. Regards, Martin From qrczak at knm.org.pl Tue Sep 30 23:34:37 2008 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 30 Sep 2008 23:34:37 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <48E29335.7090102@g.nevcal.com> References: <200809291407.55291.victor.stinner@haypocalc.com> <48E29335.7090102@g.nevcal.com> Message-ID: <3f4107910809301434j2e23d5f5l84ef14a1d248659b@mail.gmail.com> 2008/9/30 Glenn Linderman : > So the problem is that a Unicode file system interface can't deal with > non-UTF-8 byte streams as file names. > > So it seems there are four suggested approaches, all of which have aspects > that are inconvenient. Let's not forget what happens when a non-UTF-8 file name is read from a file or written to a file, under the assumption that the filename is written to the file directly (which probably breaks for filenames containing newlines or such). > 4) Use of bytes APIs on FS interfaces. This seems to be the "solution" > adopted by Posix that creates the "problem" encountered by Unicode-native > applications. It is cumbersome to deal with within applications that > attempt to display the names. What do Posix-style "open file" dialog boxes > do in this case? http://library.gnome.org/devel/glib/stable/glib-Character-Set-Conversion.html#g-filename-display-name I used to observe three different ways to display such filenames within gedit (including %xx and \xx escapes), but now it is consistent, probably because it switched to using the above function everywhere: $ touch $'abc\xffz' $ gedit The Open dialog shows: abc?z (invalid encoding) When the file is open, the window title and the tab title show: abc?z and the same is in recent file list. It has a bug: it appends " (invalid encoding)" even if the filename contains a correctly encoded U+FFFD character. Nautilus has the same behavior and the same bug because this is a design bug of that function which does not allow to tell whether the conversion was successful. A filename containing a newline is sometimes displayed in two lines, and sometimes with a U+000A character from a fallback font (hex character number in a box). -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From guido at python.org Tue Sep 30 23:34:36 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 14:34:36 -0700 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E28C31.6060606@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E28C31.6060606@v.loewis.de> Message-ID: On Tue, Sep 30, 2008 at 1:29 PM, "Martin v. L?wis" wrote: > Guido van Rossum wrote: >> However >> the *proposed* behavior (returns bytes if the arg was bytes, and >> returns str when the arg was str) is IMO sane, and no different than >> the polymorphism found in len() or many builtin operations. > > My concern still is that it brings the bytes type into the status of > another character string type, which is really bad, and will require > further modifications to Python for the lifetime of 3.x. I'd like to understand why this is "really bad". I though it was by design that the str and bytes types behave pretty similarly. You can use both as dict keys. > This is because applications will then regularly use byte strings for > file names on Unix, and regular strings on Windows, and then expect > the program to work the same without further modifications. It seems that bytes arguments actually *do* work on Windows -- somehow they get decoded. (Unless Terry's report was from 2.x.) > The next > question then will be environment variables and command line arguments, > for which we then should provide two versions (e.g. sys.argv and > sys.argvb; for os.environ, os.environ["PATH"] could mean something > different from os.environ[b"PATH"]). Actually something like that may not be a bad idea. Ian Bicking's webob supports similar double APIs for getting the request parameters out of a request object; I believe request.GET['x'] is a text object and request.GET_str['x'] is the corresponding uninterpreted bytes sequence. I would prefer to have os.environb over os.environ[b"PATH"] though. > And so on (passwd/group file, Tkinter, ...) I assume at some point we can stop and have sufficiently low-level interfaces that everyone can agree are in bytes only. Bytes aren't going away. How does Java deal with this? Its File class doesn't seem to deal in bytes at all. What would its listFiles() method do with undecodable filenames? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 30 23:38:15 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Sep 2008 14:38:15 -0700 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48E29AB6.908@gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> <48E29AB6.908@gmail.com> Message-ID: On Tue, Sep 30, 2008 at 2:31 PM, Nick Coghlan wrote: > I'm also starting to wonder if allowing mixed types might be the way to > go for these interfaces - leaving the bytes objects in place if the > Unicode decode operation fails. No, no, nooooo! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Sep 30 23:40:01 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 23:40:01 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <48E29CB1.5010309@v.loewis.de> >> On Windows, we might reject bytes filenames for all file operations: open(), >> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) > > Since I've seen no objections to this yet: please no. If we offer a > "lower-level" bytes filename API, it should work for all platforms. Unfortunately, it can't. You cannot represent all possible file names in a byte string in Windows (just as you can't do so in a Unicode string on Unix). So using byte strings on Windows would work for some files, but fail for others. In particular, listdir might give you a list of file names which you then can't open/stat/recurse into. (of course, you could use UTF-8 as the file system encoding on Windows, but then you will have to rewrite a lot of C code first) Regards, Martin From martin at v.loewis.de Tue Sep 30 23:42:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 23:42:19 +0200 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: References: <200809291407.55291.victor.stinner@haypocalc.com> Message-ID: <48E29D3B.5030900@v.loewis.de> > Oh, ok. I had assumed Windows just uses a fixed encoding without the problem > of misencoded filenames. It's the other way 'round: On Windows, Unicode file names are the natural choice, and byte strings have limitations. In a sense, Windows got it right - but then, they started later. Unix missed the opportunity of declaring that all file APIs are UTF-8 (except for Plan-9 and OS X, neither being "true" Unix). Regards, Martin From martin at v.loewis.de Tue Sep 30 23:48:33 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 23:48:33 +0200 Subject: [Python-3000] Request for documentation: PyModuleDef In-Reply-To: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org> References: <215BD948-5392-4CDD-AF82-7CCCBEEDD9D7@dword.org> Message-ID: <48E29EB1.4070800@v.loewis.de> Jan Althaus wrote: > Please correct me if I'm wrong, but it doesn't seem like there is a full > documentation of PyModuleDef's members available? That's most likely the case, yes. > While some of them are intuitive, others aren't. The usage of m_size in > particular isn't clear to me. See PEP 3121. > I understand this is the size of > additional per-interpreter storage, however I'm not sure how this > translates to code; both in terms of declaration and functions such as > PyModule_Create (e.g. how/when are the additional members/storage > initialised?) > Do you reckon it would make sense to add an example for such a case to > the Embedding and Extending part of the docs? Sure! Contributions are welcome. Regards, Martin From martin at v.loewis.de Tue Sep 30 23:51:18 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Sep 2008 23:51:18 +0200 Subject: [Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0? In-Reply-To: <48E29AB6.908@gmail.com> References: <200809271404.25654.victor.stinner@haypocalc.com> <48DE705E.6050405@v.loewis.de> <52dc1c820809281334t36086001ie7b87f618b949bdb@mail.gmail.com> <48DFF382.7020006@v.loewis.de> <52dc1c820809281621l3beb260ahec22988a05e74327@mail.gmail.com> <96AAA50A-8C20-4320-A3C7-58B4C33D091D@fuhm.net> <2040788B-98C7-4AA2-94AD-2E85E7DF07E8@fuhm.net> <87od26e3an.fsf@xemacs.org> <6C26CFCA-21E0-4F6B-A314-57358EC08D55@fuhm.net> <48E29AB6.908@gmail.com> Message-ID: <48E29F56.7060206@v.loewis.de> > $ ./python -c "import sys; print(sys.argv)" "$(echo -e 'filename\x90\x90')" > Could not convert argument 3 to str > $ ./python -c "import os; print(os.environ['DUMMY'])" > Traceback (most recent call last): > File "", line 1, in > File "/home/ncoghlan/devel/py3k/Lib/os.py", line 389, in __getitem__ > return self.data[self.keymap(key)] > KeyError: 'DUMMY' > > (Is there a bug report for these yet?) > > I'm also starting to wonder if allowing mixed types might be the way to > go for these interfaces - leaving the bytes objects in place if the > Unicode decode operation fails. While I can sympathize with people having non-ASCII file names on their disks, I can't sympathize with this example. Normal users just don't put \x90 into their command lines, and those who do deserve the error message they get. Regards, Martin From foom at fuhm.net Tue Sep 30 23:59:10 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 30 Sep 2008 17:59:10 -0400 Subject: [Python-3000] New proposition for Python3 bytes filename issue In-Reply-To: <48E29CB1.5010309@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <48E29CB1.5010309@v.loewis.de> Message-ID: <83758335-97EA-441B-A783-05F16EBE6D7A@fuhm.net> On Sep 30, 2008, at 5:40 PM, Martin v. L?wis wrote: >>> On Windows, we might reject bytes filenames for all file >>> operations: open(), >>> unlink(), os.path.join(), etc. (raise a TypeError or UnicodeError) >> >> Since I've seen no objections to this yet: please no. If we offer a >> "lower-level" bytes filename API, it should work for all platforms. > > Unfortunately, it can't. You cannot represent all possible file names > in a byte string in Windows (just as you can't do so in a Unicode > string on Unix). As you mention in the parenthetical below, of course it can. > So using byte strings on Windows would work for some files, but fail > for others. In particular, listdir might give you a list of file names > which you then can't open/stat/recurse into. > > (of course, you could use UTF-8 as the file system encoding on > Windows, > but then you will have to rewrite a lot of C code first) Yes! If there is a byte-string access method for Windows, pretty please make it decode from UTF-8 internally and call the Unicode version of the Windows APIs. The non-unicode windows APIs are pretty much just broken -- Ideally, Python should never be calling those. But, I still don't like the idea of propagating the "sometimes a string, sometimes bytes" APIs...One or the other, please. Either always strings (if and only if a method for assuring decoding always succeeds), or always bytes. James From solipsis at pitrou.net Tue Sep 30 23:55:48 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Sep 2008 23:55:48 +0200 Subject: [Python-3000] [Python-Dev] New proposition for Python3 bytes filename issue In-Reply-To: <48E29B34.5080202@v.loewis.de> References: <200809291407.55291.victor.stinner@haypocalc.com> <200809300202.38574.victor.stinner@haypocalc.com> <48E1C097.8030309@v.loewis.de> <48E2865A.3010404@v.loewis.de> <48E29B34.5080202@v.loewis.de> Message-ID: <1222811748.11841.0.camel@fsol> Le mardi 30 septembre 2008 ? 23:33 +0200, "Martin v. L?wis" a ?crit : > > By the way, doesn't all this controversy yearn for a PEP? > > There must be a solution for 3.0 (which *could* be "it's a bug, > don't use Python 3.0 on such broken systems"); we can't wait for > a PEP to resolve this issue for 3.0. Yes, I was thinking of a PEP for 3.1, with the solution for 3.0 being "it's a bug, don't use Python 3.0 on such broken systems" :-) Regards Antoine. From glyph at divmod.com Tue Sep 30 19:59:32 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Tue, 30 Sep 2008 17:59:32 -0000 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: References: <200809300247.20349.victor.stinner@haypocalc.com> <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> Message-ID: <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com> On 02:32 pm, guido at python.org wrote: >On Tue, Sep 30, 2008 at 6:21 AM, wrote: >>On 12:47 am, victor.stinner at haypocalc.com wrote: >>It sounds like maybe there should be some 2to3 fixers in here >>somewhere, >>too? Not necessarily as part of this patch, but somewhere related? I >>don't >>know what they would do, but it does seem quite likely that code which >>was >>previously correct under 2.6 (using bytes) would suddenly be mixing >>bytes >>and unicode with these APIs. > >Doesn't seem easy for 2to3 to recognize such cases. Actually I think I'm wrong. As far as dealing with glob(), listdir() and friends, I suppose that other bytes/text fixers will already have had their opportunity to deal with getting the type to be the appropriate thing, and if you have glob() it will work as expected in 3.0. (I am really just confirming that I have nothing useful to say here, using too many words to do it: at least, I hope that nobody will waste further time thinking about it as a result.) >If 2.6 weren't pretty much released already I'd ask to add >os.getcwdb() there, as an alias for os.getcwd(), and add a 2to3 fixer >that converts os.getcwdu() to os.getcwd(), leaves os.getcwd() alone >(benefit of the doubt) and leaves os.getcwdb() alone as well (a strong >indication the user meant to get bytes in the 3.x version of their >code. (Similar to using bytes instead of str in 2.6 even though they >mean the same thing there -- they will be properly separated in 3.x.) In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the "benefit of the doubt" case? It could always be added to 2.7, and the parity release of 2to3 could have a --2.7 switch that would modify the behavior of this and other fixers. From glyph at divmod.com Tue Sep 30 20:47:51 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Tue, 30 Sep 2008 18:47:51 -0000 Subject: [Python-3000] [Python-Dev] Patch for an initial support of bytes filename in Python3 In-Reply-To: References: <200809300247.20349.victor.stinner@haypocalc.com> <20080930132151.31635.132601277.divmod.xquotient.434@weber.divmod.com> <20080930175932.31635.989735053.divmod.xquotient.478@weber.divmod.com> Message-ID: <20080930184751.31635.1484325691.divmod.xquotient.520@weber.divmod.com> On 05:56 pm, guido at python.org wrote: >On Tue, Sep 30, 2008 at 10:59 AM, wrote: >>On 02:32 pm, guido at python.org wrote: >>In the absence of a 2.6 getcwdb, perhaps the fixer could just drop the >>"benefit of the doubt" case? It could always be added to 2.7, and the >>parity release of 2to3 could have a --2.7 switch that would modify the >>behavior of this and other fixers. > >I'm not sure what you're proposing. *My* proposal is that 2to3 changes >os.getcwdu() calls to os.getcwd() and leaves os.getcwd() calls alone >-- there's no way to tell whether os.getcwdb() would be a better >match, and for portable code, it won't be (since os.getcwdb() is a >Unix-only thing). My proposal is simply to change getcwd to getcwdb, and getcwdu to getcwd. This preserves whatever bytes/text behavior you are expecting from 2.6 into 3.0. Granted, the fact that unicode is really always the right thing to do on Windows complicates things. I already tend to avoid os.getcwd() though, and this is just one more reason to avoid it. In the rare cases where I really do need it, it looks like os.path.abspath(b".") / os.path.abspath(u".") will provide the clarity that I want.