From cool-rr at cool-rr.com Sat Dec 5 12:55:37 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 5 Dec 2009 11:55:37 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?= =?utf-8?q?_=60start=60_parameter=3F?= Message-ID: I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip adding any start value if none is specified? This current behavior is preventing me from using `sum` to add up a bunch of non- number objects. Ram. From python at mrabarnett.plus.com Sat Dec 5 17:43:15 2009 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 05 Dec 2009 16:43:15 +0000 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: Message-ID: <4B1A8DA3.40904@mrabarnett.plus.com> Ram Rachum wrote: > I noticed that `sum` tries to add zero to your iterable. Why? Why not > just skip adding any start value if none is specified? > > This current behavior is preventing me from using `sum` to add up a > bunch of non-number objects. > Sometimes you might find that the list you're summing is empty. Because 'sum' is most often used with numbers, the default sum of a list is 0. If you want to sum a list of non-numbers, provide a suitable start value. For example, to sum a list of lists a suitable start value is []: >>> sum([[0, 1], [2, 3]], []) [0, 1, 2, 3] I agree that it would be nice if the start value could just be omitted, but then what should 'sum' return if the list is empty? If sum([1, 2]) returned 3, then I'd want sum([]) to return 0. If sum([[1], [2]]) returned [1, 2], then I'd want sum([]) to return []. Unfortunately, I can't have it both ways. From andreengels at gmail.com Sat Dec 5 17:45:33 2009 From: andreengels at gmail.com (Andre Engels) Date: Sat, 5 Dec 2009 17:45:33 +0100 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: Message-ID: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum wrote: > I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip > adding any start value if none is specified? > > This current behavior is preventing me from using `sum` to add up a bunch of non- > number objects. In your proposed implementation, sum([]) would be undefined. -- Andr? Engels, andreengels at gmail.com From cool-rr at cool-rr.com Sat Dec 5 17:56:09 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 5 Dec 2009 16:56:09 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?= =?utf-8?q?_=60start=60=09parameter=3F?= References: <4B1A8DA3.40904@mrabarnett.plus.com> Message-ID: > Sometimes you might find that the list you're summing is empty. Because > 'sum' is most often used with numbers, the default sum of a list is 0. > If you want to sum a list of non-numbers, provide a suitable start > value. For example, to sum a list of lists a suitable start value is []: > > >>> sum([[0, 1], [2, 3]], []) > [0, 1, 2, 3] > > I agree that it would be nice if the start value could just be omitted, > but then what should 'sum' return if the list is empty? I see the problem. I think a good solution would be to tell the user, "If you want `sum` to be able to handle a non-empty list, you must supply `start`." Users that want to add up a (possibly empty) sequence of numbers will have to specify `start`. If start is supplied, it will work like it does now. If start isn't supplied, it will add up all the elements without adding any `start` to them. What do you think? From george.sakkis at gmail.com Sat Dec 5 18:01:01 2009 From: george.sakkis at gmail.com (George Sakkis) Date: Sat, 5 Dec 2009 19:01:01 +0200 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> Message-ID: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: > On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum wrote: >> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip >> adding any start value if none is specified? >> >> This current behavior is preventing me from using `sum` to add up a bunch of non- >> number objects. > > In your proposed implementation, sum([]) would be undefined. Which would make it consistent with min/max. George From algorias at gmail.com Sat Dec 5 18:23:19 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Sat, 5 Dec 2009 14:23:19 -0300 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> Message-ID: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> 2009/12/5 George Sakkis : > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: > >> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum wrote: >>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip >>> adding any start value if none is specified? >>> >>> This current behavior is preventing me from using `sum` to add up a bunch of non- >>> number objects. >> >> In your proposed implementation, sum([]) would be undefined. > > Which would make it consistent with min/max. And in that case the special string handling could also be dropped? >>> sum(["a","b"], "start") Traceback (most recent call last): File "", line 1, in sum(["a","b"], "start") TypeError: sum() can't sum strings [use ''.join(seq) instead] This behaviour is quite bothersome. Sum can handle arbitrary objects in theory (as long as they define the correct special methods, etc.), but it gratuitously raises an exception on strings. This behaviour is also inconsistent with the following: >>> sum(["a","b"]) Traceback (most recent call last): File "", line 1, in sum(["a","b"]) TypeError: unsupported operand type(s) for +: 'int' and 'str' Where sum actually tries to add "a" to the default value of 0. From g.brandl at gmx.net Sat Dec 5 18:33:13 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 05 Dec 2009 18:33:13 +0100 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com> Message-ID: Ram Rachum schrieb: >> Sometimes you might find that the list you're summing is empty. Because >> 'sum' is most often used with numbers, the default sum of a list is 0. >> If you want to sum a list of non-numbers, provide a suitable start >> value. For example, to sum a list of lists a suitable start value is []: >> >> >>> sum([[0, 1], [2, 3]], []) >> [0, 1, 2, 3] >> >> I agree that it would be nice if the start value could just be omitted, >> but then what should 'sum' return if the list is empty? > > > I see the problem. I think a good solution would be to tell the user, "If you > want `sum` to be able to handle a non-empty list, you must supply `start`." > Users that want to add up a (possibly empty) sequence of numbers will have to > specify `start`. > > If start is supplied, it will work like it does now. If start isn't supplied, it > will add up all the elements without adding any `start` to them. > > What do you think? There is a choice between these two variants: a) require start for non-numerical sequences b) require start for possibly empty sequences I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that case b) -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sat Dec 5 18:35:07 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 05 Dec 2009 18:35:07 +0100 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com> Message-ID: Ram Rachum schrieb: >> Sometimes you might find that the list you're summing is empty. Because >> 'sum' is most often used with numbers, the default sum of a list is 0. >> If you want to sum a list of non-numbers, provide a suitable start >> value. For example, to sum a list of lists a suitable start value is []: >> >> >>> sum([[0, 1], [2, 3]], []) >> [0, 1, 2, 3] >> >> I agree that it would be nice if the start value could just be omitted, >> but then what should 'sum' return if the list is empty? > > > I see the problem. I think a good solution would be to tell the user, "If you > want `sum` to be able to handle a non-empty list, you must supply `start`." > Users that want to add up a (possibly empty) sequence of numbers will have to > specify `start`. > > If start is supplied, it will work like it does now. If start isn't supplied, it > will add up all the elements without adding any `start` to them. > > What do you think? (sorry, pressed wrong key) There is a choice between these two variants: a) require start for non-numerical sequences b) require start for possibly empty sequences I don't have a preference for either, so for compatibility's sake I would vote to keep the current one, which is a). It also stands to reason that buggy usage in case b) is harder to detect, since the common case will not uncover the bug (the sequence being nonempty), while for case a) it does. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sat Dec 5 18:36:32 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 05 Dec 2009 18:36:32 +0100 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> Message-ID: Vitor Bosshard schrieb: > 2009/12/5 George Sakkis : >> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: >> >>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum wrote: >>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip >>>> adding any start value if none is specified? >>>> >>>> This current behavior is preventing me from using `sum` to add up a bunch of non- >>>> number objects. >>> >>> In your proposed implementation, sum([]) would be undefined. >> >> Which would make it consistent with min/max. > > > And in that case the special string handling could also be dropped? > >>>> sum(["a","b"], "start") > Traceback (most recent call last): > File "", line 1, in > sum(["a","b"], "start") > TypeError: sum() can't sum strings [use ''.join(seq) instead] > > > This behaviour is quite bothersome. Sum can handle arbitrary objects > in theory (as long as they define the correct special methods, etc.), > but it gratuitously raises an exception on strings. This seems to be an instance where the "practicality" Zen rule beats the "special cases" rule :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From algorias at gmail.com Sat Dec 5 19:04:42 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Sat, 5 Dec 2009 15:04:42 -0300 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> Message-ID: <2987c46d0912051004j3d590135j392e770219a0dbbe@mail.gmail.com> 2009/12/5 Georg Brandl : > Vitor Bosshard schrieb: >> 2009/12/5 George Sakkis : >>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: >>> >>>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum wrote: >>>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip >>>>> adding any start value if none is specified? >>>>> >>>>> This current behavior is preventing me from using `sum` to add up a bunch of non- >>>>> number objects. >>>> >>>> In your proposed implementation, sum([]) would be undefined. >>> >>> Which would make it consistent with min/max. >> >> >> And in that case the special string handling could also be dropped? >> >>>>> sum(["a","b"], "start") >> Traceback (most recent call last): >> ? File "", line 1, in >> ? ? sum(["a","b"], "start") >> TypeError: sum() can't sum strings [use ''.join(seq) instead] >> >> >> This behaviour is quite bothersome. Sum can handle arbitrary objects >> in theory (as long as they define the correct special methods, etc.), >> but it gratuitously raises an exception on strings. > > This seems to be an instance where the "practicality" Zen rule beats the > "special cases" rule :) > It might be more accurate to say "hand-holding" instead of practicality (and it doesn't even catch all errors it's meant to). I'm not so sure that's special enough ;-) Vitor From stephen at xemacs.org Sat Dec 5 19:10:51 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 06 Dec 2009 03:10:51 +0900 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> Message-ID: <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> George Sakkis writes: > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: > > In your proposed implementation, sum([]) would be undefined. > > Which would make it consistent with min/max. There's no justification for trying to make 'min' and 'sum' consistent. The sum of an empty list of numbers is a well-defined *number*, namely 0, but the max of an empty list of numbers is a well-defined *non-number*, namely "minus infinity". The real question is "what harm is done by preferring the (well-defined) sum of an empty list of numbers over the (well-defined) empty sums of lists and/or strings?" Then, if there is any harm, "can the situation be improved by having no useful default for empty lists of any type?" Finally, "is it worth breaking existing code to ensure equal treatment of different types?" My guess is that the answers are "very little", "hardly at all", and "emphatically no." From cool-rr at cool-rr.com Sat Dec 5 19:05:59 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 5 Dec 2009 18:05:59 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?= =?utf-8?q?_=60start=60=09parameter=3F?= References: <4B1A8DA3.40904@mrabarnett.plus.com> Message-ID: > There is a choice between these two variants: > > a) require start for non-numerical sequences > b) require start for possibly empty sequences > > I don't have a preference for either, so for compatibility's sake I would > vote to keep the current one, which is a). It also stands to reason that > buggy usage in case b) is harder to detect, since the common case will > not uncover the bug (the sequence being nonempty), while for case a) it does. I prefer (b). The problem with requiring `start` for sequences of non-numerical objects is that you now have to go out and create a "zero object" of the same type as your other objects. The object class might not even have a concept of a "zero object". Ram. From python at mrabarnett.plus.com Sat Dec 5 19:12:31 2009 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 05 Dec 2009 18:12:31 +0000 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com> Message-ID: <4B1AA28F.7040809@mrabarnett.plus.com> Georg Brandl wrote: > Ram Rachum schrieb: >>> Sometimes you might find that the list you're summing is empty. >>> Because 'sum' is most often used with numbers, the default sum of >>> a list is 0. If you want to sum a list of non-numbers, provide a >>> suitable start value. For example, to sum a list of lists a >>> suitable start value is []: >>> >>>>>> sum([[0, 1], [2, 3]], []) >>> [0, 1, 2, 3] >>> >>> I agree that it would be nice if the start value could just be >>> omitted, but then what should 'sum' return if the list is empty? >> >> I see the problem. I think a good solution would be to tell the >> user, "If you want `sum` to be able to handle a non-empty list, you >> must supply `start`." Users that want to add up a (possibly empty) >> sequence of numbers will have to specify `start`. >> >> If start is supplied, it will work like it does now. If start isn't >> supplied, it will add up all the elements without adding any >> `start` to them. >> >> What do you think? > > (sorry, pressed wrong key) > > There is a choice between these two variants: > > a) require start for non-numerical sequences > b) require start for possibly empty sequences > > I don't have a preference for either, so for compatibility's sake I > would vote to keep the current one, which is a). It also stands to > reason that buggy usage in case b) is harder to detect, since the > common case will not uncover the bug (the sequence being nonempty), > while for case a) it does. > True, providing start will ensure that the result is of the correct class, instead of it sometimes being an int, causing a TypeError later on. From python at mrabarnett.plus.com Sat Dec 5 19:18:08 2009 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 05 Dec 2009 18:18:08 +0000 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com>

Message-ID: <4B1AA3E0.7000604@mrabarnett.plus.com> Ram Rachum wrote: >> There is a choice between these two variants: >> >> a) require start for non-numerical sequences >> b) require start for possibly empty sequences >> >> I don't have a preference for either, so for compatibility's sake I would >> vote to keep the current one, which is a). It also stands to reason that >> buggy usage in case b) is harder to detect, since the common case will >> not uncover the bug (the sequence being nonempty), while for case a) it does. > > > I prefer (b). The problem with requiring `start` for sequences of non-numerical > objects is that you now have to go out and create a "zero object" of the same > type as your other objects. The object class might not even have a concept of a > "zero object". > If the objects can be summed, shouldn't there also be a zero object? Does anyone have an example when that's not possible? From george.sakkis at gmail.com Sat Dec 5 19:23:35 2009 From: george.sakkis at gmail.com (George Sakkis) Date: Sat, 5 Dec 2009 20:23:35 +0200 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull wrote: > George Sakkis writes: > ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: > > ?> > In your proposed implementation, sum([]) would be undefined. > ?> > ?> Which would make it consistent with min/max. > > There's no justification for trying to make 'min' and 'sum' > consistent. ?The sum of an empty list of numbers is a well-defined > *number*, namely 0, but the max of an empty list of numbers is a > well-defined *non-number*, namely "minus infinity". > > The real question is "what harm is done by preferring the > (well-defined) sum of an empty list of numbers over the (well-defined) > empty sums of lists and/or strings?" ?Then, if there is any harm, "can > the situation be improved by having no useful default for empty lists > of any type?" ?Finally, "is it worth breaking existing code to ensure > equal treatment of different types?" > > My guess is that the answers are "very little", "hardly at all", and > "emphatically no." Agreed that there is little harm in preferring numbers over other types when it comes to empty sequences, but the more important question is "should the start argument be used even if the sequence is *not* empty?". The OP doesn't think so and I agree. George From algorias at gmail.com Sat Dec 5 19:39:39 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Sat, 5 Dec 2009 15:39:39 -0300 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> Message-ID: <2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com> 2009/12/5 George Sakkis : > > Agreed that there is little harm in preferring numbers over other > types when it comes to empty sequences, but the more important > question is "should the start argument be used even if the sequence is > *not* empty?". The OP doesn't think so and I agree. > In that case, "default" would be a more appropriate name than "start". That change of concept is a potential break in compatibility. How often is the start argument given as a non-zero value? Not all that often I suppose, but it's still a valid use-case. Ergo, the start argument should never be omitted if it was explicitly set. From janssen at parc.com Sat Dec 5 19:40:21 2009 From: janssen at parc.com (Bill Janssen) Date: Sat, 5 Dec 2009 10:40:21 PST Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> Message-ID: <42682.1260038421@parc.com> George Sakkis wrote: > On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull wrote: > > > George Sakkis writes: > > ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: > > > > ?> > In your proposed implementation, sum([]) would be undefined. > > ?> > > ?> Which would make it consistent with min/max. > > > > There's no justification for trying to make 'min' and 'sum' > > consistent. ?The sum of an empty list of numbers is a well-defined > > *number*, namely 0, but the max of an empty list of numbers is a > > well-defined *non-number*, namely "minus infinity". > > > > The real question is "what harm is done by preferring the > > (well-defined) sum of an empty list of numbers over the (well-defined) > > empty sums of lists and/or strings?" ?Then, if there is any harm, "can > > the situation be improved by having no useful default for empty lists > > of any type?" ?Finally, "is it worth breaking existing code to ensure > > equal treatment of different types?" > > > > My guess is that the answers are "very little", "hardly at all", and > > "emphatically no." > > Agreed that there is little harm in preferring numbers over other > types when it comes to empty sequences, but the more important > question is "should the start argument be used even if the sequence is > *not* empty?". The OP doesn't think so and I agree. Or perhaps, the *default* start value should not be used if it doesn't match in type the first element of a non-empty sequence. An explicitly specified start value should still be used even if the sequence is *not* empty. Bill From cool-rr at cool-rr.com Sat Dec 5 19:42:31 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Sat, 5 Dec 2009 18:42:31 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?= =?utf-8?q?_=60start=60=09parameter=3F?= References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> Message-ID: MRAB writes: > > I prefer (b). The problem with requiring `start` for sequences of non- numerical > > objects is that you now have to go out and create a "zero object" of the same > > type as your other objects. The object class might not even have a concept of a > > "zero object". > > > If the objects can be summed, shouldn't there also be a zero object? > Does anyone have an example when that's not possible? You're right MRAB, probably almost every object type that has a concept of "addition" will have a concept of a zero element. BUT, that zero object has to be created by the user of `sum`, and that has two problems: 1. The user might not know from beforehand which type of object he's adding. Even within the same type there might be problems. What happens when the user is using `sum` to add a bunch of vectors, and he doesn't know from beforehand what the dimensions of the vectors are? How will he know if his zero element should be Vector([0, 0]) or Vector([0, 0, 0]) 2. A smaller problem: The user has to actually create that zero object now, and for some objects the definition might be lengthy, adding needless complexity to the code. Also, using the `start` has some overhead, for creating the zero object and calling __add__. Ram. From rhamph at gmail.com Sat Dec 5 19:48:52 2009 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 5 Dec 2009 11:48:52 -0700 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> Message-ID: On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard wrote: > And in that case the special string handling could also be dropped? > >>>> sum(["a","b"], "start") > Traceback (most recent call last): > ?File "", line 1, in > ? ?sum(["a","b"], "start") > TypeError: sum() can't sum strings [use ''.join(seq) instead] > > > This behaviour is quite bothersome. Sum can handle arbitrary objects > in theory (as long as they define the correct special methods, etc.), > but it gratuitously raises an exception on strings. This behaviour is > also inconsistent with the following: > >>>> sum(["a","b"]) > Traceback (most recent call last): > ?File "", line 1, in > ? ?sum(["a","b"]) > TypeError: unsupported operand type(s) for +: 'int' and 'str' > > > Where sum actually tries to add "a" to the default value of 0. sum is defined by repeatedly adding each number in a sequence. As each number is usually constant, and the size of total grows logarithmically, this is O(n log n) (but due to implementation coarseness it usually isn't distinguished from O(n)). Concatenation however grows the total's size very quickly. You instead get a performance of O(n**2). Same result, wrong algorithm. It would be possible to special case strings, but why? The programmer should know what algorithm they're using and what complexity class it has, so they can pick the right one (''.join(seq) in this case). IOW, handling arbitrary objects is an illusion. For an another example on why the programmer needs to understand the algorithmic complexity of the operations they're using, and that the language should value performance consistency and not just correct output, see ABC's usage of rational numbers: http://python-history.blogspot.com/2009/03/problem-with-integer-division.html -- Adam Olsen, aka Rhamphoryncus From algorias at gmail.com Sat Dec 5 19:55:53 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Sat, 5 Dec 2009 15:55:53 -0300 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> Message-ID: <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> 2009/12/5 Ram Rachum : > MRAB writes: > >> > I prefer (b). The problem with requiring `start` for sequences of non- > numerical >> > objects is that you now have to go out and create a "zero object" of the > same >> > type as your other objects. The object class might not even have a concept > of a >> > "zero object". >> > >> If the objects can be summed, shouldn't there also be a zero object? >> Does anyone have an example when that's not possible? > > You're right MRAB, probably almost every object type that has a concept of > "addition" will have a concept of a zero element. > > BUT, that zero object has to be created by the user of `sum`, and that has two > problems: > > 1. The user might not know from beforehand which type of object he's adding. > Even within the same type there might be problems. What happens when the user is > using `sum` to add a bunch of vectors, and he doesn't know from beforehand what > the dimensions of the vectors are? How will he know if his zero element should > be Vector([0, 0]) or Vector([0, 0, 0]) Ugly, but works: itr = iter(sequence) sum(itr, itr.next()) This is actually a good example in favor of not requiring a start value. From rhamph at gmail.com Sat Dec 5 20:03:02 2009 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 5 Dec 2009 12:03:02 -0700 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> Message-ID: On Sat, Dec 5, 2009 at 11:23, George Sakkis wrote: > On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull wrote: > >> George Sakkis writes: >> ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels wrote: >> >> ?> > In your proposed implementation, sum([]) would be undefined. >> ?> >> ?> Which would make it consistent with min/max. >> >> There's no justification for trying to make 'min' and 'sum' >> consistent. ?The sum of an empty list of numbers is a well-defined >> *number*, namely 0, but the max of an empty list of numbers is a >> well-defined *non-number*, namely "minus infinity". >> >> The real question is "what harm is done by preferring the >> (well-defined) sum of an empty list of numbers over the (well-defined) >> empty sums of lists and/or strings?" ?Then, if there is any harm, "can >> the situation be improved by having no useful default for empty lists >> of any type?" ?Finally, "is it worth breaking existing code to ensure >> equal treatment of different types?" >> >> My guess is that the answers are "very little", "hardly at all", and >> "emphatically no." > > Agreed that there is little harm in preferring numbers over other > types when it comes to empty sequences, but the more important > question is "should the start argument be used even if the sequence is > *not* empty?". The OP doesn't think so and I agree. Only sometimes adding the start value makes it more fragile. If you have Foo() objects that aren't compatible with int and you do sum([Foo(), Foo()]) you get a Foo() back. If your sequence then happens to be empty you do sum([]) and get an int back. The result is likely to be used in a context that's not compatible with int either. Better always fail and require an explicit start if you need it. -- Adam Olsen, aka Rhamphoryncus From george.sakkis at gmail.com Sat Dec 5 20:07:22 2009 From: george.sakkis at gmail.com (George Sakkis) Date: Sat, 5 Dec 2009 21:07:22 +0200 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> <2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com> Message-ID: <91ad5bf80912051107v3b345b55nfc6580a7d088431@mail.gmail.com> On Sat, Dec 5, 2009 at 8:39 PM, Vitor Bosshard wrote: > 2009/12/5 George Sakkis : >> >> Agreed that there is little harm in preferring numbers over other >> types when it comes to empty sequences, but the more important >> question is "should the start argument be used even if the sequence is >> *not* empty?". The OP doesn't think so and I agree. >> > > In that case, "default" would be a more appropriate name than "start". > That change of concept is a potential break in compatibility. How > often is the start argument given as a non-zero value? Not all that > often I suppose, but it's still a valid use-case. Ergo, the start > argument should never be omitted if it was explicitly set. Ok I see the different semantics between 'start' and 'default' and the use cases for each but at the end of the day there should be a way (preferably the default) that given a sequence [x1, ..., xN] one can compute "x1+...+xN" instead of "start+x1+...+xN". George From algorias at gmail.com Sat Dec 5 20:19:06 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Sat, 5 Dec 2009 16:19:06 -0300 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> Message-ID: <2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com> 2009/12/5 Adam Olsen : > On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard wrote: >> And in that case the special string handling could also be dropped? >> >>>>> sum(["a","b"], "start") >> Traceback (most recent call last): >> ?File "", line 1, in >> ? ?sum(["a","b"], "start") >> TypeError: sum() can't sum strings [use ''.join(seq) instead] >> >> >> This behaviour is quite bothersome. Sum can handle arbitrary objects >> in theory (as long as they define the correct special methods, etc.), >> but it gratuitously raises an exception on strings. This behaviour is >> also inconsistent with the following: >> >>>>> sum(["a","b"]) >> Traceback (most recent call last): >> ?File "", line 1, in >> ? ?sum(["a","b"]) >> TypeError: unsupported operand type(s) for +: 'int' and 'str' >> >> >> Where sum actually tries to add "a" to the default value of 0. > > sum is defined by repeatedly adding each number in a sequence. ?As > each number is usually constant, and the size of total grows > logarithmically, this is O(n log n) (but due to implementation > coarseness it usually isn't distinguished from O(n)). > > Concatenation however grows the total's size very quickly. ?You > instead get a performance of O(n**2). ?Same result, wrong algorithm. > > It would be possible to special case strings, but why? ?The programmer > should know what algorithm they're using and what complexity class it > has, so they can pick the right one (''.join(seq) in this case). ?IOW, > handling arbitrary objects is an illusion. I think you misunderstood my point. Sorry if I wasn't clear enough in my original message. I understand the performance characteristics of repeated concatenation vs str.join. I just wonder why the language goes out of its way to catch this particular occurrence of bad code, given there are plenty of ways to misuse sum or any other builtin for that matter. A newbie is more likely to get n**2 performance by using a for loop than sum: final = "" for s in strings: final += s Should python refuse to compile the above snippet? The answer is an emphatic "no". From python at rcn.com Sat Dec 5 20:31:14 2009 From: python at rcn.com (Raymond Hettinger) Date: Sat, 5 Dec 2009 11:31:14 -0800 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? References: Message-ID: [Ram Rachum] >I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip > adding any start value if none is specified? Once the API has been released, it is difficult to change without breaking code. > This current behavior is preventing me from using `sum` to add up a bunch of non- > number objects. You have plenty of options: * use sum() as designed and supply your own Zero object as a start (see below) * use reduce(operator.add, s) * write a simple for-loop to do summing It's not like summing is a hard task. There's nothing in you situation that would warrant changing the behavior of a published API where sum(s) is defined even when s is of length zero or one. Raymond ------------------------------------ >>> class Zero: ... 'universal zero for addition' ... def __add__(self, other): ... return other ... def __radd__(self, other): ... return other ... >>> Zero() + 'xyz' 'xyz' >>> sum(['xyz pdq'], Zero()) 'xyz pdq' From python at mrabarnett.plus.com Sat Dec 5 20:34:44 2009 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 05 Dec 2009 19:34:44 +0000 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <42682.1260038421@parc.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp> <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com> <42682.1260038421@parc.com> Message-ID: <4B1AB5D4.1000802@mrabarnett.plus.com> Bill Janssen wrote: > George Sakkis wrote: > >> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull >> wrote: >> >>> George Sakkis writes: >>>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels >>>> wrote: >>> >>>>> In your proposed implementation, sum([]) would be undefined. >>>> >>>> Which would make it consistent with min/max. >>> >>> There's no justification for trying to make 'min' and 'sum' >>> consistent. The sum of an empty list of numbers is a >>> well-defined *number*, namely 0, but the max of an empty list of >>> numbers is a well-defined *non-number*, namely "minus infinity". >>> >>> The real question is "what harm is done by preferring the >>> (well-defined) sum of an empty list of numbers over the >>> (well-defined) empty sums of lists and/or strings?" Then, if >>> there is any harm, "can the situation be improved by having no >>> useful default for empty lists of any type?" Finally, "is it >>> worth breaking existing code to ensure equal treatment of >>> different types?" >>> >>> My guess is that the answers are "very little", "hardly at all", >>> and "emphatically no." >> Agreed that there is little harm in preferring numbers over other >> types when it comes to empty sequences, but the more important >> question is "should the start argument be used even if the sequence >> is *not* empty?". The OP doesn't think so and I agree. > > Or perhaps, the *default* start value should not be used if it > doesn't match in type the first element of a non-empty sequence. An > explicitly specified start value should still be used even if the > sequence is *not* empty. > Currently if start is None then the result is None if the sequence is empty, but raises a TypeError otherwise. Would it break any existing code if was this instead: sum(sequence, start=0) If start is None then it's omitted from the summation, unless the sequence is empty, in which case the result is None. From g.brandl at gmx.net Sat Dec 5 21:59:36 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 05 Dec 2009 21:59:36 +0100 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> Message-ID: Vitor Bosshard schrieb: > 2009/12/5 Ram Rachum : >> MRAB writes: >> >>> > I prefer (b). The problem with requiring `start` for sequences of non- >> numerical >>> > objects is that you now have to go out and create a "zero object" of the >> same >>> > type as your other objects. The object class might not even have a concept >> of a >>> > "zero object". >>> > >>> If the objects can be summed, shouldn't there also be a zero object? >>> Does anyone have an example when that's not possible? >> >> You're right MRAB, probably almost every object type that has a concept of >> "addition" will have a concept of a zero element. >> >> BUT, that zero object has to be created by the user of `sum`, and that has two >> problems: >> >> 1. The user might not know from beforehand which type of object he's adding. >> Even within the same type there might be problems. What happens when the user is >> using `sum` to add a bunch of vectors, and he doesn't know from beforehand what >> the dimensions of the vectors are? How will he know if his zero element should >> be Vector([0, 0]) or Vector([0, 0, 0]) > > Ugly, but works: > > itr = iter(sequence) > sum(itr, itr.next()) Or, for sequences: sum(islice(seq, 1), seq[0]) which clearly communicates the need for a non-empty sequence. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From python at rcn.com Sat Dec 5 22:13:45 2009 From: python at rcn.com (Raymond Hettinger) Date: Sat, 5 Dec 2009 13:13:45 -0800 Subject: [Python-ideas] Why does `sum` use a default for the `start`parameter? References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> Message-ID: <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1> >>>> > I prefer (b). The problem with requiring `start` for sequences of non- >>> numerical >>>> > objects is that you now have to go out and create a "zero object" of the >>> same >>>> > type as your other objects. The object class might not even have a concept >>> of a >>>> > "zero object". >>>> > >>>> If the objects can be summed, shouldn't there also be a zero object? Use a single univeral zero object that works for everything. Here's an example from my earlier post: >>> class Zero: ... 'universal zero for addition' ... def __add__(self, other): ... return other ... def __radd__(self, other): ... return other ... >>> Zero() + 'xyz' 'xyz' >>> sum(['xyz', 'pdq'], Zero()) 'xyzpdq' Raymond From ncoghlan at gmail.com Sun Dec 6 00:49:53 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 06 Dec 2009 09:49:53 +1000 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: References: <4B1A8DA3.40904@mrabarnett.plus.com>

Message-ID: <4B1AF1A1.7050709@gmail.com> Ram Rachum wrote: > I prefer (b). The problem with requiring `start` for sequences of non-numerical > objects is that you now have to go out and create a "zero object" of the same > type as your other objects. The object class might not even have a concept of a > "zero object". class _AdditiveIdentity(object): def __add__(self, other): return other __radd__ = __add__ AdditiveIdentity = _AdditiveIdentity() total = sum(itr, start=AdditiveIdentity) if total is AdditiveIdentity: # Iterable was empty else: # we got a real result (Raymond already posted along these lines, but I wanted to point out that by making the identity object a singleton you can save the cost of repeated instantiation and simplify the after-the-fact check for an empty iterable) The other philosophical point here is one Guido has expressed several times in the past: "In general, the type of a return value should not depend on the *value* of an argument" (although the different numeric types tend to blur together a bit in this specific context) With only a default value, sum() could return entirely different types based on whether or not the sequence was empty. With a start value, on the other hand, the type returned must at least be one that is compatible under addition with the start value. You can subvert that a bit through the use of a universal additive identity, but it holds short of that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From tjreedy at udel.edu Sun Dec 6 01:57:38 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 05 Dec 2009 19:57:38 -0500 Subject: [Python-ideas] Why does `sum` use a default for the `start`parameter? In-Reply-To: <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1> References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1> Message-ID: Raymond Hettinger wrote: > >>>>> > I prefer (b). The problem with requiring `start` for sequences of >>>>> non- >>>> numerical >>>>> > objects is that you now have to go out and create a "zero object" >>>>> of the >>>> same >>>>> > type as your other objects. The object class might not even have >>>>> a concept >>>> of a >>>>> > "zero object". >>>>> > >>>>> If the objects can be summed, shouldn't there also be a zero object? > > > Use a single univeral zero object that works for everything. > Here's an example from my earlier post: > >>>> class Zero: > ... 'universal zero for addition' > ... def __add__(self, other): > ... return other > ... def __radd__(self, other): > ... return other > ... >>>> Zero() + 'xyz' > 'xyz' >>>> sum(['xyz', 'pdq'], Zero()) > 'xyzpdq' I would not have expected this to work, as it does not match "The iterable?s items are normally numbers, and are not allowed to be strings." It appears that it is the start value that may not be a string. I suggest a doc fix in http://bugs.python.org/issue7447 FWIW, sum was designed for summing numbers at C speed. I think it probably is as good a compromise as we can get. It is easy to program any other exact behavior one wants, and summing user objects is going to go at Python speed anyway. Certainly, none of the suggested alterations strike me as worth breaking code. Terry Jan Reedy From python at rcn.com Sun Dec 6 02:18:38 2009 From: python at rcn.com (Raymond Hettinger) Date: Sat, 5 Dec 2009 17:18:38 -0800 Subject: [Python-ideas] Why does `sum` use a default for the`start`parameter? References: <4B1A8DA3.40904@mrabarnett.plus.com>

<4B1AA3E0.7000604@mrabarnett.plus.com> <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com> <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1> Message-ID: ["Terry Reedy"] > FWIW, sum was designed for summing numbers at C speed. I think it > probably is as good a compromise as we can get. It is easy to program > any other exact behavior one wants, and summing user objects is going to > go at Python speed anyway. Certainly, none of the suggested alterations > strike me as worth breaking code. Wisely spoken. Raymond From rhamph at gmail.com Sun Dec 6 07:29:05 2009 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 5 Dec 2009 23:29:05 -0700 Subject: [Python-ideas] Why does `sum` use a default for the `start` parameter? In-Reply-To: <2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com> References: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com> <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com> <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com> <2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com> Message-ID: On Sat, Dec 5, 2009 at 12:19, Vitor Bosshard wrote: > I think you misunderstood my point. Sorry if I wasn't clear enough in > my original message. I understand the performance characteristics of > repeated concatenation vs str.join. I just wonder why the language > goes out of its way to catch this particular occurrence of bad code, > given there are plenty of ways to misuse sum or any other builtin for > that matter. A newbie is more likely to get n**2 performance by using > a for loop than sum: > > final = "" > for s in strings: > ? ?final += s > > Should python refuse to compile the above snippet? The answer is an > emphatic "no". All the individual operations there are fine. It's the composition that's wrong. Adding a sanity check would require recognizing that pattern, and changing the semantics of an individual operation based on what surrounds it. Not a nice thing to do. sum() is already a single operation (regardless of how it's implemented), so it doesn't have that problem. -- Adam Olsen, aka Rhamphoryncus From facundobatista at gmail.com Mon Dec 7 11:36:34 2009 From: facundobatista at gmail.com (Facundo Batista) Date: Mon, 7 Dec 2009 07:36:34 -0300 Subject: [Python-ideas] Heap data type In-Reply-To: <3CDA63554E1546DEA84A696B56BB4876@RaymondLaptop1> References: <20090418124357.GA8506@panix.com> <3CDA63554E1546DEA84A696B56BB4876@RaymondLaptop1> Message-ID: On Sat, Apr 18, 2009 at 8:40 PM, Raymond Hettinger wrote: > Facundo, I would like to work with you on this. > I've been the primary maintainer for heapq for a while > and had already started working on something like this > in response to repeated requested to support a key= function > (like that for sorted/min/max). After a not much complicated, but different year (I had a kid!), I'm bringing this thread back to live. There were different proposals of different people about what to do after my initial mail, we can separate them in two sections: - Move the Heap class to the collections module: I'm just changing the heapq module to have an OO interface, instead a bunch of functions. I'm +0 to moving it to "collections", but note that even after the reordering of the stdlib for Py3, the heapq module remained there. - Add functionality to the Heap class: I'm +0 to this, but I don't want to stop this change in function of further functionality... I propose to have the same functionality, in an OO and less error prone way. We can add more functionality afterwards. What do you think? Raymond, let's work together... but don't know where. The Heap class is already coded in my first mail, if you want to start from there and add functionality, I'm +0. If you want me to add tests and push the inclusion of that class into the module, just tell me. Something else, I'm all ears, :) Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From kristjan at ccpgames.com Tue Dec 8 18:51:46 2009 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 8 Dec 2009 17:51:46 +0000 Subject: [Python-ideas] disabling .pyc and .pyo files Message-ID: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Hello there. We have a large project involving multiple perforce branches of hundreds of .py files each. Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them. Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy. The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved. I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time. Do you think that such a command line option would be useful for Python at large? Cheers, Kristj?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Tue Dec 8 19:58:41 2009 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 8 Dec 2009 13:58:41 -0500 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> 2009/12/8 Kristj?n Valur J?nsson : > Hello there. > > We have a large project involving multiple perforce branches of hundreds of > .py files each. > > Although we employ our own import mechanism for the bulk of these files, we > do use the regular import mechanism for an essential core of them. > > > > Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. > This can happen for a variety of reasons, but most often it occurs when .py > files are being removed, or moved in the hierarchy.? The problem is that the > application will happily load and import an orphaned .pyo file, even though > the .py file has gone or moved. > > > > I looked at the import code and I found that it is trivial to block the > reading and writing of .pyo files.? I am about to implement that patch for > our purposes, thus forcing recompilation of the .py files on each run if so > specified.?? This will ensure that the application will execute only the > code represented by the checked-out .py files.? But it occurred to me that > this functionality might be of interest to other people than just us.? I can > imagine, for example, that buildbots running the python regression testsuite > might be running into problems with stray .pyo files from time to time. > > > > Do you think that such a command line option would be useful for Python at > large? > > > > Cheers, > > Kristj?n FWIW: I've been bitten by this more than once, especially on Django projects, mainly during the development cycle. From toddw at activestate.com Tue Dec 8 20:07:46 2009 From: toddw at activestate.com (Todd Whiteman) Date: Tue, 08 Dec 2009 11:07:46 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: <4B1EA402.50704@activestate.com> Kristj?n Valur J?nsson wrote: > I looked at the import code and I found that it is trivial to block the > reading and writing of .pyo files. I am about to implement that patch > for our purposes, thus forcing recompilation of the .py files on each > run if so specified. This will ensure that the application will > execute only the code represented by the checked-out .py files. But it > occurred to me that this functionality might be of interest to other > people than just us. I can imagine, for example, that buildbots running > the python regression testsuite might be running into problems with > stray .pyo files from time to time. > > Do you think that such a command line option would be useful for Python > at large? Yes, this is already implemented (as of Python 2.6), see -B option: http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options From guido at python.org Tue Dec 8 20:10:00 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Dec 2009 11:10:00 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> Message-ID: Agreed. I wonder if this functionality ought to be opt-in instead of opt-out? The only use cases I am aware of are software vendors who don't want to distribute their source (a near-extinct breed for sure...) or people with absurdly small disks (ditto). 2009/12/8 Jesse Noller : > 2009/12/8 Kristj?n Valur J?nsson : >> Hello there. >> >> We have a large project involving multiple perforce branches of hundreds of >> .py files each. >> >> Although we employ our own import mechanism for the bulk of these files, we >> do use the regular import mechanism for an essential core of them. >> >> >> >> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. >> This can happen for a variety of reasons, but most often it occurs when .py >> files are being removed, or moved in the hierarchy.? The problem is that the >> application will happily load and import an orphaned .pyo file, even though >> the .py file has gone or moved. >> >> >> >> I looked at the import code and I found that it is trivial to block the >> reading and writing of .pyo files.? I am about to implement that patch for >> our purposes, thus forcing recompilation of the .py files on each run if so >> specified.?? This will ensure that the application will execute only the >> code represented by the checked-out .py files.? But it occurred to me that >> this functionality might be of interest to other people than just us.? I can >> imagine, for example, that buildbots running the python regression testsuite >> might be running into problems with stray .pyo files from time to time. >> >> >> >> Do you think that such a command line option would be useful for Python at >> large? >> >> >> >> Cheers, >> >> Kristj?n > > FWIW: I've been bitten by this more than once, especially on Django > projects, mainly during the development cycle. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Dec 8 20:11:32 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Dec 2009 11:11:32 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <4B1EA402.50704@activestate.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4B1EA402.50704@activestate.com> Message-ID: -B only blocks *writing* of bytecode. I think the OP wants to block *reading*, and only in the specific case where there is no corresponding source code file. 2009/12/8 Todd Whiteman : > Kristj?n Valur J?nsson wrote: >> >> I looked at the import code and I found that it is trivial to block the >> reading and writing of .pyo files. ?I am about to implement that patch for >> our purposes, thus forcing recompilation of the .py files on each run if so >> specified. ? This will ensure that the application will execute only the >> code represented by the checked-out .py files. ?But it occurred to me that >> this functionality might be of interest to other people than just us. ?I can >> imagine, for example, that buildbots running the python regression testsuite >> might be running into problems with stray .pyo files from time to time. >> >> Do you think that such a command line option would be useful for Python at >> large? > > Yes, this is already implemented (as of Python 2.6), see -B option: > http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From john.arbash.meinel at gmail.com Tue Dec 8 20:27:21 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Tue, 08 Dec 2009 13:27:21 -0600 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4B1EA402.50704@activestate.com> Message-ID: <4B1EA899.8060409@gmail.com> Guido van Rossum wrote: > -B only blocks *writing* of bytecode. I think the OP wants to block > *reading*, and only in the specific case where there is no > corresponding source code file. > > 2009/12/8 Todd Whiteman : >> Kristj?n Valur J?nsson wrote: >>> I looked at the import code and I found that it is trivial to block the >>> reading and writing of .pyo files. I am about to implement that patch for >>> our purposes, thus forcing recompilation of the .py files on each run if so >>> specified. This will ensure that the application will execute only the >>> code represented by the checked-out .py files. But it occurred to me that >>> this functionality might be of interest to other people than just us. I can >>> imagine, for example, that buildbots running the python regression testsuite >>> might be running into problems with stray .pyo files from time to time. >>> >>> Do you think that such a command line option would be useful for Python at >>> large? >> Yes, this is already implemented (as of Python 2.6), see -B option: >> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options This would be quite nice for us. In our case we have been bit several times during refactoring. You move one file, but your test suite still passes because .pyc is still around. I think having it be opt-in would be nice. I do think that the standard py2exe code generates a library.zip that only has .pyc or .pyo files (and no .py files). It isn't that we would care if they were present, but I suppose it makes the final .zip file smaller and faster to load? Whatever flag is available, though, I'm sure py2exe could be taught to pass it. John =:-> From python at rcn.com Tue Dec 8 20:34:25 2009 From: python at rcn.com (Raymond Hettinger) Date: Tue, 8 Dec 2009 11:34:25 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> Message-ID: <90229A93A0D24EC387526F18D5741983@RaymondLaptop1> >> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. >> This can happen for a variety of reasons, but most often it occurs when .py >> files are being removed, or moved in the hierarchy. The problem is that the >> application will happily load and import an orphaned .pyo file, even though >> the .py file has gone or moved. I've seen this same problem occur for a number of users. It is recurring opportunity to get tripped-up. Raymond From brett at python.org Tue Dec 8 20:51:04 2009 From: brett at python.org (Brett Cannon) Date: Tue, 8 Dec 2009 11:51:04 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <90229A93A0D24EC387526F18D5741983@RaymondLaptop1> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> <90229A93A0D24EC387526F18D5741983@RaymondLaptop1> Message-ID: On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger wrote: > > Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. >>> This can happen for a variety of reasons, but most often it occurs when >>> .py >>> files are being removed, or moved in the hierarchy. The problem is that >>> the >>> application will happily load and import an orphaned .pyo file, even >>> though >>> the .py file has gone or moved. >>> >> > I've seen this same problem occur for a number of users. > It is recurring opportunity to get tripped-up. Another way that a sys.dont_read_bytecode flag would be helpful is for VMs that don't use Python bytecode (e.g. Jython). They could set this flag to True by default which allows code to introspect on the VM to see if it is using bytecode or not. Plus it would let importlib easily skip bytecode usage on VMs that don't support it instead of trying to come up with some heuristic to pick up on that fact (I have not figured that one out yet, but Jython folk were thinking about having marshal.loads() always throw an exception). -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Tue Dec 8 22:44:01 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 09 Dec 2009 08:44:01 +1100 Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling .pyc and .pyo files) References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: <87ljhdf8f2.fsf@benfinney.id.au> Kristj?n Valur J?nsson writes: > Repeatedly we run into trouble because of stray .pyo (and/or .pyc) > files. This can happen for a variety of reasons, but most often it > occurs when .py files are being removed, or moved in the hierarchy. > The problem is that the application will happily load and import an > orphaned .pyo file, even though the .py file has gone or moved. Yes, I think Python users would benefit from having the above behaviour be opt-in. I suggest: * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the interpreter follows the current behaviour. If ?False?, any bytecode file satisfies an import only if it has a corresponding source file (where ?corresponding? means ?this source file would, if compiled, result in a bytecode file replacing this one?). I suggest this attribute should be implemented as ?True? by default (to match current behaviour), then switched to ?False? by default as soon as feasible. * The ?PYTHONIMPORTORPHANEDBYTECODE? environment variable, when set, causes the interpreter to set the above option ?True?. * The ?-b? option to the interpreter command-line sets the above option ?True?. -- \ ?I have yet to see any problem, however complicated, which, | `\ when you looked at it in the right way, did not become still | _o__) more complicated.? ?Paul Anderson | Ben Finney From collinw at gmail.com Tue Dec 8 23:20:21 2009 From: collinw at gmail.com (Collin Winter) Date: Tue, 8 Dec 2009 14:20:21 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> <90229A93A0D24EC387526F18D5741983@RaymondLaptop1> Message-ID: <43aa6ff70912081420p1cc4d5do6126eb3f50421430@mail.gmail.com> On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon wrote: > Another way that a sys.dont_read_bytecode flag would be helpful is for VMs > that don't use Python bytecode (e.g. Jython). They could set this flag to > True by default which allows code to introspect on the VM to see if it is > using bytecode or not. Plus it would let importlib easily skip bytecode > usage on VMs that don't support it instead of trying to come up with some > heuristic to pick up on that fact (I have not figured that one out yet, but > Jython folk were thinking about having marshal.loads() always throw an > exception). It would also be useful when benchmarking multiple iterations of the same VM. I've considered implementing something like this for Unladen Swallow so that we could more effectively isolate the running binary from global state (with a sys.dont_read_bytecode command-line flag doing for bytecode files what -E does for environment variables). +1 for this in mainline. Collin Winter From greg.ewing at canterbury.ac.nz Tue Dec 8 23:24:15 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 09 Dec 2009 11:24:15 +1300 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <4B1EA899.8060409@gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4B1EA402.50704@activestate.com> <4B1EA899.8060409@gmail.com> Message-ID: <4B1ED20F.8080707@canterbury.ac.nz> John Arbash Meinel wrote: > Whatever flag is available, though, I'm sure py2exe could be taught to > pass it. I'm a bit worried about the idea of adding a flag that is required to turn on functionality that was previously available without any flag. It could make things awkward for launcher scripts that are agnostic about the exact version of Python being used. -- Greg From brett at python.org Wed Dec 9 00:13:48 2009 From: brett at python.org (Brett Cannon) Date: Tue, 8 Dec 2009 15:13:48 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: 2009/12/8 Kristj?n Valur J?nsson > [SNIP] > I looked at the import code and I found that it is trivial to block the > reading and writing of .pyo files. I am about to implement that patch for > our purposes, thus forcing recompilation of the .py files on each run if so > specified. This will ensure that the application will execute only the > code represented by the checked-out .py files. But it occurred to me that > this functionality might be of interest to other people than just us. I can > imagine, for example, that buildbots running the python regression testsuite > might be running into problems with stray .pyo files from time to time. > Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Wed Dec 9 00:23:20 2009 From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=) Date: Tue, 8 Dec 2009 23:23:20 +0000 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> You are right, I was suggesting the former. From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file. That would also help rule out any timestamp problems. But I?m happy with whatever way we agree on to solve the ?orphaned bytecode? problem and glad to see that I?m not the only one experiencing it. Kristj?n From: bcannon at gmail.com [mailto:bcannon at gmail.com] On Behalf Of Brett Cannon Sent: 8. desember 2009 23:14 To: Kristj?n Valur J?nsson Cc: python-ideas at python.org Subject: Re: [Python-ideas] disabling .pyc and .pyo files 2009/12/8 Kristj?n Valur J?nsson > [SNIP] I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files. I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified. This will ensure that the application will execute only the code represented by the checked-out .py files. But it occurred to me that this functionality might be of interest to other people than just us. I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time. Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From debatem1 at gmail.com Wed Dec 9 01:07:35 2009 From: debatem1 at gmail.com (geremy condra) Date: Tue, 8 Dec 2009 19:07:35 -0500 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <4B1EA899.8060409@gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <4B1EA402.50704@activestate.com> <4B1EA899.8060409@gmail.com> Message-ID: On Tue, Dec 8, 2009 at 2:27 PM, John Arbash Meinel wrote: > Guido van Rossum wrote: >> -B only blocks *writing* of bytecode. I think the OP wants to block >> *reading*, and only in the specific case where there is no >> corresponding source code file. >> >> 2009/12/8 Todd Whiteman : >>> Kristj?n Valur J?nsson wrote: >>>> I looked at the import code and I found that it is trivial to block the >>>> reading and writing of .pyo files. ?I am about to implement that patch for >>>> our purposes, thus forcing recompilation of the .py files on each run if so >>>> specified. ? This will ensure that the application will execute only the >>>> code represented by the checked-out .py files. ?But it occurred to me that >>>> this functionality might be of interest to other people than just us. ?I can >>>> imagine, for example, that buildbots running the python regression testsuite >>>> might be running into problems with stray .pyo files from time to time. >>>> >>>> Do you think that such a command line option would be useful for Python at >>>> large? >>> Yes, this is already implemented (as of Python 2.6), see -B option: >>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options > > This would be quite nice for us. In our case we have been bit several > times during refactoring. You move one file, but your test suite still > passes because .pyc is still around. Same experience here. Geremy Condra From eric at trueblade.com Wed Dec 9 01:04:04 2009 From: eric at trueblade.com (Eric Smith) Date: Tue, 08 Dec 2009 19:04:04 -0500 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <87ljhdf8f2.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: <4B1EE974.2090801@trueblade.com> Ben Finney wrote: > Kristj?n Valur J?nsson > writes: > >> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) >> files. This can happen for a variety of reasons, but most often it >> occurs when .py files are being removed, or moved in the hierarchy. >> The problem is that the application will happily load and import an >> orphaned .pyo file, even though the .py file has gone or moved. > > Yes, I think Python users would benefit from having the above behaviour > be opt-in. Agreed. This has bitten me, too. Often when it's a permissions problem where another user has created the .pyc file and I can't overwrite it (this on Windows). > I suggest: > > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the > interpreter follows the current behaviour. If ?False?, any bytecode > file satisfies an import only if it has a corresponding source file > (where ?corresponding? means ?this source file would, if compiled, > result in a bytecode file replacing this one?). I agree with this in principle, but I don't see how you're going to implement it. In order to actually check this condition, aren't you going to have to compile the source code anyway? If so, just skip the bytecode file. Although I guess you could store a hash of the source in the compiled file, or other similar optimizations. > I suggest this attribute should be implemented as ?True? by default > (to match current behaviour), then switched to ?False? by default as > soon as feasible. > > * The ?PYTHONIMPORTORPHANEDBYTECODE? environment variable, when set, > causes the interpreter to set the above option ?True?. > > * The ?-b? option to the interpreter command-line sets the above option > ?True?. Sounds good to me. Eric. From brett at python.org Wed Dec 9 01:45:42 2009 From: brett at python.org (Brett Cannon) Date: Tue, 8 Dec 2009 16:45:42 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> Message-ID: 2009/12/8 Kristj?n Valur J?nsson > You are right, I was suggesting the former. From what cursory glance I > had at the code it seemed simpler to not look for a .pyo file at all, rather > than to add a special rule regarding its relation to a .py file. That would > also help rule out any timestamp problems. But I?m happy with whatever way > we agree on to solve the ?orphaned bytecode? problem and glad to see that > I?m not the only one experiencing it. > > I prefer the former as well (don't read any bytecode no matter if source is available or not); clear and simple semantics that are easy to implement. > > > Kristj?n > > > > *From:* bcannon at gmail.com [mailto:bcannon at gmail.com] *On Behalf Of *Brett > Cannon > *Sent:* 8. desember 2009 23:14 > *To:* Kristj?n Valur J?nsson > *Cc:* python-ideas at python.org > *Subject:* Re: [Python-ideas] disabling .pyc and .pyo files > > > > > > 2009/12/8 Kristj?n Valur J?nsson > > [SNIP] > > I looked at the import code and I found that it is trivial to block the > reading and writing of .pyo files. I am about to implement that patch for > our purposes, thus forcing recompilation of the .py files on each run if so > specified. This will ensure that the application will execute only the > code represented by the checked-out .py files. But it occurred to me that > this functionality might be of interest to other people than just us. I can > imagine, for example, that buildbots running the python regression testsuite > might be running into problems with stray .pyo files from time to time. > > > > Are you suggesting that the flag turn off reading *period*, or only if no > source is available? I think you mean the former while Guido suggested the > latter. > > > > -Brett > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Wed Dec 9 03:28:01 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 09 Dec 2009 13:28:01 +1100 Subject: [Python-ideas] Importing orphaned bytecode files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> Message-ID: <87r5r4ev9q.fsf@benfinney.id.au> Eric Smith writes: > Ben Finney wrote: > > I suggest: > > > > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the > > interpreter follows the current behaviour. If ?False?, any bytecode > > file satisfies an import only if it has a corresponding source file > > (where ?corresponding? means ?this source file would, if compiled, > > result in a bytecode file replacing this one?). > > I agree with this in principle Thanks. > but I don't see how you're going to implement it. In order to actually > check this condition, aren't you going to have to compile the source > code anyway? If so, just skip the bytecode file. Although I guess you > could store a hash of the source in the compiled file, or other > similar optimizations. You seem to be seeing something I was careful not to write. The check is: this source file would, if compiled, result in a bytecode file replacing this one Nowhere there is there anything about the resulting bytecode files being equivalent. I'm limiting the check only to whether the resulting bytecode file would *replace* the existing bytecode file. This doesn't require knowing anything at all about the contents of the current bytecode file; indeed, my intention was to phrase it so that it's checked before bothering to open the existing bytecode file. Is there a better term for this? I'm not well-versed enough in the Python import internals to know. -- \ ?Philosophy is questions that may never be answered. Religion | `\ is answers that may never be questioned.? ?anonymous | _o__) | Ben Finney From guido at python.org Wed Dec 9 04:30:25 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Dec 2009 19:30:25 -0800 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <87r5r4ev9q.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> Message-ID: On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney wrote: > Eric Smith writes: > >> Ben Finney wrote: >> > I suggest: >> > >> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the >> > ? interpreter follows the current behaviour. If ?False?, any bytecode >> > ? file satisfies an import only if it has a corresponding source file >> > ? (where ?corresponding? means ?this source file would, if compiled, >> > ? result in a bytecode file replacing this one?). >> >> I agree with this in principle > > Thanks. > >> but I don't see how you're going to implement it. In order to actually >> check this condition, aren't you going to have to compile the source >> code anyway? If so, just skip the bytecode file. Although I guess you >> could store a hash of the source in the compiled file, or other >> similar optimizations. > > You seem to be seeing something I was careful not to write. The check > is: > > ? this source file would, if compiled, result in a bytecode file > ? replacing this one > > Nowhere there is there anything about the resulting bytecode files being > equivalent. I'm limiting the check only to whether the resulting > bytecode file would *replace* the existing bytecode file. > > This doesn't require knowing anything at all about the contents of the > current bytecode file; indeed, my intention was to phrase it so that > it's checked before bothering to open the existing bytecode file. > > Is there a better term for this? I'm not well-versed enough in the > Python import internals to know. If there was a corresponding source file, it would have been found first -- and the bytecode file would be used *if* it matches the source file (by comparing a timestamp in the bytecode file's header to the actual mtime of the source file). So I'm not sure what there is to do apart from *not* using "lone" bytecode files. (The latter was actually added as a feature at some point so I betcha it's easy to make it conditional on a flag.) -- --Guido van Rossum (python.org/~guido) From ben+python at benfinney.id.au Wed Dec 9 06:38:32 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 09 Dec 2009 16:38:32 +1100 Subject: [Python-ideas] Importing orphaned bytecode files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> Message-ID: <87iqcgemg7.fsf@benfinney.id.au> Guido van Rossum writes: > On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney wrote: > > ? this source file would, if compiled, result in a bytecode file > > ? replacing this one > > > > Nowhere there is there anything about the resulting bytecode files > > being equivalent. I'm limiting the check only to whether the > > resulting bytecode file would *replace* the existing bytecode file. > > > > This doesn't require knowing anything at all about the contents of > > the current bytecode file; indeed, my intention was to phrase it so > > that it's checked before bothering to open the existing bytecode > > file. > > > > Is there a better term for this? I'm not well-versed enough in the > > Python import internals to know. > > If there was a corresponding source file, it would have been found > first -- and the bytecode file would be used *if* it matches the > source file (by comparing a timestamp in the bytecode file's header to > the actual mtime of the source file). Right, that's what I thought. I was only looking for a way to say ?only use a bytecode file if the corresponding source code file exists?, and then trying to define ?corresponding source code file?. It appears that all I'm doing is confusing the issue, probably because my understanding of the terminology is fuzzy. I hope someone else can word it better, so the question of ?which file, exactly, are we saying must exist?? is well answered. > So I'm not sure what there is to do apart from *not* using "lone" > bytecode files. (The latter was actually added as a feature at some > point so I betcha it's easy to make it conditional on a flag.) I hope your instinct is right, and I betcha it is too. -- \ ?Intellectual property is to the 21st century what the slave | `\ trade was to the 16th.? ?David Mertz | _o__) | Ben Finney From eric at trueblade.com Wed Dec 9 07:18:45 2009 From: eric at trueblade.com (Eric Smith) Date: Wed, 09 Dec 2009 01:18:45 -0500 Subject: [Python-ideas] Importing orphaned bytecode files Message-ID: Sorry for top posting. My phone makes me! You're right: I misread. Sorry about that. -- Eric. "Ben Finney" wrote: >Eric Smith writes: > >> Ben Finney wrote: >> > I suggest: >> > >> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the >> > interpreter follows the current behaviour. If ?False?, any bytecode >> > file satisfies an import only if it has a corresponding source file >> > (where ?corresponding? means ?this source file would, if compiled, >> > result in a bytecode file replacing this one?). >> >> I agree with this in principle > >Thanks. > >> but I don't see how you're going to implement it. In order to actually >> check this condition, aren't you going to have to compile the source >> code anyway? If so, just skip the bytecode file. Although I guess you >> could store a hash of the source in the compiled file, or other >> similar optimizations. > >You seem to be seeing something I was careful not to write. The check >is: > > this source file would, if compiled, result in a bytecode file > replacing this one > >Nowhere there is there anything about the resulting bytecode files being >equivalent. I'm limiting the check only to whether the resulting >bytecode file would *replace* the existing bytecode file. > >This doesn't require knowing anything at all about the contents of the >current bytecode file; indeed, my intention was to phrase it so that >it's checked before bothering to open the existing bytecode file. > >Is there a better term for this? I'm not well-versed enough in the >Python import internals to know. > >-- > \ ?Philosophy is questions that may never be answered. Religion | > `\ is answers that may never be questioned.? ?anonymous | >_o__) | >Ben Finney > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas From ben+python at benfinney.id.au Wed Dec 9 07:28:19 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 09 Dec 2009 17:28:19 +1100 Subject: [Python-ideas] [OT] Broken email tools (was: Importing orphaned bytecode files) References: Message-ID: <87ein4ek58.fsf@benfinney.id.au> Eric Smith writes: > Sorry for top posting. My phone makes me! No, it really doesn't. If you have a broken tool, please don't inflict its brokenness on others, especially if you *know* it's broken when you use it. -- \ ?Nothing so needs reforming as other people's habits.? ?Mark | `\ Twain, _Pudd'n'head Wilson_ | _o__) | Ben Finney From ncoghlan at gmail.com Wed Dec 9 11:22:35 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Dec 2009 20:22:35 +1000 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <87iqcgemg7.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> <87iqcgemg7.fsf@benfinney.id.au> Message-ID: <4B1F7A6B.9060501@gmail.com> Ben Finney wrote: > Right, that's what I thought. I was only looking for a way to say ?only > use a bytecode file if the corresponding source code file exists?, and > then trying to define ?corresponding source code file?. As Guido said, the check goes the other way: the interpreter looks for source files first, and if it doesn't find one, only then does it look for orphaned bytecode files (pyo/pyc). The check for a corresponding bytecode files after a source file has actually been found follows a different path through the import code. Since the two features are somewhat orthogonal, slicing out the check for orphaned bytecode files while keeping the check for a cached bytecode file should be fairly straightforward. Fair warning to anyone that implements this - expect to be updating quite a few parts of the test suite. The runpy, command line, import and zipimport tests would all need to be updated to make sure they were respecting the flag (and probably the importlib tests as well, at least in Py3k). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From p.f.moore at gmail.com Wed Dec 9 13:40:53 2009 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Dec 2009 12:40:53 +0000 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> Message-ID: <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> 2009/12/9 Brett Cannon : > I prefer the former as well (don't read any bytecode no matter if source is > available or not); clear and simple semantics that are easy to implement. If that's the rule, what is the point in writing bytecode at all? It'll never be read... Paul. From jnoller at gmail.com Wed Dec 9 14:04:01 2009 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 9 Dec 2009 08:04:01 -0500 Subject: [Python-ideas] [OT] Broken email tools (was: Importing orphaned bytecode files) In-Reply-To: <87ein4ek58.fsf@benfinney.id.au> References: <87ein4ek58.fsf@benfinney.id.au> Message-ID: <4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com> On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney wrote: > Eric Smith writes: > >> Sorry for top posting. My phone makes me! > > No, it really doesn't. If you have a broken tool, please don't inflict > its brokenness on others, especially if you *know* it's broken when > you use it. Top posting isn't that big of an issue. Drop it, please. From brett at python.org Wed Dec 9 19:48:30 2009 From: brett at python.org (Brett Cannon) Date: Wed, 9 Dec 2009 10:48:30 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> Message-ID: 2009/12/9 Paul Moore > 2009/12/9 Brett Cannon : > > I prefer the former as well (don't read any bytecode no matter if source > is > > available or not); clear and simple semantics that are easy to implement. > > If that's the rule, what is the point in writing bytecode at all? > It'll never be read... This entire discussion is in the context of having a flag you need to set to turn off bytecode usage; the default behavior is not going to change. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 9 19:52:20 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Dec 2009 10:52:20 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> Message-ID: Could it be as simple as this: -b don't read bytecode (new flag) -B don't write bytecode (existing flag) ? On Wed, Dec 9, 2009 at 10:48 AM, Brett Cannon wrote: > > > 2009/12/9 Paul Moore >> >> 2009/12/9 Brett Cannon : >> > I prefer the former as well (don't read any bytecode no matter if source >> > is >> > available or not); clear and simple semantics that are easy to >> > implement. >> >> If that's the rule, what is the point in writing bytecode at all? >> It'll never be read... > > This entire discussion is in the context of having a flag you need to set to > turn off bytecode usage; the default behavior is not going to change. > -Brett > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- --Guido van Rossum (python.org/~guido) From brett at python.org Wed Dec 9 19:56:03 2009 From: brett at python.org (Brett Cannon) Date: Wed, 9 Dec 2009 10:56:03 -0800 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <4B1F7A6B.9060501@gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> <87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com> Message-ID: On Wed, Dec 9, 2009 at 02:22, Nick Coghlan wrote: > Ben Finney wrote: > > Right, that's what I thought. I was only looking for a way to say ?only > > use a bytecode file if the corresponding source code file exists?, and > > then trying to define ?corresponding source code file?. > > As Guido said, the check goes the other way: the interpreter looks for > source files first, and if it doesn't find one, only then does it look > for orphaned bytecode files (pyo/pyc). > > Just a data point: I reversed that order in importlib to match mental semantics. > The check for a corresponding bytecode files after a source file has > actually been found follows a different path through the import code. > > Since the two features are somewhat orthogonal, slicing out the check > for orphaned bytecode files while keeping the check for a cached > bytecode file should be fairly straightforward. > > Fair warning to anyone that implements this - expect to be updating > quite a few parts of the test suite. The runpy, command line, import and > zipimport tests would all need to be updated to make sure they were > respecting the flag (and probably the importlib tests as well, at least > in Py3k). > Yep for importlib, but I already protect bytecode-writing tests with a decorator for sys.dont_write_bytecode, so doing this for tests that rely on reading bytecode could easily be decorated as well. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Dec 9 19:57:43 2009 From: brett at python.org (Brett Cannon) Date: Wed, 9 Dec 2009 10:57:43 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> Message-ID: On Wed, Dec 9, 2009 at 10:52, Guido van Rossum wrote: > Could it be as simple as this: > > -b don't read bytecode (new flag) > -B don't write bytecode (existing flag) > Unfortunately no: -b is "issue warnings about str(bytes_instance), str(bytearray_instance) and comparing bytes/bytearray with str. (-bb: issue errors)" under python3. -Brett > > ? > > On Wed, Dec 9, 2009 at 10:48 AM, Brett Cannon wrote: > > > > > > 2009/12/9 Paul Moore > >> > >> 2009/12/9 Brett Cannon : > >> > I prefer the former as well (don't read any bytecode no matter if > source > >> > is > >> > available or not); clear and simple semantics that are easy to > >> > implement. > >> > >> If that's the rule, what is the point in writing bytecode at all? > >> It'll never be read... > > > > This entire discussion is in the context of having a flag you need to set > to > > turn off bytecode usage; the default behavior is not going to change. > > -Brett > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jared.grubb at gmail.com Wed Dec 9 20:07:54 2009 From: jared.grubb at gmail.com (Jared Grubb) Date: Wed, 9 Dec 2009 11:07:54 -0800 Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling .pyc and .pyo files) In-Reply-To: <87ljhdf8f2.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: On 8 Dec 2009, at 13:44, Ben Finney wrote: > > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the > interpreter follows the current behaviour. If ?False?, any bytecode > file satisfies an import only if it has a corresponding source file > (where ?corresponding? means ?this source file would, if compiled, > result in a bytecode file replacing this one?). One problem with a sys flag is that it's a global setting. Suppose a package is distributed with only pyc/pyo files, then the top-level __init__.py might flip the switch such that its sub-files can get imported from the pyc/pyo files. But you wouldnt want that flag to persist beyond that. Another idea is to use a new file extension, which isnt the best solution, but allows the creator to explicitly set what behavior they intended for their files: * if a foo.py file exists, then use the existing foo.pyc/pyo as is done today * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but file.pyc/pyo is never used, unlike today) (pyxxx is a placeholder for whatever would be a reasonable name) Jared -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 9 20:11:58 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Dec 2009 11:11:58 -0800 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> <87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com> Message-ID: On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon wrote: > > > On Wed, Dec 9, 2009 at 02:22, Nick Coghlan wrote: >> >> Ben Finney wrote: >> > Right, that's what I thought. I was only looking for a way to say ?only >> > use a bytecode file if the corresponding source code file exists?, and >> > then trying to define ?corresponding source code file?. >> >> As Guido said, the check goes the other way: the interpreter looks for >> source files first, and if it doesn't find one, only then does it look >> for orphaned bytecode files (pyo/pyc). >> > > Just a data point: I reversed that order in importlib to match mental > semantics. IIRC zipimport also reverses the order. >> The check for a corresponding bytecode files after a source file has >> actually been found follows a different path through the import code. >> >> Since the two features are somewhat orthogonal, slicing out the check >> for orphaned bytecode files while keeping the check for a cached >> bytecode file should be fairly straightforward. >> >> Fair warning to anyone that implements this - expect to be updating >> quite a few parts of the test suite. The runpy, command line, import and >> zipimport tests would all need to be updated to make sure they were >> respecting the flag (and probably the importlib tests as well, at least >> in Py3k). > > Yep for importlib, but I already protect bytecode-writing tests with a > decorator for sys.dont_write_bytecode, so doing this for tests that rely on > reading bytecode could easily be decorated as well. > -Brett -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Dec 9 20:27:00 2009 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Dec 2009 11:27:00 -0800 Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling .pyc and .pyo files) In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb wrote: > > On 8 Dec 2009, at 13:44, Ben Finney wrote: > > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the > ?interpreter follows the current behaviour. If ?False?, any bytecode > ?file satisfies an import only if it has a corresponding source file > ?(where ?corresponding? means ?this source file would, if compiled, > ?result in a bytecode file replacing this one?). > > One problem with a sys flag is that it's a global setting. Suppose a package > is distributed with only pyc/pyo files, then the top-level __init__.py might > flip the switch such that its sub-files can get imported from the pyc/pyo > files. But you wouldnt want that flag to persist beyond that. I'm not sure that there are any use cases that require using conflicting values of this setting for different packages. > Another idea is to use a new file extension, which isnt the best solution, > but allows the creator to explicitly set what behavior they intended for > their files: > ??* if a foo.py file exists, then use the existing foo.pyc/pyo as is done > today > ??* if a foo.py file does not exist, but a foo.pyxxx exists, use it (but > file.pyc/pyo is never used, unlike today) > (pyxxx is a placeholder for whatever would be a reasonable name) It's a much bigger change, but using a different extension would probably remove the need for a flag. It would also help with some tools that hide .pyc/.pyo files from view (e.g. the typical .svnignore). -- --Guido van Rossum (python.org/~guido) From john.arbash.meinel at gmail.com Wed Dec 9 20:34:41 2009 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Wed, 09 Dec 2009 13:34:41 -0600 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: <4B1FFBD1.4070004@gmail.com> Guido van Rossum wrote: > On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb wrote: >> On 8 Dec 2009, at 13:44, Ben Finney wrote: >> >> * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the >> interpreter follows the current behaviour. If ?False?, any bytecode >> file satisfies an import only if it has a corresponding source file >> (where ?corresponding? means ?this source file would, if compiled, >> result in a bytecode file replacing this one?). >> >> One problem with a sys flag is that it's a global setting. Suppose a package >> is distributed with only pyc/pyo files, then the top-level __init__.py might >> flip the switch such that its sub-files can get imported from the pyc/pyo >> files. But you wouldnt want that flag to persist beyond that. > > I'm not sure that there are any use cases that require using > conflicting values of this setting for different packages. > Well, during development of your own codebase, where you would like to not import stale .pyc files, but it depends on a 3rd-party library where they only ship you .pyc files. Now if the flag was somehow "for all modules under this namespace" that would easily handle it. Or just living with "if you want to use private 3rd-party libs, then you don't get this support for your own development". (I don't currently do this, but it certainly is *a* use case.) John =:-> From ben+python at benfinney.id.au Wed Dec 9 23:18:21 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 10 Dec 2009 09:18:21 +1100 Subject: [Python-ideas] disabling .pyc and .pyo files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> Message-ID: <87skbjdc5u.fsf@benfinney.id.au> Guido van Rossum writes: > Could it be as simple as this: > > -b don't read bytecode (new flag) > -B don't write bytecode (existing flag) Almost, but I think many in this discussion are agitating for ?don't read orphaned bytecode? to become the default. -- \ ?Visitors are expected to complain at the office between the | `\ hours of 9 and 11 a.m. daily.? ?hotel, Athens | _o__) | Ben Finney From brett at python.org Wed Dec 9 23:43:05 2009 From: brett at python.org (Brett Cannon) Date: Wed, 9 Dec 2009 14:43:05 -0800 Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling .pyc and .pyo files) In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: On Wed, Dec 9, 2009 at 11:27, Guido van Rossum wrote: > On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb > wrote: > > > > On 8 Dec 2009, at 13:44, Ben Finney wrote: > > > > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the > > interpreter follows the current behaviour. If ?False?, any bytecode > > file satisfies an import only if it has a corresponding source file > > (where ?corresponding? means ?this source file would, if compiled, > > result in a bytecode file replacing this one?). > > > > One problem with a sys flag is that it's a global setting. Suppose a > package > > is distributed with only pyc/pyo files, then the top-level __init__.py > might > > flip the switch such that its sub-files can get imported from the pyc/pyo > > files. But you wouldnt want that flag to persist beyond that. > > I'm not sure that there are any use cases that require using > conflicting values of this setting for different packages. > > Same here. This is straying into optimizations for the sake of optimizing. > > Another idea is to use a new file extension, which isnt the best > solution, > > but allows the creator to explicitly set what behavior they intended for > > their files: > > * if a foo.py file exists, then use the existing foo.pyc/pyo as is done > > today > > * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but > > file.pyc/pyo is never used, unlike today) > > (pyxxx is a placeholder for whatever would be a reasonable name) > > It's a much bigger change, but using a different extension would > probably remove the need for a flag. It would also help with some > tools that hide .pyc/.pyo files from view (e.g. the typical > .svnignore). >From a Python VM perspective, the problem with this is it doesn't help improve the situation for other VMs that have no concept of bytecode. If we make pyc/pyo files purely an optimization for CPython (and other VMs that choose to support the format) and not a recognized executable format on its own (like it is now) then that would probably help prevent people from distributing pyc/pyo files only and thus locking out the use of other VMs. I know some people seem to think pyc/pyo fles are a good way to obfuscate code, but it honestly isn't, IMO. But these people stand the most to lose from us even considering changing default behavior. In a perfect world I would make pyc/pyo files completely optional and only an optimization that could not work w/o the corresponding source. But in a backwards-compatible, paranoid world I would make it an opt-in flag to ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Wed Dec 9 23:44:29 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 10 Dec 2009 09:44:29 +1100 Subject: [Python-ideas] Importing orphaned bytecode files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1FFBD1.4070004@gmail.com> Message-ID: <87ocm7daya.fsf@benfinney.id.au> John Arbash Meinel writes: > Or just living with "if you want to use private 3rd-party libs, then > you don't get this support for your own development". FWIW, that's the option I would advocate. The default is to develop and distribute with source; choosing to omit source (or choosing to use such software) is choosing an inferior option for many other reasons as well, so I don't see it as a use case that needs explicit support. -- \ ?A learning experience is one of those things that say, ?You | `\ know that thing you just did? Don't do that.?? ?Douglas Adams, | _o__) 2000-04-05 | Ben Finney From ben+python at benfinney.id.au Thu Dec 10 00:00:04 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 10 Dec 2009 10:00:04 +1100 Subject: [Python-ideas] [OT] Broken email tools References: <87ein4ek58.fsf@benfinney.id.au> <4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com> Message-ID: <87k4wvda8b.fsf@benfinney.id.au> Jesse Noller writes: > On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney wrote: > > Eric Smith writes: > > > >> Sorry for top posting. My phone makes me! > > > > No, it really doesn't. If you have a broken tool, please don't > > inflict its brokenness on others, especially if you *know* it's > > broken when you use it. > > Top posting isn't that big of an issue. Drop it, please. No bigger than other problems of poor human-to-human communication. I agree with Eric that it deserves apology, even if you don't think it's a big deal. -- \ ?In any great organization it is far, far safer to be wrong | `\ with the majority than to be right alone.? ?John Kenneth | _o__) Galbraith, 1989-07-28 | Ben Finney From debatem1 at gmail.com Thu Dec 10 00:04:08 2009 From: debatem1 at gmail.com (geremy condra) Date: Wed, 9 Dec 2009 18:04:08 -0500 Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling .pyc and .pyo files) In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: > In a perfect world I would make pyc/pyo files completely optional and only > an optimization that could not work w/o the corresponding source. But in a > backwards-compatible, paranoid world I would make it an opt-in flag to > ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter. > -Brett FWIW, I'm in about the same boat here. As a somewhat tangential question, is anybody aware of any python3 projects for which requiring source would be an issue? Geremy Condra From fetchinson at googlemail.com Thu Dec 10 00:27:09 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Thu, 10 Dec 2009 00:27:09 +0100 Subject: [Python-ideas] [OT] Broken email tools In-Reply-To: <87k4wvda8b.fsf@benfinney.id.au> References: <87ein4ek58.fsf@benfinney.id.au> <4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com> <87k4wvda8b.fsf@benfinney.id.au> Message-ID: >> >> Sorry for top posting. My phone makes me! >> > >> > No, it really doesn't. If you have a broken tool, please don't >> > inflict its brokenness on others, especially if you *know* it's >> > broken when you use it. >> >> Top posting isn't that big of an issue. Drop it, please. > > No bigger than other problems of poor human-to-human communication. I > agree with Eric that it deserves apology, even if you don't think it's a > big deal. Did you actually make a survey of c.l.p users to determine what fraction finds top posting poor human-to-human communication? My guess is that below 31%. From the top of my head only one name comes to mind who thinks top posting is at least sometimes appropriate: GvR. Note: you are free to install software that will automatically delete any post that is top posted and voila a, you will never be bothered again. Why not do that? Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From solipsis at pitrou.net Thu Dec 10 04:48:54 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 10 Dec 2009 03:48:54 +0000 (UTC) Subject: [Python-ideas] disabling .pyc and .pyo files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> Message-ID: Ben Finney writes: > > Guido van Rossum python.org> writes: > > > Could it be as simple as this: > > > > -b don't read bytecode (new flag) > > -B don't write bytecode (existing flag) > > Almost, but I think many in this discussion are agitating for ?don't > read orphaned bytecode? to become the default. Either to become the default (which might require updates to things like py2exe), or to have a dedicated flag. On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an use case. If you don't want to read any bytecode, don't produce/install it in the first place. Bytecode is useful, it reduces startup times. It's only annoying when the original .py file has been deleted and the obsolete .pyc/.pyo is dangling on disk. cheers Antoine. From collinw at gmail.com Thu Dec 10 05:47:05 2009 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Dec 2009 20:47:05 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> Message-ID: <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> On Wed, Dec 9, 2009 at 7:48 PM, Antoine Pitrou wrote: > Ben Finney writes: >> >> Guido van Rossum python.org> writes: >> >> > Could it be as simple as this: >> > >> > -b don't read bytecode (new flag) >> > -B don't write bytecode (existing flag) >> >> Almost, but I think many in this discussion are agitating for ?don't >> read orphaned bytecode? to become the default. > > Either to become the default (which might require updates to things like > py2exe), or to have a dedicated flag. > On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an > use case. If you don't want to read any bytecode, don't produce/install it in > the first place. I gave such a use-case earlier in this thread: """ It would also be useful when benchmarking multiple iterations of the same VM. I've considered implementing something like this for Unladen Swallow so that we could more effectively isolate the running binary from global state (with a sys.dont_read_bytecode command-line flag doing for bytecode files what -E does for environment variables). """ We currently handle this by deleting all .pyc/.pyo files in our library tree, but that gets more expensive the more third-party libraries we bring in for testing, and it's not foolproof. Collin Winter From solipsis at pitrou.net Thu Dec 10 05:50:40 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 10 Dec 2009 05:50:40 +0100 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> Message-ID: <1260420640.3371.1.camel@localhost> > I gave such a use-case earlier in this thread: > > """ > It would also be useful when benchmarking multiple iterations of the > same VM. I've considered implementing something like this for Unladen > Swallow so that we could more effectively isolate the running binary > from global state (with a sys.dont_read_bytecode command-line flag > doing for bytecode files what -E does for environment variables). > """ I'm not sure I understand the point. Surely importing modules isn't in the critical path (or even in the measured path) of your benchmark, is it? From collinw at gmail.com Thu Dec 10 06:00:19 2009 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Dec 2009 21:00:19 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <1260420640.3371.1.camel@localhost> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> <1260420640.3371.1.camel@localhost> Message-ID: <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com> On Wed, Dec 9, 2009 at 8:50 PM, Antoine Pitrou wrote: > >> I gave such a use-case earlier in this thread: >> >> """ >> It would also be useful when benchmarking multiple iterations of the >> same VM. I've considered implementing something like this for Unladen >> Swallow so that we could more effectively isolate the running binary >> from global state (with a sys.dont_read_bytecode command-line flag >> doing for bytecode files what -E does for environment variables). >> """ > > I'm not sure I understand the point. Surely importing modules isn't in > the critical path (or even in the measured path) of your benchmark, is > it? When changing the bytecode sequence produced by the CPython compiler, it would be useful to make sure that a module is being compiled from scratch (and hence using the new version of the compiler) instead of reusing older bytecode from a .pyc file. You might say that we should simply increase the magic number with each iteration, but I've never found that having to change more code boosts my productivity (especially in cases where changing the magic number is not necessary for compatibility purposes). I understand this may be a fringe use-case, but given the number of optimization projects based on CPython (of which ours is but one), it may still be worth considering. Collin From solipsis at pitrou.net Thu Dec 10 06:04:31 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 10 Dec 2009 06:04:31 +0100 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> <1260420640.3371.1.camel@localhost> <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com> Message-ID: <1260421471.3371.2.camel@localhost> > When changing the bytecode sequence produced by the CPython compiler, > it would be useful to make sure that a module is being compiled from > scratch (and hence using the new version of the compiler) instead of > reusing older bytecode from a .pyc file. You might say that we should > simply increase the magic number with each iteration, Or simply "rm -f `find -name *.pyc`" :-) From collinw at gmail.com Thu Dec 10 06:07:47 2009 From: collinw at gmail.com (Collin Winter) Date: Wed, 9 Dec 2009 21:07:47 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <1260421471.3371.2.camel@localhost> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> <87skbjdc5u.fsf@benfinney.id.au> <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com> <1260420640.3371.1.camel@localhost> <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com> <1260421471.3371.2.camel@localhost> Message-ID: <43aa6ff70912092107n62e9c52ew3ffb8c7ea496f695@mail.gmail.com> On Wed, Dec 9, 2009 at 9:04 PM, Antoine Pitrou wrote: > >> When changing the bytecode sequence produced by the CPython compiler, >> it would be useful to make sure that a module is being compiled from >> scratch (and hence using the new version of the compiler) instead of >> reusing older bytecode from a .pyc file. You might say that we should >> simply increase the magic number with each iteration, > > Or simply "rm -f `find -name *.pyc`" :-) As I said, "We currently handle this by deleting all .pyc/.pyo files in our library tree, but that gets more expensive the more third-party libraries we bring in for testing, and it's not foolproof." I tire of quoting myself. Collin From ncoghlan at gmail.com Thu Dec 10 11:38:06 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Dec 2009 20:38:06 +1000 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> <87r5r4ev9q.fsf@benfinney.id.au> <87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com> Message-ID: <4B20CF8E.7060800@gmail.com> Guido van Rossum wrote: > On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon wrote: >> >> On Wed, Dec 9, 2009 at 02:22, Nick Coghlan wrote: >>> Ben Finney wrote: >>>> Right, that's what I thought. I was only looking for a way to say ?only >>>> use a bytecode file if the corresponding source code file exists?, and >>>> then trying to define ?corresponding source code file?. >>> As Guido said, the check goes the other way: the interpreter looks for >>> source files first, and if it doesn't find one, only then does it look >>> for orphaned bytecode files (pyo/pyc). >>> >> Just a data point: I reversed that order in importlib to match mental >> semantics. > > IIRC zipimport also reverses the order. Hmm, not as orthogonal as I thought then :P I guess it is a credit to the PEP 302 API that I've never needed to care that zipimport might have the check the other way around :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Thu Dec 10 11:43:04 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Dec 2009 20:43:04 +1000 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: <4B20D0B8.9040203@gmail.com> Brett Cannon wrote: > I know some people seem to think pyc/pyo fles are a good way to > obfuscate code, but it honestly isn't, IMO. But these people stand the > most to lose from us even considering changing default behavior. People that think it is a good obfuscation trick often don't realise just how powerful Python's introspection features make the disassembly process. When decompiled software includes the original variable names it is a lot easier to follow than the cryptic mass of symbols that is decompiled machine code. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Thu Dec 10 11:49:28 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Dec 2009 20:49:28 +1000 Subject: [Python-ideas] [OT] Broken email tools In-Reply-To: <87k4wvda8b.fsf@benfinney.id.au> References: <87ein4ek58.fsf@benfinney.id.au> <4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com> <87k4wvda8b.fsf@benfinney.id.au> Message-ID: <4B20D238.9050605@gmail.com> Ben Finney wrote: > No bigger than other problems of poor human-to-human communication. I > agree with Eric that it deserves apology, even if you don't think it's a > big deal. I'd prefer what Eric did (making a valid post, but apologising for using a poor tool to do so) over someone feeling they can't participate in the list discussion just because they don't have a decent email client handy. Now, if someone was to make a habit of it, then sure, they should be encouraged to switch to a better client. But the occasional post while away from your regular computer? Not a problem. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From greg.ewing at canterbury.ac.nz Fri Dec 11 00:17:12 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Dec 2009 12:17:12 +1300 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> Message-ID: <4B218178.2060009@canterbury.ac.nz> Brett Cannon wrote: > In a perfect world I would make pyc/pyo files completely optional and > only an optimization that could not work w/o the corresponding source. That wouldn't be a perfect world in every universe. For example, consider an app installed in an embedded device with limited memory -- the source is never going to be seen by anyone, and all it would do is waste resources. -- Greg From tjreedy at udel.edu Fri Dec 11 00:25:13 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Dec 2009 18:25:13 -0500 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <4B218178.2060009@canterbury.ac.nz> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B218178.2060009@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Brett Cannon wrote: > >> In a perfect world I would make pyc/pyo files completely optional and >> only an optimization that could not work w/o the corresponding source. > > That wouldn't be a perfect world in every universe. For > example, consider an app installed in an embedded device > with limited memory -- the source is never going to be > seen by anyone, and all it would do is waste resources. In a perfect world, memory would not be limited ;-) But valid point for this world. From ben+python at benfinney.id.au Fri Dec 11 02:03:08 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 11 Dec 2009 12:03:08 +1100 Subject: [Python-ideas] Importing orphaned bytecode files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B218178.2060009@canterbury.ac.nz> Message-ID: <87bpi6725v.fsf@benfinney.id.au> Greg Ewing writes: > Brett Cannon wrote: > > > In a perfect world I would make pyc/pyo files completely optional > > and only an optimization that could not work w/o the corresponding > > source. > > That wouldn't be a perfect world in every universe. For example, > consider an app installed in an embedded device with limited memory -- > the source is never going to be seen by anyone, and all it would do is > waste resources. If we're positing a perfect world, then all embedded devices would have the source code available and inspectable by any interested user. -- \ ?We can't depend for the long run on distinguishing one | `\ bitstream from another in order to figure out which rules | _o__) apply.? ?Eben Moglen, _Anarchism Triumphant_, 1999 | Ben Finney From jnoller at gmail.com Fri Dec 11 02:47:33 2009 From: jnoller at gmail.com (Jesse Noller) Date: Thu, 10 Dec 2009 20:47:33 -0500 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <87bpi6725v.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au> Message-ID: <4222a8490912101747h201305e2s53522fbb2c9f7ee5@mail.gmail.com> On Thu, Dec 10, 2009 at 8:03 PM, Ben Finney wrote: > Greg Ewing writes: > >> Brett Cannon wrote: >> >> > In a perfect world I would make pyc/pyo files completely optional >> > and only an optimization that could not work w/o the corresponding >> > source. >> >> That wouldn't be a perfect world in every universe. For example, >> consider an app installed in an embedded device with limited memory -- >> the source is never going to be seen by anyone, and all it would do is >> waste resources. > > If we're positing a perfect world, then all embedded devices would have > the source code available and inspectable by any interested user. Please. Seriously, can we drop this and stop complaining about top posting? I'm pretty sure "alt.general.python.chat" is someplace else. No one cares. From ben+python at benfinney.id.au Fri Dec 11 06:16:44 2009 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 11 Dec 2009 16:16:44 +1100 Subject: [Python-ideas] Importing orphaned bytecode files References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au> <4222a8490912101747h201305e2s53522fbb2c9f7ee5@mail.gmail.com> Message-ID: <87pr6m5bur.fsf@benfinney.id.au> Jesse Noller writes: > Please. Seriously, can we drop this and stop complaining about top > posting? I'm pretty sure "alt.general.python.chat" is someplace else. > No one cares. Er, this discussion isn't related to top posting; and it's hardly off-topic to discuss here about importing bytecode files. -- \ ?I have had a perfectly wonderful evening, but this wasn't it.? | `\ ?Groucho Marx | _o__) | Ben Finney From greg.ewing at canterbury.ac.nz Fri Dec 11 11:58:25 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Dec 2009 23:58:25 +1300 Subject: [Python-ideas] Importing orphaned bytecode files In-Reply-To: <87bpi6725v.fsf@benfinney.id.au> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> <87ljhdf8f2.fsf@benfinney.id.au> <4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au> Message-ID: <4B2225D1.5000806@canterbury.ac.nz> Ben Finney wrote: > If we're positing a perfect world, then all embedded devices would have > the source code available and inspectable by any interested user. The source wouldn't have to be on the actual device to make that possible, though. -- Greg From brett at python.org Fri Dec 11 20:43:29 2009 From: brett at python.org (Brett Cannon) Date: Fri, 11 Dec 2009 11:43:29 -0800 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: I don't know about the rest of you, but I think it's PEP time as the conversation seems to have run its course. Looks like the popular options are a flag to not read any bytecode or to only read bytecode if the source is also available. And then whether the default behavior should change or not. 2009/12/8 Kristj?n Valur J?nsson > Hello there. > > We have a large project involving multiple perforce branches of hundreds of > .py files each. > > Although we employ our own import mechanism for the bulk of these files, we > do use the regular import mechanism for an essential core of them. > > > > Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files. > This can happen for a variety of reasons, but most often it occurs when .py > files are being removed, or moved in the hierarchy. The problem is that the > application will happily load and import an orphaned .pyo file, even though > the .py file has gone or moved. > > > > I looked at the import code and I found that it is trivial to block the > reading and writing of .pyo files. I am about to implement that patch for > our purposes, thus forcing recompilation of the .py files on each run if so > specified. This will ensure that the application will execute only the > code represented by the checked-out .py files. But it occurred to me that > this functionality might be of interest to other people than just us. I can > imagine, for example, that buildbots running the python regression testsuite > might be running into problems with stray .pyo files from time to time. > > > > Do you think that such a command line option would be useful for Python at > large? > > > > Cheers, > > Kristj?n > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rrr at ronadam.com Sat Dec 12 17:59:47 2009 From: rrr at ronadam.com (Ron Adam) Date: Sat, 12 Dec 2009 10:59:47 -0600 Subject: [Python-ideas] disabling .pyc and .pyo files In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> Message-ID: <4B23CC03.5000403@ronadam.com> Brett Cannon wrote: > I don't know about the rest of you, but I think it's PEP time as the > conversation seems to have run its course. Looks like the popular > options are a flag to not read any bytecode or to only read bytecode if > the source is also available. And then whether the default behavior > should change or not. A few additional thoughts... Could the existing -B flag be extended to not read bytecode? It might be considered a bug if bytecode is read when the -B option is used to prevent writing of bytecode. Is there a use case for forcing the use of old bytecode? What was the original intent of the -B flag? Would adding a flag to force the writing of bytecode do what is needed? It would generate a noisy fail if a source file is moved or missing and renew old bytecode files. These two together would give read_none and write_all bytecode modes. With the default mode as the write as needed mode. It may be good to have A utility script in the python tools directory to find and/or remove orphaned bytecode. I'm not sure that just deleting all .py(co) files is always a good idea. A more off the wall random thought ... It might be nice in the future to have all bytecode in a single directory or package combined into a single byte_cache.py(co) file. I think Writing all and reading None bytecode files makes good sense in this context. Ron > 2009/12/8 Kristj?n Valur J?nsson > > > Hello there. > > We have a large project involving multiple perforce branches of > hundreds of .py files each. > > Although we employ our own import mechanism for the bulk of these > files, we do use the regular import mechanism for an essential core > of them. > > > > Repeatedly we run into trouble because of stray .pyo (and/or .pyc) > files. This can happen for a variety of reasons, but most often it > occurs when .py files are being removed, or moved in the hierarchy. > The problem is that the application will happily load and import an > orphaned .pyo file, even though the .py file has gone or moved. > > > > I looked at the import code and I found that it is trivial to block > the reading and writing of .pyo files. I am about to implement that > patch for our purposes, thus forcing recompilation of the .py files > on each run if so specified. This will ensure that the application > will execute only the code represented by the checked-out .py > files. But it occurred to me that this functionality might be of > interest to other people than just us. I can imagine, for example, > that buildbots running the python regression testsuite might be > running into problems with stray .pyo files from time to time. > > > > Do you think that such a command line option would be useful for > Python at large? > > > > Cheers, > > Kristj?n > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From cool-rr at cool-rr.com Tue Dec 15 13:36:46 2009 From: cool-rr at cool-rr.com (cool-RR) Date: Tue, 15 Dec 2009 14:36:46 +0200 Subject: [Python-ideas] Being able to specify "copy mode" to copy.deepcopy Message-ID: This is about the `copy.deepcopy` function. With the __deepcopy__ method, user-defined objects can specify how they will be copied. But it is assumed that you will always want to copy them the same way. What if sometimes you want to copy them in one way and sometimes in another? I am now being held back by this limitation. I will give some background to what I'm doing: I'm developing a simulations framework called GarlicSim. You can see a short video here: http://garlicsim.org/brief_introduction.html The program handles world states in simulated worlds. To generate the next world state in the timeline, the last world state is deepcopied and then modified. Now sometimes in simulations there are big, read-only objects that I don't want to replicate for each world state. For example, a map of the environment in which the simulation takes place. So I have defined a class called `Persistent`, for which I have defined a __deepcopy__ that doesn't actually copy it, but gives a reference to the original object. So now I can use `Persistent` as a sub-class to these big objects that I don't want to replicate. But in some cases I do want to replicate these objects, and I can't! So I suggest that it will be possible to specify a "mode" for copying. User defined objects will be able to specify how they will be deepcopied in each mode. What do you think? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Dec 15 16:29:30 2009 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 15 Dec 2009 15:29:30 +0000 Subject: [Python-ideas] Being able to specify "copy mode" to copy.deepcopy In-Reply-To: References: Message-ID: <4B27AB5A.6010107@mrabarnett.plus.com> cool-RR wrote: > This is about the `copy.deepcopy` function. > > With the __deepcopy__ method, user-defined objects can specify how > they will be copied. But it is assumed that you will always want to > copy them the same way. What if sometimes you want to copy them in > one way and sometimes in another? > > I am now being held back by this limitation. I will give some > background to what I'm doing: > > I'm developing a simulations framework called GarlicSim. You can see > a short video here: http://garlicsim.org/brief_introduction.html The > program handles world states in simulated worlds. To generate the > next world state in the timeline, the last world state is deepcopied > and then modified. > > Now sometimes in simulations there are big, read-only objects that I > don't want to replicate for each world state. For example, a map of > the environment in which the simulation takes place. So I have > defined a class called `Persistent`, for which I have defined a > __deepcopy__ that doesn't actually copy it, but gives a reference to > the original object. So now I can use `Persistent` as a sub-class to > these big objects that I don't want to replicate. > > But in some cases I do want to replicate these objects, and I can't! > > So I suggest that it will be possible to specify a "mode" for > copying. User defined objects will be able to specify how they will > be deepcopied in each mode. > > What do you think? > My own feeling is that this is a misuse of __deepcopy__: if you ask for a copy (of a mutable object) then you should get a copy (for immutable objects copying isn't necessary). From cool-rr at cool-rr.com Tue Dec 15 16:35:52 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Tue, 15 Dec 2009 15:35:52 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Being_able_to_specify_=22copy_mode=22_to?= =?utf-8?q?=09copy=2Edeepcopy?= References: <4B27AB5A.6010107@mrabarnett.plus.com> Message-ID: MRAB writes: > cool-RR wrote: > > What do you think? > > > My own feeling is that this is a misuse of __deepcopy__: if you ask for > a copy (of a mutable object) then you should get a copy (for immutable > objects copying isn't necessary). I agree it that the Persistent.__deecopy__ thing does smell like misuse on my part. However I'd be happy to hear any alternative suggestion you have on how to solve the problem I have. Meanwhile, I thought of a nice backwards-compatible way to implement what I suggest, but I want to know whether this idea makes sense at all to the people here. Ram. From algorias at gmail.com Tue Dec 15 17:19:43 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Tue, 15 Dec 2009 13:19:43 -0300 Subject: [Python-ideas] Being able to specify "copy mode" to copy.deepcopy In-Reply-To: References: <4B27AB5A.6010107@mrabarnett.plus.com> Message-ID: <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com> 2009/12/15 Ram Rachum : > MRAB writes: >> cool-RR wrote: >> > What do you think? >> > >> My own feeling is that this is a misuse of __deepcopy__: if you ask for >> a copy (of a mutable object) then you should get a copy (for immutable >> objects copying isn't necessary). > > > I agree it that the Persistent.__deecopy__ thing does smell like misuse on my > part. However I'd be happy to hear any alternative suggestion you have on how > to solve the problem I have. Deepcopy is a very simple operation conceptually, there's no need to make it more complicated. How about implementing __deepcopy__ in your world state objects? Specify attributes that don't need copying. You can even use the Persistent class to signal that. Something like this (untested!): def __deepcopy__(self): new = self.__class__() for k,v in self.__dict__.iteritems(): setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v)) return new Vitor From jh at improva.dk Tue Dec 15 17:17:35 2009 From: jh at improva.dk (Jacob Holm) Date: Tue, 15 Dec 2009 17:17:35 +0100 Subject: [Python-ideas] Being able to specify "copy mode" to copy.deepcopy In-Reply-To: References: <4B27AB5A.6010107@mrabarnett.plus.com> Message-ID: <4B27B69F.4030506@improva.dk> Ram Rachum wrote: > > I agree it that the Persistent.__deecopy__ thing does smell like misuse on my > part. However I'd be happy to hear any alternative suggestion you have on how > to solve the problem I have. > > Meanwhile, I thought of a nice backwards-compatible way to implement what I > suggest, but I want to know whether this idea makes sense at all to the people > here. > It is already quite easy to abuse the "memo" dict argument of copy.deepcopy to pass this kind of flag to the __deepcopy__ methods. What else do you need? - Jacob From cool-rr at cool-rr.com Tue Dec 15 17:59:57 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Tue, 15 Dec 2009 16:59:57 +0000 (UTC) Subject: [Python-ideas] Being able to specify References: <4B27AB5A.6010107@mrabarnett.plus.com> <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com> Message-ID: Vitor Bosshard writes: > Deepcopy is a very simple operation conceptually, there's no need to > make it more complicated. How about implementing __deepcopy__ in your > world state objects? Specify attributes that don't need copying. You > can even use the Persistent class to signal that. Something like this > (untested!): > > def __deepcopy__(self): > new = self.__class__() > for k,v in self.__dict__.iteritems(): > setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v)) > return new > > Vitor And what happens when State refers to another object which refers to a Persistent? Ram. From algorias at gmail.com Tue Dec 15 18:28:12 2009 From: algorias at gmail.com (Vitor Bosshard) Date: Tue, 15 Dec 2009 14:28:12 -0300 Subject: [Python-ideas] Being able to specify In-Reply-To: References: <4B27AB5A.6010107@mrabarnett.plus.com> <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com> Message-ID: <2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com> 2009/12/15 Ram Rachum : > Vitor Bosshard writes: >> Deepcopy is a very simple operation conceptually, there's no need to >> make it more complicated. How about implementing __deepcopy__ in your >> world state objects? Specify attributes that don't need copying. You >> can even use the Persistent class to signal that. Something like this >> (untested!): >> >> def __deepcopy__(self): >> ? new = self.__class__() >> ? for k,v in self.__dict__.iteritems(): >> ? ? setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v)) >> ? return new >> >> Vitor > > > And what happens when State refers to another object which refers to a > Persistent? Then that object would need to implement the same method, perhaps by inheriting form a common base. The point is that it can be done in a straightforward manner without needing to change the stdlib. Vitor From cool-rr at cool-rr.com Tue Dec 15 18:51:32 2009 From: cool-rr at cool-rr.com (Ram Rachum) Date: Tue, 15 Dec 2009 17:51:32 +0000 (UTC) Subject: [Python-ideas] Being able to specify References: <4B27AB5A.6010107@mrabarnett.plus.com> <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com> <2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com> Message-ID: Vitor Bosshard writes: > > And what happens when State refers to another object which refers to a > > Persistent? > > Then that object would need to implement the same method, perhaps by > inheriting form a common base. And what if the object is from a class defined by a third-party module that I can't change? > The point is that it can be done in a > straightforward manner without needing to change the stdlib. I guess so, yes. My method would be something like what Jacob said, abusing the memo dict to pass the copying mode. But I thought perhaps we can set a standard way for specifying different copy modes, because otherwise I'll do my memo hack and someone else will do his different memo hack and it won't be compatible. I'll detail my hack later today when I'll be back home. Ram. From tjreedy at udel.edu Tue Dec 15 21:59:25 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 15 Dec 2009 15:59:25 -0500 Subject: [Python-ideas] Being able to specify In-Reply-To: References: <4B27AB5A.6010107@mrabarnett.plus.com> <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com> <2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com> Message-ID: On 12/15/2009 12:51 PM, Ram Rachum wrote: > Vitor Bosshard writes: >>> And what happens when State refers to another object which refers to a >>> Persistent? >> >> Then that object would need to implement the same method, perhaps by >> inheriting form a common base. > > And what if the object is from a class defined by a third-party module that I > can't change? > >> The point is that it can be done in a >> straightforward manner without needing to change the stdlib. > > I guess so, yes. My method would be something like what Jacob said, abusing > the memo dict to pass the copying mode. But I thought perhaps we can set a > standard way for specifying different copy modes, because otherwise I'll do my > memo hack and someone else will do his different memo hack and it won't be > compatible. Perhaps you can post a recipe at the Python Cookbook. People who care about compatibility can follow the same recipe. From ncoghlan at gmail.com Tue Dec 15 22:28:25 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Dec 2009 07:28:25 +1000 Subject: [Python-ideas] Being able to specify In-Reply-To: