From guido at python.org Sun Jan 1 00:56:00 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 16:56:00 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: ISTM the only reasonable thing is to have a random seed picked very early in the process, to be used to change the hash() function of str/bytes/unicode (in a way that they are still compatible with each other). The seed should be unique per process except it should survive fork() (but not exec()). I'm not worried about unrelated processes needing to have the same hash(), but I'm not against offering an env variable or command line flag to force the seed. I'm not too concerned about a 3rd party being able to guess the random seed -- this would require much more effort on their part, since they would have to generate a new set of colliding keys each time they think they have guessed the hash (as long as they can't force the seed -- this actually argues slightly *against* offering a way to force the seed, except that we have strong backwards compatibility requirements). We need to fix this as far back as Python 2.6, and it would be nice if a source patch was available that works on Python 2.5 -- personally I do have a need for a 2.5 fix and if nobody creates one I will probably end up backporting the fix from 2.6 to 2.5. Is there a tracker issue yet? The discussion should probably move there. PS. I would propose a specific fix but I can't seem to build a working CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this error late in the build: ./python.exe -SE -m sysconfig --generate-posix-vars Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "/Users/guido/cpython/Lib/io.py", line 60, in make: *** [Lib/_sysconfigdata.py] Abort trap -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 01:11:12 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 17:11:12 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum wrote: > PS. I would propose a specific fix but I can't seem to build a working > CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this > error late in the build: > > ./python.exe -SE -m sysconfig --generate-posix-vars > Fatal Python error: Py_Initialize: can't initialize sys standard streams > Traceback (most recent call last): > File "/Users/guido/cpython/Lib/io.py", line 60, in > make: *** [Lib/_sysconfigdata.py] Abort trap > FWIW I managed to build Python 2.6, and a trivial mutation of the string/unicode hash function (add 1 initially) made only three tests fail; test_symtable and test_json both have a dependency on dictionary order, test_ctypes I can't quite figure out what's going on. Oh, and an unrelated failure in test_sqlite: File "/Users/guido/pythons/p26/Lib/sqlite3/test/types.py", line 355, in CheckSqlTimestamp self.failUnlessEqual(ts.year, now.year) AssertionError: 2012 != 2011 I betcha that's because it's still 2011 here in Texas but already 2012 in UTC-land. Happy New Year everyone! :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcmillan.ws Sun Jan 1 04:29:59 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 31 Dec 2011 19:29:59 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > I'm not too concerned about a 3rd party being able to guess the random seed > -- this would require much more effort on their part, since they would have > to generate a new set of colliding keys each time they think they have > guessed the hash This is incorrect. Once an attacker has guessed the random seed, any operation which reveals the ordering of hashed objects can be used to verify the answer. JSON responses would be ideal. In fact, an attacker can do a brute-force attack of the random seed offline. Once they have the seed, generating collisions is a fast process. The goal isn't perfection, but we need to do better than a simple salt. I propose we modify the string hash function like this: https://gist.github.com/0a91e52efa74f61858b5 This code is based on PyPy's implementation, but the concept is universal. Rather than choosing a single short random seed per process, we generate a much larger random seed (r). As we hash, we deterministically choose a portion of that seed and incorporate it into the hash process. This modification is a minimally intrusive change to the existing hash function, and so should not introduce unexpected side effects which might come from switching to a different class of hash functions. I've worked through this code with Alex Gaynor, Antoine Pitrou, and Victor Stinner, and have asked several mathematicians and security experts to review the concept. The reviewers who have gotten back to me thus far have agreed that if the initial random seed is not flawed, this should not overly change the properties of the hash function, but should make it quite difficult for an attacker to deduce the necessary information to predictably cause hash collisions. This function is not designed to protect against timing attacks, but should be nontrivial to reverse even with access to timing data. Empirical testing shows that this unoptimized python implementation produces ~10% slowdown in the hashing of ~20 character strings. This is probably an acceptable trade off, and actually provides better performance in the case of short strings than a high-entropy fixed-length seed prefix. -Paul From martin at v.loewis.de Sun Jan 1 04:36:37 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 01 Jan 2012 04:36:37 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120101043637.Horde.a7_NDLuWis5O-9TFmFkWl9A@webmail.df.eu> > (Well, technically, you could use trees or some other O log n data > structure as a fallback once you have too many collisions, for some value > of "too many". Seems a bit wasteful for the purpose, though.) I don't think that would be wasteful. You wouldn't just use the tree for the case of too many collisions, but for any collision. You might special-case the case of a single key, i.e. start using the tree only if there is a collision. The issue is not the effort, but the need to support ordering if you want to use trees. So you might restrict this to dicts that have only str keys (which in practice should never have any collision, unless it's a deliberate setup). I'd use the tagged-pointer trick to determine whether a key is an object pointer (tag 0) or an AVL tree (tag 1). So in the common case of interned strings, the comparison for pointer equality (which is the normal case if the keys are interned) will succeed quickly; if pointer comparison fails, check the tag bit. Regards, Martin From solipsis at pitrou.net Sun Jan 1 05:11:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 05:11:03 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <20120101051103.67448343@pitrou.net> On Sat, 31 Dec 2011 16:56:00 -0700 Guido van Rossum wrote: > ISTM the only reasonable thing is to have a random seed picked very early > in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). Do str and bytes still have to be compatible with each other in 3.x? Merry hashes, weakrefs and thread-local memoryviews to everyone! cheers Antoine. From guido at python.org Sun Jan 1 05:22:47 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 21:22:47 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101051103.67448343@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> Message-ID: On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou wrote: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: > > ISTM the only reasonable thing is to have a random seed picked very early > > in the process, to be used to change the hash() function of > > str/bytes/unicode (in a way that they are still compatible with each > other). > > Do str and bytes still have to be compatible with each other in 3.x? > Hm, you're right, that's no longer a concern. (Though ATM the hashes still *are* compatible.) > Merry hashes, weakrefs and thread-local memoryviews to everyone! > :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 05:31:50 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 31 Dec 2011 21:31:50 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan wrote: > > I'm not too concerned about a 3rd party being able to guess the random > seed > > -- this would require much more effort on their part, since they would > have > > to generate a new set of colliding keys each time they think they have > > guessed the hash > > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. > Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced. > The goal isn't perfection, but we need to do better than a simple > salt. Perhaps. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 > > This code is based on PyPy's implementation, but the concept is > universal. Rather than choosing a single short random seed per > process, we generate a much larger random seed (r). As we hash, we > deterministically choose a portion of that seed and incorporate it > into the hash process. This modification is a minimally intrusive > change to the existing hash function, and so should not introduce > unexpected side effects which might come from switching to a different > class of hash functions. > I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.) > I've worked through this code with Alex Gaynor, Antoine Pitrou, and > Victor Stinner, and have asked several mathematicians and security > experts to review the concept. The reviewers who have gotten back to > me thus far have agreed that if the initial random seed is not flawed, I forget -- what do we do on systems without urandom()? (E.g. Windows?) > this should not overly change the properties of the hash function, but > should make it quite difficult for an attacker to deduce the necessary > information to predictably cause hash collisions. This function is not > designed to protect against timing attacks, but should be nontrivial > to reverse even with access to timing data. > Let's worry about timing attacks another time okay? > Empirical testing shows that this unoptimized python implementation > produces ~10% slowdown in the hashing of ~20 character strings. This > is probably an acceptable trade off, and actually provides better > performance in the case of short strings than a high-entropy > fixed-length seed prefix. > Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcmillan.ws Sun Jan 1 06:57:09 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 31 Dec 2011 21:57:09 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > Still, it would represent an effort for the attacker of a much greater > magnitude than the current attack. It's all a trade-off -- at some point > it'll just be easier for the attacker to use some other vulnerability. Also > the class of vulnerable servers would be greatly reduced. I agree that doing anything is better than doing nothing. If we use the earlier suggestion and prepend everything with a fixed-length seed, we need quite a bit of entropy (and so a fairly long string) to make a lasting difference. > I'm not sure I understand this. What's the worry about "a different class of > hash functions"? (It may be clear that I do not have a deep mathematical > understanding of hash functions.) This was mostly in reference to earlier suggestions of switching to cityhash, or using btrees, or other more invasive changes. Python 2.X is pretty stable and making large changes like that to the codebase can have unpredictable effects. We know that the current hash function works well (except for this specific problem), so it seems like the best fix will be as minimal a modification as possible, to avoid introducing bugs. > I forget -- what do we do on systems without urandom()? (E.g. Windows?) Windows has CryptGenRandom which is approximately equivalent. > Let's worry about timing attacks another time okay? Agreed. As long as there isn't a gaping hole, I'm fine with that. > Hm. I'm not sure I like the idea of extra arithmetic for every character > being hashed. >From a performance standpoint, this may still be better than adding 8 or 10 characters to every single hash operation, since most hashes are over short strings. It is important that this function touches every character - if it only interacts with a subset of them, an attacker can fix that subset and vary the rest. > But I like the idea of a bigger random seed from which we > deterministically pick some part. Yeah. This makes it much harder to attack, since it very solidly places the attacker outside the realm of "just brute force the key". > How about just initializing x to some > subsequence of the seed determined by e.g. the length of the hashed string > plus a few characters from it? We did consider this, and if performance is absolutely the prime directive, this (or a variant) may be the best option. Unfortunately, the collision generator doesn't necessarily vary the length of the string. Additionally, if we don't vary based on all the letters in the string, an attacker can fix the characters that we do use and generate colliding strings around them. Another option to consider would be to apply this change to some but not all of the rounds. If we added the seed lookup xor operation for only the first and last 5 values of x, we would still retain much of the benefit without adding much computational overhead for very long strings. We could also consider a less computationally expensive operation than the modulo for calculating the lookup index, like simply truncating to the correct number of bits. -Paul From guido at python.org Sun Jan 1 16:09:54 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:09:54 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan wrote: > > Still, it would represent an effort for the attacker of a much greater > > magnitude than the current attack. It's all a trade-off -- at some point > > it'll just be easier for the attacker to use some other vulnerability. > Also > > the class of vulnerable servers would be greatly reduced. > > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. > Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer. > > I'm not sure I understand this. What's the worry about "a different > class of > > hash functions"? (It may be clear that I do not have a deep mathematical > > understanding of hash functions.) > > This was mostly in reference to earlier suggestions of switching to > cityhash, or using btrees, or other more invasive changes. Python 2.X > is pretty stable and making large changes like that to the codebase > can have unpredictable effects. Agreed. > We know that the current hash function > works well (except for this specific problem), so it seems like the > best fix will be as minimal a modification as possible, to avoid > introducing bugs. > Yup. > > I forget -- what do we do on systems without urandom()? (E.g. Windows?) > Windows has CryptGenRandom which is approximately equivalent. > > > Let's worry about timing attacks another time okay? > Agreed. As long as there isn't a gaping hole, I'm fine with that. > > > Hm. I'm not sure I like the idea of extra arithmetic for every character > > being hashed. > > From a performance standpoint, this may still be better than adding 8 > or 10 characters to every single hash operation, since most hashes are > over short strings. But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input. It is important that this function touches every > character - if it only interacts with a subset of them, an attacker > can fix that subset and vary the rest. > I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings. > > But I like the idea of a bigger random seed from which we > > deterministically pick some part. > > Yeah. This makes it much harder to attack, since it very solidly > places the attacker outside the realm of "just brute force the key". > > > How about just initializing x to some > > subsequence of the seed determined by e.g. the length of the hashed > string > > plus a few characters from it? > > We did consider this, and if performance is absolutely the prime > directive, this (or a variant) may be the best option. Unfortunately, > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. > Still, much more work for the attacker. > Another option to consider would be to apply this change to some but > not all of the rounds. If we added the seed lookup xor operation for > only the first and last 5 values of x, we would still retain much of > the benefit without adding much computational overhead for very long > strings. > I like that. > We could also consider a less computationally expensive operation than > the modulo for calculating the lookup index, like simply truncating to > the correct number of bits. > Sure. Thanks for thinking about all the details here!! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 16:13:11 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:13:11 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: Different concern. What if someone were to have code implementing an external, persistent hash table, using Python's hash() function? They might have a way to rehash everything when a new version of Python comes along, but they would not be happy if hash() is different in each process. I somehow vaguely remember possibly having seen such code, or something else where a bit of random data was needed and hash() was used since it's so easily available. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 1 16:13:44 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 1 Jan 2012 08:13:44 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: PS. Is the collision-generator used in the attack code open source? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at cheimes.de Sun Jan 1 16:27:51 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:27:51 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F007B77.8090901@cheimes.de> Am 01.01.2012 16:13, schrieb Guido van Rossum: > Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They > might have a way to rehash everything when a new version of Python comes > along, but they would not be happy if hash() is different in each > process. I somehow vaguely remember possibly having seen such code, or > something else where a bit of random data was needed and hash() was used > since it's so easily available. I had the same concern as you and was worried that projects like ZODB might require a stable hash function. Fred already stated that ZODB doesn't use the hash in its btree structures. Possible solutions: * make it possible to provide the seed as an env var * disable randomizing as default setting or at least add an option to disable randomization IMHO the issue needs a PEP that explains the issue, shows all possible solutions and describes how we have solved the issue. I'm willing to start a PEP. Who likes to be the co-author? Christian From lists at cheimes.de Sun Jan 1 16:30:26 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:30:26 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F007C12.4090401@cheimes.de> Am 31.12.2011 23:38, schrieb Terry Reedy: > On 12/31/2011 4:43 PM, PJ Eby wrote: > >> Here's an idea. Suppose we add a sys.hash_seed or some such, that's >> settable to an int, and defaults to whatever we're using now. Then >> programs that want a fix can just set it to a random number, > > I do not think we can allow that to change once there are hashed > dictionaries existing. Me, too. Armin suggested to use an env var as random. From lists at cheimes.de Sun Jan 1 16:48:32 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:48:32 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F008050.1030807@cheimes.de> Am 01.01.2012 00:56, schrieb Guido van Rossum: > ISTM the only reasonable thing is to have a random seed picked very > early in the process, to be used to change the hash() function of > str/bytes/unicode (in a way that they are still compatible with each other). > > The seed should be unique per process except it should survive fork() > (but not exec()). I'm not worried about unrelated processes needing to > have the same hash(), but I'm not against offering an env variable or > command line flag to force the seed. I've created a clone at http://hg.python.org/features/randomhash/ as a testbed. The code creates the seed very early in PyInitializeEx(). The method isn't called on fork() but on exec(). > I'm not too concerned about a 3rd party being able to guess the random > seed -- this would require much more effort on their part, since they > would have to generate a new set of colliding keys each time they think > they have guessed the hash (as long as they can't force the seed -- this > actually argues slightly *against* offering a way to force the seed, > except that we have strong backwards compatibility requirements). The talkers claim and have shown that it's too easy to pre-calculate collisions with hashing algorithms similar to DJBX33X / DJBX33A. It might be a good idea to change the hashing algorithm, too. Paul as listed some new algorithms. Ruby 1.9 is using FNV http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a good dispersion pattern. A hashing algorithm without a meet-in-the-middle vulnerability would reduce the pressure on a good and secure seed, too. > We need to fix this as far back as Python 2.6, and it would be nice if a > source patch was available that works on Python 2.5 -- personally I do > have a need for a 2.5 fix and if nobody creates one I will probably end > up backporting the fix from 2.6 to 2.5. +1 Should the randomization be disabled on 2.5 to 3.2 by default to reduce backward compatibility issues? Christian From lists at cheimes.de Sun Jan 1 16:56:19 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 16:56:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101051103.67448343@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> Message-ID: <4F008223.8020806@cheimes.de> Am 01.01.2012 05:11, schrieb Antoine Pitrou: > On Sat, 31 Dec 2011 16:56:00 -0700 > Guido van Rossum wrote: >> ISTM the only reasonable thing is to have a random seed picked very early >> in the process, to be used to change the hash() function of >> str/bytes/unicode (in a way that they are still compatible with each other). > > Do str and bytes still have to be compatible with each other in 3.x? py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking about this invariant? Christian From solipsis at pitrou.net Sun Jan 1 17:09:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 17:09:23 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> Message-ID: <20120101170923.5323628a@pitrou.net> On Sun, 01 Jan 2012 16:48:32 +0100 Christian Heimes wrote: > The talkers claim and have shown that it's too easy to pre-calculate > collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > might be a good idea to change the hashing algorithm, too. Paul as > listed some new algorithms. Ruby 1.9 is using FNV > http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > good dispersion pattern. We already seem to be using a FNV-alike, is it just a matter of changing the parameters? > A hashing algorithm without a > meet-in-the-middle vulnerability would reduce the pressure on a good and > secure seed, too. > > > We need to fix this as far back as Python 2.6, and it would be nice if a > > source patch was available that works on Python 2.5 -- personally I do > > have a need for a 2.5 fix and if nobody creates one I will probably end > > up backporting the fix from 2.6 to 2.5. > > +1 > > Should the randomization be disabled on 2.5 to 3.2 by default to reduce > backward compatibility issues? Isn't 2.5 already EOL'ed? As for 3.2, I'd say no. I don't know about 2.6 and 2.7. Regards Antoine. From solipsis at pitrou.net Sun Jan 1 17:10:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Jan 2012 17:10:03 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F008223.8020806@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <20120101051103.67448343@pitrou.net> <4F008223.8020806@cheimes.de> Message-ID: <20120101171003.5f657a00@pitrou.net> On Sun, 01 Jan 2012 16:56:19 +0100 Christian Heimes wrote: > Am 01.01.2012 05:11, schrieb Antoine Pitrou: > > On Sat, 31 Dec 2011 16:56:00 -0700 > > Guido van Rossum wrote: > >> ISTM the only reasonable thing is to have a random seed picked very early > >> in the process, to be used to change the hash() function of > >> str/bytes/unicode (in a way that they are still compatible with each other). > > > > Do str and bytes still have to be compatible with each other in 3.x? > > py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking > about this invariant? Yes. It doesn't seem to have any point anymore. Regards Antoine. From lists at cheimes.de Sun Jan 1 17:20:34 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 17:20:34 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F0087D2.9090405@cheimes.de> Am 01.01.2012 06:57, schrieb Paul McMillan: > I agree that doing anything is better than doing nothing. If we use > the earlier suggestion and prepend everything with a fixed-length > seed, we need quite a bit of entropy (and so a fairly long string) to > make a lasting difference. Your code at https://gist.github.com/0a91e52efa74f61858b5 reads about 2 MB (2**21 - 1) data from urandom. I'm worried that this is going to exhaust the OS's random pool and suck it dry. We shouldn't forget that Python is used for long running processes as well as short scripts. Your suggestion also increases the process size by 2 MB which is going to be an issue for mobile and embedded platforms. How about this: r = [ord(i) for i in os.urandom(256)] rs = os.urandom(4) # or 8 ? seed = rs[-1] + (rs[-2] << 8) + (rs[-3] << 16) + (rs[-4] << 24) def _hash_string(s): """The algorithm behind compute_hash() for a string or a unicode.""" from pypy.rlib.rarithmetic import intmask length = len(s) if length == 0: return -1 x = intmask(seed + (ord(s[0]) << 7)) i = 0 while i < length: o = ord(s[i]) x = intmask((1000003*x) ^ o ^ r[o % 0xff] i += 1 x ^= length return intmask(x) This combines a random seed for the hash with your suggestion. We also need to special case short strings. The above routine hands over the seed to attackers, if he is able to retrieve lots of single character hashes. The randomization shouldn't be used if we can prove that it's not possible to create hash collisions for strings shorter than X. For example 64bit FNV-1 has no collisions for 8 chars or less, 32bit FNV has no collisions for 4 or less cars. Christian Christian From lists at cheimes.de Sun Jan 1 17:34:31 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 17:34:31 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120101170923.5323628a@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> Message-ID: <4F008B17.8000606@cheimes.de> Am 01.01.2012 17:09, schrieb Antoine Pitrou: > On Sun, 01 Jan 2012 16:48:32 +0100 > Christian Heimes wrote: >> The talkers claim and have shown that it's too easy to pre-calculate >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It >> might be a good idea to change the hashing algorithm, too. Paul as >> listed some new algorithms. Ruby 1.9 is using FNV >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a >> good dispersion pattern. > > We already seem to be using a FNV-alike, is it just a matter of > changing the parameters? No, we are using something similar to DJBX33X. FNV is a completely different type of hash algorithm. From solipsis at pitrou.net Sun Jan 1 17:54:04 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 01 Jan 2012 17:54:04 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F008B17.8000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> Message-ID: <1325436844.3472.6.camel@localhost.localdomain> Le dimanche 01 janvier 2012 ? 17:34 +0100, Christian Heimes a ?crit : > Am 01.01.2012 17:09, schrieb Antoine Pitrou: > > On Sun, 01 Jan 2012 16:48:32 +0100 > > Christian Heimes wrote: > >> The talkers claim and have shown that it's too easy to pre-calculate > >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It > >> might be a good idea to change the hashing algorithm, too. Paul as > >> listed some new algorithms. Ruby 1.9 is using FNV > >> http://isthe.com/chongo/tech/comp/fnv/ which promises to be fast with a > >> good dispersion pattern. > > > > We already seem to be using a FNV-alike, is it just a matter of > > changing the parameters? > > No, we are using something similar to DJBX33X. FNV is a completely > different type of hash algorithm. I don't understand. FNV-1 multiplies the current running result with a prime and then xors it with the following byte. This is also what we do. (I'm assuming 1000003 is prime) I see two differences: - FNV starts with a non-zero constant offset basis - FNV uses a different prime than ours (as a side note, FNV operates on bytes, but for unicode we must operate on code points in [0, 1114111]: although arguably the common case is hashing ASCII substrings (protocol tokens etc.)) Regards Antoine. From lists at cheimes.de Sun Jan 1 18:28:19 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 18:28:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <1325436844.3472.6.camel@localhost.localdomain> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> <1325436844.3472.6.camel@localhost.localdomain> Message-ID: <4F0097B3.1040905@cheimes.de> Am 01.01.2012 17:54, schrieb Antoine Pitrou: > I don't understand. FNV-1 multiplies the current running result with a > prime and then xors it with the following byte. This is also what we do. > (I'm assuming 1000003 is prime) There must be a major difference somewhere inside the algorithm. The talk at the CCC conference in Berlin mentions that Ruby 1.9 is not vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C code of FNV is more complex than our code, too. Christian From lists at cheimes.de Sun Jan 1 18:32:12 2012 From: lists at cheimes.de (Christian Heimes) Date: Sun, 01 Jan 2012 18:32:12 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F00989C.4010708@cheimes.de> Am 01.01.2012 01:11, schrieb Guido van Rossum: > FWIW I managed to build Python 2.6, and a trivial mutation of the > string/unicode hash function (add 1 initially) made only three tests > fail; test_symtable and test_json both have a dependency on dictionary > order, test_ctypes I can't quite figure out what's going on. In my fork, these tests are failing: test_dbm test_dis test_gdb test_inspect test_packaging test_set test_symtable test_urllib test_userdict test_collections test_json From victor.stinner at haypocalc.com Sun Jan 1 18:32:51 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 01 Jan 2012 18:32:51 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F0098C3.9040609@haypocalc.com> Le 01/01/2012 04:29, Paul McMillan a ?crit : > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. If we want to protect a website against this attack for example, we must suppose that the attacker can inject arbitrary data and can get (indirectly) the result of hash(str) (e.g. with the representation of a dict in a traceback, with a JSON output, etc.). > The goal isn't perfection, but we need to do better than a simple > salt. I disagree. I don't want to break backward compatibility and have a hash() function different for each process, if the change is not an effective protection against the "hash vulnerability". It's really hard to write a good (secure) hash function: see for example the recent NIST competition (started in 2008, will end this year). Even good security researcher are unable to write a strong and fast hash function. It's easy to add a weakness in the function if you don't have a good background in cryptography. The NIST competition gives 4 years to analyze new hash functions. We should not rush to add a quick "hack" if it doesn't solve correctly the problem (protect against a collision attack and preimage attack). http://en.wikipedia.org/wiki/NIST_hash_function_competition http://en.wikipedia.org/wiki/Collision_attack Runtime performance does matter, I'm not completly sure that changing Python is the best place to add a countermeasure against a vulnerability. I don't want to slow down numpy for a web vulnerability. Because there are different use cases, a better compromise is maybe to add a runtime option to use a secure hash function, and keep the unsafe but fast hash function by default. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 Always allocate 2**21 bytes just to workaround one specific kind of attack is not acceptable. I suppose that the maximum acceptable is 4096 bytes (or better 256 bytes). Crytographic hash functions don't need random data, why would Python need 2 MB (!) for its hash function? Victor From tjreedy at udel.edu Sun Jan 1 19:45:01 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 01 Jan 2012 13:45:01 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: On 1/1/2012 10:13 AM, Guido van Rossum wrote: > PS. Is the collision-generator used in the attack code open source? As I posted before, Alexander Klink and Julian W?lde gave their project email as hashDoS at alech.de. Since they indicated disappointment in not hearing from Python, I presume they would welcome engagement. -- Terry Jan Reedy From tjreedy at udel.edu Sun Jan 1 19:46:51 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 01 Jan 2012 13:46:51 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0097B3.1040905@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net> <4F008B17.8000606@cheimes.de> <1325436844.3472.6.camel@localhost.localdomain> <4F0097B3.1040905@cheimes.de> Message-ID: On 1/1/2012 12:28 PM, Christian Heimes wrote: > Am 01.01.2012 17:54, schrieb Antoine Pitrou: >> I don't understand. FNV-1 multiplies the current running result with a >> prime and then xors it with the following byte. This is also what we do. >> (I'm assuming 1000003 is prime) > > There must be a major difference somewhere inside the algorithm. The > talk at the CCC conference in Berlin mentions that Ruby 1.9 is not > vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C > code of FNV is more complex than our code, too. I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying that they consider that using a random non-zero start value is sufficient to make the hash non-vulnerable. -- Terry Jan Reedy From jimjjewett at gmail.com Mon Jan 2 00:28:02 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 18:28:02 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Steven D'Aprano (in ) wrote: > By compile-time, do you mean when the byte-code is compilated, i.e. just > before runtime, rather than a switch when compiling the Python executable from > source? No. I really mean when the C code is initially compiled to produce an python executable. The only reason we're worrying about this is that an adversary may force worst-case performance. If the python instance isn't a server, or at least isn't exposed to untrusted clients, then even a single extra "if" test is unjustified overhead. Adding overhead to every string hash or every dict lookup is bad. That said, adding some overhead (only) to dict lookups *that already hit half a dozen consecutive collisions* probably is reasonable, because that won't happen very often with normal data. (6 collisions can't happen at all unless there are already at least 6 entries, so small dicts are safe; with at least 1/3 of the slots empty, it should happen only 1/729 for worst-size larger dicts.) -jJ From paul at mcmillan.ws Mon Jan 2 00:43:52 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 15:43:52 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > But how about precomputing the intermediate value (x)? The hash is (mostly) > doing x = f(x, c) for each c in the input. That's a fair point. If we go down that avenue, I think simply choosing a random fixed starting value for x is the correct choice, rather than computing an intermediate value. > I sort of see your point, but I still think that if we could add as little > per-character overhead as possible it would be best -- sometimes people *do* > hash very long strings. Yep, agreed. My original proposal did not adequately address this. >> Another option to consider would be to apply this change to some but >> not all of the rounds. If we added the seed lookup xor operation for >> only the first and last 5 values of x, we would still retain much of >> the benefit without adding much computational overhead for very long >> strings. > > I like that. I believe this is a reasonable solution. An attacker could still manipulate the internal state of long strings, but the additional information at both ends should make that difficult to exploit. I'll talk it over with the reviewers. >> We could also consider a less computationally expensive operation than >> the modulo for calculating the lookup index, like simply truncating to >> the correct number of bits. > > Sure. Thanks for thinking about all the details here!! Again, I'll talk to the reviewers (and run the randomness test battery) to be double-check that this doesn't affect the distribution in some unexpected way, but I think it should be fine. > PS. Is the collision-generator used in the attack code open source? Not in working form, and they've turned down requests for it from other projects that want to check their work. If it's imperative that we have one, I can write one, but I'd rather not spend the effort if we don't need it. -Paul From paul at mcmillan.ws Mon Jan 2 00:49:14 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 15:49:14 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: > Different concern. What if someone were to have code implementing an > external, persistent hash table, using Python's hash() function? They might > have a way to rehash everything when a new version of Python comes along, > but they would not be happy if hash() is different in each process. I > somehow vaguely remember possibly having seen such code, or something else > where a bit of random data was needed and hash() was used since it's so > easily available. I agree that there are use cases for allowing users to choose the random seed, in much the same way it's helpful to be able to set it for the random number generator. This should probably be something that can be passed in at runtime. This feature would also be useful for users who want to synchronize the hashes of multiple independent processes, for whatever reason. For the general case though, randomization should be on by default. -Paul From jimjjewett at gmail.com Mon Jan 2 01:02:44 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:02:44 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Paul McMillan in wrote: > Guido van Rossum wrote: >> Hm. I'm not sure I like the idea of extra arithmetic for every character >> being hashed. > the collision generator doesn't necessarily vary the length of the > string. Additionally, if we don't vary based on all the letters in the > string, an attacker can fix the characters that we do use and generate > colliding strings around them. If the new hash algorithm doesn't kick in before, say, 32 characters, then most currently hashed strings will not be affected. And if the attacker has to add 32 characters to every key, it reduces the "this can be done with only N bytes uploaded" risk. (The same logic would apply to even longer prefixes, except that an attacker might more easily find short-enough strings that collide.) > We could also consider a less computationally expensive operation > than the modulo for calculating the lookup index, like simply truncating > to the correct number of bits. Given that the modulo is always 2^N, how is that different? -jJ From jimjjewett at gmail.com Mon Jan 2 01:21:11 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:21:11 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: Victor Stinner wrote in > If we want to protect a website against this attack for example, we must > suppose that the attacker can inject arbitrary data and can get > (indirectly) the result of hash(str) (e.g. with the representation of a > dict in a traceback, with a JSON output, etc.). (1) Is it common to hash non-string input? Because generating integers that collide for certain dict sizes is pretty easy... (2) Would it make sense for traceback printing to sort dict keys? (Any site worried about this issue should already be hiding tracebacks from untrusted clients, but the cost of this extra protection may be pretty small, given that tracebacks shouldn't be printed all that often in the first place.) (3) Should the docs for json.encoder.JSONEncoder suggest sort_keys=True? -jJ From jimjjewett at gmail.com Mon Jan 2 01:37:26 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 19:37:26 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html Message-ID: In http://mail.python.org/pipermail/python-dev/2011-December/115172.html, P. J. Eby wrote: > On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull wrote: >> While the dictionary probe has to start with a hash for backward >> compatibility reasons, is there a reason the overflow strategy for >> insertion has to be buckets containing lists? How about >> double-hashing, etc? > This won't help, because the keys still have the same hash value. ANYTHING > you do to them after they're generated will result in them still colliding. > The *only* thing that works is to change the hash function in such a way > that the strings end up with different hashes in the first place. > Otherwise, you'll still end up with (deliberate) collisions. Well, there is nothing wrong with switching to a different hash function after N collisions, rather than "in the first place". The perturbation effectively does by shoving the high-order bits through the part of the hash that survives the mask. > (Well, technically, you could use trees or some other O log n data > structure as a fallback once you have too many collisions, for some value > of "too many". Seems a bit wasteful for the purpose, though.) Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > requires using a real dictionary for compatibility; storing some of the values outside the values array would violate that. Do you consider that obsolete? -jJ From lists at cheimes.de Mon Jan 2 02:04:38 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 02:04:38 +0100 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: <4F0102A6.1020108@cheimes.de> Am 02.01.2012 01:37, schrieb Jim Jewett: > Well, there is nothing wrong with switching to a different hash function after N > collisions, rather than "in the first place". The perturbation > effectively does by > shoving the high-order bits through the part of the hash that survives the mask. Except that it won't work or slow down every lookup of missing keys? It's absolutely crucial that the lookup time is kept as fast as possible. You can't just change the hash algorithm in the middle of the work without a speed impact on lookups. The size of the dict can shrink or grow over time. This results into a different number of collisions for the same string. Cuckoo hashing (http://en.wikipedia.org/wiki/Hash_table#Collision_resolution) doesn't sound feasible for us because it slows down lookup and requires an ABI incompatible change for more hash slots on str/bytes/unicode instances. Christian PS: Something is wrong with your email client. Every of your replies starts a new thread for me. From solipsis at pitrou.net Mon Jan 2 02:19:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 02:19:37 +0100 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html References: <4F0102A6.1020108@cheimes.de> Message-ID: <20120102021937.029dcb91@pitrou.net> On Mon, 02 Jan 2012 02:04:38 +0100 Christian Heimes wrote: > > PS: Something is wrong with your email client. Every of your replies > starts a new thread for me. Same here. Regards Antoine. From jimjjewett at gmail.com Mon Jan 2 02:23:16 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 20:23:16 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: <4F0102A6.1020108@cheimes.de> References: <4F0102A6.1020108@cheimes.de> Message-ID: On Sun, Jan 1, 2012 at 8:04 PM, Christian Heimes wrote: > Am 02.01.2012 01:37, schrieb Jim Jewett: >> Well, there is nothing wrong with switching to a different hash function after N >> collisions, rather than "in the first place". ?The perturbation >> effectively does by >> shoving the high-order bits through the part of the hash that survives the mask. > Except that it won't work or slow down every lookup of missing keys? > It's absolutely crucial that the lookup time is kept as fast as possible. It will only slow down missing keys that themselves hit more than N collisions. Or were you assuming that I meant to switch the whole table, rather than just that one key? I agree that wouldn't work. > You can't just change the hash algorithm in the middle of the work > without a speed impact on lookups. Right -- but there is nothing wrong with modifying the lookdict (and insert_clean) functions to do something different after the Nth collision than they did after the N-1th. -jJ From pje at telecommunity.com Mon Jan 2 04:00:33 2012 From: pje at telecommunity.com (PJ Eby) Date: Sun, 1 Jan 2012 22:00:33 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett wrote: > Well, there is nothing wrong with switching to a different hash function > after N > collisions, rather than "in the first place". The perturbation > effectively does by > shoving the high-order bits through the part of the hash that survives the > mask. > Since these are true hash collisions, they will all have the same high order bits. So, the usefulness of the perturbation is limited mainly to the common case where true collisions are rare. > (Well, technically, you could use trees or some other O log n data > > structure as a fallback once you have too many collisions, for some value > > of "too many". Seems a bit wasteful for the purpose, though.) > > Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > > requires > using a real dictionary for compatibility; storing some of the values > outside the > values array would violate that. When I said "use some other data structure", I was referring to the internal implementation of the dict type, not to user code. The only user-visible difference (even at C API level) would be the order of keys() et al. (In any case, I still assume this is too costly an implementation change compared to changing the hash function or seeding it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Jan 2 04:28:13 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jan 2012 22:28:13 -0500 Subject: [Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html In-Reply-To: References: Message-ID: On Sun, Jan 1, 2012 at 10:00 PM, PJ Eby wrote: > On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett wrote: >> Well, there is nothing wrong with switching to a different hash function >> after N >> collisions, rather than "in the first place". ?The perturbation >> effectively does by >> shoving the high-order bits through the part of the hash that survives the >> mask. > Since these are true hash collisions, they will all have the same high order > bits. ?So, the usefulness of the perturbation is limited mainly to the > common case where true collisions are rare. That is only because the perturb is based solely on the hash. Switching to an entirely new hash after the 5th collision (for a given lookup) would resolve that (after the 5th collision); the question is whether or not the cost is worthwhile. >> > (Well, technically, you could use trees or some other O log n data >> > structure as a fallback once you have too many collisions, for some >> > value >> > of "too many". ?Seems a bit wasteful for the purpose, though.) >> >> Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > >> requires >> using a real dictionary for compatibility; storing some of the values >> outside the >> values array would violate that. > When I said "use some other data structure", I was referring to the internal > implementation of the dict type, not to user code. ?The only user-visible > difference (even at C API level) would be the order of keys() et al. Given the wording requiring a real dictionary, I would have assumed that it was OK (if perhaps not sensible) to do pointer arithmetic and access the keys/values/hashes directly. (Though if the breakage was between python versions, I would feel guilty about griping too loudly.) -jJ From ncoghlan at gmail.com Mon Jan 2 05:44:49 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 14:44:49 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces Message-ID: I've been having an occasional argument with Benjamin regarding braces in 4-line if statements: if (cond) statement; else statement; vs. if (cond) { statement; } else { statement; } He keeps leaving them out, I occasionally tell him they should always be included (most recently this came up when we gave conflicting advice to a patch contributor). He says what he's doing is OK, because he doesn't consider the example in PEP 7 as explicitly disallowing it, I think it's a recipe for future maintenance hassles when someone adds a second statement to one of the clauses but doesn't add the braces. (The only time I consider it reasonable to leave out the braces is for one liner if statements, where there's no else clause at all) Since Benjamin doesn't accept the current brace example in PEP 7 as normative for the case above, I'm bringing it up here to seek clarification. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ben+python at benfinney.id.au Mon Jan 2 06:25:51 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 02 Jan 2012 16:25:51 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces References: Message-ID: <87pqf21xr4.fsf@benfinney.id.au> Nick Coghlan writes: > He keeps leaving [braces] out [when the block is a single statement], > I occasionally tell him they should always be included (most recently > this came up when we gave conflicting advice to a patch contributor). As someone who has maintained his fair share of C code, I am firmly on the side of unconditionally (!) enclosing C statement blocks in braces regardless of how many statements they contain. > He says what he's doing is OK, because he doesn't consider the example > in PEP 7 as explicitly disallowing it I wonder if he has a stronger argument in favour of his position, because ?it's not forbidden? doesn't imply ?it's okay?. > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. Agreed, it's an issue of code maintainability. Which is enough of a problem in C code that a low-cost improvement like this should always be done. But, as someone who carries no water in the Python developer community, my opinion has no more force than the arguments, and I can't impose it on anyone. Take it for what it's worth. -- \ ?God was invented to explain mystery. God is always invented to | `\ explain those things that you do not understand.? ?Richard P. | _o__) Feynman, 1988 | Ben Finney From scott+python-dev at scottdial.com Mon Jan 2 06:04:15 2012 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 02 Jan 2012 00:04:15 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F013ACF.7090708@scottdial.com> On 1/1/2012 11:44 PM, Nick Coghlan wrote: > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) Could you explain how these two cases differ with regard to maintenance? In either case, there are superfluous edits required if the original author had used braces *always*. Putting a brace on one-liners adds only a single line to the code -- just like in the if/else case. So, your argument seems conflicted. Surely, you would think this is a simpler edit to make and diff to see in a patch file: if(cond) { stmt1; + stmt2; } vs. -if(cond) +if(cond) { stmt1; + stmt2; +} Also, the superfluous edits will wrongly attribute the blame for the conditional to the wrong author. -- Scott Dial scott at scottdial.com From paul at mcmillan.ws Mon Jan 2 06:55:52 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 1 Jan 2012 21:55:52 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0087D2.9090405@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: I fixed a couple things in my proposed algorithm: https://gist.github.com/0a91e52efa74f61858b5 I had a typo, and used 21 instead of 12 for the size multiplier. We definitely don't need 2MB random data. The initialization of r was broken. Now it is an array of ints, so there's no conversion when it's used. I've adjusted it so there's 8k of random data, broken into 2048 ints. I added a length-based seed to the initial value of x. This prevents single-characters from being used to enumerate raw values from r. This is similar to the change proposed by Christian Heimes. Most importantly, I moved the xor with r[x % len_r] down a line. Before, it wasn't being applied to the last character. > Christian Heimes said: > We also need to special case short strings. The above routine hands over > the seed to attackers, if he is able to retrieve lots of single > character hashes. The updated code always includes at least 2 lookups from r, which I believe solves the single-character enumeration problem. If we special-case part of our hash function for short strings, we may get suboptimal collisions between the two types of hashes. I think Ruby uses FNV-1 with a salt, making it less vulnerable to this. FNV is otherwise similar to our existing hash function. For the record, cryptographically strong hash functions are in the neighborhood of 400% slower than our existing hash function. > Terry Reedy said: > I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying > that they consider that using a random non-zero start value is sufficient to > make the hash non-vulnerable. I've been talking to them. They're happy to look at our proposed changes. They indicate that a non-zero start value is sufficient to prevent the attack, but D. J. Bernstein disagrees with them. He also has indicated a willingness to look at our solution. -Paul From ncoghlan at gmail.com Mon Jan 2 07:02:59 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 16:02:59 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F013ACF.7090708@scottdial.com> References: <4F013ACF.7090708@scottdial.com> Message-ID: On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial wrote: > On 1/1/2012 11:44 PM, Nick Coghlan wrote: >> I think it's a recipe for future maintenance hassles when someone adds >> a second statement to one of the clauses but doesn't add the braces. >> (The only time I consider it reasonable to leave out the braces is for >> one liner if statements, where there's no else clause at all) > > Could you explain how these two cases differ with regard to maintenance? Sure: always including K&R style braces for compound statements (even when they aren't technically necessary) means that indentation == control flow, just like Python. Indent your additions correctly, and the reader and compiler will agree on what they mean: if (cond) { statement; } else { statement; addition; /* Reader and compiler agree this is part of the else clause */ } if (cond) statement; else statement; addition; /* Uh-oh, should have added braces */ I've been trying to convince Benjamin that there's a reason "always include the braces" is accepted wisdom amongst many veteran C programmers (with some allowing an exception for one-liners), but he isn't believing me, and I'm not going to go through and edit every single one of his commits to add them. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pje at telecommunity.com Mon Jan 2 07:16:42 2012 From: pje at telecommunity.com (PJ Eby) Date: Mon, 2 Jan 2012 01:16:42 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) Message-ID: On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett wrote: > Given the wording requiring a real dictionary, I would have assumed > that it was OK (if perhaps not sensible) to do pointer arithmetic and > access the keys/values/hashes directly. (Though if the breakage was > between python versions, I would feel guilty about griping too > loudly.) > If you're going to be a language lawyer about it, I would simply point out that all the spec requires is that "type(env) is dict" -- it says nothing about how Python defines "type" or "is" or "dict". So, you're on your own with that one. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jan 2 07:22:05 2012 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 02 Jan 2012 00:22:05 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <1325485325.20247.37.camel@Gutsy> On Mon, 2012-01-02 at 14:44 +1000, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. I've had to correct my self on this one a few times so I will have to agree it's a good practice. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) The problem is only when an additional statement is added to the last block, not the preceding ones, as the compiler will complain about those. So I don't know how the 4 line example without braces is any worse than a 2 line if without braces. I think my preference is, if any block in a multi-block expression needs braces, then the other blocks should have it. (Including the last block even if it's a single line.) The next level up would be to require them on all blocks, even two line if expressions, but I'm not sure that is really needed. At some point the extra noise of the braces makes things harder to read rather than easier, and what you gain in preventing one type of error may increase chances of another type of error not being noticed. Cheers, Ron > Since Benjamin doesn't accept the current brace example in PEP 7 as > normative for the case above, I'm bringing it up here to seek > clarification. From ncoghlan at gmail.com Mon Jan 2 09:31:00 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 18:31:00 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <1325485325.20247.37.camel@Gutsy> References: <1325485325.20247.37.camel@Gutsy> Message-ID: On Mon, Jan 2, 2012 at 4:22 PM, Ron Adam wrote: > The problem is only when an additional statement is added to the last > block, not the preceding ones, as the compiler will complain about > those. ?So I don't know how the 4 line example without braces is any > worse than a 2 line if without braces. Even when the compiler picks it up, it's still a wasted edit-compile cycle. More importantly though, this approach makes the rules too complicated. "Always use braces" is simple and easy, and the only cost is the extra line of vertical whitespace for the closing brace. (I personally don't like even the exception made for single clause if statements, but that's already too prevalent in the code base to do anything about. Hence the 4-line example in my original post.) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Mon Jan 2 09:47:14 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 00:47:14 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> On Jan 1, 2012, at 8:44 PM, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } Really? Do we need to have a brace war? People have different preferences. The standard library includes some of both styles depending on what the maintainer thought was cleanest to their eyes in a given context. Raymond From ncoghlan at gmail.com Mon Jan 2 11:15:25 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Jan 2012 20:15:25 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> Message-ID: On Mon, Jan 2, 2012 at 6:47 PM, Raymond Hettinger wrote: > Really? ?Do we need to have a brace war? > People have different preferences. > The standard library includes some of both styles > depending on what the maintainer thought was cleanest to their eyes in a given context. If the answer is "either form is OK", I'm actually fine with that (and will update PEP 7 accordingly). However, I have long read PEP 7 as *requiring* the braces, and until noticing their absence in some of Benjamin's checkins and the recent conflicting advice we gave when reviewing the same patch, I had never encountered their absence in the CPython code base outside the one-liner/two-liner case*. Since I *do* feel strongly that leaving them out is a mistake that encourages future defects, and read PEP 7 as agreeing with that (aside from the general "follow conventions in surrounding code" escape clause), I figured it was better to bring it up explicitly and clarify PEP 7 accordingly (since what is currently there is clearly ambiguous enough for two current committers to have diametrically opposed views on what it says). Cheers, Nick. * That is, constructs like: if (error_condition) return -1; if (error_condition) return -1; -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Mon Jan 2 11:25:16 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 02 Jan 2012 05:25:16 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: On 1/2/2012 12:55 AM, Paul McMillan wrote: >> Terry Reedy said: >> I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as saying >> that they consider that using a random non-zero start value is sufficient to >> make the hash non-vulnerable. > > I've been talking to them. They're happy to look at our proposed > changes. They indicate that a non-zero start value is sufficient to > prevent the attack, but D. J. Bernstein disagrees with them. He also > has indicated a willingness to look at our solution. Great. My main concern currently is that there should be no noticeable slowdown for 64 bit builds which are apparently not vulnerable and which therefore would get no benefit. Terry Jan Reedy From solipsis at pitrou.net Mon Jan 2 13:01:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 13:01:05 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: <20120102130105.2ce82e10@pitrou.net> On Sun, 1 Jan 2012 21:55:52 -0800 Paul McMillan wrote: > > This is similar to the change proposed by Christian Heimes. > > Most importantly, I moved the xor with r[x % len_r] down a line. > Before, it wasn't being applied to the last character. Shouldn't it be r[i % len(r)] instead? (refer to yesterday's #python-dev discussion) > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. Again, we could re-use FNV-1's primes, since they claim they have better dispersion properties than the average prime. Regards Antoine. From solipsis at pitrou.net Mon Jan 2 13:05:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 13:05:28 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces References: Message-ID: <20120102130528.754cec85@pitrou.net> On Mon, 2 Jan 2012 14:44:49 +1000 Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } Good, I was afraid python-dev was getting a bit futile with all these security concerns about hash functions. I don't like having the else on the same line as the closing brace, and prefer: if (cond) { statement; } else { statement; } That said, I agree with Benjamin: the shorter form is visually lighter and should not be frown upon. Regards Not-frowning Antoine. From petri at digip.org Mon Jan 2 14:24:28 2012 From: petri at digip.org (Petri Lehtinen) Date: Mon, 2 Jan 2012 15:24:28 +0200 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <20120102130528.754cec85@pitrou.net> References: <20120102130528.754cec85@pitrou.net> Message-ID: <20120102132428.GR24315@p16> Antoine Pitrou wrote: > I don't like having the else on the same line as the closing brace, > and prefer: > > if (cond) { > statement; > } > else { > statement; > } And this is how it's written in PEP-7. It seems to me that PEP-7 doesn't require braces. But it explicitly forbids if (cond) { statement; } else { statement; } by saying "braces as shown", and then showing them like this: if (mro != NULL) { ... } else { ... } > That said, I agree with Benjamin: the shorter form is visually lighter > and should not be frown upon. Me too. From ned at nedbatchelder.com Mon Jan 2 15:10:40 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 02 Jan 2012 09:10:40 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F01BAE0.5000009@nedbatchelder.com> On 1/1/2012 11:44 PM, Nick Coghlan wrote: > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: > > if (cond) > statement; > else > statement; > > vs. > > if (cond) { > statement; > } else { > statement; > } > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) > > Since Benjamin doesn't accept the current brace example in PEP 7 as > normative for the case above, I'm bringing it up here to seek > clarification. I've always valued readability and consistency above brevity, and Python does too. *Sometimes* using braces in C is a recipe for confusion, and only adds to the cognitive load in reading the code. The examples elsewhere in this thread of mistakes and noisy diffs due to leaving out the braces are plenty of reason for me to always include braces. The current code uses a mixture of styles, but that doesn't mean we need to allow any style in the future. I'm in favor of PEP 7 being amended to either require or strongly favor the braces-always style. Note: while we're reading the tea-leaves in PEP 7, it has an example of a single-line if clause with no braces. Some people favor the braces-sometimes style because it leads to "lighter" code. I think that's a misguided optimization. Consistency is better than reducing the line count. --Ned. > Cheers, > Nick. > From stephen at xemacs.org Mon Jan 2 15:32:19 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Jan 2012 23:32:19 +0900 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFF1938.6080809@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <87pqf5dk39.fsf@uwakimon.sk.tsukuba.ac.jp> <4EFF1938.6080809@cheimes.de> Message-ID: <87lipq18gc.fsf@uwakimon.sk.tsukuba.ac.jp> Christian Heimes writes: > Am 31.12.2011 13:03, schrieb Stephen J. Turnbull: > > I don't know the implementation issues well enough to claim it is a > > solution, but this hasn't been mentioned before AFAICS: > > > > While the dictionary probe has to start with a hash for backward > > compatibility reasons, is there a reason the overflow strategy for > > insertion has to be buckets containing lists? How about > > double-hashing, etc? > > Python's dict implementation doesn't use bucket but open addressing (aka > closed hashed table). The algorithm for conflict resolution doesn't use > double hashing. Instead it takes the original and (in most cases) cached > hash and perturbs the hash with a series of add, multiply and bit shift ops. In an attack, this is still O(collisions) per probe (as any scheme where the address of the nth collision is a function of only the hash), where double hashing should be "roughly" O(1) (with double the coefficient). But that evidently imposes too large a performance burden on non-evil users, so it's not worth thinking about whether "roughly O(1)" is close enough to O(1) to deter or exhaust attackers. I withdraw the suggestion. From solipsis at pitrou.net Mon Jan 2 15:41:44 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 15:41:44 +0100 Subject: [Python-Dev] Code reviews References: Message-ID: <20120102154144.2241e880@pitrou.net> On Mon, 2 Jan 2012 14:44:49 +1000 Nick Coghlan wrote: > > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). Oh, by the way, this is also why I avoid arguing too much about style in code reviews. There are two bad things which can happen: - your advice conflicts with advice given by another reviewer (perhaps on another issue) - the contributor feels drowned under tiresome requests for style fixes ("please indent continuation lines this way") Both are potentially demotivating. A contributor can have his/her own style if it doesn't adversely affect code quality. Regards Antoine. From benjamin at python.org Mon Jan 2 15:54:02 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 08:54:02 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: 2012/1/1 Nick Coghlan : > I've been having an occasional argument with Benjamin regarding braces > in 4-line if statements: Python's C code has been dropping braces long before I ever arrived. See this beautiful example in dictobject.c, for example: if (numfree < PyDict_MAXFREELIST && Py_TYPE(mp) == &PyDict_Type) free_list[numfree++] = mp; else Py_TYPE(mp)->tp_free((PyObject *)mp); There's even things like this: if (ep->me_key == dummy) freeslot = ep; else { if (ep->me_hash == hash && unicode_eq(ep->me_key, key)) return ep; freeslot = NULL; } where I would normally put braces on both statements. I think claims of its maintainability are exaggerated. (If someone could cite an example of a bug caused by braces, I'd be interested to see it.) If I start editing one of the bodies, emacs will dedent, so that I know I'm back to the containing block. By virtue of being 5 lines long, it's a very easy case to see and fix as you edit it. I think it's fine Nick raised this. PEP 7 is not very explicit about braces at all. -- Regards, Benjamin From solipsis at pitrou.net Mon Jan 2 16:04:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jan 2012 16:04:58 +0100 Subject: [Python-Dev] cpython: fix some possible refleaks from PyUnicode_READY error conditions References: Message-ID: <20120102160458.58d4c207@pitrou.net> On Mon, 02 Jan 2012 16:00:50 +0100 benjamin.peterson wrote: > http://hg.python.org/cpython/rev/d5cda62d0f8c > changeset: 74236:d5cda62d0f8c > user: Benjamin Peterson > date: Mon Jan 02 09:00:30 2012 -0600 > summary: > fix some possible refleaks from PyUnicode_READY error conditions > > files: > Objects/unicodeobject.c | 80 ++++++++++++++++++++-------- > 1 files changed, 56 insertions(+), 24 deletions(-) > > > diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c > --- a/Objects/unicodeobject.c > +++ b/Objects/unicodeobject.c > @@ -9132,10 +9132,15 @@ > Py_ssize_t len1, len2; > > str_obj = PyUnicode_FromObject(str); > - if (!str_obj || PyUnicode_READY(str_obj) == -1) > + if (!str_obj) > return -1; > sub_obj = PyUnicode_FromObject(substr); > - if (!sub_obj || PyUnicode_READY(sub_obj) == -1) { > + if (!sub_obj) { > + Py_DECREF(str_obj); > + return -1; > + } > + if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) { Shouldn't the first one be PyUnicode_READY(sub_obj) ? From benjamin at python.org Mon Jan 2 16:07:54 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 09:07:54 -0600 Subject: [Python-Dev] cpython: fix some possible refleaks from PyUnicode_READY error conditions In-Reply-To: <20120102160458.58d4c207@pitrou.net> References: <20120102160458.58d4c207@pitrou.net> Message-ID: 2012/1/2 Antoine Pitrou : > On Mon, 02 Jan 2012 16:00:50 +0100 > benjamin.peterson wrote: >> http://hg.python.org/cpython/rev/d5cda62d0f8c >> changeset: ? 74236:d5cda62d0f8c >> user: ? ? ? ?Benjamin Peterson >> date: ? ? ? ?Mon Jan 02 09:00:30 2012 -0600 >> summary: >> ? fix some possible refleaks from PyUnicode_READY error conditions >> >> files: >> ? Objects/unicodeobject.c | ?80 ++++++++++++++++++++-------- >> ? 1 files changed, 56 insertions(+), 24 deletions(-) >> >> >> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c >> --- a/Objects/unicodeobject.c >> +++ b/Objects/unicodeobject.c >> @@ -9132,10 +9132,15 @@ >> ? ? ?Py_ssize_t len1, len2; >> >> ? ? ?str_obj = PyUnicode_FromObject(str); >> - ? ?if (!str_obj || PyUnicode_READY(str_obj) == -1) >> + ? ?if (!str_obj) >> ? ? ? ? ?return -1; >> ? ? ?sub_obj = PyUnicode_FromObject(substr); >> - ? ?if (!sub_obj || PyUnicode_READY(sub_obj) == -1) { >> + ? ?if (!sub_obj) { >> + ? ? ? ?Py_DECREF(str_obj); >> + ? ? ? ?return -1; >> + ? ?} >> + ? ?if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) { > > Shouldn't the first one be PyUnicode_READY(sub_obj) ? Yes. -- Regards, Benjamin From lists at cheimes.de Mon Jan 2 16:18:41 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 16:18:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> Message-ID: <4F01CAD1.6030206@cheimes.de> Am 02.01.2012 06:55, schrieb Paul McMillan: > I think Ruby uses FNV-1 with a salt, making it less vulnerable to > this. FNV is otherwise similar to our existing hash function. > > For the record, cryptographically strong hash functions are in the > neighborhood of 400% slower than our existing hash function. I've pushed a new patch http://hg.python.org/features/randomhash/rev/0a65d2462e0c The changeset adds the murmur3 hash algorithm with some minor changes, for example more random seeds. At first I was worried that murmur might be slower than our old hash algorithm. But in fact it seems to be faster! Pybench 10 rounds on my Core2 Duo 2.60: py3k: 3.230 sec randomahash: 3.182 sec Christian From ajm at flonidan.dk Mon Jan 2 16:34:00 2012 From: ajm at flonidan.dk (Anders J. Munch) Date: Mon, 2 Jan 2012 16:34:00 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk><0F70678AC2164512A7E6FCADB2F37EA8@gmail.com><4F008050.1030807@cheimes.de> <20120101170923.5323628a@pitrou.net><4F008B17.8000606@cheimes.de><1325436844.3472.6.camel@localhost.localdomain><4F0097B3.1040905@cheimes.de> Message-ID: <9B1795C95533CA46A83BA1EAD4B01030DFDE77@flonidanmail.flonidan.net> > On 1/1/2012 12:28 PM, Christian Heimes wrote: > I understood Alexander Klink and Julian W?lde, hashDoS at alech.de, as > saying that they consider that using a random non-zero start value is > sufficient to make the hash non-vulnerable. Sufficient against their current attack. But will it last? For a long-running server, there must be plenty of ways information can leak that will help guessing that start value. The alternative, to provide a dict-like datastructure for use with untrusted input, deserves consideration. Perhaps something simpler than a balanced tree would do? How about a dict-like class that is built on a lazily sorted list? Insertions basically just do list.append and set a dirty-flag, and lookups use bisect - sorting first if the dirty-flag is set. It wouldn't be complete dict replacement by any means, mixing insertions and lookups would have terrible performance, but for something like POST parameters it should be good enough. I half expected to find something like that on activestate recipes already, but couldn't find any. regards, Anders From lists at cheimes.de Mon Jan 2 16:47:43 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 02 Jan 2012 16:47:43 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <4F01D19F.3090009@cheimes.de> Am 01.01.2012 19:45, schrieb Terry Reedy: > On 1/1/2012 10:13 AM, Guido van Rossum wrote: >> PS. Is the collision-generator used in the attack code open source? > > As I posted before, Alexander Klink and Julian W?lde gave their project > email as hashDoS at alech.de. Since they indicated disappointment in not > hearing from Python, I presume they would welcome engagement. Somebody should contact Alexander and Julian to let them know, that we are working on the matter. It should be somebody "official" for the initial contact, too. I've included Guido (BDFL), Barry (their initial security contact) and MvL (most prominent German core dev) in CC, as they are the logical choice for me. I'm willing to have a phone call with them once the contact has been established. IMHO it's slightly easier to talk in native tongue -- Alexander and Julian are German, too. Christian From g.brandl at gmx.net Mon Jan 2 18:35:48 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 02 Jan 2012 18:35:48 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: <20120102154144.2241e880@pitrou.net> References: <20120102154144.2241e880@pitrou.net> Message-ID: On 01/02/2012 03:41 PM, Antoine Pitrou wrote: > On Mon, 2 Jan 2012 14:44:49 +1000 > Nick Coghlan wrote: >> >> He keeps leaving them out, I occasionally tell him they should always >> be included (most recently this came up when we gave conflicting >> advice to a patch contributor). > > Oh, by the way, this is also why I avoid arguing too much about style > in code reviews. There are two bad things which can happen: > > - your advice conflicts with advice given by another reviewer (perhaps > on another issue) > - the contributor feels drowned under tiresome requests for style > fixes ("please indent continuation lines this way") > > Both are potentially demotivating. A contributor can have his/her own > style if it doesn't adversely affect code quality. Exactly. Especially for reviews of patches from non-core people, we should exercise a lot of restraint: as the committers, I think we can be expected to bite the sour bullet and apply our uniform style (such as it is). It is tiresome, if not downright disappointing, to get reviews that are basically "nothing wrong, but please submit again with one more empty line between the classes", and definitely not the way to attract more contributors. Georg From g.brandl at gmx.net Mon Jan 2 18:38:41 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 02 Jan 2012 18:38:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01D19F.3090009@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: On 01/02/2012 04:47 PM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: >> On 1/1/2012 10:13 AM, Guido van Rossum wrote: >>> PS. Is the collision-generator used in the attack code open source? >> >> As I posted before, Alexander Klink and Julian W?lde gave their project >> email as hashDoS at alech.de. Since they indicated disappointment in not >> hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. I wouldn't expect too much -- they seem rather keen on cheap laughs: http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Georg From guido at python.org Mon Jan 2 19:29:29 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 10:29:29 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01D19F.3090009@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 7:47 AM, Christian Heimes wrote: > Am 01.01.2012 19:45, schrieb Terry Reedy: > > On 1/1/2012 10:13 AM, Guido van Rossum wrote: > >> PS. Is the collision-generator used in the attack code open source? > > > > As I posted before, Alexander Klink and Julian W?lde gave their project > > email as hashDoS at alech.de. Since they indicated disappointment in not > > hearing from Python, I presume they would welcome engagement. > > Somebody should contact Alexander and Julian to let them know, that we > are working on the matter. It should be somebody "official" for the > initial contact, too. I've included Guido (BDFL), Barry (their initial > security contact) and MvL (most prominent German core dev) in CC, as > they are the logical choice for me. > > I'm willing to have a phone call with them once the contact has been > established. IMHO it's slightly easier to talk in native tongue -- > Alexander and Julian are German, too. > I'm not sure I see the point -- just give them a link to the python-dev archives. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Mon Jan 2 19:26:13 2012 From: francismb at email.de (francis) Date: Mon, 02 Jan 2012 19:26:13 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> Message-ID: <4F01F6C5.2020205@email.de> On 01/02/2012 06:35 PM, Georg Brandl wrote: > On 01/02/2012 03:41 PM, Antoine Pitrou wrote: >> On Mon, 2 Jan 2012 14:44:49 +1000 >> Nick Coghlan wrote: >>> He keeps leaving them out, I occasionally tell him they should always >>> be included (most recently this came up when we gave conflicting >>> advice to a patch contributor). >> Oh, by the way, this is also why I avoid arguing too much about style >> in code reviews. There are two bad things which can happen: >> >> - your advice conflicts with advice given by another reviewer (perhaps >> on another issue) >> - the contributor feels drowned under tiresome requests for style >> fixes ("please indent continuation lines this way") >> >> Both are potentially demotivating. A contributor can have his/her own >> style if it doesn't adversely affect code quality. > Exactly. Especially for reviews of patches from non-core people, we > should exercise a lot of restraint: as the committers, I think we can be > expected to bite the sour bullet and apply our uniform style (such as > it is). > > It is tiresome, if not downright disappointing, to get reviews that > are basically "nothing wrong, but please submit again with one more > empty line between the classes", and definitely not the way to > attract more contributors. > Hi to all member of this list, I'm not a Python-Dev (only some very small patches over core-mentorship list. Just my 2cents here). I would try to relax this conflicts with a script that does the reformatting itself. If that reformatting where part of the process itself do you thing that that would be an issue anymore? PS: I know that there?s a pep8 checker so it could be transformed into a reformatter but I don't know if theres a pep7 checker (reformater) Best regards! francis From brian at python.org Mon Jan 2 19:41:07 2012 From: brian at python.org (Brian Curtin) Date: Mon, 2 Jan 2012 12:41:07 -0600 Subject: [Python-Dev] Code reviews In-Reply-To: <4F01F6C5.2020205@email.de> References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: On Mon, Jan 2, 2012 at 12:26, francis wrote: > On 01/02/2012 06:35 PM, Georg Brandl wrote: >> >> On 01/02/2012 03:41 PM, Antoine Pitrou wrote: >>> >>> On Mon, 2 Jan 2012 14:44:49 +1000 >>> Nick Coghlan ?wrote: >>>> >>>> He keeps leaving them out, I occasionally tell him they should always >>>> be included (most recently this came up when we gave conflicting >>>> advice to a patch contributor). >>> >>> Oh, by the way, this is also why I avoid arguing too much about style >>> in code reviews. There are two bad things which can happen: >>> >>> - your advice conflicts with advice given by another reviewer (perhaps >>> ? on another issue) >>> - the contributor feels drowned under tiresome requests for style >>> ? fixes ("please indent continuation lines this way") >>> >>> Both are potentially demotivating. A contributor can have his/her own >>> style if it doesn't adversely affect code quality. >> >> Exactly. Especially for reviews of patches from non-core people, we >> should exercise a lot of restraint: as the committers, I think we can be >> expected to bite the sour bullet and apply our uniform style (such as >> it is). >> >> It is tiresome, if not downright disappointing, to get reviews that >> are basically "nothing wrong, but please submit again with one more >> empty line between the classes", and definitely not the way to >> attract more contributors. >> > Hi to all member of this list, > I'm not a Python-Dev (only some very small patches over core-mentorship > list. > Just my 2cents here). > > I would try to relax this conflicts with a script that does the reformatting > itself. If > that reformatting where part of the process itself do you thing that that > would > be an issue anymore? I don't think this is a problem to the point that it needs to be fixed via automation. The code I write is the code I build and test, so I'd rather not have some script that goes in and modifies it to some accepted format, then have to go through the build/test dance again. From snaury at gmail.com Mon Jan 2 19:53:09 2012 From: snaury at gmail.com (Alexey Borzenkov) Date: Mon, 2 Jan 2012 22:53:09 +0400 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F01CAD1.6030206@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> <4F01CAD1.6030206@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: > Am 02.01.2012 06:55, schrieb Paul McMillan: >> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >> this. FNV is otherwise similar to our existing hash function. >> >> For the record, cryptographically strong hash functions are in the >> neighborhood of 400% slower than our existing hash function. > > I've pushed a new patch > http://hg.python.org/features/randomhash/rev/0a65d2462e0c It seems for 32-bit version you are using pid for the two constants. Also, it's unclear why you even need to use a random constant for the final pass, you already use random constant as an initial h1, and it should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, k4 should be initialized to zero, these are key data, they don't need to be mixed with anything. Also, I'm not sure how portable is the always_inline attribute, is it supported on all compilers and all platforms? From snaury at gmail.com Mon Jan 2 19:57:27 2012 From: snaury at gmail.com (Alexey Borzenkov) Date: Mon, 2 Jan 2012 22:57:27 +0400 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F0087D2.9090405@cheimes.de> <4F01CAD1.6030206@cheimes.de> Message-ID: On Mon, Jan 2, 2012 at 10:53 PM, Alexey Borzenkov wrote: > On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes wrote: >> Am 02.01.2012 06:55, schrieb Paul McMillan: >>> I think Ruby uses FNV-1 with a salt, making it less vulnerable to >>> this. FNV is otherwise similar to our existing hash function. >>> >>> For the record, cryptographically strong hash functions are in the >>> neighborhood of 400% slower than our existing hash function. >> >> I've pushed a new patch >> http://hg.python.org/features/randomhash/rev/0a65d2462e0c > > It seems for 32-bit version you are using pid for the two constants. > Also, it's unclear why you even need to use a random constant for the > final pass, you already use random constant as an initial h1, and it > should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3, > k4 should be initialized to zero, these are key data, they don't need > to be mixed with anything. Sorry, sent too soon. What I mean is that you're initializing a pretty big array of values when you only need a 32-bit value. Pid, in my opinion might be too predictable, it would be a lot better to simply hash pid and gettimeofday bytes to produce this single 32-bit value and use it for h1, h2, h3 and h4 in both 32-bit and 128-bit versions. > Also, I'm not sure how portable is the always_inline attribute, is it > supported on all compilers and all platforms? From tseaver at palladion.com Mon Jan 2 21:25:00 2012 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 02 Jan 2012 15:25:00 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F013ACF.7090708@scottdial.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/02/2012 01:02 AM, Nick Coghlan wrote: > On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial > wrote: >> On 1/1/2012 11:44 PM, Nick Coghlan wrote: >>> I think it's a recipe for future maintenance hassles when someone >>> adds a second statement to one of the clauses but doesn't add the >>> braces. (The only time I consider it reasonable to leave out the >>> braces is for one liner if statements, where there's no else >>> clause at all) >> >> Could you explain how these two cases differ with regard to >> maintenance? > > Sure: always including K&R style braces for compound statements (even > when they aren't technically necessary) means that indentation == > control flow, just like Python. Indent your additions correctly, and > the reader and compiler will agree on what they mean: > > if (cond) { statement; } else { statement; addition; /* Reader and > compiler agree this is part of the else clause */ } > > if (cond) statement; else statement; addition; /* Uh-oh, should have > added braces */ > > I've been trying to convince Benjamin that there's a reason "always > include the braces" is accepted wisdom amongst many veteran C > programmers (with some allowing an exception for one-liners), but he > isn't believing me, and I'm not going to go through and edit every > single one of his commits to add them. FWIW, +1 to mandating braces-always (even for one liners): the future maintenance burden isn't worth the trouble of the exception. In the days when I did C / C++ / Java coding as my main gig, braceless code was routinely a bug magnet *for the team*. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8CEpwACgkQ+gerLs4ltQ5vTwCbBjlToJ2yZh4Ra+tNkqMVIaLj NfUAnjAfkDE0BPus1g33hd84tkGonUzd =K1p9 -----END PGP SIGNATURE----- From benjamin at python.org Mon Jan 2 21:31:57 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 2 Jan 2012 14:31:57 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: 2012/1/1 Nick Coghlan : > > ?if (cond) { > ? ?statement; > ?} else { > ? ?statement; > ?} I might add that assuming you have braces, PEP 7 would want you to format it as if (cond) { statement; } else { more_stuff; } -- Regards, Benjamin From julien at tayon.net Mon Jan 2 22:02:28 2012 From: julien at tayon.net (julien tayon) Date: Mon, 2 Jan 2012 22:02:28 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: @francis Like indent ? http://www.linuxmanpages.com/man1/indent.1.php @brian > I don't think this is a problem to the point that it needs to be fixed > via automation. The code I write is the code I build and test, so I'd > rather not have some script that goes in and modifies it to some > accepted format, then have to go through the build/test dance again. Well, it breaks committing since it adds non significative symbols, therefore bloats the diffs. But as far as I am concerned for using it a long time ago, it did not break anything, it was pretty reliable. my 2c * 0.1 -- jul From jimjjewett at gmail.com Mon Jan 2 22:07:59 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 2 Jan 2012 16:07:59 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby wrote: > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett wrote: >> >> Given the wording requiring a real dictionary, I would have assumed >> that it was OK (if perhaps not sensible) to do pointer arithmetic and >> access the keys/values/hashes directly. ?(Though if the breakage was >> between python versions, I would feel guilty about griping too >> loudly.) > If you're going to be a language lawyer about it, I would simply point out > that all the spec requires is that "type(env) is dict" -- it says nothing > about how Python defines "type" or "is" or "dict". ?So, you're on your own > with that one. ;-) But the public header file < http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > defines the typedef structs for PyDictEntry and _dictobject. What is the purpose of the requiring a "real dict" without also promising what the header file promises? -jJ From larry at hastings.org Mon Jan 2 22:50:32 2012 From: larry at hastings.org (Larry Hastings) Date: Mon, 02 Jan 2012 13:50:32 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> Message-ID: <4F0226A8.7020006@hastings.org> On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > Really? Do we need to have a brace war? > People have different preferences. > The standard library includes some of both styles > depending on what the maintainer thought was cleanest to their eyes in a given context. I'm with Raymond. Code should be readable, and code reviews are the best way to achieve that--not endlessly specific formatting rules. Have there been bugs in CPython that the proposed new PEP 7 rule would have prevented? /arry From guido at python.org Mon Jan 2 23:08:17 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 14:08:17 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F0226A8.7020006@hastings.org> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On Mon, Jan 2, 2012 at 1:50 PM, Larry Hastings wrote: > On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > >> Really? Do we need to have a brace war? >> People have different preferences. >> The standard library includes some of both styles >> depending on what the maintainer thought was cleanest to their eyes in a >> given context. >> > > I'm with Raymond. Code should be readable, and code reviews are the best > way to achieve that--not endlessly specific formatting rules. > > Have there been bugs in CPython that the proposed new PEP 7 rule would > have prevented? The irony is that style guides exist to *avoid* debates like this. Yes, the choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the rules. But still, once a style rule has been set, the idea is to stop debating and just code. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Mon Jan 2 23:09:49 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 3 Jan 2012 09:09:49 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F0226A8.7020006@hastings.org> References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On 3 January 2012 08:50, Larry Hastings wrote: > On 01/02/2012 12:47 AM, Raymond Hettinger wrote: > >> Really? Do we need to have a brace war? >> People have different preferences. >> The standard library includes some of both styles >> depending on what the maintainer thought was cleanest to their eyes in a >> given context. >> > > I'm with Raymond. Code should be readable, and code reviews are the best > way to achieve that--not endlessly specific formatting rules. > > Have there been bugs in CPython that the proposed new PEP 7 rule would > have prevented? > I've found that until someone has experienced multiple nasty bugs caused by not always using braces, it's nearly impossible to convince them of why you should. Afterwards it'simpossible to convince them (me) that you shouldn't always use braces. I'd also point out that if you're expecting braces, not having them can make the code less readable. A consistent format tends to make for more readable code. Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From francisco.martin at web.de Mon Jan 2 23:13:27 2012 From: francisco.martin at web.de (Francisco Martin Brugue) Date: Mon, 02 Jan 2012 23:13:27 +0100 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> <4F01F6C5.2020205@email.de> Message-ID: <4F022C07.9060800@web.de> On 01/02/2012 10:02 PM, julien tayon wrote: > @francis > Like indent ? > http://www.linuxmanpages.com/man1/indent.1.php Thank you, I wasn't aware of this one ! From raymond.hettinger at gmail.com Mon Jan 2 23:32:14 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 14:32:14 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Jan 2, 2012, at 12:31 PM, Benjamin Peterson wrote: > I might add that assuming you have braces, PEP 7 would want you to format it as > > if (cond) { > statement; > } > else { > more_stuff; > } > Running ``grep -B1 else Objects/*c`` shows that we've happily lived with a mixture of styles for a very long time. ISTM, our committers have had good instincts about when braces add clarity and when they add clutter. If Nick pushes through an always-use-braces mandate, A LOT of code will need to be changed. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Jan 2 23:55:59 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 14:55:59 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote: > I'd also point out that if you're expecting braces, not having them can make the code less readable. If a programmer's mind explodes when they look at the simple and beautiful examples in K&R's The C Programming Language, then they've got problems that can't be solved by braces ;-) Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Jan 3 00:08:15 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 02 Jan 2012 18:08:15 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F0238DF.2010701@nedbatchelder.com> On 1/2/2012 5:32 PM, Raymond Hettinger wrote: > > Running ``grep -B1 else Objects/*c`` shows that we've happily lived > with a mixture of styles for a very long time. > ISTM, our committers have had good instincts about when braces add > clarity and when they add clutter. > If Nick pushes through an always-use-braces mandate, A LOT of code > will need to be changed. > I'm sure we can agree that 1) Nick isn't "pushing through" anything, this is a discussion about what to do, and 2) even if we agree to change PEP 7, no one would advocate having to go through all the C code to change it to a newly-agreed style. --Ned. > > Raymond > > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Tue Jan 3 01:16:15 2012 From: pje at telecommunity.com (PJ Eby) Date: Mon, 2 Jan 2012 19:16:15 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett wrote: > On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby wrote: > > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett > wrote: > >> > >> Given the wording requiring a real dictionary, I would have assumed > >> that it was OK (if perhaps not sensible) to do pointer arithmetic and > >> access the keys/values/hashes directly. (Though if the breakage was > >> between python versions, I would feel guilty about griping too > >> loudly.) > > > If you're going to be a language lawyer about it, I would simply point > out > > that all the spec requires is that "type(env) is dict" -- it says nothing > > about how Python defines "type" or "is" or "dict". So, you're on your > own > > with that one. ;-) > > But the public header file < > http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > > defines the typedef structs for PyDictEntry and _dictobject. > > What is the purpose of the requiring a "real dict" without also > promising what the header file promises? > > Er, just because it's in the .h doesn't mean it's in the public API. But in any event, if you're actually serious about this, I'd just point out that: 1. The struct layout doesn't guarantee anything about insertion or lookup algorithms, 2. If the data structure were changed, the header file would obviously change as well, and 3. ISTM that Python does not even promise inter-version ABI compatibility for internals like the dict object layout. Are you seriously writing code that relies on the C structure layout of dicts? Because really, that was SO not the point of the dict type requirement. It was so that you could use Python's low-level *API* calls, not muck about with the data structure directly. I'm occasionally considered notorious for abusing Python internals, but even I have to draw the line somewhere. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 3 01:22:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2012 10:22:28 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 12:54 AM, Benjamin Peterson wrote: > I think it's fine Nick raised this. PEP 7 is not very explicit about > braces at all. I actually discovered in this thread that I've been misreading PEP 7 for going on 7 years now - I thought the brace usage example *did* use "} else {" (mainly because I write my if statements that way, and nobody had ever pointed out to me that the C style guide actually says otherwise). So I'm happy enough with leaving PEP 7 alone and letting the stylistic inconsistencies stand (even going forward). I agree in these days of auto-indenting editors and automated test suites, the maintenance benefits of always requiring the braces are significantly less than they used to be. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Jan 3 01:27:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Jan 2012 10:27:11 +1000 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger wrote: > Running ?``grep -B1 else Objects/*c`` shows that we've happily lived with a > mixture of styles for a very long time. > ISTM, our committers have had good instincts about when braces add clarity > and when they add clutter. > If Nick pushes through an always-use-braces mandate, A LOT of code will need > to be changed. Nah, I was asking a genuine question, not pushing anything in particular. I *thought* the code base was more consistent than it is, but it turns out that was an error of perception on my part, rather than an objective fact. With my perception of the status quo corrected, I can stop worrying about preserving a non-existent consistency. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From raymond.hettinger at gmail.com Tue Jan 3 01:47:48 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 2 Jan 2012 16:47:48 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <3FD8D3FB-ABD9-490C-ACC5-C8927EC39843@gmail.com> On Jan 2, 2012, at 4:27 PM, Nick Coghlan wrote: > With my perception of the status quo corrected, I can stop worrying > about preserving a non-existent consistency. +1 QOTD Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Tue Jan 3 01:53:06 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 3 Jan 2012 11:53:06 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: On 3 January 2012 09:55, Raymond Hettinger wrote: > > On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote: > > I'd also point out that if you're expecting braces, not having them can > make the code less readable. > > > If a programmer's mind explodes when they look at the simple and beautiful > examples in K&R's The C Programming Language, then they've got problems > that can't be solved by braces ;-) > Now that's just hyperbole ;) If you've got a mix of braces and non-braces in a chunk of code, it's very easy for the mind to skip over the non-brace blocks as not being blocks. I know it's not something I'm likely to mess up when reading the code in-depth, but if I'm skimming over trying to understand the gist of the code or looking for what should be an obvious bug, a block that's not brace-delimited is more likely to be missed than one that is (when amongst other blocks that are). If we had the option of "just use indentation" in C I'd advocate that. Failing that, I find that consistent usage of braces is preferable. Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 3 01:54:42 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jan 2012 16:54:42 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 4:27 PM, Nick Coghlan wrote: > On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger > wrote: > > Running ``grep -B1 else Objects/*c`` shows that we've happily lived > with a > > mixture of styles for a very long time. > > ISTM, our committers have had good instincts about when braces add > clarity > > and when they add clutter. > > If Nick pushes through an always-use-braces mandate, A LOT of code will > need > > to be changed. > > Nah, I was asking a genuine question, not pushing anything in > particular. I *thought* the code base was more consistent than it is, > but it turns out that was an error of perception on my part, rather > than an objective fact. > > With my perception of the status quo corrected, I can stop worrying > about preserving a non-existent consistency. > Amen. And, as the (nominal) author of the PEP, the PEP didn't mean to state an opinion on whether braces are mandatory. It only meant to state how they should be placed when they are there. It's true that there are other readings possible, but that's what I meant. If someone wants to change the wording to clarify this, go right ahead. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Jan 3 04:20:09 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:20:09 -0500 Subject: [Python-Dev] Code reviews In-Reply-To: References: <20120102154144.2241e880@pitrou.net> Message-ID: <20120102222009.0e87431d@limelight.wooz.org> On Jan 02, 2012, at 06:35 PM, Georg Brandl wrote: >Exactly. Especially for reviews of patches from non-core people, we >should exercise a lot of restraint: as the committers, I think we can be >expected to bite the sour bullet and apply our uniform style (such as >it is). > >It is tiresome, if not downright disappointing, to get reviews that >are basically "nothing wrong, but please submit again with one more >empty line between the classes", and definitely not the way to >attract more contributors. I think it's fine in a code review to point out where the submission misses the important consistency points, but not to hold up merging the changes because of that. You want to educate and motivate so that the next submission comes closer to our standards. The core dev who commits the change can clean up style issues. -Barry P.S. +1 for the change to PEP 7. From barry at python.org Tue Jan 3 04:25:55 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:25:55 -0500 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <9DBEC49E-4292-4C97-A587-8A31B8B8A33D@gmail.com> <4F0226A8.7020006@hastings.org> Message-ID: <20120102222555.15d0a42c@limelight.wooz.org> On Jan 02, 2012, at 02:08 PM, Guido van Rossum wrote: >The irony is that style guides exist to *avoid* debates like this. Yes, the >choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the >rules. But still, once a style rule has been set, the idea is to stop >debating and just code. +1 The other reason why style guides exist is to give contributors some sense of what they should shoot for. I've worked on existing code bases where there's so little consistency I can't tell what the author's preferences are even if I wanted to adhere to them. -Barry From barry at python.org Tue Jan 3 04:36:01 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jan 2012 22:36:01 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> <4F01D19F.3090009@cheimes.de> Message-ID: <20120102223601.5cc6e1dc@limelight.wooz.org> On Jan 02, 2012, at 06:38 PM, Georg Brandl wrote: >I wouldn't expect too much -- they seem rather keen on cheap laughs: > >http://twitter.com/#!/bk3n/status/152068096448921600/photo/1/large Heh, so yeah, it won't be me contacting them. -Barry From martin at v.loewis.de Tue Jan 3 09:44:03 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 03 Jan 2012 09:44:03 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: Message-ID: <4F02BFD3.7080204@v.loewis.de> > He keeps leaving them out, I occasionally tell him they should always > be included (most recently this came up when we gave conflicting > advice to a patch contributor). He says what he's doing is OK, because > he doesn't consider the example in PEP 7 as explicitly disallowing it, > I think it's a recipe for future maintenance hassles when someone adds > a second statement to one of the clauses but doesn't add the braces. > (The only time I consider it reasonable to leave out the braces is for > one liner if statements, where there's no else clause at all) While this appears to be settled, I'd like to add that I sided with Benjamin here all along. With Python, I accepted a style of "minimal punctuation". Examples of extra punctuation are: - parens around expression in Python's if (and while): if (x < 10): foo () - parens around return expression (C and Python) return(*p); - braces around single-statement blocks in C In all these cases, punctuation can be left out without changing the meaning of the program. I personally think that a policy requiring braces would be (mildly) harmful, as it decreases readability of the code. When I read code, I read every character: not just the identifiers, but also every punctuation character. If there is extra punctuation, I stop and wonder what the motivation for the punctuation is - is there any hidden meaning that required the author to put the punctuation? There is a single case where I can accept extra punctuation in C: to make the operator precedence explicit. Many people (including myself) don't know how a | b << *c * *d would group, so I readily accept extra parens as a clarification. Wrt. braces, I don't share the concern that there is a risk of somebody being confused when adding a second statement to a braceless block. An actual risk is stuff like if (cond) MACRO(argument); when MACRO expands to multiple statements. However, we should accept that this is a bug in MACRO (which should have used the do-while(0)-idiom), not in the application of the macro. Regards, Martin From lists at cheimes.de Tue Jan 3 14:18:34 2012 From: lists at cheimes.de (Christian Heimes) Date: Tue, 03 Jan 2012 14:18:34 +0100 Subject: [Python-Dev] RNG in the core Message-ID: <4F03002A.5010800@cheimes.de> Hello, all proposed fixes for a randomized hashing function raise and fall with a good random number generator to feed the random seed. The seed must be created very early in the startup phase of the interpreter, preferable before the basic types are initialized. CPython already have multiple sources for random data (win32_urandom in Modules/posixmodule.c, urandom in Lib/os.py, Mersenne twister in Modules/_randommodule.c). However we can't use them because they are wrapped inside Python modules which require infrastructure like initialized base types. I propose an addition to the current Python C API: int PyOS_URandom(char *buf, Py_ssize_t len) Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". The RNG should be suitable for cryptography. In case of an error the function returns -1 and sets an exception, otherwise it returns 0. On Windows I can re-use most of the code of win32_urandom(). For POSIX I have to implement os.urandom() in C in order to read data from /dev/urandom. That's simple and straight forward. Since some platforms may not have /dev/urandom, we need a PRNG in the core, too. I therefore propose to move the Mersenne twister from randommodule.c into the core, too. typedef struct { unsigned long state[N]; int index; } _Py_MT_RandomState; unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); // genrand_int32() double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random() void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed); // init_genrand() void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long init_key[], unsigned long key_length); // init_by_array I suggest Python/random.c as source file and Python/pyrandom.h as header file. Comments? Christian From anacrolix at gmail.com Tue Jan 3 15:46:51 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 4 Jan 2012 01:46:51 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F02BFD3.7080204@v.loewis.de> References: <4F02BFD3.7080204@v.loewis.de> Message-ID: FWIW I'm against forcing braces to be used. Readability is the highest concern, and this should be at the discretion of the contributor. A code formatting tool, or compiler extension is the only proper handle this, and neither are in use or available. On Tue, Jan 3, 2012 at 7:44 PM, "Martin v. L?wis" wrote: >> He keeps leaving them out, I occasionally tell him they should always >> be included (most recently this came up when we gave conflicting >> advice to a patch contributor). He says what he's doing is OK, because >> he doesn't consider the example in PEP 7 as explicitly disallowing it, >> I think it's a recipe for future maintenance hassles when someone adds >> a second statement to one of the clauses but doesn't add the braces. >> (The only time I consider it reasonable to leave out the braces is for >> one liner if statements, where there's no else clause at all) > > While this appears to be settled, I'd like to add that I sided with > Benjamin here all along. > > With Python, I accepted a style of "minimal punctuation". Examples > of extra punctuation are: > - parens around expression in Python's if (and while): > > ? ?if (x < 10): > ? ? ?foo () > > - parens around return expression (C and Python) > > ? ?return(*p); > > - braces around single-statement blocks in C > > In all these cases, punctuation can be left out without changing > the meaning of the program. > > I personally think that a policy requiring braces would be (mildly) > harmful, as it decreases readability of the code. When I read code, > I read every character: not just the identifiers, but also every > punctuation character. If there is extra punctuation, I stop and wonder > what the motivation for the punctuation is - is there any hidden > meaning that required the author to put the punctuation? > > There is a single case where I can accept extra punctuation in C: > to make the operator precedence explicit. Many people (including > myself) don't know how > > ? a | b << *c * *d > > would group, so I readily accept extra parens as a clarification. > > Wrt. braces, I don't share the concern that there is a risk of > somebody being confused when adding a second statement to a braceless > block. An actual risk is stuff like > > ? if (cond) > ? ? MACRO(argument); > > when MACRO expands to multiple statements. However, we should > accept that this is a bug in MACRO (which should have used the > do-while(0)-idiom), not in the application of the macro. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From stephen at xemacs.org Tue Jan 3 16:46:22 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 00:46:22 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> Message-ID: <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Matt Joiner writes: > Readability is the highest concern, and this should be at the > discretion of the contributor. That's quite backwards. "Readability" is community property, and has as much, if not more, to do with common convention as with some absolute metric. The "contributor's discretion" must yield. That doesn't mean the contributor has to do all the work; as several people have pointed out, it makes a lot of sense for experienced reviewers to make such trivial changes themselves before committing, especially for new contributors. From matthieu.brucher at gmail.com Tue Jan 3 18:23:08 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 3 Jan 2012 18:23:08 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: Hi, I'm not a core Python developer, but it may be intesting to use a real Crush resistant RNG, as one from Random123 (a parallel random generator that is Crush resistant, contrary to the Mersenne Twister, and without a state). Cheers, Matthieu Brucher 2012/1/3 Christian Heimes > Hello, > > all proposed fixes for a randomized hashing function raise and fall with > a good random number generator to feed the random seed. The seed must be > created very early in the startup phase of the interpreter, preferable > before the basic types are initialized. CPython already have multiple > sources for random data (win32_urandom in Modules/posixmodule.c, urandom > in Lib/os.py, Mersenne twister in Modules/_randommodule.c). However we > can't use them because they are wrapped inside Python modules which > require infrastructure like initialized base types. > > I propose an addition to the current Python C API: > > int PyOS_URandom(char *buf, Py_ssize_t len) > > Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". > The RNG should be suitable for cryptography. In case of an error the > function returns -1 and sets an exception, otherwise it returns 0. > On Windows I can re-use most of the code of win32_urandom(). For POSIX I > have to implement os.urandom() in C in order to read data from > /dev/urandom. That's simple and straight forward. > > > Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too. > > typedef struct { > unsigned long state[N]; > int index; > } _Py_MT_RandomState; > > unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); // > genrand_int32() > double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random() > void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed); > // init_genrand() > void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long > init_key[], unsigned long key_length); // init_by_array > > > I suggest Python/random.c as source file and Python/pyrandom.h as header > file. Comments? > > Christian > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 3 18:46:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Jan 2012 18:46:05 +0100 Subject: [Python-Dev] RNG in the core References: <4F03002A.5010800@cheimes.de> Message-ID: <20120103184605.1417f035@pitrou.net> On Tue, 03 Jan 2012 14:18:34 +0100 Christian Heimes wrote: > > I suggest Python/random.c as source file and Python/pyrandom.h as header > file. Comments? Looks good on the principle. The API names for MT are a bit ugly. > The RNG should be suitable for cryptography. Sounds like too strong a requirement. For cryptography, we have the ssl module (and third-party libraries). (also, "suitable for cryptography" is somewhat vague; for example, the Linux man pages insist that /dev/urandom is ok for session keys but /dev/random is needed for long-lived private keys) Regards Antoine. From lists at cheimes.de Tue Jan 3 18:50:44 2012 From: lists at cheimes.de (Christian Heimes) Date: Tue, 03 Jan 2012 18:50:44 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: References: <4F03002A.5010800@cheimes.de> Message-ID: <4F033FF4.3010204@cheimes.de> Am 03.01.2012 18:23, schrieb Matthieu Brucher: > Hi, > > I'm not a core Python developer, but it may be intesting to use a real > Crush resistant RNG, as one from Random123 (a parallel random generator > that is Crush resistant, contrary to the Mersenne Twister, and without a > state). Hello Matthieu, thanks for your input! The core RNG is going to be part of the randomized hashing function patch. The patch will be applied to all Python version from 2.6 to 3.3. Some people may want to applied it to 2.4 and 2.5, too. As the patch is going to affect six to eight Python versions, it should introduce as few new code as possible. Any new code might be a source of new bugs. The Mersenne Twister code is mature and works sufficiently as backup. Any new RNG should go through a PEP process, too. You are welcome to write a PEP and implement an additional RNG for the random module. New developers and new ideas are well received. Regards, Christian From ethan at stoneleaf.us Tue Jan 3 19:42:43 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 03 Jan 2012 10:42:43 -0800 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F034C23.7070000@stoneleaf.us> Stephen J. Turnbull wrote: > Matt Joiner writes: > > > Readability is the highest concern, and this should be at the > > discretion of the contributor. > > That's quite backwards. "Readability" is community property, and has > as much, if not more, to do with common convention as with some > absolute metric. The "contributor's discretion" must yield. Readability also includes more than just the source code; as has already been stated: if(cond) { stmt1; + stmt2; } vs. -if(cond) +if(cond) { stmt1; + stmt2; +} I find the diff version that already had braces in place much more readable. ~Ethan~ From barry at python.org Tue Jan 3 20:38:38 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 3 Jan 2012 14:38:38 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <0F70678AC2164512A7E6FCADB2F37EA8@gmail.com> Message-ID: <20120103143838.139a72f5@resist.wooz.org> On Dec 31, 2011, at 04:56 PM, Guido van Rossum wrote: >Is there a tracker issue yet? The discussion should probably move there. I think the answer to this was "no"... until now. http://bugs.python.org/issue13703 Proposed patches should be linked to this issue now. Please nosy yourself if you want to follow the progress. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From jimjjewett at gmail.com Tue Jan 3 20:55:32 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 3 Jan 2012 14:55:32 -0500 Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re: http://mail.python.org/pipermail/python-dev/2011-December/115172.html) In-Reply-To: References: Message-ID: On Mon, Jan 2, 2012 at 7:16 PM, PJ Eby wrote: > On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett wrote: >> But the public header file < >> http://hg.python.org/cpython/file/3ed5a6030c9b/Include/dictobject.h > >> defines the typedef structs for PyDictEntry and _dictobject. >> What is the purpose of the requiring a "real dict" without also >> promising what the header file promises? > Er, just because it's in the .h doesn't mean it's in the public API. ?But in > any event, if you're actually serious about this, I'd just point out that: > 1. The struct layout doesn't guarantee anything about insertion or lookup > algorithms, My concern was about your suggestion of changing the data structure to accommodate some other algorithm -- particularly if it meant that the data would no longer be stored entirely in an array of PyDictEntry. That shouldn't be done lightly even between major versions, and certainly should not be done in a bugfix (or security-only) release. > Are you seriously writing code that relies on the C structure layout of > dicts? The first page of search results for PyDictEntry suggested that others are. (The code I found did seem to be for getting data from a python dict into some other language, rather than for wsgi.) > ?Because really, that was SO not the point of the dict type > requirement. ?It was so that you could use Python's low-level *API* calls, > not muck about with the data structure directly. Would it be too late to clarify that in the PEP itself? -jJ From steve at pearwood.info Tue Jan 3 21:29:10 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 04 Jan 2012 07:29:10 +1100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: <4F036516.5080701@pearwood.info> Christian Heimes wrote: [...] > I propose an addition to the current Python C API: > > int PyOS_URandom(char *buf, Py_ssize_t len) > > Read "len" chars from the OS's RNG into the pre-allocated buffer "buf". > The RNG should be suitable for cryptography. > Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too. Mersenne twister is not suitable for cryptography. http://en.wikipedia.org/wiki/Mersenne_twister -- Steven From matthieu.brucher at gmail.com Tue Jan 3 22:00:43 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 3 Jan 2012 22:00:43 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F033FF4.3010204@cheimes.de> References: <4F03002A.5010800@cheimes.de> <4F033FF4.3010204@cheimes.de> Message-ID: > The core RNG is going to be part of the randomized hashing function > patch. The patch will be applied to all Python version from 2.6 to 3.3. > Some people may want to applied it to 2.4 and 2.5, too. As the patch is > going to affect six to eight Python versions, it should introduce as few > new code as possible. Any new code might be a source of new bugs. The > Mersenne Twister code is mature and works sufficiently as backup. > > Any new RNG should go through a PEP process, too. You are welcome to > write a PEP and implement an additional RNG for the random module. New > developers and new ideas are well received. > Good point. In fact, these RNG are 100% based on the hash functions provided for instance by OpenSSL. But I think this library is not a dependency so my proposal still has the same impact. The Random123 library is a reimplementation of some cryptographic functions with two arguments, the key and the counter, and that's it. So if there is somewhere in the Python C code such cryptographic function, it can be reused to create Crush-resistant random numbers with no new code line. Cheers, Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jan 3 22:17:06 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 3 Jan 2012 22:17:06 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F03002A.5010800@cheimes.de> References: <4F03002A.5010800@cheimes.de> Message-ID: A randomized hash doesn't need cryptographic RNG (which are slow and need a lot of new code), and the new hash function should maybe not be cryptographic. We need to make the DoS more expensive for the attacker, but we don't need to add "too much security" for that. Mersenne Twister is useless here: it is only needed when you need to generate a fast RNG to generate megabytes of random data, whereas we will not need more than 4 KB. The OS RNG is just fine (fast enough and not blocking). So we can use Windows CryptoGen API (which is already implemented in Python, win32_urandom) and /dev/urandom on UNIX/BSD. /dev/urandom does never block. We need also a fallback if /dev/urandom is not available. Because this case should not occur on modern OS, the fallback can be a weak function like something combining getpid(), gettimeofday(), address of the stack, etc. To generate 4 KB from few words, we can use a very simple LCG (x(n+1) = (x(n) * a + c) mod k). From solipsis at pitrou.net Tue Jan 3 22:20:53 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Jan 2012 22:20:53 +0100 Subject: [Python-Dev] RNG in the core References: <4F03002A.5010800@cheimes.de> Message-ID: <20120103222053.2325d352@pitrou.net> On Tue, 3 Jan 2012 22:17:06 +0100 Victor Stinner wrote: > A randomized hash doesn't need cryptographic RNG (which are slow and > need a lot of new code), and the new hash function should maybe not be > cryptographic. We need to make the DoS more expensive for the > attacker, but we don't need to add "too much security" for that. Agreed. > Mersenne Twister is useless here: it is only needed when you need to > generate a fast RNG to generate megabytes of random data, whereas we > will not need more than 4 KB. The OS RNG is just fine (fast enough and > not blocking). Have you read the following sentence: ?Since some platforms may not have /dev/urandom, we need a PRNG in the core, too. I therefore propose to move the Mersenne twister from randommodule.c into the core, too.? Regards Antoine. From janssen at parc.com Tue Jan 3 23:02:19 2012 From: janssen at parc.com (Bill Janssen) Date: Tue, 3 Jan 2012 14:02:19 PST Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFC68E0.4000606@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> Message-ID: <63988.1325628139@parc.com> Christian Heimes wrote: > Am 29.12.2011 12:13, schrieb Mark Shannon: > > The attack relies on being able to predict the hash value for a given > > string. Randomising the string hash function is quite straightforward. > > There is no need to change the dictionary code. > > > > A possible (*untested*) patch is attached. I'll leave it for those more > > familiar with unicodeobject.c to do properly. > > I'm worried that hash randomization of str is going to break 3rd party > software that rely on a stable hash across multiple Python instances. > Persistence layers like ZODB and cross interpreter communication > channels used by multiprocessing may (!) rely on the fact that the hash > of a string is fixed. Software that depends on an undefined hash function for synchronization and persistence deserves to break, IMO. There are plenty of well-defined hash functions available for this purpose. Bill From martin at v.loewis.de Tue Jan 3 23:21:30 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 03 Jan 2012 23:21:30 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120103222053.2325d352@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> Message-ID: <4F037F6A.1070806@v.loewis.de> > Have you read the following sentence: > > ?Since some platforms may not have /dev/urandom, we need a PRNG in the > core, too. I therefore propose to move the Mersenne twister from > randommodule.c into the core, too.? I disagree. We don't need a PRNG on platforms without /dev/urandom or any other native RNG. Initializing the string-hash seed to 0 is perfectly fine on those platforms; we can do slightly better by using, say, the current time (in ms or ?s if available) and the current pid (if available). People concerned with the security on those systems either need to switch to a different system, or provide a patch to access the platform's native random number generator. Regards, Martin From ben+python at benfinney.id.au Tue Jan 3 23:30:24 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 04 Jan 2012 09:30:24 +1100 Subject: [Python-Dev] PEP 7 clarification request: braces References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87sjjwzaf3.fsf@benfinney.id.au> "Stephen J. Turnbull" writes: > Matt Joiner writes: > > > Readability is the highest concern, and this should be at the > > discretion of the contributor. > > That's quite backwards. "Readability" is community property, and has > as much, if not more, to do with common convention as with some > absolute metric. The "contributor's discretion" must yield. +1 -- \ ?Those who write software only for pay should go hurt some | `\ other field.? ?Erik Naggum, in _gnu.misc.discuss_ | _o__) | Ben Finney From martin at v.loewis.de Wed Jan 4 00:11:50 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 04 Jan 2012 00:11:50 +0100 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <4F034C23.7070000@stoneleaf.us> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: <4F038B36.1000401@v.loewis.de> > Readability also includes more than just the source code; as has already > been stated: > > if(cond) { > stmt1; > + stmt2; > } > > vs. > > -if(cond) > +if(cond) { > stmt1; > + stmt2; > +} > > I find the diff version that already had braces in place much more > readable. Is it really *much* more readable? I have no difficulties reading either (although I had preferred a space after the if; this worries me more than the double if line). Regards, Martin From benjamin at python.org Wed Jan 4 00:17:56 2012 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 3 Jan 2012 23:17:56 +0000 (UTC) Subject: [Python-Dev] PEP 7 clarification request: braces References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: Ethan Furman stoneleaf.us> writes: > > Readability also includes more than just the source code; as has already > been stated: > > if(cond) { > stmt1; > + stmt2; > } > > vs. > > -if(cond) > +if(cond) { > stmt1; > + stmt2; > +} > > I find the diff version that already had braces in place much more readable. There are much larger problems facing diff readibility. On your basis, we might as well decree that code should never be arranged or reindented. Regards, Benjamin From mwm at mired.org Wed Jan 4 01:40:36 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 3 Jan 2012 16:40:36 -0800 Subject: [Python-Dev] Proposed PEP on concurrent programming support Message-ID: <20120103164036.681beeae@mikmeyer-vm-fedora> PEP: XXX Title: Interpreter support for concurrent programming Version: $Revision$ Last-Modified: $Date$ Author: Mike Meyer Status: Draft Type: Informational Content-Type: text/x-rst Created: 11-Nov-2011 Post-History: Abstract ======== The purpose of this PEP is to explore strategies for making concurrent programming in Python easier by allowing the interpreter to detect and notify the user about possible bugs in concurrent access. The reason for doing so is that "Errors should never pass silently". Such bugs are caused by allowing objects to be accessed simultaneously from another thread of execution while they are being modified. Currently, python systems provide no support for such bugs, falling back on the underlying platform facilities and some tools built on top of those. While these tools allow prevention of such modification if the programmer is aware of the need for them, there are no facilities to detect that such a need might exist and warn the programmer of it. The goal is not to prevent such bugs, as that depends on the programmer getting the logic of the interactions correct, which the interpreter can't judge. Nor is the goal to warn the programmer about any such modifications - the goal is to catch standard idioms making unsafe modifications. If the programmer starts tinkering with Python's internals, it's assumed they are aware of these issues. Rationale ========= Concurrency bugs are among the hardest bugs to locate and fix. They result in corrupt data being generated or used in a computation. Like most such bugs, the corruption may not become evident until much later and far away in the program. Minor changes in the code can cause the bugs to fail to manifest. They may even fail to manifest from run to run, depending on external factors beyond the control of the programmer. Therefore any help in locating and dealing with such bugs is valuable. If the interpreter is to provide such help, it must be aware of when things are safe to modify and when they are not. This means it will almost certainly cause incompatible changes in Python, and may impose costs so high for non-concurrent operations as to make it untenable. As such, the final options discussed are destined for Python version 4 or later, and may never be implemented in any mainstream implementation of Python. Terminology =========== The word "thread" is used throughout to mean "concurrent thread of execution". Nominally, this means a platform thread. However, it is intended to include any threading mechanism that allows the interpreter to change threads between or in the middle of a statement without the programmer specifically allowing this to happen. Similarly, the word "interpreter" means any system that processes and executes Python language files. While this normally means cPython, the changes discussed here should be amenable to other implementations. Concept ======= Locking object -------------- The idea is that the interpreter should indicate an error anytime an unlocked object is mutated. For mutable types, this would mean changing the value of the type. For Python class instances, this would mean changing the binding of an attribute. Mutating an object bound to such an attribute isn't a change in the object the attribute belongs to, and so wouldn't indicate an error unless the object bound to the attribute was unlocked. Locking by name --------------- It's also been suggested that locking "names" would be useful. That is, to prevent a specific attribute of an object from being rebound, or a key/index entry in a mapping object. This provides a finer grained locking than just locking the object, as you could lock a specific attribute or set of attributes of an object, without locking all of them. Unfortunately, this isn't sufficient: a set may need to be locked to prevent deletions for some period, or a dictionary to prevent adding a key, or a list to prevent changing a slice, etc. So some other locking mechanism is required. If that needs to specify objects, some way of distinguishing between locking a name and locking the object bound to the name needs to be invented, or there needs to be two different locking mechanisms. It's not clear that the finer grained locking is worth adding yet another language mechanism. Alternatives ============ Explicit locking ---------------- These alternatives requires that the programmer explicitly name anything that is going to be changed to lock it before changing it. This lets the interpreter gets involved, but makes a number of errors possible based on the order that locks are applied. Platform locks '''''''''''''' The current tool set uses platform locks via a C extension. The problem with these is that the interpreter has no knowledge of them, and so can't do anything about detecting the mutation of unlocked objects. A ``locking`` keyword ''''''''''''''''''''' Adding a statement to tell the interpreter to lock objects for the attached suite would let the interpreter know which objects are locked. To help prevent deadlocks, such a keyword needs to imply an order for locking objects, such that two objects locked by the a locking statement will lock the two objects in the same order during a single execution of the program. This can be achieved by sorting objects by the ``id`` of the object, since the requirements for ``id`` are sufficient for this. While the locking order requirement is sufficient to prevent deadlocks from non-nested locking statements, it's not sufficient if locking statements are allowed to nest. So either nested locking statements need to be disallowed, or the outer statement must lock everything that's going to need to be locked. Either requirement is sufficiently onerous that alternatives need to be considered. Implicit locking ---------------- In this alternative, the interpter uses one or more heuristics to decide when things should need locking. Software Transactional Memory (STM) ''''''''''''''''''''''''''''''''''' STM is a relatively new technology being experimented with in newer languages, and in a number of 3rd party libraries (both Peak [#Peak]_ and Kamaelia [#Kamaelia]_ provide STM facilities). A suite is marked as a `transaction`, and then when an unlocked object is modified, instead of indicating an error, a locked copy of it is created to be used through the rest of the transaction. If any of the originals are modified during the execution of the suite, the suite is rerun from the beginning. If it completes, the locked copies are copied back to the originals in an atomic manner. This causes the changes seen by any threads not running the transaction to be atomic. If two threads are updating the same object in transactions, the one that finishes second will be restarted with values set by the one that finished first. The advantage of an STM is that the programmer doesn't have to worry about what is locked, and there's no overhead for using locked objects (after locking them, of course). The disadvantage is that any code in a transaction must be safe to run multiple times. This forbids any kind of I/O. Compiler support '''''''''''''''' Since the point is to get the interpreter involved, we might as well let it be involved in figuring out which things are safe and don't need to be locked. This could potentially eliminate a lot of locking. Each object - whether a Python class instance or builtin type - is created with no way to access it until it is bound. So it is inherently safe to modify. Being bound to a local (or nonlocal?) variable doesn't change this. Being bound to a global, class or instance variable or stored in a container does change this, as the object may now be accessed from other threads via the module or container. Since this analysis is being done at compile time, being passed to another function - including methods of the object - makes it unsafe. Likewise, yielding an object makes it unsafe for future use. Returning it doesn't change anything, since our execution is over and we lose access to the object. Unfortunately, objects returned from functions must be treated as unsafe. Interpreter support ''''''''''''''''''' If we instead track whether or not objects require locking in the interpreter, then we can improve the analsysis. The only thing that definitely makes an object unsafe is binding to a global variable or a variable known to be unsafe. Passing objects to C routines exposes them to concurrent modification, since there's no way to know what will happen inside the C routine. Adding some way of marking C routines - or possibly the objects passed to them - as not exposing things to concurrent modification would help with this, allowing C modules to be called without requiring locking everything passed to them. Binding to class and instance variables, or adding them to a container, is an interesting issue. If the object in question is safe, then anything added to it is also safe. However, this would mean that when an object is flagged as unsafe, all objects accessible through it would also have to be flagged as unsafe. This type of tracking also means that objects effectively have three states: locked, unlocked, and safe. Both locked and safe objects can safely be modified without a problem. Locking and unlocking safe objects is a nop. Interpreter threading --------------------- One alternative is replacing the current threading tools - which are wrappers around the OS-provided threading - with threading support in the interpreter. This would allow the interpreter to control whether or not objects are shared between threads, which isn't possible today. The full implications of this approach have as yet to be worked out. Mixed solutions --------------- Most likely, any real implementation would use a number of the techniques above, since all of them have serious shortcomings. For instance, combining STM with explicit locking would allow explicit locking when IO was required, but complex multi-object changes could be handled by STM, thus avoiding the nested locking issues. Likewise, interpreter or compiler support could be mixed with most other solutions to relax the requirement of locking for part of the objects used in a program. The implications of mixing these things together also needs to be explored more thoroughly. Change proposal =============== This is 'strawman' proposal to provide a starting point for discussion. The proposal is to add an STM support to the python interpreter. A new suite type - the ``transaction`` will be added to the language. The suite will have the semantics discussed above: modifying an object in the suite will trigger creation of a thread-local shallow copy to be used in the Transaction. Further modifications of the original will cause all existing copies to be discarded and the transaction to be restarted. At the end of the transaction, the originals of all the copies are locked and then updated to the state of the copy. Further work ============ Requiring further investigation: - The interpreter providing it's own threading. - How various solutions interact when mixed. There are also a couple tools that might be useful to build, or at least investigate building: - A static concurrency safety analyzer, that handled the AST of a function to determine which variables are safe. - A dynamic concurrency safety analyzer, similar to coverage [#coverage]_. Implementation Notes ==================== Not significantly impacting the performance of single-threaded code must be of paramount importance to any implementation. One implementation technique arose that could help with this. Instead of keeping track of the objects state and having methods check that state and modify their behavior based on it, change the methods as the object changes state. So in safe or locked mode, the objects methods could freely modify the object without having to check it's mode. In unlocked mode, an attempt to do so would raise an error or warning. Unfortunately, this doesn't work if some global or thread state must be checked instead of just object-local state. References ========== .. [#Peak] "Peak, the Python Enterprise Application Kit", http://peak.telecommunity.com/ .. [#Kamaelia] "Kamaelia - Concurrency made useful, fun", http://www.kamaelia.org/ .. [#coverage] "Code coverage measurement for Python", http://pypi.python.org/pypi/coverage Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From tjreedy at udel.edu Wed Jan 4 01:41:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 03 Jan 2012 19:41:53 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <63988.1325628139@parc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On 1/3/2012 5:02 PM, Bill Janssen wrote: > Software that depends on an undefined hash function for synchronization > and persistence deserves to break, IMO. There are plenty of > well-defined hash functions available for this purpose. The doc for id() now says "This is an integer which is guaranteed to be unique and constant for this object during its lifetime." Since the default 3.2.2 hash for my win7 64bit CPython is id-address // 16, it can have no longer guarantee. I suggest that hash() doc say something similar: http://bugs.python.org/issue13707 -- Terry Jan Reedy From solipsis at pitrou.net Wed Jan 4 02:34:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 02:34:03 +0100 Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function References: Message-ID: <20120104023403.1c86c12e@pitrou.net> > +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \ > + Py_ssize_t length, Py_UCS4 fill_char) > + > + Fill a string with a character: write *fill_char* into > + ``unicode[start:start+length]``. > + > + Fail if *fill_char* is bigger than the string maximum character, or if the > + string has more than 1 reference. > + > + Return the number of written character, or return ``-1`` and raise an > + exception on error. The return type should then be Py_ssize_t, not int. Regards Antoine. From ncoghlan at gmail.com Wed Jan 4 02:42:20 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Jan 2012 11:42:20 +1000 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F037F6A.1070806@v.loewis.de> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> Message-ID: On Wed, Jan 4, 2012 at 8:21 AM, "Martin v. L?wis" wrote: >> Have you read the following sentence: >> >> ?Since some platforms may not have /dev/urandom, we need a PRNG in the >> core, too. I therefore propose to move the Mersenne twister from >> randommodule.c into the core, too.? > > I disagree. We don't need a PRNG on platforms without /dev/urandom or > any other native RNG. > Initializing the string-hash seed to 0 is perfectly fine on those > platforms; we can do slightly better by using, say, the current > time (in ms or ?s if available) and the current pid (if available). > > People concerned with the security on those systems either need to > switch to a different system, or provide a patch to access the > platform's native random number generator. +1 (especially given how far back this is going to be ported) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Jan 4 02:59:51 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 02:59:51 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <4F037F6A.1070806@v.loewis.de> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> Message-ID: <20120104025951.0f57cca8@pitrou.net> On Tue, 03 Jan 2012 23:21:30 +0100 "Martin v. L?wis" wrote: > > Have you read the following sentence: > > > > ?Since some platforms may not have /dev/urandom, we need a PRNG in the > > core, too. I therefore propose to move the Mersenne twister from > > randommodule.c into the core, too.? > > I disagree. We don't need a PRNG on platforms without /dev/urandom or > any other native RNG. Well what if /dev/urandom is unavailable because the program is run e.g. in a chroot? (or is /dev/urandom still available in a chroot?) Regards Antoine. From stephen at xemacs.org Wed Jan 4 05:10:37 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 13:10:37 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> Message-ID: <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > Ethan Furman stoneleaf.us> writes: > > > > Readability also includes more than just the source code; as has already > > been stated: [diffs elided] > > I find the diff version that already had braces in place much more readable. > > There are much larger problems facing diff readibility. On your basis, we might > as well decree that code should never be arranged or reindented. That's a reasonable approach sometimes used, but it would be hard in Python. Specifically, I often produce two patches when substantial rearrangement is involved. The first isolates the actual changes, the second does the reformatting. In Python, the first patch might be syntactically erroneous, which would be both annoying for automatic testing and less readable. A Python-friendly alternative is to provide both a machine-appliable diff and a diff ignoring whitespace changes. This could be a toggle in web interfaces to the VCS. I've also sometimes found doing word diffs to be useful. Most developers resist such procedures passionately, though. *shrug* From benjamin at python.org Wed Jan 4 05:32:23 2012 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 3 Jan 2012 22:32:23 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2012/1/3 Stephen J. Turnbull : > Benjamin Peterson writes: > ?> Ethan Furman stoneleaf.us> writes: > ?> > > ?> > Readability also includes more than just the source code; as has already > ?> > been stated: > > [diffs elided] > > ?> > I find the diff version that already had braces in place much more readable. > ?> > ?> There are much larger problems facing diff readibility. On your basis, we might > ?> as well decree that code should never be arranged or reindented. > > That's a reasonable approach sometimes used My goodness, I was trying to make a ridiculous-sounding proposition. -- Regards, Benjamin From pje at telecommunity.com Wed Jan 4 06:07:27 2012 From: pje at telecommunity.com (PJ Eby) Date: Wed, 4 Jan 2012 00:07:27 -0500 Subject: [Python-Dev] Proposed PEP on concurrent programming support In-Reply-To: <20120103164036.681beeae@mikmeyer-vm-fedora> References: <20120103164036.681beeae@mikmeyer-vm-fedora> Message-ID: On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > STM is a relatively new technology being experimented with in newer > languages, and in a number of 3rd party libraries (both Peak [#Peak]_ > and Kamaelia [#Kamaelia]_ provide STM facilities). I don't know about Kamaelia, but PEAK's STM (part of the Trellis event-driven library) is *not* an inter-thread concurrency solution: it's actually used to sort out the order of events in a co-operative multitasking scenario. So, it should not be considered evidence for the practicality of doing inter-thread co-ordination that way in pure Python. A suite is marked > as a `transaction`, and then when an unlocked object is modified, > instead of indicating an error, a locked copy of it is created to be > used through the rest of the transaction. If any of the originals are > modified during the execution of the suite, the suite is rerun from > the beginning. If it completes, the locked copies are copied back to > the originals in an atomic manner. > I'm not sure if "locked" is really the right word here. A private copy isn't "locked" because it's not shared. The disadvantage is that any code in a transaction must be safe to run > multiple times. This forbids any kind of I/O. > More precisely, code in a transaction must be *reversible*, so it doesn't forbid any I/O that can be undone. If you can seek backward in an input file, for example, or delete queued output data, then it can still be done. Even I/O like re-drawing a screen can be made STM safe by making the redraw occur after a transaction that reads and empties a buffer written by other transactions. For > instance, combining STM with explicit locking would allow explicit > locking when IO was required, I don't think this idea makes any sense, since STM's don't really "lock", and to control I/O in an STM system you just STM-ize the queues. (Generally speaking.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 4 07:30:16 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Jan 2012 15:30:16 +0900 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > My goodness, I was trying to make a ridiculous-sounding proposition. In this kind of discussion, that's in the same class as "be careful what you wish for -- because you might just get it." From fijall at gmail.com Wed Jan 4 08:59:15 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 4 Jan 2012 09:59:15 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <63988.1325628139@parc.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: > Christian Heimes wrote: > >> Am 29.12.2011 12:13, schrieb Mark Shannon: >> > The attack relies on being able to predict the hash value for a given >> > string. Randomising the string hash function is quite straightforward. >> > There is no need to change the dictionary code. >> > >> > A possible (*untested*) patch is attached. I'll leave it for those more >> > familiar with unicodeobject.c to do properly. >> >> I'm worried that hash randomization of str is going to break 3rd party >> software that rely on a stable hash across multiple Python instances. >> Persistence layers like ZODB and cross interpreter communication >> channels used by multiprocessing may (!) rely on the fact that the hash >> of a string is fixed. > > Software that depends on an undefined hash function for synchronization > and persistence deserves to break, IMO. ?There are plenty of > well-defined hash functions available for this purpose. > > Bill > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com A lot of software will break their tests, because dict ordering would depend on the particular run. I know, because some of them break on pypy which has a different dict ordering. This is probably a good thing in general, but is it really worth it? People will install python 2.6.newest and stuff *will* break. Is it *really* a security issue? We knew all along that dicts are O(n^2) in worst case scenario, how is this suddenly a security problem? Cheers, fijal From martin at v.loewis.de Wed Jan 4 09:02:14 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 04 Jan 2012 09:02:14 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: <4F040786.3040307@v.loewis.de> > Well what if /dev/urandom is unavailable because the program is run > e.g. in a chroot? If the system ought to have /dev/urandom (as e.g. determined during configure), I propose that Python fails fast, unless the command line option is given that disables random hash seeds. For the security fixes, we therefore might want to toggle the meaning of the command line switch, i.e. only use random seeds if explicitly requested. > (or is /dev/urandom still available in a chroot?) You can make it available if you want to: just create a /dev directory, and do mknod in it. It's common to run /dev/MAKEDEV (or similar), or to mount devfs into a chroot environment; else many programs run in the chroot are likely going to fail (e.g. if /dev/tty is missing). See, for example, http://tldp.org/HOWTO/Chroot-BIND-HOWTO-2.html bind apparently requires /dev/null and /dev/random. Regards, Martin From solipsis at pitrou.net Wed Jan 4 11:55:13 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Jan 2012 11:55:13 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: <20120104115513.39db6b8b@pitrou.net> On Wed, 4 Jan 2012 09:59:15 +0200 Maciej Fijalkowski wrote: > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? Because it has been shown to be exploitable for malicious purposes? Regards Antoine. From lists at cheimes.de Wed Jan 4 12:18:54 2012 From: lists at cheimes.de (Christian Heimes) Date: Wed, 04 Jan 2012 12:18:54 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: <4F04359E.3070804@cheimes.de> Am 04.01.2012 08:59, schrieb Maciej Fijalkowski: > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? For example Microsoft has released an extraordinary and unscheduled security patch for the issue between Christmas and New Year. I don't normally use MS as reference but this should give you a hint about the severity. Have you watched the talk yet? http://www.youtube.com/watch?v=R2Cq3CLI6H8 Christian From victor.stinner at gmail.com Wed Jan 4 04:30:06 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 4 Jan 2012 04:30:06 +0100 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: > (or is /dev/urandom still available in a chroot?) Last time that I played with chroot, I "binded" /dev and /proc. Many programs rely on specific devices like /dev/null. Python should not refuse to start if /dev/urandom (or CryptoGen) is missing or cannot be used, but should use a weak fallback. Victor From victor.stinner at gmail.com Wed Jan 4 04:30:16 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 4 Jan 2012 04:30:16 +0100 Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function In-Reply-To: <20120104023403.1c86c12e@pitrou.net> References: <20120104023403.1c86c12e@pitrou.net> Message-ID: Oops, it's a typo in the doc (copy/paste failure). It's now fixed, thanks. Victor 2012/1/4 Antoine Pitrou : > >> +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \ >> + ? ? ? ? ? ? ? ? ? ? ? ?Py_ssize_t length, Py_UCS4 fill_char) >> + >> + ? Fill a string with a character: write *fill_char* into >> + ? ``unicode[start:start+length]``. >> + >> + ? Fail if *fill_char* is bigger than the string maximum character, or if the >> + ? string has more than 1 reference. >> + >> + ? Return the number of written character, or return ``-1`` and raise an >> + ? exception on error. > > The return type should then be Py_ssize_t, not int. > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40haypocalc.com From brian at python.org Wed Jan 4 15:05:28 2012 From: brian at python.org (Brian Curtin) Date: Wed, 4 Jan 2012 08:05:28 -0600 Subject: [Python-Dev] PEP 7 clarification request: braces In-Reply-To: <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F02BFD3.7080204@v.loewis.de> <87lipo6b75.fsf@uwakimon.sk.tsukuba.ac.jp> <4F034C23.7070000@stoneleaf.us> <87fwfw3y6a.fsf@uwakimon.sk.tsukuba.ac.jp> <87d3b03rpj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jan 4, 2012 at 00:30, Stephen J. Turnbull wrote: > Benjamin Peterson writes: > > ?> My goodness, I was trying to make a ridiculous-sounding proposition. > > In this kind of discussion, that's in the same class as "be careful > what you wish for -- because you might just get it." I wish we could move onto better discussions than brace placement/existence at this point. *crosses fingers* From jimjjewett at gmail.com Wed Jan 4 15:41:19 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 4 Jan 2012 09:41:19 -0500 Subject: [Python-Dev] Proposed PEP on concurrent programming support Message-ID: (I've added back python-ideas, because I think that is still the appropriate forum.) >.... A new > suite type - the ``transaction`` will be added to the language. The > suite will have the semantics discussed above: modifying an object in > the suite will trigger creation of a thread-local shallow copy to be > used in the Transaction. Further modifications of the original will > cause all existing copies to be discarded and the transaction to be > restarted. ... How will you know that an object has been modified? The only ways I can think of are (1) Timestamp every object -- or at least every mutable object -- and hope that everybody agrees on which modifications should count. (2) Make two copies of every object you're using in the suite; at the end, compare one of them to both the original and the one you were operating on. With this solution, you can decide for youself what counts as a modification, but it still isn't straightforward; I would consider changing a value to be changing a dict, even though nothing in the item (header) itself changed. -jJ From barry at python.org Wed Jan 4 16:20:28 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 4 Jan 2012 10:20:28 -0500 Subject: [Python-Dev] RNG in the core In-Reply-To: <20120104025951.0f57cca8@pitrou.net> References: <4F03002A.5010800@cheimes.de> <20120103222053.2325d352@pitrou.net> <4F037F6A.1070806@v.loewis.de> <20120104025951.0f57cca8@pitrou.net> Message-ID: <20120104102028.4c722b77@limelight.wooz.org> On Jan 04, 2012, at 02:59 AM, Antoine Pitrou wrote: >Well what if /dev/urandom is unavailable because the program is run >e.g. in a chroot? >(or is /dev/urandom still available in a chroot?) It is (apparently) in an schroot in Ubuntu, so I'd guess it's also available in Debian (untested). -Barry From ericsnowcurrently at gmail.com Wed Jan 4 20:15:46 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 4 Jan 2012 12:15:46 -0700 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> Message-ID: On Wed, Jan 4, 2012 at 12:59 AM, Maciej Fijalkowski wrote: > On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen wrote: >> Christian Heimes wrote: >> >>> Am 29.12.2011 12:13, schrieb Mark Shannon: >>> > The attack relies on being able to predict the hash value for a given >>> > string. Randomising the string hash function is quite straightforward. >>> > There is no need to change the dictionary code. >>> > >>> > A possible (*untested*) patch is attached. I'll leave it for those more >>> > familiar with unicodeobject.c to do properly. >>> >>> I'm worried that hash randomization of str is going to break 3rd party >>> software that rely on a stable hash across multiple Python instances. >>> Persistence layers like ZODB and cross interpreter communication >>> channels used by multiprocessing may (!) rely on the fact that the hash >>> of a string is fixed. >> >> Software that depends on an undefined hash function for synchronization >> and persistence deserves to break, IMO. ?There are plenty of >> well-defined hash functions available for this purpose. >> >> Bill >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > > A lot of software will break their tests, because dict ordering would > depend on the particular run. I know, because some of them break on > pypy which has a different dict ordering. This is probably a good > thing in general, but is it really worth it? People will install > python 2.6.newest and stuff *will* break. So if we're making the new hashing the default and giving an option to use the old, we should make it _really_ clear in the release notes/announcement about how to revert the behavior. -eric > > Is it *really* a security issue? We knew all along that dicts are > O(n^2) in worst case scenario, how is this suddenly a security > problem? > > Cheers, > fijal > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com From andrew at bemusement.org Thu Jan 5 05:26:27 2012 From: andrew at bemusement.org (Andrew Bennetts) Date: Thu, 5 Jan 2012 15:26:27 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120104115513.39db6b8b@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> Message-ID: <20120105042627.GA10082@flay.puzzling.org> On Wed, Jan 04, 2012 at 11:55:13AM +0100, Antoine Pitrou wrote: > On Wed, 4 Jan 2012 09:59:15 +0200 > Maciej Fijalkowski wrote: > > > > Is it *really* a security issue? We knew all along that dicts are > > O(n^2) in worst case scenario, how is this suddenly a security > > problem? > > Because it has been shown to be exploitable for malicious purposes? I don't think that's news either. http://mail.python.org/pipermail/python-dev/2003-May/035907.html and http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for instance show that in 2003 it was clearly known to at least be likely to be an exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). There was debate about whether it's the language's responsibility to mitigate the problem or if apps should use safer designs for handling untrusted input (e.g. limit the number of keys input is allowed to create, or use something other than dicts), and debate about just how practical an effective exploit would be. But I think it was understood to be a real concern 8 years ago, so not exactly sudden. Just because it's old news doesn't make it not a security problem, of course. -Andrew. From paul at smedley.id.au Thu Jan 5 09:58:29 2012 From: paul at smedley.id.au (Paul Smedley) Date: Thu, 05 Jan 2012 19:28:29 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 Message-ID: Hi All, I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2 platform. I have python.exe and python27.dll compiling find, but when starting to build sharedmods I'm getting the following error: running build running build_ext Traceback (most recent call last): File "./setup.py", line 2092, in main() File "./setup.py", line 2087, in main 'Lib/smtpd.py'] File "U:/DEV/python-2.7.2/Lib/distutils/core.py", line 152, in setup dist.run_commands() File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "U:/DEV/python-2.7.2/Lib/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "./setup.py", line 152, in build_extensions missing = self.detect_modules() File "./setup.py", line 1154, in detect_modules for arg in sysconfig.get_config_var("CONFIG_ARGS").split()] AttributeError: 'NoneType' object has no attribute 'split' make: *** [sharedmods] Error 1 Any suggestions? A google showed a similar error on AIX with no clear resolution. Thanks in advance, Paul From solipsis at pitrou.net Thu Jan 5 14:39:57 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 14:39:57 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> Message-ID: <20120105143957.1b5ba7fe@pitrou.net> On Thu, 5 Jan 2012 15:26:27 +1100 Andrew Bennetts wrote: > > I don't think that's news either. > http://mail.python.org/pipermail/python-dev/2003-May/035907.html and > http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for > instance show that in 2003 it was clearly known to at least be likely to be an > exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). > > There was debate about whether it's the language's responsibility to mitigate > the problem or if apps should use safer designs for handling untrusted input > (e.g. limit the number of keys input is allowed to create, or use something > other than dicts), and debate about just how practical an effective exploit > would be. But I think it was understood to be a real concern 8 years ago, so > not exactly sudden. That's not news indeed, but that doesn't make it less of a problem, especially now that the issue has been widely publicized through a conference and announcements on several widely-read Web sites. That said, only doing the security fix in 3.3 would have the nice side effect of pushing people towards Python 3, so perhaps I'm for it after all. Half-jokingly, Antoine. From mark at hotpy.org Thu Jan 5 14:46:52 2012 From: mark at hotpy.org (Mark Shannon) Date: Thu, 05 Jan 2012 13:46:52 +0000 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. Message-ID: <4F05A9CC.3000806@hotpy.org> Hi, Python code should not depend upon the ordering of items in a dict. Unfortunately it seems that a number of tests in the standard library do just that. Changing PyDict_MINSIZE from 8 to either 4 or 16 causes the following tests to fail: test_dis test_email test_inspect test_nntplib test_packaging test_plistlib test_pprint test_symtable test_trace test_sys also fails, but this is a legitimate failure in sys.getsizeof() Changing the collision resolution function from f(n) = 5n + 1 to f(n) = n + 1 results in the same failures, except for test_packaging and test_symtable which pass. Finally, changing the seed in unicode_hash() from (implicit) 0 to an arbitrary value (12345678) causes the above tests to fail plus: test_json test_set test_ttk_textonly test_urllib test_urlparse I think this is a real issue as the unicode_hash() function is likely to change soon due to http://bugs.python.org/issue13703. Should I: 1. Submit one big bug report? 2. Submit a bug report for each "failing" test separately? 3. Ignore it, since the tests only fail when I start messing about? Cheers, Mark. From solipsis at pitrou.net Thu Jan 5 14:58:13 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 14:58:13 +0100 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. References: <4F05A9CC.3000806@hotpy.org> Message-ID: <20120105145813.35c9b8c5@pitrou.net> On Thu, 05 Jan 2012 13:46:52 +0000 Mark Shannon wrote: > > Should I: > > 1. Submit one big bug report? > > 2. Submit a bug report for each "failing" test separately? I would say a separate bug report for each failing test file, i.e. one report for test_dis, one for test_email etc. Hope this doesn't eat too much of your time :) Regards Antoine. From amauryfa at gmail.com Thu Jan 5 15:02:44 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 5 Jan 2012 15:02:44 +0100 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: 2012/1/5 Paul Smedley > Hi All, > > I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2 > platform. > > I have python.exe and python27.dll compiling find, but when starting to > build sharedmods I'm getting the following error: > running build > running build_ext > Traceback (most recent call last): > File "./setup.py", line 2092, in > main() > File "./setup.py", line 2087, in main > 'Lib/smtpd.py'] > File "U:/DEV/python-2.7.2/Lib/**distutils/core.py", line 152, in setup > dist.run_commands() > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 953, in > run_commands > self.run_command(cmd) > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/**distutils/command/build.py", line 127, > in run > self.run_command(cmd_name) > File "U:/DEV/python-2.7.2/Lib/**distutils/cmd.py", line 326, in > run_command > self.distribution.run_command(**command) > File "U:/DEV/python-2.7.2/Lib/**distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/**distutils/command/build_ext.**py", line > 340, in run > self.build_extensions() > File "./setup.py", line 152, in build_extensions > missing = self.detect_modules() > File "./setup.py", line 1154, in detect_modules > for arg in sysconfig.get_config_var("**CONFIG_ARGS").split()] > AttributeError: 'NoneType' object has no attribute 'split' > make: *** [sharedmods] Error 1 > > > Any suggestions? A google showed a similar error on AIX with no clear > resolution. > Is it in the part that configures the "dbm" module? This paragraph is already protected by a "if platform not in ['cygwin']:", I suggest to exclude 'os2emx' as well. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jan 5 16:15:33 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 5 Jan 2012 10:15:33 -0500 Subject: [Python-Dev] Testing the tests by modifying the ordering of dict items. In-Reply-To: <4F05A9CC.3000806@hotpy.org> References: <4F05A9CC.3000806@hotpy.org> Message-ID: <20120105101533.6265853b@limelight.wooz.org> On Jan 05, 2012, at 01:46 PM, Mark Shannon wrote: >2. Submit a bug report for each "failing" test separately? I'm sure it will be a pain, but this is really the best thing to do. -Barry From fijall at gmail.com Thu Jan 5 18:34:13 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 5 Jan 2012 19:34:13 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105143957.1b5ba7fe@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou wrote: > On Thu, 5 Jan 2012 15:26:27 +1100 > Andrew Bennetts wrote: >> >> I don't think that's news either. >> http://mail.python.org/pipermail/python-dev/2003-May/035907.html and >> http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for >> instance show that in 2003 it was clearly known to at least be likely to be an >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). >> >> There was debate about whether it's the language's responsibility to mitigate >> the problem or if apps should use safer designs for handling untrusted input >> (e.g. limit the number of keys input is allowed to create, or use something >> other than dicts), and debate about just how practical an effective exploit >> would be. ?But I think it was understood to be a real concern 8 years ago, so >> not exactly sudden. > > That's not news indeed, but that doesn't make it less of a problem, > especially now that the issue has been widely publicized through a > conference and announcements on several widely-read Web sites. > > That said, only doing the security fix in 3.3 would have the nice side > effect of pushing people towards Python 3, so perhaps I'm for it after > all. > > Half-jokingly, > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com Just to make things clear - stdlib itself has 1/64 of tests relying on dict order. Changing dict order in *older* pythons will break everyone's tests and some peoples code. Making this new 2.6.x release would mean that people using new python 2.6 would have to upgrade an unspecified amount of their python packages, that does not sound very cool. Also consider that new 2.6.x would go as a security fix to old ubuntu, but all other packages won't, because they'll not contain security fixes. Just so you know Cheers, fijal From v+python at g.nevcal.com Thu Jan 5 20:14:51 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jan 2012 11:14:51 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: <4F05F6AB.3060704@g.nevcal.com> On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote: > Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Just so you know Why should CPython by constrained by broken policies of Ubuntu? If the other packages must be fixed so they work correctly with a security fix in Python, then they should be considered as containing a security fix. If they aren't, then that is a broken policy. On the other hand, it is very true that the seductive convenience of dict (readily available, good performance) in normal cases have created the vulnerability because its characteristics are a function of the data inserted, and when used for data that is from unknown, possibly malicious sources, that is a bug in the program that uses dict, not in dict itself. So it seems to me that: 1) the security problem is not in CPython, but rather in web servers that use dict inappropriately. 2) changing CPython in a way that breaks code is not a security fix to CPython, but rather a gratuitous breakage of compatibility promises, wrapped in a security-fix lie. The problem for CPython here can be summarized as follows: a) it is being blamed for problems in web servers that are not problems in CPython b) perhaps dict documentation is a bit too seductive, in not declaring that data from malicious sources could cause its performance to degrade significantly (but then, any programmer who has actually taken a decent set of programming classes should understand that, but on the other hand, there are programmers who have not taken such classes). c) CPython provides no other mapping data structures that rival the performance and capabilities of dict as an alternative, nor can such data structures be written in CPython, as the performance of dict comes not only from hashing, but also from being written in C. The solutions could be: A) push back on the blame: it is not a CPython problem B) perhaps add a warning to the documentation for the na?ve, untrained programmers C) consider adding an additional data structure to the language, and mention it in the B warning for versions 3.3+. On the other hand, the web server vulnerability could be blamed on CPython in another way: identify vulnerable packages in the stdlib that are likely the be used during the parsing of user-supplied data. Ones that come to mind (Python 3.2) are: urllib.parse (various parse* functions) (package names different in Python 2.x) cgi (parse_multipart, FieldStorage) So, fixing the vulnerable packages could be a sufficient response, rather than changing the hash function. How to fix? Each of those above allocates and returns a dict. Simply have each of those allocate and return and wrapped dict, which has the following behaviors: i) during __init__, create a local, random, string. ii) for all key values, prepend the string, before passing it to the internal dict. Changing these vulnerable packages rather than the hash function is a much more constrained change, and wouldn't create bugs in programs that erroneously depend on the current hash function directly or indirectly. This would not fix web servers that use their own parsing and storage mechanism for
fields, if they have also inappropriately used a dict as their storage mechanism for user supplied data. However, a similar solution could be similarly applied by the authors of those web servers, and would be a security fix to such packages, so should be applied to Ubuntu, if available there, or other systems with security-only fix acceptance. This solution does not require changes to the hash, does not require a cryptographicly secure hash, and does not require code to be added to the initialization of Python before normal objects and mappings can be created. If a port doesn't contain a good random number generator, a weak one can be subsitituted, but such decisions can be made in Python code after the interpreter is initialized, and use of stdlib packages is available. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jan 5 20:22:22 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 20:22:22 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: <20120105202222.228d3f00@pitrou.net> On Thu, 5 Jan 2012 19:34:13 +0200 Maciej Fijalkowski wrote: > > Just to make things clear - stdlib itself has 1/64 of tests relying on > dict order. Changing dict order in *older* pythons will break > everyone's tests and some peoples code. Breaking tests is not a problem: they are typically not run by production code and so people can take the time to fix them. Breaking other code is a problem if it is legitimate. Relying on dict ordering is totally wrong and I don't think we should care about such cases. The only issue is when relying on hash() being stable accross runs. But hashing already varies from build to build (32-bit vs. 64-bit) and I think that anyone seriously relying on it should already have been bitten. > Making this new 2.6.x release > would mean that people using new python 2.6 would have to upgrade an > unspecified amount of their python packages, that does not sound very > cool. How about 2.7? Do you think it should also remain untouched? I am ok for leaving 2.6 alone (that's Barry's call anyway) but 2.7 is another matter - should people migrate to 3.x to get the security fix? As for 3.2, it should certainly get the fix IMO. There are not many Python 3 legacy applications relying on hash() stability, I think. > Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Ubuntu can decide *not* to ship the fix if they prefer it like that. Their policies and decisions, though, should not taint ours. Regards Antoine. From dmalcolm at redhat.com Thu Jan 5 20:33:24 2012 From: dmalcolm at redhat.com (David Malcolm) Date: Thu, 05 Jan 2012 14:33:24 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> Message-ID: <1325792005.2123.11.camel@surprise> On Thu, 2012-01-05 at 19:34 +0200, Maciej Fijalkowski wrote: > On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou wrote: > > On Thu, 5 Jan 2012 15:26:27 +1100 > > Andrew Bennetts wrote: > >> > >> I don't think that's news either. > >> http://mail.python.org/pipermail/python-dev/2003-May/035907.html and > >> http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for > >> instance show that in 2003 it was clearly known to at least be likely to be an > >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys). > >> > >> There was debate about whether it's the language's responsibility to mitigate > >> the problem or if apps should use safer designs for handling untrusted input > >> (e.g. limit the number of keys input is allowed to create, or use something > >> other than dicts), and debate about just how practical an effective exploit > >> would be. But I think it was understood to be a real concern 8 years ago, so > >> not exactly sudden. > > > > That's not news indeed, but that doesn't make it less of a problem, > > especially now that the issue has been widely publicized through a > > conference and announcements on several widely-read Web sites. > > > > That said, only doing the security fix in 3.3 would have the nice side > > effect of pushing people towards Python 3, so perhaps I'm for it after > > all. > > > > Half-jokingly, > > > > Antoine. > > Just to make things clear - stdlib itself has 1/64 of tests relying on > dict order. Changing dict order in *older* pythons will break > everyone's tests and some peoples code. Making this new 2.6.x release > would mean that people using new python 2.6 would have to upgrade an > unspecified amount of their python packages, that does not sound very > cool. Also consider that new 2.6.x would go as a security fix to old > ubuntu, but all other packages won't, because they'll not contain > security fixes. Just so you know We have similar issues in RHEL, with the Python versions going much further back (e.g. 2.3) When backporting the fix to ancient python versions, I'm inclined to turn the change *off* by default, requiring the change to be enabled via an environment variable: I want to avoid breaking existing code, even if such code is technically relying on non-guaranteed behavior. But we could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. That way /usr/bin/python would default to the old behavior, but web apps would have some protection. Any such logic here also suggests the need for an attribute in the sys module so that you can verify the behavior. From tseaver at palladion.com Thu Jan 5 20:49:53 2012 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 05 Jan 2012 14:49:53 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F05F6AB.3060704@g.nevcal.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/05/2012 02:14 PM, Glenn Linderman wrote: > 1) the security problem is not in CPython, but rather in web servers > that use dict inappropriately. Most webapp vulnerabilities are due to their use of Python's cgi module, which it uses a dict to hold the form / query string data being supplied by untrusted external users. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8F/uEACgkQ+gerLs4ltQ679QCgqKPYYwEetKR3bEMVh5eukLin cA8An3XJMYWhK5MutjbOCxCfYzKXmDzc =V3lh -----END PGP SIGNATURE----- From paul at smedley.id.au Thu Jan 5 21:01:53 2012 From: paul at smedley.id.au (Paul Smedley) Date: Fri, 06 Jan 2012 06:31:53 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: Hi Amaury, On 06/01/12 00:32, Amaury Forgeot d'Arc wrote: > 2012/1/5 Paul Smedley > > > Hi All, > > I'm working on updating my port of Python 2.6.5 to v2.7.2 for the > OS/2 platform. > > I have python.exe and python27.dll compiling find, but when starting > to build sharedmods I'm getting the following error: > running build > running build_ext > Traceback (most recent call last): > File "./setup.py", line 2092, in > main() > File "./setup.py", line 2087, in main > 'Lib/smtpd.py'] > File "U:/DEV/python-2.7.2/Lib/__distutils/core.py", line 152, in setup > dist.run_commands() > File "U:/DEV/python-2.7.2/Lib/__distutils/dist.py", line 953, in > run_commands > self.run_command(cmd) > File "U:/DEV/python-2.7.2/Lib/__distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/__distutils/command/build.py", line > 127, in run > self.run_command(cmd_name) > File "U:/DEV/python-2.7.2/Lib/__distutils/cmd.py", line 326, in > run_command > self.distribution.run_command(__command) > File "U:/DEV/python-2.7.2/Lib/__distutils/dist.py", line 972, in > run_command > cmd_obj.run() > File "U:/DEV/python-2.7.2/Lib/__distutils/command/build_ext.__py", > line 340, in run > self.build_extensions() > File "./setup.py", line 152, in build_extensions > missing = self.detect_modules() > File "./setup.py", line 1154, in detect_modules > for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()] > AttributeError: 'NoneType' object has no attribute 'split' > make: *** [sharedmods] Error 1 > > > Any suggestions? A google showed a similar error on AIX with no > clear resolution. > > > Is it in the part that configures the "dbm" module? > This paragraph is already protected by a "if platform not in ['cygwin']:", > I suggest to exclude 'os2emx' as well. It is - however adding os2 the the list of platforms to the ones to exclude gets me only a little further: It then bombs with: running build running build_ext Traceback (most recent call last): File "./setup.py", line 2092, in main() File "./setup.py", line 2087, in main 'Lib/smtpd.py'] File "U:/DEV/python-2.7.2/Lib/distutils/core.py", line 152, in setup dist.run_commands() File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "U:/DEV/python-2.7.2/Lib/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "U:/DEV/python-2.7.2/Lib/distutils/dist.py", line 972, in run_command cmd_obj.run() File "U:/DEV/python-2.7.2/Lib/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "./setup.py", line 152, in build_extensions missing = self.detect_modules() File "./setup.py", line 1368, in detect_modules if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"): TypeError: argument of type 'NoneType' is not iterable make: *** [sharedmods] Error 1 Which again points to problems with sysconfig.get_config_var("CONFIG_ARGS"): Thanks, Paul From v+python at g.nevcal.com Thu Jan 5 21:19:25 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jan 2012 12:19:25 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: <4F0605CD.7010500@g.nevcal.com> On 1/5/2012 11:49 AM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/05/2012 02:14 PM, Glenn Linderman wrote: >> 1) the security problem is not in CPython, but rather in web servers >> that use dict inappropriately. > Most webapp vulnerabilities are due to their use of Python's cgi module, > which it uses a dict to hold the form / query string data being supplied > by untrusted external users. Yes, I understand that (and have some such web apps in production). In fact, I pointed out urllib.parse and cgi as specific modules for which a proposed fix could be made without impacting the Python hash function. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 5 21:10:35 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 05 Jan 2012 12:10:35 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: <4F0603BB.2030204@stoneleaf.us> Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/05/2012 02:14 PM, Glenn Linderman wrote: >> 1) the security problem is not in CPython, but rather in web servers >> that use dict inappropriately. > > Most webapp vulnerabilities are due to their use of Python's cgi module, > which it uses a dict to hold the form / query string data being supplied > by untrusted external users. And Glenn suggested further down that an appropriate course of action would be to fix the cgi module (and others) instead of messing with dict. ~Ethan~ From p.f.moore at gmail.com Thu Jan 5 21:35:57 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Jan 2012 20:35:57 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <1325792005.2123.11.camel@surprise> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: On 5 January 2012 19:33, David Malcolm wrote: > We have similar issues in RHEL, with the Python versions going much > further back (e.g. 2.3) > > When backporting the fix to ancient python versions, I'm inclined to > turn the change *off* by default, requiring the change to be enabled via > an environment variable: I want to avoid breaking existing code, even if > such code is technically relying on non-guaranteed behavior. ?But we > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > That way /usr/bin/python would default to the old behavior, but web apps > would have some protection. ? Any such logic here also suggests the need > for an attribute in the sys module so that you can verify the behavior. Uh, surely no-one is suggesting backporting to "ancient" versions? I couldn't find the statement quickly on the python.org website (so this is via google), but isn't it true that 2.6 is in security-only mode and 2.5 and earlier will never get the fix? Having a source-only release for 2.6 means the fix is "off by default" in the sense that you can choose not to build it. Or add a #ifdef to the source if it really matters. Personally, I find it hard to see this as a Python security hole, but I can sympathise with the idea that it would be nice to make dict "safer by default". (Although the benefit for me personally would be zero, so I'm reluctant for the change to have a detectable cost...) My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no bells and whistles to switch it off or the like. If it's not suitable to go in on that basis, restrict it to 3.3+ (where it's certainly OK) and advise users of earlier versions to either upgrade or code defensively to avoid hitting the pathological case. Surely that sort of defensive code should be second nature to the people who might be affected by the issue? Paul. From barry at python.org Thu Jan 5 21:45:58 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 5 Jan 2012 15:45:58 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <1325792005.2123.11.camel@surprise> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: <20120105154558.1a9c95df@resist.wooz.org> On Jan 05, 2012, at 02:33 PM, David Malcolm wrote: >We have similar issues in RHEL, with the Python versions going much >further back (e.g. 2.3) > >When backporting the fix to ancient python versions, I'm inclined to >turn the change *off* by default, requiring the change to be enabled via >an environment variable: I want to avoid breaking existing code, even if >such code is technically relying on non-guaranteed behavior. But we >could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. >That way /usr/bin/python would default to the old behavior, but web apps >would have some protection. This sounds like a reasonable compromise for all stable Python releases. It can be turned on by default for Python 3.3. If you also make the default setting easy to change (i.e. parameterized in one place), then distros can make their own decision about the default, although I'd argue for the above default approach for Debian/Ubuntu. >Any such logic here also suggests the need for an attribute in the sys module >so that you can verify the behavior. That would be read-only though, right? -Barry From barry at python.org Thu Jan 5 21:50:34 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 5 Jan 2012 15:50:34 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: <20120105155034.7c5f91aa@resist.wooz.org> On Jan 05, 2012, at 08:35 PM, Paul Moore wrote: >Uh, surely no-one is suggesting backporting to "ancient" versions? I >couldn't find the statement quickly on the python.org website (so this >is via google), but isn't it true that 2.6 is in security-only mode >and 2.5 and earlier will never get the fix? Having a source-only >release for 2.6 means the fix is "off by default" in the sense that >you can choose not to build it. Or add a #ifdef to the source if it >really matters. Correct, although there's no reason why a patch for versions older than 2.6 couldn't be included on a python.org security page for reference in CVE or other security notifications. Distros that care about versions older than Python 2.6 will basically be back-porting the patch anyway. >My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no >bells and whistles to switch it off or the like. I like David Malcolm's suggestion, but I have no problem applying it to 3.3, enabled by default with no way to turn it off. The off-by-default on-switch policy for stable releases would be justified by maximum backward compatibility conservativeness. -Barry From a.badger at gmail.com Thu Jan 5 21:51:50 2012 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 5 Jan 2012 12:51:50 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: <20120105205150.GM5336@unaka.lan> On Thu, Jan 05, 2012 at 08:35:57PM +0000, Paul Moore wrote: > On 5 January 2012 19:33, David Malcolm wrote: > > We have similar issues in RHEL, with the Python versions going much > > further back (e.g. 2.3) > > > > When backporting the fix to ancient python versions, I'm inclined to > > turn the change *off* by default, requiring the change to be enabled via > > an environment variable: I want to avoid breaking existing code, even if > > such code is technically relying on non-guaranteed behavior. ?But we > > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > > That way /usr/bin/python would default to the old behavior, but web apps > > would have some protection. ? Any such logic here also suggests the need > > for an attribute in the sys module so that you can verify the behavior. > > Uh, surely no-one is suggesting backporting to "ancient" versions? I > couldn't find the statement quickly on the python.org website (so this > is via google), but isn't it true that 2.6 is in security-only mode > and 2.5 and earlier will never get the fix? > I think when dmalcolm says "backporting" he means that he'll have to backport the fix from modern, supported-by-python.org python to the ancient python's that he's supporting as part of the Linux distributions where he's the python package maintainer. I'm thinking he's mentioning it here mainly to see if someone thinks that his approach for those distributions causes anyone to point out a reason not to diverge from upstream in that manner. > Having a source-only > release for 2.6 means the fix is "off by default" in the sense that > you can choose not to build it. Or add a #ifdef to the source if it > really matters. > I don't think that this would satisfy dmalcolm's needs. What he's talking about sounds more like a runtime switch (possibly only when initializing, though, not on-the-fly). -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From dmalcolm at redhat.com Thu Jan 5 21:52:15 2012 From: dmalcolm at redhat.com (David Malcolm) Date: Thu, 05 Jan 2012 15:52:15 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: <1325796736.2123.16.camel@surprise> On Thu, 2012-01-05 at 20:35 +0000, Paul Moore wrote: > On 5 January 2012 19:33, David Malcolm wrote: > > We have similar issues in RHEL, with the Python versions going much > > further back (e.g. 2.3) > > > > When backporting the fix to ancient python versions, I'm inclined to > > turn the change *off* by default, requiring the change to be enabled via > > an environment variable: I want to avoid breaking existing code, even if > > such code is technically relying on non-guaranteed behavior. But we > > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > > That way /usr/bin/python would default to the old behavior, but web apps > > would have some protection. Any such logic here also suggests the need > > for an attribute in the sys module so that you can verify the behavior. > > Uh, surely no-one is suggesting backporting to "ancient" versions? I > couldn't find the statement quickly on the python.org website (so this > is via google), but isn't it true that 2.6 is in security-only mode > and 2.5 and earlier will never get the fix? Having a source-only > release for 2.6 means the fix is "off by default" in the sense that > you can choose not to build it. Or add a #ifdef to the source if it > really matters. Sorry, if I was unclear. I don't expect python-dev to do this backporting, but those of us who do maintain such ancient pythons via Linux distributions may want to do the backport for our users. My email was to note that it may make sense to pick more conservative defaults for such a scenario, as compared to 2.6 onwards. [snip] Hope this is helpful Dave From g.brandl at gmx.net Thu Jan 5 21:52:40 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 05 Jan 2012 21:52:40 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105154558.1a9c95df@resist.wooz.org> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> Message-ID: On 01/05/2012 09:45 PM, Barry Warsaw wrote: > On Jan 05, 2012, at 02:33 PM, David Malcolm wrote: > >>We have similar issues in RHEL, with the Python versions going much >>further back (e.g. 2.3) >> >>When backporting the fix to ancient python versions, I'm inclined to >>turn the change *off* by default, requiring the change to be enabled via >>an environment variable: I want to avoid breaking existing code, even if >>such code is technically relying on non-guaranteed behavior. But we >>could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. >>That way /usr/bin/python would default to the old behavior, but web apps >>would have some protection. > > This sounds like a reasonable compromise for all stable Python releases. It > can be turned on by default for Python 3.3. If you also make the default > setting easy to change (i.e. parameterized in one place), then distros can > make their own decision about the default, although I'd argue for the above > default approach for Debian/Ubuntu. Agreed. Georg From lists at cheimes.de Thu Jan 5 22:40:58 2012 From: lists at cheimes.de (Christian Heimes) Date: Thu, 05 Jan 2012 22:40:58 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105154558.1a9c95df@resist.wooz.org> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> Message-ID: <4F0618EA.3080405@cheimes.de> Am 05.01.2012 21:45, schrieb Barry Warsaw: > This sounds like a reasonable compromise for all stable Python releases. It > can be turned on by default for Python 3.3. If you also make the default > setting easy to change (i.e. parameterized in one place), then distros can > make their own decision about the default, although I'd argue for the above > default approach for Debian/Ubuntu. Hey Barry, stop stealing my ideas! :) I've argued for these default settings for days. ver delivery randomized hashing ========================================== 2.3 patch disabled by default 2.4 patch disabled 2.5 patch disabled 2.6 release disabled 2.7 release disabled 3.0 ignore? disabled 3.1 release disabled 3.2 release disabled 3.3 n/a yet enabled by default 2.3 to 2.5 are still used in production (RHEL, Ubuntu LTS). Guido has stated that he needs a patch for 2.4, too. I think we may safely ignore Python 3.0. Nobody should use Python 3.0 on a production system. I've suggested the env var PYRANDOMHASH. It's easy to set env vars in Apache. For example Debian/Ubuntu has /etc/apache2/envvars. Settings for PYRANDOMHASH: PYRANDOMHASH=1 enable randomized hashing function PYRANDOMHASH=/path/to/seed enable randomized hashing function and read seed from 'seed' PYRANDOMHASH=0 disable randomed hashing function Since there isn't an easy way to set env vars in a shebang line since something like #!/usr/bin/env PYRANDOMHASH=1 python2.7 doesn't work, we could come up with a solution the shebang. IMHO the setting for the default setting should be a compile time option. It's reasonable easy to extend the configure script to support --enable-randomhash / --disable-randomhash. The MS VC build scripts can grow a flag, too. I still think that the topic needs a PEP. A couple of days ago I started with a PEP. But Guido told me that he doesn't see a point in a PEP because he prefers a small and quick solution, so I stopped working on it. However the arguments, worries and ideas in this enormous topic have repeated over and over. We know from experience that a PEP is a great way to explain the how, what and why of the change as well as the paths we didn't take. Christian From neologix at free.fr Thu Jan 5 22:44:26 2012 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Thu, 5 Jan 2012 22:44:26 +0100 Subject: [Python-Dev] usefulness of Python version of threading.RLock Message-ID: Hi, Issue #13697 (http://bugs.python.org/issue13697) deals with a problem with the Python version of threading.RLock (a signal handler which tries to acquire the same RLock is called right at the wrong time) which doesn't affect the C version. Whether such a use case can be considered good practise or the best way to fix this is not settled yet, but the question that arose to me is: "why do we have both a C and Python version?". Here's Antoine answer (he suggested to me to bring this up on python-dev": """ The C version is quite recent, and there's a school of thought that we should always provide fallback Python implementations. (also, arguably a Python implementation makes things easier to prototype, although I don't think it's the case for an RLock) """ So, what do you guys think? Would it be okay to nuke the Python version? Do you have more details on this "school of thought"? Also, while we're at it, Victor created #13550 to try to rewrite the "logging hack" of the threading module: there again, I think we could just remove this logging altogether. What do you think? Cheers, cf From lists at cheimes.de Thu Jan 5 22:46:06 2012 From: lists at cheimes.de (Christian Heimes) Date: Thu, 05 Jan 2012 22:46:06 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0603BB.2030204@stoneleaf.us> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> <4F0603BB.2030204@stoneleaf.us> Message-ID: <4F061A1E.4050601@cheimes.de> Am 05.01.2012 21:10, schrieb Ethan Furman: > Tres Seaver wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 01/05/2012 02:14 PM, Glenn Linderman wrote: >>> 1) the security problem is not in CPython, but rather in web servers >>> that use dict inappropriately. >> >> Most webapp vulnerabilities are due to their use of Python's cgi module, >> which it uses a dict to hold the form / query string data being supplied >> by untrusted external users. > > And Glenn suggested further down that an appropriate course of action > would be to fix the cgi module (and others) instead of messing with dict. You'd have to fix any Python core module that may handle data from untrusted sources. The issue isn't limited to web apps and POST requests. It's possible to trigger the DoS from JSON, a malicious PDF, JPEG's EXIF metadata or any other data. Oh, and somebody has to fix all 3rd party modules, too. Christian From solipsis at pitrou.net Thu Jan 5 22:59:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 22:59:59 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> <4F0618EA.3080405@cheimes.de> Message-ID: <20120105225959.6e9dd89f@pitrou.net> On Thu, 05 Jan 2012 22:40:58 +0100 Christian Heimes wrote: > Am 05.01.2012 21:45, schrieb Barry Warsaw: > > This sounds like a reasonable compromise for all stable Python releases. It > > can be turned on by default for Python 3.3. If you also make the default > > setting easy to change (i.e. parameterized in one place), then distros can > > make their own decision about the default, although I'd argue for the above > > default approach for Debian/Ubuntu. > > Hey Barry, stop stealing my ideas! :) I've argued for these default > settings for days. > > ver delivery randomized hashing > ========================================== > 2.3 patch disabled by default > 2.4 patch disabled > 2.5 patch disabled > 2.6 release disabled > 2.7 release disabled > 3.0 ignore? disabled > 3.1 release disabled > 3.2 release disabled > 3.3 n/a yet enabled by default I don't think we (python-dev) are really concerned with 2.3, 2.4, 2.5 and 3.0. They're all unsupported, and people do what they want with their local source trees. Regards Antoine. From ericsnowcurrently at gmail.com Thu Jan 5 23:02:42 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 5 Jan 2012 15:02:42 -0700 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: 2012/1/5 Charles-Fran?ois Natali : > Hi, > > Issue #13697 (http://bugs.python.org/issue13697) deals with a problem > with the Python version of threading.RLock (a signal handler which > tries to acquire the same RLock is called right at the wrong time) > which doesn't affect the C version. > Whether such a use case can be considered good practise or the best > way to fix this is not settled yet, but the question that arose to me > is: "why do we have both a C and Python version?". > Here's Antoine answer (he suggested to me to bring this up on python-dev": > """ > The C version is quite recent, and there's a school of thought that we > should always provide fallback Python implementations. > (also, arguably a Python implementation makes things easier to > prototype, although I don't think it's the case for an RLock) > """ > > So, what do you guys think? > Would it be okay to nuke the Python version? > Do you have more details on this "school of thought"? >From what I understand, the biggest motivation for pure Python versions is cooperation with the other Python implementations. See http://www.python.org/dev/peps/pep-0399/ -eric > > Also, while we're at it, Victor created #13550 to try to rewrite the > "logging hack" of the threading module: there again, I think we could > just remove this logging altogether. What do you think? > > Cheers, > > cf > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com From lists at cheimes.de Thu Jan 5 23:11:41 2012 From: lists at cheimes.de (Christian Heimes) Date: Thu, 05 Jan 2012 23:11:41 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105225959.6e9dd89f@pitrou.net> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> <4F0618EA.3080405@cheimes.de> <20120105225959.6e9dd89f@pitrou.net> Message-ID: <4F06201D.7010302@cheimes.de> Am 05.01.2012 22:59, schrieb Antoine Pitrou: > I don't think we (python-dev) are really concerned with 2.3, 2.4, > 2.5 and 3.0. They're all unsupported, and people do what they want > with their local source trees. Let me reply with a quote from Barry: > Correct, although there's no reason why a patch for versions > older than 2.6 couldn't be included on a python.org security > page for reference in CVE or other security notifications. > Distros that care about versions older than Python 2.6 will > basically be back-porting the patch anyway. Christian From storchaka at gmail.com Thu Jan 5 23:15:31 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 06 Jan 2012 00:15:31 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F05F6AB.3060704@g.nevcal.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: 05.01.12 21:14, Glenn Linderman ???????(??): > So, fixing the vulnerable packages could be a sufficient response, > rather than changing the hash function. How to fix? Each of those > above allocates and returns a dict. Simply have each of those allocate > and return and wrapped dict, which has the following behaviors: > > i) during __init__, create a local, random, string. > ii) for all key values, prepend the string, before passing it to the > internal dict. Good idea. -------------- next part -------------- A non-text attachment was scrubbed... Name: SafeDict.py Type: text/x-python Size: 1923 bytes Desc: not available URL: From solipsis at pitrou.net Thu Jan 5 23:17:18 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 5 Jan 2012 23:17:18 +0100 Subject: [Python-Dev] usefulness of Python version of threading.RLock References: Message-ID: <20120105231718.3765b5f2@pitrou.net> On Thu, 5 Jan 2012 15:02:42 -0700 Eric Snow wrote: > 2012/1/5 Charles-Fran?ois Natali : > > Hi, > > > > Issue #13697 (http://bugs.python.org/issue13697) deals with a problem > > with the Python version of threading.RLock (a signal handler which > > tries to acquire the same RLock is called right at the wrong time) > > which doesn't affect the C version. > > Whether such a use case can be considered good practise or the best > > way to fix this is not settled yet, but the question that arose to me > > is: "why do we have both a C and Python version?". > > Here's Antoine answer (he suggested to me to bring this up on python-dev": > > """ > > The C version is quite recent, and there's a school of thought that we > > should always provide fallback Python implementations. > > (also, arguably a Python implementation makes things easier to > > prototype, although I don't think it's the case for an RLock) > > """ > > > > So, what do you guys think? > > Would it be okay to nuke the Python version? > > Do you have more details on this "school of thought"? > > >From what I understand, the biggest motivation for pure Python > versions is cooperation with the other Python implementations. See > http://www.python.org/dev/peps/pep-0399/ Apologies, I didn't remember it was written down in PEP. A bit more than a school of thought, then :-) Regards Antoine. From tjreedy at udel.edu Fri Jan 6 00:55:58 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 05 Jan 2012 18:55:58 -0500 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: On 1/5/2012 3:01 PM, Paul Smedley wrote: >> File "./setup.py", line 1154, in detect_modules >> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()] >> AttributeError: 'NoneType' object has no attribute 'split' >> make: *** [sharedmods] Error 1 > File "./setup.py", line 1368, in detect_modules > if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"): > TypeError: argument of type 'NoneType' is not iterable > make: *** [sharedmods] Error 1 > > Which again points to problems with > sysconfig.get_config_var("CONFIG_ARGS"): [The earlier call was with "__CONFIG_ARGS", for whatever difference that makes.] It appears to be returning None instead of [] (or a populated list). In 3.2.2, at line 579 of sysconfig.py is def get_config_var(name): return get_config_vars().get(name) That defaults to None if name is not a key in the dict returned by get_config_vars(). My guess is that it always is and and the the value is always a list for tested win/*nix/mac systems. So either setup.py has the bug of assuming that there is always a list value for "CONFIG_ARGS" or sysconfig.py has the bug of not setting it for os2, perhaps because of a bug elsewhere. At line 440 of sysconfig.py is def get_config_var(*args): global _CONFIG_VARS if _CONFIG_VARS is None: _CONFIG_VARS = {} if os.name in ('nt', 'os2'): _init_non_posix(_CONFIG_VARS) if args: vals = [] for name in args: vals.append(_CONFIG_VARS.get(name)) return vals else: return _CONFIG_VARS At 456 is def _init_non_posix(vars): """Initialize the module as appropriate for NT""" # set basic install directories ... "CONFIG_ARGS" is not set explicitly for any system anywhere in the file, so I do not know how the call ever works. -- Terry Jan Reedy From ncoghlan at gmail.com Fri Jan 6 01:10:52 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jan 2012 10:10:52 +1000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka wrote: > 05.01.12 21:14, Glenn Linderman ???????(??): >> >> So, fixing the vulnerable packages could be a sufficient response, >> rather than changing the hash function. ?How to fix? ?Each of those >> above allocates and returns a dict. ?Simply have each of those allocate >> and return and wrapped dict, which has the following behaviors: >> >> i) during __init__, create a local, random, string. >> ii) for all key values, prepend the string, before passing it to the >> internal dict. > > > Good idea. Not a good idea - a lot of the 3rd party tests that depend on dict ordering are going to be using those modules anyway, so scattering our solution across half the standard library is needlessly creating additional work without really reducing the incompatibility problem. If we're going to change anything, it may as well be the string hashing algorithm itself. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Fri Jan 6 01:11:22 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 05 Jan 2012 19:11:22 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0603BB.2030204@stoneleaf.us> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> <4F0603BB.2030204@stoneleaf.us> Message-ID: On 1/5/2012 3:10 PM, Ethan Furman wrote: > Tres Seaver wrote: >>> 1) the security problem is not in CPython, but rather in web servers >>> that use dict inappropriately. >> >> Most webapp vulnerabilities are due to their use of Python's cgi module, >> which it uses a dict to hold the form / query string data being supplied >> by untrusted external users. > > And Glenn suggested further down that an appropriate course of action > would be to fix the cgi module (and others) instead of messing with dict. I think both should be done. For web applications, it would be best to reject DOS attempts with 'random' keys in O(1) time rather than in O(n) time even with improved hash. But some other apps, like the Python interpreter itself, 'random' names may be quite normal. -- Terry Jan Reedy From steve at pearwood.info Fri Jan 6 01:07:27 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 06 Jan 2012 11:07:27 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <1325792005.2123.11.camel@surprise> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> Message-ID: <4F063B3F.9030903@pearwood.info> David Malcolm wrote: > When backporting the fix to ancient python versions, I'm inclined to > turn the change *off* by default, requiring the change to be enabled via > an environment variable: I want to avoid breaking existing code, even if > such code is technically relying on non-guaranteed behavior. But we > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*. > That way /usr/bin/python would default to the old behavior, but web apps > would have some protection. Any such logic here also suggests the need > for an attribute in the sys module so that you can verify the behavior. Surely the way to verify the behaviour is to run this from the shell: python -c print(hash("abcde")) twice, and see that the calls return different values. (Or have I misunderstood the way the fix is going to work?) In any case, I wouldn't want to rely on the presence of a flag in the sys module to verify the behaviour, I'd want to see for myself that hash collisions are no longer predictable. -- Steven From barry at python.org Fri Jan 6 01:31:28 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 5 Jan 2012 19:31:28 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0618EA.3080405@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> <4F0618EA.3080405@cheimes.de> Message-ID: <20120105193128.0ad39332@limelight.wooz.org> On Jan 05, 2012, at 10:40 PM, Christian Heimes wrote: >Hey Barry, stop stealing my ideas! :) I've argued for these default >settings for days. :) >I've suggested the env var PYRANDOMHASH. It's easy to set env vars in >Apache. For example Debian/Ubuntu has /etc/apache2/envvars. For consistency, it really should be PYTHONSOMETHING. I personally don't care how long it is (e.g. PYTHONIOENCODING). >Settings for PYRANDOMHASH: > > PYRANDOMHASH=1 > enable randomized hashing function > > PYRANDOMHASH=/path/to/seed > enable randomized hashing function and read seed from 'seed' > > PYRANDOMHASH=0 > disable randomed hashing function Why not PYTHONHASHSEED then? >Since there isn't an easy way to set env vars in a shebang line since >something like > > #!/usr/bin/env PYRANDOMHASH=1 python2.7 > >doesn't work, we could come up with a solution the shebang. We have precedence for mirroring startup options and envars, so it doesn't bother me to add such a switch to Python 3.3. It *does* bother me to add a switch to any stable release. >IMHO the setting for the default setting should be a compile time >option. It's reasonable easy to extend the configure script to support >--enable-randomhash / --disable-randomhash. The MS VC build scripts can >grow a flag, too. > >I still think that the topic needs a PEP. A couple of days ago I started >with a PEP. But Guido told me that he doesn't see a point in a PEP >because he prefers a small and quick solution, so I stopped working on >it. However the arguments, worries and ideas in this enormous topic have >repeated over and over. We know from experience that a PEP is a great >way to explain the how, what and why of the change as well as the paths >we didn't take. One way to look at it is to have a quick-and-dirty solution for stable releases. It could be suboptimal from a ui point of view because of backward compatibility issues. The PEP could then outline the boffo perfect solution for Python 3.3, which a section on how it will be backported to stable releases. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Fri Jan 6 01:34:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jan 2012 10:34:55 +1000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F063B3F.9030903@pearwood.info> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> Message-ID: On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: > Surely the way to verify the behaviour is to run this from the shell: > > python -c print(hash("abcde")) > > twice, and see that the calls return different values. (Or have I > misunderstood the way the fix is going to work?) > > In any case, I wouldn't want to rely on the presence of a flag in the sys > module to verify the behaviour, I'd want to see for myself that hash > collisions are no longer predictable. More directly, you can just check that the hash of the empty string is non-zero. So -1 for a flag in the sys module - "hash('') != 0" should serve as a sufficient check whether or not process-level string hash randomisation is in effect. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at gmail.com Fri Jan 6 01:46:58 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 6 Jan 2012 01:46:58 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <20120105193128.0ad39332@limelight.wooz.org> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <20120105154558.1a9c95df@resist.wooz.org> <4F0618EA.3080405@cheimes.de> <20120105193128.0ad39332@limelight.wooz.org> Message-ID: 2012/1/6 Barry Warsaw : >>Settings for PYRANDOMHASH: >> >> PYRANDOMHASH=1 >> ? enable randomized hashing function >> >> PYRANDOMHASH=/path/to/seed >> ? enable randomized hashing function and read seed from 'seed' >> >> PYRANDOMHASH=0 >> ? disable randomed hashing function > > Why not PYTHONHASHSEED then? See my patch attached to the issue #13703? I prepared the code to be able to set easily the hash seed (it has a LCG, it's seed can be provided by the user directly). I agree that the value 0 should give the same behaviour than the actual hash (disable the randomized hash). I will add the variable in the next version of my patch. From lists at cheimes.de Fri Jan 6 01:50:00 2012 From: lists at cheimes.de (Christian Heimes) Date: Fri, 06 Jan 2012 01:50:00 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> Message-ID: <4F064538.3080407@cheimes.de> Am 06.01.2012 01:34, schrieb Nick Coghlan: > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: >> Surely the way to verify the behaviour is to run this from the shell: >> >> python -c print(hash("abcde")) >> >> twice, and see that the calls return different values. (Or have I >> misunderstood the way the fix is going to work?) >> >> In any case, I wouldn't want to rely on the presence of a flag in the sys >> module to verify the behaviour, I'd want to see for myself that hash >> collisions are no longer predictable. > > More directly, you can just check that the hash of the empty string is non-zero. > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > sufficient check whether or not process-level string hash > randomisation is in effect. This might not work as we have to special case empty strings and perhaps \0 strings, too. Otherwise we would give away the random seed to an attacker if an attacker can somehow get hold of hash('') or hash(n * '\0'). Christian From benjamin at python.org Fri Jan 6 01:59:49 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 5 Jan 2012 18:59:49 -0600 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> Message-ID: 2012/1/5 Nick Coghlan : > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: >> Surely the way to verify the behaviour is to run this from the shell: >> >> python -c print(hash("abcde")) >> >> twice, and see that the calls return different values. (Or have I >> misunderstood the way the fix is going to work?) >> >> In any case, I wouldn't want to rely on the presence of a flag in the sys >> module to verify the behaviour, I'd want to see for myself that hash >> collisions are no longer predictable. > > More directly, you can just check that the hash of the empty string is non-zero. > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > sufficient check whether or not process-level string hash > randomisation is in effect. What exactly is the disadvantage of a sys attribute? That would seem preferable to an obscure incarnation like that. -- Regards, Benjamin From solipsis at pitrou.net Fri Jan 6 01:59:10 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 6 Jan 2012 01:59:10 +0100 Subject: [Python-Dev] Hash collision security issue (now public) References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F064538.3080407@cheimes.de> Message-ID: <20120106015910.2a5dea28@pitrou.net> On Fri, 06 Jan 2012 01:50:00 +0100 Christian Heimes wrote: > Am 06.01.2012 01:34, schrieb Nick Coghlan: > > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: > >> Surely the way to verify the behaviour is to run this from the shell: > >> > >> python -c print(hash("abcde")) > >> > >> twice, and see that the calls return different values. (Or have I > >> misunderstood the way the fix is going to work?) > >> > >> In any case, I wouldn't want to rely on the presence of a flag in the sys > >> module to verify the behaviour, I'd want to see for myself that hash > >> collisions are no longer predictable. > > > > More directly, you can just check that the hash of the empty string is non-zero. > > > > So -1 for a flag in the sys module - "hash('') != 0" should serve as a > > sufficient check whether or not process-level string hash > > randomisation is in effect. > > This might not work as we have to special case empty strings and perhaps > \0 strings, too. The special case value doesn't have to be zero. Make it age(Barry) for example (which, I think, is still representable in a 32-bit integer!). Regards Antoine. From ncoghlan at gmail.com Fri Jan 6 02:33:50 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 6 Jan 2012 11:33:50 +1000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> Message-ID: On Fri, Jan 6, 2012 at 10:59 AM, Benjamin Peterson wrote: > What exactly is the disadvantage of a sys attribute? That would seem > preferable to an obscure incarnation like that. Adding sys attributes in maintenance (or security) releases makes me nervous. However, Victor and Christian are right about the need for a special case to avoid leaking information, so my particular suggested check won't work. The most robust check would be to run sys.executable in a subprocess and check if it gives the same hash for a non-empty string as the current process. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Fri Jan 6 02:52:45 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 06 Jan 2012 12:52:45 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> Message-ID: <4F0653ED.40204@pearwood.info> Benjamin Peterson wrote: > 2012/1/5 Nick Coghlan : >> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano wrote: >>> Surely the way to verify the behaviour is to run this from the shell: >>> >>> python -c print(hash("abcde")) >>> >>> twice, and see that the calls return different values. (Or have I >>> misunderstood the way the fix is going to work?) >>> >>> In any case, I wouldn't want to rely on the presence of a flag in the sys >>> module to verify the behaviour, I'd want to see for myself that hash >>> collisions are no longer predictable. >> More directly, you can just check that the hash of the empty string is non-zero. >> >> So -1 for a flag in the sys module - "hash('') != 0" should serve as a >> sufficient check whether or not process-level string hash >> randomisation is in effect. > > What exactly is the disadvantage of a sys attribute? That would seem > preferable to an obscure incarnation like that. There's nothing obscure about directly testing the hash. That's about as far from obscure as it is possible to get: you are directly testing the presence of a feature by testing the feature. Relying on a flag to tell you whether hashes are randomised adds additional complexity: now you need to care about whether hashes are randomised AND know that there is a flag you can look up and what it is called. And since the flag won't exist in all versions of Python, or even in all builds of a particular Python version, it isn't a matter of just testing the flag, but of doing the try...except or hasattr() dance to check whether it exists first. At some point, presuming that there is no speed penalty, the behaviour will surely become not just enabled by default but mandatory. Python has never promised that hashes must be predictable or consistent, so apart from backwards compatibility concerns for old versions, future versions of Python should make it mandatory. Presuming that there is no speed penalty, I'd argue in favour of making it mandatory for 3.3. Why do we need a flag for something that is going to be always on? -- Steven From benjamin at python.org Fri Jan 6 03:04:34 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 5 Jan 2012 20:04:34 -0600 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0653ED.40204@pearwood.info> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> Message-ID: 2012/1/5 Steven D'Aprano : > Benjamin Peterson wrote: >> >> 2012/1/5 Nick Coghlan : >>> >>> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano >>> wrote: >>>> >>>> Surely the way to verify the behaviour is to run this from the shell: >>>> >>>> python -c print(hash("abcde")) >>>> >>>> twice, and see that the calls return different values. (Or have I >>>> misunderstood the way the fix is going to work?) >>>> >>>> In any case, I wouldn't want to rely on the presence of a flag in the >>>> sys >>>> module to verify the behaviour, I'd want to see for myself that hash >>>> collisions are no longer predictable. >>> >>> More directly, you can just check that the hash of the empty string is >>> non-zero. >>> >>> So -1 for a flag in the sys module - "hash('') != 0" should serve as a >>> sufficient check whether or not process-level string hash >>> randomisation is in effect. >> >> >> What exactly is the disadvantage of a sys attribute? That would seem >> preferable to an obscure incarnation like that. > > > There's nothing obscure about directly testing the hash. That's about as far > from obscure as it is possible to get: you are directly testing the presence > of a feature by testing the feature. It's obscure because hash('') != 0 doesn't necessarily mean the hashes are randomized. A different hashing algorithm could be in effect. -- Regards, Benjamin From lists at cheimes.de Fri Jan 6 03:09:55 2012 From: lists at cheimes.de (Christian Heimes) Date: Fri, 06 Jan 2012 03:09:55 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> Message-ID: <4F0657F3.20607@cheimes.de> Am 06.01.2012 03:04, schrieb Benjamin Peterson: > It's obscure because hash('') != 0 doesn't necessarily mean the hashes > are randomized. A different hashing algorithm could be in effect. Also in 1 of 2**32 or 2**64 tries hash('') is 0 although randomizing is active. Christian From robertc at robertcollins.net Fri Jan 6 03:43:32 2012 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 6 Jan 2012 15:43:32 +1300 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: <20120105231718.3765b5f2@pitrou.net> References: <20120105231718.3765b5f2@pitrou.net> Message-ID: On Fri, Jan 6, 2012 at 11:17 AM, Antoine Pitrou wrote: >> >From what I understand, the biggest motivation for pure Python >> versions is cooperation with the other Python implementations. ?See >> http://www.python.org/dev/peps/pep-0399/ > > Apologies, I didn't remember it was written down in PEP. > A bit more than a school of thought, then :-) It needs to be correct to aid other implementation though, doesn't it? Copying/reusing something buggy won't help... -Rob From steve at pearwood.info Fri Jan 6 04:08:10 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 06 Jan 2012 14:08:10 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> Message-ID: <4F06659A.6050704@pearwood.info> Benjamin Peterson wrote: > 2012/1/5 Steven D'Aprano : [...] >> There's nothing obscure about directly testing the hash. That's about as far >> from obscure as it is possible to get: you are directly testing the presence >> of a feature by testing the feature. > > It's obscure because hash('') != 0 doesn't necessarily mean the hashes > are randomized. A different hashing algorithm could be in effect. Fair point, but I didn't actually suggest testing hash('') != 0, that was Nick's suggestion, which he's since withdrawn. -- Steven From v+python at g.nevcal.com Fri Jan 6 04:46:53 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jan 2012 19:46:53 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0653ED.40204@pearwood.info> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> Message-ID: <4F066EAD.8010307@g.nevcal.com> On 1/5/2012 5:52 PM, Steven D'Aprano wrote: > > At some point, presuming that there is no speed penalty, the behaviour > will surely become not just enabled by default but mandatory. Python > has never promised that hashes must be predictable or consistent, so > apart from backwards compatibility concerns for old versions, future > versions of Python should make it mandatory. Presuming that there is > no speed penalty, I'd argue in favour of making it mandatory for 3.3. > Why do we need a flag for something that is going to be always on? I think the whole paragraph is invalid, because it presumes there is no speed penalty. I presume there will be a speed penalty, until benchmarking shows otherwise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Fri Jan 6 07:11:37 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 6 Jan 2012 17:11:37 +1100 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: I'm pretty sure the Python version of RLock is in use in several alternative implementations that provide an alternative _thread.lock. I think gevent would fall into this camp, as well as a personal project of mine in a similar vein that operates on python3. 2012/1/6 Charles-Fran?ois Natali > Hi, > > Issue #13697 (http://bugs.python.org/issue13697) deals with a problem > with the Python version of threading.RLock (a signal handler which > tries to acquire the same RLock is called right at the wrong time) > which doesn't affect the C version. > Whether such a use case can be considered good practise or the best > way to fix this is not settled yet, but the question that arose to me > is: "why do we have both a C and Python version?". > Here's Antoine answer (he suggested to me to bring this up on python-dev": > """ > The C version is quite recent, and there's a school of thought that we > should always provide fallback Python implementations. > (also, arguably a Python implementation makes things easier to > prototype, although I don't think it's the case for an RLock) > """ > > So, what do you guys think? > Would it be okay to nuke the Python version? > Do you have more details on this "school of thought"? > > Also, while we're at it, Victor created #13550 to try to rewrite the > "logging hack" of the threading module: there again, I think we could > just remove this logging altogether. What do you think? > > Cheers, > > cf > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -- ?_? -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jan 6 07:41:17 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 06 Jan 2012 08:41:17 +0200 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: 06.01.12 02:10, Nick Coghlan ???????(??): > Not a good idea - a lot of the 3rd party tests that depend on dict > ordering are going to be using those modules anyway, so scattering our > solution across half the standard library is needlessly creating > additional work without really reducing the incompatibility problem. > If we're going to change anything, it may as well be the string > hashing algorithm itself. Changing the string hashing algorithm will hit the general performance and also will break down any code that depend on dict ordering. Specialized dict slow down only needed parts of some applications. From paul at smedley.id.au Fri Jan 6 08:12:46 2012 From: paul at smedley.id.au (Paul Smedley) Date: Fri, 06 Jan 2012 17:42:46 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: Hi Terry, On 06/01/12 10:25, Terry Reedy wrote: > On 1/5/2012 3:01 PM, Paul Smedley wrote: > >>> File "./setup.py", line 1154, in detect_modules >>> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()] >>> AttributeError: 'NoneType' object has no attribute 'split' >>> make: *** [sharedmods] Error 1 > >> File "./setup.py", line 1368, in detect_modules >> if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"): >> TypeError: argument of type 'NoneType' is not iterable >> make: *** [sharedmods] Error 1 >> >> Which again points to problems with >> sysconfig.get_config_var("CONFIG_ARGS"): > > [The earlier call was with "__CONFIG_ARGS", for whatever difference that > makes.] It appears to be returning None instead of [] (or a populated > list). > > In 3.2.2, at line 579 of sysconfig.py is > def get_config_var(name): > return get_config_vars().get(name) > > That defaults to None if name is not a key in the dict returned by > get_config_vars(). My guess is that it always is and and the the value > is always a list for tested win/*nix/mac systems. So either setup.py has > the bug of assuming that there is always a list value for "CONFIG_ARGS" > or sysconfig.py has the bug of not setting it for os2, perhaps because > of a bug elsewhere. > > At line 440 of sysconfig.py is > def get_config_var(*args): > global _CONFIG_VARS > if _CONFIG_VARS is None: > _CONFIG_VARS = {} > > if os.name in ('nt', 'os2'): > _init_non_posix(_CONFIG_VARS) > if args: > vals = [] > for name in args: > vals.append(_CONFIG_VARS.get(name)) > return vals > else: > return _CONFIG_VARS > > At 456 is > def _init_non_posix(vars): > """Initialize the module as appropriate for NT""" > # set basic install directories > ... > > "CONFIG_ARGS" is not set explicitly for any system anywhere in the file, > so I do not know how the call ever works. This looks pretty much the same as the code in 2.7.2 - I don't understand Python code well enough to debug the script :( Thanks for the response, Paul From paul at smedley.id.au Fri Jan 6 09:52:38 2012 From: paul at smedley.id.au (Paul Smedley) Date: Fri, 06 Jan 2012 19:22:38 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 Message-ID: Hi All, I'm a little slow in responding to http://blog.python.org/2011/05/python-33-to-drop-support-for-os2.html, but I'm interested in stepping up to help maintain OS/2 support in Python 3.3 and above. I've been building Python 2.x for a while, and currently have binaries of 2.6.5 available from http://os2ports.smedley.info Unlike Andrew Mcintyre, I'm using libc for development (http://svn.netlabs.org/libc) rather than emx. libc is still being developed whereas emx hasn't been updated in about 10 years. I haven't attempted a build of 3.x yet, but will grab the latest 3.x release and see what it takes to get it building here. I expect I'll hit the same problem with sysconfig.get_config_var("CONFIG_ARGS"): as with 2.7.2 but we'll wait and see. Cheers, Paul From mark at hotpy.org Fri Jan 6 10:18:39 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 06 Jan 2012 09:18:39 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: <4F06BC6F.8040206@hotpy.org> Serhiy Storchaka wrote: > 06.01.12 02:10, Nick Coghlan ???????(??): >> Not a good idea - a lot of the 3rd party tests that depend on dict >> ordering are going to be using those modules anyway, so scattering our >> solution across half the standard library is needlessly creating >> additional work without really reducing the incompatibility problem. >> If we're going to change anything, it may as well be the string >> hashing algorithm itself. > > Changing the string hashing algorithm will hit the general performance > and also will break down any code that depend on dict ordering. > Specialized dict slow down only needed parts of some applications. The minimal proposed change of seeding the hash from a global value (a single memory read and an addition) will have such a minimal performance effect that it will be undetectable even on the most noise-free testing environment. Cheers, Mark From sandro.tosi at gmail.com Fri Jan 6 10:29:40 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Fri, 6 Jan 2012 10:29:40 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #12042: a queue is only used to retrive results; preliminary patch by In-Reply-To: <4F06280D.9060102@udel.edu> References: <4F06280D.9060102@udel.edu> Message-ID: On Thu, Jan 5, 2012 at 23:45, Terry Reedy wrote: > On 1/5/2012 1:51 PM, sandro.tosi wrote: >> >> http://hg.python.org/cpython/rev/3353f9747a39 >> changeset: ? 74282:3353f9747a39 >> branch: ? ? ?2.7 > > >> ? Doc/whatsnew/2.6.rst | ?4 ++-- > > > should that have been whatsnew/2.7.rst? The wording correction was in the 2.6 what's new, when describing multiprocessing (which was added in 2.6). -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From steve at pearwood.info Fri Jan 6 11:01:28 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 06 Jan 2012 21:01:28 +1100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F066EAD.8010307@g.nevcal.com> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> <4F066EAD.8010307@g.nevcal.com> Message-ID: <4F06C678.4060105@pearwood.info> Glenn Linderman wrote: > On 1/5/2012 5:52 PM, Steven D'Aprano wrote: >> >> At some point, presuming that there is no speed penalty, the behaviour >> will surely become not just enabled by default but mandatory. Python >> has never promised that hashes must be predictable or consistent, so >> apart from backwards compatibility concerns for old versions, future >> versions of Python should make it mandatory. Presuming that there is >> no speed penalty, I'd argue in favour of making it mandatory for 3.3. >> Why do we need a flag for something that is going to be always on? > > I think the whole paragraph is invalid, because it presumes there is no > speed penalty. I presume there will be a speed penalty, until > benchmarking shows otherwise. There *may* be a speed penalty, but I draw your attention to Paul McMillian's email on 1st of January: Empirical testing shows that this unoptimized python implementation produces ~10% slowdown in the hashing of ~20 character strings. and Christian Heimes' email on 3rd of January: The changeset adds the murmur3 hash algorithm with some minor changes, for example more random seeds. At first I was worried that murmur might be slower than our old hash algorithm. But in fact it seems to be faster! So I think that it's a fairly safe bet that there will be a solution that is as fast, or at worst, trivially slower, than the current hash function. But of course, benchmarks will be needed. -- Steven From victor.stinner at haypocalc.com Fri Jan 6 12:42:44 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 6 Jan 2012 12:42:44 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F06C678.4060105@pearwood.info> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <1325792005.2123.11.camel@surprise> <4F063B3F.9030903@pearwood.info> <4F0653ED.40204@pearwood.info> <4F066EAD.8010307@g.nevcal.com> <4F06C678.4060105@pearwood.info> Message-ID: Using my patch (random-2.patch), the overhead is 0%. I cannot see a difference with and without my patch. Numbers: --- unpatched: == 3 characters == 1 loops, best of 3: 459 usec per loop == 10 characters == 1 loops, best of 3: 575 usec per loop == 500 characters == 1 loops, best of 3: 1.36 msec per loop patched: == 3 characters == 1 loops, best of 3: 458 usec per loop == 10 characters == 1 loops, best of 3: 575 usec per loop == 500 characters == 1 loops, best of 3: 1.36 msec per loop --- (the patched version looks faster just because the timer is not reliable enough for such fast test) Script: --- echo "== 3 characters ==" ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' echo "== 10 characters ==" ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' echo "== 500 characters ==" ./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in range(1,1000))' 'sum(hash(x) for x in text)' ./python -m timeit -n 1 -s 'text=(("%0500i" % x) --- (Take the smallest timing for each test) "-n 1" is needed because the hash value is only computed once (is cached). I may be possible to have more reliable results by disabling completly the hash cache (comment "PyUnicode_HASH(self) = x;" line). Victor From solipsis at pitrou.net Fri Jan 6 13:42:45 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 6 Jan 2012 13:42:45 +0100 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 References: Message-ID: <20120106134245.399d34b4@pitrou.net> Hi Paul, > I'm a little slow in responding to > http://blog.python.org/2011/05/python-33-to-drop-support-for-os2.html, > but I'm interested in stepping up to help maintain OS/2 support in > Python 3.3 and above. > > I've been building Python 2.x for a while, and currently have binaries > of 2.6.5 available from http://os2ports.smedley.info > > Unlike Andrew Mcintyre, I'm using libc for development > (http://svn.netlabs.org/libc) rather than emx. libc is still being > developed whereas emx hasn't been updated in about 10 years. > > I haven't attempted a build of 3.x yet, but will grab the latest 3.x > release and see what it takes to get it building here. I would suggest you start from the Mercurial repository instead. There you'll find both the current stable branch (named "3.2") and the current development branch (named "default"). It will also make it easier for you to write and maintain patches. Let me point you to the devguide, even though it doesn't talk specifically about porting: http://docs.python.org/devguide/ Regards Antoine. From status at bugs.python.org Fri Jan 6 18:07:32 2012 From: status at bugs.python.org (Python tracker) Date: Fri, 6 Jan 2012 18:07:32 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20120106170732.732AB1DE8F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-12-30 - 2012-01-06) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3180 ( +2) closed 22322 (+34) total 25502 (+36) Open issues with patches: 1366 Issues opened (24) ================== #13685: argparse does not sanitize help strings for % signs http://bugs.python.org/issue13685 opened by Jeff.Yurkiw #13686: Some notes on the docs of multiprocessing http://bugs.python.org/issue13686 opened by eli.bendersky #13689: fix CGI Web Applications with Python link in howto/urllib2 http://bugs.python.org/issue13689 opened by sandro.tosi #13691: pydoc help (or help('help')) claims to run a help utility; doe http://bugs.python.org/issue13691 opened by Devin Jeanpierre #13692: 2to3 mangles from . import frobnitz http://bugs.python.org/issue13692 opened by holmbie #13694: asynchronous connect in asyncore.dispatcher does not set addr http://bugs.python.org/issue13694 opened by anacrolix #13695: "type specific" to "type-specific" http://bugs.python.org/issue13695 opened by Retro #13697: python RLock implementation unsafe with signals http://bugs.python.org/issue13697 opened by rbcollins #13698: Mailbox module should support other mbox formats in addition t http://bugs.python.org/issue13698 opened by endolith #13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani http://bugs.python.org/issue13700 opened by etukia #13701: Remove Decimal Python 2.3 Compatibility http://bugs.python.org/issue13701 opened by ramchandra.apte #13702: relative symlinks in tarfile.extract broken (windows) http://bugs.python.org/issue13702 opened by Patrick.von.Reth #13703: Hash collision security issue http://bugs.python.org/issue13703 opened by barry #13704: Random number generator in Python core http://bugs.python.org/issue13704 opened by christian.heimes #13706: non-ascii fill characters no longer work in formatting http://bugs.python.org/issue13706 opened by skrah #13708: Document ctypes.wintypes http://bugs.python.org/issue13708 opened by ramchandra.apte #13709: Capitalization mistakes in the documentation for ctypes http://bugs.python.org/issue13709 opened by ramchandra.apte #13712: pysetup create should not convert package_data to extra_files http://bugs.python.org/issue13712 opened by christian.heimes #13715: typo in unicodedata documentation http://bugs.python.org/issue13715 opened by eli.collins #13716: distutils doc contains lots of XXX http://bugs.python.org/issue13716 opened by flox #13718: Format Specification Mini-Language does not accept comma for p http://bugs.python.org/issue13718 opened by mkesper #13719: bdist_msi upload fails http://bugs.python.org/issue13719 opened by schmir #13720: argparse print_help() fails if COLUMNS is set to a low value http://bugs.python.org/issue13720 opened by zbysz #818201: distutils: clean does not use build_base option from build http://bugs.python.org/issue818201 reopened by eric.araujo Most recent 15 issues with no replies (15) ========================================== #13720: argparse print_help() fails if COLUMNS is set to a low value http://bugs.python.org/issue13720 #13718: Format Specification Mini-Language does not accept comma for p http://bugs.python.org/issue13718 #13715: typo in unicodedata documentation http://bugs.python.org/issue13715 #13708: Document ctypes.wintypes http://bugs.python.org/issue13708 #13691: pydoc help (or help('help')) claims to run a help utility; doe http://bugs.python.org/issue13691 #13689: fix CGI Web Applications with Python link in howto/urllib2 http://bugs.python.org/issue13689 #13682: Documentation of os.fdopen() refers to non-existing bufsize ar http://bugs.python.org/issue13682 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 #13649: termios.ICANON is not documented http://bugs.python.org/issue13649 #13638: PyErr_SetFromErrnoWithFilenameObject is undocumented http://bugs.python.org/issue13638 #13633: Handling of hex character references in HTMLParser.handle_char http://bugs.python.org/issue13633 #13631: readline fails to parse some forms of .editrc under editline ( http://bugs.python.org/issue13631 #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 Most recent 15 issues waiting for review (15) ============================================= #13719: bdist_msi upload fails http://bugs.python.org/issue13719 #13715: typo in unicodedata documentation http://bugs.python.org/issue13715 #13712: pysetup create should not convert package_data to extra_files http://bugs.python.org/issue13712 #13704: Random number generator in Python core http://bugs.python.org/issue13704 #13703: Hash collision security issue http://bugs.python.org/issue13703 #13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani http://bugs.python.org/issue13700 #13694: asynchronous connect in asyncore.dispatcher does not set addr http://bugs.python.org/issue13694 #13691: pydoc help (or help('help')) claims to run a help utility; doe http://bugs.python.org/issue13691 #13684: httplib tunnel infinite loop http://bugs.python.org/issue13684 #13681: Aifc read compressed frames fix http://bugs.python.org/issue13681 #13677: correct docstring for builtin compile http://bugs.python.org/issue13677 #13676: sqlite3: Zero byte truncates string contents http://bugs.python.org/issue13676 #13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi http://bugs.python.org/issue13673 #13670: Increase test coverage for pstats.py http://bugs.python.org/issue13670 #13668: mute ImportError in __del__ of _threading_local module http://bugs.python.org/issue13668 Top 10 most discussed issues (10) ================================= #13703: Hash collision security issue http://bugs.python.org/issue13703 70 msgs #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 17 msgs #8184: multiprocessing.managers will not fail if listening ocket alre http://bugs.python.org/issue8184 14 msgs #13697: python RLock implementation unsafe with signals http://bugs.python.org/issue13697 11 msgs #13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani http://bugs.python.org/issue13700 10 msgs #1079: decode_header does not follow RFC 2047 http://bugs.python.org/issue1079 7 msgs #13704: Random number generator in Python core http://bugs.python.org/issue13704 6 msgs #13706: non-ascii fill characters no longer work in formatting http://bugs.python.org/issue13706 6 msgs #8416: python 2.6.5 documentation can't search http://bugs.python.org/issue8416 5 msgs #9993: shutil.move fails on symlink source http://bugs.python.org/issue9993 5 msgs Issues closed (34) ================== #6031: BaseServer.shutdown documentation is incomplete http://bugs.python.org/issue6031 closed by sandro.tosi #8245: email examples don't actually work (SMTP.connect is not called http://bugs.python.org/issue8245 closed by sandro.tosi #9201: IDLE: raises Exception TclError in a special case http://bugs.python.org/issue9201 closed by ned.deily #9349: document argparse's help=SUPPRESS http://bugs.python.org/issue9349 closed by sandro.tosi #9975: Incorrect use of flowinfo and scope_id in IPv6 sockaddr tuple http://bugs.python.org/issue9975 closed by neologix #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 closed by benjamin.peterson #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 closed by benjamin.peterson #11648: openlog()s 'logopt' keyword broken in syslog module http://bugs.python.org/issue11648 closed by sandro.tosi #11984: Wrong "See also" in symbol and token module docs http://bugs.python.org/issue11984 closed by sandro.tosi #12042: What's New multiprocessing example error http://bugs.python.org/issue12042 closed by sandro.tosi #12926: tarfile tarinfo.extract*() broken with symlinks http://bugs.python.org/issue12926 closed by lars.gustaebel #13302: Clarification needed in C API arg parsing http://bugs.python.org/issue13302 closed by sandro.tosi #13511: Specifying multiple lib and include directories on linux http://bugs.python.org/issue13511 closed by loewis #13558: multiprocessing package incompatible with PyObjC http://bugs.python.org/issue13558 closed by ned.deily #13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le http://bugs.python.org/issue13565 closed by neologix #13594: Aifc markers write fix http://bugs.python.org/issue13594 closed by sandro.tosi #13636: Python SSL Stack doesn't have a Secure Default set of ciphers http://bugs.python.org/issue13636 closed by pitrou #13640: add mimetype for application/vnd.apple.mpegurl http://bugs.python.org/issue13640 closed by sandro.tosi #13679: Multiprocessing system crash http://bugs.python.org/issue13679 closed by pitrou #13680: Aifc comptype write fix http://bugs.python.org/issue13680 closed by sandro.tosi #13683: Docs in Python 3:raise statement mistake http://bugs.python.org/issue13683 closed by sandro.tosi #13687: parse incorrect command line on windows 7 http://bugs.python.org/issue13687 closed by balenocui #13688: ast.literal_eval fails on octal numbers http://bugs.python.org/issue13688 closed by fidoman #13690: Add DEBUG flag to documentation of re.compile http://bugs.python.org/issue13690 closed by sandro.tosi #13693: email.Header.Header incorrect/non-smart on international chars http://bugs.python.org/issue13693 closed by r.david.murray #13696: [urllib.request.HTTPRedirectHandler.http_error_302] Relative R http://bugs.python.org/issue13696 closed by orsenthil #13699: test_gdb has recently started failing http://bugs.python.org/issue13699 closed by python-dev #13705: Raising exceptions from finally works better than advertised i http://bugs.python.org/issue13705 closed by python-dev #13707: Clarify hash() constancy period http://bugs.python.org/issue13707 closed by rhettinger #13710: hash() on strings containing only null characters returns the http://bugs.python.org/issue13710 closed by benjamin.peterson #13711: html.parser.HTMLParser doesn't parse tags in comments in scrip http://bugs.python.org/issue13711 closed by ezio.melotti #13713: Regression for http.client read() http://bugs.python.org/issue13713 closed by pitrou #13714: Methods of ftplib never ends if the ip address changes http://bugs.python.org/issue13714 closed by giampaolo.rodola #13717: print fails on unicode '\udce5' surrogates not allowed http://bugs.python.org/issue13717 closed by ezio.melotti From neologix at free.fr Fri Jan 6 20:10:04 2012 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Fri, 6 Jan 2012 20:10:04 +0100 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: Thanks for those precisions, but I must admit it doesn't help me much... Can we drop it? A yes/no answer will do it ;-) > I'm pretty sure the Python version of RLock is in use in several alternative > implementations that provide an alternative _thread.lock. I think gevent > would fall into this camp, as well as a personal project of mine in a > similar vein that operates on python3. Sorry, I'm not sure I understand. Do those projects use _PyRLock directly? If yes, then aliasing it to _CRLock should do the trick, no? From paul at smedley.id.au Fri Jan 6 20:58:00 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sat, 07 Jan 2012 06:28:00 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 In-Reply-To: References: Message-ID: Hi All, On 06/01/12 19:22, Paul Smedley wrote: > I'm a little slow in responding to > http://blog.python.org/2011/05/python-33-to-drop-support-for-os2.html, > but I'm interested in stepping up to help maintain OS/2 support in > Python 3.3 and above. > > I've been building Python 2.x for a while, and currently have binaries > of 2.6.5 available from http://os2ports.smedley.info > > Unlike Andrew Mcintyre, I'm using libc for development > (http://svn.netlabs.org/libc) rather than emx. libc is still being > developed whereas emx hasn't been updated in about 10 years. > > I haven't attempted a build of 3.x yet, but will grab the latest 3.x > release and see what it takes to get it building here. I expect I'll hit > the same problem with sysconfig.get_config_var("CONFIG_ARGS"): as with > 2.7.2 but we'll wait and see. I now have a dll and exe - however when it tried to build the modules, it dies with: Could not find platform independent libraries Could not find platform dependent libraries Consider setting $PYTHONHOME to [:] Fatal Python error: Py_Initialize: Unable to get the locale encoding LookupError: no codec search functions registered: can't find encoding Have done a small amount of debugging: in get_codeset(), char* codeset = nl_langinfo(CODESET); returns: ISO8859-1 Which can't be found by: codec = _PyCodec_Lookup(encoding); from get_codec_name(const char *encoding) Where is the list of valid codepages read from? Should ISO8859-1 be valid? I see some references to ISO-8859-1 in the code but not ISO8859-1 TIA, Paul From jimjjewett at gmail.com Fri Jan 6 21:06:54 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 6 Jan 2012 15:06:54 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: In http://mail.python.org/pipermail/python-dev/2012-January/115350.html, Mark Shannon wrote: > The minimal proposed change of seeding the hash from a global value (a > single memory read and an addition) will have such a minimal performance > effect that it will be undetectable even on the most noise-free testing > environment. (1) Is it established that this (a single initial add, with no per-loop operations) would be sufficient? I thought that was in the gray area of "We don't yet have a known attack, but there are clearly safer options." (2) Even if the direct cost (fetch and add) were free, it might be expensive in practice. The current hash function is designed to send "similar" strings (and similar numbers) to similar hashes. (2a) That guarantees they won't (initially) collide, even in very small dicts. (2b) It keeps them nearby, which has an effect on cache hits. The exact effect (and even direction) would of course depend on the workload, which makes me distrust micro-benchmarks. If this were a problem in practice, I could understand accepting a little slowdown as the price of safety, but ... it isn't. Even in theory, the only way to trigger this is to take unreasonable amounts of user input and turn it directly into an unreasonable number of keys (as opposed to values, or list elements) placed in the same dict (as opposed to a series of smaller dicts). -jJ From mark at hotpy.org Fri Jan 6 21:25:46 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 06 Jan 2012 20:25:46 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: <4F0758CA.1060303@hotpy.org> Hi, It seems to me that half the folk discussing this issue want a super-strong, resist-all-hypothetical-attacks hash with little regard to performance. The other half want no change or a change that will have no observable effect. (I may be exaggerating a little.) Can I propose the following, half-way proposal: 1. Since there is a published vulnerability, that we fix it with the most efficient solution proposed so far: http://bugs.python.org/file24143/random-2.patch 2. Decide which versions of Python this should be applied to. 3.3 seems a given, the other are open to debate. 3. If and only if (and I think this unlikely) the solution chosen is shown to be vulnerable to a more sophisticated attack then a new issue should be opened and dealt with separately. Cheers, Mark. From solipsis at pitrou.net Fri Jan 6 21:28:29 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 6 Jan 2012 21:28:29 +0100 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 References: Message-ID: <20120106212829.7f4b5f43@pitrou.net> On Sat, 07 Jan 2012 06:28:00 +1030 Paul Smedley wrote: > > I now have a dll and exe - however when it tried to build the modules, > it dies with: > Could not find platform independent libraries > Could not find platform dependent libraries > Consider setting $PYTHONHOME to [:] > Fatal Python error: Py_Initialize: Unable to get the locale encoding I would look at this line: > LookupError: no codec search functions registered: can't find encoding Normally the standard codec search function is registered when importing the "encodings" module (see Lib/encodings/__init__.py), which is done at the end of _PyCodecRegistry_Init() in Python/codecs.c. There's this comment there: /* Ignore ImportErrors... this is done so that distributions can disable the encodings package. Note that other errors are not masked, e.g. SystemErrors raised to inform the user of an error in the Python configuration are still reported back to the user. */ For the purpose of debugging you could *not* ignore the error and instead print it out or bail out. Regards Antoine. From p.f.moore at gmail.com Fri Jan 6 21:52:55 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 6 Jan 2012 20:52:55 +0000 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4F0758CA.1060303@hotpy.org> References: <4F0758CA.1060303@hotpy.org> Message-ID: On 6 January 2012 20:25, Mark Shannon wrote: > Hi, > > It seems to me that half the folk discussing this issue want a super-strong, > resist-all-hypothetical-attacks hash with little regard to performance. The > other half want no change or a change that will have no ?observable effect. > (I may be exaggerating a little.) > > Can I propose the following, half-way proposal: > > 1. Since there is a published vulnerability, > that we fix it with the most efficient solution proposed so far: > http://bugs.python.org/file24143/random-2.patch > > 2. Decide which versions of Python this should be applied to. > 3.3 seems a given, the other are open to debate. > > 3. If and only if (and I think this unlikely) the solution chosen is shown > to be vulnerable to a more sophisticated attack then a new issue should be > opened and dealt with separately. +1 Paul From paul at smedley.id.au Fri Jan 6 22:52:36 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sat, 07 Jan 2012 08:22:36 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 In-Reply-To: <20120106212829.7f4b5f43@pitrou.net> References: <20120106212829.7f4b5f43@pitrou.net> Message-ID: Hi Antoine, On 07/01/12 06:58, Antoine Pitrou wrote: > On Sat, 07 Jan 2012 06:28:00 +1030 > Paul Smedley wrote: >> >> I now have a dll and exe - however when it tried to build the modules, >> it dies with: >> Could not find platform independent libraries >> Could not find platform dependent libraries >> Consider setting $PYTHONHOME to[:] >> Fatal Python error: Py_Initialize: Unable to get the locale encoding > > I would look at this line: > >> LookupError: no codec search functions registered: can't find encoding > > Normally the standard codec search function is registered when > importing the "encodings" module (see Lib/encodings/__init__.py), which > is done at the end of _PyCodecRegistry_Init() in Python/codecs.c. > There's this comment there: > > /* Ignore ImportErrors... this is done so that > distributions can disable the encodings package. Note > that other errors are not masked, e.g. SystemErrors > raised to inform the user of an error in the Python > configuration are still reported back to the user. */ > > For the purpose of debugging you could *not* ignore the error and > instead print it out or bail out. Thanks - commenting out the ImportErrors block, I get: ImportError: No module named encodings So seems it's not finding modules - possibly related to the warnings about: >> Could not find platform independent libraries >> Could not find platform dependent libraries Seems getenv() may not be working correctly... From v+python at g.nevcal.com Fri Jan 6 04:39:30 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 05 Jan 2012 19:39:30 -0800 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFC4B56.90709@hotpy.org> <4EFC68E0.4000606@cheimes.de> <63988.1325628139@parc.com> <20120104115513.39db6b8b@pitrou.net> <20120105042627.GA10082@flay.puzzling.org> <20120105143957.1b5ba7fe@pitrou.net> <4F05F6AB.3060704@g.nevcal.com> Message-ID: <4F066CF2.4050405@g.nevcal.com> On 1/5/2012 4:10 PM, Nick Coghlan wrote: > On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka wrote: >> 05.01.12 21:14, Glenn Linderman ???????(??): >>> So, fixing the vulnerable packages could be a sufficient response, >>> rather than changing the hash function. How to fix? Each of those >>> above allocates and returns a dict. Simply have each of those allocate >>> and return and wrapped dict, which has the following behaviors: >>> >>> i) during __init__, create a local, random, string. >>> ii) for all key values, prepend the string, before passing it to the >>> internal dict. >> >> Good idea. Thanks for the implementation, Serhiy. That is the sort of thing I had in mind, indeed. > Not a good idea - a lot of the 3rd party tests that depend on dict > ordering are going to be using those modules anyway, Stats? Didn't someone post a list of tests that fail when changing the hash? Oh, those were stdlib tests, not 3rd party tests. I'm not sure how to gather the stats, then, are you? > so scattering our > solution across half the standard library is needlessly creating > additional work without really reducing the incompatibility problem. Half the standard library? no one has cared to augment my list of modules, but I have seen reference to JSON in addition to cgi and urllib.parse. I think there are more than 6 modules in the standard library... > If we're going to change anything, it may as well be the string > hashing algorithm itself. Changing the string hashing algorithm is known (or at least no one has argued otherwise) to be a source of backward incompatibility that will break programs. My proposal (and Serhiy's implementation, assuming it works, or can be easily tweaked to work, I haven't reviewed it in detail or attempted to test it) will only break programs that have vulnerabilities. I failed to mention one other benefit of my proposal: every web request would have a different random prefix, so attempting to gather info is futile: the next request has a different random prefix, so different strings would collide. > Cheers, > Nick. > Indeed it is nice when we can be cheery even when arguing, for the most part :) I've enjoyed reading the discussions in this forum because most folks have respect for other people's opinions, even when they differ. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Sat Jan 7 01:10:17 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 7 Jan 2012 11:10:17 +1100 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: _PyRLock is not used directly. Instead, no _CRLock is provided, so the threading.RLock function calls _PyRLock. It's done this way because green threading libraries may only provide a greened lock. _CRLock in these contexts would not work: It would block the entire native thread. I suspect that if you removed _PyRLock, these implementations would have to expose their own RLock primitive which works the same way as the one just removed from the standard library. I don't know if this is a good thing. I would recommend checking with at least the gevent and eventlet developers. 2012/1/7 Charles-Fran?ois Natali > Thanks for those precisions, but I must admit it doesn't help me much... > Can we drop it? A yes/no answer will do it ;-) > > > I'm pretty sure the Python version of RLock is in use in several > alternative > > implementations that provide an alternative _thread.lock. I think gevent > > would fall into this camp, as well as a personal project of mine in a > > similar vein that operates on python3. > > Sorry, I'm not sure I understand. Do those projects use _PyRLock directly? > If yes, then aliasing it to _CRLock should do the trick, no? > -- ?_? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Jan 7 04:05:46 2012 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 6 Jan 2012 22:05:46 -0500 Subject: [Python-Dev] "Sort attacks" (was Re: Hash collision security issue (now public)) Message-ID: I can't find it now, but I believe Marc-Andre mentioned that CPython's list.sort() was vulnerable to attack too, because of its O(n log n) worst-case behavior. I wouldn't worry about that, because nobody could stir up anguish about it by writing a paper ;-) 1. O(n log n) is enormously more forgiving than O(n**2). 2. An attacker need not be clever at all: O(n log n) is not only sort()'s worst case, it's also its _expected_ case when fed randomly ordered data. 3. It's provable that no comparison-based sorting algorithm can have better worst-case asymptotic behavior when fed randomly ordered data. So if anyone whines about this, tell 'em to go do something useful instead :-) still-solving-problems-not-in-need-of-attention-ly y'rs - tim From paul at smedley.id.au Sat Jan 7 09:48:10 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sat, 07 Jan 2012 19:18:10 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: Hi All, On 06/01/12 10:25, Terry Reedy wrote: > On 1/5/2012 3:01 PM, Paul Smedley wrote: > >>> File "./setup.py", line 1154, in detect_modules >>> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()] >>> AttributeError: 'NoneType' object has no attribute 'split' >>> make: *** [sharedmods] Error 1 > >> File "./setup.py", line 1368, in detect_modules >> if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"): >> TypeError: argument of type 'NoneType' is not iterable >> make: *** [sharedmods] Error 1 >> >> Which again points to problems with >> sysconfig.get_config_var("CONFIG_ARGS"): > > [The earlier call was with "__CONFIG_ARGS", for whatever difference that > makes.] It appears to be returning None instead of [] (or a populated > list). > > In 3.2.2, at line 579 of sysconfig.py is > def get_config_var(name): > return get_config_vars().get(name) > > That defaults to None if name is not a key in the dict returned by > get_config_vars(). My guess is that it always is and and the the value > is always a list for tested win/*nix/mac systems. So either setup.py has > the bug of assuming that there is always a list value for "CONFIG_ARGS" > or sysconfig.py has the bug of not setting it for os2, perhaps because > of a bug elsewhere. > > At line 440 of sysconfig.py is > def get_config_var(*args): > global _CONFIG_VARS > if _CONFIG_VARS is None: > _CONFIG_VARS = {} > > if os.name in ('nt', 'os2'): > _init_non_posix(_CONFIG_VARS) > if args: > vals = [] > for name in args: > vals.append(_CONFIG_VARS.get(name)) > return vals > else: > return _CONFIG_VARS > > At 456 is > def _init_non_posix(vars): > """Initialize the module as appropriate for NT""" > # set basic install directories > ... > > "CONFIG_ARGS" is not set explicitly for any system anywhere in the file, > so I do not know how the call ever works. using _init_posix() for 'os2' instead of _init_non_posix is the fix for this. sysconfig.py also needs the following changes: --- \dev\Python-2.7.2-o\Lib\sysconfig.py 2012-01-06 19:27:14.000000000 +1030 +++ sysconfig.py 2012-01-07 19:03:00.000000000 +1030 @@ -46,7 +46,7 @@ 'scripts': '{base}/Scripts', 'data' : '{base}', }, - 'os2_home': { + 'os2_user': { 'stdlib': '{userbase}/lib/python{py_version_short}', 'platstdlib': '{userbase}/lib/python{py_version_short}', 'purelib': '{userbase}/lib/python{py_version_short}/site-packages', @@ -413,9 +413,9 @@ _CONFIG_VARS['platbase'] = _EXEC_PREFIX _CONFIG_VARS['projectbase'] = _PROJECT_BASE - if os.name in ('nt', 'os2'): + if os.name in ('nt'): _init_non_posix(_CONFIG_VARS) - if os.name == 'posix': + if os.name in ('posix', 'os2'): _init_posix(_CONFIG_VARS) # Setting 'userbase' is done below the call to the From tjreedy at udel.edu Sat Jan 7 10:17:33 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 07 Jan 2012 04:17:33 -0500 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: On 1/7/2012 3:48 AM, Paul Smedley wrote: > using _init_posix() for 'os2' instead of _init_non_posix is the fix for > this. > > sysconfig.py also needs the following changes: > --- \dev\Python-2.7.2-o\Lib\sysconfig.py 2012-01-06 19:27:14.000000000 > +1030 > +++ sysconfig.py 2012-01-07 19:03:00.000000000 +1030 > @@ -46,7 +46,7 @@ > 'scripts': '{base}/Scripts', > 'data' : '{base}', > }, > - 'os2_home': { > + 'os2_user': { > 'stdlib': '{userbase}/lib/python{py_version_short}', > 'platstdlib': '{userbase}/lib/python{py_version_short}', > 'purelib': '{userbase}/lib/python{py_version_short}/site-packages', > @@ -413,9 +413,9 @@ > _CONFIG_VARS['platbase'] = _EXEC_PREFIX > _CONFIG_VARS['projectbase'] = _PROJECT_BASE > > - if os.name in ('nt', 'os2'): > + if os.name in ('nt'): > _init_non_posix(_CONFIG_VARS) > - if os.name == 'posix': > + if os.name in ('posix', 'os2'): > _init_posix(_CONFIG_VARS) Submit a patch on the tracker, preferably as a file rather than cut and paste. -- Terry Jan Reedy From stefan_ml at behnel.de Sat Jan 7 12:02:04 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 07 Jan 2012 12:02:04 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: <4EFE88AD.2060505@cheimes.de> References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <4EFE88AD.2060505@cheimes.de> Message-ID: Christian Heimes, 31.12.2011 04:59: > Am 31.12.2011 03:22, schrieb Victor Stinner: > The unique structure of CPython's dict implementation makes it harder to > get the number of values with equal hash. The academic hash map (the one > I learnt about at university) uses a bucket to store all elements with > equal hash (more precise hash: mod mask). However Python's dict however > perturbs the hash until it finds a free slot its array. The second, > third ... collision can be caused by a legit and completely different > (!) hash. > >> The last choice is to change the hash algorithm. The *idea* is the same >> than adding salt to hashed password (in practice it will be a little bit >> different): if a pseudo-random salt is added, the attacker cannot >> prepare a single dataset, he/she will have to regenerate a new dataset >> for each possible salt value. If the salt is big enough (size in bits), >> the attacker will need too much CPU to generate the dataset (compute N >> keys with the same hash value). Basically, it slows down the attack by >> 2^(size of the salt). > > That's the idea of randomized hashing functions as implemented by Ruby > 1.8, Perl and others. The random seed is used as IV. Multiple rounds of > multiply, XOR and MOD (integer overflows) cause a deviation. In your > other posting you were worried about the performance implication. A > randomized hash function just adds a single ADD operation, that's all. > > Downside: With randomization all hashes are unpredictable and change > after every restart of the interpreter. This has some subtle side > effects like a different outcome of {a:1, b:1, c:1}.keys() after a > restart of the interpreter. > >> Another possibility would be to replace our fast hash function by a >> better hash function like MD5 or SHA1 (so the creation of the dataset >> would be too slow in practice = too expensive), but cryptographic hash >> functions are much slower (and so would slow down Python too much). > > I agree with your analysis. Cryptographic hash functions are far too > slow for our use case. During my research I found another hash function > that claims to be fast and that may not be vulnerable to this kind of > attack: http://isthe.com/chongo/tech/comp/fnv/ Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's portable, known to provide a very good distribution for different string values and is generally fast on both 32 and 64 bit architectures. http://burtleburtle.net/bob/c/lookup3.c The analysis is here: http://burtleburtle.net/bob/hash/doobs.html It seems that there's also support for generating 64bit hash values (actually 2x32bits) efficiently. Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content. So it's not a drop-in replacement. Stefan From ncoghlan at gmail.com Sat Jan 7 14:13:23 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jan 2012 23:13:23 +1000 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: 2012/1/7 Charles-Fran?ois Natali : > Thanks for those precisions, but I must admit it doesn't help me much... > Can we drop it? A yes/no answer will do it ;-) The yes/no answer is "No, we can't drop it". Even though CPython no longer uses the Python version of RLock in normal operation, it's still the reference implementation for everyone else that has to perform the same task (i.e. wrap Python code around a non-reentrant lock to create a reentrant one). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 7 14:22:44 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 7 Jan 2012 23:22:44 +1000 Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the source and destination are on different filesystems, In-Reply-To: References: Message-ID: On Sat, Jan 7, 2012 at 5:17 AM, antoine.pitrou wrote: > http://hg.python.org/cpython/rev/1ea8b7233fd7 > changeset: ? 74288:1ea8b7233fd7 > user: ? ? ? ?Antoine Pitrou > date: ? ? ? ?Fri Jan 06 20:16:19 2012 +0100 > summary: > ?Issue #9993: When the source and destination are on different filesystems, > and the source is a symlink, shutil.move() now recreates a symlink on the > destination instead of copying the file contents. > Patch by Jonathan Niehof and Hynek Schlawack. That seems like a fairly nasty backwards incompatibilty right there. While the old behaviour was different from mv, it was still perfectly well defined. Now, operations that used to work may fail - basically anything involving an absolute symlink will silently fail if being moved to removable media (it will create a symlink that is completely useless on the destination machine). Relative symlinks may or may not be broken depending on whether or not their target is *also* being copied to the destination media. The new help text also doesn't say what will happen if the destination doesn't even *support* symlinks (as is quite likely in the removable media case). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Sat Jan 7 16:22:15 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 8 Jan 2012 02:22:15 +1100 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: Nick did you mean to say "wrap python code around a reentrant lock to create a non-reentrant lock"? Isn't that what PyRLock is doing? FWIW having now read issues 13697 and 13550, I'm +1 for dropping Python RLock, and all the logging machinery in threading. 2012/1/8 Nick Coghlan > 2012/1/7 Charles-Fran?ois Natali : > > Thanks for those precisions, but I must admit it doesn't help me much... > > Can we drop it? A yes/no answer will do it ;-) > > The yes/no answer is "No, we can't drop it". > > Even though CPython no longer uses the Python version of RLock in > normal operation, it's still the reference implementation for everyone > else that has to perform the same task (i.e. wrap Python code around a > non-reentrant lock to create a reentrant one). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- ?_? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 7 16:38:26 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 8 Jan 2012 01:38:26 +1000 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: 2012/1/8 Matt Joiner : > Nick did you mean to say "wrap python code around a reentrant lock to create > a non-reentrant lock"? Isn't that what PyRLock is doing? Actually, I should have said recursive, not reentrant. > FWIW having now read issues 13697 and 13550, I'm +1 for dropping Python > RLock, and all the logging machinery in threading. While I agree on removing the unused and potentially problematic debugging machinery, I'm not convinced of the benefits of removing the pure Python RLock implementation. To quote Charles-Fran?ois from the tracker issue: "Now, the fun part: this affects not only RLock, but every Python code performing "atomic" actions: condition variables, barriers, etc. There are some constraints on what can be done from a signal handler, and it should probably be documented." Remove the pure Python RLock doesn't seem to actually solve anything - it just pushes the problem of fixing the signal interaction back onto third party users that are even more ill-equipped to resolve it than we are. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From hs at ox.cx Sat Jan 7 17:11:19 2012 From: hs at ox.cx (Hynek Schlawack) Date: Sat, 7 Jan 2012 17:11:19 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the source and destination are on different filesystems, In-Reply-To: References: Message-ID: <4A50146146794AF9924E0B4A753C7351@gmail.com> Hi Nick, Am Samstag, 7. Januar 2012 um 14:22 schrieb Nick Coghlan: > > http://hg.python.org/cpython/rev/1ea8b7233fd7 > > changeset: 74288:1ea8b7233fd7 > > user: Antoine Pitrou > > date: Fri Jan 06 20:16:19 2012 +0100 > > summary: > > Issue #9993: When the source and destination are on different filesystems, > > and the source is a symlink, shutil.move() now recreates a symlink on the > > destination instead of copying the file contents. > > Patch by Jonathan Niehof and Hynek Schlawack. > > That seems like a fairly nasty backwards incompatibilty right there. > While the old behaviour was different from mv, it was still perfectly > well defined. Now, operations that used to work may fail - basically > anything involving an absolute symlink will silently fail if being > moved to removable media (it will create a symlink that is completely > useless on the destination machine). Relative symlinks may or may not > be broken depending on whether or not their target is *also* being > copied to the destination media. I had a look at it, the possible cases are as following: 1. we can just do a os.rename(): if src is a link it stays one 2. os.rename() fails, src is not a symlink but a directory: copytree() is used with symlinks=True, i.e. symlinks are preserved, no matter where they point to, i.e. this would clash with removable media as well. 3. os.rename() fails and src is a symlink. In both former cases, links were preserved. And the removable-media-argument is IMHO moot due to case 2. If you want hardcore backwards compatibility, we could make the old behavior default and add some flag. But to be honest, the new approach seems more congruent to me. > The new help text also doesn't say what will happen if the destination > doesn't even *support* symlinks (as is quite likely in the removable > media case). A clarification might be appropriate. Maybe even a direct warning, that in such cases the usage of copytree(?, symlinks=False) might be a better idea? But the more I think about it, the more it's my impression, that symlink problems aren't really our problems as they go through all possible layers and it's next to impossible to catch all edge cases in library code. Therefore I'd say it's best just to behave like UNIX tools (please note I'm not defensive here, I've just fixed the tests+docs :)). Cheers, Hynek From lists at cheimes.de Sat Jan 7 18:57:10 2012 From: lists at cheimes.de (Christian Heimes) Date: Sat, 07 Jan 2012 18:57:10 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <4EFE88AD.2060505@cheimes.de> Message-ID: Am 07.01.2012 12:02, schrieb Stefan Behnel: > Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's > portable, known to provide a very good distribution for different string > values and is generally fast on both 32 and 64 bit architectures. > > http://burtleburtle.net/bob/c/lookup3.c > > The analysis is here: > > http://burtleburtle.net/bob/hash/doobs.html > > It seems that there's also support for generating 64bit hash values > (actually 2x32bits) efficiently. This thread as well as the ticket is getting so long that people barely have a chance to catch up ... Guido has stated that he doesn't want a completely new hash algorithm for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too. I've done some experiments with FNV and Murmur3. With Murmur3 128bit I've seen some minor speed improvements on 64bit platforms. At first I was surprised but it makes sense. Murmur3 operates on uint32_t blocks while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2 bytes (USC2) or 4 bytes (USC4) types. Since most strings are either ASCII or UCS2, the inner loop of the current algorithm is more tight. > Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. So it's not a drop-in replacement. Is this condition required and implemented at the moment? Christian From martin at v.loewis.de Sat Jan 7 18:57:41 2012 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 07 Jan 2012 18:57:41 +0100 Subject: [Python-Dev] Python as a Metro-style App Message-ID: <4F088795.5000800@v.loewis.de> I just tried porting Python as a Metro (Windows 8) App, and failed. Metro Apps use a variant of the Windows API called WinRT that still allows to write native applications in C++, but restricts various APIs to a subset of the full Win32 functionality. For example, everything related to subprocess creation would not work; none of the byte-oriented file API seems to be present, and a number of file operation functions are absent as well (such as MoveFile). Regardless, porting Python ought to be feasible, except that it fails fundamentally with the preview release of Visual Studio. The problem is that compilation of C code is apparently not supported/tested in that preview release. When compiling a trivial C file in a Metro app, the compiler complains that a temporary file ending with "md" could not be found, most likely because the C compiler failed to generate it, whereas the C++ compiler would. I tried compiling the Python sources as C++, but that produced hundreds of compilation errors. Most of them are either about missing casts (e.g. from int to enum types, or from void * to other pointer types), or about the "static forward" declarations of type objects. For the latter, anonymous namespaces should be used. While it is feasible to replace static PyTypeObject foo; ... static PyTypeObject foo = { ... }; with Py_BEGIN_STATIC PyTypeObject foo; Py_END_STATIC ... Py_BEGIN_STATIC PyTypeObject foo = { ... }; Py_END_STATIC I'm not sure whether such a change would be accepted, in particular as Microsoft might fix the bug in the compiler until the final release of Windows 8. Regards, Martin From tjreedy at udel.edu Sat Jan 7 21:53:29 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 07 Jan 2012 15:53:29 -0500 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: <8A861810-A566-4C5E-B5D1-6A73D31A7CD7@voidspace.org.uk> <4EFE71E0.2000505@haypocalc.com> <4EFE88AD.2060505@cheimes.de> Message-ID: On 1/7/2012 12:57 PM, Christian Heimes wrote: > Am 07.01.2012 12:02, schrieb Stefan Behnel: >> Admittedly, this may require some adaptation for the PEP393 unicode memory >> layout in order to produce identical hashes for all three representations >> if they represent the same content. So it's not a drop-in replacement. > > Is this condition required and implemented at the moment? If o1 == o2, then hash(o1) == hash(o2) is an unstated requirement implied by "They [hash values] are used to quickly compare dictionary keys during a dictionary lookup." since hash(o1) != hash(o2) is taken to mean o1 != o2 (whereas hash(o1) == hash(o2) is taken to mean o1 == o2 is possible but must be checked). Hashing should be a coarsening of == as an equivalence relationship. -- Terry Jan Reedy From vinay_sajip at yahoo.co.uk Sat Jan 7 22:25:37 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sat, 7 Jan 2012 21:25:37 +0000 (UTC) Subject: [Python-Dev] A question about the subprocess implementation Message-ID: The subprocess.Popen constructor takes stdin, stdout and stderr keyword arguments which are supposed to represent the file handles of the child process. The object also has stdin, stdout and stderr attributes, which one would naively expect to correspond to the passed in values, except where you pass in e.g. subprocess.PIPE (in which case the corresponding attribute would be set to an actual stream or descriptor). However, in common cases, even when keyword arguments are passed in, the corresponding attributes are set to None. The following script import os from subprocess import Popen, PIPE import tempfile cmd = 'ls /tmp'.split() p = Popen(cmd, stdout=open(os.devnull, 'w+b')) print('process output streams: %s, %s' % (p.stdout, p.stderr)) p = Popen(cmd, stdout=tempfile.TemporaryFile()) print('process output streams: %s, %s' % (p.stdout, p.stderr)) prints process output streams: None, None process output streams: None, None under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then the corresponding attribute *is* set: if the last four lines are changed to p = Popen(cmd, stdout=PIPE) print('process output streams: %s, %s' % (p.stdout, p.stderr)) p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE) print('process output streams: %s, %s' % (p.stdout, p.stderr)) then you get process output streams: ', mode 'rb' at 0x2088660>, None process output streams: None, ', mode 'rb' at 0x2088e40> under Python 2.7, and process output streams: <_io.FileIO name=3 mode='rb'>, None process output streams: None, <_io.FileIO name=5 mode='rb'> This seems to me to contradict the principle of least surprise. One would expect, when an file-like object is passed in as a keyword argument, that it be placed in the corresponding attribute. That way, if one wants to do p.stdout.close() (which is necessary in some cases), one doesn't hit an AttributeError because NoneType has no attribute 'close'. This seems like it might be a bug, but if so it does seem rather egregious: can someone tell me if there is a good design reason for the current behaviour? If there isn't one, I'll raise an issue. Regards, Vinay Sajip From benjamin at python.org Sat Jan 7 22:47:50 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 7 Jan 2012 16:47:50 -0500 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <4F088795.5000800@v.loewis.de> References: <4F088795.5000800@v.loewis.de> Message-ID: 2012/1/7 "Martin v. L?wis" : > I just tried porting Python as a Metro (Windows 8) App, and failed. Is this required for Python to run on Windows 8? Sorry if that's a dumb question. I'm not sure if "Metro App" is a special class of application. -- Regards, Benjamin From martin at v.loewis.de Sat Jan 7 23:07:33 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 07 Jan 2012 23:07:33 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> Message-ID: <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Zitat von Benjamin Peterson : > 2012/1/7 "Martin v. L?wis" : >> I just tried porting Python as a Metro (Windows 8) App, and failed. > > Is this required for Python to run on Windows 8? No. Existing applications ("desktop applications") will continue to work unmodified. Metro-style apps are primarily intended for smart phones and tablet PCs, and will be distributed through the Windows app store. The current VS prerelease supports both Intel and ARM processors for Apps. A related question is whether Python will compile unmodified with Visual Studio 11. Although I had some difficulties with that also so far, I expect that this will ultimately work (although not unmodified - the project files need to be updated, as will the packaging process). A then-related question is whether Python 3.3 should be compiled with Visual Studio 11. I'd still be in favor of that, provided Microsoft manages to release that soon enough. Regards, Martin From brian at python.org Sat Jan 7 23:52:44 2012 From: brian at python.org (Brian Curtin) Date: Sat, 7 Jan 2012 16:52:44 -0600 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: On Sat, Jan 7, 2012 at 16:07, wrote: > A then-related question is whether Python 3.3 should be compiled with Visual > Studio 11. I'd still be in favor of that, provided Microsoft manages to > release that soon enough. I'm guessing the change would have to be done before the first beta? It would have to be released awfully soon, and I haven't heard an estimated release date as of yet. I currently have the default branch mostly ported to VS 2010 save for a number of failed tests, FWIW. From eliben at gmail.com Sat Jan 7 23:56:20 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 8 Jan 2012 00:56:20 +0200 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: > A then-related question is whether Python 3.3 should be compiled with > Visual > Studio 11. I'd still be in favor of that, provided Microsoft manages to > release > that soon enough. > Martin, I assume you mean the Express version of Visual Studio 11 here, right? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jan 7 23:57:29 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 7 Jan 2012 23:57:29 +0100 Subject: [Python-Dev] Python as a Metro-style App References: <4F088795.5000800@v.loewis.de> Message-ID: <20120107235729.5d3953af@pitrou.net> On Sat, 07 Jan 2012 18:57:41 +0100 "Martin v. L?wis" wrote: > For example, everything > related to subprocess creation would not work; none of the > byte-oriented file API seems to be present, and a number of file > operation functions are absent as well (such as MoveFile). When you say MoveFile is absent, is MoveFileEx supported instead? Or is moving files just totally impossible? Depending on the extent of removed/disabled functionality, it might not be very interesting to have a Metro port at all. > I'm not sure whether such a change would be accepted, in particular as > Microsoft might fix the bug in the compiler until the final release > of Windows 8. I would hope they finally support compiling C code... Regards Antoine. From tjreedy at udel.edu Sun Jan 8 00:38:08 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 07 Jan 2012 18:38:08 -0500 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> Message-ID: On 1/7/2012 4:47 PM, Benjamin Peterson wrote: > 2012/1/7 "Martin v. L?wis": >> I just tried porting Python as a Metro (Windows 8) App, and failed. > > Is this required for Python to run on Windows 8? No, normal 'desktop' programs will still run in desktop mode. > Sorry if that's a dumb question. I'm not sure if "Metro App" is a > special class of application. Yes. They are basically 'phone/touchpad' apps, and will be managed in the more or less the same way. They will probably only be available through MS storefront, after vetting by MS. Only Metro Apps will survive a system Refresh, along with user data. Traditional unvetted, direct-from-supplier, desktops apps will be wiped because they might be 'bad'. -- Terry Jan Reedy From paul at smedley.id.au Sun Jan 8 00:47:59 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sun, 08 Jan 2012 10:17:59 +1030 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: Hi Terry, On 07/01/12 19:47, Terry Reedy wrote: > On 1/7/2012 3:48 AM, Paul Smedley wrote: > >> using _init_posix() for 'os2' instead of _init_non_posix is the fix for >> this. >> >> sysconfig.py also needs the following changes: >> --- \dev\Python-2.7.2-o\Lib\sysconfig.py 2012-01-06 19:27:14.000000000 >> +1030 >> +++ sysconfig.py 2012-01-07 19:03:00.000000000 +1030 >> @@ -46,7 +46,7 @@ >> 'scripts': '{base}/Scripts', >> 'data' : '{base}', >> }, >> - 'os2_home': { >> + 'os2_user': { >> 'stdlib': '{userbase}/lib/python{py_version_short}', >> 'platstdlib': '{userbase}/lib/python{py_version_short}', >> 'purelib': '{userbase}/lib/python{py_version_short}/site-packages', >> @@ -413,9 +413,9 @@ >> _CONFIG_VARS['platbase'] = _EXEC_PREFIX >> _CONFIG_VARS['projectbase'] = _PROJECT_BASE >> >> - if os.name in ('nt', 'os2'): >> + if os.name in ('nt'): >> _init_non_posix(_CONFIG_VARS) >> - if os.name == 'posix': >> + if os.name in ('posix', 'os2'): >> _init_posix(_CONFIG_VARS) > > Submit a patch on the tracker, preferably as a file rather than cut and > paste. Will do right now. Cheers, Paul From tjreedy at udel.edu Sun Jan 8 01:02:08 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 07 Jan 2012 19:02:08 -0500 Subject: [Python-Dev] A question about the subprocess implementation In-Reply-To: References: Message-ID: On 1/7/2012 4:25 PM, Vinay Sajip wrote: > The subprocess.Popen constructor takes stdin, stdout and stderr keyword > arguments which are supposed to represent the file handles of the child process. > The object also has stdin, stdout and stderr attributes, which one would naively > expect to correspond to the passed in values, except where you pass in e.g. > subprocess.PIPE (in which case the corresponding attribute would be set to an > actual stream or descriptor). > > However, in common cases, even when keyword arguments are passed in, the > corresponding attributes are set to None. The following script > > import os > from subprocess import Popen, PIPE > import tempfile > > cmd = 'ls /tmp'.split() > > p = Popen(cmd, stdout=open(os.devnull, 'w+b')) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=tempfile.TemporaryFile()) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > prints > > process output streams: None, None > process output streams: None, None > > under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then > the corresponding attribute *is* set: if the last four lines are changed to > > p = Popen(cmd, stdout=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > then you get > > process output streams:', mode 'rb' at 0x2088660>, None > process output streams: None,', mode 'rb' at 0x2088e40> > > under Python 2.7, and > > process output streams:<_io.FileIO name=3 mode='rb'>, None > process output streams: None,<_io.FileIO name=5 mode='rb'> > > This seems to me to contradict the principle of least surprise. One would > expect, when an file-like object is passed in as a keyword argument, that it be > placed in the corresponding attribute. The behavior matches the doc: Popen.stdin If the stdin argument was PIPE, this attribute is a file object that provides input to the child process. Otherwise, it is None. -- ditto for Popen.stdout, .stderr > That way, if one wants to do > p.stdout.close() (which is necessary in some cases), one doesn't hit an > AttributeError because NoneType has no attribute 'close'. I believe you are expected to keep a reference to anything you pass in. pout = open(os.devnull, 'w+b') p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) The attributes were added for the case when you do not otherwise have access. > This seems like it might be a bug, but if so it does seem rather egregious: It would be egregious if is were a bug, but it is not. > someone tell me if there is a good design reason for the current behaviour? If > there isn't one, I'll raise an issue. That seems like a possibly reasonable enhancement request. But the counterargument might be that you have to separately keep track of the need to close anyway. Or that you should do things like with open(os.devnull, 'w+b') as pout: p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE) -- Terry Jan Reedy From p.f.moore at gmail.com Sun Jan 8 01:04:38 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 8 Jan 2012 00:04:38 +0000 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: On 7 January 2012 22:56, Eli Bendersky wrote: > >> A then-related question is whether Python 3.3 should be compiled with >> Visual >> Studio 11. I'd still be in favor of that, provided Microsoft manages to >> release >> that soon enough. > > > Martin, I assume you mean the Express version of Visual Studio 11 here, > right? I would assume that Express should work, but the python.org distributed binaries will use the full version (IIUC, the official distribution uses some optimisations not present in Express - Profile Guided Optimisation, I believe). Paul. From brian at python.org Sun Jan 8 01:11:22 2012 From: brian at python.org (Brian Curtin) Date: Sat, 7 Jan 2012 18:11:22 -0600 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: On Sat, Jan 7, 2012 at 18:04, Paul Moore wrote: > On 7 January 2012 22:56, Eli Bendersky wrote: >> >>> A then-related question is whether Python 3.3 should be compiled with >>> Visual >>> Studio 11. I'd still be in favor of that, provided Microsoft manages to >>> release >>> that soon enough. >> >> >> Martin, I assume you mean the Express version of Visual Studio 11 here, >> right? > > I would assume that Express should work, but the python.org > distributed binaries will use the full version (IIUC, the official > distribution uses some optimisations not present in Express - Profile > Guided Optimisation, I believe). The bigger issue is how Express doesn't (officially) support x64 builds, unless that's changing in VS11. Perhaps this is better for another topic, but is anyone using the PGO stuff? I know we have PGInstrument and PGUpdate build configurations but I've never seen them mentioned anywhere. From nyamatongwe at gmail.com Sun Jan 8 01:12:08 2012 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Sun, 8 Jan 2012 11:12:08 +1100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <20120107235729.5d3953af@pitrou.net> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: Antoine Pitrou: > When you say MoveFile is absent, is MoveFileEx supported instead? WinRT strongly prefers asynchronous methods for all lengthy operations. The most likely call to use for moving files is StorageFile.MoveAsync. http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx > Depending on the extent of removed/disabled functionality, it might not > be very interesting to have a Metro port at all. Asynchronous APIs will become much more important on all platforms in the future to ensure responsive user interfaces. Python should not be left behind. Neil From mwm at mired.org Sun Jan 8 01:14:06 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 7 Jan 2012 16:14:06 -0800 Subject: [Python-Dev] A question about the subprocess implementation In-Reply-To: References: Message-ID: <20120107161406.5c46b9b0@bhuda.mired.org> On Sat, 7 Jan 2012 21:25:37 +0000 (UTC) Vinay Sajip wrote: > The subprocess.Popen constructor takes stdin, stdout and stderr keyword > arguments which are supposed to represent the file handles of the child process. > The object also has stdin, stdout and stderr attributes, which one would naively > expect to correspond to the passed in values, except where you pass in e.g. > subprocess.PIPE (in which case the corresponding attribute would be set to an > actual stream or descriptor). > > However, in common cases, even when keyword arguments are passed in, the > corresponding attributes are set to None. The following script Note that this is documented behavior for these attributes. > This seems to me to contradict the principle of least surprise. One > would expect, when an file-like object is passed in as a keyword > argument, that it be placed in the corresponding attribute. Since the only reason they exist is so you can access your end of a pipe, setting them to anything would seem to be a bug. I'd argue that their existence is more a pola violation than them having the value None. But None is easier than a call to hasattr. > That way, if one wants to do p.stdout.close() (which is necessary in > some cases), one doesn't hit an AttributeError because NoneType has > no attribute 'close'. You can close the object you passed in if it wasn't PIPE. If you passed in PIPE, the object has to be exposed some way, otherwise you *can't* close it. This did raise one interesting question, which will go to ideas... http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > import os > from subprocess import Popen, PIPE > import tempfile > > cmd = 'ls /tmp'.split() > > p = Popen(cmd, stdout=open(os.devnull, 'w+b')) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=tempfile.TemporaryFile()) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > prints > > process output streams: None, None > process output streams: None, None > > under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then > the corresponding attribute *is* set: if the last four lines are changed to > > p = Popen(cmd, stdout=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > then you get > > process output streams: ', mode 'rb' at 0x2088660>, None > process output streams: None, ', mode 'rb' at 0x2088e40> > > under Python 2.7, and > > process output streams: <_io.FileIO name=3 mode='rb'>, None > process output streams: None, <_io.FileIO name=5 mode='rb'> > > This seems to me to contradict the principle of least surprise. One would > expect, when an file-like object is passed in as a keyword argument, that it be > placed in the corresponding attribute. That way, if one wants to do > p.stdout.close() (which is necessary in some cases), one doesn't hit an > AttributeError because NoneType has no attribute 'close'. > From solipsis at pitrou.net Sun Jan 8 01:27:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 08 Jan 2012 01:27:34 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: <1325982454.3374.1.camel@localhost.localdomain> > > When you say MoveFile is absent, is MoveFileEx supported instead? > > WinRT strongly prefers asynchronous methods for all lengthy > operations. The most likely call to use for moving files is > StorageFile.MoveAsync. > http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx How does it translate to C? > > Depending on the extent of removed/disabled functionality, it might not > > be very interesting to have a Metro port at all. > > Asynchronous APIs will become much more important on all platforms > in the future to ensure responsive user interfaces. Python should not > be left behind. I'm not sure why "responsive user interfaces" would be more important today than 10 years ago, but at least I hope Microsoft has found something more usable than overlapped I/O. Regards Antoine. From ncoghlan at gmail.com Sun Jan 8 01:32:10 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 8 Jan 2012 10:32:10 +1000 Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the source and destination are on different filesystems, In-Reply-To: <20120107190031.2f59ca63@pitrou.net> References: <20120107190031.2f59ca63@pitrou.net> Message-ID: On Sun, Jan 8, 2012 at 4:00 AM, Antoine Pitrou wrote: > I'm not sure it was *well* defined (or even defined at all). It seems > more of a by-product of the implementation. It's not only different > from mv, but it's inconsistent with itself (the semantics are different > depending on whether the paths are on the same filesystem or not; > also, it copied the *file* but erased the *link*). Yeah, Hynek's explanation pointing out the existing inconsistencies made sense to me. I have to agree with the point that symlinks+removable media are almost inevitably going to create weirdness that isn't easily handled by any means other than "symlinks=False" :P Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From nyamatongwe at gmail.com Sun Jan 8 02:02:21 2012 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Sun, 8 Jan 2012 12:02:21 +1100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <1325982454.3374.1.camel@localhost.localdomain> Message-ID: Antoine Pitrou: > How does it translate to C? The simplest technique would be to use C++ code to bridge from C to the API. If you really wanted to you could explicitly call the function pointer in the COM vtable but doing COM in C is more effort than calling through C++. > I'm not sure why "responsive user interfaces" would be more important > today than 10 years ago, but at least I hope Microsoft has found > something more usable than overlapped I/O. They are more important now due to the use of phones and tablets together with distant file systems. Neil From python-dev at masklinn.net Sun Jan 8 02:19:38 2012 From: python-dev at masklinn.net (Xavier Morel) Date: Sun, 8 Jan 2012 02:19:38 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <1325982454.3374.1.camel@localhost.localdomain> Message-ID: <1F06B9D1-2997-40CC-9A75-12EB0FB7185A@masklinn.net> On 2012-01-08, at 01:27 , Antoine Pitrou wrote: >>> When you say MoveFile is absent, is MoveFileEx supported instead? >> WinRT strongly prefers asynchronous methods for all lengthy >> operations. The most likely call to use for moving files is >> StorageFile.MoveAsync. >> http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx > How does it translate to C? From what I've read so far, it does not. WinRT inherits from COM (and the .net framework in some parts), so it seems like it's fundamentally an object-based API and the lowest-level language available is two variants of C++ (a template library and an extension to C++ which looks a bit like MS's older C++/CLI). I have not seen any mention of C bindings for WinRT so far. From vinay_sajip at yahoo.co.uk Sun Jan 8 02:48:54 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 8 Jan 2012 01:48:54 +0000 (UTC) Subject: [Python-Dev] A question about the subprocess implementation References: Message-ID: Terry Reedy udel.edu> writes: > The behavior matches the doc: Popen.stdin > If the stdin argument was PIPE, this attribute is a file object that > provides input to the child process. Otherwise, it is None. Right, but it's not very helpful, nor especially intuitive. Why does it have to be None in the case where you pass in a file object? Is there some benefit to be gained by doing this? Does something bad happen if you store that file object in proc.stdin / proc.stdout / proc.stderr? > I believe you are expected to keep a reference to anything you pass in. This can of course be done, but it can make code less clear than it needs to be. For example, if you run a subprocess asynchronously, the code that makes the Popen constructor call can be in a different place to the code that e.g. captures process output after completion. For that code to know how the Popen was constructed seems to make coupling overly strong. > That seems like a possibly reasonable enhancement request. But the > counterargument might be that you have to separately keep track of the > need to close anyway. It may be that the close() needs to be called whether you passed PIPE in, or a file-like object - (a) because of the need to receive and handle SIGPIPE in command pipelines, and (b) because it's e.g. set to a pipe you constructed yourself, and you need to close the write end before you can issue an unsized read on the read end. So the close logic would have to do e.g. if proc.stdout is None: proc.stdout.close() else: # pull out the reference from some other place and then close it rather than just proc.stdout.close() It's doable, of course. The with construction you suggested isn't usable in the general case, where the close() code is in a different place from the code which fires off the subprocess. Of course, since the behaviour matches the docs it would be an enhancement request rather than a bug report. I was hoping someone could enlighten me as to the *reason* for the current behaviour ... as it is, subprocess comes in for some stick in the community for being "hard to use" ... Regards, Vinay Sajip From vinay_sajip at yahoo.co.uk Sun Jan 8 03:06:33 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 8 Jan 2012 02:06:33 +0000 (UTC) Subject: [Python-Dev] A question about the subprocess implementation References: <20120107161406.5c46b9b0@bhuda.mired.org> Message-ID: Mike Meyer mired.org> writes: > Since the only reason they exist is so you can access your end of a > pipe, setting them to anything would seem to be a bug. I'd argue that > their existence is more a pola violation than them having the value > None. But None is easier than a call to hasattr. I don't follow your reasoning, re. why setting them to a handle used for subprocess output would be a bug - it's logically the same as the PIPE case. For example, I might have a pipe (say, constructed using os.pipe()) whose write end is intended for the subprocess to output to, and whose read end I want to hand off to some other code to read the output from the subprocess. However, if that other code does a read() on that pipe, it will hang until the write handle for the pipe is closed. So, once the subprocess has terminated, I need to close the write handle. The actual reading might be done not in my code but in some client code of my code. While I could use some other place to store it, where's the problem in storing it in proc.stdout or proc.stderr? > You can close the object you passed in if it wasn't PIPE. If you > passed in PIPE, the object has to be exposed some way, otherwise you > *can't* close it. Yes, I'm not disputing that I need to keep track of it - just that proc.stdout seems a good place to keep it. That way, the closing code can be de-coupled from the code that sets up the subprocess. A use case for this is when you want the subprocess and the parent to run concurrently/asynchronously, so the proc.wait() and subsequent processing happens at a different time and place to the kick-off. Regards, Vinay Sajip From dasdasich at googlemail.com Sun Jan 8 03:29:45 2012 From: dasdasich at googlemail.com (=?utf-8?Q?Daniel_Neuh=C3=A4user?=) Date: Sun, 8 Jan 2012 03:29:45 +0100 Subject: [Python-Dev] A question about the subprocess implementation In-Reply-To: References: Message-ID: That's documented behaviour nonetheless. I would agree that the behaviour is a stupid one (not knowing the reason for it); even so it cannot be changed in a backwards compatible way. Am 07.01.2012 um 22:25 schrieb Vinay Sajip : > The subprocess.Popen constructor takes stdin, stdout and stderr keyword > arguments which are supposed to represent the file handles of the child process. > The object also has stdin, stdout and stderr attributes, which one would naively > expect to correspond to the passed in values, except where you pass in e.g. > subprocess.PIPE (in which case the corresponding attribute would be set to an > actual stream or descriptor). > > However, in common cases, even when keyword arguments are passed in, the > corresponding attributes are set to None. The following script > > import os > from subprocess import Popen, PIPE > import tempfile > > cmd = 'ls /tmp'.split() > > p = Popen(cmd, stdout=open(os.devnull, 'w+b')) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=tempfile.TemporaryFile()) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > prints > > process output streams: None, None > process output streams: None, None > > under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then > the corresponding attribute *is* set: if the last four lines are changed to > > p = Popen(cmd, stdout=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE) > print('process output streams: %s, %s' % (p.stdout, p.stderr)) > > then you get > > process output streams: ', mode 'rb' at 0x2088660>, None > process output streams: None, ', mode 'rb' at 0x2088e40> > > under Python 2.7, and > > process output streams: <_io.FileIO name=3 mode='rb'>, None > process output streams: None, <_io.FileIO name=5 mode='rb'> > > This seems to me to contradict the principle of least surprise. One would > expect, when an file-like object is passed in as a keyword argument, that it be > placed in the corresponding attribute. That way, if one wants to do > p.stdout.close() (which is necessary in some cases), one doesn't hit an > AttributeError because NoneType has no attribute 'close'. > > This seems like it might be a bug, but if so it does seem rather egregious: can > someone tell me if there is a good design reason for the current behaviour? If > there isn't one, I'll raise an issue. > > Regards, > > Vinay Sajip > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/dasdasich%40googlemail.com From vandry at TZoNE.ORG Sun Jan 8 03:28:56 2012 From: vandry at TZoNE.ORG (Phil Vandry) Date: Sun, 08 Jan 2012 11:28:56 +0900 Subject: [Python-Dev] A question about the subprocess implementation In-Reply-To: References: Message-ID: <4F08FF68.1020406@TZoNE.ORG> On 2012-01-08 10:48 , Vinay Sajip wrote: > Terry Reedy udel.edu> writes: >> The behavior matches the doc: Popen.stdin >> If the stdin argument was PIPE, this attribute is a file object that >> provides input to the child process. Otherwise, it is None. > > Right, but it's not very helpful, nor especially intuitive. Why does it have to > be None in the case where you pass in a file object? Is there some benefit to be > gained by doing this? Does something bad happen if you store that file object in > proc.stdin / proc.stdout / proc.stderr? proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference to the file that got connected to the subprocess' stdin/stdout/stderr. They are meant to be a reference to the OTHER END of the pipe that got connected. When you pass in a normal file object there is no such thing as the OTHER END of that file. The value None reflects this fact, and should continue to do so. -Phil From mwm at mired.org Sun Jan 8 03:48:56 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 7 Jan 2012 18:48:56 -0800 Subject: [Python-Dev] A question about the subprocess implementation In-Reply-To: References: <20120107161406.5c46b9b0@bhuda.mired.org> Message-ID: <20120107184856.382eef31@bhuda.mired.org> On Sun, 8 Jan 2012 02:06:33 +0000 (UTC) Vinay Sajip wrote: > Mike Meyer mired.org> writes: > > > Since the only reason they exist is so you can access your end of a > > pipe, setting them to anything would seem to be a bug. I'd argue that > > their existence is more a pola violation than them having the value > > None. But None is easier than a call to hasattr. > > I don't follow your reasoning, re. why setting them to a handle used for > subprocess output would be a bug - it's logically the same as the PIPE case. No, it isn't. In the PIPE case, the value of the attributes isn't otherwise available to the caller. I think you're not following because you're thinking about what you want to do with the attributes: > storing it [the fd] in proc.stdout or proc.stderr? As opposed to what they're used for, which is communicating the fd's created in the PIPE case to the caller. Would you feel the same way if they were given the more accurate names "pipe_input" and "pipe_output"? > > You can close the object you passed in if it wasn't PIPE. If you > > passed in PIPE, the object has to be exposed some way, otherwise you > > *can't* close it. > Yes, I'm not disputing that I need to keep track of it - just that proc.stdout > seems a good place to keep it. I disagree. Having the proc object keep track of these things for you is making it more complicated (by the admittedly trivial change of assigning those two attributes when they aren't used) so you can make your process creation code less complicated (by the equally trivial change of assigning the values in those two attributes when they are used). Since only the caller knows when this complication is needed, that's the logical place to put it. > That way, the closing code can be de-coupled from the code that sets > up the subprocess. There are other ways to do that. It's still the same tradeoff - you're making the proc code more complicated to make the calling code simpler, even though only the calling code knows if that's needed. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From martin at v.loewis.de Sun Jan 8 04:17:14 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 08 Jan 2012 04:17:14 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: <20120108041714.Horde.YrX3Btjz9kRPCQq6ODd2_PA@webmail.df.eu> Zitat von Eli Bendersky : >> A then-related question is whether Python 3.3 should be compiled with >> Visual Studio 11. I'd still be in favor of that, provided Microsoft >> manages to >> release that soon enough. >> > > Martin, I assume you mean the Express version of Visual Studio 11 here, > right? *Here*, I mean "Visual Studio 11, any edition". I don't think the edition matters for determining what version the project files have - any edition will be able to read the project files, Express or not. If you are specifically asking whether I would make the release of the express edition a prerequisite to releasing Python: no, I wouldn't. I would expect that Microsoft releases the express edition along with or soon after the commercial editions, and the commercial edition is sufficient for running the Python release process. Regards, Martin From martin at v.loewis.de Sun Jan 8 04:35:17 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 08 Jan 2012 04:35:17 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <20120107235729.5d3953af@pitrou.net> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: <20120108043517.Horde.AUkaT9jz9kRPCQ71FhcAPHA@webmail.df.eu> > When you say MoveFile is absent, is MoveFileEx supported instead? > Or is moving files just totally impossible? I can't check the SDK headers right now, but according to the online documentation, MoveFileExW is indeed available. I'm not sure whether you are allowed to pass arbitrary file names in an App, though. > Depending on the extent of removed/disabled functionality, it might not > be very interesting to have a Metro port at all. I'm not so sure. Even if the low-level Win32 API was not available, you might still be able to do useful things with the higher-level APIs, such as Windows.Storage (in case of file access). If you use, say, Windows.Storage.ApplicationData.RoamingSettings in your app, you should not actually worry what the file is named on disk (or whether there is a spinning disk in the system at all, which probably isn't). Regards, Martin From martin at v.loewis.de Sun Jan 8 04:38:38 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 08 Jan 2012 04:38:38 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107230733.Horde.nZHobKGZi1VPCMIlAXXCemA@webmail.df.eu> Message-ID: <20120108043838.Horde.MAeIAdjz9kRPCQ__2wbnJUA@webmail.df.eu> > Perhaps this is better for another topic, but is anyone using the PGO > stuff? I know we have PGInstrument and PGUpdate build configurations > but I've never seen them mentioned anywhere. I'm using them in the 32-bit builds. I don't use them for the 64-bit builds, as the build machine was a 32-bit system (but perhaps I start with PGO for Win64 for 3.3). Regards, Martin From martin at v.loewis.de Sun Jan 8 04:42:46 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sun, 08 Jan 2012 04:42:46 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <1325982454.3374.1.camel@localhost.localdomain> Message-ID: <20120108044246.Horde.oH2ZD9jz9kRPCRC2frhgQvA@webmail.df.eu> Zitat von Antoine Pitrou : >> > When you say MoveFile is absent, is MoveFileEx supported instead? >> >> WinRT strongly prefers asynchronous methods for all lengthy >> operations. The most likely call to use for moving files is >> StorageFile.MoveAsync. >> http://msdn.microsoft.com/en-us/library/windows/apps/br227219.aspx > > How does it translate to C? Not sure whether you are asking literally for *C*: please remember that my original report said that C is apparently not currently supported for Apps. In any case, for native C++ code, do StorageFile ^the_file = something(); the_file->MoveAsync(destinationFolder, "newfile.txt"); This may look like managed C++ to you, but it really compiles into native code. Regards, Martin From paul at smedley.id.au Sun Jan 8 09:37:48 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sun, 08 Jan 2012 19:07:48 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 In-Reply-To: References: <20120106212829.7f4b5f43@pitrou.net> Message-ID: On 07/01/12 08:22, Paul Smedley wrote: >> For the purpose of debugging you could *not* ignore the error and >> instead print it out or bail out. > Thanks - commenting out the ImportErrors block, I get: > ImportError: No module named encodings OK got through this - PYTHONPATH in makefile was borked for OS/2 (: separators vs ; which don't work so well with drive letters) Now having trouble importing the _io module even though it's builtin From paul at smedley.id.au Sun Jan 8 09:42:48 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sun, 08 Jan 2012 19:12:48 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 In-Reply-To: References: <20120106212829.7f4b5f43@pitrou.net> Message-ID: On 08/01/12 19:07, Paul Smedley wrote: > On 07/01/12 08:22, Paul Smedley wrote: >>> For the purpose of debugging you could *not* ignore the error and >>> instead print it out or bail out. >> Thanks - commenting out the ImportErrors block, I get: >> ImportError: No module named encodings > > OK got through this - PYTHONPATH in makefile was borked for OS/2 (: > separators vs ; which don't work so well with drive letters) > > Now having trouble importing the _io module even though it's builtin > to be clear, the error is: Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "U:/DEV/python-3.2.2/Lib/io.py", line 60, in Killed by SIGABRT From paul at smedley.id.au Sun Jan 8 09:59:59 2012 From: paul at smedley.id.au (Paul Smedley) Date: Sun, 08 Jan 2012 19:29:59 +1030 Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3 In-Reply-To: References: <20120106212829.7f4b5f43@pitrou.net> Message-ID: On 08/01/12 19:12, Paul Smedley wrote: > On 08/01/12 19:07, Paul Smedley wrote: >> On 07/01/12 08:22, Paul Smedley wrote: >>>> For the purpose of debugging you could *not* ignore the error and >>>> instead print it out or bail out. >>> Thanks - commenting out the ImportErrors block, I get: >>> ImportError: No module named encodings >> >> OK got through this - PYTHONPATH in makefile was borked for OS/2 (: >> separators vs ; which don't work so well with drive letters) >> >> Now having trouble importing the _io module even though it's builtin >> >> > to be clear, the error is: > Fatal Python error: Py_Initialize: can't initialize sys standard streams > Traceback (most recent call last): > File "U:/DEV/python-3.2.2/Lib/io.py", line 60, in > > Killed by SIGABRT > > and it's dying in _iomodule.c at: /* put os in the module state */ state->os_module = PyImport_ImportModule("os"); if (state->os_module == NULL){ fprintf(stderr,"_iomodule fail\n"); goto fail;} for some reason.. at least I'm slowly making progress :P (I think) Cheers, Paul From neologix at free.fr Sun Jan 8 12:32:08 2012 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 8 Jan 2012 12:32:08 +0100 Subject: [Python-Dev] usefulness of Python version of threading.RLock In-Reply-To: References: Message-ID: > The yes/no answer is "No, we can't drop it". Thanks, that's a clear answer :-) > I'm not convinced of the benefits of removing the pure Python RLock > implementation Indeed. As noted, this issue with signal handlers is more general, so this wouldn't solve the problem at hand. I just wanted to know whether we could remove this "duplicate" code, but since it might be used by some implementations, it's best to keep it. From vinay_sajip at yahoo.co.uk Sun Jan 8 13:09:38 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 8 Jan 2012 12:09:38 +0000 (UTC) Subject: [Python-Dev] A question about the subprocess implementation References: <4F08FF68.1020406@TZoNE.ORG> Message-ID: Phil Vandry TZoNE.ORG> writes: > proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference > to the file that got connected to the subprocess' stdin/stdout/stderr. > They are meant to be a reference to the OTHER END of the pipe that got > connected. Of course, and I've been using them like that, in general. But reading those two sentences above made the light bulb come on :-) Regards, Vinay Sajip From jimjjewett at gmail.com Sun Jan 8 23:33:32 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 8 Jan 2012 17:33:32 -0500 Subject: [Python-Dev] Hash collision security issue (now public) Message-ID: In http://mail.python.org/pipermail/python-dev/2012-January/115368.html Stefan Behnel wrote: > Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. They SHOULD NOT represent the same content; comparing two strings currently requires converting them to canonical form, which means the smallest format (of those three) that works. If it can be represented in PyUnicode_1BYTE_KIND, then representations using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as canonical, won't be created by Python itself, and already compare unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a shortcut used by dicts). That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling _PyUnicode_CheckConsistency). I don't have any insight on how prevalent non-conforming strings will be in practice, or whether supporting their equality will be required as a bugfix. -jJ From brian at python.org Mon Jan 9 01:36:59 2012 From: brian at python.org (Brian Curtin) Date: Sun, 8 Jan 2012 18:36:59 -0600 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: On Sun, Jan 8, 2012 at 16:33, Jim Jewett wrote: > In http://mail.python.org/pipermail/python-dev/2012-January/115368.html > Stefan Behnel wrote: Can you please configure your mail client to not create new threads like this? As if this topic wasn't already hard enough to follow, it now exists across handfuls of threads with the same title. From ncoghlan at gmail.com Mon Jan 9 01:40:18 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 9 Jan 2012 10:40:18 +1000 Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as In-Reply-To: References: Message-ID: On Mon, Jan 9, 2012 at 5:31 AM, charles-francois.natali wrote: > ?Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as > it should on OpenIndiana. Can rewinddir() end up touching the filesystem to retrieve data? I noticed that your previous change (the one this checkin reverted) moved it outside the GIL release macros. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From benjamin at python.org Mon Jan 9 01:43:33 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 8 Jan 2012 19:43:33 -0500 Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as In-Reply-To: References: Message-ID: 2012/1/8 Nick Coghlan : > On Mon, Jan 9, 2012 at 5:31 AM, charles-francois.natali > wrote: >> ?Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as >> it should on OpenIndiana. > > Can rewinddir() end up touching the filesystem to retrieve data? I > noticed that your previous change (the one this checkin reverted) > moved it outside the GIL release macros. It just resets a position count. (in glibc). -- Regards, Benjamin From lists at cheimes.de Mon Jan 9 02:01:46 2012 From: lists at cheimes.de (Christian Heimes) Date: Mon, 09 Jan 2012 02:01:46 +0100 Subject: [Python-Dev] py3benchmark not working Message-ID: Hello, I tried to compare the py3k baseline with my randomhash branch but the benchmark suite is failing. I've follewed the instruction # hg clone http://hg.python.org/benchmarks/ py2benchmarks # mkdir py3benchmarks; # cd py3benchmarks # ../py2benchmarks/make_perf3.sh ../py2benchmarks # python3.1 perf.py -b py3k old_py3k new_py3k but the suite immediately bails out: $ ../3.1/python perf.py -r -b default ../py3k/python ../randomhash/python Running 2to3... INFO:root:Running ../py3k/python lib/2to3/2to3 -f all lib/2to3_data Traceback (most recent call last): File "perf.py", line 2236, in main(sys.argv[1:]) File "perf.py", line 2192, in main options))) File "perf.py", line 1279, in BM_2to3 return SimpleBenchmark(Measure2to3, *args, **kwargs) File "perf.py", line 706, in SimpleBenchmark *args, **kwargs) File "perf.py", line 1275, in Measure2to3 return MeasureCommand(command, trials, env, options.track_memory) File "perf.py", line 1223, in MeasureCommand CallAndCaptureOutput(command, env=env) File "perf.py", line 1053, in CallAndCaptureOutput raise RuntimeError("Benchmark died: " + str(stderr, 'ascii')) RuntimeError: Benchmark died: RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma Traceback (most recent call last): File "lib/2to3/2to3", line 5, in sys.exit(main("lib2to3.fixes")) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/main.py", line 173, in main options.processes) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 700, in refactor items, write, doctests_only) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 294, in refactor self.refactor_dir(dir_or_file, write, doctests_only) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 314, in refactor_dir self.refactor_file(fullname, write, doctests_only) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 741, in refactor_file *args, **kwargs) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 349, in refactor_file tree = self.refactor_string(input, filename) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 381, in refactor_string self.refactor_tree(tree, name) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/refactor.py", line 455, in refactor_tree new = fixer.transform(node, results) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/fixes/fix_operator.py", line 43, in transform method = self._check_method(node, results) File "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/fixes/fix_operator.py", line 89, in _check_method method = getattr(self, "_" + results["method"][0].value.encode("ascii")) TypeError: Can't convert 'bytes' object to str implicitly Christian From solipsis at pitrou.net Mon Jan 9 02:24:42 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 9 Jan 2012 02:24:42 +0100 Subject: [Python-Dev] py3benchmark not working References: Message-ID: <20120109022442.089d190f@pitrou.net> On Mon, 09 Jan 2012 02:01:46 +0100 Christian Heimes wrote: > > I tried to compare the py3k baseline with my randomhash branch but the > benchmark suite is failing. > > I've follewed the instruction For the record, you don't really need this. Just run the "2n3" benchmark set (it works under both 2.x and 3.x). The "py3k" set will include a couple more/other benchmarks though. Regards Antoine. From jdhardy at gmail.com Mon Jan 9 07:13:25 2012 From: jdhardy at gmail.com (Jeff Hardy) Date: Sun, 8 Jan 2012 22:13:25 -0800 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <20120107235729.5d3953af@pitrou.net> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou wrote: > Depending on the extent of removed/disabled functionality, it might not > be very interesting to have a Metro port at all. Win 8 is practically a new OS target - the nt module may need to be replaced with a metro module to handle it well. Accessing the WinRT APIs directly from Python will also require a set of Python projections for the API, which should be straightforward to generate from the WinRT metadata files. I know Dino Viehland did some work on that; not sure if he can elaborate or not though. Otherwise, IronPython would be the only option for writing Metro apps in Python - not that I'd be *horribly* upset at that :). IronPython is slowly growing Metro support, and it seems like most things will work, but the .NET framework shields it from a lot of the WinRT guts. - Jeff From stefan_ml at behnel.de Mon Jan 9 09:13:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 09 Jan 2012 09:13:15 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: Jim Jewett, 08.01.2012 23:33: > Stefan Behnel wrote: >> Admittedly, this may require some adaptation for the PEP393 unicode memory >> layout in order to produce identical hashes for all three representations >> if they represent the same content. > > They SHOULD NOT represent the same content; comparing two strings > currently requires converting them to canonical form, which means the > smallest format (of those three) that works. > [...] > That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency). That's what I meant. AFAIR, the PEP393 discussions at some point brought up the suspicion that third party code may end up generating Unicode strings that do not comply with that "invariant". So internal code shouldn't strictly rely on it when it deals with user provided data. One example is the "unequal kinds" optimisation in equality comparison, which, if I'm not mistaken, wasn't implemented, due to exactly this reasoning. The same applies to hashing then. Stefan From neologix at free.fr Mon Jan 9 09:23:30 2012 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Mon, 9 Jan 2012 09:23:30 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as In-Reply-To: References: Message-ID: >> Can rewinddir() end up touching the filesystem to retrieve data? I >> noticed that your previous change (the one this checkin reverted) >> moved it outside the GIL release macros. > > It just resets a position count. (in glibc). Actually, it also calls lseek() on the directory FD: http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/rewinddir.c;hb=HEAD But lseek() doesn't (normally) perform I/O, it just sets an offset in the kernel file structure: http://lxr.free-electrons.com/source/fs/read_write.c#L38 For example, it's not documented to return EINTR. Now, one could imagine that the kernel could do some read-ahead or some other magic things when passed SEEK_DATA or SEEK_HOLE, but seeking at the beginning of a directory FD should be fast. Anyway, I ended up reverting this change, because for some reason this broke OpenIndiana buildbots (maybe rewinddir() is a no-op before readdir() has been called?). Cheers, cf From mark at hotpy.org Mon Jan 9 09:56:56 2012 From: mark at hotpy.org (Mark Shannon) Date: Mon, 09 Jan 2012 08:56:56 +0000 Subject: [Python-Dev] py3benchmark not working In-Reply-To: References: Message-ID: <4F0AABD8.8000802@hotpy.org> Christian Heimes wrote: > Hello, > > I tried to compare the py3k baseline with my randomhash branch but the > benchmark suite is failing. > > I've follewed the instruction > > # hg clone http://hg.python.org/benchmarks/ py2benchmarks > # mkdir py3benchmarks; > # cd py3benchmarks > # ../py2benchmarks/make_perf3.sh ../py2benchmarks > # python3.1 perf.py -b py3k old_py3k new_py3k > > but the suite immediately bails out: > [snip] > "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/fixes/fix_operator.py", > line 89, in _check_method > method = getattr(self, "_" + results["method"][0].value.encode("ascii")) > TypeError: Can't convert 'bytes' object to str implicitly > You can temporarily "fix" this by removing the .encode("ascii") from line 89 in lib2to3/fixes/fix_operator.py I'm not sure if this is a bug in 2to3 or the benchmark. Cheers, Mark From victor.stinner at haypocalc.com Mon Jan 9 10:53:19 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 9 Jan 2012 10:53:19 +0100 Subject: [Python-Dev] Hash collision security issue (now public) In-Reply-To: References: Message-ID: > That said, I don't think smallest-format is actually enforced with > anything stronger than comments (such as in unicodeobject.h struct > PyASCIIObject) and asserts (mostly calling > _PyUnicode_CheckConsistency). ?I don't have any insight on how > prevalent non-conforming strings will be in practice, or whether > supporting their equality will be required as a bugfix. If you are only Python, you cannot create a string in a non canonical form. If you use the C API, you can create a string in a non canonical form using PyUnicode_New() + PyUnicode_WRITE, or PyUnicode_FromUnicode(NULL, length) (or PyUnicode_FromStringAndSize(NULL, length)) + direct access to the Py_UNICODE* string. If you create strings in a non canonical form, it is a bug in your application and Python doesn't help you. But how could Python help you? Expose a function to check your newly creating string? There is already _PyUnicode_CheckConsistency() which is slow (O(n)) because it checks each character, it is only used in debug mode. Victor From victor.stinner at haypocalc.com Mon Jan 9 10:58:25 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 9 Jan 2012 10:58:25 +0100 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: > - ? ? ? ?if os.name in ('nt', 'os2'): > + ? ? ? ?if os.name in ('nt'): This change is wrong: it should be os.name == 'nt'. Victor From steve at pearwood.info Mon Jan 9 11:02:57 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 09 Jan 2012 21:02:57 +1100 Subject: [Python-Dev] Compiling 2.7.2 on OS/2 In-Reply-To: References: Message-ID: <4F0ABB51.2080004@pearwood.info> Victor Stinner wrote: >> - if os.name in ('nt', 'os2'): >> + if os.name in ('nt'): > > This change is wrong: it should be os.name == 'nt'. Or possibly os.name in ('nt', ) (note the comma). -- Steven From benjamin at python.org Mon Jan 9 14:02:53 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 9 Jan 2012 08:02:53 -0500 Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as In-Reply-To: References: Message-ID: 2012/1/9 Charles-Fran?ois Natali : >>> Can rewinddir() end up touching the filesystem to retrieve data? I >>> noticed that your previous change (the one this checkin reverted) >>> moved it outside the GIL release macros. >> >> It just resets a position count. (in glibc). > > Actually, it also calls lseek() on the directory FD: > http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/rewinddir.c;hb=HEAD > > But lseek() doesn't (normally) perform I/O, it just sets an offset in > the kernel file structure: > http://lxr.free-electrons.com/source/fs/read_write.c#L38 Sorry, I should have implied I looked at the kernel source, too. :) -- Regards, Benjamin From pasparis at noos.fr Mon Jan 9 15:46:04 2012 From: pasparis at noos.fr (pasparis at noos.fr) Date: Mon, 9 Jan 2012 15:46:04 +0100 (CET) Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class Message-ID: An HTML attachment was scrubbed... URL: From jon at sandgate.com Mon Jan 9 15:32:13 2012 From: jon at sandgate.com (Jon Wells) Date: Tue, 10 Jan 2012 01:32:13 +1100 Subject: [Python-Dev] descriptor as instance attribute Message-ID: <1326119533.16276.54.camel@localhost> I can't find an answer to this grovelling through get user info. on descriptors. Assuming desc() is a data descriptor class why are the following not the same??? class poop(object): var = desc() and class poop(object): def __init__(self): self.var = desc() In the second form the descriptor protocol for access to 'var' is ignored. Would seem to not make sense to me. jon. From phd at phdru.name Mon Jan 9 16:51:35 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 9 Jan 2012 19:51:35 +0400 Subject: [Python-Dev] descriptor as instance attribute In-Reply-To: <1326119533.16276.54.camel@localhost> References: <1326119533.16276.54.camel@localhost> Message-ID: <20120109155135.GA27690@iskra.aviel.ru> Hello. We are sorry but we cannot help you. This mailing list is to work on developing Python (adding new features to Python itself and fixing bugs); if you're having problems learning, understanding or using Python, please find another forum. Probably python-list/comp.lang.python mailing list/news group is the best place; there are Python developers who participate in it; you may get a faster, and probably more complete, answer there. See http://www.python.org/community/ for other lists/news groups/fora. Thank you for understanding. On Tue, Jan 10, 2012 at 01:32:13AM +1100, Jon Wells wrote: > I can't find an answer to this grovelling through get user info. on > descriptors. Read carefully http://users.rcn.com/python/download/Descriptor.htm > Assuming desc() is a data descriptor class why are the following not the > same??? > > class poop(object): > var = desc() > > and > > class poop(object): > def __init__(self): > self.var = desc() > > In the second form the descriptor protocol for access to 'var' is > ignored. From http://users.rcn.com/python/download/Descriptor.htm: ...transforms b.x into type(b).__dict__['x'].__get__(b, type(b)).. Please note the first type(b). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From amauryfa at gmail.com Mon Jan 9 19:09:19 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 9 Jan 2012 19:09:19 +0100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: Good evening, 2012/1/9 > ** > I am trying to send a tuple to a method of a python class and I got a Run > failed from netbeans compiler > when I want to send a tuple to a simple method in a module it works,when I > want to send a simple parameter to a method of a clas it works also but not > a tuple to a method of a class > This mailing list is for the development *of* python. For development *with* python, please ask your questions on the comp.lang.python group or the python-list at python.org mailing list. There you will find friendly people willing to help. [for your particular question: keep in mind that PyObject_Call takes arguments as a tuple; if you want to pass one tuple, you need to build a 1-tuple around your tuple] -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From dinov at microsoft.com Mon Jan 9 18:59:45 2012 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 9 Jan 2012 17:59:45 +0000 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> We spent some time investigating Python/Win8 projections but we don't really have anything else to say right now, but it is certainly possible. I haven't been following this thread so maybe this was already discussed, but on the whole "new OS target" thing - if people want to write immersive apps in Python then there will need to be a new build of Python. One thing that might make that easier is the fact that the C runtime is still available to metro apps, even if the C runtime calls a banned API. So to the extent that Python is just a C program the "port" should be pretty easy and mostly involve disabling functionality that isn't available at all to metro apps. I have packaged up Python 2.7 in an appx and run the application verifier on it (this was a while ago, so things may have changed between now and then), the attached banned.txt includes the list of APIs which Python is using that aren't allowed for the curious. Also, people who write apps will need to distribute Python w/ their app, there's currently no sharing between apps. -----Original Message----- From: Jeff Hardy [mailto:jdhardy at gmail.com] Sent: Sunday, January 08, 2012 10:13 PM To: Antoine Pitrou Cc: python-dev at python.org; Dino Viehland Subject: Re: [Python-Dev] Python as a Metro-style App On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou wrote: > Depending on the extent of removed/disabled functionality, it might > not be very interesting to have a Metro port at all. Win 8 is practically a new OS target - the nt module may need to be replaced with a metro module to handle it well. Accessing the WinRT APIs directly from Python will also require a set of Python projections for the API, which should be straightforward to generate from the WinRT metadata files. I know Dino Viehland did some work on that; not sure if he can elaborate or not though. Otherwise, IronPython would be the only option for writing Metro apps in Python - not that I'd be *horribly* upset at that :). IronPython is slowly growing Metro support, and it seems like most things will work, but the .NET framework shields it from a lot of the WinRT guts. - Jeff -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: banned.txt URL: From solipsis at pitrou.net Mon Jan 9 22:59:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 9 Jan 2012 22:59:07 +0100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. References: Message-ID: <20120109225907.18e834c3@pitrou.net> On Mon, 09 Jan 2012 21:58:29 +0100 terry.reedy wrote: > > -Different branches are used at a time to represent different *minor versions* > -in which development is made. All development should be done **first** in the > -:ref:`in-development ` branch, and selectively backported > -to other branches when necessary. > +There is a branch for each *minor version*. Development is done separately > +for Python 2 and Python 3. For each *major version*, each change should be made > +**first** in the oldest branch to which it applies and forward-ported as > +appropriate. Please avoid using the terms "minor version" and "major version", they are confusing. Thanks Antoine. From neologix at free.fr Mon Jan 9 23:01:54 2012 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Mon, 9 Jan 2012 23:01:54 +0100 Subject: [Python-Dev] svn.python.org certificate expired Message-ID: Hi, All the buildbots are turning red because of test_ssl: """ ====================================================================== ERROR: test_connect (test.test_ssl.NetworkedTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_ssl.py", line 616, in test_connect s.connect(("svn.python.org", 443)) File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py", line 519, in connect self._real_connect(addr, False) File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py", line 509, in _real_connect self.do_handshake() File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py", line 489, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [Errno 1] _ssl.c:420: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed """ It seems that svn.python.org certificate expired today (09/01/2012). Cheers, cf From ncoghlan at gmail.com Tue Jan 10 02:52:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jan 2012 11:52:40 +1000 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <20120109225907.18e834c3@pitrou.net> References: <20120109225907.18e834c3@pitrou.net> Message-ID: On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou wrote: > Please avoid using the terms "minor version" and "major version", they > are confusing. Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x, 3.x) are the least confusing terms we have available. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Tue Jan 10 05:05:05 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Jan 2012 23:05:05 -0500 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: References: <20120109225907.18e834c3@pitrou.net> Message-ID: On 1/9/2012 8:52 PM, Nick Coghlan wrote: > On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou wrote: >> Please avoid using the terms "minor version" and "major version", they >> are confusing. > > Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x, > 3.x) are the least confusing terms we have available. I minimally edited what was already there to correct what is now an error. The change comes immediately after a section defining major, minor, and micro releases. To change terms, http://docs.python.org/devguide/devcycle.html and possibly other pages needs more extensive editing. -- Terry Jan Reedy From stefan_ml at behnel.de Tue Jan 10 09:35:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 10 Jan 2012 09:35:44 +0100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: Hi, sorry for hooking into this off-topic thread. Amaury Forgeot d'Arc, 09.01.2012 19:09: > 2012/1/9 >> I am trying to send a tuple to a method of a python class and I got a Run >> failed from netbeans compiler >> when I want to send a tuple to a simple method in a module it works,when I >> want to send a simple parameter to a method of a clas it works also but not >> a tuple to a method of a class > > This mailing list is for the development *of* python. > For development *with* python, please ask your questions on > the comp.lang.python group or the python-list at python.org mailing list. > There you will find friendly people willing to help. It's also worth mentioning the cython-users mailing list here, in case the OP cares about simplifying these kinds of issues from the complexity of C/C++ into Python. Cython is a really good and simple way to implement these kinds of language interactions, also for embedding Python. > [for your particular question: keep in mind that PyObject_Call takes > arguments as a tuple; > if you want to pass one tuple, you need to build a 1-tuple around your > tuple] The presented code also requires a whole lot of fixes (specifically in the error handling parts) that Cython would basically just handle for you already. Stefan From anacrolix at gmail.com Tue Jan 10 09:40:39 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Tue, 10 Jan 2012 19:40:39 +1100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: Perhaps the python-dev mailing list should be renamed to python-core. On Tue, Jan 10, 2012 at 7:35 PM, Stefan Behnel wrote: > Hi, > > sorry for hooking into this off-topic thread. > > Amaury Forgeot d'Arc, 09.01.2012 19:09: >> 2012/1/9 >>> I am trying to send a tuple to a method of a python class and I got a Run >>> failed from netbeans compiler >>> when I want to send a tuple to a simple method in a module it works,when I >>> want to send a simple parameter to a method of a clas it works also but not >>> a tuple to a method of a class >> >> This mailing list is for the development *of* python. >> For development *with* python, please ask your questions on >> the comp.lang.python group or the python-list at python.org mailing list. >> There you will find friendly people willing to help. > > It's also worth mentioning the cython-users mailing list here, in case the > OP cares about simplifying these kinds of issues from the complexity of > C/C++ into Python. Cython is a really good and simple way to implement > these kinds of language interactions, also for embedding Python. > > >> [for your particular question: keep in mind that PyObject_Call takes >> arguments as a tuple; >> if you want to pass one tuple, you need to build a 1-tuple around your >> tuple] > > The presented code also requires a whole lot of fixes (specifically in the > error handling parts) that Cython would basically just handle for you already. > > Stefan > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com -- ?_? From ncoghlan at gmail.com Tue Jan 10 09:50:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jan 2012 18:50:11 +1000 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12760: Add a create mode to open(). Patch by David Townshend. In-Reply-To: References: Message-ID: On Tue, Jan 10, 2012 at 7:40 AM, charles-francois.natali wrote: > http://hg.python.org/cpython/rev/bf609baff4d3 > changeset: ? 74315:bf609baff4d3 > user: ? ? ? ?Charles-Fran?ois Natali > date: ? ? ? ?Mon Jan 09 22:40:02 2012 +0100 > summary: > ?Issue #12760: Add a create mode to open(). Patch by David Townshend. To help make the 'x' more intuitive, it would be helpful if the mode was referred to as "exclusive create" in the docs (at least once, anyway), and the What's New entry stated explicitly that 'x' is used based on the C11 precedent. Otherwise, I'm sure I'll be far from the only one thinking "why not 'c'?". People shouldn't have to go read the tracker item to find out the reason 'x' is used instead of 'c'. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rob.cliffe at btinternet.com Tue Jan 10 09:49:04 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 10 Jan 2012 08:49:04 +0000 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: References: <20120109225907.18e834c3@pitrou.net> Message-ID: <4F0BFB80.8010008@btinternet.com> But "minor version" and "major version" are readily understandable to the general reader, e.g. me, whereas "feature release" and "release series" I find are not. Couldn't the first two terms be defined once and then used throughout? Rob Cliffe On 10/01/2012 04:05, Terry Reedy wrote: > On 1/9/2012 8:52 PM, Nick Coghlan wrote: >> On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou >> wrote: >>> Please avoid using the terms "minor version" and "major version", they >>> are confusing. >> >> Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x, >> 3.x) are the least confusing terms we have available. > > I minimally edited what was already there to correct what is now an > error. The change comes immediately after a section defining major, > minor, and micro releases. To change terms, > http://docs.python.org/devguide/devcycle.html > and possibly other pages needs more extensive editing. > From anthony.hw.kong at gmail.com Tue Jan 10 11:03:25 2012 From: anthony.hw.kong at gmail.com (Anthony Kong) Date: Tue, 10 Jan 2012 21:03:25 +1100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <4F0BFB80.8010008@btinternet.com> References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> Message-ID: I don't find 'major' and 'minor' confusing too. Maybe because it is the designation used in linux community for years. On Tue, Jan 10, 2012 at 7:49 PM, Rob Cliffe wrote: > But "minor version" and "major version" are readily understandable to the > general reader, e.g. me, whereas "feature release" and "release series" I > find are not. Couldn't the first two terms be defined once and then used > throughout? > Rob Cliffe > > > On 10/01/2012 04:05, Terry Reedy wrote: > >> On 1/9/2012 8:52 PM, Nick Coghlan wrote: >> >>> On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou >>> wrote: >>> >>>> Please avoid using the terms "minor version" and "major version", they >>>> are confusing. >>>> >>> >>> Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x, >>> 3.x) are the least confusing terms we have available. >>> >> >> I minimally edited what was already there to correct what is now an >> error. The change comes immediately after a section defining major, minor, >> and micro releases. To change terms, >> http://docs.python.org/**devguide/devcycle.html >> and possibly other pages needs more extensive editing. >> >> ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > anthony.hw.kong%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Jan 10 11:09:37 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 10 Jan 2012 11:09:37 +0100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> Message-ID: <20120110110937.4eb53781@rivendell> On Jan 10, 2012, at 09:03 PM, Anthony Kong wrote: >I don't find 'major' and 'minor' confusing too. Maybe because it is the >designation used in linux community for years. Neither do I. I read them as aliases for "leftmost digit" and "middle digit" respectively, regardless of Python's interpretation of them. -Barry From peck at us.ibm.com Tue Jan 10 12:00:58 2012 From: peck at us.ibm.com (Jon K Peck) Date: Tue, 10 Jan 2012 04:00:58 -0700 Subject: [Python-Dev] AUTO: Jon K Peck is out of the office (returning 01/12/2012) Message-ID: I am out of the office until 01/12/2012. I will be out of the office Monda through Wednesday with limited access to email. Note: This is an automated response to your message "Python-Dev Digest, Vol 102, Issue 26" sent on 1/9/2012 21:05:32. This is the only notification you will receive while this person is away. From stefan_ml at behnel.de Tue Jan 10 13:17:34 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 10 Jan 2012 13:17:34 +0100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: Matt Joiner, 10.01.2012 09:40: > Perhaps the python-dev mailing list should be renamed to python-core. Well, there *is* a rather visible warning on the list subscription page that tells people that it's most likely not the list they actually want to use. If they manage to ignore that, I doubt that a different list name would fix it for them. Stefan From solipsis at pitrou.net Tue Jan 10 13:57:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 10 Jan 2012 13:57:05 +0100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> Message-ID: <20120110135705.3756738c@pitrou.net> On Tue, 10 Jan 2012 08:49:04 +0000 Rob Cliffe wrote: > But "minor version" and "major version" are readily understandable to > the general reader, e.g. me, whereas "feature release" and "release > series" I find are not. Couldn't the first two terms be defined once > and then used throughout? To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature release, e.g. 3.3. I have a hard time considering 3.2 or 3.3 "minor". Regards Antoine. From victor.stinner at haypocalc.com Tue Jan 10 14:08:52 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 10 Jan 2012 14:08:52 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix stock symbol for Microsoft In-Reply-To: References: Message-ID: You may port the fix to 3.2 and 3.3. Victor 2012/1/10 raymond.hettinger : > http://hg.python.org/cpython/rev/068ce5d7f7e7 > changeset: ? 74320:068ce5d7f7e7 > branch: ? ? ?2.7 > user: ? ? ? ?Raymond Hettinger > date: ? ? ? ?Tue Jan 10 09:51:51 2012 +0000 > summary: > ?Fix stock symbol for Microsoft > > files: > ?Doc/library/sqlite3.rst | ?4 ++-- > ?1 files changed, 2 insertions(+), 2 deletions(-) > > > diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst > --- a/Doc/library/sqlite3.rst > +++ b/Doc/library/sqlite3.rst > @@ -66,7 +66,7 @@ > > ? ?# Larger example > ? ?for t in [('2006-03-28', 'BUY', 'IBM', 1000, 45.00), > - ? ? ? ? ? ? ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00), > + ? ? ? ? ? ? ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00), > ? ? ? ? ? ? ?('2006-04-06', 'SELL', 'IBM', 500, 53.00), > ? ? ? ? ? ? ]: > ? ? ? ?c.execute('insert into stocks values (?,?,?,?,?)', t) > @@ -86,7 +86,7 @@ > ? ?(u'2006-01-05', u'BUY', u'RHAT', 100, 35.14) > ? ?(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0) > ? ?(u'2006-04-06', u'SELL', u'IBM', 500, 53.0) > - ? (u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0) > + ? (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0) > ? ?>>> > > > > -- > Repository URL: http://hg.python.org/cpython > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > From sandro.tosi at gmail.com Tue Jan 10 17:32:01 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Tue, 10 Jan 2012 17:32:01 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: <4E4AF610.5040303@simplistix.co.uk> Message-ID: Hi all, On Sat, Aug 27, 2011 at 07:47, Georg Brandl wrote: > One of the main reasons for keeping Sphinx compatibility to 0.6.x was to > enable distributions (like Debian) to build the docs for the Python they ship > with the version of Sphinx that they ship. > > This should now be fine with 1.0.x, so since you are ready to do the work of > converting the 2.7 Doc sources, it will be accepted. ?The argument of easier > backports is a very good one. Not exactly as quickly as I would, I started to work on upgrading sphinx for 2.7. Currently I've all the preliminary patches at: http://hg.python.org/sandbox/morph/shortlog/5057ce392838 in the 2.7-sphinx branch (they fix one thing at a time, they'll be collapsed once all ready). During the build process, there are some warnings that I can understand: writing output... [100%] whatsnew/index /home/morph/cpython/morph_sandbox/Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal /home/morph/cpython/morph_sandbox/Doc/library/stdtypes.rst:2372: WARNING: more than one target found for cross-reference u'next': iterator.next, multifile.MultiFile.next, csv.csvreader.next, dbhash.dbhash.next, mailbox.oldmailbox.next, ttk.Treeview.next, nntplib.NNTP.next, file.next, bsddb.bsddbobject.next, tarfile.TarFile.next, generator.next /home/morph/cpython/morph_sandbox/Doc/library/stdtypes.rst:2372: WARNING: more than one target found for cross-reference u'next': iterator.next, multifile.MultiFile.next, csv.csvreader.next, dbhash.dbhash.next, mailbox.oldmailbox.next, ttk.Treeview.next, nntplib.NNTP.next, file.next, bsddb.bsddbobject.next, tarfile.TarFile.next, generator.next /home/morph/cpython/morph_sandbox/Doc/library/sys.rst:651: WARNING: unknown keyword: None /home/morph/cpython/morph_sandbox/Doc/library/sys.rst:712: WARNING: unknown keyword: None /home/morph/cpython/morph_sandbox/Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in /home/morph/cpython/morph_sandbox/Doc/reference/expressions.rst:1101: WARNING: unknown keyword: not in /home/morph/cpython/morph_sandbox/Doc/reference/expressions.rst:1135: WARNING: unknown keyword: not in /home/morph/cpython/morph_sandbox/Doc/reference/expressions.rst:1176: WARNING: unknown keyword: not in /home/morph/cpython/morph_sandbox/Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not /home/morph/cpython/morph_sandbox/Doc/reference/expressions.rst:1362: WARNING: unknown keyword: is not /home/morph/cpython/morph_sandbox/Doc/reference/simple_stmts.rst:700: WARNING: unknown keyword: None /home/morph/cpython/morph_sandbox/Doc/reference/simple_stmts.rst:729: WARNING: unknown keyword: None /home/morph/cpython/morph_sandbox/Doc/reference/simple_stmts.rst:729: WARNING: unknown keyword: None writing additional files... genindex py-modindex search download index opensearch Do you know how I can fix them? Thanks & Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From glyph at twistedmatrix.com Tue Jan 10 17:57:03 2012 From: glyph at twistedmatrix.com (Glyph) Date: Tue, 10 Jan 2012 11:57:03 -0500 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <20120110135705.3756738c@pitrou.net> References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> <20120110135705.3756738c@pitrou.net> Message-ID: <3A6C669C-DF8E-46A9-892F-F0BEF4818FA0@twistedmatrix.com> On Jan 10, 2012, at 7:57 AM, Antoine Pitrou wrote: > On Tue, 10 Jan 2012 08:49:04 +0000 > Rob Cliffe wrote: >> But "minor version" and "major version" are readily understandable to >> the general reader, e.g. me, whereas "feature release" and "release >> series" I find are not. Couldn't the first two terms be defined once >> and then used throughout? > > To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature > release, e.g. 3.3. I have a hard time considering 3.2 or 3.3 "minor". Whatever your personal feelings, there is a precedent established in the API: >>> sys.version_info.major 2 >>> sys.version_info.minor 7 >>> sys.version_info.micro 1 This strikes me as the most authoritative definition of the terms, in the context of Python. (Although the fact that this precedent is widely established elsewhere doesn't hurt.) Whatever term is chosen, the important thing is to apply the terminology consistently so that it's clear what is meant. I doubt that anyone has a term which every reader will intuitively and immediately associate with "middle dot-separated digit increment by one". If you want to emphasize the importance of a release, just choose a subjective term aside from "major" or "minor". -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Tue Jan 10 18:09:51 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 11 Jan 2012 04:09:51 +1100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <20120110135705.3756738c@pitrou.net> References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> <20120110135705.3756738c@pitrou.net> Message-ID: http://semver.org/ This has made sense since Gentoo days. On Tue, Jan 10, 2012 at 11:57 PM, Antoine Pitrou wrote: > On Tue, 10 Jan 2012 08:49:04 +0000 > Rob Cliffe wrote: >> But "minor version" and "major version" are readily understandable to >> the general reader, e.g. me, whereas "feature release" and "release >> series" I find are not. ?Couldn't the first two terms be defined once >> and then used throughout? > > To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature > release, e.g. 3.3. ?I have a hard time considering 3.2 or 3.3 "minor". > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com From anacrolix at gmail.com Tue Jan 10 18:15:06 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 11 Jan 2012 04:15:06 +1100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: I suspect it actually would fix the confusion. "dev" usually means development, not "core implementation development". People float past looking for dev help... python-dev. Python-list is a bit generic. On Tue, Jan 10, 2012 at 11:17 PM, Stefan Behnel wrote: > Matt Joiner, 10.01.2012 09:40: >> Perhaps the python-dev mailing list should be renamed to python-core. > > Well, there *is* a rather visible warning on the list subscription page > that tells people that it's most likely not the list they actually want to > use. If they manage to ignore that, I doubt that a different list name > would fix it for them. > > Stefan > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com From solipsis at pitrou.net Tue Jan 10 18:14:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 10 Jan 2012 18:14:47 +0100 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <3A6C669C-DF8E-46A9-892F-F0BEF4818FA0@twistedmatrix.com> References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> <20120110135705.3756738c@pitrou.net> <3A6C669C-DF8E-46A9-892F-F0BEF4818FA0@twistedmatrix.com> Message-ID: <20120110181447.4b61c22a@pitrou.net> On Tue, 10 Jan 2012 11:57:03 -0500 Glyph wrote: > > Whatever your personal feelings, there is a precedent established in the API: > > >>> sys.version_info.major > 2 > >>> sys.version_info.minor > 7 > >>> sys.version_info.micro > 1 > > This strikes me as the most authoritative definition of the terms, in the context of Python. (Although the fact that this precedent is widely established elsewhere doesn't hurt.) While authoritative, it is still counter-intuitive and misleading for some people (including Nick and me, apparently). I never use the field names myself, I use version_info as a 3-tuple. > Whatever term is chosen, the important thing is to apply the terminology consistently so that it's clear what is meant. I doubt that anyone has a term which every reader will intuitively and immediately associate with "middle dot-separated digit increment by one". I changed the terminology in my latest changeset: http://hg.python.org/devguide/rev/f39d063ab3dd Important to notice is that the major / minor distinction isn't relevant in most contexts, while the feature / bugfix distinction is. Where "major" plays a role, we can simply avoid the term by talking about Python 2 and Python 3, which is more explicit too. I doubt this needs to be revisited before 10 years anyway. Regards Antoine. From martin at v.loewis.de Tue Jan 10 23:30:58 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 10 Jan 2012 23:30:58 +0100 Subject: [Python-Dev] svn.python.org certificate expired In-Reply-To: References: Message-ID: <4F0CBC22.2090505@v.loewis.de> > It seems that svn.python.org certificate expired today (09/01/2012). I have now replaced the certificate. The current one will expire on Chistmas 2013. Regards, Martin From tjreedy at udel.edu Tue Jan 10 23:38:18 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 10 Jan 2012 17:38:18 -0500 Subject: [Python-Dev] devguide: Backporting is obsolete. Add details that I had to learn. In-Reply-To: <20120110181447.4b61c22a@pitrou.net> References: <20120109225907.18e834c3@pitrou.net> <4F0BFB80.8010008@btinternet.com> <20120110135705.3756738c@pitrou.net> <3A6C669C-DF8E-46A9-892F-F0BEF4818FA0@twistedmatrix.com> <20120110181447.4b61c22a@pitrou.net> Message-ID: On 1/10/2012 12:14 PM, Antoine Pitrou wrote: > I changed the terminology in my latest changeset: > http://hg.python.org/devguide/rev/f39d063ab3dd > > Important to notice is that the major / minor distinction isn't > relevant in most contexts, while the feature / bugfix distinction is. > Where "major" plays a role, we can simply avoid the term by talking > about Python 2 and Python 3, which is more explicit too. I doubt this > needs to be revisited before 10 years anyway. FWIW, I like the changes, and you did them better than I would have. -- Terry Jan Reedy From martin at v.loewis.de Wed Jan 11 01:20:21 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jan 2012 01:20:21 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> Message-ID: <4F0CD5C5.2080309@v.loewis.de> Am 09.01.2012 07:13, schrieb Jeff Hardy: > On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou wrote: >> Depending on the extent of removed/disabled functionality, it might not >> be very interesting to have a Metro port at all. > > Win 8 is practically a new OS target - the nt module may need to be > replaced with a metro module to handle it well. No, it's not. Everything continues to work just fine on Windows 8, as long as we keep developing desktop apps. Only if Metro Apps are the target things may need to be replaced (but only very few changes are necessary to the nt module to make it compile). Regards, Martin From martin at v.loewis.de Wed Jan 11 01:32:12 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jan 2012 01:32:12 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> Message-ID: <4F0CD88C.6030407@v.loewis.de> > I haven't been following this thread so maybe this was already > discussed, but on the whole "new OS target" thing - if people want to > write immersive apps in Python then there will need to be a new build > of Python. One thing that might make that easier is the fact that > the C runtime is still available to metro apps, even if the C runtime > calls a banned API. Does that hold for all versions of the C runtime (i.e. is msvcr80.dll also exempt from the ban, or just the version that comes with VS 11)? > So to the extent that Python is just a C program > the "port" should be pretty easy and mostly involve disabling > functionality that isn't available at all to metro apps. See the start of the thread: I tried to create a "WinRT Component DLL", and that failed, as VS would refuse to compile any C file in such a project. Not sure whether this is triggered by defining WINAPI_FAMILY=2, or any other compiler setting. I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier to fix than verifier errors. Regards, Martin From phd at phdru.name Mon Jan 9 16:07:23 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 9 Jan 2012 19:07:23 +0400 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: <20120109150723.GA24824@iskra.aviel.ru> Hello. We are sorry but we cannot help you. This mailing list is to work on developing Python (adding new features to Python itself and fixing bugs); if you're having problems learning, understanding or using Python, please find another forum. Probably python-list/comp.lang.python mailing list/news group is the best place; there are Python developers who participate in it; you may get a faster, and probably more complete, answer there. See http://www.python.org/community/ for other lists/news groups/fora. Thank you for understanding. On Mon, Jan 09, 2012 at 03:46:04PM +0100, pasparis at noos.fr wrote: > Hello,

I am trying to send a tuple to a method of a python class Also please don't send html-only mail. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From martin at v.loewis.de Wed Jan 11 02:09:02 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 11 Jan 2012 02:09:02 +0100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: References: Message-ID: <4F0CE12E.4030002@v.loewis.de> Am 10.01.2012 18:15, schrieb Matt Joiner: > I suspect it actually would fix the confusion. "dev" usually means > development, not "core implementation development". People float past > looking for dev help... python-dev. Python-list is a bit generic. There is occasional confusion. More often, people think "there are the folks who could actually answer my question, and nobody on python-list answered, so I'll just ask there". We established to assume that they are confused instead of deliberately breaking convention, which is a polite way of pointing out that we really mean it. IOW, I think it is all fine the way it is. Typically, somebody answers quickly. In this case, *two* people answered the same, which a) really gets the message through, and b) suggests that people are not too tired in actually typing in this message every now and then. Of course, pointing the OP to a more specific focused forum (which is not always cython-users) is also kind. Regards, Martin From ncoghlan at gmail.com Wed Jan 11 03:25:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 Jan 2012 12:25:46 +1000 Subject: [Python-Dev] os.walk() with followlinks=False Message-ID: When discussing http://bugs.python.org/issue13734, Charles-Fran?ois noted that when os.walk() is called with "followlinks=False", symlinks to directories are still included in the "subdirs" list rather than the "files" list. This seems rather odd to me, so I'm asking here to see if there's a specific rationale for it, or if it's just an artifact of the implementation. If it's the latter... could we change it for 3.3, or is that too significant a breach of backwards compatibility? Even if we can't change os.walk(), does os.walkfd() need to replicate the annoying behaviour for consistency, or can it instead consider such symlinks to be files rather than directories? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dinov at microsoft.com Wed Jan 11 02:59:08 2012 From: dinov at microsoft.com (Dino Viehland) Date: Wed, 11 Jan 2012 01:59:08 +0000 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <4F0CD88C.6030407@v.loewis.de> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> <4F0CD88C.6030407@v.loewis.de> Message-ID: <6C7ABA8B4E309440B857D74348836F2E4CCBE969@TK5EX14MBXC292.redmond.corp.microsoft.com> Martin wrote: > Does that hold for all versions of the C runtime (i.e. is msvcr80.dll also > exempt from the ban, or just the version that comes with VS 11)? Just the VS 11 CRT is allowed. > > > So to the extent that Python is just a C program the "port" should be > > pretty easy and mostly involve disabling functionality that isn't > > available at all to metro apps. > > See the start of the thread: I tried to create a "WinRT Component DLL", and > that failed, as VS would refuse to compile any C file in such a project. Not > sure whether this is triggered by defining WINAPI_FAMILY=2, or any other > compiler setting. > > I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier > to fix than verifier errors. Let me see if I can try this. Hopefully I still have my VM w/ this all setup and I can see if I can get it building this way. I can always ping some people on the C++ team and ask them for help if I run into issues. I'll give it a shot tomorrow and get back to you. From ericsnowcurrently at gmail.com Wed Jan 11 05:23:28 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 10 Jan 2012 21:23:28 -0700 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: <4F0CE12E.4030002@v.loewis.de> References: <4F0CE12E.4030002@v.loewis.de> Message-ID: On Tue, Jan 10, 2012 at 6:09 PM, "Martin v. L?wis" wrote: > IOW, I think it is all fine the way it is. Typically, somebody answers > quickly. In this case, *two* people answered the same, which > a) really gets the message through, and > b) suggests that people are not too tired in actually typing in > ? this message every now and then. +1 -eric From lists at cheimes.de Wed Jan 11 10:49:46 2012 From: lists at cheimes.de (Christian Heimes) Date: Wed, 11 Jan 2012 10:49:46 +0100 Subject: [Python-Dev] shutil.copy() and hard links Message-ID: Hello, here is another fun fact about links, this time hard links and the shutil.copy() function. The shutil.copy() functions behaves like the Unix cp(1) command. Both don't unlink the destination file if it already exists. As a consequence all hard links point to the updated file data. This behavior may surprise some users. Perhaps the docs should point out how shutil.copy() works when hard links join the party. It might be worth to add a function that works similar to install(1). The install(1) command unlinks the destination first and opens it with exclusive create flags. This compensates for possible symlink attacks, too. Christian Shell session example of cp and install ======================================= $ echo "test1" > test1 $ echo "test2" > test2 $ ln test1 test_hardlink now test_hardlink points to the same inodes as test1 $ cat test_hardlink test1 test_hardlink still points to the same inodes $ cp test2 test1 $ cat test_hardlink test2 reset $ echo "test1" > test1 $ cat test_hardlink test1 install unlinks the file first, test1 and test_hardlink point to different inodes $ install test2 test1 $ cat test_hardlink test1 strace of install test2 test1 ============================= stat("test1", {st_mode=S_IFREG|0755, st_size=6, ...}) = 0 stat("test2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0 lstat("test1", {st_mode=S_IFREG|0755, st_size=6, ...}) = 0 unlink("test1") = 0 open("test2", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0664, st_size=6, ...}) = 0 open("test1", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4 fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0 From martin at v.loewis.de Wed Jan 11 11:12:16 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Wed, 11 Jan 2012 11:12:16 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <6C7ABA8B4E309440B857D74348836F2E4CCBE969@TK5EX14MBXC292.redmond.corp.microsoft.com> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> <4F0CD88C.6030407@v.loewis.de> <6C7ABA8B4E309440B857D74348836F2E4CCBE969@TK5EX14MBXC292.redmond.corp.microsoft.com> Message-ID: <20120111111216.Horde.Ruxpc0lCcOxPDWCAqYUzX5A@webmail.df.eu> > Let me see if I can try this. Hopefully I still have my VM w/ this > all setup and > I can see if I can get it building this way. I can always ping some > people on the > C++ team and ask them for help if I run into issues. I'll give it a > shot tomorrow > and get back to you. Hi Dino, I reported that as a bug. If you need that for reference, see https://connect.microsoft.com/VisualStudio/feedback/details/717395/c1083-when-compiling-c-code-in-a-metro-app Regards, Martin From solipsis at pitrou.net Wed Jan 11 15:52:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 11 Jan 2012 15:52:07 +0100 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class References: <4F0CE12E.4030002@v.loewis.de> Message-ID: <20120111155207.04929873@pitrou.net> On Wed, 11 Jan 2012 02:09:02 +0100 "Martin v. L?wis" wrote: > Am 10.01.2012 18:15, schrieb Matt Joiner: > > I suspect it actually would fix the confusion. "dev" usually means > > development, not "core implementation development". People float past > > looking for dev help... python-dev. Python-list is a bit generic. > > There is occasional confusion. More often, people think "there are the > folks who could actually answer my question, and nobody on python-list > answered, so I'll just ask there". We established to assume that they > are confused instead of deliberately breaking convention, which is a > polite way of pointing out that we really mean it. > > IOW, I think it is all fine the way it is. Typically, somebody answers > quickly. In this case, *two* people answered the same, which > a) really gets the message through, and > b) suggests that people are not too tired in actually typing in > this message every now and then. I suspect one of them doesn't actually *type* the message ;) Regards Antoine. From solipsis at pitrou.net Wed Jan 11 15:54:05 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 11 Jan 2012 15:54:05 +0100 Subject: [Python-Dev] os.walk() with followlinks=False References: Message-ID: <20120111155405.3eab04de@pitrou.net> On Wed, 11 Jan 2012 12:25:46 +1000 Nick Coghlan wrote: > When discussing http://bugs.python.org/issue13734, Charles-Fran?ois > noted that when os.walk() is called with "followlinks=False", symlinks > to directories are still included in the "subdirs" list rather than > the "files" list. > > This seems rather odd to me, so I'm asking here to see if there's a > specific rationale for it, or if it's just an artifact of the > implementation. > > If it's the latter... could we change it for 3.3, or is that too > significant a breach of backwards compatibility? I think we could change it. > Even if we can't change os.walk(), does os.walkfd() need to replicate > the annoying behaviour for consistency, or can it instead consider > such symlinks to be files rather than directories? IMO walkfd() should do the right thing. Regards Antoine. From phd at phdru.name Wed Jan 11 16:07:32 2012 From: phd at phdru.name (Oleg Broytman) Date: Wed, 11 Jan 2012 19:07:32 +0400 Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a python Class In-Reply-To: <20120111155207.04929873@pitrou.net> References: <4F0CE12E.4030002@v.loewis.de> <20120111155207.04929873@pitrou.net> Message-ID: <20120111150732.GA24839@iskra.aviel.ru> On Wed, Jan 11, 2012 at 03:52:07PM +0100, Antoine Pitrou wrote: > On Wed, 11 Jan 2012 02:09:02 +0100 > "Martin v. L?wis" wrote: > > b) suggests that people are not too tired in actually typing in > > this message every now and then. > > I suspect one of them doesn't actually *type* the message ;) Certainly, no. :0r mail/misc/python-dev And even this command is in vim history, I don't type it, just press :0 ;-) Sometimes I add something useful to the OP but this time I didn't - I just haven't got any helpful information. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From jdhardy at gmail.com Wed Jan 11 18:30:28 2012 From: jdhardy at gmail.com (Jeff Hardy) Date: Wed, 11 Jan 2012 09:30:28 -0800 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <4F0CD5C5.2080309@v.loewis.de> References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <4F0CD5C5.2080309@v.loewis.de> Message-ID: On Tue, Jan 10, 2012 at 4:20 PM, "Martin v. L?wis" wrote: >> Win 8 is practically a new OS target - the nt module may need to be >> replaced with a metro module to handle it well. > > No, it's not. Everything continues to work just fine on Windows 8, > as long as we keep developing desktop apps. > > Only if Metro Apps are the target things may need to be replaced (but > only very few changes are necessary to the nt module to make it compile). Yeah, that's what I meant. I should have said "WinRT is ..." instead of "Win 8 is ...". If nt can be made to work, than that's even better than I expected. - Jeff From mwm at mired.org Thu Jan 12 01:01:44 2012 From: mwm at mired.org (Mike Meyer) Date: Wed, 11 Jan 2012 16:01:44 -0800 Subject: [Python-Dev] Proposed PEP on concurrent programming support In-Reply-To: References: <20120103164036.681beeae@mikmeyer-vm-fedora> Message-ID: <20120111160144.66c46236@mikmeyer-vm-fedora> On Wed, 4 Jan 2012 00:07:27 -0500 PJ Eby wrote: > On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > > A suite is marked > > as a `transaction`, and then when an unlocked object is modified, > > instead of indicating an error, a locked copy of it is created to be > > used through the rest of the transaction. If any of the originals > > are modified during the execution of the suite, the suite is rerun > > from the beginning. If it completes, the locked copies are copied > > back to the originals in an atomic manner. > I'm not sure if "locked" is really the right word here. A private > copy isn't "locked" because it's not shared. Do you have a suggestion for a better word? Maybe the "safe" state used elsewhere? > > For > > instance, combining STM with explicit locking would allow explicit > > locking when IO was required, > I don't think this idea makes any sense, since STM's don't really > "lock", and to control I/O in an STM system you just STM-ize the > queues. (Generally speaking.) I thought about that. I couldn't convince myself that STM by itself sufficient. If you need to make irreversible changes to the state of an object, you can't use STM, so what do you use? Can every such situation be handled by creating "safe" values then using an STM to update them? References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> <4F0CD88C.6030407@v.loewis.de> Message-ID: <6C7ABA8B4E309440B857D74348836F2E4CCC110D@TK5EX14MBXC292.redmond.corp.microsoft.com> Martin wrote: > See the start of the thread: I tried to create a "WinRT Component DLL", and > that failed, as VS would refuse to compile any C file in such a project. Not > sure whether this is triggered by defining WINAPI_FAMILY=2, or any other > compiler setting. > > I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier > to fix than verifier errors. I got the same errors as you - it seems like they're related to enabling the Immersive bit for the compile of the DLL. I'm not certain if that's necessary, when I did the run before to see if Python would pass the app store validation it didn't care that we didn't have the App Container bit set on the DLL (it did want NXCOMPAT and dynamic base set though). I was also able to just define WINAPI_FAMILY=2 in the .vcxproj file and I got the various expected errors when accessing banned APIs (it actually seems like a bunch were missing vs. what the validator reported, but maybe that's just an issue w/ the developer preview). Once I fixed those errors up I was able to get a DLL that successfully compiled. I'm going to ping some people on the windows team and see if the app container bit is or will be necessary for DLLs. From anacrolix at gmail.com Thu Jan 12 08:20:08 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 12 Jan 2012 18:20:08 +1100 Subject: [Python-Dev] Proposed PEP on concurrent programming support In-Reply-To: <20120111160144.66c46236@mikmeyer-vm-fedora> References: <20120103164036.681beeae@mikmeyer-vm-fedora> <20120111160144.66c46236@mikmeyer-vm-fedora> Message-ID: On Thu, Jan 12, 2012 at 11:01 AM, Mike Meyer wrote: > On Wed, 4 Jan 2012 00:07:27 -0500 > PJ Eby wrote: > >> On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: >> > A suite is marked >> > as a `transaction`, and then when an unlocked object is modified, >> > instead of indicating an error, a locked copy of it is created to be >> > used through the rest of the transaction. If any of the originals >> > are modified during the execution of the suite, the suite is rerun >> > from the beginning. If it completes, the locked copies are copied >> > back to the originals in an atomic manner. >> I'm not sure if "locked" is really the right word here. ?A private >> copy isn't "locked" because it's not shared. > > Do you have a suggestion for a better word? Maybe the "safe" state > used elsewhere? > >> > For >> > instance, combining STM with explicit locking would allow explicit >> > locking when IO was required, >> I don't think this idea makes any sense, since STM's don't really >> "lock", and to control I/O in an STM system you just STM-ize the >> queues. (Generally speaking.) > > I thought about that. I couldn't convince myself that STM by itself > sufficient. If you need to make irreversible changes to the state of > an object, you can't use STM, so what do you use? Can every such > situation be handled by creating "safe" values then using an STM to > update them? > > ? ? ? _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com IMHO STM by itself isn't sufficient. Either immutability, or careful use of references protected by STM amounting to the same are the only reasonable ways to do it. Both also perform much better than the alternatives. From ncoghlan at gmail.com Thu Jan 12 12:47:16 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Jan 2012 21:47:16 +1000 Subject: [Python-Dev] os.walk() with followlinks=False In-Reply-To: <20120111155405.3eab04de@pitrou.net> References: <20120111155405.3eab04de@pitrou.net> Message-ID: On Thu, Jan 12, 2012 at 12:54 AM, Antoine Pitrou wrote: > On Wed, 11 Jan 2012 12:25:46 +1000 > Nick Coghlan wrote: >> If it's the latter... could we change it for 3.3, or is that too >> significant a breach of backwards compatibility? > > I think we could change it. For the benefit of those not following the tracker issue, Charles-Fran?ois pointed out that putting the symlinks-to-directories into the files list instead of the subdirectory list isn't really any better (it just moves the problem to different use cases, such as those that actually want to read the file contents). With that being the case, I've changed my mind and figure we may as well leave the current behaviour alone. I'll think about adding a filter to walkdir that makes it easy to control the way they're handled [1]. [1] https://bitbucket.org/ncoghlan/walkdir/issue/9/better-handling-of-dir-symlinks Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Fri Jan 13 02:24:33 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 13 Jan 2012 02:24:33 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability Message-ID: Many people proposed their own idea to fix the vulnerability, but only 3 wrote a patch: - Glenn Linderman proposes to fix the vulnerability by adding a new "safe" dict type (only accepting string keys). His proof-of-concept (SafeDict.py) uses a secret of 64 random bits and uses it to compute the hash of a key. - Marc Andre Lemburg proposes to fix the vulnerability directly in dict (for any key type). The patch raises an exception if a lookup causes more than 1000 collisions. - I propose to fix the vulnerability only in the Unicode hash (not for other types). My patch adds a random secret initialized at startup (it can be disabled or fixed using an environment variable). -- I consider that Glenn's proposition is not applicable in practice because all applications and all libraries have to be patched to use the new "safe" dict type. Some people are concerned by possible regression introduced by Marc's proposition: his patch may raise an exception for legitimate data. My proposition tries to be "just enough" secure with a low (runtime performance) overhead. My patch becomes huge (and so backporting is more complex), whereas Marc's patch is very simple and so trivial to backport. -- It is still unclear to me if the fix should be enabled by default for Python < 3.3. Because the overhead (of my patch) is low, I would prefer to enable the fix by default, to protect everyone with a simple Python upgrade. I prefer to explain how to disable explicitly the randomized hash (PYTHONHASHSEED=0) (or how to fix application bugs) to people having troubles with randomized hash, instead of leaving the hole open by default. -- We might change hash() for types other than str, but it looks like web servers are only concerned by dict with string keys. We may use Paul's hash function if mine is not enough secure. My patch doesn't fix the DoS, it just make the attack more complex. The attacker cannot pregenerate data for an attack: (s)he has first to compute the hash secret, and then compute hash collisions using the secret. The hash secret is a least 64 bits long (128 bits on a 64 bit system). So I hope that computing collisions requires a lot of CPU time (is slow) to make the attack ineffective with today computers. -- I plan to write a nice patch for Python 3.3, then write a simpler patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged, maybe don't create a new random.c file, maybe don't touch the test suite while the patch breaks many tests), and finally write patches for Python 2.6 and 2.7. Details about my patch: - I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits) - a new PYTHONSEED environment variable allow to control the randomized hash: PYTHONSEED=0 disables completly the randomized hash (restore the previous behaviour), PYTHONSEED=value uses a fixed seed for processes sharing data and needind same hash values (multiprocessing users?) - no overhead on hash(str) - no startup overhead on Linux - startup overhead is 10% on Windows (see the issue, I propose another solution with a startup overhead of 1%) The patch is not done, some tests are still failing because of the randomized hash. -- FYI, PHP released a version 5.3.9 adding "max_input_vars directive to prevent attacks based on hash collisions (CVE-2011-4885)". Victor From guido at python.org Fri Jan 13 03:57:42 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jan 2012 18:57:42 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: Hm... I started out as a big fan of the randomized hash, but thinking more about it, I actually believe that the chances of some legitimate app having >1000 collisions are way smaller than the chances that somebody's code will break due to the variable hashing. In fact we know for a fact that the latter will break code, since it changes the order of items in a dict. This affects many tests written without this in mind, and I assume there will be some poor sap out there who uses Python's hash() function to address some external persistent hash table or some other external datastructure. How pathological the data needs to be before the collision counter triggers? I'd expect *very* pathological. This is depending on how the counting is done (I didn't look at MAL's patch), and assuming that increasing the hash table size will generally reduce collisions if items collide but their hashes are different. That said, even with collision counting I'd like a way to disable it without changing the code, e.g. a flag or environment variable. --Guido On Thu, Jan 12, 2012 at 5:24 PM, Victor Stinner < victor.stinner at haypocalc.com> wrote: > Many people proposed their own idea to fix the vulnerability, but only > 3 wrote a patch: > > - Glenn Linderman proposes to fix the vulnerability by adding a new > "safe" dict type (only accepting string keys). His proof-of-concept > (SafeDict.py) uses a secret of 64 random bits and uses it to compute > the hash of a key. > - Marc Andre Lemburg proposes to fix the vulnerability directly in > dict (for any key type). The patch raises an exception if a lookup > causes more than 1000 collisions. > - I propose to fix the vulnerability only in the Unicode hash (not for > other types). My patch adds a random secret initialized at startup (it > can be disabled or fixed using an environment variable). > > -- > > I consider that Glenn's proposition is not applicable in practice > because all applications and all libraries have to be patched to use > the new "safe" dict type. > > Some people are concerned by possible regression introduced by Marc's > proposition: his patch may raise an exception for legitimate data. > > My proposition tries to be "just enough" secure with a low (runtime > performance) overhead. My patch becomes huge (and so backporting is > more complex), whereas Marc's patch is very simple and so trivial to > backport. > > -- > > It is still unclear to me if the fix should be enabled by default for > Python < 3.3. Because the overhead (of my patch) is low, I would > prefer to enable the fix by default, to protect everyone with a simple > Python upgrade. > > I prefer to explain how to disable explicitly the randomized hash > (PYTHONHASHSEED=0) (or how to fix application bugs) to people having > troubles with randomized hash, instead of leaving the hole open by > default. > > -- > > We might change hash() for types other than str, but it looks like web > servers are only concerned by dict with string keys. > > We may use Paul's hash function if mine is not enough secure. > > My patch doesn't fix the DoS, it just make the attack more complex. > The attacker cannot pregenerate data for an attack: (s)he has first to > compute the hash secret, and then compute hash collisions using the > secret. The hash secret is a least 64 bits long (128 bits on a 64 bit > system). So I hope that computing collisions requires a lot of CPU > time (is slow) to make the attack ineffective with today computers. > > -- > > I plan to write a nice patch for Python 3.3, then write a simpler > patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged, > maybe don't create a new random.c file, maybe don't touch the test > suite while the patch breaks many tests), and finally write patches > for Python 2.6 and 2.7. > > Details about my patch: > > - I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits) > - a new PYTHONSEED environment variable allow to control the > randomized hash: PYTHONSEED=0 disables completly the randomized hash > (restore the previous behaviour), PYTHONSEED=value uses a fixed seed > for processes sharing data and needind same hash values > (multiprocessing users?) > - no overhead on hash(str) > - no startup overhead on Linux > - startup overhead is 10% on Windows (see the issue, I propose another > solution with a startup overhead of 1%) > > The patch is not done, some tests are still failing because of the > randomized hash. > > -- > > FYI, PHP released a version 5.3.9 adding "max_input_vars directive to > prevent attacks based on hash collisions (CVE-2011-4885)". > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Fri Jan 13 05:19:29 2012 From: pje at telecommunity.com (PJ Eby) Date: Thu, 12 Jan 2012 23:19:29 -0500 Subject: [Python-Dev] Proposed PEP on concurrent programming support In-Reply-To: <20120111160144.66c46236@mikmeyer-vm-fedora> References: <20120103164036.681beeae@mikmeyer-vm-fedora> <20120111160144.66c46236@mikmeyer-vm-fedora> Message-ID: On Wed, Jan 11, 2012 at 7:01 PM, Mike Meyer wrote: > On Wed, 4 Jan 2012 00:07:27 -0500 > PJ Eby wrote: > > On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer wrote: > > > For > > > instance, combining STM with explicit locking would allow explicit > > > locking when IO was required, > > I don't think this idea makes any sense, since STM's don't really > > "lock", and to control I/O in an STM system you just STM-ize the > > queues. (Generally speaking.) > > I thought about that. I couldn't convince myself that STM by itself > sufficient. If you need to make irreversible changes to the state of > an object, you can't use STM, so what do you use? Can every such > situation be handled by creating "safe" values then using an STM to > update them? > If you need to do something irreversible, you just need to use an STM-controlled queue, with something that reads from it to do the irreversible things. The catch is that your queue design has to support guaranteed-successful item removal, since if the dequeue transaction fails, it's too late. Alternately, the queue reader can commit removal first, then perform the irreversible operation... but leave open a short window for failure. It depends on the precise semantics you're looking for. In either case, though, the STM is pretty much sufficient, given a good enough queue data structure. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pydev at sievertsen.de Fri Jan 13 09:11:47 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Fri, 13 Jan 2012 09:11:47 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F0FE743.20002@sievertsen.de> Am 13.01.2012 02:24, schrieb Victor Stinner: > My patch doesn't fix the DoS, it just make the attack more complex. > The attacker cannot pregenerate data for an attack: (s)he has first to > compute the hash secret, and then compute hash collisions using the > secret. The hash secret is a least 64 bits long (128 bits on a 64 bit > system). So I hope that computing collisions requires a lot of CPU > time (is slow) to make the attack ineffective with today computers. Unfortunately it requires only a few seconds to compute enough 32bit collisions on one core with no precomputed data. I'm sure it's possible to make this less than a second. In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is possible. So the question is: How difficult is it to guess the seed? Frank From victor.stinner at haypocalc.com Fri Jan 13 10:23:45 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 13 Jan 2012 10:23:45 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F0FE743.20002@sievertsen.de> References: <4F0FE743.20002@sievertsen.de> Message-ID: > Unfortunately it requires only a few seconds to compute enough 32bit > collisions on one core with no precomputed data. Are you running the hash function "backward" to generate strings with the same value, or you are more trying something like brute forcing? And how do you get the hash secret? You need it to run an attack. > In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^ > suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is > possible. My change adds also a prefix (a prefix and a suffix). I don't know if it changes anything for generating collisions. > So the question is: How difficult is it to guess the seed? I wrote some remarks about that in the issue. For example: (hash("\0")^1) ^ (hash("\0\0")^2) gives ((prefix * 1000003) & HASH_MASK) ^ ((prefix * 1000003**2) & HASH_MASK) I suppose that you don't have directly the full output of hash(str) in practical, but hash(str) & DICT_MASK where DICT_MASK depends is the size of the internal dict array minus 1. For example, for a dictionary of 65,536 items, the mask is 0x1ffff and so cannot gives you more than 17 bits of hash(str) output. I still don't know how difficult it is to retreive hash(str) bits from repr(dict). Victor From regebro at gmail.com Fri Jan 13 12:20:28 2012 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 13 Jan 2012 12:20:28 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Fri, Jan 13, 2012 at 02:24, Victor Stinner wrote: > - Glenn Linderman proposes to fix the vulnerability by adding a new > "safe" dict type (only accepting string keys). His proof-of-concept > (SafeDict.py) uses a secret of 64 random bits and uses it to compute > the hash of a key. This is my preferred solution. The vulnerability is basically only in the dictionary you keep the form data you get from a request. This solves it easily and nicely. It can also be a separate module installable for Python 2, which many web frameworks still use, so it can be practical implementable now, and not in a couple of years. Then again, nothing prevents us from having both this, *and* one of the other solutions. :-) //Lennart From ncoghlan at gmail.com Fri Jan 13 13:14:43 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 13 Jan 2012 22:14:43 +1000 Subject: [Python-Dev] PEP 380 ("yield from") is now Final Message-ID: I marked PEP 380 as Final this evening, after pushing the tested and documented implementation to hg.python.org: http://hg.python.org/cpython/rev/d64ac9ab4cd0 As the list of names in the NEWS and What's New entries suggests, it was quite a collaborative effort to get this one over the line, and that's without even listing all the people that offered helpful suggestions and comments along the way :) print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))()))) -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From frank at sievertsen.de Fri Jan 13 12:49:15 2012 From: frank at sievertsen.de (Frank Sievertsen) Date: Fri, 13 Jan 2012 12:49:15 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F0FE743.20002@sievertsen.de> Message-ID: <4F101A3B.2070408@sievertsen.de> >> Unfortunately it requires only a few seconds to compute enough 32bit >> collisions on one core with no precomputed data. > Are you running the hash function "backward" to generate strings with > the same value, or you are more trying something like brute forcing? If you try it brute force to hit a specific target, you'll only find only one good string every 4 billion tries. That's why you first blow up your target: You start backward from an arbitrary target-value. You brute force for 3 characters, for example, this will give you 16 million intermediate values from which you know that they'll end up in your target-value. Those 16 million values are a huge target for now brute-forcing forward: Every 256 tries you'll hit one of these values. > And how do you get the hash secret? You need it to run an attack. I don't know. This was meant as an answer to the quoted text "So I hope that computing collisions requires a lot of CPU time (is slow) to make the attack ineffective with today computers.". What I wanted to say is: The security relies on the fact that the attacker can't guess the prefix, not that he can't precompute the values and it takes hours or days to compute the collisions. If the prefix leaks out of the application, then the rest is trivial and done in a few seconds. The suffix is not important for the collision-prevention, but it will probably make it much harder to guess the prefix. I don't know an effective way to get the prefix either, (if the application doesn't leak full hash(X) values). Frank From anacrolix at gmail.com Fri Jan 13 13:34:38 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 13 Jan 2012 23:34:38 +1100 Subject: [Python-Dev] PEP 380 ("yield from") is now Final In-Reply-To: References: Message-ID: Great work Nick, I've been looking forward to this one. Thanks all for putting the effort in. On Fri, Jan 13, 2012 at 11:14 PM, Nick Coghlan wrote: > I marked PEP 380 as Final this evening, after pushing the tested and > documented implementation to hg.python.org: > http://hg.python.org/cpython/rev/d64ac9ab4cd0 > > As the list of names in the NEWS and What's New entries suggests, it > was quite a collaborative effort to get this one over the line, and > that's without even listing all the people that offered helpful > suggestions and comments along the way :) > > print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))()))) > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com From and-dev at doxdesk.com Fri Jan 13 13:45:50 2012 From: and-dev at doxdesk.com (And Clover) Date: Fri, 13 Jan 2012 12:45:50 +0000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F10277E.6060804@doxdesk.com> On 2012-01-13 11:20, Lennart Regebro wrote: > The vulnerability is basically only in the dictionary you keep the > form data you get from a request. I'd have to disagree with this statement. The vulnerability is anywhere that creates a dictionary (or set) from attacker-provided keys. That would include HTTP headers, RFC822-family subheaders and parameters, the environ, input taken from JSON or XML, and so on - and indeed hash collision attacks are not at all web-specific. The problem with having two dict implementations is that a caller would have to tell libraries that use dictionaries which implementation to use. So for example an argument would have to be passed to json.load[s] to specify whether the input was known-sane or potentially hostile. Any library could ever use dictionaries to process untrusted input *or any library that used another library that did* would have to pass such a flag through, which would quickly get very unwieldy indeed... or else they'd have to just always use safedict, in which case we're in pretty much the same position as we are with changing dict anyway. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ gtalk:chat?jid=bobince at gmail.com From g.brandl at gmx.net Fri Jan 13 16:17:09 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 13 Jan 2012 16:17:09 +0100 Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682) In-Reply-To: References: Message-ID: Caution, long review ahead. On 01/13/2012 12:43 PM, nick.coghlan wrote: > http://hg.python.org/cpython/rev/d64ac9ab4cd0 > changeset: 74356:d64ac9ab4cd0 > user: Nick Coghlan > date: Fri Jan 13 21:43:40 2012 +1000 > summary: > Implement PEP 380 - 'yield from' (closes #11682) > diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst > --- a/Doc/reference/expressions.rst > +++ b/Doc/reference/expressions.rst > @@ -318,7 +318,7 @@ There should probably be a "versionadded" somewhere on this page. > .. productionlist:: > yield_atom: "(" `yield_expression` ")" > - yield_expression: "yield" [`expression_list`] > + yield_expression: "yield" [`expression_list` | "from" `expression`] > > The :keyword:`yield` expression is only used when defining a generator function, > and can only be used in the body of a function definition. Using a > @@ -336,7 +336,10 @@ > the generator's methods, the function can proceed exactly as if the > :keyword:`yield` expression was just another external call. The value of the > :keyword:`yield` expression after resuming depends on the method which resumed > -the execution. > +the execution. If :meth:`__next__` is used (typically via either a > +:keyword:`for` or the :func:`next` builtin) then the result is :const:`None`, > +otherwise, if :meth:`send` is used, then the result will be the value passed > +in to that method. > > .. index:: single: coroutine > > @@ -346,12 +349,29 @@ > where should the execution continue after it yields; the control is always > transferred to the generator's caller. > > -The :keyword:`yield` statement is allowed in the :keyword:`try` clause of a > +:keyword:`yield` expressions are allowed in the :keyword:`try` clause of a > :keyword:`try` ... :keyword:`finally` construct. If the generator is not > resumed before it is finalized (by reaching a zero reference count or by being > garbage collected), the generator-iterator's :meth:`close` method will be > called, allowing any pending :keyword:`finally` clauses to execute. > > +When ``yield from expression`` is used, it treats the supplied expression as > +a subiterator. All values produced by that subiterator are passed directly > +to the caller of the current generator's methods. Any values passed in with > +:meth:`send` and any exceptions passed in with :meth:`throw` are passed to > +the underlying iterator if it has the appropriate methods. If this is not the > +case, then :meth:`send` will raise :exc:`AttributeError` or :exc:`TypeError`, > +while :meth:`throw` will just raise the passed in exception immediately. > + > +When the underlying iterator is complete, the :attr:`~StopIteration.value` > +attribute of the raised :exc:`StopIteration` instance becomes the value of > +the yield expression. It can be either set explicitly when raising > +:exc:`StopIteration`, or automatically when the sub-iterator is a generator > +(by returning a value from the sub-generator). > + > +The parentheses can be omitted when the :keyword:`yield` expression is the > +sole expression on the right hand side of an assignment statement. > + > .. index:: object: generator > > The following generator's methods can be used to control the execution of a > @@ -444,6 +464,10 @@ > The proposal to enhance the API and syntax of generators, making them > usable as simple coroutines. > > + :pep:`0380` - Syntax for Delegating to a Subgenerator > + The proposal to introduce the :token:`yield_from` syntax, making delegation > + to sub-generators easy. > + > > .. _primaries: > > PEP 3155: Qualified name for classes and functions > ================================================== > > @@ -208,7 +224,6 @@ > how they might be accessible from the global scope. > > Example with (non-bound) methods:: > - > >>> class C: > ... def meth(self): > ... pass This looks like a spurious (and syntax-breaking) change. > diff --git a/Grammar/Grammar b/Grammar/Grammar > --- a/Grammar/Grammar > +++ b/Grammar/Grammar > @@ -121,7 +121,7 @@ > |'**' test) > # The reason that keywords are test nodes instead of NAME is that using NAME > # results in an ambiguity. ast.c makes sure it's a NAME. > -argument: test [comp_for] | test '=' test # Really [keyword '='] test > +argument: (test) [comp_for] | test '=' test # Really [keyword '='] test This looks like a change without effect? > diff --git a/Include/genobject.h b/Include/genobject.h > --- a/Include/genobject.h > +++ b/Include/genobject.h > @@ -11,20 +11,20 @@ > struct _frame; /* Avoid including frameobject.h */ > > typedef struct { > - PyObject_HEAD > - /* The gi_ prefix is intended to remind of generator-iterator. */ > + PyObject_HEAD > + /* The gi_ prefix is intended to remind of generator-iterator. */ > > - /* Note: gi_frame can be NULL if the generator is "finished" */ > - struct _frame *gi_frame; > + /* Note: gi_frame can be NULL if the generator is "finished" */ > + struct _frame *gi_frame; > > - /* True if generator is being executed. */ > - int gi_running; > + /* True if generator is being executed. */ > + int gi_running; > > - /* The code object backing the generator */ > - PyObject *gi_code; > + /* The code object backing the generator */ > + PyObject *gi_code; > > - /* List of weak reference. */ > - PyObject *gi_weakreflist; > + /* List of weak reference. */ > + PyObject *gi_weakreflist; > } PyGenObject; While these change tabs into spaces, it should be 4 spaces, not 8. > @@ -34,6 +34,7 @@ > > PyAPI_FUNC(PyObject *) PyGen_New(struct _frame *); > PyAPI_FUNC(int) PyGen_NeedsFinalizing(PyGenObject *); > +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); > > #ifdef __cplusplus > } Does this API need to be public? If yes, it needs to be documented. > diff --git a/Include/opcode.h b/Include/opcode.h > --- a/Include/opcode.h > +++ b/Include/opcode.h > @@ -7,116 +7,117 @@ > > /* Instruction opcodes for compiled code */ > > -#define POP_TOP 1 > -#define ROT_TWO 2 > -#define ROT_THREE 3 > -#define DUP_TOP 4 > +#define POP_TOP 1 > +#define ROT_TWO 2 > +#define ROT_THREE 3 > +#define DUP_TOP 4 > #define DUP_TOP_TWO 5 > -#define NOP 9 > +#define NOP 9 > > -#define UNARY_POSITIVE 10 > -#define UNARY_NEGATIVE 11 > -#define UNARY_NOT 12 > +#define UNARY_POSITIVE 10 > +#define UNARY_NEGATIVE 11 > +#define UNARY_NOT 12 > > -#define UNARY_INVERT 15 > +#define UNARY_INVERT 15 > > -#define BINARY_POWER 19 > +#define BINARY_POWER 19 > > -#define BINARY_MULTIPLY 20 > +#define BINARY_MULTIPLY 20 > > -#define BINARY_MODULO 22 > -#define BINARY_ADD 23 > -#define BINARY_SUBTRACT 24 > -#define BINARY_SUBSCR 25 > +#define BINARY_MODULO 22 > +#define BINARY_ADD 23 > +#define BINARY_SUBTRACT 24 > +#define BINARY_SUBSCR 25 > #define BINARY_FLOOR_DIVIDE 26 > #define BINARY_TRUE_DIVIDE 27 > #define INPLACE_FLOOR_DIVIDE 28 > #define INPLACE_TRUE_DIVIDE 29 > > -#define STORE_MAP 54 > -#define INPLACE_ADD 55 > -#define INPLACE_SUBTRACT 56 > -#define INPLACE_MULTIPLY 57 > +#define STORE_MAP 54 > +#define INPLACE_ADD 55 > +#define INPLACE_SUBTRACT 56 > +#define INPLACE_MULTIPLY 57 > > -#define INPLACE_MODULO 59 > -#define STORE_SUBSCR 60 > -#define DELETE_SUBSCR 61 > +#define INPLACE_MODULO 59 > +#define STORE_SUBSCR 60 > +#define DELETE_SUBSCR 61 > > -#define BINARY_LSHIFT 62 > -#define BINARY_RSHIFT 63 > -#define BINARY_AND 64 > -#define BINARY_XOR 65 > -#define BINARY_OR 66 > -#define INPLACE_POWER 67 > -#define GET_ITER 68 > -#define STORE_LOCALS 69 > -#define PRINT_EXPR 70 > +#define BINARY_LSHIFT 62 > +#define BINARY_RSHIFT 63 > +#define BINARY_AND 64 > +#define BINARY_XOR 65 > +#define BINARY_OR 66 > +#define INPLACE_POWER 67 > +#define GET_ITER 68 > +#define STORE_LOCALS 69 > +#define PRINT_EXPR 70 > #define LOAD_BUILD_CLASS 71 > +#define YIELD_FROM 72 > > -#define INPLACE_LSHIFT 75 > -#define INPLACE_RSHIFT 76 > -#define INPLACE_AND 77 > -#define INPLACE_XOR 78 > -#define INPLACE_OR 79 > -#define BREAK_LOOP 80 > +#define INPLACE_LSHIFT 75 > +#define INPLACE_RSHIFT 76 > +#define INPLACE_AND 77 > +#define INPLACE_XOR 78 > +#define INPLACE_OR 79 > +#define BREAK_LOOP 80 > #define WITH_CLEANUP 81 > > -#define RETURN_VALUE 83 > -#define IMPORT_STAR 84 > +#define RETURN_VALUE 83 > +#define IMPORT_STAR 84 > > -#define YIELD_VALUE 86 > -#define POP_BLOCK 87 > -#define END_FINALLY 88 > -#define POP_EXCEPT 89 > +#define YIELD_VALUE 86 > +#define POP_BLOCK 87 > +#define END_FINALLY 88 > +#define POP_EXCEPT 89 > > -#define HAVE_ARGUMENT 90 /* Opcodes from here have an argument: */ > +#define HAVE_ARGUMENT 90 /* Opcodes from here have an argument: */ > > -#define STORE_NAME 90 /* Index in name list */ > -#define DELETE_NAME 91 /* "" */ > -#define UNPACK_SEQUENCE 92 /* Number of sequence items */ > -#define FOR_ITER 93 > +#define STORE_NAME 90 /* Index in name list */ > +#define DELETE_NAME 91 /* "" */ > +#define UNPACK_SEQUENCE 92 /* Number of sequence items */ > +#define FOR_ITER 93 > #define UNPACK_EX 94 /* Num items before variable part + > (Num items after variable part << 8) */ > > -#define STORE_ATTR 95 /* Index in name list */ > -#define DELETE_ATTR 96 /* "" */ > -#define STORE_GLOBAL 97 /* "" */ > -#define DELETE_GLOBAL 98 /* "" */ > +#define STORE_ATTR 95 /* Index in name list */ > +#define DELETE_ATTR 96 /* "" */ > +#define STORE_GLOBAL 97 /* "" */ > +#define DELETE_GLOBAL 98 /* "" */ > > -#define LOAD_CONST 100 /* Index in const list */ > -#define LOAD_NAME 101 /* Index in name list */ > -#define BUILD_TUPLE 102 /* Number of tuple items */ > -#define BUILD_LIST 103 /* Number of list items */ > -#define BUILD_SET 104 /* Number of set items */ > -#define BUILD_MAP 105 /* Always zero for now */ > -#define LOAD_ATTR 106 /* Index in name list */ > -#define COMPARE_OP 107 /* Comparison operator */ > -#define IMPORT_NAME 108 /* Index in name list */ > -#define IMPORT_FROM 109 /* Index in name list */ > +#define LOAD_CONST 100 /* Index in const list */ > +#define LOAD_NAME 101 /* Index in name list */ > +#define BUILD_TUPLE 102 /* Number of tuple items */ > +#define BUILD_LIST 103 /* Number of list items */ > +#define BUILD_SET 104 /* Number of set items */ > +#define BUILD_MAP 105 /* Always zero for now */ > +#define LOAD_ATTR 106 /* Index in name list */ > +#define COMPARE_OP 107 /* Comparison operator */ > +#define IMPORT_NAME 108 /* Index in name list */ > +#define IMPORT_FROM 109 /* Index in name list */ > > -#define JUMP_FORWARD 110 /* Number of bytes to skip */ > -#define JUMP_IF_FALSE_OR_POP 111 /* Target byte offset from beginning of code */ > -#define JUMP_IF_TRUE_OR_POP 112 /* "" */ > -#define JUMP_ABSOLUTE 113 /* "" */ > -#define POP_JUMP_IF_FALSE 114 /* "" */ > -#define POP_JUMP_IF_TRUE 115 /* "" */ > +#define JUMP_FORWARD 110 /* Number of bytes to skip */ > +#define JUMP_IF_FALSE_OR_POP 111 /* Target byte offset from beginning of code */ > +#define JUMP_IF_TRUE_OR_POP 112 /* "" */ > +#define JUMP_ABSOLUTE 113 /* "" */ > +#define POP_JUMP_IF_FALSE 114 /* "" */ > +#define POP_JUMP_IF_TRUE 115 /* "" */ > > -#define LOAD_GLOBAL 116 /* Index in name list */ > +#define LOAD_GLOBAL 116 /* Index in name list */ > > -#define CONTINUE_LOOP 119 /* Start of loop (absolute) */ > -#define SETUP_LOOP 120 /* Target address (relative) */ > -#define SETUP_EXCEPT 121 /* "" */ > -#define SETUP_FINALLY 122 /* "" */ > +#define CONTINUE_LOOP 119 /* Start of loop (absolute) */ > +#define SETUP_LOOP 120 /* Target address (relative) */ > +#define SETUP_EXCEPT 121 /* "" */ > +#define SETUP_FINALLY 122 /* "" */ > > -#define LOAD_FAST 124 /* Local variable number */ > -#define STORE_FAST 125 /* Local variable number */ > -#define DELETE_FAST 126 /* Local variable number */ > +#define LOAD_FAST 124 /* Local variable number */ > +#define STORE_FAST 125 /* Local variable number */ > +#define DELETE_FAST 126 /* Local variable number */ > > -#define RAISE_VARARGS 130 /* Number of raise arguments (1, 2 or 3) */ > +#define RAISE_VARARGS 130 /* Number of raise arguments (1, 2 or 3) */ > /* CALL_FUNCTION_XXX opcodes defined below depend on this definition */ > -#define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ > -#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults<<8 + #annotations<<16 */ > -#define BUILD_SLICE 133 /* Number of items */ > +#define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ > +#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults<<8 + #annotations<<16 */ > +#define BUILD_SLICE 133 /* Number of items */ > > #define MAKE_CLOSURE 134 /* same as MAKE_FUNCTION */ > #define LOAD_CLOSURE 135 /* Load free variable from closure */ Not sure putting these and all the other cosmetic changes into an already big patch is such a good idea... > diff --git a/Include/pyerrors.h b/Include/pyerrors.h > --- a/Include/pyerrors.h > +++ b/Include/pyerrors.h > @@ -51,6 +51,11 @@ > Py_ssize_t written; /* only for BlockingIOError, -1 otherwise */ > } PyOSErrorObject; > > +typedef struct { > + PyException_HEAD > + PyObject *value; > +} PyStopIterationObject; > + > /* Compatibility typedefs */ > typedef PyOSErrorObject PyEnvironmentErrorObject; > #ifdef MS_WINDOWS > @@ -380,6 +385,8 @@ > const char *reason /* UTF-8 encoded string */ > ); > > +/* create a StopIteration exception with the given value */ > +PyAPI_FUNC(PyObject *) PyStopIteration_Create(PyObject *); About this API see below. > diff --git a/Objects/abstract.c b/Objects/abstract.c > --- a/Objects/abstract.c > +++ b/Objects/abstract.c > @@ -2267,7 +2267,6 @@ > > func = PyObject_GetAttrString(o, name); > if (func == NULL) { > - PyErr_SetString(PyExc_AttributeError, name); > return 0; > } > > @@ -2311,7 +2310,6 @@ > > func = PyObject_GetAttrString(o, name); > if (func == NULL) { > - PyErr_SetString(PyExc_AttributeError, name); > return 0; > } > va_start(va, format); These two changes also look suspiciously unrelated? > +PyObject * > +PyStopIteration_Create(PyObject *value) > +{ > + return PyObject_CallFunctionObjArgs(PyExc_StopIteration, value, NULL); > +} I think this function is rather questionable. It is only used once at all. If kept, it should rather be named _PyE{rr,xc}_CreateStopIteration. But since it's so trivial, it should be removed altogether. > diff --git a/Objects/genobject.c b/Objects/genobject.c > --- a/Objects/genobject.c > +++ b/Objects/genobject.c > @@ -5,6 +5,9 @@ > #include "structmember.h" > #include "opcode.h" > > +static PyObject *gen_close(PyGenObject *gen, PyObject *args); > +static void gen_undelegate(PyGenObject *gen); > + > static int > gen_traverse(PyGenObject *gen, visitproc visit, void *arg) > { > @@ -90,12 +93,18 @@ > > /* If the generator just returned (as opposed to yielding), signal > * that the generator is exhausted. */ > - if (result == Py_None && f->f_stacktop == NULL) { > - Py_DECREF(result); > - result = NULL; > - /* Set exception if not called by gen_iternext() */ > - if (arg) > + if (result && f->f_stacktop == NULL) { > + if (result == Py_None) { > + /* Delay exception instantiation if we can */ > PyErr_SetNone(PyExc_StopIteration); > + } else { > + PyObject *e = PyStopIteration_Create(result); > + if (e != NULL) { > + PyErr_SetObject(PyExc_StopIteration, e); > + Py_DECREF(e); > + } Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here anyway? > +/* > + * If StopIteration exception is set, fetches its 'value' > + * attribute if any, otherwise sets pvalue to None. > + * > + * Returns 0 if no exception or StopIteration is set. > + * If any other exception is set, returns -1 and leaves > + * pvalue unchanged. > + */ > + > +int > +PyGen_FetchStopIterationValue(PyObject **pvalue) { > + PyObject *et, *ev, *tb; > + PyObject *value = NULL; > + > + if (PyErr_ExceptionMatches(PyExc_StopIteration)) { > + PyErr_Fetch(&et, &ev, &tb); > + Py_XDECREF(et); > + Py_XDECREF(tb); > + if (ev) { > + value = ((PyStopIterationObject *)ev)->value; > + Py_DECREF(ev); > + } PyErr_Fetch without PyErr_Restore clears the exception, that should be mentioned in the docstring. Georg From techtonik at gmail.com Fri Jan 13 16:34:38 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 13 Jan 2012 18:34:38 +0300 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) Message-ID: Posting to python-dev as it is no more relates to the idea of improving print(). sys.stdout.write() in Python 3 causes backwards incompatible behavior that breaks recipe for unbuffered character reading from stdin on Linux - http://code.activestate.com/recipes/134892/ At first I though that the problem is in the new print() function, but it appeared that the culprit is sys.stdout.write() Attached is a test script which is a stripped down version of the recipe above. If executed with Python 2, you can see the prompt to press a key (even though output on Linux is buffered in Python 2). With Python 3, there is not prompt until you press a key. Is it a bug or intended behavior? What is the cause of this break? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: getchrec.py Type: text/x-python Size: 489 bytes Desc: not available URL: From guido at python.org Fri Jan 13 16:49:56 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 07:49:56 -0800 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: References: Message-ID: I think this may be because in Python 2, there is a coupling between stdin and stderr (in the C stdlib code) that flushes stdout when you read stdin. This doesn't seem to be required by the C std, but most implementations seem to do it. http://stackoverflow.com/questions/2123528/does-reading-from-stdin-flush-stdout I think it was a nice feature but I can see problems with it; apps that want this behavior ought to bite the bullet and flush stdout. On Fri, Jan 13, 2012 at 7:34 AM, anatoly techtonik wrote: > Posting to python-dev as it is no more relates to the idea of improving > print(). > > > sys.stdout.write() in Python 3 causes backwards incompatible behavior that > breaks recipe for unbuffered character reading from stdin on Linux - > http://code.activestate.com/recipes/134892/ At first I though that the > problem is in the new print() function, but it appeared that the culprit is > sys.stdout.write() > > Attached is a test script which is a stripped down version of the recipe > above. > > If executed with Python 2, you can see the prompt to press a key (even > though output on Linux is buffered in Python 2). > With Python 3, there is not prompt until you press a key. > > Is it a bug or intended behavior? What is the cause of this break? > -- > anatoly t. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 13 17:00:23 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 08:00:23 -0800 Subject: [Python-Dev] PEP 380 ("yield from") is now Final In-Reply-To: References: Message-ID: AWESOME!!! On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan wrote: > I marked PEP 380 as Final this evening, after pushing the tested and > documented implementation to hg.python.org: > http://hg.python.org/cpython/rev/d64ac9ab4cd0 > > As the list of names in the NEWS and What's New entries suggests, it > was quite a collaborative effort to get this one over the line, and > that's without even listing all the people that offered helpful > suggestions and comments along the way :) > > print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))()))) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-dev at masklinn.net Fri Jan 13 17:00:57 2012 From: python-dev at masklinn.net (Xavier Morel) Date: Fri, 13 Jan 2012 17:00:57 +0100 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: References: Message-ID: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> On 2012-01-13, at 16:34 , anatoly techtonik wrote: > Posting to python-dev as it is no more relates to the idea of improving > print(). > > > sys.stdout.write() in Python 3 causes backwards incompatible behavior that > breaks recipe for unbuffered character reading from stdin on Linux - > http://code.activestate.com/recipes/134892/ At first I though that the > problem is in the new print() function, but it appeared that the culprit is > sys.stdout.write() > > Attached is a test script which is a stripped down version of the recipe > above. > > If executed with Python 2, you can see the prompt to press a key (even > though output on Linux is buffered in Python 2). > With Python 3, there is not prompt until you press a key. > > Is it a bug or intended behavior? What is the cause of this break? FWIW this is not restricted to Linux (the same behavior change can be observed in OSX), and the script is overly complex you can expose the change with 3 lines import sys sys.stdout.write('promt>') sys.stdin.read(1) Python 2 displays "prompt" and terminates execution on [Return], Python 3 does not display anything until [Return] is pressed. Interestingly, the `-u` option is not sufficient to make "prompt>" appear in Python 3, the stream has to be flushed explicitly unless the input is ~16k characters (I guess that's an internal buffer size of some sort) From solipsis at pitrou.net Fri Jan 13 17:19:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 13 Jan 2012 17:19:08 +0100 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) References: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> Message-ID: <20120113171908.4e1da88d@pitrou.net> On Fri, 13 Jan 2012 17:00:57 +0100 Xavier Morel wrote: > FWIW this is not restricted to Linux (the same behavior change can > be observed in OSX), and the script is overly complex you can expose > the change with 3 lines > > import sys > sys.stdout.write('promt>') > sys.stdin.read(1) > > Python 2 displays "prompt" and terminates execution on [Return], > Python 3 does not display anything until [Return] is pressed. > > Interestingly, the `-u` option is not sufficient to make > "prompt>" appear in Python 3, the stream has to be flushed > explicitly unless the input is ~16k characters (I guess that's > an internal buffer size of some sort) "-u" forces line-buffering mode for stdout/stderr, which is already the default if they are wired to an interactive device (isattr() returning True). But this was already rehashed on python-ideas and the bug tracker, and apparently Anatoly thought it would be a good idea to post on a third medium. Sigh. Regards Antoine. From status at bugs.python.org Fri Jan 13 18:07:30 2012 From: status at bugs.python.org (Python tracker) Date: Fri, 13 Jan 2012 18:07:30 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20120113170730.7B70B1CBFF@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2012-01-06 - 2012-01-13) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3210 (+30) closed 22352 (+30) total 25562 (+60) Open issues with patches: 1384 Issues opened (42) ================== #6774: socket.shutdown documentation: on some platforms, closing one http://bugs.python.org/issue6774 reopened by neologix #13721: ssl.wrap_socket on a connected but failed connection succeeds http://bugs.python.org/issue13721 opened by kiilerix #13722: "distributions can disable the encodings package" http://bugs.python.org/issue13722 opened by pitrou #13723: Regular expressions: (?:X|\s+)*$ takes a long time http://bugs.python.org/issue13723 opened by ericp #13725: regrtest does not recognize -d flag http://bugs.python.org/issue13725 opened by etukia #13726: regrtest ambiguous -S flag http://bugs.python.org/issue13726 opened by etukia #13727: Accessor macros for PyDateTime_Delta members http://bugs.python.org/issue13727 opened by amaury.forgeotdarc #13728: Description of -m and -c cli options wrong? http://bugs.python.org/issue13728 opened by sandro.tosi #13730: Grammar mistake in Decimal documentation http://bugs.python.org/issue13730 opened by zacherates #13733: Change required to sysconfig.py for Python 2.7.2 on OS/2 http://bugs.python.org/issue13733 opened by Paul.Smedley #13734: Add a generic directory walker method to avoid symlink attacks http://bugs.python.org/issue13734 opened by hynek #13736: urllib.request.urlopen leaks exceptions from socket and httpli http://bugs.python.org/issue13736 opened by jmoy #13737: bugs.python.org/review's Django settings file DEBUG=True http://bugs.python.org/issue13737 opened by Bithin.A #13740: winsound.SND_NOWAIT ignored on modern Windows platforms http://bugs.python.org/issue13740 opened by bughunter2 #13742: Add a key parameter (like sorted) to heapq.merge http://bugs.python.org/issue13742 opened by ssapin #13743: xml.dom.minidom.Document class is not documented http://bugs.python.org/issue13743 opened by sandro.tosi #13744: raw byte strings are described in a confusing way http://bugs.python.org/issue13744 opened by barry #13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte http://bugs.python.org/issue13745 opened by doko #13746: ast.Tuple's have an inconsistent "col_offset" value http://bugs.python.org/issue13746 opened by bronikkk #13747: ssl_version documentation error http://bugs.python.org/issue13747 opened by Ben.Darnell #13749: socketserver can't stop http://bugs.python.org/issue13749 opened by teamnoir #13751: multiprocessing.pool hangs if any worker raises an Exception w http://bugs.python.org/issue13751 opened by fmitha #13752: add a str.casefold() method http://bugs.python.org/issue13752 opened by benjamin.peterson #13756: Python3.2.2 make fail on cygwin http://bugs.python.org/issue13756 opened by holgerd00d #13758: compile() should not encode 'filename' (at least on Windows) http://bugs.python.org/issue13758 opened by terry.reedy #13759: Python 3.2.2 Mac installer version doesn't accept multibyte ch http://bugs.python.org/issue13759 opened by ats #13760: ConfigParser exceptions are not pickleable http://bugs.python.org/issue13760 opened by fmitha #13761: Add flush keyword to print() http://bugs.python.org/issue13761 opened by georg.brandl #13763: rm obsolete reference in devguide http://bugs.python.org/issue13763 opened by tshepang #13764: Misc/build.sh is outdated... talks about svn http://bugs.python.org/issue13764 opened by tshepang #13766: explain the relationship between Lib/lib2to3/Grammar.txt and G http://bugs.python.org/issue13766 opened by tshepang #13768: Doc/tools/dailybuild.py available only on 2.7 branch http://bugs.python.org/issue13768 opened by tshepang #13769: json.dump(ensure_ascii=False) return str instead of unicode http://bugs.python.org/issue13769 opened by mmarkk #13770: python3 & json: add ensure_ascii documentation http://bugs.python.org/issue13770 opened by mmarkk #13771: HTTPSConnection __init__ super implementation causes recursion http://bugs.python.org/issue13771 opened by michael.mulich #13772: listdir() doesn't work with non-trivial symlinks http://bugs.python.org/issue13772 opened by pitrou #13773: Support sqlite3 uri filenames http://bugs.python.org/issue13773 opened by poq #13774: json.loads raises a SystemError for invalid encoding on 2.7.2 http://bugs.python.org/issue13774 opened by Julian #13775: Access Denied message on symlink creation misleading for an ex http://bugs.python.org/issue13775 opened by santa4nt #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 opened by goderbauer #13779: os.walk: bottom-up http://bugs.python.org/issue13779 opened by patrick.vrijlandt #13780: make YieldFrom its own node http://bugs.python.org/issue13780 opened by benjamin.peterson Most recent 15 issues with no replies (15) ========================================== #13780: make YieldFrom its own node http://bugs.python.org/issue13780 #13779: os.walk: bottom-up http://bugs.python.org/issue13779 #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 #13771: HTTPSConnection __init__ super implementation causes recursion http://bugs.python.org/issue13771 #13770: python3 & json: add ensure_ascii documentation http://bugs.python.org/issue13770 #13769: json.dump(ensure_ascii=False) return str instead of unicode http://bugs.python.org/issue13769 #13768: Doc/tools/dailybuild.py available only on 2.7 branch http://bugs.python.org/issue13768 #13766: explain the relationship between Lib/lib2to3/Grammar.txt and G http://bugs.python.org/issue13766 #13760: ConfigParser exceptions are not pickleable http://bugs.python.org/issue13760 #13756: Python3.2.2 make fail on cygwin http://bugs.python.org/issue13756 #13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte http://bugs.python.org/issue13745 #13743: xml.dom.minidom.Document class is not documented http://bugs.python.org/issue13743 #13740: winsound.SND_NOWAIT ignored on modern Windows platforms http://bugs.python.org/issue13740 #13730: Grammar mistake in Decimal documentation http://bugs.python.org/issue13730 #13727: Accessor macros for PyDateTime_Delta members http://bugs.python.org/issue13727 Most recent 15 issues waiting for review (15) ============================================= #13780: make YieldFrom its own node http://bugs.python.org/issue13780 #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 #13775: Access Denied message on symlink creation misleading for an ex http://bugs.python.org/issue13775 #13774: json.loads raises a SystemError for invalid encoding on 2.7.2 http://bugs.python.org/issue13774 #13773: Support sqlite3 uri filenames http://bugs.python.org/issue13773 #13763: rm obsolete reference in devguide http://bugs.python.org/issue13763 #13761: Add flush keyword to print() http://bugs.python.org/issue13761 #13752: add a str.casefold() method http://bugs.python.org/issue13752 #13742: Add a key parameter (like sorted) to heapq.merge http://bugs.python.org/issue13742 #13736: urllib.request.urlopen leaks exceptions from socket and httpli http://bugs.python.org/issue13736 #13734: Add a generic directory walker method to avoid symlink attacks http://bugs.python.org/issue13734 #13733: Change required to sysconfig.py for Python 2.7.2 on OS/2 http://bugs.python.org/issue13733 #13730: Grammar mistake in Decimal documentation http://bugs.python.org/issue13730 #13727: Accessor macros for PyDateTime_Delta members http://bugs.python.org/issue13727 #13725: regrtest does not recognize -d flag http://bugs.python.org/issue13725 Top 10 most discussed issues (10) ================================= #13703: Hash collision security issue http://bugs.python.org/issue13703 43 msgs #13734: Add a generic directory walker method to avoid symlink attacks http://bugs.python.org/issue13734 13 msgs #13761: Add flush keyword to print() http://bugs.python.org/issue13761 12 msgs #13721: ssl.wrap_socket on a connected but failed connection succeeds http://bugs.python.org/issue13721 8 msgs #13122: Out of date links in the sidebar of the documentation index of http://bugs.python.org/issue13122 7 msgs #13241: llvm-gcc-4.2 miscompiles Python (XCode 4.1 on Mac OS 10.7) http://bugs.python.org/issue13241 7 msgs #13733: Change required to sysconfig.py for Python 2.7.2 on OS/2 http://bugs.python.org/issue13733 7 msgs #13642: urllib incorrectly quotes username and password in https basic http://bugs.python.org/issue13642 6 msgs #9253: argparse: optional subparsers http://bugs.python.org/issue9253 5 msgs #13521: Make dict.setdefault() atomic http://bugs.python.org/issue13521 5 msgs Issues closed (29) ================== #9637: docs do not say that urllib uses HTTP_PROXY http://bugs.python.org/issue9637 closed by orsenthil #9993: shutil.move fails on symlink source http://bugs.python.org/issue9993 closed by pitrou #11418: Method's global scope is module containing function definition http://bugs.python.org/issue11418 closed by python-dev #11682: PEP 380 reference implementation for 3.3 http://bugs.python.org/issue11682 closed by ncoghlan #12364: Deadlock in test_concurrent_futures http://bugs.python.org/issue12364 closed by rosslagerwall #13168: Python 2.6 having trouble finding modules when invoked via a s http://bugs.python.org/issue13168 closed by terry.reedy #13502: Documentation for Event.wait return value is either wrong or i http://bugs.python.org/issue13502 closed by neologix #13692: 2to3 mangles from . import frobnitz http://bugs.python.org/issue13692 closed by benjamin.peterson #13718: Format Specification Mini-Language does not accept comma for p http://bugs.python.org/issue13718 closed by eric.smith #13724: socket.create_connection and multiple IP addresses http://bugs.python.org/issue13724 closed by pitrou #13729: Evaluation order for dics key/value http://bugs.python.org/issue13729 closed by amaury.forgeotdarc #13731: Awkward phrasing in Decimal documentation http://bugs.python.org/issue13731 closed by rhettinger #13732: test_logging failure on Windows buildbots http://bugs.python.org/issue13732 closed by python-dev #13735: The protocol > 0 of cPickle does not given stable dictionary v http://bugs.python.org/issue13735 closed by pitrou #13738: Optimize bytes.upper() and lower() http://bugs.python.org/issue13738 closed by pitrou #13739: os.fdlistdir() is not idempotent http://bugs.python.org/issue13739 closed by neologix #13741: *** glibc detected *** python: double free or corruption (!pre http://bugs.python.org/issue13741 closed by neologix #13748: Allow rb"" literals as an equivalent to br"" http://bugs.python.org/issue13748 closed by pitrou #13750: queue broken when built without-thread http://bugs.python.org/issue13750 closed by rhettinger #13753: str.join description contains an incorrect reference to argume http://bugs.python.org/issue13753 closed by terry.reedy #13754: str.ljust and str.rjust do not exactly describes original stri http://bugs.python.org/issue13754 closed by python-dev #13755: str.endswith and str.startswith do not take lists of strings http://bugs.python.org/issue13755 closed by rhettinger #13757: os.fdlistdir() should not close the file descriptor given in a http://bugs.python.org/issue13757 closed by neologix #13762: missing section: how to contribute to devguide http://bugs.python.org/issue13762 closed by tshepang #13765: Distutils does not put quotes around paths that contain spaces http://bugs.python.org/issue13765 closed by eric.araujo #13767: Would be nice to have a future import that turned off old exce http://bugs.python.org/issue13767 closed by benjamin.peterson #13776: formatter_unicode.c still assumes ASCII http://bugs.python.org/issue13776 closed by eric.smith #13778: Python should invalidate all non-owned 'thread.lock' objects w http://bugs.python.org/issue13778 closed by neologix #12736: Request for python casemapping functions to use full not simpl http://bugs.python.org/issue12736 closed by benjamin.peterson From python-dev at masklinn.net Fri Jan 13 18:07:28 2012 From: python-dev at masklinn.net (Xavier Morel) Date: Fri, 13 Jan 2012 18:07:28 +0100 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: <20120113171908.4e1da88d@pitrou.net> References: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> <20120113171908.4e1da88d@pitrou.net> Message-ID: <2AF373C1-C710-4788-91F6-D75FEF4A9931@masklinn.net> On 2012-01-13, at 17:19 , Antoine Pitrou wrote: > > "-u" forces line-buffering mode for stdout/stderr, which is already the > default if they are wired to an interactive device (isattr() returning > True). Oh, I had not noticed the documentation had changed in Python 3 (in Python 2 it stated that `-u` made IO unbuffered, on Python 3 it now states that only binary IO is unbuffered and text IO remains line-buffered). Sorry about that. From dickinsm at gmail.com Fri Jan 13 18:08:26 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 13 Jan 2012 17:08:26 +0000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum wrote: > How > pathological the data needs to be before the collision counter triggers? I'd > expect *very* pathological. How pathological do you consider the set {1 << n for n in range(2000)} to be? What about the set: ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} ? The > 2000 elements of the latter set have only 61 distinct hash values on 64-bit machine, so there will be over 2000 total collisions involved in creating this set (though admittedly only around 30 collisions per hash value). -- Mark From guido at python.org Fri Jan 13 18:43:00 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 09:43:00 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson wrote: > On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum > wrote: > > How > > pathological the data needs to be before the collision counter triggers? > I'd > > expect *very* pathological. > > How pathological do you consider the set > > {1 << n for n in range(2000)} > > to be? What about the set: > > ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} > > ? The > 2000 elements of the latter set have only 61 distinct hash > values on 64-bit machine, so there will be over 2000 total collisions > involved in creating this set (though admittedly only around 30 > collisions per hash value). > Hm... So how does the collision counting work for this case? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jan 13 18:54:29 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 13 Jan 2012 18:54:29 +0100 Subject: [Python-Dev] PEP 380 ("yield from") is now Final References: Message-ID: <20120113185429.41c7b4ad@pitrou.net> On Fri, 13 Jan 2012 22:14:43 +1000 Nick Coghlan wrote: > I marked PEP 380 as Final this evening, after pushing the tested and > documented implementation to hg.python.org: > http://hg.python.org/cpython/rev/d64ac9ab4cd0 I don't know if this is supposed to work, but the exception looks wrong: >>> def g(): yield from () ... >>> f = list(g()) Traceback (most recent call last): File "", line 1, in File "", line 1, in g SystemError: error return without exception set Also, the checkin lacked a bytecode magic number bump. It is not really a problem since I've just bumped it anyway. Regards Antoine. From dickinsm at gmail.com Fri Jan 13 19:13:08 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 13 Jan 2012 18:13:08 +0000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum wrote: >> How pathological do you consider the set >> >> ? {1 << n for n in range(2000)} >> >> to be? ?What about the set: >> >> ? ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} >> >> ? ?The > 2000 elements of the latter set have only 61 distinct hash >> values on 64-bit machine, so there will be over 2000 total collisions >> involved in creating this set (though admittedly only around 30 >> collisions per hash value). > > Hm... So how does the collision counting work for this case? Ah, my bad. It looks like the ieee754_powers_of_two is safe---IIUC, it's the number of collisions involved in a single key-set operation that's limited. So a dictionary with keys {1<>> {1<<(n*61):True for n in range(2000)} Traceback (most recent call last): File "", line 1, in File "", line 1, in KeyError: 'too many hash collisions' [67961 refs] I'd still not consider this particularly pathological, though. -- Mark From guido at python.org Fri Jan 13 22:22:32 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 13:22:32 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson wrote: > On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum > wrote: > >> How pathological do you consider the set > >> > >> {1 << n for n in range(2000)} > >> > >> to be? What about the set: > >> > >> ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)} > >> > >> ? The > 2000 elements of the latter set have only 61 distinct hash > >> values on 64-bit machine, so there will be over 2000 total collisions > >> involved in creating this set (though admittedly only around 30 > >> collisions per hash value). > > > > Hm... So how does the collision counting work for this case? > > Ah, my bad. It looks like the ieee754_powers_of_two is safe---IIUC, > it's the number of collisions involved in a single key-set operation > that's limited. So a dictionary with keys {1< is fine, but a dictionary with keys {1<<(61*n) for n in range(2000)} > is not: > > >>> {1<<(n*61):True for n in range(2000)} > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in > KeyError: 'too many hash collisions' > [67961 refs] > > I'd still not consider this particularly pathological, though. Really? Even though you came up with specifically to prove me wrong? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 13 23:48:09 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 13 Jan 2012 17:48:09 -0500 Subject: [Python-Dev] PEP 380 ("yield from") is now Final In-Reply-To: References: Message-ID: On 1/13/2012 7:14 AM, Nick Coghlan wrote: > print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))()))) I pulled, rebuilt, and it indeed works (on Win 7). I just remembered that Tim Peters somewhere (generator.c?) left a large comment with examples of recursive generators, such as knight's tours. Could these be rewritten with (and benefit from) 'yield from'? (It occurs to me his stuff might be worth exposing in an iterator/generator how-to.) -- Terry Jan Reedy From dinov at microsoft.com Sat Jan 14 00:22:20 2012 From: dinov at microsoft.com (Dino Viehland) Date: Fri, 13 Jan 2012 23:22:20 +0000 Subject: [Python-Dev] Python as a Metro-style App References: <4F088795.5000800@v.loewis.de> <20120107235729.5d3953af@pitrou.net> <6C7ABA8B4E309440B857D74348836F2E4CCBBC92@TK5EX14MBXC292.redmond.corp.microsoft.com> <4F0CD88C.6030407@v.loewis.de> Message-ID: <6C7ABA8B4E309440B857D74348836F2E528A417A@TK5EX14MBXC292.redmond.corp.microsoft.com> Dino wrote: > Martin wrote: > > See the start of the thread: I tried to create a "WinRT Component > > DLL", and that failed, as VS would refuse to compile any C file in > > such a project. Not sure whether this is triggered by defining > > WINAPI_FAMILY=2, or any other compiler setting. > > > > I'd really love to use WINAPI_FAMILY=2, as compiler errors are much > > easier to fix than verifier errors. > > ... > > I'm going to ping some people on the windows team and see if the app > container bit is or will be necessary for DLLs. > I heard back from the Windows team and they are going to require the app container bit to be set on all PE files (although they don't currently enforce it). I was able to compile a simple .c file and pass /link /appcontainer and that worked, so I'm going to try and figure out if there's some way to get the .vcxproj to build a working command line that includes that. From benjamin at python.org Sat Jan 14 01:37:28 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 13 Jan 2012 19:37:28 -0500 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: 2012/1/13 Guido van Rossum : > Really? Even though you came up with specifically to prove me wrong? Coming up with a counterexample now invalidates it? -- Regards, Benjamin From solipsis at pitrou.net Sat Jan 14 02:17:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 14 Jan 2012 02:17:08 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: Message-ID: <20120114021708.2fbe990f@pitrou.net> On Thu, 12 Jan 2012 18:57:42 -0800 Guido van Rossum wrote: > Hm... I started out as a big fan of the randomized hash, but thinking more > about it, I actually believe that the chances of some legitimate app having > >1000 collisions are way smaller than the chances that somebody's code will > break due to the variable hashing. Breaking due to variable hashing is deterministic: you notice it as soon as you upgrade (and then you use PYTHONHASHSEED to disable variable hashing). That seems better than unpredictable breaking when some legitimate collision chain happens. Regards Antoine. From victor.stinner at haypocalc.com Sat Jan 14 02:35:14 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 14 Jan 2012 02:35:14 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: > - Glenn Linderman proposes to fix the vulnerability by adding a new > "safe" dict type (only accepting string keys). His proof-of-concept > (SafeDict.py) uses a secret of 64 random bits and uses it to compute > the hash of a key. We could mix Marc's collision counter with SafeDict idea (being able to use a different secret for each dict): use hash(key, secret) (simple example: hash(secret+key)) instead of hash(key) in dict (and set), and change the secret if we have more than N collisions. But it would slow down all dict lookup (dict creation, get, set, del, ...). And getting new random data can also be slow. SafeDict and hash(secret+key) lose the benefit of the cached hash result. Because the hash result depends on a argument, we cannot cache the result anymore, and we have to recompute the hash for each lookup (even if you lookup the same key twice ore more). Victor From guido at python.org Sat Jan 14 02:38:02 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 17:38:02 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <20120114021708.2fbe990f@pitrou.net> References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: > On Thu, 12 Jan 2012 18:57:42 -0800 > Guido van Rossum wrote: > > Hm... I started out as a big fan of the randomized hash, but thinking > more > > about it, I actually believe that the chances of some legitimate app > having > > >1000 collisions are way smaller than the chances that somebody's code > will > > break due to the variable hashing. > > Breaking due to variable hashing is deterministic: you notice it as > soon as you upgrade (and then you use PYTHONHASHSEED to disable > variable hashing). That seems better than unpredictable breaking when > some legitimate collision chain happens. Fair enough. But I'm now uncomfortable with turning this on for bugfix releases. I'm fine with making this the default in 3.3, just not in 3.2, 3.1 or 2.x -- it will break too much code and organizations will have to roll back the release or do extensive testing before installing a bugfix release -- exactly what we *don't* want for those. FWIW, I don't believe in the SafeDict solution -- you never know which dicts you have to change. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sat Jan 14 02:58:23 2012 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 13 Jan 2012 17:58:23 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum wrote: > On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: > >> On Thu, 12 Jan 2012 18:57:42 -0800 >> Guido van Rossum wrote: >> > Hm... I started out as a big fan of the randomized hash, but thinking >> more >> > about it, I actually believe that the chances of some legitimate app >> having >> > >1000 collisions are way smaller than the chances that somebody's code >> will >> > break due to the variable hashing. >> >> Breaking due to variable hashing is deterministic: you notice it as >> soon as you upgrade (and then you use PYTHONHASHSEED to disable >> variable hashing). That seems better than unpredictable breaking when >> some legitimate collision chain happens. > > > Fair enough. But I'm now uncomfortable with turning this on for bugfix > releases. I'm fine with making this the default in 3.3, just not in 3.2, > 3.1 or 2.x -- it will break too much code and organizations will have to > roll back the release or do extensive testing before installing a bugfix > release -- exactly what we *don't* want for those. > > FWIW, I don't believe in the SafeDict solution -- you never know which > dicts you have to change. > > Agreed. Of the three options Victor listed only one is good. I don't like *SafeDict*. *-1*. It puts the onerous on the coder to always get everything right with regards to data that came from outside the process never ending up hashed in a non-safe dict or set *anywhere*. "Safe" needs to be the default option for all hash tables. I don't like the "*too many hash collisions*" exception. *-1*. It provides non-deterministic application behavior for data driven applications with no way for them to predict when it'll happen or where and prepare for it. It may work in practice for many applications but is simply odd behavior. I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be back ported to any Python version. It is perfectly okay to break existing users who had anything depending on ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the feature off at their own peril which they can use in their test harnesses that are stupid enough to use doctests with order dependencies. This approach worked fine for Perl 9 years ago. https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Sat Jan 14 03:09:33 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 13 Jan 2012 18:09:33 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F10E3DD.1010200@g.nevcal.com> On 1/13/2012 5:35 PM, Victor Stinner wrote: >> - Glenn Linderman proposes to fix the vulnerability by adding a new >> "safe" dict type (only accepting string keys). His proof-of-concept >> (SafeDict.py) uses a secret of 64 random bits and uses it to compute >> the hash of a key. > We could mix Marc's collision counter with SafeDict idea (being able > to use a different secret for each dict): use hash(key, secret) > (simple example: hash(secret+key)) instead of hash(key) in dict (and > set), and change the secret if we have more than N collisions. But it > would slow down all dict lookup (dict creation, get, set, del, ...). > And getting new random data can also be slow. > > SafeDict and hash(secret+key) lose the benefit of the cached hash > result. Because the hash result depends on a argument, we cannot cache > the result anymore, and we have to recompute the hash for each lookup > (even if you lookup the same key twice ore more). > > Victor So integrating SafeDict into dict so it could be automatically converted would mean changing the data structures underneath dict. Given that, a technique for hash caching could be created, that isn't quite as good as the one in place, but may be less expensive than not caching the hashes. It would also take more space, a second dict, internally, as well as the secret. So once the collision counter reaches some threshold (since there would be a functional fallback, it could be much lower than 1000), the secret is obtained, and the keys are rehashed using hash(secret+key). Now when lookups occur, the object id of the key and the hash of the key are used as the index and hash(secret+key) is stored as a cached value. This would only benefit lookups by the same object, other objects with the same key value would be recalculated (at least the first time). Some limit on the number of cached values would probably be appropriate. This would add complexity, of course, in trying to save time. An alternate solution would be to convert a dict to a tree once the number of collisions produces poor performance. Converting to a tree would result in O(log N) instead of O(1) lookup performance, but that is better than the degenerate case of O(N) which is produced by the excessive number of collisions resulting from an attack. This would require new tree code to be included in the core, of course, probably a red-black tree, which stays balanced. In either of these cases, the conversion is expensive, because a collision threshold must first be reached to determine the need for conversion, so the hash could already contain lots of data. If it were too expensive, the attack could still be effective. Another solution would be to change the collision code, so that colliding keys don't produce O(N) behavior, but some other behavior. Each colliding entry could convert that entry to a tree of entries, perhaps. This would require no conversion of "bad dicts", and an attack could at worst convert O(1) performance to O(log N). Clearly these ideas are more complex than adding randomization, but adding randomization doesn't seem to be produce immunity from attack, when data about the randomness is leaked. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sat Jan 14 03:25:49 2012 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 13 Jan 2012 18:25:49 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F10E3DD.1010200@g.nevcal.com> References: <4F10E3DD.1010200@g.nevcal.com> Message-ID: > > > Clearly these ideas are more complex than adding randomization, but adding > randomization doesn't seem to be produce immunity from attack, when data > about the randomness is leaked. > Which will not normally happen. I'm firmly in the camp that believes the random seed can be probed and determined by creatively injecting values and measuring timing of things. But doing that is difficult and time and bandwidth intensive so the per process random hash seed is good enough. There's another elephant in the room here, if you want to avoid this attack use a 64-bit Python build as it uses 64-bit hash values that are significantly more difficult to force a collision on. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sat Jan 14 03:34:48 2012 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 13 Jan 2012 18:34:48 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F10E3DD.1010200@g.nevcal.com> Message-ID: btw, Tim's commit message on this one is amusingly relevant. :) http://hg.python.org/cpython/diff/8d2bbbbf2cb9/Objects/dictobject.c On Fri, Jan 13, 2012 at 6:25 PM, Gregory P. Smith wrote: > >> Clearly these ideas are more complex than adding randomization, but >> adding randomization doesn't seem to be produce immunity from attack, when >> data about the randomness is leaked. >> > > Which will not normally happen. > > I'm firmly in the camp that believes the random seed can be probed and > determined by creatively injecting values and measuring timing of things. > But doing that is difficult and time and bandwidth intensive so the per > process random hash seed is good enough. > > There's another elephant in the room here, if you want to avoid this > attack use a 64-bit Python build as it uses 64-bit hash values that are > significantly more difficult to force a collision on. > > -gps > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 14 03:55:22 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 14 Jan 2012 13:55:22 +1100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: <4F10EE9A.4060703@pearwood.info> On 14/01/12 12:58, Gregory P. Smith wrote: > I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be > back ported to any Python version. > > It is perfectly okay to break existing users who had anything depending on > ordering of internal hash tables. Their code was already broken. For the record: steve at runes:~$ python -c "print(hash('spam ham'))" -376510515 steve at runes:~$ jython -c "print(hash('spam ham'))" 2054637885 So it is already the case that Python code that assumes stable hashing is broken. For what it's worth, I'm not convinced that we should be overly-concerned by "poor saps" (Guido's words) who rely on accidents of implementation regarding hash. We shouldn't break their code unless we have a good reason, but this strikes me as a good reason. The documentation for hash certainly makes no promise about stability, and relying on it strikes me as about as sensible as relying on the stability of error messages. I'm also not convinced that the option to raise an exception after 1000 collisions actually solves the problem. That relies on the application being re-written to catch the exception and recover from it (how?). Otherwise, all it does is change the attack vector from "cause an indefinite number of hash collisions" to "cause 999 hash collisions followed by crashing the application with an exception", which doesn't strike me as much of an improvement. +1 on random seeding. Default to on in 3.3+ and default to off in older versions, which allows people to avoid breaking their code until they're ready for it to be broken. -- Steven From greg at krypto.org Sat Jan 14 04:06:00 2012 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 13 Jan 2012 19:06:00 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith wrote: > > On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum wrote: > >> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: >> >>> On Thu, 12 Jan 2012 18:57:42 -0800 >>> Guido van Rossum wrote: >>> > Hm... I started out as a big fan of the randomized hash, but thinking >>> more >>> > about it, I actually believe that the chances of some legitimate app >>> having >>> > >1000 collisions are way smaller than the chances that somebody's code >>> will >>> > break due to the variable hashing. >>> >>> Breaking due to variable hashing is deterministic: you notice it as >>> soon as you upgrade (and then you use PYTHONHASHSEED to disable >>> variable hashing). That seems better than unpredictable breaking when >>> some legitimate collision chain happens. >> >> >> Fair enough. But I'm now uncomfortable with turning this on for bugfix >> releases. I'm fine with making this the default in 3.3, just not in 3.2, >> 3.1 or 2.x -- it will break too much code and organizations will have to >> roll back the release or do extensive testing before installing a bugfix >> release -- exactly what we *don't* want for those. >> >> FWIW, I don't believe in the SafeDict solution -- you never know which >> dicts you have to change. >> >> > Agreed. > > Of the three options Victor listed only one is good. > > I don't like *SafeDict*. *-1*. It puts the onerous on the coder to > always get everything right with regards to data that came from outside the > process never ending up hashed in a non-safe dict or set *anywhere*. > "Safe" needs to be the default option for all hash tables. > > I don't like the "*too many hash collisions*" exception. *-1*. It > provides non-deterministic application behavior for data driven > applications with no way for them to predict when it'll happen or where and > prepare for it. It may work in practice for many applications but is simply > odd behavior. > > I do like *randomly seeding the hash*. *+1*. This is easy. It can easily > be back ported to any Python version. > > It is perfectly okay to break existing users who had anything depending on > ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the > feature off at their own peril which they can use in their test harnesses > that are stupid enough to use doctests with order dependencies. > What an implementation looks like: http://pastebin.com/9ydETTag some stuff to be filled in, but this is all that is really required. add logic to allow a particular seed to be specified or forced to 0 from the command line or environment. add the logic to grab random bytes. add the autoconf glue to disable it. done. -gps > This approach worked fine for Perl 9 years ago. > https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 > > -gps > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sat Jan 14 04:19:38 2012 From: barry at python.org (Barry Warsaw) Date: Sat, 14 Jan 2012 04:19:38 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: <20120114041938.098fd14b@rivendell> On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote: >On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: > >> Breaking due to variable hashing is deterministic: you notice it as >> soon as you upgrade (and then you use PYTHONHASHSEED to disable >> variable hashing). That seems better than unpredictable breaking when >> some legitimate collision chain happens. > > >Fair enough. But I'm now uncomfortable with turning this on for bugfix >releases. I'm fine with making this the default in 3.3, just not in 3.2, >3.1 or 2.x -- it will break too much code and organizations will have to >roll back the release or do extensive testing before installing a bugfix >release -- exactly what we *don't* want for those. +1 -Barry From merwok at netwok.org Sat Jan 14 04:24:52 2012 From: merwok at netwok.org (=?UTF-8?Q?=C3=89ric_Araujo?=) Date: Sat, 14 Jan 2012 04:24:52 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: "\"<4E4AF610.5040303@simplistix.co.uk>" " Message-ID: Hi Sandro, Thanks for getting the ball rolling on this. One style for markup, one Sphinx version to code our extensions against and one location for the documenting guidelines will make our work a bit easier. > During the build process, there are some warnings that I can > understand: I assume you mean ?can?t?, as you later ask how to fix them. As a general rule, they?re only warnings, so they don?t break the build, only some links or stylings, so I think it?s okay to ignore them *right now*. > Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal That?s a mistake I did in cefe4f38fa0e. This sentence should be removed. > Doc/library/stdtypes.rst:2372: WARNING: more than one target found > for > cross-reference u'next': Need to use :meth:`.next` to let Sphinx find the right target (more info on request :) > Doc/library/sys.rst:651: WARNING: unknown keyword: None Should use ``None``. > Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in > Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not I don?t know if these should work (i.e. create a link to the appropriate language reference section) or abuse the markup (there are ?not? and ?in? keywords, but no ?not in? keyword ? use ``not in``). I?d say ignore them. Cheers From martin at v.loewis.de Sat Jan 14 04:45:57 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 14 Jan 2012 04:45:57 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> > What an implementation looks like: > > http://pastebin.com/9ydETTag > > some stuff to be filled in, but this is all that is really required. I think this statement (and the patch) is wrong. You also need to change the byte string hashing, at least for 2.x. This I consider the biggest flaw in that approach - other people may have written string-like objects which continue to compare equal to a string but now hash different. Regards, Martin From guido at python.org Sat Jan 14 05:00:54 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 13 Jan 2012 20:00:54 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith wrote: > It is perfectly okay to break existing users who had anything depending on > ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the > feature off at their own peril which they can use in their test harnesses > that are stupid enough to use doctests with order dependencies. No, that is not how we usually take compatibility between bugfix releases. "Your code is already broken" is not an argument to break forcefully what worked (even if by happenstance) before. The difference between CPython and Jython (or between different CPython feature releases) also isn't relevant -- historically we have often bent over backwards to avoid changing behavior that was technically undefined, if we believed it would affect a significant fraction of users. I don't think anyone doubts that this will break lots of code (at least, the arguments I've heard have been "their code is broken", not "nobody does that"). This approach worked fine for Perl 9 years ago. > https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 > I don't know what the Perl attitude about breaking undefined behavior between micro versions was at the time. But ours is pretty clear -- don't do it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 14 06:16:32 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Jan 2012 15:16:32 +1000 Subject: [Python-Dev] [Python-checkins] cpython: add test, which was missing from d64ac9ab4cd0 In-Reply-To: References: Message-ID: On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson wrote: > http://hg.python.org/cpython/rev/be85914b611c > changeset: ? 74363:be85914b611c > parent: ? ? ?74361:609482c6710e > user: ? ? ? ?Benjamin Peterson > date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500 > summary: > ?add test, which was missing from d64ac9ab4cd0 Ah, that's where that came from, thanks. I still haven't fully trained myself to use hg import instead of patch, which would avoid precisely this kind of error :P Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sat Jan 14 06:43:04 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 14 Jan 2012 00:43:04 -0500 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > It is perfectly okay to break existing users who had anything depending > on ordering of internal hash tables. Their code was already broken. Given that the doc says "Return the hash value of the object", I do not think we should be so hard-nosed. The above clearly implies that there is such a thing as *the* Python hash value for an object. And indeed, that has been true across many versions. If we had written "Return a hash value for the object, which can vary from run to run", the case would be different. -- Terry Jan Reedy From jackdied at gmail.com Sat Jan 14 07:24:54 2012 From: jackdied at gmail.com (Jack Diederich) Date: Sat, 14 Jan 2012 01:24:54 -0500 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum wrote: > Hm... I started out as a big fan of the randomized hash, but thinking more > about it, I actually believe that the chances of some legitimate app having >>1000 collisions are way smaller than the chances that somebody's code will > break due to the variable hashing. Python's dicts are designed to avoid hash conflicts by resizing and keeping the available slots bountiful. 1000 conflicts sounds like a number that couldn't be hit accidentally unless you had a single dict using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're good). The hashes also look to exploit cache locality but that is very unlikely to get one thousand conflicts by chance. If you get that many there is an attack. > This is depending on how the counting is done (I didn't look at MAL's > patch), and assuming that increasing the hash table size will generally > reduce collisions if items collide but their hashes are different. The patch counts conflicts on an individual insert and not lifetime conflicts. Looks sane to me. > That said, even with collision counting I'd like a way to disable it without > changing the code, e.g. a flag or environment variable. Agreed. Paranoid people can turn the behavior off and if it ever were to become a problem in practice we could point people to a solution. -Jack From ncoghlan at gmail.com Sat Jan 14 07:53:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Jan 2012 16:53:39 +1000 Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682) In-Reply-To: References: Message-ID: On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl wrote: > On 01/13/2012 12:43 PM, nick.coghlan wrote: >> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst > > There should probably be a "versionadded" somewhere on this page. Good catch, I added versionchanged notes to this page, simple_stmts and the StopIteration entry in the library reference. >> ?PEP 3155: Qualified name for classes and functions >> ?================================================== > > This looks like a spurious (and syntax-breaking) change. Yeah, it was an error I introduced last time I merged from default. Fixed. >> diff --git a/Grammar/Grammar b/Grammar/Grammar >> -argument: test [comp_for] | test '=' test ?# Really [keyword '='] test >> +argument: (test) [comp_for] | test '=' test ?# Really [keyword '='] test > > This looks like a change without effect? Fixed. It was a lingering after-effect of Greg's original patch (which also modified the function call syntax to allow "yield from" expressions with extra parens). I reverted the change to the function call syntax, but forgot to ditch the added parens while doing so. >> diff --git a/Include/genobject.h b/Include/genobject.h >> >> - ? ? /* List of weak reference. */ >> - ? ? PyObject *gi_weakreflist; >> + ? ? ? ?/* List of weak reference. */ >> + ? ? ? ?PyObject *gi_weakreflist; >> ?} PyGenObject; > > While these change tabs into spaces, it should be 4 spaces, not 8. Fixed. >> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); > > Does this API need to be public? If yes, it needs to be documented. Hmm, good point - that one needs a bit of thought, so I've put it on the tracker: http://bugs.python.org/issue13783 (that issue also covers your comments regarding the docstring for this function and whether or not we even need the StopIteration instance creation API) >> -#define CALL_FUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */ >> -#define MAKE_FUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >> -#define BUILD_SLICE ?133 ? ? /* Number of items */ >> +#define CALL_FUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */ >> +#define MAKE_FUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >> +#define BUILD_SLICE ? ? 133 ? ? /* Number of items */ > > Not sure putting these and all the other cosmetic changes into an already > big patch is such a good idea... I agree, but it's one of the challenges of a long-lived branch like the PEP 380 one (I believe some of these cosmetic changes started life in Greg's original patch and separating them out would have been quite a pain). Anyone that wants to see the gory details of the branch history can take a look at my bitbucket repo: https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 >> diff --git a/Objects/abstract.c b/Objects/abstract.c >> --- a/Objects/abstract.c >> +++ b/Objects/abstract.c >> @@ -2267,7 +2267,6 @@ >> >> ? ? ?func = PyObject_GetAttrString(o, name); >> ? ? ?if (func == NULL) { >> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name); >> ? ? ? ? ?return 0; >> ? ? ?} >> >> @@ -2311,7 +2310,6 @@ >> >> ? ? ?func = PyObject_GetAttrString(o, name); >> ? ? ?if (func == NULL) { >> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name); >> ? ? ? ? ?return 0; >> ? ? ?} >> ? ? ?va_start(va, format); > > These two changes also look suspiciously unrelated? IIRC, I removed those lines while working on the patch because the message they produce (just the attribute name) is worse than the one produced by the call to PyObject_GetAttrString (which also includes the type of the object being accessed). Leaving the original exceptions alone helped me track down some failures I was getting at the time. I've now made the various CallMethod helper APIs in abstract.c (1 public, 3 private) consistently leave the GetAttr exception alone and added an explicit C API note to NEWS. (Vaguely related tangent: the new code added by the patch probably has a few parts that could benefit from the new GetAttrId private API) >> diff --git a/Objects/genobject.c b/Objects/genobject.c >> + ? ? ? ?} else { >> + ? ? ? ? ? ?PyObject *e = PyStopIteration_Create(result); >> + ? ? ? ? ? ?if (e != NULL) { >> + ? ? ? ? ? ? ? ?PyErr_SetObject(PyExc_StopIteration, e); >> + ? ? ? ? ? ? ? ?Py_DECREF(e); >> + ? ? ? ? ? ?} > > Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here > anyway? I think you're right - so noted in the tracker issue about the C API additions. Thanks for the thorough review, a fresh set of eyes is very helpful :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 14 08:01:48 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Jan 2012 17:01:48 +1000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich wrote: >> This is depending on how the counting is done (I didn't look at MAL's >> patch), and assuming that increasing the hash table size will generally >> reduce collisions if items collide but their hashes are different. > > The patch counts conflicts on an individual insert and not lifetime > conflicts. ?Looks sane to me. Having a hard limit on the worst-case behaviour certainly sounds like an attractive prospect. And there's nothing to worry about in terms of secrecy or sufficient randomness - by default, attackers cannot generate more than 1000 hash collisions in one lookup, period. >> That said, even with collision counting I'd like a way to disable it without >> changing the code, e.g. a flag or environment variable. > > Agreed. ?Paranoid people can turn the behavior off and if it ever were > to become a problem in practice we could point people to a solution. Does MAL's patch allow the limit to be set on a per-dict basis (including setting it to None to disable collision limiting completely)? If people have data sets that need to tolerate that kind of collision level (and haven't already decided to move to a data structure other than the builtin dict), then it may make sense to allow them to remove the limit when using trusted input. For maintenance versions though, it would definitely need to be possible to switch it off without touching the code. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From g.brandl at gmx.net Sat Jan 14 08:53:59 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 14 Jan 2012 08:53:59 +0100 Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682) In-Reply-To: References: Message-ID: On 01/14/2012 07:53 AM, Nick Coghlan wrote: >>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); >> >> Does this API need to be public? If yes, it needs to be documented. > > Hmm, good point - that one needs a bit of thought, so I've put it on > the tracker: http://bugs.python.org/issue13783 > > (that issue also covers your comments regarding the docstring for this > function and whether or not we even need the StopIteration instance > creation API) Great. >>> -#define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ >>> -#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >>> -#define BUILD_SLICE 133 /* Number of items */ >>> +#define CALL_FUNCTION 131 /* #args + (#kwargs<<8) */ >>> +#define MAKE_FUNCTION 132 /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >>> +#define BUILD_SLICE 133 /* Number of items */ >> >> Not sure putting these and all the other cosmetic changes into an already >> big patch is such a good idea... > > I agree, but it's one of the challenges of a long-lived branch like > the PEP 380 one (I believe some of these cosmetic changes started life > in Greg's original patch and separating them out would have been quite > a pain). Anyone that wants to see the gory details of the branch > history can take a look at my bitbucket repo: > > https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 I see. I hadn't followed the development of PEP 380 closely before. In any case, it is probably a good idea to mention this branch URL in the commit message in case it is meant to be kept permanently (it would also be possible to put only that branch of your sandbox into another clone at hg.python.org). >>> diff --git a/Objects/abstract.c b/Objects/abstract.c >>> --- a/Objects/abstract.c >>> +++ b/Objects/abstract.c >>> @@ -2267,7 +2267,6 @@ >>> >>> func = PyObject_GetAttrString(o, name); >>> if (func == NULL) { >>> - PyErr_SetString(PyExc_AttributeError, name); >>> return 0; >>> } >>> >>> @@ -2311,7 +2310,6 @@ >>> >>> func = PyObject_GetAttrString(o, name); >>> if (func == NULL) { >>> - PyErr_SetString(PyExc_AttributeError, name); >>> return 0; >>> } >>> va_start(va, format); >> >> These two changes also look suspiciously unrelated? > > IIRC, I removed those lines while working on the patch because the > message they produce (just the attribute name) is worse than the one > produced by the call to PyObject_GetAttrString (which also includes > the type of the object being accessed). Leaving the original > exceptions alone helped me track down some failures I was getting at > the time. I agree that it's useful. > I've now made the various CallMethod helper APIs in abstract.c (1 > public, 3 private) consistently leave the GetAttr exception alone and > added an explicit C API note to NEWS. > > (Vaguely related tangent: the new code added by the patch probably has > a few parts that could benefit from the new GetAttrId private API) Maybe another candidate for an issue, so that we don't forget? cheers, Georg From chris at simplistix.co.uk Fri Jan 13 21:11:36 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 13 Jan 2012 20:11:36 +0000 Subject: [Python-Dev] PEP 380 ("yield from") is now Final In-Reply-To: References: Message-ID: <4F108FF8.3010800@simplistix.co.uk> Finally, a reason to use Python 3 ;-) Chris On 13/01/2012 16:00, Guido van Rossum wrote: > AWESOME!!! > > On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan > wrote: > > I marked PEP 380 as Final this evening, after pushing the tested and > documented implementation to hg.python.org : > http://hg.python.org/cpython/rev/d64ac9ab4cd0 > > As the list of names in the NEWS and What's New entries suggests, it > was quite a collaborative effort to get this one over the line, and > that's without even listing all the people that offered helpful > suggestions and comments along the way :) > > print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))()))) > > -- > --Guido van Rossum (python.org/~guido ) > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/chris%40simplistix.co.uk -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From stephen at xemacs.org Sat Jan 14 09:05:24 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 14 Jan 2012 17:05:24 +0900 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <87aa5q7lq3.fsf@uwakimon.sk.tsukuba.ac.jp> Jack Diederich writes: > On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum wrote: > > Hm... I started out as a big fan of the randomized hash, but thinking more > > about it, I actually believe that the chances of some legitimate app having > >>1000 collisions are way smaller than the chances that somebody's code will > > break due to the variable hashing. > > Python's dicts are designed to avoid hash conflicts by resizing and > keeping the available slots bountiful. 1000 conflicts sounds like a > number that couldn't be hit accidentally I may be missing something, but AIUI, with the resize, the search for an unused slot after collision will be looking in a different series of slots, so the N counter for the N^2 behavior resets on resize. If not, you can delete this message now. If so, since (a) in the error-on-many-collisions approach we're adding a test here for collision count anyway and (b) we think this is almost never gonna happen, can't we defuse the exploit by just resizing the dict after 1000 collisions, with strictly better performance than the error approach, and almost current performance for "normal" input? In order to prevent attackers from exploiting every 1000th collision to force out-of-memory, the expansion factor for collision-induced resizing could be "very small". (I don't know if that's possible in the Python dict implementation, if the algorithm requires something like doubling the dict size on every resize this is right out, of course.) Or, since this is an error/rare path anyway, offer the user a choice of an error or a resize on hitting 1000 collisions? From solipsis at pitrou.net Sat Jan 14 09:33:02 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 14 Jan 2012 09:33:02 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: <20120114021708.2fbe990f@pitrou.net> <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> Message-ID: <20120114093302.13fbd473@pitrou.net> On Sat, 14 Jan 2012 04:45:57 +0100 martin at v.loewis.de wrote: > > What an implementation looks like: > > > > http://pastebin.com/9ydETTag > > > > some stuff to be filled in, but this is all that is really required. > > I think this statement (and the patch) is wrong. You also need to change > the byte string hashing, at least for 2.x. This I consider the biggest > flaw in that approach - other people may have written string-like objects > which continue to compare equal to a string but now hash different. They're unlikely to have rewritten the hash algorithm by hand - especially given the caveats wrt. differences between Python integers and C integers. Rather, they would have returned the hash() of the equivalent str or unicode object. Regards Antoine. From solipsis at pitrou.net Sat Jan 14 09:33:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 14 Jan 2012 09:33:28 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: <20120114021708.2fbe990f@pitrou.net> <4F10EE9A.4060703@pearwood.info> Message-ID: <20120114093328.2faba43c@pitrou.net> On Sat, 14 Jan 2012 13:55:22 +1100 Steven D'Aprano wrote: > On 14/01/12 12:58, Gregory P. Smith wrote: > > > I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be > > back ported to any Python version. > > > > It is perfectly okay to break existing users who had anything depending on > > ordering of internal hash tables. Their code was already broken. > > For the record: > > steve at runes:~$ python -c "print(hash('spam ham'))" > -376510515 > steve at runes:~$ jython -c "print(hash('spam ham'))" > 2054637885 Not to mention: $ ./python -c "print(hash('spam ham'))" -6071355389066156083 (64-bit CPython) Regards Antoine. From martin at v.loewis.de Sat Jan 14 13:09:40 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 14 Jan 2012 13:09:40 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <20120114093302.13fbd473@pitrou.net> References: <20120114021708.2fbe990f@pitrou.net> <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> <20120114093302.13fbd473@pitrou.net> Message-ID: <20120114130940.Horde.zymPDKGZi1VPEXCErcdB3uA@webmail.df.eu> >> I think this statement (and the patch) is wrong. You also need to change >> the byte string hashing, at least for 2.x. This I consider the biggest >> flaw in that approach - other people may have written string-like objects >> which continue to compare equal to a string but now hash different. > > They're unlikely to have rewritten the hash algorithm by hand - > especially given the caveats wrt. differences between Python integers > and C integers. See the CHAR_HASH macro in http://hg.python.org/cpython/file/e78f00dbd7ae/Modules/expat/xmlparse.c It's not *that* unlikely that more copies of that algorithm exist. Regards, Martin From ncoghlan at gmail.com Sat Jan 14 15:04:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jan 2012 00:04:55 +1000 Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes #11682) Message-ID: On Jan 14, 2012 5:56 PM, "Georg Brandl" wrote: > > On 01/14/2012 07:53 AM, Nick Coghlan wrote: > > I agree, but it's one of the challenges of a long-lived branch like > > the PEP 380 one (I believe some of these cosmetic changes started life > > in Greg's original patch and separating them out would have been quite > > a pain). Anyone that wants to see the gory details of the branch > > history can take a look at my bitbucket repo: > > > > https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 > > I see. I hadn't followed the development of PEP 380 closely before. > > In any case, it is probably a good idea to mention this branch URL in the > commit message in case it is meant to be kept permanently (it would also be > possible to put only that branch of your sandbox into another clone at > hg.python.org). You're right we should have a PSF-controlled copy of the entire branch history in cases like this. I actually still keep an irregularly updated clone of my entire sandbox repo on hg.python.org (that's actually where it started), so I'll refresh that and add a link to the pep380 branch history into the tracker item that covered the PEP 380 integration into 3.3. > > (Vaguely related tangent: the new code added by the patch probably has > > a few parts that could benefit from the new GetAttrId private API) > > Maybe another candidate for an issue, so that we don't forget? I just added a note about it to the C API cleanup tracker item. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandro.tosi at gmail.com Sat Jan 14 15:31:31 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Sat, 14 Jan 2012 15:31:31 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: <4E4AF610.5040303@simplistix.co.uk> Message-ID: On Sat, Jan 14, 2012 at 04:24, ?ric Araujo wrote: > Hi Sandro, > > Thanks for getting the ball rolling on this. ?One style for markup, one > Sphinx version to code our extensions against and one location for the > documenting guidelines will make our work a bit easier. thanks :) I'm happy to help! >> During the build process, there are some warnings that I can understand: > > I assume you mean ?can?t?, as you later ask how to fix them. ?As a yes, indeed > general rule, they?re only warnings, so they don?t break the build, only > some links or stylings, so I think it?s okay to ignore them *right now*. but I like to get them fixed nonetheless: after all, the current build doesn't show warnings - but I agree it's a non-blocking issue. >> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal > > That?s a mistake I did in cefe4f38fa0e. ?This sentence should be removed. Do you mean revert this whole hunk: @@ -480,10 +516,11 @@ nested scope The ability to refer to a variable in an enclosing definition. For instance, a function defined inside another function can refer to - variables in the outer function. Note that nested scopes work only for - reference and not for assignment which will always write to the innermost - scope. In contrast, local variables both read and write in the innermost - scope. Likewise, global variables read and write to the global namespace. + variables in the outer function. Note that nested scopes by default work + only for reference and not for assignment. Local variables both read and + write in the innermost scope. Likewise, global variables read and write + to the global namespace. The :keyword:`nonlocal` allows writing to outer + scopes. new-style class Any class which inherits from :class:`object`. This includes all built-in or just "The :keyword:`nonlocal` allows writing to outer scopes."? >> Doc/library/stdtypes.rst:2372: WARNING: more than one target found for >> cross-reference u'next': > > Need to use :meth:`.next` to let Sphinx find the right target (more info > on request :) it seems what it needed to was :meth:`next` (without the dot). The current page links all 'next' in file.next() to functions.html#next, and using :meth:`next` does that. >> Doc/library/sys.rst:651: WARNING: unknown keyword: None > > Should use ``None``. fixed >> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in >> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not > > I don?t know if these should work (i.e. create a link to the appropriate > language reference section) or abuse the markup (there are ?not? and > ?in? keywords, but no ?not in? keyword ? use ``not in``). ?I?d say ignore > them. ACK, but I'm willing to fix them if someone tells me how to :) I'm going to prepare the patches and then push - i'll send a heads-up afterward. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From martin at v.loewis.de Sat Jan 14 16:12:19 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 14 Jan 2012 16:12:19 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F119B53.2050602@v.loewis.de> Am 13.01.2012 18:08, schrieb Mark Dickinson: > On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum wrote: >> How >> pathological the data needs to be before the collision counter triggers? I'd >> expect *very* pathological. > > How pathological do you consider the set > > {1 << n for n in range(2000)} > > to be? I think this is not a counter-example for the proposed algorithm (at least not in the way I think it should be implemented). Those values may collide on the slot in the set, but they don't collide on the actual hash value. So in order to determine whether the collision limit is exceeded, we shouldn't count colliding slots, but colliding hash values (which we will all encounter during an insert). > though admittedly only around 30 collisions per hash value. I do consider the case of hashing integers with only one bit set pathological. However, this can be overcome by factoring the magnitude of the number into the hash as well. Regards, Martin From martin at v.loewis.de Sat Jan 14 16:17:59 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 14 Jan 2012 16:17:59 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F119CA7.1070903@v.loewis.de> Am 14.01.2012 01:37, schrieb Benjamin Peterson: > 2012/1/13 Guido van Rossum : >> Really? Even though you came up with specifically to prove me wrong? > > Coming up with a counterexample now invalidates it? There are two concerns here: - is it possible to come up with an example of constructed values that show many collisions in a way that poses a threat? To this, the answer is apparently "yes", and the proposed reaction is to hard-limit the number of collisions accepted by the implementation. - then, *assuming* such a limitation is in place: is it possible to come up with a realistic application that would break under this limitation. Mark's example is no such realistic application, instead, it is yet another example demonstrating collisions using constructed values (although the specific example would continue to work fine even under the limitation). A valid counterexample would have to come from a real application, or at least from a scenario that is plausible for a real application. Regards, Martin From sandro.tosi at gmail.com Sat Jan 14 17:14:05 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Sat, 14 Jan 2012 17:14:05 +0100 Subject: [Python-Dev] 2.7 now uses Sphinx 1.0 Message-ID: Hello, just a heads-up: documentation for 2.7 branch has been ported to use sphinx 1.0, so now the same syntax can be used for 2.x and 3.x patches, hopefully easying working on both python stacks. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From sandro.tosi at gmail.com Sat Jan 14 19:09:10 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Sat, 14 Jan 2012 19:09:10 +0100 Subject: [Python-Dev] "Documenting Python" is moving to devguide Message-ID: Hi all, (another) heads-up about my current work: I've just pushed the "Documenting Python" doc section (ftr: http://docs.python.org/documenting/index.html) to devguide. That was possibile now that we use the same sphinx version on all the active branches. It was not a re-editing of the content, that might still be outdated and in need of work, but just a brutal cut & paste of the current files. Now that we have a central place, additional editing will be much more easy. The section is still available in the cpython repo, and I'm waiting to remove it because it's better to have some redirections in place from the current urls to the new ones. I've prepared a small set of RewriteRules (attached): I don't know the actual setup of apache for docs.p.o but at least they are a start :) whomever has root access, could please review & apply those rules? Once the rewrites are in place, i'll take care of removing the Doc/documenting dir from the active branches. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi -------------- next part -------------- RewriteEngine On RewriteRule /documenting/$ /devguide/documenting.html [NE,R=permanent,L] RewriteRule /documenting/index.html /devguide/documenting.html [NE,R=permanent,L] RewriteRule /documenting/intro.html /devguide/documenting.html#introduction [NE,R=permanent,L] RewriteRule /documenting/style.html /devguide/documenting.html#style-guide [NE,R=permanent,L] RewriteRule /documenting/rest.html /devguide/documenting.html#restructuredtext-primer [NE,R=permanent,L] RewriteRule /documenting/markup.html /devguide/documenting.html#additional-markup-constructs [NE,R=permanent,L] RewriteRule /documenting/fromlatex.html /devguide/documenting.html#differences-to-the-latex-markup [NE,R=permanent,L] RewriteRule /documenting/building.html /devguide/documenting.html#building-the-documentation [NE,R=permanent,L] From greg at krypto.org Sat Jan 14 20:17:01 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 14 Jan 2012 11:17:01 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> References: <20120114021708.2fbe990f@pitrou.net> <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> Message-ID: My patch example does change the bytes object hash as well as Unicode. On Jan 13, 2012 7:46 PM, wrote: > What an implementation looks like: >> >> http://pastebin.com/9ydETTag >> >> some stuff to be filled in, but this is all that is really required. >> > > I think this statement (and the patch) is wrong. You also need to change > the byte string hashing, at least for 2.x. This I consider the biggest > flaw in that approach - other people may have written string-like objects > which continue to compare equal to a string but now hash different. > > Regards, > Martin > > > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandro.tosi at gmail.com Sat Jan 14 22:34:52 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Sat, 14 Jan 2012 22:34:52 +0100 Subject: [Python-Dev] "Documenting Python" is moving to devguide In-Reply-To: References: Message-ID: Hi again, On Sat, Jan 14, 2012 at 19:09, Sandro Tosi wrote: > Hi all, > (another) heads-up about my current work: I've just pushed the > "Documenting Python" doc section (ftr: > http://docs.python.org/documenting/index.html) to devguide. That was > possibile now that we use the same sphinx version on all the active > branches. > > It was not a re-editing of the content, that might still be outdated > and in need of work, but just a brutal cut & paste of the current > files. Now that we have a central place, additional editing will be > much more easy. > > The section is still available in the cpython repo, and I'm waiting to > remove it because it's better to have some redirections in place from > the current urls to the new ones. I've prepared a small set of > RewriteRules (attached): I don't know the actual setup of apache for > docs.p.o but at least they are a start :) whomever has root access, > could please review & apply those rules? Thanks to Georg that applied the rewrites both for 2.7 and 3.2 . > Once the rewrites are in place, i'll take care of removing the > Doc/documenting dir from the active branches. and so Doc/documenting is gone on all the active branches. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From greg at krypto.org Sun Jan 15 02:31:34 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 14 Jan 2012 17:31:34 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA@webmail.df.eu> Message-ID: FWIW the quick change i pastebin'ed is basically covered by the change already under review in http://bugs.python.org/review/13704/show. I've made my comments and suggestions there. I looked into Modules/expat/xmlparse.c and it has an odd copy of the old string hash algorithm entirely for its own internal use and its own internal hash table implementations. That module is likely vulnerable to creatively crafted documents for the same reason. With 13704 and the public API it provides to get the random hash seed, that module could simply be updated to use that in its own hash implementation. As for when to enable it or not, I unfortunately have to agree, despite my wild desires we can't turn on the hash randomization change by default in anything prior to 3.3. -gps On Sat, Jan 14, 2012 at 11:17 AM, Gregory P. Smith wrote: > My patch example does change the bytes object hash as well as Unicode. > On Jan 13, 2012 7:46 PM, wrote: > >> What an implementation looks like: >>> >>> http://pastebin.com/9ydETTag >>> >>> some stuff to be filled in, but this is all that is really required. >>> >> >> I think this statement (and the patch) is wrong. You also need to change >> the byte string hashing, at least for 2.x. This I consider the biggest >> flaw in that approach - other people may have written string-like objects >> which continue to compare equal to a string but now hash different. >> >> Regards, >> Martin >> >> >> ______________________________**_________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/**mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** >> greg%40krypto.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jan 15 05:42:59 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 15 Jan 2012 15:42:59 +1100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F125953.5060309@pearwood.info> Victor Stinner wrote: > - Marc Andre Lemburg proposes to fix the vulnerability directly in > dict (for any key type). The patch raises an exception if a lookup > causes more than 1000 collisions. Am I missing something? How does this fix the vulnerability? It seems to me that the only thing this does is turn one sort of DOS attack into another sort of DOS attack: hostile users will just cause hash collisions until an exception is raised and the application falls over. Catching these exceptions, and recovering from them (how?), would be the responsibility of the application author. Given that developers are unlikely to ever see 1000 collisions by accident, or even realise that it could happen, I don't expect that many people will do that -- until they personally get bitten. -- Steven From steve at pearwood.info Sun Jan 15 05:49:50 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 15 Jan 2012 15:49:50 +1100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: <4F125AEE.3050702@pearwood.info> Guido van Rossum wrote: > On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith wrote: > >> It is perfectly okay to break existing users who had anything depending on >> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the >> feature off at their own peril which they can use in their test harnesses >> that are stupid enough to use doctests with order dependencies. > > > No, that is not how we usually take compatibility between bugfix releases. > "Your code is already broken" is not an argument to break forcefully what > worked (even if by happenstance) before. The difference between CPython and > Jython (or between different CPython feature releases) also isn't relevant > -- historically we have often bent over backwards to avoid changing > behavior that was technically undefined, if we believed it would affect a > significant fraction of users. > > I don't think anyone doubts that this will break lots of code (at least, > the arguments I've heard have been "their code is broken", not "nobody does > that"). I don't know about "lots" of code, but it will break at least one library (or so I'm told): http://mail.python.org/pipermail/python-list/2012-January/1286535.html -- Steven From ncoghlan at gmail.com Sun Jan 15 06:11:44 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Jan 2012 15:11:44 +1000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F125953.5060309@pearwood.info> References: <4F125953.5060309@pearwood.info> Message-ID: On Sun, Jan 15, 2012 at 2:42 PM, Steven D'Aprano wrote: > Victor Stinner wrote: > >> - Marc Andre Lemburg proposes to fix the vulnerability directly in >> dict (for any key type). The patch raises an exception if a lookup >> causes more than 1000 collisions. > > > > Am I missing something? How does this fix the vulnerability? It seems to me > that the only thing this does is turn one sort of DOS attack into another > sort of DOS attack: hostile users will just cause hash collisions until an > exception is raised and the application falls over. > > Catching these exceptions, and recovering from them (how?), would be the > responsibility of the application author. Given that developers are unlikely > to ever see 1000 collisions by accident, or even realise that it could > happen, I don't expect that many people will do that -- until they > personally get bitten. As I understand it, the way the attack works is that a *single* malicious request from the attacker can DoS the server by eating CPU resources while evaluating a massive collision chain induced in a dict by attacker supplied data. Explicitly truncating the collision chain boots them out almost immediately (likely with a 500 response for an internal server error), so they no longer affect other events, threads and processes on the same machine. In some ways, the idea is analogous to the way we implement explicit recursion limiting in an attempt to avoid actually blowing the C stack - we take a hard-to-detect-and-hard-to-handle situation (i.e. blowing the C stack or malicious generation of long collision chains in a dict) and replace it with something that is easy to detect and can be handled by normal exception processing (i.e. a recursion depth exception or one reporting an excessive number of slot collisions in a dict lookup). That then makes the default dict implementation safe from this kind of attack by default, and use cases that are getting that many collisions legitimately can be handled in one of two ways: - switch to a more appropriate data type (if you're getting that many collisions with benign data, a dict is probably the wrong container to be using) - offer a mechanism (command line switch or environment variable) to turn the collision limiting off Now, where you can still potentially run into problems is if a single shared dict is used to store both benign and malicious data - if the malicious data makes it into the destination dict before the exception finally gets triggered, and then benign data also happens to trigger the same collision chain, then yes, the entire app may fall over. However, such an app would have been crippled by the original DoS anyway, since its performance would have been gutted - the collision chain limiting just means it will trigger exceptions for the cases that would been insanely slow. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From eliben at gmail.com Sun Jan 15 07:33:16 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 15 Jan 2012 08:33:16 +0200 Subject: [Python-Dev] "Documenting Python" is moving to devguide In-Reply-To: References: Message-ID: > > The section is still available in the cpython repo, and I'm waiting to > > remove it because it's better to have some redirections in place from > > the current urls to the new ones. I've prepared a small set of > > RewriteRules (attached): I don't know the actual setup of apache for > > docs.p.o but at least they are a start :) whomever has root access, > > could please review & apply those rules? > > Thanks to Georg that applied the rewrites both for 2.7 and 3.2 . > > > Once the rewrites are in place, i'll take care of removing the > > Doc/documenting dir from the active branches. > > and so Doc/documenting is gone on all the active branches. > Good work Sandro, thanks! "Documenting Python" definitely belongs in the devguide Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From hs at ox.cx Sun Jan 15 13:15:05 2012 From: hs at ox.cx (Hynek Schlawack) Date: Sun, 15 Jan 2012 13:15:05 +0100 Subject: [Python-Dev] Status of the fix for the hash collision ulnerability In-Reply-To: <4F125AEE.3050702@pearwood.info> References: <20120114021708.2fbe990f@pitrou.net> <4F125AEE.3050702@pearwood.info> Message-ID: Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano: > > I don't think anyone doubts that this will break lots of code (at least, > > the arguments I've heard have been "their code is broken", not "nobody does > > that"). > > I don't know about "lots" of code, but it will break at least one library (or > so I'm told): > > http://mail.python.org/pipermail/python-list/2012-January/1286535.html Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on top of that, the development is in limbo ATM) From victor.stinner at haypocalc.com Sun Jan 15 15:27:55 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 15 Jan 2012 15:27:55 +0100 Subject: [Python-Dev] Status of the fix for the hash collision ulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> <4F125AEE.3050702@pearwood.info> Message-ID: I don't think that it would be hard to patch this library to use another hash function. It can implement its own hash function, use MD5, SHA1, or anything else. hash() is not stable accross Python versions and 32/64 bit systems. Victor 2012/1/15 Hynek Schlawack : > Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano: >> > I don't think anyone doubts that this will break lots of code (at least, >> > the arguments I've heard have been "their code is broken", not "nobody does >> > that"). >> >> I don't know about "lots" of code, but it will break at least one library (or >> so I'm told): >> >> http://mail.python.org/pipermail/python-list/2012-January/1286535.html > Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on top of that, the development is in limbo ATM) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/victor.stinner%40haypocalc.com From stefan_ml at behnel.de Sun Jan 15 15:30:59 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Jan 2012 15:30:59 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: Terry Reedy, 14.01.2012 06:43: > On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >> It is perfectly okay to break existing users who had anything depending >> on ordering of internal hash tables. Their code was already broken. > > Given that the doc says "Return the hash value of the object", I do not > think we should be so hard-nosed. The above clearly implies that there is > such a thing as *the* Python hash value for an object. And indeed, that has > been true across many versions. If we had written "Return a hash value for > the object, which can vary from run to run", the case would be different. Just a side note, but I don't think hash() is the right place to document this. Hashing is a protocol in Python, just like indexing or iteration. Nothing keeps an object from changing its hash value due to modification, and that would even be valid in the face of the usual dict lookup invariants if changes are only applied while the object is not referenced by any dict. So the guarantees do not depend on the function hash() and may be even weaker than your above statement. Stefan From lukasz at langa.pl Sun Jan 15 15:17:39 2012 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Sun, 15 Jan 2012 15:17:39 +0100 Subject: [Python-Dev] Dinsdale is no more Message-ID: Gentlemen, www.python.org is down at the moment. -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. From eliben at gmail.com Sun Jan 15 16:20:06 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 15 Jan 2012 17:20:06 +0200 Subject: [Python-Dev] Dinsdale is no more In-Reply-To: References: Message-ID: 2012/1/15 ?ukasz Langa > Gentlemen, www.python.org is down at the moment. > > Well, it's back now: http://www.downforeveryoneorjustme.com/python.org Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 15 17:10:54 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Jan 2012 08:10:54 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > Terry Reedy, 14.01.2012 06:43: > > On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > > > >> It is perfectly okay to break existing users who had anything depending > >> on ordering of internal hash tables. Their code was already broken. > > > > Given that the doc says "Return the hash value of the object", I do not > > think we should be so hard-nosed. The above clearly implies that there is > > such a thing as *the* Python hash value for an object. And indeed, that > has > > been true across many versions. If we had written "Return a hash value > for > > the object, which can vary from run to run", the case would be different. > > Just a side note, but I don't think hash() is the right place to document > this. You mean we shouldn't document that the hash() of a string will vary per run? > Hashing is a protocol in Python, just like indexing or iteration. > Nothing keeps an object from changing its hash value due to modification, > Eh? There's a huge body of cultural awareness that only immutable objects should define a hash, implying that the hash remains constant during the object's lifetime. > and that would even be valid in the face of the usual dict lookup > invariants if changes are only applied while the object is not referenced > by any dict. And how would you know it isn't? > So the guarantees do not depend on the function hash() and may > be even weaker than your above statement. > There are no actual guarantees for hash(), but lots of rules for well-behaved hashes. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Jan 15 17:46:36 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Jan 2012 17:46:36 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: Guido van Rossum, 15.01.2012 17:10: > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: >> Terry Reedy, 14.01.2012 06:43: >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: >>> >>>> It is perfectly okay to break existing users who had anything depending >>>> on ordering of internal hash tables. Their code was already broken. >>> >>> Given that the doc says "Return the hash value of the object", I do not >>> think we should be so hard-nosed. The above clearly implies that there is >>> such a thing as *the* Python hash value for an object. And indeed, that >> has >>> been true across many versions. If we had written "Return a hash value >> for >>> the object, which can vary from run to run", the case would be different. >> >> Just a side note, but I don't think hash() is the right place to document >> this. > > You mean we shouldn't document that the hash() of a string will vary per > run? No, I mean that the hash() builtin function is not the right place to document the behaviour of a string hash. That should go into the string object documentation. Although, arguably, it may be worth mentioning in the docs of hash() that, in general, hash values of builtin types are bound to the lifetime of the interpreter instance (or entire runtime?) and may change after restarts. I think that's a reasonable restriction to document that prominently, even if it will only apply to str for the time being. >> Hashing is a protocol in Python, just like indexing or iteration. >> Nothing keeps an object from changing its hash value due to modification, > > Eh? There's a huge body of cultural awareness that only immutable objects > should define a hash, implying that the hash remains constant during the > object's lifetime. > >> and that would even be valid in the face of the usual dict lookup >> invariants if changes are only applied while the object is not referenced >> by any dict. > > And how would you know it isn't? Well, if it's an object with a mutable hash then it's up to the application defining that object to make sure it's used in a sensible way. Immutability just makes your life easier. I can imagine that an object gets removed from a dict (say, a cache), modified and then reinserted, and I think it's valid to allow the modification to have an impact on the hash in this case, in order to accommodate for any changes to equality comparisons due to the modification. That being said, it seems that the Python docs actually consider constant hashes a requirement rather than a virtue. http://docs.python.org/glossary.html#term-hashable """ An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value. """ It also seems to me that the wording "has a hash value which never changes during its lifetime" makes it pretty clear that the lifetime of the hash value is not guaranteed to supersede the lifetime of the object (although that's a rather muddy definition - memory lifetime? or pickle-unpickle as well?). However, this entry in the glossary only seems to have appeared with Py2.6, likely as a result of the abc changes. So it won't help in defending a change to the hash function. >> So the guarantees do not depend on the function hash() and may >> be even weaker than your above statement. > > There are no actual guarantees for hash(), but lots of rules for > well-behaved hashes. Absolutely. Stefan From greg at krypto.org Sun Jan 15 18:02:35 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 15 Jan 2012 09:02:35 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel wrote: > > It also seems to me that the wording "has a hash value which never changes > during its lifetime" makes it pretty clear that the lifetime of the hash > value is not guaranteed to supersede the lifetime of the object (although > that's a rather muddy definition - memory lifetime? or pickle-unpickle as > well?). > Lifetime to me means of that specific instance of the object. I would not expect that to survive pickle-unpickle. > However, this entry in the glossary only seems to have appeared with Py2.6, > likely as a result of the abc changes. So it won't help in defending a > change to the hash function. > Ugh, I really hope there is no code out there depending on the hash function being the same across a pickle and unpickle boundary. Unfortunately the hash function was last changed in 1996 in http://hg.python.org/cpython/rev/839f72610ae1 so it is possible someone somewhere has written code blindly assuming that non-guarantee is true. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jan 15 18:11:10 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 15 Jan 2012 18:11:10 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: <20120114021708.2fbe990f@pitrou.net> Message-ID: <20120115181110.4ff580ba@pitrou.net> On Sun, 15 Jan 2012 17:46:36 +0100 Stefan Behnel wrote: > Guido van Rossum, 15.01.2012 17:10: > > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > >> Terry Reedy, 14.01.2012 06:43: > >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >>> > >>>> It is perfectly okay to break existing users who had anything depending > >>>> on ordering of internal hash tables. Their code was already broken. > >>> > >>> Given that the doc says "Return the hash value of the object", I do not > >>> think we should be so hard-nosed. The above clearly implies that there is > >>> such a thing as *the* Python hash value for an object. And indeed, that > >> has > >>> been true across many versions. If we had written "Return a hash value > >> for > >>> the object, which can vary from run to run", the case would be different. > >> > >> Just a side note, but I don't think hash() is the right place to document > >> this. > > > > You mean we shouldn't document that the hash() of a string will vary per > > run? > > No, I mean that the hash() builtin function is not the right place to > document the behaviour of a string hash. That should go into the string > object documentation. No, but we can document that *any* hash() value can vary between runs without being specific about which builtin types randomize their hashes right now. Regards Antoine. From guido at python.org Sun Jan 15 18:44:08 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Jan 2012 09:44:08 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel wrote: > Guido van Rossum, 15.01.2012 17:10: > > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: > >> Terry Reedy, 14.01.2012 06:43: > >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >>> > >>>> It is perfectly okay to break existing users who had anything > depending > >>>> on ordering of internal hash tables. Their code was already broken. > >>> > >>> Given that the doc says "Return the hash value of the object", I do not > >>> think we should be so hard-nosed. The above clearly implies that there > is > >>> such a thing as *the* Python hash value for an object. And indeed, that > >> has > >>> been true across many versions. If we had written "Return a hash value > >> for > >>> the object, which can vary from run to run", the case would be > different. > >> > >> Just a side note, but I don't think hash() is the right place to > document > >> this. > > > > You mean we shouldn't document that the hash() of a string will vary per > > run? > > No, I mean that the hash() builtin function is not the right place to > document the behaviour of a string hash. That should go into the string > object documentation. > > Although, arguably, it may be worth mentioning in the docs of hash() that, > in general, hash values of builtin types are bound to the lifetime of the > interpreter instance (or entire runtime?) and may change after restarts. I > think that's a reasonable restriction to document that prominently, even if > it will only apply to str for the time being. > Actually it will apply to a lot more than str, because the hash of (immutable) compound objects is often derived from the hash of the constituents, e.g. hash of a tuple. > >> Hashing is a protocol in Python, just like indexing or iteration. > >> Nothing keeps an object from changing its hash value due to > modification, > > > > Eh? There's a huge body of cultural awareness that only immutable objects > > should define a hash, implying that the hash remains constant during the > > object's lifetime. > > > >> and that would even be valid in the face of the usual dict lookup > >> invariants if changes are only applied while the object is not > referenced > >> by any dict. > > > > And how would you know it isn't? > > Well, if it's an object with a mutable hash then it's up to the application > defining that object to make sure it's used in a sensible way. Immutability > just makes your life easier. I can imagine that an object gets removed from > a dict (say, a cache), modified and then reinserted, and I think it's valid > to allow the modification to have an impact on the hash in this case, in > order to accommodate for any changes to equality comparisons due to the > modification. > That could be considered valid only in a very abstract, theoretical, non-constructive way, since there is no protocol to detect removal from a dict (and you cannot assume an object is used in only one dict at a time). > That being said, it seems that the Python docs actually consider constant > hashes a requirement rather than a virtue. > > http://docs.python.org/glossary.html#term-hashable > > """ > An object is hashable if it has a hash value which never changes during its > lifetime (it needs a __hash__() method), and can be compared to other > objects (it needs an __eq__() or __cmp__() method). Hashable objects which > compare equal must have the same hash value. > """ > > It also seems to me that the wording "has a hash value which never changes > during its lifetime" makes it pretty clear that the lifetime of the hash > value is not guaranteed to supersede the lifetime of the object (although > that's a rather muddy definition - memory lifetime? or pickle-unpickle as > well?). > Across pickle-unpickle it's not considered the same object. Pickling at best preserves values. However, this entry in the glossary only seems to have appeared with Py2.6, > likely as a result of the abc changes. So it won't help in defending a > change to the hash function. > > > >> So the guarantees do not depend on the function hash() and may > >> be even weaker than your above statement. > > > > There are no actual guarantees for hash(), but lots of rules for > > well-behaved hashes. > > Absolutely. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From modelnine at modelnine.org Sun Jan 15 19:40:49 2012 From: modelnine at modelnine.org (Heiko Wundram) Date: Sun, 15 Jan 2012 19:40:49 +0100 Subject: [Python-Dev] Status of the fix for the hash collision ulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> <4F125AEE.3050702@pearwood.info> Message-ID: <4F131DB1.8020704@modelnine.org> Am 15.01.2012 15:27, schrieb Victor Stinner: > I don't think that it would be hard to patch this library to use > another hash function. It can implement its own hash function, use > MD5, SHA1, or anything else. hash() is not stable accross Python > versions and 32/64 bit systems. As I wrote in a reply further down: no, it isn't hard to change this behaviour (and I find the current caching system, which uses hash() on an URL to choose the cache index, braindead to begin with), but, as with all other considerations: the current version of the library, with the default options, depends on hash() to be stable for the cache to make any sense at all (and especially with "generic" schema such as the referenced xml.dtd, caching makes a lot of sense, and not being able to cache _breaks_ applications as it did mine). This is juts something to bear in mind. -- --- Heiko. From ulrich.eckhardt at dominolaser.com Mon Jan 16 10:12:27 2012 From: ulrich.eckhardt at dominolaser.com (Ulrich Eckhardt) Date: Mon, 16 Jan 2012 10:12:27 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <4F088795.5000800@v.loewis.de> References: <4F088795.5000800@v.loewis.de> Message-ID: <4F13E9FB.4090000@dominolaser.com> Am 07.01.2012 18:57, schrieb "Martin v. L?wis": > I just tried porting Python as a Metro (Windows 8) App, and failed. > > Metro Apps use a variant of the Windows API called WinRT that still > allows to write native applications in C++, but restricts various APIs > to a subset of the full Win32 functionality. For example, everything > related to subprocess creation would not work; none of the > byte-oriented file API seems to be present, and a number of file > operation functions are absent as well (such as MoveFile). Just wondering, do Metro apps define UNDER_CE or _WIN32_WCE? The point is that the old ANSI functions (CreateFileA etc) have been removed from the embedded MS Windows CE long ago, too, and MS Windows Mobile used to be a custom CE variant or at least strongly related. In any case, it could help using the existing (incomplete) CE port as base for Metro. Uli ************************************************************************************** Domino Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at http://www.dominolaser.com ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Domino Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From neo_python at 126.com Mon Jan 16 11:23:51 2012 From: neo_python at 126.com (python) Date: Mon, 16 Jan 2012 18:23:51 +0800 Subject: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35 Message-ID: jbk python-dev-request at python.org??? >Send Python-Dev mailing list submissions to > python-dev at python.org > >To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/python-dev >or, via email, send a message with subject or body 'help' to > python-dev-request at python.org > >You can reach the person managing the list at > python-dev-owner at python.org > >When replying, please edit your Subject line so it is more specific >than "Re: Contents of Python-Dev digest..." > > >Today's Topics: > > 1. Re: Status of the fix for the hash collision vulnerability > (Gregory P. Smith) > 2. Re: Status of the fix for the hash collision vulnerability > (Barry Warsaw) > 3. Re: Sphinx version for Python 2.x docs (?ric Araujo) > 4. Re: Status of the fix for the hash collision vulnerability > (martin at v.loewis.de) > 5. Re: Status of the fix for the hash collision vulnerability > (Guido van Rossum) > 6. Re: [Python-checkins] cpython: add test, which was missing > from d64ac9ab4cd0 (Nick Coghlan) > 7. Re: Status of the fix for the hash collision vulnerability > (Terry Reedy) > 8. Re: Status of the fix for the hash collision vulnerability > (Jack Diederich) > 9. Re: cpython: Implement PEP 380 - 'yield from' (closes #11682) > (Nick Coghlan) > 10. Re: Status of the fix for the hash collision vulnerability > (Nick Coghlan) > > >---------------------------------------------------------------------- > >Message: 1 >Date: Fri, 13 Jan 2012 19:06:00 -0800 >From: "Gregory P. Smith" >Cc: python-dev at python.org >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: > >Content-Type: text/plain; charset="iso-8859-1" > >On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith wrote: > >> >> On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum wrote: >> >>> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: >>> >>>> On Thu, 12 Jan 2012 18:57:42 -0800 >>>> Guido van Rossum wrote: >>>> > Hm... I started out as a big fan of the randomized hash, but thinking >>>> more >>>> > about it, I actually believe that the chances of some legitimate app >>>> having >>>> > >1000 collisions are way smaller than the chances that somebody's code >>>> will >>>> > break due to the variable hashing. >>>> >>>> Breaking due to variable hashing is deterministic: you notice it as >>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable >>>> variable hashing). That seems better than unpredictable breaking when >>>> some legitimate collision chain happens. >>> >>> >>> Fair enough. But I'm now uncomfortable with turning this on for bugfix >>> releases. I'm fine with making this the default in 3.3, just not in 3.2, >>> 3.1 or 2.x -- it will break too much code and organizations will have to >>> roll back the release or do extensive testing before installing a bugfix >>> release -- exactly what we *don't* want for those. >>> >>> FWIW, I don't believe in the SafeDict solution -- you never know which >>> dicts you have to change. >>> >>> >> Agreed. >> >> Of the three options Victor listed only one is good. >> >> I don't like *SafeDict*. *-1*. It puts the onerous on the coder to >> always get everything right with regards to data that came from outside the >> process never ending up hashed in a non-safe dict or set *anywhere*. >> "Safe" needs to be the default option for all hash tables. >> >> I don't like the "*too many hash collisions*" exception. *-1*. It >> provides non-deterministic application behavior for data driven >> applications with no way for them to predict when it'll happen or where and >> prepare for it. It may work in practice for many applications but is simply >> odd behavior. >> >> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily >> be back ported to any Python version. >> >> It is perfectly okay to break existing users who had anything depending on >> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the >> feature off at their own peril which they can use in their test harnesses >> that are stupid enough to use doctests with order dependencies. >> > >What an implementation looks like: > > http://pastebin.com/9ydETTag > >some stuff to be filled in, but this is all that is really required. add >logic to allow a particular seed to be specified or forced to 0 from the >command line or environment. add the logic to grab random bytes. add the >autoconf glue to disable it. done. > >-gps > > >> This approach worked fine for Perl 9 years ago. >> https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 >> >> -gps >> >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 2 >Date: Sat, 14 Jan 2012 04:19:38 +0100 >From: Barry Warsaw >To: python-dev at python.org >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: <20120114041938.098fd14b at rivendell> >Content-Type: text/plain; charset=US-ASCII > >On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote: > >>On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou wrote: >> >>> Breaking due to variable hashing is deterministic: you notice it as >>> soon as you upgrade (and then you use PYTHONHASHSEED to disable >>> variable hashing). That seems better than unpredictable breaking when >>> some legitimate collision chain happens. >> >> >>Fair enough. But I'm now uncomfortable with turning this on for bugfix >>releases. I'm fine with making this the default in 3.3, just not in 3.2, >>3.1 or 2.x -- it will break too much code and organizations will have to >>roll back the release or do extensive testing before installing a bugfix >>release -- exactly what we *don't* want for those. > >+1 > >-Barry > > >------------------------------ > >Message: 3 >Date: Sat, 14 Jan 2012 04:24:52 +0100 >From: ?ric Araujo >To: >Subject: Re: [Python-Dev] Sphinx version for Python 2.x docs >Message-ID: >Content-Type: text/plain; charset=UTF-8; format=flowed > >Hi Sandro, > >Thanks for getting the ball rolling on this. One style for markup, one >Sphinx version to code our extensions against and one location for the >documenting guidelines will make our work a bit easier. > >> During the build process, there are some warnings that I can >> understand: >I assume you mean ?can?t?, as you later ask how to fix them. As a >general rule, they?re only warnings, so they don?t break the build, >only >some links or stylings, so I think it?s okay to ignore them *right >now*. > >> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal >That?s a mistake I did in cefe4f38fa0e. This sentence should be >removed. > >> Doc/library/stdtypes.rst:2372: WARNING: more than one target found >> for >> cross-reference u'next': >Need to use :meth:`.next` to let Sphinx find the right target (more >info >on request :) > >> Doc/library/sys.rst:651: WARNING: unknown keyword: None >Should use ``None``. > >> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in >> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not >I don?t know if these should work (i.e. create a link to the >appropriate >language reference section) or abuse the markup (there are ?not? and >?in? keywords, but no ?not in? keyword ? use ``not in``). I?d say >ignore >them. > >Cheers > > >------------------------------ > >Message: 4 >Date: Sat, 14 Jan 2012 04:45:57 +0100 >From: martin at v.loewis.de >To: python-dev at python.org >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: > <20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA at webmail.df.eu> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed; DelSp=Yes > >> What an implementation looks like: >> >> http://pastebin.com/9ydETTag >> >> some stuff to be filled in, but this is all that is really required. > >I think this statement (and the patch) is wrong. You also need to change >the byte string hashing, at least for 2.x. This I consider the biggest >flaw in that approach - other people may have written string-like objects >which continue to compare equal to a string but now hash different. > >Regards, >Martin > > > > >------------------------------ > >Message: 5 >Date: Fri, 13 Jan 2012 20:00:54 -0800 >From: Guido van Rossum >To: "Gregory P. Smith" >Cc: Antoine Pitrou , python-dev at python.org >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: > >Content-Type: text/plain; charset="iso-8859-1" > >On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith wrote: > >> It is perfectly okay to break existing users who had anything depending on >> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the >> feature off at their own peril which they can use in their test harnesses >> that are stupid enough to use doctests with order dependencies. > > >No, that is not how we usually take compatibility between bugfix releases. >"Your code is already broken" is not an argument to break forcefully what >worked (even if by happenstance) before. The difference between CPython and >Jython (or between different CPython feature releases) also isn't relevant >-- historically we have often bent over backwards to avoid changing >behavior that was technically undefined, if we believed it would affect a >significant fraction of users. > >I don't think anyone doubts that this will break lots of code (at least, >the arguments I've heard have been "their code is broken", not "nobody does >that"). > >This approach worked fine for Perl 9 years ago. >> https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371 >> > >I don't know what the Perl attitude about breaking undefined behavior >between micro versions was at the time. But ours is pretty clear -- don't >do it. > >-- >--Guido van Rossum (python.org/~guido) >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: > >------------------------------ > >Message: 6 >Date: Sat, 14 Jan 2012 15:16:32 +1000 >From: Nick Coghlan >To: python-dev at python.org >Cc: python-checkins at python.org >Subject: Re: [Python-Dev] [Python-checkins] cpython: add test, which > was missing from d64ac9ab4cd0 >Message-ID: > >Content-Type: text/plain; charset=ISO-8859-1 > >On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson > wrote: >> http://hg.python.org/cpython/rev/be85914b611c >> changeset: ? 74363:be85914b611c >> parent: ? ? ?74361:609482c6710e >> user: ? ? ? ?Benjamin Peterson >> date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500 >> summary: >> ?add test, which was missing from d64ac9ab4cd0 > >Ah, that's where that came from, thanks. > >I still haven't fully trained myself to use hg import instead of >patch, which would avoid precisely this kind of error :P > >Cheers, >Nick. > >-- >Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > > >------------------------------ > >Message: 7 >Date: Sat, 14 Jan 2012 00:43:04 -0500 >From: Terry Reedy >To: python-dev at python.org >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: >Content-Type: text/plain; charset=UTF-8; format=flowed > >On 1/13/2012 8:58 PM, Gregory P. Smith wrote: > >> It is perfectly okay to break existing users who had anything depending >> on ordering of internal hash tables. Their code was already broken. > >Given that the doc says "Return the hash value of the object", I do not >think we should be so hard-nosed. The above clearly implies that there >is such a thing as *the* Python hash value for an object. And indeed, >that has been true across many versions. If we had written "Return a >hash value for the object, which can vary from run to run", the case >would be different. > >-- >Terry Jan Reedy > > > >------------------------------ > >Message: 8 >Date: Sat, 14 Jan 2012 01:24:54 -0500 >From: Jack Diederich >To: Guido van Rossum >Cc: Python Dev >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: > >Content-Type: text/plain; charset=ISO-8859-1 > >On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum wrote: >> Hm... I started out as a big fan of the randomized hash, but thinking more >> about it, I actually believe that the chances of some legitimate app having >>>1000 collisions are way smaller than the chances that somebody's code will >> break due to the variable hashing. > >Python's dicts are designed to avoid hash conflicts by resizing and >keeping the available slots bountiful. 1000 conflicts sounds like a >number that couldn't be hit accidentally unless you had a single dict >using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're >good). The hashes also look to exploit cache locality but that is >very unlikely to get one thousand conflicts by chance. If you get >that many there is an attack. > >> This is depending on how the counting is done (I didn't look at MAL's >> patch), and assuming that increasing the hash table size will generally >> reduce collisions if items collide but their hashes are different. > >The patch counts conflicts on an individual insert and not lifetime >conflicts. Looks sane to me. > >> That said, even with collision counting I'd like a way to disable it without >> changing the code, e.g. a flag or environment variable. > >Agreed. Paranoid people can turn the behavior off and if it ever were >to become a problem in practice we could point people to a solution. > >-Jack > > >------------------------------ > >Message: 9 >Date: Sat, 14 Jan 2012 16:53:39 +1000 >From: Nick Coghlan >To: Georg Brandl >Cc: python-dev at python.org >Subject: Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from' > (closes #11682) >Message-ID: > >Content-Type: text/plain; charset=ISO-8859-1 > >On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl wrote: >> On 01/13/2012 12:43 PM, nick.coghlan wrote: >>> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst >> >> There should probably be a "versionadded" somewhere on this page. > >Good catch, I added versionchanged notes to this page, simple_stmts >and the StopIteration entry in the library reference. > >>> ?PEP 3155: Qualified name for classes and functions >>> ?================================================== >> >> This looks like a spurious (and syntax-breaking) change. > >Yeah, it was an error I introduced last time I merged from default. Fixed. > >>> diff --git a/Grammar/Grammar b/Grammar/Grammar >>> -argument: test [comp_for] | test '=' test ?# Really [keyword '='] test >>> +argument: (test) [comp_for] | test '=' test ?# Really [keyword '='] test >> >> This looks like a change without effect? > >Fixed. > >It was a lingering after-effect of Greg's original patch (which also >modified the function call syntax to allow "yield from" expressions >with extra parens). I reverted the change to the function call syntax, >but forgot to ditch the added parens while doing so. > >>> diff --git a/Include/genobject.h b/Include/genobject.h >>> >>> - ? ? /* List of weak reference. */ >>> - ? ? PyObject *gi_weakreflist; >>> + ? ? ? ?/* List of weak reference. */ >>> + ? ? ? ?PyObject *gi_weakreflist; >>> ?} PyGenObject; >> >> While these change tabs into spaces, it should be 4 spaces, not 8. > >Fixed. > >>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **); >> >> Does this API need to be public? If yes, it needs to be documented. > >Hmm, good point - that one needs a bit of thought, so I've put it on >the tracker: http://bugs.python.org/issue13783 > >(that issue also covers your comments regarding the docstring for this >function and whether or not we even need the StopIteration instance >creation API) > >>> -#define CALL_FUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */ >>> -#define MAKE_FUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >>> -#define BUILD_SLICE ?133 ? ? /* Number of items */ >>> +#define CALL_FUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */ >>> +#define MAKE_FUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */ >>> +#define BUILD_SLICE ? ? 133 ? ? /* Number of items */ >> >> Not sure putting these and all the other cosmetic changes into an already >> big patch is such a good idea... > >I agree, but it's one of the challenges of a long-lived branch like >the PEP 380 one (I believe some of these cosmetic changes started life >in Greg's original patch and separating them out would have been quite >a pain). Anyone that wants to see the gory details of the branch >history can take a look at my bitbucket repo: > >https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29 > >>> diff --git a/Objects/abstract.c b/Objects/abstract.c >>> --- a/Objects/abstract.c >>> +++ b/Objects/abstract.c >>> @@ -2267,7 +2267,6 @@ >>> >>> ? ? ?func = PyObject_GetAttrString(o, name); >>> ? ? ?if (func == NULL) { >>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name); >>> ? ? ? ? ?return 0; >>> ? ? ?} >>> >>> @@ -2311,7 +2310,6 @@ >>> >>> ? ? ?func = PyObject_GetAttrString(o, name); >>> ? ? ?if (func == NULL) { >>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name); >>> ? ? ? ? ?return 0; >>> ? ? ?} >>> ? ? ?va_start(va, format); >> >> These two changes also look suspiciously unrelated? > >IIRC, I removed those lines while working on the patch because the >message they produce (just the attribute name) is worse than the one >produced by the call to PyObject_GetAttrString (which also includes >the type of the object being accessed). Leaving the original >exceptions alone helped me track down some failures I was getting at >the time. > >I've now made the various CallMethod helper APIs in abstract.c (1 >public, 3 private) consistently leave the GetAttr exception alone and >added an explicit C API note to NEWS. > >(Vaguely related tangent: the new code added by the patch probably has >a few parts that could benefit from the new GetAttrId private API) > >>> diff --git a/Objects/genobject.c b/Objects/genobject.c >>> + ? ? ? ?} else { >>> + ? ? ? ? ? ?PyObject *e = PyStopIteration_Create(result); >>> + ? ? ? ? ? ?if (e != NULL) { >>> + ? ? ? ? ? ? ? ?PyErr_SetObject(PyExc_StopIteration, e); >>> + ? ? ? ? ? ? ? ?Py_DECREF(e); >>> + ? ? ? ? ? ?} >> >> Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here >> anyway? > >I think you're right - so noted in the tracker issue about the C API additions. > >Thanks for the thorough review, a fresh set of eyes is very helpful :) > >Cheers, >Nick. > >-- >Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > > >------------------------------ > >Message: 10 >Date: Sat, 14 Jan 2012 17:01:48 +1000 >From: Nick Coghlan >To: Jack Diederich >Cc: Guido van Rossum , Python Dev > >Subject: Re: [Python-Dev] Status of the fix for the hash collision > vulnerability >Message-ID: > >Content-Type: text/plain; charset=ISO-8859-1 > >On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich wrote: >>> This is depending on how the counting is done (I didn't look at MAL's >>> patch), and assuming that increasing the hash table size will generally >>> reduce collisions if items collide but their hashes are different. >> >> The patch counts conflicts on an individual insert and not lifetime >> conflicts. ?Looks sane to me. > >Having a hard limit on the worst-case behaviour certainly sounds like >an attractive prospect. And there's nothing to worry about in terms of >secrecy or sufficient randomness - by default, attackers cannot >generate more than 1000 hash collisions in one lookup, period. > >>> That said, even with collision counting I'd like a way to disable it without >>> changing the code, e.g. a flag or environment variable. >> >> Agreed. ?Paranoid people can turn the behavior off and if it ever were >> to become a problem in practice we could point people to a solution. > >Does MAL's patch allow the limit to be set on a per-dict basis >(including setting it to None to disable collision limiting >completely)? If people have data sets that need to tolerate that kind >of collision level (and haven't already decided to move to a data >structure other than the builtin dict), then it may make sense to >allow them to remove the limit when using trusted input. > >For maintenance versions though, it would definitely need to be >possible to switch it off without touching the code. > >Cheers, >Nick. > >-- >Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > > >------------------------------ > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev > > >End of Python-Dev Digest, Vol 102, Issue 35 >******************************************* From steve at pearwood.info Mon Jan 16 13:28:59 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 16 Jan 2012 23:28:59 +1100 Subject: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35 In-Reply-To: References: Message-ID: <4F14180B.2080003@pearwood.info> python wrote: > jbk [snip 560+ lines of quoted text] Please delete irrelevant text when replying to digests, and replace the subject line with a meaningful subject. -- Steven From merwok at netwok.org Mon Jan 16 16:42:14 2012 From: merwok at netwok.org (=?UTF-8?Q?=C3=89ric_Araujo?=) Date: Mon, 16 Jan 2012 16:42:14 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: "\"<4E4AF610.5040303@simplistix.co.uk> " " Message-ID: Hi, Le 14/01/2012 15:31, Sandro Tosi a ?crit : > On Sat, Jan 14, 2012 at 04:24, ?ric Araujo wrote: >>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal >> That?s a mistake I did in cefe4f38fa0e. This sentence should be >> removed. > Do you mean revert this whole hunk: > [...] > or just "The :keyword:`nonlocal` allows writing to outer scopes."? My proposal was to remove just that one last sentence, but the only other change in the diff hunk is the addition of ?by default?, which is connected to the existence of nonlocal. Both changes, i.e. the whole hunk, should be reverted (I think I?ll have time to do that today). >>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found >>> for >>> cross-reference u'next': >> Need to use :meth:`.next` to let Sphinx find the right target (more >> info >> on request :) > it seems what it needed to was :meth:`next` (without the dot). The > current page links all 'next' in file.next() to functions.html#next, > and using :meth:`next` does that. I should have given more info, as I wanted the opposite result :) file.next should not link to the next function but to the file.next method. Because Sphinx does not differentiate between meth/func/class/mod roles, :meth:`next` is not resolved to the nearest next method as one could expect but to the next function, so we have to use :meth:`~SomeClass.next` or :meth:`.next` (local ref markup) to get our links to methods. >>> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in >>> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is >>> not Georg fixed them. Cheers From brett at python.org Mon Jan 16 17:17:42 2012 From: brett at python.org (Brett Cannon) Date: Mon, 16 Jan 2012 11:17:42 -0500 Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3 feature list up to date. In-Reply-To: References: Message-ID: Is the change to the pyc format big enough news to go into the release PEP? Or should that just be a "What's New" topic? On Fri, Jan 13, 2012 at 15:18, georg.brandl wrote: > http://hg.python.org/peps/rev/ea3ffa3611e5 > changeset: 4012:ea3ffa3611e5 > user: Georg Brandl > date: Fri Jan 13 21:18:11 2012 +0100 > summary: > Bring the Python 3.3 feature list up to date. > > files: > pep-0398.txt | 17 ++++++++++++----- > 1 files changed, 12 insertions(+), 5 deletions(-) > > > diff --git a/pep-0398.txt b/pep-0398.txt > --- a/pep-0398.txt > +++ b/pep-0398.txt > @@ -57,27 +57,34 @@ > Features for 3.3 > ================ > > +Implemented PEPs: > + > +* PEP 380: Syntax for Delegating to a Subgenerator > +* PEP 393: Flexible String Representation > +* PEP 3151: Reworking the OS and IO exception hierarchy > +* PEP 3155: Qualified name for classes and functions > + > +Other final large-scale changes: > + > +* Addition of the "packaging" module, deprecating "distutils" > +* Addition of the faulthandler module > + > Candidate PEPs: > > * PEP 362: Function Signature Object > -* PEP 380: Syntax for Delegating to a Subgenerator > * PEP 382: Namespace Packages > -* PEP 393: Flexible String Representation > * PEP 395: Module Aliasing > * PEP 397: Python launcher for Windows > * PEP 3143: Standard daemon process library > -* PEP 3151: Reworking the OS and IO exception hierarchy > > (Note that these are not accepted yet and even if they are, they might > not be finished in time for Python 3.3.) > > Other planned large-scale changes: > > -* Addition of the "packaging" module, replacing "distutils" > * Implementing ``__import__`` using importlib > * Email version 6 > * A standard event-loop interface (PEP by Jim Fulton pending) > -* Adding the faulthandler module. > * Breaking out standard library and docs in separate repos? > * A PEP on supplementing C modules with equivalent Python modules? > > > -- > Repository URL: http://hg.python.org/peps > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 16 17:28:11 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 16 Jan 2012 17:28:11 +0100 Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3 feature list up to date. References: Message-ID: <20120116172811.45e868f9@pitrou.net> On Mon, 16 Jan 2012 11:17:42 -0500 Brett Cannon wrote: > Is the change to the pyc format big enough news to go into the release PEP? > Or should that just be a "What's New" topic? "What's New" sounds enough to me. The change doesn't enable any new feature, it just makes an issue much less likely to pop out. Regards Antoine. From jaraco at jaraco.com Mon Jan 16 21:00:37 2012 From: jaraco at jaraco.com (Jason R. Coombs) Date: Mon, 16 Jan 2012 20:00:37 +0000 Subject: [Python-Dev] Script(s) for building Python on Windows Message-ID: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> The current scripts for building Python lack some things to be desired. The first thing I notice when I try to build Python on Windows is the scripts expect to be run inside of a Visual Studio environment, the environment of which is only defined inside of a cmd.exe context. This means the scripts can't be executed from within Powershell (my preferred shell on Windows). One must first shell out to cmd.exe, which disables any Powershell-specific features the developer might have installed (aliases, functions, etc). The second thing I notice is the scripts assume Visual Studio 2008. And while I recognize that Python is specifically built against Visual Studio 2008 for the official releases and that Visual Studio 2008 may be the only officially-supported build environment, later releases, such as Visual Studio 2010 are also adequate for testing purposes. I've been developing Python against Visual Studio 2010 for quite a while and it seems to be more than adequate. And while it's not the responsibility of the scripts to accommodate such environments, if the scripts could allow for such environments, that would be nice. Furthermore, having scripts that codify the process to upgrade will facilitate the migration should someone make the decision to officially upgrade to Visual Studio 2010. The third thing that I notice is that the command-line argument handling by the batch scripts is clumsy (compared to argparse, for example). This clumsiness is not a criticism of the authors, who have done well with the tools they had. However, batch programming is probably one of the least powerful ways to automate builds these days. So to ease my experience, I've developed my own library of functions and commands to facilitate building Python that aren't subject to the above limitations. Of course, I built these in Python, so they do require Python to build Python (not a huge burden, but worth mentioning). All of these modules are open-source and part of the jaraco.develop package . The first of these modules is jaraco.develop.vstudio . It exposes a class for locating Visual Studio in the usual locations, loading the environment for that instance of Visual Studio, and upgrading a project or solution file to that version. This class in particular enables running Visual Studio commands (including msbuild) from within a Visual Studio environment without actually requiring a cmd.exe context with that environment. Another module is jaraco.develop.python , which includes build_python, a function (and command) to build Python using whatever version of Visual Studio can be found (9 or 10 required). It has no environmental requirements except that Visual Studio be installed. Simply run build-python (part of jaraco.develop's console scripts) and it will build PCbuild.sln from the current directory to whatever targets are specified (or all of them if none are specified). The builder currently makes some assumptions (such as always building the 64-bit Release targets), but those could easily be customized using argparse parameters. This package and these modules have been tested and run on Python 2.7+. These tools solve the three shortcomings I mentioned above and make the development process so much smoother, IMO. If these modules were built into the repository, building Python could be as simple as "hg clone; cd cpython/pcbuild; ./build.py" (assuming only Visual Studio and Python available). I'd like to propose migrating this functionality (mainly these two modules) into the CPython heads for Python 2.7, 3.1, 3.2, and default as PCbuild/build.py (or similar). This functionality doesn't necessarily need to supersede the existing scripts (env, build_env, build), though it certainly could (and would as far as my usage is concerned). If there are no objections, I'll work to extract the aforementioned functionality from the jaraco.develop modules and into a portable script and put together a proof-of-concept in the default branch. The build script should not interfere with any build bots or other existing build processes, but should enable another more powerful technique for producing builds. I look forward to your comments and feedback. Regards, Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6662 bytes Desc: not available URL: From greg at krypto.org Mon Jan 16 21:16:38 2012 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 16 Jan 2012 12:16:38 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <20120114021708.2fbe990f@pitrou.net> Message-ID: On Sun, Jan 15, 2012 at 9:44 AM, Guido van Rossum wrote: > On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel wrote: > >> Guido van Rossum, 15.01.2012 17:10: >> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote: >> >> Terry Reedy, 14.01.2012 06:43: >> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote: >> >>> >> >>>> It is perfectly okay to break existing users who had anything >> depending >> >>>> on ordering of internal hash tables. Their code was already broken. >> >>> >> >>> Given that the doc says "Return the hash value of the object", I do >> not >> >>> think we should be so hard-nosed. The above clearly implies that >> there is >> >>> such a thing as *the* Python hash value for an object. And indeed, >> that >> >> has >> >>> been true across many versions. If we had written "Return a hash value >> >> for >> >>> the object, which can vary from run to run", the case would be >> different. >> >> >> >> Just a side note, but I don't think hash() is the right place to >> document >> >> this. >> > >> > You mean we shouldn't document that the hash() of a string will vary per >> > run? >> >> No, I mean that the hash() builtin function is not the right place to >> document the behaviour of a string hash. That should go into the string >> object documentation. >> >> Although, arguably, it may be worth mentioning in the docs of hash() that, >> in general, hash values of builtin types are bound to the lifetime of the >> interpreter instance (or entire runtime?) and may change after restarts. I >> think that's a reasonable restriction to document that prominently, even >> if >> it will only apply to str for the time being. >> > > Actually it will apply to a lot more than str, because the hash of > (immutable) compound objects is often derived from the hash of the > constituents, e.g. hash of a tuple. > > >> >> Hashing is a protocol in Python, just like indexing or iteration. >> >> Nothing keeps an object from changing its hash value due to >> modification, >> > >> > Eh? There's a huge body of cultural awareness that only immutable >> objects >> > should define a hash, implying that the hash remains constant during the >> > object's lifetime. >> > >> >> and that would even be valid in the face of the usual dict lookup >> >> invariants if changes are only applied while the object is not >> referenced >> >> by any dict. >> > >> > And how would you know it isn't? >> >> Well, if it's an object with a mutable hash then it's up to the >> application >> defining that object to make sure it's used in a sensible way. >> Immutability >> just makes your life easier. I can imagine that an object gets removed >> from >> a dict (say, a cache), modified and then reinserted, and I think it's >> valid >> to allow the modification to have an impact on the hash in this case, in >> order to accommodate for any changes to equality comparisons due to the >> modification. >> > > That could be considered valid only in a very abstract, theoretical, > non-constructive way, since there is no protocol to detect removal from a > dict (and you cannot assume an object is used in only one dict at a time). > > >> That being said, it seems that the Python docs actually consider constant >> hashes a requirement rather than a virtue. >> >> http://docs.python.org/glossary.html#term-hashable >> >> """ >> An object is hashable if it has a hash value which never changes during >> its >> lifetime (it needs a __hash__() method), and can be compared to other >> objects (it needs an __eq__() or __cmp__() method). Hashable objects which >> compare equal must have the same hash value. >> """ >> >> It also seems to me that the wording "has a hash value which never changes >> during its lifetime" makes it pretty clear that the lifetime of the hash >> value is not guaranteed to supersede the lifetime of the object (although >> that's a rather muddy definition - memory lifetime? or pickle-unpickle as >> well?). >> > > Across pickle-unpickle it's not considered the same object. Pickling at > best preserves values. > Updating the docs to explicitly clarify this sounds like a good idea. How does this wording to be added to the glossary.rst hashing section sound? """Hash values may not be stable across Python processes and must not be used for storage or otherwise communicated outside of a single Python session.""" -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Mon Jan 16 21:19:33 2012 From: brian at python.org (Brian Curtin) Date: Mon, 16 Jan 2012 14:19:33 -0600 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: On Mon, Jan 16, 2012 at 14:00, Jason R. Coombs wrote: > The second thing I notice is the scripts assume Visual Studio 2008. And > while I recognize that Python is specifically built against Visual Studio > 2008 for the official releases and that Visual Studio 2008 may be the only > officially-supported build environment, later releases, such as Visual > Studio 2010 are also adequate for testing purposes. I?ve been developing > Python against Visual Studio 2010 for quite a while and it seems to be more > than adequate. And while it?s not the responsibility of the scripts to > accommodate such environments, if the scripts could allow for such > environments, that would be nice. 2010 is adequate for limited use but the test suite doesn't pass, so I would be hesitant to add support and/or documentation for building with it until we actually support it the same as or in place of 2008. From jaraco at jaraco.com Mon Jan 16 21:33:08 2012 From: jaraco at jaraco.com (Jason R. Coombs) Date: Mon, 16 Jan 2012 20:33:08 +0000 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: <7E79234E600438479EC119BD241B48D60142BB93@SN2PRD0604MB141.namprd06.prod.outlook.com> > From: Brian Curtin [mailto:brian at python.org] > Sent: Monday, 16 January, 2012 15:20 > > 2010 is adequate for limited use but the test suite doesn't pass, so I would be > hesitant to add support and/or documentation for building with it until we > actually support it the same as or in place of 2008. Good point. The current tools don't automatically support 2010; an extra command is require to perform the conversion. I'll be cautious and not expose that functionality without some indication to the user of the limitations. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6662 bytes Desc: not available URL: From martin at v.loewis.de Mon Jan 16 22:24:40 2012 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 16 Jan 2012 22:24:40 +0100 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: <4F149598.7070006@v.loewis.de> > If there are no objections, I?ll work to extract the aforementioned > functionality from the jaraco.develop modules and into a portable script > and put together a proof-of-concept in the default branch. The build > script should not interfere with any build bots or other existing build > processes, but should enable another more powerful technique for > producing builds. I'd be hesitant to put too many specialized tools into the tree that will become unmaintained. Please take a look at the vs9to8 tool in PCbuild; if you could adjust that to support VS 10, it would be better IMO. As for completely automating the build: please take notice of Tools/buildbot/build.bat. It also fully automates the build, also doesn't require that the VS environment is already activated, and has the additional advantage of not requiring Python to be installed. Regards, Martin From paul at mcmillan.ws Mon Jan 16 23:23:40 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Mon, 16 Jan 2012 14:23:40 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: > As I understand it, the way the attack works is that a *single* > malicious request from the attacker can DoS the server by eating CPU > resources while evaluating a massive collision chain induced in a dict > by attacker supplied data. Explicitly truncating the collision chain > boots them out almost immediately (likely with a 500 response for an > internal server error), so they no longer affect other events, threads > and processes on the same machine. This is only true in the specific attack presented at 28c3. If an attacker can insert data without triggering the attack, it's possible to produce (in the example of a web application) urls that (regardless of the request) always produce pathological behavior. For example, a collection of pathological usernames might make it impossible to list users (and so choose which ones to delete) without resorting to removing the problem data at an SQL level. This is why the "simply throw an error" solution isn't a complete fix. Making portions of an interface unusable for regular users is clearly a bad thing, and is clearly applicable to other types of poisoned data as well. We need to detect collisions and work around them transparently. > However, such an app would have been crippled by the original DoS > anyway, since its performance would have been gutted - the collision > chain limiting just means it will trigger exceptions for the cases > that would been insanely slow. We can do better than saying "it would have been broken before, it's broken differently now". The universal hash function idea has merit, and for practical purposes hash randomization would fix this too (since colliding data is only likely to collide within a single process, persistent poisoning is far less feasible). -Paul From timothy.c.delaney at gmail.com Tue Jan 17 00:14:02 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 17 Jan 2012 10:14:02 +1100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: On 17 January 2012 09:23, Paul McMillan wrote: > This is why the "simply throw an error" solution isn't a complete fix. > Making portions of an interface unusable for regular users is clearly > a bad thing, and is clearly applicable to other types of poisoned data > as well. We need to detect collisions and work around them > transparently. What if in a pathological collision (e.g. > 1000 collisions), we increased the size of a dict by a small but random amount? Should be transparent, have neglible speed penalty, maximal reuse of existing code, and should be very difficult to attack since the dictionary would change size in a (near) non-deterministic manner when being attacked (i.e. first attack causes non-deterministic remap, next attack should fail). It should also have near-zero effect on existing tests and frameworks since we would only get the non-deterministic behaviour in pathological cases, which we would presumably need new tests for. Thoughts? Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Tue Jan 17 00:17:05 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 17 Jan 2012 10:17:05 +1100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: On 17 January 2012 10:14, Tim Delaney wrote: > On 17 January 2012 09:23, Paul McMillan wrote: > >> This is why the "simply throw an error" solution isn't a complete fix. >> Making portions of an interface unusable for regular users is clearly >> a bad thing, and is clearly applicable to other types of poisoned data >> as well. We need to detect collisions and work around them >> transparently. > > > What if in a pathological collision (e.g. > 1000 collisions), we increased > the size of a dict by a small but random amount? Should be transparent, > have neglible speed penalty, maximal reuse of existing code, and should be > very difficult to attack since the dictionary would change size in a (near) > non-deterministic manner when being attacked (i.e. first attack causes > non-deterministic remap, next attack should fail). > > It should also have near-zero effect on existing tests and frameworks > since we would only get the non-deterministic behaviour in pathological > cases, which we would presumably need new tests for. > > Thoughts? > And one thought I had immediately after hitting send is that there could be an attack of the form "build a huge dict, then hit it with something that causes it to rehash due to >1000 collisions". But that's not really going to be any worse than just building a huge dict and hitting a resize anyway. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaraco at jaraco.com Tue Jan 17 01:01:12 2012 From: jaraco at jaraco.com (Jason R. Coombs) Date: Tue, 17 Jan 2012 00:01:12 +0000 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <4F149598.7070006@v.loewis.de> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> <4F149598.7070006@v.loewis.de> Message-ID: <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: Monday, 16 January, 2012 16:25 > > I'd be hesitant to put too many specialized tools into the tree that will > become unmaintained. Please take a look at the vs9to8 tool in PCbuild; if you > could adjust that to support VS 10, it would be better IMO. Are you suggesting creating vs10to9, which would be congruent to vs9to8, or vs9to10? I'm unsure if the conversion from 9 to 10 or 10 to 9 can be as simple as the vs9to8 suggests. When I run the upgrade using the Visual Studio tools, it does upgrade the .sln file [as so]( http://a.libpa.st/kB19G). But as you can see, it also converts all of the .vcproj to .vcxproj, which appears to be a very different schema. According to [this article]( http://social.msdn.microsoft.com/Forums/en/vsprereleaseannouncements/thread/ 4345a151-d288-48d6-b7c7-a7c598d0f85e) it should be trivial to downgrade by only updating the .sln file (perhaps Visual Studio 2008 is forward compatible with the .vcxproj format). I'll look into this more when I have a better idea what you had in mind. My goal in adding the upgrade code was to provide a one-step upgrade for developers with only VS 10 installed. That's what vs-upgrade in jaraco.develop does. > As for completely automating the build: please take notice of > Tools/buildbot/build.bat. It also fully automates the build, also doesn't > require that the VS environment is already activated, and has the additional > advantage of not requiring Python to be installed. That's interesting, but it still suffers from several shortcomings: 1) It still assumes Visual Studio 2008 and fails with an obscure error otherwise. 2) You can't use it to build different targets (only the whole solution). 3) It automatically downloads the external dependencies (it'd be nice to build without them on occasion). 4) It's still a batch file, so still gives the abominable "Terminate batch job (Y/N)?" when cancelling any operation via Ctrl+C. 5) This functionality isn't in PCBuild/*. Why not? 6) There's no good way to select which type to build (64-bit versus 32-bit, release versus debug). Adding these command-line options is clumsy in batch files. 7) Since it's written in batch script, Python programmers might be hesitant to work with it (improve it). For a buildbot, the batch file is perfectly adequate. It should do the same thing every time reliably. For anyone but a robot or seasoned CPython Windows developer, however, the build tools are not intuitive, and I find that I'm constantly tweaking the batch scripts and asking myself, "why couldn't this be in Python, which is a much more powerful language?" This is why I developed the scripts, and my thought is they could be useful to others as well. My hope is they might even supersede the existing scripts and become canonical, in which case there would be no possibility of them becoming unmaintained. If it turns out that they do become unused and unmaintained, they can be removed, but my feeling is since they're concise, documented, Python scripts, they'd be more likely to be maintained than their '.bat' counterparts. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6662 bytes Desc: not available URL: From brian at python.org Tue Jan 17 01:13:29 2012 From: brian at python.org (Brian Curtin) Date: Mon, 16 Jan 2012 18:13:29 -0600 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> <4F149598.7070006@v.loewis.de> <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: On Mon, Jan 16, 2012 at 18:01, Jason R. Coombs > My goal in adding the upgrade code was to provide a one-step upgrade for > developers with only VS 10 installed. That's what vs-upgrade in > jaraco.develop does. Upgrading to 2010 requires some code changes in addition to the conversion, so the process might not be as ripe for automation as the previous versions. For one, a lot of constants in errno had to be updated, then a few places that set certain errnos had to be updated. From victor.stinner at haypocalc.com Tue Jan 17 01:16:43 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 01:16:43 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: 2012/1/17 Tim Delaney : > What if in a pathological collision (e.g. > 1000 collisions), we increased > the size of a dict by a small but random amount? It doesn't change anything, you will still get collisions. Victor From guido at python.org Tue Jan 17 02:18:27 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jan 2012 17:18:27 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: On Mon, Jan 16, 2012 at 4:16 PM, Victor Stinner < victor.stinner at haypocalc.com> wrote: > 2012/1/17 Tim Delaney : > > What if in a pathological collision (e.g. > 1000 collisions), we > increased > > the size of a dict by a small but random amount? > > It doesn't change anything, you will still get collisions. That depends right? If the collision is because they all have the same hash(), yes. It might be different if it is because the secondary hashing (or whatever it's called :-) causes collisions. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaraco at jaraco.com Tue Jan 17 04:08:27 2012 From: jaraco at jaraco.com (Jason R. Coombs) Date: Tue, 17 Jan 2012 03:08:27 +0000 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> <4F149598.7070006@v.loewis.de> <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: <7E79234E600438479EC119BD241B48D60142C806@SN2PRD0604MB141.namprd06.prod.outlook.com> > From: python-dev-bounces+jaraco=jaraco.com at python.org [mailto:python- > dev-bounces+jaraco=jaraco.com at python.org] On Behalf Of Jason R. Coombs > Sent: Monday, 16 January, 2012 19:01 > > I'm unsure if the conversion from 9 to 10 or 10 to 9 can be as simple as the > vs9to8 suggests. When I run the upgrade using the Visual Studio tools, it does > upgrade the .sln file [as so]( http://a.libpa.st/kB19G). But as you can see, it also > converts all of the .vcproj to .vcxproj, which appears to be a very different > schema. According to [this article]( > http://social.msdn.microsoft.com/Forums/en/vsprereleaseannouncements/thre > ad/ > 4345a151-d288-48d6-b7c7-a7c598d0f85e) it should be trivial to downgrade by > only updating the .sln file (perhaps Visual Studio 2008 is forward compatible > with the .vcxproj format). I upgraded the solution file using Visual Studio, then followed those instructions suggested by the article, but the solution no longer builds under Visual Studio 2008, so apparently that answer is incorrect. Perhaps it's possible to upgrade the .sln in a less aggressive way than the Visual Studio tools do by default, but my initial experience suggests it won't be as easy to upgrade/downgrade the solution file as it was between VS8/VS9. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6662 bytes Desc: not available URL: From g.brandl at gmx.net Tue Jan 17 08:22:34 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 17 Jan 2012 08:22:34 +0100 Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3 feature list up to date. In-Reply-To: <20120116172811.45e868f9@pitrou.net> References: <20120116172811.45e868f9@pitrou.net> Message-ID: Am 16.01.2012 17:28, schrieb Antoine Pitrou: > On Mon, 16 Jan 2012 11:17:42 -0500 > Brett Cannon wrote: >> Is the change to the pyc format big enough news to go into the release PEP? >> Or should that just be a "What's New" topic? > > "What's New" sounds enough to me. The change doesn't enable any new > feature, it just makes an issue much less likely to pop out. Agreed. Georg From martin at v.loewis.de Tue Jan 17 09:16:36 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Tue, 17 Jan 2012 09:16:36 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F125953.5060309@pearwood.info> Message-ID: <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> >> It doesn't change anything, you will still get collisions. > > > That depends right? If the collision is because they all have the same > hash(), yes. It might be different if it is because the secondary hashing > (or whatever it's called :-) causes collisions. But Python deals with the latter case just fine already. The open hashing approach relies on the dict resizing "enough" to prevent collisions after the dictionary has grown. Unless somebody can demonstrate a counter example, I believe this discussion is a red herring. Plus: if an attacker could craft keys that deliberately cause collisions because of the dictionary size, they could likely also craft keys in the same number that collide on actual hash values, bringing us back to the original problem. Regards, Martin From techtonik at gmail.com Tue Jan 17 11:59:16 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 17 Jan 2012 13:59:16 +0300 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: <20120113171908.4e1da88d@pitrou.net> References: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> <20120113171908.4e1da88d@pitrou.net> Message-ID: On Fri, Jan 13, 2012 at 7:19 PM, Antoine Pitrou wrote: > On Fri, 13 Jan 2012 17:00:57 +0100 > Xavier Morel wrote: > > FWIW this is not restricted to Linux (the same behavior change can > > be observed in OSX), and the script is overly complex you can expose > > the change with 3 lines > > > > import sys > > sys.stdout.write('promt>') > > sys.stdin.read(1) > > > > Python 2 displays "prompt" and terminates execution on [Return], > > Python 3 does not display anything until [Return] is pressed. > > > > Interestingly, the `-u` option is not sufficient to make > > "prompt>" appear in Python 3, the stream has to be flushed > > explicitly unless the input is ~16k characters (I guess that's > > an internal buffer size of some sort) > > "-u" forces line-buffering mode for stdout/stderr, which is already the > default if they are wired to an interactive device (isattr() returning > True). > > But this was already rehashed on python-ideas and the bug tracker, and > apparently Anatoly thought it would be a good idea to post on a third > medium. Sigh. > If you track this more closely, you'll notice there are four issues (surprises) from the user point of view: 1. print() buffers output on Python3 2. print() also buffers output on Python2, but only on Linux 3. there is some useless '-u' command line parameter (useless, because the last thing user wants is not only care about Python 2/3, but also how to invoke them) 4. print() is not guilty - it is sys.stdout.write() that buffers output 1-2 discussion was about idea to make new print() function behavior more 'pythonic', i.e. 'user-friendly' or just KISS, which resulted in adding a flush parameter 3 is a just a side FYI remark 4 doesn't relate to python-ideas anymore about fixing print() - it is about the *cause* of the problem with print() UX, which is underlying sys.stdout.write() behavior I asked 4 here, because it is the more appropriate place not only to ask if it can be/will be fixed, but also why. The target audience of the question are developers. Hope that helps Antoine recover from the sorrow. ;) -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jan 17 12:10:38 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 17 Jan 2012 20:10:38 +0900 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> Message-ID: <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> martin at v.loewis.de writes: > >> It doesn't change anything, you will still get collisions. > > > > > > That depends right? If the collision is because they all have the same > > hash(), yes. It might be different if it is because the secondary hashing > > (or whatever it's called :-) causes collisions. > > But Python deals with the latter case just fine already. The open hashing > approach relies on the dict resizing "enough" to prevent collisions after > the dictionary has grown. Unless somebody can demonstrate a counter example, > I believe this discussion is a red herring. > > Plus: if an attacker could craft keys that deliberately cause collisions > because of the dictionary size, they could likely also craft keys in the same > number that collide on actual hash values, bringing us back to the original > problem. I thought that the original problem was that with N insertions in the dictionary, by repeatedly inserting different keys generating the same hash value an attacker could arrange that the cost of finding an open slot is O(N), and thus the cost of N insertions is O(N^2). If so, frequent resizing could make the attacker's problem much more difficult, as the distribution of secondary probes should change with each resize. From victor.stinner at haypocalc.com Tue Jan 17 12:55:02 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 12:55:02 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > I thought that the original problem was that with N insertions in the > dictionary, by repeatedly inserting different keys generating the same > hash value an attacker could arrange that the cost of finding an open > slot is O(N), and thus the cost of N insertions is O(N^2). > > If so, frequent resizing could make the attacker's problem much more > difficult, as the distribution of secondary probes should change with > each resize. The attack creates 60,000 strings (or more) with exactly the same hash value. A dictionary uses hash(str) & DICT_MASK to compute the bucket index, where DICT_HASH is the number of buckets minus one. If all strings have the same hash value, we always start in the same bucket and the key has to be compared to all previous strings to find the next empty bucket. The attack works because a LOT of strings are compared and comparing strings is slow. If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but hash(str1)!=hash(str2), strings are not compared (this is a common optimization in Python), and the so the attack would not be successful (it would be slow, but not as slow as comparing two strings). Victor From ronaldoussoren at mac.com Tue Jan 17 12:25:18 2012 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 17 Jan 2012 12:25:18 +0100 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: References: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> <20120113171908.4e1da88d@pitrou.net> Message-ID: <19C851AE-B345-451A-B50C-2597F087F7E1@mac.com> On 17 Jan, 2012, at 11:59, anatoly techtonik wrote: > > > If you track this more closely, you'll notice there are four issues (surprises) from the user point of view: > 1. print() buffers output on Python3 > 2. print() also buffers output on Python2, but only on Linux > 3. there is some useless '-u' command line parameter > (useless, because the last thing user wants is not only care about Python 2/3, but also how to invoke them) > 4. print() is not guilty - it is sys.stdout.write() that buffers output > > 1-2 discussion was about idea to make new print() function behavior more 'pythonic', i.e. 'user-friendly' or just KISS, which resulted in adding a flush parameter > 3 is a just a side FYI remark > 4 doesn't relate to python-ideas anymore about fixing print() - it is about the *cause* of the problem with print() UX, which is underlying sys.stdout.write() behavior > > I asked 4 here, because it is the more appropriate place not only to ask if it can be/will be fixed, but also why. The target audience of the question are developers. All four "issues" are related to output buffering and how that is not user-friendly. The new issue you raise is the same as before: sys.stdout is line buffered when writing to a tty, which means that you have to explictly flush output when you want to output a partial line. Why is this a problem for you? Is that something that bothers you personally or do you have data that suggests that this is a problem for a significant amount of (new) users? Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4788 bytes Desc: not available URL: From victor.stinner at haypocalc.com Tue Jan 17 13:28:52 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 13:28:52 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: I finished my patch transforming hash(str) to a randomized hash function, see random-8.patch attached to the issue: http://bugs.python.org/issue13703 The remaining question is which random number generator should be used on Windows to initialize the hash secret (CryptoGen adds an overhead of 10%, at least when the DLL is loaded dynamically), read the issue for the details. I plan to commit my fix to Python 3.3 if it is accepted. Then write a simplified version to Python 3.2 and backport it to 3.1. Then backport the simplified fix to 2.7, and finally to 2.6. The vulnerability is public since one month, it is maybe time to fix it before it is widely exploited. Victor From jeremy at jeremysanders.net Tue Jan 17 16:39:03 2012 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Tue, 17 Jan 2012 15:39:03 +0000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Victor Stinner wrote: > If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but > hash(str1)!=hash(str2), strings are not compared (this is a common > optimization in Python), and the so the attack would not be successful > (it would be slow, but not as slow as comparing two strings). It's a shame the hash function can't take a second salt parameter to include in the hash. Each dict could have its own salt, generated from a quick pseudo-random generator. Jeremy From jeremy at jeremysanders.net Tue Jan 17 16:44:21 2012 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Tue, 17 Jan 2012 15:44:21 +0000 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Jeremy Sanders wrote: > Victor Stinner wrote: > >> If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but >> hash(str1)!=hash(str2), strings are not compared (this is a common >> optimization in Python), and the so the attack would not be successful >> (it would be slow, but not as slow as comparing two strings). > > It's a shame the hash function can't take a second salt parameter to > include in the hash. Each dict could have its own salt, generated from a > quick pseudo-random generator. Please ignore... forgot that the hashes are cached for strings! Jeremy From merwok at netwok.org Tue Jan 17 18:26:05 2012 From: merwok at netwok.org (=?UTF-8?Q?=C3=89ric_Araujo?=) Date: Tue, 17 Jan 2012 18:26:05 +0100 Subject: [Python-Dev] [Python-checkins] cpython: add str.casefold() (closes #13752) In-Reply-To: References: Message-ID: <9bd4a2c9c735b9cf1a896fa6f11fe2e3@netwok.org> Hi, > changeset: d4669f43d05f > user: Benjamin Peterson > date: Sat Jan 14 13:23:30 2012 -0500 > summary: > add str.casefold() (closes #13752) > diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst > --- a/Doc/library/stdtypes.rst > +++ b/Doc/library/stdtypes.rst > @@ -1002,6 +1002,14 @@ > rest lowercased. > > > +.. method:: str.casefold() > + > + Return a casefolded copy of the string. Casefolded strings may be > used for > + caseless matching. For example, ``"MASSE".casefold() == > "ma?e".casefold()``. > + > + .. versionadded:: 3.3 I think this method requires at least a link to relevant definitions (Unicode website or Wikipedia), and at best a bit more explanation (for example, it is not locale-dependent, even though the example above is only meaningful for German). Cheers From merwok at netwok.org Tue Jan 17 18:27:31 2012 From: merwok at netwok.org (=?UTF-8?Q?=C3=89ric_Araujo?=) Date: Tue, 17 Jan 2012 18:27:31 +0100 Subject: [Python-Dev] =?utf-8?q?=5BPython-checkins=5D_cpython=3A_provide_a?= =?utf-8?q?_common_method_to_check_for_RETR=5FDATA_validity=2C_first_check?= =?utf-8?q?ing_the?= In-Reply-To: References: Message-ID: <6dbcf4f4a79cd81464501ff4ff9aafb0@netwok.org> Hi Giampaolo, > changeset: 53a5a5b8859d > user: Giampaolo Rodola' > date: Mon Jan 09 17:10:10 2012 +0100 > summary: > provide a common method to check for RETR_DATA validity, first > checking the expected len and then the actual data content; this > way we get a failure on len mismatch rather than content mismatch > (which is very long and unreadable) My trick is to convert long strings to lists (with data.split(appropriate line ending)) and pass them to assertEqual. Then I get more readable element-based diffs when there is a test failure. Another trick I use is this (for example when I don?t want to make too much diff noise, or when I don?t want to build the list of expected results): self.assertEqual(len(got), 3, got) unittest will print the third argument on failure. Regards From matrixhasu at gmail.com Tue Jan 17 19:02:13 2012 From: matrixhasu at gmail.com (Sandro Tosi) Date: Tue, 17 Jan 2012 19:02:13 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: <4E4AF610.5040303@simplistix.co.uk> Message-ID: On Mon, Jan 16, 2012 at 16:42, ?ric Araujo wrote: > Hi, > > Le 14/01/2012 15:31, Sandro Tosi a ?crit : >> >> On Sat, Jan 14, 2012 at 04:24, ?ric Araujo wrote: >>>> >>>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal >>> >>> That?s a mistake I did in cefe4f38fa0e. ?This sentence should be removed. >> >> Do you mean revert this whole hunk: >> [...] >> >> or just "The :keyword:`nonlocal` allows writing to outer scopes."? > > > My proposal was to remove just that one last sentence, but the only > other change in the diff hunk is the addition of ?by default?, which is > connected to the existence of nonlocal. ?Both changes, i.e. the whole > hunk, should be reverted (I think I?ll have time to do that today). I've reverted it with ef1612a6a4f7 >>>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found for >>>> cross-reference u'next': >>> >>> Need to use :meth:`.next` to let Sphinx find the right target (more info >>> on request :) >> >> it seems what it needed to was :meth:`next` (without the dot). The >> current page links all 'next' in file.next() to functions.html#next, >> and using :meth:`next` does that. > > > I should have given more info, as I wanted the opposite result :) > file.next should not link to the next function but to the file.next > method. ?Because Sphinx does not differentiate between > meth/func/class/mod roles, :meth:`next` is not resolved to the nearest > next method as one could expect but to the next function, so we have to > use :meth:`~SomeClass.next` or :meth:`.next` (local ref markup) to get > our links to methods. I tried :meth:`.next` but got a lots of : /home/morph/cpython/py27/Doc/library/stdtypes.rst:2372: WARNING: more than one target found for cross-reference u'next': iterator.next, multifile.MultiFile.next, csv.csvreader.next, dbhash.dbhash.next, mailbox.oldmailbox.next, ttk.Treeview.next, nntplib.NNTP.next, file.next, bsddb.bsddbobject.next, tarfile.TarFile.next, generator.next so I ended up with :meth:`next` but it was still wrong. I've committed 51e11b4937b7 which uses :meth:`~file.next` instead, and it works. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From g.brandl at gmx.net Tue Jan 17 20:33:30 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 17 Jan 2012 20:33:30 +0100 Subject: [Python-Dev] Sphinx version for Python 2.x docs In-Reply-To: References: <4E4AF610.5040303@simplistix.co.uk> Message-ID: Am 17.01.2012 19:02, schrieb Sandro Tosi: >> I should have given more info, as I wanted the opposite result :) >> file.next should not link to the next function but to the file.next >> method. Because Sphinx does not differentiate between >> meth/func/class/mod roles, :meth:`next` is not resolved to the nearest >> next method as one could expect but to the next function, so we have to >> use :meth:`~SomeClass.next` or :meth:`.next` (local ref markup) to get >> our links to methods. > > I tried :meth:`.next` but got a lots of : > > /home/morph/cpython/py27/Doc/library/stdtypes.rst:2372: WARNING: more > than one target found for cross-reference u'next': iterator.next, > multifile.MultiFile.next, csv.csvreader.next, dbhash.dbhash.next, > mailbox.oldmailbox.next, ttk.Treeview.next, nntplib.NNTP.next, > file.next, bsddb.bsddbobject.next, tarfile.TarFile.next, > generator.next > > so I ended up with :meth:`next` but it was still wrong. I've committed > 51e11b4937b7 which uses :meth:`~file.next` instead, and it works. No need to try, just read the docs :) `next` looks in the current (class, then module) namespaces. `.next` looks everywhere, so the match must be unique. So for something as common as "next", an explicit `file.next` is required. Georg From martin at v.loewis.de Tue Jan 17 21:09:02 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 21:09:02 +0100 Subject: [Python-Dev] Script(s) for building Python on Windows In-Reply-To: <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> References: <7E79234E600438479EC119BD241B48D60142BAF1@SN2PRD0604MB141.namprd06.prod.outlook.com> <4F149598.7070006@v.loewis.de> <7E79234E600438479EC119BD241B48D60142C5F3@SN2PRD0604MB141.namprd06.prod.outlook.com> Message-ID: <4F15D55E.4030205@v.loewis.de> > Are you suggesting creating vs10to9, which would be congruent to vs9to8, or > vs9to10? After reconsidering, I don't think I want anything like this in the tree at this point. The code will be outdated by the time Python 3.3 is released, as Python 3.3 will be built with a Visual Studio different from 2008. Regards, Martin P.S. Please shorten your messages. They contain too much text for me to grasp. From martin at v.loewis.de Tue Jan 17 21:29:51 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 17 Jan 2012 21:29:51 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F15DA3F.4010603@v.loewis.de> > I thought that the original problem was that with N insertions in the > dictionary, by repeatedly inserting different keys generating the same > hash value an attacker could arrange that the cost of finding an open > slot is O(N), and thus the cost of N insertions is O(N^2). > > If so, frequent resizing could make the attacker's problem much more > difficult, as the distribution of secondary probes should change with > each resize. Not sure what you mean by "distribution of secondary probes". Let H be the initial hash value, and let MASK be the current size of the dictionary. Then I(n), the sequence of dictionary indices being probed, is computed as I(0) = H & MASK PERTURB(0) = H I(n+1) = (5*I(n) + 1 + PERTURB(n)) & MASK PERTURN(n+1) = PERTURB(n) >> 5 So if two objects O1 and O2 have the same hash value H, the sequence of probed indices is the same for any MASK value. It will be a different sequence, yes, but they will still collide on each and every slot. This is the very nature of open addressing. If it *wouldn't* try all indices in the probe sequence, it may not be possible to perform the lookup for a key correctly. Regards, Martin From solipsis at pitrou.net Tue Jan 17 21:34:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 17 Jan 2012 21:34:40 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions Message-ID: <20120117213440.0008fd70@pitrou.net> Hello, We would like to propose the following PEP to change (C)Python's release cycle. Discussion is welcome, especially from people involved in the release process, and maintainers from third-party distributions of Python. Regards Antoine. PEP: 407 Title: New release cycle and introducing long-term support versions Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou , Georg Brandl , Barry Warsaw Status: Draft Type: Process Content-Type: text/x-rst Created: 2012-01-12 Post-History: Resolution: TBD Abstract ======== Finding a release cycle for an open-source project is a delicate exercise in managing mutually contradicting constraints: developer manpower, availability of release management volunteers, ease of maintenance for users and third-party packagers, quick availability of new features (and behavioural changes), availability of bug fixes without pulling in new features or behavioural changes. The current release cycle errs on the conservative side. It is adequate for people who value stability over reactivity. This PEP is an attempt to keep the stability that has become a Python trademark, while offering a more fluid release of features, by introducing the notion of long-term support versions. Scope ===== This PEP doesn't try to change the maintenance period or release scheme for the 2.7 branch. Only 3.x versions are considered. Proposal ======== Under the proposed scheme, there would be two kinds of feature versions (sometimes dubbed "minor versions", for example 3.2 or 3.3): normal feature versions and long-term support (LTS) versions. Normal feature versions would get either zero or at most one bugfix release; the latter only if needed to fix critical issues. Security fix handling for these branches needs to be decided. LTS versions would get regular bugfix releases until the next LTS version is out. They then would go into security fixes mode, up to a termination date at the release manager's discretion. Periodicity ----------- A new feature version would be released every X months. We tentatively propose X = 6 months. LTS versions would be one out of N feature versions. We tentatively propose N = 4. With these figures, a new LTS version would be out every 24 months, and remain supported until the next LTS version 24 months later. This is mildly similar to today's 18 months bugfix cycle for every feature version. Pre-release versions -------------------- More frequent feature releases imply a smaller number of disruptive changes per release. Therefore, the number of pre-release builds (alphas and betas) can be brought down considerably. Two alpha builds and a single beta build would probably be enough in the regular case. The number of release candidates depends, as usual, on the number of last-minute fixes before final release. Effects ======= Effect on development cycle --------------------------- More feature releases might mean more stress on the development and release management teams. This is quantitatively alleviated by the smaller number of pre-release versions; and qualitatively by the lesser amount of disruptive changes (meaning less potential for breakage). The shorter feature freeze period (after the first beta build until the final release) is easier to accept. The rush for adding features just before feature freeze should also be much smaller. Effect on bugfix cycle ---------------------- The effect on fixing bugs should be minimal with the proposed figures. The same number of branches would be simultaneously open for regular maintenance (two until 2.x is terminated, then one). Effect on workflow ------------------ The workflow for new features would be the same: developers would only commit them on the ``default`` branch. The workflow for bug fixes would be slightly updated: developers would commit bug fixes to the current LTS branch (for example ``3.3``) and then merge them into ``default``. If some critical fixes are needed to a non-LTS version, they can be grafted from the current LTS branch to the non-LTS branch, just like fixes are ported from 3.x to 2.7 today. Effect on the community ----------------------- People who value stability can just synchronize on the LTS releases which, with the proposed figures, would give a similar support cycle (both in duration and in stability). People who value reactivity and access to new features (without taking the risk to install alpha versions or Mercurial snapshots) would get much more value from the new release cycle than currently. People who want to contribute new features or improvements would be more motivated to do so, knowing that their contributions will be more quickly available to normal users. Also, a smaller feature freeze period makes it less cumbersome to interact with contributors of features. Discussion ========== These are open issues that should be worked out during discussion: * Decide on X (months between feature releases) and N (feature releases per LTS release) as defined above. * For given values of X and N, is the no-bugfix-releases policy for non-LTS versions feasible? * Restrict new syntax and similar changes (i.e. everything that was prohibited by PEP 3003) to LTS versions? * What is the effect on packagers such as Linux distributions? * How will release version numbers or other identifying and marketing material make it clear to users which versions are normal feature releases and which are LTS releases? How do we manage user expectations? A community poll or survey to collect opinions from the greater Python community would be valuable before making a final decision. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From martin at v.loewis.de Tue Jan 17 21:43:49 2012 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 21:43:49 +0100 Subject: [Python-Dev] Switching to Visual Studio 2010 Message-ID: <4F15DD85.6000905@v.loewis.de> It seems a number of people are interested that the Python trunk switches to Visual Studio 2010 *now*. I've been hesitant to agree to such a change, as I still hope that Python can skip over VS 2010 (a.k.a. VS 10), and go straight to VS 11. However, I just learned that VS 11 supposed ready VS 10 project files just fine, with no need of conversion. So I'd be willing to agree to converting the Python trunk now. It will surely cause all kinds of issues, as any switching of Visual Studio releases has caused in the past. Since a number of people have already started with such a project, I'd like to ask for a volunteer who will lead this project. You get the honor to commit the changes, and you will be in charge if something breaks, hopefully finding out solutions in a timely manner (not necessarily implementing the solutions yourself). Any volunteers? Regards, Martin P.S. Here is my personal list of requirements and non-requirements: - must continue to live in PCbuild, and must replace the VS 9 project files "for good" - may or may not support automatic conversion to VS 9. If it turns out that conversion to old project files is not feasible, we could either decide to maintain old project files manually (in PC/VS9), or just give up on maintaining build support for old VS releases. - must generate binaries that run on Windows XP - must support x86 and AMD64 builds - must support debug and no-debug builds - must support PGO builds - must support buildbot - must support building all extensions that we currently build - may break existing buildbot installations until they upgrade to a new VS release - must support PCbuild/rt.bat - should support Tools/msi. If it doesn't, I'll look into it. - must nearly pass the test suite (i.e. number of additional failures due to VS 2010 should be "small") From brian at python.org Tue Jan 17 21:51:04 2012 From: brian at python.org (Brian Curtin) Date: Tue, 17 Jan 2012 14:51:04 -0600 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: <4F15DD85.6000905@v.loewis.de> References: <4F15DD85.6000905@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 14:43, "Martin v. L?wis" wrote: > It seems a number of people are interested that the Python trunk > switches to Visual Studio 2010 *now*. I've been hesitant to agree > to such a change, as I still hope that Python can skip over VS 2010 > (a.k.a. ?VS 10), and go straight to VS 11. > > However, I just learned that VS 11 supposed ready VS 10 project files > just fine, with no need of conversion. > > So I'd be willing to agree to converting the Python trunk now. It > will surely cause all kinds of issues, as any switching of Visual Studio > releases has caused in the past. > > Since a number of people have already started with such a project, > I'd like to ask for a volunteer who will lead this project. You > get the honor to commit the changes, and you will be in charge if > something breaks, hopefully finding out solutions in a timely manner > (not necessarily implementing the solutions yourself). > > Any volunteers? I previously completed the port at my old company (but could not release it), and I have a good bit of it completed for us at http://hg.python.org/sandbox/vs2010port/. That repo is a little bit behind 'default' but updating it shouldn't pose any problems. From martin at v.loewis.de Tue Jan 17 21:52:02 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 21:52:02 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: Message-ID: <4F15DF72.1060201@v.loewis.de> > I plan to commit my fix to Python 3.3 if it is accepted. Then write a > simplified version to Python 3.2 and backport it to 3.1. I'm opposed to any change to the hash values of strings in maintenance releases, so I guess I'm opposed to your patch in principle. See my next message for an alternative proposal. > The vulnerability is public since one month, it is maybe time to fix > it before it is widely exploited. I don't think there is any urgency. The vulnerability has been known for more than five years now. From creating a release to the point where the change actually arrives at end users, many months will pass. Regards, Martin From martin at v.loewis.de Tue Jan 17 21:59:28 2012 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 21:59:28 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts Message-ID: <4F15E130.6010200@v.loewis.de> I'd like to propose a different approach to seeding the string hashes: only do so for dictionaries involving only strings, and leave the tp_hash slot of strings unchanged. Each string would get two hashes: the "public" hash, which is constant across runs and bugfix releases, and the dict-hash, which is only used by the dictionary implementation, and only if all keys to the dict are strings. In order to allow caching of the hash, all dicts should use the same hash (if caching wasn't necessary, each dict could use its own seed). There are several variants of that approach wrt. caching of the hash 1. add an additional field to all string objects, to cache the second hash value. a) variant: in 3.3, drop the extra field, and declare that hashes may change across runs 2. only cache the dict-hash, recomputing the public hash each time 3. on a per-string choice, cache either the dict-hash or the public hash, depending on which one gets computed first, and recompute the other one every time it's needed. As you can see, 1 vs. 2/3 is a classical time-space-tradeoff. What do you think? Regards, Martin From martin at v.loewis.de Tue Jan 17 22:01:21 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 22:01:21 +0100 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: References: <4F15DD85.6000905@v.loewis.de> Message-ID: <4F15E1A1.6090303@v.loewis.de> > I previously completed the port at my old company (but could not > release it), and I have a good bit of it completed for us at > http://hg.python.org/sandbox/vs2010port/. That repo is a little bit > behind 'default' but updating it shouldn't pose any problems. So: do you agree that we switch? Do you volunteer to drive the change? Regards, Martin From martin at v.loewis.de Tue Jan 17 22:06:30 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jan 2012 22:06:30 +0100 Subject: [Python-Dev] Python as a Metro-style App In-Reply-To: <4F13E9FB.4090000@dominolaser.com> References: <4F088795.5000800@v.loewis.de> <4F13E9FB.4090000@dominolaser.com> Message-ID: <4F15E2D6.4000409@v.loewis.de> > Just wondering, do Metro apps define UNDER_CE or _WIN32_WCE? The point > is that the old ANSI functions (CreateFileA etc) have been removed from > the embedded MS Windows CE long ago, too, and MS Windows Mobile used to > be a custom CE variant or at least strongly related. In any case, it > could help using the existing (incomplete) CE port as base for Metro. I have now completed building Python as a Metro DLL; the WinRT restrictions are fairly limited (code-wise, not so in impact). They are quite different from the CE restrictions. For example, CreateSemaphore is not available on WinRT, you have to use CreateSemaphoreExW (which is new in Windows Vista). No traces of the CE API can be seen in the restrictions, and the separation is done in a different manner (WINAPI_FAMILY==2). Regards, Martin From brian at python.org Tue Jan 17 22:11:21 2012 From: brian at python.org (Brian Curtin) Date: Tue, 17 Jan 2012 15:11:21 -0600 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: <4F15E1A1.6090303@v.loewis.de> References: <4F15DD85.6000905@v.loewis.de> <4F15E1A1.6090303@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 15:01, "Martin v. L?wis" wrote: >> I previously completed the port at my old company (but could not >> release it), and I have a good bit of it completed for us at >> http://hg.python.org/sandbox/vs2010port/. That repo is a little bit >> behind 'default' but updating it shouldn't pose any problems. > > So: do you agree that we switch? Do you volunteer to drive the change? I do, and I'll volunteer. From solipsis at pitrou.net Tue Jan 17 22:26:11 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 17 Jan 2012 22:26:11 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts References: <4F15E130.6010200@v.loewis.de> Message-ID: <20120117222611.64b3fd4e@pitrou.net> On Tue, 17 Jan 2012 21:59:28 +0100 "Martin v. L?wis" wrote: > I'd like to propose a different approach to seeding the string hashes: > only do so for dictionaries involving only strings, and leave the > tp_hash slot of strings unchanged. I think Python 3 would be better with a clean fix (all hashes randomized). Now for Python 2... The problem with this idea is that it only addresses str dicts. Unicode dicts, and any other dicts, are left vulnerable. Unicode dicts are quite likely in Web frameworks/applications and other places which have well-thought text semantics. That said, here's a suggestion to squeeze those bits: > 1. add an additional field to all string objects, to cache the second > hash value. > a) variant: in 3.3, drop the extra field, and declare that hashes > may change across runs In 2.7, a string object has the following fields: long ob_shash; int ob_sstate; Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits could cache a "hash perturbation" computed from the string and the random bits: - hash() would use ob_shash - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) This way, you cache almost all computations, adding only a computation and a couple logical ops when looking up a string in a dict. Regards Antoine. From mark at hotpy.org Tue Jan 17 23:03:45 2012 From: mark at hotpy.org (Mark Shannon) Date: Tue, 17 Jan 2012 22:03:45 +0000 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: References: Message-ID: <4F15F041.6010607@hotpy.org> Hi all. Lets start controversially: I don't like PEP 380, I think it's a kludge. I think that CPython should have proper coroutines, rather than add more bits and pieces to generators in an attempt to make them more like coroutines. I have mentioned this before, but this time I have done something about it :) I have a working, portable, (asymmetric) coroutine implementation here: https://bitbucket.org/markshannon/hotpy_coroutines Its all standard C, no messing with the C stack, just using standard techniques to convert recursion to iteration (in the VM not at the Python level) and a revised internal calling convention to make CPython stackless: https://bitbucket.org/markshannon/hotpy_stackless Then I've added a Coroutine class and fiddled with the implementation of YIELD_VALUE to support it. I think the stackless implementation is pretty solid, but the coroutine stuff needs some refinement. I've not tested it well (it passes the test suite, but I've added no new tests). It is (surprisingly) a bit faster than tip (on my machine). There are limitations: all calls must be Python-to-Python calls, which rules out most __xxx__ methods. It might be worth special casing __iter__, but I've not done that yet. To try it out: >>> import coroutine To send a value to a coroutine: >>> co.send(val) where co is a Coroutine() To yield a value: >>> coroutine.co_yield(val) send() is a method, co_yield is a function. Here's a little program to demonstrate: import coroutine class Node: def __init__(self, l, item, r): self.l = l self.item = item self.r = r def make_tree(n): if n == 0: return Node(None, n, None) else: return Node(make_tree(n-1), n, make_tree(n-1)) def walk_tree(t, f): if t is not None: walk_tree(t.l, f) f(t) walk_tree(t.r, f) def yielder(t): coroutine.co_yield(t.item) def tree_yielder(t): walk_tree(t, yielder) co = coroutine.Coroutine(tree_yielder, (make_tree(2),)) while True: print(co.send(None)) Which will output: 0 1 0 2 0 1 0 None Traceback (most recent call last): File "co_demo.py", line 30, in print(co.send(None)) TypeError: can't send to a halted coroutine Cheers, Mark. From victor.stinner at haypocalc.com Tue Jan 17 23:06:47 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 23:06:47 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F15E130.6010200@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> Message-ID: 2012/1/17 "Martin v. L?wis" : > I'd like to propose a different approach to seeding the string hashes: > only do so for dictionaries involving only strings, and leave the > tp_hash slot of strings unchanged. The real problem is in dict (or any structure using an hash table), so if it is possible, it would also prefer to fix the problem directly in dict. > There are several variants of that approach wrt. caching of the hash > 1. add an additional field to all string objects, to cache the second > ? hash value. > ? a) variant: in 3.3, drop the extra field, and declare that hashes > ? may change across runs > 2. only cache the dict-hash, recomputing the public hash each time > 3. on a per-string choice, cache either the dict-hash or the public > ? hash, depending on which one gets computed first, and recompute > ? the other one every time it's needed. There is a simpler solution: bucket_index = (hash(str) ^ secret) & DICT_MASK. Remark: set must also be fixed. Victor From victor.stinner at haypocalc.com Tue Jan 17 23:23:48 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 23:23:48 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: References: <4F15E130.6010200@v.loewis.de> Message-ID: > There is a simpler solution: > > bucket_index = (hash(str) ^ secret) & DICT_MASK. Oops, hash^secret doesn't add any security. Victor From ericsnowcurrently at gmail.com Tue Jan 17 23:24:09 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 17 Jan 2012 15:24:09 -0700 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: On Tue, Jan 17, 2012 at 1:34 PM, Antoine Pitrou wrote: > Under the proposed scheme, there would be two kinds of feature > versions (sometimes dubbed "minor versions", for example 3.2 or 3.3): > normal feature versions and long-term support (LTS) versions. ... > A new feature version would be released every X months. We > tentatively propose X = 6 months. > > LTS versions would be one out of N feature versions. We tentatively > propose N = 4. It sounds like every six months we would get a new feature version, with every fourth one an LTS release. That sounds great, but, unless I've misunderstood, there has been a strong desire to keep that number to one digit. It doesn't matter to me all that much. However, if there is such a limit, implied or explicit, it should be mentioned and factor into the PEP. That aside, +1. -eric From victor.stinner at haypocalc.com Tue Jan 17 23:57:46 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 17 Jan 2012 23:57:46 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F15E130.6010200@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> Message-ID: > Each string would get two hashes: the "public" hash, which is constant > across runs and bugfix releases, and the dict-hash, which is only used > by the dictionary implementation, and only if all keys to the dict are > strings. The distinction between secret (private, secure) and "public" hash (deterministic) is not clear to me. Example: collections.UserDict implements __hash__() using hash(self.data). Should it use the public or the private hash? collections.abc.Set computes its hash using hash(x) of each item. Same question. If we need to use the secret hash, it should be exposed in Python. Which function/method would be used? I suppose that we cannot add anything to stable releases like 2.7. Victor From anacrolix at gmail.com Wed Jan 18 00:04:19 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 18 Jan 2012 10:04:19 +1100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: If minor/feature releases are introducing breaking changes perhaps it's time to adopt accelerated major versioning schedule. For instance there are breaking ABI changes between 3.0/3.1, and 3.2, and while acceptable for the early adoption state of Python 3, such changes should normally be reserved for major versions. If every 4th or so feature release is sufficiently different to be worth of an LTS, consider this a major release albeit with smaller beading changes than Python 3. Aside from this, given the radical features of 3.3, and the upcoming Ubuntu 12.04 LTS, I would recommend adopting 2.7 and 3.2 as the first LTSs, to be reviewed 2 years hence should this go ahead. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Wed Jan 18 00:17:13 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 18 Jan 2012 10:17:13 +1100 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F15F041.6010607@hotpy.org> References: <4F15F041.6010607@hotpy.org> Message-ID: Just to clarify, this differs in functionality from enhanced generators by allowing you to yield from an arbitrary call depth rather than having to "yield from" through a chain of calling generators? Furthermore there's no syntactical change except to the bottommost frame doing a co_yield? Does this capture the major differences? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 18 00:20:06 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 00:20:06 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> Message-ID: <20120118002006.7304d768@pitrou.net> Hello, On Wed, 18 Jan 2012 10:04:19 +1100 Matt Joiner wrote: > If minor/feature releases are introducing breaking changes perhaps it's > time to adopt accelerated major versioning schedule. The PEP doesn't propose to accelerate compatibility breakage. So I don't think a change in numbering is required. > For instance there are > breaking ABI changes between 3.0/3.1, and 3.2, and while acceptable for the > early adoption state of Python 3, such changes should normally be reserved > for major versions. Which "breaking ABI changes" are you thinking about? Python doesn't guarantee any A*B*I (as opposed to API), unless you use Py_LIMITED_API which was introduced in 3.2. Regards Antoine. From victor.stinner at haypocalc.com Wed Jan 18 00:25:23 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 18 Jan 2012 00:25:23 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F15DF72.1060201@v.loewis.de> References: <4F15DF72.1060201@v.loewis.de> Message-ID: >> I plan to commit my fix to Python 3.3 if it is accepted. Then write a >> simplified version to Python 3.2 and backport it to 3.1. > > I'm opposed to any change to the hash values of strings in maintenance > releases, so I guess I'm opposed to your patch in principle. If randomized hash cannot be turned on by default, an alternative is to switch them off by default, and add an option (command line option, environment variable, etc.) to enable it. >> The vulnerability is public since one month, it is maybe time to fix >> it before it is widely exploited. > > I don't think there is any urgency. The vulnerability has been known for > more than five years now. From creating a release to the point where > the change actually arrives at end users, many months will pass. In 2003, Python was not seen as vulnerable. Maybe because the hash function is different than Perl hash function, or because nobody tried to generate collisions. Today it is clear that Python is vulnerable (64 bits version is also affected), and it's really fast to generate collisions using the right algorithm. Why is it so long to fix the vulnerability in Python, whereas it was fixed quickly in Ruby? (they chose to use a randomized hash) Victor From tjreedy at udel.edu Wed Jan 18 00:29:11 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 17 Jan 2012 18:29:11 -0500 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: On 1/17/2012 3:34 PM, Antoine Pitrou wrote: > > Hello, > > We would like to propose the following PEP to change (C)Python's release > cycle. Discussion is welcome, especially from people involved in the > release process, and maintainers from third-party distributions of > Python. > > Regards > > Antoine. > > > PEP: 407 > Title: New release cycle and introducing long-term support version To me, as I understand the proposal, the title is wrong. Our current feather releases already are long-term support versions. They get bugfix releases at close to 6 month intervals for 1 1/2 -2 years and security fixes for 3 years. The only change here is that you propose, for instance, a fixed 6-month interval and 2 year period. As I read this, you propose to introduce a new short-term (interim, preview) feature release along with each bugfix release. Each would have all the bugfixes plus a preview of the new features expected to be in the next long-term release. (I know, this is not exactly how you spun it.) There has been discussion on python-ideas about whether new features are or can be considered experimental, or whether there should be an 'experimental' package. An argument against is that long-term production releases should not have experimental features that might go away or have their apis changed. If the short-term, non-production, interim feature releases were called preview releases, then some or all of the new features could be labelled experimental and subject to change. It might actually be good to have major new features tested in at least one preview release before being frozen. Maybe then more of the initial bugs would be found and repaired *before* their initial appearance in a long-term release. (All of this is not to say that experimental features should be casually changed or reverted without good reason.) One problem, at least on Windows, is that short-term releases would almost never have compiled binaries for 3rd-party libraries. It already takes awhile for them to appear for the current long-term releases. On the other hand, library authors might be more inclined to test new features, a few at a time, if part of tested preview releases, than if just in the repository. So the result *might* be quicker library updates after each long-term release. -- Terry Jan Reedy From ethan at stoneleaf.us Tue Jan 17 23:46:35 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 17 Jan 2012 14:46:35 -0800 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F15F041.6010607@hotpy.org> References: <4F15F041.6010607@hotpy.org> Message-ID: <4F15FA4B.2080309@stoneleaf.us> Mark Shannon wrote: > I think that CPython should have proper coroutines, rather than add more > bits and pieces to generators in an attempt to make them more like > coroutines. > > I have mentioned this before, but this time I have done something about > it :) > > I have a working, portable, (asymmetric) coroutine implementation here: > > https://bitbucket.org/markshannon/hotpy_coroutines As a user, this sounds cool! ~Ethan~ From glyph at twistedmatrix.com Wed Jan 18 00:37:31 2012 From: glyph at twistedmatrix.com (Glyph) Date: Tue, 17 Jan 2012 18:37:31 -0500 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F15F041.6010607@hotpy.org> References: <4F15F041.6010607@hotpy.org> Message-ID: <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote: > Lets start controversially: I don't like PEP 380, I think it's a kludge. Too late; it's already accepted. There's not much point in making controversial statements about it now. > I think that CPython should have proper coroutines, rather than add more bits and pieces to generators in an attempt to make them more like coroutines. By "proper" coroutines, you mean implicit coroutines (cooperative threads) rather than explicit coroutines (cooperative generators). Python has been going in the "explicit" direction on this question for a long time. (And, in my opinion, this is the right direction to go, but that's not really relevant here.) I think this discussion would be more suitable for python-ideas though, since you have a long row to hoe here. There's already a PEP - http://www.python.org/dev/peps/pep-0219/ - apparently deferred and not rejected, which you may want to revisit. There are several libraries which can give you cooperative threading already; I assume you're already aware of greenlet and stackless, but I didn't see what advantages your proposed implementation provides over those. I would guess that one of the first things you should address on python-ideas is why adopting your implementation would be a better idea than just bundling one of those with the standard library :). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 18 00:42:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 00:42:21 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions References: <20120117213440.0008fd70@pitrou.net> Message-ID: <20120118004221.56da92cb@pitrou.net> On Tue, 17 Jan 2012 18:29:11 -0500 Terry Reedy wrote: > > To me, as I understand the proposal, the title is wrong. Our current > feather releases already are long-term support versions. They get bugfix > releases at close to 6 month intervals for 1 1/2 -2 years and security > fixes for 3 years. The only change here is that you propose, for > instance, a fixed 6-month interval and 2 year period. > > As I read this, you propose to introduce a new short-term (interim, > preview) feature release along with each bugfix release. Each would have > all the bugfixes plus a preview of the new features expected to be in > the next long-term release. (I know, this is not exactly how you spun it.) Well, "spinning" is important here. We are not proposing any "preview" releases. These would have the same issue as alphas or betas: nobody wants to install them where they could disrupt working applications and libraries. What we are proposing are first-class releases that are as robust as any other (and usable in production). It's really about making feature releases more frequent, not making previews available during development. I agree "long-term" could be misleading as their support duration is not significantly longer than current feature releases. I chose this term because it is quite well-known and well-understood, but we could pick something else ("extended support", "2-year support", etc.). > There has been discussion on python-ideas about whether new features are > or can be considered experimental, or whether there should be an > 'experimental' package. An argument against is that long-term production > releases should not have experimental features that might go away or > have their apis changed. That's orthogonal to this PEP. (that said, more frequent feature releases are also a benefit for the __preview__ proposal, since we could be more reactive changing APIs in that namespace) > One problem, at least on Windows, is that short-term releases would > almost never have compiled binaries for 3rd-party libraries. That's a good point, although Py_LIMITED_API will hopefully make things better in the middle term. Regards Antoine. From ezio.melotti at gmail.com Wed Jan 18 00:50:52 2012 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Wed, 18 Jan 2012 01:50:52 +0200 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: <4F16095C.3050701@gmail.com> Hi, On 17/01/2012 22.34, Antoine Pitrou wrote: > [...] > > Proposal > ======== > > Under the proposed scheme, there would be two kinds of feature > versions (sometimes dubbed "minor versions", for example 3.2 or 3.3): > normal feature versions and long-term support (LTS) versions. > > Normal feature versions would get either zero or at most one bugfix > release; the latter only if needed to fix critical issues. Security > fix handling for these branches needs to be decided. If non-LTS releases won't get bug fixes, a bug that is fixed in 3.3.x might not be fixed in 3.4, unless the bug fixes releases are synchronized with the new feature releases (see below). > LTS versions would get regular bugfix releases until the next LTS > version is out. They then would go into security fixes mode, up to a > termination date at the release manager's discretion. > > Periodicity > ----------- > > A new feature version would be released every X months. We > tentatively propose X = 6 months. > > LTS versions would be one out of N feature versions. We tentatively > propose N = 4. If LTS bug fixes releases and feature releases are synchronized, we will have something like: 3.3 3.3.1 / 3.4 3.3.2 / 3.5 3.3.3 / 3.6 3.7 3.7.1 / 3.8 ... so every new feature release will have all the bug fixes of the current LTS release, plus new features. With this scheme we will soon run out of 1-digit numbers though. Currently we already have a 3.x release every ~18 months, so if we keep doing that (just every 24 months instead of 18) and introduce the feature releases in between under a different versioning scheme, we might avoid the problem. This means: 3.1 ... 18 months, N bug fix releases... 3.2 ... 18 months, N bug fix releases ... 3.3 LTS ... 24 months, 3 bug fix releases, 3 feature releases ... 3.4 LTS ... 24 months, 3 bug fix releases, 3 feature releases ... 3.5 LTS In this way we solve the numbering problem and keep a familiar scheme (all the 3.x will be LTS and will be released as the same pace as before, no need to mark some 3.x as LTS). OTOH this will make the feature releases less "noticeable" and people might just ignore them and stick with the LTS releases. Also we would need to define a versioning convention for the feature releases. > [...] > > Effect on bugfix cycle > ---------------------- > > The effect on fixing bugs should be minimal with the proposed figures. > The same number of branches would be simultaneously open for regular > maintenance (two until 2.x is terminated, then one). Wouldn't it still be two? Bug fixes will go to the last LTS and on default, features only on default. > Effect on workflow > ------------------ > > The workflow for new features would be the same: developers would only > commit them on the ``default`` branch. > > The workflow for bug fixes would be slightly updated: developers would > commit bug fixes to the current LTS branch (for example ``3.3``) and > then merge them into ``default``. So here the difference is that instead of committing on the previous release (what currently is 3.2), we commit it to the previous LTS release, ignoring the ones between that and default. > If some critical fixes are needed to a non-LTS version, they can be > grafted from the current LTS branch to the non-LTS branch, just like > fixes are ported from 3.x to 2.7 today. > > Effect on the community > ----------------------- > > People who value stability can just synchronize on the LTS releases > which, with the proposed figures, would give a similar support cycle > (both in duration and in stability). That's why I proposed to keep the same versioning scheme for these releases, and have a different numbering for the feature releases. > [...] > > Discussion > ========== > > These are open issues that should be worked out during discussion: > > * Decide on X (months between feature releases) and N (feature releases > per LTS release) as defined above. This doesn't necessarily have to be fixed, especially if we don't change the versioning scheme (so we don't need to know that we have a LTS release every N releases). > * For given values of X and N, is the no-bugfix-releases policy for > non-LTS versions feasible? If LTS bug fix releases and feature releases are synchronized it should be feasible. > * Restrict new syntax and similar changes (i.e. everything that was > prohibited by PEP 3003) to LTS versions? (I was reading this the other way around, maybe rephrase it to "Allow new syntax and similar changes only in LTS versions") > * What is the effect on packagers such as Linux distributions? * What is the effect on PyPy/Jython/IronPython? Can they just skip the feature releases and focus on the LTS ones? > * How will release version numbers or other identifying and marketing > material make it clear to users which versions are normal feature > releases and which are LTS releases? How do we manage user > expectations? This is not an issue with the scheme I proposed. > A community poll or survey to collect opinions from the greater Python > community would be valuable before making a final decision. > > [...] Best Regards, Ezio Melotti From tjreedy at udel.edu Wed Jan 18 00:58:55 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 17 Jan 2012 18:58:55 -0500 Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print()) In-Reply-To: References: <941F8C0E-287B-47B1-B657-A2D1304EC0E9@masklinn.net> <20120113171908.4e1da88d@pitrou.net> Message-ID: On 1/17/2012 5:59 AM, anatoly techtonik wrote: > 1. print() buffers output on Python3 > 2. print() also buffers output on Python2, but only on Linux No, print() does not buffer output. It merely sends it to a file. > 4. print() is not guilty - it is sys.stdout.write() that buffers output Oh, you already know that 1&2 are false. So is 4, if interpreted as saying that sys.stdout.write() *will* buffer output. sys.stdout can be *any* file-like object. Its .write method *may* buffer output, or it *may not*. With IDLE, it does not. We have been over this before. At your instigation, the doc has been changed to make this clearer. At your request, a new feature has been added to force flushing. By most people's standards, you won. -- Terry Jan Reedy From jdhardy at gmail.com Wed Jan 18 01:24:34 2012 From: jdhardy at gmail.com (Jeff Hardy) Date: Tue, 17 Jan 2012 16:24:34 -0800 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F16095C.3050701@gmail.com> References: <20120117213440.0008fd70@pitrou.net> <4F16095C.3050701@gmail.com> Message-ID: On Tue, Jan 17, 2012 at 3:50 PM, Ezio Melotti wrote: > * What is the effect on PyPy/Jython/IronPython? ?Can they just skip the > feature releases and focus on the LTS ones? At least for IronPython it's unlikely we'd be able track the feature releases. We're still trying to catch up as it is. Honestly, I don't see the advantages of this. Are there really enough new features planned that Python needs a full release more than every 18 months? - Jeff From martin at v.loewis.de Wed Jan 18 01:30:59 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Wed, 18 Jan 2012 01:30:59 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: References: <4F15E130.6010200@v.loewis.de> Message-ID: <20120118013059.Horde.Ywb7VKGZi1VPFhLDwqojoCA@webmail.df.eu> Zitat von Victor Stinner : >> Each string would get two hashes: the "public" hash, which is constant >> across runs and bugfix releases, and the dict-hash, which is only used >> by the dictionary implementation, and only if all keys to the dict are >> strings. > > The distinction between secret (private, secure) and "public" hash > (deterministic) is not clear to me. It's not about privacy or security. It's about compatibility. The dict-hash is only used in the dict implementation, and never exposed, leaving the tp_hash unmodified. > Example: collections.UserDict implements __hash__() using > hash(self.data). Are you sure? I only see that used for UserString, not UserDict. > collections.abc.Set computes its hash using hash(x) of each item. Same > question. The hash of the Set should most certainly use the element's tp_hash. That *is* the hash of the objects, and it may collide for strings just fine due to the vulnerability. > If we need to use the secret hash, it should be exposed in Python. It's not secret, just specific. I don't mind it being exposed. However, that would be a new feature, which cannot be added in a security fix or bug fix release. > Which function/method would be used? I suppose that we cannot add > anything to stable releases like 2.7. Right. Nor do I see any need to expose it. It fixes the vulnerability just fine without being exposed. Regards, Martin From martin at v.loewis.de Wed Jan 18 01:37:49 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Wed, 18 Jan 2012 01:37:49 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F15DF72.1060201@v.loewis.de> Message-ID: <20120118013749.Horde.m2JwR6GZi1VPFhRdoesjpvA@webmail.df.eu> > If randomized hash cannot be turned on by default, an alternative is > to switch them off by default, and add an option (command line option, > environment variable, etc.) to enable it. That won't really fix the problem. If people install a new release because it fixes a vulnerability, it better does so. > In 2003, Python was not seen as vulnerable. Maybe because the hash > function is different than Perl hash function, or because nobody tried > to generate collisions. Today it is clear that Python is vulnerable > (64 bits version is also affected), and it's really fast to generate > collisions using the right algorithm. There is the common vulnerability to the threat of confusing threats with vulnerabilities [1]. Python was vulnerable all the time, and nobody claimed otherwise. It's just that nobody saw it as a threat. I still don't see it as a practical threat, as there are many ways that people use in practice to protect against this threat already. But I understand that others feel threatened now. > Why is it so long to fix the vulnerability in Python, whereas it was > fixed quickly in Ruby? (they chose to use a randomized hash) Because the risk of breakage for Python is much higher than it is for Ruby. Regards, Martin [1] http://jps.anl.gov/Volume4_iss2/Paper3-RGJohnston.pdf From merwok at netwok.org Wed Jan 18 01:39:17 2012 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Wed, 18 Jan 2012 01:39:17 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Refactored logging rotating handlers for improved flexibility. In-Reply-To: References: Message-ID: <4F1614B5.2070307@netwok.org> Hi, > changeset: 57295c4d81ac > user: Vinay Sajip > date: Wed Jan 04 12:02:26 2012 +0000 > summary: > Refactored logging rotating handlers for improved flexibility. > diff --git a/Doc/howto/logging-cookbook.rst b/Doc/howto/logging-cookbook.rst > --- a/Doc/howto/logging-cookbook.rst > +++ b/Doc/howto/logging-cookbook.rst > [snip] > +These are not ?true? .gz files, as they are bare compressed data, with no > +?container? such as you?d find in an actual gzip file. This snippet is just > +for illustration purposes. I believe using the right characters for quote marks will upset Latex and thus PDF generation, so the docs use ASCII straight quote marks. > diff --git a/Doc/library/logging.handlers.rst b/Doc/library/logging.handlers.rst > --- a/Doc/library/logging.handlers.rst > +++ b/Doc/library/logging.handlers.rst > [snip] > + .. method:: BaseRotatingHandler.rotation_filename(default_name) > + > + Modify the filename of a log file when rotating. > + > + This is provided so that a custom filename can be provided. > + > + The default implementation calls the 'namer' attribute of the handler, > + if it's callable, passing the default name to it. If the attribute isn't > + callable (the default is `None`), the name is returned unchanged. Should be ``None``. Regards From martin at v.loewis.de Wed Jan 18 01:58:42 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jan 2012 01:58:42 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <20120117222611.64b3fd4e@pitrou.net> References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> Message-ID: <4F161942.5040100@v.loewis.de> Am 17.01.2012 22:26, schrieb Antoine Pitrou: > On Tue, 17 Jan 2012 21:59:28 +0100 > "Martin v. L?wis" wrote: >> I'd like to propose a different approach to seeding the string hashes: >> only do so for dictionaries involving only strings, and leave the >> tp_hash slot of strings unchanged. > > I think Python 3 would be better with a clean fix (all hashes > randomized). > Now for Python 2... The problem with this idea is that it only > addresses str dicts. Unicode dicts, and any other dicts, are left > vulnerable. No, you misunderstood. I meant to propose that this applies to both kinds of string (unicode and byte strings); for 2.x also dictionaries including a mix of them. > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > could cache a "hash perturbation" computed from the string and the > random bits: > > - hash() would use ob_shash > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > This way, you cache almost all computations, adding only a computation > and a couple logical ops when looking up a string in a dict. That's a good idea. For Unicode, it might be best to add another slot into the object, even though this increases the object size. Regards, Martin From stephen at xemacs.org Wed Jan 18 03:37:08 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 18 Jan 2012 11:37:08 +0900 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> Executive summary: My take is "show us the additional resources, and don't be stingy!" Sorry, Antoine, I agree with your goals, but I think you are too optimistic about the positive effects and way too optimistic about the costs. Antoine Pitrou writes: > Finding a release cycle for an open-source project is a delicate > exercise in managing mutually contradicting constraints: developer > manpower, This increases the demand for developer manpower somewhat. > availability of release management volunteers, Dramatic increase here. It may look like RM is not so demanding -- run a few scripts to put out the alphas/betas/releases. But the RM needs to stay on top of breaking news, make decisions. That takes time, interrupts other work, etc. > ease of maintenance for users and third-party packagers, Dunno about users, but 3rd party packagers will also have more work to do, or will have to tell their users "we only promise compatibility with LTS releases." > quick availability of new features (and behavioural changes), These are already *available*, just not *tested*. Since testing is the bottleneck on what users consider to be "available for me", you cannot decrease the amount of testing (alpha, beta releases) by anywhere near the amount you're increasing frequency, or you're just producing "as is" snapshots. Percentage of time in feature freeze goes way up, features get introduced all at once just before the next release, schedule slippage is inevitable on some releases. > availability of bug fixes without pulling in new features or > behavioural changes. Sounds like a slight further increase in demand for RM, and as described a dramatic decrease in the bugfixing for throw-away releases. > The current release cycle errs on the conservative side. What evidence do you have for that, besides people who aren't RMs wishing that somebody else would do more RM work? > More feature releases might mean more stress on the development and > release management teams. This is quantitatively alleviated by the > smaller number of pre-release versions; and qualitatively by the > lesser amount of disruptive changes (meaning less potential for > breakage). Way optimistic IMO (theoretical, admitted, but I do release management for a less well-organized project, and I teach in a business school, FWIW). > The shorter feature freeze period (after the first beta build until > the final release) is easier to accept. But you need to look at total time in feature freeze over the LTS cycle, not just before each throw-away release. > The rush for adding features just before feature freeze should also > be much smaller. This doesn't depend on the length of time in feature freeze per release, it depends on the fraction of time in feature freeze over the cycle. Given your quality goals, this will go way up. From tjreedy at udel.edu Wed Jan 18 05:32:04 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 17 Jan 2012 23:32:04 -0500 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120118004221.56da92cb@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> <20120118004221.56da92cb@pitrou.net> Message-ID: On 1/17/2012 6:42 PM, Antoine Pitrou wrote: > On Tue, 17 Jan 2012 18:29:11 -0500 > Terry Reedy wrote: >> >> To me, as I understand the proposal, the title is wrong. Our current >> feather releases already are long-term support versions. They get bugfix >> releases at close to 6 month intervals for 1 1/2 -2 years and security >> fixes for 3 years. The only change here is that you propose, for >> instance, a fixed 6-month interval and 2 year period. >> >> As I read this, you propose to introduce a new short-term (interim, >> preview) feature release along with each bugfix release. Each would have >> all the bugfixes plus a preview of the new features expected to be in >> the next long-term release. (I know, this is not exactly how you spun it.) The main point of my comment is that the new thing you are introducing is not long-term supported versions but short term unsupported versions. > Well, "spinning" is important here. We are not proposing any "preview" > releases. These would have the same issue as alphas or betas: nobody I said nothing about quality. We aim to keep default in near-release condition and seem to be getting better. The new unicode is still getting polished a bit, it seems, after 3 months, but that is fairly unusual. > wants to install them where they could disrupt working applications and > libraries. > > What we are proposing are first-class releases that are as robust as > any other (and usable in production). But I am dubious that releases that are obsolete in 6 months and lack 3rd party support will see much production use. > It's really about making feature releases more frequent, > not making previews available during development. Given the difficulty of making a complete windows build, it would be nice to have one made available every 6 months, regardless of how it is labeled. I believe that some people will see and use good-for-6-months releases as previews of the new features that will be in the 'real', normal, bug-fix supported, long-term releases. Every release is a snapshot of a continuous process, with some extra effort made to tie up some (but not all) of the loose ends. -- Terry Jan Reedy From greg at krypto.org Wed Jan 18 06:58:51 2012 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 17 Jan 2012 21:58:51 -0800 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F15E130.6010200@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. L?wis" wrote: > I'd like to propose a different approach to seeding the string hashes: > only do so for dictionaries involving only strings, and leave the > tp_hash slot of strings unchanged. > > Each string would get two hashes: the "public" hash, which is constant > across runs and bugfix releases, and the dict-hash, which is only used > by the dictionary implementation, and only if all keys to the dict are > strings. In order to allow caching of the hash, all dicts should use > the same hash (if caching wasn't necessary, each dict could use its own > seed). > > There are several variants of that approach wrt. caching of the hash > 1. add an additional field to all string objects, to cache the second > hash value. > yuck, our objects are large enough as it is. > a) variant: in 3.3, drop the extra field, and declare that hashes > may change across runs > +1 Absolutely. We can and should make 3.3 change hashes across runs (behavior that can be disabled via a flag or environment variable). I think the issue of doctests and such breaking even in 2.7 due to hash order changes is a being overblown. Code like that has already needs to fix its tests at least once when they want tests to pass on on both 32-bit and 64-bit python VMs (they have different hashes). Do we have _any_ measure of how big a deal this will be before going too far here? -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Wed Jan 18 07:06:33 2012 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 17 Jan 2012 22:06:33 -0800 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F15DF72.1060201@v.loewis.de> References: <4F15DF72.1060201@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 12:52 PM, "Martin v. L?wis" wrote: > > I plan to commit my fix to Python 3.3 if it is accepted. Then write a > > simplified version to Python 3.2 and backport it to 3.1. > > I'm opposed to any change to the hash values of strings in maintenance > releases, so I guess I'm opposed to your patch in principle. > Please at least consider his patch for 3.3 onwards then. Changing the hash seed per interpreter instance / process is the right thing to do going forward. What to do on maintenance releases is a separate discussion. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Jan 18 08:15:35 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jan 2012 08:15:35 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: References: <4F15DF72.1060201@v.loewis.de> Message-ID: <4F167197.5020109@v.loewis.de> Am 18.01.2012 07:06, schrieb Gregory P. Smith: > > On Tue, Jan 17, 2012 at 12:52 PM, "Martin v. L?wis" > wrote: > > > I plan to commit my fix to Python 3.3 if it is accepted. Then write a > > simplified version to Python 3.2 and backport it to 3.1. > > I'm opposed to any change to the hash values of strings in maintenance > releases, so I guess I'm opposed to your patch in principle. > > > Please at least consider his patch for 3.3 onwards then. Changing the > hash seed per interpreter instance / process is the right thing to do > going forward. For 3.3 onwards, I'm skeptical whether all this configuration support is really necessary. I think a much smaller patch which leaves no choice would be more appropriate. Regards, Martin From martin at v.loewis.de Wed Jan 18 08:19:44 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jan 2012 08:19:44 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: References: <4F15E130.6010200@v.loewis.de> Message-ID: <4F167290.4090800@v.loewis.de> > +1 Absolutely. We can and should make 3.3 change hashes across runs > (behavior that can be disabled via a flag or environment variable). > > I think the issue of doctests and such breaking even in 2.7 due to hash > order changes is a being overblown. Code like that has already needs to > fix its tests at least once when they want tests to pass on on both > 32-bit and 64-bit python VMs (they have different hashes). Do we have > _any_ measure of how big a deal this will be before going too far here? My concern is not about breaking doctests: this proposal will also break them. My concern is about applications that assume that hash(s) is stable across runs, and we do have reports that it will break applications. Regards, Martin From g.brandl at gmx.net Wed Jan 18 08:46:39 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jan 2012 08:46:39 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <20120118004221.56da92cb@pitrou.net> Message-ID: Am 18.01.2012 05:32, schrieb Terry Reedy: > On 1/17/2012 6:42 PM, Antoine Pitrou wrote: >> On Tue, 17 Jan 2012 18:29:11 -0500 >> Terry Reedy wrote: >>> >>> To me, as I understand the proposal, the title is wrong. Our current >>> feather releases already are long-term support versions. They get bugfix >>> releases at close to 6 month intervals for 1 1/2 -2 years and security >>> fixes for 3 years. The only change here is that you propose, for >>> instance, a fixed 6-month interval and 2 year period. >>> >>> As I read this, you propose to introduce a new short-term (interim, >>> preview) feature release along with each bugfix release. Each would have >>> all the bugfixes plus a preview of the new features expected to be in >>> the next long-term release. (I know, this is not exactly how you spun it.) > > The main point of my comment is that the new thing you are introducing > is not long-term supported versions but short term unsupported versions. That is really a matter of perspective. For the proposed cycle, there would be more regular version than LTS versions, so they are the exception and get the special name. (And at the same time, the name is already established and people probably grasp instantly what it means.) >> Well, "spinning" is important here. We are not proposing any "preview" >> releases. These would have the same issue as alphas or betas: nobody > > I said nothing about quality. We aim to keep default in near-release > condition and seem to be getting better. The new unicode is still > getting polished a bit, it seems, after 3 months, but that is fairly > unusual. > >> wants to install them where they could disrupt working applications and >> libraries. >> >> What we are proposing are first-class releases that are as robust as >> any other (and usable in production). > > But I am dubious that releases that are obsolete in 6 months and lack > 3rd party support will see much production use. Whether people would use the releases is probably something that only they can tell us -- that's why a community survey is mentioned in the PEP. Not sure what you mean by lacking 3rd party support. >> It's really about making feature releases more frequent, > > not making previews available during development. > > Given the difficulty of making a complete windows build, it would be > nice to have one made available every 6 months, regardless of how it is > labeled. > > I believe that some people will see and use good-for-6-months releases > as previews of the new features that will be in the 'real', normal, > bug-fix supported, long-term releases. Maybe they will. That's another thing that is made clear in the PEP: for one group of people (those preferring stability over long time), nothing much changes, except that the release period is a little longer, and there are these "previews" as you call them. Georg From p.f.moore at gmail.com Wed Jan 18 08:44:30 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 18 Jan 2012 07:44:30 +0000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <20120118004221.56da92cb@pitrou.net> Message-ID: On 18 January 2012 04:32, Terry Reedy wrote: >> It's really about making feature releases more frequent, > >> not making previews available during development. > > Given the difficulty of making a complete windows build, it would be nice to > have one made available every 6 months, regardless of how it is labeled. > > I believe that some people will see and use good-for-6-months releases as > previews of the new features that will be in the 'real', normal, bug-fix > supported, long-term releases. I'd love to see 6-monthly releases, including Windows binaries, and binary builds of all packages that needed a compiler to build. Oh, and a pony every LTS release :-) Seriously, this proposal doesn't really acknowledge the amount of work by other people that would be needed for a 6-month release to be *usable* in normal cases (by Windows users, at least). It's usually some months after a release on the current schedule that Windows binaries have appeared for everything I use regularly. I could easily imagine 3rd-party developers tending to only focus on LTS releases, making the release cycle effectively *slower* for me, rather than faster. Paul PS Things that might help improve this: (1) PY_LIMITED_API, and (2) support in packaging for binary releases, including a way to force installation of a binary release on the "wrong" version (so that developers don't have to repackage and publish identical binaries every 6 months). From g.brandl at gmx.net Wed Jan 18 08:55:08 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jan 2012 08:55:08 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <4F16095C.3050701@gmail.com> Message-ID: Am 18.01.2012 01:24, schrieb Jeff Hardy: > On Tue, Jan 17, 2012 at 3:50 PM, Ezio Melotti wrote: >> * What is the effect on PyPy/Jython/IronPython? Can they just skip the >> feature releases and focus on the LTS ones? > > At least for IronPython it's unlikely we'd be able track the feature > releases. We're still trying to catch up as it is. > > Honestly, I don't see the advantages of this. Are there really enough > new features planned that Python needs a full release more than every > 18 months? Yes, we think so. (What is a non-full release, by the way?) The main reason is changes in the library. We have been getting complaints about the standard library bitrotting for years now, and one of the main reasons it's so hard to a) get decent code into the stdlib and b) keep it maintained is that the release cycles are so long. It's a tough thing for contributors to accept that the feature you've just implemented will only be in a stable release in 16 months. If the stdlib does not get more reactive, it might just as well be cropped down to a bare core, because 3rd-party libraries do everything as well and do it before we do. But you're right that if Python came without batteries, the current release cycle would be fine. (Another, more far-reaching proposal, has been to move the stdlib out of the cpython repo and share a new repo with Jython/IronPython/PyPy. It could then also be released separately from the core. But this is much more work than the current proposal.) Georg From p.f.moore at gmail.com Wed Jan 18 08:52:20 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 18 Jan 2012 07:52:20 +0000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <20120118004221.56da92cb@pitrou.net> Message-ID: On 18 January 2012 07:46, Georg Brandl wrote: >> But I am dubious that releases that are obsolete in 6 months and lack >> 3rd party support will see much production use. > > Whether people would use the releases is probably something that only > they can tell us -- that's why a community survey is mentioned in the > PEP. The class of people who we need to consider carefully is those who want to use the latest release, but are limited by the need for other parties to release stuff that works with that release (usually, this means Windows binaries of extensions, or platform vendor packaged releases of modules/packages). For them, if the other parties focus on LTS releases (as is possible, certainly) the release cycle became slower, going from 18 months to 24. > Not sure what you mean by lacking 3rd party support. I take it as meaning that the people who release Windows binaries on PyPI, and vendors who package up PyPI distributions in their own distribution format. Lacking support in the sense that these people might well decide that a 6 month cycle is too fast (too much work) and explicitly decide to focus only on LTS releases. Paul From g.brandl at gmx.net Wed Jan 18 09:00:55 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jan 2012 09:00:55 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F16095C.3050701@gmail.com> References: <20120117213440.0008fd70@pitrou.net> <4F16095C.3050701@gmail.com> Message-ID: Am 18.01.2012 00:50, schrieb Ezio Melotti: > Hi, > > On 17/01/2012 22.34, Antoine Pitrou wrote: >> [...] >> >> Proposal >> ======== >> >> Under the proposed scheme, there would be two kinds of feature >> versions (sometimes dubbed "minor versions", for example 3.2 or 3.3): >> normal feature versions and long-term support (LTS) versions. >> >> Normal feature versions would get either zero or at most one bugfix >> release; the latter only if needed to fix critical issues. Security >> fix handling for these branches needs to be decided. > > If non-LTS releases won't get bug fixes, a bug that is fixed in 3.3.x > might not be fixed in 3.4, unless the bug fixes releases are > synchronized with the new feature releases (see below). That's already the case today. 3.2.5 might be released before 3.3.1 and therefore include bugfixes that 3.3.0 doesn't. True, there will be a 3.3.1 afterwards that does include it, but in the new case, there will be a new feature release instead. >> LTS versions would get regular bugfix releases until the next LTS >> version is out. They then would go into security fixes mode, up to a >> termination date at the release manager's discretion. >> >> Periodicity >> ----------- >> >> A new feature version would be released every X months. We >> tentatively propose X = 6 months. >> >> LTS versions would be one out of N feature versions. We tentatively >> propose N = 4. > > If LTS bug fixes releases and feature releases are synchronized, we will > have something like: > > 3.3 > 3.3.1 / 3.4 > 3.3.2 / 3.5 > 3.3.3 / 3.6 > 3.7 > 3.7.1 / 3.8 > ... > > so every new feature release will have all the bug fixes of the current > LTS release, plus new features. > > With this scheme we will soon run out of 1-digit numbers though. > Currently we already have a 3.x release every ~18 months, so if we keep > doing that (just every 24 months instead of 18) and introduce the > feature releases in between under a different versioning scheme, we > might avoid the problem. > > This means: > 3.1 > ... 18 months, N bug fix releases... > 3.2 > ... 18 months, N bug fix releases ... > 3.3 LTS > ... 24 months, 3 bug fix releases, 3 feature releases ... > 3.4 LTS > ... 24 months, 3 bug fix releases, 3 feature releases ... > 3.5 LTS > > In this way we solve the numbering problem and keep a familiar scheme > (all the 3.x will be LTS and will be released as the same pace as > before, no need to mark some 3.x as LTS). OTOH this will make the > feature releases less "noticeable" and people might just ignore them and > stick with the LTS releases. Also we would need to define a versioning > convention for the feature releases. Let's see how Guido feels about 3.10 first. >> [...] >> >> Effect on bugfix cycle >> ---------------------- >> >> The effect on fixing bugs should be minimal with the proposed figures. >> The same number of branches would be simultaneously open for regular >> maintenance (two until 2.x is terminated, then one). > > Wouldn't it still be two? > Bug fixes will go to the last LTS and on default, features only on default. "Maintenance" excludes the feature development branch here. Will clarify. >> Effect on workflow >> ------------------ >> >> The workflow for new features would be the same: developers would only >> commit them on the ``default`` branch. >> >> The workflow for bug fixes would be slightly updated: developers would >> commit bug fixes to the current LTS branch (for example ``3.3``) and >> then merge them into ``default``. > > So here the difference is that instead of committing on the previous > release (what currently is 3.2), we commit it to the previous LTS > release, ignoring the ones between that and default. Yes. >> If some critical fixes are needed to a non-LTS version, they can be >> grafted from the current LTS branch to the non-LTS branch, just like >> fixes are ported from 3.x to 2.7 today. >> >> Effect on the community >> ----------------------- >> >> People who value stability can just synchronize on the LTS releases >> which, with the proposed figures, would give a similar support cycle >> (both in duration and in stability). > > That's why I proposed to keep the same versioning scheme for these > releases, and have a different numbering for the feature releases. > >> [...] >> >> Discussion >> ========== >> >> These are open issues that should be worked out during discussion: >> >> * Decide on X (months between feature releases) and N (feature releases >> per LTS release) as defined above. > > This doesn't necessarily have to be fixed, especially if we don't change > the versioning scheme (so we don't need to know that we have a LTS > release every N releases). For these relatively short times (X = 6 months), I feel it is important to fix the time spans to have predictability for our developers. Georg From mark at hotpy.org Wed Jan 18 09:47:57 2012 From: mark at hotpy.org (Mark Shannon) Date: Wed, 18 Jan 2012 08:47:57 +0000 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: References: <4F15F041.6010607@hotpy.org> Message-ID: <4F16873D.5090507@hotpy.org> Matt Joiner wrote: > Just to clarify, this differs in functionality from enhanced generators > by allowing you to yield from an arbitrary call depth rather than having > to "yield from" through a chain of calling generators? Furthermore > there's no syntactical change except to the bottommost frame doing a > co_yield? Does this capture the major differences? > Yes. From mark at hotpy.org Wed Jan 18 10:23:49 2012 From: mark at hotpy.org (Mark Shannon) Date: Wed, 18 Jan 2012 09:23:49 +0000 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> Message-ID: <4F168FA5.2000503@hotpy.org> Glyph wrote: > On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote: > >> Lets start controversially: I don't like PEP 380, I think it's a kludge. > > Too late; it's already accepted. There's not much point in making > controversial statements about it now. Why is it too late? Presenting this as a fait accompli does not make it any better. The PEP mailing list is closed to most people, so what forum for debate is there? > >> I think that CPython should have proper coroutines, rather than add >> more bits and pieces to generators in an attempt to make them more >> like coroutines. > > By "proper" coroutines, you mean implicit coroutines (cooperative > threads) rather than explicit coroutines (cooperative generators). Nothing "implicit" about it. > Python has been going in the "explicit" direction on this question for > a long time. (And, in my opinion, this is the right direction to go, > but that's not really relevant here.) You can use asymmetric coroutines with a scheduler to provide cooperative threads if you want, but coroutines not have to be used as threads. The key advantages of my coroutine implmentation over PEP 380 are: 1. No syntax change. 2. Code can be used in coroutines without modification. 3. No stack unwinding is required at a yield point. > > I think this discussion would be more suitable for python-ideas though, > since you have a long row to hoe here. There's already a PEP - > http://www.python.org/dev/peps/pep-0219/ - apparently deferred and not > rejected, which you may want to revisit. > > There are several libraries which can give you cooperative threading > already; I assume you're already aware of greenlet and stackless, but I > didn't see what advantages your proposed implementation provides over > those. I would guess that one of the first things you should address on > python-ideas is why adopting your implementation would be a better idea > than just bundling one of those with the standard library :). Already been discussed: http://mail.python.org/pipermail/python-ideas/2011-October/012571.html All of the objections to coroutines (as I propose) also apply to PEP 380. The advantage of my implementation over greenlets is portability. I suspect stackless is actually fairly similar to what I have done, I haven't checked in detail. Cheers, Mark. From victor.stinner at haypocalc.com Wed Jan 18 10:54:26 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 18 Jan 2012 10:54:26 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F167197.5020109@v.loewis.de> References: <4F15DF72.1060201@v.loewis.de> <4F167197.5020109@v.loewis.de> Message-ID: 2012/1/18 "Martin v. L?wis" : > For 3.3 onwards, I'm skeptical whether all this configuration support is > really necessary. I think a much smaller patch which leaves no choice > would be more appropriate. The configuration helps unit testing: see changes on Lib/test/*.py in my last patch. I hesitate to say that the configuration is required for tests. Anyway, users upgrading from Python 3.2 to 3.3 may need to keep the same hash function and don't care of security (e.g. programs running locally with trusted data). Victor From hrvoje.niksic at avl.com Wed Jan 18 11:15:49 2012 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Wed, 18 Jan 2012 11:15:49 +0100 Subject: [Python-Dev] Status of the fix for the hash collision vulnerability In-Reply-To: <4F15DA3F.4010603@v.loewis.de> References: <4F125953.5060309@pearwood.info> <20120117091636.Horde.6hzGLqGZi1VPFS5kLcfCSXA@webmail.df.eu> <8739be7ff5.fsf@uwakimon.sk.tsukuba.ac.jp> <4F15DA3F.4010603@v.loewis.de> Message-ID: <4F169BD5.4030703@avl.com> On 01/17/2012 09:29 PM, "Martin v. L?wis" wrote: > I(0) = H& MASK > PERTURB(0) = H > I(n+1) = (5*I(n) + 1 + PERTURB(n))& MASK > PERTURN(n+1) = PERTURB(n)>> 5 > > So if two objects O1 and O2 have the same hash value H, the sequence of > probed indices is the same for any MASK value. It will be a different > sequence, yes, but they will still collide on each and every slot. > > This is the very nature of open addressing. Open addressing can still deploy a collision resolution mechanism without this property. For example, double hashing uses a different hash function (applied to the key) to calculate PERTURB(0). To defeat it, the attacker would have to produce keys that hash the same using both hash functions. Double hashing is not a good general solution for Python dicts because it complicates the interface of hash tables that support arbitrary keys. Still, it could be considered for dicts with known key types (built-ins could hardcode the alternative hash function) or for SafeDicts, if they are still considered. Hrvoje From solipsis at pitrou.net Wed Jan 18 12:00:00 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 12:00:00 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions References: <20120117213440.0008fd70@pitrou.net> <20120118004221.56da92cb@pitrou.net> Message-ID: <20120118120000.7aaae1ad@pitrou.net> On Wed, 18 Jan 2012 07:52:20 +0000 Paul Moore wrote: > On 18 January 2012 07:46, Georg Brandl wrote: > >> But I am dubious that releases that are obsolete in 6 months and lack > >> 3rd party support will see much production use. > > > > Whether people would use the releases is probably something that only > > they can tell us -- that's why a community survey is mentioned in the > > PEP. > > The class of people who we need to consider carefully is those who > want to use the latest release, but are limited by the need for other > parties to release stuff that works with that release (usually, this > means Windows binaries of extensions, or platform vendor packaged > releases of modules/packages). Well, do consider, though, that anyone not using third-party C extensions under Windows (either Windows users that are content with pure Python libs, or users of other platforms) won't have that problem. That should be quite a lot of people already. As for vendors, they have their own release management independent of ours already, so this PEP wouldn't change anything for them. Regards Antoine. From solipsis at pitrou.net Wed Jan 18 12:15:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 12:15:30 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120118121530.2e6a3b52@pitrou.net> On Wed, 18 Jan 2012 11:37:08 +0900 "Stephen J. Turnbull" wrote: > > availability of release management volunteers, > > Dramatic increase here. It may look like RM is not so demanding -- > run a few scripts to put out the alphas/betas/releases. But the RM > needs to stay on top of breaking news, make decisions. That takes > time, interrupts other work, etc. Georg and Barry may answer you here: they are release managers and PEP co-authors. > > quick availability of new features (and behavioural changes), > > These are already *available*, just not *tested*. > > Since testing is the bottleneck on what users consider to be > "available for me", you cannot decrease the amount of testing (alpha, > beta releases) by anywhere near the amount you're increasing > frequency, or you're just producing "as is" snapshots. The point is to *increase* the amount of testing by making features available in stable releases on a more frequent basis. Not decrease it. Alphas and betas never produce much feedback, because people are reluctant to install them for anything else than toying around. Python is not emacs or Firefox, you don't use it in a vacuum and therefore installing non-stable versions is dangerous. Regards Antoine. From ncoghlan at gmail.com Wed Jan 18 12:26:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Jan 2012 21:26:19 +1000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: This won't be a surprise to Antoine or Georg (since I've already expressed the same opinion privately), but I'm -1 on the idea of official releases of the whole shebang every 6 months. We're not Ubuntu, Fedora, Chrome or Firefox with a for-profit company (or large foundation) with multiple paid employees kicking around to really drive the QA process. If we had official support from Red Hat or Canonical promising to devote paid QA and engineering resources to keeping things on track my opinion might be different, but that is highly unlikely. I'm also wholly in agreement with Ezio that using the same versioning scheme for both full releases and interim releases is thoroughly confusing for users (for example, I consider Red Hat's completely separate branding and versioning for Fedora and RHEL a better model for end users than Canonical's more subtle 'Ubuntu' and 'Ubuntu LTS' distinction, and that's been my opinion since long before I started working for RH). My original suggestion to Antoine and Georg for 3.4 was that we simply propose to Larry Hastings (the 3.4 RM) that we spread out the release cycle, releasing the first alpha after ~6 months, the second after about ~12, then rolling into the regular release cycle of a final alpha, some beta releases, one or two release candidates and then the actual release. However, I'm sympathetic to Antoine's point that early alphas aren't likely to be at all interesting to folks that would like a fully supported stdlib update to put into production and no longer think that suggestion makes much sense on its own. Instead, if the proposal involves instituting a PEP 3003 style moratorium (i.e. stdlib changes only) for all interim releases, then we're essentially talking about splitting the versioning of the core language (and the CPython C API) and the standard library. If we're going to discuss that, we may as well go a bit further and just split development of the two out onto separate branches, with the current numbering scheme applying to full language version releases and switching to a date-based versioning scheme for the standard library (i.e. if 3.3 goes out in August as planned, then it would be "Python 3.3 with the 12.08 stdlib release"). What might such a change mean? 1. For 3.3, the following releases would be made: - 3.2.x is cut from the 3.2 branch (1 rc + 1 release) - 3.3.0 + PyStdlib 12.08 is created from the default branch (1 alpha, 2 betas, 1+ rc, 1 release) - the 3.3 maintenance branch is created - the stdlib development branch is created 2. Once 3.2 goes into security-fix only mode, this would then leave us with 4 active branches: - 2.7 (maintenance) - 3.3 (maintenance) - stdlib (Python 3.3 compatible, PEP 3003 compliant updates) - default (3.4 development) The 2.7 branch would remain a separate head of development, but for 3.x development the update flow would become: Bug fixes: 3.3->stdlib->default Stdlib features: stdlib->default Language changes: default 3. Somewhere around February 2013, we prepare to release Python 3.4a1 and 3.3.1, along with PyStdlib 13.02: - 3.3.1 + PyStdlib 12.08 is cut from the 3.3 branch (1 rc + 1 release) - 3.3.1 + PyStdlib 13.02 comes from the stdlib branch (1 alpha, 1 beta, 1+ rc, 1 release) - 3.4.0a1 comes from the default branch (may include additional stdlib changes) 4. Around August 2013 this process repeats: - 3.3.2 + PyStdlib 12.08 is cut from the 3.3 branch - 3.3.2 + PyStdlib 13.08 comes from the stdlib branch (final 3.3 compatible stdlib release) - 3.4.0a2 comes from the default branch 5. And then in February 2014, we gear up for a new major release: - 3.3.3 is cut from the 3.3 branch and the 3.3 branch enters security fix only mode - 3.4.0 + PyStdlib 14.02 is created from the default branch (1 alpha, 2 betas, 1+ rc, 1 release) - the 3.4 maintenance branch is created and merged into the stdlib branch (alternatively, Feb 2014 could be another interim release of 3.4 alpha and a 3.3 compatible stdlib updated, with 3.4 delayed until August 2014) I believe this approach would get to the core of what the PEP authors want (i.e. more frequent releases of the standard library), while being quite explicit in *avoiding* the concerns associated with more frequent releases of the core language itself. The rate of updates on the language spec, the C API (and ABI), the bytecode format and the AST would remain largely unchanged at 18-24 months. Other key protocols (e.g. default pickle formats) could also be declared ineligible for changes in interim releases. If a critical security problem is found, then additional releases may be cut for the maintenance branch and for the stdlib branch. There's a slight annoyance in having all development filtered through an additional branch, but there's a large advantage in that having a stable core in the stdlib branch makes it more likely we'll be able to use it as a venue for collaboration with the PyPy, Jython and IronPython folks (they all have push rights and a separate branch means they can use it without having to worry about any of the core changes going on in the default branch). A separate branch with combined "3.x.y + PyStdlib YY.MM" releases is also significantly less work than trying to split the stdlib out completely into a separate repo. Regards, Nick. From glyph at twistedmatrix.com Wed Jan 18 12:27:39 2012 From: glyph at twistedmatrix.com (Glyph) Date: Wed, 18 Jan 2012 06:27:39 -0500 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F168FA5.2000503@hotpy.org> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> Message-ID: <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> On Jan 18, 2012, at 4:23 AM, Mark Shannon wrote: > Glyph wrote: >> On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote: >>> Lets start controversially: I don't like PEP 380, I think it's a kludge. >> Too late; it's already accepted. There's not much point in making controversial statements about it now. > > Why is it too late? Because discussion happens before the PEP is accepted. See the description of the workflow in . The time to object to PEP 380 was when those threads were going on. > Presenting this as a fait accompli does not make it any better. But it is[1] a fait accompli, whether you like it or not; I'm first and foremost informing you of the truth, not trying to make you feel better (or worse). Secondly, I am trying to forestall a long and ultimately pointless conversation :). > The PEP mailing list is closed to most people, The PEP mailing list is just where you submit your PEPs, and where the PEP editors do their work. I'm not on it, but to my understanding of the process, there's not really any debate there. > so what forum for debate is there? python-ideas, and then this mailing list, in that order. Regarding PEP 380 specifically, there's been quite a bit. See for example . Keep in mind that the purpose of debate in this context is to inform Guido's opinion. There's no voting involved, although he will occasionally delegate decisions about particular PEPs to people knowledgeable in a relevant area. >> I think this discussion would be more suitable for python-ideas though [...] > Already been discussed: > http://mail.python.org/pipermail/python-ideas/2011-October/012571.html If you're following the PEP process, then the next step would be for you (having built some support) to author a new PEP, or to resurrect the deferred Stackless PEP with some new rationale - personally I'd recommend the latter. My brief skimming of the linked thread doesn't indicate you have a lot of strong support though, just some people who would be somewhat interested. So I still think it bears more discussion there, especially on the motivation / justification side of things. > All of the objections to coroutines (as I propose) also apply to PEP 380. You might want to see the video of Guido's "Fireside Chat" last year . Skip to a little before 15:00. He mentions the point that coroutines that can implicitly switch out from under you have the same non-deterministic property as threads: you don't know where you're going to need a lock or lock-like construct to update any variables, so you need to think about concurrency more deeply than if you could explicitly always see a 'yield'. I have more than one "painful event in my past" (as he refers to it) indicating that microthreads have the same problem as real threads :). (And yes, they're microthreads, even if you don't have an elaborate scheduling construct. If you can switch to another stack by making a function call, then you are effectively context switching, and it can become arbitrarily complex. Any coroutine in a system may introduce an arbitrarily complex microthread scheduler just by calling a function that yields to it.) -glyph ([1]: Well actually it isn't, note the dashed line from "Accepted" to "Rejected" in the workflow diagram. But you have to have a really darn good reason, and championing the rejection of a pep that Guido has explicitly accepted and has liked from pretty much the beginning is going to be very, very hard.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jan 18 13:30:10 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 18 Jan 2012 07:30:10 -0500 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F167290.4090800@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <4F167290.4090800@v.loewis.de> Message-ID: <20120118073010.39c080e6@resist.wooz.org> On Jan 18, 2012, at 08:19 AM, Martin v. L?wis wrote: >My concern is not about breaking doctests: this proposal will also break >them. My concern is about applications that assume that hash(s) is >stable across runs, and we do have reports that it will break >applications. I am a proponent of doctests, and thus use them heavily. I can tell you that the issue of dict hashing (non-)order has been well known for *years* and I have convenience functions in my own doctests to sort and print dict elements. Back in my Launchpad days (which has oodles of doctests), many years ago we went on a tear to fix dict printing when some change in Python caused them to break. So I'm not personally worried that such a change would break any of my own code. Even though I hope anybody who uses doctests has their own workarounds for this, I still support being conservative in default behavior for stable releases, because it's the right thing to do for our users. -Barry From solipsis at pitrou.net Wed Jan 18 13:30:13 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 13:30:13 +0100 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> Message-ID: <1326889813.3395.37.camel@localhost.localdomain> Le mercredi 18 janvier 2012 ? 21:26 +1000, Nick Coghlan a ?crit : > I'm also wholly in agreement with Ezio that using the > same versioning scheme for both full releases and interim releases is > thoroughly confusing for users It's a straight-forward way to track the feature support of a release. How do you suggest all these "sys.version_info >= (3, 2)" - and the corresponding documentation snippets a.k.a "versionadded" or "versionchanged" tags - be spelt otherwise? > for example, I consider Red Hat's > completely separate branding and versioning for Fedora and RHEL a > better model for end users It's not only branding and versioning, is it? They're completely different projects with different goals (and different commercial support). If you're suggesting we do only short-term releases and leave the responsibility of long-term support to another project or entity, I'm not against it, but it's far more radical than what we are proposing in the PEP :-) > Instead, if the proposal involves instituting a PEP 3003 style > moratorium (i.e. stdlib changes only) for all interim releases, then > we're essentially talking about splitting the versioning of the core > language (and the CPython C API) and the standard library. If we're > going to discuss that, we may as well go a bit further and just split > development of the two out onto separate branches, with the current > numbering scheme applying to full language version releases and > switching to a date-based versioning scheme for the standard library > (i.e. if 3.3 goes out in August as planned, then it would be "Python > 3.3 with the 12.08 stdlib release"). Well, you're opposing the PEP on the basis that it's workforce-intensive but you're proposing something much more workforce-intensive :-) Splitting the stdlib: - requires someone to do the splitting (highly non-trivial given the interactions of some modules with interpreter details or low-level C code) - requires setting up separate resources (continuous integration with N stdlib versions and M interpreter versions, for example) - requires separate maintenance and releases for the stdlib (but with non-trivial interaction with interpreter maintenance, since they will affect each other and must be synchronized for Python to be usable at all) - requires more attention by users since there are now *two* release schedules and independent version numbers to track The former two are one-time costs, but the latter two are recurring costs. Therefore, splitting the stdlib is much more complicated and involved than many people think; it's not just "move a few directories around and be done". And it's not even obvious it would have an actual benefit, since developers of other implementations are busy doing just that (see Jeff Hardy's message in this thread). Regards Antoine. From stephen at xemacs.org Wed Jan 18 13:48:58 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 18 Jan 2012 21:48:58 +0900 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120118121530.2e6a3b52@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> Message-ID: <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > > Since testing is the bottleneck on what users consider to be > > "available for me", you cannot decrease the amount of testing (alpha, > > beta releases) by anywhere near the amount you're increasing > > frequency, or you're just producing "as is" snapshots. > > The point is to *increase* the amount of testing by making features > available in stable releases on a more frequent basis. Not decrease > it. We're talking about different kinds of testing. You're talking about (what old-school commercial software houses meant by "beta") testing in a production or production prototype environment. I'd love to see more of that, too! My claim is that I don't expect much uptake if you don't do close to as many of what are called "alpha" and "beta" tests on python-dev as are currently done. > Alphas and betas never produce much feedback, because people are > reluctant to install them for anything else than toying around. Python > is not emacs or Firefox, you don't use it in a vacuum > and therefore installing non-stable versions is dangerous. Exactly my point, except that the PEP authors seem to think that we can cut back on the number of alpha and beta prereleases and still achieve the stability that such users expect from a Python release. I don't think that's right. I expect that unless quite substantial resources (far more than "proportional to 1/frequency") are devoted to each non-LTS release, a large fraction of such users to avoid non-LTS releases the way they avoid betas now. From solipsis at pitrou.net Wed Jan 18 14:02:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 14:02:07 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1326891727.3395.44.camel@localhost.localdomain> Le mercredi 18 janvier 2012 ? 21:48 +0900, Stephen J. Turnbull a ?crit : > My claim is that I don't expect much uptake if you > don't do close to as many of what are called "alpha" and "beta" tests > on python-dev as are currently done. You claim people won't use stable releases because of not enough alphas? That sounds completely unrelated. I don't know of any users who would bother about that. (you can produce flimsy software with many alphas, too) > > Alphas and betas never produce much feedback, because people are > > reluctant to install them for anything else than toying around. Python > > is not emacs or Firefox, you don't use it in a vacuum > > and therefore installing non-stable versions is dangerous. > > Exactly my point, except that the PEP authors seem to think that we > can cut back on the number of alpha and beta prereleases and still > achieve the stability that such users expect from a Python release. I > don't think that's right. Sure, and we think it is :) Regards Antoine. From ncoghlan at gmail.com Wed Jan 18 15:08:49 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 00:08:49 +1000 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: <1326889813.3395.37.camel@localhost.localdomain> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: On Wed, Jan 18, 2012 at 10:30 PM, Antoine Pitrou wrote: > Splitting the stdlib: > - requires someone to do the splitting (highly non-trivial given the > interactions of some modules with interpreter details or low-level C > code) > - requires setting up separate resources (continuous integration with N > stdlib versions and M interpreter versions, for example) > - requires separate maintenance and releases for the stdlib (but with > non-trivial interaction with interpreter maintenance, since they will > affect each other and must be synchronized for Python to be usable at > all) > - requires more attention by users since there are now *two* release > schedules and independent version numbers to track Did you read what I actually proposed? I specifically *didn't* propose separate stdlib releases (for all the reasons you point out), only separate date based stdlib *versioning*. Distribution of the CPython interpreter + stdlib would remain monolithic, as it is today. Any given stdlib release would only be supported for the most recent language release. The only difference is that between language releases, where we currently only release maintenance builds, we'd *also* release a second version of each maintenance build with an updated standard library, along with an alpha release of the next language version (with the last part being entirely optional, but I figured I may as well make the suggestion since I like the idea to encourage getting syntax updates and the like out for earlier experimentation). When you initially pitched the proposal via email, you didn't include the "language moratarium applies to interim releases" idea. That one additional suggestion makes the whole concept *much* more appealing to me, but I only like it on the condition that we decouple the stdlib versioning from the language definition versioning (even though I recommend we only officially support very specific combinations of the two). My suggestion is really just a concrete proposal for implementing Ezio's idea of only bumping the Python version for releases with long term support, and using some other mechanism to distinguish the interim releases. So, assuming a 2 year LTS cycle, the released versions up to February 2015 with my suggestion would end up being: >From the default branch: Python 3.3.0 + stdlib 12.08.0 (~August 2012) Python 3.4.0a1 + stdlib 14.08.0a1 (~February 2013) Python 3.4.0a2 + stdlib 14.08.0a2 (~August 2013) Python 3.4.0a3 + stdlib 14.08.0a3 (~February 2014) Python 3.4.0a4 + stdlib 14.08.0a4 (~2014) Python 3.4.0b1 + stdlib 14.08.0b1 (~2014) Python 3.4.0b2 + stdlib 14.08.0b2 (~2014) Python 3.4.0c1 + stdlib 14.08.0c1 (~2014) Python 3.4.0 + stdlib 14.08 (~August 2014) Python 3.5.0a1 + stdlib 16.08.0a1 (~February 2015) >From the 3.3 maintenance branch (these are maintenance updates to the "LTS" release): Python 3.3.1 + stdlib 12.08.1 (~February 2013) Python 3.3.2 + stdlib 12.08.2 (~August 2013) Python 3.3.3 + stdlib 12.08.3 (~February 2014) Python 3.3.4 + stdlib 12.08.4 (~August 2014) (and 3.3 branch enters security patch only mode) >From the 3.4 maintenance branch (these are maintenance updates to the "LTS" release): Python 3.4.1 + stdlib 14.08.1 (~February 2015) >From the stdlib feature development branch (these are the new interim releases with standard library updates only as proposed by PEP 407): Python 3.3.1 + stdlib 13.02.0 (~February 2013) Python 3.3.2 + stdlib 13.08.0 (~August 2013) Python 3.3.3 + stdlib 14.02.0 (~February 2014) (only upgrade path from here is to make the jump to 3.4.0) -- 3.4.0 + 12.08.0 is released from default branch -- Python 3.4.1 + stdlib 15.02.0 (~February 2015) If we have to make "brown paper bag" releases for the maintenance or stdlib branches then the micro versions get bumped - the date based version of the standard library versions relates to when that particular *API* was realised, not when bugs were last fixed in it. If a target release date slips, then the stdlib version would be increased accordingly (cf. Ubuntu 6.06). Yes, we'd have an extra set of active buildbots to handle the stdlib branch, but a) that's no harder than creating the buildbots for a new maintenance branch and b) the interim release proposal will need to separate language level changes from stdlib level changes *anyway*. As far as how sys.version checks would be updated, I would propose a simple API addition to track the new date-based standard lib versioning: sys.stdlib_version. People could choose to just depend on a specific Python version (implicitly depending on the stdlib version that was originally shipped with that version of CPython), or they may instead decide to depend on a specific stdlib version (implicitly depending on the first Python version that was shipped with that stdlib). The reason I like this scheme is that it allows us (and users) to precisely track the things that can vary at the two different rates. At least the following would still be governed by changes in the first two fields of sys.version (i.e. the major Python version): - deprecation policy - language syntax - compiler AST - C ABI stability - Windows compilation suite and C runtime version - anything else we decide to link with the Python language version (e.g. default pickle protocol) However, the addition of date based stdlib versioning would allow us to clearly identify the new interim releases proposed by PEP 407 *without* mucking up all those things that are currently linked to sys.version and really *shouldn't* be getting updated every 6 months. Users get a clear guarantee that if they follow the stdlib updates instead of the regular maintenance releases, they'll get nice new features along with their bug fixes, but no new deprecations or backwards incompatible API changes. However, they're also going to be obliged to transition to each new language release as it comes out if they want to continue getting security updates. Basically, what it boils down to is that I'm now +1 on the general proposal in the PEP, *so long as*: 1. We get a separate Hg branch for "stdlib only" changes and default becomes the destination specifically for "language update" changes (with the latter being a superset of the former) 2. The proposed "interim releases" are denoted by a new date-based sys.stdlib_version field and sys.version retains its current meaning (and slow rate of change) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Wed Jan 18 16:06:07 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 01:06:07 +1000 Subject: [Python-Dev] [Python-checkins] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: References: Message-ID: On Wed, Jan 18, 2012 at 2:31 PM, wrote: > results for 12de1ad1cee8 on branch "default" > -------------------------------------------- > > test_capi leaked [2008, 2008, 2008] references, sum=6024 Yikes, you weren't kidding about that new subinterpreter code execution test upsetting the refleak detection... Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Wed Jan 18 16:25:10 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 19 Jan 2012 00:25:10 +0900 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <1326891727.3395.44.camel@localhost.localdomain> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> Message-ID: <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > You claim people won't use stable releases because of not enough > alphas? That sounds completely unrelated. Surely testing is related to user perceptions of stability. More testing helps reduce bugs in released software, which improves user perception of stability, encouraging them to use the software in production. Less testing, then, will have the opposite effect. But you understand that theory, I'm sure. So what do you mean to say? > (you can produce flimsy software with many alphas, too) The problem is the converse: can you produce Python-release-quality software with much less pre-release testing than current feature releases get? > Sure, and we think it is [possible to do that] :) Given the relative risk of rejecting PEP 407 and me being wrong (the status quo really isn't all that bad AFAICS), vs. accepting PEP 407 and you being wrong, I don't find a smiley very convincing. In fact, I don't find the PEP itself convincing -- and I'm not the only one. We'll see what Barry and Georg have to say. From solipsis at pitrou.net Wed Jan 18 16:51:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 16:51:59 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1326901919.3395.67.camel@localhost.localdomain> Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit : > > > You claim people won't use stable releases because of not enough > > alphas? That sounds completely unrelated. > > Surely testing is related to user perceptions of stability. More > testing helps reduce bugs in released software, which improves user > perception of stability, encouraging them to use the software in > production. I have asked a practical question, a theoretical answer isn't exactly what I was waiting for. > > Sure, and we think it is [possible to do that] :) > > Given the relative risk of rejecting PEP 407 and me being wrong (the > status quo really isn't all that bad AFAICS), vs. accepting PEP 407 > and you being wrong, I don't find a smiley very convincing. I don't care to convince *you*, since you are not involved in Python development and release management (you haven't ever been a contributor AFAIK). Unless you produce practical arguments, saying "I don't think you can do it" is plain FUD and certainly not worth answering to. Regards Antoine. From senthil at uthcode.com Wed Jan 18 16:54:49 2012 From: senthil at uthcode.com (Senthil Kumaran) Date: Wed, 18 Jan 2012 23:54:49 +0800 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> Message-ID: <20120118155449.GE1958@mathmagic> On Wed, Jan 18, 2012 at 09:26:19PM +1000, Nick Coghlan wrote: > My original suggestion to Antoine and Georg for 3.4 was that we simply > propose to Larry Hastings (the 3.4 RM) that we spread out the release > cycle, releasing the first alpha after ~6 months, the second after > about ~12, then rolling into the regular release cycle of a final > alpha, some beta releases, one or two release candidates and then the > actual release. However, I'm sympathetic to Antoine's point that early > alphas aren't likely to be at all interesting to folks that would like > a fully supported stdlib update to put into production and no longer > think that suggestion makes much sense on its own. This looks like a 'good bridge' of suggestion between rapid releases and stable releases. What would be purpose of alpha release. Would we encourage people to use it or test it? Which the rapid relase cycle, the encouragement is to use rather than test. -- Senthil From solipsis at pitrou.net Wed Jan 18 16:56:04 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 16:56:04 +0100 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 References: Message-ID: <20120118165604.23c66c00@pitrou.net> On Thu, 19 Jan 2012 01:06:07 +1000 Nick Coghlan wrote: > On Wed, Jan 18, 2012 at 2:31 PM, wrote: > > results for 12de1ad1cee8 on branch "default" > > -------------------------------------------- > > > > test_capi leaked [2008, 2008, 2008] references, sum=6024 > > Yikes, you weren't kidding about that new subinterpreter code > execution test upsetting the refleak detection... Well, these are real leaks, but I expect them to be quite difficult to track (I've found a couple of them), because they can be scattered around in C module initialization routines and the like. I suggest we skip this test on refleak runs. cheers Antoine. From pje at telecommunity.com Wed Jan 18 17:01:10 2012 From: pje at telecommunity.com (PJ Eby) Date: Wed, 18 Jan 2012 11:01:10 -0500 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F161942.5040100@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" wrote: > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > > could cache a "hash perturbation" computed from the string and the > > random bits: > > > > - hash() would use ob_shash > > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > > > This way, you cache almost all computations, adding only a computation > > and a couple logical ops when looking up a string in a dict. > > That's a good idea. For Unicode, it might be best to add another slot > into the object, even though this increases the object size. > Wouldn't that break the ABI in 2.x? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Jan 18 17:14:50 2012 From: brett at python.org (Brett Cannon) Date: Wed, 18 Jan 2012 11:14:50 -0500 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: <20120118165604.23c66c00@pitrou.net> References: <20120118165604.23c66c00@pitrou.net> Message-ID: On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou wrote: > On Thu, 19 Jan 2012 01:06:07 +1000 > Nick Coghlan wrote: > > On Wed, Jan 18, 2012 at 2:31 PM, wrote: > > > results for 12de1ad1cee8 on branch "default" > > > -------------------------------------------- > > > > > > test_capi leaked [2008, 2008, 2008] references, sum=6024 > > > > Yikes, you weren't kidding about that new subinterpreter code > > execution test upsetting the refleak detection... > > Well, these are real leaks, but I expect them to be quite difficult to > track (I've found a couple of them), because they can be scattered > around in C module initialization routines and the like. I suggest we > skip this test on refleak runs. > Do we have any general strategy to help make it more fine-grained to detect where the leak might be coming from? We could then maybe try to get some people pound on this at the PyCon sprints. Otherwise I'm reluctant to skip it since they are legitimate leaks that should be get fixed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 18 17:27:56 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 17:27:56 +0100 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: References: <20120118165604.23c66c00@pitrou.net> Message-ID: <20120118172756.2df75c23@pitrou.net> On Wed, 18 Jan 2012 11:14:50 -0500 Brett Cannon wrote: > On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou wrote: > > > On Thu, 19 Jan 2012 01:06:07 +1000 > > Nick Coghlan wrote: > > > On Wed, Jan 18, 2012 at 2:31 PM, wrote: > > > > results for 12de1ad1cee8 on branch "default" > > > > -------------------------------------------- > > > > > > > > test_capi leaked [2008, 2008, 2008] references, sum=6024 > > > > > > Yikes, you weren't kidding about that new subinterpreter code > > > execution test upsetting the refleak detection... > > > > Well, these are real leaks, but I expect them to be quite difficult to > > track (I've found a couple of them), because they can be scattered > > around in C module initialization routines and the like. I suggest we > > skip this test on refleak runs. > > > > Do we have any general strategy to help make it more fine-grained to detect > where the leak might be coming from? Unfortunately not. I've tried to track down the remaining leaks (*) by using gc.get_objects(), but apart from a couple of false positives (dead weakrefs lingering in some tp_subclasses slots until the next subclasses take their place ;-)), most refleaks seem to be either on long-lived objects (meaning the leaks are not severe) or on non-gc-tracked objects. (*) $ ./python -m test -R 3:2 test_capi [1/1] test_capi beginning 5 repetitions 12345 ..... test_capi leaked [152, 152] references, sum=304 > We could then maybe try to get some > people pound on this at the PyCon sprints. Otherwise I'm reluctant to skip > it since they are legitimate leaks that should be get fixed. Well it's the old well-known issue with pseudo-"permanent" references not being appropriately managed/cleaned up. Which only shows when calling Py_Initialize/Py_Finalize multiple times, or using sub-interpreters. Regards Antoine. From brett at python.org Wed Jan 18 17:39:42 2012 From: brett at python.org (Brett Cannon) Date: Wed, 18 Jan 2012 11:39:42 -0500 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: <20120118172756.2df75c23@pitrou.net> References: <20120118165604.23c66c00@pitrou.net> <20120118172756.2df75c23@pitrou.net> Message-ID: On Wed, Jan 18, 2012 at 11:27, Antoine Pitrou wrote: > On Wed, 18 Jan 2012 11:14:50 -0500 > Brett Cannon wrote: > > > On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou > wrote: > > > > > On Thu, 19 Jan 2012 01:06:07 +1000 > > > Nick Coghlan wrote: > > > > On Wed, Jan 18, 2012 at 2:31 PM, wrote: > > > > > results for 12de1ad1cee8 on branch "default" > > > > > -------------------------------------------- > > > > > > > > > > test_capi leaked [2008, 2008, 2008] references, sum=6024 > > > > > > > > Yikes, you weren't kidding about that new subinterpreter code > > > > execution test upsetting the refleak detection... > > > > > > Well, these are real leaks, but I expect them to be quite difficult to > > > track (I've found a couple of them), because they can be scattered > > > around in C module initialization routines and the like. I suggest we > > > skip this test on refleak runs. > > > > > > > Do we have any general strategy to help make it more fine-grained to > detect > > where the leak might be coming from? > > Unfortunately not. I've tried to track down the remaining leaks (*) by > using gc.get_objects(), but apart from a couple of false positives > (dead weakrefs lingering in some tp_subclasses slots until the next > subclasses take their place ;-)), most refleaks seem to be either on > long-lived objects (meaning the leaks are not severe) or on > non-gc-tracked objects. > > (*) > > $ ./python -m test -R 3:2 test_capi > [1/1] test_capi > beginning 5 repetitions > 12345 > ..... > test_capi leaked [152, 152] references, sum=304 > > > > We could then maybe try to get some > > people pound on this at the PyCon sprints. Otherwise I'm reluctant to > skip > > it since they are legitimate leaks that should be get fixed. > > Well it's the old well-known issue with pseudo-"permanent" references > not being appropriately managed/cleaned up. Which only shows when > calling Py_Initialize/Py_Finalize multiple times, or using > sub-interpreters. > Could we tweak the report to somehow ignore the permanent refcounts for just this test? If not then we might as well leave it out since that number will never hit 0. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 18 17:42:15 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 17:42:15 +0100 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: References: <20120118165604.23c66c00@pitrou.net> <20120118172756.2df75c23@pitrou.net> Message-ID: <20120118174215.09a267d6@pitrou.net> On Wed, 18 Jan 2012 11:39:42 -0500 Brett Cannon wrote: > > > > > We could then maybe try to get some > > > people pound on this at the PyCon sprints. Otherwise I'm reluctant to > > skip > > > it since they are legitimate leaks that should be get fixed. > > > > Well it's the old well-known issue with pseudo-"permanent" references > > not being appropriately managed/cleaned up. Which only shows when > > calling Py_Initialize/Py_Finalize multiple times, or using > > sub-interpreters. > > > > Could we tweak the report to somehow ignore the permanent refcounts for > just this test? If not then we might as well leave it out since that number > will never hit 0. I can't think of any way to specifically ignore them (if we knew where they are we could just fix the refleaks :-)). Regards Antoine. From stephen at xemacs.org Wed Jan 18 17:57:12 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 19 Jan 2012 01:57:12 +0900 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: <87ipk96j9z.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > >From the stdlib feature development branch (these are the new interim > releases with standard library updates only as proposed by PEP 407): > Python 3.3.1 + stdlib 13.02.0 (~February 2013) > Python 3.3.2 + stdlib 13.08.0 (~August 2013) > Python 3.3.3 + stdlib 14.02.0 (~February 2014) (only upgrade path > from here is to make the jump to 3.4.0) > -- 3.4.0 + 12.08.0 is released from default branch -- Typo? -> 3.4.0 + 14.08.0, right? > Python 3.4.1 + stdlib 15.02.0 (~February 2015) It seems to me there could be considerable divergence between the stdlib code in > the default branch: > Python 3.4.0a1 + stdlib 14.08.0a1 (~February 2013) > Python 3.4.0a2 + stdlib 14.08.0a2 (~August 2013) > Python 3.4.0a3 + stdlib 14.08.0a3 (~February 2014) and > the stdlib feature development branch > Python 3.3.1 + stdlib 13.02.0 (~February 2013) > Python 3.3.2 + stdlib 13.08.0 (~August 2013) > Python 3.3.3 + stdlib 14.02.0 (~February 2014) (only upgrade path because 14.08.0a* will be targeting 3.4, and *should* use new language constructs and APIs where they are appropriate, while 13.02.0 ... 14.02.0 will be targeting the 3.3 API, and mustn't use them. From dirkjan at ochtman.nl Wed Jan 18 18:32:22 2012 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 18 Jan 2012 18:32:22 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120117213440.0008fd70@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> Message-ID: On Tuesday, January 17, 2012, Antoine Pitrou wrote: > We would like to propose the following PEP to change (C)Python's release > cycle. Discussion is welcome, especially from people involved in the > release process, and maintainers from third-party distributions of > Python. As a Gentoo packager, this would mean much more work for us, unless all the non-LTS releases promised to be backwards compatible. I.e. the hard part for us is managing all the incompatibilities in other packages, compatibility with Python. As a user of Python, I would rather dislike the change from 18 to 24 months for LTS release cycles. And the limiting factor for my use of Python features is largely old Python versions still in use, not the availability of newer features in the newest Python. So I'm much more interested in finding ways of improving 2.7/3.2 uptake than adding more feature releases. I also think that it would be sensible to wait with something like this process change until the 3.x adoption curve is much further along. Cheers, Dirkjan > Regards > > Antoine. > > > PEP: 407 > Title: New release cycle and introducing long-term support versions > Version: $Revision$ > Last-Modified: $Date$ > Author: Antoine Pitrou , > Georg Brandl , > Barry Warsaw > Status: Draft > Type: Process > Content-Type: text/x-rst > Created: 2012-01-12 > Post-History: > Resolution: TBD > > > Abstract > ======== > > Finding a release cycle for an open-source project is a delicate > exercise in managing mutually contradicting constraints: developer > manpower, availability of release management volunteers, ease of > maintenance for users and third-party packagers, quick availability of > new features (and behavioural changes), availability of bug fixes > without pulling in new features or behavioural changes. > > The current release cycle errs on the conservative side. It is > adequate for people who value stability over reactivity. This PEP is > an attempt to keep the stability that has become a Python trademark, > while offering a more fluid release of features, by introducing the > notion of long-term support versions. > > > Scope > ===== > > This PEP doesn't try to change the maintenance period or release > scheme for the 2.7 branch. Only 3.x versions are considered. > > > Proposal > ======== > > Under the proposed scheme, there would be two kinds of feature > versions (sometimes dubbed "minor versions", for example 3.2 or 3.3): > normal feature versions and long-term support (LTS) versions. > > Normal feature versions would get either zero or at most one bugfix > release; the latter only if needed to fix critical issues. Security > fix handling for these branches needs to be decided. > > LTS versions would get regular bugfix releases until the next LTS > version is out. They then would go into security fixes mode, up to a > termination date at the release manager's discretion. > > Periodicity > ----------- > > A new feature version would be released every X months. We > tentatively propose X = 6 months. > > LTS versions would be one out of N feature versions. We tentatively > propose N = 4. > > With these figures, a new LTS version would be out every 24 months, > and remain supported until the next LTS version 24 months later. This > is mildly similar to today's 18 months bugfix cycle for every feature > version. > > Pre-release versions > -------------------- > > More frequent feature releases imply a smaller number of disruptive > changes per release. Therefore, the number of pre-release builds > (alphas and betas) can be brought down considerably. Two alpha builds > and a single beta build would probably be enough in the regular case. > The number of release candidates depends, as usual, on the number of > last-minute fixes before final release. > > > Effects > ======= > > Effect on development cycle > --------------------------- > > More feature releases might mean more stress on the development and > release management teams. This is quantitatively alleviated by the > smaller number of pre-release versions; and qualitatively by the > lesser amount of disruptive changes (meaning less potential for > breakage). The shorter feature freeze period (after the first beta > build until the final release) is easier to accept. The rush for > adding features just before feature freeze should also be much > smaller. > > Effect on bugfix cycle > ---------------------- > > The effect on fixing bugs should be minimal with the proposed figures. > The same number of branches would be simultaneously open for regular > maintenance (two until 2.x is terminated, then one). > > Effect on workflow > ------------------ > > The workflow for new features would be the same: developers would only > commit them on the ``default`` branch. > > The workflow for bug fixes would be slightly updated: developers would > commit bug fixes to the current LTS branch (for example ``3.3``) and > then merge them into ``default``. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Jan 18 18:55:31 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jan 2012 18:55:31 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> Message-ID: <4F170793.9060802@v.loewis.de> Am 18.01.2012 17:01, schrieb PJ Eby: > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" > wrote: > > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > > could cache a "hash perturbation" computed from the string and the > > random bits: > > > > - hash() would use ob_shash > > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > > > This way, you cache almost all computations, adding only a computation > > and a couple logical ops when looking up a string in a dict. > > That's a good idea. For Unicode, it might be best to add another slot > into the object, even though this increases the object size. > > > Wouldn't that break the ABI in 2.x? I was thinking about adding the field at the end, so I thought it shouldn't. However, if somebody inherits from PyUnicodeObject, it still might - so my new proposal is to add the extra hash into the str block, either at str[-1], or after the terminating 0. This would cause an average increase of four bytes of the storage (0 bytes in 50% of the cases, 8 bytes because of padding in the other 50%). What do you think? Regards, Martin From brett at python.org Wed Jan 18 18:56:21 2012 From: brett at python.org (Brett Cannon) Date: Wed, 18 Jan 2012 12:56:21 -0500 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: On Wed, Jan 18, 2012 at 09:08, Nick Coghlan wrote: > On Wed, Jan 18, 2012 at 10:30 PM, Antoine Pitrou > wrote: > > Splitting the stdlib: > > - requires someone to do the splitting (highly non-trivial given the > > interactions of some modules with interpreter details or low-level C > > code) > > - requires setting up separate resources (continuous integration with N > > stdlib versions and M interpreter versions, for example) > > - requires separate maintenance and releases for the stdlib (but with > > non-trivial interaction with interpreter maintenance, since they will > > affect each other and must be synchronized for Python to be usable at > > all) > > - requires more attention by users since there are now *two* release > > schedules and independent version numbers to track > > Did you read what I actually proposed? I specifically *didn't* propose > separate stdlib releases (for all the reasons you point out), only > separate date based stdlib *versioning*. Distribution of the CPython > interpreter + stdlib would remain monolithic, as it is today. Any > given stdlib release would only be supported for the most recent > language release. The only difference is that between language > releases, where we currently only release maintenance builds, we'd > *also* release a second version of each maintenance build with an > updated standard library, along with an alpha release of the next > language version (with the last part being entirely optional, but I > figured I may as well make the suggestion since I like the idea to > encourage getting syntax updates and the like out for earlier > experimentation). > When you initially pitched the proposal via email, you didn't include > the "language moratarium applies to interim releases" idea. That one > additional suggestion makes the whole concept *much* more appealing to > me, but I only like it on the condition that we decouple the stdlib > versioning from the language definition versioning (even though I > recommend we only officially support very specific combinations of the > two). My suggestion is really just a concrete proposal for > implementing Ezio's idea of only bumping the Python version for > releases with long term support, and using some other mechanism to > distinguish the interim releases. > IOW we would have a language moratorium every 2 years (i.e. between LTS releases) while switching to a 6 month release cycle for language/VM bugfixes and full stdlib releases? I would support that as it has several benefits from several angles. >From a VM perspective, it gives other VMs 2 years to catch up to the next release instead of 18 months; not a big switch, but still better than shortening it. It also makes disruptive language changes less frequent so people have more time to catch up, update books/docs, etc. We can also let them bake longer and we all get more experience with them. Doing a release every 6 months that includes updates to the stdlib and bugfixes to the language/VM also benefits other VMs by getting compatibility fixes in faster. All of the other VM maintainers have told me that keeping the stdlib non-CPython compliant is the biggest hurdle. This kind of switch means they could release a VM that supports a release 6 months or a year after a language change release (e.g. 1 to 2 releases in) so as to get changes in faster and lower the need to keep their own fork. It should also increase the chances of external developers of projects being willing to become core developers and contributing their project to Python. If they get to keep a 6 month release cycle we could consider pulling in project like httplib2 and others that have resisted inclusion in the stdlib because painfully long (for them) wait between releases. > > So, assuming a 2 year LTS cycle, the released versions up to February > 2015 with my suggestion would end up being: > > >From the default branch: > Python 3.3.0 + stdlib 12.08.0 (~August 2012) > Python 3.4.0a1 + stdlib 14.08.0a1 (~February 2013) > Python 3.4.0a2 + stdlib 14.08.0a2 (~August 2013) > Python 3.4.0a3 + stdlib 14.08.0a3 (~February 2014) > Python 3.4.0a4 + stdlib 14.08.0a4 (~2014) > Python 3.4.0b1 + stdlib 14.08.0b1 (~2014) > Python 3.4.0b2 + stdlib 14.08.0b2 (~2014) > Python 3.4.0c1 + stdlib 14.08.0c1 (~2014) > Python 3.4.0 + stdlib 14.08 (~August 2014) > Python 3.5.0a1 + stdlib 16.08.0a1 (~February 2015) > > >From the 3.3 maintenance branch (these are maintenance updates to the > "LTS" release): > Python 3.3.1 + stdlib 12.08.1 (~February 2013) > Python 3.3.2 + stdlib 12.08.2 (~August 2013) > Python 3.3.3 + stdlib 12.08.3 (~February 2014) > Python 3.3.4 + stdlib 12.08.4 (~August 2014) (and 3.3 branch enters > security patch only mode) > > >From the 3.4 maintenance branch (these are maintenance updates to the > "LTS" release): > Python 3.4.1 + stdlib 14.08.1 (~February 2015) > > >From the stdlib feature development branch (these are the new interim > releases with standard library updates only as proposed by PEP 407): > Python 3.3.1 + stdlib 13.02.0 (~February 2013) > Python 3.3.2 + stdlib 13.08.0 (~August 2013) > Python 3.3.3 + stdlib 14.02.0 (~February 2014) (only upgrade path > from here is to make the jump to 3.4.0) > -- 3.4.0 + 12.08.0 is released from default branch -- > Python 3.4.1 + stdlib 15.02.0 (~February 2015) > > If we have to make "brown paper bag" releases for the maintenance or > stdlib branches then the micro versions get bumped - the date based > version of the standard library versions relates to when that > particular *API* was realised, not when bugs were last fixed in it. If > a target release date slips, then the stdlib version would be > increased accordingly (cf. Ubuntu 6.06). > > Yes, we'd have an extra set of active buildbots to handle the stdlib > branch, but a) that's no harder than creating the buildbots for a new > maintenance branch and b) the interim release proposal will need to > separate language level changes from stdlib level changes *anyway*. > > As far as how sys.version checks would be updated, I would propose a > simple API addition to track the new date-based standard lib > versioning: sys.stdlib_version. People could choose to just depend on > a specific Python version (implicitly depending on the stdlib version > that was originally shipped with that version of CPython), or they may > instead decide to depend on a specific stdlib version (implicitly > depending on the first Python version that was shipped with that > stdlib). > > The reason I like this scheme is that it allows us (and users) to > precisely track the things that can vary at the two different rates. > At least the following would still be governed by changes in the first > two fields of sys.version (i.e. the major Python version): > - deprecation policy > - language syntax > - compiler AST > - C ABI stability > - Windows compilation suite and C runtime version > - anything else we decide to link with the Python language version > (e.g. default pickle protocol) > > However, the addition of date based stdlib versioning would allow us > to clearly identify the new interim releases proposed by PEP 407 > *without* mucking up all those things that are currently linked to > sys.version and really *shouldn't* be getting updated every 6 months. > Users get a clear guarantee that if they follow the stdlib updates > instead of the regular maintenance releases, they'll get nice new > features along with their bug fixes, but no new deprecations or > backwards incompatible API changes. However, they're also going to be > obliged to transition to each new language release as it comes out if > they want to continue getting security updates. > > Basically, what it boils down to is that I'm now +1 on the general > proposal in the PEP, *so long as*: > 1. We get a separate Hg branch for "stdlib only" changes and default > becomes the destination specifically for "language update" changes > (with the latter being a superset of the former) > 2. The proposed "interim releases" are denoted by a new date-based > sys.stdlib_version field and sys.version retains its current meaning > (and slow rate of change) > > I don't think we need to do a new versioning scheme. Why can't we just say which releases are covered by a language moratorium? The community seemed to pick up on that rather well when we did it for Python 3 and I didn't see anyone having difficulty explaining it when someone didn't know what was going on. As long as we are clear which releases are under a language moratorium and which one's aren't we shouldn't need to switch to language + stdlib versioning scheme. This will lead to use reaching Python 4 faster (in about 4 years), but even that doesn't need to be a big deal. Linux jumped from 2 to 3 w/o issue. Once again, as long as we are clear on which new versions have language changes it should be clear as to what to expect. Otherwise I say we just bump the major version when we do a language-changing release (i.e. every 2 years) and just to a minor/feature number bump (i.e. every 6 months) when we add/change stuff to the stdlib. People can then be told "learn Python 4" which is easy to point out on docs, e.g. you won't have to go digging for what minor/feature release a book covers, just what major release which will probably be emblazoned on the cover. And with the faster stdlib release schedule other VMs can aim for X.N versions when they have all the language features *and* all of their compatibility fixes into the stdlib. And then once they hit that they can just continue to support that major version by just keeping up with minor releases with compatibility fixes (which buildbots can help guarantee). And honestly, if we don't go with this I'm with Georg's comment in another email of beginning to consider stripping the stdlib down to core libraries to help stop with the bitrot (sorry, Paul). If we can't attract new replacements for modules we can't ditch because of backwards compatibility I start to wonder if I should even care about improving the stdlib outside of core code required to make Python simply function. -Brett > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Wed Jan 18 18:52:23 2012 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 18 Jan 2012 18:52:23 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <20120118073010.39c080e6@resist.wooz.org> References: <4F15E130.6010200@v.loewis.de> <4F167290.4090800@v.loewis.de> <20120118073010.39c080e6@resist.wooz.org> Message-ID: <4F1706D7.8080406@v.loewis.de> Am 18.01.2012 13:30, schrieb Barry Warsaw: > On Jan 18, 2012, at 08:19 AM, Martin v. L?wis wrote: > >> My concern is not about breaking doctests: this proposal will also break >> them. My concern is about applications that assume that hash(s) is >> stable across runs, and we do have reports that it will break >> applications. > > I am a proponent of doctests, and thus use them heavily. I can tell you that > the issue of dict hashing (non-)order has been well known for *years* and I > have convenience functions in my own doctests to sort and print dict > elements. Indeed. So that breakage may actually be less than people expect. As for cases that still rely on dict order: none of the proposed solutions preserve full compatibility in dict order. The only solution (not actually proposed so far) is to add an AVL tree into the hash table, to track keys that collide on hash values (rather than hash slots). Such a tree would be only used if there is an actual collision, which, in practical dict usage, never occurs. I've been seriously considering implementing a balanced tree inside the dict (again for string-only dicts, as ordering can't be guaranteed otherwise). However, this would be a lot of code for a security fix. It *would* solve the issue for good, though. Regards, Martin From solipsis at pitrou.net Wed Jan 18 19:37:33 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 19:37:33 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> Message-ID: <20120118193733.0be2c21d@pitrou.net> Hello Dirkjan, On Wed, 18 Jan 2012 18:32:22 +0100 Dirkjan Ochtman wrote: > On Tuesday, January 17, 2012, Antoine Pitrou wrote: > > We would like to propose the following PEP to change (C)Python's release > > cycle. Discussion is welcome, especially from people involved in the > > release process, and maintainers from third-party distributions of > > Python. > > As a Gentoo packager, this would mean much more work for us, unless all the > non-LTS releases promised to be backwards compatible. I.e. the hard part > for us is managing all the incompatibilities in other packages, > compatibility with Python. It might need to be spelt clearly in the PEP, but one of my assumptions is that packagers choose on what release series they want to synchronize. So packagers can synchronize on the LTS releases if it's more practical for them, or if it maps better to their own release model (e.g. Debian). Do you think that's a valid answer to Gentoo's concerns? > So I'm much more interested in > finding ways of improving 2.7/3.2 uptake than adding more feature releases. That would be nice as well, but I think it's orthogonal to the PEP. Besides, I'm afraid there's not much we (python-dev) can do about it. Some vendors (Debian, Redhat) will always lag behind the bleeding-edge feature releases. Regards Antoine. From g.brandl at gmx.net Wed Jan 18 19:43:13 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jan 2012 19:43:13 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Am 18.01.2012 16:25, schrieb Stephen J. Turnbull: > Antoine Pitrou writes: > > > You claim people won't use stable releases because of not enough > > alphas? That sounds completely unrelated. > > Surely testing is related to user perceptions of stability. More > testing helps reduce bugs in released software, which improves user > perception of stability, encouraging them to use the software in > production. Less testing, then, will have the opposite effect. But > you understand that theory, I'm sure. So what do you mean to say? > > > (you can produce flimsy software with many alphas, too) > > The problem is the converse: can you produce Python-release-quality > software with much less pre-release testing than current feature > releases get? > > > Sure, and we think it is [possible to do that] :) > > Given the relative risk of rejecting PEP 407 and me being wrong (the > status quo really isn't all that bad AFAICS), vs. accepting PEP 407 > and you being wrong, I don't find a smiley very convincing. "The status quo really isn't all that bad" applies to any PEP. Also, compared to most PEPs, it is quite easy to revert to the previous state of things if they don't work out as wanted. > In fact, > I don't find the PEP itself convincing -- and I'm not the only one. That is noted. And I think Antoine was a little harsh earlier; of course we also need to convince users that the new cycle is advantageous and not detrimental. > We'll see what Barry and Georg have to say. Two things: a) The release manager's job is not as bad as you might believe. We have an incredibly helpful and active core of developers which means that the RM job is more or less "reduced" to pronouncing on changes during the rc phase, and actually producing the releases. b) I did not have the impression (maybe someone can underline that with tracker stats?) that there were a lot more bug reports than usual during the alpha and early beta stages of Python 3.2. Georg From g.brandl at gmx.net Wed Jan 18 19:46:38 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jan 2012 19:46:38 +0100 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: Am 18.01.2012 18:56, schrieb Brett Cannon: > IOW we would have a language moratorium every 2 years (i.e. between LTS > releases) while switching to a 6 month release cycle for language/VM bugfixes > and full stdlib releases? That is certainly a possibility (it's listed as an open issue in the PEP). > I would support that as it has several benefits from > several angles. > > From a VM perspective, it gives other VMs 2 years to catch up to the next > release instead of 18 months; not a big switch, but still better than shortening it. > > It also makes disruptive language changes less frequent so people have more time > to catch up, update books/docs, etc. We can also let them bake longer and we all > get more experience with them. Yes. In the end, the moratorium really was a good idea, and this would be carrying on the spirit. > Doing a release every 6 months that includes updates to the stdlib and bugfixes > to the language/VM also benefits other VMs by getting compatibility fixes in > faster. All of the other VM maintainers have told me that keeping the stdlib > non-CPython compliant is the biggest hurdle. This kind of switch means they > could release a VM that supports a release 6 months or a year after a language > change release (e.g. 1 to 2 releases in) so as to get changes in faster and > lower the need to keep their own fork. > > It should also increase the chances of external developers of projects being > willing to become core developers and contributing their project to Python. If > they get to keep a 6 month release cycle we could consider pulling in project > like httplib2 and others that have resisted inclusion in the stdlib because > painfully long (for them) wait between releases. Exactly! Georg From v+python at g.nevcal.com Wed Jan 18 20:09:27 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 18 Jan 2012 11:09:27 -0800 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F1706D7.8080406@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <4F167290.4090800@v.loewis.de> <20120118073010.39c080e6@resist.wooz.org> <4F1706D7.8080406@v.loewis.de> Message-ID: <4F1718E7.2000909@g.nevcal.com> On 1/18/2012 9:52 AM, "Martin v. L?wis" wrote: > I've been seriously considering implementing a balanced tree inside > the dict (again for string-only dicts, as ordering can't be guaranteed > otherwise). However, this would be a lot of code for a security fix. > It*would* solve the issue for good, though. To handle keys containing non-orderable keys along with strings, which are equally vulnerable to string-only keys, especially if the non-string components can have fixed values during an attack, you could simply use their hash value as an orderable proxy for the non-orderable key components. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 18 21:50:57 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 18 Jan 2012 21:50:57 +0100 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 References: <20120118165604.23c66c00@pitrou.net> <20120118172756.2df75c23@pitrou.net> <20120118174215.09a267d6@pitrou.net> Message-ID: <20120118215057.23b2396a@pitrou.net> Well, they should be fixed now :-) Regards Antoine. On Wed, 18 Jan 2012 17:42:15 +0100 Antoine Pitrou wrote: > On Wed, 18 Jan 2012 11:39:42 -0500 > Brett Cannon wrote: > > > > > > > We could then maybe try to get some > > > > people pound on this at the PyCon sprints. Otherwise I'm reluctant to > > > skip > > > > it since they are legitimate leaks that should be get fixed. > > > > > > Well it's the old well-known issue with pseudo-"permanent" references > > > not being appropriately managed/cleaned up. Which only shows when > > > calling Py_Initialize/Py_Finalize multiple times, or using > > > sub-interpreters. > > > > > > > Could we tweak the report to somehow ignore the permanent refcounts for > > just this test? If not then we might as well leave it out since that number > > will never hit 0. > > I can't think of any way to specifically ignore them (if we knew where > they are we could just fix the refleaks :-)). > > Regards > > Antoine. From fwierzbicki at gmail.com Wed Jan 18 22:31:26 2012 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Wed, 18 Jan 2012 13:31:26 -0800 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: On Wed, Jan 18, 2012 at 9:56 AM, Brett Cannon wrote: > Doing a release every 6 months that includes updates to the stdlib and > bugfixes to the language/VM also benefits other VMs by getting compatibility > fixes in faster. All of the other VM maintainers have told me that keeping > the stdlib non-CPython compliant is the biggest hurdle. This kind of switch > means they could release a VM that supports a release 6 months or a year > after a language change release (e.g. 1 to 2 releases in) so as to get > changes in faster and lower the need to keep their own fork. As one of the other VM maintainers I agree with everything Brett has said here. The proposal sounds very good to me from that perspective. -Frank From steve at pearwood.info Thu Jan 19 01:12:06 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Jan 2012 11:12:06 +1100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <1326901919.3395.67.camel@localhost.localdomain> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> Message-ID: <4F175FD6.30502@pearwood.info> Antoine Pitrou wrote: > Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit : >> > You claim people won't use stable releases because of not enough >> > alphas? That sounds completely unrelated. >> >> Surely testing is related to user perceptions of stability. More >> testing helps reduce bugs in released software, which improves user >> perception of stability, encouraging them to use the software in >> production. > > I have asked a practical question, a theoretical answer isn't exactly > what I was waiting for. [...] > I don't care to convince *you*, since you are not involved in Python > development and release management (you haven't ever been a contributor > AFAIK). Unless you produce practical arguments, saying "I don't think > you can do it" is plain FUD and certainly not worth answering to. Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the sort of people you DO have to convince that moving to an accelerated or more complex release process will result in a better product. The risk is that you will lose users, or fragment the user base even more than it is now with 2.x vs 3.x. Quite frankly, I like the simplicity and speed of the current release cycle. All this talk about separate LTS releases and parallel language releases and library releases makes my head spin. I fear the day that people asking questions on the tutor or python-list mailing lists will have to say (e.g.) "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the version they're using. I fear change, because the current system works well and for every way to make it better there are a thousand ways to make it worse. Dismissing fears like this as FUD doesn't do anyone any favours. One on-going complaint is that Python-Dev doesn't have the manpower or time to do everything that needs to be done. Bugs languish for months or years because nobody has the time to look at it. Will going to a more rapid release cycle give people more time, or just increase their workload? You're hoping that a more rapid release cycle will attract more developers, and there is a chance that you could be right; but a more rapid release cycle WILL increase the total work load. So you're betting that this change will attract enough new developers that the work load per person will decrease even as the total work load increases. I don't think that's a safe bet. -- Steven From steve at pearwood.info Thu Jan 19 01:19:29 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Jan 2012 11:19:29 +1100 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: <4F176191.6090206@pearwood.info> Brett Cannon wrote: > And honestly, if we don't go with this I'm with Georg's comment in another > email of beginning to consider stripping the stdlib down to core libraries > to help stop with the bitrot (sorry, Paul). If we can't attract new > replacements for modules we can't ditch because of backwards compatibility > I start to wonder if I should even care about improving the stdlib outside > of core code required to make Python simply function. Do we have any evidence of this alleged bitrot? I spend a lot of time on the comp.lang.python newsgroup and I see no evidence that people using Python believe the standard library is rotting from lack of attention. I do see people having trouble with installing third party packages. I see that stripping back the standard library and forcing people to rely more on external libraries will hurt, rather than help, the experience they have with Python. -- Steven From anacrolix at gmail.com Thu Jan 19 01:42:00 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 19 Jan 2012 11:42:00 +1100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <4F16095C.3050701@gmail.com> Message-ID: On Wed, Jan 18, 2012 at 6:55 PM, Georg Brandl wrote: > The main reason is changes in the library. ?We have been getting complaints > about the standard library bitrotting for years now, and one of the main > reasons it's so hard to a) get decent code into the stdlib and b) keep it > maintained is that the release cycles are so long. ?It's a tough thing for > contributors to accept that the feature you've just implemented will only > be in a stable release in 16 months. > > If the stdlib does not get more reactive, it might just as well be cropped > down to a bare core, because 3rd-party libraries do everything as well and > do it before we do. ?But you're right that if Python came without batteries, > the current release cycle would be fine. I think this is the real issue here. The batteries in Python are so important because: 1) The stability and quality of 3rd party libraries is not guaranteed. 2) The mechanism used to obtain 3rd party libraries, is not popular or considered reliable. Much of the "bitrot" is that standard library modules have been deprecated by third party ones that are of a much higher functionality. Rather than importing these libraries, it needs to be trivial to obtain them. Putting some of these higher quality 3rd party modules into lock step with Python is an unpopular move, and hampers their future growth. >From the top of my head, libraries such as LXML, argparse, and requests are such popular libraries that shouldn't be baked in. In the long term, it would be nice to see these kinds of libraries dropped from the standard installation, and made available through the new distribute package systems etc. From ethan at stoneleaf.us Thu Jan 19 01:01:23 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 18 Jan 2012 16:01:23 -0800 Subject: [Python-Dev] Writable __doc__ Message-ID: <4F175D53.1050107@stoneleaf.us> Is there a reason why normal classes can't have their __doc__ strings rewritten? Creating a do-nothing metaclass seems like overkill for such a simple operation. Python 3.2 ... on win32 --> class Test(): ... __doc__ = 'am I permanent?' ... --> Test.__doc__ 'am I permanent?' --> Test.__doc__ = 'yes' Traceback (most recent call last): File "", line 1, in AttributeError: attribute '__doc__' of 'type' objects is not writable --> type(Test) --> class Meta(type): ... "only for exists to allow writable __doc__" ... --> class Test(metaclass=Meta): ... __doc__ = 'am I permanent?' ... --> Test.__doc__ 'am I permanent?' --> Test.__doc__ = 'No!' --> Test.__doc__ 'No!' --> type(Test) Should I create a bug report? ~Ethan~ From benjamin at python.org Thu Jan 19 01:54:39 2012 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 18 Jan 2012 19:54:39 -0500 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F175D53.1050107@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> Message-ID: 2012/1/18 Ethan Furman : > Is there a reason why normal classes can't have their __doc__ strings > rewritten? ?Creating a do-nothing metaclass seems like overkill for such a > simple operation. > > Python 3.2 ... on win32 > --> class Test(): > ... ? __doc__ = 'am I permanent?' > ... > --> Test.__doc__ > 'am I permanent?' > --> Test.__doc__ = 'yes' > Traceback (most recent call last): > ?File "", line 1, in > AttributeError: attribute '__doc__' of 'type' objects is not writable > --> type(Test) > > > --> class Meta(type): > ... ? "only for exists to allow writable __doc__" > ... > --> class Test(metaclass=Meta): > ... ? __doc__ = 'am I permanent?' > ... > --> Test.__doc__ > 'am I permanent?' > --> Test.__doc__ = 'No!' > --> Test.__doc__ > 'No!' > --> type(Test) > > > Should I create a bug report? $ ./python Python 3.3.0a0 (default:095de2293f39, Jan 18 2012, 10:34:18) [GCC 4.5.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> class Test: ... __doc__ = "time machine" ... >>> Test.__doc__ = "strikes again" >>> Test.__doc__ 'strikes again' -- Regards, Benjamin From anacrolix at gmail.com Thu Jan 19 01:58:24 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 19 Jan 2012 11:58:24 +1100 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> Message-ID: PEP380 and Mark's coroutines could coexist, so I really don't "it's too late" matters. Furthermore, PEP380 has utility in its own right without considering its use for "explicit coroutines". I would like to see these coroutines considered, but as someone else mentioned, coroutines via PEP380 enhanced generators have some interesting characteristics, from my experimentations they feel monadic. From ncoghlan at gmail.com Thu Jan 19 02:03:15 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 11:03:15 +1000 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: On Thu, Jan 19, 2012 at 7:31 AM, fwierzbicki at gmail.com wrote: > On Wed, Jan 18, 2012 at 9:56 AM, Brett Cannon wrote: > >> Doing a release every 6 months that includes updates to the stdlib and >> bugfixes to the language/VM also benefits other VMs by getting compatibility >> fixes in faster. All of the other VM maintainers have told me that keeping >> the stdlib non-CPython compliant is the biggest hurdle. This kind of switch >> means they could release a VM that supports a release 6 months or a year >> after a language change release (e.g. 1 to 2 releases in) so as to get >> changes in faster and lower the need to keep their own fork. > As one of the other VM maintainers I agree with everything Brett has > said here. The proposal sounds very good to me from that perspective. Yes, with the addition of the idea of a PEP 3003 style language change moratorium for interim releases, I've been converted from an initial opponent of the idea (since we don't want to give the wider community whiplash) to a supporter (since some parts of the community, especially web service developers that deploy to tightly controlled environments, aren't well served by the standard library's inability to keep up with externally maintained standards and recommended development practices). It means PEP 407 can end up serving two goals: 1. Speeding up the rate of release for the standard library, allowing enhanced features to be made available to end users sooner. 2. Slowing down (slightly) the rate of release of changes to the core language and builtins, providing more time for those changes to filter out through the wider Python ecosystem. Agreeing with those goals in principle then leaves two key questions to be addressed: 1. How would we have to update our development practices to make such a dual versioning scheme feasible? 2. How can we best communicate a new approach to versioning without unduly confusing developers that have built up certain expectations about Python's release cycle over the past 20+ years? For the first point, I think having two active development branches (one for stdlib updates, one for language updates) will prove to be absolutely essential. Otherwise all language updates would have to be landed in the 6 month window between the last stdlib release for a given language version and the next language release, which seems to me a crazy way to go about things. As a consequence, I think we'd be obliged to do something to avoid conflicts on Misc/NEWS (this could be as simple as splitting it out into NEWS and NEWS_STDLIB, but if we're restructuring those files anyway, we may also want to do something about the annoying conflicts between maintenance releases and development releases). That then leaves the question of how to best communicate such a change to the rest of the Python community. This is more a political and educational question than it is a technical one. A few different approaches have already been suggested: 1. I believe the PEP currently proposes just taking the "no more than 9" limit off the minor version of the language. Feature releases would just come out every 6 months, with every 4th release flagged as a language release. This could even be conveyed programmatically by offering "sys.lang_version" and "sys.lang_version_info" attributes that define the *language* version of a given release - 3.3, 3.4, 3.5 and 3.6 would all have something like sys.lang_version == '3.3', and then in 3.7 (the next language release) it would be updated to say sys.lang_version == '3.7'. This approach would require that some policies (such as the deprecation cycle) by updated to refer to changes in the language version (sys.lang_version) rather than change in the stdlib version (sys.version). I don't like this scheme because it tries to use one number (the minor version field) to cover two very different concepts (stdlib updates and language updates). While technically feasible, this is unnecessarily obscure and confusing for end users. 2. Brett's alternative proposal is that we switch to using the major version for language releases and the minor version for stdlib releases. We would then release 3.3, 3.4, 3.5 and 3.6 at 6 month intervals, with 4.0 then being released in August 2014 as a new language version. Without taking recent history into acount, I actually like this scheme - it fits well with traditional usage of major.minor.micro version numbering. However, I'm not confident that the "python" name will refer to Python 3 on a majority of systems by 2014 and accessing Python 4.0 through the "python3" name would just be odd. It also means we lose our ability to signal to the community when we plan to make a backwards incompatible language release (making the assumtion that we're never going to want to do that again would be incredibly naive). On a related note, we'd also be setting ourselves to have to explain to everyone that "no, no, Python 3 -> 4 is like upgrading from Python 3.2 -> 3.3, not 2.7 -> 3.2". I expect the disruptions of the Python 3 transition will still be fresh enough in everyone's mind at that point that we really shouldn't go there if we don't have to. 3. Finally, we get to my proposal: that we just leave sys.version and sys.version_info alone. They will still refer to Python language versions, the micro release will be incremented every 6 months or so, the minor release once every couple of years to indicate a language update and the major release every decade or so (if absolutely necessary) to indicate the introduction of backwards incompatibilities. All current intuitions and expectations regarding the meaning of sys.version and sys.version_info remain completely intact. However, we would still need *something* to indicate that the stdlib has changed in the interim releases. This should be a monotically increasing value, but should also be clearly distinct from the language version. Hence my proposal of a date based sys.stdlib_version and sys.stdlib_version_info. That way, nobody has to *unlearn* anything about current Python development practices and policies. Instead, all people have to do is *learn* that we now effectively have two release streams: a date-based release stream that comes out every 6 months (described by sys.stdlib_version) and an explicitly numbered release stream (described by sys.version) that comes out every 24 months. So in August this year, we would release 3.3+12.08, followed by 3.3+13.02, 3.3+13.08, 3.3+14.02 at 6 month intervals, and then the next language release as 3.4+14.08. If someone refers to just Python 3.3, then the "at least stdlib 12.08" is implied. If they refer to Python stdlib 12.08, 13.02, 13.08 or 14.02, then it is the dependency on "Python 3.3" that is implied. Two different rates of release -> two different version numbers. Makes sense to me. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Jan 19 02:06:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 11:06:01 +1000 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: <4F176191.6090206@pearwood.info> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> Message-ID: On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano wrote: > Brett Cannon wrote: > Do we have any evidence of this alleged bitrot? I spend a lot of time on the > comp.lang.python newsgroup and I see no evidence that people using Python > believe the standard library is rotting from lack of attention. IMO, it's a problem mainly with network (especially web) protocols and file formats. It can take the stdlib a long time to catch up with external developments due to the long release cycle, so people are often forced to switch to third party libraries that better track the latest versions of relevant standards (de facto or otherwise). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Thu Jan 19 02:54:45 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 18 Jan 2012 20:54:45 -0500 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> Message-ID: On 1/18/2012 8:06 PM, Nick Coghlan wrote: > On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano wrote: >> Do we have any evidence of this alleged bitrot? I spend a lot of time on the >> comp.lang.python newsgroup and I see no evidence that people using Python >> believe the standard library is rotting from lack of attention. > > IMO, it's a problem mainly with network (especially web) protocols and > file formats. It can take the stdlib a long time to catch up with > external developments due to the long release cycle, so people are > often forced to switch to third party libraries that better track the > latest versions of relevant standards (de facto or otherwise). Some of those modules are more that 2 years out of date and I guess what Brett is saying is that the people interested and able to update them will not do so in the stdlib because they want to be able to push out feature updates whenever they are needed and available and not be tied to a slow release schedule. Morever, since the external standards will continue to evolve for the foreseeable future, the need to track them more quickly will also continue. We could relax the ban on new features in micro releases and designate such modules as volatile and let them get new features in each x.y.z release. In a sense, this would be less drastic than inventing a new type of release. Code can require an x.y.z release, as it must if it depends on a bug fix not in x.y.0. I also like the idea of stretching out the alpha release cycle. I would like to see 3.3.0a1 appear along with 3.2.3 (in February?). If alpha releases are released with all buildbots green, they are as good, at least with respect to old features, as a corresponding bugfix release. All releases will become more dependable as test coverage improves. Again, this idea avoids inventing a new type of release with new release designations. I think one reason people avoid alpha releases is that they so quickly become obsolete. If one sat for 3 to 6 months, it might get more attention. As for any alpha stigma, we should emphasize that alpha only mean not feature frozen. -- Terry Jan Reedy From ericsnowcurrently at gmail.com Thu Jan 19 04:31:38 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 18 Jan 2012 20:31:38 -0700 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F175D53.1050107@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> Message-ID: On Wed, Jan 18, 2012 at 5:01 PM, Ethan Furman wrote: > Is there a reason why normal classes can't have their __doc__ strings > rewritten? ?Creating a do-nothing metaclass seems like overkill for such a > simple operation. > > Python 3.2 ... on win32 > --> class Test(): > ... ? __doc__ = 'am I permanent?' > ... > --> Test.__doc__ > 'am I permanent?' > --> Test.__doc__ = 'yes' > Traceback (most recent call last): > ?File "", line 1, in > AttributeError: attribute '__doc__' of 'type' objects is not writable > --> type(Test) > > > --> class Meta(type): > ... ? "only for exists to allow writable __doc__" > ... > --> class Test(metaclass=Meta): > ... ? __doc__ = 'am I permanent?' > ... > --> Test.__doc__ > 'am I permanent?' > --> Test.__doc__ = 'No!' > --> Test.__doc__ > 'No!' > --> type(Test) > > > Should I create a bug report? http://bugs.python.org/issue12773 :) -eric From thatiparthysreenivas at gmail.com Thu Jan 19 05:52:08 2012 From: thatiparthysreenivas at gmail.com (Sreenivas Reddy T) Date: Thu, 19 Jan 2012 10:22:08 +0530 Subject: [Python-Dev] Writable __doc__ In-Reply-To: References: <4F175D53.1050107@stoneleaf.us> Message-ID: this is happening on python 2.6 too. Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class Test(type): ... __doc__= File "", line 2 __doc__= ^ SyntaxError: invalid syntax >>> class Test(type): ... __doc__='asasdas' ... >>> >>> Test.__doc__='sadfsdff' Traceback (most recent call last): File "", line 1, in AttributeError: attribute '__doc__' of 'type' objects is not writable >>> type(Test) >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From noufal at nibrahim.net.in Thu Jan 19 06:07:30 2012 From: noufal at nibrahim.net.in (Noufal Ibrahim) Date: Thu, 19 Jan 2012 10:37:30 +0530 Subject: [Python-Dev] Writable __doc__ In-Reply-To: (Sreenivas Reddy T.'s message of "Thu, 19 Jan 2012 10:22:08 +0530") References: <4F175D53.1050107@stoneleaf.us> Message-ID: <874nvs8elp.fsf@sanitarium.localdomain> Sreenivas Reddy T writes: > this is happening on python 2.6 too. > > Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) > [GCC 4.4.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> class Test(type): > ... __doc__= > File "", line 2 > __doc__= > ^ > SyntaxError: invalid syntax >>>> class Test(type): > ... __doc__='asasdas' > ... >>>> I don't get any syntax errors (Python2.7 and 2.6) >>> class Test(object): ... __doc__ = "Something" ... >>> >>> help(Test) >>> class Test(type): ... __doc__ = "something" ... >>> help(Test) >>> Test.__doc__ 'something' >>>> Test.__doc__='sadfsdff' > Traceback (most recent call last): > File "", line 1, in > AttributeError: attribute '__doc__' of 'type' objects is not writable >>>> type(Test) > >>>> The __name__, __bases__, __module__, __abstractmethods__, __dict__ and __doc__ attributes have custom getters and setters in the type object definition. __doc__ has only a getter. No setter and no deleter. http://hg.python.org/cpython/file/0b5ce36a7a24/Objects/typeobject.c#l658 That is why you're seeing this. What's the question here? [...] -- ~noufal http://nibrahim.net.in May I ask a question? From noufal at nibrahim.net.in Thu Jan 19 06:12:51 2012 From: noufal at nibrahim.net.in (Noufal Ibrahim) Date: Thu, 19 Jan 2012 10:42:51 +0530 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <874nvs8elp.fsf@sanitarium.localdomain> (Noufal Ibrahim's message of "Thu, 19 Jan 2012 10:37:30 +0530") References: <4F175D53.1050107@stoneleaf.us> <874nvs8elp.fsf@sanitarium.localdomain> Message-ID: <87zkdk6zsc.fsf@sanitarium.localdomain> Noufal Ibrahim writes: [...] > That is why you're seeing this. What's the question here? [...] My apologies. I didn't read the whole thread. -- ~noufal http://nibrahim.net.in Some bird populations soaring down -Headline of an article in Science News, page 126, February 20, 1993. From turnbull at sk.tsukuba.ac.jp Thu Jan 19 07:29:35 2012 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 19 Jan 2012 15:29:35 +0900 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d3ag6w8g.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > "The status quo really isn't all that bad" applies to any PEP. Also, > compared to most PEPs, it is quite easy to revert to the previous > state of things if they don't work out as wanted. That depends on how "doesn't work out" plays out. If meeting the schedule *and* producing a good release regularly is just more work than expected, of course you're right. If you stick to the schedule with insufficient resources, and lack of testing produces a really bad release (or worse, a couple of sorta bad releases in succession), reverting Python's reputation for stability is going to be non-trivial. > a) The release manager's job is not as bad as you might believe. We > have an incredibly helpful and active core of developers which means > that the RM job is more or less "reduced" to pronouncing on changes > during the rc phase, and actually producing the releases. I've done release management and I've been watching Python do release management since PEP 263; I'm well aware that Python has a truly excellent process in place, and I regularly recommend studying to friends interested in improving their own projects' processes. But I've also (twice) been involved (as RM) in a major revision of RM procedures, and both times it was a lot more work than anybody expected. Finally, the whole point of this exercise is to integrate a lot more stdlib changes (including whole packages) than in the past on a much shorter timeline, and to do it repeatedly. "Every six months" still sounds like a long time if you are a "leaf" project still working on your changes on your own schedule and chafing at the bit waiting to get them in to the core project's releases, but it's actually quite short for the RM. I'm not against this change (especially since, as Antoine so graciously pointed out, I'm not going to be actually doing the work in the foreseeable future), but I do advise that the effort required seemed to be dramatically underestimated. > b) I did not have the impression (maybe someone can underline that > with tracker stats?) that there were a lot more bug reports than > usual during the alpha and early beta stages of Python 3.2. Yeah, but the question for Python's stability reputation is "were there more than zero?" Every bug that gets through is a risk. From stephen at xemacs.org Thu Jan 19 07:33:51 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 19 Jan 2012 15:33:51 +0900 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F175FD6.30502@pearwood.info> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> Message-ID: <87boq06w1c.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the > sort of people you DO have to convince that moving to an accelerated or more > complex release process will result in a better product. Well, to be fair, Antoine is right in excluding me from the user base he's trying to attract (as I understand it). I do not maintain products or systems that depend on Python working 99.99999% of the time, and in fact in many of my personal projects I use trunk. One of the problems with this kind of discussion is that the targets of the new procedures are not clear in everybody's mind, but all of us tend to use generic terms like "users" when we mean to discuss benefits or costs to a specific class of users. From greg at krypto.org Thu Jan 19 07:59:10 2012 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 18 Jan 2012 22:59:10 -0800 Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024 In-Reply-To: <20120118215057.23b2396a@pitrou.net> References: <20120118165604.23c66c00@pitrou.net> <20120118172756.2df75c23@pitrou.net> <20120118174215.09a267d6@pitrou.net> <20120118215057.23b2396a@pitrou.net> Message-ID: On Wed, Jan 18, 2012 at 12:50 PM, Antoine Pitrou wrote: > > Well, they should be fixed now :-) > > Regards > > Antoine. awesome! :) From victor.stinner at haypocalc.com Thu Jan 19 11:02:05 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 19 Jan 2012 11:02:05 +0100 Subject: [Python-Dev] Writable __doc__ In-Reply-To: References: <4F175D53.1050107@stoneleaf.us> Message-ID: > http://bugs.python.org/issue12773 ?:) The bug is marked as close, whereas the bug exists in Python 3.2 and has no been closed. The fix must be backported. Victor From solipsis at pitrou.net Thu Jan 19 12:07:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Jan 2012 12:07:59 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> Message-ID: <20120119120759.28bdef68@pitrou.net> On Thu, 19 Jan 2012 11:12:06 +1100 Steven D'Aprano wrote: > Antoine Pitrou wrote: > > Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit : > >> > You claim people won't use stable releases because of not enough > >> > alphas? That sounds completely unrelated. > >> > >> Surely testing is related to user perceptions of stability. More > >> testing helps reduce bugs in released software, which improves user > >> perception of stability, encouraging them to use the software in > >> production. > > > > I have asked a practical question, a theoretical answer isn't exactly > > what I was waiting for. > [...] > > I don't care to convince *you*, since you are not involved in Python > > development and release management (you haven't ever been a contributor > > AFAIK). Unless you produce practical arguments, saying "I don't think > > you can do it" is plain FUD and certainly not worth answering to. > > Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the > sort of people you DO have to convince that moving to an accelerated or more > complex release process will result in a better product. The risk is that you > will lose users, or fragment the user base even more than it is now with 2.x > vs 3.x. Well, you might bring some examples here, but I haven't seen any project lose users *because* they switched to a faster release cycle (*). I don't understand why this proposal would fragment the user base, either. We're not proposing to drop compatibility or build Python 4. ((*) Firefox's decrease in popularity seems to be due to Chrome uptake, and their new release cycle is arguably in response to that) > Quite frankly, I like the simplicity and speed of the current release cycle. > All this talk about separate LTS releases and parallel language releases and > library releases makes my head spin. Well, the PEP discussion might make your head spin, because various possibilities are explored. Obviously the final solution will have to be simple enough to be understood by anyone :-) (do you find Ubuntu's release model, for example, too complicated?) > I fear the day that people asking > questions on the tutor or python-list mailing lists will have to say (e.g.) > "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the > version they're using. Yeah, that's my biggest problem with Nick's proposal. Hopefully we can avoid parallel version schemes. > You're hoping that a > more rapid release cycle will attract more developers, and there is a chance > that you could be right; but a more rapid release cycle WILL increase the > total work load. So you're betting that this change will attract enough new > developers that the work load per person will decrease even as the total work > load increases. This is not something that we can find out without trying, I think. As Georg pointed out, the decision is easy to revert or amend if we find out that the new release cycle is unworkable. Regards Antoine. From solipsis at pitrou.net Thu Jan 19 12:17:51 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Jan 2012 12:17:51 +0100 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> Message-ID: <20120119121751.6d10cb04@pitrou.net> On Thu, 19 Jan 2012 11:03:15 +1000 Nick Coghlan wrote: > > 1. I believe the PEP currently proposes just taking the "no more than > 9" limit off the minor version of the language. Feature releases would > just come out every 6 months, with every 4th release flagged as a > language release. With the moratorium suggestion factored in, yes. The PEP insists on support duration rather than the breadth of changes, though. I think that's a more important piece of information for users. (you don't care whether or not new language constructs were added, if you were not planning to use them) > I don't like this scheme because it tries to use one number (the minor > version field) to cover two very different concepts (stdlib updates > and language updates). While technically feasible, this is > unnecessarily obscure and confusing for end users. As an end user I wouldn't really care whether a release is "stdlib changes only" or "language/builtins additions too" (especially in a language like Python where the boundaries are somewhat blurry). I think this distinction is useful mainly for experts and therefore not worth complicating version numbering for. > 2. Brett's alternative proposal is that we switch to using the major > version for language releases and the minor version for stdlib > releases. We would then release 3.3, 3.4, 3.5 and 3.6 at 6 month > intervals, with 4.0 then being released in August 2014 as a new > language version. The main problem I see with this is that Python 3 was a big disruptive event for the community, and calling a new version "Python 4" may make people anxious at the prospect of compatibility breakage. Instead of spending some time advertising that "Python 4" is a safe upgrade, perhaps we could simply call it "Python 3.X+1"? (and, as you point out, keep "Python X+1" for when we want to change the language in incompatible ways again) > So in August this year, we would release 3.3+12.08, followed by > 3.3+13.02, 3.3+13.08, 3.3+14.02 at 6 month intervals, and then the > next language release as 3.4+14.08. If someone refers to just Python > 3.3, then the "at least stdlib 12.08" is implied. If they refer to > Python stdlib 12.08, 13.02, 13.08 or 14.02, then it is the dependency > on "Python 3.3" that is implied. If I were a casual user of a piece of software, I'd really find such a numbering scheme complicated and intimidating. I don't think most users want such a level of information. Regards Antoine. From solipsis at pitrou.net Thu Jan 19 12:18:44 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Jan 2012 12:18:44 +0100 Subject: [Python-Dev] Writable __doc__ References: <4F175D53.1050107@stoneleaf.us> Message-ID: <20120119121844.3e014bd2@pitrou.net> On Wed, 18 Jan 2012 20:31:38 -0700 Eric Snow wrote: > > > > Should I create a bug report? > > http://bugs.python.org/issue12773 :) Well done Eric :) From ncoghlan at gmail.com Thu Jan 19 12:35:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 21:35:19 +1000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <20120119120759.28bdef68@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <20120119120759.28bdef68@pitrou.net> Message-ID: On Thu, Jan 19, 2012 at 9:07 PM, Antoine Pitrou wrote: >> I fear the day that people asking >> questions on the tutor or python-list mailing lists will have to say (e.g.) >> "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the >> version they're using. > > Yeah, that's my biggest problem with Nick's proposal. Hopefully we can > avoid parallel version schemes. They're not really parallel - the stdlib version would fully determine the language version. I'm only proposing two version numbers because we're planning to start versioning *two* things (the standard library, updated every 6 months, and the language spec, updated every 18-24 months). Since the latter matches what we do now, I'm merely proposing that we leave its versioning alone, and add a *new* identiifier specifically for the interim stdlib updates. Thinking about it though, I've realised that the sys.version string already contains a lot more than just the language version number, so I think it should just be updated to include the stdlib version information, and the version_info named tuple could get a new 'stdlib' field as a string. That way, sys.version and sys.version_info would still fully define the Python version, we just wouldn't be mucking with the meaning of any of the existing fields. For example, the current: >>> sys.version '3.2.2 (default, Sep 5 2011, 21:17:14) \n[GCC 4.6.1]' >>> sys.version_info sys.version_info(major=3, minor=2, micro=2, releaselevel='final', serial=0) might become: >>> sys.version '3.3.1 (stdlib 12.08, default, Feb 18 2013, 21:17:14) \n[GCC 4.6.1]' >>> sys.version_info sys.version_info(major=3, minor=3, micro=1, releaselevel='final', serial=0, stdlib='12.08') for the maintenance release and: >>> sys.version '3.3.1 (stdlib 13.02, default, Feb 18 2013, 21:17:14) \n[GCC 4.6.1]' >>> sys.version_info sys.version_info(major=3, minor=3, micro=1, releaselevel='final', serial=0, stdlib='13.02') for the stdlib-only update. Explicit-is-better-than-implicit'ly yours, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Jan 19 13:00:06 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Jan 2012 22:00:06 +1000 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: <20120119121751.6d10cb04@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <20120119121751.6d10cb04@pitrou.net> Message-ID: On Thu, Jan 19, 2012 at 9:17 PM, Antoine Pitrou wrote: > If I were a casual user of a piece of software, I'd really find such a > numbering scheme complicated and intimidating. I don't think most users > want such a level of information. I think the ideal numbering scheme from a *new* user point of view is the one Brett suggested (where major=language update, minor=stdlib update), but (as has been noted) there are solid historical reasons we can't use that. While I still have misgivings, I'm starting to come around to the idea of just allowing the minor release number to increment faster (Barry's co-authorship of the PEP, suggesting he doesn't see such a scheme causing any problems for Ubuntu is big factor in that). I'd still like the core language version to be available programmatically, though, and I'd like the PEP to consider displaying it as part of sys.version and using it to allow things like having bytecode compatible versions share bytecode files in the cache. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Thu Jan 19 13:00:35 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 19 Jan 2012 07:00:35 -0500 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: <20120119121751.6d10cb04@pitrou.net> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <20120119121751.6d10cb04@pitrou.net> Message-ID: <20120119070035.3a8a518e@resist.wooz.org> On Jan 19, 2012, at 12:17 PM, Antoine Pitrou wrote: >The main problem I see with this is that Python 3 was a big >disruptive event for the community, and calling a new version "Python >4" may make people anxious at the prospect of compatibility breakage. s/was/is/ The Python 3 transition is ongoing, and Guido himself at the time thought it would take 5 years. I think we're making excellent progress, but there are still occasional battles just to convince upstream third party developers that supporting Python 3 (let alone *switching* to Python 3) is even worth the effort. I think we're soon going to be at a tipping point where not supporting Python 3 will be the minority position. Even if a hypothetical Python 4 were completely backward compatible, I shudder at the PR nightmare that would entail. I'm not saying there will never be a time for Python 4, but I sure hope it's far enough in the future that you youngun's will be telling us about it in the Tim Peters Home for Python Old Farts, where we'll smile blankly, bore you again with stories of vinyl records, phones with real buttons, and Python 1.6.1 while you feed us our mush under chronologically arranged pictures of BDFLs Van Rossum, Peterson, and Van Rossum. -Barry From benjamin at python.org Thu Jan 19 14:07:45 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 19 Jan 2012 08:07:45 -0500 Subject: [Python-Dev] Writable __doc__ In-Reply-To: References: <4F175D53.1050107@stoneleaf.us> Message-ID: 2012/1/19 Victor Stinner : >> http://bugs.python.org/issue12773 ?:) > > The bug is marked as close, whereas the bug exists in Python 3.2 and > has no been closed. The fix must be backported. It's not a bug; it's a feature. -- Regards, Benjamin From merwok at netwok.org Thu Jan 19 15:03:07 2012 From: merwok at netwok.org (=?UTF-8?Q?=C3=89ric_Araujo?=) Date: Thu, 19 Jan 2012 15:03:07 +0100 Subject: [Python-Dev] [Python-checkins] cpython: add str.casefold() (closes #13752) In-Reply-To: <9bd4a2c9c735b9cf1a896fa6f11fe2e3@netwok.org> References: <9bd4a2c9c735b9cf1a896fa6f11fe2e3@netwok.org> Message-ID: Thanks for 0b5ce36a7a24 Benjamin. From pje at telecommunity.com Thu Jan 19 16:17:18 2012 From: pje at telecommunity.com (PJ Eby) Date: Thu, 19 Jan 2012 10:17:18 -0500 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F170793.9060802@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> <4F170793.9060802@v.loewis.de> Message-ID: On Jan 18, 2012 12:55 PM, Martin v. L?wis wrote: > > Am 18.01.2012 17:01, schrieb PJ Eby: > > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" > > wrote: > > > > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits > > > could cache a "hash perturbation" computed from the string and the > > > random bits: > > > > > > - hash() would use ob_shash > > > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > > > > > This way, you cache almost all computations, adding only a computation > > > and a couple logical ops when looking up a string in a dict. > > > > That's a good idea. For Unicode, it might be best to add another slot > > into the object, even though this increases the object size. > > > > > > Wouldn't that break the ABI in 2.x? > > I was thinking about adding the field at the end, so I thought it > shouldn't. However, if somebody inherits from PyUnicodeObject, it still > might - so my new proposal is to add the extra hash into the str block, > either at str[-1], or after the terminating 0. This would cause an > average increase of four bytes of the storage (0 bytes in 50% of the > cases, 8 bytes because of padding in the other 50%). > > What do you think? So far it sounds like the very best solution of all, as far as backward compatibility is concerned. If the extra bits are only used when two strings have a matching hash value, the only doctests that could be affected are ones testing for this issue. ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 19 17:36:59 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 19 Jan 2012 08:36:59 -0800 Subject: [Python-Dev] Writable __doc__ In-Reply-To: References: <4F175D53.1050107@stoneleaf.us> Message-ID: <4F1846AB.7060700@stoneleaf.us> Benjamin Peterson wrote: > 2012/1/19 Victor Stinner : >>> http://bugs.python.org/issue12773 :) >> The bug is marked as close, whereas the bug exists in Python 3.2 and >> has no been closed. The fix must be backported. > > It's not a bug; it's a feature. Where does one draw the line between feature and bug? As a user I'm inclined to classify this as a bug: __doc__ was writable with old-style classes; __doc__ is writable with new-style classes with any metaclass; and there exists no good reason (that I'm aware of ;) for __doc__ to not be writable. ~Ethan~ From guido at python.org Thu Jan 19 18:21:56 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jan 2012 09:21:56 -0800 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F1846AB.7060700@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> Message-ID: On Thu, Jan 19, 2012 at 8:36 AM, Ethan Furman wrote: > Benjamin Peterson wrote: > >> 2012/1/19 Victor Stinner **: >> >>> http://bugs.python.org/**issue12773 :) >>>> >>> The bug is marked as close, whereas the bug exists in Python 3.2 and >>> has no been closed. The fix must be backported. >>> >> >> It's not a bug; it's a feature. >> > > Where does one draw the line between feature and bug? As a user I'm > inclined to classify this as a bug: __doc__ was writable with old-style > classes; __doc__ is writable with new-style classes with any metaclass; and > there exists no good reason (that I'm aware of ;) for __doc__ to not be > writable. Like it or not, this has worked this way ever since new-style classes were introduced. That has made it a de-facto feature. We should not encourage people to write code that works with a certain bugfix release but not with the previous bugfix release of the same feature release. Given that we haven't had any complaints about this in nearly a decade, the backport can't be important. Don't do it. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From janssen at parc.com Thu Jan 19 18:22:03 2012 From: janssen at parc.com (Bill Janssen) Date: Thu, 19 Jan 2012 09:22:03 PST Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> Message-ID: <62991.1326993723@parc.com> Nick Coghlan wrote: > On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano wrote: > > Brett Cannon wrote: > > Do we have any evidence of this alleged bitrot? I spend a lot of time on the > > comp.lang.python newsgroup and I see no evidence that people using Python > > believe the standard library is rotting from lack of attention. > > IMO, it's a problem mainly with network (especially web) protocols and > file formats. It can take the stdlib a long time to catch up with > external developments due to the long release cycle, so people are > often forced to switch to third party libraries that better track the > latest versions of relevant standards (de facto or otherwise). I'm not sure how much of a problem this really is. I continually build fairly complicated systems with Python that do a lot of HTTP networking, for instance. It's fairly easy to replace use of the standard library modules with use of Tornado and httplib2, and I wouldn't think of *not* doing that. But the standard modules are there, out-of-the-box, for experimentation and tinkering, and they work in the sense that they pass their module tests. Are those standard modules as "Internet-proof" as some commercially-supported package with an income stream that supports frequent security updates would be? Perhaps not. But maybe that's OK. Another way of doing this would be to "bless" certain third-party modules in some fashion short of incorporation, and provide them with more robust development support, again, "somehow", so that they don't fall by the wayside when their developers move on to something else, but are still able to release on an independent schedule. Bill From greg at krypto.org Thu Jan 19 18:41:56 2012 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 19 Jan 2012 09:41:56 -0800 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F170793.9060802@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> <4F170793.9060802@v.loewis.de> Message-ID: On Wed, Jan 18, 2012 at 9:55 AM, "Martin v. L?wis" wrote: > Am 18.01.2012 17:01, schrieb PJ Eby: > > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" > > wrote: > > > > Am 17.01.2012 22:26, schrieb Antoine Pitrou: > > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 > bits > > > could cache a "hash perturbation" computed from the string and the > > > random bits: > > > > > > - hash() would use ob_shash > > > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) > > > > > > This way, you cache almost all computations, adding only a > computation > > > and a couple logical ops when looking up a string in a dict. > > > > That's a good idea. For Unicode, it might be best to add another slot > > into the object, even though this increases the object size. > > > > Wouldn't that break the ABI in 2.x? > > I was thinking about adding the field at the end, so I thought it > shouldn't. However, if somebody inherits from PyUnicodeObject, it still > might - so my new proposal is to add the extra hash into the str block, > either at str[-1], or after the terminating 0. This would cause an > average increase of four bytes of the storage (0 bytes in 50% of the > cases, 8 bytes because of padding in the other 50%). > > What do you think? > str[-1] is not likely to work if you want to maintain ABI compatibility. Appending it to the data after the terminating \0 is more likely to be possible, but if there is any possibility that existing compiled extension modules have somehow inlined code to do allocation of the str field even that is questionable (i don't think there are?). I'd also be concerned about C API code that uses PyUnicode_Resize(). How do you keep track of if you have filled in these extra bytes at the end in or not? allocation and resize fill it with a magic value indicating "not filled in" similar to a tp_hash of -1? Regardless of all of this, I don't think this fully addresses the overall issue as strings within other hashable data structures like tuples would not be treated this way, only strings directly stored in a dict. Sure you can continue on and "fix" tuples and such in a similar manner but then what about user defined classes that implement __hash__ based on the return value of hash() on some strings they contain? I don't see anything I'd consider a real complete fix unless we also backport the randomized hash code so that people who need a guaranteed fix can enable it and use it. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Jan 19 19:01:15 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 19 Jan 2012 11:01:15 -0700 Subject: [Python-Dev] PEP 407 / splitting the stdlib In-Reply-To: <62991.1326993723@parc.com> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> <62991.1326993723@parc.com> Message-ID: On Jan 19, 2012 9:28 AM, "Bill Janssen" wrote: > I'm not sure how much of a problem this really is. I continually build > fairly complicated systems with Python that do a lot of HTTP networking, > for instance. It's fairly easy to replace use of the standard library > modules with use of Tornado and httplib2, and I wouldn't think of *not* > doing that. But the standard modules are there, out-of-the-box, for > experimentation and tinkering, and they work in the sense that they pass > their module tests. Are those standard modules as "Internet-proof" as > some commercially-supported package with an income stream that supports > frequent security updates would be? This is starting to sound a little like the discussion about the __preview__ / __experimental__ idea. If I recall correctly, one of the points is that for some organizations getting a third-party library approved for use is not trivial. In contrast, inclusion in the stdlib is like a free pass, since the organization can rely on the robustness of the CPython QA and release processes. As well, there is at least a small cost with third-party libraries for those that maintain more rigorous configuration management. In contrast, there is basically no extra cost with new/updated stdlib, beyond upgrading Python. -eric > > Perhaps not. But maybe that's OK. > > Another way of doing this would be to "bless" certain third-party > modules in some fashion short of incorporation, and provide them with > more robust development support, again, "somehow", so that they don't > fall by the wayside when their developers move on to something else, > but are still able to release on an independent schedule. > > Bill > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Jan 19 19:04:36 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 20 Jan 2012 03:04:36 +0900 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F1846AB.7060700@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> Message-ID: <871uqv7emj.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Where does one draw the line between feature and bug? Bug: Doesn't work as documented. Feature: Works as expected but not documented[1] to do so. Miracle: Works as documented.[2] Unspecified behavior that doesn't work as you expect is the unmarked case (ie, none of the above). The Devil's Dictionary defines feature somewhat differently: Feature: Name for any behavior you don't feel like justifying to a user. Footnotes: [1] Including cases where the patch contains documentation but hasn't been committed to trunk yet. [2] Python is pretty miraculous, isn't it? From ethan at stoneleaf.us Thu Jan 19 18:46:07 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 19 Jan 2012 09:46:07 -0800 Subject: [Python-Dev] Writable __doc__ In-Reply-To: References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> Message-ID: <4F1856DF.3040708@stoneleaf.us> Guido van Rossum wrote: > We should not encourage people to write code that works with a certain > bugfix release but not with the previous bugfix release of the same > feature release. Then what's the point of a bug-fix release? If 3.2.1 had broken threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to 3.2.2? Or would we scrap 3.2 and move immediately to 3.3? (Is that more or less what happened with 3.0?) > Like it or not, this has worked this way ever since new-style classes > were introduced. That has made it a de-facto feature. But what of the discrepancy between the 'type' metaclass and any other Python metaclass? > Given that we haven't had any complaints about this in nearly a decade, > the backport can't be important. Don't do it. Agreed. ~Ethan~ From fuzzyman at voidspace.org.uk Thu Jan 19 19:40:09 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 19 Jan 2012 18:40:09 +0000 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F1856DF.3040708@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> <4F1856DF.3040708@stoneleaf.us> Message-ID: <4F186389.2060601@voidspace.org.uk> On 19/01/2012 17:46, Ethan Furman wrote: > Guido van Rossum wrote: > > We should not encourage people to write code that works with a certain > > bugfix release but not with the previous bugfix release of the same > > feature release. > > Then what's the point of a bug-fix release? If 3.2.1 had broken > threading, wouldn't we fix it in 3.2.2 and encourage folks to switch > to 3.2.2? Or would we scrap 3.2 and move immediately to 3.3? (Is > that more or less what happened with 3.0?) > > >> Like it or not, this has worked this way ever since new-style classes >> were introduced. That has made it a de-facto feature. > > But what of the discrepancy between the 'type' metaclass and any other > Python metaclass? There are many discrepancies between built-in types and any Python class. Writable attributes are (generally) one of them. Michael > > >> Given that we haven't had any complaints about this in nearly a >> decade, the backport can't be important. Don't do it. > > Agreed. > > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From g.brandl at gmx.net Thu Jan 19 21:12:01 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 19 Jan 2012 21:12:01 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F175FD6.30502@pearwood.info> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> Message-ID: Am 19.01.2012 01:12, schrieb Steven D'Aprano: > One on-going complaint is that Python-Dev doesn't have the manpower or time to > do everything that needs to be done. Bugs languish for months or years because > nobody has the time to look at it. Will going to a more rapid release cycle > give people more time, or just increase their workload? You're hoping that a > more rapid release cycle will attract more developers, and there is a chance > that you could be right; but a more rapid release cycle WILL increase the > total work load. So you're betting that this change will attract enough new > developers that the work load per person will decrease even as the total work > load increases. I don't think that's a safe bet. I can't help noticing that so far, worries about the workload came mostly from people who don't actually bear that load (this is no accusation!), while those that do are the proponents of the PEP... That is, I don't want to exclude you from the discussion, but on the issue of workload I would like to encourage more of our (past and present) release managers and active bug triagers to weigh in. cheers, Georg From nadeem.vawda at gmail.com Thu Jan 19 22:09:40 2012 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Thu, 19 Jan 2012 23:09:40 +0200 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER In-Reply-To: References: Message-ID: On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi wrote: > + ?are gathered into a lits. This is commonly useful for command line s/lits/list ? From sandro.tosi at gmail.com Thu Jan 19 22:17:56 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Thu, 19 Jan 2012 22:17:56 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER In-Reply-To: References: Message-ID: On Thu, Jan 19, 2012 at 22:09, Nadeem Vawda wrote: > On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi > wrote: >> + ?are gathered into a lits. This is commonly useful for command line > > s/lits/list ? crap! I committed an older version of the patch... thanks for spotting it, i'll fix it right away -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From guido at python.org Thu Jan 19 22:21:28 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jan 2012 13:21:28 -0800 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <4F1856DF.3040708@stoneleaf.us> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> <4F1856DF.3040708@stoneleaf.us> Message-ID: On Thu, Jan 19, 2012 at 9:46 AM, Ethan Furman wrote: > Guido van Rossum wrote: > > We should not encourage people to write code that works with a certain > > bugfix release but not with the previous bugfix release of the same > > feature release. > > Then what's the point of a bug-fix release? If 3.2.1 had broken > threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to > 3.2.2? Or would we scrap 3.2 and move immediately to 3.3? (Is that more > or less what happened with 3.0?) Usually the bugs fixed in bugfix releases are things that usually go well but don't work under certain circumstances. But I'd also be happy to just declare that assignable __doc__ is a feature without explaining why. Like it or not, this has worked this way ever since new-style classes were > introduced. That has made it a de-facto feature. > But what of the discrepancy between the 'type' metaclass and any other > Python metaclass? Michael Foord explained that. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandro.tosi at gmail.com Thu Jan 19 23:10:42 2012 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Thu, 19 Jan 2012 23:10:42 +0100 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER In-Reply-To: <4F18860F.7030909@udel.edu> References: <4F18860F.7030909@udel.edu> Message-ID: On Thu, Jan 19, 2012 at 22:07, Terry Reedy wrote: > typo ... > lits .> list yep, i've already fixed it committing a more useful example too -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From tjreedy at udel.edu Thu Jan 19 23:14:49 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Jan 2012 17:14:49 -0500 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <871uqv7emj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> <871uqv7emj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 1/19/2012 1:04 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > > > Where does one draw the line between feature and bug? > > Bug: Doesn't work as documented. The basic idea is that the x.y docs define (mostly) the x.y language. Patches to the x.y docs fix typos, omissions, ambiguities, and the occasional error. The x.y.z cpython releases are increasingly better implementations of Python x.y. -- Terry Jan Reedy From ethan at stoneleaf.us Thu Jan 19 23:44:18 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 19 Jan 2012 14:44:18 -0800 Subject: [Python-Dev] Writable __doc__ In-Reply-To: <871uqv7emj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F175D53.1050107@stoneleaf.us> <4F1846AB.7060700@stoneleaf.us> <871uqv7emj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F189CC2.3090300@stoneleaf.us> Stephen J. Turnbull wrote: > Ethan Furman writes: > >> Where does one draw the line between feature and bug? > > Miracle: Works as documented.[2] > > > [2] Python is pretty miraculous, isn't it? Yes, indeed it is! :) ~Ethan~ From martin at v.loewis.de Fri Jan 20 00:54:00 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 20 Jan 2012 00:54:00 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> Message-ID: <4F18AD18.2080901@v.loewis.de> > I can't help noticing that so far, worries about the workload came mostly from > people who don't actually bear that load (this is no accusation!), while those > that do are the proponents of the PEP... Ok, so let me add then that I'm worried about the additional work-load. I'm particularly worried about the coordination of vacation across the three people that work on a release. It might well not be possible to make any release for a period of two months, which, in a six-months release cycle with two alphas and a beta, might mean that we (the release people) would need to adjust our vacation plans with the release schedule, or else step down (unless you would release the "normal" feature releases as source-only releases). FWIW, it might well be that I can't be available for the 3.3 final release (I haven't finalized my vacation schedule yet for August). Regards, Martin From vijaymajagaonkar at gmail.com Fri Jan 20 00:56:25 2012 From: vijaymajagaonkar at gmail.com (Vijay N. Majagaonkar) Date: Thu, 19 Jan 2012 18:56:25 -0500 Subject: [Python-Dev] python build failed on mac Message-ID: Hi all, I am trying to build python 3 on mac and build failing with following error can somebody help me with this $ hg clone http://hg.python.org/cpython $ ./configure $ make gcc -framework CoreFoundation -o python.exe Modules/python.o libpython3.3m.a -ldl -framework CoreFoundation ./python.exe -SE -m sysconfig --generate-posix-vars Could not find platform dependent libraries Consider setting $PYTHONHOME to [:] python.exe(43296) malloc: *** mmap(size=7310873954244194304) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug make: *** [Lib/_sysconfigdata.py] Segmentation fault: 11 make: *** Deleting file `Lib/_sysconfigdata.py' ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Fri Jan 20 01:48:53 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 01:48:53 +0100 Subject: [Python-Dev] Counting collisions for the win Message-ID: Hi, I'm working on the hash collision issue since 2 or 3 weeks. I evaluated all solutions and I think that I have now a good knowledge of the problem and how it should be solved. The major issue is to have a minor or no impact on applications (don't break backward compatibility). I saw three major solutions: - use a randomized hash - use two hashes, a randomized hash and the actual hash kept for backward compatibility - count collisions on dictionary lookup Using a randomized hash does break a lot of tests (e.g. tests relying on the representation of a dictionary). The patch is huge, too big to backport it directly on stable versions. Using a randomized hash may also break (indirectly) real applications because the application output is also somehow "randomized". For example, in the Django test suite, the HTML output is different at each run. Web browsers may render the web page differently, or crash, or ... I don't think that Django would like to sort attributes of each HTML tag, just because we wanted to fix a vulnerability. Randomized hash has also a major issue: if the attacker is able to compute the secret, (s)he can easily compute collisions and exploit the hash collision vulnerability again. I don't know exactly how complex it is to compute the secret, but our hash function is weak (it is far from being cryptographic, it is really simple to run it backward). If someone writes a fast function to compute the secret, we will go back to the same point. IMO using two hashes has the same disavantages of the randomized hash solution, whereas it is more complex to implement. The last solution is very simple: count collision and raise an exception if it hits a limit. The path is something like 10 lines whereas the randomized hash is more close to 500 lines, add a new file, change Visual Studio project file, etc. First I thaught that it would break more applications than the randomized hash, but I tried on Django: the test suite fails with a limit of 20 collisions, but not with a limit of 50 collisions, whereas the patch uses a limit of 1000 collisions. According to my basic tests, a limit of 35 collisions requires a dictionary with more than 10,000,000 integer keys to raise an error. I am not talking about the attack, but valid data. More details about my tests on the Django test suite: http://bugs.python.org/issue13703#msg151620 -- I propose to solve the hash collision vulnerability by counting collisions because it does fix the vulnerability with a minor or no impact on applications or backward compatibility. I don't see why we should use a different fix for Python 3.3. If counting collisons solves the issue for stable versions, it is also enough for Python 3.3. We now know all issues of the randomized hash solution, and I think that there are more drawbacks than advantages. IMO the randomized hash is overkill to fix the hash collision issue. I just have some requests on Marc Andre Lemburg patch: - the limit should be configurable: a new function in the sys module should be enough. It may be private (or replaced by an environment variable?) in stable versions - the set type should also be patched (I didn't check if it is vulnerable or not using the patch) - the patch has no test! (a class with a fixed hash should be enough to write a test) - the limit must be documented somwhere - the exception type should be different than KeyError Victor From greg.ewing at canterbury.ac.nz Thu Jan 19 22:41:17 2012 From: greg.ewing at canterbury.ac.nz (Greg) Date: Fri, 20 Jan 2012 10:41:17 +1300 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> Message-ID: <4F188DFD.6080401@canterbury.ac.nz> Glyph wrote: > [Guido] mentions the point that coroutines that can implicitly switch out from > under you have the same non-deterministic property as threads: you don't > know where you're going to need a lock or lock-like construct to update > any variables, so you need to think about concurrency more deeply than > if you could explicitly always see a 'yield'. I'm not convinced that being able to see 'yield's will help all that much. In any system that makes substantial use of generator-based coroutines, you're going to see 'yield from's all over the place, from the lowest to the highest levels. But that doesn't mean you need a correspondingly large number of locks. You can't look at a 'yield' and conclude that you need a lock there or tell what needs to be locked. There's no substitute for deep thought where any kind of theading is involved, IMO. -- Greg From guido at python.org Fri Jan 20 03:47:13 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jan 2012 18:47:13 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Thu, Jan 19, 2012 at 4:48 PM, Victor Stinner < victor.stinner at haypocalc.com> wrote: > Hi, > > I'm working on the hash collision issue since 2 or 3 weeks. I > evaluated all solutions and I think that I have now a good knowledge > of the problem and how it should be solved. The major issue is to have > a minor or no impact on applications (don't break backward > compatibility). I saw three major solutions: > > - use a randomized hash > - use two hashes, a randomized hash and the actual hash kept for > backward compatibility > - count collisions on dictionary lookup > > Using a randomized hash does break a lot of tests (e.g. tests relying > on the representation of a dictionary). The patch is huge, too big to > backport it directly on stable versions. Using a randomized hash may > also break (indirectly) real applications because the application > output is also somehow "randomized". For example, in the Django test > suite, the HTML output is different at each run. Web browsers may > render the web page differently, or crash, or ... I don't think that > Django would like to sort attributes of each HTML tag, just because we > wanted to fix a vulnerability. > > Randomized hash has also a major issue: if the attacker is able to > compute the secret, (s)he can easily compute collisions and exploit > the hash collision vulnerability again. I don't know exactly how > complex it is to compute the secret, but our hash function is weak (it > is far from being cryptographic, it is really simple to run it > backward). If someone writes a fast function to compute the secret, we > will go back to the same point. > > IMO using two hashes has the same disavantages of the randomized hash > solution, whereas it is more complex to implement. > > The last solution is very simple: count collision and raise an > exception if it hits a limit. The path is something like 10 lines > whereas the randomized hash is more close to 500 lines, add a new > file, change Visual Studio project file, etc. First I thaught that it > would break more applications than the randomized hash, but I tried on > Django: the test suite fails with a limit of 20 collisions, but not > with a limit of 50 collisions, whereas the patch uses a limit of 1000 > collisions. According to my basic tests, a limit of 35 collisions > requires a dictionary with more than 10,000,000 integer keys to raise > an error. I am not talking about the attack, but valid data. > > More details about my tests on the Django test suite: > http://bugs.python.org/issue13703#msg151620 > > -- > > I propose to solve the hash collision vulnerability by counting > collisions because it does fix the vulnerability with a minor or no > impact on applications or backward compatibility. I don't see why we > should use a different fix for Python 3.3. If counting collisons > solves the issue for stable versions, it is also enough for Python > 3.3. We now know all issues of the randomized hash solution, and I > think that there are more drawbacks than advantages. IMO the > randomized hash is overkill to fix the hash collision issue. > +1 > I just have some requests on Marc Andre Lemburg patch: > > - the limit should be configurable: a new function in the sys module > should be enough. It may be private (or replaced by an environment > variable?) in stable versions > - the set type should also be patched (I didn't check if it is > vulnerable or not using the patch) > - the patch has no test! (a class with a fixed hash should be enough > to write a test) > - the limit must be documented somwhere > - the exception type should be different than KeyError > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jan 20 03:49:29 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Jan 2012 12:49:29 +1000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F18AD18.2080901@v.loewis.de> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: On Fri, Jan 20, 2012 at 9:54 AM, "Martin v. L?wis" wrote: >> I can't help noticing that so far, worries about the workload came mostly from >> people who don't actually bear that load (this is no accusation!), while those >> that do are the proponents of the PEP... > > Ok, so let me add then that I'm worried about the additional work-load. > > I'm particularly worried about the coordination of vacation across the > three people that work on a release. It might well not be possible to > make any release for a period of two months, which, in a six-months > release cycle with two alphas and a beta, might mean that we (the > release people) would need to adjust our vacation plans with the release > schedule, or else step down (unless you would release the "normal" > feature releases as source-only releases). I must admit that aspect had concerned me as well. Currently we use the 18-24 month window for releases to slide things around to accommodate the schedules of the RM, Martin (Windows binaries) and Ned/Ronald (Mac OS X binaries). Before we could realistically switch to more frequent releases, something would need to change on the binary release side. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Fri Jan 20 04:01:19 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 20 Jan 2012 14:01:19 +1100 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F188DFD.6080401@canterbury.ac.nz> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> Message-ID: On Fri, Jan 20, 2012 at 8:41 AM, Greg wrote: > Glyph wrote: >> >> [Guido] mentions the point that coroutines that can implicitly switch out >> from under you have the same non-deterministic property as threads: you >> don't know where you're going to need a lock or lock-like construct to >> update any variables, so you need to think about concurrency more deeply >> than if you could explicitly always see a 'yield'. > > > I'm not convinced that being able to see 'yield's will help > all that much. In any system that makes substantial use of > generator-based coroutines, you're going to see 'yield from's > all over the place, from the lowest to the highest levels. > But that doesn't mean you need a correspondingly large > number of locks. You can't look at a 'yield' and conclude > that you need a lock there or tell what needs to be locked. > > There's no substitute for deep thought where any kind of > theading is involved, IMO. > > -- > Greg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com I wasn't aware that Guido had brought this up, and I believe what he says to be true. Preemptive coroutines, are just a hack around the GIL, and reduce OS overheads. It's the explicit nature of the enhanced generators that is their greatest value. FWIW, I wrote a Python 3 compatible equivalent to gevent (also greenlet based, and also very similar to Brett's et al coroutine proposal), which didn't really solve the concurrency problems I hoped. There were no guarantees whether functions would "switch out", so all the locking and threading issues simply reemerged, albeit with also needing to have all calls non-blocking, losing compatibility with any routine that didn't make use of nonblocking calls and/or expose it's "yield" in the correct way, but reducing GIL contention. Overall not worth it. In short, implicit coroutines are just a GIL work around, that break compatibility for little gain. Thanks Glyph for those links. From ivan at ludios.org Fri Jan 20 04:32:13 2012 From: ivan at ludios.org (Ivan Kozik) Date: Fri, 20 Jan 2012 03:32:13 +0000 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Fri, Jan 20, 2012 at 00:48, Victor Stinner wrote: > I propose to solve the hash collision vulnerability by counting > collisions because it does fix the vulnerability with a minor or no > impact on applications or backward compatibility. I don't see why we > should use a different fix for Python 3.3. If counting collisons > solves the issue for stable versions, it is also enough for Python > 3.3. We now know all issues of the randomized hash solution, and I > think that there are more drawbacks than advantages. IMO the > randomized hash is overkill to fix the hash collision issue. I'd like to point out that an attacker is not limited to sending just one dict full of colliding keys. Given a 22ms stall for a dict full of 1000 colliding keys, and 100 such objects inside a parent object (perhaps JSON), you can stall a server for 2.2+ seconds. Going with the raise-at-1000 approach doesn't solve the problem for everyone. In addition, because the raise-at-N-collisions approach raises an exception, everyone who wants to handle this error condition properly has to change their code to catch a previously-unexpected exception. (I know they're usually still better off with the fix, but why force many people to change code when you can actually fix the hashing problem?) Another issue is that even with a configurable limit, different modules can't have their own limits. One module might want a relatively safe raise-at-100, and another module creating massive dicts might want raise-at-1000. How does a developer know whether they can raise or lower the limit, given that they use a bunch of different modules? I actually went with this stop-at-N-collisions approach by patching my CPython a few years ago, where I limiting dictobject and setobject's critical `for` loop to 100 iterations (I realize this might handle fewer than 100 collisions.) This worked fine until I tried to compile PyPy, where the translator blew up due to a massive dict. This, combined with the second problem (needing to catch an exception), led me to abandon this approach and write Securetypes, which has a securedict that uses SHA-1. Not that I like this either; I think I'm happy with the randomize-hash() approach. Ivan From guido at python.org Fri Jan 20 04:48:16 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jan 2012 19:48:16 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik wrote: > On Fri, Jan 20, 2012 at 00:48, Victor Stinner > wrote: > > I propose to solve the hash collision vulnerability by counting > > collisions because it does fix the vulnerability with a minor or no > > impact on applications or backward compatibility. I don't see why we > > should use a different fix for Python 3.3. If counting collisons > > solves the issue for stable versions, it is also enough for Python > > 3.3. We now know all issues of the randomized hash solution, and I > > think that there are more drawbacks than advantages. IMO the > > randomized hash is overkill to fix the hash collision issue. > > I'd like to point out that an attacker is not limited to sending just > one dict full of colliding keys. Given a 22ms stall for a dict full > of 1000 colliding keys, and 100 such objects inside a parent object > (perhaps JSON), you can stall a server for 2.2+ seconds. Going with > the raise-at-1000 approach doesn't solve the problem for everyone. > It's "just" a DoS attack. Those won't go away. We just need to raise the effort needed for the attacker. The original attack would cause something like 5 minutes of CPU usage per request (with a set of colliding keys that could be computed once and used to attack every Python-run website in the world). That's at least 2 orders of magnitude worse. In addition, because the raise-at-N-collisions approach raises an > exception, everyone who wants to handle this error condition properly > has to change their code to catch a previously-unexpected exception. > (I know they're usually still better off with the fix, but why force > many people to change code when you can actually fix the hashing > problem?) > Why would anybody need to change their code? Every web framework worth its salt has a top-level error catcher that logs the error, serves a 500 response, and possibly does other things like email the admin. > Another issue is that even with a configurable limit, different > modules can't have their own limits. One module might want a > relatively safe raise-at-100, and another module creating massive > dicts might want raise-at-1000. How does a developer know whether > they can raise or lower the limit, given that they use a bunch of > different modules? > I don't think it needs to be configurable. There just needs to be a way to turn it off. > I actually went with this stop-at-N-collisions approach by patching my > CPython a few years ago, where I limiting dictobject and setobject's > critical `for` loop to 100 iterations (I realize this might handle > fewer than 100 collisions.) This worked fine until I tried to compile > PyPy, where the translator blew up due to a massive dict. I think that's because your collision-counting algorithm was much more primitive than MAL's. > This, > combined with the second problem (needing to catch an exception), led > me to abandon this approach and write Securetypes, which has a > securedict that uses SHA-1. Not that I like this either; I think I'm > happy with the randomize-hash() approach. > Why did you need to catch the exception? Were you not happy with the program simply terminating with a traceback when it got attacked? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Fri Jan 20 04:57:53 2012 From: brian at python.org (Brian Curtin) Date: Thu, 19 Jan 2012 21:57:53 -0600 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F18AD18.2080901@v.loewis.de> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: On Thu, Jan 19, 2012 at 17:54, "Martin v. L?wis" wrote: > Ok, so let me add then that I'm worried about the additional work-load. > > I'm particularly worried about the coordination of vacation across the > three people that work on a release. It might well not be possible to > make any release for a period of two months, which, in a six-months > release cycle with two alphas and a beta, might mean that we (the > release people) would need to adjust our vacation plans with the release > schedule, or else step down (unless you would release the "normal" > feature releases as source-only releases). > > FWIW, it might well be that I can't be available for the 3.3 final > release (I haven't finalized my vacation schedule yet for August). In the interest of not having Windows releases depend on one person, and having gone through building the installer myself (which I know is but one of the duties), I'm available to help should you need it. From steve at pearwood.info Fri Jan 20 05:00:48 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 20 Jan 2012 15:00:48 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: <4F18E6F0.2010208@pearwood.info> Victor Stinner wrote: > The last solution is very simple: count collision and raise an > exception if it hits a limit. ... > According to my basic tests, a limit of 35 collisions > requires a dictionary with more than 10,000,000 integer keys to raise > an error. I am not talking about the attack, but valid data. You might think that 10 million keys is a lot of data, but that's only about 100 MB worth. I already see hardware vendors advertising computers with 6 GB RAM as "entry level", e.g. the HP Pavilion starts with 6GB expandable to 16GB. I expect that there are already people using Python who will unpredictably hit that limit by accident, and the number will only grow as computers get more memory. With a limit of 35 collisions, it only takes 35 keys to to force a dict to raise an exception, if you are an attacker able to select colliding keys. We're trying to defend against an attacker who is able to force collisions, not one who is waiting for accidental collisions. I don't see that causing the dict to raise an exception helps matters: it just changes the attack from "keep the dict busy indefinitely" to "cause an exception and crash the application". This moves responsibility from dealing with collisions out of the dict to the application code. Instead of solving the problem in one place (the built-in dict) now every application that uses dicts has to identify which dicts can be attacked, and deal with the exception. That pushes the responsibility for security onto people who are the least willing or able to deal with it: the average developer, who neither understands nor cares about security, or if they do care, they can't convince their manager to care. I suppose an exception is an improvement over the application hanging indefinitely, but I'd hardly call it a fix. Ruby uses randomized hashes. Are there any other languages with a dict or mapping class that raises on too many exceptions? -- Steven From ivan at ludios.org Fri Jan 20 05:06:25 2012 From: ivan at ludios.org (Ivan Kozik) Date: Fri, 20 Jan 2012 04:06:25 +0000 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Fri, Jan 20, 2012 at 03:48, Guido van Rossum wrote: > I think that's because your collision-counting algorithm was much more > primitive than MAL's. Conceded. >> This, >> combined with the second problem (needing to catch an exception), led >> me to abandon this approach and write Securetypes, which has a >> securedict that uses SHA-1. ?Not that I like this either; I think I'm >> happy with the randomize-hash() approach. > > > Why did you need to catch the exception? Were you not happy with the program > simply terminating with a traceback when it got attacked? No, I wasn't happy with termination. I wanted to treat it just like a JSON decoding error, and send the appropriate response. I actually forgot to mention the main reason I abandoned the stop-at-N-collisions approach. I had a server with a dict that stayed in memory, across many requests. It was being populated with identifiers chosen by clients. I couldn't have my server stay broken if this dict filled up with a bunch of colliding keys. (I don't think I could have done another thing either, like nuke the dict or evict some keys.) Ivan From carl at oddbird.net Fri Jan 20 05:54:18 2012 From: carl at oddbird.net (Carl Meyer) Date: Thu, 19 Jan 2012 21:54:18 -0700 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: <4F18F37A.4040200@oddbird.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Victor, On 01/19/2012 05:48 PM, Victor Stinner wrote: [snip] > Using a randomized hash may > also break (indirectly) real applications because the application > output is also somehow "randomized". For example, in the Django test > suite, the HTML output is different at each run. Web browsers may > render the web page differently, or crash, or ... I don't think that > Django would like to sort attributes of each HTML tag, just because we > wanted to fix a vulnerability. I'm a Django core developer, and if it is true that our test-suite has a dictionary-ordering dependency that is expressed via HTML attribute ordering, I consider that a bug and would like to fix it. I'd be grateful for, not resentful of, a change in CPython that revealed the bug and prompted us to fix it. (I presume that it is true, as it sounds like you experienced it directly; I don't have time to play around at the moment, but I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago). I can't speak for the core team, but I doubt there would be much disagreement on this point: ideally Django would run equally well on any implementation of Python, and as far as I know none of the alternative implementations guarantee hash or dict-ordering compatibility with CPython. I don't have the expertise to speak otherwise to the alternatives for fixing the collisions vulnerability, but I don't believe it's accurate to presume that Django would not want to fix a dict-ordering dependency, and use that as a justification for one approach over another. Carl -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8Y83oACgkQ8W4rlRKtE2cNawCg5q/p1+OOKFYDymDJGoClBBlg WNAAn3xevD+0CqAQ+mFNHCBhtLgw8IYv =HDOh -----END PGP SIGNATURE----- From ncoghlan at gmail.com Fri Jan 20 06:15:16 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Jan 2012 15:15:16 +1000 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F18E6F0.2010208@pearwood.info> References: <4F18E6F0.2010208@pearwood.info> Message-ID: On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano wrote: > With a limit of 35 collisions, it only takes 35 keys to to force a dict to > raise an exception, if you are an attacker able to select colliding keys. > We're trying to defend against an attacker who is able to force collisions, > not one who is waiting for accidental collisions. I don't see that causing > the dict to raise an exception helps matters: it just changes the attack > from "keep the dict busy indefinitely" to "cause an exception and crash the > application". No, that's fundamentally misunderstanding the nature of the attack. The reason the hash collision attack is a problem is because it allows you to DoS a web service in a way that requires minimal client side resources but can have a massive effect on the server. The attacker is making a single request that takes the server an inordinately long time to process, consuming CPU resources all the while, and likely preventing the handling of any other requests (especially for an event-based server, since the attack is CPU based, bypassing all use of asynchronous IO). With the 1000 collision limit in place, the attacker sends their massive request, the affected dict quickly hits the limit, throws an unhandled exception which is then caught by the web framework and turned into a 500 Error response (or whatever's appropriate for the protocol being attacked). If a given web service doesn't *already* have a catch all handler to keep an unexpected exception from bringing the entire service down, then DoS attacks like this one are the least of its worries. As for why other languages haven't gone this way, I have no idea. There are lots of details relating to a language's hash and hash map design that will drive how suitable randomisation is as an answer, and it also depends greatly on how you decide to characterise the threat. FWIW, Victor's analysis in the opening post of this thread matches the conclusions I came to a few days ago, although he's been over the alternatives far more thoroughly than I have. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Jan 20 06:18:36 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Jan 2012 15:18:36 +1000 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F18F37A.4040200@oddbird.net> References: <4F18F37A.4040200@oddbird.net> Message-ID: On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer wrote: > I don't have the expertise to speak otherwise to the alternatives for > fixing the collisions vulnerability, but I don't believe it's accurate > to presume that Django would not want to fix a dict-ordering dependency, > and use that as a justification for one approach over another. It's more a matter of wanting deployment of a security fix to be as painless as possible - a security fix that system administrators can't deploy because it breaks critical applications may as well not exist. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From v+python at g.nevcal.com Fri Jan 20 06:24:55 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 19 Jan 2012 21:24:55 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F18F37A.4040200@oddbird.net> References: <4F18F37A.4040200@oddbird.net> Message-ID: <4F18FAA7.7000805@g.nevcal.com> On 1/19/2012 8:54 PM, Carl Meyer wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Victor, > > On 01/19/2012 05:48 PM, Victor Stinner wrote: > [snip] >> Using a randomized hash may >> also break (indirectly) real applications because the application >> output is also somehow "randomized". For example, in the Django test >> suite, the HTML output is different at each run. Web browsers may >> render the web page differently, or crash, or ... I don't think that >> Django would like to sort attributes of each HTML tag, just because we >> wanted to fix a vulnerability. > I'm a Django core developer, and if it is true that our test-suite has a > dictionary-ordering dependency that is expressed via HTML attribute > ordering, I consider that a bug and would like to fix it. I'd be > grateful for, not resentful of, a change in CPython that revealed the > bug and prompted us to fix it. (I presume that it is true, as it sounds > like you experienced it directly; I don't have time to play around at > the moment, but I'm surprised we haven't seen bug reports about it from > users of 64-bit Pythons long ago). I can't speak for the core team, but > I doubt there would be much disagreement on this point: ideally Django > would run equally well on any implementation of Python, and as far as I > know none of the alternative implementations guarantee hash or > dict-ordering compatibility with CPython. > > I don't have the expertise to speak otherwise to the alternatives for > fixing the collisions vulnerability, but I don't believe it's accurate > to presume that Django would not want to fix a dict-ordering dependency, > and use that as a justification for one approach over another. > > Carl It might be a good idea to have a way to seed the hash with some value to allow testing with different dict orderings -- this would allow tests to be developed using one Python implementation that would be immune to the different orderings on different implementations; however, randomizing the hash not only doesn't solve the problem for long-running applications, it causes non-deterministic performance from one run to the next even with the exact same data: a different (random) seed could cause collisions sporadically with data that usually gave good performance results, and there would be little explanation for it, and little way to reproduce the problem to report it or understand it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Fri Jan 20 07:28:22 2012 From: glyph at twistedmatrix.com (Glyph) Date: Fri, 20 Jan 2012 01:28:22 -0500 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F188DFD.6080401@canterbury.ac.nz> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> Message-ID: On Jan 19, 2012, at 4:41 PM, Greg wrote: > Glyph wrote: >> [Guido] mentions the point that coroutines that can implicitly switch out from under you have the same non-deterministic property as threads: you don't know where you're going to need a lock or lock-like construct to update any variables, so you need to think about concurrency more deeply than if you could explicitly always see a 'yield'. > > I'm not convinced that being able to see 'yield's will help > all that much. Well, apparently we disagree, and I work on such a system all day, every day :-). It was nice to see that Matt Joiner also agreed for very similar reasons, and at least I know I'm not crazy. > In any system that makes substantial use of > generator-based coroutines, you're going to see 'yield from's > all over the place, from the lowest to the highest levels. > But that doesn't mean you need a correspondingly large > number of locks. You can't look at a 'yield' and conclude > that you need a lock there or tell what needs to be locked. Yes, but you can look at a 'yield' and conclude that you might need a lock, and that you have to think about it. Further exploration of my own feelings on the subject grew a bit beyond a good length for a reply here, so if you're interested in my thoughts you can have a look at my blog: . > There's no substitute for deep thought where any kind of theading is involved, IMO. Sometimes there's no alternative, but wherever I can, I avoid thinking, especially hard thinking. This maxim has served me very well throughout my programming career ;-). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From hs at ox.cx Fri Jan 20 10:29:06 2012 From: hs at ox.cx (Hynek Schlawack) Date: Fri, 20 Jan 2012 10:29:06 +0100 Subject: [Python-Dev] python build failed on mac In-Reply-To: References: Message-ID: Hello Vijay Am Freitag, 20. Januar 2012 um 00:56 schrieb Vijay N. Majagaonkar: > I am trying to build python 3 on mac and build failing with following error can somebody help me with this It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: http://bugs.python.org/issue13241 make clean CC=clang ./configure && make -s works though (despite the abundant warnings). Regards, -h From martin at v.loewis.de Fri Jan 20 10:34:09 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 20 Jan 2012 10:34:09 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: <4F193511.5000102@v.loewis.de> > The last solution is very simple: count collision and raise an > exception if it hits a limit. The path is something like 10 lines > whereas the randomized hash is more close to 500 lines, add a new > file, change Visual Studio project file, etc. First I thaught that it > would break more applications than the randomized hash The main issue with that approach is that it allows a new kind of attack. An attacker now needs to find 1000 colliding keys, and submit them one-by-one into a database. The limit will not trigger, as those are just database insertions. Now, if the applications also as a need to read the entire database table into a dictionary, that will suddenly break, and not for the attacker (which would be ok), but for the regular user of the application or the site administrator. So it may be that this approach actually simplifies the attack, making the cure worse than the disease. Regards, Martin From hrvoje.niksic at avl.com Fri Jan 20 10:49:06 2012 From: hrvoje.niksic at avl.com (Hrvoje Niksic) Date: Fri, 20 Jan 2012 10:49:06 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: <4F170793.9060802@v.loewis.de> References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> <4F170793.9060802@v.loewis.de> Message-ID: <4F193892.90901@avl.com> On 01/18/2012 06:55 PM, "Martin v. L?wis" wrote: > I was thinking about adding the field at the end, Will this make all strings larger, or only those that create dict collisions? Making all strings larger to fix this issue sounds like a really bad idea. Also, would it be acceptable to simply not cache the alternate hash? The cached string hash is an optimization anyway. Hrvoje From pydev at sievertsen.de Fri Jan 20 10:57:44 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Fri, 20 Jan 2012 10:57:44 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F193511.5000102@v.loewis.de> References: <4F193511.5000102@v.loewis.de> Message-ID: <4F193A98.1070603@sievertsen.de> > The main issue with that approach is that it allows a new kind of attack. Indeed, I posted another example: http://bugs.python.org/msg151677 This kind of fix can be used in a specific application or maybe in a special-purpose framework, but not on the level of a general-purpose language. Frank -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jan 20 11:06:32 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Jan 2012 20:06:32 +1000 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F193511.5000102@v.loewis.de> References: <4F193511.5000102@v.loewis.de> Message-ID: On Fri, Jan 20, 2012 at 7:34 PM, "Martin v. L?wis" wrote: > The main issue with that approach is that it allows a new kind of attack. > > An attacker now needs to find 1000 colliding keys, and submit them > one-by-one into a database. The limit will not trigger, as those are > just database insertions. > > Now, if the applications also as a need to read the entire database > table into a dictionary, that will suddenly break, and not for the > attacker (which would be ok), but for the regular user of the > application or the site administrator. > > So it may be that this approach actually simplifies the attack, making > the cure worse than the disease. Ouch, I think you're right. So hash randomisation may be the best option, and admins will need to test for themselves to see if it breaks things... Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mark at hotpy.org Fri Jan 20 11:49:05 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 20 Jan 2012 10:49:05 +0000 Subject: [Python-Dev] Changing the order of iteration over a dictionary In-Reply-To: <62991.1326993723@parc.com> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> <62991.1326993723@parc.com> Message-ID: <4F1946A1.2050306@hotpy.org> Hi, One of the main sticking points over possible fixes for the hash-collision security issue seems to be a fear that changing the iteration order of a dictionary will break backwards compatibility. The order of iteration has never been specified. In fact not only is it arbitrary, it cannot be determined from the contents of a dict alone; it may depend on the insertion order. Changing a hash function is not the only change that will change the iteration order; any of the following will also do so: * Changing the minimum size of a dict. * Changing the load factor of a dict. * Changing the resizing policy of a dict. * Sharing of keys between dicts. By treating iteration order as part of the API we are effectively ruling out ever making any improvements to the dict. For example, my new dictionary implementation https://bitbucket.org/markshannon/hotpy_new_dict/ reduces memory use by 47% for gcbench, and by about 20% for the 2to3 benchmark, on my 32bit machine. (Nice graphs: http://tinyurl.com/7qd2nnm http://tinyurl.com/6uqvl2x ) The new dict implementation (necessarily) changes the iteration order and will break code that relies on it. If dict iteration order is to be treated as part of the API (and I think that is a very bad idea) then it should be documented, which will be difficult since it is barely deterministic. This will also be a major problem for PyPy, Jython and IronPython, as they will have to reimplement their dicts. So, don't be afraid to change that hash function :) Cheers, Mark From fdrake at acm.org Fri Jan 20 12:11:47 2012 From: fdrake at acm.org (Fred Drake) Date: Fri, 20 Jan 2012 06:11:47 -0500 Subject: [Python-Dev] Changing the order of iteration over a dictionary In-Reply-To: <4F1946A1.2050306@hotpy.org> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> <62991.1326993723@parc.com> <4F1946A1.2050306@hotpy.org> Message-ID: On Fri, Jan 20, 2012 at 5:49 AM, Mark Shannon wrote: > So, don't be afraid to change that hash function :) Definitely. The hash function *has* been changed in the past, and a lot of developers were schooled in not relying on the iteration order. That's a good thing, as those developers now write tests of what's actually important rather than relying on implementation details of the Python runtime. A hash function that changes more often than during an occasional major version update will encourage more developers to write better tests. We can think of it as an educational tool. -Fred -- Fred L. Drake, Jr.? ? "A person who won't read has no advantage over one who can't read." ?? --Samuel Langhorne Clemens From ncoghlan at gmail.com Fri Jan 20 12:32:04 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 20 Jan 2012 21:32:04 +1000 Subject: [Python-Dev] Changing the order of iteration over a dictionary In-Reply-To: <4F1946A1.2050306@hotpy.org> References: <20120117213440.0008fd70@pitrou.net> <1326889813.3395.37.camel@localhost.localdomain> <4F176191.6090206@pearwood.info> <62991.1326993723@parc.com> <4F1946A1.2050306@hotpy.org> Message-ID: On Fri, Jan 20, 2012 at 8:49 PM, Mark Shannon wrote: > So, don't be afraid to change that hash function :) Changing it for 3.3 isn't really raising major concerns: the real concern is with changing it in maintenance and security patches for earlier releases. Security patches that may break production applications aren't desirable, since it means admins have to weigh up the risk of being affected by the security vulnerability against the risk of breakage from the patch itself. The collision counting approach was attractive because it looked like it might offer a way out that was less likely to break deployed systems. Unfortunately, I think the point Martin raised about just opening a new (even more subtle) attack vector kills that idea dead. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Fri Jan 20 12:46:39 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 12:46:39 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: 2012/1/20 Ivan Kozik : > I'd like to point out that an attacker is not limited to sending just > one dict full of colliding keys. ?Given a 22ms stall ... The presented attack produces a stall of at least 30 seconds (5 minutes or more if there is no time limit in the application), not 0.022 second. You have to send a lot of requests to produce a DoS if a single requests just eat 22 ms. I suppose that there are a lot of other kinds of request than takes much longer than 22 ms, even valid requests. > Another issue is that even with a configurable limit, different > modules can't have their own limits. ?One module might want a > relatively safe raise-at-100, and another module creating massive > dicts might want raise-at-1000. ?How does a developer know whether > they can raise or lower the limit, given that they use a bunch of > different modules? Python becomes really slow when you have more than N collisions (O(n^2) problem). If an application hits this limit with valid data, it is time to use another data structure or use a different hash function. We have to do more tests to choose correctly N, but according my first tests, it looks like N=1000 is a safe limit. Marc Andre's patch doesn't count all "collisions", but only "collisions" requiring to compare objects. When two objects have the same hash value, the open addressing algorithm searchs a free bucket. If a bucket is not free but has a different hash value, the objects are not compared and the collision counter is not incremented. The limit is only reached when you have N objects having the same hash value modulo the size of the bucket (hash(str) & DICT_MASK). When there are not enough empty buckets (it comes before all buckets are full), Python resizes the dictionary (it does something like size = size * 2) and so it uses at least one more bit each time than the dictionary is resized. Collisions are very likely with a small dictioanry, but becomes more rare each time than the dictionary is resized. It means that the number of potential collisions (with valid data) decreases when the dictionary grows. Tell me if I am wrong. From victor.stinner at haypocalc.com Fri Jan 20 13:08:43 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 13:08:43 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F18F37A.4040200@oddbird.net> References: <4F18F37A.4040200@oddbird.net> Message-ID: > I'm surprised we haven't seen bug reports about it from users > of 64-bit Pythons long ago A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) & mask == hash64(str) & mask for mask <= 2**32-1. From frank at sievertsen.de Fri Jan 20 13:12:57 2012 From: frank at sievertsen.de (Frank Sievertsen) Date: Fri, 20 Jan 2012 13:12:57 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F18F37A.4040200@oddbird.net> Message-ID: <4F195A49.6050508@sievertsen.de> No, that's not true. Whenever a collision happens, other bits are mixed in very fast. Frank Am 20.01.2012 13:08, schrieb Victor Stinner: >> I'm surprised we haven't seen bug reports about it from users >> of 64-bit Pythons long ago > A Python dictionary only uses the lower bits of a hash value. If your > dictionary has less than 2**32 items, the dictionary order is exactly > the same on 32 and 64 bits system: hash32(str)& mask == hash64(str)& > mask for mask<= 2**32-1. > _________________________ From victor.stinner at haypocalc.com Fri Jan 20 13:42:37 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 13:42:37 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F195A49.6050508@sievertsen.de> References: <4F18F37A.4040200@oddbird.net> <4F195A49.6050508@sievertsen.de> Message-ID: 2012/1/20 Frank Sievertsen : > No, that's not true. > Whenever a collision happens, other bits are mixed in very fast. Oh, I didn't know that. So the dict order is only the same if there is no collision. Victor From victor.stinner at haypocalc.com Fri Jan 20 13:50:18 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 13:50:18 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F193511.5000102@v.loewis.de> References: <4F193511.5000102@v.loewis.de> Message-ID: > The main issue with that approach is that it allows a new kind of attack. > > An attacker now needs to find 1000 colliding keys, and submit them > one-by-one into a database. The limit will not trigger, as those are > just database insertions. > > Now, if the applications also as a need to read the entire database > table into a dictionary, that will suddenly break, and not for the > attacker (which would be ok), but for the regular user of the > application or the site administrator. Oh, good catch. But it would not call it a new kind of attack, it is just a particular case of the hash collision vulnerability. Counting collision doesn't solve this case, but it doesn't make the situation worse than before. Raising quickly an exception is better than stalling for minutes, even if I agree than it is not the best behaviour. The best would is to answer quickly with the expected result :-) (using a different data structure or a different hash function?) Right now, I don't see any counter measure against this case. Victor From p.f.moore at gmail.com Fri Jan 20 13:57:45 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 20 Jan 2012 12:57:45 +0000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: On 20 January 2012 03:57, Brian Curtin wrote: >> FWIW, it might well be that I can't be available for the 3.3 final >> release (I haven't finalized my vacation schedule yet for August). > > In the interest of not having Windows releases depend on one person, > and having gone through building the installer myself (which I know is > but one of the duties), I'm available to help should you need it. One thought comes to mind - while we need a PEP to make a permanent change to the release schedule, would it be practical in any way to do a "trial run" of the process, and simply aim to release 3.4 about 6 months after 3.3? Based on the experiences gained from that, some of the discussions around this PEP could be supported (or not :-)) with more concrete information. If we can't do that, then that says something about the practicality of the proposal in itself... The plan for 3.4 would need to be publicised well in advance, of course, but doing that as a one-off exercise might well be viable. Paul. PS I have no view on whether the proposal is a good idea or a bad idea from a RM point of view. That's entirely up to the people who do the work to decide, in my opinion. From barry at python.org Fri Jan 20 14:10:30 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 20 Jan 2012 08:10:30 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> Message-ID: <20120120081030.75529cf5@resist.wooz.org> On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote: >Counting collision doesn't solve this case, but it doesn't make the >situation worse than before. Raising quickly an exception is better >than stalling for minutes, even if I agree than it is not the best >behaviour. ISTM that adding the possibility of raising a new exception on dictionary insertion is *more* backward incompatible than changing dictionary order, which for a very long time has been known to not be guaranteed. You're running some application, you upgrade Python because you apply all security fixes, and suddenly you're starting to get exceptions in places you can't really do anything about. Yet those exceptions are now part of the documented public API for dictionaries. This is asking for trouble. Bugs will suddenly start appearing in that application's tracker and they will seem to the application developer like Python just added a new public API in a security release. OTOH, if you change dictionary order and *that* breaks the application, then the bugs submitted to the application's tracker will be legitimate bugs that have to be fixed even if nothing else changed. So I still think we should ditch the paranoia about dictionary order changing, and fix this without counting. A little bit of paranoia could creep back in by disabling the hash fix by default in stable releases, but I think it would be fine to make that a compile-time option. -Barry From barry at python.org Fri Jan 20 14:17:05 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 20 Jan 2012 08:17:05 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F18F37A.4040200@oddbird.net> Message-ID: <20120120081705.32252b01@resist.wooz.org> On Jan 20, 2012, at 03:18 PM, Nick Coghlan wrote: >On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer wrote: >> I don't have the expertise to speak otherwise to the alternatives for >> fixing the collisions vulnerability, but I don't believe it's accurate >> to presume that Django would not want to fix a dict-ordering dependency, >> and use that as a justification for one approach over another. > >It's more a matter of wanting deployment of a security fix to be as >painless as possible - a security fix that system administrators can't >deploy because it breaks critical applications may as well not exist. True, but collision counting is worse IMO. It's just as likely (maybe) that an application would start getting new exceptions on dictionary insertion, as they would failures due to dictionary order changes. Unfortunately, in the former case it's because Python just added a new public API in a security release (the new exception *is* public API). In the latter case, no new API was added, but something exposed an already existing bug in the application. That's still a bug in the application even if counting was added. It's also a bug that any number of changes in the environment, or OS vendor deployment, could have triggered. -1 for collision counting. -Barry From barry at python.org Fri Jan 20 14:20:55 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 20 Jan 2012 08:20:55 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F18E6F0.2010208@pearwood.info> Message-ID: <20120120082055.076ffc11@resist.wooz.org> On Jan 20, 2012, at 03:15 PM, Nick Coghlan wrote: >With the 1000 collision limit in place, the attacker sends their >massive request, the affected dict quickly hits the limit, throws an >unhandled exception which is then caught by the web framework and >turned into a 500 Error response (or whatever's appropriate for the >protocol being attacked). Let's just be clear about it: this exception is new public API. Changing dictionary order is not. For me, that comes down firmly on the side of the latter rather than the former for stable releases. -Barry From solipsis at pitrou.net Fri Jan 20 14:51:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 20 Jan 2012 14:51:59 +0100 Subject: [Python-Dev] Counting collisions for the win References: <4F193511.5000102@v.loewis.de> Message-ID: <20120120145159.5aaf5328@pitrou.net> On Fri, 20 Jan 2012 13:50:18 +0100 Victor Stinner wrote: > > The main issue with that approach is that it allows a new kind of attack. > > > > An attacker now needs to find 1000 colliding keys, and submit them > > one-by-one into a database. The limit will not trigger, as those are > > just database insertions. > > > > Now, if the applications also as a need to read the entire database > > table into a dictionary, that will suddenly break, and not for the > > attacker (which would be ok), but for the regular user of the > > application or the site administrator. > > Oh, good catch. But it would not call it a new kind of attack, it is > just a particular case of the hash collision vulnerability. > > Counting collision doesn't solve this case, but it doesn't make the > situation worse than before. Raising quickly an exception is better > than stalling for minutes, even if I agree than it is not the best > behaviour. Actually, it *is* worse because stalling for seconds or minutes may not be a problem in some cases (e.g. some batch script that gets run overnight). Regards Antoine. From solipsis at pitrou.net Fri Jan 20 14:53:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 20 Jan 2012 14:53:47 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: <20120120145347.14950d9c@pitrou.net> On Fri, 20 Jan 2012 12:57:45 +0000 Paul Moore wrote: > On 20 January 2012 03:57, Brian Curtin wrote: > >> FWIW, it might well be that I can't be available for the 3.3 final > >> release (I haven't finalized my vacation schedule yet for August). > > > > In the interest of not having Windows releases depend on one person, > > and having gone through building the installer myself (which I know is > > but one of the duties), I'm available to help should you need it. > > One thought comes to mind - while we need a PEP to make a permanent > change to the release schedule, would it be practical in any way to do > a "trial run" of the process, and simply aim to release 3.4 about 6 > months after 3.3? It sounds reasonable to me, although we probably wouldn't market it as a "trial run". cheers Antoine. From meadori at gmail.com Fri Jan 20 16:02:24 2012 From: meadori at gmail.com (Meador Inge) Date: Fri, 20 Jan 2012 09:02:24 -0600 Subject: [Python-Dev] [Python-checkins] cpython (3.1): Closes #13807: Now checks for sys.stderr being there before writing to it. In-Reply-To: References: Message-ID: On Fri, Jan 20, 2012 at 5:32 AM, vinay.sajip wrote: > http://hg.python.org/cpython/rev/73dad4940b88 > changeset: ? 74538:73dad4940b88 > branch: ? ? ?3.1 I thought that the 3.1 branch is in security mode? Is this a security related fix? >From my brief scan of the changeset, it doesn't seem to be. > parent: ? ? ?74253:fb5707168351 > user: ? ? ? ?Vinay Sajip > date: ? ? ? ?Fri Jan 20 11:23:02 2012 +0000 > summary: > ?Closes #13807: Now checks for sys.stderr being there before writing to it. > > files: > ?Lib/logging/__init__.py | ?2 +- > ?1 files changed, 1 insertions(+), 1 deletions(-) > > > diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py > --- a/Lib/logging/__init__.py > +++ b/Lib/logging/__init__.py > @@ -721,7 +721,7 @@ > ? ? ? ? You could, however, replace this with a custom handler if you wish. > ? ? ? ? The record which was being processed is passed in to this method. > ? ? ? ? """ > - ? ? ? ?if raiseExceptions: > + ? ? ? ?if raiseExceptions and sys.stderr: ?# see issue 13807 > ? ? ? ? ? ? ei = sys.exc_info() > ? ? ? ? ? ? try: > ? ? ? ? ? ? ? ? traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr) > > -- > Repository URL: http://hg.python.org/cpython > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > -- # Meador From guido at python.org Fri Jan 20 16:33:28 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 07:33:28 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F193511.5000102@v.loewis.de> References: <4F193511.5000102@v.loewis.de> Message-ID: On Fri, Jan 20, 2012 at 1:34 AM, "Martin v. L?wis" wrote: > > The last solution is very simple: count collision and raise an > > exception if it hits a limit. The path is something like 10 lines > > whereas the randomized hash is more close to 500 lines, add a new > > file, change Visual Studio project file, etc. First I thaught that it > > would break more applications than the randomized hash > > The main issue with that approach is that it allows a new kind of attack. > > An attacker now needs to find 1000 colliding keys, and submit them > one-by-one into a database. The limit will not trigger, as those are > just database insertions. > > Now, if the applications also as a need to read the entire database > table into a dictionary, that will suddenly break, and not for the > attacker (which would be ok), but for the regular user of the > application or the site administrator. > > So it may be that this approach actually simplifies the attack, making > the cure worse than the disease. > It would be a pretty lousy app that tried to load the contents of an entire database into a dict. It seems that this would require much more knowledge of what the app is trying to do before a successful attack can be mounted. So I don't think this is worse than the original attack -- I think it requires much more ingenuity of an attacker. (I'm thinking that the original attack is trivial once the set of 65000 colliding keys is public knowledge, which must be only a matter of time.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From pydev at sievertsen.de Fri Jan 20 16:55:32 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Fri, 20 Jan 2012 16:55:32 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> Message-ID: <4F198E74.3050807@sievertsen.de> Hello, I still see at least two ways to create a DOS attack even with the collison-counting-patch. I assumed that it's possible to send ~500KB of payload to the application. 1. It's fully deterministic which slots the dict will lookup. Since we don't count slot-collisions, but only hash-value-collisions this can be exploited easily by creating strings with the hash-values along the lookup-way of an arbitrary (short) string. So first we pick an arbitrary string. Then calculate which slots it will visit on the way to the first empty slot. Then we create strings with hash-values for these slots. This attack first injects the strings to fill all the slots that the one short string will want to visit. Then it adds THE SAME string again and again. Since the entry is already there, nothing will be added and no additional collisions happen, no exception raised. $ ls -l super.txt -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt $ tail -n3 super.txt FX5 FX5 FX5 $ wc -l super.txt 90000 super.txt $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' real 0m52.724s user 0m51.543s sys 0m0.028s 2. The second attack actually attacks that 1000 allowed string comparisons are still a lot of work. First I added 999 strings that collide with a one-byte string "a". In some applications a zero-byte string might work even better. Then I can add a many thousand of the "a"'s, just like the first attack. $ ls -l 1000.txt -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt $ head -n 3 1000.txt 7hLci00 4wVFm10 _rZJU50 $ wc -l 1000.txt 247000 1000.txt $ tail -n 3 1000.txt a a a $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' real 0m17.408s user 0m15.897s sys 0m0.008s Of course the first attack is far more efficient. One could argue that 16 seconds is not enough for an attack. But maybe it's possible to send 1MB, have zero-bytes strings, and since for example django does 5 lookups per query-string this will keep it busy for ~80 seconds on my pc. What to do now? I think it's not smart to reduce the number of allowed collisions dramatically AND count all slot-collisions at the same time. Frank From victor.stinner at haypocalc.com Fri Jan 20 17:04:18 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 17:04:18 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> Message-ID: > (I'm thinking that the original > attack is trivial once the set of 65000 colliding keys is public knowledge, > which must be only a matter of time.) I have a program able to generate collisions: it takes 1 second to compute 60,000 colliding strings on a desktop computer. So the security of the randomized hash is based on the fact than the attacker cannot compute the secret. Victor From victor.stinner at haypocalc.com Fri Jan 20 17:17:24 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 20 Jan 2012 17:17:24 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <20120120081030.75529cf5@resist.wooz.org> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: > So I still think we should ditch the paranoia about dictionary order changing, > and fix this without counting. The randomized hash has other issues: - its security is based on its secret, whereas it looks to be easy to compute it (see more details in the issue) - my patch only changes hash(str), whereas other developers asked me to patch also bytes, int and other types hash(bytes) can be changed. But changing hash(int) may leak easily the secret. We may use a different secret for each type, but if it is easy to compute int hash secret, dictionaries using int are still vulnerable. -- There is no perfect solutions, drawbacks of each solution should be compared. Victor From solipsis at pitrou.net Fri Jan 20 17:31:17 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 20 Jan 2012 17:31:17 +0100 Subject: [Python-Dev] Counting collisions for the win References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: <20120120173117.7f78d4f8@pitrou.net> On Fri, 20 Jan 2012 17:17:24 +0100 Victor Stinner wrote: > > So I still think we should ditch the paranoia about dictionary order changing, > > and fix this without counting. > > The randomized hash has other issues: > > - its security is based on its secret, whereas it looks to be easy to > compute it (see more details in the issue) How do you compute the secret? I see two possibilities: - the application leaks the hash() values: this sounds unlikely since I don't see the use case for it; - the application shows the dict iteration order (e.g. order of HTML attributes): then we could add a second per-dictionary secret so that the iteration order of a single dict doesn't give any useful information about the hash function. But the bottom line for me is the following: - randomized hashes eliminate the possibility to use a single exploit for all Python-powered applications: for each application, the attacker now has to find a way to extract the secret; - collision counting doesn't eliminate the possibility of generic exploits, as Frank Sievertsen has just shown in http://mail.python.org/pipermail/python-dev/2012-January/115726.html Regards Antoine. From regebro at gmail.com Fri Jan 20 17:41:34 2012 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 20 Jan 2012 17:41:34 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Fri, Jan 20, 2012 at 01:48, Victor Stinner wrote: > ?- the limit should be configurable: a new function in the sys module > should be enough. It may be private (or replaced by an environment > variable?) in stable versions I'd like to see both. I would like both the programmer and the "user" to be able to control what the limit is. //Lennart From status at bugs.python.org Fri Jan 20 18:07:33 2012 From: status at bugs.python.org (Python tracker) Date: Fri, 20 Jan 2012 18:07:33 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20120120170733.522F21DEC1@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2012-01-13 - 2012-01-20) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3209 ( -1) closed 22405 (+53) total 25614 (+52) Open issues with patches: 1376 Issues opened (37) ================== #13411: Hashable memoryviews http://bugs.python.org/issue13411 reopened by skrah #13782: xml.etree.ElementTree: Element.append doesn't type-check its a http://bugs.python.org/issue13782 opened by sjmachin #13783: Clean up PEP 380 C API additions http://bugs.python.org/issue13783 opened by ncoghlan #13784: Documentation of xml.sax.xmlreader: Locator.getLineNumber() a http://bugs.python.org/issue13784 opened by patrick.vrijlandt #13785: Make concurrent.futures.Future state public http://bugs.python.org/issue13785 opened by jjdominguezm #13788: os.closerange optimization http://bugs.python.org/issue13788 opened by ferringb #13789: _tkinter does not build on Windows 7 http://bugs.python.org/issue13789 opened by terry.reedy #13790: In str.format an incorrect error message for list, tuple, dict http://bugs.python.org/issue13790 opened by py.user #13792: The "os.execl" call doesn't give programs exit code http://bugs.python.org/issue13792 opened by kayhayen #13793: hasattr, delattr, getattr fail with unnormalized names http://bugs.python.org/issue13793 opened by Jim.Jewett #13796: use 'text=...' to define the text attribute of and xml.etree.E http://bugs.python.org/issue13796 opened by paaguti #13797: Allow objects implemented in pure Python to export PEP 3118 bu http://bugs.python.org/issue13797 opened by ncoghlan #13798: Pasting and then running code doesn't work in the IDLE Shell http://bugs.python.org/issue13798 opened by ramchandra.apte #13799: Base 16 should be hexadecimal in Unicode HOWTO http://bugs.python.org/issue13799 opened by ramchandra.apte #13801: The Python 3 Docs don't highlight nonlocal http://bugs.python.org/issue13801 opened by ramchandra.apte #13802: IDLE Prefernces/Fonts: use multiple alphabets in examples http://bugs.python.org/issue13802 opened by terry.reedy #13804: Python library structure creates hard to read code when using http://bugs.python.org/issue13804 opened by dwt #13806: Audioop decompression frames size check fix http://bugs.python.org/issue13806 opened by Oleg.Plakhotnyuk #13812: multiprocessing package doesn't flush stderr on child exceptio http://bugs.python.org/issue13812 opened by brandj #13814: Generators as context managers. http://bugs.python.org/issue13814 opened by yak #13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper http://bugs.python.org/issue13815 opened by cjwatson #13816: Two typos in the docs http://bugs.python.org/issue13816 opened by Retro #13817: deadlock in subprocess while running several threads using Pop http://bugs.python.org/issue13817 opened by glaubich #13818: argparse: -h listening required options under optional argumen http://bugs.python.org/issue13818 opened by mgodinho #13819: _warnings settings are process-wide http://bugs.python.org/issue13819 opened by pitrou #13820: 2.6 is no longer in the future http://bugs.python.org/issue13820 opened by Jim.Jewett #13821: misleading return from isidentifier http://bugs.python.org/issue13821 opened by Jim.Jewett #13822: is(upper/lower/title) are not exactly correct http://bugs.python.org/issue13822 opened by benjamin.peterson #13823: xml.etree.ElementTree.ElementTree.write - argument checking http://bugs.python.org/issue13823 opened by patrick.vrijlandt #13824: argparse.FileType opens a file without excepting resposibility http://bugs.python.org/issue13824 opened by David.Layton #13825: Datetime failing while reading active directory time attribute http://bugs.python.org/issue13825 opened by scape #13826: Having a shlex example in the subprocess.Popen docs is confusi http://bugs.python.org/issue13826 opened by Julian #13828: Further improve casefold documentation http://bugs.python.org/issue13828 opened by Jim.Jewett #13829: exception error http://bugs.python.org/issue13829 opened by Dan.kamp #13830: codecs error handler is called with a UnicodeDecodeError with http://bugs.python.org/issue13830 opened by amaury.forgeotdarc #13831: get method of multiprocessing.pool.Async should return full t http://bugs.python.org/issue13831 opened by fmitha #13833: No documentation for PyStructSequence http://bugs.python.org/issue13833 opened by torsten Most recent 15 issues with no replies (15) ========================================== #13833: No documentation for PyStructSequence http://bugs.python.org/issue13833 #13831: get method of multiprocessing.pool.Async should return full t http://bugs.python.org/issue13831 #13830: codecs error handler is called with a UnicodeDecodeError with http://bugs.python.org/issue13830 #13829: exception error http://bugs.python.org/issue13829 #13824: argparse.FileType opens a file without excepting resposibility http://bugs.python.org/issue13824 #13823: xml.etree.ElementTree.ElementTree.write - argument checking http://bugs.python.org/issue13823 #13822: is(upper/lower/title) are not exactly correct http://bugs.python.org/issue13822 #13820: 2.6 is no longer in the future http://bugs.python.org/issue13820 #13819: _warnings settings are process-wide http://bugs.python.org/issue13819 #13818: argparse: -h listening required options under optional argumen http://bugs.python.org/issue13818 #13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper http://bugs.python.org/issue13815 #13802: IDLE Prefernces/Fonts: use multiple alphabets in examples http://bugs.python.org/issue13802 #13784: Documentation of xml.sax.xmlreader: Locator.getLineNumber() a http://bugs.python.org/issue13784 #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 #13771: HTTPSConnection __init__ super implementation causes recursion http://bugs.python.org/issue13771 Most recent 15 issues waiting for review (15) ============================================= #13833: No documentation for PyStructSequence http://bugs.python.org/issue13833 #13817: deadlock in subprocess while running several threads using Pop http://bugs.python.org/issue13817 #13816: Two typos in the docs http://bugs.python.org/issue13816 #13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper http://bugs.python.org/issue13815 #13806: Audioop decompression frames size check fix http://bugs.python.org/issue13806 #13788: os.closerange optimization http://bugs.python.org/issue13788 #13785: Make concurrent.futures.Future state public http://bugs.python.org/issue13785 #13777: socket: communicating with Mac OS X KEXT controls http://bugs.python.org/issue13777 #13775: Access Denied message on symlink creation misleading for an ex http://bugs.python.org/issue13775 #13773: Support sqlite3 uri filenames http://bugs.python.org/issue13773 #13742: Add a key parameter (like sorted) to heapq.merge http://bugs.python.org/issue13742 #13736: urllib.request.urlopen leaks exceptions from socket and httpli http://bugs.python.org/issue13736 #13734: Add a generic directory walker method to avoid symlink attacks http://bugs.python.org/issue13734 #13733: Change required to sysconfig.py for Python 2.7.2 on OS/2 http://bugs.python.org/issue13733 #13719: bdist_msi upload fails http://bugs.python.org/issue13719 Top 10 most discussed issues (10) ================================= #13703: Hash collision security issue http://bugs.python.org/issue13703 48 msgs #12600: Add example of using load_tests to parameterise Test Cases http://bugs.python.org/issue12600 12 msgs #13790: In str.format an incorrect error message for list, tuple, dict http://bugs.python.org/issue13790 10 msgs #6727: ImportError when package is symlinked on Windows http://bugs.python.org/issue6727 8 msgs #13405: Add DTrace probes http://bugs.python.org/issue13405 8 msgs #6531: atexit_callfuncs() crashing within Py_Finalize() when using mu http://bugs.python.org/issue6531 7 msgs #13804: Python library structure creates hard to read code when using http://bugs.python.org/issue13804 7 msgs #8052: subprocess close_fds behavior should only close open fds http://bugs.python.org/issue8052 6 msgs #11805: package_data only allows one glob per-package http://bugs.python.org/issue11805 6 msgs #10181: Problems with Py_buffer management in memoryobject.c (and else http://bugs.python.org/issue10181 5 msgs Issues closed (51) ================== #2124: xml.sax and xml.dom fetch DTDs by default http://bugs.python.org/issue2124 closed by loewis #2134: Add new attribute to TokenInfo to report specific token IDs http://bugs.python.org/issue2134 closed by meador.inge #6528: builtins colored as keyword at beginning of line http://bugs.python.org/issue6528 closed by terry.reedy #8285: IDLE not smart indenting correctly in nested statements http://bugs.python.org/issue8285 closed by terry.reedy #11906: test_argparse failure in interactive mode http://bugs.python.org/issue11906 closed by terry.reedy #11948: Tutorial/Modules - small fix to better clarify the modules sea http://bugs.python.org/issue11948 closed by sandro.tosi #12705: Make compile('1\n2\n', '', 'single') raise an exception instea http://bugs.python.org/issue12705 closed by meador.inge #12949: Documentation of PyCode_New() lacks kwonlyargcount argument http://bugs.python.org/issue12949 closed by meador.inge #13039: IDLE editor: shell-like behaviour on line starting with ">>>" http://bugs.python.org/issue13039 closed by terry.reedy #13516: Gzip old log files in rotating handlers http://bugs.python.org/issue13516 closed by vinay.sajip #13589: Aifc low level serialization primitives fix http://bugs.python.org/issue13589 closed by pitrou #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 closed by sandro.tosi #13629: _PyParser_TokenNames does not match up with the token.h number http://bugs.python.org/issue13629 closed by meador.inge #13642: urllib incorrectly quotes username and password in https basic http://bugs.python.org/issue13642 closed by orsenthil #13645: import machinery vulnerable to timestamp collisions http://bugs.python.org/issue13645 closed by pitrou #13665: TypeError: string or integer address expected instead of str i http://bugs.python.org/issue13665 closed by ezio.melotti #13695: "type specific" to "type-specific" http://bugs.python.org/issue13695 closed by ezio.melotti #13715: typo in unicodedata documentation http://bugs.python.org/issue13715 closed by ezio.melotti #13722: "distributions can disable the encodings package" http://bugs.python.org/issue13722 closed by pitrou #13723: Regular expressions: (?:X|\s+)*$ takes a long time http://bugs.python.org/issue13723 closed by terry.reedy #13725: regrtest does not recognize -d flag http://bugs.python.org/issue13725 closed by meador.inge #13726: regrtest ambiguous -S flag http://bugs.python.org/issue13726 closed by orsenthil #13727: Accessor macros for PyDateTime_Delta members http://bugs.python.org/issue13727 closed by amaury.forgeotdarc #13728: Description of -m and -c cli options wrong? http://bugs.python.org/issue13728 closed by sandro.tosi #13730: Grammar mistake in Decimal documentation http://bugs.python.org/issue13730 closed by python-dev #13746: ast.Tuple's have an inconsistent "col_offset" value http://bugs.python.org/issue13746 closed by georg.brandl #13752: add a str.casefold() method http://bugs.python.org/issue13752 closed by python-dev #13760: ConfigParser exceptions are not pickleable http://bugs.python.org/issue13760 closed by lukasz.langa #13761: Add flush keyword to print() http://bugs.python.org/issue13761 closed by python-dev #13763: Potentially hard to understand wording in devguide http://bugs.python.org/issue13763 closed by terry.reedy #13764: Misc/build.sh is outdated... talks about svn http://bugs.python.org/issue13764 closed by pitrou #13766: explain the relationship between Lib/lib2to3/Grammar.txt and G http://bugs.python.org/issue13766 closed by python-dev #13768: Doc/tools/dailybuild.py available only on 2.7 branch http://bugs.python.org/issue13768 closed by georg.brandl #13774: json.loads raises a SystemError for invalid encoding on 2.7.2 http://bugs.python.org/issue13774 closed by amaury.forgeotdarc #13780: make YieldFrom its own node http://bugs.python.org/issue13780 closed by python-dev #13781: gzip module does the wrong thing with an os.fdopen()'ed fileob http://bugs.python.org/issue13781 closed by nadeem.vawda #13786: regrtest.py does not handle --trace http://bugs.python.org/issue13786 closed by meador.inge #13787: PyCode_New not round-trippable (TypeError) http://bugs.python.org/issue13787 closed by amaury.forgeotdarc #13791: Reword ???Old versions??? in the doc sidebar http://bugs.python.org/issue13791 closed by eric.araujo #13794: Copyright Year - Change it to 2012 please http://bugs.python.org/issue13794 closed by eric.araujo #13795: CDATA Element missing http://bugs.python.org/issue13795 closed by amaury.forgeotdarc #13803: Under Solaris, distutils doesn't include bitness in the direct http://bugs.python.org/issue13803 closed by python-dev #13805: [].sort() should return self http://bugs.python.org/issue13805 closed by ezio.melotti #13807: logging.Handler.handlerError() may raise AttributeError in tra http://bugs.python.org/issue13807 closed by python-dev #13808: url for Tutor mailing list is broken http://bugs.python.org/issue13808 closed by ezio.melotti #13809: bz2 does not work when threads are disabled http://bugs.python.org/issue13809 closed by nadeem.vawda #13810: refer people to Doc/Makefile when not using 'make' to build ma http://bugs.python.org/issue13810 closed by sandro.tosi #13811: In str.format, if invalid fill and alignment are specified, th http://bugs.python.org/issue13811 closed by python-dev #13813: "sysconfig.py" and "distutils/util.py" redundancy http://bugs.python.org/issue13813 closed by eric.araujo #13827: Unexecuted import changes namespace http://bugs.python.org/issue13827 closed by benjamin.peterson #13832: tokenization assuming ASCII whitespace; missing multiline case http://bugs.python.org/issue13832 closed by benjamin.peterson From ethan at stoneleaf.us Fri Jan 20 18:09:57 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Jan 2012 09:09:57 -0800 Subject: [Python-Dev] exception chaining Message-ID: <4F199FE5.9080005@stoneleaf.us> Summary: Exception Chaining is cool, unless you are writing libraries that want to transform from Exception X to Exception Y as the the previous exception context is unnecessary, potentially confusing, and cluttery (yup, just made that word up!). For all the gory details, see http://bugs.python.org/issue6210. I'm going to attempt a patch implementing MRAB's suggestion: try: some_op except ValueError: raise as OtherError() # `raise` keeps context, `raise as` does not The question I have at the moment is: should `raise as` be an error if no exception is currently being handled? Example: def smurfy(x): if x != 'magic flute': raise as WrongInstrument do_something_with_x If this is allowed then `smurfy` could be called from inside an `except` clause or outside it. I don't care for it for two reasons: - I don't like the way it looks - I can see it encouraging always using `raise as` instead of `raise` and losing the value of exception chaining. Other thoughts? ~Ethan~ From guido at python.org Fri Jan 20 18:43:27 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 09:43:27 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Thu, Jan 19, 2012 at 8:06 PM, Ivan Kozik wrote: > No, I wasn't happy with termination. I wanted to treat it just like a > JSON decoding error, and send the appropriate response. > So was this attack actually being mounted on your service regularly? I'd think it would be sufficient to treat it as a MemoryError -- unavoidable, if it happens things are really bad, and hopefully you'll crash quickly and some monitor process restarts your service. That's a mechanism that you should have anyway. > I actually forgot to mention the main reason I abandoned the > stop-at-N-collisions approach. I had a server with a dict that stayed > in memory, across many requests. It was being populated with > identifiers chosen by clients. I couldn't have my server stay broken > if this dict filled up with a bunch of colliding keys. (I don't think > I could have done another thing either, like nuke the dict or evict > some keys.) > What would your service do if it ran out of memory? Maybe one tweak to the collision counting would be that the exception needs to inherit from BaseException (like MemoryError) so most generic exception handlers don't actually handle it. (Style note: never use "except:", always use "except Exception:".) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Fri Jan 20 18:47:13 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 20 Jan 2012 12:47:13 -0500 Subject: [Python-Dev] exception chaining In-Reply-To: <4F199FE5.9080005@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> Message-ID: 2012/1/20 Ethan Furman : > Summary: > > Exception Chaining is cool, unless you are writing libraries that want to > transform from Exception X to Exception Y as the the previous exception > context is unnecessary, potentially confusing, and cluttery (yup, just made > that word up!). > > For all the gory details, see http://bugs.python.org/issue6210. > > I'm going to attempt a patch implementing MRAB's suggestion: > > try: > ? ?some_op > except ValueError: > ? ?raise as OtherError() # `raise` keeps context, `raise as` does not I dislike this syntax. Raise what as OtherError()? I think the "raise x from None" idea is preferable, since it indicates you are nulling the context. The optimal solution would be to have "raise X nocontext", but that would obviously require another keyword... -- Regards, Benjamin From guido at python.org Fri Jan 20 18:50:25 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 09:50:25 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F193A98.1070603@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <4F193A98.1070603@sievertsen.de> Message-ID: On Fri, Jan 20, 2012 at 1:57 AM, Frank Sievertsen wrote: > The main issue with that approach is that it allows a new kind of attack. > > > Indeed, I posted another example: http://bugs.python.org/msg151677 > > This kind of fix can be used in a specific application or maybe in a > special-purpose framework, but not on the level of a general-purpose > language. > Right. We are discussion this issue (for weeks now...) because it makes pretty much any Python app that takes untrusted data vulnerable, especially web apps, and after extensive analysis we came to the conclusion that defenses in the framework or in the app are really hard to do, very disruptive for developers, whereas preventing the attack by a modification of the dict or hash algorithms would fix it for everybody. And moreover, the attack would work against pretty much any Python web app using a set of evil strings computed once (hence encouraging script kiddies of just firing their fully-automatic weapon at random websites). The new attacks that are now being considered require analysis of how the website is implemented, how it uses and stores data, etc. So an attacker has to sit down and come up with an attack tailored to a specific website. That can be dealt with on an ad-hoc basis. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 20 19:15:21 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 10:15:21 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <20120120081030.75529cf5@resist.wooz.org> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw wrote: > On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote: > > >Counting collision doesn't solve this case, but it doesn't make the > >situation worse than before. Raising quickly an exception is better > >than stalling for minutes, even if I agree than it is not the best > >behaviour. > > ISTM that adding the possibility of raising a new exception on dictionary > insertion is *more* backward incompatible than changing dictionary order, > which for a very long time has been known to not be guaranteed. You're > running some application, you upgrade Python because you apply all security > fixes, and suddenly you're starting to get exceptions in places you can't > really do anything about. Yet those exceptions are now part of the > documented > public API for dictionaries. This is asking for trouble. Bugs will > suddenly > start appearing in that application's tracker and they will seem to the > application developer like Python just added a new public API in a security > release. > Dict insertion can already raise an exception: MemoryError. I think we should be safe if the new exception also derives from BaseException. We should actually eriously consider just raising MemoryException, since introducing a new built-in exception in a bugfix release is also very questionable: code explicitly catching or raising it would not work on previous bugfix releases of the same feature release. OTOH, if you change dictionary order and *that* breaks the application, then > the bugs submitted to the application's tracker will be legitimate bugs > that > have to be fixed even if nothing else changed. > There are lots of things that are undefined according to the language spec (and quite possibly known to vary between versions or platforms or implementations like PyPy or Jython) but which we would never change in a bugfix release. So I still think we should ditch the paranoia about dictionary order > changing, > and fix this without counting. A little bit of paranoia could creep back > in > by disabling the hash fix by default in stable releases, but I think it > would > be fine to make that a compile-time option. I'm sorry, but I don't want to break a user's app with a bugfix release and say "haha your code was already broken you just didn't know it". Sure, the dict order already varies across Python implementations, possibly across 32/64 bits or operating systems. But many organizations (I know a few :-) have a very large installed software base, created over many years by many people with varying skills, that is kept working in part by very carefully keeping the environment as constant as possible. This means that the target environment is much more predictable than it is for the typical piece of open source software. Sure, a good Python developer doesn't write apps or tests that depend on dict order. But time and again we see that not everybody writes perfect code every time. Especially users writing "in-house" apps (as opposed to frameworks shared as open source) are less likely to always use the most robust, portable algorithms in existence, because they may know with much more certainty that their code will never be used on certain combinations of platforms. For example, I rarely think about whether code I write might not work on IronPython or Jython, or even CPython on Windows. And if something I wrote suddenly needs to be ported to one of those, well, that's considered a port and I'll just accept that it might mean changing a few things. The time to break a dependency on dict order is not with a bugfix release but with a feature release: those are more likely to break other things as well anyway, and uses are well aware that they have to test everything and anticipate having to fix some fraction of their code for each feature release. OTOH we have established a long and successful track record of conservative bugfix releases that don't break anything. (I am aware of exactly one thing that was broken by a bugfix release in application code I am familiar with.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 20 19:16:15 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 10:16:15 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <20120120082055.076ffc11@resist.wooz.org> References: <4F18E6F0.2010208@pearwood.info> <20120120082055.076ffc11@resist.wooz.org> Message-ID: On Fri, Jan 20, 2012 at 5:20 AM, Barry Warsaw wrote: > Let's just be clear about it: this exception is new public API. Changing > dictionary order is not. > Not if you raise MemoryError or BaseException. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 20 19:20:08 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 10:20:08 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F198E74.3050807@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <4F198E74.3050807@sievertsen.de> Message-ID: This is the first objection I have seen against collision-counting that might stand. On Fri, Jan 20, 2012 at 7:55 AM, Frank Sievertsen wrote: > Hello, > > I still see at least two ways to create a DOS attack even with the > collison-counting-patch. > > I assumed that it's possible to send ~500KB of payload to the > application. > > 1. It's fully deterministic which slots the dict will lookup. > Since we don't count slot-collisions, but only hash-value-collisions > this can be exploited easily by creating strings with the hash-values > along the lookup-way of an arbitrary (short) string. > > So first we pick an arbitrary string. Then calculate which slots it will > visit on the way to the first empty slot. Then we create strings with > hash-values for these slots. > > This attack first injects the strings to fill all the slots that the > one short string will want to visit. Then it adds THE SAME string > again and again. Since the entry is already there, nothing will be added > and no additional collisions happen, no exception raised. > > $ ls -l super.txt > -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt > $ tail -n3 super.txt > FX5 > FX5 > FX5 > $ wc -l super.txt > 90000 super.txt > $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("super.txt"))' > real 0m52.724s > user 0m51.543s > sys 0m0.028s > > 2. The second attack actually attacks that 1000 allowed string > comparisons are still a lot of work. > First I added 999 strings that collide with a one-byte string "a". In > some applications a zero-byte string might work even better. Then I > can add a many thousand of the "a"'s, just like the first attack. > > $ ls -l 1000.txt > -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt > $ head -n 3 1000.txt > 7hLci00 > 4wVFm10 > _rZJU50 > $ wc -l 1000.txt > 247000 1000.txt > $ tail -n 3 1000.txt > a > a > a > $ time python -c 'dict((unicode(l[:-1]), 0) for l in open("1000.txt"))' > real 0m17.408s > user 0m15.897s > sys 0m0.008s > > Of course the first attack is far more efficient. One could argue > that 16 seconds is not enough for an attack. But maybe it's possible > to send 1MB, have zero-bytes strings, and since for example django > does 5 lookups per query-string this will keep it busy for ~80 seconds on > my pc. > > What to do now? > I think it's not smart to reduce the number of allowed collisions > dramatically > AND count all slot-collisions at the same time. > > Frank > > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Fri Jan 20 19:52:44 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 20 Jan 2012 19:52:44 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: Am 20.01.2012 19:15, schrieb Guido van Rossum: > OTOH, if you change dictionary order and *that* breaks the application, then > the bugs submitted to the application's tracker will be legitimate bugs that > have to be fixed even if nothing else changed. > > > There are lots of things that are undefined according to the language spec (and > quite possibly known to vary between versions or platforms or implementations > like PyPy or Jython) but which we would never change in a bugfix release. > > So I still think we should ditch the paranoia about dictionary order changing, > and fix this without counting. A little bit of paranoia could creep back in > by disabling the hash fix by default in stable releases, but I think it would > be fine to make that a compile-time option. > > > I'm sorry, but I don't want to break a user's app with a bugfix release and say > "haha your code was already broken you just didn't know it". > > Sure, the dict order already varies across Python implementations, possibly > across 32/64 bits or operating systems. But many organizations (I know a few :-) > have a very large installed software base, created over many years by many > people with varying skills, that is kept working in part by very carefully > keeping the environment as constant as possible. This means that the target > environment is much more predictable than it is for the typical piece of open > source software. I agree. This applies to 3.2 and 2.7, but even more to 3.1 and 2.6, which are in security-fix mode. Even if relying on dict order is a bug right now, I believe it happens many times more often in code bases out there than dicts that are filled with many many colliding keys. So even if we can honestly blame the programmer in the former case, the users applying the security fix will have the same bad experience and won't likely care if we claim "undefined behavior". This means that it seems preferable to go with the situation where you have less breakages in total. Not to mention that changing dict order is likely to lead to much more subtle bugs than a straight MemoryError on a dict insert. Also, another advantage of collision counting I haven't seen in the discussion yet is that it's quite trivial to provide an API in sys to turn it on or off -- while turning on or off randomized hashes has to be done before Python starts up, i.e. at build time or with an environment variable or flag. Georg From brett at python.org Fri Jan 20 19:49:55 2012 From: brett at python.org (Brett Cannon) Date: Fri, 20 Jan 2012 13:49:55 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On Fri, Jan 20, 2012 at 13:15, Guido van Rossum wrote: > On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw wrote: > >> On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote: >> >> >Counting collision doesn't solve this case, but it doesn't make the >> >situation worse than before. Raising quickly an exception is better >> >than stalling for minutes, even if I agree than it is not the best >> >behaviour. >> >> ISTM that adding the possibility of raising a new exception on dictionary >> insertion is *more* backward incompatible than changing dictionary order, >> which for a very long time has been known to not be guaranteed. You're >> running some application, you upgrade Python because you apply all >> security >> fixes, and suddenly you're starting to get exceptions in places you can't >> really do anything about. Yet those exceptions are now part of the >> documented >> public API for dictionaries. This is asking for trouble. Bugs will >> suddenly >> start appearing in that application's tracker and they will seem to the >> application developer like Python just added a new public API in a >> security >> release. >> > > Dict insertion can already raise an exception: MemoryError. I think we > should be safe if the new exception also derives from BaseException. We > should actually eriously consider just raising MemoryException, since > introducing a new built-in exception in a bugfix release is also very > questionable: code explicitly catching or raising it would not work on > previous bugfix releases of the same feature release. > > OTOH, if you change dictionary order and *that* breaks the application, >> then >> the bugs submitted to the application's tracker will be legitimate bugs >> that >> have to be fixed even if nothing else changed. >> > > There are lots of things that are undefined according to the language spec > (and quite possibly known to vary between versions or platforms or > implementations like PyPy or Jython) but which we would never change in a > bugfix release. > > So I still think we should ditch the paranoia about dictionary order >> changing, >> and fix this without counting. A little bit of paranoia could creep back >> in >> by disabling the hash fix by default in stable releases, but I think it >> would >> be fine to make that a compile-time option. > > > I'm sorry, but I don't want to break a user's app with a bugfix release > and say "haha your code was already broken you just didn't know it". > > Sure, the dict order already varies across Python implementations, > possibly across 32/64 bits or operating systems. But many organizations (I > know a few :-) have a very large installed software base, created over many > years by many people with varying skills, that is kept working in part by > very carefully keeping the environment as constant as possible. This means > that the target environment is much more predictable than it is for the > typical piece of open source software. > > Sure, a good Python developer doesn't write apps or tests that depend on > dict order. But time and again we see that not everybody writes perfect > code every time. Especially users writing "in-house" apps (as opposed to > frameworks shared as open source) are less likely to always use the most > robust, portable algorithms in existence, because they may know with much > more certainty that their code will never be used on certain combinations > of platforms. For example, I rarely think about whether code I write might > not work on IronPython or Jython, or even CPython on Windows. And if > something I wrote suddenly needs to be ported to one of those, well, that's > considered a port and I'll just accept that it might mean changing a few > things. > > The time to break a dependency on dict order is not with a bugfix release > but with a feature release: those are more likely to break other things as > well anyway, and uses are well aware that they have to test everything and > anticipate having to fix some fraction of their code for each feature > release. OTOH we have established a long and successful track record of > conservative bugfix releases that don't break anything. (I am aware of > exactly one thing that was broken by a bugfix release in application code I > am familiar with.) > Why can't we have our cake and eat it too? Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 20 20:03:36 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Jan 2012 14:03:36 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On 1/20/2012 11:17 AM, Victor Stinner wrote: > There is no perfect solutions, drawbacks of each solution should be compared. Amen. One possible attack that has been described for a collision counting dict depends on knowing precisely the trigger point. So let MAXCOLLISIONS either be configureable or just choose a random count between M and N, say 700 and 999. It would not hurt to have alternate patches available in case a particular Python-powered site comes under prolonged attack. Though given our miniscule share of the market, than is much less likely that an attack on a PHP- or MS-powered site. -- Terry Jan Reedy From donald.stufft at gmail.com Fri Jan 20 20:04:21 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 20 Jan 2012 14:04:21 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API. I don't believe there is a way to fix this without breaking someones application. The major differences I see between the two solutions is that counting will break people's applications who are otherwise following the documented api contract of dictionaries, and randomization will break people's applications who are violating the documented api contract of dictionaries. Personally I feel that the lesser of two evils is to reward those who followed the documentation, and not reward those who didn't. So +1 for Randomization as the only option in 3.3, and off by default with a flag or environment variable in bug fixes. I think it's the only way to proceed that won't hurt people who have followed the documented behavior. On Friday, January 20, 2012 at 1:49 PM, Brett Cannon wrote: > > > On Fri, Jan 20, 2012 at 13:15, Guido van Rossum wrote: > > On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw wrote: > > > On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote: > > > > > > >Counting collision doesn't solve this case, but it doesn't make the > > > >situation worse than before. Raising quickly an exception is better > > > >than stalling for minutes, even if I agree than it is not the best > > > >behaviour. > > > > > > ISTM that adding the possibility of raising a new exception on dictionary > > > insertion is *more* backward incompatible than changing dictionary order, > > > which for a very long time has been known to not be guaranteed. You're > > > running some application, you upgrade Python because you apply all security > > > fixes, and suddenly you're starting to get exceptions in places you can't > > > really do anything about. Yet those exceptions are now part of the documented > > > public API for dictionaries. This is asking for trouble. Bugs will suddenly > > > start appearing in that application's tracker and they will seem to the > > > application developer like Python just added a new public API in a security > > > release. > > > > Dict insertion can already raise an exception: MemoryError. I think we should be safe if the new exception also derives from BaseException. We should actually eriously consider just raising MemoryException, since introducing a new built-in exception in a bugfix release is also very questionable: code explicitly catching or raising it would not work on previous bugfix releases of the same feature release. > > > > > OTOH, if you change dictionary order and *that* breaks the application, then > > > the bugs submitted to the application's tracker will be legitimate bugs that > > > have to be fixed even if nothing else changed. > > > > There are lots of things that are undefined according to the language spec (and quite possibly known to vary between versions or platforms or implementations like PyPy or Jython) but which we would never change in a bugfix release. > > > > > So I still think we should ditch the paranoia about dictionary order changing, > > > and fix this without counting. A little bit of paranoia could creep back in > > > by disabling the hash fix by default in stable releases, but I think it would > > > be fine to make that a compile-time option. > > > > I'm sorry, but I don't want to break a user's app with a bugfix release and say "haha your code was already broken you just didn't know it". > > > > Sure, the dict order already varies across Python implementations, possibly across 32/64 bits or operating systems. But many organizations (I know a few :-) have a very large installed software base, created over many years by many people with varying skills, that is kept working in part by very carefully keeping the environment as constant as possible. This means that the target environment is much more predictable than it is for the typical piece of open source software. > > > > Sure, a good Python developer doesn't write apps or tests that depend on dict order. But time and again we see that not everybody writes perfect code every time. Especially users writing "in-house" apps (as opposed to frameworks shared as open source) are less likely to always use the most robust, portable algorithms in existence, because they may know with much more certainty that their code will never be used on certain combinations of platforms. For example, I rarely think about whether code I write might not work on IronPython or Jython, or even CPython on Windows. And if something I wrote suddenly needs to be ported to one of those, well, that's considered a port and I'll just accept that it might mean changing a few things. > > > > The time to break a dependency on dict order is not with a bugfix release but with a feature release: those are more likely to break other things as well anyway, and uses are well aware that they have to test everything and anticipate having to fix some fraction of their code for each feature release. OTOH we have established a long and successful track record of conservative bugfix releases that don't break anything. (I am aware of exactly one thing that was broken by a bugfix release in application code I am familiar with.) > > Why can't we have our cake and eat it too? > > Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). > > I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From casevh at gmail.com Fri Jan 20 20:06:46 2012 From: casevh at gmail.com (Case Van Horsen) Date: Fri, 20 Jan 2012 11:06:46 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On Fri, Jan 20, 2012 at 8:17 AM, Victor Stinner wrote: >> So I still think we should ditch the paranoia about dictionary order changing, >> and fix this without counting. > > The randomized hash has other issues: > > ?- its security is based on its secret, whereas it looks to be easy to > compute it (see more details in the issue) > ?- my patch only changes hash(str), whereas other developers asked me > to patch also bytes, int and other types Changing hash(int) on a bugfix release will cause issues with extensions (gmpy, sage, probably others) that calculate the hash of numerical objects. > > hash(bytes) can be changed. But changing hash(int) may leak easily the > secret. We may use a different secret for each type, but if it is easy > to compute int hash secret, dictionaries using int are still > vulnerable. > > -- > > There is no perfect solutions, drawbacks of each solution should be compared. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/casevh%40gmail.com From g.brandl at gmx.net Fri Jan 20 20:14:49 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 20 Jan 2012 20:14:49 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F18AD18.2080901@v.loewis.de> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: Am 20.01.2012 00:54, schrieb "Martin v. L?wis": >> I can't help noticing that so far, worries about the workload came mostly from >> people who don't actually bear that load (this is no accusation!), while those >> that do are the proponents of the PEP... > > Ok, so let me add then that I'm worried about the additional work-load. > > I'm particularly worried about the coordination of vacation across the > three people that work on a release. It might well not be possible to > make any release for a period of two months, which, in a six-months > release cycle with two alphas and a beta, might mean that we (the > release people) would need to adjust our vacation plans with the release > schedule, or else step down (unless you would release the "normal" > feature releases as source-only releases). Thanks for the reminder, Martin. Even with the current release schedule, I think that the load on you is too much, and we need a whole team of Windows release experts. It's not really fair that the RM usually changes from release to release (at least every 2), and you have to do the same for everyone. It looks like we have one volunteer already; if we find another, I think one of them will also be not on vacation at most times :) For the Mac, at least we're up to two experts, but I'd like to see a third there too. cheers, Georg From tjreedy at udel.edu Fri Jan 20 20:29:31 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Jan 2012 14:29:31 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F198E74.3050807@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <4F198E74.3050807@sievertsen.de> Message-ID: On 1/20/2012 10:55 AM, Frank Sievertsen wrote: > Hello, > > I still see at least two ways to create a DOS attack even with the > collison-counting-patch. > 2. The second attack actually attacks that 1000 allowed string > comparisons are still a lot of work. > First I added 999 strings that collide with a one-byte string "a". In > some applications a zero-byte string might work even better. Then I > can add a many thousand of the "a"'s, just like the first attack. If 1000 were replaced by, for instance, random.randint(700,1000) the dict could not be set to have an exception triggered with one other entry (which I believe was Martin's idea). But I suppose you would say that 699 entries would still make for much work. The obvious defense for this particular attack is to reject duplicate keys. Perhaps there should be write-once string sets and dicts available. This gets to the point that there is no best blind defense to all possible attacks. -- Terry Jan Reedy From tseaver at palladion.com Fri Jan 20 20:36:56 2012 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 20 Jan 2012 14:36:56 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/20/2012 02:04 PM, Donald Stufft wrote: > Even if a MemoryException is raised I believe that is still a > fundamental change in the documented contract of dictionary API. How so? Dictionary inserts can *already* raise that error. > I don't believe there is a way to fix this without breaking someones > application. The major differences I see between the two solutions is > that counting will break people's applications who are otherwise > following the documented api contract of dictionaries, Do you have a case in mind where legitimate user data (not crafted as part of a DoS attack) would trip the 1000-collision limit? How likely is it that such cases exist in already-deployed applications, compared to the known breakage in existing applications due to hash randomization? > and randomization will break people's applications who are violating > the documented api contract of dictionaries. > > Personally I feel that the lesser of two evils is to reward those who > followed the documentation, and not reward those who didn't. Except that I think your set is purely hypothetical, while the second set is *lots* of deployed applications. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA =u7Mt -----END PGP SIGNATURE----- From chris at simplistix.co.uk Fri Jan 20 20:05:41 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 20 Jan 2012 19:05:41 +0000 Subject: [Python-Dev] 2.7 now uses Sphinx 1.0 In-Reply-To: References: Message-ID: <4F19BB05.3010300@simplistix.co.uk> On 14/01/2012 16:14, Sandro Tosi wrote: > Hello, > just a heads-up: documentation for 2.7 branch has been ported to use > sphinx 1.0, so now the same syntax can be used for 2.x and 3.x > patches, hopefully easying working on both python stacks. That's great news, does that now mean the objects inventory for Python 2.7 and Python 3 on python.org now supports referring to section headers from 3rd party packages? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From donald.stufft at gmail.com Fri Jan 20 20:51:16 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 20 Jan 2012 14:51:16 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/20/2012 02:04 PM, Donald Stufft wrote: > > > Even if a MemoryException is raised I believe that is still a > > fundamental change in the documented contract of dictionary API. > > > > > How so? Dictionary inserts can *already* raise that error. Because it's raising it for a fundamentally different thing. "You have plenty of memory, but we decided to add an arbitrary limit that has nothing to do with memory and pretend you are out of memory anyways". > > > I don't believe there is a way to fix this without breaking someones > > application. The major differences I see between the two solutions is > > that counting will break people's applications who are otherwise > > following the documented api contract of dictionaries, > > > > > Do you have a case in mind where legitimate user data (not crafted as > part of a DoS attack) would trip the 1000-collision limit? How likely is > it that such cases exist in already-deployed applications, compared to > the known breakage in existing applications due to hash randomization? > > I don't, but as there's never been a limit on how many collisions a dictionary can have, this would be a fundamental change in the documented (and undocumented) abilities of a dictionary. Dictionary key order has never been guaranteed, is documented to not be relied on, already changes depending on if you are using 32bit, 64bit, Jython, PyPy etc or as someone else pointed out, to any number of possible improvements to dict. The counting solution violates the existing contract in order to serve people who themselves are violating the contract. Even with their violation the method that I +1'd still serves to not break existing applications by default. > > > and randomization will break people's applications who are violating > > the documented api contract of dictionaries. > > > > Personally I feel that the lesser of two evils is to reward those who > > followed the documentation, and not reward those who didn't. > > > > > Except that I think your set is purely hypothetical, while the second set > is *lots* of deployed applications. > > Which is why I believe that it should be off by default on the bugfix, but easily enabled. (Flag, env var, whatever). That allows people to upgrade to a bugfix without breaking their application, and if this vulnerability affects them, they can enable it. I think the counting collision is at best a bandaid and not a proper fix stemmed from a desire to not break existing applications on a bugfix release which can be better solved by implementing the real fix and allowing people to control (only on the bugfix, on 3.3+ it should be forced to on always) if they have it enabled or not. > > > Tres. > - -- > =================================================================== > Tres Seaver +1 540-429-0999 tseaver at palladion.com (mailto:tseaver at palladion.com) > Palladion Software "Excellence by Design" http://palladion.com > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.10 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo > 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA > =u7Mt > -----END PGP SIGNATURE----- > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 20 20:27:07 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Jan 2012 11:27:07 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: <4F19C00B.6060103@stoneleaf.us> Donald Stufft wrote: > Even if a MemoryException is raised I believe that is still a > fundamental change in the documented contract of dictionary API. I don't > believe there is a way to fix this without breaking someones > application. The major differences I see between the two solutions is > that counting will break people's applications who are otherwise > following the documented api contract of dictionaries, and randomization > will break people's applications who are violating the documented api > contract of dictionaries. > > Personally I feel that the lesser of two evils is to reward those who > followed the documentation, and not reward those who didn't. > > So +1 for Randomization as the only option in 3.3, and off by default > with a flag or environment variable in bug fixes. I think it's the only > way to proceed that won't hurt people who have followed the documented > behavior. +1 ~Ethan~ From guido at python.org Fri Jan 20 21:02:39 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 12:02:39 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft wrote: > On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote: > > On 01/20/2012 02:04 PM, Donald Stufft wrote: > > Even if a MemoryException is raised I believe that is still a > fundamental change in the documented contract of dictionary API. > > How so? Dictionary inserts can *already* raise that error. > > Because it's raising it for a fundamentally different thing. "You have > plenty of memory, but we decided to add an arbitrary limit that has nothing > to do with memory and pretend you are out of memory anyways". > Actually due to fragmentation that can already happen. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at timgolden.me.uk Fri Jan 20 21:08:50 2012 From: mail at timgolden.me.uk (Tim Golden) Date: Fri, 20 Jan 2012 20:08:50 +0000 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> Message-ID: <4F19C9D2.30107@timgolden.me.uk> On 20/01/2012 19:14, Georg Brandl wrote: > Am 20.01.2012 00:54, schrieb "Martin v. L?wis": >>> I can't help noticing that so far, worries about the workload came mostly from >>> people who don't actually bear that load (this is no accusation!), while those >>> that do are the proponents of the PEP... >> >> Ok, so let me add then that I'm worried about the additional work-load. >> >> I'm particularly worried about the coordination of vacation across the >> three people that work on a release. It might well not be possible to >> make any release for a period of two months, which, in a six-months >> release cycle with two alphas and a beta, might mean that we (the >> release people) would need to adjust our vacation plans with the release >> schedule, or else step down (unless you would release the "normal" >> feature releases as source-only releases). > > Thanks for the reminder, Martin. Even with the current release schedule, > I think that the load on you is too much, and we need a whole team of > Windows release experts. It's not really fair that the RM usually changes > from release to release (at least every 2), and you have to do the same > for everyone. > > It looks like we have one volunteer already; if we find another, I think > one of them will also be not on vacation at most times :) I'm certainly happy to help out there. Like everyone I'm not always clear on my availability but the more people who know what needs to be done, the better ISTM. TJG From ethan at stoneleaf.us Fri Jan 20 21:05:12 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Jan 2012 12:05:12 -0800 Subject: [Python-Dev] exception chaining In-Reply-To: References: <4F199FE5.9080005@stoneleaf.us> Message-ID: <4F19C8F8.5000102@stoneleaf.us> Benjamin Peterson wrote: > 2012/1/20 Ethan Furman : >> Summary: >> >> Exception Chaining is cool, unless you are writing libraries that want to >> transform from Exception X to Exception Y as the the previous exception >> context is unnecessary, potentially confusing, and cluttery (yup, just made >> that word up!). >> >> For all the gory details, see http://bugs.python.org/issue6210. >> >> I'm going to attempt a patch implementing MRAB's suggestion: >> >> try: >> some_op >> except ValueError: >> raise as OtherError() # `raise` keeps context, `raise as` does not > > I dislike this syntax. Raise what as OtherError()? I think the "raise > x from None" idea is preferable, since it indicates you are nulling > the context. The optimal solution would be to have "raise X > nocontext", but that would obviously require another keyword... Raise 'the error' as OtherError. The problem I have with 'raise x from None' is it puts 'from None' clear at the end of line -- not a big deal on this very short example, but when you have actual text it's not as obvious: except SomeError(): raise SomeOtherError('explanatory text with actual %data to help track down the problem' % data) from None Of course, I suppose that same issue exists with the 'raise x from exc' syntax, and 'from None' certainly matches that better... ~Ethan~ From benjamin at python.org Fri Jan 20 21:56:27 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 20 Jan 2012 15:56:27 -0500 Subject: [Python-Dev] exception chaining In-Reply-To: <4F19C8F8.5000102@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: 2012/1/20 Ethan Furman : > Benjamin Peterson wrote: >> >> 2012/1/20 Ethan Furman : >>> >>> Summary: >>> >>> Exception Chaining is cool, unless you are writing libraries that want to >>> transform from Exception X to Exception Y as the the previous exception >>> context is unnecessary, potentially confusing, and cluttery (yup, just >>> made >>> that word up!). >>> >>> For all the gory details, see http://bugs.python.org/issue6210. >>> >>> I'm going to attempt a patch implementing MRAB's suggestion: >>> >>> try: >>> ? some_op >>> except ValueError: >>> ? raise as OtherError() # `raise` keeps context, `raise as` does not >> >> >> I dislike this syntax. Raise what as OtherError()? I think the "raise >> x from None" idea is preferable, since it indicates you are nulling >> the context. The optimal solution would be to have "raise X >> nocontext", but that would obviously require another keyword... > > > Raise 'the error' as OtherError. Where 'the error' is? Aren't you trying to override the current error? > > The problem I have with 'raise x from None' is it puts 'from None' clear at > the end of line -- not a big deal on this very short example, but when you > have actual text it's not as obvious: > > except SomeError(): > ? ?raise SomeOtherError('explanatory text with actual %data to help track > down the problem' % data) from None > > Of course, I suppose that same issue exists with the 'raise x from exc' > syntax, and 'from None' certainly matches that better... Exactly! -- Regards, Benjamin From d01c at uni-bremen.de Fri Jan 20 21:20:12 2012 From: d01c at uni-bremen.de (Dr.-Ing. Ingo D. Rullhusen) Date: Fri, 20 Jan 2012 21:20:12 +0100 Subject: [Python-Dev] negative ref count on windows debug version Message-ID: <4F19CC7C.7080603@uni-bremen.de> Hello, using loc = PyDict_New(); Py_XDECREF(loc); or PyObject *src = Py_CompileString( code.toStdString().c_str(), "", Py_single_input ); Py_XDECREF(src); results in a "object at blahblah has negative ref count -1" error on windows visual studio in debug mode. And yes, python is compiled and linked in debug mode also. The release version seems to work. This happens in version 2.6.7 and 2.7.2. Any hints? Thanks Ingo From g.brandl at gmx.net Fri Jan 20 22:06:11 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 20 Jan 2012 22:06:11 +0100 Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions In-Reply-To: <4F19C9D2.30107@timgolden.me.uk> References: <20120117213440.0008fd70@pitrou.net> <87zkdl68iz.fsf@uwakimon.sk.tsukuba.ac.jp> <20120118121530.2e6a3b52@pitrou.net> <87lip56urp.fsf@uwakimon.sk.tsukuba.ac.jp> <1326891727.3395.44.camel@localhost.localdomain> <87k44p6njd.fsf@uwakimon.sk.tsukuba.ac.jp> <1326901919.3395.67.camel@localhost.localdomain> <4F175FD6.30502@pearwood.info> <4F18AD18.2080901@v.loewis.de> <4F19C9D2.30107@timgolden.me.uk> Message-ID: Am 20.01.2012 21:08, schrieb Tim Golden: > On 20/01/2012 19:14, Georg Brandl wrote: >> Am 20.01.2012 00:54, schrieb "Martin v. L?wis": >>>> I can't help noticing that so far, worries about the workload came mostly from >>>> people who don't actually bear that load (this is no accusation!), while those >>>> that do are the proponents of the PEP... >>> >>> Ok, so let me add then that I'm worried about the additional work-load. >>> >>> I'm particularly worried about the coordination of vacation across the >>> three people that work on a release. It might well not be possible to >>> make any release for a period of two months, which, in a six-months >>> release cycle with two alphas and a beta, might mean that we (the >>> release people) would need to adjust our vacation plans with the release >>> schedule, or else step down (unless you would release the "normal" >>> feature releases as source-only releases). >> >> Thanks for the reminder, Martin. Even with the current release schedule, >> I think that the load on you is too much, and we need a whole team of >> Windows release experts. It's not really fair that the RM usually changes >> from release to release (at least every 2), and you have to do the same >> for everyone. >> >> It looks like we have one volunteer already; if we find another, I think >> one of them will also be not on vacation at most times :) > > > I'm certainly happy to help out there. Like everyone I'm > not always clear on my availability but the more people > who know what needs to be done, the better ISTM. Definitely. Thanks for volunteering, Tim! Georg From g.brandl at gmx.net Fri Jan 20 22:07:54 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 20 Jan 2012 22:07:54 +0100 Subject: [Python-Dev] exception chaining In-Reply-To: <4F19C8F8.5000102@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: Am 20.01.2012 21:05, schrieb Ethan Furman: > Benjamin Peterson wrote: >> 2012/1/20 Ethan Furman : >>> Summary: >>> >>> Exception Chaining is cool, unless you are writing libraries that want to >>> transform from Exception X to Exception Y as the the previous exception >>> context is unnecessary, potentially confusing, and cluttery (yup, just made >>> that word up!). >>> >>> For all the gory details, see http://bugs.python.org/issue6210. >>> >>> I'm going to attempt a patch implementing MRAB's suggestion: >>> >>> try: >>> some_op >>> except ValueError: >>> raise as OtherError() # `raise` keeps context, `raise as` does not >> >> I dislike this syntax. Raise what as OtherError()? I think the "raise >> x from None" idea is preferable, since it indicates you are nulling >> the context. The optimal solution would be to have "raise X >> nocontext", but that would obviously require another keyword... > > Raise 'the error' as OtherError. > > The problem I have with 'raise x from None' is it puts 'from None' clear > at the end of line -- not a big deal on this very short example, but > when you have actual text it's not as obvious: Well, the "as" in "raise as" would be very easily overlooked too. > except SomeError(): > raise SomeOtherError('explanatory text with actual %data to help > track down the problem' % data) from None In any case, I don't think the context suppression is the most important thing about the exception raising, so it doesn't need to stand out... Georg From tjreedy at udel.edu Fri Jan 20 22:38:01 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Jan 2012 16:38:01 -0500 Subject: [Python-Dev] exception chaining In-Reply-To: <4F19C8F8.5000102@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: Since 'raise' means 're-raise the current error', 'raise as OtherError' means (clearly to me, anyway) 're-raise the current error as OtherError'. This is just what you want to be able to say. Since 'raise' without a current error results in a TypeError, so should 'raise as OtherError'. I would just go with this as the proposal. -- Terry Jan Reedy From amauryfa at gmail.com Fri Jan 20 22:42:20 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 20 Jan 2012 22:42:20 +0100 Subject: [Python-Dev] negative ref count on windows debug version In-Reply-To: <4F19CC7C.7080603@uni-bremen.de> References: <4F19CC7C.7080603@uni-bremen.de> Message-ID: Hi, 2012/1/20 Dr.-Ing. Ingo D. Rullhusen > using > > loc = PyDict_New(); > Py_XDECREF(loc); > > or > > PyObject *src = Py_CompileString( code.toStdString().c_str(), > "", Py_single_input ); > Py_XDECREF(src); > > results in a "object at blahblah has negative ref count -1" error on > windows visual studio in debug mode. And yes, python is compiled and > linked in debug mode also. The release version seems to work. > > This happens in version 2.6.7 and 2.7.2. > This looks very unlikely. Python itself is written with tons of similar constructs, and works very well in debug mode. If you can isolate a reproducible case, please file a ticket on bugs.python.org, with all details: code, versions of the compiler, etc. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 20 23:11:15 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 20 Jan 2012 17:11:15 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: On 1/20/2012 2:51 PM, Donald Stufft wrote: > I think the counting collision is at best a bandaid and not a proper fix > stemmed from a desire to not break existing applications on a bugfix > release ... My opinion of counting is better than yours, but even conceding the theoretical, purity argument, our release process is practical as well. There have been a few occasions when fixes to bugs in our code have been delayed from a bugfix release to the next feature release -- because the fix would break too much code depending on the bug. Some years ago there was a proposal that we should deliberately tweak hash() to break 'buggy' code that depended on it not changing. This never happened. So it has been left de facto constant, to the extent it is, for some years. -- Terry Jan Reedy From wolfson at gmail.com Fri Jan 20 23:33:08 2012 From: wolfson at gmail.com (Ben Wolfson) Date: Fri, 20 Jan 2012 14:33:08 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy wrote: > On 1/20/2012 2:51 PM, Donald Stufft wrote: > >> I think the counting collision is at best a bandaid and not a proper fix >> stemmed from a desire to not break existing applications on a bugfix >> release ... > > My opinion of counting is better than yours, but even conceding the > theoretical, purity argument, our release process is practical as well. > There have been a few occasions when fixes to bugs in our code have been > delayed from a bugfix release to the next feature release -- because the fix > would break too much code depending on the bug. AFAICT Brett's suggestion (which had occurred to me as well, but I'm not a core developer by any stretch) seemed to get lost in the debate: would it be possible to go with collision counting for bugfix releases and hash randomization for new feature releases? (Brett made it here: .) -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry] From ethan at stoneleaf.us Fri Jan 20 23:17:29 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 20 Jan 2012 14:17:29 -0800 Subject: [Python-Dev] exception chaining In-Reply-To: References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: <4F19E7F9.3010809@stoneleaf.us> Georg Brandl wrote: > > Well, the "as" in "raise as" would be very easily overlooked too. > > In any case, I don't think the context suppression is the most important > thing about the exception raising, so it doesn't need to stand out... Good point. From pydev at sievertsen.de Fri Jan 20 23:35:42 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Fri, 20 Jan 2012 23:35:42 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> Message-ID: <4F19EC3E.6000008@sievertsen.de> Am 20.01.2012 16:33, schrieb Guido van Rossum: > (I'm thinking that the original attack is trivial once the set of > 65000 colliding keys is public knowledge, which must be only a matter > of time. I think it's very likely that this will happen soon. For ASP and PHP there is attack-payload publicly available. PHP and ASP have patches to limit the number of query-variables. We're very lucky that there's no public payload for python yet, and all non-public software and payload I'm aware of is based upon my software. But this can change any moment. It's not really difficult to write software to create 32bit-collisions. Frank From donald.stufft at gmail.com Fri Jan 20 23:36:20 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Fri, 20 Jan 2012 17:36:20 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: I believe that either solution has the potential to break existing applications so to ensure that no applications are broken there will need to be a method of disabling it. I also believe that to maintain the backwards compatibility that Python has traditionally had in bug fix releases that either solution will need to default to off. Given those 2 things that I believe, I don't think that the argument should be which solution will break less, because I believe both will need to be off by default, but which solution more adequately solves the underlying problem. On Friday, January 20, 2012 at 5:11 PM, Terry Reedy wrote: > On 1/20/2012 2:51 PM, Donald Stufft wrote: > > > I think the counting collision is at best a bandaid and not a proper fix > > stemmed from a desire to not break existing applications on a bugfix > > release ... > > > > > My opinion of counting is better than yours, but even conceding the > theoretical, purity argument, our release process is practical as well. > There have been a few occasions when fixes to bugs in our code have been > delayed from a bugfix release to the next feature release -- because the > fix would break too much code depending on the bug. > > Some years ago there was a proposal that we should deliberately tweak > hash() to break 'buggy' code that depended on it not changing. This > never happened. So it has been left de facto constant, to the extent it > is, for some years. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org (mailto:Python-Dev at python.org) > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/donald.stufft%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vijaymajagaonkar at gmail.com Fri Jan 20 23:40:29 2012 From: vijaymajagaonkar at gmail.com (Vijay Majagaonkar) Date: Fri, 20 Jan 2012 17:40:29 -0500 Subject: [Python-Dev] python build failed on mac In-Reply-To: References: Message-ID: On 2012-01-20, at 4:29 AM, Hynek Schlawack wrote: > Hello Vijay > > > Am Freitag, 20. Januar 2012 um 00:56 schrieb Vijay N. Majagaonkar: > >> I am trying to build python 3 on mac and build failing with following error can somebody help me with this > It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: http://bugs.python.org/issue13241 > > make clean > CC=clang ./configure && make -s > Hi Hynek, Thanks for the help, but above command need to run in different way ./configure CC=clang make this allowed me to build the code but when ran test I got following error message [363/364/3] test_io python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 Thanks ;) From benjamin at python.org Fri Jan 20 23:45:08 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 20 Jan 2012 17:45:08 -0500 Subject: [Python-Dev] exception chaining In-Reply-To: References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: 2012/1/20 Terry Reedy : > Since 'raise' means 're-raise the current error', 'raise as OtherError' > means (clearly to me, anyway) 're-raise the current error as OtherError'. That doesn't make any sense. You're changing the exception completely not reraising it. -- Regards, Benjamin From guido at python.org Fri Jan 20 23:51:19 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 14:51:19 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: On Fri, Jan 20, 2012 at 2:33 PM, Ben Wolfson wrote: > On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy wrote: >> On 1/20/2012 2:51 PM, Donald Stufft wrote: >> >>> I think the counting collision is at best a bandaid and not a proper fix >>> stemmed from a desire to not break existing applications on a bugfix >>> release ... >> >> My opinion of counting is better than yours, but even conceding the >> theoretical, purity argument, our release process is practical as well. >> There have been a few occasions when fixes to bugs in our code have been >> delayed from a bugfix release to the next feature release -- because the fix >> would break too much code depending on the bug. > > AFAICT Brett's suggestion (which had occurred to me as well, but I'm > not a core developer by any stretch) seemed to get lost in the debate: > would it be possible to go with collision counting for bugfix releases > and hash randomization for new feature releases? (Brett made it here: > .) I made it earlier. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 20 23:55:03 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 14:55:03 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F19EC3E.6000008@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <4F19EC3E.6000008@sievertsen.de> Message-ID: On Fri, Jan 20, 2012 at 2:35 PM, Frank Sievertsen wrote: > Am 20.01.2012 16:33, schrieb Guido van Rossum: > >> (I'm thinking that the original attack is trivial once the set of 65000 >> colliding keys is public knowledge, which must be only a matter of time. > > > > I think it's very likely that this will happen soon. > > For ASP and PHP there is attack-payload publicly available. > PHP and ASP have patches to limit the number of query-variables. > > We're very lucky that there's no public payload for python yet, > and all non-public software and payload I'm aware of is based > upon my software. > > But this can change any moment. It's not really difficult to > write software to create 32bit-collisions. While we're debating the best fix, could we allow people to at least protect themselves against script-kiddies by offering fixes to cgi.py, django, webob and a few other popular frameworks that limits forms to 1000 keys? (I suppose it's really only POST requests that are vulnerable to script kiddies, because of the length restriction on URLs.) -- --Guido van Rossum (python.org/~guido) From amauryfa at gmail.com Sat Jan 21 00:03:55 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sat, 21 Jan 2012 00:03:55 +0100 Subject: [Python-Dev] Hashing proposal: change only string-only dicts In-Reply-To: References: <4F15E130.6010200@v.loewis.de> <20120117222611.64b3fd4e@pitrou.net> <4F161942.5040100@v.loewis.de> <4F170793.9060802@v.loewis.de> Message-ID: 2012/1/19 Gregory P. Smith > str[-1] is not likely to work if you want to maintain ABI compatibility. > Appending it to the data after the terminating \0 is more likely to be > possible, but if there is any possibility that existing compiled extension > modules have somehow inlined code to do allocation of the str field even > that is questionable (i don't think there are?). There are. Unfortunately. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/scalarapi.c#L710 -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 21 01:40:15 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Jan 2012 11:40:15 +1100 Subject: [Python-Dev] exception chaining In-Reply-To: References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: <4F1A096F.7060404@pearwood.info> Benjamin Peterson wrote: > 2012/1/20 Terry Reedy : >> Since 'raise' means 're-raise the current error', 'raise as OtherError' >> means (clearly to me, anyway) 're-raise the current error as OtherError'. > > That doesn't make any sense. You're changing the exception completely > not reraising it. I expect Terry is referring to the coder's intention, not the actual nuts and bolts of how it is implemented. def spam(): try: something() except HamError: raise SpamError is implemented by catching a HamError and raising a completely different SpamError, but the intention is to "replace the HamError which actually occurred with a more appropriate SpamError". At least that is *my* intention when I write code like the above, and it appears to be the usual intention in code I've seen that uses that idiom. Typically SpamError is part of the function's API while HamError is not. The fact that Python doesn't actually "replace" anything is besides the point. The purpose of the idiom is to turn one exception into another exception, which is as close as we can get to re-raising HamError as a SpamError instead. (It's not actually a re-raise as the traceback will point to a different line of code, but it's close enough.) I'd prefer "raise SpamError from None", but I understand that this cannot work due to technical limitations. If that is the case, then "raise as SpamError" works for me. -- Steven From steve at pearwood.info Sat Jan 21 01:53:31 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Jan 2012 11:53:31 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> Message-ID: <4F1A0C8B.8080500@pearwood.info> Guido van Rossum wrote: > On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft wrote: > >> On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote: >> >> On 01/20/2012 02:04 PM, Donald Stufft wrote: >> >> Even if a MemoryException is raised I believe that is still a >> fundamental change in the documented contract of dictionary API. >> >> How so? Dictionary inserts can *already* raise that error. >> >> Because it's raising it for a fundamentally different thing. "You have >> plenty of memory, but we decided to add an arbitrary limit that has nothing >> to do with memory and pretend you are out of memory anyways". >> > > Actually due to fragmentation that can already happen. Whether you have run out of total memory, or a single contiguous block, it is still a memory error. An arbitrary limit on collisions is not a memory error. If we were designing this API from scratch, would anyone propose using MemoryError for "you have reached a limit on collisions"? It has nothing to do with memory. Using MemoryError for something which isn't a memory error is ugly. How about RuntimeError? -- Steven From guido at python.org Sat Jan 21 02:02:53 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 17:02:53 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1A0C8B.8080500@pearwood.info> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> <4F1A0C8B.8080500@pearwood.info> Message-ID: It should derive from BaseException. --Guido van Rossum (sent from Android phone) On Jan 20, 2012 4:59 PM, "Steven D'Aprano" wrote: > Guido van Rossum wrote: > >> On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft >> **wrote: >> >> On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote: >>> >>> On 01/20/2012 02:04 PM, Donald Stufft wrote: >>> >>> Even if a MemoryException is raised I believe that is still a >>> fundamental change in the documented contract of dictionary API. >>> >>> How so? Dictionary inserts can *already* raise that error. >>> >>> Because it's raising it for a fundamentally different thing. "You have >>> plenty of memory, but we decided to add an arbitrary limit that has >>> nothing >>> to do with memory and pretend you are out of memory anyways". >>> >>> >> Actually due to fragmentation that can already happen. >> > > Whether you have run out of total memory, or a single contiguous block, it > is still a memory error. > > An arbitrary limit on collisions is not a memory error. If we were > designing this API from scratch, would anyone propose using MemoryError for > "you have reached a limit on collisions"? It has nothing to do with memory. > Using MemoryError for something which isn't a memory error is ugly. > > How about RuntimeError? > > > > -- > Steven > > ______________________________**_________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/**mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/**mailman/options/python-dev/** > guido%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 21 03:25:08 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Jan 2012 13:25:08 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: <4F1A2204.1050100@pearwood.info> Terry Reedy wrote: > On 1/20/2012 11:17 AM, Victor Stinner wrote: > >> There is no perfect solutions, drawbacks of each solution should be >> compared. > > Amen. > > One possible attack that has been described for a collision counting > dict depends on knowing precisely the trigger point. So let > MAXCOLLISIONS either be configureable or just choose a random count > between M and N, say 700 and 999. Have I missed something? Why wouldn't the attacker simply target 1000 collisions, and if the collision triggers at 700 instead of 1000, that's a bonus? -- Steven From steve at pearwood.info Sat Jan 21 03:33:24 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Jan 2012 13:33:24 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> <4F1A0C8B.8080500@pearwood.info> Message-ID: <4F1A23F4.3060305@pearwood.info> Guido van Rossum wrote: > It should derive from BaseException. RuntimeError meets that requirement, and it is an existing exception so there are no issues with introducing a new built-in exception to a point release. py> issubclass(RuntimeError, BaseException) True -- Steven From benjamin at python.org Sat Jan 21 03:36:32 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 20 Jan 2012 21:36:32 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1A23F4.3060305@pearwood.info> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> <4F1A0C8B.8080500@pearwood.info> <4F1A23F4.3060305@pearwood.info> Message-ID: 2012/1/20 Steven D'Aprano : > Guido van Rossum wrote: >> >> It should derive from BaseException. > > > RuntimeError meets that requirement, and it is an existing exception so > there are no issues with introducing a new built-in exception to a point > release. > > py> issubclass(RuntimeError, BaseException) > True Guido meant a direct subclass. -- Regards, Benjamin From guido at python.org Sat Jan 21 03:37:25 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jan 2012 18:37:25 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1A23F4.3060305@pearwood.info> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <865D1F6FFDE344E89259E3BACF5370DA@gmail.com> <4F1A0C8B.8080500@pearwood.info> <4F1A23F4.3060305@pearwood.info> Message-ID: On Fri, Jan 20, 2012 at 6:33 PM, Steven D'Aprano wrote: > Guido van Rossum wrote: >> >> It should derive from BaseException. > RuntimeError meets that requirement, and it is an existing exception so > there are no issues with introducing a new built-in exception to a point > release. > > py> issubclass(RuntimeError, BaseException) > True Sorry, I was ambiguous. I meant it should not derive from Exception. It goes RuntimeError -> StandardError -> Exception -> BaseException. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat Jan 21 07:02:14 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 21 Jan 2012 17:02:14 +1100 Subject: [Python-Dev] exception chaining In-Reply-To: <4F199FE5.9080005@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> Message-ID: <4F1A54E6.5070507@pearwood.info> Ethan Furman wrote: > The question I have at the moment is: should `raise as` be an error if > no exception is currently being handled? I think so. "raise as Spam" essentially means "raise Spam with the context set to None". That's actually only useful if the context otherwise wouldn't be None, that is, if you're raising an exception when another exception is active. Doing it "just in case" defeats the usefulness of exception chaining, and should be discouraged. It is easier to change our minds later and allow "raise as" outside of an except block, than to change our minds and forbid it. > > Example: > > def smurfy(x): > if x != 'magic flute': > raise as WrongInstrument > do_something_with_x > > If this is allowed then `smurfy` could be called from inside an `except` > clause or outside it. What's your use-case? The only one I can think of is this: def choose_your_own_exception(x): if x < 0: raise as ValueError elif x == 0: raise as SpamError elif x < 1: raise as SmallerThanOneError else: raise as RuntimeError try: something() except TypeError: choose_your_own_exception(x) I don't think we want to encourage such practices. Besides, if you really need such an exception selector, change every "raise as" to return, then do: try: something() except TypeError: raise as choose_your_own_exception(x) Much more clear. > I don't care for it for two reasons: > > - I don't like the way it looks > - I can see it encouraging always using `raise as` instead of `raise` > and losing the value of exception chaining. > > Other thoughts? > > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info > From tjreedy at udel.edu Sat Jan 21 07:07:00 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 21 Jan 2012 01:07:00 -0500 Subject: [Python-Dev] exception chaining In-Reply-To: <4F1A096F.7060404@pearwood.info> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> <4F1A096F.7060404@pearwood.info> Message-ID: On 1/20/2012 7:40 PM, Steven D'Aprano wrote: > Benjamin Peterson wrote: >> 2012/1/20 Terry Reedy : >>> Since 'raise' means 're-raise the current error', 'raise as OtherError' >>> means (clearly to me, anyway) 're-raise the current error as >>> OtherError'. >> >> That doesn't make any sense. You're changing the exception completely >> not reraising it. > > I expect Terry is referring to the coder's intention, not the actual > nuts and bolts of how it is implemented. Yes, same error situation, translated, typically from developer language to app language. > def spam(): > try: > something() > except HamError: > raise SpamError > > is implemented by catching a HamError and raising a completely different > SpamError, but the intention is to "replace the HamError which actually > occurred with a more appropriate SpamError". > > At least that is *my* intention when I write code like the above, and it > appears to be the usual intention in code I've seen that uses that > idiom. Typically SpamError is part of the function's API while HamError > is not. -- Terry Jan Reedy From senthil at uthcode.com Sat Jan 21 07:32:12 2012 From: senthil at uthcode.com (Senthil Kumaran) Date: Sat, 21 Jan 2012 14:32:12 +0800 Subject: [Python-Dev] 2.7 now uses Sphinx 1.0 In-Reply-To: <4F19BB05.3010300@simplistix.co.uk> References: <4F19BB05.3010300@simplistix.co.uk> Message-ID: <20120121063212.GA1988@mathmagic> On Fri, Jan 20, 2012 at 07:05:41PM +0000, Chris Withers wrote: > > That's great news, does that now mean the objects inventory for > Python 2.7 and Python 3 on python.org now supports referring to > section headers from 3rd party packages? Nope. It does not seem to have any relation to that. Would you care to explain more, possibly show an example in rst as what you mean? -- Senthil From g.brandl at gmx.net Sat Jan 21 08:43:33 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 21 Jan 2012 08:43:33 +0100 Subject: [Python-Dev] 2.7 now uses Sphinx 1.0 In-Reply-To: <4F19BB05.3010300@simplistix.co.uk> References: <4F19BB05.3010300@simplistix.co.uk> Message-ID: Am 20.01.2012 20:05, schrieb Chris Withers: > On 14/01/2012 16:14, Sandro Tosi wrote: >> Hello, >> just a heads-up: documentation for 2.7 branch has been ported to use >> sphinx 1.0, so now the same syntax can be used for 2.x and 3.x >> patches, hopefully easying working on both python stacks. > > That's great news, does that now mean the objects inventory for Python > 2.7 and Python 3 on python.org now supports referring to section headers > from 3rd party packages? Yes, they should -- there's something wrong with the automatic build still, but I'll fix that ASAP. Georg From matthew at woodcraft.me.uk Sat Jan 21 16:50:59 2012 From: matthew at woodcraft.me.uk (Matthew Woodcraft) Date: Sat, 21 Jan 2012 15:50:59 +0000 (UTC) Subject: [Python-Dev] Counting collisions for the win References: Message-ID: Victor Stinner wrote: > I propose to solve the hash collision vulnerability by counting > collisions [...] > We now know all issues of the randomized hash solution, and I > think that there are more drawbacks than advantages. IMO the > randomized hash is overkill to fix the hash collision issue. For web frameworks, forcing an exception is less harmful than forcing a many-second delay, but I think it's hard to be confident that there aren't other vulnerable applications where it's the other way round. Web frameworks like the exception because they already have backstop exception handlers, and anyway they use short-lived processes and keep valuable data in databases rather than process memory. Web frameworks don't like the delay because they allow unauthenticated users to submit many requests (including multiple requests in parallel), and they normally expect each response to take little cpu time. But many programs are not like this. What about a log analyser or a mailing list archiver or a web crawler or a game server or some other kind of program we haven't considered? -M- From guido at python.org Sat Jan 21 17:45:29 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 21 Jan 2012 08:45:29 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: Message-ID: On Sat, Jan 21, 2012 at 7:50 AM, Matthew Woodcraft wrote: > Victor Stinner ? wrote: >> I propose to solve the hash collision vulnerability by counting >> collisions [...] > >> We now know all issues of the randomized hash solution, and I >> think that there are more drawbacks than advantages. IMO the >> randomized hash is overkill to fix the hash collision issue. > > > For web frameworks, forcing an exception is less harmful than forcing a > many-second delay, but I think it's hard to be confident that there > aren't other vulnerable applications where it's the other way round. > > > Web frameworks like the exception because they already have backstop > exception handlers, and anyway they use short-lived processes and keep > valuable data in databases rather than process memory. > > Web frameworks don't like the delay because they allow unauthenticated > users to submit many requests (including multiple requests in parallel), > and they normally expect each response to take little cpu time. > > > But many programs are not like this. > > What about a log analyser or a mailing list archiver or a web crawler or > a game server or some other kind of program we haven't considered? If my log crawler ended up taking minutes per log entry instead of milliseconds I'd have to kill it anyway. Web crawlers are huge multi-process systems that are as robust as web servers, or more. Game servers are just web apps. -- --Guido van Rossum (python.org/~guido) From hs at ox.cx Sat Jan 21 19:57:23 2012 From: hs at ox.cx (Hynek Schlawack) Date: Sat, 21 Jan 2012 19:57:23 +0100 Subject: [Python-Dev] python build failed on mac In-Reply-To: References: Message-ID: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar: > > > I am trying to build python 3 on mac and build failing with following error can somebody help me with this > > > > It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: http://bugs.python.org/issue13241 > > > > make clean > > CC=clang ./configure && make -s > > Thanks for the help, but above command need to run in different way > > ./configure CC=clang > make I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine. > this allowed me to build the code but when ran test I got following error message > > [363/364/3] test_io > python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > > I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 Please ensure there aren't any gcc-created objects left by running "make distclean" first. -h From dmalcolm at redhat.com Sat Jan 21 21:22:34 2012 From: dmalcolm at redhat.com (David Malcolm) Date: Sat, 21 Jan 2012 15:22:34 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F198E74.3050807@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <4F198E74.3050807@sievertsen.de> Message-ID: <1327177356.4992.265.camel@surprise> On Fri, 2012-01-20 at 16:55 +0100, Frank Sievertsen wrote: > Hello, > > I still see at least two ways to create a DOS attack even with the > collison-counting-patch. [snip description of two types of attack on the collision counting approach] > What to do now? > I think it's not smart to reduce the number of allowed collisions > dramatically > AND count all slot-collisions at the same time. Frank: did you see the new approach I proposed in: http://bugs.python.org/issue13703#msg151735 http://bugs.python.org/file24289/amortized-probe-counting-dmalcolm-2012-01-21-003.patch (repurposes the ma_smalltable region of large dictionaries to add tracking of each such dict's average iterations taken per modification, and raise an exception when it exceeds a particular ratio) I'm interested in hearing how it holds up against your various test cases, or what flaws there are in it. Thanks! Dave From vijaymajagaonkar at gmail.com Sat Jan 21 21:24:00 2012 From: vijaymajagaonkar at gmail.com (Vijay Majagaonkar) Date: Sat, 21 Jan 2012 15:24:00 -0500 Subject: [Python-Dev] python build failed on mac In-Reply-To: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> References: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> Message-ID: On 2012-01-21, at 1:57 PM, Hynek Schlawack wrote: > Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar: >>>> I am trying to build python 3 on mac and build failing with following error can somebody help me with this >>> >>> It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: http://bugs.python.org/issue13241 >>> >>> make clean >>> CC=clang ./configure && make -s >> >> Thanks for the help, but above command need to run in different way >> >> ./configure CC=clang >> make > > > I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine. I am not sure, that was just try and worked for me, with first option suggested by you was throwing same compile error then I tried with this that worked :) > >> this allowed me to build the code but when ran test I got following error message >> >> [363/364/3] test_io >> python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >> *** error: can't allocate region >> *** set a breakpoint in malloc_error_break to debug >> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >> *** error: can't allocate region >> *** set a breakpoint in malloc_error_break to debug >> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >> *** error: can't allocate region >> *** set a breakpoint in malloc_error_break to debug >> >> I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 > > Please ensure there aren't any gcc-created objects left by running "make distclean" first. I have tried this option too but still result is same, I have attached test result if that will helps -------------- next part -------------- A non-text attachment was scrubbed... Name: mac_test.log Type: application/octet-stream Size: 3051915 bytes Desc: not available URL: -------------- next part -------------- and I will like to work on this if you give me some guideline to look into this issue Thanks for the help ;) From solipsis at pitrou.net Sat Jan 21 23:20:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 21 Jan 2012 23:20:47 +0100 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. References: Message-ID: <20120121232047.22c19409@pitrou.net> On Sat, 21 Jan 2012 21:51:43 +0100 gregory.p.smith wrote: > http://hg.python.org/cpython/rev/d01fecadf3ea > changeset: 74561:d01fecadf3ea > branch: 3.2 > parent: 74558:03e61104f7a2 > user: Gregory P. Smith > date: Sat Jan 21 12:31:25 2012 -0800 > summary: > Avoid the compiler warning about the unused return value. Can't that give you another warning about the ssize_t being truncated to int? How about the following instead? (void) write(...); cheers Antoine. From benjamin at python.org Sat Jan 21 23:24:56 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 21 Jan 2012 17:24:56 -0500 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: <20120121232047.22c19409@pitrou.net> References: <20120121232047.22c19409@pitrou.net> Message-ID: 2012/1/21 Antoine Pitrou : > On Sat, 21 Jan 2012 21:51:43 +0100 > gregory.p.smith wrote: >> http://hg.python.org/cpython/rev/d01fecadf3ea >> changeset: ? 74561:d01fecadf3ea >> branch: ? ? ?3.2 >> parent: ? ? ?74558:03e61104f7a2 >> user: ? ? ? ?Gregory P. Smith >> date: ? ? ? ?Sat Jan 21 12:31:25 2012 -0800 >> summary: >> ? Avoid the compiler warning about the unused return value. > > Can't that give you another warning about the ssize_t being truncated to > int? > How about the following instead? > > ? ?(void) write(...); Also, if you use a recent enough version of gcc, ./configure will disable the warning. I would prefer if stop using these kinds of hacks. -- Regards, Benjamin From stefan at bytereef.org Sat Jan 21 23:33:08 2012 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 21 Jan 2012 23:33:08 +0100 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: References: <20120121232047.22c19409@pitrou.net> Message-ID: <20120121223308.GA13093@sleipnir.bytereef.org> Benjamin Peterson wrote: > > Can't that give you another warning about the ssize_t being truncated to > > int? > > How about the following instead? > > > > (void) write(...); > > Also, if you use a recent enough version of gcc, ./configure will > disable the warning. I would prefer if stop using these kinds of > hacks. Do you mean (void)write(...)? Many people think this is good practice, since it indicates to the reader that the return value is deliberately ignored. Stefan Krah From benjamin at python.org Sat Jan 21 23:53:17 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 21 Jan 2012 17:53:17 -0500 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: <20120121223308.GA13093@sleipnir.bytereef.org> References: <20120121232047.22c19409@pitrou.net> <20120121223308.GA13093@sleipnir.bytereef.org> Message-ID: 2012/1/21 Stefan Krah : > Benjamin Peterson wrote: >> > Can't that give you another warning about the ssize_t being truncated to >> > int? >> > How about the following instead? >> > >> > ? (void) write(...); >> >> Also, if you use a recent enough version of gcc, ./configure will >> disable the warning. I would prefer if stop using these kinds of >> hacks. > > Do you mean (void)write(...)? Many people think this is good practice, > since it indicates to the reader that the return value is deliberately > ignored. Not doing anything with it seems fairly deliberate to me. -- Regards, Benjamin From solipsis at pitrou.net Sat Jan 21 23:52:36 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 21 Jan 2012 23:52:36 +0100 Subject: [Python-Dev] cpython (3.2): Fixes issue #8052: The posix subprocess module's close_fds behavior was References: Message-ID: <20120121235236.139cceac@pitrou.net> On Sat, 21 Jan 2012 23:39:41 +0100 gregory.p.smith wrote: > http://hg.python.org/cpython/rev/61aa484a3e54 > changeset: 74563:61aa484a3e54 > branch: 3.2 > parent: 74561:d01fecadf3ea > user: Gregory P. Smith > date: Sat Jan 21 14:01:08 2012 -0800 > summary: > Fixes issue #8052: The posix subprocess module's close_fds behavior was > suboptimal by closing all possible file descriptors rather than just > the open ones in the child process before exec(). For what it's worth, I'm not really confident with so much new low-level code in a bugfix release. IMHO it's more of a new feature, since it's a performance improvement. Regards Antoine. From greg at krypto.org Sun Jan 22 00:02:58 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 21 Jan 2012 15:02:58 -0800 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: <20120121223308.GA13093@sleipnir.bytereef.org> References: <20120121232047.22c19409@pitrou.net> <20120121223308.GA13093@sleipnir.bytereef.org> Message-ID: On Sat, Jan 21, 2012 at 2:33 PM, Stefan Krah wrote: > Benjamin Peterson wrote: >> > Can't that give you another warning about the ssize_t being truncated to >> > int? >> > How about the following instead? >> > >> > ? (void) write(...); >> >> Also, if you use a recent enough version of gcc, ./configure will >> disable the warning. I would prefer if stop using these kinds of >> hacks. > > Do you mean (void)write(...)? Many people think this is good practice, > since it indicates to the reader that the return value is deliberately > ignored. Unfortunately (void) write(...) does not get rid of the warning. Asking me to change the version of the compiler i'm using is unfortunately not helpful. I don't want to see this warning on any common default compiler versions today. I am not going to use a different gcc/g++ version than what my distro provides to build python unless we start making that demand of all CPython users as well. It is normally a useful warning message, just not in this specific case. -gps From greg.ewing at canterbury.ac.nz Sun Jan 22 00:22:56 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jan 2012 12:22:56 +1300 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> Message-ID: <4F1B48D0.3060309@canterbury.ac.nz> Glyph wrote: > Yes, but you /can/ look at a 'yield' and conclude that you /might/ need > a lock, and that you have to think about it. My concern is that you will end up with vastly more 'yield from's than places that require locks, so most of them are just noise. If you bite your nails over whether a lock is needed every time you see one, they will cause you a lot more anxiety than they alleviate. > Sometimes there's no alternative, but wherever I can, I avoid thinking, > especially hard thinking. This maxim has served me very well throughout > my programming career ;-). There are already well-known techniques for dealing with concurrency that minimise the amount of hard thinking required. You devise some well-behaved abstractions, such as queues, and put all your hard thinking into implementing them. Then you build the rest of your code around those abstractions. That way you don't have to rely on crutches such as explicitly marking everything that might cause a task switch, because it doesn't matter. -- Greg From greg at krypto.org Sun Jan 22 00:36:26 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 21 Jan 2012 15:36:26 -0800 Subject: [Python-Dev] cpython (3.2): Fixes issue #8052: The posix subprocess module's close_fds behavior was In-Reply-To: <20120121235236.139cceac@pitrou.net> References: <20120121235236.139cceac@pitrou.net> Message-ID: On Sat, Jan 21, 2012 at 2:52 PM, Antoine Pitrou wrote: > On Sat, 21 Jan 2012 23:39:41 +0100 > gregory.p.smith wrote: >> http://hg.python.org/cpython/rev/61aa484a3e54 >> changeset: ? 74563:61aa484a3e54 >> branch: ? ? ?3.2 >> parent: ? ? ?74561:d01fecadf3ea >> user: ? ? ? ?Gregory P. Smith >> date: ? ? ? ?Sat Jan 21 14:01:08 2012 -0800 >> summary: >> ? Fixes issue #8052: The posix subprocess module's close_fds behavior was >> suboptimal by closing all possible file descriptors rather than just >> the open ones in the child process before exec(). > > For what it's worth, I'm not really confident with so much new low-level > code in a bugfix release. > IMHO it's more of a new feature, since it's a performance improvement. No APIs change and it makes the subprocess module usable on systems running with high file descriptor limits where it was painfully slow to use in the past. This was a regression in behavior introduced with 3.2's change to make close_fds=True be the (quite sane) default so I do consider it a fix rather than a performance improvement. Obviously the final decision rests with the 3.2.3 release manager. For anyone uncomfortable with the code itself: The equivalent of that code has been in use in production at work continuously in multithreaded processes across a massive number of machines running a variety of versions of Linux for many years now. And the non-Linux code is effectively what the Java VM's Process module does. -gps From benjamin at python.org Sun Jan 22 01:15:38 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 21 Jan 2012 19:15:38 -0500 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: References: <20120121232047.22c19409@pitrou.net> <20120121223308.GA13093@sleipnir.bytereef.org> Message-ID: 2012/1/21 Gregory P. Smith : > On Sat, Jan 21, 2012 at 2:33 PM, Stefan Krah wrote: >> Benjamin Peterson wrote: >>> > Can't that give you another warning about the ssize_t being truncated to >>> > int? >>> > How about the following instead? >>> > >>> > ? (void) write(...); >>> >>> Also, if you use a recent enough version of gcc, ./configure will >>> disable the warning. I would prefer if stop using these kinds of >>> hacks. >> >> Do you mean (void)write(...)? Many people think this is good practice, >> since it indicates to the reader that the return value is deliberately >> ignored. > > Unfortunately (void) write(...) does not get rid of the warning. > > Asking me to change the version of the compiler i'm using is > unfortunately not helpful. ?I don't want to see this warning on any > common default compiler versions today. I'm not asking you to. I'm just saying this annoyance (which is all it is) has been fixed when the infrastructure is new enough to support it. > > I am not going to use a different gcc/g++ version than what my distro > provides to build python unless we start making that demand of all > CPython users as well. > > It is normally a useful warning message, just not in this specific case. -- Regards, Benjamin From jared.grubb at gmail.com Sun Jan 22 01:19:46 2012 From: jared.grubb at gmail.com (Jared Grubb) Date: Sat, 21 Jan 2012 16:19:46 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On 20 Jan 2012, at 10:49, Brett Cannon wrote: > Why can't we have our cake and eat it too? > > Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). > > I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data. I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash: * this would solve the security issue that someone mentioned about being able to deduce the hash because if they keep being mean it'll change anyway * for bugfix, start off without randomization (seed==0) and start to use it only when the collision count hits the threshold * for release, reseeding when you hit a certain threshold still seems like a good idea as it will make lookups/insertions better in the long-run AFAIUI, Python already doesnt guarantee order stability when you insert something into a dictionary, as in the worst case the dictionary has to resize its hash table, and then the order is freshly jumbled again. Just my two cents. Jared From benjamin at python.org Sun Jan 22 01:21:52 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 21 Jan 2012 19:21:52 -0500 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fixes issue #8052: The posix subprocess module's close_fds behavior was In-Reply-To: References: Message-ID: 2012/1/21 gregory.p.smith : ... > +/* Convert ASCII to a positive int, no libc call. no overflow. -1 on error. */ Is no libc call important? > +static int _pos_int_from_ascii(char *name) To be consistent with the rest of posixmodule.c, "static int" should be on a different line from the signature. This also applies to all other function declarations added by this. > +{ > + ? ?int num = 0; > + ? ?while (*name >= '0' && *name <= '9') { > + ? ? ? ?num = num * 10 + (*name - '0'); > + ? ? ? ?++name; > + ? ?} > + ? ?if (*name) > + ? ? ? ?return -1; ?/* Non digit found, not a number. */ > + ? ?return num; > +} > + > + > +/* Returns 1 if there is a problem with fd_sequence, 0 otherwise. */ > +static int _sanity_check_python_fd_sequence(PyObject *fd_sequence) > +{ > + ? ?Py_ssize_t seq_idx, seq_len = PySequence_Length(fd_sequence); PySequence_Length can fail. > + ? ?long prev_fd = -1; > + ? ?for (seq_idx = 0; seq_idx < seq_len; ++seq_idx) { > + ? ? ? ?PyObject* py_fd = PySequence_Fast_GET_ITEM(fd_sequence, seq_idx); > + ? ? ? ?long iter_fd = PyLong_AsLong(py_fd); > + ? ? ? ?if (iter_fd < 0 || iter_fd < prev_fd || iter_fd > INT_MAX) { > + ? ? ? ? ? ?/* Negative, overflow, not a Long, unsorted, too big for a fd. */ > + ? ? ? ? ? ?return 1; > + ? ? ? ?} > + ? ?} > + ? ?return 0; > +} > + > + > +/* Is fd found in the sorted Python Sequence? */ > +static int _is_fd_in_sorted_fd_sequence(int fd, PyObject *fd_sequence) > +{ > + ? ?/* Binary search. */ > + ? ?Py_ssize_t search_min = 0; > + ? ?Py_ssize_t search_max = PySequence_Length(fd_sequence) - 1; > + ? ?if (search_max < 0) > + ? ? ? ?return 0; > + ? ?do { > + ? ? ? ?long middle = (search_min + search_max) / 2; > + ? ? ? ?long middle_fd = PyLong_AsLong( > + ? ? ? ? ? ? ? ?PySequence_Fast_GET_ITEM(fd_sequence, middle)); No check for error? > + ? ? ? ?if (fd == middle_fd) > + ? ? ? ? ? ?return 1; > + ? ? ? ?if (fd > middle_fd) > + ? ? ? ? ? ?search_min = middle + 1; > + ? ? ? ?else > + ? ? ? ? ? ?search_max = middle - 1; > + ? ?} while (search_min <= search_max); > + ? ?return 0; > +} > + > + > +/* Close all file descriptors in the range start_fd inclusive to > + * end_fd exclusive except for those in py_fds_to_keep. ?If the > + * range defined by [start_fd, end_fd) is large this will take a > + * long time as it calls close() on EVERY possible fd. > + */ > +static void _close_fds_by_brute_force(int start_fd, int end_fd, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject *py_fds_to_keep) > +{ > + ? ?Py_ssize_t num_fds_to_keep = PySequence_Length(py_fds_to_keep); > + ? ?Py_ssize_t keep_seq_idx; > + ? ?int fd_num; > + ? ?/* As py_fds_to_keep is sorted we can loop through the list closing > + ? ? * fds inbetween any in the keep list falling within our range. */ > + ? ?for (keep_seq_idx = 0; keep_seq_idx < num_fds_to_keep; ++keep_seq_idx) { > + ? ? ? ?PyObject* py_keep_fd = PySequence_Fast_GET_ITEM(py_fds_to_keep, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?keep_seq_idx); > + ? ? ? ?int keep_fd = PyLong_AsLong(py_keep_fd); > + ? ? ? ?if (keep_fd < start_fd) > + ? ? ? ? ? ?continue; > + ? ? ? ?for (fd_num = start_fd; fd_num < keep_fd; ++fd_num) { > + ? ? ? ? ? ?while (close(fd_num) < 0 && errno == EINTR); > + ? ? ? ?} > + ? ? ? ?start_fd = keep_fd + 1; > + ? ?} > + ? ?if (start_fd <= end_fd) { > + ? ? ? ?for (fd_num = start_fd; fd_num < end_fd; ++fd_num) { > + ? ? ? ? ? ?while (close(fd_num) < 0 && errno == EINTR); > + ? ? ? ?} > + ? ?} > +} > + > + > +#if defined(__linux__) && defined(HAVE_SYS_SYSCALL_H) > +/* It doesn't matter if d_name has room for NAME_MAX chars; we're using this > + * only to read a directory of short file descriptor number names. ?The kernel > + * will return an error if we didn't give it enough space. ?Highly Unlikely. > + * This structure is very old and stable: It will not change unless the kernel > + * chooses to break compatibility with all existing binaries. ?Highly Unlikely. > + */ > +struct linux_dirent { > + ? unsigned long ?d_ino; ? ? ? ?/* Inode number */ > + ? unsigned long ?d_off; ? ? ? ?/* Offset to next linux_dirent */ > + ? unsigned short d_reclen; ? ? /* Length of this linux_dirent */ > + ? char ? ? ? ? ? d_name[256]; ?/* Filename (null-terminated) */ > +}; > + > +/* Close all open file descriptors in the range start_fd inclusive to end_fd > + * exclusive. Do not close any in the sorted py_fds_to_keep list. > + * > + * This version is async signal safe as it does not make any unsafe C library > + * calls, malloc calls or handle any locks. ?It is _unfortunate_ to be forced > + * to resort to making a kernel system call directly but this is the ONLY api > + * available that does no harm. ?opendir/readdir/closedir perform memory > + * allocation and locking so while they usually work they are not guaranteed > + * to (especially if you have replaced your malloc implementation). ?A version > + * of this function that uses those can be found in the _maybe_unsafe variant. > + * > + * This is Linux specific because that is all I am ready to test it on. ?It > + * should be easy to add OS specific dirent or dirent64 structures and modify > + * it with some cpp #define magic to work on other OSes as well if you want. > + */ > +static void _close_open_fd_range_safe(int start_fd, int end_fd, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject* py_fds_to_keep) > +{ > + ? ?int fd_dir_fd; > + ? ?if (start_fd >= end_fd) > + ? ? ? ?return; > + ? ?fd_dir_fd = open(LINUX_SOLARIS_FD_DIR, O_RDONLY | O_CLOEXEC, 0); > + ? ?/* Not trying to open the BSD_OSX path as this is currently Linux only. */ > + ? ?if (fd_dir_fd == -1) { > + ? ? ? ?/* No way to get a list of open fds. */ > + ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep); > + ? ? ? ?return; > + ? ?} else { > + ? ? ? ?char buffer[sizeof(struct linux_dirent)]; > + ? ? ? ?int bytes; > + ? ? ? ?while ((bytes = syscall(SYS_getdents, fd_dir_fd, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(struct linux_dirent *)buffer, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sizeof(buffer))) > 0) { > + ? ? ? ? ? ?struct linux_dirent *entry; > + ? ? ? ? ? ?int offset; > + ? ? ? ? ? ?for (offset = 0; offset < bytes; offset += entry->d_reclen) { > + ? ? ? ? ? ? ? ?int fd; > + ? ? ? ? ? ? ? ?entry = (struct linux_dirent *)(buffer + offset); > + ? ? ? ? ? ? ? ?if ((fd = _pos_int_from_ascii(entry->d_name)) < 0) > + ? ? ? ? ? ? ? ? ? ?continue; ?/* Not a number. */ > + ? ? ? ? ? ? ? ?if (fd != fd_dir_fd && fd >= start_fd && fd < end_fd && > + ? ? ? ? ? ? ? ? ? ?!_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) { > + ? ? ? ? ? ? ? ? ? ?while (close(fd) < 0 && errno == EINTR); > + ? ? ? ? ? ? ? ?} > + ? ? ? ? ? ?} > + ? ? ? ?} > + ? ? ? ?close(fd_dir_fd); > + ? ?} > +} > + > +#define _close_open_fd_range _close_open_fd_range_safe > + > +#else ?/* NOT (defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)) */ > + > + > +/* Close all open file descriptors in the range start_fd inclusive to end_fd > + * exclusive. Do not close any in the sorted py_fds_to_keep list. > + * > + * This function violates the strict use of async signal safe functions. :( > + * It calls opendir(), readdir64() and closedir(). ?Of these, the one most > + * likely to ever cause a problem is opendir() as it performs an internal > + * malloc(). ?Practically this should not be a problem. ?The Java VM makes the > + * same calls between fork and exec in its own UNIXProcess_md.c implementation. > + * > + * readdir_r() is not used because it provides no benefit. ?It is typically > + * implemented as readdir() followed by memcpy(). ?See also: > + * ? http://womble.decadent.org.uk/readdir_r-advisory.html > + */ > +static void _close_open_fd_range_maybe_unsafe(int start_fd, int end_fd, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject* py_fds_to_keep) > +{ > + ? ?DIR *proc_fd_dir; > +#ifndef HAVE_DIRFD > + ? ?while (_is_fd_in_sorted_fd_sequence(start_fd, py_fds_to_keep) && > + ? ? ? ? ? (start_fd < end_fd)) { > + ? ? ? ?++start_fd; > + ? ?} > + ? ?if (start_fd >= end_fd) > + ? ? ? ?return; > + ? ?/* Close our lowest fd before we call opendir so that it is likely to > + ? ? * reuse that fd otherwise we might close opendir's file descriptor in > + ? ? * our loop. ?This trick assumes that fd's are allocated on a lowest > + ? ? * available basis. */ > + ? ?while (close(start_fd) < 0 && errno == EINTR); > + ? ?++start_fd; > +#endif > + ? ?if (start_fd >= end_fd) > + ? ? ? ?return; > + > + ? ?proc_fd_dir = opendir(BSD_OSX_FD_DIR); > + ? ?if (!proc_fd_dir) > + ? ? ? ?proc_fd_dir = opendir(LINUX_SOLARIS_FD_DIR); > + ? ?if (!proc_fd_dir) { > + ? ? ? ?/* No way to get a list of open fds. */ > + ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep); > + ? ?} else { > + ? ? ? ?struct dirent64 *dir_entry; > +#ifdef HAVE_DIRFD > + ? ? ? ?int fd_used_by_opendir = DIRFD(proc_fd_dir); > +#else > + ? ? ? ?int fd_used_by_opendir = start_fd - 1; > +#endif > + ? ? ? ?errno = 0; > + ? ? ? ?/* readdir64 is used to work around Solaris 9 bug 6395699. */ > + ? ? ? ?while ((dir_entry = readdir64(proc_fd_dir))) { > + ? ? ? ? ? ?int fd; > + ? ? ? ? ? ?if ((fd = _pos_int_from_ascii(dir_entry->d_name)) < 0) > + ? ? ? ? ? ? ? ?continue; ?/* Not a number. */ > + ? ? ? ? ? ?if (fd != fd_used_by_opendir && fd >= start_fd && fd < end_fd && > + ? ? ? ? ? ? ? ?!_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) { > + ? ? ? ? ? ? ? ?while (close(fd) < 0 && errno == EINTR); > + ? ? ? ? ? ?} > + ? ? ? ? ? ?errno = 0; > + ? ? ? ?} > + ? ? ? ?if (errno) { > + ? ? ? ? ? ?/* readdir error, revert behavior. Highly Unlikely. */ > + ? ? ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep); > + ? ? ? ?} > + ? ? ? ?closedir(proc_fd_dir); > + ? ?} > +} > + > +#define _close_open_fd_range _close_open_fd_range_maybe_unsafe > + > +#endif ?/* else NOT (defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)) */ > + > + > ?/* > ?* This function is code executed in the child process immediately after fork > ?* to set things up and call exec(). > @@ -46,12 +292,12 @@ > ? ? ? ? ? ? ? ? ? ? ? ?int errread, int errwrite, > ? ? ? ? ? ? ? ? ? ? ? ?int errpipe_read, int errpipe_write, > ? ? ? ? ? ? ? ? ? ? ? ?int close_fds, int restore_signals, > - ? ? ? ? ? ? ? ? ? ? ? int call_setsid, Py_ssize_t num_fds_to_keep, > + ? ? ? ? ? ? ? ? ? ? ? int call_setsid, > ? ? ? ? ? ? ? ? ? ? ? ?PyObject *py_fds_to_keep, > ? ? ? ? ? ? ? ? ? ? ? ?PyObject *preexec_fn, > ? ? ? ? ? ? ? ? ? ? ? ?PyObject *preexec_fn_args_tuple) > ?{ > - ? ?int i, saved_errno, fd_num, unused; > + ? ?int i, saved_errno, unused; > ? ? PyObject *result; > ? ? const char* err_msg = ""; > ? ? /* Buffer large enough to hold a hex integer. ?We can't malloc. */ > @@ -113,33 +359,8 @@ > ? ? ? ? POSIX_CALL(close(errwrite)); > ? ? } > > - ? ?/* close() is intentionally not checked for errors here as we are closing */ > - ? ?/* a large range of fds, some of which may be invalid. */ > - ? ?if (close_fds) { > - ? ? ? ?Py_ssize_t keep_seq_idx; > - ? ? ? ?int start_fd = 3; > - ? ? ? ?for (keep_seq_idx = 0; keep_seq_idx < num_fds_to_keep; ++keep_seq_idx) { > - ? ? ? ? ? ?PyObject* py_keep_fd = PySequence_Fast_GET_ITEM(py_fds_to_keep, > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?keep_seq_idx); > - ? ? ? ? ? ?int keep_fd = PyLong_AsLong(py_keep_fd); > - ? ? ? ? ? ?if (keep_fd < 0) { ?/* Negative number, overflow or not a Long. */ > - ? ? ? ? ? ? ? ?err_msg = "bad value in fds_to_keep."; > - ? ? ? ? ? ? ? ?errno = 0; ?/* We don't want to report an OSError. */ > - ? ? ? ? ? ? ? ?goto error; > - ? ? ? ? ? ?} > - ? ? ? ? ? ?if (keep_fd < start_fd) > - ? ? ? ? ? ? ? ?continue; > - ? ? ? ? ? ?for (fd_num = start_fd; fd_num < keep_fd; ++fd_num) { > - ? ? ? ? ? ? ? ?close(fd_num); > - ? ? ? ? ? ?} > - ? ? ? ? ? ?start_fd = keep_fd + 1; > - ? ? ? ?} > - ? ? ? ?if (start_fd <= max_fd) { > - ? ? ? ? ? ?for (fd_num = start_fd; fd_num < max_fd; ++fd_num) { > - ? ? ? ? ? ? ? ?close(fd_num); > - ? ? ? ? ? ?} > - ? ? ? ?} > - ? ?} > + ? ?if (close_fds) > + ? ? ? ?_close_open_fd_range(3, max_fd, py_fds_to_keep); > > ? ? if (cwd) > ? ? ? ? POSIX_CALL(chdir(cwd)); > @@ -227,7 +448,7 @@ > ? ? pid_t pid; > ? ? int need_to_reenable_gc = 0; > ? ? char *const *exec_array, *const *argv = NULL, *const *envp = NULL; > - ? ?Py_ssize_t arg_num, num_fds_to_keep; > + ? ?Py_ssize_t arg_num; > > ? ? if (!PyArg_ParseTuple( > ? ? ? ? ? ? args, "OOOOOOiiiiiiiiiiO:fork_exec", > @@ -243,9 +464,12 @@ > ? ? ? ? PyErr_SetString(PyExc_ValueError, "errpipe_write must be >= 3"); > ? ? ? ? return NULL; > ? ? } > - ? ?num_fds_to_keep = PySequence_Length(py_fds_to_keep); > - ? ?if (num_fds_to_keep < 0) { > - ? ? ? ?PyErr_SetString(PyExc_ValueError, "bad fds_to_keep"); > + ? ?if (PySequence_Length(py_fds_to_keep) < 0) { > + ? ? ? ?PyErr_SetString(PyExc_ValueError, "cannot get length of fds_to_keep"); > + ? ? ? ?return NULL; > + ? ?} > + ? ?if (_sanity_check_python_fd_sequence(py_fds_to_keep)) { > + ? ? ? ?PyErr_SetString(PyExc_ValueError, "bad value(s) in fds_to_keep"); > ? ? ? ? return NULL; > ? ? } > > @@ -348,8 +572,7 @@ > ? ? ? ? ? ? ? ? ? ?p2cread, p2cwrite, c2pread, c2pwrite, > ? ? ? ? ? ? ? ? ? ?errread, errwrite, errpipe_read, errpipe_write, > ? ? ? ? ? ? ? ? ? ?close_fds, restore_signals, call_setsid, > - ? ? ? ? ? ? ? ? ? num_fds_to_keep, py_fds_to_keep, > - ? ? ? ? ? ? ? ? ? preexec_fn, preexec_fn_args_tuple); > + ? ? ? ? ? ? ? ? ? py_fds_to_keep, preexec_fn, preexec_fn_args_tuple); > ? ? ? ? _exit(255); > ? ? ? ? return NULL; ?/* Dead code to avoid a potential compiler warning. */ > ? ? } > diff --git a/configure b/configure > --- a/configure > +++ b/configure > @@ -6165,7 +6165,7 @@ > ?sys/audioio.h sys/bsdtty.h sys/epoll.h sys/event.h sys/file.h sys/loadavg.h \ > ?sys/lock.h sys/mkdev.h sys/modem.h \ > ?sys/param.h sys/poll.h sys/select.h sys/socket.h sys/statvfs.h sys/stat.h \ > -sys/termio.h sys/time.h \ > +sys/syscall.h sys/termio.h sys/time.h \ > ?sys/times.h sys/types.h sys/un.h sys/utsname.h sys/wait.h pty.h libutil.h \ > ?sys/resource.h netpacket/packet.h sysexits.h bluetooth.h \ > ?bluetooth/bluetooth.h linux/tipc.h spawn.h util.h > diff --git a/configure.in b/configure.in > --- a/configure.in > +++ b/configure.in > @@ -1341,7 +1341,7 @@ > ?sys/audioio.h sys/bsdtty.h sys/epoll.h sys/event.h sys/file.h sys/loadavg.h \ > ?sys/lock.h sys/mkdev.h sys/modem.h \ > ?sys/param.h sys/poll.h sys/select.h sys/socket.h sys/statvfs.h sys/stat.h \ > -sys/termio.h sys/time.h \ > +sys/syscall.h sys/termio.h sys/time.h \ > ?sys/times.h sys/types.h sys/un.h sys/utsname.h sys/wait.h pty.h libutil.h \ > ?sys/resource.h netpacket/packet.h sysexits.h bluetooth.h \ > ?bluetooth/bluetooth.h linux/tipc.h spawn.h util.h) > diff --git a/pyconfig.h.in b/pyconfig.h.in > --- a/pyconfig.h.in > +++ b/pyconfig.h.in > @@ -789,6 +789,9 @@ > ?/* Define to 1 if you have the header file. */ > ?#undef HAVE_SYS_STAT_H > > +/* Define to 1 if you have the header file. */ > +#undef HAVE_SYS_SYSCALL_H > + > ?/* Define to 1 if you have the header file. */ > ?#undef HAVE_SYS_TERMIO_H -- Regards, Benjamin From paul at mcmillan.ws Sun Jan 22 05:09:10 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 21 Jan 2012 20:09:10 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb wrote: > I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash This is nonsense. You have to determine the random seed at startup, and it has to be uniform for the entire life of the process. You can't change it after Python has started. -Paul From steve at pearwood.info Sun Jan 22 05:24:02 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 22 Jan 2012 15:24:02 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> Message-ID: <4F1B8F62.9010503@pearwood.info> Paul McMillan wrote: > On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb wrote: >> I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash > > This is nonsense. You have to determine the random seed at startup, > and it has to be uniform for the entire life of the process. You can't > change it after Python has started. I may have a terminology problem here. I expect that a random seed must change every time it is used, otherwise the pseudorandom number generator using it just returns the same value each time. Should we be talking about a salt rather than a seed? -- Steven From stephen at xemacs.org Sun Jan 22 05:59:37 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 22 Jan 2012 13:59:37 +0900 Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about the unused return value. In-Reply-To: References: <20120121232047.22c19409@pitrou.net> <20120121223308.GA13093@sleipnir.bytereef.org> Message-ID: <87obtw5o3q.fsf@uwakimon.sk.tsukuba.ac.jp> Benjamin Peterson writes: > 2012/1/21 Stefan Krah : > > Do you mean (void)write(...)? Many people think this is good practice, > > since it indicates to the reader that the return value is deliberately > > ignored. > > Not doing anything with it seems fairly deliberate to me. It may be deliberate, but then again it may not be. EIBTI applies. From anacrolix at gmail.com Sun Jan 22 07:45:11 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 22 Jan 2012 17:45:11 +1100 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: <4F1B48D0.3060309@canterbury.ac.nz> References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> <4F1B48D0.3060309@canterbury.ac.nz> Message-ID: > My concern is that you will end up with vastly more 'yield from's > than places that require locks, so most of them are just noise. > If you bite your nails over whether a lock is needed every time > you see one, they will cause you a lot more anxiety than they > alleviate. Not necessarily. The yield from's follow the blocking control flow, which is surprisingly less common than you might think. Parts of your code naturally arise as not requiring blocking behaviour in the same manner as in Haskell where parts of your code are identified as requiring the IO monad. >> Sometimes there's no alternative, but wherever I can, I avoid thinking, >> especially hard thinking. ?This maxim has served me very well throughout my >> programming career ;-). I'd replace "hard thinking" with "future confusion" here. > There are already well-known techniques for dealing with > concurrency that minimise the amount of hard thinking required. > You devise some well-behaved abstractions, such as queues, and > put all your hard thinking into implementing them. Then you > build the rest of your code around those abstractions. That > way you don't have to rely on crutches such as explicitly > marking everything that might cause a task switch, because > it doesn't matter. It's my firm belief that this isn't sufficient. If this were true, then the Python internals could be improved by replacing the GIL with a series of channels/queues or what have you. State is complex, and without guarantees of immutability, it's just not practical to try to wrap every state object in some protocol to be passed back and forth on queues. From paul at mcmillan.ws Sun Jan 22 08:44:24 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sat, 21 Jan 2012 23:44:24 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1B8F62.9010503@pearwood.info> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: > I may have a terminology problem here. I expect that a random seed must > change every time it is used, otherwise the pseudorandom number generator > using it just returns the same value each time. Should we be talking about a > salt rather than a seed? You should read the several other threads, the bug, as well as the implementation and patch under discussion. Briefly, Python string hashes are calculated once per string, and then used in many places. You can't change the hash value for a string during program execution without breaking everything. The proposed change modifies the starting value of the hash function to include a process-wide randomly generated seed. This seed is chosen randomly at runtime, but cannot change once chosen. Using the seed changes the final output of the hash to be unpredictable to an attacker, solving the underlying problem. Salt could also be an appropriate term here, but since salt is generally changed on a per-use basis (a single process may use many different salts), seed is more correct, since this value is only chosen once per process. -Paul From greg at krypto.org Sun Jan 22 10:08:13 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 22 Jan 2012 01:08:13 -0800 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fixes issue #8052: The posix subprocess module's close_fds behavior was In-Reply-To: References: Message-ID: On Sat, Jan 21, 2012 at 4:21 PM, Benjamin Peterson wrote: > 2012/1/21 gregory.p.smith : > ... >> +/* Convert ASCII to a positive int, no libc call. no overflow. -1 on error. */ > > Is no libc call important? Yes. strtol() is not on the async signal safe C library function list. > >> +static int _pos_int_from_ascii(char *name) > > To be consistent with the rest of posixmodule.c, "static int" should > be on a different line from the signature. This also applies to all > other function declarations added by this. Python C style as a whole, yes. This file already has a mix of same line vs two line declarations, I added these following the style of the functions immediately surrounding them. Want a style fixup on the whole file? > >> +{ >> + ? ?int num = 0; >> + ? ?while (*name >= '0' && *name <= '9') { >> + ? ? ? ?num = num * 10 + (*name - '0'); >> + ? ? ? ?++name; >> + ? ?} >> + ? ?if (*name) >> + ? ? ? ?return -1; ?/* Non digit found, not a number. */ >> + ? ?return num; >> +} >> + >> + >> +/* Returns 1 if there is a problem with fd_sequence, 0 otherwise. */ >> +static int _sanity_check_python_fd_sequence(PyObject *fd_sequence) >> +{ >> + ? ?Py_ssize_t seq_idx, seq_len = PySequence_Length(fd_sequence); > > PySequence_Length can fail. It has already been checked not to by the only entry point into the code in this file. > >> + ? ?long prev_fd = -1; >> + ? ?for (seq_idx = 0; seq_idx < seq_len; ++seq_idx) { >> + ? ? ? ?PyObject* py_fd = PySequence_Fast_GET_ITEM(fd_sequence, seq_idx); >> + ? ? ? ?long iter_fd = PyLong_AsLong(py_fd); >> + ? ? ? ?if (iter_fd < 0 || iter_fd < prev_fd || iter_fd > INT_MAX) { >> + ? ? ? ? ? ?/* Negative, overflow, not a Long, unsorted, too big for a fd. */ >> + ? ? ? ? ? ?return 1; >> + ? ? ? ?} >> + ? ?} >> + ? ?return 0; >> +} >> + >> + >> +/* Is fd found in the sorted Python Sequence? */ >> +static int _is_fd_in_sorted_fd_sequence(int fd, PyObject *fd_sequence) >> +{ >> + ? ?/* Binary search. */ >> + ? ?Py_ssize_t search_min = 0; >> + ? ?Py_ssize_t search_max = PySequence_Length(fd_sequence) - 1; >> + ? ?if (search_max < 0) >> + ? ? ? ?return 0; >> + ? ?do { >> + ? ? ? ?long middle = (search_min + search_max) / 2; >> + ? ? ? ?long middle_fd = PyLong_AsLong( >> + ? ? ? ? ? ? ? ?PySequence_Fast_GET_ITEM(fd_sequence, middle)); > > No check for error? _sanity_check_python_fd_sequence() already checked the entire list to guarantee that there would not be any such error. >> + ? ? ? ?if (fd == middle_fd) >> + ? ? ? ? ? ?return 1; >> + ? ? ? ?if (fd > middle_fd) >> + ? ? ? ? ? ?search_min = middle + 1; >> + ? ? ? ?else >> + ? ? ? ? ? ?search_max = middle - 1; >> + ? ?} while (search_min <= search_max); >> + ? ?return 0; >> +} In general this is an extension module that is best viewed as a whole including its existing comments rather than as a diff. It contains code that will look "odd" in a diff because much of it executes in a path where not much is allowed (post fork, pre exec) and no useful way of responding to an error is possible so it attempts to pre-check for any possible errors up front so that later code that is unable to handle errors cannot possibly fail. -gps From victor.stinner at haypocalc.com Sun Jan 22 11:11:29 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 22 Jan 2012 11:11:29 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: > This seed is chosen randomly at runtime, but cannot > change once chosen. The hash is used to compare objects: if hash(obj1) != hash(obj2), objects are considered different. So two strings must have the same hash if their value is the same. > Salt could also be an appropriate term here, but since salt is > generally changed on a per-use basis (a single process may use many > different salts), seed is more correct, since this value is only > chosen once per process. We may use a different salt per dictionary. Victor From fuzzyman at voidspace.org.uk Sun Jan 22 14:14:19 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 22 Jan 2012 13:14:19 +0000 Subject: [Python-Dev] python build failed on mac In-Reply-To: References: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> Message-ID: On 21 Jan 2012, at 20:24, Vijay Majagaonkar wrote: > > On 2012-01-21, at 1:57 PM, Hynek Schlawack wrote: > >> Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar: >>>>> I am trying to build python 3 on mac and build failing with following error can somebody help me with this >>>> >>>> It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: http://bugs.python.org/issue13241 >>>> >>>> make clean >>>> CC=clang ./configure && make -s >>> >>> Thanks for the help, but above command need to run in different way >>> >>> ./configure CC=clang >>> make >> >> >> I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine. > > I am not sure, that was just try and worked for me, with first option suggested by you was throwing same compile error then I tried with this that worked :) The problems compiling Python 3 on the Mac with XCode 4.1 have been reported and discussed here: http://bugs.python.org/issue13241 This invocation worked for me: ./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug All the best, Michael Foord > >> >>> this allowed me to build the code but when ran test I got following error message >>> >>> [363/364/3] test_io >>> python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >>> *** error: can't allocate region >>> *** set a breakpoint in malloc_error_break to debug >>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >>> *** error: can't allocate region >>> *** set a breakpoint in malloc_error_break to debug >>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12) >>> *** error: can't allocate region >>> *** set a breakpoint in malloc_error_break to debug >>> >>> I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 >> >> Please ensure there aren't any gcc-created objects left by running "make distclean" first. > > I have tried this option too but still result is same, I have attached test result if that will helps and I will like to work on this if you give me some guideline to look into this issue > > > Thanks for the help > ;)_______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From lukasz at langa.pl Sun Jan 22 18:43:52 2012 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Sun, 22 Jan 2012 18:43:52 +0100 Subject: [Python-Dev] python build failed on mac In-Reply-To: References: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> Message-ID: <8FFB3BE9-E68A-4879-8967-3C103B2D61E4@langa.pl> Wiadomo?? napisana przez Michael Foord w dniu 22 sty 2012, o godz. 14:14: > ./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug Why the phony prefix? -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! Please consider the environment before printing out this e-mail. From fuzzyman at voidspace.org.uk Sun Jan 22 19:17:03 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 22 Jan 2012 18:17:03 +0000 Subject: [Python-Dev] python build failed on mac In-Reply-To: <8FFB3BE9-E68A-4879-8967-3C103B2D61E4@langa.pl> References: <5DED60FA213649C2BE5D8E2C161285C5@gmail.com> <8FFB3BE9-E68A-4879-8967-3C103B2D61E4@langa.pl> Message-ID: <9E5A02D9-9E7B-421C-AA4F-327927FD365B@voidspace.org.uk> On 22 Jan 2012, at 17:43, ?ukasz Langa wrote: > Wiadomo?? napisana przez Michael Foord w dniu 22 sty 2012, o godz. 14:14: > >> ./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug > > Why the phony prefix? Heh, it's what I've always done - I think copied from other developers. The dev guide suggests it: http://docs.python.org/devguide/setup.html#unix There is normally no need to install your built copy of Python! The interpreter will realize where it is being run from and thus use the files found in the working copy. If you are worried you might accidentally install your working copy build, you can add --prefix=/dev/null to the configuration step. Not that this is particularly a worry for me... All the best, Michael > > -- > Best regards, > ?ukasz Langa > Senior Systems Architecture Engineer > > IT Infrastructure Department > Grupa Allegro Sp. z o.o. > > Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! > Please consider the environment before printing out this e-mail. > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From regebro at gmail.com Sun Jan 22 20:53:41 2012 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 22 Jan 2012 20:53:41 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: On Sun, Jan 22, 2012 at 11:11, Victor Stinner wrote: >> This seed is chosen randomly at runtime, but cannot >> change once chosen. > > The hash is used to compare objects: if hash(obj1) != hash(obj2), > objects are considered different. So two strings must have the same > hash if their value is the same. > >> Salt could also be an appropriate term here, but since salt is >> generally changed on a per-use basis (a single process may use many >> different salts), seed is more correct, since this value is only >> chosen once per process. > > We may use a different salt per dictionary. Can we do that? I was thinking of ways to not raise errors when we get over a collision count, but instead somehow change the way the dictionary behaves when we get over the collision count, but I couldn't come up with something. Somehow adding a salt would be one possibility. But I don't see how it's doable except for the string-keys only case mentioned before. But I might just be lacking imagination. :-) //Lennart From solipsis at pitrou.net Sun Jan 22 21:13:32 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 22 Jan 2012 21:13:32 +0100 Subject: [Python-Dev] Counting collisions for the win References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: <20120122211332.6e95c6a6@pitrou.net> I think this thread is approaching the recursion limit. Be careful not to blow the stack :) Regards Antoine. On Sun, 22 Jan 2012 20:53:41 +0100 Lennart Regebro wrote: > On Sun, Jan 22, 2012 at 11:11, Victor Stinner > wrote: > >> This seed is chosen randomly at runtime, but cannot > >> change once chosen. > > > > The hash is used to compare objects: if hash(obj1) != hash(obj2), > > objects are considered different. So two strings must have the same > > hash if their value is the same. > > > >> Salt could also be an appropriate term here, but since salt is > >> generally changed on a per-use basis (a single process may use many > >> different salts), seed is more correct, since this value is only > >> chosen once per process. > > > > We may use a different salt per dictionary. > > Can we do that? I was thinking of ways to not raise errors when we get > over a collision count, but instead somehow change the way the > dictionary behaves when we get over the collision count, but I > couldn't come up with something. Somehow adding a salt would be one > possibility. But I don't see how it's doable except for the > string-keys only case mentioned before. > > But I might just be lacking imagination. :-) > > //Lennart From paul at mcmillan.ws Mon Jan 23 06:02:46 2012 From: paul at mcmillan.ws (Paul McMillan) Date: Sun, 22 Jan 2012 21:02:46 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: > We may use a different salt per dictionary. If we're willing to re-hash everything on a per-dictionary basis. That doesn't seem reasonable given our existing usage. From regebro at gmail.com Mon Jan 23 06:49:16 2012 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 23 Jan 2012 06:49:16 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: On Mon, Jan 23, 2012 at 06:02, Paul McMillan wrote: >> We may use a different salt per dictionary. > > If we're willing to re-hash everything on a per-dictionary basis. That > doesn't seem reasonable given our existing usage. Well, if we get crazy amounts of collisions, re-hashing with a new salt to get rid of those collisions seems quite reasonable to me... //Lennart From stephen at xemacs.org Mon Jan 23 07:15:54 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 23 Jan 2012 15:15:54 +0900 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: <87y5sz3pwl.fsf@uwakimon.sk.tsukuba.ac.jp> Lennart Regebro writes: > On Mon, Jan 23, 2012 at 06:02, Paul McMillan wrote: > >> We may use a different salt per dictionary. > > > > If we're willing to re-hash everything on a per-dictionary basis. That > > doesn't seem reasonable given our existing usage. > > Well, if we get crazy amounts of collisions, re-hashing with a new > salt to get rid of those collisions seems quite reasonable to me... But doesn't the whole idea of a hash table fall flat on its face if you need to worry about crazy amounts of collisions (outside of deliberate attacks)? From timothy.c.delaney at gmail.com Mon Jan 23 07:41:51 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 23 Jan 2012 17:41:51 +1100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: On 23 January 2012 16:49, Lennart Regebro wrote: > On Mon, Jan 23, 2012 at 06:02, Paul McMillan wrote: > >> We may use a different salt per dictionary. > > > > If we're willing to re-hash everything on a per-dictionary basis. That > > doesn't seem reasonable given our existing usage. > > Well, if we get crazy amounts of collisions, re-hashing with a new > salt to get rid of those collisions seems quite reasonable to me... Actually, this looks like it has the seed of a solution in it. I haven't scrutinised the following beyond "it sounds like it could work" - it could well contain nasty flaws. Assumption: We only get an excessive number of collisions during an attack (directly or indirectly). Assumption: Introducing a salt into hashes will change those hashes sufficiently to mitigate the attack (all discussion of randomising hashes makes this assumption). 1. Keep the current hashing (for all dictionaries) i.e. just using hash(key). 2. Count collisions. 3. If any key hits X collisions change that dictionary to use a random salt for hashes (at least for str and unicode keys). This salt would be remembered for the dictionary. Consequence: The dictionary would need to be rebuilt when an attack was detected. Consequence: Hash caching would no longer occur for this dictionary, making most operations more expensive. Consequence: Anything relying on the iteration order of a dictionary which has suffered excessive conflicts would fail. 4. (Optional) in 3.3, provide a way to get a dictionary with random salt (i.e. not wait for attack). Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From pydev at sievertsen.de Mon Jan 23 09:53:06 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Mon, 23 Jan 2012 09:53:06 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> Message-ID: <4F1D1FF2.4030701@sievertsen.de> Hello, I'd still prefer to see a randomized hash()-function (at least for 3.3). But to protect against the attacks it would be sufficient to use randomization for collision resolution in dicts (and sets). What if we use a second (randomized) hash-function in case there are many collisions in ONE lookup. This hash-function is used only for collision resolution and is not cached. The benefits: * protection against the known attacks * hash(X) stays stable and the same * dict order is only changed when there are many collisions * doctests will not break * enhanced collision resolution * RNG doesn't have to be initialized in smaller programs * nearly no slowdown of most dicts * second hash-function is only used for keys with higher collision-rate * lower probability to leak secrets * possibility to use different secrets for each dict The drawback: * need to add a second hash-function * slower than using one hash-function only, when > 20 collisions * need to add this to container-types? (if used for py3.3) * need to expose this to the user? (if used for py3.3) * works only for datatypes with this new function * possible to implement without breaking ABI? The following code is meant for explanation purpose only: for(perturb = hash; ; perturb >>= 5) { i = (i << 2) + i + perturb + 1; if((collisions++) == 20) { // perturb is already zero after 13 rounds. // 20 collisions are rare. // you can add && (ma_mask > 256) to make 100% sure // that it's not used for smaller dicts. if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) { // If type has a randomized hash, use this now for lookup i = perturb = PyObject_RandomizedHash(key)); } ..... If I got this right we could add a new function "tp_randomized_hash" to 3.3 release. But can we also add functions in older releases, without breaking ABI? If not, can we implement this somehow using a flag? FOR OLDER RELEASE < 3.3: Py_hash_t PyObject_RandomizedHash(PyVarObject *o) { PyTypeObject *tp = Py_TYPE(v); if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH)) return -1; global_flags_somewhere->USE_RANDOMIZED_HASH = 1; return (*tp->tp_hash)(v); } .... and in unicodeobject.c: (and wherever we need randomization) static Py_hash_t unicode_hash(PyUnicodeObject *self) { Py_ssize_t len; Py_UNICODE *p; Py_hash_t x; Py_hash_t prefix=0; Py_hash_t suffix=0; if(global_flags_somewhere->USE_RANDOMIZED_HASH) { global_flags_somewhere->USE_RANDOMIZED_HASH = 0; initialize_rng_if_not_already_done_and_return_seed(&prefix, &suffix); ..... (and don't cache in this case) ..... It's ugly, but if I understand this correctly, the GIL will protect us against race-conditions, right? Hello, internals experts: Would this work or is there a better way to do this without breaking the ABI? Frank From lukasz at langa.pl Mon Jan 23 15:49:11 2012 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Mon, 23 Jan 2012 15:49:11 +0100 Subject: [Python-Dev] exception chaining In-Reply-To: <4F19C8F8.5000102@stoneleaf.us> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> Message-ID: <3289A837-2555-4C5F-8F3D-E1BBF7889B6C@langa.pl> Wiadomo?? napisana przez Ethan Furman w dniu 20 sty 2012, o godz. 21:05: > The problem I have with 'raise x from None' is it puts 'from None' clear at the end of line from None raise SomeOtherError('etc.') Better yet: with nocontext(): raise SomeOtherError('etc.') But that's python-ideas territory ;) -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! Please consider the environment before printing out this e-mail. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 1898 bytes Desc: not available URL: From v+python at g.nevcal.com Mon Jan 23 19:25:43 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 23 Jan 2012 10:25:43 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1D1FF2.4030701@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> Message-ID: <4F1DA627.2020407@g.nevcal.com> On 1/23/2012 12:53 AM, Frank Sievertsen wrote: > > What if we use a second (randomized) hash-function in case there > are many collisions in ONE lookup. This hash-function is used only > for collision resolution and is not cached. So this sounds like SafeDict, but putting it under the covers and automatically converting from dict to SafeDict after a collision threshold has been reached. Let's call it fallback-dict. Compared to SafeDict as a programmer tool, fallback-dict has these benefits: * No need to change program (or library) source to respond to an attack * Order is preserved until the collision threshold has been reached * Performance is preserved until the collision threshold has been reached and costs: * converting the dict from one hash to the other by rehashing all the keys. Compared to always randomizing the hash, fallback-dict has these benefits: * hash (and perfomance) is deterministic: programs running on the same data set will have the same performance characteristic, unless the collision threshold is reached for that data set. * lower probability to leak secrets, because each attacked set/dict can have its own secret, randomized hash seed * patch would not need to include RNG initialization during startup, lowering the impact on short-running programs. What is not clear is how much SafeDict degrades performance when it is used; non-cached hashes will definitely have an impact. I'm not sure whether an implementation of fallback-dict in C code, would be significantly faster than the implementation of SafeDict in Python, to know whether comparing the performance of SafeDict and dict would be representative of the two stages of fallback-dict performance, but certainly the performance cost of SafeDict would be an upper bound on the performance cost of fallback-dict, once conversion takes place, but would not measure the conversion cost. The performance of fallback-dict does have to be significantly better than the performance of dict with collisions to be beneficial, but if the conversion cost is significant, triggering conversions could be an attack vector. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pydev at sievertsen.de Mon Jan 23 19:58:25 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Mon, 23 Jan 2012 19:58:25 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1DA627.2020407@g.nevcal.com> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> <4F1DA627.2020407@g.nevcal.com> Message-ID: <4F1DADD1.4070907@sievertsen.de> On 23.01.2012 19:25, Glenn Linderman wrote: > So this sounds like SafeDict, but putting it under the covers and > automatically converting from dict to SafeDict after a collision > threshold has been reached. Let's call it fallback-dict. > > and costs: > > * converting the dict from one hash to the other by rehashing all the > keys. That's not exactly what it does, it calls the randomized hash-function only for those keys, that that didn't find a free slot after 20 collision. And it uses this value only for the further collision resolution. So the value of hash() is used for the first 20 slots, randomized_hash() is used after that. 1st try: slot[i = perturb = hash]; 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)] 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)] .... 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)] 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)] This is also why there is no conversion needed. It's a per-key/per-lookup rule. Frank -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Jan 23 21:18:33 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 23 Jan 2012 21:18:33 +0100 Subject: [Python-Dev] exception chaining In-Reply-To: <3289A837-2555-4C5F-8F3D-E1BBF7889B6C@langa.pl> References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> <3289A837-2555-4C5F-8F3D-E1BBF7889B6C@langa.pl> Message-ID: Am 23.01.2012 15:49, schrieb ?ukasz Langa: [graphics] > Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! > Please consider the environment before printing out this e-mail. Oh please?! Georg From v+python at g.nevcal.com Mon Jan 23 21:15:36 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 23 Jan 2012 12:15:36 -0800 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1DADD1.4070907@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> <4F1DA627.2020407@g.nevcal.com> <4F1DADD1.4070907@sievertsen.de> Message-ID: <4F1DBFE8.4090607@g.nevcal.com> On 1/23/2012 10:58 AM, Frank Sievertsen wrote: > > > On 23.01.2012 19:25, Glenn Linderman wrote: >> So this sounds like SafeDict, but putting it under the covers and >> automatically converting from dict to SafeDict after a collision >> threshold has been reached. Let's call it fallback-dict. >> >> and costs: >> >> * converting the dict from one hash to the other by rehashing all the >> keys. > > That's not exactly what it does, it calls the randomized hash-function > only for those > keys, that that didn't find a free slot after 20 collision. And it > uses this value only for > the further collision resolution. > > So the value of hash() is used for the first 20 slots, > randomized_hash() is used > after that. > > 1st try: slot[i = perturb = hash]; > 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)] > 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)] > .... > 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)] > 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE > 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)] > > This is also why there is no conversion needed. It's a > per-key/per-lookup rule. > > Frank Interesting idea, and I see it would avoid conversions. What happens if the data area also removed from the hash? So you enter 20 colliding keys, then 20 more that get randomized, then delete the 18 of the first 20. Can you still find the second 20 keys? Takes two sets of probes, somehow? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukasz at langa.pl Mon Jan 23 21:32:22 2012 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Mon, 23 Jan 2012 21:32:22 +0100 Subject: [Python-Dev] exception chaining In-Reply-To: References: <4F199FE5.9080005@stoneleaf.us> <4F19C8F8.5000102@stoneleaf.us> <3289A837-2555-4C5F-8F3D-E1BBF7889B6C@langa.pl> Message-ID: <67731F80-935F-40EA-9FA1-7AA9AEA32FCF@langa.pl> Wiadomo?? napisana przez Georg Brandl w dniu 23 sty 2012, o godz. 21:18: > Am 23.01.2012 15:49, schrieb ?ukasz Langa: > > [graphics] >> Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??! >> Please consider the environment before printing out this e-mail. > > Oh please?! Excuse me. Corpo speak! (at least it's short) -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. From mal at egenix.com Mon Jan 23 22:55:47 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 23 Jan 2012 22:55:47 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1D1FF2.4030701@sievertsen.de> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> Message-ID: <4F1DD763.80800@egenix.com> Frank Sievertsen wrote: > Hello, > > I'd still prefer to see a randomized hash()-function (at least for 3.3). > > But to protect against the attacks it would be sufficient to use > randomization for collision resolution in dicts (and sets). > > What if we use a second (randomized) hash-function in case there > are many collisions in ONE lookup. This hash-function is used only > for collision resolution and is not cached. This sounds a lot like what I'm referring to as universal hash function in the discussion on the ticket: http://bugs.python.org/issue13703#msg150724 http://bugs.python.org/issue13703#msg150795 http://bugs.python.org/issue13703#msg151813 However, I don't like the term "random" in there. It's better to make the approach deterministic to avoid issues with not being able to easily reproduce Python application runs for debugging purposes. If you find that the data is manipulated, simply incrementing the universal hash parameter and rehashing the dict with that parameter should be enough to solve the issue (if not, which is highly unlikely, the dict will simply reapply the fix). No randomness needed. BTW: I attached a demo script to the ticket which demonstrates both types of collisions using integers. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 23 2012) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From janzert at janzert.com Mon Jan 23 23:38:47 2012 From: janzert at janzert.com (Janzert) Date: Mon, 23 Jan 2012 17:38:47 -0500 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1DA627.2020407@g.nevcal.com> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> <4F1DA627.2020407@g.nevcal.com> Message-ID: On 1/23/2012 1:25 PM, Glenn Linderman wrote: > On 1/23/2012 12:53 AM, Frank Sievertsen wrote: >> >> What if we use a second (randomized) hash-function in case there >> are many collisions in ONE lookup. This hash-function is used only >> for collision resolution and is not cached. > > So this sounds like SafeDict, but putting it under the covers and > automatically converting from dict to SafeDict after a collision > threshold has been reached. Let's call it fallback-dict. > If you're going to essentially switch data structures dynamically anyway, why not just switch to something that doesn't have n**2 worse case performance? Janzert From frank at sievertsen.de Mon Jan 23 21:43:11 2012 From: frank at sievertsen.de (Frank Sievertsen) Date: Mon, 23 Jan 2012 21:43:11 +0100 Subject: [Python-Dev] Counting collisions for the win In-Reply-To: <4F1DBFE8.4090607@g.nevcal.com> References: <4F193511.5000102@v.loewis.de> <20120120081030.75529cf5@resist.wooz.org> <4F1B8F62.9010503@pearwood.info> <4F1D1FF2.4030701@sievertsen.de> <4F1DA627.2020407@g.nevcal.com> <4F1DADD1.4070907@sievertsen.de> <4F1DBFE8.4090607@g.nevcal.com> Message-ID: <4F1DC65F.9080300@sievertsen.de> > Interesting idea, and I see it would avoid conversions. What happens > if the data area also removed from the hash? So you enter 20 > colliding keys, then 20 more that get randomized, then delete the 18 > of the first 20. Can you still find the second 20 keys? Takes two > sets of probes, somehow? > That's no problem, because the dict doesn't really free a slot, it replaces the values with a dummy-values. These places are later reused for new values or the whole dict is recreated and resized. Frank From brett at python.org Tue Jan 24 16:42:17 2012 From: brett at python.org (Brett Cannon) Date: Tue, 24 Jan 2012 10:42:17 -0500 Subject: [Python-Dev] Sprinting at PyCon US Message-ID: I went ahead and signed us up as usual: https://us.pycon.org/2012/community/sprints/projects/ . I listed myself as the leader, but I will only be at the sprints one full day and whatever part of Tuesday I can fit in before flying out to Toronto (which is probably not much thanks to the timezone difference). So if someone wants to be the official leader who will be there longer feel free to take me off and put yourself in (and you don't need to ask me beforehand). -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 24 19:52:53 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 24 Jan 2012 19:52:53 +0100 Subject: [Python-Dev] devguide: Use -j0 to maximimze parallel execution. In-Reply-To: References: Message-ID: Am 24.01.2012 18:58, schrieb brett.cannon: > http://hg.python.org/devguide/rev/a34e4a6b89dc > changeset: 489:a34e4a6b89dc > user: Brett Cannon > date: Tue Jan 24 12:58:01 2012 -0500 > summary: > Use -j0 to maximimze parallel execution. > > files: > runtests.rst | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > > diff --git a/runtests.rst b/runtests.rst > --- a/runtests.rst > +++ b/runtests.rst > @@ -41,7 +41,7 @@ > If you have a multi-core or multi-CPU machine, you can enable parallel testing > using several Python processes so as to speed up things:: > > - ./python -m test -j2 > + ./python -m test -j0 That only works on 3.3 though... Georg From brett at python.org Tue Jan 24 20:03:35 2012 From: brett at python.org (Brett Cannon) Date: Tue, 24 Jan 2012 14:03:35 -0500 Subject: [Python-Dev] devguide: Use -j0 to maximimze parallel execution. In-Reply-To: References: Message-ID: On Tue, Jan 24, 2012 at 13:52, Georg Brandl wrote: > Am 24.01.2012 18:58, schrieb brett.cannon: > > http://hg.python.org/devguide/rev/a34e4a6b89dc > > changeset: 489:a34e4a6b89dc > > user: Brett Cannon > > date: Tue Jan 24 12:58:01 2012 -0500 > > summary: > > Use -j0 to maximimze parallel execution. > > > > files: > > runtests.rst | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > > > diff --git a/runtests.rst b/runtests.rst > > --- a/runtests.rst > > +++ b/runtests.rst > > @@ -41,7 +41,7 @@ > > If you have a multi-core or multi-CPU machine, you can enable parallel > testing > > using several Python processes so as to speed up things:: > > > > - ./python -m test -j2 > > + ./python -m test -j0 > > That only works on 3.3 though... > Bugger. I will add a note. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexis at notmyidea.org Tue Jan 24 21:54:13 2012 From: alexis at notmyidea.org (=?ISO-8859-1?Q?Alexis_M=E9taireau?=) Date: Tue, 24 Jan 2012 21:54:13 +0100 Subject: [Python-Dev] Packaging and setuptools compatibility Message-ID: <4F1F1A75.1080104@notmyidea.org> Hi folks, I have this in my mind since a long time, but I didn't talked about that on this list, was only writing on distutils@ or another list we had for distutils2 (the fellowship of packaging). AFAIK, we're almost good about packaging in python 3.3, but there is still something that keeps bogging me. What we've done (I worked especially on this bit) is to provide a compatibility layer for the distributions packaged using setuptools/distribute. What it does, basically, is to install things using setuptools or distribute (the one present with the system) and then convert the metadata to the new one described in PEP 345. A few things are not handled yet, regarding setuptools: entrypoints and namespaces. I would like to espeicially talk about entrypoints here. Entrypoints basically are a plugin system. They are storing information in the metadata and then retrieving them when needing them. The problem with this, as everything when trying to get information from metadata is that we need to parse all the metadata for all the installed distributions. (say O(N)). I'm wondering if we should support that (a way to have plugins) in the new packaging thing, or not. If not, this mean we should come with another solution to support this outside of packaging (may be in distribute). If yes, then we should design it, and probably make it a sub-part of packaging. What are your opinions on that? Should we do it or not? and if yes, what's the way to go? -- Alexis From glyph at twistedmatrix.com Tue Jan 24 22:58:52 2012 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 24 Jan 2012 13:58:52 -0800 Subject: [Python-Dev] Packaging and setuptools compatibility In-Reply-To: <4F1F1A75.1080104@notmyidea.org> References: <4F1F1A75.1080104@notmyidea.org> Message-ID: <32B530FD-A091-4EAC-A687-40142436C64A@twistedmatrix.com> On Jan 24, 2012, at 12:54 PM, Alexis M?taireau wrote: > I'm wondering if we should support that (a way to have plugins) in the new packaging thing, or not. If not, this mean we should come with another solution to support this outside of packaging (may be in distribute). If yes, then we should design it, and probably make it a sub-part of packaging. First, my interest: Twisted has its own plugin system. I would like this to continue to work in the future. I do not believe that packaging should support plugins directly. Run-time metadata is not the packaging system's job. However, the packaging system does need to provide some guarantees about how to install and update data at installation (and post-installation time) so that databases of plugin metadata may be kept up to date. Basically, packaging's job is constructing explicitly declared parallels between your development environment and your deployment environment. Some such databases are outside of Python entirely (for example, you might think of /etc/init.d as such a database), so even if you don't care about the future of Twisted's weirdo plugin system, it would be nice for this to be supported. In other words, packaging should have a meta-plugin system: a way for a plugin system to register itself and provide an API for things to install their metadata, and a way to query the packaging module about the way that a Python package is installed so that it can put things near to it in an appropriate way. (Keep in mind that "near to it" may mean in a filesystem directory, or a zip file, or stuffed inside a bundle or executable.) In my design of Twisted's plugin system, we used PEP 302 as this sort of meta-standard, and (modulo certain bugs in easy_install and pip, most of which are apparently getting fixed in pip pretty soon) it worked out reasonably well. The big missing pieces are post-install and post-uninstall hooks. If we had those, translating to "native" packages for Twisted (and for things that use it) could be made totally automatic. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadeem.vawda at gmail.com Wed Jan 25 05:05:19 2012 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Wed, 25 Jan 2012 06:05:19 +0200 Subject: [Python-Dev] Status of Mac buildbots Message-ID: Hi all, I've noticed that most of the Mac buildbots have been offline for a while: * http://www.python.org/dev/buildbot/all/buildslaves/parc-snowleopard-1 * http://www.python.org/dev/buildbot/all/buildslaves/parc-tiger-1 * http://www.python.org/dev/buildbot/all/buildslaves/parc-leopard-1 Does anyone know what the status of these bots is? Are they permanently down, or just temporarily inaccessible? Cheers, Nadeem From greg at krypto.org Wed Jan 25 06:24:31 2012 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 24 Jan 2012 21:24:31 -0800 Subject: [Python-Dev] Counting collisions w/ no need for a fatal exception Message-ID: On Sun, Jan 22, 2012 at 10:41 PM, Tim Delaney wrote: > On 23 January 2012 16:49, Lennart Regebro wrote: >> >> On Mon, Jan 23, 2012 at 06:02, Paul McMillan wrote: >> >> We may use a different salt per dictionary. >> > >> > If we're willing to re-hash everything on a per-dictionary basis. That >> > doesn't seem reasonable given our existing usage. >> >> Well, if we get crazy amounts of collisions, re-hashing with a new >> salt to get rid of those collisions seems quite reasonable to me... > > > Actually, this looks like it has the seed of a solution in it. I haven't > scrutinised the following beyond "it sounds like it could work" - it could > well contain nasty flaws. > > Assumption: We only get an excessive number of collisions during an attack > (directly or indirectly). > Assumption: Introducing a salt into hashes will change those hashes > sufficiently to mitigate the attack (all discussion of randomising hashes > makes this assumption). > > 1. Keep the current hashing (for all dictionaries) i.e. just using > hash(key). > > 2. Count collisions. > > 3. If any key hits X collisions change that dictionary to use a random salt > for hashes (at least for str and unicode keys). This salt would be > remembered for the dictionary. > > Consequence: The dictionary would need to be rebuilt when an attack was > detected. > Consequence: Hash caching would no longer occur for this dictionary, making > most operations more expensive. > Consequence: Anything relying on the iteration order of a dictionary which > has suffered excessive conflicts would fail. +1 I like this! The dictionary would still be O(n) but the constant cost in front of that just went up. When you are dealing with keys coming in from outside of the process, those are unlikely to already have any hash values so the constant cost at insertion time has really not changed at all because they would need hashing anyways. Their cost at non-iteration lookup time will be a constant factor greater but I do not see that as being a problem given that known keys being looked up in a This approach also allows for the dictionary hashing mode switch to occur after a lower number of collisions than the previous investigations into raising a MemoryError or similar were asking for (because they wanted to avoid false hard failures). It prevents that case from breaking in favor of a brief performance hiccup. I would *combine* this with a per process/interpreter-instance seed in 3.3 and later for added impact (less need for this code path to ever be triggered). For the purposes of backporting as a security fix, that part would be disabled by default but #1-3 would be enabled by default. Question A: Does the dictionary get rebuilt -again- with a new dict-salt if a large number of collisions occurs after a dict-salt has already been established? Question B: Is there a size of dictionary in which we refuse to rebuild & rehash it because it would simply be too costly? obviously if we lack the ram to malloc a new table, when else? ever? Suggestion: Would there be any benefit to making the number of collisions threshold on when to rebuild & rehash a log function of the dictionary's current size rather than a constant for all dicts? > > 4. (Optional) in 3.3, provide a way to get a dictionary with random salt > (i.e. not wait for attack). I don't like #4 as a documented public API as I'm not sure how well that'd play with other VMs (I suppose they could ignore it) but it would be useful for dict implementation testing purposes and easier studying of the behavior. If this is added it should be a method on the dict such as ._set_hash_salt() or something and for testing purposes it would be good to allow a dictionary to be queried to see if they are using their own salt or not (perhaps just ._get_hash_salt() returning non 0 means they are?) -gps From anacrolix at gmail.com Wed Jan 25 06:32:43 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 25 Jan 2012 16:32:43 +1100 Subject: [Python-Dev] io module types Message-ID: Can calls to the C types in the io module be made into module lookups more akin to how it would work were it written in Python? The C implementation for io_open invokes the C type objects for FileIO, and friends, instead of looking them up on the io or _io modules. This makes it difficult to subclass and/or modify the behaviour of those classes from Python. http://hg.python.org/cpython/file/0bec943f6778/Modules/_io/_iomodule.c#l413 From anacrolix at gmail.com Wed Jan 25 08:35:30 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 25 Jan 2012 18:35:30 +1100 Subject: [Python-Dev] Coroutines and PEP 380 In-Reply-To: References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> <4F1B48D0.3060309@canterbury.ac.nz> Message-ID: After much consideration, and playing with PEP380, I've changed my stance on this. Full blown coroutines are the proper way forward. greenlet doesn't cut it because the Python interpreter isn't aware of the context switches. Profiling, debugging and tracebacks are completely broken by this. Stackless would need to be merged, and that's clearly not going to happen. I built a basic scheduler and had a go at "enhancing" the stdlib using PEP380, here are some examples making use of this style: https://bitbucket.org/anacrolix/green380/src/8f7fdc20a8ce/examples After realising it was a dead-end, I read up on Mark's ideas, there's some really good stuff in there: http://www.dcs.gla.ac.uk/~marks/ http://hotpy.blogspot.com/ If someone can explain what's stopping real coroutines being into Python (3.3), that would be great. From matteo at naufraghi.net Wed Jan 25 10:41:20 2012 From: matteo at naufraghi.net (Matteo Bertini) Date: Wed, 25 Jan 2012 10:41:20 +0100 Subject: [Python-Dev] distutils 'depends' management Message-ID: Hello, I've noted that distutils manages depends in a way I cannot understand. Suppose I have a minimal setup.py: from distutils.core import setup, Extension setup( name='foo', version='1.0', ext_modules=[ Extension('foo', sources=['foo.c'], depends=['fop.conf'] # <---- note the typo foo->fop ), ] ) Now setup.py will rebuild all every time, this is because the policy of newer_group in build_extension is to consider 'newer' any missing file. http://bit.ly/build_ext_471 def build_extension(self, ext): ... depends = sources + ext.depends if not (self.force or newer_group(depends, ext_path, 'newer')): logger.debug("skipping '%s' extension (up-to-date)", ext.name) return else: logger.info("building '%s' extension", ext.name) ... Can someone suggest me the reason of this choice instead of missing='error' (at least for ext.depends)? Cheers, Matteo From janssen at parc.com Wed Jan 25 16:35:20 2012 From: janssen at parc.com (Bill Janssen) Date: Wed, 25 Jan 2012 07:35:20 PST Subject: [Python-Dev] Status of Mac buildbots In-Reply-To: References: Message-ID: <67059.1327505720@parc.com> Nadeem Vawda wrote: > Hi all, > > I've noticed that most of the Mac buildbots have been offline for a while: > > * http://www.python.org/dev/buildbot/all/buildslaves/parc-snowleopard-1 > * http://www.python.org/dev/buildbot/all/buildslaves/parc-tiger-1 > * http://www.python.org/dev/buildbot/all/buildslaves/parc-leopard-1 > > Does anyone know what the status of these bots is? Are they > permanently down, or just temporarily inaccessible? We're tinkering with that server room. They should be back by the end of the week. Bill From nadeem.vawda at gmail.com Wed Jan 25 16:57:38 2012 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Wed, 25 Jan 2012 17:57:38 +0200 Subject: [Python-Dev] Status of Mac buildbots In-Reply-To: <67059.1327505720@parc.com> References: <67059.1327505720@parc.com> Message-ID: On Wed, Jan 25, 2012 at 5:35 PM, Bill Janssen wrote: > We're tinkering with that server room. ?They should be back by the end of > the week. OK, cool. Thanks for the info. From pje at telecommunity.com Wed Jan 25 18:28:23 2012 From: pje at telecommunity.com (PJ Eby) Date: Wed, 25 Jan 2012 12:28:23 -0500 Subject: [Python-Dev] Packaging and setuptools compatibility In-Reply-To: <4F1F1A75.1080104@notmyidea.org> References: <4F1F1A75.1080104@notmyidea.org> Message-ID: 2012/1/24 Alexis M?taireau > Entrypoints basically are a plugin system. They are storing information in > the metadata and then retrieving them when needing them. The problem with > this, as everything when trying to get information from metadata is that we > need to parse all the metadata for all the installed distributions. (say > O(N)). > Note that this is why setuptools doesn't put entry points into PKG-INFO, but instead uses separate metadata files. Thus there is a lower "N" as well as smaller files to parse. ;-) Entrypoints are also only one type of extension metadata supported by setuptools; there is for example the EggTranslations system built on setuptools metadata system: it allows plugins to provide translations and localized resources for applications, and for other plugins in the same application. And it does this by using a different metadata file, again stored in the installed project's metadata. Since the new packaging metadata format is still a directory (replacing setuptools' EGG-INFO or .egg-info directories), it seems a reasonable migration path to simply install entry_points.txt and other metadata extensions to that same directory, and provide API to iterate over all the packages that offer a particular metadata file name. Entry points work this way now in setuptools, i.e. they iterate over all eggs containing entry_points metadata, then parse and cache the contents. An API for doing the same sort of thing here seems appropriate. This is still "meta" as Glyph suggests, and allows both setuptools-style entry point plugins, EggTranslations-style plugins, and whatever other sorts of plugin systems people would like. (I believe some other systems exist with this sort of metadata scheme; ISTM that Paster has a metadata format, but I don't know if it's exposed in egg-info metadata like this currently.) Anyway, if you offer an API for finding packages by metadata file (or even just a per-installed-package object API to query the existence of a metadata file), and for process-level caching of extended metadata for installed packages, that is sufficient for the above systems to work, without needing to bless any particular plugin API per se. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Thu Jan 26 10:30:45 2012 From: mark at hotpy.org (Mark Shannon) Date: Thu, 26 Jan 2012 09:30:45 +0000 Subject: [Python-Dev] [Python-ideas] Coroutines and PEP 380 In-Reply-To: References: <4F15F041.6010607@hotpy.org> <20DB36E8-2538-4FE8-9FBF-6B3DA67E3CD6@twistedmatrix.com> <4F168FA5.2000503@hotpy.org> <7F3B6F9E-A901-4FA5-939E-CDD7B1E6E5B5@twistedmatrix.com> <4F188DFD.6080401@canterbury.ac.nz> <4F1B48D0.3060309@canterbury.ac.nz> Message-ID: <4F211D45.6000504@hotpy.org> Nick Coghlan wrote: > (redirecting to python-ideas - coroutine proposals are nowhere near > mature enough for python-dev) > > On Wed, Jan 25, 2012 at 5:35 PM, Matt Joiner wrote: >> If someone can explain what's stopping real coroutines being into >> Python (3.3), that would be great. > > The general issues with that kind of idea: > - the author hasn't offered the code for inclusion and relicensing > under the PSF license (thus we legally aren't allowed to do it) If by the author you mean me, then of course it can be included. Since it is a fork of CPython and I haven't changed the licence I assumed it already was under the PSF licence. > - complexity > - maintainability Hard to measure, but it adds about 200 lines of code. > - platform support Its all fully portable standard C. > > In the specific case of coroutines, you have the additional hurdle of > convincing people whether or not they're a good idea at all. That may well be the biggest obstacle :) One other obstacle (and this may be a killer) is that it may not be practical to refactor Jython to use coroutines since Jython compiles Python direct to JVM bytecodes and the JVM doesn't support coroutines. Jython should be able to support yield-from much more easily. Cheers, Mark. From brian at python.org Thu Jan 26 21:33:43 2012 From: brian at python.org (Brian Curtin) Date: Thu, 26 Jan 2012 14:33:43 -0600 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: References: <4F15DD85.6000905@v.loewis.de> <4F15E1A1.6090303@v.loewis.de> Message-ID: On Tue, Jan 17, 2012 at 15:11, Brian Curtin wrote: > On Tue, Jan 17, 2012 at 15:01, "Martin v. L?wis" wrote: >>> I previously completed the port at my old company (but could not >>> release it), and I have a good bit of it completed for us at >>> http://hg.python.org/sandbox/vs2010port/. That repo is a little bit >>> behind 'default' but updating it shouldn't pose any problems. >> >> So: do you agree that we switch? Do you volunteer to drive the change? > > I do, and I'll volunteer. Is this considered a new feature that has to be in by the first beta? I'm hoping to have it completed much sooner than that so we can get mileage on it, but is there a cutoff for changing the compiler? From martin at v.loewis.de Thu Jan 26 21:54:31 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Thu, 26 Jan 2012 21:54:31 +0100 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: References: <4F15DD85.6000905@v.loewis.de> <4F15E1A1.6090303@v.loewis.de> Message-ID: <20120126215431.Horde.dSI3OML8999PIb2HJXHnfeA@webmail.df.eu> > Is this considered a new feature that has to be in by the first beta? > I'm hoping to have it completed much sooner than that so we can get > mileage on it, but is there a cutoff for changing the compiler? At some point, I'll start doing this myself if it hasn't been done by then, and I would certainly want the build process adjusted (with all buildbots updated) before beta 1. Regards, Martin From ethan at stoneleaf.us Fri Jan 27 04:19:45 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 26 Jan 2012 19:19:45 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' Message-ID: <4F2217D1.2000700@stoneleaf.us> PEP: XXX Title: Interpreter support for concurrent programming Version: $Revision$ Last-Modified: $Date$ Author: Ethan Furman Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 26-Jan-2012 Python-Version: 3.3 Post-History: Abstract ======== One of the open issues from PEP 3134 is suppressing context: currently there is no way to do it. This PEP proposes one. Motivation ========== There are two basic ways to generate exceptions: 1) Python does it (buggy code, missing resources, ending loops, etc.); and, 2) manually (with a raise statement). When writing libraries, or even just custom classes, it can become necessary to raise exceptions; moreover it can be useful, even necessary, to change from one exception to another. To take an example from my dbf module: try: value = int(value) except Exception: raise DbfError(...) Whatever the original exception was (ValueError, TypeError, or something else) is irrelevant. The exception from this point on is a DbfError, and the original exception is of no value. However, if this exception is printed, we would currently see both. Alternatives ============ Several possibilities have been put forth: - raise as NewException() Reuses the 'as' keyword; can be confusing since we are not really reraising the originating exception - raise NewException() from None Follows existing syntax of explicitly declaring the originating exception - exc = NewException(); exc.__context__ = None; raise exc Very verbose way of the previous method - raise NewException.no_context(...) Make context suppression a class method. All of the above options will require changes to the core. Proposal ======== I proprose going with the second option: raise NewException from None It has the advantage of using the existing pattern of explicitly setting the cause: raise KeyError() from NameError() but because the 'cause' is None the previous context is discarded. There is a patch to this effect attached to Issue6210 (http://bugs.python.org/issue6210). Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From benjamin at python.org Fri Jan 27 04:54:06 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 Jan 2012 22:54:06 -0500 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F2217D1.2000700@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> Message-ID: 2012/1/26 Ethan Furman : > PEP: XXX > Title: Interpreter support for concurrent programming mm? > Version: $Revision$ > Last-Modified: $Date$ > Author: Ethan Furman > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 26-Jan-2012 > Python-Version: 3.3 > Post-History: BTW, I don't really think this needs a PEP. -- Regards, Benjamin From barry at python.org Fri Jan 27 05:16:06 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 26 Jan 2012 23:16:06 -0500 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <20120126231606.3c344532@resist.wooz.org> On Jan 26, 2012, at 10:54 PM, Benjamin Peterson wrote: >2012/1/26 Ethan Furman : >> PEP: XXX >> Title: Interpreter support for concurrent programming > >mm? > >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Ethan Furman >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 26-Jan-2012 >> Python-Version: 3.3 >> Post-History: > >BTW, I don't really think this needs a PEP. I think a PEP is appropriate, but the title is certainly misnamed. -Barry From ethan at stoneleaf.us Fri Jan 27 05:03:46 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 26 Jan 2012 20:03:46 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <4F222222.2070700@stoneleaf.us> Benjamin Peterson wrote: > 2012/1/26 Ethan Furman : >> PEP: XXX >> Title: Interpreter support for concurrent programming > > mm? Oops! > >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Ethan Furman >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 26-Jan-2012 >> Python-Version: 3.3 >> Post-History: > > BTW, I don't really think this needs a PEP. I was surprised, but Nick seems to think it is. If somebody could fix that oopsie, and any others ;) and then commit it (if necessary) I would appreciate it. ~Ethan~ From benjamin at python.org Fri Jan 27 05:40:02 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 26 Jan 2012 23:40:02 -0500 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F222222.2070700@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> <4F222222.2070700@stoneleaf.us> Message-ID: 2012/1/26 Ethan Furman : >> BTW, I don't really think this needs a PEP. Obviously it doesn't hurt. And I see from the issue that the change was not as uncontroversial as I originally thought, so it's likely for the better. -- Regards, Benjamin From ncoghlan at gmail.com Fri Jan 27 06:18:49 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 Jan 2012 15:18:49 +1000 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: On Fri, Jan 27, 2012 at 1:54 PM, Benjamin Peterson wrote: > BTW, I don't really think this needs a PEP. That's largely my influence - the discussion in the relevant tracker item (http://bugs.python.org/issue6210) had covered enough ground that I didn't notice that Ethan's specific proposal *isn't* a syntax change, but is rather just a matter of giving some additional semantics to the "raise X from Y" syntax (some of the other suggestions like "raise as " really were syntax changes). So I've changed my mind to being +1 on the idea and proposed syntax of the draft PEP, but I think there are still some details to be worked through in terms of the detailed semantics. (The approach in Ethan's patch actually *clobbers* the context info when "from None" is used, and I don't believe that's a good idea. My own suggestions in the tracker item aren't very good either, for exactly the same reason) Currently, the raise from syntax is just syntactic sugar for setting __cause__ manually: >>> try: ... 1/0 ... except ZeroDivisionError as ex: ... new_exc = ValueError("Denominator is zero") ... new_exc.__cause__ = ex ... raise new_exc ... Traceback (most recent call last): File "", line 2, in ZeroDivisionError: division by zero The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 6, in ValueError: Denominator is zero The context information isn't lost in that case, the display of it is simply suppressed when an explicit cause is set: >>> try: ... try: ... 1/0 ... except ZeroDivisionError as ex: ... new_exc = ValueError() ... new_exc.__cause__ = ex ... raise new_exc ... except ValueError as ex: ... saved = ex ... >>> saved.__context__ ZeroDivisionError('division by zero',) >>> saved.__cause__ ZeroDivisionError('division by zero',) This behaviour (i.e. preserving the context, but not displaying it by default) is retained when using the dedicated syntax: >>> try: ... try: ... 1/0 ... except ZeroDivisionError as ex: ... raise ValueError() from ex ... except ValueError as ex: ... saved = ex ... >>> saved.__context__ ZeroDivisionError('division by zero',) >>> saved.__cause__ ZeroDivisionError('division by zero',) However, if you try to set the __cause__ to None explicitly, then the display falls back to showing the context: >>> try: ... 1/0 ... except ZeroDivisionError as ex: ... new_exc = ValueError("Denominator is zero") ... new_exc.__cause__ = None ... raise new_exc ... Traceback (most recent call last): File "", line 2, in ZeroDivisionError: division by zero During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 6, in ValueError: Denominator is zero This happens because None is used by the exception display logic to indicate "no specific cause, so report the context if that is set". My proposal would be that instead of using None as the "not set" sentinel value for __cause__, we instead use a dedicated sentinel object (exposed to Python at least as "BaseException().__cause__", but potentially being given its own name somewhere). Then the display logic for exceptions would be changed to be: - if the __cause__ is None, then don't report a cause or exception context at all - if the __cause__ is BaseException().__cause__, report the exception context (from __context__) - otherwise report __cause__ as the specific cause of the raised exception That way we make it easy to emit nicer default tracebacks when replacing exceptions without completely hiding the potentially useful data that can be provided by retaining information in __context__. I've been burnt by too much code that replaces detailed, informative and useful error messages that tell me exactly what is going wrong with bland, useless garbage to be in favour of an approach that doesn't even set the __context__ attribute in the first place. If __context__ is always set regardless, and then __cause__ is used to control whether or not __context__ gets displayed in the standard tracebacks, that's a much more flexible approach. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Fri Jan 27 06:51:35 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 26 Jan 2012 21:51:35 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: On Thu, Jan 26, 2012 at 9:18 PM, Nick Coghlan wrote: > I've been burnt by too much code that replaces detailed, informative > and useful error messages that tell me exactly what is going wrong > with bland, useless garbage to be in favour of an approach that > doesn't even set the __context__ attribute in the first place. Ditto here. > If __context__ is always set regardless, and then __cause__ is used > to control whether or not __context__ gets displayed in the standard > tracebacks, that's a much more flexible approach. Well, but usually all you see is the printed traceback, so it might as well be lost, right? (It gives full control to programmatic handlers, of course, but that's usually not where the problem lies. It's when things go horribly wrong in the hash function and all you see in the traceback is a lousy KeyError. :-) Did you consider to just change the words so users can ignore it more easily? -- --Guido van Rossum (python.org/~guido) From v+python at g.nevcal.com Fri Jan 27 07:47:57 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 26 Jan 2012 22:47:57 -0800 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: References: Message-ID: <4F22489D.7080902@g.nevcal.com> On 1/26/2012 10:25 PM, Gregory P. Smith wrote: > (and on top of all of this I believe we're all settled on having per > interpreter hash randomization_as well_ in 3.3; but this AVL tree > approach is one nice option for a backport to fix the major > vulnerability) If the tree code cures the problem, then randomization just makes debugging harder. I think if it is included in 3.3, it needs to have a switch to turn it on/off (whichever is not default). I'm curious why AVL tree rather than RB tree, simpler implementation? C++ stdlib includes RB tree, though, for even simpler implementation :) Can we have a tree type in 3.3, independent of dict? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Jan 27 09:32:58 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 27 Jan 2012 09:32:58 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <4F22489D.7080902@g.nevcal.com> References: <4F22489D.7080902@g.nevcal.com> Message-ID: Glenn Linderman, 27.01.2012 07:47: > Can we have a tree type in 3.3, independent of dict? I'd be happy to see that happen, but I guess the usual requirements on stdlib extensions would apply here. I.e., someone has to write the code, make sure people actually use it to prove that it's worth being added, make sure it runs in different Python implementations, donate the code to the PSF asking for stdlib addition and agree to maintain it in the future. Such an addition is a totally separate issue from the hash collision attack issue. Stefan From martin at v.loewis.de Fri Jan 27 09:55:07 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Fri, 27 Jan 2012 09:55:07 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <4F22489D.7080902@g.nevcal.com> References: <4F22489D.7080902@g.nevcal.com> Message-ID: <20120127095507.Horde.bNohN0lCcOxPImZrs73XPjA@webmail.df.eu> > I'm curious why AVL tree rather than RB tree, simpler implementation? Somewhat arbitrary. AVL trees have a better performance than RB trees (1.44 log2(N) vs 2 log2(N) in the worst case). Wrt. implementation, I looked around for a trustworthy, reusable, free (as in speech), C-only implementation of both AVL and RB trees. The C++ std::map is out of question as it is C++, and many other free implementations are out of question as they are GPLed and LGPLed. Writing an implementation from scratch for a bugfix release is also out of the question. So I found Ian Piumarta's AVL tree 1.0 from 2006. I trust Ian Piumarta to get it right (plus I reviewed the code a little). There are some API glitches (such as assuming a single comparison function, whereas it would better be rewritten to directly invoke rich comparison, or such as node removal not returning the node that was removed). It gets most API decisions right, in particular wrt. memory management. The license is in the style of the MIT license. If somebody could propose an alternative implementation (e.g. one with an even more liberal license, or with a smaller per-node memory usage), I'd be open to change it. From stefan_ml at behnel.de Fri Jan 27 10:49:08 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 27 Jan 2012 10:49:08 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <20120127095507.Horde.bNohN0lCcOxPImZrs73XPjA@webmail.df.eu> References: <4F22489D.7080902@g.nevcal.com> <20120127095507.Horde.bNohN0lCcOxPImZrs73XPjA@webmail.df.eu> Message-ID: martin at v.loewis.de, 27.01.2012 09:55: > So I found Ian Piumarta's AVL tree 1.0 from 2006. I trust Ian Piumarta > to get it right (plus I reviewed the code a little). There are some > API glitches (such as assuming a single comparison function, whereas > it would better be rewritten to directly invoke rich comparison, or > such as node removal not returning the node that was removed). It > gets most API decisions right, in particular wrt. memory management. > The license is in the style of the MIT license. That sounds ok for internal use, and the implementation really looks short enough to allow the adaptations you propose and generic enough to be generally usable. However, note that my comment on Glenn's question regarding a stdlib addition of a tree type still applies - someone would have to write a suitable CPython wrapper for it as well as a separate pure Python implementation, and then offer both for inclusion and maintenance. I'm not sure it's a good idea to have multiple C tree implementations in CPython, i.e. one for internal use and one for the stdlib. Unless there's a serious interest in maintaining both, that is. After all, writing a Python wrapper for this may not be simpler than the work that went into one of the existing (C)Python tree implementations already. Stefan From martin at v.loewis.de Fri Jan 27 10:59:15 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Fri, 27 Jan 2012 10:59:15 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: References: <4F22489D.7080902@g.nevcal.com> <20120127095507.Horde.bNohN0lCcOxPImZrs73XPjA@webmail.df.eu> Message-ID: <20120127105915.Horde.aaROCElCcOxPInVzyQkH7nA@webmail.df.eu> > However, note that my comment on Glenn's question regarding a stdlib > addition of a tree type still applies I agree with all that. Having a tree-based mapping type in the standard library is a different issue entirely. From eliben at gmail.com Fri Jan 27 14:21:33 2012 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 27 Jan 2012 15:21:33 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package Message-ID: Hello, Following an earlier discussion on python-ideas [1], we would like to propose the following PEP for review. Discussion is welcome. The PEP can also be viewed in HTML form at http://www.python.org/dev/peps/pep-0408/ [1] http://mail.python.org/pipermail/python-ideas/2012-January/013246.html Eli --------------------------- PEP: 408 Title: Standard library __preview__ package Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan , Eli Bendersky Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2012-01-07 Python-Version: 3.3 Post-History: 2012-01-27 Abstract ======== The process of including a new module into the Python standard library is hindered by the API lock-in and promise of backward compatibility implied by a module being formally part of Python. This PEP proposes a transitional state for modules - inclusion in a special ``__preview__`` package for the duration of a minor release (roughly 18 months) prior to full acceptance into the standard library. On one hand, this state provides the module with the benefits of being formally part of the Python distribution. On the other hand, the core development team explicitly states that no promises are made with regards to the module's eventual full inclusion into the standard library, or to the stability of its API, which may change for the next release. Proposal - the __preview__ package ================================== Whenever the Python core development team decides that a new module should be included into the standard library, but isn't entirely sure about whether the module's API is optimal, the module can be placed in a special package named ``__preview__`` for a single minor release. In the next minor release, the module may either be "graduated" into the standard library (and occupy its natural place within its namespace, leaving the ``__preview__`` package), or be rejected and removed entirely from the Python source tree. If the module ends up graduating into the standard library after spending a minor release in ``__preview__``, its API may be changed according to accumulated feedback. The core development team explicitly makes no guarantees about API stability and backward compatibility of modules in ``__preview__``. Entry into the ``__preview__`` package marks the start of a transition of the module into the standard library. It means that the core development team assumes responsibility of the module, similarly to any other module in the standard library. Which modules should go through ``__preview__`` ----------------------------------------------- We expect most modules proposed for addition into the Python standard library to go through a minor release in ``__preview__``. There may, however, be some exceptions, such as modules that use a pre-defined API (for example ``lzma``, which generally follows the API of the existing ``bz2`` module), or modules with an API that has wide acceptance in the Python development community. In any case, modules that are proposed to be added to the standard library, whether via ``__preview__`` or directly, must fulfill the acceptance conditions set by PEP 2. It is important to stress that the aim of of this proposal is not to make the process of adding new modules to the standard library more difficult. On the contrary, it tries to provide a means to add *more* useful libraries. Modules which are obvious candidates for entry can be added as before. Modules which due to uncertainties about the API could be stalled for a long time now have a means to still be distributed with Python, via an incubation period in the ``__preview__`` package. Criteria for "graduation" ------------------------- In principle, most modules in the ``__preview__`` package should eventually graduate to the stable standard library. Some reasons for not graduating are: * The module may prove to be unstable or fragile, without sufficient developer support to maintain it. * A much better alternative module may be found during the preview release Essentially, the decision will be made by the core developers on a per-case basis. The point to emphasize here is that a module's appearance in the ``__preview__`` package in some release does not guarantee it will continue being part of Python in the next release. Example ------- Suppose the ``example`` module is a candidate for inclusion in the standard library, but some Python developers aren't convinced that it presents the best API for the problem it intends to solve. The module can then be added to the ``__preview__`` package in release ``3.X``, importable via:: from __preview__ import example Assuming the module is then promoted to the the standard library proper in release ``3.X+1``, it will be moved to a permanent location in the library:: import example And importing it from ``__preview__`` will no longer work. Rationale ========= Benefits for the core development team -------------------------------------- Currently, the core developers are really reluctant to add new interfaces to the standard library. This is because as soon as they're published in a release, API design mistakes get locked in due to backward compatibility concerns. By gating all major API additions through some kind of a preview mechanism for a full release, we get one full release cycle of community feedback before we lock in the APIs with our standard backward compatibility guarantee. We can also start integrating preview modules with the rest of the standard library early, so long as we make it clear to packagers that the preview modules should not be considered optional. The only difference between preview APIs and the rest of the standard library is that preview APIs are explicitly exempted from the usual backward compatibility guarantees. Essentially, the ``__preview__`` package is intended to lower the risk of locking in minor API design mistakes for extended periods of time. Currently, this concern can block new additions, even when the core development team consensus is that a particular addition is a good idea in principle. Benefits for end users ---------------------- For future end users, the broadest benefit lies in a better "out-of-the-box" experience - rather than being told "oh, the standard library tools for task X are horrible, download this 3rd party library instead", those superior tools are more likely to be just be an import away. For environments where developers are required to conduct due diligence on their upstream dependencies (severely harming the cost-effectiveness of, or even ruling out entirely, much of the material on PyPI), the key benefit lies in ensuring that anything in the ``__preview__`` package is clearly under python-dev's aegis from at least the following perspectives: * Licensing: Redistributed by the PSF under a Contributor Licensing Agreement. * Documentation: The documentation of the module is published and organized via the standard Python documentation tools (i.e. ReST source, output generated with Sphinx and published on http://docs.python.org). * Testing: The module test suites are run on the python.org buildbot fleet and results published via http://www.python.org/dev/buildbot. * Issue management: Bugs and feature requests are handled on http://bugs.python.org * Source control: The master repository for the software is published on http://hg.python.org. Candidates for inclusion into __preview__ ========================================= For Python 3.3, there are a number of clear current candidates: * ``regex`` (http://pypi.python.org/pypi/regex) * ``daemon`` (PEP 3143) * ``ipaddr`` (PEP 3144) Other possible future use cases include: * Improved HTTP modules (e.g. ``requests``) * HTML 5 parsing support (e.g. ``html5lib``) * Improved URL/URI/IRI parsing * A standard image API (PEP 368) * Encapsulation of the import state (PEP 368) * Standard event loop API (PEP 3153) * A binary version of WSGI for Python 3 (e.g. PEP 444) * Generic function support (e.g. ``simplegeneric``) Relationship with PEP 407 ========================= PEP 407 proposes a change to the core Python release cycle to permit interim releases every 6 months (perhaps limited to standard library updates). If such a change to the release cycle is made, the following policy for the ``__preview__`` namespace is suggested: * For long term support releases, the ``__preview__`` namespace would always be empty. * New modules would be accepted into the ``__preview__`` namespace only in interim releases that immediately follow a long term support release. * All modules added will either be migrated to their final location in the standard library or dropped entirely prior to the next long term support release. Rejected alternatives and variations ==================================== Using ``__future__`` -------------------- Python already has a "forward-looking" namespace in the form of the ``__future__`` module, so it's reasonable to ask why that can't be re-used for this new purpose. There are two reasons why doing so not appropriate: 1. The ``__future__`` module is actually linked to a separate compiler directives feature that can actually change the way the Python interpreter compiles a module. We don't want that for the preview package - we just want an ordinary Python package. 2. The ``__future__`` module comes with an express promise that names will be maintained in perpetuity, long after the associated features have become the compiler's default behaviour. Again, this is precisely the opposite of what is intended for the preview package - it is almost certain that all names added to the preview will be removed at some point, most likely due to their being moved to a permanent home in the standard library, but also potentially due to their being reverted to third party package status (if community feedback suggests the proposed addition is irredeemably broken). Versioning the package ---------------------- One proposed alternative [1]_ was to add explicit versioning to the ``__preview__`` package, i.e. ``__preview34__``. We think that it's better to simply define that a module being in ``__preview__`` in Python 3.X will either graduate to the normal standard library namespace in Python 3.X+1 or will disappear from the Python source tree altogether. Versioning the ``_preview__`` package complicates the process and does not align well with the main intent of this proposal. Using a package name without leading and trailing underscores ------------------------------------------------------------- It was proposed [1]_ to use a package name like ``preview`` or ``exp``, instead of ``__preview__``. This was rejected in the discussion due to the special meaning a "dunder" package name (that is, a name *with* leading and trailing double-underscores) conveys in Python. Besides, a non-dunder name would suggest normal standard library API stability guarantees, which is not the intention of the ``__preview__`` package. Preserving pickle compatibility ------------------------------- A pickled class instance based on a module in ``__preview__`` in release 3.X won't be unpickle-able in release 3.X+1, where the module won't be in ``__preview__``. Special code may be added to make this work, but this goes against the intent of this proposal, since it implies backward compatibility. Therefore, this PEP does not propose to preserve pickle compatibility. Credits ======= Dj Gilcrease initially proposed the idea of having a ``__preview__`` package in Python [2]_. Although his original proposal uses the name ``__experimental__``, we feel that ``__preview__`` conveys the meaning of this package in a better way. References ========== .. [#] Discussed in this thread: http://mail.python.org/pipermail/python-ideas/2012-January/013246.html .. [#] http://mail.python.org/pipermail/python-ideas/2011-August/011278.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From anacrolix at gmail.com Fri Jan 27 14:48:06 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 28 Jan 2012 00:48:06 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: +0. I think the idea is right, and will help to get good quality modules in at a faster rate. However it is compensating for a lack of interface and packaging standardization in the 3rd party module world. From phil at freehackers.org Fri Jan 27 15:37:08 2012 From: phil at freehackers.org (Philippe Fremy) Date: Fri, 27 Jan 2012 15:37:08 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <4F22B694.6060909@freehackers.org> Hi, A small comment from a user perspective. Since a package in preview is strongly linked to a given version of Python, any program taking advantage of it becomes strongly specific to a given version of Python. Such programs will of course break for any upgrade or downgrade of python version. To make the reason for the breakage more explicit, I believe that the PEP should provide examples of correct versionned usage of the module. Something along the lines of : if sys.version_info[:2] == (3, X): from __preview__ import example else: raise ImportError( 'Package example is only available as preview in Python version 3.X. Please check the documentation of your version of Python to see if and how you can get the package example.' ) cheers, Philippe From solipsis at pitrou.net Fri Jan 27 16:09:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Jan 2012 16:09:34 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: Message-ID: <20120127160934.2ad5e0bf@pitrou.net> On Fri, 27 Jan 2012 15:21:33 +0200 Eli Bendersky wrote: > > Following an earlier discussion on python-ideas [1], we would like to > propose the following PEP for review. Discussion is welcome. The PEP > can also be viewed in HTML form at > http://www.python.org/dev/peps/pep-0408/ A big +1 from me. > Assuming the module is then promoted to the the standard library proper in > release ``3.X+1``, it will be moved to a permanent location in the library:: > > import example > > And importing it from ``__preview__`` will no longer work. Why not leave it accessible through __preview__ too? > Benefits for the core development team > -------------------------------------- > > Currently, the core developers are really reluctant to add new interfaces to > the standard library. A nit, but I think "reluctant" is enough and "really" makes the tone very defensive :) > Relationship with PEP 407 > ========================= > > PEP 407 proposes a change to the core Python release cycle to permit interim > releases every 6 months (perhaps limited to standard library updates). If > such a change to the release cycle is made, the following policy for the > ``__preview__`` namespace is suggested: > > * For long term support releases, the ``__preview__`` namespace would always > be empty. > * New modules would be accepted into the ``__preview__`` namespace only in > interim releases that immediately follow a long term support release. Well this is all speculative (due to the status of PEP 407) but I think a simpler approach of having a __preview__ namespace in all releases (including LTS) would be easier to handler for both us and our users. People can refrain from using anything in __preview__ if that's what they prefer. The naming and the double underscores make it quite recognizable at the top of a source file :-) > Preserving pickle compatibility > ------------------------------- > > A pickled class instance based on a module in ``__preview__`` in release 3.X > won't be unpickle-able in release 3.X+1, where the module won't be in > ``__preview__``. Special code may be added to make this work, but this goes > against the intent of this proposal, since it implies backward compatibility. > Therefore, this PEP does not propose to preserve pickle compatibility. Wouldn't it be a good argument to keep __preview__.XXX as an alias? Regards Antoine. From fuzzyman at voidspace.org.uk Fri Jan 27 16:25:28 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 27 Jan 2012 15:25:28 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F22B694.6060909@freehackers.org> References: <4F22B694.6060909@freehackers.org> Message-ID: <4F22C1E8.6090500@voidspace.org.uk> On 27/01/2012 14:37, Philippe Fremy wrote: > Hi, > > A small comment from a user perspective. > > Since a package in preview is strongly linked to a given version of > Python, any program taking advantage of it becomes strongly specific to > a given version of Python. > > Such programs will of course break for any upgrade or downgrade of > python version. To make the reason for the breakage more explicit, I > believe that the PEP should provide examples of correct versionned usage > of the module. > > Something along the lines of : > > if sys.version_info[:2] == (3, X): > from __preview__ import example > else: > raise ImportError( 'Package example is only available as preview in > Python version 3.X. Please check the documentation of your version of > Python to see if and how you can get the package example.' ) A more normal incantation, as is often the way for packages that became parts of the standard library after first being a third party library (sometimes under a different name, e.g. simplejson -> json): try: from __preview__ import thing except ImportError: import thing So no need to target a very specific version of Python. Michael > > cheers, > > Philippe > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Fri Jan 27 16:27:36 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 27 Jan 2012 15:27:36 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127160934.2ad5e0bf@pitrou.net> References: <20120127160934.2ad5e0bf@pitrou.net> Message-ID: <4F22C268.40005@voidspace.org.uk> On 27/01/2012 15:09, Antoine Pitrou wrote: > On Fri, 27 Jan 2012 15:21:33 +0200 > Eli Bendersky wrote: >> Following an earlier discussion on python-ideas [1], we would like to >> propose the following PEP for review. Discussion is welcome. The PEP >> can also be viewed in HTML form at >> http://www.python.org/dev/peps/pep-0408/ > A big +1 from me. > >> Assuming the module is then promoted to the the standard library proper in >> release ``3.X+1``, it will be moved to a permanent location in the library:: >> >> import example >> >> And importing it from ``__preview__`` will no longer work. > Why not leave it accessible through __preview__ too? +1 The point about pickling is one good reason, minimising code breakage (due to package name changing) is another. Michael > >> Benefits for the core development team >> -------------------------------------- >> >> Currently, the core developers are really reluctant to add new interfaces to >> the standard library. > A nit, but I think "reluctant" is enough and "really" makes the > tone very defensive :) > >> Relationship with PEP 407 >> ========================= >> >> PEP 407 proposes a change to the core Python release cycle to permit interim >> releases every 6 months (perhaps limited to standard library updates). If >> such a change to the release cycle is made, the following policy for the >> ``__preview__`` namespace is suggested: >> >> * For long term support releases, the ``__preview__`` namespace would always >> be empty. >> * New modules would be accepted into the ``__preview__`` namespace only in >> interim releases that immediately follow a long term support release. > Well this is all speculative (due to the status of PEP 407) but I think > a simpler approach of having a __preview__ namespace in all releases > (including LTS) would be easier to handler for both us and our users. > People can refrain from using anything in __preview__ if that's what > they prefer. The naming and the double underscores make it quite > recognizable at the top of a source file :-) > >> Preserving pickle compatibility >> ------------------------------- >> >> A pickled class instance based on a module in ``__preview__`` in release 3.X >> won't be unpickle-able in release 3.X+1, where the module won't be in >> ``__preview__``. Special code may be added to make this work, but this goes >> against the intent of this proposal, since it implies backward compatibility. >> Therefore, this PEP does not propose to preserve pickle compatibility. > Wouldn't it be a good argument to keep __preview__.XXX as an alias? > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From benjamin at python.org Fri Jan 27 16:34:59 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 10:34:59 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: 2012/1/27 Eli Bendersky : > Criteria for "graduation" > ------------------------- I think you also need "Criteria for being placed in __preview__". Do we just toss everything someone suggests in? -- Regards, Benjamin From anacrolix at gmail.com Fri Jan 27 16:35:53 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 28 Jan 2012 02:35:53 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F22C1E8.6090500@voidspace.org.uk> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> Message-ID: > A more normal incantation, as is often the way for packages that became > parts of the standard library after first being a third party library > (sometimes under a different name, e.g. simplejson -> json): > > try: > ? ?from __preview__ import thing > except ImportError: > ? ?import thing > > So no need to target a very specific version of Python. I think this is suboptimal, having to guess where modules are located, you end up with this in every module: try: import cjson as json except ImportError: try: import simplejson as json except ImportError: import json as json Perhaps the versioned import stuff could be implemented (whatever the syntax may be), in order that something like this can be done instead: import regex('__preview__') import regex('3.4') Where clearly the __preview__ version makes no guarantees about interface or implementation whatsoever. etc. From fuzzyman at voidspace.org.uk Fri Jan 27 16:37:44 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 27 Jan 2012 15:37:44 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <4F22C4C8.5090500@voidspace.org.uk> On 27/01/2012 15:34, Benjamin Peterson wrote: > 2012/1/27 Eli Bendersky: >> Criteria for "graduation" >> ------------------------- > I think you also need "Criteria for being placed in __preview__". Do > we just toss everything someone suggests in? > > And given that permanently deleting something from __preview__ would be a big deal (deciding it didn't make the grade and should never graduate), the criteria shouldn't be much less strict than for adopting a package into the standard library. i.e. once something gets into __preview__ people are going to assume it will graduate at some point - __preview__ is a place for apis to stabilise and mature, not a place for dubious libraries that we may or may not want in the standard library at some point. Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Fri Jan 27 16:39:44 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 27 Jan 2012 15:39:44 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> Message-ID: <4F22C540.3030007@voidspace.org.uk> On 27/01/2012 15:35, Matt Joiner wrote: >> A more normal incantation, as is often the way for packages that became >> parts of the standard library after first being a third party library >> (sometimes under a different name, e.g. simplejson -> json): >> >> try: >> from __preview__ import thing >> except ImportError: >> import thing >> >> So no need to target a very specific version of Python. > I think this is suboptimal, having to guess where modules are located, > you end up with this in every module: > > try: > import cjson as json > except ImportError: > try: > import simplejson as json > except ImportError: > import json as json It's trivial to wrap in a function though - or do the import in one place and then import the package from there. Michael > Perhaps the versioned import stuff could be implemented (whatever the > syntax may be), in order that something like this can be done instead: > > import regex('__preview__') > import regex('3.4') > > Where clearly the __preview__ version makes no guarantees about > interface or implementation whatsoever. > > etc. > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From benjamin at python.org Fri Jan 27 16:42:51 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 10:42:51 -0500 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F2217D1.2000700@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> Message-ID: 2012/1/26 Ethan Furman : > PEP: XXX Congratulations, you are now PEP 409. -- Regards, Benjamin From phil at freehackers.org Fri Jan 27 17:09:08 2012 From: phil at freehackers.org (Philippe Fremy) Date: Fri, 27 Jan 2012 17:09:08 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F22C1E8.6090500@voidspace.org.uk> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> Message-ID: <4F22CC24.1080802@freehackers.org> On 27/01/2012 16:25, Michael Foord wrote: > On 27/01/2012 14:37, Philippe Fremy wrote: >> Hi, >> >> A small comment from a user perspective. >> >> Since a package in preview is strongly linked to a given version of >> Python, any program taking advantage of it becomes strongly specific to >> a given version of Python. >> >> Such programs will of course break for any upgrade or downgrade of >> python version. To make the reason for the breakage more explicit, I >> believe that the PEP should provide examples of correct versionned usage >> of the module. >> >> Something along the lines of : >> >> if sys.version_info[:2] == (3, X): >> from __preview__ import example >> else: >> raise ImportError( 'Package example is only available as preview in >> Python version 3.X. Please check the documentation of your version of >> Python to see if and how you can get the package example.' ) > > A more normal incantation, as is often the way for packages that became > parts of the standard library after first being a third party library > (sometimes under a different name, e.g. simplejson -> json): > > try: > from __preview__ import thing > except ImportError: > import thing > > So no need to target a very specific version of Python. > According to the PEP, the interface may change betweeen __preview__ and final inclusion in stdlib. It would be unwise as a developer to assume that a program written for the preview version will work correctly in the stdlib version, wouldn't it ? I would use your "normal" incantation only after checking that no significant API change have occured after stdlib integration. By the way, if as Antoine suggests, the package remain available in __preview__ even after it's accepted in the stdlib, how is the user supposed to deal with possible API changes ? cheers, Philippe From solipsis at pitrou.net Fri Jan 27 17:39:50 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Jan 2012 17:39:50 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F22CC24.1080802@freehackers.org> Message-ID: <20120127173950.58ca9a80@pitrou.net> Hello Philippe, On Fri, 27 Jan 2012 17:09:08 +0100 Philippe Fremy wrote: > > According to the PEP, the interface may change betweeen __preview__ and > final inclusion in stdlib. It would be unwise as a developer to assume > that a program written for the preview version will work correctly in > the stdlib version, wouldn't it ? > > I would use your "normal" incantation only after checking that no > significant API change have occured after stdlib integration. > > By the way, if as Antoine suggests, the package remain available in > __preview__ even after it's accepted in the stdlib, how is the user > supposed to deal with possible API changes ? The API *may* change but it would probably not change much anyway. Consider e.g. the "regex" module: it aims at compatibility with the standard "re" module; there may be additional APIs (e.g. new flags), but whoever uses it with the standard "re" API would not see any difference between the __preview__ version and the final version. cheers Antoine. From eliben at gmail.com Fri Jan 27 17:44:02 2012 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 27 Jan 2012 18:44:02 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127160934.2ad5e0bf@pitrou.net> References: <20120127160934.2ad5e0bf@pitrou.net> Message-ID: >> Assuming the module is then promoted to the the standard library proper in >> release ``3.X+1``, it will be moved to a permanent location in the library:: >> >> ? ? import example >> >> And importing it from ``__preview__`` will no longer work. > > Why not leave it accessible through __preview__ too? I guess there's no real problem with leaving it accessible, as long as it's clear that the API may have changed between releases. I.e. when a package "graduates" and is also left accessible through __preview__, it should obviously be just a pointer to the same package, so if the API changed, code that imported it from __preview__ in a previous release may stop working. > >> Benefits for the core development team >> -------------------------------------- >> >> Currently, the core developers are really reluctant to add new interfaces to >> the standard library. > > A nit, but I think "reluctant" is enough and "really" makes the > tone very defensive :) Agreed, I will change this > >> Relationship with PEP 407 >> ========================= >> >> PEP 407 proposes a change to the core Python release cycle to permit interim >> releases every 6 months (perhaps limited to standard library updates). If >> such a change to the release cycle is made, the following policy for the >> ``__preview__`` namespace is suggested: >> >> * For long term support releases, the ``__preview__`` namespace would always >> ? be empty. >> * New modules would be accepted into the ``__preview__`` namespace only in >> ? interim releases that immediately follow a long term support release. > > Well this is all speculative (due to the status of PEP 407) but I think > a simpler approach of having a __preview__ namespace in all releases > (including LTS) would be easier to handler for both us and our users. > People can refrain from using anything in __preview__ if that's what > they prefer. The naming and the double underscores make it quite > recognizable at the top of a source file :-) I agree that it's speculative, and would recommend to decouple the two PEPs. They surely can live on their own and aren't tied. If PEP 407 gets accepted, this section can be reworded appropriately. > >> Preserving pickle compatibility >> ------------------------------- >> >> A pickled class instance based on a module in ``__preview__`` in release 3.X >> won't be unpickle-able in release 3.X+1, where the module won't be in >> ``__preview__``. ?Special code may be added to make this work, but this goes >> against the intent of this proposal, since it implies backward compatibility. >> Therefore, this PEP does not propose to preserve pickle compatibility. > > Wouldn't it be a good argument to keep __preview__.XXX as an alias? Good point. Eli From eliben at gmail.com Fri Jan 27 17:45:27 2012 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 27 Jan 2012 18:45:27 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F22C1E8.6090500@voidspace.org.uk> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> Message-ID: >> Something along the lines of : >> >> if sys.version_info[:2] == (3, X): >> ? ? ? ?from __preview__ import example >> else: >> ? ? ? ?raise ImportError( 'Package example is only available as preview in >> Python version 3.X. Please check the documentation of your version of >> Python to see if and how you can get the package example.' ) > > > A more normal incantation, as is often the way for packages that became > parts of the standard library after first being a third party library > (sometimes under a different name, e.g. simplejson -> json): > > try: > ? ?from __preview__ import thing > except ImportError: > ? ?import thing > > So no need to target a very specific version of Python. > Yep, this is what I had in mind. And it appeared too trivial to place it in the PEP. Eli From eliben at gmail.com Fri Jan 27 17:47:05 2012 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 27 Jan 2012 18:47:05 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Fri, Jan 27, 2012 at 17:34, Benjamin Peterson wrote: > 2012/1/27 Eli Bendersky : >> Criteria for "graduation" >> ------------------------- > > I think you also need "Criteria for being placed in __preview__". Do > we just toss everything someone suggests in? > I hoped to have this covered by: "In any case, modules that are proposed to be added to the standard library, whether via __preview__ or directly, must fulfill the acceptance conditions set by PEP 2." PEP 2 is quite detailed and I saw no need to repeat large chunks of it here. The idea is that all the same restrictions and caveats apply. The thing that goes away is promise for future API stability. Eli From status at bugs.python.org Fri Jan 27 18:07:35 2012 From: status at bugs.python.org (Python tracker) Date: Fri, 27 Jan 2012 18:07:35 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20120127170735.3DDAC1DE27@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2012-01-20 - 2012-01-27) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3234 (+25) closed 22437 (+32) total 25671 (+57) Open issues with patches: 1391 Issues opened (44) ================== #6631: Disallow relative files paths in urllib*.open() http://bugs.python.org/issue6631 reopened by amaury.forgeotdarc #13829: exception error in _scproxy.so http://bugs.python.org/issue13829 reopened by ned.deily #13836: Define key failed http://bugs.python.org/issue13836 opened by olivier57 #13837: test_shutil fails with symlinks enabled under Windows http://bugs.python.org/issue13837 opened by pitrou #13839: -m pstats should combine all the profiles given as arguments http://bugs.python.org/issue13839 opened by anacrolix #13841: multiprocessing should use sys.exit() where possible http://bugs.python.org/issue13841 opened by brandj #13842: Cannot pickle Ellipsis or NotImplemented http://bugs.python.org/issue13842 opened by James.Sanders #13843: Python doesn't compile anymore on our Solaris buildbot: undefi http://bugs.python.org/issue13843 opened by haypo #13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on http://bugs.python.org/issue13845 opened by haypo #13846: Add time.monotonic() function http://bugs.python.org/issue13846 opened by haypo #13847: Catch time(), ftime(), localtime() and clock() errors http://bugs.python.org/issue13847 opened by haypo #13848: io.open() doesn't check for embedded NUL characters http://bugs.python.org/issue13848 opened by pitrou #13849: Add tests for NUL checking in certain strs http://bugs.python.org/issue13849 opened by alex #13850: Summary tables for argparse add_argument options http://bugs.python.org/issue13850 opened by ncoghlan #13851: Packaging distutils2 for Fedora http://bugs.python.org/issue13851 opened by vikash #13854: multiprocessing: SystemExit from child with non-int, non-str a http://bugs.python.org/issue13854 opened by brandj #13855: Add qualname support to types.FunctionType http://bugs.python.org/issue13855 opened by meador.inge #13856: xmlrpc / httplib changes to allow for certificate verification http://bugs.python.org/issue13856 opened by Nathanael.Noblet #13857: Add textwrap.indent() as counterpart to textwrap.dedent() http://bugs.python.org/issue13857 opened by ncoghlan #13860: PyBuffer_FillInfo() return value http://bugs.python.org/issue13860 opened by skrah #13861: test_pydoc failure http://bugs.python.org/issue13861 opened by skrah #13863: import.c sometimes generates incorrect timestamps on Windows + http://bugs.python.org/issue13863 opened by mark.dickinson #13865: distutils documentation says Extension has "optional" argument http://bugs.python.org/issue13865 opened by tebeka #13866: {urllib,urllib.parse}.urlencode should not use quote_plus http://bugs.python.org/issue13866 opened by Stephen.Day #13867: misleading comment in weakrefobject.h http://bugs.python.org/issue13867 opened by Jim.Jewett #13868: Add hyphen doc fix http://bugs.python.org/issue13868 opened by Retro #13869: CFLAGS="-UNDEBUG" build failure http://bugs.python.org/issue13869 opened by skrah #13871: namedtuple does not normalize field names when checking for du http://bugs.python.org/issue13871 opened by Jim.Jewett #13872: socket.detach doesn't mark socket._closed http://bugs.python.org/issue13872 opened by anacrolix #13873: SIGBUS in test_zlib on Debian bigmem buildbot http://bugs.python.org/issue13873 opened by nadeem.vawda #13874: test_faulthandler: read_null test fails with current clang http://bugs.python.org/issue13874 opened by skrah #13875: cmd: no user documentation http://bugs.python.org/issue13875 opened by techtonik #13876: Sporadic failure in test_socket http://bugs.python.org/issue13876 opened by nadeem.vawda #13878: test_sched failures on Windows buildbot http://bugs.python.org/issue13878 opened by nadeem.vawda #13879: Argparse does not support subparser aliases in 2.7 http://bugs.python.org/issue13879 opened by Tim.Willis #13880: pydoc -k throws "AssertionError: distutils has already been pa http://bugs.python.org/issue13880 opened by __KFL__ #13881: Stream encoder for zlib_codec doesn't use the incremental enco http://bugs.python.org/issue13881 opened by amcnabb #13882: Add format argument for time.time(), time.clock(), ... to get http://bugs.python.org/issue13882 opened by haypo #13884: IDLE 2.6.5 Recent Files undocks http://bugs.python.org/issue13884 opened by mcgrete #13886: readline-related test_builtin failure http://bugs.python.org/issue13886 opened by nadeem.vawda #13888: test_builtin failure when run after test_tk http://bugs.python.org/issue13888 opened by nadeem.vawda #13889: str(float) and round(float) issues with FPU precision http://bugs.python.org/issue13889 opened by samuel.iseli #13890: test_importlib failures under Windows http://bugs.python.org/issue13890 opened by pitrou #1003195: segfault when running smtplib example http://bugs.python.org/issue1003195 reopened by neologix Most recent 15 issues with no replies (15) ========================================== #13890: test_importlib failures under Windows http://bugs.python.org/issue13890 #13889: str(float) and round(float) issues with FPU precision http://bugs.python.org/issue13889 #13888: test_builtin failure when run after test_tk http://bugs.python.org/issue13888 #13881: Stream encoder for zlib_codec doesn't use the incremental enco http://bugs.python.org/issue13881 #13876: Sporadic failure in test_socket http://bugs.python.org/issue13876 #13872: socket.detach doesn't mark socket._closed http://bugs.python.org/issue13872 #13869: CFLAGS="-UNDEBUG" build failure http://bugs.python.org/issue13869 #13868: Add hyphen doc fix http://bugs.python.org/issue13868 #13867: misleading comment in weakrefobject.h http://bugs.python.org/issue13867 #13866: {urllib,urllib.parse}.urlencode should not use quote_plus http://bugs.python.org/issue13866 #13865: distutils documentation says Extension has "optional" argument http://bugs.python.org/issue13865 #13861: test_pydoc failure http://bugs.python.org/issue13861 #13860: PyBuffer_FillInfo() return value http://bugs.python.org/issue13860 #13856: xmlrpc / httplib changes to allow for certificate verification http://bugs.python.org/issue13856 #13855: Add qualname support to types.FunctionType http://bugs.python.org/issue13855 Most recent 15 issues waiting for review (15) ============================================= #13889: str(float) and round(float) issues with FPU precision http://bugs.python.org/issue13889 #13886: readline-related test_builtin failure http://bugs.python.org/issue13886 #13882: Add format argument for time.time(), time.clock(), ... to get http://bugs.python.org/issue13882 #13879: Argparse does not support subparser aliases in 2.7 http://bugs.python.org/issue13879 #13872: socket.detach doesn't mark socket._closed http://bugs.python.org/issue13872 #13868: Add hyphen doc fix http://bugs.python.org/issue13868 #13856: xmlrpc / httplib changes to allow for certificate verification http://bugs.python.org/issue13856 #13848: io.open() doesn't check for embedded NUL characters http://bugs.python.org/issue13848 #13847: Catch time(), ftime(), localtime() and clock() errors http://bugs.python.org/issue13847 #13846: Add time.monotonic() function http://bugs.python.org/issue13846 #13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on http://bugs.python.org/issue13845 #13842: Cannot pickle Ellipsis or NotImplemented http://bugs.python.org/issue13842 #13839: -m pstats should combine all the profiles given as arguments http://bugs.python.org/issue13839 #13833: No documentation for PyStructSequence http://bugs.python.org/issue13833 #13817: deadlock in subprocess while running several threads using Pop http://bugs.python.org/issue13817 Top 10 most discussed issues (10) ================================= #13703: Hash collision security issue http://bugs.python.org/issue13703 61 msgs #4966: Improving Lib Doc Sequence Types Section http://bugs.python.org/issue4966 10 msgs #13790: In str.format an incorrect error message for list, tuple, dict http://bugs.python.org/issue13790 9 msgs #11457: os.stat(): add new fields to get timestamps as Decimal objects http://bugs.python.org/issue11457 8 msgs #13850: Summary tables for argparse add_argument options http://bugs.python.org/issue13850 8 msgs #6210: Exception Chaining missing method for suppressing context http://bugs.python.org/issue6210 7 msgs #13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on http://bugs.python.org/issue13845 7 msgs #13847: Catch time(), ftime(), localtime() and clock() errors http://bugs.python.org/issue13847 7 msgs #13849: Add tests for NUL checking in certain strs http://bugs.python.org/issue13849 7 msgs #13609: Add "os.get_terminal_size()" function http://bugs.python.org/issue13609 6 msgs Issues closed (31) ================== #8052: subprocess close_fds behavior should only close open fds http://bugs.python.org/issue8052 closed by gregory.p.smith #11235: Source files with date modifed in 2106 cause OverflowError http://bugs.python.org/issue11235 closed by pitrou #12922: StringIO and seek() http://bugs.python.org/issue12922 closed by pitrou #13071: IDLE accepts, then crashes, on invalid key bindings. http://bugs.python.org/issue13071 closed by terry.reedy #13190: ConfigParser uses wrong newline on Windows http://bugs.python.org/issue13190 closed by lukasz.langa #13435: Copybutton does not hide tracebacks http://bugs.python.org/issue13435 closed by ezio.melotti #13737: bugs.python.org/review's Django settings file DEBUG=True http://bugs.python.org/issue13737 closed by ezio.melotti #13772: listdir() doesn't work with non-trivial symlinks http://bugs.python.org/issue13772 closed by pitrou #13793: hasattr, delattr, getattr fail with unnormalized names http://bugs.python.org/issue13793 closed by benjamin.peterson #13796: use 'text=...' to define the text attribute of and xml.etree.E http://bugs.python.org/issue13796 closed by terry.reedy #13798: Pasting and then running code doesn't work in the IDLE Shell http://bugs.python.org/issue13798 closed by terry.reedy #13804: Python library structure creates hard to read code when using http://bugs.python.org/issue13804 closed by terry.reedy #13812: multiprocessing package doesn't flush stderr on child exceptio http://bugs.python.org/issue13812 closed by pitrou #13816: Two typos in the docs http://bugs.python.org/issue13816 closed by georg.brandl #13820: 2.6 is no longer in the future http://bugs.python.org/issue13820 closed by terry.reedy #13834: In help(bytes.strip) there is no info about leading ASCII whit http://bugs.python.org/issue13834 closed by georg.brandl #13835: whatsnew/3.3 misspelling/mislink http://bugs.python.org/issue13835 closed by sandro.tosi #13838: In str.format "{0:#.5g}" for decimal.Decimal doesn't print tra http://bugs.python.org/issue13838 closed by eric.smith #13840: create_string_buffer rejects str init_or_size parameter http://bugs.python.org/issue13840 closed by meador.inge #13844: hg.python.org doesn't escape title attributes in annotate view http://bugs.python.org/issue13844 closed by pitrou #13852: Doc fixes with patch http://bugs.python.org/issue13852 closed by georg.brandl #13853: SystemExit/sys.exit() doesn't print boolean argument http://bugs.python.org/issue13853 closed by brett.cannon #13858: readline fails on nonblocking, unbuffered io.FileIO objects http://bugs.python.org/issue13858 closed by neologix #13859: Lingering StandardError in logging module http://bugs.python.org/issue13859 closed by python-dev #13862: test_zlib failure http://bugs.python.org/issue13862 closed by nadeem.vawda #13864: IDLE: Python 2.7.2 refuses to open http://bugs.python.org/issue13864 closed by terry.reedy #13870: Out-of-date comment in collections/__init__.py ordered dict http://bugs.python.org/issue13870 closed by rhettinger #13877: segfault when running smtplib example http://bugs.python.org/issue13877 closed by neologix #13883: PYTHONCASEOK docs mistakenly says it is limited to Windows http://bugs.python.org/issue13883 closed by brett.cannon #13885: CVE-2011-3389: _ssl module always disables the CBC IV attack c http://bugs.python.org/issue13885 closed by pitrou #13887: defaultdict.get does not default to initial default but None http://bugs.python.org/issue13887 closed by python-dev From alex.gaynor at gmail.com Fri Jan 27 18:26:29 2012 From: alex.gaynor at gmail.com (Alex) Date: Fri, 27 Jan 2012 17:26:29 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?PEP_408_--_Standard_library_=5F=5Fpreview?= =?utf-8?q?=5F=5F_package?= References: Message-ID: Eli Bendersky gmail.com> writes: > > Hello, > > Following an earlier discussion on python-ideas [1], we would like to > propose the following PEP for review. Discussion is welcome. The PEP > can also be viewed in HTML form at > http://www.python.org/dev/peps/pep-0408/ > > [1] http://mail.python.org/pipermail/python-ideas/2012-January/013246.html > I'm -1 on this, for a pretty simple reason. Something goes into __preview__, instead of it's final destination directly because it needs feedback/possibly changes. However, given the release cycle of the stdlib (~18 months), any feedback it gets can't be seen by actual users until it's too late. Essentially you can only get one round of stdlib. I think a significantly healthier process (in terms of maximizing feedback and getting something into it's best shape) is to let a project evolve naturally on PyPi and in the ecosystem, give feedback to it from an inclusion perspective, and then include it when it becomes ready on it's own merits. The counter argument to this is that putting it in the stdlib gets you signficantly more eyeballs (and hopefully more feedback, therefore), my only response to this is: if it doesn't get eyeballs on PyPi I don't think there's a great enough need to justify it in the stdlib. Alex From ethan at stoneleaf.us Fri Jan 27 18:08:40 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 27 Jan 2012 09:08:40 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <4F22DA18.4050706@stoneleaf.us> Guido van Rossum wrote: > Did you consider to just change the > words so users can ignore it more easily? Yes, that has also been discussed. Speaking for myself, it would be only slightly better. Speaking for everyone that wants context suppression (using Steven D'Aprano's words): chained exceptions expose details to the caller that are irrelevant implementation details. It seems to me that generating the amount of information needed to track down errors is a balancing act between too much and too little; forcing the print of previous context when switching from exception A to exception B feels like too much: at the very least it's extra noise; at the worst it can be confusing to the actual problem. When the library (or custom class) author is catching A, saying "Yes, expected, now let's raise B instead", A is no longer necessary. Also, the programmer is free to *not* use 'from None', leaving the complete traceback in place. ~Ethan~ From v+python at g.nevcal.com Fri Jan 27 19:18:35 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 27 Jan 2012 10:18:35 -0800 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <4F22489D.7080902@g.nevcal.com> References: <4F22489D.7080902@g.nevcal.com> Message-ID: <4F22EA7B.1050903@g.nevcal.com> On 1/26/2012 10:47 PM, Glenn Linderman wrote: > On 1/26/2012 10:25 PM, Gregory P. Smith wrote: >> (and on top of all of this I believe we're all settled on having per >> interpreter hash randomization_as well_ in 3.3; but this AVL tree >> approach is one nice option for a backport to fix the major >> vulnerability) > > If the tree code cures the problem, then randomization just makes > debugging harder. I think if it is included in 3.3, it needs to have > a switch to turn it on/off (whichever is not default). In case it is not clear, I meant randomization should always be able to be switched off. Another issue occurs to me: when a hash with colliding keys (one that has been attacked, and has trees) has a non-string key added, isn't the flattening process likely to have extremely poor performance? Agreed that the common HTML FORM or JSON attack vectors are unlikely to produce anything except string keys, but if an application grabs those, knows that the user keys are all strings, and adds a few more bits of info to the dict for convenience, using other key types, then ... WHAM? Seems a bit unlikely, but I know I've coded things along that line from time to time... I don't recall doing it in Python Web applications... -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Fri Jan 27 20:39:28 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Fri, 27 Jan 2012 20:39:28 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <4F22EA7B.1050903@g.nevcal.com> References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> Message-ID: <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> > Another issue occurs to me: when a hash with colliding keys (one > that has been attacked, and has trees) has a non-string key added, > isn't the flattening process likely to have extremely poor > performance? Correct. "Don't do that, then" I don't consider it mandatory to fix all issues with hash collision. In fact, none of the strategies fixes all issues with hash collisions; even the hash-randomization solutions only deal with string keys, and don't consider collisions on non-string keys. From guido at python.org Fri Jan 27 20:54:31 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jan 2012 11:54:31 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F22DA18.4050706@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> Message-ID: On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman wrote: > Guido van Rossum wrote: >> >> Did you consider to just change the >> words so users can ignore it more easily? > > > Yes, that has also been discussed. > > Speaking for myself, it would be only slightly better. > > Speaking for everyone that wants context suppression (using Steven > D'Aprano's words): ?chained exceptions expose details to the caller that are > irrelevant implementation details. > > It seems to me that generating the amount of information needed to track > down errors is a balancing act between too much and too little; forcing the > print of previous context when switching from exception A to exception B > feels like too much: ?at the very least it's extra noise; at the worst it > can be confusing to the actual problem. ?When the library (or custom class) > author is catching A, saying "Yes, expected, now let's raise B instead", A > is no longer necessary. > > Also, the programmer is free to *not* use 'from None', leaving the complete > traceback in place. Ok, got it. The developer has to explicitly say "raise from None" and that indicates they have really thought about the issue of suppressing too much information and they are okay with it. I dig that. -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Fri Jan 27 20:59:02 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 27 Jan 2012 21:59:02 +0200 Subject: [Python-Dev] Hashing proposal: 64-bit hash Message-ID: As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely. Perhaps the easiest change, avoid 32-bit Python on the vulnerability, will use 64-bit (or more) hash on all platforms. The performance is comparable to the randomization. Keys order depended code will be braked not stronger than when you change the platform or Python feature version. Maybe all the 64 bits used only for strings, and for other objects -- only the lower 32 bits. From benjamin at python.org Fri Jan 27 21:39:44 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 15:39:44 -0500 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: References: Message-ID: 2012/1/27 Serhiy Storchaka : > As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely. Perhaps the easiest change, avoid 32-bit Python on the vulnerability, will use 64-bit (or more) hash on all platforms. The performance is comparable to the randomization. Keys order depended code will be braked not stronger than when you change the platform or Python feature version. Maybe all the 64 bits used only for strings, and for other objects -- only the lower 32 bits. A tempting idea, but binary incompatible. -- Regards, Benjamin From steve at pearwood.info Fri Jan 27 21:43:46 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Jan 2012 07:43:46 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <4F230C82.9060703@pearwood.info> Eli Bendersky wrote: > Hello, > > Following an earlier discussion on python-ideas [1], we would like to > propose the following PEP for review. Discussion is welcome. I think you need to emphasize that modules in __preview__ are NOT expected to have a forward-compatible, stable, API. This is a feature of __preview__, not a bug, and I believe it is the most important feature. I see responses to this PEP that assume that APIs will be stable, and that having a module fail to graduate out of __preview__ should be an extraordinary event. But if this is the case, then why bother with __preview__? It just adds complexity to the process -- if __preview__.spam and spam are expected to be the same, then just spam straight into the std lib and be done with it. This PEP only makes sense if we assume that __preview__.spam and spam *will* be different, even if only in minor ways, and that there might not even be a spam. There should be no expectation that every __preview__ module must graduate, or that every standard library module must go through __preview__. If it is stable and uncontroversial, __preview__ adds nothing to the process. Even when there are candidates for inclusion with relatively stable APIs, like regex, we should *assume* that there will be API differences between __preview__.regex and regex, simply because it is less harmful to expect changes that don't eventuate than to expect stability and be surprised by changes. This, I believe, rules out Antoine's suggestion that modules remain importable from __preview__ even after graduation to a full member of the standard library. We simply can't say have all three of these statements true at the same time: 1) regular standard library modules are expected to be backward compatible 2) __preview__ modules are not expected to be forward compatible 3) __preview__.spam is an alias to regular standard library spam At least one of them has to go. Since both 1) and 2) are powerful features, and 3) is only a convenience, the obvious one to drop is 3). I note that the PEP, as it is currently written, explicitly states that __preview__.spam will be dropped when it graduates to spam. This is a good thing and should not be changed. Keeping __preview__.spam around after graduation is, I believe, actively harmful. It adds complexity to the developer's decision-making process ("Should I import spam from __preview__, or just import spam? What's the difference?"). It gives a dangerous impression that code written for __preview__.spam will still work for spam. We should be discouraging simple-minded recipes like try: import spam except ImportError: from __preview__ import spam spam.foo(a, b, c) since they undermine the vital feature of __preview__ that the signature and even the existence of spam.foo is subject to change. I would go further and suggest that __preview__ be explicitly called __unstable__. If that name is scary, and it frightens some users off, good! The last thing we want is when 3.4 comes around to have dozens of bug reports along the line of "spam.foo() and __preview__.spam.foo() have different function signatures and aren't compatible". Of course they do. That's why __preview__.spam existed in the first place, to allow the API to mature without the expectation that it was already stable. Since __preview__.spam (or, as I would prefer, __unstable__.spam) and spam cannot be treated as drop-in replacements, what is __preview__.spam good for? Without a stable API, __preview__.spam is not suitable for use in production applications that expect to run under multiple versions of the standard library. I think the PEP needs more use-cases on who might use __preview__.spam, and why. These come to my mind: * if you don't care about Python 3.x+1, then there is no reason not to treat Python 3.x's __preview__.spam as stable; * rapid development proof-of-concept software ("build one to throw away") can safely use __preview__.spam, since they are expected to be replaced anyway; * one-use scripts; * use at the interactive interpreter; * any other time where forward-compatibility is not required. I am reminded of the long, often acrimonious arguments that took place on Python-Dev a few years back about the API for the ipaddr library. A lot of the arguments could have been short-circuited if we had said "putting ipaddr into __preview__ does not constitute acceptance of its API". (On the other hand, if __preview__ becomes used in the future for library authors to fob-off criticism for 18 months in the hope it will just be forgotten, then this will be a bad thing.) -- Steven From steve at pearwood.info Fri Jan 27 21:48:55 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Jan 2012 07:48:55 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> Message-ID: <4F230DB7.603@pearwood.info> Eli Bendersky wrote: >> try: >> from __preview__ import thing >> except ImportError: >> import thing >> >> So no need to target a very specific version of Python. >> > > Yep, this is what I had in mind. And it appeared too trivial to place > it in the PEP. Trivial and wrong. Since thing and __preview__.thing may have subtle, or major, API differences, how do you use it? try: result = thing.foo(a, b, c) + thing.bar(x) except AttributeError: # Must be the preview version result = thing.foobar(a, c, b, x) -- Steven From pydev at sievertsen.de Fri Jan 27 22:08:37 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Fri, 27 Jan 2012 22:08:37 +0100 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: References: Message-ID: <4F231255.3050106@sievertsen.de> > As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely. Unfortunately this assumption is not correct. It works very good with 64bit-hashing. It's much harder to create (efficiently) 64-bit hash-collisions. But I managed to do so and created strings with a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even 14 characters would have been enough. You need less than twice as many characters for the same effect as in the 32bit-world. Frank From barry at python.org Fri Jan 27 22:10:51 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 16:10:51 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: Message-ID: <20120127161051.3a47b26c@resist.wooz.org> On Jan 27, 2012, at 05:26 PM, Alex wrote: >I'm -1 on this, for a pretty simple reason. Something goes into __preview__, >instead of it's final destination directly because it needs feedback/possibly >changes. However, given the release cycle of the stdlib (~18 months), any >feedback it gets can't be seen by actual users until it's too >late. Essentially you can only get one round of stdlib. I'm -1 on this as well. It just feels like the completely wrong way to stabilize an API, and I think despite the caveats that are explicit in __preview__, Python will just catch tons of grief from users and haters about API instability anyway, because from a practical standpoint, applications written using __preview__ APIs *will* be less stable. It also won't improve the situation for prospective library developers because they're locked into Python's development cycle anyway. I also think the benefit to users is a false one since it will be much harder to write applications that are portable across Python releases. >I think a significantly healthier process (in terms of maximizing feedback >and getting something into it's best shape) is to let a project evolve >naturally on PyPi and in the ecosystem, give feedback to it from an inclusion >perspective, and then include it when it becomes ready on it's own >merits. The counter argument to this is that putting it in the stdlib gets >you signficantly more eyeballs (and hopefully more feedback, therefore), my >only response to this is: if it doesn't get eyeballs on PyPi I don't think >there's a great enough need to justify it in the stdlib. I agree with everything Alex said here. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Fri Jan 27 22:48:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 27 Jan 2012 22:48:58 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <20120127161051.3a47b26c@resist.wooz.org> Message-ID: <20120127224858.671af059@pitrou.net> On Fri, 27 Jan 2012 16:10:51 -0500 Barry Warsaw wrote: > > I'm -1 on this as well. It just feels like the completely wrong way to > stabilize an API, and I think despite the caveats that are explicit in > __preview__, Python will just catch tons of grief from users and haters about > API instability anyway, because from a practical standpoint, applications > written using __preview__ APIs *will* be less stable. Well, obviously __preview__ is not for the most conservative users. I think the name clearly conveys the idea that you are trying out something which is not in its definitive state, doesn't it? > >I think a significantly healthier process (in terms of maximizing feedback > >and getting something into it's best shape) is to let a project evolve > >naturally on PyPi and in the ecosystem, give feedback to it from an inclusion > >perspective, and then include it when it becomes ready on it's own > >merits. The counter argument to this is that putting it in the stdlib gets > >you signficantly more eyeballs (and hopefully more feedback, therefore), my > >only response to this is: if it doesn't get eyeballs on PyPi I don't think > >there's a great enough need to justify it in the stdlib. > > I agree with everything Alex said here. The idea that being on PyPI is sufficient is nice but flawed (the IPaddr example). PyPI doesn't guarantee any visibility (how many packages are there?). Furthermore, having users is not a guarantee that the API is appropriate, either; it just means that the API is appropriate for *some* users. On the other hand, __preview__ would clearly signal that something is on the verge of being frozen as an official stdlib API, and would prompt people to actively try it. Regards Antoine. From p.f.moore at gmail.com Fri Jan 27 23:02:00 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 27 Jan 2012 22:02:00 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127224858.671af059@pitrou.net> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> Message-ID: On 27 January 2012 21:48, Antoine Pitrou wrote: > Well, obviously __preview__ is not for the most conservative users. I > think the name clearly conveys the idea that you are trying out > something which is not in its definitive state, doesn't it? Agreed. But that in turn implies to me that __preview__.foo should not be maintained as an alias for foo once it gets "promoted". Firstly, because if you're not comfortable with changing your code to make the simple change to remove the __preview__ prefix in the import, then how could you be comfortable with using a module with no compatibility guarantee anyway? (BTW, I assume that the normal incantation would actually be "from __preview__ import foo", as that limits the module name change to the import statement). > The idea that being on PyPI is sufficient is nice but flawed (the > IPaddr example). PyPI doesn't guarantee any visibility (how many > packages are there?). Furthermore, having users is not a guarantee that > the API is appropriate, either; it just means that the API is > appropriate for *some* users. Agreed entirely. We need a way to signal somehow that a module is being seriously considered for stdlib inclusion. That *would* result in more uptake, and hence more testing and feedback. As an example, I would definitely try out MRAB's regex module if it were in __preview__, but even though I keep meaning to, I've never actually got round to bothering to download from PyPI - I end up just using the stdlib re for my one-off scripts. > On the other hand, __preview__ would clearly signal that something is > on the verge of being frozen as an official stdlib API, and would > prompt people to actively try it. Precisely. It's in effect a "last call for feedback", and people should view it that way, in my opinion. Paul. From tjreedy at udel.edu Fri Jan 27 23:40:16 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 27 Jan 2012 17:40:16 -0500 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> Message-ID: On 1/27/2012 2:54 PM, Guido van Rossum wrote: > On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman wrote: >> Guido van Rossum wrote: >>> >>> Did you consider to just change the >>> words so users can ignore it more easily? >> >> >> Yes, that has also been discussed. >> >> Speaking for myself, it would be only slightly better. >> >> Speaking for everyone that wants context suppression (using Steven >> D'Aprano's words): chained exceptions expose details to the caller that are >> irrelevant implementation details. Especially if the users are non-programmer app users. >> It seems to me that generating the amount of information needed to track >> down errors is a balancing act between too much and too little; forcing the >> print of previous context when switching from exception A to exception B >> feels like too much: at the very least it's extra noise; at the worst it >> can be confusing to the actual problem. When the library (or custom class) >> author is catching A, saying "Yes, expected, now let's raise B instead", A >> is no longer necessary. I find double tracebacks to be 'jarring'. If there is a double bug, one in both the try and except blocks, it *should* stand out. If there is just one bug and the developer merely wants to rename it and change the message, it should not. >> >> Also, the programmer is free to *not* use 'from None', leaving the complete >> traceback in place. > > Ok, got it. The developer has to explicitly say "raise > from None" and that indicates they have really thought about the issue > of suppressing too much information and they are okay with it. I dig > that. Now that I have been reminded that 'from x' was already added to raise statements, I am fine with reusing that. I still think it 'sticks out' more than the 'as' version, but when reading code, having (rare) info suppression stick out is not so bad. The PEP does not address the issue of whether the new variation of raise is valid outside of an except block. My memory is that it was not to be and I think it should not be. One advantage of the 'as' form is that it is clear that raising the default as something else is invalid if there is no default. -- Terry Jan Reedy From barry at python.org Fri Jan 27 23:54:14 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 17:54:14 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127224858.671af059@pitrou.net> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> Message-ID: <20120127175414.385567b6@resist.wooz.org> On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote: >On Fri, 27 Jan 2012 16:10:51 -0500 >Barry Warsaw wrote: >> >> I'm -1 on this as well. It just feels like the completely wrong way to >> stabilize an API, and I think despite the caveats that are explicit in >> __preview__, Python will just catch tons of grief from users and haters about >> API instability anyway, because from a practical standpoint, applications >> written using __preview__ APIs *will* be less stable. > >Well, obviously __preview__ is not for the most conservative users. I >think the name clearly conveys the idea that you are trying out >something which is not in its definitive state, doesn't it? Maybe. I could quibble about the name, but let's not bikeshed on that right now. The problem as I see it is that __preview__ will be very tempting to use in production. In fact, its use case is almost predicated on that. (We want you to use it so you can tell us if the API is good.) Once people use it, they will probably ship code that relies on it, and then the pressure will be applied to us to continue to support that API even if a newer, better one gets promoted out of __preview__. I worry that over time, for all practical purposes, there won't be much difference between __preview__ and the stdlib. >> >I think a significantly healthier process (in terms of maximizing feedback >> >and getting something into it's best shape) is to let a project evolve >> >naturally on PyPi and in the ecosystem, give feedback to it from an inclusion >> >perspective, and then include it when it becomes ready on it's own >> >merits. The counter argument to this is that putting it in the stdlib gets >> >you signficantly more eyeballs (and hopefully more feedback, therefore), my >> >only response to this is: if it doesn't get eyeballs on PyPi I don't think >> >there's a great enough need to justify it in the stdlib. >> >> I agree with everything Alex said here. > >The idea that being on PyPI is sufficient is nice but flawed (the >IPaddr example). PyPI doesn't guarantee any visibility (how many >packages are there?). Furthermore, having users is not a guarantee that >the API is appropriate, either; it just means that the API is >appropriate for *some* users. I can't argue with that, it's just that I don't think __preview__ solves that problem. And it seems to me that __preview__ introduces a whole 'nother set of problems on top of that. So taking the IPaddr example further. Would having it in the stdlib, relegated to an explicitly unstable API part of the stdlib, increase eyeballs enough to generate the kind of API feedback we're looking for, without imposing an additional maintenance burden on us? If you were writing an app that used something in __preview__, how would you provide feedback on what parts of the API you'd want to change, *and* how would you adapt your application to use those better APIs once they became available 18 months from now? I think we'll just see folks using the unstable APIs and then complaining when we remove them, even though they *know* *upfront* that these APIs will go away. I'm also nervous about it from an OS vender point of view. Should I reject any applications that import from __preview__? Or do I have to make a commitment to support those APIs longer than Python does because the application that uses it is important to me? I think the OS vendor problem is easier with an application that uses some PyPI package, because I can always make that package available to the application by pulling in the version I care about. It's harder if a newer, incompatible version is released upstream and I want to provide both, but I don't think __preview__ addresses that. A robust, standard approach to versioning of modules would though, and I think would better solve what __preview__ is trying to solve. >On the other hand, __preview__ would clearly signal that something is >on the verge of being frozen as an official stdlib API, and would >prompt people to actively try it. I'm not so sure about that. If I were to actively try it, I'm not sure how much motivation I'd have to rewrite key parts of my code when an incompatible version gets promoted to the un__preview__d stdlib. -Barry From barry at python.org Fri Jan 27 23:56:03 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 17:56:03 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> Message-ID: <20120127175603.04e56eb1@resist.wooz.org> On Jan 27, 2012, at 10:02 PM, Paul Moore wrote: >Agreed entirely. We need a way to signal somehow that a module is >being seriously considered for stdlib inclusion. That *would* result >in more uptake, and hence more testing and feedback. I'm just not convinced that's a message that we can clearly articulate to users of the library. I think most people will see it in the module documentation, just use it, and then complain when it's gone. -Barry From solipsis at pitrou.net Sat Jan 28 00:19:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 00:19:37 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> Message-ID: <20120128001937.22e498a4@pitrou.net> On Fri, 27 Jan 2012 17:54:14 -0500 Barry Warsaw wrote: > On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote: > > >On Fri, 27 Jan 2012 16:10:51 -0500 > >Barry Warsaw wrote: > >> > >> I'm -1 on this as well. It just feels like the completely wrong way to > >> stabilize an API, and I think despite the caveats that are explicit in > >> __preview__, Python will just catch tons of grief from users and haters about > >> API instability anyway, because from a practical standpoint, applications > >> written using __preview__ APIs *will* be less stable. > > > >Well, obviously __preview__ is not for the most conservative users. I > >think the name clearly conveys the idea that you are trying out > >something which is not in its definitive state, doesn't it? > > Maybe. I could quibble about the name, but let's not bikeshed on that > right now. The problem as I see it is that __preview__ will be very tempting > to use in production. In fact, its use case is almost predicated on that. > (We want you to use it so you can tell us if the API is good.) That's my opinion too. But using it in production doesn't mean you lose control on the code and its users. Perhaps you are used to a kind of production where the code gets disseminated all over the GNUniverse :) But for most people "production" means a single server or machine where they have entire control. > If you were writing an app > that used something in __preview__, how would you provide feedback on what > parts of the API you'd want to change, *and* how would you adapt your > application to use those better APIs once they became available 18 months from > now? For the former, the normal channels probably apply (bug tracker or python-dev). For the latter, depending on the API change, catching e.g. AttributeError on module lookup, or TypeError on function call, or explicitly examining the Python version are all plausible choices. Let's take another example: the regex module, where the API is unlikely to change much (since it's meant to be re-compatible), and the main concerns are ease of maintenance, data-wise compatibility with re (rather than API-wise), performance, and the like. > I think we'll just see folks using the unstable APIs and then > complaining when we remove them, even though they *know* *upfront* that these > APIs will go away. Hmm, isn't that a bit pessimistic about our users? > I'm also nervous about it from an OS vender point of view. Should I reject > any applications that import from __preview__? Or do I have to make a > commitment to support those APIs longer than Python does because the > application that uses it is important to me? Well, is the application supported upstream? If yes, then there shouldn't be any additional burden. If no, then you have a complication indeed. > A robust, standard approach to > versioning of modules would though, and I think would better solve what > __preview__ is trying to solve. I don't think versioning can replace API stability. __preview__ is explicitly and visibly special, and that's a protection against us becoming too complacent. > >On the other hand, __preview__ would clearly signal that something is > >on the verge of being frozen as an official stdlib API, and would > >prompt people to actively try it. > > I'm not so sure about that. If I were to actively try it, I'm not sure how > much motivation I'd have to rewrite key parts of my code when an incompatible > version gets promoted to the un__preview__d stdlib. Obviously you would only use a module from __preview__ if the functionality is exciting enough for you (or the cost/benefit ratio is good enough). Regards Antoine. From v+python at g.nevcal.com Sat Jan 28 01:17:28 2012 From: v+python at g.nevcal.com (Glenn Linderman) Date: Fri, 27 Jan 2012 16:17:28 -0800 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> Message-ID: <4F233E98.2040007@g.nevcal.com> On 1/27/2012 11:39 AM, martin at v.loewis.de wrote: > >> Another issue occurs to me: when a hash with colliding keys (one that >> has been attacked, and has trees) has a non-string key added, isn't >> the flattening process likely to have extremely poor performance? > > Correct. Thanks for the clarification. > "Don't do that, then" > > I don't consider it mandatory to fix all issues with hash collision. > In fact, none of the strategies fixes all issues with hash collisions; > even the hash-randomization solutions only deal with string keys, and > don't consider collisions on non-string keys. Which is fine, I just wanted the clarification. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jan 28 01:32:42 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 01:32:42 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> Message-ID: <20120128013242.3334cf79@pitrou.net> > I don't consider it mandatory to fix all issues with hash collision. > In fact, none of the strategies fixes all issues with hash collisions; > even the hash-randomization solutions only deal with string keys, and > don't consider collisions on non-string keys. How so? None of the patches did, but I think it was said several times that other types (int, tuple, float) could also be converted to use randomized hashes. What's more, there isn't any technical difficulty in doing so. And once you have randomized the hashes for these 4 or 5 built-in types, most third-party types follow since the common case of a __hash__ implementation is to call hash() on one or several constituents. Regards Antoine. From steve at pearwood.info Sat Jan 28 01:50:16 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Jan 2012 11:50:16 +1100 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> Message-ID: <4F234648.8070904@pearwood.info> Terry Reedy wrote: > On 1/27/2012 2:54 PM, Guido van Rossum wrote: >> On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman wrote: >>> Guido van Rossum wrote: >>>> >>>> Did you consider to just change the >>>> words so users can ignore it more easily? >>> >>> >>> Yes, that has also been discussed. >>> >>> Speaking for myself, it would be only slightly better. >>> >>> Speaking for everyone that wants context suppression (using Steven >>> D'Aprano's words): chained exceptions expose details to the caller >>> that are >>> irrelevant implementation details. > > Especially if the users are non-programmer app users. Or beginner programmers, e.g. on the python-list and tutor mailing lists. It is hard enough to get beginners to post the entire traceback without making them bigger. The typical newbie posts just the error message, sometimes not even the exception type. What they will make of chained exceptions, I hate to think. > I find double tracebacks to be 'jarring'. If there is a double bug, one > in both the try and except blocks, it *should* stand out. If there is > just one bug and the developer merely wants to rename it and change the > message, it should not. Agreed with all of this. [...] > The PEP does not address the issue of whether the new variation of raise > is valid outside of an except block. My memory is that it was not to be > and I think it should not be. One advantage of the 'as' form is that it > is clear that raising the default as something else is invalid if there > is no default. I think that raise ... from None should be illegal outside an except block. My reasoning is: 1) It ensures that raise from None only occurs when the developer can see the old exception right there, and not "just in case". 2) I can't think of any use-cases for raise from None outside of an except block. 3) When in doubt, start with something more restrictive, because it is easier to loosen the restriction later if it turns out to be too much, than to change our mind and add the restriction afterwards. -- Steven From ethan at stoneleaf.us Sat Jan 28 01:33:21 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 27 Jan 2012 16:33:21 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> Message-ID: <4F234251.8080708@stoneleaf.us> Terry Reedy wrote: > The PEP does not address the issue of whether the new variation of raise > is valid outside of an except block. My memory is that it was not to be > and I think it should not be. One advantage of the 'as' form is that it > is clear that raising the default as something else is invalid if there > is no default. Were you speaking of the original (PEP 3134), or this new one (PEP 409)? Because at this point it is possible to do: raise ValueError from NameError outside a try block. I don't see it as incredibly useful, but I don't know that it's worth making it illegal. So the question is: - should 'raise ... from ...' be legal outside a try block? - should 'raise ... from None' be legal outside a try block? ~Ethan~ From martin at v.loewis.de Sat Jan 28 01:53:40 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 28 Jan 2012 01:53:40 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <20120128013242.3334cf79@pitrou.net> References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> <20120128013242.3334cf79@pitrou.net> Message-ID: <20120128015340.Horde.kDNqQdjz9kRPI0cUH342vmA@webmail.df.eu> > How so? None of the patches did, but I think it was said several times > that other types (int, tuple, float) could also be converted to use > randomized hashes. What's more, there isn't any technical difficulty in > doing so. The challenge again is about incompatibility: the more types you apply this to, the higher the risk of breaking third-party code. Plus you still risk that the hash seed might leak out of the application, opening it up again to the original attack. From ncoghlan at gmail.com Sat Jan 28 02:04:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 11:04:09 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Fri, Jan 27, 2012 at 11:48 PM, Matt Joiner wrote: > +0. I think the idea is right, and will help to get good quality > modules in at a faster rate. However it is compensating for a lack of > interface and packaging standardization in the 3rd party module world. No, it really isn't. virtualenv and pip already work *beautifully*, so long as you're in an environment where: 1. Due diligence isn't a problem 2. Network connectivity isn't a problem 3. You *already know* about virtual environments and the Python Package Index 4. You either don't need dependencies written in C, or the ones you need are written to compile cleanly under distutils and you aren't on Windows (because Microsoft consider building fully functional binaries from source to be an optional extra people should be charged for rather than a fundamental feature of an operating system) It would probably be worth adding a heading specifically countering this myth, though. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 28 02:13:06 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 11:13:06 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: n Sat, Jan 28, 2012 at 3:26 AM, Alex wrote: > I think a significantly healthier process (in terms of maximizing feedback and > getting something into it's best shape) is to let a project evolve naturally on > PyPi and in the ecosystem, give feedback to it from an inclusion perspective, > and then include it when it becomes ready on it's own merits. The counter > argument to ?this is that putting it in the stdlib gets you signficantly more > eyeballs (and hopefully more feedback, therefore), my only response to this is: > if it doesn't get eyeballs on PyPi I don't think there's a great enough need to > justify it in the stdlib. And what about a project like regex, which *has* the eyeballs on PyPI, but the core devs aren't confident enough of its maintainability yet to be happy about adding it directly to the stdlib with full backwards compatibility guarantees? The easy answer for us in that context is to just not add it (i.e. the status quo), which isn't a healthy outcome for the overall language ecosystem. Really, regex is the *reason* this PEP exists: we *know* we need to either replace or seriously enhance "re" (since its Unicode handling isn't up to scratch), but we're only *pretty sure* adding "regex" to the stdlib is the right answer. Adding "__preview__.regex" instead gives us a chance to back out if we uncover serious problems (e.g. with the cross-platform support). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Sat Jan 28 02:13:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 02:13:40 +0100 Subject: [Python-Dev] [issue13703] Hash collision security issue References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> <20120128013242.3334cf79@pitrou.net> <20120128015340.Horde.kDNqQdjz9kRPI0cUH342vmA@webmail.df.eu> Message-ID: <20120128021340.1983ccb9@pitrou.net> On Sat, 28 Jan 2012 01:53:40 +0100 martin at v.loewis.de wrote: > > > How so? None of the patches did, but I think it was said several times > > that other types (int, tuple, float) could also be converted to use > > randomized hashes. What's more, there isn't any technical difficulty in > > doing so. > > The challenge again is about incompatibility: the more types you apply this > to, the higher the risk of breaking third-party code. > > Plus you still risk that the hash seed might leak out of the application, > opening it up again to the original attack. Attacks on the hash seed are a different level of difficulty than sending a well-known universal payload to a Web site. Unless the application leaks hash() values directly, you have to guess them from the dict ordering observed in the application's output. IMHO it's ok if our hash function is vulnerable to cryptanalysts rather than script kiddies. Regards Antoine. From benjamin at python.org Sat Jan 28 02:19:58 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 20:19:58 -0500 Subject: [Python-Dev] plugging the hash attack Message-ID: Hello everyone, In effort to get a fix out before Perl 6 goes mainstream, Barry and I have decided to pronounce on what we want for our stable releases. What we have decided is that 1. Simple hash randomization is the way to go. We think this has the best chance of actually fixing the problem while being fairly straightforward such that we're comfortable putting it in a stable release. 2. It will be off by default in stable releases and enabled by an envar at runtime. This will prevent code breakage from dictionary order changing as well as people depending on the hash stability. -- Regards, Benjamin From ncoghlan at gmail.com Sat Jan 28 02:27:35 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 11:27:35 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F230C82.9060703@pearwood.info> References: <4F230C82.9060703@pearwood.info> Message-ID: On Sat, Jan 28, 2012 at 6:43 AM, Steven D'Aprano wrote: > This PEP only makes sense if we assume that __preview__.spam and spam *will* > be different, even if only in minor ways, and that there might not even be a > spam. There should be no expectation that every __preview__ module must > graduate, or that every standard library module must go through __preview__. > If it is stable and uncontroversial, __preview__ adds nothing to the > process. Yes, the PEP already points to lzma as an example of a module with a sufficiently obvious API that it didn't need to go through a preview round. > Keeping __preview__.spam around after graduation is, I believe, actively > harmful. It adds complexity to the developer's decision-making process > ("Should I import spam from __preview__, or just import spam? What's the > difference?"). It gives a dangerous impression that code written for > __preview__.spam will still work for spam. Yes, this was exactly the reasoning behind removing the names from __preview__ namespace when the modules graduated. It sets a line in the sand: "An API compatibility break is not only allowed, it is 100% guaranteed. If you are not prepared to deal with this, then you are *not* part of the target audience for the __preview__ namespace. Wait until the module reaches the main section of the standard library before you start using it, or else download a third party supported version with backwards compatibility guarantees from PyPI. The __preview__ namespace is not designed for anything that requires long term support spanning multiple Python version - it is intended for use in single version environments, such as intranet web services and student classrooms" > I would go further and suggest that __preview__ be explicitly called > __unstable__. If that name is scary, and it frightens some users off, good! Hmm, the problem with "unstable" is that we only mean the *API* is unstable. The software itself will be as thoroughly tested as everything else we ship. > I think the PEP needs more use-cases on who might use __preview__.spam, and > why. These come to my mind: > > * if you don't care about Python 3.x+1, then there is no reason not to > ?treat Python 3.x's __preview__.spam as stable; > > * rapid development proof-of-concept software ("build one to throw away") > ?can safely use __preview__.spam, since they are expected to be replaced > ?anyway; > > * one-use scripts; > > * use at the interactive interpreter; > > * any other time where forward-compatibility is not required. A specific list of use cases is a good idea. I'd add a couple more: * in a student classroom where the concept of PyPI and third party packages has yet to be introduced * for an intranet web service deployment where due diligence adds significant overhead to any use of third party packages > I am reminded of the long, often acrimonious arguments that took place on > Python-Dev a few years back about the API for the ipaddr library. A lot of > the arguments could have been short-circuited if we had said "putting ipaddr > into __preview__ does not constitute acceptance of its API". Yep, there's a reason 'ipaddr' was high on the list of modules this could be used for :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From s.brunthaler at uci.edu Sat Jan 28 02:28:28 2012 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Fri, 27 Jan 2012 17:28:28 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: Message-ID: Hi, On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson wrote: > 2011/11/8 stefan brunthaler : >> How does that sound? > > I think I can hear real patches and benchmarks most clearly. > I spent the better part of my -20% time on implementing the work as "suggested". Please find the benchmarks attached to this email, I just did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched off the regular 3.3a0 default tip changeset 73977 shortly after your email. I do not have an official patch yet, but am going to create one if wanted. Changes to the existing interpreter are minimal, the biggest chunk is a new interpreter dispatch loop. Merging dispatch loops eliminates some of my optimizations, but my inline caching technique enables inlining some functionality, which results in visible speedups. The code is normalized to the non-threaded-code version of the CPython interpreter (named "vanilla"), so that I can reference it to my preceding results. I anticipate *no* compatibility issues and the interpreter requires less than 100 KiB of extra memory at run-time. Since my interpreter is using 215 of a maximum of 255 instructions, there is room for adding additional derivatives, e.g., for popular Python libraries, too. Let me know what python-dev thinks of this and have a nice weekend, --stefan PS: AFAIR the version without partial stack frame caching also passes all regression tests modulo the ones that test against specific bytecodes. -------------- next part -------------- currently processing: bench/binarytrees.py3.py phd-cpy-3a0-thr-cod-pytho arg: 10 | time: 0.161876 | stdev: 0.007780 | var: 0.000061 | mem: 6633.60 phd-cpy-3a0-thr-cod-pytho arg: 12 | time: 0.699243 | stdev: 0.019112 | var: 0.000365 | mem: 8142.67 phd-cpy-3a0-thr-cod-pytho arg: 14 | time: 3.388344 | stdev: 0.048042 | var: 0.002308 | mem: 13586.93 phd-cpy-pio-sne-pre-pyt-no-psf arg: 10 | time: 0.153875 | stdev: 0.003828 | var: 0.000015 | mem: 6873.73 phd-cpy-pio-sne-pre-pyt-no-psf arg: 12 | time: 0.632572 | stdev: 0.019121 | var: 0.000366 | mem: 8246.27 phd-cpy-pio-sne-pre-pyt-no-psf arg: 14 | time: 3.020988 | stdev: 0.043483 | var: 0.001891 | mem: 13640.27 phd-cpy-pio-sne-pre-pytho arg: 10 | time: 0.150942 | stdev: 0.005157 | var: 0.000027 | mem: 6901.87 phd-cpy-pio-sne-pre-pytho arg: 12 | time: 0.660841 | stdev: 0.020538 | var: 0.000422 | mem: 8286.80 phd-cpy-pio-sne-pre-pytho arg: 14 | time: 3.184198 | stdev: 0.051103 | var: 0.002612 | mem: 13680.40 phd-cpy-3a0-van-pytho arg: 10 | time: 0.202812 | stdev: 0.005480 | var: 0.000030 | mem: 6633.33 phd-cpy-3a0-van-pytho arg: 12 | time: 0.908456 | stdev: 0.015744 | var: 0.000248 | mem: 8153.07 phd-cpy-3a0-van-pytho arg: 14 | time: 4.364805 | stdev: 0.037522 | var: 0.001408 | mem: 13593.60 ### phd-cpy-3a0-thr-cod-pytho : 1.2887 (avg-sum: 1.416488) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.4383 (avg-sum: 1.269145) ### phd-cpy-pio-sne-pre-pytho : 1.3704 (avg-sum: 1.331994) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 1.825358) currently processing: bench/fannkuch.py3.py phd-cpy-3a0-thr-cod-pytho arg: 8 | time: 0.172677 | stdev: 0.006620 | var: 0.000044 | mem: 6424.13 phd-cpy-3a0-thr-cod-pytho arg: 9 | time: 1.426755 | stdev: 0.035545 | var: 0.001263 | mem: 6425.20 phd-cpy-pio-sne-pre-pyt-no-psf arg: 8 | time: 0.168010 | stdev: 0.010277 | var: 0.000106 | mem: 6481.07 phd-cpy-pio-sne-pre-pyt-no-psf arg: 9 | time: 1.345817 | stdev: 0.033127 | var: 0.001097 | mem: 6479.60 phd-cpy-pio-sne-pre-pytho arg: 8 | time: 0.165876 | stdev: 0.007136 | var: 0.000051 | mem: 6520.00 phd-cpy-pio-sne-pre-pytho arg: 9 | time: 1.351150 | stdev: 0.028822 | var: 0.000831 | mem: 6519.73 phd-cpy-3a0-van-pytho arg: 8 | time: 0.216146 | stdev: 0.012879 | var: 0.000166 | mem: 6419.07 phd-cpy-3a0-van-pytho arg: 9 | time: 1.834247 | stdev: 0.028224 | var: 0.000797 | mem: 6418.67 ### phd-cpy-3a0-thr-cod-pytho : 1.2820 (avg-sum: 0.799716) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.3544 (avg-sum: 0.756913) ### phd-cpy-pio-sne-pre-pytho : 1.3516 (avg-sum: 0.758513) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 1.025197) currently processing: bench/fasta.py3.py phd-cpy-3a0-thr-cod-pytho arg: 50000 | time: 0.374023 | stdev: 0.010870 | var: 0.000118 | mem: 6495.07 phd-cpy-3a0-thr-cod-pytho arg: 100000 | time: 0.714577 | stdev: 0.024713 | var: 0.000611 | mem: 6495.47 phd-cpy-3a0-thr-cod-pytho arg: 150000 | time: 1.062866 | stdev: 0.040138 | var: 0.001611 | mem: 6496.27 phd-cpy-pio-sne-pre-pyt-no-psf arg: 50000 | time: 0.345621 | stdev: 0.022549 | var: 0.000508 | mem: 6551.87 phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time: 0.656174 | stdev: 0.031608 | var: 0.000999 | mem: 6551.60 phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time: 0.964326 | stdev: 0.046202 | var: 0.002135 | mem: 6552.13 phd-cpy-pio-sne-pre-pytho arg: 50000 | time: 0.381223 | stdev: 0.015771 | var: 0.000249 | mem: 6592.40 phd-cpy-pio-sne-pre-pytho arg: 100000 | time: 0.739112 | stdev: 0.035685 | var: 0.001273 | mem: 6591.60 phd-cpy-pio-sne-pre-pytho arg: 150000 | time: 1.080334 | stdev: 0.035524 | var: 0.001262 | mem: 6591.73 phd-cpy-3a0-van-pytho arg: 50000 | time: 0.417759 | stdev: 0.016483 | var: 0.000272 | mem: 6490.27 phd-cpy-3a0-van-pytho arg: 100000 | time: 0.788182 | stdev: 0.019665 | var: 0.000387 | mem: 6492.40 phd-cpy-3a0-van-pytho arg: 150000 | time: 1.187140 | stdev: 0.035640 | var: 0.001270 | mem: 6491.73 ### phd-cpy-3a0-thr-cod-pytho : 1.1123 (avg-sum: 0.717155) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.2172 (avg-sum: 0.655374) ### phd-cpy-pio-sne-pre-pytho : 1.0874 (avg-sum: 0.733556) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 0.797694) currently processing: mandelbrot.py phd-cpy-3a0-thr-cod-pytho arg: 200 | time: 0.244281 | stdev: 0.009795 | var: 0.000096 | mem: 6424.13 phd-cpy-3a0-thr-cod-pytho arg: 400 | time: 0.861120 | stdev: 0.019812 | var: 0.000393 | mem: 6501.87 phd-cpy-3a0-thr-cod-pytho arg: 500 | time: 1.338883 | stdev: 0.029741 | var: 0.000885 | mem: 6730.67 phd-cpy-pio-sne-pre-pyt-no-psf arg: 200 | time: 0.220013 | stdev: 0.013307 | var: 0.000177 | mem: 6476.00 phd-cpy-pio-sne-pre-pyt-no-psf arg: 400 | time: 0.789915 | stdev: 0.028319 | var: 0.000802 | mem: 6566.00 phd-cpy-pio-sne-pre-pyt-no-psf arg: 500 | time: 1.180740 | stdev: 0.042762 | var: 0.001829 | mem: 6794.00 phd-cpy-pio-sne-pre-pytho arg: 200 | time: 0.218946 | stdev: 0.014494 | var: 0.000210 | mem: 6519.47 phd-cpy-pio-sne-pre-pytho arg: 400 | time: 0.767381 | stdev: 0.042411 | var: 0.001799 | mem: 6614.67 phd-cpy-pio-sne-pre-pytho arg: 500 | time: 1.162739 | stdev: 0.029852 | var: 0.000891 | mem: 6842.67 phd-cpy-3a0-van-pytho arg: 200 | time: 0.328553 | stdev: 0.009619 | var: 0.000093 | mem: 6419.60 phd-cpy-3a0-van-pytho arg: 400 | time: 1.202208 | stdev: 0.018670 | var: 0.000349 | mem: 6514.27 phd-cpy-3a0-van-pytho arg: 500 | time: 1.860382 | stdev: 0.036647 | var: 0.001343 | mem: 6712.93 ### phd-cpy-3a0-thr-cod-pytho : 1.3874 (avg-sum: 0.814761) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.5480 (avg-sum: 0.730223) ### phd-cpy-pio-sne-pre-pytho : 1.5780 (avg-sum: 0.716355) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 1.130381) currently processing: bench/nbody.py3.py phd-cpy-3a0-thr-cod-pytho arg: 50000 | time: 0.907789 | stdev: 0.021787 | var: 0.000475 | mem: 6668.13 phd-cpy-3a0-thr-cod-pytho arg: 100000 | time: 1.788778 | stdev: 0.042285 | var: 0.001788 | mem: 6674.67 phd-cpy-3a0-thr-cod-pytho arg: 150000 | time: 2.666433 | stdev: 0.062115 | var: 0.003858 | mem: 6663.20 phd-cpy-pio-sne-pre-pyt-no-psf arg: 50000 | time: 0.789515 | stdev: 0.022475 | var: 0.000505 | mem: 6720.00 phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time: 1.525695 | stdev: 0.039957 | var: 0.001597 | mem: 6735.87 phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time: 2.283342 | stdev: 0.071985 | var: 0.005182 | mem: 6730.93 phd-cpy-pio-sne-pre-pytho arg: 50000 | time: 0.789915 | stdev: 0.012848 | var: 0.000165 | mem: 6771.47 phd-cpy-pio-sne-pre-pytho arg: 100000 | time: 1.563297 | stdev: 0.033950 | var: 0.001153 | mem: 6770.00 phd-cpy-pio-sne-pre-pytho arg: 150000 | time: 2.324945 | stdev: 0.050021 | var: 0.002502 | mem: 6768.93 phd-cpy-3a0-van-pytho arg: 50000 | time: 1.167939 | stdev: 0.025035 | var: 0.000627 | mem: 6666.80 phd-cpy-3a0-van-pytho arg: 100000 | time: 2.327478 | stdev: 0.047759 | var: 0.002281 | mem: 6666.93 phd-cpy-3a0-van-pytho arg: 150000 | time: 3.434881 | stdev: 0.066780 | var: 0.004460 | mem: 6666.67 ### phd-cpy-3a0-thr-cod-pytho : 1.2922 (avg-sum: 1.787667) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.5071 (avg-sum: 1.532851) ### phd-cpy-pio-sne-pre-pytho : 1.4814 (avg-sum: 1.559386) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 2.310099) currently processing: bench/spectralnorm.py3.py phd-cpy-3a0-thr-cod-pytho arg: 100 | time: 0.267083 | stdev: 0.010964 | var: 0.000120 | mem: 6548.80 phd-cpy-3a0-thr-cod-pytho arg: 200 | time: 0.970060 | stdev: 0.023750 | var: 0.000564 | mem: 6539.20 phd-cpy-3a0-thr-cod-pytho arg: 300 | time: 2.160668 | stdev: 0.044157 | var: 0.001950 | mem: 6528.93 phd-cpy-pio-sne-pre-pyt-no-psf arg: 100 | time: 0.233081 | stdev: 0.007929 | var: 0.000063 | mem: 6611.87 phd-cpy-pio-sne-pre-pyt-no-psf arg: 200 | time: 0.837918 | stdev: 0.019807 | var: 0.000392 | mem: 6596.80 phd-cpy-pio-sne-pre-pyt-no-psf arg: 300 | time: 1.865183 | stdev: 0.028789 | var: 0.000829 | mem: 6616.40 phd-cpy-pio-sne-pre-pytho arg: 100 | time: 0.241614 | stdev: 0.006662 | var: 0.000044 | mem: 6647.60 phd-cpy-pio-sne-pre-pytho arg: 200 | time: 0.870454 | stdev: 0.017455 | var: 0.000305 | mem: 6646.53 phd-cpy-pio-sne-pre-pytho arg: 300 | time: 1.969456 | stdev: 0.052760 | var: 0.002784 | mem: 6651.33 phd-cpy-3a0-van-pytho arg: 100 | time: 0.355088 | stdev: 0.007057 | var: 0.000050 | mem: 6545.07 phd-cpy-3a0-van-pytho arg: 200 | time: 1.335549 | stdev: 0.021511 | var: 0.000463 | mem: 6555.47 phd-cpy-3a0-van-pytho arg: 300 | time: 3.042990 | stdev: 0.032533 | var: 0.001058 | mem: 6599.87 ### phd-cpy-3a0-thr-cod-pytho : 1.3931 (avg-sum: 1.132603) ### phd-cpy-pio-sne-pre-pyt-no-psf: 1.6122 (avg-sum: 0.978727) ### phd-cpy-pio-sne-pre-pytho : 1.5361 (avg-sum: 1.027175) ### phd-cpy-3a0-van-pytho : 1.0000 (avg-sum: 1.577876) Overall performance: Interpreter: cpython-3.3a0-threaded-code/python : 1.129733 (speedup: 1.3004, counts: 510) Overall performance: Interpreter: cpython-pio-sneak-preview/python-no-psfc : 1.000752 (speedup: 1.4680, counts: 510) Overall performance: Interpreter: cpython-pio-sneak-preview/python : 1.036613 (speedup: 1.4172, counts: 510) Overall performance: Interpreter: cpython-3.3a0-vanilla/python : 1.469095 (speedup: 1.0000, counts: 510) From benjamin at python.org Sat Jan 28 02:31:46 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 20:31:46 -0500 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: Message-ID: 2012/1/27 stefan brunthaler : > Hi, > > On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson wrote: >> 2011/11/8 stefan brunthaler : >>> How does that sound? >> >> I think I can hear real patches and benchmarks most clearly. >> > I spent the better part of my -20% time on implementing the work as > "suggested". Please find the benchmarks attached to this email, I just > did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched > off the regular 3.3a0 default tip changeset 73977 shortly after your > email. I do not have an official patch yet, but am going to create one > if wanted. Changes to the existing interpreter are minimal, the > biggest chunk is a new interpreter dispatch loop. > > Merging dispatch loops eliminates some of my optimizations, but my > inline caching technique enables inlining some functionality, which > results in visible speedups. The code is normalized to the > non-threaded-code version of the CPython interpreter (named > "vanilla"), so that I can reference it to my preceding results. I > anticipate *no* compatibility issues and the interpreter requires less > than 100 KiB of extra memory at run-time. Since my interpreter is > using 215 of a maximum of 255 instructions, there is room for adding > additional derivatives, e.g., for popular Python libraries, too. > > > Let me know what python-dev thinks of this and have a nice weekend, Cool. It'd be nice to see a patch. -- Regards, Benjamin From ncoghlan at gmail.com Sat Jan 28 02:37:37 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 11:37:37 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127175414.385567b6@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> Message-ID: On Sat, Jan 28, 2012 at 8:54 AM, Barry Warsaw wrote: > I think the OS vendor problem is easier with an application that uses some > PyPI package, because I can always make that package available to the > application by pulling in the version I care about. ?It's harder if a newer, > incompatible version is released upstream and I want to provide both, but I > don't think __preview__ addresses that. ?A robust, standard approach to > versioning of modules would though, and I think would better solve what > __preview__ is trying to solve. I'd be A-OK with an explicit requirement that *any* module shipped in __preview__ must have a third-party supported multi-version compatible alternative on PyPI. (PEP 2 actually pretty much says that should be the case, but making it mandatory in the __preview__ case would be a good idea). As an OS vendor, you'd then be able to say: "Don't use __preview__, since that will cause problems when we next upgrade the system Python. Use the PyPI version instead." Then the stdlib docs for that module (while it is in __preview__) would say "If you are able to easily use third party packages, package offers this API for multiple Python versions with stronger API stability guarantees. This preview version of the module is for use in environments that specifically target a single Python version and/or where the use of third party packages outside the standard library poses additional complications beyond simply downloading and installing the code." Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Sat Jan 28 02:48:10 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 20:48:10 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> Message-ID: <20120127204810.7d27cd06@resist.wooz.org> On Jan 28, 2012, at 11:37 AM, Nick Coghlan wrote: >Then the stdlib docs for that module (while it is in __preview__) >would say "If you are able to easily use third party packages, package > offers this API for multiple Python versions with stronger API >stability guarantees. This preview version of the module is for use in >environments that specifically target a single Python version and/or >where the use of third party packages outside the standard library >poses additional complications beyond simply downloading and >installing the code." Would it be acceptable then for a distro to disable __preview__ or empty it out? The thinking goes like this: if you would normally use an __preview__ module because you can't get approval to download some random package from PyPI, well then your distro probably could or should provide it, so get it from them. In fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents were a requirement, then a distro vendor could just ensure those PyPI versions are available as distro packages outside of the __preview__ stdlib namespace (i.e. in their normal third-party namespace). Then folks developing on that platform could just use the distro package and ignore __preview__. If that's acceptable, then maybe it should be explicitly so in the PEP. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From martin at v.loewis.de Sat Jan 28 02:49:26 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 28 Jan 2012 02:49:26 +0100 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: Message-ID: <20120128024926.Horde.UqyzO9jz9kRPI1QmtrqG84A@webmail.df.eu> > 1. Simple hash randomization is the way to go. We think this has the > best chance of actually fixing the problem while being fairly > straightforward such that we're comfortable putting it in a stable > release. > 2. It will be off by default in stable releases and enabled by an > envar at runtime. This will prevent code breakage from dictionary > order changing as well as people depending on the hash stability. I think this is a good compromise given the widely varying assessments of the issue. Regards, Martin From barry at python.org Sat Jan 28 02:54:22 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 20:54:22 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <4F230C82.9060703@pearwood.info> Message-ID: <20120127205422.17ab2ffe@resist.wooz.org> On Jan 28, 2012, at 11:27 AM, Nick Coghlan wrote: >* for an intranet web service deployment where due diligence adds >significant overhead to any use of third party packages Which really means that *we* are assuming the responsibility for this due diligence. And of course, we should not add anything to __preview__ without consent (and contributor agreement) of the upstream developers. -Barry From barry at python.org Sat Jan 28 02:58:40 2012 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Jan 2012 20:58:40 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <20120127205840.03fa50eb@resist.wooz.org> On Jan 28, 2012, at 11:13 AM, Nick Coghlan wrote: >Really, regex is the *reason* this PEP exists: we *know* we need to >either replace or seriously enhance "re" (since its Unicode handling >isn't up to scratch), but we're only *pretty sure* adding "regex" to >the stdlib is the right answer. Adding "__preview__.regex" instead >gives us a chance to back out if we uncover serious problems (e.g. >with the cross-platform support). I'd also feel much better about this PEP if we had specific ways to measure success. If, for example, regex were added to Python 3.3, but removed from 3.4 because we didn't get enough feedback about it, then I'd consider the approach put forward in this PEP to be a failure. Experiments that fail are *okay* of course, if they are viewed as experiments, there are clear metrics to measure their success, and we have the guts to end the experiment if it doesn't work out. Of course, if it's a resounding success, then that's fantastic too. -Barry From fuzzyman at voidspace.org.uk Sat Jan 28 03:02:02 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 02:02:02 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F230C82.9060703@pearwood.info> References: <4F230C82.9060703@pearwood.info> Message-ID: <4F23571A.7000407@voidspace.org.uk> On 27/01/2012 20:43, Steven D'Aprano wrote: > Eli Bendersky wrote: >> Hello, >> >> Following an earlier discussion on python-ideas [1], we would like to >> propose the following PEP for review. Discussion is welcome. > > > I think you need to emphasize that modules in __preview__ are NOT > expected to have a forward-compatible, stable, API. This is a feature > of __preview__, not a bug, and I believe it is the most important > feature. > > I see responses to this PEP that assume that APIs will be stable, I didn't see responses like that - the *point* of this pep is to allow an api we think *should* be in the standard library stabilise and mature (that's how I see it anyway). There is a difference between "not yet stable" and "we will make huge gratuitous changes" though. We *might* make huge gratuitous changes, but only if they're really needed (meaning they're huge but not gratuitous I guess). > and that having a module fail to graduate out of __preview__ should be > an extraordinary event. I would say this will probably be the case. Once we add something there will be resistance to removing it and we shouldn't let things rot in __preview__ either. I would say failing to graduate would be the exception, although maybe not extraordinary. > But if this is the case, then why bother with __preview__? It just > adds complexity to the process -- if __preview__.spam and spam are > expected to be the same, then just spam straight into the std lib and > be done with it. > I think you're misunderstanding what was suggested. The suggestion was that once spam has graduated from __preview__ into stdlib, that __preview__.spam should remain as an alias - so that code using it from __preview__ at least has a fighting chance of working. > This PEP only makes sense if we assume that __preview__.spam and spam > *will* be different, I disagree. Once there is a spam they should remain the same. __preview__ is for packages that haven't yet made it into the standard library - not a place for experimenting with apis that are already there. > even if only in minor ways, and that there might not even be a spam. > There should be no expectation that every __preview__ module must > graduate, Graduate or die however. > or that every standard library module must go through __preview__. If > it is stable and uncontroversial, __preview__ adds nothing to the > process. > Sure. __preview__ is for things that *need* previewing. All the best, Michael Foord > Even when there are candidates for inclusion with relatively stable > APIs, like regex, we should *assume* that there will be API > differences between __preview__.regex and regex, simply because it is > less harmful to expect changes that don't eventuate than to expect > stability and be surprised by changes. > > This, I believe, rules out Antoine's suggestion that modules remain > importable from __preview__ even after graduation to a full member of > the standard library. We simply can't say have all three of these > statements true at the same time: > > 1) regular standard library modules are expected to be backward > compatible > 2) __preview__ modules are not expected to be forward compatible > 3) __preview__.spam is an alias to regular standard library spam > > > At least one of them has to go. Since both 1) and 2) are powerful > features, and 3) is only a convenience, the obvious one to drop is 3). > I note that the PEP, as it is currently written, explicitly states > that __preview__.spam will be dropped when it graduates to spam. This > is a good thing and should not be changed. > > Keeping __preview__.spam around after graduation is, I believe, > actively harmful. It adds complexity to the developer's > decision-making process ("Should I import spam from __preview__, or > just import spam? What's the difference?"). It gives a dangerous > impression that code written for __preview__.spam will still work for > spam. > > We should be discouraging simple-minded recipes like > > try: > import spam > except ImportError: > from __preview__ import spam > spam.foo(a, b, c) > > since they undermine the vital feature of __preview__ that the > signature and even the existence of spam.foo is subject to change. > > I would go further and suggest that __preview__ be explicitly called > __unstable__. If that name is scary, and it frightens some users off, > good! The last thing we want is when 3.4 comes around to have dozens > of bug reports along the line of "spam.foo() and > __preview__.spam.foo() have different function signatures and aren't > compatible". Of course they do. That's why __preview__.spam existed in > the first place, to allow the API to mature without the expectation > that it was already stable. > > Since __preview__.spam (or, as I would prefer, __unstable__.spam) and > spam cannot be treated as drop-in replacements, what is > __preview__.spam good for? Without a stable API, __preview__.spam is > not suitable for use in production applications that expect to run > under multiple versions of the standard library. > > I think the PEP needs more use-cases on who might use > __preview__.spam, and why. These come to my mind: > > > * if you don't care about Python 3.x+1, then there is no reason not to > treat Python 3.x's __preview__.spam as stable; > > * rapid development proof-of-concept software ("build one to throw away") > can safely use __preview__.spam, since they are expected to be replaced > anyway; > > * one-use scripts; > > * use at the interactive interpreter; > > * any other time where forward-compatibility is not required. > > > I am reminded of the long, often acrimonious arguments that took place > on Python-Dev a few years back about the API for the ipaddr library. A > lot of the arguments could have been short-circuited if we had said > "putting ipaddr into __preview__ does not constitute acceptance of its > API". > > (On the other hand, if __preview__ becomes used in the future for > library authors to fob-off criticism for 18 months in the hope it will > just be forgotten, then this will be a bad thing.) > > > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Sat Jan 28 03:05:23 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 02:05:23 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F230DB7.603@pearwood.info> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F230DB7.603@pearwood.info> Message-ID: <4F2357E3.602@voidspace.org.uk> On 27/01/2012 20:48, Steven D'Aprano wrote: > Eli Bendersky wrote: > >>> try: >>> from __preview__ import thing >>> except ImportError: >>> import thing >>> >>> So no need to target a very specific version of Python. >>> >> >> Yep, this is what I had in mind. And it appeared too trivial to place >> it in the PEP. > > Trivial and wrong. > > Since thing and __preview__.thing may have subtle, or major, API > differences, how do you use it? > No, potentially wrong in cases where the APIs are different. Even with the try...except ImportError dance around StringIO / cStringIO there are some API differences. But for a lot of use cases it works fine (simplejson and json aren't *identical*, but it works for most people). Michael > try: > result = thing.foo(a, b, c) + thing.bar(x) > except AttributeError: > # Must be the preview version > result = thing.foobar(a, c, b, x) > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From steve at pearwood.info Sat Jan 28 03:28:21 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Jan 2012 13:28:21 +1100 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: Message-ID: <4F235D45.7080707@pearwood.info> Benjamin Peterson wrote: > Hello everyone, > In effort to get a fix out before Perl 6 goes mainstream, Barry and I > have decided to pronounce on what we want for our stable releases. > What we have decided is that > 1. Simple hash randomization is the way to go. We think this has the > best chance of actually fixing the problem while being fairly > straightforward such that we're comfortable putting it in a stable > release. > 2. It will be off by default in stable releases and enabled by an > envar at runtime. This will prevent code breakage from dictionary > order changing as well as people depending on the hash stability. Do you have the expectation that it will become on by default in some future release? -- Steven From benjamin at python.org Sat Jan 28 03:33:57 2012 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 27 Jan 2012 21:33:57 -0500 Subject: [Python-Dev] plugging the hash attack In-Reply-To: <4F235D45.7080707@pearwood.info> References: <4F235D45.7080707@pearwood.info> Message-ID: 2012/1/27 Steven D'Aprano : > Benjamin Peterson wrote: >> >> Hello everyone, >> In effort to get a fix out before Perl 6 goes mainstream, Barry and I >> have decided to pronounce on what we want for our stable releases. >> What we have decided is that >> 1. Simple hash randomization is the way to go. We think this has the >> best chance of actually fixing the problem while being fairly >> straightforward such that we're comfortable putting it in a stable >> release. >> 2. It will be off by default in stable releases and enabled by an >> envar at runtime. This will prevent code breakage from dictionary >> order changing as well as people depending on the hash stability. > > > Do you have the expectation that it will become on by default in some future > release? Yes, 3.3. The solution in 3.3 could even be one of the more sophisticated proposals we have today. -- Regards, Benjamin From guido at python.org Sat Jan 28 03:40:32 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jan 2012 18:40:32 -0800 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: Message-ID: On Fri, Jan 27, 2012 at 5:19 PM, Benjamin Peterson wrote: > Hello everyone, > In effort to get a fix out before Perl 6 goes mainstream, Barry and I > have decided to pronounce on what we want for our stable releases. > What we have decided is that > 1. Simple hash randomization is the way to go. We think this has the > best chance of actually fixing the problem while being fairly > straightforward such that we're comfortable putting it in a stable > release. > 2. It will be off by default in stable releases and enabled by an > envar at runtime. This will prevent code breakage from dictionary > order changing as well as people depending on the hash stability. Okay, good call! -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat Jan 28 03:51:41 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Jan 2012 13:51:41 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F2357E3.602@voidspace.org.uk> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F230DB7.603@pearwood.info> <4F2357E3.602@voidspace.org.uk> Message-ID: <4F2362BD.10704@pearwood.info> Michael Foord wrote: > On 27/01/2012 20:48, Steven D'Aprano wrote: >> Eli Bendersky wrote: >> >>>> try: >>>> from __preview__ import thing >>>> except ImportError: >>>> import thing >>>> >>>> So no need to target a very specific version of Python. >>>> >>> >>> Yep, this is what I had in mind. And it appeared too trivial to place >>> it in the PEP. >> >> Trivial and wrong. >> >> Since thing and __preview__.thing may have subtle, or major, API >> differences, how do you use it? >> > No, potentially wrong in cases where the APIs are different. Even with > the try...except ImportError dance around StringIO / cStringIO there are > some API differences. But for a lot of use cases it works fine > (simplejson and json aren't *identical*, but it works for most people). Okay, granted, I accept your point. But I think we need to distinguish between these cases. In the case of StringIO and cStringIO, API compatibility is expected, and differences are either bugs or implementation differences that you shouldn't be relying on. In the case of the typical[1] __preview__ module, one of the motivations of adding it to __preview__ is to test the existing API. We should expect changes, even if in practice often there won't be. We might hope for no API changes, but we should plan for the case where there will be. And that rules out the "try import" dance for the typical __preview__ module. There may be modules which graduate and keep the same API. In those cases, people will quickly work out the import dance on their own, it's a very common idiom. But we shouldn't advertise it as the right way to deal with __preview__, since that implies the expectation of API stability, and we want to send the opposite message: __preview__ is the last time the API can change without a big song and dance, so be prepared for it to change. I'm with Nick on this one: if you're not prepared to change "from __preview__ import module" to "import module" in your app, then you certainly won't be prepared to deal with the potential API changes and you aren't the target audience for __preview__. [1] I am fully aware of the folly of referring to a "typical" example of something that doesn't exist yet -- Steven From eliben at gmail.com Sat Jan 28 04:11:14 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 28 Jan 2012 05:11:14 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F2362BD.10704@pearwood.info> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F230DB7.603@pearwood.info> <4F2357E3.602@voidspace.org.uk> <4F2362BD.10704@pearwood.info> Message-ID: >> No, potentially wrong in cases where the APIs are different. Even with the >> try...except ImportError dance around StringIO / cStringIO there are some >> API differences. But for a lot of use cases it works fine (simplejson and >> json aren't *identical*, but it works for most people). > > > > Okay, granted, I accept your point. > > But I think we need to distinguish between these cases. > > In the case of StringIO and cStringIO, API compatibility is expected, and > differences are either bugs or implementation differences that you shouldn't > be relying on. > I just recently ran into a compatibility of StringIO and cStringIO. It's a good thing it's documented: "Another difference from the StringIO module is that calling StringIO() with a string parameter creates a read-only object. Unlike an object created without a string parameter, it does not have write methods. These objects are not generally visible. They turn up in tracebacks as StringI and StringO." But it did cause me a couple of minutes of grief until I found this piece in the docs and wrote a work-around. But no, even in the current stable stdlib, the "try import ... except import from elsewhere" trick doesn't "just work" for StringIO/cStringIO. And as far as I can understand this is documented, not a bug or some obscure implementation detail. My point is that if our users accept *this*, in the stable stdlib, I see no reason they won't accept the same happening between __preview__ and a graduated module, when they (hopefully) understand the intention of __preview__. Eli From turnbull at sk.tsukuba.ac.jp Sat Jan 28 05:44:52 2012 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 28 Jan 2012 13:44:52 +0900 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F22C268.40005@voidspace.org.uk> References: <20120127160934.2ad5e0bf@pitrou.net> <4F22C268.40005@voidspace.org.uk> Message-ID: <87aa58306z.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Foord writes: > >> Assuming the module is then promoted to the the standard library proper in > >> release ``3.X+1``, it will be moved to a permanent location in the library:: > >> > >> import example > >> > >> And importing it from ``__preview__`` will no longer work. > > Why not leave it accessible through __preview__ too? > > +1 Er, doesn't this contradict your point about using try: from __preview__ import spam except ImportError: import spam ? I think it's a bad idea to introduce a feature that's *supposed* to break (in the sense of "make a break", ie, change the normal pattern) with every release and then try to avoid breaking (in the sense of "causing an unexpected failure") code written by people who don't want to follow the discipline of keeping up with changing APIs. If they want that stability, they should wait for the stable release. Modules should become unavailable from __preview__ as soon as they have a stable home. From stephen at xemacs.org Sat Jan 28 06:22:54 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 28 Jan 2012 14:22:54 +0900 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127175414.385567b6@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> Message-ID: <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> Executive summary: If the promise to remove the module from __preview__ is credible (ie, strictly kept), then __preview__ will have a specific audience in those who want the stdlib candidate code and are willing to deal with a certain amount of instability in that code. (Whether that audience is big enough to be worth the effort of managing __preview__ is another question.) Barry Warsaw writes: > >> I agree with everything Alex said here. I don't necessarily disagree. But: > I can't argue with that, it's just that I don't think __preview__ > solves [the visibility] problem. I do disagree with that. I frequently refer to the library reference for modules that do what I need, but almost never to PyPI (my own needs are usually not very hard to program, but if there's a stdlib module it's almost surely far more general, robust, and tested than my special-case code would be; PyPI provides far less of a robustness guarantee than a stdlib candidate would). I don't know how big or important a use case this is, though I think that Antoine's point that a similar argument applies to those who develop software for their own internal use (like me, but they have actual standards for QA) is valid. > I think we'll just see folks using the unstable APIs and then > complaining when we remove them, even though they *know* *upfront* > that these APIs will go away. So maybe the Hon. Mr. Broytman would be willing to supply a form letter for those folks, too. "We promised to remove the module from __preview__, and we did. We warned you the API would be likely unstable, and it was. You have no complaint." would be the gist. > A robust, standard approach to versioning of modules would though, > and I think would better solve what __preview__ is trying to solve. I suspect that "robust, standard approach to versioning of modules" is an oxymoron. The semantics of "module version" from the point of view of application developers and users is very complex, and cannot be encapsulated in a linear sequence. The only reliable comparison that can be done on versions is equality (and Python knows that; that's why there is a stdlib bound to the core in the first place!) > I'm not so sure about that. If I were to actively try it, I'm not > sure how much motivation I'd have to rewrite key parts of my code > when an incompatible version gets promoted to the un__preview__d > stdlib. So use the old version of Python. You do that anyway. Or avoid APIs where you are unwilling to deal with more or less frequent changes. You do that anyway. And if you're motivated enough, use __preview__. I don't understand what you think you lose here. From scott+python-dev at scottdial.com Sat Jan 28 06:09:13 2012 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sat, 28 Jan 2012 00:09:13 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127204810.7d27cd06@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: <4F2382F9.90101@scottdial.com> On 1/27/2012 8:48 PM, Barry Warsaw wrote: > The thinking goes like this: if you would normally use an __preview__ module > because you can't get approval to download some random package from PyPI, well > then your distro probably could or should provide it, so get it from them. That is my thought about the entire __preview__ concept. Anything that would/should go into __preview__ would be better off being packaged for a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would get better visibility than just being on PyPI and would be more flexible in terms of release schedule to allow API changes. If the effort being put into making the __preview__ package was put into packaging those modules for distros, then you would get the same exposure with better flexibility and a better maintenance story. The whole idea of __preview__ seems to be a workaround for the difficult packaging story for Python modules on common distros -- stuffing them into __preview__ is a cheat to get the distro packagers to distribute these interesting modules since we would be bundling them. However, as you have pointed out, it would very desirable to them to not do so. So in the end, these modules may not receive as wide of visibility as the PEP suggests. I could very easily imagine the more stable distributions refusing or patching anything that used __preview__ in order to eliminate difficulties. -- Scott Dial scott at scottdial.com From stephen at xemacs.org Sat Jan 28 06:41:44 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 28 Jan 2012 14:41:44 +0900 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F230DB7.603@pearwood.info> <4F2357E3.602@voidspace.org.uk> <4F2362BD.10704@pearwood.info> Message-ID: <877h0c2xk7.fsf@uwakimon.sk.tsukuba.ac.jp> Eli Bendersky writes: > My point is that if our users accept *this*, in the stable stdlib, I > see no reason they won't accept the same happening between __preview__ > and a graduated module, when they (hopefully) understand the intention > of __preview__. If it doesn't happen with sufficiently high frequency and annoyance factors to make attempting to use both the __preview__ and graduated versions in the same code base unacceptable to most users, then __preview__ is unnecessary, and the PEP should be rejected. From eliben at gmail.com Sat Jan 28 07:05:53 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 28 Jan 2012 08:05:53 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <877h0c2xk7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F22B694.6060909@freehackers.org> <4F22C1E8.6090500@voidspace.org.uk> <4F230DB7.603@pearwood.info> <4F2357E3.602@voidspace.org.uk> <4F2362BD.10704@pearwood.info> <877h0c2xk7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 28, 2012 at 07:41, Stephen J. Turnbull wrote: > Eli Bendersky writes: > > ?> My point is that if our users accept *this*, in the stable stdlib, I > ?> see no reason they won't accept the same happening between __preview__ > ?> and a graduated module, when they (hopefully) understand the intention > ?> of __preview__. > > If it doesn't happen with sufficiently high frequency and annoyance > factors to make attempting to use both the __preview__ and graduated > versions in the same code base unacceptable to most users, then > __preview__ is unnecessary, and the PEP should be rejected. API differences such as changing one method to another (perhaps repeated over several methods) is unacceptable for stdlib modules. On the other hand, for a determined user importing from either __preview__ or the graduated version it's only a matter of a few lines in a conditional import. IMHO this is much preferable to having the module either external or in the stdlib, because that imposes another external dependency. But I think that the issue of keeping __preview__ in a later release is just an "implementation detail" of the PEP and shouldn't be seen as its main decision point. Eli From ncoghlan at gmail.com Sat Jan 28 07:31:41 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 16:31:41 +1000 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F234251.8080708@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> <4F234251.8080708@stoneleaf.us> Message-ID: On Sat, Jan 28, 2012 at 10:33 AM, Ethan Furman wrote: > Because at this point it is possible to do: > > ? ?raise ValueError from NameError > > outside a try block. ?I don't see it as incredibly useful, but I don't know > that it's worth making it illegal. > > So the question is: > > ?- should 'raise ... from ...' be legal outside a try block? > > ?- should 'raise ... from None' be legal outside a try block? Given that it would be quite a bit of work to make it illegal, my preference is to leave it alone. I believe that means there's only one open question. Should "raise ex from None" be syntactic sugar for: 1. clearing the current thread's exception state (as I believe Ethan's patch currently does), thus meaning that __context__ and __cause__ both end up being None 2. setting __cause__ to None (so that __context__ still gets set normally, as it is now when __cause__ is set to a specific exception), and having __cause__ default to a *new* sentinel object that indicates "use __context__" I've already stated my own preference in favour of 2 - that approach means developers that think about it can explicitly change exception types such that the context isn't displayed by default, but application and framework developers remain free to insert their own exception handlers that *always* report the full exception stack. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 28 07:37:57 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 16:37:57 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127204810.7d27cd06@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: On Sat, Jan 28, 2012 at 11:48 AM, Barry Warsaw wrote: > Would it be acceptable then for a distro to disable __preview__ or empty it > out? > > The thinking goes like this: if you would normally use an __preview__ module > because you can't get approval to download some random package from PyPI, well > then your distro probably could or should provide it, so get it from them. ?In > fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents > were a requirement, then a distro vendor could just ensure those PyPI versions > are available as distro packages outside of the __preview__ stdlib namespace > (i.e. in their normal third-party namespace). ?Then folks developing on that > platform could just use the distro package and ignore __preview__. > > If that's acceptable, then maybe it should be explicitly so in the PEP. I think that's an excellent idea - in that case, the distro vendor is taking over the due diligence responsibilities, which are the main point of __preview__. Similarly, sumo distributions like ActiveState or Python(x, y) could choose to add the PyPI version. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 28 08:10:22 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 17:10:22 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 28, 2012 at 3:22 PM, Stephen J. Turnbull wrote: > Executive summary: > > If the promise to remove the module from __preview__ is credible (ie, > strictly kept), then __preview__ will have a specific audience in > those who want the stdlib candidate code and are willing to deal with > a certain amount of instability in that code. People need to remember there's another half to this equation: the core dev side. The reason *regex* specifically isn't in the stdlib already is largely due to (perhaps excessive) concerns about the potential maintenance burden. It's not a small chunk of code and we don't want to deal with another bsddb. That's the main roadblock to inclusion. Not lack of user demand. Not blindness to the problems with re. Just concerns about maintainability. Add to that some niggling concerns about backwards compatibility in obscure corner cases that may not be exercised by current users. And so we have an impasse. Matthew has indicated he's happy to include it and maintain it as part of the core, but it hasn't really gone anywhere because we don't currently have a good way to address those maintainability concerns (aside from saying "you're worrying about it too much", which isn't what I would call persuasive). That's what __preview__ gives us: a way to deal with the *negative* votes that keep positive additions out of the standard library. Most of the PEP's arguments for due diligence etc are actually talking about why we want things in the standard library in the first place, rather than about __preview__ in particular. The core idea behind the __preview__ namespace is to allow *3* possible responses when a module is proposed for stdlib inclusion: 1. Yes, that's a good idea, we'll add it (cf. lzma for 3.3) 2. Maybe, so we'll add it to __preview__ for a release and see if it blows up in our face (hopefully at least regex for 3.3, maybe ipaddr and daemon as well) 3. No, not going to happen. Currently, anything where we would answer "2" ends up becoming a "3" by default, and that's not a good thing for the long-term health of the language. The reason this will be more effective in building core developer confidence than third party distribution via PyPI is due to a few different things: - we all run the test suite, so we get to see that the software builds and tests effectively - we know what our own buildbots cover, so we know it's passing on all those platforms - we'll get to see more of the related discussions in channels we monitor *anyway* (i.e. the bug tracker, python-dev) As far as the criteria for failing to graduate goes, I'd say something that ends up in __preview__ will almost always make it into the main part of the standard library, with the following exceptions: - excessive build process, test suite and buildbot instability. Whether this is due to fragile test cases or fragile code, we don't want to deal with another bsddb. If the test suite can't be stabilised over the course of an entire feature release, then the module would most likely be rejected rather than allowing it to graduate to the standard library. - strongly negative (or just plain confused) user feedback. We deal with feedback on APIs all the time. Sometimes we add new ones, or tweak the existing ones. Occasionally we'll judge them to be irredeemably broken and just plain remove them (cf. CObject, contextlib.nested, Bastion, rexec). This wouldn't change just because a module was in __preview__ - instead, we'd just have another option available to us (i.e. rejecting the module for stdlib inclusion post-preview rather than trying to fix it). Really, the main benefit for end users doesn't lie in __preview__ itself: it lies in the positive effect __preview__ will have on the long term evolution of the standard library, as it aims to turn python-dev's inherent conservatism (which is a good thing!) into a speed bump rather than a road block. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 28 08:13:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 17:13:28 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: On Sat, Jan 28, 2012 at 4:37 PM, Nick Coghlan wrote: > I think that's an excellent idea - in that case, the distro vendor is > taking over the due diligence responsibilities, which are the main > point of __preview__. Heh, contradicted myself in my next email. python-dev handling due diligence is a key benefit for *stdlib inclusion*, not __preview__ per se. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Sat Jan 28 08:42:42 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 28 Jan 2012 02:42:42 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Fri, Jan 27, 2012 at 12:26 PM, Alex wrote: > I think a significantly healthier process (in terms of maximizing feedback and > getting something into it's best shape) is to let a project evolve naturally on > PyPi and in the ecosystem, give feedback to it from an inclusion perspective, > and then include it when it becomes ready on it's own merits. The counter > argument to ?this is that putting it in the stdlib gets you signficantly more > eyeballs (and hopefully more feedback, therefore), my only response to this is: > if it doesn't get eyeballs on PyPi I don't think there's a great enough need to > justify it in the stdlib. Strongly agree. From anacrolix at gmail.com Sat Jan 28 08:49:40 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 28 Jan 2012 02:49:40 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: FWIW I'm now -1 for this idea. Stronger integration with PyPI and packaging systems is much preferable. Python core public releases are no place for testing. On Sat, Jan 28, 2012 at 2:42 AM, Matt Joiner wrote: > On Fri, Jan 27, 2012 at 12:26 PM, Alex wrote: >> I think a significantly healthier process (in terms of maximizing feedback and >> getting something into it's best shape) is to let a project evolve naturally on >> PyPi and in the ecosystem, give feedback to it from an inclusion perspective, >> and then include it when it becomes ready on it's own merits. The counter >> argument to ?this is that putting it in the stdlib gets you signficantly more >> eyeballs (and hopefully more feedback, therefore), my only response to this is: >> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to >> justify it in the stdlib. > > Strongly agree. From raymond.hettinger at gmail.com Sat Jan 28 08:50:48 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 27 Jan 2012 23:50:48 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F2217D1.2000700@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <4F38422F-C0CD-4E9E-83FC-960394830567@gmail.com> On Jan 26, 2012, at 7:19 PM, Ethan Furman wrote: > One of the open issues from PEP 3134 is suppressing context: currently there is no way to do it. This PEP proposes one. Thanks for proposing fixes to this issue. It is an annoying problem. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodgestar+pythondev at gmail.com Sat Jan 28 08:58:15 2012 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Sat, 28 Jan 2012 09:58:15 +0200 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Sat, Jan 28, 2012 at 9:49 AM, Matt Joiner wrote: > FWIW I'm now -1 for this idea. Stronger integration with PyPI and > packaging systems is much preferable. Python core public releases are > no place for testing. +1. I'd much rather just use the module from PyPI. It would be good to have a practical guide on how to manage the transition from third-party to core library module though. A PEP with a list of modules earmarked for upcoming inclusion in the standard library (and which Python version they're intended to be included in) might focus community effort on using, testing and fixing modules before they make it into core and fixing becomes a lot harder. Schiavo Simon From anacrolix at gmail.com Sat Jan 28 09:08:33 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 28 Jan 2012 19:08:33 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: > +1. I'd much rather just use the module from PyPI. > > It would be good to have a practical guide on how to manage the > transition from third-party to core library module though. A PEP with > a list of modules earmarked for upcoming inclusion in the standard > library (and which Python version they're intended to be included in) > might focus community effort on using, testing and fixing modules > before they make it into core and fixing becomes a lot harder. +1 for your +1, and earmarking. That's the word I was looking for, and instead chose "advocacy". From stephen at xemacs.org Sat Jan 28 09:38:20 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 28 Jan 2012 17:38:20 +0900 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > People need to remember there's another half to this equation: the > core dev side. Why? There's nothing about it in the PEP. > The reason *regex* specifically isn't in the stdlib already is > largely due to (perhaps excessive) concerns about the potential > maintenance burden. But then giving regex as an example seems to contradict the PEP: "The only difference between preview APIs and the rest of the standard library is that preview APIs are explicitly exempted from the usual backward compatibility guarantees," "in principle, most modules in the __preview__ package should eventually graduate to the stable standard library," and "whenever the Python core development team decides that a new module should be included into the standard library, but isn't sure about whether the module's API is optimal". True, there were a few bits spilled on the possibility of being "without sufficient developer support to maintain it," but I read that as a risk that is basically a consequence of instability of the API. The rationale is entirely focused on API instability, and a focus on API instability is certainly the reason for calling it "__preview__" rather than "__experimental__". I don't have an opinion on whether this is an argument for rejecting the PEP or for rewriting it (specifically, seriously beefing up the "after trying it, maybe we won't want to maintain it" rationale). I also think that if "we need to try it to decide if the maintenance burden is acceptable" is a rationale, the name "__experimental__" should be seriously reconsidered as more accurately reflecting the intended content of the package. From ncoghlan at gmail.com Sat Jan 28 09:55:13 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 18:55:13 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Sat, Jan 28, 2012 at 5:49 PM, Matt Joiner wrote: > FWIW I'm now -1 for this idea. Stronger integration with PyPI and > packaging systems is much preferable. Python core public releases are > no place for testing. People saying this: we KNOW this approach doesn't work in all cases. If it worked perfectly, regex would be in the standard library by now. Don't consider this PEP a purely theoretical proposal, because it isn't. It's really being put forward to solve a specific problem: the fact that we need to do something about re's lack of proper Unicode support [1]. Those issues are actually hard to solve, so replacing re with Matthew Barnett's regex module (just as re itself was a replacement for the original regex module) that already addresses most of them seems like a good way forward, but this is currently being blocked because there are still a few lingering concerns with maintainability and backwards compatibility. We *need* to break the impasse preventing its inclusion in the standard library, and __preview__ lets us do that without running roughshod over the legitimate core developer concerns raised in the associated tracker issue [2]. With the current criteria for stdlib inclusion, it doesn't *matter* if a module is oh-so-close to being accepted: it gets rejected anyway, just like a module that has no chance of ever being suitable. There is currently *no* path forward for resolving any stdlib-specific concerns that arise with already popular PyPI modules, and so such situations remain unresolved and key components of the standard library stagnate. While regex is the current poster-child for this problem, it's quite likely that similar problems will arise in the future. Kenneth Reitz's requests module is an obvious candidate: it's enormously popular with users, Kenneth has indicated he's amenable to the idea of stdlib inclusion once the feature set is sufficiently stable (i.e. not for 3.3), but I expect there will be legitimate concerns with incorporating it, given its scope. Cheers, Nick. [1] http://bugs.python.org/issue?%40search_text=&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=tchrist&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search [2] http://bugs.python.org/issue2636 -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 28 10:18:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jan 2012 19:18:01 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 28, 2012 at 6:38 PM, Stephen J. Turnbull wrote: > I don't have an opinion on whether this is an argument for rejecting > the PEP or for rewriting it (specifically, seriously beefing up the > "after trying it, maybe we won't want to maintain it" rationale). ?I > also think that if "we need to try it to decide if the maintenance > burden is acceptable" is a rationale, the name "__experimental__" > should be seriously reconsidered as more accurately reflecting the > intended content of the package. I think it's an argument for rewriting it (and, as you point out, perhaps reverting to __experimental__ as the proposed name). Eli started from a draft I wrote a while back and my own thinking on the topic wasn't particularly clear (in fact, it's only this thread that has really clarified things for me). The main thing I've realised is that the end user benefits currently discussed in the PEP are really about the importance of a robust *standard library*. They aren't specific to the new namespace at all - that part of the rationale is really only needed to counter the predictable "who cares about the standard library, we can just use PyPI!" responses (and the answer is, "lots of people that can't or won't use PyPI modules for a wide range of reasons"). The only reason to add a new double-underscore namespace is to address *core developer* concerns in cases where we're *almost* sure that we want to add the module to the standard library, but aren't quite prepared to commit to maintaining it for the life of the 3.x series (cf. 2.x and the ongoing problems we had with keeping the bsddb module working properly, especially before Jesus Cea stepped up to wrangle it into submission). It's basically us saying to Python users "We're explicitly flagging this PyPI module for inclusion in the next major Python release. We've integrated it into our build process, test suite and binary releases, so you don't even have to download it from PyPI in order to try it out, you can just import it from the __preview__ namespace (although you're still free to download it from PyPI if you prefer - in fact, if you need to support multiple Python versions, we actively recommend it!). There's still a small chance this module won't make the grade and will be dropped from the standard library entirely (that's why it's only a preview), but most likely it will move into the main part of the standard library with full backwards compatibility guarantees in the next release". Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mark at hotpy.org Sat Jan 28 10:56:39 2012 From: mark at hotpy.org (Mark Shannon) Date: Sat, 28 Jan 2012 09:56:39 +0000 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: Message-ID: <4F23C657.9050501@hotpy.org> stefan brunthaler wrote: > Hi, > > On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson wrote: >> 2011/11/8 stefan brunthaler : >>> How does that sound? >> I think I can hear real patches and benchmarks most clearly. >> > I spent the better part of my -20% time on implementing the work as > "suggested". Please find the benchmarks attached to this email, I just Could you try benchmarking with the "standard" benchmarks: http://hg.python.org/benchmarks/ and see what sort of performance gains you get? > did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched > off the regular 3.3a0 default tip changeset 73977 shortly after your > email. I do not have an official patch yet, but am going to create one > if wanted. Changes to the existing interpreter are minimal, the > biggest chunk is a new interpreter dispatch loop. How portable is the threaded interpreter? Do you have a public repository for the code, so we can take a look? Cheers, Mark. From lukasz at langa.pl Sat Jan 28 11:37:53 2012 From: lukasz at langa.pl (=?iso-8859-2?Q?=A3ukasz_Langa?=) Date: Sat, 28 Jan 2012 11:37:53 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: Wiadomo?? napisana przez Simon Cross w dniu 28 sty 2012, o godz. 08:58: > +1. I'd much rather just use the module from PyPI. > > It would be good to have a practical guide on how to manage the > transition from third-party to core library module though. A PEP with > a list of modules earmarked for upcoming inclusion in the standard > library (and which Python version they're intended to be included in) > might focus community effort on using, testing and fixing modules > before they make it into core and fixing becomes a lot harder. +1 -- Best regards, ?ukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o. From fijall at gmail.com Sat Jan 28 13:21:18 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Sat, 28 Jan 2012 14:21:18 +0200 Subject: [Python-Dev] Python 3 benchmarks Message-ID: Hi Something that's maybe worth mentioning is that the "official" python benchmark suite http://hg.python.org/benchmarks/ has a pretty incomplete set of benchmarks for python 3 compared to say what we run for pypy: https://bitbucket.org/pypy/benchmarks I think a very worthwhile project would be to try to port other benchmarks (that actually use existing python projects like sympy or django) for those that has been ported to python 3. Any thoughts? Cheers, fijal From p.f.moore at gmail.com Sat Jan 28 14:04:45 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 28 Jan 2012 13:04:45 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 28 January 2012 09:18, Nick Coghlan wrote: > It's basically us saying to Python users "We're explicitly flagging > this PyPI module for inclusion in the next major Python release. We've > integrated it into our build process, test suite and binary releases, > so you don't even have to download it from PyPI in order to try it > out, you can just import it from the __preview__ namespace (although > you're still free to download it from PyPI if you prefer - in fact, if > you need to support multiple Python versions, we actively recommend > it!). There's still a small chance this module won't make the grade > and will be dropped from the standard library entirely (that's why > it's only a preview), but most likely it will move into the main part > of the standard library with full backwards compatibility guarantees > in the next release". +1. Paul. From p.f.moore at gmail.com Sat Jan 28 14:13:09 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 28 Jan 2012 13:13:09 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127204810.7d27cd06@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: On 28 January 2012 01:48, Barry Warsaw wrote: > The thinking goes like this: if you would normally use an __preview__ module > because you can't get approval to download some random package from PyPI, well > then your distro probably could or should provide it, so get it from them. ?In > fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents > were a requirement, then a distro vendor could just ensure those PyPI versions > are available as distro packages outside of the __preview__ stdlib namespace > (i.e. in their normal third-party namespace). ?Then folks developing on that > platform could just use the distro package and ignore __preview__. Just so that you know that such cases exist, I am in a position where I have access to systems with (distro-supplied) Python installed. I can use anything supplied with Python (i.e., the stdlib - and __preview__ would fall into this category as well). And yet I have essentially no means of gaining access to any 3rd party modules, whether they are packaged by the distro or obtained from PyPI. (And "build your own" isn't an option in many cases, if only because a C compiler may well not be available!) This is essentially due to corporate inertia and bogged down "do-nothing" policies rather than due dilligence or supportability concerns. But it is a reality for me (and many others, I suspect). Having said this, of course, the same corporate inertia means that Python 3.3 is a pipe-dream for me in those environments for many years yet. So ignoring them may be reasonable. Just some facts to consider :-) Paul. From eric at trueblade.com Sat Jan 28 14:23:45 2012 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 28 Jan 2012 08:23:45 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F23F6E1.8030607@trueblade.com> On 1/28/2012 2:10 AM, Nick Coghlan wrote: > On Sat, Jan 28, 2012 at 3:22 PM, Stephen J. Turnbull wrote: >> Executive summary: >> >> If the promise to remove the module from __preview__ is credible (ie, >> strictly kept), then __preview__ will have a specific audience in >> those who want the stdlib candidate code and are willing to deal with >> a certain amount of instability in that code. > > People need to remember there's another half to this equation: the > core dev side. > > The reason *regex* specifically isn't in the stdlib already is largely > due to (perhaps excessive) concerns about the potential maintenance > burden. It's not a small chunk of code and we don't want to deal with > another bsddb. ... > Really, the main benefit for end users doesn't lie in __preview__ > itself: it lies in the positive effect __preview__ will have on the > long term evolution of the standard library, as it aims to turn > python-dev's inherent conservatism (which is a good thing!) into a > speed bump rather than a road block. I was -0 on this proposal, but after Nick's discussion above I'm now +1. I also think it's worth thinking about how multiprocessing would have benefited from the __preview__ process. And for people saying "just use PyPI": that tends to exclude many Windows users from trying out packages that aren't pure Python. From anacrolix at gmail.com Sat Jan 28 14:55:09 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 29 Jan 2012 00:55:09 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: > __preview__ would fall into this category as well). And yet I have > essentially no means of gaining access to any 3rd party modules, > whether they are packaged by the distro or obtained from PyPI. ?(And > "build your own" isn't an option in many cases, if only because a C > compiler may well not be available!) This is essentially due to > corporate inertia and bogged down "do-nothing" policies rather than > due dilligence or supportability concerns. But it is a reality for me > (and many others, I suspect). > > Having said this, of course, the same corporate inertia means that > Python 3.3 is a pipe-dream for me in those environments for many years > yet. So ignoring them may be reasonable. You clearly want access to external modules sooner. A preview namespace addresses this indirectly. The separated stdlib versioning concept is far superior for this use case. From hs at ox.cx Sat Jan 28 15:07:00 2012 From: hs at ox.cx (Hynek Schlawack) Date: Sat, 28 Jan 2012 15:07:00 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <1224D6D3-E844-4862-A40F-1C1078551C32@ox.cx> Hi, Am 27.01.2012 um 18:26 schrieb Alex: > I'm -1 on this, for a pretty simple reason. Something goes into __preview__, > instead of it's final destination directly because it needs feedback/possibly > changes. However, given the release cycle of the stdlib (~18 months), any > feedback it gets can't be seen by actual users until it's too late. Essentially > you can only get one round of stdlib. > > I think a significantly healthier process (in terms of maximizing feedback and > getting something into it's best shape) is to let a project evolve naturally on > PyPi and in the ecosystem, give feedback to it from an inclusion perspective, > and then include it when it becomes ready on it's own merits. The counter > argument to this is that putting it in the stdlib gets you signficantly more > eyeballs (and hopefully more feedback, therefore), my only response to this is: > if it doesn't get eyeballs on PyPi I don't think there's a great enough need to > justify it in the stdlib. I agree with Alex on this: The iterations ? even with PEP 407 ? would be wayyy too long to be useful. As for the only downside: How about endorsing certain pypi projects as possible future additions in order to give them more exposure? I'm sure there is some nice way for that. Plus: Everybody could pin the version their code depends on right now, so updates wouldn't break anything. I.e. api users would have more peace of mind and api developers could develop more aggressively. Bye, -h From fuzzyman at voidspace.org.uk Sat Jan 28 15:59:22 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 14:59:22 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F240D4A.1080304@voidspace.org.uk> On 28/01/2012 13:04, Paul Moore wrote: > On 28 January 2012 09:18, Nick Coghlan wrote: > >> It's basically us saying to Python users "We're explicitly flagging >> this PyPI module for inclusion in the next major Python release. We've >> integrated it into our build process, test suite and binary releases, >> so you don't even have to download it from PyPI in order to try it >> out, you can just import it from the __preview__ namespace (although >> you're still free to download it from PyPI if you prefer - in fact, if >> you need to support multiple Python versions, we actively recommend >> it!). There's still a small chance this module won't make the grade >> and will be dropped from the standard library entirely (that's why >> it's only a preview), but most likely it will move into the main part >> of the standard library with full backwards compatibility guarantees >> in the next release". > +1. Yep, nice way of putting it - and summing up the virtues of the approach. (Although I might say "most likely it will move into the main part of the standard library with full backwards compatibility guarantees in a future release".) Michael > > Paul. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Sat Jan 28 16:12:45 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 15:12:45 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120127175414.385567b6@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> Message-ID: <4F24106D.5060604@voidspace.org.uk> On 27/01/2012 22:54, Barry Warsaw wrote: > On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote: > >> On Fri, 27 Jan 2012 16:10:51 -0500 >> Barry Warsaw wrote: >>> I'm -1 on this as well. It just feels like the completely wrong way to >>> stabilize an API, and I think despite the caveats that are explicit in >>> __preview__, Python will just catch tons of grief from users and haters about >>> API instability anyway, because from a practical standpoint, applications >>> written using __preview__ APIs *will* be less stable. >> Well, obviously __preview__ is not for the most conservative users. I >> think the name clearly conveys the idea that you are trying out >> something which is not in its definitive state, doesn't it? > Maybe. I could quibble about the name, but let's not bikeshed on that > right now. The problem as I see it is that __preview__ will be very tempting > to use in production. In fact, its use case is almost predicated on that. > (We want you to use it so you can tell us if the API is good.) > > Once people use it, they will probably ship code that relies on it, and then > the pressure will be applied to us to continue to support that API even if a > newer, better one gets promoted out of __preview__. I worry that over time, > for all practical purposes, there won't be much difference between __preview__ > and the stdlib. > >>>> I think a significantly healthier process (in terms of maximizing feedback >>>> and getting something into it's best shape) is to let a project evolve >>>> naturally on PyPi and in the ecosystem, give feedback to it from an inclusion >>>> perspective, and then include it when it becomes ready on it's own >>>> merits. The counter argument to this is that putting it in the stdlib gets >>>> you signficantly more eyeballs (and hopefully more feedback, therefore), my >>>> only response to this is: if it doesn't get eyeballs on PyPi I don't think >>>> there's a great enough need to justify it in the stdlib. >>> I agree with everything Alex said here. >> The idea that being on PyPI is sufficient is nice but flawed (the >> IPaddr example). PyPI doesn't guarantee any visibility (how many >> packages are there?). Furthermore, having users is not a guarantee that >> the API is appropriate, either; it just means that the API is >> appropriate for *some* users. > I can't argue with that, it's just that I don't think __preview__ solves that > problem. And it seems to me that __preview__ introduces a whole 'nother set > of problems on top of that. > > So taking the IPaddr example further. Would having it in the stdlib, > relegated to an explicitly unstable API part of the stdlib, increase eyeballs > enough to generate the kind of API feedback we're looking for, without > imposing an additional maintenance burden on us? I think the answer is yes. That's kind of the crux of the matter I guess. > If you were writing an app > that used something in __preview__, how would you provide feedback on what > parts of the API you'd want to change, The bugtracker. > *and* how would you adapt your > application to use those better APIs once they became available 18 months from > now? How do users do it for the standard library? Using the third party version is one way. > I think we'll just see folks using the unstable APIs and then > complaining when we remove them, even though they *know* *upfront* that these > APIs will go away. > > I'm also nervous about it from an OS vender point of view. Should I reject > any applications that import from __preview__? Or do I have to make a > commitment to support those APIs longer than Python does because the > application that uses it is important to me? > > I think the OS vendor problem is easier with an application that uses some > PyPI package, because I can always make that package available to the > application by pulling in the version I care about. It's harder if a newer, > incompatible version is released upstream and I want to provide both, but I > don't think __preview__ addresses that. A robust, standard approach to > versioning of modules would though, and I think would better solve what > __preview__ is trying to solve. Don't OS vendors go further and say "pin your dependency to the version we ship", whether it's in the Python standard library or not? So "just use a more recent version from pypi" is explicitly not an option for people using system packages. As OS packagers tend to target a specific version of python, using __preview__ for that version would be fine - and when they upgrade to the next version applications may need fixing in the same way as they would if the system packaged a new release of the third party library. (When moving between Ubuntu distributions I've found that my software using system packages often needs to change because the version of some library has now changed.) Plus having a package in __preview__ has no bearing on whether or not the system packages the third party version, so I think it's a bit of a red-herring. Michael >> On the other hand, __preview__ would clearly signal that something is >> on the verge of being frozen as an official stdlib API, and would >> prompt people to actively try it. > I'm not so sure about that. If I were to actively try it, I'm not sure how > much motivation I'd have to rewrite key parts of my code when an incompatible > version gets promoted to the un__preview__d stdlib. > > -Barry > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Sat Jan 28 17:05:11 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 16:05:11 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F2382F9.90101@scottdial.com> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> <4F2382F9.90101@scottdial.com> Message-ID: <4F241CB7.9090501@voidspace.org.uk> On 28/01/2012 05:09, Scott Dial wrote: > On 1/27/2012 8:48 PM, Barry Warsaw wrote: >> The thinking goes like this: if you would normally use an __preview__ module >> because you can't get approval to download some random package from PyPI, well >> then your distro probably could or should provide it, so get it from them. > That is my thought about the entire __preview__ concept. Anything that > would/should go into __preview__ would be better off being packaged for > a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would > get better visibility than just being on PyPI and would be more flexible > in terms of release schedule to allow API changes. > > If the effort being put into making the __preview__ package was put into > packaging those modules for distros, That effort wouldn't be put in though. Largely those involved in working on Python are not the ones packaging for Linux distributions. So it isn't an alternative to __preview__ - it could happily be done alongside it though. Those who work on Python won't just switch to Linux if this proposal isn't accepted, they'll do different work on Python instead. > then you would get the same > exposure Packaging libraries for Linux gets you no exposure on Windows or the Mac, so __preview__ is wider. > with better flexibility and a better maintenance story. The > whole idea of __preview__ seems to be a workaround for the difficult > packaging story for Python modules on common distros I don't know where you got that impression. :-) One of the reasons for __preview__ is that it means integrating libraries with the Python build and test systems, for all platforms. Packaging for [some-variants-of] Linux only doesn't do anything for this. All the best, Michael > -- stuffing them > into __preview__ is a cheat to get the distro packagers to distribute > these interesting modules since we would be bundling them. > > However, as you have pointed out, it would very desirable to them to not > do so. So in the end, these modules may not receive as wide of > visibility as the PEP suggests. I could very easily imagine the more > stable distributions refusing or patching anything that used __preview__ > in order to eliminate difficulties. > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Sat Jan 28 17:09:08 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 16:09:08 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> Message-ID: <4F241DA4.6080607@voidspace.org.uk> On 28/01/2012 13:55, Matt Joiner wrote: >> __preview__ would fall into this category as well). And yet I have >> essentially no means of gaining access to any 3rd party modules, >> whether they are packaged by the distro or obtained from PyPI. (And >> "build your own" isn't an option in many cases, if only because a C >> compiler may well not be available!) This is essentially due to >> corporate inertia and bogged down "do-nothing" policies rather than >> due dilligence or supportability concerns. But it is a reality for me >> (and many others, I suspect). >> >> Having said this, of course, the same corporate inertia means that >> Python 3.3 is a pipe-dream for me in those environments for many years >> yet. So ignoring them may be reasonable. > You clearly want access to external modules sooner. A preview > namespace addresses this indirectly. The separated stdlib versioning > concept is far superior for this use case. There are two proposals for the standard library - one is to do development in a separate repository to make it easier for other implementations to contribute. To my understanding this proposal is mildly controversial, but doesn't involve changing the way the standard library is distributed or versioned. A separate proposal about standard library versioning has been floated but is *much* more controversial and therefore much less likely to happen. So I wouldn't hold your breath on it... All the best, Michael Foord > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fuzzyman at voidspace.org.uk Sat Jan 28 17:12:47 2012 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 28 Jan 2012 16:12:47 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <87aa58306z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120127160934.2ad5e0bf@pitrou.net> <4F22C268.40005@voidspace.org.uk> <87aa58306z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F241E7F.2070206@voidspace.org.uk> On 28/01/2012 04:44, Stephen J. Turnbull wrote: > Michael Foord writes: > > > >> Assuming the module is then promoted to the the standard library proper in > > >> release ``3.X+1``, it will be moved to a permanent location in the library:: > > >> > > >> import example > > >> > > >> And importing it from ``__preview__`` will no longer work. > > > Why not leave it accessible through __preview__ too? > > > > +1 > > Er, doesn't this contradict your point about using > > try: > from __preview__ import spam > except ImportError: > import spam > > ? > > I think it's a bad idea to introduce a feature that's *supposed* to > break (in the sense of "make a break", ie, change the normal pattern) > with every release and then try to avoid breaking (in the sense of > "causing an unexpected failure") code written by people who don't want > to follow the discipline of keeping up with changing APIs. If they > want that stability, they should wait for the stable release. > > Modules should become unavailable from __preview__ as soon as they > have a stable home. > I like not breaking people's code where *possible*. Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From storchaka at gmail.com Sat Jan 28 11:18:34 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 28 Jan 2012 12:18:34 +0200 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: <4F231255.3050106@sievertsen.de> References: <4F231255.3050106@sievertsen.de> Message-ID: 27.01.12 23:08, Frank Sievertsen ???????(??): >> As already mentioned, the vulnerability of 64-bit Python rather >> theoretical and not practical. The size of the hash makes the attack >> is extremely unlikely. > > Unfortunately this assumption is not correct. It works very good with > 64bit-hashing. > > It's much harder to create (efficiently) 64-bit hash-collisions. > But I managed to do so and created strings with > a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even > 14 characters would have been enough. > > You need less than twice as many characters for the same effect as in > the 32bit-world. The point is not the length of the string, but the size of string space for inspection. To search for a string with a specified 64-bit hash to iterate over 2 ** 64 strings. Spending on a single string scan 1 nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 / (3600 * 24 * 365.25) = 585 years. For the attack we need to find 1000 such strings -- more than half a million years. For 32-bit hash would need only an hour. Of course, to calculate the hash function to use secure, not allowing "cut corners" and reduce computation time. From solipsis at pitrou.net Sat Jan 28 15:30:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 15:30:30 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: Message-ID: <20120128153030.763132ec@pitrou.net> On Sat, 28 Jan 2012 02:49:40 -0500 Matt Joiner wrote: > FWIW I'm now -1 for this idea. Stronger integration with PyPI and > packaging systems is much preferable. That will probably never happen. "pip install XXX" is the best we (python-dev and the community) can do. "import some_module" won't magically start fetching some_module from PyPI if it isn't installed on your system. So the bottom line is: we would benefit from an intermediate status between "available on PyPI" and "shipped as a stable API in the stdlib". The __preview__ proposal does just that in an useful way; are there any alternatives you'd like to suggest? Regards Antoine. From solipsis at pitrou.net Sat Jan 28 15:17:17 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 15:17:17 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <20120127204810.7d27cd06@resist.wooz.org> <4F2382F9.90101@scottdial.com> Message-ID: <20120128151717.7ddc33ea@pitrou.net> On Sat, 28 Jan 2012 00:09:13 -0500 Scott Dial wrote: > On 1/27/2012 8:48 PM, Barry Warsaw wrote: > > The thinking goes like this: if you would normally use an __preview__ module > > because you can't get approval to download some random package from PyPI, well > > then your distro probably could or should provide it, so get it from them. > > That is my thought about the entire __preview__ concept. Anything that > would/should go into __preview__ would be better off being packaged for > a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would > get better visibility than just being on PyPI and would be more flexible > in terms of release schedule to allow API changes. This is a red herring. First, not everyone uses a distro. There are almost a million monthly downloads of the Windows installers. Second, what a distro puts in their packages has nothing to do with considering a module for inclusion in the Python stdlib. Besides, I don't understand how being packaged by a distro makes a difference. My distro has thousands of packages, many of them quite obscure. OTOH, being shipped in the stdlib *and* visibly documented on python.org (in the stdlib docs, in the what's new, etc.) will make a difference. Regards Antoine. From guido at python.org Sat Jan 28 18:15:15 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 28 Jan 2012 09:15:15 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 28, 2012 at 5:04 AM, Paul Moore wrote: > On 28 January 2012 09:18, Nick Coghlan wrote: > >> It's basically us saying to Python users "We're explicitly flagging >> this PyPI module for inclusion in the next major Python release. We've >> integrated it into our build process, test suite and binary releases, >> so you don't even have to download it from PyPI in order to try it >> out, you can just import it from the __preview__ namespace (although >> you're still free to download it from PyPI if you prefer - in fact, if >> you need to support multiple Python versions, we actively recommend >> it!). There's still a small chance this module won't make the grade >> and will be dropped from the standard library entirely (that's why >> it's only a preview), but most likely it will move into the main part >> of the standard library with full backwards compatibility guarantees >> in the next release". > > +1. Hm. You could do this just as well without a __preview__ package. You just flag the module as experimental in the docs and get on with your life. We have some experience with this in Google App Engine. We used to use a separate "labs" package in our namespace and when packages were deemed stable enough they were moved from labs to non-labs. But the move always turned out to be a major pain, causing more breakage than we would have had if we had simply kept the package location the same but let the API mutate. Now we just put new, experimental packages in the right place from the start, and put a loud "experimental" banner on all pages of their docs, which is removed once the API is stable. There is much less pain now: while incompatible changes do happen for experimental package, they are not frequent, and rarely earth-shattering, and usually the final step is simply removing the banner without making any (incompatible) changes to the code. This means that the final step is painless for early adopters, thereby rewarding them for their patience instead of giving them one final kick while they sort out the import changes. So I do not support the __preview__ package. I think we're better off flagging experimental modules in the docs than in their name. For the specific case of the regex module, the best way to adoption may just be to include it in the stdlib as regex and keep it there. Any other solution will just cause too much anxiety. -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Sat Jan 28 16:21:10 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 28 Jan 2012 17:21:10 +0200 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: <4F231255.3050106@sievertsen.de> References: <4F231255.3050106@sievertsen.de> Message-ID: 27.01.12 23:08, Frank Sievertsen ???????(??): >> As already mentioned, the vulnerability of 64-bit Python rather >> theoretical and not practical. The size of the hash makes the attack >> is extremely unlikely. > > Unfortunately this assumption is not correct. It works very good with > 64bit-hashing. > > It's much harder to create (efficiently) 64-bit hash-collisions. > But I managed to do so and created strings with > a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even > 14 characters would have been enough. > > You need less than twice as many characters for the same effect as in > the 32bit-world. The point is not the length of the string, but the size of string space for inspection. To search for a string with a specified 64-bit hash to iterate over 2 ** 64 strings. Spending on a single string scan 1 nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 / (3600 * 24 * 365.25) = 585 years. For the attack we need to find 1000 such strings -- more than half a million years. For 32-bit hash would need only an hour. Of course, to calculate the hash function to use secure, not allowing "cut corners" and reduce computation time. From g.brandl at gmx.net Sat Jan 28 18:54:38 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 28 Jan 2012 18:54:38 +0100 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: Message-ID: Am 28.01.2012 02:19, schrieb Benjamin Peterson: > Hello everyone, > In effort to get a fix out before Perl 6 goes mainstream, Barry and I > have decided to pronounce on what we want for our stable releases. > What we have decided is that > 1. Simple hash randomization is the way to go. We think this has the > best chance of actually fixing the problem while being fairly > straightforward such that we're comfortable putting it in a stable > release. > 2. It will be off by default in stable releases and enabled by an > envar at runtime. This will prevent code breakage from dictionary > order changing as well as people depending on the hash stability. FWIW, the same will be done for 3.2. Georg From barry at python.org Sat Jan 28 19:14:36 2012 From: barry at python.org (Barry Warsaw) Date: Sat, 28 Jan 2012 13:14:36 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120128131436.0179513d@resist.wooz.org> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote: >So I do not support the __preview__ package. I think we're better off >flagging experimental modules in the docs than in their name. For the >specific case of the regex module, the best way to adoption may just >be to include it in the stdlib as regex and keep it there. Any other >solution will just cause too much anxiety. +1 What does the PEP give you above this "simple as possible" solution? -Barry From solipsis at pitrou.net Sat Jan 28 19:29:49 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 19:29:49 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> Message-ID: <20120128192949.68f07267@pitrou.net> On Sat, 28 Jan 2012 13:14:36 -0500 Barry Warsaw wrote: > On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote: > > >So I do not support the __preview__ package. I think we're better off > >flagging experimental modules in the docs than in their name. For the > >specific case of the regex module, the best way to adoption may just > >be to include it in the stdlib as regex and keep it there. Any other > >solution will just cause too much anxiety. > > +1 > > What does the PEP give you above this "simple as possible" solution? "I think we'll just see folks using the unstable APIs and then complaining when we remove them, even though they *know* *upfront* that these APIs will go away." That problem would be much worse if some modules were simply marked "experimental" in the doc, rather than put in a separate namespace. You will see people copying recipes found on the internet without knowing that they rely on unstable APIs. Regards Antoine. From solipsis at pitrou.net Sat Jan 28 19:39:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 19:39:08 +0100 Subject: [Python-Dev] Python 3 benchmarks References: Message-ID: <20120128193908.3dd8d9fc@pitrou.net> On Sat, 28 Jan 2012 14:21:18 +0200 Maciej Fijalkowski wrote: > Hi > > Something that's maybe worth mentioning is that the "official" python > benchmark suite http://hg.python.org/benchmarks/ has a pretty > incomplete set of benchmarks for python 3 compared to say what we run > for pypy: https://bitbucket.org/pypy/benchmarks I think a very > worthwhile project would be to try to port other benchmarks (that > actually use existing python projects like sympy or django) for those > that has been ported to python 3. Agreed. cheers Antoine. From mwm at mired.org Sat Jan 28 19:46:18 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 28 Jan 2012 10:46:18 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120128192949.68f07267@pitrou.net> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> Message-ID: <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> Antoine Pitrou wrote: >On Sat, 28 Jan 2012 13:14:36 -0500 >Barry Warsaw wrote: >> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote: >> >> >So I do not support the __preview__ package. I think we're better >off >> >flagging experimental modules in the docs than in their name. For >the >> >specific case of the regex module, the best way to adoption may just >> >be to include it in the stdlib as regex and keep it there. Any other >> >solution will just cause too much anxiety. >> >> +1 >> >> What does the PEP give you above this "simple as possible" solution? > >"I think we'll just see folks using the unstable APIs and then >complaining when we remove them, even though they *know* *upfront* that >these APIs will go away." > >That problem would be much worse if some modules were simply marked >"experimental" in the doc, rather than put in a separate namespace. >You will see people copying recipes found on the internet without >knowing that they rely on unstable APIs. How. About doing them the way we do depreciated modules, and have them spit warnings to stderr? Maybe add a flag and environment variable to disable that. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From solipsis at pitrou.net Sat Jan 28 19:49:01 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 19:49:01 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> Message-ID: <1327776541.8904.5.camel@localhost.localdomain> Le samedi 28 janvier 2012 ? 10:46 -0800, Mike Meyer a ?crit : > Antoine Pitrou wrote: > > >On Sat, 28 Jan 2012 13:14:36 -0500 > >Barry Warsaw wrote: > >> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote: > >> > >> >So I do not support the __preview__ package. I think we're better > >off > >> >flagging experimental modules in the docs than in their name. For > >the > >> >specific case of the regex module, the best way to adoption may just > >> >be to include it in the stdlib as regex and keep it there. Any other > >> >solution will just cause too much anxiety. > >> > >> +1 > >> > >> What does the PEP give you above this "simple as possible" solution? > > > >"I think we'll just see folks using the unstable APIs and then > >complaining when we remove them, even though they *know* *upfront* that > >these APIs will go away." > > > >That problem would be much worse if some modules were simply marked > >"experimental" in the doc, rather than put in a separate namespace. > >You will see people copying recipes found on the internet without > >knowing that they rely on unstable APIs. > > How. About doing them the way we do depreciated modules, and have them > spit warnings to stderr? Maybe add a flag and environment variable to > disable that. You're proposing that new experimental modules spit warnings when you use them? I don't think that's a good way of promoting their use :) (something we do want to do even though we also want to convey the idea that they're not yet "stable" or "fully approved") Regards Antoine. From ethan at stoneleaf.us Sat Jan 28 19:48:31 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 28 Jan 2012 10:48:31 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F241E7F.2070206@voidspace.org.uk> References: <20120127160934.2ad5e0bf@pitrou.net> <4F22C268.40005@voidspace.org.uk> <87aa58306z.fsf@uwakimon.sk.tsukuba.ac.jp> <4F241E7F.2070206@voidspace.org.uk> Message-ID: <4F2442FF.7050106@stoneleaf.us> Michael Foord wrote: > On 28/01/2012 04:44, Stephen J. Turnbull wrote: >> I think it's a bad idea to introduce a feature that's *supposed* to >> break (in the sense of "make a break", ie, change the normal pattern) >> with every release and then try to avoid breaking (in the sense of >> "causing an unexpected failure") code written by people who don't want >> to follow the discipline of keeping up with changing APIs. If they >> want that stability, they should wait for the stable release. >> >> Modules should become unavailable from __preview__ as soon as they >> have a stable home. >> > I like not breaking people's code where *possible*. __preview__ is not about stability. It's about making code easily available for testing before the API freezes. If nothing has changed once it graduates, how hard is it to change a few lines of code from from __preview__ import blahblahblah to import blahblahblah ? It seems to me that including a __preview__ package in production software is a mistake, and not its intention. ~Ethan~ From ethan at stoneleaf.us Sat Jan 28 19:56:57 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 28 Jan 2012 10:56:57 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> <4F22DA18.4050706@stoneleaf.us> <4F234251.8080708@stoneleaf.us> Message-ID: <4F2444F9.6040500@stoneleaf.us> Nick Coghlan wrote: > On Sat, Jan 28, 2012 at 10:33 AM, Ethan Furman wrote: >> So the question is: >> >> - should 'raise ... from ...' be legal outside a try block? >> >> - should 'raise ... from None' be legal outside a try block? > > Given that it would be quite a bit of work to make it illegal, my > preference is to leave it alone. > > I believe that means there's only one open question. Should "raise ex > from None" be syntactic sugar for: > > 1. clearing the current thread's exception state (as I believe Ethan's > patch currently does), thus meaning that __context__ and __cause__ > both end up being None > 2. setting __cause__ to None (so that __context__ still gets set > normally, as it is now when __cause__ is set to a specific exception), > and having __cause__ default to a *new* sentinel object that indicates > "use __context__" > > I've already stated my own preference in favour of 2 - that approach > means developers that think about it can explicitly change exception > types such that the context isn't displayed by default, but > application and framework developers remain free to insert their own > exception handlers that *always* report the full exception stack. The reasoning behind choice two makes a lot of sense. My latest effort (I should be able to get the patch posted within two days) involves creating a new dummy exception, SuppressContext, and 'raise ... from None' sets cause to it; the printing logic checks to see if cause is SuppressContext, and if so, prints neither context nor cause. Not exactly how Nick describes it, but as far as I've gotten in my Python core hacking skills. ;) ~Ethan~ From bauertomer at gmail.com Sat Jan 28 20:59:10 2012 From: bauertomer at gmail.com (T.B.) Date: Sat, 28 Jan 2012 21:59:10 +0200 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints Message-ID: <4F24538E.9060705@gmail.com> Hello python-dev, This is probably worth of a bug report: While looking at threading.py I noticed that Semaphore's counter can go below zero. This is opposed to the docs: "The counter can never go below zero; ...". Just try: import threading s = threading.Semaphore(0.5) # You can now acquire s as many times as you want! # even when s._value < 0. The fix is tiny: diff -r 265d35e8fe82 Lib/threading.py --- a/Lib/threading.py Fri Jan 27 21:17:04 2012 +0000 +++ b/Lib/threading.py Sat Jan 28 21:22:04 2012 +0200 @@ -322,7 +321,7 @@ rc = False endtime = None self._cond.acquire() - while self._value == 0: + while self._value <= 0: if not blocking: break if __debug__: Which is better than forcing s._value to be an int. I also think that the docs should be updated to reflect that the counter is not compared to be equal to zero, but non-positive. e.g. "when acquire() finds that it is zero...", "If it is zero on entry, block...". On another commit: Regarding http://bugs.python.org/issue9346, an unused import was left: -from collections import deque Cheers, TB From benjamin at python.org Sat Jan 28 21:07:16 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 28 Jan 2012 15:07:16 -0500 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: <4F24538E.9060705@gmail.com> References: <4F24538E.9060705@gmail.com> Message-ID: 2012/1/28 T.B. : > Hello python-dev, > > This is probably worth of a bug report: While looking at threading.py I > noticed that Semaphore's counter can go below zero. This is opposed to the > docs: "The counter can never go below zero; ...". Just try: > > import threading > s = threading.Semaphore(0.5) But why would you want to pass a float? It seems like API abuse to me. -- Regards, Benjamin From martin at v.loewis.de Sat Jan 28 21:57:01 2012 From: martin at v.loewis.de (martin at v.loewis.de) Date: Sat, 28 Jan 2012 21:57:01 +0100 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: References: <4F231255.3050106@sievertsen.de> Message-ID: <20120128215701.Horde.VBuyR8L8999PJGEdWyFFnkA@webmail.df.eu> Zitat von Serhiy Storchaka : > 27.01.12 23:08, Frank Sievertsen ???????(??): >>> As already mentioned, the vulnerability of 64-bit Python rather >>> theoretical and not practical. The size of the hash makes the attack >>> is extremely unlikely. >> >> Unfortunately this assumption is not correct. It works very good with >> 64bit-hashing. >> >> It's much harder to create (efficiently) 64-bit hash-collisions. >> But I managed to do so and created strings with >> a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even >> 14 characters would have been enough. >> >> You need less than twice as many characters for the same effect as in >> the 32bit-world. > > > The point is not the length of the string, but the size of string > space for inspection. To search for a string with a specified 64-bit > hash to iterate over 2 ** 64 strings. I think you entirely missed the point of Frank's message. Despite your analysis that it shall not be possible, Frank has *actually* computed colliding strings, most likely also for a specified hash value. > Of course, to calculate the hash function to use secure, not > allowing "cut corners" and reduce computation time. This issue wouldn't be that relevant if there wasn't a documented algorithm to significantly reduce the number of tries you need to make to produce a string with a desired hash value. My own implementation would need 2**33 tries in the worst case (for a 64-bit hash value); thanks to the birthday paradox, it's actually a significant chance that the algorithm finds collisions even faster. Regards, Martin From mwm at mired.org Sat Jan 28 22:49:52 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 28 Jan 2012 13:49:52 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <1327776541.8904.5.camel@localhost.localdomain> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> <1327776541.8904.5.camel@localhost.localdomain> Message-ID: Antoine Pitrou wrote: >Le samedi 28 janvier 2012 ? 10:46 -0800, Mike Meyer a ?crit : >> Antoine Pitrou wrote: >> >You will see people copying recipes found on the internet without >> >knowing that they rely on unstable APIs. >> >> How. About doing them the way we do depreciated modules, and have >them >> spit warnings to stderr? Maybe add a flag and environment variable >to >> disable that. > >You're proposing that new experimental modules spit warnings when you >use them? To be explicit, when the system loada them. > I don't think that's a good way of promoting their use :) And importing something from __preview__or __experimental__or whatever won't? This thread did include the suggestion that they go into their final location instead of a magic module. >(something we do want to do even though we also want to convey the idea >that they're not yet "stable" or "fully approved") Doing it with a message pointing at the page describing the status makes sure users read the docs before using them. That solves the problem of using them without realizing it. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From solipsis at pitrou.net Sat Jan 28 23:02:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Jan 2012 23:02:37 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> <1327776541.8904.5.camel@localhost.localdomain> Message-ID: <1327788157.8904.17.camel@localhost.localdomain> > >You're proposing that new experimental modules spit warnings when you > >use them? > > To be explicit, when the system loada them. There are many reasons to import a module, such as viewing its documentation. And the warning will trigger if the import happens in non-user code, such as a library; or when there is a fallback for the module not being present. People usually get annoyed by intempestive warnings which don't warn about an actual problem. > >(something we do want to do even though we also want to convey the idea > >that they're not yet "stable" or "fully approved") > > Doing it with a message pointing at the page describing the status > makes sure users read the docs before using them. Sure, it's just much less user-friendly than conveying that idea in the module's namespace. Besides, it only works if warnings are not silenced. People are used to __future__ (and I've seen no indication that they don't like it). __preview__ is another application of the same pattern (using a special namespace to indicate the status of a feature). Regards Antoine. From ericsnowcurrently at gmail.com Sun Jan 29 00:03:45 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 28 Jan 2012 16:03:45 -0700 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <1327788157.8904.17.camel@localhost.localdomain> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> <1327776541.8904.5.camel@localhost.localdomain> <1327788157.8904.17.camel@localhost.localdomain> Message-ID: On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou wrote: > There are many reasons to import a module, such as viewing its > documentation. And the warning will trigger if the import happens in > non-user code, such as a library; or when there is a fallback for the > module not being present. People usually get annoyed by intempestive > warnings which don't warn about an actual problem. As an alternative, how about a __preview__ or __provisional__ attribute on modules that are in this provisional state? So just add that big warning to the docs, as Guido suggested, and set the attribute as a programmatic indicator. Perhaps also add sys.provisional_modules (or wherever) to explicitly give the full list for the current Python version. -eric From solipsis at pitrou.net Sun Jan 29 00:08:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 29 Jan 2012 00:08:37 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> <1327776541.8904.5.camel@localhost.localdomain> <1327788157.8904.17.camel@localhost.localdomain> Message-ID: <1327792117.4376.8.camel@localhost.localdomain> Le samedi 28 janvier 2012 ? 16:03 -0700, Eric Snow a ?crit : > On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou wrote: > > There are many reasons to import a module, such as viewing its > > documentation. And the warning will trigger if the import happens in > > non-user code, such as a library; or when there is a fallback for the > > module not being present. People usually get annoyed by intempestive > > warnings which don't warn about an actual problem. > > As an alternative, how about a __preview__ or __provisional__ > attribute on modules that are in this provisional state? So just add > that big warning to the docs, as Guido suggested, and set the > attribute as a programmatic indicator. Perhaps also add > sys.provisional_modules (or wherever) to explicitly give the full list > for the current Python version. Well, how often do you examine the attributes of a module before using it? I think that's a much too obscure way to convey the information. Regards Antoine. From ericsnowcurrently at gmail.com Sun Jan 29 00:34:37 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 28 Jan 2012 16:34:37 -0700 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <1327792117.4376.8.camel@localhost.localdomain> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120128131436.0179513d@resist.wooz.org> <20120128192949.68f07267@pitrou.net> <075f602c-7477-4d4f-8fad-483f8c669b5a@email.android.com> <1327776541.8904.5.camel@localhost.localdomain> <1327788157.8904.17.camel@localhost.localdomain> <1327792117.4376.8.camel@localhost.localdomain> Message-ID: On Sat, Jan 28, 2012 at 4:08 PM, Antoine Pitrou wrote: > Le samedi 28 janvier 2012 ? 16:03 -0700, Eric Snow a ?crit : >> On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou wrote: >> > There are many reasons to import a module, such as viewing its >> > documentation. And the warning will trigger if the import happens in >> > non-user code, such as a library; or when there is a fallback for the >> > module not being present. People usually get annoyed by intempestive >> > warnings which don't warn about an actual problem. >> >> As an alternative, how about a __preview__ or __provisional__ >> attribute on modules that are in this provisional state? ?So just add >> that big warning to the docs, as Guido suggested, and set the >> attribute as a programmatic indicator. ?Perhaps also add >> sys.provisional_modules (or wherever) to explicitly give the full list >> for the current Python version. > > Well, how often do you examine the attributes of a module before using > it? I think that's a much too obscure way to convey the information. Granted. However, actively looking for the attribute is only one of the lesser use-cases. The key is that it allows you to check any library programmatically for dependence on any of the provisional modules. The warning in the docs is important, but being able to have code check for it is important too. As a small bonus, it would show up in help for the module and in dir(). -eric From pydev at sievertsen.de Sun Jan 29 00:39:48 2012 From: pydev at sievertsen.de (Frank Sievertsen) Date: Sun, 29 Jan 2012 00:39:48 +0100 Subject: [Python-Dev] Hashing proposal: 64-bit hash In-Reply-To: References: <4F231255.3050106@sievertsen.de> Message-ID: <4F248744.7030201@sievertsen.de> > > The point is not the length of the string, but the size of string > space for inspection. To search for a string with a specified 64-bit > hash to iterate over 2 ** 64 strings. Spending on a single string scan > 1 nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 > / (3600 * 24 * 365.25) = 585 years. For the attack we need to find > 1000 such strings -- more than half a million years. For 32-bit hash > would need only an hour. > With meet-in-the-middle and some other tricks it's possible to generate 25,000 64-bit-collisions per hour using an older desktop-cpu and 4gb ram. H_dypmRNWgOxiaaG A_ceO8B4Q2eKfabi S_kpgdB3tUFJiaae H_dypmRNWgOxiaaG D_FYzdys3H8qbaba 0_pOwRq15h8vbabO S_kpgdB3tUFJiaae __mdKp1GvI_fcaaM 6_U3_B0pJT1UsaaW 4_1GnK9BmLj9naa5 __X7hMeAOpACdaaw B_7pm.T62SiLlaai I_HSdl0axd8tmaae T_Dv3LwayACpdaaO Frank From tjreedy at udel.edu Sun Jan 29 00:47:23 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 28 Jan 2012 18:47:23 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On 1/28/2012 3:55 AM, Nick Coghlan wrote: I am currently -something on the proposal as it because it will surely create a lot of hassles and because I do not think it is necessary the best solution to the motivating concerns. > Don't consider this PEP a purely theoretical proposal, because it > isn't. It's really being put forward to solve a specific problem: the > fact that we need to do something about re's lack of proper Unicode > support [1]. Those issues are actually hard to solve, so replacing re > with Matthew Barnett's regex module (just as re itself was a > replacement for the original regex module) that already addresses most > of them seems like a good way forward, but this is currently being > blocked because there are still a few lingering concerns with > maintainability and backwards compatibility. I find the concern about 'maintainability' a bit strange as regex seems to be getting more maintainance and improvement than re. The re author is no longer active. If neither were in the library, and we were considering both, regex would certainly win, at least from a user view. Tom Christiansen reviewed about 8 unicode-capable extended r. e. packages, including both re and regex, and regex came out much better. The concern about back compatibility ignores the code that re users cannot write. In any case, that problem would be solved by adding regex in addition to re instead of as a replacement. If it were initially added as __preview__.regex, would the next step be to call it regex? or change it to re and remove the current package?. If the former, I think we might as well do it now. If the latter, that is different from what the pep proposes. > While regex is the current poster-child for this problem, I see it as a special case that is not really addressed by the Pep. The other proposed use-case for __preview__ is packages whose api is not stable. Such packages may need their api changed a lot sooner than 18-24 months. Or, their api may change for a lot longer than just one release cycle. So the PEP would be best suited for packages who api may be fixed but might need code-breaking adjustments *once* in 18 months. A counter-proposal: add an __x__ package to site-packages. Document the contents separately in an X-Library manual. Let the api of such packages change with every micro release. Don't guarantee that modules won't disappear completely. Don't put a time limit on residence there before being moved up (to the stdlib) or out. Packages that track volatile external standards could stay there indefinitely. If an module is moved to stdlib, leave a stub for at least two versions that emits a deprecation warning (to switch to import a instead of __x__.a) and a notice that the doc has moved, along with importing the contents of the stdlib version. (This would work for the __preview__ proposal also.) -- Terry Jan Reedy From ncoghlan at gmail.com Sun Jan 29 02:33:21 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jan 2012 11:33:21 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jan 29, 2012 at 3:15 AM, Guido van Rossum wrote: > Hm. You could do this just as well without a __preview__ package. You > just flag the module as experimental in the docs and get on with your > life. > > We have some experience with this in Google App Engine. We used to use > a separate "labs" package in our namespace and when packages were > deemed stable enough they were moved from labs to non-labs. But the > move always turned out to be a major pain, causing more breakage than > we would have had if we had simply kept the package location the same > but let the API mutate. Now we just put new, experimental packages in > the right place from the start, and put a loud "experimental" banner > on all pages of their docs, which is removed once the API is stable. > > There is much less pain now: while incompatible changes do happen for > experimental package, they are not frequent, and rarely > earth-shattering, and usually the final step is simply removing the > banner without making any (incompatible) changes to the code. This > means that the final step is painless for early adopters, thereby > rewarding them for their patience instead of giving them one final > kick while they sort out the import changes. > > So I do not support the __preview__ package. I think we're better off > flagging experimental modules in the docs than in their name. For the > specific case of the regex module, the best way to adoption may just > be to include it in the stdlib as regex and keep it there. Any other > solution will just cause too much anxiety. I'm willing to go along with that (especially given your report of AppEngine's experience with the "labs" namespace). Can we class this as a pronouncement on PEP 408? That is, "No to adding a __preview__ namespace, but yes to adding regex directly for 3.3"? Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Sun Jan 29 04:29:05 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 28 Jan 2012 19:29:05 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan wrote: > I'm willing to go along with that (especially given your report of > AppEngine's experience with the "labs" namespace). > > Can we class this as a pronouncement on PEP 408? That is, "No to > adding a __preview__ namespace, but yes to adding regex directly for > 3.3"? Yup. We seem to have a tendency to over-analyze decisions a bit lately (witness the hand-wringing about the hash collision DoS attack). For those who worry about people who copy recipes that stop working, I think they're worrying too much. If people want to take a shortcut without reading the documentation or understanding the code they are copying, fine, but they should realize the limitations of free advice. I don't mean to put down the many great recipes that exist or the value of copying code to get started quickly. But I think our liability as maintainers of the library is sufficiently delineated when we clearly mark a module as experimental in the documentation. (Recipe authors should ideally also add this warning to their recipe if it depends on an experimental API.) Finally, if you really want to put warnings in whenever an experimental module is being used, make it a silent warning, like SilentDeprecationWarning. That allows people to request more strict warnings without unduly alarming the users of an app. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sun Jan 29 07:42:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jan 2012 16:42:28 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jan 29, 2012 at 1:29 PM, Guido van Rossum wrote: > On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan wrote: >> I'm willing to go along with that (especially given your report of >> AppEngine's experience with the "labs" namespace). >> >> Can we class this as a pronouncement on PEP 408? That is, "No to >> adding a __preview__ namespace, but yes to adding regex directly for >> 3.3"? > > Yup. We seem to have a tendency to over-analyze decisions a bit lately > (witness the hand-wringing about the hash collision DoS attack). I have now updated PEP 408 accordingly (i.e. rejected, but with a specific note about regex). And (since Alex Gaynor brought it up off-list), I'll explicitly note here that I'm taking your approval as granting the special permission PEP 399 needs to accept a C extension module without a pure Python equivalent. Patches to *add* a pure Python version for use by other implementations are of course welcome (in practice, I suspect it's likely only in PyPy that such an engine would be fast enough to be usable). Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ethan at stoneleaf.us Sun Jan 29 08:42:20 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 28 Jan 2012 23:42:20 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <4F24F85C.9080603@stoneleaf.us> Benjamin Peterson wrote: > 2012/1/26 Ethan Furman : >> PEP: XXX > > Congratulations, you are now PEP 409. Thanks, Benjamin! So, how do I make changes to it? ~Ethan~ From ethan at stoneleaf.us Sun Jan 29 08:44:32 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 28 Jan 2012 23:44:32 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F2217D1.2000700@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> Message-ID: <4F24F8E0.6080400@stoneleaf.us> For those not on the nosy list, here's the latest post to http://bugs.python.org/issue6210: ------------------------------------------------------- It looks like agreement is forming around the raise ... from None method. It has been mentioned more than once that having the context saved on the exception would be a Good Thing, and for further debugging (or logging or what-have-you) I must agree. The patch attached now sets __cause__ to True, leaving __context__ unclobbered. The exception printing routine checks to see if __cause__ is True, and if so simply skips the display of either cause or __context__, but __context__ can still be queried by later code. One concern raised was that since it is possible to write (even before this patch) raise KeyError from NameError outside of a try block that some would get into the habit of writing raise KeyError from None as a way of preemptively suppressing implicit context chaining; I am happy to report that this is not an issue, since when that exception is caught and a new exception raised, it is the new exception that controls the display. In other words: >>> >>> try: ... raise ValueError from None ... except: ... raise NameError ... Traceback (most recent call last): File "", line 2, in ValueError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 4, in NameError From g.brandl at gmx.net Sun Jan 29 09:39:01 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 29 Jan 2012 09:39:01 +0100 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F24F85C.9080603@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> <4F24F85C.9080603@stoneleaf.us> Message-ID: Am 29.01.2012 08:42, schrieb Ethan Furman: > Benjamin Peterson wrote: >> 2012/1/26 Ethan Furman : >>> PEP: XXX >> >> Congratulations, you are now PEP 409. > > Thanks, Benjamin! > > So, how do I make changes to it? Please send PEP updates to the PEP editors at peps at python.org. Georg From mark at hotpy.org Sun Jan 29 11:31:48 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 10:31:48 +0000 Subject: [Python-Dev] A new dictionary implementation Message-ID: <4F252014.3080900@hotpy.org> Hi, Now that issue 13703 has been largely settled, I want to propose my new dictionary implementation again. It is a little more polished than before. https://bitbucket.org/markshannon/hotpy_new_dict Object-oriented benchmarks use considerably less memory and are sometimes faster (by a small amount). (I've only benchmarked on my old 32bit machine) E.g 2to3 No speed change -28% memory GCbench +10% speed -47% memory Other benchmarks show little or no change in behaviour, mainly minor memory savings. If an application is OO and uses lots of memory the new dict will save a lot of memory and maybe boost performance. Other applications will be largely unaffected. It passes all the tests. (I had to change a couple that relied on dict repr() ordering) Cheers, Mark. From solipsis at pitrou.net Sun Jan 29 15:09:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 29 Jan 2012 15:09:34 +0100 Subject: [Python-Dev] A new dictionary implementation References: <4F252014.3080900@hotpy.org> Message-ID: <20120129150934.458702a2@pitrou.net> Hi, On Sun, 29 Jan 2012 10:31:48 +0000 Mark Shannon wrote: > > Now that issue 13703 has been largely settled, > I want to propose my new dictionary implementation again. > It is a little more polished than before. > > https://bitbucket.org/markshannon/hotpy_new_dict I briefly took a look at your code yesterday and it looked generally reasonable to me. It would be nice to open an issue on http://bugs.python.org so that we can review it there (just fill the "repository" field and use the "create patch" button). Regards Antoine. From benjamin at python.org Sun Jan 29 15:56:11 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 29 Jan 2012 09:56:11 -0500 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F252014.3080900@hotpy.org> References: <4F252014.3080900@hotpy.org> Message-ID: 2012/1/29 Mark Shannon : > Hi, > > Now that issue 13703 has been largely settled, > I want to propose my new dictionary implementation again. > It is a little more polished than before. If you're serious about changing the dictionary implementation, I think you should write a PEP. It should explain the new dicts advantages (and disadvantages?) and give comprehensive benchmark numbers. Something along the lines of http://www.python.org/dev/peps/pep-3128/ I should think. -- Regards, Benjamin From solipsis at pitrou.net Sun Jan 29 16:08:41 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 29 Jan 2012 16:08:41 +0100 Subject: [Python-Dev] A new dictionary implementation References: <4F252014.3080900@hotpy.org> Message-ID: <20120129160841.2343b62f@pitrou.net> On Sun, 29 Jan 2012 09:56:11 -0500 Benjamin Peterson wrote: > 2012/1/29 Mark Shannon : > > Hi, > > > > Now that issue 13703 has been largely settled, > > I want to propose my new dictionary implementation again. > > It is a little more polished than before. > > If you're serious about changing the dictionary implementation, I > think you should write a PEP. It should explain the new dicts > advantages (and disadvantages?) and give comprehensive benchmark > numbers. Something along the lines of > http://www.python.org/dev/peps/pep-3128/ I should think. "New dictionary implementation" is a misnomer here. Mark's patch merely allows to share the keys array between several dictionaries. The lookup algorithm remains exactly the same as far as I've read. It's actually much less invasive than e.g. Martin's AVL trees-for-hash-collisions proposal. Regards Antoine. From benjamin at python.org Sun Jan 29 16:19:39 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 29 Jan 2012 10:19:39 -0500 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <20120129160841.2343b62f@pitrou.net> References: <4F252014.3080900@hotpy.org> <20120129160841.2343b62f@pitrou.net> Message-ID: 2012/1/29 Antoine Pitrou : > On Sun, 29 Jan 2012 09:56:11 -0500 > Benjamin Peterson wrote: > >> 2012/1/29 Mark Shannon : >> > Hi, >> > >> > Now that issue 13703 has been largely settled, >> > I want to propose my new dictionary implementation again. >> > It is a little more polished than before. >> >> If you're serious about changing the dictionary implementation, I >> think you should write a PEP. It should explain the new dicts >> advantages (and disadvantages?) and give comprehensive benchmark >> numbers. Something along the lines of >> http://www.python.org/dev/peps/pep-3128/ I should think. > > "New dictionary implementation" is a misnomer here. Mark's patch merely > allows to share the keys array between several dictionaries. The lookup > algorithm remains exactly the same as far as I've read. It's actually > much less invasive than e.g. Martin's AVL trees-for-hash-collisions > proposal. Ah, okay. So, the subject makes sound scarier than it is. :) -- Regards, Benjamin From mark at hotpy.org Sun Jan 29 16:16:24 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 15:16:24 +0000 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <20120129150934.458702a2@pitrou.net> References: <4F252014.3080900@hotpy.org> <20120129150934.458702a2@pitrou.net> Message-ID: <4F2562C8.4080707@hotpy.org> Antoine Pitrou wrote: > Hi, > > On Sun, 29 Jan 2012 10:31:48 +0000 > Mark Shannon wrote: >> Now that issue 13703 has been largely settled, >> I want to propose my new dictionary implementation again. >> It is a little more polished than before. >> >> https://bitbucket.org/markshannon/hotpy_new_dict > > I briefly took a look at your code yesterday and it looked generally > reasonable to me. It would be nice to open an issue on > http://bugs.python.org so that we can review it there (just fill the > "repository" field and use the "create patch" button). Done: http://bugs.python.org/issue13903 Cheers, Mark From andrea.crotti.0 at gmail.com Sun Jan 29 16:34:38 2012 From: andrea.crotti.0 at gmail.com (Andrea Crotti) Date: Sun, 29 Jan 2012 15:34:38 +0000 Subject: [Python-Dev] #include "Python.h" Message-ID: <4F25670E.4010701@gmail.com> I have a newbie question about CPython. Looking at the C code I noted that for example in tupleobject.c there is only one include #include "Python.h" Python.h actually includes everything as far as I can I see so: - it's very hard with a not-enough smart editor to find out where the not-locally defined symbols are actually defined (well sure that is not a problem for most of the people) - if all the files include python.h, doesn't it generate very big object files? Or is it not a problem since they are stripped out after? Thanks, Andrea From mark at hotpy.org Sun Jan 29 17:07:56 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 16:07:56 +0000 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <20120129160841.2343b62f@pitrou.net> References: <4F252014.3080900@hotpy.org> <20120129160841.2343b62f@pitrou.net> Message-ID: <4F256EDC.70707@hotpy.org> Antoine Pitrou wrote: > On Sun, 29 Jan 2012 09:56:11 -0500 > Benjamin Peterson wrote: > >> 2012/1/29 Mark Shannon : >>> Hi, >>> >>> Now that issue 13703 has been largely settled, >>> I want to propose my new dictionary implementation again. >>> It is a little more polished than before. >> If you're serious about changing the dictionary implementation, I >> think you should write a PEP. It should explain the new dicts >> advantages (and disadvantages?) and give comprehensive benchmark >> numbers. Something along the lines of >> http://www.python.org/dev/peps/pep-3128/ I should think. > > "New dictionary implementation" is a misnomer here. Mark's patch merely > allows to share the keys array between several dictionaries. The lookup > algorithm remains exactly the same as far as I've read. It's actually > much less invasive than e.g. Martin's AVL trees-for-hash-collisions > proposal. > Antoine is right. It is a reorganisation of the dict, plus a couple of changes to typeobject.c and object.c to ensure that instance dictionaries do indeed share keys arrays. The lookup algorithm remains the same (it works well). Cheers, Mark From phd at phdru.name Sun Jan 29 18:22:23 2012 From: phd at phdru.name (Oleg Broytman) Date: Sun, 29 Jan 2012 21:22:23 +0400 Subject: [Python-Dev] #include "Python.h" In-Reply-To: <4F25670E.4010701@gmail.com> References: <4F25670E.4010701@gmail.com> Message-ID: <20120129172223.GA32042@iskra.aviel.ru> Hello. We are sorry but we cannot help you. This mailing list is to work on developing Python (adding new features to Python itself and fixing bugs); if you're having problems learning, understanding or using Python, please find another forum. Probably python-list/comp.lang.python mailing list/news group is the best place; there are Python developers who participate in it; you may get a faster, and probably more complete, answer there. See http://www.python.org/community/ for other lists/news groups/fora. Thank you for understanding. On Sun, Jan 29, 2012 at 03:34:38PM +0000, Andrea Crotti wrote: > I have a newbie question about CPython. > Looking at the C code I noted that for example in tupleobject.c there is > only one include > #include "Python.h" > > Python.h actually includes everything as far as I can I see so: > - it's very hard with a not-enough smart editor to find out where the > not-locally defined symbols are actually defined (well sure that is > not a problem for most of the people) > > - if all the files include python.h, doesn't it generate very big object > files? Or is it not a problem since they are stripped out after? > > Thanks, > Andrea Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From andrea.crotti.0 at gmail.com Sun Jan 29 18:59:51 2012 From: andrea.crotti.0 at gmail.com (Andrea Crotti) Date: Sun, 29 Jan 2012 17:59:51 +0000 Subject: [Python-Dev] #include "Python.h" In-Reply-To: <20120129172223.GA32042@iskra.aviel.ru> References: <4F25670E.4010701@gmail.com> <20120129172223.GA32042@iskra.aviel.ru> Message-ID: <4F258917.9090706@gmail.com> On 01/29/2012 05:22 PM, Oleg Broytman wrote: > Hello. > > We are sorry but we cannot help you. This mailing list is to work on > developing Python (adding new features to Python itself and fixing bugs); > if you're having problems learning, understanding or using Python, please > find another forum. Probably python-list/comp.lang.python mailing list/news > group is the best place; there are Python developers who participate in it; > you may get a faster, and probably more complete, answer there. See > http://www.python.org/community/ for other lists/news groups/fora. Thank > you for understanding. > I wrote here because I thought it was the best place, but I understand this point of view, I can ask on python or python-core for example.. From ctb at msu.edu Sun Jan 29 19:10:07 2012 From: ctb at msu.edu (C. Titus Brown) Date: Sun, 29 Jan 2012 10:10:07 -0800 Subject: [Python-Dev] #include "Python.h" In-Reply-To: <4F258917.9090706@gmail.com> References: <4F25670E.4010701@gmail.com> <20120129172223.GA32042@iskra.aviel.ru> <4F258917.9090706@gmail.com> Message-ID: <20120129181007.GA17631@idyll.org> On Sun, Jan 29, 2012 at 05:59:51PM +0000, Andrea Crotti wrote: > On 01/29/2012 05:22 PM, Oleg Broytman wrote: >> Hello. >> >> We are sorry but we cannot help you. This mailing list is to work on >> developing Python (adding new features to Python itself and fixing bugs); >> if you're having problems learning, understanding or using Python, please >> find another forum. Probably python-list/comp.lang.python mailing list/news >> group is the best place; there are Python developers who participate in it; >> you may get a faster, and probably more complete, answer there. See >> http://www.python.org/community/ for other lists/news groups/fora. Thank >> you for understanding. >> > > I wrote here because I thought it was the best place, but I understand > this point of view, I can ask on python or python-core for example.. python-dev isn't that inappropriate, IMO, but probably the best place to go with this discussion is python-ideas. Could you repost over there? cheers, --titus -- C. Titus Brown, ctb at msu.edu From p.f.moore at gmail.com Sun Jan 29 19:34:49 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Jan 2012 18:34:49 +0000 Subject: [Python-Dev] #include "Python.h" In-Reply-To: <20120129181007.GA17631@idyll.org> References: <4F25670E.4010701@gmail.com> <20120129172223.GA32042@iskra.aviel.ru> <4F258917.9090706@gmail.com> <20120129181007.GA17631@idyll.org> Message-ID: On 29 January 2012 18:10, C. Titus Brown wrote: > python-dev isn't that inappropriate, IMO, but probably the best place to > go with this discussion is python-ideas. ?Could you repost over there? I agree that python-dev isn't particularly appropriate, python-list is probably your best bet. The python-ideas isn't really appropriate, as this isn't a proposal for a change to Python, but rather a question about how the Python C code is structured. That's always a grey area, and I can see why the OP thought python-dev might be a reasonable place. Having said all that: > Python.h actually includes everything as far as I can I see so: > - it's very hard with a not-enough smart editor to find out where the > not-locally defined symbols are actually defined (well sure that is > not a problem for most of the people) Well, that's more of a question of what tools you use to edit/read Python code. I guess you could view it as a trade-off between ease of writing the core code and extensions (avoiding micromanagement of headers, and being able to document #include "Python.h" as the canonical way to get access to the Python API from C) versus tracking down macro definitions and symbol declarations (and that's really only for information, as the API is documented in the manuals anyway). I don't use an editor that can automatically find the definitions, but grep and the manuals does me fine. > - if all the files include python.h, doesn't it generate very big object > files? Or is it not a problem since they are stripped out after? That's more of a C/linker question, but generally .h files only contain declarations and macros, and nothing that generates code. So there is no impact on object code size if you include multiple .h files, or too many, or whatever. So no, it doesn't generate big object files. Paul From andrea.crotti.0 at gmail.com Sun Jan 29 19:53:31 2012 From: andrea.crotti.0 at gmail.com (Andrea Crotti) Date: Sun, 29 Jan 2012 18:53:31 +0000 Subject: [Python-Dev] #include "Python.h" In-Reply-To: References: <4F25670E.4010701@gmail.com> <20120129172223.GA32042@iskra.aviel.ru> <4F258917.9090706@gmail.com> <20120129181007.GA17631@idyll.org> Message-ID: <4F2595AB.9040301@gmail.com> On 01/29/2012 06:34 PM, Paul Moore wrote: > On 29 January 2012 18:10, C. Titus Brown wrote: >> python-dev isn't that inappropriate, IMO, but probably the best place to >> go with this discussion is python-ideas. Could you repost over there? > I agree that python-dev isn't particularly appropriate, python-list is > probably your best bet. The python-ideas isn't really appropriate, as > this isn't a proposal for a change to Python, but rather a question > about how the Python C code is structured. That's always a grey area, > and I can see why the OP thought python-dev might be a reasonable > place. Ok well for this I won't repost it anywhere else, I have already all the answers I wanted and it was not so important.. > Having said all that: > >> Python.h actually includes everything as far as I can I see so: >> - it's very hard with a not-enough smart editor to find out where the >> not-locally defined symbols are actually defined (well sure that is >> not a problem for most of the people) > Well, that's more of a question of what tools you use to edit/read > Python code. I guess you could view it as a trade-off between ease of > writing the core code and extensions (avoiding micromanagement of > headers, and being able to document #include "Python.h" as the > canonical way to get access to the Python API from C) versus tracking > down macro definitions and symbol declarations (and that's really only > for information, as the API is documented in the manuals anyway). > > I don't use an editor that can automatically find the definitions, but > grep and the manuals does me fine. Yes sure it makes sense, probably it's even better than including only simple files, since all the contributions to Python.h can be moved around and refactored without breaking all the code.. And for editor I use Emacs, which can actually do any kind of magic on the symbols, I just didn't set it up for the python source code.. Thanks, Andrea From eliben at gmail.com Sun Jan 29 20:05:33 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 29 Jan 2012 21:05:33 +0200 Subject: [Python-Dev] #include "Python.h" In-Reply-To: <4F25670E.4010701@gmail.com> References: <4F25670E.4010701@gmail.com> Message-ID: On Sun, Jan 29, 2012 at 17:34, Andrea Crotti wrote: > I have a newbie question about CPython. > Looking at the C code I noted that for example in tupleobject.c there is > only one include > #include "Python.h" > > Python.h actually includes everything as far as I can I see so: > - it's very hard with a not-enough smart editor to find out where the > ?not-locally defined symbols are actually defined (well sure that is > ?not a problem for most of the people) Hi Andrea, Not sure what you mean by "not-enough smart editor". Dismissing IDEs for the moment (which by your classifications are probably "smart enough"), Python's source code (including headers included in Python.h) is readily navigable with Emacs or Vim using ctags, which is very easy to set up. Declarations are then easily found. Even if you forgo such features of the editor, grepping (or source specific greppers like ack or pss) also works fine most of the time. > - if all the files include python.h, doesn't it generate very big object > ?files? Or is it not a problem since they are stripped out after? Header files usually don't affect object file size. Unless something very fishy is going on (and this is not the case for headers included from Python.h, AFAIK) headers only contain declarations which don't affect code size. They may affect compilation time, but that's not a bit problem for Python's code base which is very fast to compile. Eli From greg at krypto.org Sun Jan 29 22:20:07 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 29 Jan 2012 13:20:07 -0800 Subject: [Python-Dev] [issue13703] Hash collision security issue In-Reply-To: <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> References: <4F22489D.7080902@g.nevcal.com> <4F22EA7B.1050903@g.nevcal.com> <20120127203928.Horde.MoApcUlCcOxPIv1w5_-lUBA@webmail.df.eu> Message-ID: On Fri, Jan 27, 2012 at 11:39 AM, wrote: > > In fact, none of the strategies fixes all issues with hash collisions; > even the hash-randomization solutions only deal with string keys, and > don't consider collisions on non-string keys. The hash-randomization approach also works fine on immutable container objects containing bytes and string keys such as tuples and UserString that merely expose a combination of the hashes of all of their contained elements. -gps From greg at krypto.org Sun Jan 29 22:26:06 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 29 Jan 2012 13:26:06 -0800 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: <4F235D45.7080707@pearwood.info> Message-ID: On Fri, Jan 27, 2012 at 6:33 PM, Benjamin Peterson wrote: > 2012/1/27 Steven D'Aprano : >> Benjamin Peterson wrote: >>> >>> Hello everyone, >>> In effort to get a fix out before Perl 6 goes mainstream, Barry and I >>> have decided to pronounce on what we want for our stable releases. >>> What we have decided is that >>> 1. Simple hash randomization is the way to go. We think this has the >>> best chance of actually fixing the problem while being fairly >>> straightforward such that we're comfortable putting it in a stable >>> release. >>> 2. It will be off by default in stable releases and enabled by an >>> envar at runtime. This will prevent code breakage from dictionary >>> order changing as well as people depending on the hash stability. >> >> >> Do you have the expectation that it will become on by default in some future >> release? > > Yes, 3.3. The solution in 3.3 could even be one of the more > sophisticated proposals we have today. Yay! Thanks for the decision Release Managers! -gps From greg at krypto.org Sun Jan 29 22:39:10 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 29 Jan 2012 13:39:10 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On Fri, Jan 27, 2012 at 9:26 AM, Alex wrote: > Eli Bendersky gmail.com> writes: > >> >> Hello, >> >> Following an earlier discussion on python-ideas [1], we would like to >> propose the following PEP for review. Discussion is welcome. The PEP >> can also be viewed in HTML form at >> http://www.python.org/dev/peps/pep-0408/ >> >> [1] http://mail.python.org/pipermail/python-ideas/2012-January/013246.html >> > > I'm -1 on this, for a pretty simple reason. Something goes into __preview__, > instead of it's final destination directly because it needs feedback/possibly > changes. However, given the release cycle of the stdlib (~18 months), any > feedback it gets can't be seen by actual users until it's too late. Essentially > you can only get one round of stdlib. > > I think a significantly healthier process (in terms of maximizing feedback and > getting something into it's best shape) is to let a project evolve naturally on > PyPi and in the ecosystem, give feedback to it from an inclusion perspective, > and then include it when it becomes ready on it's own merits. The counter > argument to ?this is that putting it in the stdlib gets you signficantly more > eyeballs (and hopefully more feedback, therefore), my only response to this is: > if it doesn't get eyeballs on PyPi I don't think there's a great enough need to > justify it in the stdlib. -1 from me as well. How is the __preview__ namespace any different than the PendingDeprecationWarning that nobody ever uses? Nobody is likely to write significant code depending on anything in __preview__ thus the amount of feedback received would be low. A better way to get additional feedback would be to promote libraries that we are considering including by way of direct links to them on pypi from the relevant areas of the Python documentation (including the Module Reference / Index pages?) for that release and let the feedback on them roll in via that route. An example of this working: ipaddr is ready to go in. It got the eyeballs and API modifications while still a pypi library as a result of the discussion around the time it was originally suggested as being added. I or any other committers have simply not added it yet. -gps From francismb at email.de Sun Jan 29 21:59:27 2012 From: francismb at email.de (francis) Date: Sun, 29 Jan 2012 21:59:27 +0100 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F252014.3080900@hotpy.org> References: <4F252014.3080900@hotpy.org> Message-ID: <4F25B32F.1090200@email.de> On 01/29/2012 11:31 AM, Mark Shannon wrote: > It passes all the tests. > (I had to change a couple that relied on dict repr() ordering) Hi Mark, I've cloned the repo, build it the I've tried with ./python -m test. I got some errors: First in general: 340 tests OK. 2 tests failed: test_dis test_gdb 4 tests altered the execution environment: test_multiprocessing test_packaging test_site test_strlit 18 tests skipped: test_curses test_devpoll test_kqueue test_lzma test_msilib test_ossaudiodev test_smtpnet test_socketserver test_startfile test_timeout test_tk test_ttk_guionly test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 1 skip unexpected on linux: test_lzma [1348560 refs] **************************************************** then test_dis: == CPython 3.3.0a0 (default:f15cf35c9922, Jan 29 2012, 18:12:19) [GCC 4.6.2] == Linux-3.1.0-1-amd64-x86_64-with-debian-wheezy-sid little-endian == /home/ci/prog/cpython/hotpy_new_dict/build/test_python_14470 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0) [1/1] test_dis test_big_linenos (test.test_dis.DisTests) ... ok test_boundaries (test.test_dis.DisTests) ... ok test_bug_1333982 (test.test_dis.DisTests) ... ok test_bug_708901 (test.test_dis.DisTests) ... ok test_dis (test.test_dis.DisTests) ... ok test_dis_none (test.test_dis.DisTests) ... ok test_dis_object (test.test_dis.DisTests) ... ok test_dis_traceback (test.test_dis.DisTests) ... ok test_disassemble_bytes (test.test_dis.DisTests) ... ok test_disassemble_method (test.test_dis.DisTests) ... ok test_disassemble_method_bytes (test.test_dis.DisTests) ... ok test_disassemble_str (test.test_dis.DisTests) ... ok test_opmap (test.test_dis.DisTests) ... ok test_opname (test.test_dis.DisTests) ... ok test_code_info (test.test_dis.CodeInfoTests) ... FAIL test_code_info_object (test.test_dis.CodeInfoTests) ... ok test_pretty_flags_no_flags (test.test_dis.CodeInfoTests) ... ok test_show_code (test.test_dis.CodeInfoTests) ... FAIL ====================================================================== FAIL: test_code_info (test.test_dis.CodeInfoTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ci/prog/cpython/hotpy_new_dict/Lib/test/test_dis.py", line 439, in test_code_info self.assertRegex(dis.code_info(x), expected) AssertionError: Regex didn't match: 'Name: f\nFilename: (.*)\nArgument count: 1\nKw-only arguments: 0\nNumber of locals: 1\nStack size: 8\nFlags: OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n 0: None\nNames:\n 0: print\nVariable names:\n 0: c\nFree variables:\n 0: e\n 1: d\n 2: f\n 3: y\n 4: x\n 5: z' not found in 'Name: f\nFilename: /home/ci/prog/cpython/hotpy_new_dict/Lib/test/test_dis.py\nArgument count: 1\nKw-only arguments: 0\nNumber of locals: 1\nStack size: 8\nFlags: OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n 0: None\nNames:\n 0: print\nVariable names:\n 0: c\nFree variables:\n 0: y\n 1: e\n 2: d\n 3: f\n 4: x\n 5: z' ====================================================================== FAIL: test_show_code (test.test_dis.CodeInfoTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ci/prog/cpython/hotpy_new_dict/Lib/test/test_dis.py", line 446, in test_show_code self.assertRegex(output.getvalue(), expected+"\n") AssertionError: Regex didn't match: 'Name: f\nFilename: (.*)\nArgument count: 1\nKw-only arguments: 0\nNumber of locals: 1\nStack size: 8\nFlags: OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n 0: None\nNames:\n 0: print\nVariable names:\n 0: c\nFree variables:\n 0: e\n 1: d\n 2: f\n 3: y\n 4: x\n 5: z\n' not found in 'Name: f\nFilename: /home/ci/prog/cpython/hotpy_new_dict/Lib/test/test_dis.py\nArgument count: 1\nKw-only arguments: 0\nNumber of locals: 1\nStack size: 8\nFlags: OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n 0: None\nNames:\n 0: print\nVariable names:\n 0: c\nFree variables:\n 0: y\n 1: e\n 2: d\n 3: f\n 4: x\n 5: z\n' ---------------------------------------------------------------------- Ran 18 tests in 0.070s FAILED (failures=2) test test_dis failed 1 test failed: test_dis [111919 refs] ***************************************************** For test gdb: Lots of output ..... Ran 42 tests in 11.361s FAILED (failures=28) test test_gdb failed 1 test failed: test_gdb [109989 refs] From p.f.moore at gmail.com Sun Jan 29 23:02:44 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Jan 2012 22:02:44 +0000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: On 29 January 2012 21:39, Gregory P. Smith wrote: > An example of this working: ipaddr is ready to go in. It got the > eyeballs and API modifications while still a pypi library as a result > of the discussion around the time it was originally suggested as being > added. ?I or any other committers have simply not added it yet. Interesting. I recall the API debates and uncertainty, but I don't recall having seen anything to indicate that it all got resolved and we're essentially "ready to go". If I were looking for an IP address library, I wouldn't know where to go, and I certainly wouldn't know that there was an option that would become part of the stdlib. Not sure that counts as the approach "working"... (although I concede that my lack of a *real* need for an IP address library may be a contributing factor to my lack of knowledge...) Paul. From martin at v.loewis.de Sun Jan 29 23:15:37 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 29 Jan 2012 23:15:37 +0100 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F252014.3080900@hotpy.org> References: <4F252014.3080900@hotpy.org> Message-ID: <4F25C509.1070002@v.loewis.de> > Now that issue 13703 has been largely settled, > I want to propose my new dictionary implementation again. > It is a little more polished than before. Please clarify the status of that code: are you actually proposing 6a21f3b35e20 for inclusion into Python as-is? If so, please post it as a patch to the tracker, as it will need to be reviewed (possibly with requests for further changes). If not, it would be good if you could give a list of things that need to be done before you consider submission to Python. Also, please submit a contrib form if you haven't done so. Regards, Martin From mark at hotpy.org Sun Jan 29 23:26:21 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 22:26:21 +0000 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F25B32F.1090200@email.de> References: <4F252014.3080900@hotpy.org> <4F25B32F.1090200@email.de> Message-ID: <4F25C78D.1040001@hotpy.org> francis wrote: > On 01/29/2012 11:31 AM, Mark Shannon wrote: >> It passes all the tests. >> (I had to change a couple that relied on dict repr() ordering) > > Hi Mark, > I've cloned the repo, build it the I've tried with ./python -m test. I > got some errors: > > First in general: > 340 tests OK. > 2 tests failed: > test_dis test_gdb [snip] > > **************************************************** > then test_dis: > [snip] > ====================================================================== > FAIL: test_code_info (test.test_dis.CodeInfoTests) > ---------------------------------------------------------------------- [snip] > > ====================================================================== > FAIL: test_show_code (test.test_dis.CodeInfoTests) > ---------------------------------------------------------------------- [snip] These are known failures, the tests are at fault as they rely on dict ordering. However, they should be commented out. Probably crept back in again when I pulled the latest version of cpython -- I'll fix them now. [snip] > > ***************************************************** > For test gdb: > > Lots of output ..... > > Ran 42 tests in 11.361s > > FAILED (failures=28) > test test_gdb failed > 1 test failed: > test_gdb > [109989 refs] I still have gdb 6.somthing, would you mail me the full output please, so I can see what the problem is. Cheers, Mark. From barry at python.org Sun Jan 29 23:44:07 2012 From: barry at python.org (Barry Warsaw) Date: Sun, 29 Jan 2012 17:44:07 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120129174407.6e240d4c@resist.wooz.org> On Jan 28, 2012, at 07:29 PM, Guido van Rossum wrote: >Finally, if you really want to put warnings in whenever an >experimental module is being used, make it a silent warning, like >SilentDeprecationWarning. That allows people to request more strict >warnings without unduly alarming the users of an app. I'll just note too that we have examples of "stable" APIs in modules being used successfully in the field for years, and still having long hand-wringing debates about whether the API choices are right or not. email Nothing beats people beating on it heavily for years in production code to shake things out. I often think a generic answer to "did I get the API right" could be "no, but it's okay" :) -Barry From mark at hotpy.org Sun Jan 29 23:44:53 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 22:44:53 +0000 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F25C509.1070002@v.loewis.de> References: <4F252014.3080900@hotpy.org> <4F25C509.1070002@v.loewis.de> Message-ID: <4F25CBE5.9030908@hotpy.org> Martin v. L?wis wrote: >> Now that issue 13703 has been largely settled, >> I want to propose my new dictionary implementation again. >> It is a little more polished than before. > > Please clarify the status of that code: are you actually proposing > 6a21f3b35e20 for inclusion into Python as-is? If so, please post it > as a patch to the tracker, as it will need to be reviewed (possibly > with requests for further changes). I thought it already was a patch. What do I need to do to make it a patch? > > If not, it would be good if you could give a list of things that need to > be done before you consider submission to Python. A few tests that rely on dict ordering should probably be fixed first. I'll submit bug reports for those. > > Also, please submit a contrib form if you haven't done so. Where do I find it? Cheers, Mark. From martin at v.loewis.de Sun Jan 29 23:57:36 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 29 Jan 2012 23:57:36 +0100 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: <20120129202309.GA21774@snakebite.org> References: <4F15DD85.6000905@v.loewis.de> <4F15E1A1.6090303@v.loewis.de> <20120126215431.Horde.dSI3OML8999PIb2HJXHnfeA@webmail.df.eu> <20120129202309.GA21774@snakebite.org> Message-ID: <4F25CEE0.2020003@v.loewis.de> > I... I think I might have already done this, inadvertently. I > needed an x64 VS2010 debug build of Subversion/APR*/Python a few > weeks ago -- forgetting the fact that we're still on VS2008. There is a lot of duplication of work going on here: at least four people have done the same. The more people duplicate the work, the more urgent it apparently becomes that the trunk switches "officially". > * Three new buildbot scripts: > - build-amd64-vs10.bat > - clean-amd64-vs10.bat > - external-amd64-vs10.bat When we switch, these should actually replace the current ones, rather than being additions. > So, I guess my question is, is that work useful? Perhaps not, given that several other copies of that to draw from may exist. OTOH, I haven't heard anybody reporting these specific changes. In any case, it's now in Brian's hand. Regards, Martin From martin at v.loewis.de Mon Jan 30 00:01:24 2012 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 30 Jan 2012 00:01:24 +0100 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F25CBE5.9030908@hotpy.org> References: <4F252014.3080900@hotpy.org> <4F25C509.1070002@v.loewis.de> <4F25CBE5.9030908@hotpy.org> Message-ID: <4F25CFC4.6050808@v.loewis.de> >> Please clarify the status of that code: are you actually proposing >> 6a21f3b35e20 for inclusion into Python as-is? If so, please post it >> as a patch to the tracker, as it will need to be reviewed (possibly >> with requests for further changes). > > I thought it already was a patch. What do I need to do to make it a patch? I missed your announcement of issue13903; all is fine here. > Where do I find it? http://www.python.org/psf/contrib/contrib-form-python/ Thanks, Martin From mark at hotpy.org Mon Jan 30 00:20:32 2012 From: mark at hotpy.org (Mark Shannon) Date: Sun, 29 Jan 2012 23:20:32 +0000 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: References: <4F252014.3080900@hotpy.org> <4F25C509.1070002@v.loewis.de> <4F25CBE5.9030908@hotpy.org> Message-ID: <4F25D440.9080000@hotpy.org> Matt Joiner wrote: > Mark, Good luck with getting this in, I'm also hopeful about coroutines, > maybe after pushing your dict optimization your coroutine implementation > will get more consideration. Shush, don't say the C word or you'll put people off ;) I'm actually not that fussed about the coroutine implementation. With "yield from" generators have all the power of asymmetric coroutines. I think my coroutine implementation is a neater way to do things, but it is not worth the fuss. Anyway, I'm working on my next crazy experiment :) Cheers, Mark. From steve at pearwood.info Mon Jan 30 00:30:14 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 30 Jan 2012 10:30:14 +1100 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F256EDC.70707@hotpy.org> References: <4F252014.3080900@hotpy.org> <20120129160841.2343b62f@pitrou.net> <4F256EDC.70707@hotpy.org> Message-ID: <4F25D686.9070907@pearwood.info> Mark Shannon wrote: > Antoine Pitrou wrote: >> On Sun, 29 Jan 2012 09:56:11 -0500 >> Benjamin Peterson wrote: >> >>> 2012/1/29 Mark Shannon : >>>> Hi, >>>> >>>> Now that issue 13703 has been largely settled, >>>> I want to propose my new dictionary implementation again. >>>> It is a little more polished than before. >>> If you're serious about changing the dictionary implementation, I >>> think you should write a PEP. It should explain the new dicts >>> advantages (and disadvantages?) and give comprehensive benchmark >>> numbers. Something along the lines of >>> http://www.python.org/dev/peps/pep-3128/ I should think. >> >> "New dictionary implementation" is a misnomer here. Mark's patch merely >> allows to share the keys array between several dictionaries. The lookup >> algorithm remains exactly the same as far as I've read. It's actually >> much less invasive than e.g. Martin's AVL trees-for-hash-collisions >> proposal. >> > > Antoine is right. It is a reorganisation of the dict, plus a couple of > changes to typeobject.c and object.c to ensure that instance > dictionaries do indeed share keys arrays. I don't quite follow how that could work. If I have this: class C: pass a = C() b = C() a.spam = 1 b.ham = 2 how can a.__dict__ and b.__dict__ share key arrays? I've tried reading the source, but I'm afraid I don't understand it well enough to make sense of it. -- Steven From ncoghlan at gmail.com Mon Jan 30 00:46:13 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Jan 2012 09:46:13 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120129174407.6e240d4c@resist.wooz.org> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120129174407.6e240d4c@resist.wooz.org> Message-ID: On Mon, Jan 30, 2012 at 8:44 AM, Barry Warsaw wrote: > Nothing beats people beating on it heavily for years in production code to > shake things out. ?I often think a generic answer to "did I get the API right" > could be "no, but it's okay" :) Heh, my answer to complaints about the urrlib (etc) APIs being horrendous in the modern web era is to point out that they were put together in an age where "web" mostly meant "unauthenticated HTTP GET requests". They're hard to use for modern authentication protocols because they *predate* widespread use of such things... Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Mon Jan 30 00:46:32 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 30 Jan 2012 00:46:32 +0100 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> Message-ID: >> import threading >> s = threading.Semaphore(0.5) > > But why would you want to pass a float? It seems like API abuse to me. If something should be changed, Semaphore(arg) should raise a TypeError if arg is not an integer. Victor From anacrolix at gmail.com Mon Jan 30 01:31:34 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 30 Jan 2012 11:31:34 +1100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120129174407.6e240d4c@resist.wooz.org> Message-ID: I think an advocacy of 3rd party modules would start with modules such as ipaddr, requests, regex. Linking directly to them from the python core documentation, while requesting they hold a successful moratorium in order to be included in a later standard module release. On Jan 30, 2012 10:47 AM, "Nick Coghlan" wrote: > On Mon, Jan 30, 2012 at 8:44 AM, Barry Warsaw wrote: > > Nothing beats people beating on it heavily for years in production code > to > > shake things out. I often think a generic answer to "did I get the API > right" > > could be "no, but it's okay" :) > > Heh, my answer to complaints about the urrlib (etc) APIs being > horrendous in the modern web era is to point out that they were put > together in an age where "web" mostly meant "unauthenticated HTTP GET > requests". > > They're hard to use for modern authentication protocols because they > *predate* widespread use of such things... > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Mon Jan 30 00:22:07 2012 From: francismb at email.de (francis) Date: Mon, 30 Jan 2012 00:22:07 +0100 Subject: [Python-Dev] A new dictionary implementation In-Reply-To: <4F25C78D.1040001@hotpy.org> References: <4F252014.3080900@hotpy.org> <4F25B32F.1090200@email.de> <4F25C78D.1040001@hotpy.org> Message-ID: <4F25D49F.4040508@email.de> > I still have gdb 6.somthing, > would you mail me the full output please, > so I can see what the problem is. It's done, let me know if you need more output. Cheers, francis From jxo6948 at rit.edu Mon Jan 30 03:11:08 2012 From: jxo6948 at rit.edu (John O'Connor) Date: Sun, 29 Jan 2012 21:11:08 -0500 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> Message-ID: On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson wrote: > But why would you want to pass a float? It seems like API abuse to me. > Agreed. Anything else seems meaningless. From ethan at stoneleaf.us Mon Jan 30 05:51:29 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 29 Jan 2012 20:51:29 -0800 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: References: <4F2217D1.2000700@stoneleaf.us> <4F24F85C.9080603@stoneleaf.us> Message-ID: <4F2621D1.8010301@stoneleaf.us> Latest addition for PEP 409 has been sent. Text follows: Language Details ================ Currently, __context__ and __cause__ start out as None, and then get set as exceptions occur. To support 'from None', __context__ will stay as it is, but __cause__ will start out as False, and will change to None when the 'raise ... from None' method is used. If __cause__ is False the __context__ (if any) will be printed. If __cause__ is None the __context__ will not be printed. if __cause__ is anything else, __cause__ will be printed. This has the benefit of leaving the __context__ intact for future logging, querying, etc., while suppressing its display if it is not caught. raise ... from ... is not disallowed outside a try block, but this behavior is not guaranteed to remain. ------------------------------------------------------------------ Should that last disclaimer be there? Should it be changed? ~Ethan~ From ncoghlan at gmail.com Mon Jan 30 07:23:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Jan 2012 16:23:01 +1000 Subject: [Python-Dev] PEP for allowing 'raise NewException from None' In-Reply-To: <4F2621D1.8010301@stoneleaf.us> References: <4F2217D1.2000700@stoneleaf.us> <4F24F85C.9080603@stoneleaf.us> <4F2621D1.8010301@stoneleaf.us> Message-ID: On Mon, Jan 30, 2012 at 2:51 PM, Ethan Furman wrote: > raise ... from ... is not disallowed outside a try block, but this > behavior is not guaranteed to remain. > > ------------------------------------------------------------------ > > Should that last disclaimer be there? ?Should it be changed? I'd leave it out - the original PEP didn't disallow it, enforcing it would be annoying, and it's easy enough to pick up if you happen to happen to care (it will mean __cause__ is set along with __context == None). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From thcberserk at me.com Mon Jan 30 10:51:45 2012 From: thcberserk at me.com (Ivano) Date: Mon, 30 Jan 2012 09:51:45 +0000 (UTC) Subject: [Python-Dev] Release cycle question Message-ID: Hello everyone. I'm writing to ask if Python uses a "fixed" release time or if it depends strongly on something else. In example, Blender does and since I'm diving into Python because I would like to extend it, I would like to know if my work will have a default lifetime or not. By the way, Python 3 changed the game AFAIK, will another major change come short? Thanks in advance for any help. Bye, Ivano. From ncoghlan at gmail.com Mon Jan 30 11:55:06 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Jan 2012 20:55:06 +1000 Subject: [Python-Dev] Release cycle question In-Reply-To: References: Message-ID: On Mon, Jan 30, 2012 at 7:51 PM, Ivano wrote: > Hello everyone. > I'm writing to ask if Python uses a "fixed" release > time or if it depends strongly on something else. > In example, Blender does and since I'm diving > into Python because I would like to extend it, I > would like to know if my work will have a default > lifetime or not. Hi Ivano, The current release cycle is documented in the developer's guide: http://docs.python.org/devguide/devcycle.html At this point in time, there are two official python.org releases: Python 2.7 and 3.2 2.7 was released in July 2010 and will receive maintenance updates until around July 2015 (as it is the final release in the 2.x series) 3.2 was released in February 2011 and will receive maintenance updates until around August this year (but will receive further source-only security updates until around February 2016) 3.3 is due for release in August this year. However, those are the official support dates specifically for python-dev. OS vendors such as Red Hat and Canonical provide support for older versions of Python as part of their enterprise releases (e.g. RHEL5 is still supported by Red Hat and ships with Python 2.4, even though python-dev ended upstream security updates for 2.4 in 2009) > By the way, Python 3 changed the game AFAIK, > will another major change come short? No, as noted on the development cycle page, changes on the scale of those between Python 2 and Python 3 are not expected any time in the near future. I'd personally be surprised if anything like that transition happened again within the next decade. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From brett at python.org Mon Jan 30 18:03:20 2012 From: brett at python.org (Brett Cannon) Date: Mon, 30 Jan 2012 12:03:20 -0500 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: <4F235D45.7080707@pearwood.info> Message-ID: On Fri, Jan 27, 2012 at 21:33, Benjamin Peterson wrote: > 2012/1/27 Steven D'Aprano : > > Benjamin Peterson wrote: > >> > >> Hello everyone, > >> In effort to get a fix out before Perl 6 goes mainstream, Barry and I > >> have decided to pronounce on what we want for our stable releases. > >> What we have decided is that > >> 1. Simple hash randomization is the way to go. We think this has the > >> best chance of actually fixing the problem while being fairly > >> straightforward such that we're comfortable putting it in a stable > >> release. > >> 2. It will be off by default in stable releases and enabled by an > >> envar at runtime. This will prevent code breakage from dictionary > >> order changing as well as people depending on the hash stability. > > > Great! > > > > Do you have the expectation that it will become on by default in some > future > > release? > > Yes, 3.3. The solution in 3.3 could even be one of the more > sophisticated proposals we have today. I think that would be good. And I would even argue we remove support for turning it off to force people to no longer lean on dict ordering as a crutch (in 3.3 obviously). -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Jan 30 18:14:44 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Jan 2012 12:14:44 -0500 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: <4F235D45.7080707@pearwood.info> Message-ID: <20120130121444.1281ee6d@resist.wooz.org> On Jan 30, 2012, at 12:03 PM, Brett Cannon wrote: >I think that would be good. And I would even argue we remove support for >turning it off to force people to no longer lean on dict ordering as a >crutch (in 3.3 obviously). Yes, please! -Barry From scott+python-dev at scottdial.com Mon Jan 30 19:19:56 2012 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 30 Jan 2012 13:19:56 -0500 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: References: Message-ID: <4F26DF4C.5080006@scottdial.com> On 1/29/2012 4:39 PM, Gregory P. Smith wrote: > An example of this working: ipaddr is ready to go in. It got the > eyeballs and API modifications while still a pypi library as a result > of the discussion around the time it was originally suggested as being > added. I or any other committers have simply not added it yet. This is wrong. PEP 3144 was not pronounced upon, so ipaddr is not just waiting for someone to commit it; it's waiting on consensus and pronouncement. PEP 3144 wasn't pronounced upon because there were significant disagreements about the design of the API proposed in the PEP. As it stands, I believe the authorship of ipaddr either decided that they were not going to compromise their module or lost interest. See Nick Coghlan's summary: http://mail.python.org/pipermail//python-ideas/2011-August/011305.html -- Scott Dial scott at scottdial.com From guido at python.org Mon Jan 30 19:29:44 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jan 2012 10:29:44 -0800 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <4F26DF4C.5080006@scottdial.com> References: <4F26DF4C.5080006@scottdial.com> Message-ID: Maybe that's another example of waiting too long for the perfect decision though. In the last ~12 months, ipaddr was downloaded at least 11,000 times from its home (http://code.google.com/p/ipaddr-py/downloads/list). There's been a fair amount of changes over that time and a new release was put out 10 days ago. What are the stats for the "competing" package? --Guido On Mon, Jan 30, 2012 at 10:19 AM, Scott Dial wrote: > On 1/29/2012 4:39 PM, Gregory P. Smith wrote: >> An example of this working: ipaddr is ready to go in. It got the >> eyeballs and API modifications while still a pypi library as a result >> of the discussion around the time it was originally suggested as being >> added. ?I or any other committers have simply not added it yet. > > This is wrong. PEP 3144 was not pronounced upon, so ipaddr is not just > waiting for someone to commit it; it's waiting on consensus and > pronouncement. > > PEP 3144 wasn't pronounced upon because there were significant > disagreements about the design of the API proposed in the PEP. As it > stands, I believe the authorship of ipaddr either decided that they were > not going to compromise their module or lost interest. > > See Nick Coghlan's summary: > > http://mail.python.org/pipermail//python-ideas/2011-August/011305.html > > -- > Scott Dial > scott at scottdial.com > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From bauertomer at gmail.com Mon Jan 30 19:40:32 2012 From: bauertomer at gmail.com (T.B.) Date: Mon, 30 Jan 2012 20:40:32 +0200 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> Message-ID: <4F26E420.2060707@gmail.com> On 2012-01-30 01:46, Victor Stinner wrote: >> >> But why would you want to pass a float? It seems like API abuse to me. > > If something should be changed, Semaphore(arg) should raise a > TypeError if arg is not an integer. > Short version: I propose the the change to be - while self._value == 0: + while self._value < 1: This should not change the flow when Semaphore._value is an int. Longer explanation: I thought it is surprising to use math.floor() for threading.Semaphore, but now as you propose, we will need to use something like int(math.floor(value)) in Python2.x - which is even more surprising. That is because math.floor() (and round() for that matter) return a float object in Python2.x. Note: isinstance(4.0, numbers.Integral) is False, even in Python3.x, but until now 4.0 was valid as a value for Semaphore(). Also, using the builtin int()/math.trunc() on a float is probably not what you want here, but rather math.floor(). The value argument given to threading.Semaphore() is really a duck (or an object) that can be compared to 0 and 1, incremented by 1 and decremented by 1. These are properties that fit float. Why should you force the entire builtin int behavior on that object? I agree that using a float as the counter smells bad, but at times you might have something like a fractional resource (which is different from a floating point number). In such cases Semaphore.acquire(), after the tiny patch above, can be thought as checking if you have at least one "unit of resource" available. If you do have at least one such resource - acquire it. This will make sure the invariant "The counter can never go below zero" holds. Regards, TB From guido at python.org Mon Jan 30 19:52:29 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jan 2012 10:52:29 -0800 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: <4F26E420.2060707@gmail.com> References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: TB, what's your use case for passing a float to a semaphore? Semaphores are conceptually tied to integers. You've kept arguing a few times now that the workaround you need are clumsy, but you've not explained why you're passing floats in the first place. A "fractional resource" just doesn't sound like a real use case to me. On Mon, Jan 30, 2012 at 10:40 AM, T.B. wrote: > > On 2012-01-30 01:46, Victor Stinner wrote: >>> >>> >>> But why would you want to pass a float? It seems like API abuse to me. >> >> >> If something should be changed, Semaphore(arg) should raise a >> TypeError if arg is not an integer. >> > Short version: > I propose the the change to be > > - ? ? ? ?while self._value == 0: > + ? ? ? ?while self._value < 1: > This should not change the flow when Semaphore._value is an int. > > Longer explanation: > I thought it is surprising to use math.floor() for threading.Semaphore, but > now as you propose, we will need to use something like > int(math.floor(value)) in Python2.x - which is even more surprising. That is > because math.floor() (and round() for that matter) return a float object in > Python2.x. > > Note: isinstance(4.0, numbers.Integral) is False, even in Python3.x, but > until now 4.0 was valid as a value for Semaphore(). Also, using the builtin > int()/math.trunc() on a float is probably not what you want here, but rather > math.floor(). > > The value argument given to threading.Semaphore() is really a duck (or an > object) that can be compared to 0 and 1, incremented by 1 and decremented by > 1. These are properties that fit float. Why should you force the entire > builtin int behavior on that object? > > I agree that using a float as the counter smells bad, but at times you might > have something like a fractional resource (which is different from a > floating point number). In such cases Semaphore.acquire(), after the tiny > patch above, can be thought as checking if you have at least one "unit of > resource" available. If you do have at least one such resource - acquire it. > This will make sure the invariant "The counter can never go below zero" > holds. > > Regards, > TB > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Jan 30 19:59:22 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Jan 2012 19:59:22 +0100 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120130195922.459cd010@pitrou.net> On Sun, 29 Jan 2012 16:42:28 +1000 Nick Coghlan wrote: > On Sun, Jan 29, 2012 at 1:29 PM, Guido van Rossum wrote: > > On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan wrote: > >> I'm willing to go along with that (especially given your report of > >> AppEngine's experience with the "labs" namespace). > >> > >> Can we class this as a pronouncement on PEP 408? That is, "No to > >> adding a __preview__ namespace, but yes to adding regex directly for > >> 3.3"? > > > > Yup. We seem to have a tendency to over-analyze decisions a bit lately > > (witness the hand-wringing about the hash collision DoS attack). > > I have now updated PEP 408 accordingly (i.e. rejected, but with a > specific note about regex). It would be nice if that pronouncement or decision could outline the steps required to include an "experimental" module in the stdlib, and the steps required to move it from "experimental" to "stable". Regards Antoine. From stefan at brunthaler.net Mon Jan 30 20:06:44 2012 From: stefan at brunthaler.net (stefan brunthaler) Date: Mon, 30 Jan 2012 11:06:44 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: <4F23C657.9050501@hotpy.org> References: <4F23C657.9050501@hotpy.org> Message-ID: Hello, > Could you try benchmarking with the "standard" benchmarks: > http://hg.python.org/benchmarks/ > and see what sort of performance gains you get? > Yeah, of course. I already did. Refere to the page listed below for details. I did not look into the results yet, though. > How portable is the threaded interpreter? > Well, you can implement threaded code on any machine that support indirect branch instructions. Fortunately, GCC supports the "label-as-values" feature, which makes it available on any machine that supports GCC. My optimizations themselves are portable, and I tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports this feature, too.) > Do you have a public repository for the code, so we can take a look? > I have created a patch (as Benjamin wanted) and put all of the resources (i.e., benchmark results and the patch itself) on my home page: http://www.ics.uci.edu/~sbruntha/pydev.html Regards, --stefan From solipsis at pitrou.net Mon Jan 30 20:13:52 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Jan 2012 20:13:52 +0100 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... References: <4F23C657.9050501@hotpy.org> Message-ID: <20120130201352.6fc893e9@pitrou.net> Hello, > Well, you can implement threaded code on any machine that support > indirect branch instructions. Fortunately, GCC supports the > "label-as-values" feature, which makes it available on any machine > that supports GCC. My optimizations themselves are portable, and I > tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports > this feature, too.) Well, you're aware that Python already uses threaded code where available? Or are you testing against Python 2? Regards Antoine. From stefan at brunthaler.net Mon Jan 30 20:18:09 2012 From: stefan at brunthaler.net (stefan brunthaler) Date: Mon, 30 Jan 2012 11:18:09 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: <20120130201352.6fc893e9@pitrou.net> References: <4F23C657.9050501@hotpy.org> <20120130201352.6fc893e9@pitrou.net> Message-ID: > Well, you're aware that Python already uses threaded code where > available? Or are you testing against Python 2? > Yes, and I am building on that. --stefan From ncoghlan at gmail.com Mon Jan 30 22:07:53 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 07:07:53 +1000 Subject: [Python-Dev] plugging the hash attack In-Reply-To: References: <4F235D45.7080707@pearwood.info> Message-ID: On Tue, Jan 31, 2012 at 3:03 AM, Brett Cannon wrote: > I think that would be good. And I would ?even argue we remove support for > turning it off to force people to no longer lean on dict ordering as a > crutch (in 3.3 obviously). On-by-default should be enough to cover that. Just as we allow people to force the random seed to reproduce particular sequences, there's value in being able to increase determinism in cases where the collision attack isn't a concern. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From brett at python.org Mon Jan 30 22:26:30 2012 From: brett at python.org (Brett Cannon) Date: Mon, 30 Jan 2012 16:26:30 -0500 Subject: [Python-Dev] [Python-checkins] cpython: Issue #8828: Add new function os.replace(), for cross-platform renaming with In-Reply-To: References: Message-ID: Should this end up being used in importlib through _os? On Mon, Jan 30, 2012 at 16:11, antoine.pitrou wrote: > http://hg.python.org/cpython/rev/80ddbd822227 > changeset: 74689:80ddbd822227 > user: Antoine Pitrou > date: Mon Jan 30 22:08:52 2012 +0100 > summary: > Issue #8828: Add new function os.replace(), for cross-platform renaming > with overwriting. > > files: > Doc/library/os.rst | 18 +++++++++- > Lib/test/test_os.py | 12 ++++++ > Misc/NEWS | 3 + > Modules/posixmodule.c | 55 +++++++++++++++++++++--------- > 4 files changed, 69 insertions(+), 19 deletions(-) > > > diff --git a/Doc/library/os.rst b/Doc/library/os.rst > --- a/Doc/library/os.rst > +++ b/Doc/library/os.rst > @@ -1889,8 +1889,9 @@ > Unix flavors if *src* and *dst* are on different filesystems. If > successful, > the renaming will be an atomic operation (this is a POSIX requirement). > On > Windows, if *dst* already exists, :exc:`OSError` will be raised even if > it is a > - file; there may be no way to implement an atomic rename when *dst* > names an > - existing file. > + file. > + > + If you want cross-platform overwriting of the destination, use > :func:`replace`. > > Availability: Unix, Windows. > > @@ -1908,6 +1909,19 @@ > permissions needed to remove the leaf directory or file. > > > +.. function:: replace(src, dst) > + > + Rename the file or directory *src* to *dst*. If *dst* is a directory, > + :exc:`OSError` will be raised. If *dst* exists and is a file, it will > + be replaced silently if the user has permission. The operation may > fail > + if *src* and *dst* are on different filesystems. If successful, > + the renaming will be an atomic operation (this is a POSIX requirement). > + > + Availability: Unix, Windows > + > + .. versionadded:: 3.3 > + > + > .. function:: rmdir(path) > > Remove (delete) the directory *path*. Only works when the directory is > diff --git a/Lib/test/test_os.py b/Lib/test/test_os.py > --- a/Lib/test/test_os.py > +++ b/Lib/test/test_os.py > @@ -129,6 +129,18 @@ > self.fdopen_helper('r') > self.fdopen_helper('r', 100) > > + def test_replace(self): > + TESTFN2 = support.TESTFN + ".2" > + with open(support.TESTFN, 'w') as f: > + f.write("1") > + with open(TESTFN2, 'w') as f: > + f.write("2") > + self.addCleanup(os.unlink, TESTFN2) > + os.replace(support.TESTFN, TESTFN2) > + self.assertRaises(FileNotFoundError, os.stat, support.TESTFN) > + with open(TESTFN2, 'r') as f: > + self.assertEqual(f.read(), "1") > + > > # Test attributes on return values from os.*stat* family. > class StatAttributeTests(unittest.TestCase): > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -463,6 +463,9 @@ > Library > ------- > > +- Issue #8828: Add new function os.replace(), for cross-platform renaming > + with overwriting. > + > - Issue #13848: open() and the FileIO constructor now check for NUL > characters in the file name. Patch by Hynek Schlawack. > > diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c > --- a/Modules/posixmodule.c > +++ b/Modules/posixmodule.c > @@ -3280,17 +3280,16 @@ > #endif /* HAVE_SETPRIORITY */ > > > -PyDoc_STRVAR(posix_rename__doc__, > -"rename(old, new)\n\n\ > -Rename a file or directory."); > - > -static PyObject * > -posix_rename(PyObject *self, PyObject *args) > +static PyObject * > +internal_rename(PyObject *self, PyObject *args, int is_replace) > { > #ifdef MS_WINDOWS > PyObject *src, *dst; > BOOL result; > - if (PyArg_ParseTuple(args, "UU:rename", &src, &dst)) > + int flags = is_replace ? MOVEFILE_REPLACE_EXISTING : 0; > + if (PyArg_ParseTuple(args, > + is_replace ? "UU:replace" : "UU:rename", > + &src, &dst)) > { > wchar_t *wsrc, *wdst; > > @@ -3301,16 +3300,17 @@ > if (wdst == NULL) > return NULL; > Py_BEGIN_ALLOW_THREADS > - result = MoveFileW(wsrc, wdst); > + result = MoveFileExW(wsrc, wdst, flags); > Py_END_ALLOW_THREADS > if (!result) > - return win32_error("rename", NULL); > + return win32_error(is_replace ? "replace" : "rename", NULL); > Py_INCREF(Py_None); > return Py_None; > } > else { > PyErr_Clear(); > - if (!PyArg_ParseTuple(args, "O&O&:rename", > + if (!PyArg_ParseTuple(args, > + is_replace ? "O&O&:replace" : "O&O&:rename", > PyUnicode_FSConverter, &src, > PyUnicode_FSConverter, &dst)) > return NULL; > @@ -3319,15 +3319,15 @@ > goto error; > > Py_BEGIN_ALLOW_THREADS > - result = MoveFileA(PyBytes_AS_STRING(src), > - PyBytes_AS_STRING(dst)); > + result = MoveFileExA(PyBytes_AS_STRING(src), > + PyBytes_AS_STRING(dst), flags); > Py_END_ALLOW_THREADS > > Py_XDECREF(src); > Py_XDECREF(dst); > > if (!result) > - return win32_error("rename", NULL); > + return win32_error(is_replace ? "replace" : "rename", NULL); > Py_INCREF(Py_None); > return Py_None; > > @@ -3337,10 +3337,30 @@ > return NULL; > } > #else > - return posix_2str(args, "O&O&:rename", rename); > -#endif > -} > - > + return posix_2str(args, > + is_replace ? "O&O&:replace" : "O&O&:rename", > rename); > +#endif > +} > + > +PyDoc_STRVAR(posix_rename__doc__, > +"rename(old, new)\n\n\ > +Rename a file or directory."); > + > +static PyObject * > +posix_rename(PyObject *self, PyObject *args) > +{ > + return internal_rename(self, args, 0); > +} > + > +PyDoc_STRVAR(posix_replace__doc__, > +"replace(old, new)\n\n\ > +Rename a file or directory, overwriting the destination."); > + > +static PyObject * > +posix_replace(PyObject *self, PyObject *args) > +{ > + return internal_rename(self, args, 1); > +} > > PyDoc_STRVAR(posix_rmdir__doc__, > "rmdir(path)\n\n\ > @@ -10555,6 +10575,7 @@ > {"readlink", win_readlink, METH_VARARGS, win_readlink__doc__}, > #endif /* !defined(HAVE_READLINK) && defined(MS_WINDOWS) */ > {"rename", posix_rename, METH_VARARGS, posix_rename__doc__}, > + {"replace", posix_replace, METH_VARARGS, > posix_replace__doc__}, > {"rmdir", posix_rmdir, METH_VARARGS, posix_rmdir__doc__}, > {"stat", posix_stat, METH_VARARGS, posix_stat__doc__}, > {"stat_float_times", stat_float_times, METH_VARARGS, > stat_float_times__doc__}, > > -- > Repository URL: http://hg.python.org/cpython > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://mail.python.org/mailman/listinfo/python-checkins > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 30 22:34:29 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Jan 2012 22:34:29 +0100 Subject: [Python-Dev] cpython: Issue #8828: Add new function os.replace(), for cross-platform renaming with References: Message-ID: <20120130223429.7f06d7e6@pitrou.net> On Mon, 30 Jan 2012 16:26:30 -0500 Brett Cannon wrote: > Should this end up being used in importlib through _os? Yes, probably. I hadn't thought about that. Regards Antoine. From ncoghlan at gmail.com Mon Jan 30 22:44:26 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 07:44:26 +1000 Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard library __preview__ package) Message-ID: On Tue, Jan 31, 2012 at 4:19 AM, Scott Dial wrote: > PEP 3144 wasn't pronounced upon because there were significant > disagreements about the design of the API proposed in the PEP. As it > stands, I believe the authorship of ipaddr either decided that they were > not going to compromise their module or lost interest. > > See Nick Coghlan's summary: > > http://mail.python.org/pipermail//python-ideas/2011-August/011305.html Peter Moody actually addressed all my comments from last year (alas, I forgot that python-ideas got dropped from the latter part of the email chain, so it became a private discussion between Peter, Guido and myself). I apparently got distracted by other issues and never followed up on Peter's final review request. The branch with the relevant changes is here (these weren't added back into ipaddr mainline since they aren't all backwards compatible with the existing ipaddr API): http://code.google.com/p/ipaddr-py/source/browse/#svn%2Fbranches%2F3144 Peter was very responsive and accommodating during that discussion :) (The notes below are an edited version of Peter's off-list reply to me from last year, reflecting the final state of the ipaddr 3144 branch) On Mon, Aug 29, 2011 at 7:09 PM, Nick Coghlan wrote: I believe the PEP would be significantly more palatable with the following changes/additions: 1. Draft ReStructuredText documentation for inclusion in the stdlib docs (still needed) 2. Removal of the "ip" attribute of IP network objects (since it makes the nominal "networks" behave like IP interface definitions) the Class hierarchy now looks like: _IPAddrBase(object) # mother of everything _BaseAddress(_IPAddrBase) # base for addresses _ BaseNetwork(_IPAddrBase) # base for networks and interfaces, could use be renamed. _BaseV4(object) # ipv4 base _BaseV6(object) # ipv6 base IPv4Address(_BaseV4, _BaseAddress) IPv4Interface(_BaseV4, _BaseNetwork) IPv4Network(IPv4Interface) IPv6Address(_BaseV6, _BaseAddress) IPv6Interface(_BaseV6, _BaseNetwork) IPv6Network(IPv6Interface) (essentially, the current ipaddr "Network" objects become "Interface" objects in PEP 3144, with a new strict "Network" object that has no ip attribute) 3. "network" property renamed to "netaddr" (since it returns an address object rather than a network object) renamed to network_address. did the same for the broadcast_address. 4. "strict" parameter removed from class signatures, replaced with class method for non-strict behaviour 'strict' is gone, just create IPv*Interface objects or use the ip_interface API instead. Network objects are always strict. 5. Factory functions renamed so they don't look like class names (ip_network, ip_address, ip) Now ip_address, ip_network, ip_interface 6. "strict" parameter on factory functions modified to default to True rather than False 'strict' is gone. Interfaces allow a host IP, Networks don't. 7. Addition of an explicit "IPInterface" class to cover the association of an address with a specific network that is currently handled by storing arbitrary addresses on IP network objects done. So with a cleanup of the docstrings (and creation of some ReST docs based on them) a definite +1 from me for inclusion of ipaddr (based on the 3144 branch in SVN) in 3.3. (with the tweaks to the API, we may want to use a different name like "ipaddress" or "iptools", though - otherwise people could be legitimately confused by the differences relative to the PyPI "ipaddr" module) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From pmoody at google.com Mon Jan 30 22:52:26 2012 From: pmoody at google.com (Peter Moody) Date: Mon, 30 Jan 2012 13:52:26 -0800 Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard library __preview__ package) In-Reply-To: References: Message-ID: On Mon, Jan 30, 2012 at 1:44 PM, Nick Coghlan wrote: > On Tue, Jan 31, 2012 at 4:19 AM, Scott Dial > wrote: >> PEP 3144 wasn't pronounced upon because there were significant >> disagreements about the design of the API proposed in the PEP. As it >> stands, I believe the authorship of ipaddr either decided that they were >> not going to compromise their module or lost interest. >> >> See Nick Coghlan's summary: >> >> http://mail.python.org/pipermail//python-ideas/2011-August/011305.html > > Peter Moody actually addressed all my comments from last year (alas, I > forgot that python-ideas got dropped from the latter part of the email > chain, so it became a private discussion between Peter, Guido and > myself). I apparently got distracted by other issues and never > followed up on Peter's final review request. The branch with the > relevant changes is here (these weren't added back into ipaddr > mainline since they aren't all backwards compatible with the existing > ipaddr API): http://code.google.com/p/ipaddr-py/source/browse/#svn%2Fbranches%2F3144 > > Peter was very responsive and accommodating during that discussion :) > > (The notes below are an edited version of Peter's off-list reply to me > from last year, reflecting the final state of the ipaddr 3144 branch) > > On Mon, Aug 29, 2011 at 7:09 PM, Nick Coghlan wrote: > > ? ?I believe the PEP would be significantly more palatable with the > ? ?following changes/additions: > ? ?1. Draft ReStructuredText documentation for inclusion in the stdlib docs > > (still needed) > > ? ?2. Removal of the "ip" attribute of IP network objects (since it makes > ? ?the nominal "networks" behave like IP interface definitions) > > the Class hierarchy now looks like: > > _IPAddrBase(object) # mother of everything > _BaseAddress(_IPAddrBase) # base for addresses > _ BaseNetwork(_IPAddrBase) # base for networks and interfaces, could > use be renamed. > _BaseV4(object) # ipv4 base > _BaseV6(object) # ipv6 base > > IPv4Address(_BaseV4, _BaseAddress) > IPv4Interface(_BaseV4, _BaseNetwork) > IPv4Network(IPv4Interface) > > IPv6Address(_BaseV6, _BaseAddress) > IPv6Interface(_BaseV6, _BaseNetwork) > IPv6Network(IPv6Interface) > > (essentially, the current ipaddr "Network" objects become "Interface" > objects in PEP 3144, with a new strict "Network" object that has no ip > attribute) > > ? ?3. "network" property renamed to "netaddr" (since it returns an > ? ?address object rather than a network object) > > renamed to network_address. > did the same for the broadcast_address. > > ? ?4. "strict" parameter removed from class signatures, replaced with > ? ?class method for non-strict behaviour > > 'strict' is gone, just create IPv*Interface objects or use the > ip_interface API instead. Network objects are always strict. > > > ? ?5. Factory functions renamed so they don't look like class names > ? ?(ip_network, ip_address, ip) > > Now ip_address, ip_network, ip_interface > > > ? ?6. "strict" parameter on factory functions modified to default to True > ? ?rather than False > > 'strict' is gone. Interfaces allow a host IP, Networks don't. > > ? ?7. Addition of an explicit "IPInterface" class to cover the > ? ?association of an address with a specific network that is currently > ? ?handled by storing arbitrary addresses on IP network objects > > done. > > So with a cleanup of the docstrings (and creation of some ReST docs > based on them) a definite +1 from me for inclusion of ipaddr (based on > the 3144 branch in SVN) in 3.3. (with the tweaks to the API, we may > want to use a different name like "ipaddress" or "iptools", though - > otherwise people could be legitimately confused by the differences > relative to the PyPI "ipaddr" module) Cleaning up the docstrings and re-tooling the PEP was where I stalled after addressing your comments. Easy enough to complete if there's still interest. Note, http://pypi.python.org/pypi/ipaddr is actually the same module, but down a few versions. I'm not sure if your concern is about the same library having such a different api or if you had thought they were completely different libraries. Cheers, peter > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia -- Peter Moody? ? ? Google? ? 1.650.253.7306 Security Engineer? pgp:0xC3410038 From ncoghlan at gmail.com Mon Jan 30 23:04:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 08:04:28 +1000 Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package In-Reply-To: <20120130195922.459cd010@pitrou.net> References: <20120127161051.3a47b26c@resist.wooz.org> <20120127224858.671af059@pitrou.net> <20120127175414.385567b6@resist.wooz.org> <878vks2yfl.fsf@uwakimon.sk.tsukuba.ac.jp> <8739b02pdv.fsf@uwakimon.sk.tsukuba.ac.jp> <20120130195922.459cd010@pitrou.net> Message-ID: On Tue, Jan 31, 2012 at 4:59 AM, Antoine Pitrou wrote: > It would be nice if that pronouncement or decision could outline the > steps required to include an "experimental" module in the stdlib, and > the steps required to move it from "experimental" to "stable". Actually, that's a good idea - Eli, care to try your hand at writing up a counter-PEP to 408 that more explicitly documents Guido's preferred approach? It should document a standard note to be placed in the module documentation and in What's New for experimental/provisional/whatever modules. For example: "The module has been included in the standard library on a provisional basis. While major changes are not anticipated, as long as this notice remains in place, backwards incompatible changes are permitted if deemed necessary by the standard library developers. Such changes will not be made gratuitously - they will occur only if serious API flaws are uncovered that were missed prior to inclusion of the module. If the small chance of such changes is not acceptable for your use, the module is also available from PyPI with full backwards compatibility guarantees." (include direct link to module on PyPI) As far as the provisional->stable transition goes, I'd say there are a couple of options: 1. Just make it part of the normal release process to ask for each provisional module "This hasn't been causing any dramas, shall we drop the provisional warning?" 2. Explicitly create 'release blocker' tracker issues for the *next* release whenever a provisional module is added. These will basically say "either drop the provisional warning for module or bump this issue along to the next release" Former is obviously easier, latter means we're less likely to forget to do it. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Jan 30 23:09:22 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 08:09:22 +1000 Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard library __preview__ package) In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 7:52 AM, Peter Moody wrote: > Note, http://pypi.python.org/pypi/ipaddr is actually the same module, > but down a few versions. I'm not sure if your concern is about the > same library having such a different api or if you had thought they > were completely different libraries. No, I knew that - my point was that the changes in the PEP 3144 branch are backwards incompatible with the existing ipaddr API (mainly due to the always-strict Network objects, with the permissive behaviour moved out to the separate Interface objects, but also due to the renamed factory functions), so it may be easier to just give the 3144 version of the module a different name. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From bauertomer at gmail.com Mon Jan 30 23:11:04 2012 From: bauertomer at gmail.com (T.B.) Date: Tue, 31 Jan 2012 00:11:04 +0200 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: <4F271578.80000@gmail.com> On 2012-01-30 20:52, Guido van Rossum wrote: > TB, what's your use case for passing a float to a semaphore? > Semaphores are conceptually tied to integers. You've kept arguing a > few times now that the workaround you need are clumsy, but you've not > explained why you're passing floats in the first place. A "fractional > resource" just doesn't sound like a real use case to me. > Not an example from real life and certainly not one that can't be worked around; rather a thing that caught my eyes while looking at Lib/threading.py: Say you have a "known" constant guaranteed bandwidth and you need to split it among several connections which each of them take a known fixed amount of bandwidth (no more, no less). How many connections can I reliably serve? TOTAL_BANDWIDTH/BANDWIDTH_PER_CONNECTION. Well, actually int(_)... Side note: If someone really want a discrete math implementation of a semaphore, you can replace _value with a list of resources. Then you check in acquire() "while not self._resources:" and pop a resource. In that case when a semaphore is used as a context manager it can have a useful 'as' clause. To me it seems too complicated for something that should be simple like a semaphore. Regards, TB From anacrolix at gmail.com Mon Jan 30 23:11:22 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Tue, 31 Jan 2012 09:11:22 +1100 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: It's also potentially lossy if you incremented and decremented until integer precision is lost. My vote is for an int type check. No casting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 30 23:19:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 08:19:42 +1000 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner wrote: > It's also potentially lossy if you incremented and decremented until integer > precision is lost. My vote is for an int type check. No casting. operator.index() is built for that purpose (it's what we use these days to restrict slicing to integers). +1 for the type restriction from me. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Mon Jan 30 23:14:51 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jan 2012 14:14:51 -0800 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: On Mon, Jan 30, 2012 at 2:11 PM, Matt Joiner wrote: > It's also potentially lossy if you incremented and decremented until integer > precision is lost. My vote is for an int type check. No casting. +1. Anything else is insane scope creep for something called "Semaphore". -- --Guido van Rossum (python.org/~guido) From benjamin at python.org Mon Jan 30 23:23:42 2012 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 30 Jan 2012 17:23:42 -0500 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: 2012/1/30 Nick Coghlan : > On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner wrote: >> It's also potentially lossy if you incremented and decremented until integer >> precision is lost. My vote is for an int type check. No casting. > > operator.index() is built for that purpose (it's what we use these > days to restrict slicing to integers). > > +1 for the type restriction from me. We don't need a type check. Just pass integers (obviously the only right type) to it. -- Regards, Benjamin From victor.stinner at haypocalc.com Tue Jan 31 00:31:13 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 31 Jan 2012 00:31:13 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects Message-ID: Hi, In issues #13882 and #11457, I propose to add an argument to functions returning timestamps to choose the timestamp format. Python uses float in most cases whereas float is not enough to store a timestamp with a resolution of 1 nanosecond. I added recently time.clock_gettime() to Python 3.3 which has a resolution of a nanosecond. The (first?) new timestamp format will be decimal.Decimal because it is able to store any timestamp in any resolution without loosing bits. Instead of adding a boolean argument, I would prefer to support more formats. My last patch provides the following formats: - "float": float (used by default) - "decimal": decimal.Decimal - "datetime": datetime.datetime - "timespec": (sec, nsec) tuple # I don't think that we need it, it is just another example The proposed API is: time.time(format="datetime") time.clock_gettime(time.CLOCK_REALTIME, format="decimal") os.stat(path, timestamp="datetime) etc. This API has an issue: importing the datetime or decimal object is implicit, I don't know if it is really an issue. (In my last patch, the import is done too late, but it can be fixed, it is not really a matter.) Alexander Belopolsky proposed to use time.time(format=datetime.datetime) instead. -- The first step would be to add an argument to functions returning timestamps. The second step is to accept these new formats (Decimal?) as input, for datetime.datetime.fromtimestamp() and os.utime() for example. (Using decimal.Decimal, we may remove os.utimens() and use the right function depending on the timestamp resolution.) -- I prefer Decimal over a dummy tuple like (sec, nsec) because you can do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the resolution of the clock: time.time() and time.clock_gettime() have for example different resolution (sec, ms, us for time.time() and ns for clock_gettime()). The decimal module is still implemented in Python, but there is working implementation in C which is much faster. Store timestamps as Decimal can be a motivation to integrate the C implementation :-) -- Examples with the time module: $ ./python Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31) >>> import time >>> time.time() 1327611705.948446 >>> time.time('decimal') Decimal('1327611708.988419') >>> t1=time.time('decimal'); t2=time.time('decimal'); t2-t1 Decimal('0.000550') >>> t1=time.time('float'); t2=time.time('float'); t2-t1 5.9604644775390625e-06 >>> time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal') Decimal('1211833.389740312') >>> time.clock_getres(time.CLOCK_MONOTONIC, 'decimal') Decimal('1E-9') >>> time.clock() 0.12 >>> time.clock('decimal') Decimal('0.120000') Examples with os.stat: $ ./python Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24) >>> import os >>> s=os.stat("setup.py", timestamp="datetime") >>> s.st_mtime - s.st_ctime datetime.timedelta(0) >>> print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293 >>> os.stat("setup.py", timestamp="timespec").st_ctime (1323458640, 702327236) >>> os.stat("setup.py", timestamp="decimal").st_ctime Decimal('1323458640.702327236') Victor From anacrolix at gmail.com Tue Jan 31 00:50:45 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Tue, 31 Jan 2012 10:50:45 +1100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: Sounds good, but I also prefer Alexander's method. The type information is already encoded in the class object. This way you don't need to maintain a mapping of strings to classes, and other functions/third party can join in the fun without needing access to the latest canonical mapping. Lastly there will be no confusion or contention for duplicate keys. On Jan 31, 2012 10:32 AM, "Victor Stinner" wrote: > Hi, > > In issues #13882 and #11457, I propose to add an argument to functions > returning timestamps to choose the timestamp format. Python uses float > in most cases whereas float is not enough to store a timestamp with a > resolution of 1 nanosecond. I added recently time.clock_gettime() to > Python 3.3 which has a resolution of a nanosecond. The (first?) new > timestamp format will be decimal.Decimal because it is able to store > any timestamp in any resolution without loosing bits. Instead of > adding a boolean argument, I would prefer to support more formats. My > last patch provides the following formats: > > - "float": float (used by default) > - "decimal": decimal.Decimal > - "datetime": datetime.datetime > - "timespec": (sec, nsec) tuple # I don't think that we need it, it > is just another example > > The proposed API is: > > time.time(format="datetime") > time.clock_gettime(time.CLOCK_REALTIME, format="decimal") > os.stat(path, timestamp="datetime) > etc. > > This API has an issue: importing the datetime or decimal object is > implicit, I don't know if it is really an issue. (In my last patch, > the import is done too late, but it can be fixed, it is not really a > matter.) > > Alexander Belopolsky proposed to use > time.time(format=datetime.datetime) instead. > > -- > > The first step would be to add an argument to functions returning > timestamps. The second step is to accept these new formats (Decimal?) > as input, for datetime.datetime.fromtimestamp() and os.utime() for > example. > > (Using decimal.Decimal, we may remove os.utimens() and use the right > function depending on the timestamp resolution.) > > -- > > I prefer Decimal over a dummy tuple like (sec, nsec) because you can > do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the > resolution of the clock: time.time() and time.clock_gettime() have for > example different resolution (sec, ms, us for time.time() and ns for > clock_gettime()). > > The decimal module is still implemented in Python, but there is > working implementation in C which is much faster. Store timestamps as > Decimal can be a motivation to integrate the C implementation :-) > > -- > > Examples with the time module: > > $ ./python > Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31) > >>> import time > >>> time.time() > 1327611705.948446 > >>> time.time('decimal') > Decimal('1327611708.988419') > >>> t1=time.time('decimal'); t2=time.time('decimal'); t2-t1 > Decimal('0.000550') > >>> t1=time.time('float'); t2=time.time('float'); t2-t1 > 5.9604644775390625e-06 > >>> time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal') > Decimal('1211833.389740312') > >>> time.clock_getres(time.CLOCK_MONOTONIC, 'decimal') > Decimal('1E-9') > >>> time.clock() > 0.12 > >>> time.clock('decimal') > Decimal('0.120000') > > Examples with os.stat: > > $ ./python > Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24) > >>> import os > >>> s=os.stat("setup.py", timestamp="datetime") > >>> s.st_mtime - s.st_ctime > datetime.timedelta(0) > >>> print(s.st_atime - s.st_ctime) > 52 days, 1:44:06.191293 > >>> os.stat("setup.py", timestamp="timespec").st_ctime > (1323458640, 702327236) > >>> os.stat("setup.py", timestamp="decimal").st_ctime > Decimal('1323458640.702327236') > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jan 31 01:51:09 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 31 Jan 2012 09:51:09 +0900 Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard library __preview__ package) In-Reply-To: References: Message-ID: <87pqe01ypu.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > 1. Draft ReStructuredText documentation for inclusion in the stdlib docs > > (still needed) No wonder people (not directly involved in development of the module) think that the proponents don't care! What good is a battery if the odds are even that you will hook it up with wrong polarity and fry your expensive components? I don't mean to criticize the proponents and mentors of *this* PEP; I recall the ipaddr vs. netaddr discussions, and clearly the API needed and got a lot of changes. That's definitely a chilling factor for writing a second document that largely covers the same material as the PEP. On the other hand, people who are not battery manufacturers have every right to use stdlib-ready documentation as a litmus test for readiness (and even if you think otherwise, you can't stop them). While you probably won't get a lot of comments from those people if you publish such docs, if you don't publish docs, you will get none. I suggest emphasizing (in the 408bis PEP that Nick suggested) the importance of documentation in convincing the "just users" audience (which is the one that stdlib is really aimed at) of the value and readiness of a module proposed for stdlib integration. From ncoghlan at gmail.com Tue Jan 31 02:26:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 11:26:09 +1000 Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard library __preview__ package) In-Reply-To: <87pqe01ypu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87pqe01ypu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jan 31, 2012 at 10:51 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > ?> ? ? 1. Draft ReStructuredText documentation for inclusion in the stdlib docs > ?> > ?> (still needed) > > No wonder people (not directly involved in development of the module) > think that the proponents don't care! ?What good is a battery if the > odds are even that you will hook it up with wrong polarity and fry > your expensive components? Thinking about how to document the library from a network engineer's perspective was actually the driving force behind my asking for the Address/Interface/Network split in the PEP 3144 branch. Without that, Network tries to fill both the Interface and Network role and it becomes a bit of a nightmare to write coherent prose documentation. Sure, merging them can *work* from a programming point of view, but you can't document it that way and have the API seems sensible to anyone familiar with the underlying networking concepts. Now that ReadTheDocs exists, it is of course *much* easier to draft and publish such documentation than it once was (*not-so-subtle-hint*). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From g.brandl at gmx.net Tue Jan 31 07:22:08 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 31 Jan 2012 07:22:08 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: Am 31.01.2012 00:50, schrieb Matt Joiner: > Sounds good, but I also prefer Alexander's method. The type information is > already encoded in the class object. This way you don't need to maintain a > mapping of strings to classes, and other functions/third party can join in the > fun without needing access to the latest canonical mapping. Lastly there will be > no confusion or contention for duplicate keys. Sorry, I don't think it makes any sense to pass around classes as flags. Sure, if you do something directly with the class, it's fine, but in this case that's impossible. So you will be testing if format is datetime.datetime: ... elif format is decimal.Decimal: ... else: ... which has no advantage at all over if format == "datetime": ... elif format == "decimal": ... else: Not to speak of formats like "timespec" that don't have a respective class. And how do you propose to handle the extensibility you speak of to work? Georg From stefan_ml at behnel.de Tue Jan 31 07:55:29 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 31 Jan 2012 07:55:29 +0100 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> <20120130201352.6fc893e9@pitrou.net> Message-ID: stefan brunthaler, 30.01.2012 20:18: >> Well, you're aware that Python already uses threaded code where >> available? Or are you testing against Python 2? >> > Yes, and I am building on that. I assume "yes" here means "yes, I'm aware" and not "yes, I'm using Python 2", right? And you're building on top of the existing support for threaded code in order to improve it? Stefan From g.brandl at gmx.net Tue Jan 31 08:12:22 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 31 Jan 2012 08:12:22 +0100 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> Message-ID: Am 30.01.2012 20:06, schrieb stefan brunthaler: >> Do you have a public repository for the code, so we can take a look? >> > I have created a patch (as Benjamin wanted) and put all of the > resources (i.e., benchmark results and the patch itself) on my home > page: > http://www.ics.uci.edu/~sbruntha/pydev.html If I read the patch correctly, most of it is auto-generated (and there is probably a few spurious changes that blow it up, such as the python-gdb.py file). But the tool that actually generates the code doesn't seem to be included? (Which means that in this form, the patch couldn't possibly be accepted.) Georg From ncoghlan at gmail.com Tue Jan 31 08:16:06 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 17:16:06 +1000 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 9:31 AM, Victor Stinner wrote: > Hi, > > In issues #13882 and #11457, I propose to add an argument to functions > returning timestamps to choose the timestamp format. Python uses float > in most cases whereas float is not enough to store a timestamp with a > resolution of 1 nanosecond. I added recently time.clock_gettime() to > Python 3.3 which has a resolution of a nanosecond. The (first?) new > timestamp format will be decimal.Decimal because it is able to store > any timestamp in any resolution without loosing bits. Instead of > adding a boolean argument, I would prefer to support more formats. I think this is definitely worth elaborating in a PEP (to recap the long discussion in #11457 if nothing else). In particular, I'd want to see a very strong case being made for supporting multiple formats over standardising on a *single* new higher precision format (for example, using decimal.Decimal in conjunction with integration of Stefan's cdecimal work) that can then be converted to other formats (like datetime) via the appropriate APIs. "There are lots of alternatives, so let's choose not to choose!" is a bad way to design an API. Helping to make decisions like this by laying out the alternatives and weighing up their costs and benefits is one of the major reasons the PEP process exists. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Tue Jan 31 10:42:39 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 31 Jan 2012 10:42:39 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: > I think this is definitely worth elaborating in a PEP (to recap the > long discussion in #11457 if nothing else). The discussion in issues #13882 and #11457 already lists many alternatives with their costs and benefits, but I can produce a PEP if you need a summary. > In particular, I'd want to > see a very strong case being made for supporting multiple formats over > standardising on a *single* new higher precision format (for example, > using decimal.Decimal in conjunction with integration of Stefan's > cdecimal work) that can then be converted to other formats (like > datetime) via the appropriate APIs. To convert a Decimal to a datetime object, we have already the datetime.datetime.fromtimestamp() function (it converts Decimal to float, but the function can be improved without touching its API). But I like the possibility of getting the file modification time directly as a datetime object to have something like: >>> s=os.stat("setup.py", timestamp="datetime") >>> print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293 We have already more than one timestamp format: os.stat() uses int or float depending on os.stat_float_times() value. In 5 years, we may prefer to use directly float128 instead of Decimal. I prefer to have an extensible API to prepare future needs, even if we just add Decimal today. Hum, by the way, we need a "int" format for os.stat(), so os.stat_float_times() can be deprecated. So there will be a minimum of 3 types: - int - float - decimal.Decimal Victor From ncoghlan at gmail.com Tue Jan 31 12:11:37 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jan 2012 21:11:37 +1000 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner wrote: >> I think this is definitely worth elaborating in a PEP (to recap the >> long discussion in #11457 if nothing else). > > The discussion in issues #13882 and #11457 already lists many > alternatives with their costs and benefits, but I can produce a PEP if > you need a summary. PEPs are about more than just providing a summary - they're about presenting the alternatives in a clear form instead of having them scattered across a long meandering tracker discussion. Laying out the alternatives and clearly articulating their pros and cons (as Larry attempted to do on the tracker) *helps to make better decisions*. I counted several options presented as possibilities and I probably missed some: - expose the raw POSIX (seconds, nanoseconds) 2-tuples (lots of good reasons not to go that way) - use decimal.Decimal (with or without cdecimal) - use float128 (nixed due to cross-platform supportability problems) - use datetime (bad idea for the reasons Martin mentioned) - use timedelta (not mentioned on the tracker, but a *much* better fit for a timestamp than datetime, since timestamps are relative to the epoch while datetime objects try to be absolute) A PEP would also allow the following items to be specifically addressed: - a survey of what other languages are doing to cope with nanosecond time resolutions (as suggested by Raymond but not actually done as far I could see on the tracker) - how to avoid a negative performance impact on os.stat() (new API? flag argument? new lazily populated attributes accessed by name only?) Guido's admonition against analysis paralysis doesn't mean we should go to the other extreme and skip clearly documenting our analysis of complex problems altogether (particularly for something like this which may end up having ramifications for a lot of other time related code). Having a low-level module like os needing to know about higher-level types like decimal.Decimal and datetime.datetime (or even timedelta) should be setting off all kinds of warning bells. Of all the possibilties that offer decent arithmetic support, timedelta is probably the one currently most suited to being pushed down to the os level, although decimal.Decimal is also a contender if backed up by Stefan's C implementation. You're right that supporting this does mean being able to at least select between 'int', 'float' and output, but that's the kind of case that can be made most clearly in a PEP. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Tue Jan 31 12:47:27 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 31 Jan 2012 11:47:27 +0000 Subject: [Python-Dev] cdecimal (Was: Store timestamps as decimal.Decimal objects) Message-ID: On 31 January 2012 11:11, Nick Coghlan wrote: > although decimal.Decimal is also a contender if backed up by > Stefan's C implementation. As you mention this, and given the ongoing thread about __preview__ and "nearly ready for stdlib" modules, what is the current position on cdecimal? I seem to recall it being announced some time ago, but I don't recall any particular discussions/conclusions about including it in the stdlib. Is it being considered for stdlib inclusion? What obstacles remain before inclusion (clearly not many, if it's being seriously considered as an option to support functions in something as fundamental as os)? Do Guido's comments on the __preview__ thread make any difference here? (Note - I don't have any particular *need* for cdecimal, I'm just curious...) Paul. From victor.stinner at haypocalc.com Tue Jan 31 13:08:21 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 31 Jan 2012 13:08:21 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: Hi, 2012/1/31 Matt Joiner : > Sounds good, but I also prefer Alexander's method. The type information is > already encoded in the class object. Ok, I posted a patch version 6 to use types instead of strings. I also prefer types because it solves the "hidden import" issue. > This way you don't need to maintain a > mapping of strings to classes, and other functions/third party can join in > the fun without needing access to the latest canonical mapping. Lastly there > will be no confusion or contention for duplicate keys. My patch checks isinstance(format, type), format.__module__ and format.__name__ to do the "mapping". It is not a direct mapping because I don't always call the same method, the implementation is completly differenet for each type. I don't think that we need user defined timestamp formats. My last patch provides 5 formats: - int - float - decimal.Decimal - datetime.datetime - datetime.timedelta (I removed the timespec format, I consider that we don't need it.) Examples: >>> time.time() 1328006975.681211 >>> time.time(format=int) 1328006979 >>> time.time(format=decimal.Decimal) Decimal('1328006983.761119') >>> time.time(format=datetime.datetime) datetime.datetime(2012, 1, 31, 11, 49, 49, 409831) >>> print(time.time(format=datetime.timedelta)) 15370 days, 10:49:52.842116 If someone wants another format, he/she should pick up an existing format to build his/her own format. datetime.datetime and datetime.timedelta can be used on any function, but datetime.datetime format gives surprising results on clocks using an arbitrary start like time.clock() or time.wallclock(). We may raise an error in these cases. From solipsis at pitrou.net Tue Jan 31 13:13:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 31 Jan 2012 13:13:30 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects References: Message-ID: <20120131131330.2349dc6b@pitrou.net> On Tue, 31 Jan 2012 21:11:37 +1000 Nick Coghlan wrote: > > Having a low-level module like os needing to know about higher-level > types like decimal.Decimal and datetime.datetime (or even timedelta) > should be setting off all kinds of warning bells. Decimal is ideally low-level (it's a number), it's just that it has a complicated high-level implementation :) But we can't use Decimal by default, for the obvious reason (performance impact that threatens to contaminate other parts of the code through operator application). > Of all the > possibilties that offer decent arithmetic support, timedelta is > probably the one currently most suited to being pushed down to the os > level, although decimal.Decimal is also a contender if backed up by > Stefan's C implementation. I'm -1 on using timedelta. This is a purity proposition that will make no sense to the average user. By the way, datetimes are relative too, by the same reasoning. Regards Antoine. From victor.stinner at haypocalc.com Tue Jan 31 13:20:23 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 31 Jan 2012 13:20:23 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: > - use datetime (bad idea for the reasons Martin mentioned) It is only a bad idea if it is the only available choice. > - use timedelta (not mentioned on the tracker, but a *much* better fit > for a timestamp than datetime, since timestamps are relative to the > epoch while datetime objects try to be absolute) Last version of my patch supports also timedelta. > - a survey of what other languages are doing to cope with nanosecond > time resolutions (as suggested by Raymond but not actually done as far > I could see on the tracker) I didn't check that right now. I don't know if it is really revelant because some languages don't have a builtin Decimal class or no "builtin" datetime module. > - how to avoid a negative performance impact on os.stat() (new API? > flag argument? new lazily populated attributes accessed by name only?) Because timestamp is an optional argument to os.stat() and the behaviour is unchanged by default, the performance impact of my patch on os.stat() is null (if you don't set timestamp argument). > Having a low-level module like os needing to know about higher-level > types like decimal.Decimal and datetime.datetime (or even timedelta) > should be setting off all kinds of warning bells. What is the problem of using decimal in the os module? Especially if it is an option. In my patch version 6, the timestamp argument is now a type (e.g. decimal.Decimal) instead of a string, so the os module doesn't import directly the module (well, to be exact, it does import the module, but the module should already be in the cache, sys.modules). > You're right that supporting this does mean being able to at least > select between 'int', 'float' and output, but that's > the kind of case that can be made most clearly in a PEP. Why do you want to limit the available formats? Why not giving the choice to the user between Decimal, datetime and timedelta? Each type has a different use case and different features, sometimes exclusive. Victor From stefan_ml at behnel.de Tue Jan 31 14:19:40 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 31 Jan 2012 14:19:40 +0100 Subject: [Python-Dev] PEPs and cons (was: Re: Store timestamps as decimal.Decimal objects) In-Reply-To: References: Message-ID: Nick Coghlan, 31.01.2012 12:11: > On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner wrote: >>> I think this is definitely worth elaborating in a PEP (to recap the >>> long discussion in #11457 if nothing else). >> >> The discussion in issues #13882 and #11457 already lists many >> alternatives with their costs and benefits, but I can produce a PEP if >> you need a summary. > > PEPs are about more than just providing a summary - they're about > presenting the alternatives in a clear form instead of having them > scattered across a long meandering tracker discussion. There was a keynote by Jan Lehnardt (of CouchDB fame) on last year's PyCon-DE on the end of language wars and why we should just give each other a hug and get along and all that. To seed some better understanding, he had come up with mottoes for the Ruby and Python language communities, which find themselves in continuous quarrel. I remember the motto for Python being "you do it right - and you document it". A clear hit IMHO. Decisions about language changes and environmental changes (such as the stdlib) aren't easily taken in the Python world, but when they are taken, they tend to show a good amount of well reflected common sense, and we make it transparent how they come to be by writing a PEP about them, so that we (and others) can go back and read them up later on when they are being questioned again or when similar problems appear in other languages. That's a good thing, and we should keep that up. Stefan From s.brunthaler at uci.edu Tue Jan 31 16:33:15 2012 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Tue, 31 Jan 2012 07:33:15 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> <20120130201352.6fc893e9@pitrou.net> Message-ID: > I assume "yes" here means "yes, I'm aware" and not "yes, I'm using Python > 2", right? And you're building on top of the existing support for threaded > code in order to improve it? > Your assumption is correct, I'm sorry for the sloppiness (I was heading out for lunch.) None of the code is 2.x compatible, all of my work has always targeted Python 3.x. My work does not improve threaded code (as in interpreter dispatch technique), but enables efficient and purely interpretative inline caching via quickening. (So, after execution of BINARY_ADD, I rewrite the specific occurence of the bytecode instruction to a, say, FLOAT_ADD instruction and ensure that my assumption is correct in the FLOAT_ADD instruction.) Thanks, --stefan From s.brunthaler at uci.edu Tue Jan 31 16:46:04 2012 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Tue, 31 Jan 2012 07:46:04 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> Message-ID: > If I read the patch correctly, most of it is auto-generated (and there > is probably a few spurious changes that blow it up, such as the > python-gdb.py file). Hm, honestly I don't know where the python-gdb.py file comes from, I thought it came with the switch from 3.1 to the tip version I was using. Anyways, I did not tuch it or at least have no recollection of doing so. Regarding the spurious changes: This might very well be, regression testing works, and it would actually be fairly easy to figure out crashes (e.g., by tracing all executed bytecode instructions and seeing if all of them are actually executed, I could easily do that if wanted/necessary.) > But the tool that actually generates the code > doesn't seem to be included? ?(Which means that in this form, the > patch couldn't possibly be accepted.) > Well, the tool is not included because it does a lot more (e.g., generate the code for elimination of reference count operations.) Unfortunately, my interpreter architecture that achieves the highest speedups is more complicated, and I got the feeling that this is not going well with python-dev. So, I had the idea of basically using just one (but a major one) optimization technique and going with that. I don't see why you would need my code generator, though. Not that I cared, but I would need to strip down and remove many parts of it and also make it more accessible to other people. However, if python-dev decides that it wants to include the optimizations and requires the code generator, I'll happily chip in the extra work an give you the corresponding code generator, too. Thanks, --stefan From brett at python.org Tue Jan 31 16:54:22 2012 From: brett at python.org (Brett Cannon) Date: Tue, 31 Jan 2012 10:54:22 -0500 Subject: [Python-Dev] cdecimal (Was: Store timestamps as decimal.Decimal objects) In-Reply-To: References: Message-ID: On Tue, Jan 31, 2012 at 06:47, Paul Moore wrote: > On 31 January 2012 11:11, Nick Coghlan wrote: > > although decimal.Decimal is also a contender if backed up by > > Stefan's C implementation. > > As you mention this, and given the ongoing thread about __preview__ > and "nearly ready for stdlib" modules, what is the current position on > cdecimal? I seem to recall it being announced some time ago, but I > don't recall any particular discussions/conclusions about including it > in the stdlib. > > Is it being considered for stdlib inclusion? What obstacles remain > before inclusion (clearly not many, if it's being seriously considered > as an option to support functions in something as fundamental as os)? > Do Guido's comments on the __preview__ thread make any difference > here? > > (Note - I don't have any particular *need* for cdecimal, I'm just > curious...) > Because cdecimal is just an accelerated version of decimal there is no specific stdlib restriction from it going in. At this point I think it just needs to be finished and then committed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bauertomer at gmail.com Tue Jan 31 19:46:54 2012 From: bauertomer at gmail.com (T.B.) Date: Tue, 31 Jan 2012 20:46:54 +0200 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> <4F26E420.2060707@gmail.com> Message-ID: <4F28371E.7000001@gmail.com> On 2012-01-31 00:23, Benjamin Peterson wrote: > 2012/1/30 Nick Coghlan: >> On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner wrote: >>> It's also potentially lossy if you incremented and decremented until integer >>> precision is lost. My vote is for an int type check. No casting. >> >> operator.index() is built for that purpose (it's what we use these >> days to restrict slicing to integers). >> >> +1 for the type restriction from me. > > We don't need a type check. Just pass integers (obviously the only > right type) to it. > > When a float is used, think of debugging such a thing, e.g. a float from integer division. I don't care if float (or generally non-integers) are not allowed in threading.Semaphore, but please make it fail with a bang. Regards, TB From alexander.belopolsky at gmail.com Tue Jan 31 19:57:49 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 31 Jan 2012 13:57:49 -0500 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: On Mon, Jan 30, 2012 at 6:31 PM, Victor Stinner wrote: > Alexander Belopolsky proposed to use > time.time(format=datetime.datetime) instead. Just to make sure my view is fully expressed: I am against adding flag arguments to time.time(). My preferred solution to exposing high resolution clocks is to do it in a separate module. You can even call the new function time() and access it as hirestime.time(). Longer names that reflect various time representation are also an option: hirestime.decimal_time(), hirestime.datetime_time() etc. The suggestion to use the actual type as a flag was motivated by the desire to require module import before fancy time.time() can be called. When you care about nanoseconds in your time stamps you won't tolerate an I/O delay between calling time() and getting the result. A separate module can solve this issue much better: simply import decimal or datetime or both at the top of the module. From alexander.belopolsky at gmail.com Tue Jan 31 20:08:31 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 31 Jan 2012 14:08:31 -0500 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: <20120131131330.2349dc6b@pitrou.net> References: <20120131131330.2349dc6b@pitrou.net> Message-ID: On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou wrote: > On Tue, 31 Jan 2012 21:11:37 +1000 > Nick Coghlan wrote: >> >> Having a low-level module like os needing to know about higher-level >> types like decimal.Decimal and datetime.datetime (or even timedelta) >> should be setting off all kinds of warning bells. > > Decimal is ideally low-level (it's a number), it's just that it has a > complicated high-level implementation :) FWIW, my vote is also for Decimal and against datetime or timedelta. (I dream of Decimal replacing float in Python 4000, so take my vote with an appropriate amount of salt. :-) From raymond.hettinger at gmail.com Tue Jan 31 21:10:16 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 31 Jan 2012 12:10:16 -0800 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: References: <4F24538E.9060705@gmail.com> Message-ID: <0817F73E-95E6-4B88-B967-3DBE0A40D7C6@gmail.com> On Jan 29, 2012, at 6:11 PM, John O'Connor wrote: > On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson wrote: >> But why would you want to pass a float? It seems like API abuse to me. >> > > Agreed. Anything else seems meaningless. I concur. This is very much a non-problem. There is no need to add more code and slow running time with superfluous type checks. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Jan 31 21:49:52 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 31 Jan 2012 21:49:52 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: Am 31.01.2012 13:08, schrieb Victor Stinner: >> This way you don't need to maintain a >> mapping of strings to classes, and other functions/third party can join in >> the fun without needing access to the latest canonical mapping. Lastly there >> will be no confusion or contention for duplicate keys. > > My patch checks isinstance(format, type), format.__module__ and > format.__name__ to do the "mapping". It is not a direct mapping > because I don't always call the same method, the implementation is > completly differenet for each type. > > I don't think that we need user defined timestamp formats. My last > patch provides 5 formats: > > - int > - float > - decimal.Decimal > - datetime.datetime > - datetime.timedelta > > (I removed the timespec format, I consider that we don't need it.) Rather, I guess you removed it because it didn't fit the "types as flags" pattern. As I said in another message, another hint that this is the wrong API design: Will the APIs ever support passing in types other than these five? Probably not, so I strongly believe they should not be passed in as types. Georg From g.brandl at gmx.net Tue Jan 31 21:50:57 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 31 Jan 2012 21:50:57 +0100 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> Message-ID: Am 31.01.2012 16:46, schrieb stefan brunthaler: >> If I read the patch correctly, most of it is auto-generated (and there >> is probably a few spurious changes that blow it up, such as the >> python-gdb.py file). > > Hm, honestly I don't know where the python-gdb.py file comes from, I > thought it came with the switch from 3.1 to the tip version I was > using. Anyways, I did not tuch it or at least have no recollection of > doing so. Regarding the spurious changes: This might very well be, > regression testing works, and it would actually be fairly easy to > figure out crashes (e.g., by tracing all executed bytecode > instructions and seeing if all of them are actually executed, I could > easily do that if wanted/necessary.) There is also the issue of the two test modules removed from the test suite. >> But the tool that actually generates the code >> doesn't seem to be included? (Which means that in this form, the >> patch couldn't possibly be accepted.) >> > Well, the tool is not included because it does a lot more (e.g., > generate the code for elimination of reference count operations.) > Unfortunately, my interpreter architecture that achieves the highest > speedups is more complicated, and I got the feeling that this is not > going well with python-dev. So, I had the idea of basically using just > one (but a major one) optimization technique and going with that. I > don't see why you would need my code generator, though. Not that I > cared, but I would need to strip down and remove many parts of it and > also make it more accessible to other people. However, if python-dev > decides that it wants to include the optimizations and requires the > code generator, I'll happily chip in the extra work an give you the > corresponding code generator, too. Well, nobody wants to review generated code. Georg From stefan at brunthaler.net Tue Jan 31 22:17:41 2012 From: stefan at brunthaler.net (stefan brunthaler) Date: Tue, 31 Jan 2012 13:17:41 -0800 Subject: [Python-Dev] Python 3 optimizations, continued, continued again... In-Reply-To: References: <4F23C657.9050501@hotpy.org> Message-ID: > There is also the issue of the two test modules removed from the > test suite. > Oh, I'm sorry, seems like the patch did contain too much of my development stuff. (I did remove them before, because they were always failing due to the instruction opcodes being changed because of quickening; they pass the tests, though.) > Well, nobody wants to review generated code. > I agree. The code generator basically uses templates that contain the information and a dump of the C-structure of several types to traverse and see which one of them implements which functions. There is really no magic there, the most "complex" thing is to get the inline-cache miss checks for function calls right. But I tried to make the generated code look pretty, so that working with it is not too much of a hassle. The code generator itself is a little bit more complicated, so I am not sure it would help a lot... best, --stefan From victor.stinner at haypocalc.com Tue Jan 31 22:41:39 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 31 Jan 2012 22:41:39 +0100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: > (I removed the timespec format, I consider that we don't need it.) > > Rather, I guess you removed it because it didn't fit the "types as flags" > pattern. I removed it because I don't like tuple: you cannot do arithmetic on tuple, like t2-t1. Print a tuple doesn't give you a nice output. It is used in C because you have no other choice, but in Python, we can do better. > As I said in another message, another hint that this is the wrong API design: > Will the APIs ever support passing in types other than these five? ?Probably > not, so I strongly believe they should not be passed in as types. I don't know if we should only support 3 types today, or more, but I suppose that we will add more later (e.g. if datetime is replaced by another new and better datetime module). You mean that we should use a string instead of type, so time.time(format="decimal")? Or do something else? Victor From bauertomer at gmail.com Tue Jan 31 22:58:40 2012 From: bauertomer at gmail.com (T.B.) Date: Tue, 31 Jan 2012 23:58:40 +0200 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: <0817F73E-95E6-4B88-B967-3DBE0A40D7C6@gmail.com> References: <4F24538E.9060705@gmail.com> <0817F73E-95E6-4B88-B967-3DBE0A40D7C6@gmail.com> Message-ID: <4F286410.3050002@gmail.com> > I concur. This is very much a non-problem. > There is no need to add more code and slow > running time with superfluous type checks. > > > Raymond > What do you think about the following check from threading.py: @@ -317,8 +317,6 @@ self._value = value def acquire(self, blocking=True, timeout=None): - if not blocking and timeout is not None: - raise ValueError("can't specify timeout for non-blocking acquire") rc = False (There are similar checks in Modules/_threadmodule.c) Removing the check means that we ignore the timeout argument when blocking=False. Currently in the multiprocessing docs there is an outdated note concerning acquire() methods that also says: "If block is False then timeout is ignored". This makes the acquire() methods of the threading and multiprocessing modules have different behaviors. Related: http://bugs.python.org/issue850728#msg103227 TB From tjreedy at udel.edu Tue Jan 31 23:07:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 Jan 2012 17:07:53 -0500 Subject: [Python-Dev] threading.Semaphore()'s counter can become negative for non-ints In-Reply-To: <0817F73E-95E6-4B88-B967-3DBE0A40D7C6@gmail.com> References: <4F24538E.9060705@gmail.com> <0817F73E-95E6-4B88-B967-3DBE0A40D7C6@gmail.com> Message-ID: On 1/31/2012 3:10 PM, Raymond Hettinger wrote: > > On Jan 29, 2012, at 6:11 PM, John O'Connor wrote: > >> On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson >> > wrote: >>> But why would you want to pass a float? It seems like API abuse to me. >>> >> >> Agreed. Anything else seems meaningless. > > I concur. This is very much a non-problem. > There is no need to add more code and slow > running time with superfluous type checks. If it does not now, the doc could be changed to say that the arg must be an int, and behavior is undefined otherwise. Then the contract is clear. -- Terry Jan Reedy From anacrolix at gmail.com Tue Jan 31 23:41:56 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Wed, 1 Feb 2012 09:41:56 +1100 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: Message-ID: Nick mentioned using a single type and converting upon return, I'm starting to like that more. A limited set of time formats is mostly arbitrary, and there will always be a performance hit deciding which type to return. The goal here is to allow high precision timings with minimal cost. A separate module, and an agreement on what the best performing high precision type is I think is the best way forward. On Feb 1, 2012 8:47 AM, "Victor Stinner" wrote: > > (I removed the timespec format, I consider that we don't need it.) > > > > Rather, I guess you removed it because it didn't fit the "types as flags" > > pattern. > > I removed it because I don't like tuple: you cannot do arithmetic on > tuple, like t2-t1. Print a tuple doesn't give you a nice output. It is > used in C because you have no other choice, but in Python, we can do > better. > > > As I said in another message, another hint that this is the wrong API > design: > > Will the APIs ever support passing in types other than these five? > Probably > > not, so I strongly believe they should not be passed in as types. > > I don't know if we should only support 3 types today, or more, but I > suppose that we will add more later (e.g. if datetime is replaced by > another new and better datetime module). > > You mean that we should use a string instead of type, so > time.time(format="decimal")? Or do something else? > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Tue Jan 31 23:58:48 2012 From: mark at hotpy.org (Mark Shannon) Date: Tue, 31 Jan 2012 22:58:48 +0000 Subject: [Python-Dev] Store timestamps as decimal.Decimal objects In-Reply-To: References: <20120131131330.2349dc6b@pitrou.net> Message-ID: <4F287228.1090403@hotpy.org> Alexander Belopolsky wrote: > On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou wrote: >> On Tue, 31 Jan 2012 21:11:37 +1000 >> Nick Coghlan wrote: >>> Having a low-level module like os needing to know about higher-level >>> types like decimal.Decimal and datetime.datetime (or even timedelta) >>> should be setting off all kinds of warning bells. >> Decimal is ideally low-level (it's a number), it's just that it has a >> complicated high-level implementation :) > > FWIW, my vote is also for Decimal and against datetime or timedelta. > (I dream of Decimal replacing float in Python 4000, so take my vote > with an appropriate amount of salt. :-) Why not add a new function rather than modifying time.time()? (after all its just a timestamp, does it really need nanosecond precision?) For those who do want super-accuracy then add a new function time.picotime() (it could be nanotime but why not future proof it :) ) which returns an int represent the number of picoseconds since the epoch. ints never loose precision and never overflow. Cheers, Mark. From trent at snakebite.org Sun Jan 29 21:23:14 2012 From: trent at snakebite.org (Trent Nelson) Date: Sun, 29 Jan 2012 15:23:14 -0500 Subject: [Python-Dev] Switching to Visual Studio 2010 In-Reply-To: <20120126215431.Horde.dSI3OML8999PIb2HJXHnfeA@webmail.df.eu> References: <4F15DD85.6000905@v.loewis.de> <4F15E1A1.6090303@v.loewis.de> <20120126215431.Horde.dSI3OML8999PIb2HJXHnfeA@webmail.df.eu> Message-ID: <20120129202309.GA21774@snakebite.org> On Thu, Jan 26, 2012 at 12:54:31PM -0800, martin at v.loewis.de wrote: > > Is this considered a new feature that has to be in by the first beta? > > I'm hoping to have it completed much sooner than that so we can get > > mileage on it, but is there a cutoff for changing the compiler? > > At some point, I'll start doing this myself if it hasn't been done by > then, and I would certainly want the build process adjusted (with > all buildbots updated) before beta 1. I... I think I might have already done this, inadvertently. I needed an x64 VS2010 debug build of Subversion/APR*/Python a few weeks ago -- forgetting the fact that we're still on VS2008. By the time I got to building Python, I'd already coerced everything else to use VS2010, so I just bit the bullet and coerced Python to use it too, including updating all the buildbot scripts and relevant externals to use VS2010, too. Things that immediately come to mind as potentially being useful: * Three new buildbot scripts: - build-amd64-vs10.bat - clean-amd64-vs10.bat - external-amd64-vs10.bat * Updates to externals/(tcl|tk)-8.5.9.x so that they both build with VS2010. This was a tad fiddly. I ended up creating makefile.vs10 from win/makefile.vc and encapsulating the changes there, then calling that from the buildbot *-vs10.bat scripts. I had to change win/rules.vc, too. * A few other things I can't remember off the top of my head. So, I guess my question is, is that work useful? Based on Martin's original list, it seems to check a few boxes. Brian, what are your plans? Are you going to continue working in hg.python.org/sandbox/vs2010port then merge everything over when ready? I have some time available to work on this for the next three weeks or so and would like to help out. Regards, Trent.